WO2020124400A1 - Appareil et procédé de traitement de saut à branches multiples et processeur - Google Patents

Appareil et procédé de traitement de saut à branches multiples et processeur Download PDF

Info

Publication number
WO2020124400A1
WO2020124400A1 PCT/CN2018/121901 CN2018121901W WO2020124400A1 WO 2020124400 A1 WO2020124400 A1 WO 2020124400A1 CN 2018121901 W CN2018121901 W CN 2018121901W WO 2020124400 A1 WO2020124400 A1 WO 2020124400A1
Authority
WO
WIPO (PCT)
Prior art keywords
branch
branch jump
predicate register
jump
instruction
Prior art date
Application number
PCT/CN2018/121901
Other languages
English (en)
Chinese (zh)
Inventor
刘建军
刘国丁
赖晓飞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880100044.8A priority Critical patent/CN113168327B/zh
Priority to PCT/CN2018/121901 priority patent/WO2020124400A1/fr
Publication of WO2020124400A1 publication Critical patent/WO2020124400A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter

Definitions

  • the present application relates to the technical field of processors, in particular to a multi-branch jump processing device and method, and a processor.
  • the multi-branch jump structure includes at least two branches and the judgment conditions corresponding to each branch (or called branch jump conditions) ), generally only when the judgment result of the branch jump condition corresponding to the branch is true, it will jump to the address of the branch to execute the branch, that is to say, the processor is generally difficult to predict before executing the multi-branch jump structure
  • the branch address to jump to is a type of structure that can change the flow of instructions.
  • the key to implementing a multi-branch jump to a multi-branch jump structure is to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure.
  • the processor uses a tri-state content addressable memory (ternary content addressable memory, TCAM) to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure.
  • TCAM stores TCAM entries corresponding to each branch in the multi-branch jump structure, and each branch corresponds to at least one TCAM entry. The more complicated the branch jump condition corresponding to the branch, the more the number of TCAM entries corresponding to the branch.
  • the processor can determine the storage address of the index information of the branch corresponding to the successful TCAM entry, and then the processing The device or other device executes the corresponding branch according to the determined storage address, so as to realize the multi-branch jump to the multi-branch jump structure.
  • the process that the processor uses TCAM to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure needs to use TCAM to store a large number of TCAM entries, which brings challenges to the storage space of TCAM.
  • Embodiments of the present application provide a multi-branch jump processing device, method, and processor.
  • an embodiment of the present application provides a multi-branch jump processing device, including: an arithmetic unit, configured to obtain a corresponding branch jump condition according to an operation relationship of each branch jump condition in the multi-branch jump structure. Judgment result, and the judgment result of each of the branch jump conditions is stored in a predicate register, the judgment result is 0 or 1, and the multi-branch jump structure includes multiple branch jump conditions; branch jump A unit, configured to determine a target address according to a judgment result of a storage base address of the multi-branch jump structure and all branch jump conditions in the multi-branch jump structure stored in the predicate register, the target address includes Index information of the branch to be executed in the multi-branch jump structure. Since the multi-branch jump processing device of the present application can realize branch jump without occupying TCAM storage space, it is beneficial to reduce the demand for TCAM storage space.
  • the branch and jump unit is specifically configured to: find a target bit in the predicate register, where the target bit is the first bit in the predicate register One bit set to a preset value; and, the target address is determined based on the storage base address and the target bit.
  • This implementation mode refines the functions of the branch and jump unit and increases the operability of the embodiments of the present application.
  • the branch and jump unit is specifically used to: find multiple valid bits from the predicate register, so The judgment result of all branch jump conditions in the multi-branch jump structure is that the storage bits in the predicate register are all the valid bits; finding the first one from a plurality of the valid bits is set to a preset value Target bit.
  • the branch jump unit searches for the target bit from the effective bits, which is beneficial to improve the accuracy of the multi-branch jump.
  • the branch jump condition includes N
  • the relationship expressions are arranged serially in the sequence, and every two adjacent relationship expressions are connected by a logical operator, N is an integer greater than or equal to 2;
  • the branch jump condition also includes N logical expressions ,
  • the first relational expression is the first logical expression, the operation result of the i-th logical expression and the i+1th relational expression are located in the i+1th relation through the i-th relational expression
  • the arithmetic unit is used to sequentially execute the N logical expressions, and
  • the operation result of the Nth logical expression is stored in the predicate register, wherein the operation result of the Nth logical expression is a judgment result of the branch jump condition.
  • This implementation mode refines the functions of the arithmetic unit
  • a fourth possible implementation manner of the first aspect after determining the target address, the branch The jump unit is also used to execute the branch to be executed according to the target address.
  • This implementation manner increases the achievability of the embodiment of the present application by adding the step of executing the branch to be executed according to the target address.
  • an embodiment of the present application provides a multi-branch jump processing method, including: a multi-branch jump processing device obtains a corresponding branch jump condition according to an operation relationship of each branch jump condition in a multi-branch jump structure And the judgment result of each of the branch jump conditions is stored in a predicate register, the judgment result is 0 or 1, and the multi-branch jump structure includes multiple branch jump conditions; The multi-branch jump processing device determines the target address according to the judgment result of the storage base address of the multi-branch jump structure and all the branch jump conditions in the multi-branch jump structure stored in the predicate register, and the target The address includes index information of the branch to be executed in the multi-branch jump structure. Since the multi-branch jump processing method of the present application can realize branch jump without occupying TCAM storage space, it is beneficial to reduce the demand for TCAM storage space.
  • the multi-branch jump processing device according to the storage base address of the multi-branch jump structure and the multiple stored in the predicate register
  • the determination of the target address by the judgment result of all the branch jump conditions in the branch jump structure includes: the multi-branch jump processing device searches for a target bit in the predicate register, the target bit being the first in the predicate register A bit set to a preset value; and the multi-branch jump processing device determines the target address based on the storage base address and the target bit.
  • This implementation mode refines the step of searching for the target bit in the predicate register, which increases the operability of the embodiments of the present application.
  • the multi-branch jump processing device searching for the target bit in the predicate register includes: the multi-branch The jump processing device searches for a plurality of valid bits from the predicate register, and the storage results of the judgment results of all branch jump conditions in the multi-branch jump structure in the predicate register are all the valid bits; The multi-branch jump processing device searches for the first target bit set to a preset value from the plurality of valid bits. This implementation method finds the target bit from the effective bits, which is beneficial to improve the accuracy of the multi-branch jump.
  • the branch jump condition includes N
  • the relationship expressions are arranged serially in the sequence, and every two adjacent relationship expressions are connected by a logical operator, N is an integer greater than or equal to 2;
  • the branch jump condition also includes N logical expressions ,
  • the first relational expression is the first logical expression, the operation result of the i-th logical expression and the i+1th relational expression are located in the i+1th relation through the i-th relational expression Connect the logical operators between the expressions to get the i+1th logical expression, i is an integer greater than or equal to 1 and less than N;
  • the multi-branch jump processing device according to each branch in the multi-branch jump structure
  • the calculation relationship of the jump condition itself, obtaining the judgment result of the corresponding branch jump condition, and storing the judgment result of each of the branch jump conditions in the predicate register includes: the multi-branch jump processing device
  • the multi-branch jump processing device determines the After the target address, the method further includes: the multi-branch jump processing device performs the branch to be executed according to the target address.
  • This implementation manner increases the achievability of the embodiment of the present application by adding the step of executing the branch to be executed according to the target address.
  • an embodiment of the present application provides a processor, including: an instruction cache, a decoder, a predicate register, and an execution core; the instruction cache is used to cache instructions corresponding to a multi-branch jump structure, and is coupled to all The decoder provides the instruction to the decoder; the decoder is used to decode the instruction, and is coupled to the execution kernel to control the execution kernel to execute the instruction; the The predicate register is coupled to the execution core and is used to provide read and write operations to the execution core on the return value of the predicate function; the execution core is used to execute the second aspect during the execution of the instruction Or any possible implementation method of the second aspect.
  • an embodiment of the present application provides a multi-branch jump processing system, including a processor, peripheral devices, external memory, and a power supply; wherein, the power supply is used to power the processor, peripheral devices, and external memory; the processor is coupled External memory and one or more peripheral devices; the external memory can be used to store programs; when the processor executes the programs stored in the external memory, it is used to implement the second aspect or any possible implementation manner of the second aspect method.
  • Figure 1 is a schematic diagram of the structure of the computer
  • FIG. 2 is a schematic structural diagram of a computer used to implement multi-branch jump in the prior art
  • FIG. 3 is a schematic diagram of an embodiment of a computer of this application.
  • FIG. 4 is a schematic diagram of an embodiment of a multi-branch jump processing method of the present application.
  • step S200 is a schematic diagram of a possible detailed step of step S200 in the multi-branch jump processing method of the present application
  • step S210 is a schematic diagram of a possible detailed step of step S210 in the multi-branch jump processing method of the present application
  • step S100 is a schematic diagram of a possible detailed step of step S100 in the multi-branch jump processing method of the present application.
  • FIG. 8 is a schematic diagram of an embodiment of a multi-branch jump processing system of the present application.
  • Embodiments of the present application provide a multi-branch jump processing device, method, and processor.
  • At least one item (a) in a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, c can be a single or multiple .
  • a computer generally includes a processor 1, a program memory 2, and a compiler 3.
  • the program memory 2 is used to store programs
  • the compiler 3 is used to compile programs stored in the program memory 2 to obtain instructions
  • the processor 1 is used to
  • the program stored in the program memory 2 is executed according to the instruction obtained by the compiler 3.
  • the program stored in the program memory 2 of a computer, especially a network computer (such as a router) generally includes a large number of multi-branch jump structures. Therefore, the processor 1 in the computer needs to perform a large number of multi-branch jumps.
  • the processor 1 determines whether the logical value of the branch jump condition in the multi-branch jump structure 1 is true. When the judgment result of a branch jump condition is true, the processor 1 determines the branch corresponding to the branch jump condition The storage address of the index information of the, then executes the branch according to the determined storage address; otherwise, the program continues to execute, so as to realize the multi-branch jump of the multi-branch jump structure 1 by the processor 1.
  • the key to implementing a multi-branch jump to a multi-branch jump structure is to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure.
  • currently TCAM4 needs to be set in the computer to store the TCAM entries corresponding to each branch in the multi-branch jump structure.
  • the processor 1 uses TCAM4 to determine the storage of the index information of the branch to be executed in the multi-branch jump structure address. Specifically, the processor 1 matches the compiled branch jump condition with the TCAM entry in the TCAM4, and when the matching is successful, reads the storage address of the index information of the branch to be executed corresponding to the successfully matched TCAM entry.
  • the TCAM entries corresponding to the first branch may include: ⁇ 1, 0 ⁇ , ⁇ 1, 1 ⁇ , ⁇ 1,2, ⁇ and ⁇ 1, 3 ⁇ .
  • the processor 1 when determining the storage address of the index information of the branch to be executed in the multi-branch jump structure, the processor 1 needs to use TCAM4 in the computer to store the TCAM entries corresponding to each branch in the multi-branch jump structure.
  • Storage space brings challenges. Especially when the number of branch jump conditions in the multi-branch jump structure is large, or the branch jump conditions are more complicated, there is also an explosive growth trend of TCAM entries, which not only requires more storage space in TCAM4, but also exists The problem of a large number of matches, which in turn leads to low matching efficiency, reduces the execution efficiency of the multi-branch jump structure of the processor 1.
  • the present application provides a specific structure of the processor 1 to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure.
  • the determined storage address is used to implement the multi-branch jump structure.
  • the processor 1 may include an instruction cache 11, a decoder 12, a predicate register 14 and an execution core 13.
  • the execution core 13 includes an operation unit 131 and a branch jump unit 132.
  • the instruction cache 11 is used to obtain the instructions obtained by the compiler 3 compiling the multi-branch jump structure in the program memory 2, and cache the obtained instructions, and is coupled to the decoder 12 to the decoder. 12 Provide the instruction.
  • the decoder 12 is used to decode the instruction obtained from the instruction cache 11, and is coupled to the execution core 13 to control the execution core 13 to execute the corresponding instruction.
  • the predicate register 14 is coupled to the execution core 13 to provide the execution core 13 with a read/write operation on the return value of the predicate function.
  • the predicate function refers to a function whose return value is a logical value.
  • the return value of the function may be a number, a string, or a date, etc., but the return value of the predicate function is a logical value (generally 0 or 1 , Usually 1 represents the logical value is true, and 0 represents the logical value is false), the return value of a predicate function occupies 1 bit in the predicate register 14.
  • the execution core 13 is used to execute corresponding instructions under the control of the decoder 12 to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure.
  • the execution process of the instruction by the execution kernel 13 may be specifically embodied in the execution process of the instruction by the arithmetic unit 131 and the branch and jump unit 132, wherein the arithmetic unit 131 is used for its own operation according to each branch jump condition in the multi-branch jump structure Relationship, obtain the judgment result of the corresponding branch jump condition, and write the judgment result of each branch jump condition into the predicate register 14, the process of the operation unit 131 obtaining the judgment result of the branch jump condition can be understood as the calculation by the operation unit 131
  • the judgment result of the branch jump condition is a logical value, that is, 0 or 1; the branch jump unit 132 is used to store the base address of the multi-branch jump structure and the number stored in the predicate register 14
  • the judgment result of all the branch jump conditions in the branch jump structure determines the target address, and the target address includes index information of the branch to be executed in the multi-branch jump structure.
  • the branch jump unit 132 may be used to execute the branch to be executed according to the target address.
  • the branch information may be executed by determining the index information stored in the target address. The branch corresponding to the index information.
  • the processor provided by the present application does not need to use TCAM to determine the storage address of the index information of the branch to be executed in the multi-branch jump structure, it is beneficial to reduce the requirement for TCAM storage space. Moreover, since the processor provided by the present application does not need to match the TCAM entry to achieve multi-branch jump, even if the complexity of the multi-branch jump structure is high, there is no problem of a large number of matches, which is beneficial to Reduce the impact of the complexity of the multi-branch jump structure on the execution efficiency of the processor.
  • the branch jump unit 132 may be specifically used to search for the first bit in the predicate register 14 that is set to a preset value.
  • the first one in the predicate register 14 is The bit set to the preset value is called the target bit, and then, the branch jump unit 132 may determine the target address according to the storage base address of the multi-branch jump structure and the found target bit.
  • the process of the branch jump unit 132 searching for the target bit in the predicate register 14 may specifically include: the branch jump unit 132 searching for multiple valid bits from the predicate register, and then searching for the first set bit from the multiple valid bits It is the target bit of the preset value.
  • the storage results in the predicate register of all branch jump condition judgment results in the multi-branch jump structure are all valid bits.
  • branch jump conditions of some multi-branch jump structures are more complicated.
  • a branch jump condition in a type of multi-branch jump structure that is, a branch jump condition includes both relational operations and logical operations.
  • this type of branch jump condition is called a compound branch jump condition.
  • the processor 1 provided by the present application may determine a storage address of index information of a branch to be executed of a multi-branch jump structure including a compound branch jump condition.
  • a certain branch jump condition in the multi-branch jump structure to be executed in the program memory 2 is a compound branch jump condition
  • the compound branch jump condition includes N relational expressions arranged in series in a determined order, N It is an integer greater than or equal to 2, and every two adjacent relational expressions are connected by logical operators.
  • the above compound branch jump condition further includes N logical expressions
  • the first relational expression is the first logical expression
  • the operation result of the ith logical expression is related to the i+1th relation
  • the expression is connected by a logical operator between the i-th relational expression and the i+1th relational expression to obtain the i+1th logical expression, i is an integer greater than or equal to 1 and less than N
  • the The operation result of the N logical expressions is the judgment result of the compound branch jump condition.
  • the arithmetic unit 131 may obtain the judgment result according to the computational relationship of the compound branch jump condition itself, and The judgment result is stored in the predicate register. In a possible implementation manner, the arithmetic unit 131 may sequentially execute the N logical expressions in the compound branch jump condition, and store the arithmetic result of the Nth logical expression In the predicate register.
  • the compiler 3 can compile the multi-branch jump structure stored in the program memory 2 to obtain the first instruction and the second instruction, and the instruction cache 11 can sequentially obtain the first instruction and the second instruction, and store and decode it
  • the processor 12 may decode the acquired first instruction and second instruction to control the arithmetic unit 131 to execute the first instruction, and then control the branch jump unit 132 to execute the second instruction.
  • the compiler 3 can sequentially compile each branch jump condition in the multi-branch jump structure, and the compiler 3 compiles each branch jump condition to obtain a first instruction.
  • the instruction cache 11 can Get and store the first instruction.
  • Sequential compilation refers to compiling in accordance with the order of the branch jump conditions in the multi-branch jump structure. Compiler 3 compiles the branch jump conditions in the multi-branch jump structure in sequence to obtain multiple first Instruction, the instruction cache 11 may sequentially acquire multiple first instructions and store them. After that, the instruction cache 11 may sequentially provide the first instruction to the decoder 12, and the decoder 12 may decode the obtained first instruction to control the arithmetic unit 131 to execute the first instruction.
  • the first branch jump condition in the multi-branch jump structure will be compiled, cached, and decoded first, and then executed by the arithmetic unit 131 first.
  • the arithmetic unit 131 can execute the first instruction under the control of the decoder 12. Specifically, the arithmetic unit 131 can obtain the judgment result of the branch jump condition corresponding to the first instruction.
  • the judgment result of the branch jump condition is 0 or
  • the logical value represented by 1 can generally be represented by 1 as the logical value is true, with 0 as the logical value is false, in actual use, you can also use 0 to represent the logical value is true, and 1 represents the logical value is false, not here Be specific.
  • the arithmetic unit 131 can then store the judgment result of the branch jump condition corresponding to the first instruction in the predicate register 14.
  • the judgment result of a branch jump condition occupies one bit in the predicate register 14.
  • the predicate register 14 with a bit width of 16 can store the judgment result of up to 16 branch jump conditions, that is, it can support up to 16 channels Multi-branch jump structure.
  • the decoder 12 may obtain the next first instruction corresponding to the multi-branch jump structure from the instruction cache 11 and decode it.
  • the arithmetic unit 131 may The first instruction is executed under the control of the decoder 12. In this way, the arithmetic unit 131 can sequentially execute each first instruction corresponding to the multi-branch jump structure, thereby sequentially storing the judgment result of each branch jump condition in the multi-branch jump structure in the predicate register 14.
  • the decoder 12 can obtain the first instruction from the instruction cache 11.
  • the control branch jump unit 132 executes the second instruction.
  • the second instruction may include the storage base address of the multi-branch jump structure.
  • the branch jump unit 132 may use the storage base address of the multi-branch jump structure and the multi-branch stored in the predicate register 14
  • the judgment result of all branch jump conditions in the jump structure determines the target address, and the target address includes index information of the branch to be executed in the multi-branch jump structure.
  • the compiler 3 can compile the multi-branch jump structure to obtain the first instruction and the second instruction, and the arithmetic unit 131 in the processor 1 can execute the first instruction to convert the The judgment result of all branch jump conditions is written into the predicate register 14, and then the branch jump unit 132 can determine the storage address of the index information of the branch to be executed in the multi-branch jump structure by executing the second instruction.
  • the processor 1 Or other devices in the computer, may execute the branch to be executed in the multi-branch jump structure according to the storage address determined by the branch jump unit 132, thereby implementing the multi-branch jump in the multi-branch jump structure.
  • a branch jump condition in the multi-branch jump structure to be executed in the program memory 2 is a compound branch jump condition
  • the compound branch jump condition includes N relational expressions arranged in series in a determined order, N Is an integer greater than or equal to 2, every two adjacent relational expressions are connected by a logical operator. According to the previous description of the compound branch jump condition, the compound branch jump condition can be understood as including N Logical expression.
  • the first instruction obtained after the compiler 3 compiles the compound branch jump condition may include N comparison instructions in a determined order, and the N comparison instructions are compiled
  • the device 3 respectively compiles the N logical expressions in the branch jump condition in turn, that is, the N comparison instructions correspond to the N logical expressions in the branch jump condition one by one, and the i
  • the comparison instructions correspond to the ith logical expression in the branch and jump condition, i is any positive integer less than n, that is to say, the first comparison instruction in the N comparison instructions is obtained by the first compilation
  • the comparison instruction corresponds to the first logical expression in the branch and jump condition.
  • the second comparison instruction in the N comparison instructions is the second compiled comparison instruction, which corresponds to the second instruction in the branch jump condition.
  • the logical expression, ..., the Nth comparison instruction in the N comparison instructions is the Nth compiled comparison instruction, which corresponds to the Nth logic expression in the branch jump condition.
  • Each comparison instruction includes an operator field and an operand field, and the arithmetic unit 131 can perform a comparison operation according to the comparison instruction to obtain an operation result.
  • the comparison operation includes logical operation and relational operation. Therefore, the operator field in the comparison instruction may include the logical operator field and the relational operator field.
  • the relational operator field is used to indicate the type of the relational operation corresponding to the comparison instruction, that is, the type of the relational operation corresponding to the i+1th relational expression in the compound branch jump condition, and It can be understood as the type of relational operation corresponding to the i+1th logical expression;
  • the logical operator field is used to indicate the type of logical operation corresponding to the comparison instruction, that is, the ith relational expression and
  • the type of logical operator between the i+1th relational expression can also be understood as the type of logical operation corresponding to the i+1th logical expression.
  • the operand field in the comparison instruction includes the source operand field and the destination operand field.
  • the source operand field is used to indicate the source operand of the comparison operation.
  • the destination operand field is used to index the destination operand of the comparison operation in the predicate register 14 Storage location.
  • the source operand field in the comparison instruction may specifically include a first source operand field and a second source operand field.
  • the first source operand field is used to indicate the source operand of the relational operation corresponding to the comparison instruction, that is, the source operation in the i+1th relational expression in the compound branch jump condition
  • the second source operand field is used to index the source operand of the logical operation corresponding to the comparison instruction, that is, the destination operand of the i-th comparison instruction.
  • the second source operand field in the i+1th comparison instruction may point to the same storage address as the destination operand field in the ith comparison instruction.
  • the first source operand field in the i+1th comparison instruction may correspond to an immediate number, or may correspond to the storage address of the source operand in the i+1th relational expression.
  • the processor 1 may further include a general register 15, and the first source operand field in the comparison instruction may point to the general register 15, and the general register 15 is coupled to the arithmetic unit 131 , Used to provide the source operand of the relational operation to the arithmetic unit 131.
  • the arithmetic unit 131 may sequentially execute N comparison instructions. Taking the process of the operation unit 131 executing the i+1th comparison instruction as an example, the operation unit 131 can perform a relationship operation based on the relational operator field and the first source operand field in the i+1th comparison instruction to obtain the i+th The result of the relational operation corresponding to one comparison instruction; afterwards, the arithmetic unit 131 can perform a logic operation based on the obtained relational operation result and the logical operator field and the second source operand field in the i+1th comparison instruction to obtain The result of the logical operation corresponding to the i+1th comparison instruction; afterwards, the arithmetic unit 131 may follow the result of the logical operation corresponding to the obtained i+1th comparison instruction according to the destination operand field in the i+1th comparison instruction Stored in the predicate register 14.
  • the arithmetic unit 131 After the arithmetic unit 131 executes the Nth comparison instruction, the arithmetic unit 131 completes the execution of the first instruction, and the result of the logical operation corresponding to the Nth comparison instruction obtained by the arithmetic unit 131 is the destination operand of the first instruction, that is The judgment result of the branch jump condition corresponding to the first instruction, that is, the destination operand field in the Nth comparison instruction stores the branch corresponding to the first instruction in the corresponding storage location (or bit) in the predicate register 14 The judgment result of the jump condition.
  • the storage content in the storage location corresponding to the destination operand field in the ith comparison instruction is no longer Is used, in order to save the storage space of the predicate register 14, in a possible implementation, for the i+1th comparison instruction, the destination operand field and the second source operand field can point to the same in the predicate register 14
  • the storage address is the same bit in the multiplex predicate register 14.
  • the arithmetic unit 131 only needs to calculate the operation result of the first relational expression according to the first comparison instruction, and takes the operation result of the first relational expression as the destination operand of the first comparison instruction, according to the first
  • the destination operand field in each comparison instruction is stored in the predicate register 14; or, to unify the format of each comparison instruction, the operator field in the first comparison instruction can still include the logical operator field and the second source operand field
  • the operation unit 131 should still perform the logical operation according to the logical operator field and the second source operand field in the first comparison instruction. The result of the relational operation corresponding to the comparison instruction for the first comparison instruction.
  • the process of the first instruction executed by the arithmetic unit 131 is exemplified below.
  • the multi-branch jump structure to be executed is an 8-way multi-branch jump structure 2, including at least one compound branch jump condition
  • the first branch jump condition of the multi-branch jump structure 2 is the compound branch jump condition
  • the compiler 3 compiles the branch jump conditions in the multi-branch jump structure 2 in sequence, and can obtain 8 first instructions
  • the first branch jump condition corresponds to the first first instruction
  • the second branch The jump condition corresponds to the 2nd first instruction
  • the 8th branch jump condition corresponds to the 8th first instruction.
  • the instruction cache 11 can sequentially obtain the eight first instructions obtained by the compiler 3 and cache them, and sequentially provide the eight first instructions to the decoder 12, and the decoder 12 can control the arithmetic unit 131 to sequentially execute the eight first instructions instruction.
  • the specific process of the operation unit 131 executing the first first instruction is exemplified below.
  • condition2_1 includes 3 relational expressions, or can be understood
  • the first instruction may include 3 comparison instructions.
  • the comparison instruction in this application may be as shown in Table 1 (the first row in Table 1 is used to represent storage bits, and the second row is used to represent storage contents corresponding to corresponding storage bits):
  • src1 and src2 correspond to the first source operand domain
  • Pn corresponds to the destination operand domain
  • Type corresponds to the relational operator domain
  • Pm corresponds to the second source operand domain
  • cond corresponds to the logical operator domain
  • opcode corresponds to
  • the opcode indicated in the opcode field refers to which instruction (first instruction or second instruction, usually represented by code) specified in the computer program to perform the operation, indicating what nature of the instruction should be performed Operation.
  • Pm and Pn of the previous comparison instruction point to the same storage location in the predicate register 14.
  • relational operator field can be designed as shown in Table 2:
  • the logical operator domain can be designed as shown in Table 3:
  • the comparison instruction corresponding to the first logical expression in the branch jump condition still includes the logical operator field and the second source operand field
  • the comparison instruction The logical operator field of may correspond to the logical operator "UN" in Table 3, and the arithmetic unit 131 performs the logical operation corresponding to the logical operation of the comparison instruction according to the logical operator field and the second source operand field of the comparison instruction The result of the operation is still the result of the first relational operation.
  • Table 4 each The first row in the table is used to indicate storage bits, and the second row is used to indicate storage contents corresponding to the corresponding storage bits
  • Pm1, Pn1, Pn2, and Pn3 correspond to the storage locations in the predicate register 14, and opcode_1 corresponds to the opcode field of the comparison operation.
  • a, b, c, A1, B1, and C1 may be immediate data, or may be the storage address of the first source operand.
  • the process of the arithmetic unit 131 executing the first first instruction is the process of sequentially executing the above three comparison instructions.
  • the arithmetic unit 131 first executes the first comparison instruction shown in Table 4.
  • the result "1" is stored in the bit in the predicate register 14 indicated by Pn1.
  • the storage content in the predicate register 14 can refer to Table 7 (the first row in Table 7 is used to indicate the bit in the predicate register 14 , The second line is used to indicate the storage content corresponding to the corresponding bit):
  • the arithmetic unit 131 After the arithmetic unit 131 executes the first comparison instruction, it can execute the second comparison instruction.
  • the arithmetic unit 131 can store the destination operand "1" of the second comparison instruction in the first bit in the predicate register 14. After the arithmetic unit 131 executes the second comparison instruction, the storage content in the predicate register 14 can still refer to Table 7.
  • the arithmetic unit 131 After the arithmetic unit 131 executes the second comparison instruction, it may continue to execute the third comparison instruction.
  • the specific process may refer to the above process of executing the second comparison instruction, which will not be repeated here.
  • the storage content in the predicate register 14 can still refer to Table 7.
  • the arithmetic unit 131 completes the execution of the first first instruction, and the judgment result of the first branch jump condition condition2_1 corresponding to the first instruction is "1", which is stored in the first bit of the predicate register 14 , As shown in Table 7.
  • the arithmetic unit 131 can sequentially execute the second first instruction, the third first instruction, ..., the eighth first instruction under the control of the decoder 12, and convert the The judgment result of each branch jump condition is stored in the predicate register 14, assuming that the storage content in the predicate register 14 is as shown in Table 8 (the first row in Table 8 is used to indicate the bit in the predicate register 14, the second The row is used to indicate the storage content corresponding to the corresponding bit):
  • the decoder 12 may obtain the second instruction from the instruction cache 11.
  • the second instruction may include a third source operand field.
  • the third source operand field is used to indicate a base address corresponding to the multi-branch jump structure.
  • the base address corresponding to the multi-branch jump structure may be Point to random access memory.
  • the branch jump unit 132 can execute the second instruction. Specifically, the branch jump unit 132 can search for the first bit in the predicate register 14 that is set to the preset value, that is, the target bit.
  • the preset value may be 1 or 0, which is not specifically limited here.
  • the present application uses the preset value of 1 as an example for description.
  • the branch jump unit 132 may determine the offset address of the branch to be executed according to the found target bit, and then, the branch jump unit 132 may determine the third source in the second instruction according to the determined offset address of the branch to be executed The base address corresponding to the operand field determines the target address.
  • the arithmetic unit 131 After the arithmetic unit 131 stores the judgment result of each branch jump condition in the multi-branch jump structure in the predicate register 14, if the bit width of the predicate register 14 is equal to the number of branch jump conditions in the multi-branch jump structure Each bit in the predicate register 14 corresponds to a judgment result of a branch jump condition. At this time, the target bit can be searched according to all the bits in the predicate register 14.
  • the arithmetic unit 131 stores the judgment result of each branch jump condition in the 8-way multi-branch jump structure in the predicate register 14, and continues to refer to the above
  • the storage content of each bit in the predicate register 14 can be referred to Table 8.
  • the first bit set to 1 in the predicate register 14 is the first bit, that is, the target bit is the first bit, and the target bit corresponds to the first bit in the multi-branch jump structure 1 branch jump condition (that is, condition2_1)
  • the first branch corresponding to the first branch jump condition is the branch to be executed
  • the offset address of the branch to be executed on the basis of the base address is 1.
  • the bit width of the predicate register 14 is greater than the number of branch jump conditions in the multi-branch jump structure, then some of the bits in the predicate register 14 do not correspond to the judgment result of the branch jump condition.
  • the arithmetic unit 131 stores the judgment result of each branch jump condition in the 3-way multi-branch jump structure in the predicate register 14, and each bit in the predicate register 14
  • the storage content is still shown in Table 8.
  • the three branch jump conditions of the multi-branch jump structure are stored in the second, third, and fourth bits of the predicate register 14, respectively, if you search according to all the bits in the predicate register 14 Target bit, then the first bit is the target bit, and the offset address determined according to the target bit is 1.
  • the target bit should be the third bit.
  • the target bit since the target bit is in the second, third, and fourth bits The second position, therefore, the target bit should correspond to the second branch jump condition in the multi-branch jump structure, and the offset address of the branch to be executed should be 2.
  • the second instruction may further include a fourth source operand field.
  • the fourth source operand field in the second instruction is used to indicate the valid bit in the predicate register 14, and the valid bit corresponds to the judgment result of the branch jump condition
  • the second instruction in this application may be as shown in Table 9 (the first row in Table 9 is used to represent storage bits, and the second row is used to represent storage contents corresponding to corresponding storage bits):
  • src1 corresponds to the third source operand field of the second instruction, used to indicate the base address corresponding to the multi-branch jump structure
  • src2 corresponds to the fourth source operand field of the second instruction, used to indicate the predicate register 14
  • the opcode corresponds to the opcode field, and it is assumed that the opcode corresponding to the second instruction is opcode_2.
  • the fourth source operand field in the second instruction may be an immediate number.
  • the bit width of the fourth source operand field src2 is the same as the bit width of the predicate register 14. You can use
  • the position corresponding to the valid bit in the predicate register 14 is a preset value (such as 1) to indicate the valid bit. Assuming that the bit width of the predicate register 14 is 8, the judgment result of the branch jump condition of the 3-way multi-branch jump structure is stored in the predicate register 14 after the second, third, and fourth bits, the second For instructions, refer to Table 10 (the first row in Table 10 is used to represent storage bits, and the second row is used to represent storage contents corresponding to corresponding storage bits):
  • the arithmetic unit 131 stores the judgment results of the branch jump conditions in the multi-branch jump structure in the predicate register 14, assuming that the storage content of the predicate register 14 is shown in Table 8, and the second instruction is shown in Table 10.
  • the process of the branch jump unit 132 executing the second instruction may specifically include: first, the branch jump unit 132 may determine that the valid bit in the predicate register 14 is the second bit according to the fourth source operand field in the second instruction , The third bit and the fourth bit, and then look for the first bit set to the preset value from the effective bits of the predicate register 14 as the target bit (ie the third bit).
  • the branch jump unit 132 can determine the offset address of the branch to be executed according to the valid bits (ie, the second, third, and fourth bits) and the target bit (ie, the third bit) found by the branch jump unit 132 Is 2. After that, the branch jump unit 132 can determine the target address according to the offset address of the branch to be executed determined by the branch jump unit 132 and the third source operand field (src1) in the second instruction, and store the branch to be executed at the target address Index information.
  • the valid bits ie, the second, third, and fourth bits
  • the target bit ie, the third bit found by the branch jump unit 132 Is 2.
  • the branch jump unit 132 can determine the target address according to the offset address of the branch to be executed determined by the branch jump unit 132 and the third source operand field (src1) in the second instruction, and store the branch to be executed at the target address Index information.
  • the main part of executing the multi-branch jump processing in the processor 1 is the execution core 13, so the execution core 13 in the processor 1 of the present application is also called a multi-branch jump. Processing device.
  • an embodiment of the present application further provides a multi-branch jump processing method.
  • an embodiment of the multi-branch jump processing method of the present application includes the following steps :
  • the multi-branch jump structure includes multiple branch jump conditions.
  • the multi-branch jump processing device can obtain the judgment result of the corresponding branch jump condition according to the calculation relationship of each branch jump condition in the multi-branch jump structure itself, and The obtained judgment result of each branch jump condition is stored in the predicate register, wherein the judgment result of the branch jump condition is 0 or 1.
  • the multi-branch jump processing device After the multi-branch jump processing device stores the judgment result of each branch jump condition in the multi-branch jump structure in the predicate register, it can be based on the storage base address of the multi-branch jump structure and the multi-branch jump stored in the predicate register
  • the judgment result of all branch jump conditions in the structure determines the target address, and the target address includes index information of the branch to be executed in the multi-branch jump structure.
  • step S100 may be performed by the arithmetic unit 131 in the execution kernel 13 described above, and step S200 may be performed by the branch jump unit 132 in the execution kernel 13.
  • a possible refinement step of step S200 may include:
  • the multi-branch jump processing device stores the judgment result of each branch jump condition in the multi-branch jump structure in the predicate register, and can search for the first bit in the predicate register that is set to the preset value. This bit in the predicate register is called the target bit.
  • the multi-branch jump processing device can determine the target address according to the storage base address and the target bit of the multi-branch jump structure.
  • a possible refinement step of step S210 may include:
  • the multi-branch jump processing device After the multi-branch jump processing device stores the judgment result of each branch jump condition in the multi-branch jump structure in the predicate register, all valid bits can be found from the predicate register, and all branch jumps in the multi-branch jump structure
  • the storage positions of the judgment result of the condition in the predicate register are all valid bits, or the storage position of the judgment result of each branch jump condition in the multi-branch jump structure in the predicate register is used as a valid bit, exemplary Yes, the 8-way multi-branch jump structure produces 8 valid bits in the predicate register. Since the multi-branch jump structure includes multiple branch jump conditions, the multi-branch jump processing device can find multiple valid bits from the predicate register.
  • the multi-branch jump processing device may search for the first target bit set to a preset value from the multiple valid bits.
  • the preset value can be 1 or 0.
  • a possible refinement step of step S100 may include:
  • branch jump condition includes N logical expressions, execute N logical expressions in sequence
  • the multi-branch jump processing device can obtain the judgment result of the corresponding branch jump condition according to the operation relationship of each branch jump condition in the multi-branch jump structure.
  • the process for the multi-branch jump device to obtain its judgment result may be specifically: executing N logical expressions in sequence.
  • Branch jump conditions including N logical expressions, including N relational expressions arranged in series in a determined order, and every two adjacent relational expressions are connected by a logical operator, N is greater than or equal to Integer of 2, the first relational expression is the first logical expression, the operation result of the ith logical expression and the i+1th relational expression are located in the ith relational expression and the i+1th The logical operators between the relationship expressions are connected to obtain the i+1th logical expression, i is an integer greater than or equal to 1 and less than N.
  • the operation result of the Nth logical expression can be stored in the predicate register.
  • the operation result of the Nth logical expression is the judgment result of the branch jump condition, and storing the operation result of the Nth logic expression in the predicate register means that the judgment result of the branch jump condition is stored in the predicate register .
  • a possible multi-branch jump processing system of the present application may include a processor 81, a peripheral device 82, an external memory 83, and a power supply 84.
  • the power supply 84 is used to The processor 81, the peripheral device 82, and the external memory 83 are powered.
  • the processor 81 is coupled to the external memory 83 and one or more peripheral devices 82.
  • the external memory 83 may be used to store programs, and may include any type of memory, such as static random-access memory (SRAM), read-only memory (ROM), and so on.
  • SRAM static random-access memory
  • ROM read-only memory
  • the peripheral device 82 may include any desired circuits, and the multi-branch jump processing system may be any type of computing system, such as desktop computers, workstations, network set-top boxes, etc. In a possible implementation In the manner, the multi-branch jump processing system may be a mobile device (such as a smartphone and a tablet computer, etc.), and the peripheral device 82 may include devices used for various types of wireless communication, such as wifi devices, Bluetooth devices, cellular devices, and global Positioning system, etc. The peripheral device 82 may also include additional storage devices. The peripheral device 82 may include user interface devices such as a display screen, a keyboard, a microphone, and so on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

L'invention concerne un appareil et un procédé de traitement de saut à branches multiples et un processeur utilisés pour résoudre le problème d'un appareil de traitement de saut à branches multiples existant qui gaspille un espace de mémoire TCAM. L'appareil de traitement de saut à branches multiples comprend : une unité de calcul (131) utilisée pour acquérir un résultat de détermination correspondant à une condition de saut de branche en fonction d'une relation de calcul de chaque condition de saut de branche dans une structure de saut à branches multiples, et mémoriser le résultat de détermination de chaque condition de saut de branche dans un registre de prédicats (14), le résultat de détermination étant 0 ou 1, et la structure de saut à branches multiples comprenant de multiples conditions de saut de branche ; et une unité de saut de branche (132) utilisée pour déterminer une adresse cible en fonction d'une adresse de base de mémoire de la structure de saut à branches multiples et des résultats de détermination de toutes les conditions de saut de branche dans la structure de saut à branches multiples mémorisée dans le registre de prédicats (14), l'adresse cible comprenant des informations d'index d'une branche à exécuter de la structure de saut à branches multiples.
PCT/CN2018/121901 2018-12-19 2018-12-19 Appareil et procédé de traitement de saut à branches multiples et processeur WO2020124400A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880100044.8A CN113168327B (zh) 2018-12-19 一种多分支跳转处理装置和方法、处理器
PCT/CN2018/121901 WO2020124400A1 (fr) 2018-12-19 2018-12-19 Appareil et procédé de traitement de saut à branches multiples et processeur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/121901 WO2020124400A1 (fr) 2018-12-19 2018-12-19 Appareil et procédé de traitement de saut à branches multiples et processeur

Publications (1)

Publication Number Publication Date
WO2020124400A1 true WO2020124400A1 (fr) 2020-06-25

Family

ID=71102438

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/121901 WO2020124400A1 (fr) 2018-12-19 2018-12-19 Appareil et procédé de traitement de saut à branches multiples et processeur

Country Status (1)

Country Link
WO (1) WO2020124400A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104468357A (zh) * 2013-09-16 2015-03-25 中兴通讯股份有限公司 流表的多级化方法、多级流表处理方法及装置
US20150370561A1 (en) * 2014-06-20 2015-12-24 Netronome Systems, Inc. Skip instruction to skip a number of instructions on a predicate
CN107018078A (zh) * 2017-01-25 2017-08-04 华为技术有限公司 多分支跳转协处理方法及装置
CN107239260A (zh) * 2017-05-11 2017-10-10 中国电子科技集团公司第三十八研究所 一种面向数字信号处理器的多谓词控制及编译优化方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104468357A (zh) * 2013-09-16 2015-03-25 中兴通讯股份有限公司 流表的多级化方法、多级流表处理方法及装置
US20150370561A1 (en) * 2014-06-20 2015-12-24 Netronome Systems, Inc. Skip instruction to skip a number of instructions on a predicate
CN107018078A (zh) * 2017-01-25 2017-08-04 华为技术有限公司 多分支跳转协处理方法及装置
CN107239260A (zh) * 2017-05-11 2017-10-10 中国电子科技集团公司第三十八研究所 一种面向数字信号处理器的多谓词控制及编译优化方法

Also Published As

Publication number Publication date
CN113168327A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
US7870371B2 (en) Target-frequency based indirect jump prediction for high-performance processors
CN1983235B (zh) 设置向量屏蔽的方法、设备、系统和处理器
US6954845B2 (en) Reconfigurable instruction set computing
US10459727B2 (en) Loop code processor optimizations
JP6718454B2 (ja) 選択的ページミス変換プリフェッチによってプログラムメモリコントローラにおけるページ変換ミスレイテンシを隠すこと
CN107925420B (zh) 用于经优化压缩比的异构压缩架构
JPWO2013099414A1 (ja) レジスタ・マッピング方法
WO2016140756A1 (fr) Renommage de registres dans une architecture d'ensemble d'instructions à base de blocs multi-cœur
US9501268B2 (en) Generating SIMD code from code statements that include non-isomorphic code statements
US20110302394A1 (en) System and method for processing regular expressions using simd and parallel streams
US10318261B2 (en) Execution of complex recursive algorithms
US10592252B2 (en) Efficient instruction processing for sparse data
US20230084523A1 (en) Data Processing Method and Device, and Storage Medium
Celio et al. The renewed case for the reduced instruction set computer: Avoiding isa bloat with macro-op fusion for risc-v
JP6352386B2 (ja) 定数キャッシュを使用してより効率的にリテラル生成データを従属命令に転送するための方法および装置
US9116719B2 (en) Partial commits in dynamic binary translation based systems
WO2020124400A1 (fr) Appareil et procédé de traitement de saut à branches multiples et processeur
US20170192896A1 (en) Zero cache memory system extension
WO2021061269A1 (fr) Appareil de commande de stockage, appareil de traitement, système informatique et procédé de commande de stockage
CN107729118A (zh) 面向众核处理器的修改Java虚拟机的方法
CN113168327B (zh) 一种多分支跳转处理装置和方法、处理器
Muri et al. Embedded Processor-In-Memory architecture for accelerating arithmetic operations
US11231935B2 (en) Vectorized sorted-set intersection using conflict-detection SIMD instructions
US20200174841A1 (en) Modular accelerator function unit (afu) design, discovery, and reuse
Zhao et al. Optimization of the FFT Algorithm on RISC-V CPUs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18943610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18943610

Country of ref document: EP

Kind code of ref document: A1