WO2021120713A1 - Data processing method, decoding circuit, and processor - Google Patents

Data processing method, decoding circuit, and processor Download PDF

Info

Publication number
WO2021120713A1
WO2021120713A1 PCT/CN2020/114004 CN2020114004W WO2021120713A1 WO 2021120713 A1 WO2021120713 A1 WO 2021120713A1 CN 2020114004 W CN2020114004 W CN 2020114004W WO 2021120713 A1 WO2021120713 A1 WO 2021120713A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
operand
address
repetition
type
Prior art date
Application number
PCT/CN2020/114004
Other languages
French (fr)
Chinese (zh)
Other versions
WO2021120713A8 (en
Inventor
陈庆
Original Assignee
成都海光微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都海光微电子技术有限公司 filed Critical 成都海光微电子技术有限公司
Publication of WO2021120713A1 publication Critical patent/WO2021120713A1/en
Publication of WO2021120713A8 publication Critical patent/WO2021120713A8/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30178Runtime instruction translation, e.g. macros of compressed or encrypted instructions

Definitions

  • This application belongs to the field of computer technology, and specifically relates to a data processing method, a decoding circuit, and a processor.
  • Computer instructions are instructions and commands that direct the work of a machine.
  • a program is a series of instructions arranged in a certain order. The process of executing the program is the working process of the computer.
  • a computer executes an instruction (program)
  • it needs to read the instruction from the instruction cache (Cache) first. If the instruction cache misses (Cache Miss), it will cause more serious performance problems. For example, fetching instructions takes a long time, which significantly increases the processing cycle of an instruction sequence and reduces performance.
  • fetching instructions takes a long time, which significantly increases the processing cycle of an instruction sequence and reduces performance.
  • the current instruction sequence is in a stopped and waiting state. If there is not enough active instruction sequence, the entire computing unit may stop, which significantly reduces performance.
  • Instruction block refers to a collection of instructions in a cache line (Cache Line). Since each cache line is only 512 bits and the 3-operand operation instructions use 64 bits, each cache line can only store 8 such operation instructions, so that an instruction block can only accommodate 8 three-operand instructions. Processing large operations therefore needs to read thousands of instruction blocks, which is obviously not conducive to power optimization.
  • an embodiment of the present application provides a data processing method, including: judging whether the acquired instruction is a compressed instruction; if yes, acquiring key information in the compressed instruction, and the key information includes: instruction The repetition type and the instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2;
  • the compressed instruction is decompressed to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction.
  • the acquired instruction is a compressed instruction
  • the key information in the compressed instruction is acquired, and then the compressed instruction is decompressed according to the instruction repetition type and the number of instruction repetitions in the key information to decompress the compressed instruction into and Multiple instructions corresponding to the instruction repetition type and with the same number of instruction repetitions.
  • decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions includes: according to the address ID corresponding to the operand in the instruction repetition type Generate an instruction, and update the number of instruction repetitions; when it is determined that the updated instruction repetition number is greater than a preset threshold, update the address ID corresponding to the operand; generate according to the updated address ID corresponding to the operand Command and update the instruction repetition number again; determine whether the re-updated instruction repetition number is equal to the preset threshold; if yes, determine that the decompression of the compression instruction is completed, and obtain the repetition of the instruction Multiple instructions corresponding to the type and with the same number of repetitions of the instruction.
  • the instruction repetition number is updated, and it is determined whether the updated instruction repetition number is equal to the preset threshold. If not, update the address ID corresponding to the operand, and generate instructions based on the address ID corresponding to the updated operand, then update the number of instruction repetitions again, and determine whether the updated number of instruction repetitions is equal to the preset threshold, until after the update When the number of instruction repetitions is equal to the preset threshold, the decompression of the compressed instruction is completed.
  • decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions includes: according to the address ID corresponding to the operand in the instruction repetition type Generate instructions, and record the number of generations of the generated instructions; when it is determined that the number of generations is less than the number of repetitions of the instructions, update the address ID corresponding to the operand; generate instructions according to the updated address ID corresponding to the operand, And update the number of generations; determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of the decompression of the compression instruction, and obtain the corresponding instruction repetition type, and The instruction repeats multiple instructions with the same number of times.
  • the compressed instruction when the compressed instruction is decompressed according to the instruction repetition type and the instruction repetition number, after each instruction is generated, the generation number of the generated instruction is recorded, and it is judged whether the recorded generation number is equal to the instruction repetition number. If not, update the address ID corresponding to the operand, and generate instructions based on the address ID corresponding to the updated operand, then update the generation times, and determine whether the updated generation times are equal to the number of instruction repetitions, until the updated instruction When the number of repetitions is equal to the number of instruction repetitions, the compressed instruction is decompressed. In this process, a counter is used to record the generation times of the generated instructions. After each instruction is generated, the generation times of the generated instructions are updated. When it is equal to the number of instruction repetitions, the decompression of the compressed instruction is completed.
  • updating the address ID corresponding to the operand includes: updating the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand .
  • the address ID corresponding to the operand is updated by the operand source type pointed to by the address ID corresponding to the operand, so that when the address is updated, the rules when the address ID corresponding to the operand is updated by different operand source types can be different.
  • updating the address ID corresponding to the operand includes: updating the data type of the data stored in the source of the operand pointed to by the address ID corresponding to the operand The address ID corresponding to the operand.
  • the data type of the data stored in the source of the operand pointed to by the address ID corresponding to the operand updates the address ID corresponding to the operand, so that when the address is updated, different data types can correspond to different update rules.
  • the operand in the instruction repetition type is the destination operand
  • the key information further includes the destination pass-through DF field
  • the address corresponding to the operand is updated.
  • the method further includes: determining that the value in the destination through DF field is not a set threshold.
  • the method before obtaining the key information in the compression instruction, the method further includes: determining that the compression instruction is valid.
  • an embodiment of the present application also provides a decoding circuit, including: a decoder and an instruction decompression module; the decoder is configured to determine whether the acquired instruction is a compressed instruction, and if yes, acquire the compressed instruction
  • the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; instruction decompression A module configured to decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a corresponding instruction repetition type and the same number as the instruction repetition number Multiple instructions.
  • the instruction decompression module includes: a controller configured to obtain an address ID corresponding to an operand in the instruction repetition type; an instruction generator configured to The address ID corresponding to the operand in the instruction repetition type generates an instruction; the controller is further configured to update the instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type The number of repetitions, and when it is determined that the updated number of instruction repetitions is greater than a preset threshold, the address ID corresponding to the operand is updated, and the address ID corresponding to the updated operand is sent to the instruction generator
  • the instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand; the controller is also configured to generate an instruction in the instruction generator according to the updated address corresponding to the operand After the ID command is generated, the number of instruction repetitions is updated again, and it is determined whether the re-updated instruction repetition number is equal to the preset threshold; if yes, it is determined that the de
  • the instruction decompression module includes: a controller configured to obtain an address ID corresponding to an operand in the instruction repetition type; an instruction generator configured to The address ID corresponding to the operand in the instruction repetition type generates an instruction; the controller is further configured to record the generation instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type The number of generations, and when it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator; The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand; the controller is also configured to generate the instruction according to the updated address ID corresponding to the operand in the instruction generator After the instruction, update the number of generations, and determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of decompression of the compression instruction, and obtain the corresponding instruction repetition
  • the controller is configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
  • the controller is configured to update the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand .
  • the operand in the instruction repetition type is the destination operand
  • the key information further includes the destination pass-through DF field
  • the controller is also configured to update Before the address ID corresponding to the operand, it is determined that the value in the destination through DF field is not a set threshold.
  • the source type of the operand pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS
  • the instruction decompression module further includes: a configuration register, the configuration The register is configured to store the address of the source operand in the LDS, and automatically update its own address to the address corresponding to the next source operand after reading the corresponding source operand from the LDS according to the current address; accordingly, The controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, wherein the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.
  • the decoder is further configured to determine that the compression instruction is valid before obtaining the key information in the compression instruction.
  • the instruction decompression module is further configured to send to the decoder to prevent it from obtaining the key information from the instruction distribution unit when receiving the key information sent by the decoder And when it is determined that the decompression of the compressed instruction ends, an instruction to allow the decoder to obtain the instruction from the instruction distribution unit is sent to the decoder.
  • an embodiment of the present application further provides a processor, including: an instruction distribution unit, an instruction execution unit such as the foregoing second aspect embodiment and/or any possible implementation manner in combination with the foregoing second aspect embodiment
  • a processor including: an instruction distribution unit, an instruction execution unit such as the foregoing second aspect embodiment and/or any possible implementation manner in combination with the foregoing second aspect embodiment
  • the instruction distributing unit and the instruction execution unit are both connected to the decoding circuit.
  • Fig. 1 shows a schematic diagram of each field in a VOP3R instruction provided by an embodiment of the present application.
  • Fig. 2 shows a schematic structural diagram of a decoding circuit provided by an embodiment of the present application.
  • FIG. 3 shows a schematic structural diagram of another decoding circuit provided by an embodiment of the present application.
  • FIG. 4 shows a schematic structural diagram of another decoding circuit provided by an embodiment of the present application.
  • FIG. 5 shows a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Fig. 6 shows a schematic structural diagram of a processor provided by an embodiment of the present application.
  • an instruction block can only accommodate 8 three-operand instructions, which is optimized for power Say, it's not enough. Therefore, the embodiments of this application provide an efficient instruction compression method, which can compress 64 3-operand instructions into 64 bits, so each cache line can store up to 512 3-operand instructions, which not only improves the computing performance , And it can also significantly reduce the number of instruction cache misses.
  • VOP3R Vector Operation with 3 Operand and Repeat, with 3 operands and repeated vector operations
  • set type is "110010”
  • 110010 indicates that the instruction is a VOP3R instruction, as shown in Figure 1.
  • the VOP3R instruction defines the following special fields, as shown in Table 1.
  • Repeat_Enable Repeat enable field, 4bit, each bit indicates the repetition of the source operand (Operand0, Operand1, Operand2) and the destination operand (also called Result), for example, B[59:59](Or B[ 0:0]): RepeatOperand0; B[60:60](OrB[1:1]): RepeatOperand1; B[61:61](OrB[2:2]): RepeatOperand2; B[62 :62](Or B[3:3]): Repeat destination.
  • VGPR Vector General Purpose Register
  • SGPR Scalar General Purpose Register
  • LDS_DIRECT Local Data Share
  • an embodiment of the present application provides a decoding circuit, as shown in FIG. 2. After the decoding circuit obtains the instruction from the instruction dispatch unit (Instruction Dispatch), it determines whether the instruction is a compressed instruction.
  • the instruction dispatch unit Instruction Dispatch
  • the decoding circuit sends the instruction directly to the instruction execution unit (Instruction Execution), the instruction execution unit executes the instruction; when yes, that is, when the current instruction is a compressed instruction, the decoding circuit obtains the key information in the compressed instruction; then according to the instruction repetition type and the number of instruction repetitions in the key information
  • the compressed instruction is decompressed to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions.
  • the key information includes: instruction repetition type and instruction repetition number.
  • the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2.
  • the instruction repeat type is obtained according to the repeat enable field (Repeat_Enable) in the compressed instruction, and the instruction repeat number is obtained according to the repeat count field (Repeat_Counter).
  • the detailed parameters of the key information are shown in Table 2.
  • the compression instruction is:
  • the compressed instruction is decompressed, and 62 instructions corresponding to the instruction repetition type (repeat Operand0 and Operand1) and the same number of instruction repetition times (62) can be obtained.
  • the obtained instructions are as follows :
  • Repeat Enable represents the type of instruction repetition, where 0x3 represents the two operands of Operand0 and Operand1, RepeatCounter represents the number of instruction repetitions, and 62 represents the number of repetitions, so that after decompressing the compressed instruction, you can get 62 instructions .
  • the types of instructions to be repeated are Operand0 and Operand1 as examples.
  • the types of instructions to be repeated can be repeated Result (destination operand), Operand0, Operand1, Operand2, among the four operands. There are at least one of these 15 combinations. Different bytes are defined to indicate different repeat types. For example, Repeat Enable (0x1) indicates repeating Operand 0 operand, Repeat Enable (0x2) indicates repeating Operand 1 operand, and Repeat Enable( 0x3) means to repeat the two operands of Operand0 and Operand1.
  • the instruction logic of the corresponding hardware includes regular mode and repeat mode.
  • the execution logic is obtained from the instruction distribution unit. Instructions and execute.
  • the foregoing decoding circuit compresses instructions so that one instruction block can accommodate more three-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.
  • the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition number; When it is determined that the updated instruction repetition number is greater than the preset threshold, the address ID corresponding to the operand is updated according to the address ID corresponding to the operand; the instruction is generated according to the address ID corresponding to the updated operand, and the instruction repetition number is updated again ; Determine whether the number of instruction repetitions after the re-update is equal to the preset threshold; if yes, complete the decompression of the compressed instruction, and obtain multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions, if it is no When, repeat the operation (update the address ID corresponding to the operand; generate the instruction according to the address ID corresponding to the updated operand, and update the instruction repetition number again; determine whether the instruction repetition number after the update is equal to the preset threshold) until The
  • Operand0_id OperandRepeat(Operand0_id, Repeat_Enable&0x1); // The address update function of Operand0;
  • Operand1_id OperandRepeat(Operand1_id, Repeat_Enable&0x2); // The address update function of Operand1;
  • Operand2_id OperandRepeat(Operand2_id, Repeat_Enable&0x4); // The address update function of Operand2;
  • Result_ID OperandRepeat(Result_ID, Repeat_Enable&0x8); //Result address update function;
  • the preset threshold such as 0
  • the address ID (address 2) corresponding to the operand is updated, the instruction is generated according to the updated address ID, and the instruction repetition times are updated again, and then it is judged whether the updated instruction repetition times is equal to the preset threshold value If yes, update the address ID corresponding to the operand again, generate instructions based on the updated address ID, and update the number of instruction repetitions again (the number of instruction repetitions at this time is 60), and then determine the updated instruction repetition number (60 ) Is equal to the prese
  • the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation times of the generated instruction ; When it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand; generate instructions according to the address ID corresponding to the updated operand, and update the number of generations; determine whether the updated number of generations is equal to the number of instruction repetitions; If yes, complete the decompression of the compressed instruction, and obtain multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions; otherwise, repeat the operation (update the address ID corresponding to the operand; according to the update The address ID corresponding to the subsequent operand generates an instruction and updates the generation times; it is determined whether the updated generation times are equal to the instruction repetition times), until the updated generation times are equal to the instruction repetition times.
  • the principle of this embodiment is the same as that of the previous embodiment. The difference is that in the first embodiment, after the command is generated, the number of repetitions of the command is updated, and it is determined whether the updated number of repetitions of the command is equal to the preset number.
  • the threshold (for example, 0) is used to determine whether the decompression of the compressed instruction is completed.
  • the number of generations of the generated instruction is recorded, and the completion is determined by judging whether the number of generations is equal to the number of repetitions of the instruction Decompression work on compression instructions. That is, in this embodiment, it is necessary to use a counter to count the number of generated instructions. Each time an instruction is generated, the number is counted once, and the value is incremented.
  • the operand in the above instruction repetition type may be at least one of the four operands of Result, Operand0, Operand1, Operand2.
  • it can be based on the operand source type pointed to by the address ID corresponding to the operand (such as VGPR/SGPR/LDS_DIRECT), such as the update corresponding to different operand source types
  • the rules for the address ID may be different. For example, the rule for updating the address ID corresponding to VGPR as the source of the operand is different from the rule for updating the address ID corresponding to the SGPR as the source of the operand.
  • the rules for updating the address ID are different from when the operand source pointed to by the ID corresponding to the operand is VGPR/SGPR. If the operand source pointed to by the ID corresponding to the operand is LDS_DIRECT, in this mode, the hardware reads the data from the LDS as the operand, and the access address and data type are determined by the configuration register, such as the M0 register (32bit dedicated hardware internal Register, its low 16bit is used as address by LDS_DIRECT) to determine.
  • M0 register 32bit dedicated hardware internal Register, its low 16bit is used as address by LDS_DIRECT
  • the address field of the M0 register needs to be automatically updated.
  • the address pointed to by the address ID is the address stored in the M0 register, and the address is used to read the source operand stored in the LDS. That is, the M0 register is configured to store the address of the source operand in the LDS (such as the element of each row in the matrix), and after reading the corresponding element from the LDS according to the current address, the M0 register needs to be The address is updated to the address corresponding to the next element.
  • the data stored in the operand source pointed to by the address ID corresponding to the operand can also be used.
  • Type to update the address ID corresponding to the operand Different data types correspond to different address update rules, for example, as shown below:
  • Address i+1 Addressi+0x1;//The data type is unsignedbyte;
  • Address i+1 Addressi+0x2;//The data type is unsignedbyte;
  • Address i+1 Addressi+0x4;//The data type is DWord;
  • Address i+1 Addressi+0x0;//The data type is Default(Reserved);
  • Address i+1 Addressi+0x1;//The data type is signed byte;
  • Address i+1 Addressi+0x2;//The data type is signed short;
  • Address i+1 Addressi+0x8;//The data type is Qword;
  • the destination pass-through DF field can be used to determine whether the source type of the operand pointed to by the address ID corresponding to the destination operand is a temporary register for data pass-through.
  • the Result_ID is forwarding (pass-through).
  • the address does not need to be updated, just keep forwarding.
  • the instruction is generated based on the default Result_ID in the compressed instruction.
  • the Result_ID in the instruction is the same.
  • the decoding circuit can also determine whether the compression instruction is valid before obtaining the key information in the compression instruction, and only obtain the compression instruction after the compression instruction is determined to be valid. According to the key information in the key information, the compressed instructions are decompressed according to the instruction repetition type and the number of instruction repetitions in the key information.
  • whether the compression instruction is valid can be determined in the following manner: the compression is determined according to the repetitive enable field in the compression instruction that characterizes the source operand, or the repetitive enable field in the compression instruction that characterizes the destination operand. Whether the instruction is valid; when the repetitive enable field that characterizes the source operand is not zero, and the address ID corresponding to the source operand points to the specified operand source type (such as VGPR/SGPR/LDS_Direct), or when characterizing the destination operand When the repeated enable field is not zero, it indicates that the compression instruction is valid. If at least one of the following is true, it means that the compression instruction is valid:
  • the repetition enable field of at least one source operand is not zero, and the corresponding address ID points to the source type of the specified operand, or the repetition enable field of the destination operand is not zero, indicating that the compression instruction is valid.
  • the decoding circuit includes: a repeat decoder, an instruction decompression module, and the decoder is connected to the instruction decompression module.
  • the decoder is configured to determine whether the acquired instruction is a compressed instruction, and if it is not, it sends the instruction to the instruction execution unit to execute the instruction, and if it is, it acquires key information in the compressed instruction.
  • the decoder is further configured to determine that the compressed instruction is valid before obtaining key information in the compressed instruction.
  • the decoder is configured to determine that the compression instruction is valid according to the following method: according to the repeated enable field in the compression instruction that characterizes the source operand, or the repeated enable field in the compression instruction that characterizes the destination operand. Determine whether the compression instruction is valid; when the repetitive enable field representing the source operand is not zero, and the address ID corresponding to the source operand points to the specified operand source type, or when the repetitive enable field representing the destination operand is not When it is zero, it means that the compression command is valid.
  • the instruction decompression module is configured to decompress the compressed instruction according to the instruction repetition type and the number of instruction repetitions, so as to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions.
  • the instruction decompression module is also configured to, upon receiving the key information sent by the decoder, send an instruction to the decoder to prevent it from obtaining the instruction from the instruction distribution unit, and to decompress the compressed instruction after the completion of the decompression. At the time, send an instruction to the decoder to allow it to obtain instructions from the instruction distribution unit.
  • the instruction decompression module includes: a controller and an instruction generator.
  • the controller is respectively connected with the instruction generator and the decoder.
  • the controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type; the instruction generator is configured to generate the address ID corresponding to the operand in the instruction repetition type The controller is also configured to update the number of instruction repetitions after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, and update the operation when it is determined that the updated instruction repetition number is greater than a preset threshold
  • the address ID corresponding to the number, and the address ID corresponding to the updated operand is sent to the instruction generator;
  • the instruction generator is also configured to generate instructions according to the address ID corresponding to the updated operand; the controller is also configured to After the instruction generator generates the instruction according to the address ID corresponding to the updated operand, it updates the instruction repetition times again, and determines whether the re-updated instruction repetition times is equal to the preset threshold; if yes, completes the decompression of the compressed instruction, Obtain multiple instructions corresponding to the instruction repetition type and with the same number of instruction repetitions.
  • the controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type; the instruction generator is configured to generate the instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller also After the instruction generator generates an instruction according to the address ID corresponding to the operand in the instruction repetition type, it records the generation times of the generated instruction, and when it is determined that the generation times are less than the instruction repetition times, the address ID corresponding to the operand is updated, The address ID corresponding to the updated operand is sent to the instruction generator; the instruction generator is also configured to generate instructions according to the address ID corresponding to the updated operand; the controller is also configured to generate instructions in the instruction generator according to the updated operation After generating the instruction for the address ID corresponding to the number, update the generation times, and determine whether the updated generation times are equal to the instruction repetition times; if yes, complete the decompression of the compressed instruction, and obtain the corresponding instruction repetition type and repeat the instruction Multiple instructions with the same number of times.
  • the controller when the controller updates the address ID corresponding to the operand, it is further configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
  • the controller when the controller updates the address ID corresponding to the operand, when the controller updates the address ID corresponding to the operand, it is also configured to use the data stored in the source of the operand pointed to by the address ID corresponding to the operand.
  • the address ID corresponding to the data type update operand when the controller updates the address ID corresponding to the operand, it is also configured to use the data stored in the source of the operand pointed to by the address ID corresponding to the operand.
  • the address ID corresponding to the data type update operand when the controller updates the address ID corresponding to the operand.
  • the operand in the instruction repetition type is the destination operand
  • the key information also includes the destination pass-through DF field.
  • the controller is also configured to determine the destination pass-through before updating the address ID corresponding to the operand
  • the value in the DF field is not the set threshold (such as 1).
  • the instruction decompression module When the operand source type pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, the instruction decompression module also includes: a configuration register, which is configured to store the address of the source operand in the LDS, and is based on the current After the address reads the corresponding source operand from the LDS, it automatically updates the address of its own (configuration register) to the address corresponding to the next source operand.
  • the controller updates the address ID corresponding to the operand, it is also configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, and the address ID is the same as the address currently indicated by the configuration register.
  • the instruction decompression module includes: a controller, a configuration register (M0 register), and an instruction generator. The controller is respectively connected with the decoder, the instruction generator and the configuration register.
  • instructions are compressed through VOP3R, so that each cache line (512bit) can accommodate 512 3-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.
  • the following uses the method provided in the embodiment of the present application to be applied to matrix multiplication as an example for description.
  • a 64X64 matrix is taken as an example
  • C 64x64 A 64x64 *B 64x64 , where the 64X64 matrix size is only an example and is not limited to this.
  • each arithmetic operation unit has a 200x64bit VGPR space.
  • A(0,0) ⁇ LDS(Address0);//A(0,0) is stored in the Address0 location of LDS;
  • A(0,1) ⁇ LDS(Address1);//A(0,1) is stored in Address1 of LDS;
  • A(0,2) ⁇ LDS(Address2);//A(0,2) is stored in the location of Address2 of LDS;
  • Matrix B is loaded into the VGPR space, as shown in Table 4.
  • VGPR different VGPR stores different rows.
  • the elements in matrix A are loaded into 64 ALUs one by one in parallel, and are multiplied by the elements corresponding to the columns stored in each of the 64 vector general registers, 64
  • the ALU sequentially accumulates the multiplication results generated by the elements in the same row of matrix A and the corresponding elements of matrix B in parallel to obtain all elements in the same row of matrix C, thereby completing the multiplication operation of matrix A and second matrix B.
  • the instruction to calculate matrix C in normal mode is as follows:
  • M0_register start_address; //The initial address of the M0 register, where the M0 register is configured to store the address of each element in the read matrix A, and read the matrix A from the LDS based on the current address of the M0 register in 64 ALUs in parallel After the corresponding element in the file is automatically updated to the address corresponding to the next element.
  • Block_Start::Forwarding LDS_Direct(M0_register)*B(0,ALU_Index);
  • Block_End::C(0,ALU_Index) LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
  • Block_Start::Forwarding LDS_Direct(M0_register)*B(0,ALU_Index);
  • Block_End::C(1,ALU_Index) LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
  • Block_Start::Forwarding LDS_Direct(M0_register)*B(0,ALU_Index);
  • Block_End::C(63,ALU_Index) LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
  • Block_Star::Forwarding LDS_Direct(M0_register)*B(0,ALU_Index);
  • Block_End::C(0,ALU_Index) LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
  • Block_Start::Forwarding LDS_Direct(M0_register)*B(0,ALU_Index);
  • Block_End::C(1,ALU_Index) LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
  • Block_Start::Forwarding LDS_Direct(M0_register)*B(0,ALU_Index);
  • Block_End::C(63,ALU_Index) LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
  • FIG. 5 Please refer to FIG. 5 for a data processing method provided by an embodiment of this application. The steps involved will be described below in conjunction with FIG. 5.
  • Step S101 Determine whether the acquired instruction is a compressed instruction.
  • step S102 If it is yes, execute step S102, if it is no, send the acquired instruction to the instruction execution unit.
  • Step S102 Acquire key information in the compressed instruction, where the key information includes: instruction repetition type and instruction repetition number.
  • the instruction repetition type is used to indicate the instruction type to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2.
  • Step S103 Decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a quantity corresponding to the instruction repetition type and the same quantity as the instruction repetition number. Multiple instructions.
  • the method before obtaining the key information in the compression instruction, the method further includes: determining that the compression instruction is valid.
  • the process of decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and Update the number of repetitions of the instruction; when it is determined that the number of repetitions of the instruction after the update is greater than a preset threshold, update the address ID corresponding to the operand; generate the instruction according to the updated address ID corresponding to the operand, and again Update the number of instruction repetitions; determine whether the re-updated instruction repetition number is equal to the preset threshold; if yes, determine that the decompression of the compressed instruction is completed, and obtain the corresponding instruction repetition type, And multiple instructions with the same number of repetitions as the instructions.
  • the process of decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation The number of generations of the instruction; when it is determined that the number of generations is less than the number of repetitions of the instruction, the address ID corresponding to the operand is updated; the instruction is generated according to the updated address ID corresponding to the operand, and the number of generations is updated ; Determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of the decompression of the compressed instruction, and obtain the number corresponding to the instruction repetition type and the number of instruction repetitions The same multiple instructions.
  • the process of updating the address ID corresponding to the operand may be: updating the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
  • the process of updating the address ID corresponding to the operand may also be: updating the corresponding operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand The address ID.
  • the operand in the instruction repetition type is the destination operand
  • the key information further includes the destination pass-through DF field.
  • the method It also includes: determining that the value in the destination through DF field is not a set threshold.
  • the embodiment of the present application also provides a processor, as shown in FIG. 6.
  • the processor includes a decoding circuit, an instruction execution unit, and an instruction distribution unit in any of the foregoing embodiments. Both the instruction distribution unit and the instruction execution unit are connected to the decoding circuit.
  • the instruction distribution unit is configured to store instructions so that the decoding circuit can obtain instructions from the instruction distribution unit.
  • the instruction execution unit is configured to execute instructions issued by the decoding circuit.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • the foregoing processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), a graphics processing unit (Graphics Processing Unit, GPU), etc.; a general-purpose processor may be a micro
  • the processor or the processor may also be any conventional processor or the like.
  • the data processing method, decoding circuit, and processor provided in this application determine whether the acquired instruction is a compressed instruction; if yes, acquire key information in the compressed instruction, and the key information includes: instruction repetition type and instruction The number of repetitions, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; the compression instruction is performed according to the instruction repetition type and the instruction repetition number Decompression, so as to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and having the same number of repetition times of the instruction.
  • one instruction block can accommodate more three-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A data processing method, a decoding circuit, and a processor, which belong to the technical field of computers. The method comprises: determining whether an acquired instruction is a compressed instruction (S101); when the acquired instruction is a compressed instruction, acquiring key information in the compressed instruction, the key information comprising an instruction repetition type and the number of times the instruction is repeated (S102), wherein the instruction repetition type is used to indicate an instruction type to be repeated, and the number of times the instruction is repeated is a positive integer greater than or equal to 2; and decompressing the compressed instruction according to the instruction repetition type and the number of times the instruction is repeated so as to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and being the same number as the number of times the instruction is repeated (S103). By compressing an instruction, one instruction block can accommodate more three operand instructions, which effectively reduces the probability of instruction cache miss, while also optimizing efficiency.

Description

一种数据处理方法、解码电路及处理器Data processing method, decoding circuit and processor
相关申请的交叉引用Cross-references to related applications
本申请要求于2019年12月16日提交中国专利局的申请号为2019113025118、名称为“一种数据处理方法、解码电路及处理器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 2019113025118 and titled "A data processing method, decoding circuit and processor" filed with the Chinese Patent Office on December 16, 2019, the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请属于计算机技术领域,具体涉及一种数据处理方法、解码电路及处理器。This application belongs to the field of computer technology, and specifically relates to a data processing method, a decoding circuit, and a processor.
背景技术Background technique
计算机指令就是指挥机器工作的指示和命令,程序就是一系列按一定顺序排列的指令,执行程序的过程就是计算机的工作过程。计算机在执行指令(程序)时,需要先去指令缓存(Cache)中读取指令,如果指令缓存未命中(Cache Miss),会带来比较严重的性能问题。例如,取指令需要很长的时间,这显著增加了一个指令序列的处理周期,并且降低了性能。在发生指令缺失时,当前指令序列处于停止和等待状态,如果没有足够的激活指令序列,整个计算单元就可能停止,这显著降低了性能。Computer instructions are instructions and commands that direct the work of a machine. A program is a series of instructions arranged in a certain order. The process of executing the program is the working process of the computer. When a computer executes an instruction (program), it needs to read the instruction from the instruction cache (Cache) first. If the instruction cache misses (Cache Miss), it will cause more serious performance problems. For example, fetching instructions takes a long time, which significantly increases the processing cycle of an instruction sequence and reduces performance. When an instruction is missing, the current instruction sequence is in a stopped and waiting state. If there is not enough active instruction sequence, the entire computing unit may stop, which significantly reduces performance.
指令块,指一个缓存行(Cache Line)内指令的集合。由于每个缓存行仅512bit,而3操作数的运算指令使用64bit,因此每个缓存行只能存储8条这样的运算指令,从而使得一个指令块只能容纳8条三操作数指令。处理大型运算因而就需要读取成千上万个指令块,显然这不利于功耗优化。Instruction block refers to a collection of instructions in a cache line (Cache Line). Since each cache line is only 512 bits and the 3-operand operation instructions use 64 bits, each cache line can only store 8 such operation instructions, so that an instruction block can only accommodate 8 three-operand instructions. Processing large operations therefore needs to read thousands of instruction blocks, which is obviously not conducive to power optimization.
发明内容Summary of the invention
本申请的实施例是这样实现的:The embodiments of this application are implemented as follows:
第一方面,本申请实施例提供了一种数据处理方法,包括:判断获取到的指令是否为压缩指令;在为是时,获取所述压缩指令中的关键信息,所述关键信息包括:指令重复类型和指令重复次数,其中,所述指令重复类型用于指示待重复的指令类型,所述指令重复次数为大于等于2的正整数;根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,以将所述压缩指令解压成与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。本申请实施例中,当获取到的指令为压缩指令,获取该压缩指令中的关键信息,然后根据关键信息中的指令重复类型和指令重复次数对压缩指令进行解压,以将压缩指令解压成与指令重复类型对应的,且与指令重复次数数量相同的多条指令。In the first aspect, an embodiment of the present application provides a data processing method, including: judging whether the acquired instruction is a compressed instruction; if yes, acquiring key information in the compressed instruction, and the key information includes: instruction The repetition type and the instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; The compressed instruction is decompressed to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction. In the embodiment of the present application, when the acquired instruction is a compressed instruction, the key information in the compressed instruction is acquired, and then the compressed instruction is decompressed according to the instruction repetition type and the number of instruction repetitions in the key information to decompress the compressed instruction into and Multiple instructions corresponding to the instruction repetition type and with the same number of instruction repetitions.
结合第一方面实施例的一种可能的实施方式,根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,包括:根据所述指令重复类型中的操作数对应的地址ID生成指令,并更新所述指令重复次数;在确定更新后的所述指令重复次数大于预设阈值时,更新所述操作数对应的地址ID;根据更新后的所述操作数对应的地址ID生成指令,并再次更新所述指令重复次数;判断再次更新后的所述指令重复次数是否等于所述预设阈值;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。本申请实施例中,在根据指令重复类型和指令重复次数对压缩指令进行解压时,在每生成一条指令后,更新指令重复次数,并判断更新后的指令重复次数是否等于预设阈值,在为否时,更新操作数对应的地址ID,并基于根据更新后的操作数对应的地址ID生成指令,然后再次更新指令重复次数,并判断更新后的指令重复次数是否等于预设阈值,直至更新后的指令重复次数等于预设阈值时,完成对压缩指令的解压。With reference to a possible implementation manner of the embodiment of the first aspect, decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions includes: according to the address ID corresponding to the operand in the instruction repetition type Generate an instruction, and update the number of instruction repetitions; when it is determined that the updated instruction repetition number is greater than a preset threshold, update the address ID corresponding to the operand; generate according to the updated address ID corresponding to the operand Command and update the instruction repetition number again; determine whether the re-updated instruction repetition number is equal to the preset threshold; if yes, determine that the decompression of the compression instruction is completed, and obtain the repetition of the instruction Multiple instructions corresponding to the type and with the same number of repetitions of the instruction. In the embodiment of the present application, when the compressed instruction is decompressed according to the instruction repetition type and the instruction repetition number, after each instruction is generated, the instruction repetition number is updated, and it is determined whether the updated instruction repetition number is equal to the preset threshold. If not, update the address ID corresponding to the operand, and generate instructions based on the address ID corresponding to the updated operand, then update the number of instruction repetitions again, and determine whether the updated number of instruction repetitions is equal to the preset threshold, until after the update When the number of instruction repetitions is equal to the preset threshold, the decompression of the compressed instruction is completed.
结合第一方面实施例的一种可能的实施方式,根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,包括:根据所述指令重复类型中的操作数对应的地址ID生成指令,并记录生成指令的生成次数;在确定所述生成次数小于所述指令重复次数时,更新所述操作数对应的地址ID;根据更新后的所述操作数对应的地址ID生成指令,并更新 所述生成次数;判断更新后的所述生成次数是否等于所述指令重复次数;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。本申请实施例中,在根据指令重复类型和指令重复次数对压缩指令进行解压时,在每生成一条指令后,便记录生成指令的生成次数,并判断记录的生成次数是否等于指令重复次数,在为否时,更新操作数对应的地址ID,并基于根据更新后的操作数对应的地址ID生成指令,然后更新生成次数,并判断更新后的生成次数是否等于指令重复次数,直至更新后的指令重复次数等于指令重复次数时,完成对压缩指令的解压,在这个过程中,利用计数器来对生成指令的生成次数进行记录,在每生成一条指令后,便更新生成指令的生成次数,当生成次数等于指令重复次数时,完成对压缩指令的解压。With reference to a possible implementation manner of the embodiment of the first aspect, decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions includes: according to the address ID corresponding to the operand in the instruction repetition type Generate instructions, and record the number of generations of the generated instructions; when it is determined that the number of generations is less than the number of repetitions of the instructions, update the address ID corresponding to the operand; generate instructions according to the updated address ID corresponding to the operand, And update the number of generations; determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of the decompression of the compression instruction, and obtain the corresponding instruction repetition type, and The instruction repeats multiple instructions with the same number of times. In the embodiment of the present application, when the compressed instruction is decompressed according to the instruction repetition type and the instruction repetition number, after each instruction is generated, the generation number of the generated instruction is recorded, and it is judged whether the recorded generation number is equal to the instruction repetition number. If not, update the address ID corresponding to the operand, and generate instructions based on the address ID corresponding to the updated operand, then update the generation times, and determine whether the updated generation times are equal to the number of instruction repetitions, until the updated instruction When the number of repetitions is equal to the number of instruction repetitions, the compressed instruction is decompressed. In this process, a counter is used to record the generation times of the generated instructions. After each instruction is generated, the generation times of the generated instructions are updated. When it is equal to the number of instruction repetitions, the decompression of the compressed instruction is completed.
结合第一方面实施例的一种可能的实施方式,更新所述操作数对应的地址ID,包括:根据所述操作数对应的地址ID指向的操作数来源类型更新所述操作数对应的地址ID。本申请实施例中,通过操作数对应的地址ID指向的操作数来源类型更新操作数对应的地址ID,使得在地址更新时,不同的操作数来源类型更新操作数对应的地址ID时的规则可以不同。With reference to a possible implementation manner of the embodiment of the first aspect, updating the address ID corresponding to the operand includes: updating the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand . In the embodiment of the present application, the address ID corresponding to the operand is updated by the operand source type pointed to by the address ID corresponding to the operand, so that when the address is updated, the rules when the address ID corresponding to the operand is updated by different operand source types can be different.
结合第一方面实施例的一种可能的实施方式,更新所述操作数对应的地址ID,包括:根据所述操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新所述操作数对应的地址ID。本申请实施例中,操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新操作数对应的地址ID,使得在地址更新时,不同的数据类型可以对应不同的更新规则。With reference to a possible implementation manner of the embodiment of the first aspect, updating the address ID corresponding to the operand includes: updating the data type of the data stored in the source of the operand pointed to by the address ID corresponding to the operand The address ID corresponding to the operand. In the embodiment of the present application, the data type of the data stored in the source of the operand pointed to by the address ID corresponding to the operand updates the address ID corresponding to the operand, so that when the address is updated, different data types can correspond to different update rules.
结合第一方面实施例的一种可能的实施方式,所述指令重复类型中的操作数为目的操作数,所述关键信息还包括:目的地直通DF字段,在更新所述操作数对应的地址ID之前,所述方法还包括:确定所述目的地直通DF字段中的数值不为设定阈值。With reference to a possible implementation manner of the embodiment of the first aspect, the operand in the instruction repetition type is the destination operand, and the key information further includes the destination pass-through DF field, and the address corresponding to the operand is updated. Before ID, the method further includes: determining that the value in the destination through DF field is not a set threshold.
结合第一方面实施例的一种可能的实施方式,在获取所述压缩指令中的关键信息之前,所述方法还包括:确定所述压缩指令有效。With reference to a possible implementation manner of the embodiment of the first aspect, before obtaining the key information in the compression instruction, the method further includes: determining that the compression instruction is valid.
第二方面,本申请实施例还提供了一种解码电路,包括:解码器和指令解压模块;解码器,配置成判断获取到的指令是否为压缩指令,在为是时,获取所述压缩指令中的关键信息,所述关键信息包括:指令重复类型和指令重复次数,其中,所述指令重复类型用于指示待重复的指令类型,所述指令重复次数为大于等于2的正整数;指令解压模块,配置成根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,以将所述压缩指令解压成与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。In the second aspect, an embodiment of the present application also provides a decoding circuit, including: a decoder and an instruction decompression module; the decoder is configured to determine whether the acquired instruction is a compressed instruction, and if yes, acquire the compressed instruction In the key information, the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; instruction decompression A module configured to decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a corresponding instruction repetition type and the same number as the instruction repetition number Multiple instructions.
结合第二方面实施例的一种可能的实施方式,所述指令解压模块包括:控制器,配置成获取所述指令重复类型中的操作数对应的地址ID;指令生成器,配置成根据所述指令重复类型中的操作数对应的地址ID生成指令;所述控制器,还配置成在所述指令生成器根据所述指令重复类型中的操作数对应的地址ID生成指令后,更新所述指令重复次数,以及在确定更新后的所述指令重复次数大于预设阈值时,更新所述操作数对应的地址ID,并将更新后的所述操作数对应的地址ID发给所述指令生成器;所述指令生成器,还配置成根据更新后的所述操作数对应的地址ID生成指令;所述控制器,还配置成在所述指令生成器根据更新后的所述操作数对应的地址ID生成指令后,再次更新所述指令重复次数,并判断再次更新后的所述指令重复次数是否等于所述预设阈值;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompression module includes: a controller configured to obtain an address ID corresponding to an operand in the instruction repetition type; an instruction generator configured to The address ID corresponding to the operand in the instruction repetition type generates an instruction; the controller is further configured to update the instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type The number of repetitions, and when it is determined that the updated number of instruction repetitions is greater than a preset threshold, the address ID corresponding to the operand is updated, and the address ID corresponding to the updated operand is sent to the instruction generator The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand; the controller is also configured to generate an instruction in the instruction generator according to the updated address corresponding to the operand After the ID command is generated, the number of instruction repetitions is updated again, and it is determined whether the re-updated instruction repetition number is equal to the preset threshold; if yes, it is determined that the decompression of the compression instruction ends, and the result is obtained. Multiple instructions corresponding to the instruction repetition type and with the same number of repetition times of the instruction.
结合第二方面实施例的一种可能的实施方式,所述指令解压模块包括:控制器,配置成获取所述指令重复类型中的操作数对应的地址ID;指令生成器,配置成根据所述指令重复类型中的操作数对应的地址ID生成指令;所述控制器,还配置成在所述指令生成器根据所述指令重复类型中的操作数对应的地址ID生成指令后,记录生成指令的生成次数,以及在确定所述生成次数小于所述指令重复次数时,更新所述操作数对应的地址ID,并将更新后的所述操作数对应的地址ID发给所述指令生成器;所述指令生成器,还配置成根据更新后的所述操作数对应的地址ID生成指令;所述控制器,还配置成在所述指令生成器根据更 新后的所述操作数对应的地址ID生成指令后,更新所述生成次数,并判断更新后的所述生成次数是否等于所述指令重复次数;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompression module includes: a controller configured to obtain an address ID corresponding to an operand in the instruction repetition type; an instruction generator configured to The address ID corresponding to the operand in the instruction repetition type generates an instruction; the controller is further configured to record the generation instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type The number of generations, and when it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator; The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand; the controller is also configured to generate the instruction according to the updated address ID corresponding to the operand in the instruction generator After the instruction, update the number of generations, and determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of decompression of the compression instruction, and obtain the corresponding instruction repetition type , And multiple instructions with the same number of repetitions of the instruction.
结合第二方面实施例的一种可能的实施方式,所述控制器在配置成根据所述操作数对应的地址ID指向的操作数来源类型更新所述操作数对应的地址ID。With reference to a possible implementation manner of the embodiment of the second aspect, the controller is configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
结合第二方面实施例的一种可能的实施方式,所述控制器配置成根据所述操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新所述操作数对应的地址ID。With reference to a possible implementation manner of the embodiment of the second aspect, the controller is configured to update the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand .
结合第二方面实施例的一种可能的实施方式,所述指令重复类型中的操作数为目的操作数,所述关键信息还包括:目的地直通DF字段,所述控制器还配置成在更新所述操作数对应的地址ID之前,确定所述目的地直通DF字段中的数值不为设定阈值。With reference to a possible implementation manner of the embodiment of the second aspect, the operand in the instruction repetition type is the destination operand, the key information further includes the destination pass-through DF field, and the controller is also configured to update Before the address ID corresponding to the operand, it is determined that the value in the destination through DF field is not a set threshold.
结合第二方面实施例的一种可能的实施方式,所述指令重复类型中的操作数对应的地址ID指向的操作数来源类型为LDS,所述指令解压模块还包括:配置寄存器,所述配置寄存器配置成存储获取LDS中的源操作数的地址,并且在根据当前的地址从LDS中读取对应的源操作数后自动将自身的地址更新到下一个源操作数对应的地址;相应地,所述控制器配置成根据所述配置寄存器当前指示的地址来更新所述操作数对应的地址ID,其中,所述操作数对应的地址ID与所述配置寄存器当前指示的地址相同。With reference to a possible implementation manner of the embodiment of the second aspect, the source type of the operand pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, and the instruction decompression module further includes: a configuration register, the configuration The register is configured to store the address of the source operand in the LDS, and automatically update its own address to the address corresponding to the next source operand after reading the corresponding source operand from the LDS according to the current address; accordingly, The controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, wherein the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.
结合第二方面实施例的一种可能的实施方式,所述解码器,还配置成在获取所述压缩指令中的关键信息之前,确定所述压缩指令有效。With reference to a possible implementation manner of the embodiment of the second aspect, the decoder is further configured to determine that the compression instruction is valid before obtaining the key information in the compression instruction.
结合第二方面实施例的一种可能的实施方式,所述指令解压模块,还配置成在接收到所述解码器发送的关键信息时,向所述解码器发送阻止其从指令分发单元中获取指令的指示,以及在确定对所述压缩指令的解压结束时,向所述解码器发送允许其从指令分发单元中获取指令的指示。With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompression module is further configured to send to the decoder to prevent it from obtaining the key information from the instruction distribution unit when receiving the key information sent by the decoder And when it is determined that the decompression of the compressed instruction ends, an instruction to allow the decoder to obtain the instruction from the instruction distribution unit is sent to the decoder.
第三方面,本申请实施例还提供了一种处理器,包括:指令分发单元、指令执行单元如上述第二方面实施例和/或结合上述第二方面实施例的任一种可能的实施方式提供的解码电路,所述指令分发单元和所述指令执行单元均与所述解码电路连接。In a third aspect, an embodiment of the present application further provides a processor, including: an instruction distribution unit, an instruction execution unit such as the foregoing second aspect embodiment and/or any possible implementation manner in combination with the foregoing second aspect embodiment In the decoding circuit provided, the instruction distributing unit and the instruction execution unit are both connected to the decoding circuit.
本申请的其他特征和优点将在随后的说明书阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请实施例而了解。本申请的目的和其他优点可通过在所写的说明书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be described in the following description, and partly become obvious from the description, or can be understood by implementing the embodiments of the present application. The purpose and other advantages of the present application can be realized and obtained through the structure specifically pointed out in the written description and the drawings.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。通过附图所示,本申请的上述及其它目的、特征和优势将更加清晰。在全部附图中相同的附图标记指示相同的部分。并未刻意按实际尺寸等比例缩放绘制附图,重点在于示出本申请的主旨。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings. The above and other objectives, features and advantages of the present application will be clearer through the drawings. The same reference numerals indicate the same parts in all the drawings. The drawings are not deliberately scaled to the actual size and proportions, and the focus is to show the main point of the application.
图1示出了本申请实施例提供的一种VOP3R指令中各字段的示意图。Fig. 1 shows a schematic diagram of each field in a VOP3R instruction provided by an embodiment of the present application.
图2示出了本申请实施例提供的一种解码电路的结构示意图。Fig. 2 shows a schematic structural diagram of a decoding circuit provided by an embodiment of the present application.
图3示出了本申请实施例提供的又一种解码电路的结构示意图。FIG. 3 shows a schematic structural diagram of another decoding circuit provided by an embodiment of the present application.
图4示出了本申请实施例提供的又一种解码电路的结构示意图。FIG. 4 shows a schematic structural diagram of another decoding circuit provided by an embodiment of the present application.
图5示出了本申请实施例提供的一种数据处理方法的流程示意图。FIG. 5 shows a schematic flowchart of a data processing method provided by an embodiment of the present application.
图6示出了本申请实施例提供的一种处理器的结构示意图。Fig. 6 shows a schematic structural diagram of a processor provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中诸如“第一”、“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体 或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once a certain item is defined in one figure, it does not need to be further defined and explained in subsequent figures. At the same time, in the description of this application, relational terms such as "first", "second", etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities Or there is any such actual relationship or sequence between operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.
再者,本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。Furthermore, the term "and/or" in this application is only an association relationship describing the associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A alone exists, and both A and A exist at the same time. B, there are three cases of B alone.
鉴于当前一个缓存行只能存储8条3操作数的运算指令,为了避免发生指令缓存未命中(Cache Miss)的情况,使得一个指令块只能容纳8条三操作数指令,这对于功率优化来说,远远不够。因此,本申请实施例中提供了一种高效的指令压缩方法,使得可以将64条3操作数指令压缩为64bit,因此每个缓存行最多可以存储512条3操作数指令,不仅提高了运算性能,而且还可以显著减少指令缓存未命中的情况。In view of the fact that a cache line can only store 8 three-operand arithmetic instructions, in order to avoid the occurrence of an instruction cache miss (Cache Miss), an instruction block can only accommodate 8 three-operand instructions, which is optimized for power Say, it's not enough. Therefore, the embodiments of this application provide an efficient instruction compression method, which can compress 64 3-operand instructions into 64 bits, so each cache line can store up to 512 3-operand instructions, which not only improves the computing performance , And it can also significantly reduce the number of instruction cache misses.
为了支持可以将64条3操作数指令压缩为64bit,本申请中引入了一个VOP3R(Vector Operation with 3 Operand and Repeat,具有3个操作数和重复的向量运算)指令,设定类型为“110010”,即110010表示指令是VOP3R指令,如图1所示。其中,VOP3R指令定义了如下特殊字段,见表1。In order to support the compression of 64 3-operand instructions into 64 bits, this application introduces a VOP3R (Vector Operation with 3 Operand and Repeat, with 3 operands and repeated vector operations) instruction, and the set type is "110010" , That is, 110010 indicates that the instruction is a VOP3R instruction, as shown in Figure 1. Among them, the VOP3R instruction defines the following special fields, as shown in Table 1.
表1Table 1
Figure PCTCN2020114004-appb-000001
Figure PCTCN2020114004-appb-000001
Figure PCTCN2020114004-appb-000002
Figure PCTCN2020114004-appb-000002
需要说明的是,表1中的各个字段的位数(位宽)是相对固定的,其位置是可以变化的,例如,Repeat_Enable可以不再是[62:59]这一位数,其可以是在[3:0]这一位数,其余字段的情况与之类似。It should be noted that the number of bits (bit width) of each field in Table 1 is relatively fixed, and its position can be changed. For example, Repeat_Enable can no longer be the number of [62:59], it can be In the [3:0] digit, the situation of the other fields is similar.
其中,Repeat_Enable:重复使能字段,4bit,各个bit指示源操作数(Operand0,Operand1,Operand2)和目的操作数(也称为结果Result)的重复,例如,B[59:59](Or B[0:0]):Repeat Operand0;B[60:60](Or B[1:1]):Repeat Operand1;B[61:61](Or B[2:2]):Repeat Operand2;B[62:62](Or B[3:3]):Repeat destination。其中,需要说明的是,仅对源操作数来源于向量通用寄存器(Vector General Purpose Register,VGPR)/标量通用寄存器(Scalar General Purpose Register,SGPR)/局部数据存储(Local Data Share,LDS_DIRECT),以及目的操作数来源于VGPR/SGPR的操作数重复,其他情况直接忽略。Among them, Repeat_Enable: Repeat enable field, 4bit, each bit indicates the repetition of the source operand (Operand0, Operand1, Operand2) and the destination operand (also called Result), for example, B[59:59](Or B[ 0:0]): RepeatOperand0; B[60:60](OrB[1:1]): RepeatOperand1; B[61:61](OrB[2:2]): RepeatOperand2; B[62 :62](Or B[3:3]): Repeat destination. Among them, it should be noted that only the source operand comes from Vector General Purpose Register (VGPR)/Scalar General Purpose Register (SGPR)/Local Data Share (LDS_DIRECT), and The destination operand comes from the repetition of the operand of VGPR/SGPR, and other cases are directly ignored.
为了支持这种指令重复,在硬件上,本申请实施例提供了一种解码电路,如图2所示。当解码电路从指令分发单元(Instruction Dispatch)中获取指令后,判断该指令是否是压缩指令,在为否时,也即当前指令不为压缩指令时,解码电路将该指令直接发送至指令执行单元(Instruction Execution),指令执行单元执行指令;在为是时,也即当前指令为压缩指令时,解码电路获取压缩指令中的关键信息;再根据关键信息中的指令重复类型和指令重复次数对该压缩指令进行解压,以将该压缩指令解压成与指令重复类型对应的,且与指令重复次数数量相同的多条指令。In order to support this kind of instruction repetition, in hardware, an embodiment of the present application provides a decoding circuit, as shown in FIG. 2. After the decoding circuit obtains the instruction from the instruction dispatch unit (Instruction Dispatch), it determines whether the instruction is a compressed instruction. If it is no, that is, when the current instruction is not a compressed instruction, the decoding circuit sends the instruction directly to the instruction execution unit (Instruction Execution), the instruction execution unit executes the instruction; when yes, that is, when the current instruction is a compressed instruction, the decoding circuit obtains the key information in the compressed instruction; then according to the instruction repetition type and the number of instruction repetitions in the key information The compressed instruction is decompressed to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions.
其中,该关键信息包括:指令重复类型和指令重复次数,指令重复类型用于指示待重复的指令类型,指令重复次数为大于等于2的正整数。根据压缩指令中的重复使能字段(Repeat_Enable)来获得指令重复类型,根据重复计数字段(Repeat_Counter)来获得指令重复次数。在判断该指令是否为压缩指令时,可以根据Repeat_Counter字段来判断当前指令是否为压缩指令,若Repeat_Count!=0x0,则为压缩指令,若repeat_count==0x0(16进制0),则为非压缩指令。关键信息的详细参数如表2所示。Wherein, the key information includes: instruction repetition type and instruction repetition number. The instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2. The instruction repeat type is obtained according to the repeat enable field (Repeat_Enable) in the compressed instruction, and the instruction repeat number is obtained according to the repeat count field (Repeat_Counter). When judging whether the instruction is a compression instruction, you can judge whether the current instruction is a compression instruction according to the Repeat_Counter field, if Repeat_Count! = 0x0, it is a compressed instruction, if repeat_count == 0x0 (hexadecimal 0), it is a non-compressed instruction. The detailed parameters of the key information are shown in Table 2.
表2Table 2
字段Field 位数Number of digits
Operation_codeOperation_code 1010
Repeat_CounterRepeat_Counter 66
Result_ID Result_ID 88
Repeat_EnableRepeat_Enable 44
Operand2_IDOperand2_ID 99
Operand1_IDOperand1_ID 99
Operand0_IDOperand0_ID 99
为了便于理解,具体进行说明,例如,压缩指令为:In order to facilitate understanding, a specific description is given. For example, the compression instruction is:
Repeat Enable(0x3),Repeat Counter(62)::Repeat Enable(0x3), Repeat Counter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
则根据指令重复类型和指令重复次数对该压缩指令进行解压,可以得到与指令重复类型(重复Operand0和Operand1)对应的,且与指令重复次数(62)数量相同的62条指令,得到的指令如下:Then according to the instruction repetition type and the number of instruction repetitions, the compressed instruction is decompressed, and 62 instructions corresponding to the instruction repetition type (repeat Operand0 and Operand1) and the same number of instruction repetition times (62) can be obtained. The obtained instructions are as follows :
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
……...
Forwarding=LDS_Direct(M0_register)*B(61,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(61,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(62,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(62,ALU_Index)+Forwarding;
其中,Repeat Enable表示指令重复类型,其中的0x3表示重复Operand0和Operand1两个操作数,Repeat Counter表示指令重复次数,其中的62表示重复次数,这样对该压缩指令进行解压后就可以得到62条指令。需要说明的是,此处仅以待重复的指令类型为Operand0和Operand1为例进行了说明,待重复的指令类型可以是重复Result(目的操作数)、Operand0、Operand1、Operand2这四种操作数中的至少一种,这样有15种组合,通过定义不同的字节来表示不同的重复类型,如Repeat Enable(0x1)表示重复Operand0操作数,Repeat Enable(0x2)表示重复Operand1操作数,Repeat Enable(0x3)表示重复Operand0和Operand1两个操作数。Among them, Repeat Enable represents the type of instruction repetition, where 0x3 represents the two operands of Operand0 and Operand1, RepeatCounter represents the number of instruction repetitions, and 62 represents the number of repetitions, so that after decompressing the compressed instruction, you can get 62 instructions . It should be noted that only the types of instructions to be repeated are Operand0 and Operand1 as examples. The types of instructions to be repeated can be repeated Result (destination operand), Operand0, Operand1, Operand2, among the four operands. There are at least one of these 15 combinations. Different bytes are defined to indicate different repeat types. For example, Repeat Enable (0x1) indicates repeating Operand 0 operand, Repeat Enable (0x2) indicates repeating Operand 1 operand, and Repeat Enable( 0x3) means to repeat the two operands of Operand0 and Operand1.
其中,由于指令分为常规指令(单指令)和压缩指令,对应的硬件的指令逻辑包括常规模式和重复模式,当Repeat_Count==0表示常规模式,在常规模式下,执行逻辑从指令分发单元获取指令,并执行。Repeat_Count!=0表示为重复模式,在重复模式下,解码电路停止从指令分发单元获取指令,当解码电路完成对该压缩指令的解压工作后,也即当Repeat_Count==0时,切换回到常规模式。Among them, because the instructions are divided into regular instructions (single instructions) and compressed instructions, the instruction logic of the corresponding hardware includes regular mode and repeat mode. When Repeat_Count == 0, it means the regular mode. In the regular mode, the execution logic is obtained from the instruction distribution unit. Instructions and execute. Repeat_Count! =0 indicates the repeat mode. In the repeat mode, the decoding circuit stops fetching instructions from the instruction distribution unit. When the decoding circuit completes the decompression of the compressed instruction, that is, when Repeat_Count == 0, it switches back to the normal mode.
上述解码电路通过将指令进行压缩,使得一个指令块可以容纳更多的三操作数指令,不仅有效降低了指令缓存未命中的概率,同时优化了效率。The foregoing decoding circuit compresses instructions so that one instruction block can accommodate more three-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.
其中,作为一种实施方式,解码电路在根据指令重复类型和指令重复次数对压缩指令进行解压的过程可以是:根据指令重复类型中的操作数对应的地址ID生成指令,并更新指令重复次数;在确定更新后的指令重复次数大于预设阈值时,根据操作数对应的地址ID指向的更新操作数对应的地址ID;根据更新后的操作数对应的地址ID生成指令,并再次更新指令重复次数;判断再次更新后的指令重复次数是否等于预设阈值;在为是时,完成对压缩指令的解压,得到与指令重复类型对应的,且与指令重复次数数量相同的多条指令,在为否时,重复该操作(更新操作数对应的地址ID;根据更新后的操作数对应的地址ID生成指令,并再次更新指令重复次数;判断再次更新后的指令重复次数是否等于预设阈值),直至更新后的指令重复次数等于预设阈值则结束。Wherein, as an implementation manner, the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition number; When it is determined that the updated instruction repetition number is greater than the preset threshold, the address ID corresponding to the operand is updated according to the address ID corresponding to the operand; the instruction is generated according to the address ID corresponding to the updated operand, and the instruction repetition number is updated again ; Determine whether the number of instruction repetitions after the re-update is equal to the preset threshold; if yes, complete the decompression of the compressed instruction, and obtain multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions, if it is no When, repeat the operation (update the address ID corresponding to the operand; generate the instruction according to the address ID corresponding to the updated operand, and update the instruction repetition number again; determine whether the instruction repetition number after the update is equal to the preset threshold) until The updated instruction repetition number equals the preset threshold and ends.
该过程的代码表示如下:The code for this process is as follows:
if(repeat_count!=0x0)if(repeat_count! = 0x0)
{//repeat one instruction as below(重复生成一条指令的过程如下):{//repeat one instruction as below (The process of generating an instruction repeatedly is as follows):
Operand0_id=OperandRepeat(Operand0_id,Repeat_Enable&0x1);//Operand0的地址更新函数;Operand0_id=OperandRepeat(Operand0_id, Repeat_Enable&0x1); // The address update function of Operand0;
Operand1_id=OperandRepeat(Operand1_id,Repeat_Enable&0x2);//Operand1的地址更新函数;Operand1_id=OperandRepeat(Operand1_id, Repeat_Enable&0x2); // The address update function of Operand1;
Operand2_id=OperandRepeat(Operand2_id,Repeat_Enable&0x4);//Operand2的地址更新函数;Operand2_id=OperandRepeat(Operand2_id, Repeat_Enable&0x4); // The address update function of Operand2;
Result_ID=OperandRepeat(Result_ID,Repeat_Enable&0x8);//Result的地址更新函数;Result_ID=OperandRepeat(Result_ID, Repeat_Enable&0x8); //Result address update function;
GenerateRepeatInstruction(Result_ID,Operand0_id,Operand1_id,Operand2_id);//根据新地址生成指令;GenerateRepeatInstruction(Result_ID,Operand0_id,Operand1_id,Operand2_id);//Generate instructions according to the new address;
repeat_count--;//更新指令重复次数;repeat_count--;//Update instruction repeat times;
if(repeat_count==0)if(repeat_count == 0)
{{
Exit;Exit;
}}
}}
在该种实施方式中,也即,在第一次生成指令时,根据压缩指令中自带的地址ID生成指令,如上述的示例中,“Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding”这条指令就是根据压缩指令中默认的地址ID(地址1)生成的,然后更新指令重复次数(此时的指令重复次数为61),在确定更新后的指令重复次数(61)大于预设阈值(如0)时,更新操作数对应的地址ID(地址2),根据更新后的地址ID生成指令,并再次更新指令重复次数,然后判断更新后的指令重复次数是否等于预设阈值,在是否时,再次更新操作数对应的地址ID,根据更新后的地址ID生成指令,并再次更新指令重复次数(此时的指令重复次数为60),然后判断更新后的指令重复次数(60)是否等于预设阈值,若依然大于,则重复上述的操作(更新操作数对应的地址ID,根据更新后的地址ID生成指令,并再次更新指令重复次数,然后判断更新后的指令重复次数是否等于预设阈值),直至更新后的指令重复次数(0)等于预设阈值(如0)则结束,当更新后的指令重复次数等于预设阈值时,就得到了操作数的62条指令,即完成对压缩指令的解压。In this implementation, that is, when the instruction is generated for the first time, the instruction is generated according to the address ID included in the compressed instruction, as in the above example, "Forwarding=LDS_Direct(M0_register)*B(1, ALU_Index) +Forwarding” this instruction is generated based on the default address ID (address 1) in the compressed instruction, and then update the instruction repetition number (the instruction repetition number at this time is 61), after confirming that the updated instruction repetition number (61) is greater than When the preset threshold (such as 0), the address ID (address 2) corresponding to the operand is updated, the instruction is generated according to the updated address ID, and the instruction repetition times are updated again, and then it is judged whether the updated instruction repetition times is equal to the preset threshold value If yes, update the address ID corresponding to the operand again, generate instructions based on the updated address ID, and update the number of instruction repetitions again (the number of instruction repetitions at this time is 60), and then determine the updated instruction repetition number (60 ) Is equal to the preset threshold, if it is still greater than, repeat the above operation (update the address ID corresponding to the operand, generate the instruction according to the updated address ID, and update the instruction repetition times again, and then judge whether the updated instruction repetition times Equal to the preset threshold), until the updated instruction repetition number (0) equals the preset threshold (such as 0), it ends. When the updated instruction repetition number is equal to the preset threshold, 62 instructions of the operand are obtained. That is to complete the decompression of the compression command.
上述解码电路在判断是否完成对压缩指令的解压的整个过程中,无需要借助其他元件(如计数器),通过每生成一条指令后直接更新指令重复次数,即可完成,在保证准确的前提下,能最大化的简化处理流程,节约成本。In the whole process of judging whether the above-mentioned decoding circuit has completed the decompression of the compressed instruction, there is no need to use other components (such as a counter), and it can be completed by directly updating the number of instruction repetitions after each instruction is generated. Under the premise of ensuring accuracy, It can simplify the processing flow to the greatest extent and save costs.
作为又一种实施方式,解码电路在根据指令重复类型和指令重复次数对压缩指令进行解压的过程可以是:根据指令重复类型中的操作数对应的地址ID生成指令,并记录生成指令的生成次数;在确定生成次数小于指令重复次数时,更新操作数对应的地址ID;根据更新后的操作数对应的地址ID生成指令,并更新生成次数;判断更新后的生成次数是否等于指令重复次数;在为是时,完成对压缩指令的解压,得到与指令重复类型对应的,且与指令重复次数数量相同的多条指令;在为否时,重复该操作(更新操作数对应的地址ID;根据更新后的操作数对应的地址ID生成指令,并更新生成次数;判断更新后的生成次数是否等于指令重复次数),直至更新后的生成次数等于指令重复次数则结束。As yet another implementation manner, the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation times of the generated instruction ; When it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand; generate instructions according to the address ID corresponding to the updated operand, and update the number of generations; determine whether the updated number of generations is equal to the number of instruction repetitions; If yes, complete the decompression of the compressed instruction, and obtain multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions; otherwise, repeat the operation (update the address ID corresponding to the operand; according to the update The address ID corresponding to the subsequent operand generates an instruction and updates the generation times; it is determined whether the updated generation times are equal to the instruction repetition times), until the updated generation times are equal to the instruction repetition times.
该种实施方式的原理与前述实施方式的原理相同,不同的是,第一种实施方式中,在生成指令后,是通过更新指令重复次数,并通过判断更新后的指令重复次数是否等于预设阈值(例如0)来判断是否完成对压缩指令的解压工作,而本实施例中,在生成指令后,是通过记录生成指令的生成次数,并通过判断生成次数是否等于指令重复次数来判断是否完成对压缩指令的解压工作。也即在本实施例中,需要使用计数器来对生成指令的次数进行计数,每生成一条指令就计数一次,数值就增加一次,通过判断记录的次数是否等于指令重复次数(62)来决定是否需要继续生成指令。该种实施方式利用计数器来对生成指令的生成次数进行记录,在每生成一条指令后,便更新生成指令的生成次数,当生成次数等于指令重复次数时,完成对压缩指令的解压,提供了另一种可行的方式,丰富了方案的适用性。The principle of this embodiment is the same as that of the previous embodiment. The difference is that in the first embodiment, after the command is generated, the number of repetitions of the command is updated, and it is determined whether the updated number of repetitions of the command is equal to the preset number. The threshold (for example, 0) is used to determine whether the decompression of the compressed instruction is completed. In this embodiment, after the instruction is generated, the number of generations of the generated instruction is recorded, and the completion is determined by judging whether the number of generations is equal to the number of repetitions of the instruction Decompression work on compression instructions. That is, in this embodiment, it is necessary to use a counter to count the number of generated instructions. Each time an instruction is generated, the number is counted once, and the value is incremented. It is determined whether the number of records is equal to the number of instruction repetitions (62). Continue to generate instructions. This implementation uses a counter to record the number of generations of generated instructions. After each instruction is generated, the number of generations of generated instructions is updated. When the number of generations is equal to the number of instruction repetitions, the decompression of compressed instructions is completed. A feasible way to enrich the applicability of the scheme.
其中,在对压缩指令进行解压的过程中,每生成一条指令,就将该指令下发至指令执行单元。Among them, in the process of decompressing the compressed instruction, every time an instruction is generated, the instruction is issued to the instruction execution unit.
其中,上述的指令重复类型中的操作数可以是Result、Operand0、Operand1、Operand2这四种操作数中的至少一种。在更新操作数对应的地址ID时,一种实施方式下,可以是根据操作数对应的地址ID指向的操作数来源类型(如VGPR/SGPR/LDS_DIRECT),如不同的操作数来源类型对应的更新地址ID的规则可以不同,例如,操作数来源为VGPR对应的更新地址ID的规则与操作数来源为SGPR对应的更新地址ID的规则不同。Wherein, the operand in the above instruction repetition type may be at least one of the four operands of Result, Operand0, Operand1, Operand2. When updating the address ID corresponding to the operand, in one implementation, it can be based on the operand source type pointed to by the address ID corresponding to the operand (such as VGPR/SGPR/LDS_DIRECT), such as the update corresponding to different operand source types The rules for the address ID may be different. For example, the rule for updating the address ID corresponding to VGPR as the source of the operand is different from the rule for updating the address ID corresponding to the SGPR as the source of the operand.
下面以操作数来源为VGPR对应的更新地址ID的规则与操作数来源为SGPR对应的更新地址ID的规则相同为例进行说明,例如,当操作数对应的ID指向的操作数来源为VGPR/SGPR时,则在更新地址ID的时候,可以是基于(Operand_ID++,或Result_ID++)的规则进行更新,也即更新后的地址等于更新前的地址加一。为了便于理解,以Operand1为例进行说明,如果Operand1_ID指向的是VGPR/SGPR,使其重复如下:The following is an example of the same rules for updating address ID corresponding to VGPR as the source of the operand and the same rule for updating address ID corresponding to SGPR as the source of the operand. For example, when the source of the operand pointed to by the ID corresponding to the operand is VGPR/SGPR When the address ID is updated, it can be updated based on the (Operand_ID++, or Result_ID++) rule, that is, the updated address is equal to the address before the update plus one. For ease of understanding, take Operand1 as an example. If Operand1_ID points to VGPR/SGPR, repeat it as follows:
if(Operand1_ID is SGPR or VGPR)if(Operand1_ID is SGPR or VGPR)
{{
Operand1_ID=((Repeat_Enable&0x8)!=0)?Operand1_ID++:Operand1_ID;Operand1_ID=((Repeat_Enable&0x8)!=0)? Operand1_ID++:Operand1_ID;
}}
也即,如果Operand1_ID指向的操作数来源是VGPR/SGPR,且Repeat_Enable[60]为1,则操作数1的地址(Operand1_ID)增加1;如果为0,操作数1的地址保持不变。其中,需要说明的是,上述的示例中仅以地址自增,且增幅为1为例进行举例的,地址更新的规律,还可以是地址自减,此时,幅度还可以不为1,其主要取决于存储数据时,是采用递增的方式存储,还是递减的方式存储,是否是连续存储等,因此,不能将该示例理解成是对本申请的限制。That is, if the source of the operand pointed to by Operand1_ID is VGPR/SGPR, and Repeat_Enable[60] is 1, the address of operand 1 (Operand1_ID) is increased by 1; if it is 0, the address of operand 1 remains unchanged. Among them, it should be noted that, in the above example, only the address self-increment, and the increment is 1 as an example, the law of address update can also be the address self-decrement. In this case, the amplitude may not be 1. It mainly depends on whether the data is stored in an incremental manner or a decremental manner, whether it is continuous storage, etc., therefore, this example cannot be understood as a limitation of the application.
当操作数对应的ID指向的操作数来源为LDS_DIRECT,则更新地址ID的规则与操作数对应的ID指向的操作数来源为VGPR/SGPR时的不同。如果当操作数对应的ID指向的操作数来源为LDS_DIRECT时,在这种模式下,硬件从LDS读取数据作为操作数,访问地址和数据类型由配置寄存器,如M0寄存器(32bit的专用硬件内部寄存器,其低16bit被LDS_DIRECT用作地址)确定。M0寄存器的32bit定义如表3所示。When the operand source pointed to by the ID corresponding to the operand is LDS_DIRECT, the rules for updating the address ID are different from when the operand source pointed to by the ID corresponding to the operand is VGPR/SGPR. If the operand source pointed to by the ID corresponding to the operand is LDS_DIRECT, in this mode, the hardware reads the data from the LDS as the operand, and the access address and data type are determined by the configuration register, such as the M0 register (32bit dedicated hardware internal Register, its low 16bit is used as address by LDS_DIRECT) to determine. The 32bit definition of M0 register is shown in Table 3.
表3table 3
Figure PCTCN2020114004-appb-000003
Figure PCTCN2020114004-appb-000003
因此,当源操作数来源于LDS_DIRECT时,此时在更新地址ID时,需要对M0寄存器的地址字段进行自动更新。对应的,该地址ID的指向的地址为存储于M0寄存器中的地址,该地址用于读取存储于LDS中的源操作数。也即,M0寄存器,配置成存储读取LDS中的源操作数(如矩阵中的每一行的元素)的地址,并且在根据当前的地址从LDS中读取对应的元素后需要将M0寄存器的地址更新到下一个元素对应的地址。Therefore, when the source operand comes from LDS_DIRECT, when the address ID is updated at this time, the address field of the M0 register needs to be automatically updated. Correspondingly, the address pointed to by the address ID is the address stored in the M0 register, and the address is used to read the source operand stored in the LDS. That is, the M0 register is configured to store the address of the source operand in the LDS (such as the element of each row in the matrix), and after reading the corresponding element from the LDS according to the current address, the M0 register needs to be The address is updated to the address corresponding to the next element.
作为又一种实施方式,除了根据操作数对应的地址ID指向的操作数来源类型更新操作数对应的地址ID外,还可以根据操作数对应的地址ID指向的操作数来源中存储的数据的数据类型来更新操作数对应的地址ID。不同的数据类型,对应的地址更新规则不同,例如,如下所示:As yet another implementation manner, in addition to updating the address ID corresponding to the operand according to the operand source type pointed to by the address ID corresponding to the operand, the data stored in the operand source pointed to by the address ID corresponding to the operand can also be used. Type to update the address ID corresponding to the operand. Different data types correspond to different address update rules, for example, as shown below:
Address i+1=Addressi+0x1;//数据类型为unsigned byte;Address i+1=Addressi+0x1;//The data type is unsignedbyte;
Address i+1=Addressi+0x2;//数据类型为unsigned byte;Address i+1=Addressi+0x2;//The data type is unsignedbyte;
Address i+1=Addressi+0x4;//数据类型为DWord;Address i+1=Addressi+0x4;//The data type is DWord;
Address i+1=Addressi+0x0;//数据类型为Default(Reserved);Address i+1=Addressi+0x0;//The data type is Default(Reserved);
Address i+1=Addressi+0x1;//数据类型为signed byte;Address i+1=Addressi+0x1;//The data type is signed byte;
Address i+1=Addressi+0x2;//数据类型为signed short;Address i+1=Addressi+0x2;//The data type is signed short;
Address i+1=Addressi+0x8;//数据类型为Qword;Address i+1=Addressi+0x8;//The data type is Qword;
以操作数对应的地址ID指向的操作数来源为LDS_DIRECT为例,此时,在更新对M0寄存器的地址字段进行自动更新,还要考虑LDS中存储的数据的数据类型,若数据类型为unsigned byte,则按照Address i+1=Addressi+0x1的规律进行更新。Take the operand source pointed to by the address ID corresponding to the operand as LDS_DIRECT as an example. At this time, the address field of the M0 register is automatically updated during the update, and the data type of the data stored in the LDS should also be considered. If the data type is unsignedbyte , Then update according to the law of Address i+1=Addressi+0x1.
其中,当操作数为目的操作数(Result)时,在进行地址更新之前,还需要确保目的操作数对应的地址ID指向的操作数来源类型不为用于数据直通的临时寄存器。其中,可以通过目的地直通DF字段来判断该目的操作数对应的地址ID指向的操作数来源类型是否为用于数据直通的临时寄存器。当DF==1时,则Result_ID为forwarding(直通),此时,地址不需要更新,保持forwarding即可,此时,在生成指令时,是基于压缩指令中默认的Result_ID来生成指令,每条指令中的Result_ID相同。反之也即DF不为1,则该目的操作数对应的地址ID指向的操作数来源类型不为用于数据直通的临时寄存器,如为VGPR/SGPR,则按照前面的方式进行地址更新即可。Among them, when the operand is the destination operand (Result), before the address update is performed, it is also necessary to ensure that the source type of the operand pointed to by the address ID corresponding to the destination operand is not a temporary register used for data pass-through. Among them, the destination pass-through DF field can be used to determine whether the source type of the operand pointed to by the address ID corresponding to the destination operand is a temporary register for data pass-through. When DF==1, the Result_ID is forwarding (pass-through). At this time, the address does not need to be updated, just keep forwarding. At this time, when generating the instruction, the instruction is generated based on the default Result_ID in the compressed instruction. The Result_ID in the instruction is the same. Conversely, that is, if DF is not 1, the source type of the operand pointed to by the address ID corresponding to the destination operand is not a temporary register for data pass-through, such as VGPR/SGPR, then the address update can be performed in the previous way.
在上述实施方式中,当操作数为目的操作数,在更新操作数对应的地址ID之前,需要确定目的地直通DF字段中的数值不为设定阈值,以避免对数据直通造成影响。In the foregoing embodiment, when the operand is the destination operand, before updating the address ID corresponding to the operand, it is necessary to determine that the value in the destination pass-through DF field is not the set threshold to avoid affecting the data pass-through.
为了提高效率,避免对错误的压缩指令进行解压而造成资源浪费,在获取压缩指令中的关键信息之前,解码电路还可以先判断该压缩指令是否有效,当确定压缩指令有效后,才获取压缩指令中的关键信息,并根据关键信息中的指令重复类型和指令重复次数对压缩指令进行解压。In order to improve efficiency and avoid the waste of resources caused by decompressing the wrong compression instruction, the decoding circuit can also determine whether the compression instruction is valid before obtaining the key information in the compression instruction, and only obtain the compression instruction after the compression instruction is determined to be valid. According to the key information in the key information, the compressed instructions are decompressed according to the instruction repetition type and the number of instruction repetitions in the key information.
作为一种实施方式,可以通过以下方式来确定该压缩指令是否有效:根据压缩指令中的表征源操作数的重复使能字段,或者压缩指令中的表征目的操作数的重复使能字段来判断压缩指令是否有效;在表征源操作数的重复使能字段不为零,且源操作数对应的地址ID指向指定操作数来源类型(如VGPR/SGPR/LDS_Direct)时,或者,在表征目的操作数的重复使能字段不为零时,表征压缩指令有效。若以下至少一项为真,则表征该压缩指令有效:As an implementation manner, whether the compression instruction is valid can be determined in the following manner: the compression is determined according to the repetitive enable field in the compression instruction that characterizes the source operand, or the repetitive enable field in the compression instruction that characterizes the destination operand. Whether the instruction is valid; when the repetitive enable field that characterizes the source operand is not zero, and the address ID corresponding to the source operand points to the specified operand source type (such as VGPR/SGPR/LDS_Direct), or when characterizing the destination operand When the repeated enable field is not zero, it indicates that the compression instruction is valid. If at least one of the following is true, it means that the compression instruction is valid:
If(Repeat_Enable[59:59]!=0x0)andoperand0_ID isVGPR/SGPR/LDS_DIRECT;If(Repeat_Enable[59:59]!=0x0)andoperand0_ID isVGPR/SGPR/LDS_DIRECT;
If(Repeat_Enable[60:60]!=0x0)andoperand1_ID isVGPR/SGPR/LDS_DIRECT;If(Repeat_Enable[60:60]!=0x0)andoperand1_ID isVGPR/SGPR/LDS_DIRECT;
If(Repeat_Enable[61:61]!=0x0)andoperand2_ID isVGPR/SGPR/LDS_DIRECT;If(Repeat_Enable[61:61]!=0x0)andoperand2_ID isVGPR/SGPR/LDS_DIRECT;
If(Repeat_Enable[62:62]!=0x0);If(Repeat_Enable[62:62]! = 0x0);
也即,至少一个源操作数的重复使能字段不为零,且对应的地址ID指向指定操作数来源类型,或者目的操作数的重复使能字段不为零,则表征该压缩指令有效。That is, the repetition enable field of at least one source operand is not zero, and the corresponding address ID points to the source type of the specified operand, or the repetition enable field of the destination operand is not zero, indicating that the compression instruction is valid.
以上是从整个解码电路的角度来叙述的,为了便于理解该解码电路中的各个元件之间的信息交互,下面对解码电路中的各个元件所执行的步骤进行说明。如图1所示,解码电路包括:解码器(Repeat Decoder)、指令解压模块,解码器和指令解压模块连接。The above is described from the perspective of the entire decoding circuit. In order to facilitate the understanding of the information interaction between the various components in the decoding circuit, the steps performed by the various components in the decoding circuit are described below. As shown in Figure 1, the decoding circuit includes: a repeat decoder, an instruction decompression module, and the decoder is connected to the instruction decompression module.
其中,解码器配置成判断获取到的指令是否为压缩指令,在为否时,将该指令发送给至指令执行单元去执行该指令,在为是时,获取压缩指令中的关键信息。Wherein, the decoder is configured to determine whether the acquired instruction is a compressed instruction, and if it is not, it sends the instruction to the instruction execution unit to execute the instruction, and if it is, it acquires key information in the compressed instruction.
为了提高效率,在一些可能的实现方式中,解码器还配置成在获取所压缩指令中的关键信息之前,确定该压缩指令有效。作为一种实施方式,解码器配置成根据以下方式来确定该压缩指令有效:根据压缩指令中的表征源操作数的重复使能字段,或者压缩指令中的表征目的操作数的重复使能字段来判断压缩指令是否有效;在表征源操作数的重复使能字段不为零,且源操作数对应的地址ID指向指定操作数来源类型时,或者,在表征目的操作数的重复使能字段不为零时,表征压缩指令有效。In order to improve efficiency, in some possible implementation manners, the decoder is further configured to determine that the compressed instruction is valid before obtaining key information in the compressed instruction. As an implementation manner, the decoder is configured to determine that the compression instruction is valid according to the following method: according to the repeated enable field in the compression instruction that characterizes the source operand, or the repeated enable field in the compression instruction that characterizes the destination operand. Determine whether the compression instruction is valid; when the repetitive enable field representing the source operand is not zero, and the address ID corresponding to the source operand points to the specified operand source type, or when the repetitive enable field representing the destination operand is not When it is zero, it means that the compression command is valid.
指令解压模块,配置成根据指令重复类型和指令重复次数对压缩指令进行解压,以将压缩指令解压成与指令重复类型对应的,且与指令重复次数数量相同的多条指令。The instruction decompression module is configured to decompress the compressed instruction according to the instruction repetition type and the number of instruction repetitions, so as to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions.
在一些可能的实现方式中,指令解压模块,还配置成在接收到解码器发送的关键信息时,向解码器发送阻止其从指令分发单元中获取指令的指示,以及在完成对压缩指令的解压时,向解码器发送允许其从指令分发单元中获取指令的指示。In some possible implementations, the instruction decompression module is also configured to, upon receiving the key information sent by the decoder, send an instruction to the decoder to prevent it from obtaining the instruction from the instruction distribution unit, and to decompress the compressed instruction after the completion of the decompression. At the time, send an instruction to the decoder to allow it to obtain instructions from the instruction distribution unit.
一种实施方式下,如图3所示,该指令解压模块包括:控制器和指令生成器。控制器分别与指令生成器和解码器连接。In one implementation, as shown in FIG. 3, the instruction decompression module includes: a controller and an instruction generator. The controller is respectively connected with the instruction generator and the decoder.
一种实施方式下,在一些可能的实现方式中,控制器配置成获取指令重复类型中的操作数对应的地址ID;指令生成器,配置成根据指令重复类型中的操作数对应的地址ID生 成指令;控制器,还配置成在指令生成器根据指令重复类型中的操作数对应的地址ID生成指令后,更新指令重复次数,以及在确定更新后的指令重复次数大于预设阈值时,更新操作数对应的地址ID,并将更新后的操作数对应的地址ID发给指令生成器;指令生成器,还配置成根据更新后的操作数对应的地址ID生成指令;控制器,还配置成在指令生成器根据更新后的操作数对应的地址ID生成指令后,再次更新指令重复次数,并判断再次更新后的指令重复次数是否等于预设阈值;在为是时,完成对压缩指令的解压,得到与指令重复类型对应的,且与指令重复次数数量相同的多条指令。In one implementation, in some possible implementations, the controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type; the instruction generator is configured to generate the address ID corresponding to the operand in the instruction repetition type The controller is also configured to update the number of instruction repetitions after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, and update the operation when it is determined that the updated instruction repetition number is greater than a preset threshold The address ID corresponding to the number, and the address ID corresponding to the updated operand is sent to the instruction generator; the instruction generator is also configured to generate instructions according to the address ID corresponding to the updated operand; the controller is also configured to After the instruction generator generates the instruction according to the address ID corresponding to the updated operand, it updates the instruction repetition times again, and determines whether the re-updated instruction repetition times is equal to the preset threshold; if yes, completes the decompression of the compressed instruction, Obtain multiple instructions corresponding to the instruction repetition type and with the same number of instruction repetitions.
又一种实施方式下,控制器,配置成获取指令重复类型中的操作数对应的地址ID;指令生成器,配置成根据指令重复类型中的操作数对应的地址ID生成指令;控制器,还配置成在指令生成器根据指令重复类型中的操作数对应的地址ID生成指令后,记录生成指令的生成次数,以及在确定生成次数小于指令重复次数时,更新操作数对应的地址ID,并将更新后的操作数对应的地址ID发给指令生成器;指令生成器,还配置成根据更新后的操作数对应的地址ID生成指令;控制器,还配置成在指令生成器根据更新后的操作数对应的地址ID生成指令后,更新生成次数,并判断更新后的生成次数是否等于指令重复次数;在为是时,完成对压缩指令的解压,得到与指令重复类型对应的,且与指令重复次数数量相同的多条指令。In yet another embodiment, the controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type; the instruction generator is configured to generate the instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller also After the instruction generator generates an instruction according to the address ID corresponding to the operand in the instruction repetition type, it records the generation times of the generated instruction, and when it is determined that the generation times are less than the instruction repetition times, the address ID corresponding to the operand is updated, The address ID corresponding to the updated operand is sent to the instruction generator; the instruction generator is also configured to generate instructions according to the address ID corresponding to the updated operand; the controller is also configured to generate instructions in the instruction generator according to the updated operation After generating the instruction for the address ID corresponding to the number, update the generation times, and determine whether the updated generation times are equal to the instruction repetition times; if yes, complete the decompression of the compressed instruction, and obtain the corresponding instruction repetition type and repeat the instruction Multiple instructions with the same number of times.
一种实施方式下,控制器在更新操作数对应的地址ID时,还配置成根据操作数对应的地址ID指向的操作数来源类型更新操作数对应的地址ID。In an implementation manner, when the controller updates the address ID corresponding to the operand, it is further configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
一种实施方式下,控制器在更新操作数对应的地址ID时,控制器在更新操作数对应的地址ID时,还配置成根据操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新操作数对应的地址ID。In one embodiment, when the controller updates the address ID corresponding to the operand, when the controller updates the address ID corresponding to the operand, it is also configured to use the data stored in the source of the operand pointed to by the address ID corresponding to the operand. The address ID corresponding to the data type update operand.
在一些可能的实现方式中,指令重复类型中的操作数为目的操作数,关键信息还包括:目的地直通DF字段,控制器在更新操作数对应的地址ID之前,还配置成确定目的地直通DF字段中的数值不为设定阈值(如1)。In some possible implementations, the operand in the instruction repetition type is the destination operand, and the key information also includes the destination pass-through DF field. The controller is also configured to determine the destination pass-through before updating the address ID corresponding to the operand The value in the DF field is not the set threshold (such as 1).
在一些可能的实现方式中,控制器还配置成在对压缩指令进行解压缩的过程中,还向解码器发送阻止其(解码器)从指令分发单元获取指令的指示,此时,解码器不在从指令分发单元中获取指令。待解压缩完成时,向解码器发送允许其(解码器)从指令分发单元获取指令的指示,此时,解码器可以从指令分发单元中获取指令。也即控制器包括常规模式和重复模式,在常规模式(当Repeat_Count==0表示常规模式,)下,控制器允许解码器从指令分发单元获取指令,并执行。在重复模式(Repeat_Count!=0表示为重复模式)下,控制器阻止解码器从指令分发单元获取指令,当控制器完成对该压缩指令的解压工作后,也即当Repeat_Count==0时,切换回到常规模式。In some possible implementations, the controller is also configured to send an instruction to the decoder to prevent it (the decoder) from obtaining instructions from the instruction distribution unit during the process of decompressing the compressed instruction. At this time, the decoder is not present. Obtain instructions from the instruction distribution unit. When the decompression is completed, an instruction is sent to the decoder to allow it (the decoder) to obtain the instruction from the instruction distribution unit. At this time, the decoder can obtain the instruction from the instruction distribution unit. That is, the controller includes a normal mode and a repeat mode. In the normal mode (when Repeat_Count == 0 indicates the normal mode), the controller allows the decoder to obtain instructions from the instruction distribution unit and execute them. In repeat mode (Repeat_Count! = 0 means repeat mode), the controller prevents the decoder from obtaining instructions from the instruction distribution unit. After the controller completes the decompression of the compressed instruction, that is, when Repeat_Count == 0, switch Back to normal mode.
当指令重复类型中的操作数对应的地址ID指向的操作数来源类型为LDS,指令解压模块还包括:配置寄存器,配置寄存器配置成存储获取LDS中的源操作数的地址,并且在根据当前的地址从LDS中读取对应的源操作数后自动将自身(配置寄存器)的地址更新到下一个源操作数对应的地址。此时,控制器在更新操作数对应的地址ID时,还配置成根据配置寄存器当前指示的地址来更新操作数对应的地址ID,该地址ID与配置寄存器当前指示的地址相同。此时,如图4所示,指令解压模块包括:控制器、配置寄存器(M0寄存器)和指令生成器。控制器分别与解码器、指令生成器和配置寄存器连接。When the operand source type pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, the instruction decompression module also includes: a configuration register, which is configured to store the address of the source operand in the LDS, and is based on the current After the address reads the corresponding source operand from the LDS, it automatically updates the address of its own (configuration register) to the address corresponding to the next source operand. At this time, when the controller updates the address ID corresponding to the operand, it is also configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, and the address ID is the same as the address currently indicated by the configuration register. At this time, as shown in Figure 4, the instruction decompression module includes: a controller, a configuration register (M0 register), and an instruction generator. The controller is respectively connected with the decoder, the instruction generator and the configuration register.
其中,在对各个元器件的功能作用进行描述时,未提及之处可参考前述以解码电路为整体进行描述时的实施例中的相同部分,该部分在前述装置实施例中已经作了详细介绍,为了说明书的简洁,在此不再重复介绍。Among them, when describing the function of each component, what is not mentioned can refer to the same part in the foregoing embodiment when the decoding circuit is described as a whole. This part has been described in detail in the foregoing device embodiment. Introduction, for the sake of brevity of the manual, the introduction is not repeated here.
本申请实施例中,通过VOP3R来将指令进行压缩,使得每个缓存行(512bit)能容纳512条3操作数指令,不仅有效降低了指令缓存未命中的概率,同时优化了效率。为了便于理解,接下来以将本申请实施例提供的方法应用于矩阵乘法中作为示例进行说明。此处以64X64的矩阵为例,C 64x64=A 64x64*B 64x64,其中,64X64的矩阵大小仅是示例,不限于 此。假设有64个算术运算单元,每一个算术运算单元具有200x64bit的VGPR空间。 In the embodiments of the present application, instructions are compressed through VOP3R, so that each cache line (512bit) can accommodate 512 3-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency. To facilitate understanding, the following uses the method provided in the embodiment of the present application to be applied to matrix multiplication as an example for description. Here, a 64X64 matrix is taken as an example, C 64x64 =A 64x64 *B 64x64 , where the 64X64 matrix size is only an example and is not limited to this. Assuming that there are 64 arithmetic operation units, each arithmetic operation unit has a 200x64bit VGPR space.
计算过程大致如下:The calculation process is roughly as follows:
1)矩阵A以线性模式加载到LDS:1) Matrix A is loaded into LDS in linear mode:
A(0,0)→LDS(Address0);//A(0,0)存储在LDS的Address0的位置;A(0,0)→LDS(Address0);//A(0,0) is stored in the Address0 location of LDS;
A(0,1)→LDS(Address1);//A(0,1)存储在LDS的Address1的位置;A(0,1)→LDS(Address1);//A(0,1) is stored in Address1 of LDS;
A(0,2)→LDS(Address2);//A(0,2)存储在LDS的Address2的位置;A(0,2)→LDS(Address2);//A(0,2) is stored in the location of Address2 of LDS;
……...
2)矩阵B加载到VGPR空间,如表4所示。2) Matrix B is loaded into the VGPR space, as shown in Table 4.
表4Table 4
ALU0ALU0 ALU1ALU1 ALU2ALU2 ……... ALU62ALU62 ALU63ALU63
B0,0B0,0 B0,1B0,1 B0,2B0,2 ……... B0,62B0,62 B0,63B0,63
B1,0B1,0 B1,1B1,1 B1,2B1,2 ……... B1,62B1,62 B1,63B1,63
……... ……... ……... ……... ……... ……...
B63,0B63,0 B63,1B63,1 B63,2B63,2 ……... B63,62B63,62 B63,63B63,63
其中,不同的VGPR存储不同的行,在计算时,矩阵A中的元素逐个并行地被加载到64个ALU中,与64个向量通用寄存器中各自存储的列对应的元素进行相乘,64个ALU并行地将矩阵A中同一行中的元素逐个与矩阵B的对应元素产生的相乘结果依次累加,得到矩阵C同一行的所有元素,从而完成矩阵A和第二矩阵B的乘法运算。Among them, different VGPR stores different rows. During calculation, the elements in matrix A are loaded into 64 ALUs one by one in parallel, and are multiplied by the elements corresponding to the columns stored in each of the 64 vector general registers, 64 The ALU sequentially accumulates the multiplication results generated by the elements in the same row of matrix A and the corresponding elements of matrix B in parallel to obtain all elements in the same row of matrix C, thereby completing the multiplication operation of matrix A and second matrix B.
3)计算矩阵C:3) Calculate matrix C:
在常规模式下计算矩阵C的指令如下:The instruction to calculate matrix C in normal mode is as follows:
M0_register=start_address;//M0寄存器的初始地址,其中,M0寄存器配置成存储读取矩阵A中每个元素的地址,并且在64个ALU并行地根据M0寄存器当前的地址从所LDS读取矩阵A中对应的元素后自动更新到下一个元素对应的地址。M0_register=start_address; //The initial address of the M0 register, where the M0 register is configured to store the address of each element in the read matrix A, and read the matrix A from the LDS based on the current address of the M0 register in 64 ALUs in parallel After the corresponding element in the file is automatically updated to the address corresponding to the next element.
//-----------------------------------------//-----------------------------------------
//Calculate the first row of Matrix C(计算矩阵C的第一行)://Calculate the first row of Matrix C (calculate the first row of matrix C):
//C(0,0)is calculated on ALU_Index0:ALU_Index=0(ALU0计算C(0,0)).//C(0,0) is calculated on ALU_Index0:ALU_Index=0 (ALU0 calculates C(0,0)).
//C(0,1)is calculated on ALU_Index1:ALU_Index=1(ALU1计算C(0,1)).//C(0,1) is calculated on ALU_Index1:ALU_Index=1 (ALU1 calculates C(0,1)).
//......//...
//-----------------------------------------每个ALU分别计算矩阵C的第一行中对应的一个元素的执行指令如下://-----------------------------------------Calculate matrix C separately for each ALU The execution instruction of the corresponding element in the first line of is as follows:
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
……...
Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
//-----------------------------------------//-----------------------------------------
//Calculate the second row of Matrix C(计算矩阵C的第二行)://Calculate the second row of Matrix C (calculate the second row of matrix C):
//C(1,0)is calculated on ALU_Index0(ALU0计算C(1,0)).//C(1,0) is calculated on ALU_Index0 (ALU0 calculates C(1,0)).
//C(1,1)is calculated on ALU_Index1(ALU1计算C(1,1)).//C(1,1) is calculated on ALU_Index1 (ALU1 calculates C(1,1)).
//......//...
//------------------------------------------每个ALU分别计算矩阵C的第二行中对应的一个元素的执行指令如下://------------------------------------------Calculate the matrix separately for each ALU The execution instruction of the corresponding element in the second line of C is as follows:
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
……...
Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
……...
//-----------------------------------------//-----------------------------------------
//Calculate the last row of Matrix C(计算矩阵C的最后一行)://Calculate the last row of Matrix C (calculate the last row of matrix C):
//C(63,0)is calculated on ALU_Index0(ALU0计算C(63,0)).//C(63,0) is calculated on ALU_Index0 (ALU0 calculates C(63,0)).
//C(63,1)is calculated on ALU_Index1(ALU1计算C(63,1)).//C(63,1) is calculated on ALU_Index1 (ALU1 calculates C(63,1)).
//......//...
//-----------------------------------------每个ALU分别计算矩阵C的最后一行中对应的一个元素的执行指令如下://-----------------------------------------Calculate matrix C separately for each ALU The execution instruction of the corresponding element in the last line of is as follows:
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
……...
Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
以上是没有使用指令压缩的常规模式,以下是利用本申请提供的指令压缩方法,可以将上述的常规指令列表压缩如下:The above is the conventional mode without command compression. The following is the command compression method provided by this application to compress the above-mentioned conventional command list as follows:
M0_register=start_address;M0_register=start_address;
//-----------------------------------------//-----------------------------------------
//Calculate the first row of Matrix C(计算矩阵C的第一行)://Calculate the first row of Matrix C (calculate the first row of matrix C):
//C(0,0)is calculated on ALU_Index0:ALU_Index=0.//C(0,0)is calculated on ALU_Index0:ALU_Index=0.
//C(0,1)is calculated on ALU_Index1:ALU_Index=1.//C(0,1)is calculated on ALU_Index1:ALU_Index=1.
//......//...
//-----------------------------------------//-----------------------------------------
Block_Star::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);Block_Star::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
RepeatEnable(0x3,RepeatCounter(62)::RepeatEnable(0x3, RepeatCounter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0and Operand1;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0and Operand1;
Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
//-----------------------------------------//-----------------------------------------
//Calculate the second row of Matrix C(计算矩阵C的第二行)://Calculate the second row of Matrix C (calculate the second row of matrix C):
//C(1,0)is calculated on ALU_Index0.//C(1,0)is calculated on ALU_Index0.
//C(1,1)is calculated on ALU_Index1.//C(1,1)is calculated on ALU_Index1.
//...........//...........
//-----------------------------------------//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
RepeatEnable(0x3),RepeatCounter(62)::RepeatEnable(0x3), RepeatCounter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0and Operand1;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0and Operand1;
Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
……...
//-----------------------------------------//-----------------------------------------
//Calculate the last row of Matrix C(计算矩阵C的最后一行)://Calculate the last row of Matrix C (calculate the last row of matrix C):
//C(63,0)is calculated on ALU_Index0.//C(63,0)is calculated on ALU_Index0.
//C(63,1)is calculated on ALU_Index1.//C(63,1)is calculated on ALU_Index1.
//...........//...........
//-----------------------------------------//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
RepeatEnable(0x3),RepeatCounter(62)::RepeatEnable(0x3), RepeatCounter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0and Operand1;Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//RepeatOperand0andOperand1;
Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
从上述可以看出,采用常规的指令模式,完成C 64x64=A 64x64*B 64x64需要64X64条指令=4096条指令,而采用本申请的指令压缩后,仅需要3x64条指令,显著的提高了效率。 It can be seen from the above that using the conventional instruction mode to complete C 64x64 = A 64x64 * B 64x64 requires 64X64 instructions = 4096 instructions, and after using the instruction compression of this application, only 3x64 instructions are required, which significantly improves efficiency .
请参阅图5,为本申请实施例提供的一种数据处理发法,下面将结合图5对其所包含的步骤进行说明。Please refer to FIG. 5 for a data processing method provided by an embodiment of this application. The steps involved will be described below in conjunction with FIG. 5.
步骤S101:判断获取到的指令是否为压缩指令。Step S101: Determine whether the acquired instruction is a compressed instruction.
在为是时,执行步骤S102,在为否时,将获取到的指令发送给指令执行单元。If it is yes, execute step S102, if it is no, send the acquired instruction to the instruction execution unit.
步骤S102:获取所述压缩指令中的关键信息,所述关键信息包括:指令重复类型和指令重复次数。Step S102: Acquire key information in the compressed instruction, where the key information includes: instruction repetition type and instruction repetition number.
其中,所述指令重复类型用于指示待重复的指令类型,所述指令重复次数为大于等于2的正整数。The instruction repetition type is used to indicate the instruction type to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2.
步骤S103:根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,以将所述压缩指令解压成与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。Step S103: Decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a quantity corresponding to the instruction repetition type and the same quantity as the instruction repetition number. Multiple instructions.
在一些可能的实现方式中,在获取所述压缩指令中的关键信息之前,所述方法还包括:确定所述压缩指令有效。In some possible implementation manners, before obtaining the key information in the compression instruction, the method further includes: determining that the compression instruction is valid.
其中,一种实施方式下,根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压的过程可以是:根据所述指令重复类型中的操作数对应的地址ID生成指令,并更新所述指令重复次数;在确定更新后的所述指令重复次数大于预设阈值时,更新所述操作数对应的地址ID;根据更新后的所述操作数对应的地址ID生成指令,并再次更新所述指令重复次数;判断再次更新后的所述指令重复次数是否等于所述预设阈值;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。Wherein, in an implementation manner, the process of decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and Update the number of repetitions of the instruction; when it is determined that the number of repetitions of the instruction after the update is greater than a preset threshold, update the address ID corresponding to the operand; generate the instruction according to the updated address ID corresponding to the operand, and again Update the number of instruction repetitions; determine whether the re-updated instruction repetition number is equal to the preset threshold; if yes, determine that the decompression of the compressed instruction is completed, and obtain the corresponding instruction repetition type, And multiple instructions with the same number of repetitions as the instructions.
一种实施方式下,根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压的过程可以是:根据所述指令重复类型中的操作数对应的地址ID生成指令,并记录生成指令的生成次数;在确定所述生成次数小于所述指令重复次数时,更新所述操作数对应的地址ID;根据更新后的所述操作数对应的地址ID生成指令,并更新所述生成次数;判断更新后的所述生成次数是否等于所述指令重复次数;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。In one implementation, the process of decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation The number of generations of the instruction; when it is determined that the number of generations is less than the number of repetitions of the instruction, the address ID corresponding to the operand is updated; the instruction is generated according to the updated address ID corresponding to the operand, and the number of generations is updated ; Determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of the decompression of the compressed instruction, and obtain the number corresponding to the instruction repetition type and the number of instruction repetitions The same multiple instructions.
在一些可能的实现方式中,更新所述操作数对应的地址ID的过程可以是:根据所述操作数对应的地址ID指向的操作数来源类型更新所述操作数对应的地址ID。In some possible implementation manners, the process of updating the address ID corresponding to the operand may be: updating the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
在一些可能的实现方式中,更新所述操作数对应的地址ID的过程还可以是:根据所述操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新所述操作数对应的地址ID。In some possible implementations, the process of updating the address ID corresponding to the operand may also be: updating the corresponding operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand The address ID.
在一些可能的实现方式中,所述指令重复类型中的操作数为目的操作数,所述关键信息还包括:目的地直通DF字段,在更新所述操作数对应的地址ID之前,所述方法还包括:确定所述目的地直通DF字段中的数值不为设定阈值。In some possible implementations, the operand in the instruction repetition type is the destination operand, and the key information further includes the destination pass-through DF field. Before updating the address ID corresponding to the operand, the method It also includes: determining that the value in the destination through DF field is not a set threshold.
本申请实施例所提供的方法,其实现原理及产生的技术效果和前述装置实施例相同,为简要描述,方法实施例部分未提及之处,可参考前述装置实施例中相应内容。The implementation principles and technical effects of the methods provided in the embodiments of the present application are the same as those of the foregoing device embodiments. For a brief description, for the parts not mentioned in the method embodiments, please refer to the corresponding content in the foregoing device embodiments.
本申请实施例还提供了一种处理器,如图6所示。该处理器包括上述任一实施例中的解码电路、指令执行单元和指令分发单元。指令分发单元和指令执行单元均与解码电路连接。指令分发单元配置成存储指令,以便于解码电路从指令分发单元中获取指令。指令执行单元配置成执行解码电路下发的指令。The embodiment of the present application also provides a processor, as shown in FIG. 6. The processor includes a decoding circuit, an instruction execution unit, and an instruction distribution unit in any of the foregoing embodiments. Both the instruction distribution unit and the instruction execution unit are connected to the decoding circuit. The instruction distribution unit is configured to store instructions so that the decoding circuit can obtain instructions from the instruction distribution unit. The instruction execution unit is configured to execute instructions issued by the decoding circuit.
其中,该处理器可能是一种集成电路芯片,具有信号的处理能力。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)、图形处理器(Graphics Processing Unit、GPU)等;通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Among them, the processor may be an integrated circuit chip with signal processing capabilities. The foregoing processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), a graphics processing unit (Graphics Processing Unit, GPU), etc.; a general-purpose processor may be a micro The processor or the processor may also be any conventional processor or the like.
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. For the same and similar parts between the various embodiments, refer to each other. can.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
工业实用性Industrial applicability
本申请提供的数据处理方法、解码电路及处理器,判断获取到的指令是否为压缩指令;在为是时,获取所述压缩指令中的关键信息,所述关键信息包括:指令重复类型和指令重复次数,其中,所述指令重复类型用于指示待重复的指令类型,所述指令重复次数为大于等于2的正整数;根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,以将所述压缩指令解压成与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。本申请实施例中,通过将指令进行压缩,使得一个指令块可以容纳更多的三操作数指令,不仅有效降低了指令缓存未命中的概率,同时优化了效率。The data processing method, decoding circuit, and processor provided in this application determine whether the acquired instruction is a compressed instruction; if yes, acquire key information in the compressed instruction, and the key information includes: instruction repetition type and instruction The number of repetitions, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; the compression instruction is performed according to the instruction repetition type and the instruction repetition number Decompression, so as to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and having the same number of repetition times of the instruction. In the embodiment of the present application, by compressing instructions, one instruction block can accommodate more three-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.

Claims (20)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, characterized in that it comprises:
    判断获取到的指令是否为压缩指令;Determine whether the acquired instruction is a compressed instruction;
    在为是时,获取所述压缩指令中的关键信息,所述关键信息包括:指令重复类型和指令重复次数,其中,所述指令重复类型用于指示待重复的指令类型,所述指令重复次数为大于等于2的正整数;If yes, acquire key information in the compressed instruction, the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number Is a positive integer greater than or equal to 2;
    根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,以将所述压缩指令解压成与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。Decompress the compressed instruction according to the instruction repetition type and the instruction repetition number to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and the same number of instruction repetition times .
  2. 根据权利要求1所述的方法,其特征在于,根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,包括:The method according to claim 1, wherein the decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions comprises:
    根据所述指令重复类型中的操作数对应的地址ID生成指令,并更新所述指令重复次数;Generate an instruction according to the address ID corresponding to the operand in the instruction repetition type, and update the instruction repetition number;
    在确定更新后的所述指令重复次数大于预设阈值时,更新所述操作数对应的地址ID;When it is determined that the number of repetitions of the instruction after the update is greater than a preset threshold, update the address ID corresponding to the operand;
    根据更新后的所述操作数对应的地址ID生成指令,并再次更新所述指令重复次数;Generate an instruction according to the updated address ID corresponding to the operand, and update the instruction repetition number again;
    判断再次更新后的所述指令重复次数是否等于所述预设阈值;Judging whether the number of repetitions of the instruction after being updated again is equal to the preset threshold;
    在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。If yes, it is determined that the decompression of the compression instruction is completed, and multiple instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction are obtained.
  3. 根据权利要求1所述的方法,其特征在于,根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,包括:The method according to claim 1, wherein the decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions comprises:
    根据所述指令重复类型中的操作数对应的地址ID生成指令,并记录生成指令的生成次数;Generate an instruction according to the address ID corresponding to the operand in the instruction repetition type, and record the generation times of the generated instruction;
    在确定所述生成次数小于所述指令重复次数时,更新所述操作数对应的地址ID;When it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand;
    根据更新后的所述操作数对应的地址ID生成指令,并更新所述生成次数;Generate an instruction according to the updated address ID corresponding to the operand, and update the number of generations;
    判断更新后的所述生成次数是否等于所述指令重复次数;Judging whether the updated generation times are equal to the instruction repetition times;
    在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。If yes, it is determined that the decompression of the compression instruction is completed, and multiple instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction are obtained.
  4. 根据权利要求2或3所述的方法,其特征在于,更新所述操作数对应的地址ID,包括:The method according to claim 2 or 3, wherein updating the address ID corresponding to the operand comprises:
    根据所述操作数对应的地址ID指向的操作数来源类型更新所述操作数对应的地址ID。The address ID corresponding to the operand is updated according to the source type of the operand pointed to by the address ID corresponding to the operand.
  5. 根据权利要求2或3所述的方法,其特征在于,更新所述操作数对应的地址ID,包括:The method according to claim 2 or 3, wherein updating the address ID corresponding to the operand comprises:
    根据所述操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新所述操作数对应的地址ID。The address ID corresponding to the operand is updated according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand.
  6. 根据权利要求2或3所述的方法,其特征在于,所述指令重复类型中的操作数为目的操作数,所述关键信息还包括:目的地直通DF字段,在更新所述操作数对应的地址ID之前,所述方法还包括:The method according to claim 2 or 3, wherein the operand in the instruction repetition type is a destination operand, and the key information further includes: a destination pass-through DF field. Before the address ID, the method further includes:
    确定所述目的地直通DF字段中的数值不为设定阈值。It is determined that the value in the destination through DF field is not a set threshold.
  7. 根据权利要求1所述的方法,其特征在于,在获取所述压缩指令中的关键信息之前,所述方法还包括:The method according to claim 1, wherein before obtaining the key information in the compression instruction, the method further comprises:
    确定所述压缩指令有效。It is determined that the compression instruction is valid.
  8. 根据权利要求7中所述的方法,其特征在于,所述压缩指令包括重复使能字段和重复计数字段,所述确定所述压缩指令有效的步骤包括:The method according to claim 7, wherein the compression instruction includes a repetition enable field and a repetition count field, and the step of determining that the compression instruction is valid includes:
    若表征源操作数的所述重复使能字段不为0、且源操作数对应的地址ID指向指定操作数来源类型,则判定所述压缩指令有效;If the repeat enable field characterizing the source operand is not 0, and the address ID corresponding to the source operand points to the specified operand source type, then it is determined that the compression instruction is valid;
    若表征目的操作数的重复使能字段不为0,则判定所述压缩指令有效。If the repeat enable field representing the destination operand is not 0, it is determined that the compression instruction is valid.
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述压缩指令包括重复使能字段和重复计数字段,获取所述压缩指令中的关键信息的步骤包括:The method according to any one of claims 1-8, wherein the compression instruction includes a repetition enable field and a repetition count field, and the step of obtaining key information in the compression instruction comprises:
    根据所述重复使能字段获得指令重复类型;Obtaining the instruction repetition type according to the repetition enable field;
    根据所述重复计数字段获得指令重复次数。Obtain the instruction repetition number according to the repetition count field.
  10. 根据权利要求1中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1, wherein the method further comprises:
    在对所述压缩指令进行解压的过程中,每生成一条指令,就将生成的所述指令进行下发。In the process of decompressing the compressed instruction, each time an instruction is generated, the generated instruction is issued.
  11. 一种解码电路,其特征在于,包括:A decoding circuit, characterized in that it comprises:
    解码器,配置成判断获取到的指令是否为压缩指令,在为是时,获取所述压缩指令中的关键信息,所述关键信息包括:指令重复类型和指令重复次数,其中,所述指令重复类型用于指示待重复的指令类型,所述指令重复次数为大于等于2的正整数;The decoder is configured to determine whether the acquired instruction is a compressed instruction, and if yes, acquire key information in the compressed instruction, where the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition Type is used to indicate the type of instruction to be repeated, and the number of instruction repetitions is a positive integer greater than or equal to 2;
    指令解压模块,配置成根据所述指令重复类型和所述指令重复次数对所述压缩指令进行解压,以将所述压缩指令解压成与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。An instruction decompression module configured to decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a corresponding instruction repetition type and the instruction repetition number Multiple instructions of the same number.
  12. 根据权利要求11所述的解码电路,其特征在于,所述指令解压模块包括:The decoding circuit according to claim 11, wherein the instruction decompression module comprises:
    控制器,配置成获取所述指令重复类型中的操作数对应的地址ID;The controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type;
    指令生成器,配置成根据所述指令重复类型中的操作数对应的地址ID生成指令;An instruction generator configured to generate an instruction according to the address ID corresponding to the operand in the instruction repetition type;
    所述控制器,还配置成在所述指令生成器根据所述指令重复类型中的操作数对应的地址ID生成指令后,更新所述指令重复次数,以及在确定更新后的所述指令重复次数大于预设阈值时,更新所述操作数对应的地址ID,并将更新后的所述操作数对应的地址ID发给所述指令生成器;The controller is further configured to update the instruction repetition number after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, and after determining the updated instruction repetition number When it is greater than a preset threshold, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator;
    所述指令生成器,还配置成根据更新后的所述操作数对应的地址ID生成指令;The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand;
    所述控制器,还配置成在所述指令生成器根据更新后的所述操作数对应的地址ID生成指令后,再次更新所述指令重复次数,并判断再次更新后的所述指令重复次数是否等于所述预设阈值;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。The controller is further configured to update the instruction repetition number again after the instruction generator generates an instruction according to the updated address ID corresponding to the operand, and determine whether the re-updated instruction repetition number is It is equal to the preset threshold; if yes, it is determined that the decompression of the compressed instruction is completed, and multiple instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction are obtained.
  13. 根据权利要求11所述的解码电路,其特征在于,所述指令解压模块包括:The decoding circuit according to claim 11, wherein the instruction decompression module comprises:
    控制器,配置成获取所述指令重复类型中的操作数对应的地址ID;The controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type;
    指令生成器,配置成根据所述指令重复类型中的操作数对应的地址ID生成指令;An instruction generator configured to generate an instruction according to the address ID corresponding to the operand in the instruction repetition type;
    所述控制器,还配置成在所述指令生成器根据所述指令重复类型中的操作数对应的地址ID生成指令后,记录生成指令的生成次数,以及在确定所述生成次数小于所述指令重复次数时,更新所述操作数对应的地址ID,并将更新后的所述操作数对应的地址ID发给所述指令生成器;The controller is further configured to, after the instruction generator generates an instruction according to the address ID corresponding to the operand in the instruction repetition type, record the number of generations of the generated instruction, and when it is determined that the number of generations is less than the instruction When the number of repetitions is repeated, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator;
    所述指令生成器,还配置成根据更新后的所述操作数对应的地址ID生成指令;The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand;
    所述控制器,还配置成在所述指令生成器根据更新后的所述操作数对应的地址ID生成指令后,更新所述生成次数,并判断更新后的所述生成次数是否等于所述指令重复次数;在为是时,确定对所述压缩指令的解压结束,得到与所述指令重复类型对应的,且与所述指令重复次数数量相同的多条指令。The controller is further configured to update the number of generations after the instruction generator generates an instruction according to the updated address ID corresponding to the operand, and determine whether the updated number of generations is equal to the instruction Number of repetitions; if yes, it is determined that the decompression of the compressed instruction is completed, and multiple instructions corresponding to the type of instruction repetition and the same number of repetitions of the instruction are obtained.
  14. 根据权利要求12或13所述的解码电路,其特征在于,所述控制器配置成根据所述操作数对应的地址ID指向的操作数来源类型更新所述操作数对应的地址ID。The decoding circuit according to claim 12 or 13, wherein the controller is configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
  15. 根据权利要求12或13所述的解码电路,其特征在于,所述控制器配置成根据所述操作数对应的地址ID指向的操作数来源中存储的数据的数据类型更新所述操作数对应的地址ID。The decoding circuit according to claim 12 or 13, wherein the controller is configured to update the operand corresponding to the operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand Address ID.
  16. 根据权利要求12或13所述的解码电路,其特征在于,所述指令重复类型中的操作数为目的操作数,所述关键信息还包括:目的地直通DF字段,所述控制器还配置成在更新所述操作数对应的地址ID之前,确定所述目的地直通DF字段中的数值不为设定阈值。The decoding circuit according to claim 12 or 13, wherein the operand in the instruction repetition type is a destination operand, the key information further includes: a destination pass-through DF field, and the controller is further configured to Before updating the address ID corresponding to the operand, it is determined that the value in the destination through DF field is not a set threshold.
  17. 根据权利要求12或13所述的解码电路,其特征在于,所述指令重复类型中的操作数对应的地址ID指向的操作数来源类型为LDS,所述指令解压模块还包括:配置寄存器,所述配置寄存器配置成存储获取LDS中的源操作数的地址,并且在根据当前的地址从LDS中读取对应的源操作数后自动将自身的地址更新到下一个源操作数对应的地址;The decoding circuit according to claim 12 or 13, wherein the source type of the operand pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, and the instruction decompression module further includes: a configuration register, so The configuration register is configured to store the address of the source operand in the LDS, and automatically update its own address to the address corresponding to the next source operand after reading the corresponding source operand from the LDS according to the current address;
    相应地,所述控制器配置成根据所述配置寄存器当前指示的地址来更新所述操作数对应的地址ID,其中,所述操作数对应的地址ID与所述配置寄存器当前指示的地址相同。Correspondingly, the controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, wherein the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.
  18. 根据权利要求11所述的解码电路,其特征在于,所述解码器,还配置成在获取所述压缩指令中的关键信息之前,确定所述压缩指令有效。The decoding circuit according to claim 11, wherein the decoder is further configured to determine that the compression instruction is valid before obtaining the key information in the compression instruction.
  19. 根据权利要求11所述的解码电路,其特征在于,所述指令解压模块,还配置成在接收到所述解码器发送的关键信息时,向所述解码器发送阻止其从指令分发单元中获取指令的指示,以及在确定对所述压缩指令的解压结束时,向所述解码器发送允许其从指令分发单元中获取指令的指示。The decoding circuit according to claim 11, wherein the instruction decompression module is further configured to, upon receiving the key information sent by the decoder, send to the decoder to prevent it from obtaining the key information from the instruction distribution unit And when it is determined that the decompression of the compressed instruction ends, an instruction to allow the decoder to obtain the instruction from the instruction distribution unit is sent to the decoder.
  20. 一种处理器,其特征在于,包括:指令分发单元、指令执行单元如权利要求11-19任一项所述的解码电路,所述指令分发单元和所述指令执行单元均与所述解码电路连接。A processor, characterized by comprising: an instruction distribution unit and an instruction execution unit. The decoding circuit according to any one of claims 11-19, wherein the instruction distribution unit and the instruction execution unit are both connected to the decoding circuit connection.
PCT/CN2020/114004 2019-12-16 2020-09-08 Data processing method, decoding circuit, and processor WO2021120713A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911302511.8A CN111124495B (en) 2019-12-16 2019-12-16 Data processing method, decoding circuit and processor
CN201911302511.8 2019-12-16

Publications (2)

Publication Number Publication Date
WO2021120713A1 true WO2021120713A1 (en) 2021-06-24
WO2021120713A8 WO2021120713A8 (en) 2021-08-05

Family

ID=70499328

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/114004 WO2021120713A1 (en) 2019-12-16 2020-09-08 Data processing method, decoding circuit, and processor

Country Status (2)

Country Link
CN (1) CN111124495B (en)
WO (1) WO2021120713A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124495B (en) * 2019-12-16 2021-02-12 海光信息技术股份有限公司 Data processing method, decoding circuit and processor
CN112929379B (en) * 2021-02-22 2023-03-24 深圳供电局有限公司 Intelligent recorder remote operation and maintenance instruction defense method and system
CN116225538A (en) * 2023-05-06 2023-06-06 苏州萨沙迈半导体有限公司 Processor and pipeline structure and instruction execution method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1684104A (en) * 2004-07-26 2005-10-19 威盛电子股份有限公司 Method and device for compressing and decompressing instrument in computer system
CN1735860A (en) * 2003-01-09 2006-02-15 国际商业机器公司 Method and apparatus for instruction compression
US20120110307A1 (en) * 2010-11-01 2012-05-03 Fujitsu Semiconductor Limited Compressed instruction processing device and compressed instruction generation device
US20160321076A1 (en) * 2015-04-28 2016-11-03 Intel Corporation Method and apparatus for speculative decompression
CN111124495A (en) * 2019-12-16 2020-05-08 海光信息技术有限公司 Data processing method, decoding circuit and processor

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1685310A (en) * 2002-09-24 2005-10-19 皇家飞利浦电子股份有限公司 Apparatus, method, and compiler enabling processing of load immediate instructions in a very long instruction word processor
US8281109B2 (en) * 2007-12-27 2012-10-02 Intel Corporation Compressed instruction format
CA2783829C (en) * 2009-12-11 2018-07-31 Aerial Robotics, Inc. Transparent network substrate system
US20120254592A1 (en) * 2011-04-01 2012-10-04 Jesus Corbal San Adrian Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
US9672041B2 (en) * 2013-08-01 2017-06-06 Andes Technology Corporation Method for compressing variable-length instructions including PC-relative instructions and processor for executing compressed instructions using an instruction table
CN107729054B (en) * 2017-10-18 2020-07-24 珠海市杰理科技股份有限公司 Method and device for realizing execution of processor on loop body
CN111708574B (en) * 2020-05-28 2023-03-31 中国科学院信息工程研究所 Instruction stream compression and decompression method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1735860A (en) * 2003-01-09 2006-02-15 国际商业机器公司 Method and apparatus for instruction compression
CN1684104A (en) * 2004-07-26 2005-10-19 威盛电子股份有限公司 Method and device for compressing and decompressing instrument in computer system
US20120110307A1 (en) * 2010-11-01 2012-05-03 Fujitsu Semiconductor Limited Compressed instruction processing device and compressed instruction generation device
US20160321076A1 (en) * 2015-04-28 2016-11-03 Intel Corporation Method and apparatus for speculative decompression
CN111124495A (en) * 2019-12-16 2020-05-08 海光信息技术有限公司 Data processing method, decoding circuit and processor

Also Published As

Publication number Publication date
CN111124495A (en) 2020-05-08
WO2021120713A8 (en) 2021-08-05
CN111124495B (en) 2021-02-12

Similar Documents

Publication Publication Date Title
WO2021120713A1 (en) Data processing method, decoding circuit, and processor
US20210026634A1 (en) Apparatus with reduced hardware register set using register-emulating memory location to emulate architectural register
US11847185B2 (en) Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements
US8051226B2 (en) Circular buffer support in a single instruction multiple data (SIMD) data processor
CN101178644B (en) Microprocessor structure based on sophisticated instruction set computer architecture
US20180075113A1 (en) Efficient evaluation of aggregate functions
CN115421686A (en) FP16-S7E8 hybrid precision for deep learning and other algorithms
TW201820125A (en) Systems and methods for executing a fused multiply-add instruction for complex numbers
US11900114B2 (en) Systems and methods to skip inconsequential matrix operations
US10666288B2 (en) Systems, methods, and apparatuses for decompression using hardware and software
TW201732734A (en) Apparatus and method for accelerating graph analytics
US11640300B2 (en) Byte comparison method for string processing and instruction processing apparatus
WO2021249054A1 (en) Data processing method and device, and storage medium
CN110688160B (en) Instruction pipeline processing method, system, equipment and computer storage medium
US20180095760A1 (en) Instruction set for variable length integer coding
CN108268279B (en) System, apparatus and method for broadcasting arithmetic operations
TW201712534A (en) Decoding information about a group of instructions including a size of the group of instructions
US10069512B2 (en) Systems, methods, and apparatuses for decompression using hardware and software
KR20230129559A (en) Parallel decode instruction set computer architecture using variable-length instructions
CN109683959B (en) Instruction execution method of processor and processor thereof
US20190102199A1 (en) Methods and systems for executing vectorized pythagorean tuple instructions
US20230205530A1 (en) Graph Instruction Processing Method and Apparatus
CN116991481A (en) Execution method, device and medium of operation instruction
CN114518901A (en) Method and processing unit for randomly generating instruction sequence
CN117407057A (en) SM3 algorithm implementation method and device based on RISC-V architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20902110

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20902110

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 280323)

122 Ep: pct application non-entry in european phase

Ref document number: 20902110

Country of ref document: EP

Kind code of ref document: A1