WO2021120713A1

WO2021120713A1 - Data processing method, decoding circuit, and processor

Info

Publication number: WO2021120713A1
Application number: PCT/CN2020/114004
Authority: WO
Inventors: 陈庆
Original assignee: 成都海光微电子技术有限公司
Priority date: 2019-12-16
Filing date: 2020-09-08
Publication date: 2021-06-24
Also published as: CN111124495A; WO2021120713A8; CN111124495B

Abstract

A data processing method, a decoding circuit, and a processor, which belong to the technical field of computers. The method comprises: determining whether an acquired instruction is a compressed instruction (S101); when the acquired instruction is a compressed instruction, acquiring key information in the compressed instruction, the key information comprising an instruction repetition type and the number of times the instruction is repeated (S102), wherein the instruction repetition type is used to indicate an instruction type to be repeated, and the number of times the instruction is repeated is a positive integer greater than or equal to 2; and decompressing the compressed instruction according to the instruction repetition type and the number of times the instruction is repeated so as to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and being the same number as the number of times the instruction is repeated (S103). By compressing an instruction, one instruction block can accommodate more three operand instructions, which effectively reduces the probability of instruction cache miss, while also optimizing efficiency.

Description

Data processing method, decoding circuit and processor

Cross-references to related applications

This application claims the priority of the Chinese patent application with the application number 2019113025118 and titled "A data processing method, decoding circuit and processor" filed with the Chinese Patent Office on December 16, 2019, the entire content of which is incorporated by reference In this application.

Technical field

This application belongs to the field of computer technology, and specifically relates to a data processing method, a decoding circuit, and a processor.

Background technique

Computer instructions are instructions and commands that direct the work of a machine. A program is a series of instructions arranged in a certain order. The process of executing the program is the working process of the computer. When a computer executes an instruction (program), it needs to read the instruction from the instruction cache (Cache) first. If the instruction cache misses (Cache Miss), it will cause more serious performance problems. For example, fetching instructions takes a long time, which significantly increases the processing cycle of an instruction sequence and reduces performance. When an instruction is missing, the current instruction sequence is in a stopped and waiting state. If there is not enough active instruction sequence, the entire computing unit may stop, which significantly reduces performance.

Instruction block refers to a collection of instructions in a cache line (Cache Line). Since each cache line is only 512 bits and the 3-operand operation instructions use 64 bits, each cache line can only store 8 such operation instructions, so that an instruction block can only accommodate 8 three-operand instructions. Processing large operations therefore needs to read thousands of instruction blocks, which is obviously not conducive to power optimization.

Summary of the invention

The embodiments of this application are implemented as follows:

In the first aspect, an embodiment of the present application provides a data processing method, including: judging whether the acquired instruction is a compressed instruction; if yes, acquiring key information in the compressed instruction, and the key information includes: instruction The repetition type and the instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; The compressed instruction is decompressed to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction. In the embodiment of the present application, when the acquired instruction is a compressed instruction, the key information in the compressed instruction is acquired, and then the compressed instruction is decompressed according to the instruction repetition type and the number of instruction repetitions in the key information to decompress the compressed instruction into and Multiple instructions corresponding to the instruction repetition type and with the same number of instruction repetitions.

With reference to a possible implementation manner of the embodiment of the first aspect, decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions includes: according to the address ID corresponding to the operand in the instruction repetition type Generate an instruction, and update the number of instruction repetitions; when it is determined that the updated instruction repetition number is greater than a preset threshold, update the address ID corresponding to the operand; generate according to the updated address ID corresponding to the operand Command and update the instruction repetition number again; determine whether the re-updated instruction repetition number is equal to the preset threshold; if yes, determine that the decompression of the compression instruction is completed, and obtain the repetition of the instruction Multiple instructions corresponding to the type and with the same number of repetitions of the instruction. In the embodiment of the present application, when the compressed instruction is decompressed according to the instruction repetition type and the instruction repetition number, after each instruction is generated, the instruction repetition number is updated, and it is determined whether the updated instruction repetition number is equal to the preset threshold. If not, update the address ID corresponding to the operand, and generate instructions based on the address ID corresponding to the updated operand, then update the number of instruction repetitions again, and determine whether the updated number of instruction repetitions is equal to the preset threshold, until after the update When the number of instruction repetitions is equal to the preset threshold, the decompression of the compressed instruction is completed.

With reference to a possible implementation manner of the embodiment of the first aspect, decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions includes: according to the address ID corresponding to the operand in the instruction repetition type Generate instructions, and record the number of generations of the generated instructions; when it is determined that the number of generations is less than the number of repetitions of the instructions, update the address ID corresponding to the operand; generate instructions according to the updated address ID corresponding to the operand, And update the number of generations; determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of the decompression of the compression instruction, and obtain the corresponding instruction repetition type, and The instruction repeats multiple instructions with the same number of times. In the embodiment of the present application, when the compressed instruction is decompressed according to the instruction repetition type and the instruction repetition number, after each instruction is generated, the generation number of the generated instruction is recorded, and it is judged whether the recorded generation number is equal to the instruction repetition number. If not, update the address ID corresponding to the operand, and generate instructions based on the address ID corresponding to the updated operand, then update the generation times, and determine whether the updated generation times are equal to the number of instruction repetitions, until the updated instruction When the number of repetitions is equal to the number of instruction repetitions, the compressed instruction is decompressed. In this process, a counter is used to record the generation times of the generated instructions. After each instruction is generated, the generation times of the generated instructions are updated. When it is equal to the number of instruction repetitions, the decompression of the compressed instruction is completed.

With reference to a possible implementation manner of the embodiment of the first aspect, updating the address ID corresponding to the operand includes: updating the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand . In the embodiment of the present application, the address ID corresponding to the operand is updated by the operand source type pointed to by the address ID corresponding to the operand, so that when the address is updated, the rules when the address ID corresponding to the operand is updated by different operand source types can be different.

With reference to a possible implementation manner of the embodiment of the first aspect, updating the address ID corresponding to the operand includes: updating the data type of the data stored in the source of the operand pointed to by the address ID corresponding to the operand The address ID corresponding to the operand. In the embodiment of the present application, the data type of the data stored in the source of the operand pointed to by the address ID corresponding to the operand updates the address ID corresponding to the operand, so that when the address is updated, different data types can correspond to different update rules.

With reference to a possible implementation manner of the embodiment of the first aspect, the operand in the instruction repetition type is the destination operand, and the key information further includes the destination pass-through DF field, and the address corresponding to the operand is updated. Before ID, the method further includes: determining that the value in the destination through DF field is not a set threshold.

With reference to a possible implementation manner of the embodiment of the first aspect, before obtaining the key information in the compression instruction, the method further includes: determining that the compression instruction is valid.

In the second aspect, an embodiment of the present application also provides a decoding circuit, including: a decoder and an instruction decompression module; the decoder is configured to determine whether the acquired instruction is a compressed instruction, and if yes, acquire the compressed instruction In the key information, the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; instruction decompression A module configured to decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a corresponding instruction repetition type and the same number as the instruction repetition number Multiple instructions.

With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompression module includes: a controller configured to obtain an address ID corresponding to an operand in the instruction repetition type; an instruction generator configured to The address ID corresponding to the operand in the instruction repetition type generates an instruction; the controller is further configured to update the instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type The number of repetitions, and when it is determined that the updated number of instruction repetitions is greater than a preset threshold, the address ID corresponding to the operand is updated, and the address ID corresponding to the updated operand is sent to the instruction generator The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand; the controller is also configured to generate an instruction in the instruction generator according to the updated address corresponding to the operand After the ID command is generated, the number of instruction repetitions is updated again, and it is determined whether the re-updated instruction repetition number is equal to the preset threshold; if yes, it is determined that the decompression of the compression instruction ends, and the result is obtained. Multiple instructions corresponding to the instruction repetition type and with the same number of repetition times of the instruction.

With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompression module includes: a controller configured to obtain an address ID corresponding to an operand in the instruction repetition type; an instruction generator configured to The address ID corresponding to the operand in the instruction repetition type generates an instruction; the controller is further configured to record the generation instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type The number of generations, and when it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator; The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand; the controller is also configured to generate the instruction according to the updated address ID corresponding to the operand in the instruction generator After the instruction, update the number of generations, and determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of decompression of the compression instruction, and obtain the corresponding instruction repetition type , And multiple instructions with the same number of repetitions of the instruction.

With reference to a possible implementation manner of the embodiment of the second aspect, the controller is configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.

With reference to a possible implementation manner of the embodiment of the second aspect, the controller is configured to update the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand .

With reference to a possible implementation manner of the embodiment of the second aspect, the operand in the instruction repetition type is the destination operand, the key information further includes the destination pass-through DF field, and the controller is also configured to update Before the address ID corresponding to the operand, it is determined that the value in the destination through DF field is not a set threshold.

With reference to a possible implementation manner of the embodiment of the second aspect, the source type of the operand pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, and the instruction decompression module further includes: a configuration register, the configuration The register is configured to store the address of the source operand in the LDS, and automatically update its own address to the address corresponding to the next source operand after reading the corresponding source operand from the LDS according to the current address; accordingly, The controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, wherein the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.

With reference to a possible implementation manner of the embodiment of the second aspect, the decoder is further configured to determine that the compression instruction is valid before obtaining the key information in the compression instruction.

With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompression module is further configured to send to the decoder to prevent it from obtaining the key information from the instruction distribution unit when receiving the key information sent by the decoder And when it is determined that the decompression of the compressed instruction ends, an instruction to allow the decoder to obtain the instruction from the instruction distribution unit is sent to the decoder.

In a third aspect, an embodiment of the present application further provides a processor, including: an instruction distribution unit, an instruction execution unit such as the foregoing second aspect embodiment and/or any possible implementation manner in combination with the foregoing second aspect embodiment In the decoding circuit provided, the instruction distributing unit and the instruction execution unit are both connected to the decoding circuit.

Other features and advantages of the present application will be described in the following description, and partly become obvious from the description, or can be understood by implementing the embodiments of the present application. The purpose and other advantages of the present application can be realized and obtained through the structure specifically pointed out in the written description and the drawings.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings. The above and other objectives, features and advantages of the present application will be clearer through the drawings. The same reference numerals indicate the same parts in all the drawings. The drawings are not deliberately scaled to the actual size and proportions, and the focus is to show the main point of the application.

Fig. 1 shows a schematic diagram of each field in a VOP3R instruction provided by an embodiment of the present application.

Fig. 2 shows a schematic structural diagram of a decoding circuit provided by an embodiment of the present application.

FIG. 3 shows a schematic structural diagram of another decoding circuit provided by an embodiment of the present application.

FIG. 4 shows a schematic structural diagram of another decoding circuit provided by an embodiment of the present application.

FIG. 5 shows a schematic flowchart of a data processing method provided by an embodiment of the present application.

Fig. 6 shows a schematic structural diagram of a processor provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once a certain item is defined in one figure, it does not need to be further defined and explained in subsequent figures. At the same time, in the description of this application, relational terms such as "first", "second", etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities Or there is any such actual relationship or sequence between operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.

Furthermore, the term "and/or" in this application is only an association relationship describing the associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A alone exists, and both A and A exist at the same time. B, there are three cases of B alone.

In view of the fact that a cache line can only store 8 three-operand arithmetic instructions, in order to avoid the occurrence of an instruction cache miss (Cache Miss), an instruction block can only accommodate 8 three-operand instructions, which is optimized for power Say, it's not enough. Therefore, the embodiments of this application provide an efficient instruction compression method, which can compress 64 3-operand instructions into 64 bits, so each cache line can store up to 512 3-operand instructions, which not only improves the computing performance , And it can also significantly reduce the number of instruction cache misses.

In order to support the compression of 64 3-operand instructions into 64 bits, this application introduces a VOP3R (Vector Operation with 3 Operand and Repeat, with 3 operands and repeated vector operations) instruction, and the set type is "110010" , That is, 110010 indicates that the instruction is a VOP3R instruction, as shown in Figure 1. Among them, the VOP3R instruction defines the following special fields, as shown in Table 1.

Table 1

It should be noted that the number of bits (bit width) of each field in Table 1 is relatively fixed, and its position can be changed. For example, Repeat_Enable can no longer be the number of [62:59], it can be In the [3:0] digit, the situation of the other fields is similar.

Among them, Repeat_Enable: Repeat enable field, 4bit, each bit indicates the repetition of the source operand (Operand0, Operand1, Operand2) and the destination operand (also called Result), for example, B[59:59](Or B[ 0:0]): RepeatOperand0; B[60:60](OrB[1:1]): RepeatOperand1; B[61:61](OrB[2:2]): RepeatOperand2; B[62 ：62](Or B[3:3]): Repeat destination. Among them, it should be noted that only the source operand comes from Vector General Purpose Register (VGPR)/Scalar General Purpose Register (SGPR)/Local Data Share (LDS_DIRECT), and The destination operand comes from the repetition of the operand of VGPR/SGPR, and other cases are directly ignored.

In order to support this kind of instruction repetition, in hardware, an embodiment of the present application provides a decoding circuit, as shown in FIG. 2. After the decoding circuit obtains the instruction from the instruction dispatch unit (Instruction Dispatch), it determines whether the instruction is a compressed instruction. If it is no, that is, when the current instruction is not a compressed instruction, the decoding circuit sends the instruction directly to the instruction execution unit (Instruction Execution), the instruction execution unit executes the instruction; when yes, that is, when the current instruction is a compressed instruction, the decoding circuit obtains the key information in the compressed instruction; then according to the instruction repetition type and the number of instruction repetitions in the key information The compressed instruction is decompressed to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions.

Wherein, the key information includes: instruction repetition type and instruction repetition number. The instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2. The instruction repeat type is obtained according to the repeat enable field (Repeat_Enable) in the compressed instruction, and the instruction repeat number is obtained according to the repeat count field (Repeat_Counter). When judging whether the instruction is a compression instruction, you can judge whether the current instruction is a compression instruction according to the Repeat_Counter field, if Repeat_Count! = 0x0, it is a compressed instruction, if repeat_count == 0x0 (hexadecimal 0), it is a non-compressed instruction. The detailed parameters of the key information are shown in Table 2.

Table 2

字段Field	位数Number of digits
Operation_codeOperation_code	1010
Repeat_CounterRepeat_Counter	66
Result_ID Result_ID	88
Repeat_EnableRepeat_Enable	44
Operand2_IDOperand2_ID	99
Operand1_IDOperand1_ID	99
Operand0_IDOperand0_ID	99

In order to facilitate understanding, a specific description is given. For example, the compression instruction is:

Repeat Enable(0x3), Repeat Counter(62)::

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;

Then according to the instruction repetition type and the number of instruction repetitions, the compressed instruction is decompressed, and 62 instructions corresponding to the instruction repetition type (repeat Operand0 and Operand1) and the same number of instruction repetition times (62) can be obtained. The obtained instructions are as follows :

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;

...

Forwarding=LDS_Direct(M0_register)*B(61,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(62,ALU_Index)+Forwarding;

Among them, Repeat Enable represents the type of instruction repetition, where 0x3 represents the two operands of Operand0 and Operand1, RepeatCounter represents the number of instruction repetitions, and 62 represents the number of repetitions, so that after decompressing the compressed instruction, you can get 62 instructions . It should be noted that only the types of instructions to be repeated are Operand0 and Operand1 as examples. The types of instructions to be repeated can be repeated Result (destination operand), Operand0, Operand1, Operand2, among the four operands. There are at least one of these 15 combinations. Different bytes are defined to indicate different repeat types. For example, Repeat Enable (0x1) indicates repeating Operand 0 operand, Repeat Enable (0x2) indicates repeating Operand 1 operand, and Repeat Enable( 0x3) means to repeat the two operands of Operand0 and Operand1.

Among them, because the instructions are divided into regular instructions (single instructions) and compressed instructions, the instruction logic of the corresponding hardware includes regular mode and repeat mode. When Repeat_Count == 0, it means the regular mode. In the regular mode, the execution logic is obtained from the instruction distribution unit. Instructions and execute. Repeat_Count! =0 indicates the repeat mode. In the repeat mode, the decoding circuit stops fetching instructions from the instruction distribution unit. When the decoding circuit completes the decompression of the compressed instruction, that is, when Repeat_Count == 0, it switches back to the normal mode.

The foregoing decoding circuit compresses instructions so that one instruction block can accommodate more three-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.

Wherein, as an implementation manner, the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition number; When it is determined that the updated instruction repetition number is greater than the preset threshold, the address ID corresponding to the operand is updated according to the address ID corresponding to the operand; the instruction is generated according to the address ID corresponding to the updated operand, and the instruction repetition number is updated again ; Determine whether the number of instruction repetitions after the re-update is equal to the preset threshold; if yes, complete the decompression of the compressed instruction, and obtain multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions, if it is no When, repeat the operation (update the address ID corresponding to the operand; generate the instruction according to the address ID corresponding to the updated operand, and update the instruction repetition number again; determine whether the instruction repetition number after the update is equal to the preset threshold) until The updated instruction repetition number equals the preset threshold and ends.

The code for this process is as follows:

if(repeat_count! = 0x0)

{//repeat one instruction as below (The process of generating an instruction repeatedly is as follows):

Operand0_id=OperandRepeat(Operand0_id, Repeat_Enable&0x1); // The address update function of Operand0;

Operand1_id=OperandRepeat(Operand1_id, Repeat_Enable&0x2); // The address update function of Operand1;

Operand2_id=OperandRepeat(Operand2_id, Repeat_Enable&0x4); // The address update function of Operand2;

Result_ID=OperandRepeat(Result_ID, Repeat_Enable&0x8); //Result address update function;

GenerateRepeatInstruction(Result_ID,Operand0_id,Operand1_id,Operand2_id);//Generate instructions according to the new address;

repeat_count--;//Update instruction repeat times;

if(repeat_count == 0)

{

Exit;

}

In this implementation, that is, when the instruction is generated for the first time, the instruction is generated according to the address ID included in the compressed instruction, as in the above example, "Forwarding=LDS_Direct(M0_register)*B(1, ALU_Index) +Forwarding” this instruction is generated based on the default address ID (address 1) in the compressed instruction, and then update the instruction repetition number (the instruction repetition number at this time is 61), after confirming that the updated instruction repetition number (61) is greater than When the preset threshold (such as 0), the address ID (address 2) corresponding to the operand is updated, the instruction is generated according to the updated address ID, and the instruction repetition times are updated again, and then it is judged whether the updated instruction repetition times is equal to the preset threshold value If yes, update the address ID corresponding to the operand again, generate instructions based on the updated address ID, and update the number of instruction repetitions again (the number of instruction repetitions at this time is 60), and then determine the updated instruction repetition number (60 ) Is equal to the preset threshold, if it is still greater than, repeat the above operation (update the address ID corresponding to the operand, generate the instruction according to the updated address ID, and update the instruction repetition times again, and then judge whether the updated instruction repetition times Equal to the preset threshold), until the updated instruction repetition number (0) equals the preset threshold (such as 0), it ends. When the updated instruction repetition number is equal to the preset threshold, 62 instructions of the operand are obtained. That is to complete the decompression of the compression command.

In the whole process of judging whether the above-mentioned decoding circuit has completed the decompression of the compressed instruction, there is no need to use other components (such as a counter), and it can be completed by directly updating the number of instruction repetitions after each instruction is generated. Under the premise of ensuring accuracy, It can simplify the processing flow to the greatest extent and save costs.

As yet another implementation manner, the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation times of the generated instruction ; When it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand; generate instructions according to the address ID corresponding to the updated operand, and update the number of generations; determine whether the updated number of generations is equal to the number of instruction repetitions; If yes, complete the decompression of the compressed instruction, and obtain multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions; otherwise, repeat the operation (update the address ID corresponding to the operand; according to the update The address ID corresponding to the subsequent operand generates an instruction and updates the generation times; it is determined whether the updated generation times are equal to the instruction repetition times), until the updated generation times are equal to the instruction repetition times.

The principle of this embodiment is the same as that of the previous embodiment. The difference is that in the first embodiment, after the command is generated, the number of repetitions of the command is updated, and it is determined whether the updated number of repetitions of the command is equal to the preset number. The threshold (for example, 0) is used to determine whether the decompression of the compressed instruction is completed. In this embodiment, after the instruction is generated, the number of generations of the generated instruction is recorded, and the completion is determined by judging whether the number of generations is equal to the number of repetitions of the instruction Decompression work on compression instructions. That is, in this embodiment, it is necessary to use a counter to count the number of generated instructions. Each time an instruction is generated, the number is counted once, and the value is incremented. It is determined whether the number of records is equal to the number of instruction repetitions (62). Continue to generate instructions. This implementation uses a counter to record the number of generations of generated instructions. After each instruction is generated, the number of generations of generated instructions is updated. When the number of generations is equal to the number of instruction repetitions, the decompression of compressed instructions is completed. A feasible way to enrich the applicability of the scheme.

Among them, in the process of decompressing the compressed instruction, every time an instruction is generated, the instruction is issued to the instruction execution unit.

Wherein, the operand in the above instruction repetition type may be at least one of the four operands of Result, Operand0, Operand1, Operand2. When updating the address ID corresponding to the operand, in one implementation, it can be based on the operand source type pointed to by the address ID corresponding to the operand (such as VGPR/SGPR/LDS_DIRECT), such as the update corresponding to different operand source types The rules for the address ID may be different. For example, the rule for updating the address ID corresponding to VGPR as the source of the operand is different from the rule for updating the address ID corresponding to the SGPR as the source of the operand.

The following is an example of the same rules for updating address ID corresponding to VGPR as the source of the operand and the same rule for updating address ID corresponding to SGPR as the source of the operand. For example, when the source of the operand pointed to by the ID corresponding to the operand is VGPR/SGPR When the address ID is updated, it can be updated based on the (Operand_ID++, or Result_ID++) rule, that is, the updated address is equal to the address before the update plus one. For ease of understanding, take Operand1 as an example. If Operand1_ID points to VGPR/SGPR, repeat it as follows:

if(Operand1_ID is SGPR or VGPR)

{

Operand1_ID=((Repeat_Enable&0x8)!=0)? Operand1_ID++:Operand1_ID;

}

That is, if the source of the operand pointed to by Operand1_ID is VGPR/SGPR, and Repeat_Enable[60] is 1, the address of operand 1 (Operand1_ID) is increased by 1; if it is 0, the address of operand 1 remains unchanged. Among them, it should be noted that, in the above example, only the address self-increment, and the increment is 1 as an example, the law of address update can also be the address self-decrement. In this case, the amplitude may not be 1. It mainly depends on whether the data is stored in an incremental manner or a decremental manner, whether it is continuous storage, etc., therefore, this example cannot be understood as a limitation of the application.

When the operand source pointed to by the ID corresponding to the operand is LDS_DIRECT, the rules for updating the address ID are different from when the operand source pointed to by the ID corresponding to the operand is VGPR/SGPR. If the operand source pointed to by the ID corresponding to the operand is LDS_DIRECT, in this mode, the hardware reads the data from the LDS as the operand, and the access address and data type are determined by the configuration register, such as the M0 register (32bit dedicated hardware internal Register, its low 16bit is used as address by LDS_DIRECT) to determine. The 32bit definition of M0 register is shown in Table 3.

table 3

Therefore, when the source operand comes from LDS_DIRECT, when the address ID is updated at this time, the address field of the M0 register needs to be automatically updated. Correspondingly, the address pointed to by the address ID is the address stored in the M0 register, and the address is used to read the source operand stored in the LDS. That is, the M0 register is configured to store the address of the source operand in the LDS (such as the element of each row in the matrix), and after reading the corresponding element from the LDS according to the current address, the M0 register needs to be The address is updated to the address corresponding to the next element.

As yet another implementation manner, in addition to updating the address ID corresponding to the operand according to the operand source type pointed to by the address ID corresponding to the operand, the data stored in the operand source pointed to by the address ID corresponding to the operand can also be used. Type to update the address ID corresponding to the operand. Different data types correspond to different address update rules, for example, as shown below:

Address i+1=Addressi+0x1;//The data type is unsignedbyte;

Address i+1=Addressi+0x2;//The data type is unsignedbyte;

Address i+1=Addressi+0x4;//The data type is DWord;

Address i+1=Addressi+0x0;//The data type is Default(Reserved);

Address i+1=Addressi+0x1;//The data type is signed byte;

Address i+1=Addressi+0x2;//The data type is signed short;

Address i+1=Addressi+0x8;//The data type is Qword;

Take the operand source pointed to by the address ID corresponding to the operand as LDS_DIRECT as an example. At this time, the address field of the M0 register is automatically updated during the update, and the data type of the data stored in the LDS should also be considered. If the data type is unsignedbyte , Then update according to the law of Address i+1=Addressi+0x1.

Among them, when the operand is the destination operand (Result), before the address update is performed, it is also necessary to ensure that the source type of the operand pointed to by the address ID corresponding to the destination operand is not a temporary register used for data pass-through. Among them, the destination pass-through DF field can be used to determine whether the source type of the operand pointed to by the address ID corresponding to the destination operand is a temporary register for data pass-through. When DF==1, the Result_ID is forwarding (pass-through). At this time, the address does not need to be updated, just keep forwarding. At this time, when generating the instruction, the instruction is generated based on the default Result_ID in the compressed instruction. The Result_ID in the instruction is the same. Conversely, that is, if DF is not 1, the source type of the operand pointed to by the address ID corresponding to the destination operand is not a temporary register for data pass-through, such as VGPR/SGPR, then the address update can be performed in the previous way.

In the foregoing embodiment, when the operand is the destination operand, before updating the address ID corresponding to the operand, it is necessary to determine that the value in the destination pass-through DF field is not the set threshold to avoid affecting the data pass-through.

In order to improve efficiency and avoid the waste of resources caused by decompressing the wrong compression instruction, the decoding circuit can also determine whether the compression instruction is valid before obtaining the key information in the compression instruction, and only obtain the compression instruction after the compression instruction is determined to be valid. According to the key information in the key information, the compressed instructions are decompressed according to the instruction repetition type and the number of instruction repetitions in the key information.

As an implementation manner, whether the compression instruction is valid can be determined in the following manner: the compression is determined according to the repetitive enable field in the compression instruction that characterizes the source operand, or the repetitive enable field in the compression instruction that characterizes the destination operand. Whether the instruction is valid; when the repetitive enable field that characterizes the source operand is not zero, and the address ID corresponding to the source operand points to the specified operand source type (such as VGPR/SGPR/LDS_Direct), or when characterizing the destination operand When the repeated enable field is not zero, it indicates that the compression instruction is valid. If at least one of the following is true, it means that the compression instruction is valid:

If(Repeat_Enable[59:59]!=0x0)andoperand0_ID isVGPR/SGPR/LDS_DIRECT;

If(Repeat_Enable[60:60]!=0x0)andoperand1_ID isVGPR/SGPR/LDS_DIRECT;

If(Repeat_Enable[61:61]!=0x0)andoperand2_ID isVGPR/SGPR/LDS_DIRECT;

If(Repeat_Enable[62:62]! = 0x0);

That is, the repetition enable field of at least one source operand is not zero, and the corresponding address ID points to the source type of the specified operand, or the repetition enable field of the destination operand is not zero, indicating that the compression instruction is valid.

The above is described from the perspective of the entire decoding circuit. In order to facilitate the understanding of the information interaction between the various components in the decoding circuit, the steps performed by the various components in the decoding circuit are described below. As shown in Figure 1, the decoding circuit includes: a repeat decoder, an instruction decompression module, and the decoder is connected to the instruction decompression module.

Wherein, the decoder is configured to determine whether the acquired instruction is a compressed instruction, and if it is not, it sends the instruction to the instruction execution unit to execute the instruction, and if it is, it acquires key information in the compressed instruction.

In order to improve efficiency, in some possible implementation manners, the decoder is further configured to determine that the compressed instruction is valid before obtaining key information in the compressed instruction. As an implementation manner, the decoder is configured to determine that the compression instruction is valid according to the following method: according to the repeated enable field in the compression instruction that characterizes the source operand, or the repeated enable field in the compression instruction that characterizes the destination operand. Determine whether the compression instruction is valid; when the repetitive enable field representing the source operand is not zero, and the address ID corresponding to the source operand points to the specified operand source type, or when the repetitive enable field representing the destination operand is not When it is zero, it means that the compression command is valid.

The instruction decompression module is configured to decompress the compressed instruction according to the instruction repetition type and the number of instruction repetitions, so as to decompress the compressed instruction into multiple instructions corresponding to the instruction repetition type and the same number of instruction repetitions.

In some possible implementations, the instruction decompression module is also configured to, upon receiving the key information sent by the decoder, send an instruction to the decoder to prevent it from obtaining the instruction from the instruction distribution unit, and to decompress the compressed instruction after the completion of the decompression. At the time, send an instruction to the decoder to allow it to obtain instructions from the instruction distribution unit.

In one implementation, as shown in FIG. 3, the instruction decompression module includes: a controller and an instruction generator. The controller is respectively connected with the instruction generator and the decoder.

In one implementation, in some possible implementations, the controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type; the instruction generator is configured to generate the address ID corresponding to the operand in the instruction repetition type The controller is also configured to update the number of instruction repetitions after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, and update the operation when it is determined that the updated instruction repetition number is greater than a preset threshold The address ID corresponding to the number, and the address ID corresponding to the updated operand is sent to the instruction generator; the instruction generator is also configured to generate instructions according to the address ID corresponding to the updated operand; the controller is also configured to After the instruction generator generates the instruction according to the address ID corresponding to the updated operand, it updates the instruction repetition times again, and determines whether the re-updated instruction repetition times is equal to the preset threshold; if yes, completes the decompression of the compressed instruction, Obtain multiple instructions corresponding to the instruction repetition type and with the same number of instruction repetitions.

In yet another embodiment, the controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type; the instruction generator is configured to generate the instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller also After the instruction generator generates an instruction according to the address ID corresponding to the operand in the instruction repetition type, it records the generation times of the generated instruction, and when it is determined that the generation times are less than the instruction repetition times, the address ID corresponding to the operand is updated, The address ID corresponding to the updated operand is sent to the instruction generator; the instruction generator is also configured to generate instructions according to the address ID corresponding to the updated operand; the controller is also configured to generate instructions in the instruction generator according to the updated operation After generating the instruction for the address ID corresponding to the number, update the generation times, and determine whether the updated generation times are equal to the instruction repetition times; if yes, complete the decompression of the compressed instruction, and obtain the corresponding instruction repetition type and repeat the instruction Multiple instructions with the same number of times.

In an implementation manner, when the controller updates the address ID corresponding to the operand, it is further configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.

In one embodiment, when the controller updates the address ID corresponding to the operand, when the controller updates the address ID corresponding to the operand, it is also configured to use the data stored in the source of the operand pointed to by the address ID corresponding to the operand. The address ID corresponding to the data type update operand.

In some possible implementations, the operand in the instruction repetition type is the destination operand, and the key information also includes the destination pass-through DF field. The controller is also configured to determine the destination pass-through before updating the address ID corresponding to the operand The value in the DF field is not the set threshold (such as 1).

In some possible implementations, the controller is also configured to send an instruction to the decoder to prevent it (the decoder) from obtaining instructions from the instruction distribution unit during the process of decompressing the compressed instruction. At this time, the decoder is not present. Obtain instructions from the instruction distribution unit. When the decompression is completed, an instruction is sent to the decoder to allow it (the decoder) to obtain the instruction from the instruction distribution unit. At this time, the decoder can obtain the instruction from the instruction distribution unit. That is, the controller includes a normal mode and a repeat mode. In the normal mode (when Repeat_Count == 0 indicates the normal mode), the controller allows the decoder to obtain instructions from the instruction distribution unit and execute them. In repeat mode (Repeat_Count! = 0 means repeat mode), the controller prevents the decoder from obtaining instructions from the instruction distribution unit. After the controller completes the decompression of the compressed instruction, that is, when Repeat_Count == 0, switch Back to normal mode.

When the operand source type pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, the instruction decompression module also includes: a configuration register, which is configured to store the address of the source operand in the LDS, and is based on the current After the address reads the corresponding source operand from the LDS, it automatically updates the address of its own (configuration register) to the address corresponding to the next source operand. At this time, when the controller updates the address ID corresponding to the operand, it is also configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, and the address ID is the same as the address currently indicated by the configuration register. At this time, as shown in Figure 4, the instruction decompression module includes: a controller, a configuration register (M0 register), and an instruction generator. The controller is respectively connected with the decoder, the instruction generator and the configuration register.

Among them, when describing the function of each component, what is not mentioned can refer to the same part in the foregoing embodiment when the decoding circuit is described as a whole. This part has been described in detail in the foregoing device embodiment. Introduction, for the sake of brevity of the manual, the introduction is not repeated here.

In the embodiments of the present application, instructions are compressed through VOP3R, so that each cache line (512bit) can accommodate 512 3-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency. To facilitate understanding, the following uses the method provided in the embodiment of the present application to be applied to matrix multiplication as an example for description. _{Here, a 64X64} matrix is taken as an example, C 64x64 =A _64x64 *B _64x64 , where the 64X64 matrix size is only an example and is not limited to this. Assuming that there are 64 arithmetic operation units, each arithmetic operation unit has a 200x64bit VGPR space.

The calculation process is roughly as follows:

1) Matrix A is loaded into LDS in linear mode:

A(0,0)→LDS(Address0);//A(0,0) is stored in the Address0 location of LDS;

A(0,1)→LDS(Address1);//A(0,1) is stored in Address1 of LDS;

A(0,2)→LDS(Address2);//A(0,2) is stored in the location of Address2 of LDS;

...

2) Matrix B is loaded into the VGPR space, as shown in Table 4.

Table 4

ALU0ALU0	ALU1ALU1	ALU2ALU2	……...	ALU62ALU62	ALU63ALU63
B0,0B0,0	B0,1B0,1	B0,2B0,2	……...	B0,62B0,62	B0,63B0,63
B1,0B1,0	B1,1B1,1	B1,2B1,2	……...	B1,62B1,62	B1,63B1,63
……...	……...	……...	……...	……...	……...
B63,0B63,0	B63,1B63,1	B63,2B63,2	……...	B63,62B63,62	B63,63B63,63

Among them, different VGPR stores different rows. During calculation, the elements in matrix A are loaded into 64 ALUs one by one in parallel, and are multiplied by the elements corresponding to the columns stored in each of the 64 vector general registers, 64 The ALU sequentially accumulates the multiplication results generated by the elements in the same row of matrix A and the corresponding elements of matrix B in parallel to obtain all elements in the same row of matrix C, thereby completing the multiplication operation of matrix A and second matrix B.

3) Calculate matrix C:

The instruction to calculate matrix C in normal mode is as follows:

M0_register=start_address; //The initial address of the M0 register, where the M0 register is configured to store the address of each element in the read matrix A, and read the matrix A from the LDS based on the current address of the M0 register in 64 ALUs in parallel After the corresponding element in the file is automatically updated to the address corresponding to the next element.

//-----------------------------------------

//Calculate the first row of Matrix C (calculate the first row of matrix C):

//C(0,0) is calculated on ALU_Index0:ALU_Index=0 (ALU0 calculates C(0,0)).

//C(0,1) is calculated on ALU_Index1:ALU_Index=1 (ALU1 calculates C(0,1)).

//...

//-----------------------------------------Calculate matrix C separately for each ALU The execution instruction of the corresponding element in the first line of is as follows:

Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;

...

Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;

//-----------------------------------------

//Calculate the second row of Matrix C (calculate the second row of matrix C):

//C(1,0) is calculated on ALU_Index0 (ALU0 calculates C(1,0)).

//C(1,1) is calculated on ALU_Index1 (ALU1 calculates C(1,1)).

//...

//------------------------------------------Calculate the matrix separately for each ALU The execution instruction of the corresponding element in the second line of C is as follows:

Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;

...

Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;

...

//-----------------------------------------

//Calculate the last row of Matrix C (calculate the last row of matrix C):

//C(63,0) is calculated on ALU_Index0 (ALU0 calculates C(63,0)).

//C(63,1) is calculated on ALU_Index1 (ALU1 calculates C(63,1)).

//...

//-----------------------------------------Calculate matrix C separately for each ALU The execution instruction of the corresponding element in the last line of is as follows:

Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;

Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;

...

Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;

The above is the conventional mode without command compression. The following is the command compression method provided by this application to compress the above-mentioned conventional command list as follows:

M0_register=start_address;

//-----------------------------------------

//Calculate the first row of Matrix C (calculate the first row of matrix C):

//C(0,0)is calculated on ALU_Index0:ALU_Index=0.

//C(0,1)is calculated on ALU_Index1:ALU_Index=1.

//...

//-----------------------------------------

Block_Star::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);

RepeatEnable(0x3, RepeatCounter(62)::

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0and Operand1;

Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;

//-----------------------------------------

//Calculate the second row of Matrix C (calculate the second row of matrix C):

//C(1,0)is calculated on ALU_Index0.

//C(1,1)is calculated on ALU_Index1.

//...........

//-----------------------------------------

Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);

RepeatEnable(0x3), RepeatCounter(62)::

Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;

...

//-----------------------------------------

//Calculate the last row of Matrix C (calculate the last row of matrix C):

//C(63,0)is calculated on ALU_Index0.

//C(63,1)is calculated on ALU_Index1.

//...........

//-----------------------------------------

Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);

RepeatEnable(0x3), RepeatCounter(62)::

Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//RepeatOperand0andOperand1;

Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;

It can be seen from the above that using the conventional instruction mode to complete C _64x64 = A _64x64 * B _64x64 requires 64X64 instructions = 4096 instructions, and after using the instruction compression of this application, only 3x64 instructions are required, which significantly improves efficiency .

Please refer to FIG. 5 for a data processing method provided by an embodiment of this application. The steps involved will be described below in conjunction with FIG. 5.

Step S101: Determine whether the acquired instruction is a compressed instruction.

If it is yes, execute step S102, if it is no, send the acquired instruction to the instruction execution unit.

Step S102: Acquire key information in the compressed instruction, where the key information includes: instruction repetition type and instruction repetition number.

The instruction repetition type is used to indicate the instruction type to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2.

Step S103: Decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a quantity corresponding to the instruction repetition type and the same quantity as the instruction repetition number. Multiple instructions.

In some possible implementation manners, before obtaining the key information in the compression instruction, the method further includes: determining that the compression instruction is valid.

Wherein, in an implementation manner, the process of decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and Update the number of repetitions of the instruction; when it is determined that the number of repetitions of the instruction after the update is greater than a preset threshold, update the address ID corresponding to the operand; generate the instruction according to the updated address ID corresponding to the operand, and again Update the number of instruction repetitions; determine whether the re-updated instruction repetition number is equal to the preset threshold; if yes, determine that the decompression of the compressed instruction is completed, and obtain the corresponding instruction repetition type, And multiple instructions with the same number of repetitions as the instructions.

In one implementation, the process of decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation The number of generations of the instruction; when it is determined that the number of generations is less than the number of repetitions of the instruction, the address ID corresponding to the operand is updated; the instruction is generated according to the updated address ID corresponding to the operand, and the number of generations is updated ; Determine whether the updated number of generations is equal to the number of instruction repetitions; if yes, determine the end of the decompression of the compressed instruction, and obtain the number corresponding to the instruction repetition type and the number of instruction repetitions The same multiple instructions.

In some possible implementation manners, the process of updating the address ID corresponding to the operand may be: updating the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.

In some possible implementations, the process of updating the address ID corresponding to the operand may also be: updating the corresponding operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand The address ID.

In some possible implementations, the operand in the instruction repetition type is the destination operand, and the key information further includes the destination pass-through DF field. Before updating the address ID corresponding to the operand, the method It also includes: determining that the value in the destination through DF field is not a set threshold.

The implementation principles and technical effects of the methods provided in the embodiments of the present application are the same as those of the foregoing device embodiments. For a brief description, for the parts not mentioned in the method embodiments, please refer to the corresponding content in the foregoing device embodiments.

The embodiment of the present application also provides a processor, as shown in FIG. 6. The processor includes a decoding circuit, an instruction execution unit, and an instruction distribution unit in any of the foregoing embodiments. Both the instruction distribution unit and the instruction execution unit are connected to the decoding circuit. The instruction distribution unit is configured to store instructions so that the decoding circuit can obtain instructions from the instruction distribution unit. The instruction execution unit is configured to execute instructions issued by the decoding circuit.

Among them, the processor may be an integrated circuit chip with signal processing capabilities. The foregoing processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), a graphics processing unit (Graphics Processing Unit, GPU), etc.; a general-purpose processor may be a micro The processor or the processor may also be any conventional processor or the like.

It should be noted that the various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. For the same and similar parts between the various embodiments, refer to each other. can.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Industrial applicability

The data processing method, decoding circuit, and processor provided in this application determine whether the acquired instruction is a compressed instruction; if yes, acquire key information in the compressed instruction, and the key information includes: instruction repetition type and instruction The number of repetitions, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; the compression instruction is performed according to the instruction repetition type and the instruction repetition number Decompression, so as to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and having the same number of repetition times of the instruction. In the embodiment of the present application, by compressing instructions, one instruction block can accommodate more three-operand instructions, which not only effectively reduces the probability of instruction cache misses, but also optimizes efficiency.

Claims

A data processing method, characterized in that it comprises:

Determine whether the acquired instruction is a compressed instruction;

If yes, acquire key information in the compressed instruction, the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition type is used to indicate the type of instruction to be repeated, and the instruction repetition number Is a positive integer greater than or equal to 2;

Decompress the compressed instruction according to the instruction repetition type and the instruction repetition number to decompress the compressed instruction into a plurality of instructions corresponding to the instruction repetition type and the same number of instruction repetition times .
The method according to claim 1, wherein the decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions comprises:

Generate an instruction according to the address ID corresponding to the operand in the instruction repetition type, and update the instruction repetition number;

When it is determined that the number of repetitions of the instruction after the update is greater than a preset threshold, update the address ID corresponding to the operand;

Generate an instruction according to the updated address ID corresponding to the operand, and update the instruction repetition number again;

Judging whether the number of repetitions of the instruction after being updated again is equal to the preset threshold;

If yes, it is determined that the decompression of the compression instruction is completed, and multiple instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction are obtained.
The method according to claim 1, wherein the decompressing the compressed instruction according to the instruction repetition type and the number of instruction repetitions comprises:

Generate an instruction according to the address ID corresponding to the operand in the instruction repetition type, and record the generation times of the generated instruction;

When it is determined that the number of generations is less than the number of instruction repetitions, update the address ID corresponding to the operand;

Generate an instruction according to the updated address ID corresponding to the operand, and update the number of generations;

Judging whether the updated generation times are equal to the instruction repetition times;

If yes, it is determined that the decompression of the compression instruction is completed, and multiple instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction are obtained.
The method according to claim 2 or 3, wherein updating the address ID corresponding to the operand comprises:

The address ID corresponding to the operand is updated according to the source type of the operand pointed to by the address ID corresponding to the operand.
The method according to claim 2 or 3, wherein updating the address ID corresponding to the operand comprises:

The address ID corresponding to the operand is updated according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand.
The method according to claim 2 or 3, wherein the operand in the instruction repetition type is a destination operand, and the key information further includes: a destination pass-through DF field. Before the address ID, the method further includes:

It is determined that the value in the destination through DF field is not a set threshold.
The method according to claim 1, wherein before obtaining the key information in the compression instruction, the method further comprises:

It is determined that the compression instruction is valid.
The method according to claim 7, wherein the compression instruction includes a repetition enable field and a repetition count field, and the step of determining that the compression instruction is valid includes:

If the repeat enable field characterizing the source operand is not 0, and the address ID corresponding to the source operand points to the specified operand source type, then it is determined that the compression instruction is valid;

If the repeat enable field representing the destination operand is not 0, it is determined that the compression instruction is valid.
The method according to any one of claims 1-8, wherein the compression instruction includes a repetition enable field and a repetition count field, and the step of obtaining key information in the compression instruction comprises:

Obtaining the instruction repetition type according to the repetition enable field;

Obtain the instruction repetition number according to the repetition count field.
The method according to any one of claims 1, wherein the method further comprises:

In the process of decompressing the compressed instruction, each time an instruction is generated, the generated instruction is issued.
A decoding circuit, characterized in that it comprises:

The decoder is configured to determine whether the acquired instruction is a compressed instruction, and if yes, acquire key information in the compressed instruction, where the key information includes: instruction repetition type and instruction repetition number, wherein the instruction repetition Type is used to indicate the type of instruction to be repeated, and the number of instruction repetitions is a positive integer greater than or equal to 2;

An instruction decompression module configured to decompress the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to decompress the compressed instruction into a corresponding instruction repetition type and the instruction repetition number Multiple instructions of the same number.
The decoding circuit according to claim 11, wherein the instruction decompression module comprises:

The controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type;

An instruction generator configured to generate an instruction according to the address ID corresponding to the operand in the instruction repetition type;

The controller is further configured to update the instruction repetition number after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, and after determining the updated instruction repetition number When it is greater than a preset threshold, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator;

The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand;

The controller is further configured to update the instruction repetition number again after the instruction generator generates an instruction according to the updated address ID corresponding to the operand, and determine whether the re-updated instruction repetition number is It is equal to the preset threshold; if yes, it is determined that the decompression of the compressed instruction is completed, and multiple instructions corresponding to the instruction repetition type and the same number of repetition times of the instruction are obtained.
The decoding circuit according to claim 11, wherein the instruction decompression module comprises:

The controller is configured to obtain the address ID corresponding to the operand in the instruction repetition type;

An instruction generator configured to generate an instruction according to the address ID corresponding to the operand in the instruction repetition type;

The controller is further configured to, after the instruction generator generates an instruction according to the address ID corresponding to the operand in the instruction repetition type, record the number of generations of the generated instruction, and when it is determined that the number of generations is less than the instruction When the number of repetitions is repeated, update the address ID corresponding to the operand, and send the updated address ID corresponding to the operand to the instruction generator;

The instruction generator is further configured to generate an instruction according to the updated address ID corresponding to the operand;

The controller is further configured to update the number of generations after the instruction generator generates an instruction according to the updated address ID corresponding to the operand, and determine whether the updated number of generations is equal to the instruction Number of repetitions; if yes, it is determined that the decompression of the compressed instruction is completed, and multiple instructions corresponding to the type of instruction repetition and the same number of repetitions of the instruction are obtained.
The decoding circuit according to claim 12 or 13, wherein the controller is configured to update the address ID corresponding to the operand according to the source type of the operand pointed to by the address ID corresponding to the operand.
The decoding circuit according to claim 12 or 13, wherein the controller is configured to update the operand corresponding to the operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand Address ID.
The decoding circuit according to claim 12 or 13, wherein the operand in the instruction repetition type is a destination operand, the key information further includes: a destination pass-through DF field, and the controller is further configured to Before updating the address ID corresponding to the operand, it is determined that the value in the destination through DF field is not a set threshold.
The decoding circuit according to claim 12 or 13, wherein the source type of the operand pointed to by the address ID corresponding to the operand in the instruction repetition type is LDS, and the instruction decompression module further includes: a configuration register, so The configuration register is configured to store the address of the source operand in the LDS, and automatically update its own address to the address corresponding to the next source operand after reading the corresponding source operand from the LDS according to the current address;

Correspondingly, the controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, wherein the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.
The decoding circuit according to claim 11, wherein the decoder is further configured to determine that the compression instruction is valid before obtaining the key information in the compression instruction.
The decoding circuit according to claim 11, wherein the instruction decompression module is further configured to, upon receiving the key information sent by the decoder, send to the decoder to prevent it from obtaining the key information from the instruction distribution unit And when it is determined that the decompression of the compressed instruction ends, an instruction to allow the decoder to obtain the instruction from the instruction distribution unit is sent to the decoder.
A processor, characterized by comprising: an instruction distribution unit and an instruction execution unit. The decoding circuit according to any one of claims 11-19, wherein the instruction distribution unit and the instruction execution unit are both connected to the decoding circuit connection.