CN111124495B - Data processing method, decoding circuit and processor - Google Patents

Data processing method, decoding circuit and processor Download PDF

Info

Publication number
CN111124495B
CN111124495B CN201911302511.8A CN201911302511A CN111124495B CN 111124495 B CN111124495 B CN 111124495B CN 201911302511 A CN201911302511 A CN 201911302511A CN 111124495 B CN111124495 B CN 111124495B
Authority
CN
China
Prior art keywords
instruction
operand
address
repetition
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911302511.8A
Other languages
Chinese (zh)
Other versions
CN111124495A (en
Inventor
陈庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Haiguang Microelectronics Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN201911302511.8A priority Critical patent/CN111124495B/en
Publication of CN111124495A publication Critical patent/CN111124495A/en
Priority to PCT/CN2020/114004 priority patent/WO2021120713A1/en
Application granted granted Critical
Publication of CN111124495B publication Critical patent/CN111124495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30178Runtime instruction translation, e.g. macros of compressed or encrypted instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to a data processing method, a decoding circuit and a processor, and belongs to the technical field of computers. The method comprises the following steps: judging whether the obtained instruction is a compression instruction or not; if yes, key information in the compression instruction is obtained, and the key information comprises: the instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number. In the embodiment of the application, the instructions are compressed, so that one instruction block can accommodate more three-operand instructions, the probability of instruction cache miss is effectively reduced, and the efficiency is optimized.

Description

Data processing method, decoding circuit and processor
Technical Field
The application belongs to the technical field of computers, and particularly relates to a data processing method, a decoding circuit and a processor.
Background
The computer instructions are instructions and commands for commanding the machine to work, the program is a series of instructions arranged in a certain sequence, and the process of executing the program is the working process of the computer. When a computer executes an instruction (program), the computer needs to go to an instruction Cache (Cache) to read the instruction, and if the instruction Cache is not hit (Cache Miss), a serious performance problem is caused. For example, instruction fetching requires a long time, which significantly increases the processing cycle of an instruction sequence and reduces performance. When an instruction miss occurs, the current instruction sequence is in a stall and wait state, and if there are not enough active instruction sequences, the entire compute unit may stall, which significantly degrades performance.
An instruction block refers to a collection of instructions within a Cache Line (Cache Line). Since each cache line is only 512 bits and a 3-operand operation instruction uses 64 bits, each cache line can only store 8 such operation instructions, so that one instruction block can only accommodate 8 three-operand instructions. Handling large operations thus requires the reading of thousands of instruction blocks, which obviously is detrimental to power consumption optimization.
Disclosure of Invention
In view of the above, an object of the present application is to provide a data processing method, a decoding circuit and a processor, so as to solve the problem that the conventional instruction block can only accommodate 8 three-operand instructions, which results in many instructions being required to be issued during task execution, and is not favorable for power optimization.
The embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a data processing method, including: judging whether the obtained instruction is a compression instruction or not; if yes, key information in the compression instruction is obtained, and the key information comprises: the instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number. In the embodiment of the application, when the obtained instruction is a compression instruction, key information in the compression instruction is obtained, and then the compression instruction is decompressed according to the instruction repetition type and the instruction repetition times in the key information, so that the compression instruction is decompressed into a plurality of instructions which correspond to the instruction repetition type and are the same as the instruction repetition times in number, and through compressing the instructions, one instruction block can accommodate more three-operand instructions, so that the probability of instruction cache miss is effectively reduced, and the efficiency is optimized.
With reference to one possible implementation manner of the embodiment of the first aspect, decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number includes: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition times; updating the address ID corresponding to the operand when the updated instruction repetition times are determined to be larger than a preset threshold; generating an instruction according to the address ID corresponding to the updated operand, and updating the instruction repetition times again; judging whether the instruction repetition times after being updated again is equal to the preset threshold value or not; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times. In the embodiment of the application, when a compression instruction is decompressed according to an instruction repetition type and an instruction repetition number, after each instruction is generated, the instruction repetition number is updated, whether the updated instruction repetition number is equal to a preset threshold value or not is judged, if not, an address ID corresponding to an operand is updated, an instruction is generated based on the address ID corresponding to the updated operand, then the instruction repetition number is updated again, whether the updated instruction repetition number is equal to the preset threshold value or not is judged, decompression of the compression instruction is completed until the updated instruction repetition number is equal to the preset threshold value, in the whole process of judging whether decompression of the compression instruction is completed or not, other elements (such as a counter) are not needed, the decompression can be completed by directly updating the instruction repetition number after each instruction is generated, and the processing flow can be simplified to the maximum extent on the premise of ensuring accuracy, and the cost is saved.
With reference to one possible implementation manner of the embodiment of the first aspect, decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number includes: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation times of the generated instruction; when the generation times are determined to be less than the instruction repetition times, updating the address ID corresponding to the operand; generating an instruction according to the address ID corresponding to the updated operand, and updating the generation times; judging whether the updated generation times are equal to the instruction repetition times or not; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times. In the embodiment of the application, when decompressing the compression instruction according to the instruction repetition type and the instruction repetition number, after generating an instruction, the generation number of the generation instruction is recorded, whether the recorded generation number is equal to the instruction repetition number is judged, if not, the address ID corresponding to the operand is updated, the instruction is generated based on the address ID corresponding to the updated operand, then the generation number is updated, whether the updated generation number is equal to the instruction repetition number is judged, until the updated instruction repetition number is equal to the instruction repetition number, the decompression of the compression instruction is completed, in the process, the generation number of the generation instruction is recorded by a counter, after generating an instruction, the generation number of the generation instruction is updated, when the generation number is equal to the instruction repetition number, the decompression of the compression instruction is completed, provides another feasible mode and enriches the applicability of the scheme.
With reference to a possible implementation manner of the embodiment of the first aspect, the updating the address ID corresponding to the operand includes: and updating the address ID corresponding to the operand according to the operand source type pointed by the address ID corresponding to the operand. In the embodiment of the application, the address ID corresponding to the operand is updated according to the operand source type pointed by the address ID corresponding to the operand, so that rules for updating the address ID corresponding to the operand according to different operand source types can be different when the address is updated.
With reference to a possible implementation manner of the embodiment of the first aspect, the updating the address ID corresponding to the operand includes: and updating the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed by the address ID corresponding to the operand. In the embodiment of the application, the data type of the data stored in the operand source pointed by the address ID corresponding to the operand updates the address ID corresponding to the operand, so that different data types can correspond to different update rules when the address is updated.
With reference to a possible implementation manner of the embodiment of the first aspect, the operand in the instruction repeat type is a destination operand, and the key information further includes: a destination pass-through DF field, wherein before updating the address ID corresponding to the operand, the method further comprises: determining that a value in the destination cut-through DF field is not a set threshold. In this embodiment, when the operand is the destination operand, it needs to be determined that the value in the destination direct-connection DF field is not the set threshold before updating the address ID corresponding to the operand, so as to avoid the influence on the data direct-connection.
With reference to a possible implementation manner of the embodiment of the first aspect, before obtaining the key information in the compressed instruction, the method further includes: determining that the compress instruction is valid. In the embodiment of the application, before the key information in the compression instruction is acquired, it is further required to determine that the compression instruction is effective, so that efficiency is improved, and resource waste caused by decompression of an erroneous compression instruction is avoided.
In a second aspect, an embodiment of the present application further provides a decoding circuit, including: a decoder and an instruction decompression module; the decoder is used for judging whether the obtained instruction is a compression instruction or not, and if so, obtaining key information in the compression instruction, wherein the key information comprises: the instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2; and the instruction decompressing module is used for decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number.
With reference to one possible implementation manner of the embodiment of the second aspect, the instruction decompressing module includes: the controller is used for acquiring an address ID corresponding to an operand in the instruction repetition type; the instruction generator is used for generating an instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller is further configured to update the instruction repetition times after the instruction generator generates an instruction according to an address ID corresponding to an operand in the instruction repetition type, update the address ID corresponding to the operand when it is determined that the updated instruction repetition times is greater than a preset threshold, and send the updated address ID corresponding to the operand to the instruction generator; the instruction generator is further used for generating an instruction according to the updated address ID corresponding to the operand; the controller is further configured to update the instruction repetition number again after the instruction generator generates an instruction according to the updated address ID corresponding to the operand, and determine whether the instruction repetition number updated again is equal to the preset threshold; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
With reference to one possible implementation manner of the embodiment of the second aspect, the instruction decompressing module includes: the controller is used for acquiring an address ID corresponding to an operand in the instruction repetition type; the instruction generator is used for generating an instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller is further configured to record the generation times of the generated instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, update the address ID corresponding to the operand when it is determined that the generation times is smaller than the instruction repetition times, and send the updated address ID corresponding to the operand to the instruction generator; the instruction generator is further used for generating an instruction according to the updated address ID corresponding to the operand; the controller is further configured to update the generation times and determine whether the updated generation times is equal to the instruction repetition times after the instruction generator generates an instruction according to the updated address ID corresponding to the operand; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
In combination with a possible implementation manner of the embodiment of the second aspect, the controller is configured to update the address ID corresponding to the operand according to an operand source type pointed to by the address ID corresponding to the operand.
In combination with a possible implementation manner of the embodiment of the second aspect, the controller is configured to update the address ID corresponding to the operand according to a data type of data stored in an operand source pointed to by the address ID corresponding to the operand.
With reference to a possible implementation manner of the embodiment of the second aspect, the operand in the instruction repeat type is a destination operand, and the key information further includes: a destination cut-through DF field, wherein the controller is further configured to determine that a value in the destination cut-through DF field is not a set threshold value before updating the address ID corresponding to the operand.
With reference to a possible implementation manner of the embodiment of the second aspect, the operand source type pointed to by the address ID corresponding to the operand in the instruction repeat type is LDS, and the instruction decompression module further includes: the configuration register is used for storing and acquiring the address of a source operand in the LDS, and automatically updating the address of the configuration register to the address corresponding to the next source operand after the corresponding source operand is read from the LDS according to the current address; correspondingly, the controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, where the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.
With reference to a possible implementation manner of the embodiment of the second aspect, the decoder is further configured to determine that the compression instruction is valid before obtaining the key information in the compression instruction.
With reference to a possible implementation manner of the embodiment of the second aspect, the instruction decompressing module is further configured to send, to the decoder, an instruction to prevent the decoder from acquiring the instruction from the instruction distribution unit when the key information sent by the decoder is received, and send, to the decoder, an instruction to allow the decoder to acquire the instruction from the instruction distribution unit when it is determined that decompression of the compressed instruction is finished.
In a third aspect, an embodiment of the present application further provides a processor, including: the instruction dispatch unit and the instruction execution unit are connected to the decoding circuit, as provided in the second aspect of the embodiments above and/or in combination with any one of the possible implementations of the second aspect of the embodiments above.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.
Fig. 1 shows a schematic diagram of fields in a VOP3R instruction according to an embodiment of the present application.
Fig. 2 shows a schematic structural diagram of a decoding circuit provided in an embodiment of the present application.
Fig. 3 shows a schematic structural diagram of another decoding circuit provided in an embodiment of the present application.
Fig. 4 shows a schematic structural diagram of another decoding circuit provided in an embodiment of the present application.
Fig. 5 shows a flowchart of a data processing method provided in an embodiment of the present application.
Fig. 6 shows a schematic structural diagram of a processor provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.
Considering that currently, a Cache line can only store 8 arithmetic instructions with 3 operands, in order to avoid the situation of instruction Cache Miss (Cache Miss), only 8 three-operand instructions can be accommodated in one instruction block, which is far from sufficient for power optimization. Therefore, the embodiment of the present application provides an efficient instruction compression method, so that 64 3-operand instructions can be compressed into 64 bits, and thus each cache line can store 512 3-operand instructions at most, which not only improves the operation performance, but also can significantly reduce the instruction cache miss.
In order to support that 64 3-Operand instructions can be compressed into 64 bits, a VOP3R (Vector Operation with 3 operands and Repeat) instruction is introduced in the application, and the setting type is "110010", namely 110010 means that the instruction is a VOP3R instruction, as shown in fig. 1. The VOP3R instruction defines the following special fields, see table 1.
TABLE 1
Figure BDA0002320768430000061
Figure BDA0002320768430000071
It should be noted that the bit number (bit width) of each field in table 1 is relatively fixed, and the position thereof may be changed, for example, Repeat _ Enable may no longer be [ 62: 59] which may be a value between [ 3: 0] this number of bits, and the rest of the fields are similar.
Wherein, Repeat _ Enable: a repeat enable field, 4 bits, each bit indicating a repeat of a source Operand (operandd 0, operandd 1, operandd 2) and a destination Operand (also referred to as a Result), e.g., B [ 59: 59] (OrB [ 0: 0 ]): repeat Operand 0; b [ 60: 60 (OrB [ 1: 1 ]): repeat Operand 1; b [ 61: 61 (OrB [ 2: 2 ]): repeat Operand 2; b [ 62: 62 (OrB [ 3: 3 ]): repeat destination. It should be noted that, only the operands whose source operands are derived from Vector General Purpose Register (VGPR)/Scalar General Purpose Register (SGPR)/Local Data store (LDS _ DIRECT), and whose destination operands are derived from VGPR/SGPR are duplicated, and the other cases are directly ignored.
To support such instruction repetition, in hardware, the embodiment of the present application provides a decoding circuit, as shown in fig. 2. After the decoding circuit acquires an Instruction from an Instruction distribution unit (Instruction Dispatch), judging whether the Instruction is a compression Instruction, if not, namely the current Instruction is not the compression Instruction, directly sending the Instruction to an Instruction Execution unit (Instruction Execution) by the decoding circuit, and executing the Instruction by the Instruction Execution unit; if yes, namely when the current instruction is a compression instruction, the decoding circuit acquires key information in the compression instruction; and decompressing the compressed instruction according to the instruction repetition type and the instruction repetition times in the key information so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition times.
Wherein the key information includes: the instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2. The instruction Repeat type is obtained from a Repeat Enable field (Repeat _ Enable) in the compressed instruction, and the instruction Repeat number is obtained from a Repeat count field (Repeat _ Counter). When determining whether the command is a compress command, it can be determined whether the current command is a compress command according to the Repeat _ Counter field, if the Repeat _ Counter! If the repeat _ count is 0x0 (0 in 16 th order), the command is a compression command, and if the repeat _ count is 0x0, the command is an uncompressed command. The detailed parameters of the key information are shown in table 2.
TABLE 2
Field(s) Number of bits
Operation_code 10
Repeat_Counter 6
Result_ID 8
Repeat_Enable 4
Operand2_ID 9
Operand1_ID 9
Operand0_ID 9
For ease of understanding, the specific description is made, for example, with the compression instruction:
Repeat Enable(0x3),Repeat Counter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number, so as to obtain 62 instructions which correspond to the instruction repetition type (repeat operandd 0 and operandd 1) and are the same as the instruction repetition number (62), and obtaining the following instructions:
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
……
Forwarding=LDS_Direct(M0_register)*B(61,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(62,ALU_Index)+Forwarding;
wherein, Repeat Enable represents an instruction Repeat type, wherein 0x3 represents that two operands, i.e. Operand0 and Operand1, are repeated, and Repeat Counter represents the instruction Repeat times, wherein 62 represents the Repeat times, so that 62 instructions can be obtained after decompressing the compressed instruction. It should be noted that, here, only the instruction types to be repeated are operandd 0 and operandd 1 as examples, the instruction types to be repeated may be at least one of four operands, i.e., Repeat Result (destination Operand), operandd 0, operandd 1 and operandd 2, so that there are 15 combinations, different Repeat types are represented by defining different bytes, e.g., Repeat Enable (0x1) represents Repeat operandd 0 Operand, Repeat Enable (0x2) represents Repeat operandd 1 Operand, and Repeat Enable (0x3) represents two operands, i.e., Repeat operandd 0 and operandd 1.
Because the instructions are divided into normal instructions (single instructions) and compressed instructions, the instruction logic of the corresponding hardware comprises a normal mode and a Repeat mode, when the Repeat _ Count is 0, the normal mode is represented, and in the normal mode, the execution logic acquires the instructions from the instruction distribution unit and executes the instructions. Repeat _ Count! When the decoding circuit finishes decompressing the compressed instruction, that is, when Repeat _ Count is 0, the decoding circuit switches back to the normal mode.
As an embodiment, the process of decompressing the compressed instruction by the decoding circuit according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition times; when the number of times of instruction repetition after updating is determined to be larger than a preset threshold value, updating the address ID corresponding to the operand according to the address ID corresponding to the operand; generating an instruction according to the address ID corresponding to the updated operand, and updating the instruction repetition times again; judging whether the repeated times of the updated instruction are equal to a preset threshold value or not; if so, completing decompression of the compressed instruction to obtain a plurality of instructions which correspond to the instruction repetition types and have the same number as the instruction repetition times, otherwise, repeating the operation (updating the address ID corresponding to the operand, generating the instruction according to the address ID corresponding to the updated operand, updating the instruction repetition times again, judging whether the instruction repetition times after updating again is equal to a preset threshold value or not), and ending the operation until the instruction repetition times after updating is equal to the preset threshold value.
The code for this process is represented as follows:
Figure BDA0002320768430000091
Figure BDA0002320768430000101
in this embodiment, that is, when the instruction is generated for the first time, the instruction is generated according to the address ID carried in the compressed instruction, in the above example, the instruction of "Forwarding _ Direct (M0_ register) × B (1, ALU _ Index) + Forwarding" is generated according to the default address ID (address1) in the compressed instruction, then the instruction repetition number (the instruction repetition number at this time is 61) is updated, when it is determined that the updated instruction repetition number (61) is greater than the preset threshold (e.g. 0), the address ID (address2) corresponding to the operand is updated, the instruction is generated according to the updated address ID, and the instruction repetition number is updated again, then it is determined whether the updated instruction repetition number is equal to the preset threshold, if not, the address ID corresponding to the operand is updated again, the instruction is generated according to the updated address ID, and the instruction repetition number is updated again (the instruction repetition number at this time is 60), and then judging whether the updated instruction repetition frequency (60) is equal to a preset threshold value or not, if the updated instruction repetition frequency is still larger than the preset threshold value, repeating the operation (updating the address ID corresponding to the operand, generating the instruction according to the updated address ID, updating the instruction repetition frequency again, and then judging whether the updated instruction repetition frequency is equal to the preset threshold value or not) until the updated instruction repetition frequency (0) is equal to the preset threshold value (such as 0), ending the operation, and when the updated instruction repetition frequency is equal to the preset threshold value, obtaining 62 instructions of the operand, namely completing the decompression of the compression instruction.
As another embodiment, the process of the decoding circuit decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to an address ID corresponding to an operand in the instruction repetition type, and recording the generation times of the generated instruction; when the generation times are determined to be less than the instruction repetition times, updating the address ID corresponding to the operand; generating an instruction according to the address ID corresponding to the updated operand, and updating the generation times; judging whether the updated generation times are equal to the instruction repetition times or not; if so, completing decompression of the compressed instruction to obtain a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition times; if not, repeating the operation (updating the address ID corresponding to the operand, generating an instruction according to the address ID corresponding to the updated operand, updating the generation times, judging whether the updated generation times are equal to the instruction repetition times or not), and ending the operation until the updated generation times are equal to the instruction repetition times.
The principle of this embodiment is the same as that of the foregoing embodiment, except that in the first embodiment, after the instruction is generated, the number of times of instruction repetition is updated, and whether the decompression operation for the compressed instruction is completed is determined by determining whether the number of times of instruction repetition after update is equal to a preset threshold (for example, 0). That is, in this embodiment, a counter is used to count the number of times an instruction is generated, the number is incremented once every time an instruction is generated, and whether the instruction needs to be generated continuously is determined by determining whether the number of times recorded is equal to the number of times the instruction is repeated (62).
When a compressed instruction is decompressed, the instruction is issued to the instruction execution unit every time an instruction is generated.
The Operand in the instruction repeat type may be at least one of four operands, namely Result, operandd 0, operandd 1 and operandd 2. In updating the address ID corresponding to the operand, in one embodiment, the operand source type (e.g., VGPR/SGPR/LDS _ DIRECT) pointed by the address ID corresponding to the operand may be determined, for example, the rule for updating the address ID corresponding to different operand source types may be different, for example, the rule for updating the address ID corresponding to VGPR for the operand source is different from the rule for updating the address ID corresponding to SGPR for the operand source.
For example, when the Operand source pointed by the ID corresponding to the Operand is VGPR/SGPR, the address ID may be updated based on a rule (Operand _ ID + +, or Result _ ID + +) when the address ID is updated, that is, the updated address is equal to the address before updating plus one. For the sake of easy understanding, taking Operand1 as an example, if Operand1_ ID points to VGPR/SGPR, repeat the following:
Figure BDA0002320768430000111
that is, if Operand1_ ID points to the Operand source is VGPR/SGPR, and Repeat _ Enable [60] is 1, then the address of Operand1 (Operand1_ ID) is incremented by 1; if 0, the address of operand1 remains unchanged. It should be noted that, in the above example, only address self-increment and increment of 1 are taken as examples, and the rule of address update may also be address self-decrement, in this case, the amplitude may also not be 1, which mainly depends on whether to store data in an incremental manner or a decremental manner, whether to store data continuously, or the like.
When the operand source pointed to by the ID corresponding to the operand is LDS _ DIRECT, the rule for updating the address ID is different from that when the operand source pointed to by the ID corresponding to the operand is VGPR/SGPR. If in this mode the hardware reads data from the LDS as an operand when the operand source pointed to by the operand's corresponding ID is LDS _ DIRECT, the access address and data type are determined by configuration registers, such as the M0 register (32-bit special hardware internal register whose lower 16-bits are used as addresses by LDS _ DIRECT). The 32bit definition of the M0 register is shown in Table 3.
TABLE 3
Figure BDA0002320768430000121
Thus, when the source operand is derived from LDS _ DIRECT, the address field of the M0 register needs to be automatically updated when the address ID is updated. Correspondingly, the address pointed to by the address ID is the address stored in the M0 register, which is used to read the source operand stored in the LDS. That is, the M0 register is used to store the address of the source operand (e.g., the element of each row in the matrix) in the LDS for reading, and the address of the M0 register needs to be updated to the address corresponding to the next element after the corresponding element is read from the LDS according to the current address.
In yet another embodiment, in addition to updating the address ID corresponding to the operand according to the operand source type pointed to by the address ID corresponding to the operand, the address ID corresponding to the operand may also be updated according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand. Different data types and corresponding address update rules are different, for example, as follows:
address i +1 ═ Address i +0x 1; // data type is unsigned byte;
address i +1 ═ Address i +0x 2; // data type is unsigned byte;
address i +1 ═ Address i +0x 4; // data type DWord;
address i +1 ═ Address i +0x 0; // data type Default (reserved);
address i +1 ═ Address i +0x 1; the data type is signed byte;
address i +1 ═ Address i +0x 2; the data type is signed short;
address i +1 ═ Address i +0x 8; // data type is Qword;
taking the operand source pointed by the Address ID corresponding to the operand as LDS _ DIRECT as an example, at this time, the Address field of the M0 register is automatically updated during updating, and the data type of the data stored in the LDS is also considered, and if the data type is unsigned byte, the data is updated according to the rule that Address i +1 is Address i +0x 1.
When the operand is a destination operand (Result), it is further required to ensure that the operand source type pointed to by the address ID corresponding to the destination operand is not a temporary register for data communication before performing address update. Whether the operand source type pointed by the address ID corresponding to the destination operand is a temporary register for data through can be judged through the destination through DF field. When DF is equal to 1, Result _ ID is forwarding, at this time, the address does not need to be updated, and forwarding is maintained. Otherwise, that is, DF is not 1, the operand source type pointed to by the address ID corresponding to the destination operand is not a temporary register for data through, and if it is VGPR/SGPR, the address update is performed in the foregoing manner.
In order to improve efficiency, before obtaining the key information in the compressed instruction, the decoding circuit may further determine whether the compressed instruction is valid, obtain the key information in the compressed instruction only after determining that the compressed instruction is valid, and decompress the compressed instruction according to the instruction repetition type and the instruction repetition number in the key information.
As an embodiment, whether the compress instruction is valid may be determined by: judging whether the compression instruction is valid according to a repeat enable field which represents a source operand in the compression instruction or a repeat enable field which represents a destination operand in the compression instruction; when the repeat enable field for representing the source operand is not zero and the address ID corresponding to the source operand points to the source type of the specified operand (such as VGPR/SGPR/LDS _ Direct), or when the repeat enable field for representing the destination operand is not zero, the compression instruction is represented to be valid. Characterizing the compress instruction as valid if at least one of:
If(Repeat_Enable[59:59]!=0x0)andoperand0_ID isVGPR/SGPR/LDS_DIRECT;
If(Repeat_Enable[60:60]!=0x0)andoperand1_ID isVGPR/SGPR/LDS_DIRECT;
If(Repeat_Enable[61:61]!=0x0)andoperand2_ID isVGPR/SGPR/LDS_DIRECT;
If(Repeat_Enable[62:62]!=0x0);
that is, if the repeat enable field of at least one source operand is not zero and the corresponding address ID points to the specified operand source type, or if the repeat enable field of the destination operand is not zero, the packed instruction is valid.
The above is described from the perspective of the entire decoding circuit, and the steps performed by each element in the decoding circuit are described below in order to facilitate understanding of information exchange between each element in the decoding circuit. As shown in fig. 1, the decoding circuit includes: the Decoder (Repeat Decoder) and the instruction decompression module are connected.
The decoder is used for judging whether the obtained instruction is a compression instruction or not, if not, the instruction is sent to the instruction execution unit to execute the instruction, and if so, key information in the compression instruction is obtained.
To improve efficiency, the decoder is optionally further configured to determine that the compressed instruction is valid before retrieving critical information in the compressed instruction. As an embodiment, the decoder is configured to determine that the compress instruction is valid according to: judging whether the compression instruction is valid according to a repeat enable field which represents a source operand in the compression instruction or a repeat enable field which represents a destination operand in the compression instruction; the compression instruction is characterized to be valid when the repeat enable field for characterizing the source operand is not zero and the address ID corresponding to the source operand points to the source type of the specified operand, or when the repeat enable field for characterizing the destination operand is not zero.
And the instruction decompressing module is used for decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number.
Optionally, the instruction decompressing module is further configured to send, to the decoder, an instruction to prevent the decoder from obtaining the instruction from the instruction distributing unit when the key information sent by the decoder is received, and send, to the decoder, an instruction to allow the decoder to obtain the instruction from the instruction distributing unit when decompression of the compressed instruction is completed.
Under one implementation, as shown in fig. 3, the instruction decompressing module includes: a controller and an instruction generator. The controller is connected with the instruction generator and the decoder respectively.
In one embodiment, optionally, the controller is configured to obtain an address ID corresponding to an operand in the instruction repeat type; the instruction generator is used for generating an instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller is further used for updating the instruction repetition times after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, updating the address ID corresponding to the operand when the updated instruction repetition times is determined to be larger than a preset threshold value, and sending the address ID corresponding to the updated operand to the instruction generator; the instruction generator is also used for generating an instruction according to the address ID corresponding to the updated operand; the controller is further used for updating the instruction repetition times again after the instruction generator generates the instruction according to the address ID corresponding to the updated operand, and judging whether the instruction repetition times after updating again is equal to a preset threshold value or not; if yes, decompression of the compressed instruction is completed, and a plurality of instructions which correspond to the instruction repetition type and are the same as the instruction repetition times are obtained.
In another embodiment, the controller is configured to obtain an address ID corresponding to an operand in an instruction repeat type; the instruction generator is used for generating an instruction according to the address ID corresponding to the operand in the instruction repetition type; the controller is further used for recording the generation times of the generated instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, updating the address ID corresponding to the operand when the generation times are determined to be smaller than the instruction repetition times, and sending the address ID corresponding to the updated operand to the instruction generator; the instruction generator is also used for generating an instruction according to the address ID corresponding to the updated operand; the controller is also used for updating the generation times after the instruction generator generates the instruction according to the address ID corresponding to the updated operand, and judging whether the updated generation times are equal to the instruction repetition times or not; if yes, decompression of the compressed instruction is completed, and a plurality of instructions which correspond to the instruction repetition type and are the same as the instruction repetition times are obtained.
In one embodiment, when updating the address ID corresponding to the operand, the controller is further configured to update the address ID corresponding to the operand according to the operand source type pointed to by the address ID corresponding to the operand.
In one embodiment, when the controller updates the address ID corresponding to the operand, the controller is further configured to update the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed by the address ID corresponding to the operand.
Optionally, the operand in the instruction repeat type is a destination operand, and the key information further includes: and the destination direct-through DF field is also used for determining that the value in the destination direct-through DF field is not a set threshold (such as 1) before the controller updates the address ID corresponding to the operand.
Optionally, the controller is further configured to send an instruction to the decoder to prevent the decoder from fetching the instruction from the instruction dispatch unit during decompression of the compressed instruction, when the decoder is not fetching the instruction from the instruction dispatch unit. When decompression is complete, an indication is sent to the decoder that allows it (the decoder) to fetch instructions from the instruction dispatch unit, at which point the decoder can fetch instructions from the instruction dispatch unit. That is, the controller includes a normal mode in which the decoder is allowed to acquire an instruction from the instruction distribution unit and execute it, and a Repeat mode in which the normal mode is indicated when Repeat _ Count ═ 0. In the Repeat mode (Repeat _ Count | ═ 0 indicates a Repeat mode), the controller prevents the decoder from fetching an instruction from the instruction distribution unit, and switches back to the normal mode when the controller completes the decompression of the compressed instruction, that is, when Repeat _ Count ═ 0.
When the operand source type pointed by the address ID corresponding to the operand in the instruction repeat type is LDS, the instruction decompressing module further includes: and the configuration register is used for storing and acquiring the address of the source operand in the LDS, and automatically updating the address of the configuration register to the address corresponding to the next source operand after reading the corresponding source operand from the LDS according to the current address. At this time, when updating the address ID corresponding to the operand, the controller is further configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, where the address ID is the same as the address currently indicated by the configuration register. In this case, as shown in fig. 4, the instruction decompressing module includes: a controller, a configuration register (M0 register), and an instruction generator. The controller is connected with the decoder, the instruction generator and the configuration register respectively.
When the functional functions of the respective components are described, reference may be made to the same parts in the foregoing embodiments in which the decoding circuit is described as a whole, and the parts have been described in detail in the foregoing embodiments of the apparatus, and for the sake of brevity of the description, the descriptions will not be repeated here.
In the embodiment of the application, the instructions are compressed through the VOP3R, so that each cache line (512bit) can accommodate 512 3-operand instructions, the probability of instruction cache miss is effectively reduced, and the efficiency is optimized. For ease of understanding, the method provided by the embodiments of the present application is applied to matrix multiplication as an example. Taking a 64X64 matrix as an example, C64x64=A64x64*B64x64Here, the matrix size of 64X64 is merely an example, and is not limited thereto. Assume that there are 64 arithmetic operation units each having a VGPR space of 200x64 bit.
The calculation process is roughly as follows:
1) matrix a is loaded to LDS in linear mode:
a (0,0) → LDS (Address 0); // A (0,0) is stored at the location of Address0 of LDS;
a (0,1) → LDS (Address 1); // A (0,1) is stored at the location of Address1 of LDS;
a (0,2) → LDS (Address 2); // A (0,2) is stored at the location of Address2 of LDS;
……
2) matrix B is loaded into the VGPR space as shown in Table 4.
TABLE 4
ALU0 ALU1 ALU2 …… ALU62 ALU63
B0,0 B0,1 B0,2 …… B0,62 B0,63
B1,0 B1,1 B1,2 …… B1,62 B1,63
…… …… …… …… …… ……
B63,0 B63,1 B63,2 …… B63,62 B63,63
During calculation, elements in the matrix A are loaded into 64 ALUs one by one in parallel and multiplied by elements corresponding to columns stored in 64 vector general registers respectively, and the 64 ALUs accumulate multiplication results generated by the elements in the same row in the matrix A and the corresponding elements in the matrix B one by one in parallel in sequence to obtain all elements in the same row in the matrix C, so that multiplication operation of the matrix A and the second matrix B is completed.
3) Calculating a matrix C:
the instructions for calculating matrix C in the normal mode are as follows:
m0_ register is start _ address; the initial address of the register// M0, wherein the M0 register is used to store the address of each element in the read matrix A and is automatically updated to the address of the next element after the 64 ALUs read the corresponding element in the matrix A from the LDS in parallel according to the current address of the M0 register.
//-----------------------------------------
// Calculate the first row of Matrix C (first row of calculation Matrix C):
// C (0,0) is calculated on ALU _ Index0 ALU _ Index ═ 0(ALU0 calculates C (0,0)).
// C (0,1) is calculated on ALU _ Index1 ALU _ Index ═ 1(ALU1 calculates C (0,1)).
//......
The execution instruction for each ALU to compute a corresponding element in the first row of the matrix C is as follows:
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
……
Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
//-----------------------------------------
// calculating the second row of Matrix C:
// C (1,0) is calculated on ALU _ Index0(ALU0 calculates C (1,0)).
// C (1,1) is calculated on ALU _ Index1(ALU1 calculates C (1,1)).
//......
The execution instruction for each ALU to compute a corresponding element in the second row of the matrix C is as follows:
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
……
Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
……
//-----------------------------------------
// Calculate the last row of Matrix C:
// C (63,0) is calculated on ALU _ Index0(ALU0 calculates C (63,0)).
// C (63,1) is calculated on ALU _ Index1(ALU1 calculates C (63,1)).
//......
The execution instruction for each ALU to compute a corresponding element in the last row of the matrix C is as follows:
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
……
Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
the above is a conventional mode without using instruction compression, and the following is that with the instruction compression method provided by the present application, the above conventional instruction list can be compressed as follows:
M0_register=start_address;
//-----------------------------------------
// Calculate the first row of Matrix C (first row of calculation Matrix C):
//C(0,0)is calculated on ALU_Index0:ALU_Index=0.
//C(0,1)is calculated on ALU_Index1:ALU_Index=1.
//......
//-----------------------------------------
Block_Star::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
RepeatEnable(0x3,RepeatCounter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0 and Operand1;
Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
//-----------------------------------------
// calculating the second row of Matrix C:
//C(1,0)is calculated on ALU_Index0.
//C(1,1)is calculated on ALU_Index1.
//...........
//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
RepeatEnable(0x3),RepeatCounter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0 and Operand1;
Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
……
//-----------------------------------------
// Calculate the last row of Matrix C:
//C(63,0)is calculated on ALU_Index0.
//C(63,1)is calculated on ALU_Index1.
//...........
//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
RepeatEnable(0x3),RepeatCounter(62)::
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;//Repeat Operand0 and Operand1;
Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
as can be seen from the above, C is accomplished using the conventional instruction pattern64x64=A64x64*B64x6464X64 instructions are needed, 4096 instructions, and only 3X64 instructions are needed after the application of the instruction compression, so that the efficiency is remarkably improved.
Referring to fig. 5, a data processing method according to an embodiment of the present application will be described with reference to fig. 5.
Step S101: and judging whether the acquired instruction is a compression instruction or not.
If yes, step S102 is executed, and if no, the acquired instruction is sent to the instruction execution unit.
Step S102: obtaining key information in the compression instruction, wherein the key information comprises: an instruction repeat type and an instruction repeat number.
The instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2.
Step S103: decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number.
Optionally, before obtaining the key information in the compression instruction, the method further includes: determining that the compress instruction is valid.
In an embodiment, the process of decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition times; updating the address ID corresponding to the operand when the updated instruction repetition times are determined to be larger than a preset threshold; generating an instruction according to the address ID corresponding to the updated operand, and updating the instruction repetition times again; judging whether the instruction repetition times after being updated again is equal to the preset threshold value or not; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
In one embodiment, the process of decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number may be: generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation times of the generated instruction; when the generation times are determined to be less than the instruction repetition times, updating the address ID corresponding to the operand; generating an instruction according to the address ID corresponding to the updated operand, and updating the generation times; judging whether the updated generation times are equal to the instruction repetition times or not; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
Optionally, the process of updating the address ID corresponding to the operand may be: and updating the address ID corresponding to the operand according to the operand source type pointed by the address ID corresponding to the operand.
Optionally, the process of updating the address ID corresponding to the operand may also be: and updating the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed by the address ID corresponding to the operand.
Optionally, the operand in the instruction repeat type is a destination operand, and the key information further includes: a destination pass-through DF field, wherein before updating the address ID corresponding to the operand, the method further comprises: determining that a value in the destination cut-through DF field is not a set threshold.
The method provided by the embodiment of the present application, which has the same implementation principle and the same technical effect as the foregoing device embodiment, for the sake of brief description, and where no part of the method embodiment is mentioned, reference may be made to the corresponding content in the foregoing device embodiment.
The embodiment of the application also provides a processor, as shown in fig. 6. The processor comprises the decoding circuit, the instruction execution unit and the instruction distribution unit in any one of the above embodiments. The instruction distributing unit and the instruction executing unit are both connected with the decoding circuit. The instruction dispatch unit is to store the instruction to facilitate the decode circuitry to fetch the instruction from the instruction dispatch unit. The instruction execution unit is used for executing the instruction issued by the decoding circuit.
The processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (17)

1. A data processing method, comprising:
judging whether the obtained instruction is a compression instruction or not;
if yes, key information in the compression instruction is obtained, and the key information comprises: the instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2;
decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number.
2. The method of claim 1, wherein decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number comprises:
generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and updating the instruction repetition times;
updating the address ID corresponding to the operand when the updated instruction repetition times are determined to be larger than a preset threshold;
generating an instruction according to the address ID corresponding to the updated operand, and updating the instruction repetition times again;
judging whether the instruction repetition times after being updated again is equal to the preset threshold value or not;
if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
3. The method of claim 1, wherein decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number comprises:
generating an instruction according to the address ID corresponding to the operand in the instruction repetition type, and recording the generation times of the generated instruction;
when the generation times are determined to be less than the instruction repetition times, updating the address ID corresponding to the operand;
generating an instruction according to the address ID corresponding to the updated operand, and updating the generation times;
judging whether the updated generation times are equal to the instruction repetition times or not;
if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
4. The method of claim 2 or 3, wherein updating the address ID corresponding to the operand comprises:
and updating the address ID corresponding to the operand according to the operand source type pointed by the address ID corresponding to the operand.
5. The method of claim 2 or 3, wherein updating the address ID corresponding to the operand comprises:
and updating the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed by the address ID corresponding to the operand.
6. The method of claim 2 or 3, wherein the operand in the instruction repeat type is a destination operand, and wherein the key information further comprises: a destination pass-through DF field, wherein before updating the address ID corresponding to the operand, the method further comprises:
determining that a value in the destination cut-through DF field is not a set threshold.
7. The method of claim 1, wherein prior to obtaining critical information in the compressed instruction, the method further comprises:
determining that the compress instruction is valid.
8. A decoding circuit, comprising:
the decoder is used for judging whether the obtained instruction is a compression instruction or not, and if so, obtaining key information in the compression instruction, wherein the key information comprises: the instruction repetition type is used for indicating the type of the instruction to be repeated, and the instruction repetition number is a positive integer greater than or equal to 2;
and the instruction decompressing module is used for decompressing the compressed instruction according to the instruction repetition type and the instruction repetition number so as to decompress the compressed instruction into a plurality of instructions which correspond to the instruction repetition type and have the same number as the instruction repetition number.
9. The decoding circuit of claim 8, wherein the instruction decompression module comprises:
the controller is used for acquiring an address ID corresponding to an operand in the instruction repetition type;
the instruction generator is used for generating an instruction according to the address ID corresponding to the operand in the instruction repetition type;
the controller is further configured to update the instruction repetition times after the instruction generator generates an instruction according to an address ID corresponding to an operand in the instruction repetition type, update the address ID corresponding to the operand when it is determined that the updated instruction repetition times is greater than a preset threshold, and send the updated address ID corresponding to the operand to the instruction generator;
the instruction generator is further used for generating an instruction according to the updated address ID corresponding to the operand;
the controller is further configured to update the instruction repetition number again after the instruction generator generates an instruction according to the updated address ID corresponding to the operand, and determine whether the instruction repetition number updated again is equal to the preset threshold; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
10. The decoding circuit of claim 8, wherein the instruction decompression module comprises:
the controller is used for acquiring an address ID corresponding to an operand in the instruction repetition type;
the instruction generator is used for generating an instruction according to the address ID corresponding to the operand in the instruction repetition type;
the controller is further configured to record the generation times of the generated instruction after the instruction generator generates the instruction according to the address ID corresponding to the operand in the instruction repetition type, update the address ID corresponding to the operand when it is determined that the generation times is smaller than the instruction repetition times, and send the updated address ID corresponding to the operand to the instruction generator;
the instruction generator is further used for generating an instruction according to the updated address ID corresponding to the operand;
the controller is further configured to update the generation times and determine whether the updated generation times is equal to the instruction repetition times after the instruction generator generates an instruction according to the updated address ID corresponding to the operand; if yes, determining that decompression of the compressed instruction is finished, and obtaining a plurality of instructions which correspond to the instruction repetition types and are the same as the instruction repetition times.
11. The decoding circuit according to claim 9 or 10, wherein the controller is configured to update the address ID corresponding to the operand according to an operand source type pointed to by the address ID corresponding to the operand.
12. The decoding circuit according to claim 9 or 10, wherein the controller is configured to update the address ID corresponding to the operand according to the data type of the data stored in the operand source pointed to by the address ID corresponding to the operand.
13. The decode circuit of claim 9 or 10, wherein the operand in the instruction repeat type is a destination operand, the key information further comprising: a destination cut-through DF field, wherein the controller is further configured to determine that a value in the destination cut-through DF field is not a set threshold value before updating the address ID corresponding to the operand.
14. The decoding circuit according to claim 9 or 10, wherein the operand source type pointed to by the address ID corresponding to the operand in the instruction repeat type is LDS, and the instruction decompression module further comprises: the configuration register is used for storing and acquiring the address of a source operand in the LDS, and automatically updating the address of the configuration register to the address corresponding to the next source operand after the corresponding source operand is read from the LDS according to the current address;
correspondingly, the controller is configured to update the address ID corresponding to the operand according to the address currently indicated by the configuration register, where the address ID corresponding to the operand is the same as the address currently indicated by the configuration register.
15. The decoding circuit of claim 8, wherein the decoder is further configured to determine that the compressed instruction is valid before critical information in the compressed instruction is fetched.
16. The decoding circuit according to claim 8, wherein the instruction decompressing module is further configured to send, to the decoder, an instruction to prevent the decoder from fetching the instruction from the instruction distributing unit when the key information sent by the decoder is received, and send, to the decoder, an instruction to allow the decoder to fetch the instruction from the instruction distributing unit when it is determined that the decompression of the compressed instruction is finished.
17. A processor, comprising: instruction dispatch unit, instruction execution unit the decoding circuit of any of claims 8-16, the instruction dispatch unit and the instruction execution unit both being connected to the decoding circuit.
CN201911302511.8A 2019-12-16 2019-12-16 Data processing method, decoding circuit and processor Active CN111124495B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911302511.8A CN111124495B (en) 2019-12-16 2019-12-16 Data processing method, decoding circuit and processor
PCT/CN2020/114004 WO2021120713A1 (en) 2019-12-16 2020-09-08 Data processing method, decoding circuit, and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911302511.8A CN111124495B (en) 2019-12-16 2019-12-16 Data processing method, decoding circuit and processor

Publications (2)

Publication Number Publication Date
CN111124495A CN111124495A (en) 2020-05-08
CN111124495B true CN111124495B (en) 2021-02-12

Family

ID=70499328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911302511.8A Active CN111124495B (en) 2019-12-16 2019-12-16 Data processing method, decoding circuit and processor

Country Status (2)

Country Link
CN (1) CN111124495B (en)
WO (1) WO2021120713A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124495B (en) * 2019-12-16 2021-02-12 海光信息技术股份有限公司 Data processing method, decoding circuit and processor
CN112929379B (en) * 2021-02-22 2023-03-24 深圳供电局有限公司 Intelligent recorder remote operation and maintenance instruction defense method and system
CN116225538A (en) * 2023-05-06 2023-06-06 苏州萨沙迈半导体有限公司 Processor and pipeline structure and instruction execution method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533342A (en) * 2007-12-27 2009-09-16 英特尔公司 Compressed instruction format
CN103562855A (en) * 2011-04-01 2014-02-05 英特尔公司 Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004029796A2 (en) * 2002-09-24 2004-04-08 Koninklijke Philips Electronics N.V. Apparatus, method ,and compiler enabling processing of load immediate instructions in a very long instruction word processor
US20040139298A1 (en) * 2003-01-09 2004-07-15 International Business Machines Corporation Method and apparatus for instruction compression and decompression in a cache memory
US7552316B2 (en) * 2004-07-26 2009-06-23 Via Technologies, Inc. Method and apparatus for compressing instructions to have consecutively addressed operands and for corresponding decompression in a computer system
US9135055B2 (en) * 2009-12-11 2015-09-15 Aerial Robotics, Inc. Transparent network substrate system
JP2012098893A (en) * 2010-11-01 2012-05-24 Fujitsu Semiconductor Ltd Compression instruction processing device and compression instruction generation device
US9672041B2 (en) * 2013-08-01 2017-06-06 Andes Technology Corporation Method for compressing variable-length instructions including PC-relative instructions and processor for executing compressed instructions using an instruction table
US9513919B2 (en) * 2015-04-28 2016-12-06 Intel Corporation Method and apparatus for speculative decompression
CN107729054B (en) * 2017-10-18 2020-07-24 珠海市杰理科技股份有限公司 Method and device for realizing execution of processor on loop body
CN111124495B (en) * 2019-12-16 2021-02-12 海光信息技术股份有限公司 Data processing method, decoding circuit and processor
CN111708574B (en) * 2020-05-28 2023-03-31 中国科学院信息工程研究所 Instruction stream compression and decompression method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101533342A (en) * 2007-12-27 2009-09-16 英特尔公司 Compressed instruction format
CN103562855A (en) * 2011-04-01 2014-02-05 英特尔公司 Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于单指令级并行的快速求交算法;宋省身;《山东大学学报》;20180330;全文 *

Also Published As

Publication number Publication date
CN111124495A (en) 2020-05-08
WO2021120713A8 (en) 2021-08-05
WO2021120713A1 (en) 2021-06-24

Similar Documents

Publication Publication Date Title
EP3602278B1 (en) Systems, methods, and apparatuses for tile matrix multiplication and accumulation
US11714875B2 (en) Apparatuses, methods, and systems for instructions of a matrix operations accelerator
CN111124495B (en) Data processing method, decoding circuit and processor
US11847185B2 (en) Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements
KR101515311B1 (en) Performing a multiply-multiply-accumulate instruction
US20200210516A1 (en) Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions
JP4817185B2 (en) Computer instruction value field with embedded code
JP7481069B2 (en) System and method for performing chained tile operations - Patents.com
US10922077B2 (en) Apparatuses, methods, and systems for stencil configuration and computation instructions
JP5341163B2 (en) Instruction cache with a fixed number of variable-length instructions
EP3757769B1 (en) Systems and methods to skip inconsequential matrix operations
US20190347099A1 (en) Arithmetic operation with shift
Molina et al. Dynamic removal of redundant computations
US9792121B2 (en) Microprocessor that fuses if-then instructions
CN111782270B (en) Data processing method and device and storage medium
CN113703832A (en) Method, device and medium for executing immediate data transfer instruction
CN116097212A (en) Apparatus, method, and system for a 16-bit floating point matrix dot product instruction
CN116339832A (en) Data processing device, method and processor
EP4278256B1 (en) Parallel decode instruction set computer architecture with variable-length instructions
CN115328547A (en) Data processing method, electronic equipment and storage medium
EP3757822A1 (en) Apparatuses, methods, and systems for enhanced matrix multiplier architecture
Cheresiz et al. The CSI multimedia architecture
CN109683959B (en) Instruction execution method of processor and processor thereof
CN118276951B (en) RISC-V based instruction expansion method and implementation device
CN113064841B (en) Data storage method, processing method, computing device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 300450 Tianjin Binhai New Area Huayuan Industrial Zone Haitai West Road 18 North 2-204 Industrial Incubation-3-8

Applicant after: Haiguang Information Technology Co., Ltd

Address before: 1809-1810, block B, blue talent port, No.1, Intelligent Island Road, high tech Zone, Qingdao, Shandong Province

Applicant before: HAIGUANG INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210407

Address after: 610000 China (Sichuan) pilot Free Trade Zone, Chengdu high tech Zone

Patentee after: CHENGDU HAIGUANG MICROELECTRONICS TECHNOLOGY Co.,Ltd.

Address before: Industrial incubation-3-8, North 2-204, No. 18, Haitai West Road, Huayuan Industrial Zone, Binhai New Area, Tianjin 300450

Patentee before: Haiguang Information Technology Co., Ltd