WO2023093335A1 - Data processing circuit, artificial intelligence chip, and data processing method and apparatus - Google Patents

Data processing circuit, artificial intelligence chip, and data processing method and apparatus Download PDF

Info

Publication number
WO2023093335A1
WO2023093335A1 PCT/CN2022/124509 CN2022124509W WO2023093335A1 WO 2023093335 A1 WO2023093335 A1 WO 2023093335A1 CN 2022124509 W CN2022124509 W CN 2022124509W WO 2023093335 A1 WO2023093335 A1 WO 2023093335A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
information
sent
read
queue
Prior art date
Application number
PCT/CN2022/124509
Other languages
French (fr)
Chinese (zh)
Inventor
李越
朱志岐
王文强
徐宁仪
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023093335A1 publication Critical patent/WO2023093335A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/337Design optimisation

Definitions

  • the present disclosure relates to the technical field of integrated circuit design, in particular to a data processing circuit, an artificial intelligence chip, a data processing method and a device.
  • the disclosure provides a data processing circuit, an artificial intelligence chip, a data processing method and a device.
  • a data processing circuit the circuit includes a plurality of processing units, and each of the plurality of processing units includes: an instruction queue for processing received The first instruction is cached; the sent instruction queue is used to cache the information of each second instruction in at least one second instruction that has been sent; the detection unit is used for based on the information of the first instruction and each second instruction information to detect whether the first instruction satisfies the instruction sending condition, and if the instruction sending condition is met, take the first instruction from the instruction queue and send it; the information includes read-write type information and at least one of the address information.
  • the detection unit is specifically configured to: when the sent instruction queue is not full, based on the read/write type information of the first instruction, the read/write type information of the second instruction, the The address information of the first instruction and the address information of the second instruction detect whether the first instruction satisfies the instruction sending condition; when the sent instruction queue is full, read and write based on the first instruction The type information and the read/write type information of the second instruction detect whether the first instruction satisfies the instruction sending condition.
  • the detection unit is specifically configured to: address information of each second instruction in the sent instruction queue is different from that of the first instruction In the case of address information, it is determined that the first instruction satisfies the instruction sending condition; there is at least one potential conflicting instruction whose address information is the same as the address information of the first instruction in the sent instruction queue, but each In the case where the read/write type of the potential conflicting instruction is the same as the read/write type of the first instruction, it is determined that the first instruction satisfies the instruction sending condition; there is at least one of the potential conflicting instructions in the sent instruction queue In the case of conflicting instructions, and there is at least one conflicting instruction whose read/write type is different from that of the first instruction among the at least one potential conflicting instruction, it is determined that the first instruction does not satisfy the instruction sending condition.
  • the detecting unit is specifically configured to: there is at least one second instruction in the sent instruction queue that has the same read/write type information as the first instruction In the case of different read-write type information, it is determined that the first instruction does not meet the instruction sending condition; the read-write type information of each second instruction in the sent instruction queue is the same as the read-write type information of the first instruction If the information is the same, it is determined that the first instruction satisfies the instruction sending condition.
  • processing units are used to process instructions sent by different thread groups.
  • the circuit further includes a first instruction distribution unit, configured to: receive first instructions sent by each thread group, each first instruction sent by each thread group carries identification information of the corresponding thread group; The identification information carried in the first instruction sent by the thread group distributes the first instruction sent by each thread group to the corresponding processing unit.
  • a first instruction distribution unit configured to: receive first instructions sent by each thread group, each first instruction sent by each thread group carries identification information of the corresponding thread group; The identification information carried in the first instruction sent by the thread group distributes the first instruction sent by each thread group to the corresponding processing unit.
  • the circuit further includes: an instruction arbitration unit, configured to receive the first instructions sent by each processing unit, and based on the first instruction sent by each processing unit The priority is to send the first instruction received in sequence.
  • an instruction arbitration unit configured to receive the first instructions sent by each processing unit, and based on the first instruction sent by each processing unit The priority is to send the first instruction received in sequence.
  • the first instruction includes a storage address of the bypass information, and the bypass information corresponding to the first instruction is stored under the storage address of the bypass information;
  • the circuit further includes: a second instruction distribution A unit, configured to decouple the original instruction carrying the bypass information, obtain the decoupled original instruction and the bypass information, store the bypass information under the storage address of the bypass information, and based on the solution generating the first instruction from the coupled original instruction and the storage address of the bypass information, and sending the first instruction to the instruction queue in the processing unit; If the first instruction satisfies the instruction sending condition, generate a target instruction based on the first instruction and the storage address of the bypass information, and send the target instruction.
  • the bus control unit is further configured to: clear the information of the processed second instruction from the sent instruction queue.
  • the first instruction includes a write instruction
  • the bypass information includes first bypass information corresponding to the write instruction
  • the circuit further includes: a first storage unit, configured to store the first bypass information road information.
  • the bus control unit is configured to: extract the storage address of the first bypass information from the write instruction, and acquire the first bypass information from the storage address of the first bypass information , generating the target instruction based on the first bypass information and the write instruction, and sending the target instruction.
  • the first instruction includes a read instruction
  • the bypass information includes second bypass information corresponding to the read instruction
  • the circuit further includes: a second storage unit, configured to store the second bypass information road information.
  • the bus control unit is further configured to: use the read instruction as the target instruction.
  • the bus control unit is further configured to: receive the target data read by the read instruction, the target data carrying the storage address of the second bypass information; write the target data into the In the storage address of the second bypass information in the second storage unit.
  • the bus control unit is further configured to: clear the bypass information corresponding to the first instruction from the storage address of the bypass information when the processing of the first instruction is completed.
  • the circuit further includes a statistical unit for counting the following information: the total number of the second instructions; the number of processed instructions in each second instruction; and the read/write type information of each second instruction;
  • the detection unit is configured to: in the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read/write type information is different from the first instruction, based on the statistics of the statistical unit The information detects whether the first instruction satisfies the instruction sending condition.
  • the information of the first instruction is written into the sent instruction queue when the first instruction is sent successfully.
  • an artificial intelligence chip including: the data processing circuit described in any embodiment of the present disclosure; and a control unit, configured to send instructions to the data processing circuit.
  • a data processing method which is applied to the detection unit included in each processing unit in the data processing circuit described in any embodiment of the present disclosure, the method includes: based on the The information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit detect whether the first instruction satisfies the instruction sending condition; When the instruction sending condition is satisfied, the first instruction is taken out from the instruction queue and sent; the information includes at least one of read-write type information and address information.
  • a data processing device which is applied to the detection unit included in each processing unit in the data processing circuit according to any embodiment of the present disclosure, the device includes: a detection module, It is used to detect whether the first instruction satisfies the instruction sending condition based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit; A sending module, configured to take out the first instruction from the instruction queue and send it when the first instruction satisfies the instruction sending condition; the information includes at least one of read-write type information and address information .
  • a plurality of processing units are used to process instructions, and each processing unit judges independently based on the information of the first instruction received by the processing unit and the information of the second instruction sent by the processing unit. Whether there is a data hazard between the instructions processed. In the case of data hazards between instructions processed by some of the processing units, instructions can still be sent through other processing units, thereby reducing instruction congestion.
  • FIG. 1 is a schematic diagram of a data processing circuit of an embodiment of the present disclosure.
  • Fig. 2 is a schematic diagram of the detection principle of the detection unit of the embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of a data processing circuit according to another embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of instruction over-issuance according to an embodiment of the present disclosure.
  • 5A and 5B are schematic diagrams of decoupling and merging of instructions in an embodiment of the present disclosure, respectively.
  • FIG. 6 is a schematic diagram of an instruction sending process of an embodiment of the present disclosure.
  • FIG. 7 is an overall flowchart of an embodiment of the present disclosure.
  • Fig. 8 is a block diagram of an artificial intelligence chip according to an embodiment of the present disclosure.
  • FIG. 9 is a flowchart of a data processing method according to an embodiment of the present disclosure.
  • FIG. 10 is a block diagram of a data processing device of an embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • address 1 in the external storage space stores data d1
  • the control unit for data transfer generates two instructions, one instruction is a read instruction indicating to read data from address 1 and write it into the register, and the other One command is a write command indicating to write data d2 to address 1.
  • the external storage space may include, but is not limited to, double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR) or high bandwidth memory (High Bandwidth Memory, HBM).
  • the control unit may include, but is not limited to, a central processing unit (Central Processing Unit, CPU) or a graphics processing unit (Graphics Processing Unit, GPU). These two instructions are executed in different orders, which will produce different execution results. If the read command is executed first and then the write command is executed, the data written to the register is the original data d1 in address 1. If the write command is executed first and then the read command is executed, the data written to the register is the data d2 in address 1 modified by the write command. This kind of data read and write errors that may be caused by executing instructions of different read and write types on the same address is called data hazard. In order to reduce data hazards, it is necessary to control the sending process of read commands and write commands. However, the related technology generally only judges whether there is a data hazard between two instructions before and after. Once a data hazard occurs, subsequent commands cannot be issued, which easily leads to command congestion.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • an embodiment of the present disclosure provides a data processing circuit 100.
  • the circuit 100 includes a plurality of processing units 101, and each processing unit 101 in the plurality of processing units includes:
  • an instruction queue 1011 configured to cache the received first instruction
  • Sent instruction queue 1012 configured to cache information of each second instruction in at least one second instruction that has been sent
  • the detection unit 1013 is configured to detect whether the first instruction satisfies the instruction sending condition based on the information of the first instruction and the information of each second instruction, and if the instruction sending condition is satisfied, the taking the first command out of the queue and sending it;
  • the information includes at least one of read-write type information and address information.
  • the instruction is generally stopped when there is a data hazard, and the instruction cannot be resent until the processing of the sent instruction is completed.
  • the embodiment of the present disclosure adopts multiple independent processing units 101 , even if there is a data hazard among the instructions processed by one or some of the processing units, the instructions can still be sent through other processing units. It is only necessary to stop issuing instructions if there is a data hazard between instructions processed by all processing units. Therefore, the embodiments of the present disclosure can effectively improve instruction sending efficiency and reduce instruction congestion.
  • each processing unit independently judges whether there is a data hazard between the instructions processed by the processing unit based on the information of the first instruction received by the processing unit and the information of the second instruction sent by the processing unit. In the case of data hazards between instructions processed by some of the processing units, instructions can still be sent through other processing units, thereby reducing instruction congestion. It should be noted that when two instructions indicate to write data to the same address, a write conflict problem may occur, and the data processing circuit in the embodiment of the present disclosure may further include a processing module for avoiding the write conflict problem.
  • the above instruction queue 1011 may be a First In First Out (FIFO) queue, and each time the instruction queue 1011 receives the first instruction sent by the superior control unit, it may cache the first instruction.
  • the upper-level control unit is, for example, the instruction distribution unit 102 shown in FIG. 3 , or the control unit 802 shown in FIG. 8 .
  • the order in which the first instructions are cached in the instruction queue 1011 may be the same as the order in which the instruction queue 1011 receives the first instructions.
  • the first instruction may include a read instruction for reading data from a certain address in the external storage space (for example, a hard disk).
  • the first instruction may also include a write instruction for writing data to a certain address in the external storage space.
  • the first instruction may include information corresponding to the first instruction, where the information includes address information and read/write type information.
  • the address information is used to indicate the address requested to be accessed by the first instruction.
  • the address information indicates the address where the data to be read is located (ie, the data source); for a write command, the address information indicates the address where the data needs to be written (ie, the data destination).
  • the read/write type information is used to indicate whether the first instruction is a read instruction or a write instruction.
  • the information of each second instruction may be cached in the above-mentioned sent instruction queue 1012 .
  • the second instruction refers to an instruction that has been successfully sent but not yet processed. It may be determined that the instruction has been processed after receiving a notification message returned for a second instruction indicating that the instruction has been processed, and the information of the second instruction is cleared in the sent instruction queue 1012 .
  • the return condition of the notification message can be set based on actual conditions. For example, for a read command, the notification message may be returned by the data requester when the data requested by the read command is returned to the data requester. For another example, for a write command, when the data carried in the write command is written to the specified data receiver, the data receiver may return the notification message. Of course, the actual situation is not limited to the methods listed above.
  • the information of the second command may also include address information and read/write type information.
  • address information and read/write type information included in the second instruction please refer to the meaning of the address information and the read/write type information included in the first instruction, which will not be repeated here.
  • the address information and read/write type information in the sent instruction queue 1012 may be cached correspondingly, that is, the address information and read/write type information of the same second instruction are cached as one piece of information.
  • the detecting unit 1013 may respectively compare the information of the first instruction with each piece of information cached in the sent instruction queue 1012, so as to determine whether there is a data hazard between the first instruction and the sent second instruction. If there is a data hazard between the first instruction and any second instruction, the first instruction is not sent. The first instruction is sent only when there is no data hazard between the first instruction and any second instruction. In the case of a data hazard, the detection unit 1013 may detect whether there is a data hazard again after a certain time interval until there is no data hazard and take the first instruction out of the instruction queue 1011 for transmission. Wherein, the time interval may be one clock cycle, or other durations.
  • the detection unit 1013 may determine whether there is a data hazard based on at least one of the read/write type information and the address information. Specifically, it may first be determined whether the sent instruction queue 1012 is full (step 201). When the sent instruction queue is not full, it may be based on the read/write type information of the first instruction, the read/write type information of the second instruction, the address information of the first instruction, and the second instruction. The address information of the second instruction detects whether the first instruction satisfies the instruction sending condition.
  • the address information of each second instruction in the sent instruction queue 1012 is different from the address information of the first instruction, it may be determined that the first instruction satisfies the instruction sending condition (step 202) , and send the first instruction (step 207).
  • the information included in the sent instruction queue 1012 is as shown in Table 1:
  • instruction 1, instruction 2 and instruction 3 are the second instructions
  • the information ⁇ A1, read ⁇ included in the sent instruction queue 1012 is the information of instruction 1
  • ⁇ A2 write ⁇ is the information of instruction 2
  • ⁇ A3 is the information of instruction 3
  • the address information in the first instruction is A3, A3 is different from A1, and A3 is different from A2. Since the address information in the first command is different from the address information in the sent second commands, and the data read and write between different addresses are independent of each other, no matter the read and write data type of the first command and the What is the read-write data type of each second instruction? There is no data hazard between the first instruction and any second instruction, that is, the first instruction satisfies the instruction sending condition, so the first instruction can be sent.
  • the information included in the sent instruction queue 1012 is as shown in Table 1, and assume that the address information in the first instruction is A1, since the first instruction and instruction 1 are directed to the same address, if the first instruction If the instruction is sent, if the read/write type information of the first instruction is different from that of instruction 1, a data read/write error may occur, that is, there is a data hazard. Therefore, in this case, it is necessary to combine the read-write type information and the address information to determine whether the first instruction satisfies the instruction sending condition.
  • the sent instruction queue 1012 has the same target address information as the address information of the first instruction
  • the sent instruction queue 1012 has the read/write type information different from the first instruction target read/write type information, and the target read/write type information and the target address information belong to the same second instruction, it can be determined that the first instruction does not meet the instruction sending condition (step 203), so that the first instruction is not sent instruction (step 206).
  • the first instruction satisfies the instruction sending condition (step 202), and send the first instruction (step 207).
  • the read/write type information of the first instruction is write, since the address information of the first instruction is the same as the address information of instruction 1, and the read/write type information of the first instruction is the same as the read/write type information of instruction 1.
  • the type information is different, so there is a data hazard, and the first command does not satisfy the command sending condition.
  • the read-write type information of the first instruction is read, there is no second instruction that satisfies the same address information of the second instruction as the address information of the first instruction and satisfies the read-write type information of the second instruction. Different from the read/write type information of the first instruction, therefore, there is no data hazard, and the first instruction satisfies the instruction sending condition.
  • the detection unit may also use the following method to determine whether the first instruction satisfies the instruction sending condition, including: When the address information of the second instruction is different from the address information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition; the address information and the first instruction exist in the sent instruction queue.
  • the sent instruction queue 1012 When the sent instruction queue 1012 is full, it may be detected whether the first instruction satisfies the instruction sending condition based on the read/write type information of the first instruction and the read/write type information of the second instruction. Specifically, in the case where the read/write type information of each second instruction in the sent instruction queue 1012 is the same as the read/write type information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition ( Step 204), and send the first instruction (step 207).
  • each item of read/write type information in the sent instruction queue 1012 is write, and the read/write type information of the first instruction is also write, regardless of whether each write instruction Whether the address information (address 1, address 2, . . . , address n) is the same, there is no data hazard between the various write instructions. Therefore, in this case, the first instruction can be sent directly.
  • the first instruction can be sent directly.
  • the read-write type information in the sent instruction queue 1012 is read, and the read-write type information of the first instruction is also read, it is also possible to directly write to the first instruction A command is sent.
  • the number of sent instructions exceeds the length of the sent instruction queue 1012, and this situation may be called instruction overissue.
  • instruction oversending it is possible to improve the sending efficiency of multiple consecutive instructions of the same read and write type, and further reduce instruction congestion.
  • each item of read-write type information in the sent instruction queue 1012 includes both read and write; or, each item of read-write type information in the sent instruction queue 1012 is read, but the read-write type information of the first instruction is write; or, all the read-write type information in the sent instruction queue 1012 is write, but the read-write type information of the first instruction is read. In the above three cases, regardless of the address information in the first command and the second command, the first command is not sent.
  • the number of instructions that have been sent but not yet processed has exceeded the length of the sent instruction queue 1012 .
  • the number of instructions that have been sent and have not been processed is 5, and the information of a total of 4 instructions can be cached in the sent instruction queue 1012 (that is, the length of the sent instruction queue 1012 is 4).
  • the information of the super-issued command cannot be stored in the sent command queue 1012 after the super-issued command is sent, and whether there is a data hazard cannot be determined based on the information in the sent command queue 1012 .
  • the super-issue instruction S0 is a read instruction for address A0
  • the information ⁇ A0, read ⁇ of instruction S0 is not cached in the sent instruction queue 1012
  • the information in the sent instruction queue 1012 is as shown in the table 2 shows:
  • the information ⁇ A1, read ⁇ is cleared from the sent instruction queue 1012, and the sent instruction queue 1012 is not full at this time. If the instruction queue 1011 receives an instruction Sk whose information is ⁇ A0, write ⁇ , it can be known from the above-mentioned method of judging data hazard that the same address information as the address information of the instruction Sk does not exist in the sent instruction queue 1012, therefore, If it is determined based on the information in the sent command queue 1012 that the command Sk satisfies the command sending condition.
  • all items of information in the sent command queue can be set to invalid .
  • the first instruction may be detected based on the statistical information of the sent instruction. Whether the instruction satisfies the sending condition of the instruction. In this case, the first instruction is sent only when all the sent second instructions have been processed; as long as there are unprocessed second instructions, the first instruction is not sent.
  • the invalid setting for each item of information in the sent instruction queue can be canceled.
  • the circuit further includes a statistical unit 1014, configured to count the following information: the total number of the second instructions; the number of instructions that have been processed in each second instruction; and the reading and writing of each second instruction type information.
  • the detection unit 1013 may base on the statistics of the statistical unit 1014 in the case that all items of information in the sent command queue 1012 are invalid and there is a second command whose read/write type information is different from the first command. The information detects whether the first instruction satisfies an instruction sending condition.
  • the detection unit 1013 may first determine whether the read/write type information of the first instruction is the same as that of the sent second instructions. In the case that the read and write type information is different, the total number of sent second instructions and the number of instructions processed in each second instruction are obtained from the statistics unit 1014 . Only when the total number of sent second instructions is equal to the number of processed instructions, it is determined that the first instruction satisfies the instruction sending condition; otherwise, it is judged that the first instruction does not meet the instruction sending condition.
  • processing units 101 are used to process instructions sent by different thread groups. Wherein, each processing unit 101 may be used to process instructions sent by one or more thread groups, and the thread groups responsible for different processing units may be the same or different. For example, processing unit 0 is used to process instructions sent by thread group 0, processing unit 1 is used to process instructions sent by thread group 1 and thread group 2, and processing unit 2 is used to process instructions sent by thread group 3 and thread group 4.
  • the circuit further includes a first instruction distribution unit, configured to receive the first instructions sent by each thread group, and the first instructions sent by each thread group carry Corresponding to the identification information of the thread group; distributing the first instruction sent by each thread group to the corresponding processing unit 101 based on the identification information carried in the first instruction sent by each thread group respectively. Identification information such as thread group number.
  • instructions sent by different processing units 101 have different priorities; the circuit further includes: an instruction arbitration unit 103, configured to receive the first instruction sent by each processing unit 101, and based on each processing unit 101 The priority of the first instruction to be sent is to send the first instructions sent by each processing unit 101 in sequence.
  • the first instruction includes a storage address of the bypass information, and the bypass information corresponding to the first instruction is stored under the storage address of the bypass information;
  • the circuit further includes: a second an instruction distribution unit, configured to decouple the original instruction carrying the bypass information, obtain the decoupled original instruction and the bypass information, and store the bypass information under the storage address of the bypass information, generating the first instruction based on the decoupled original instruction and the storage address of the bypass information, and sending the first instruction to the instruction queue; a bus control unit, configured to based on the first instruction Generate a target instruction, and send the target instruction.
  • the first instruction distribution unit for distributing instructions to each processing unit 101 and the second instruction distribution unit for decoupling original instructions may be the same instruction distribution unit 102, However, in practical applications, instruction distribution and instruction decoupling may also be performed separately through different instruction distribution units.
  • raw instructions may be issued by thread groups.
  • the original instruction may carry some bypass information, which has nothing to do with the process of judging whether the first instruction satisfies the instruction sending condition.
  • the bypass information may include, but not limited to, data to be written, identification information for identifying valid bits of the data to be written, and the like.
  • the bypass information may include, but not limited to, a target address for data reading, an address of a register storing the target address, and the like. If the bypass information is always carried in the instruction, each processing unit needs additional storage space to store the bypass information, which will increase the area and power consumption of the data processing circuit.
  • the bypass information is decoupled from the instruction, the bypass information is stored separately, and the data hazard detection is performed based on the part of the instruction except the bypass information, thereby reducing the area and power consumption of the circuit, and at the same time Reduce crossbar (crossbar matrix) complexity. Only when it is determined that the first instruction satisfies the instruction sending condition, the bypass information is combined with the first instruction again to obtain the target instruction, and the target instruction is sent.
  • the instruction distribution unit 102 can extract the bypass information from the original instruction, and send the bypass information to the bypass information storage unit.
  • the bypass information corresponding to the read command and the bypass information corresponding to the write command may be stored separately.
  • the bypass information corresponding to the write command may be stored in the first storage unit 105
  • the bypass information corresponding to the read command may be stored in the second storage unit 106 .
  • the bypass information storage unit may return the storage address of the bypass information to the instruction dispatch unit 102 .
  • the storage address of the bypass information is the storage address of the bypass information corresponding to the write instruction in the first storage unit 105; for a read instruction, the storage address of the bypass information is the corresponding bypass information of the read instruction.
  • the instruction distribution unit 102 may combine the storage addresses of the instruction information and the bypass information to generate a first instruction, and send the first instruction to the bus control unit 104 through the instruction arbitration unit 103 .
  • the bus control unit 104 can generate the final target instruction and send it. Wherein, for the write command and the read command, the bus control unit 104 may generate the target command in different ways. Specifically, for the write command, since the bypass information such as the data to be written needs to be sent to the target address together with the command, so as to write the data to be written into the target address, the bus control unit 104 can obtain the data from the write command Extracting the storage address of the first bypass information corresponding to the write instruction, obtaining the first bypass information from the storage address of the first bypass information, based on the first bypass information and the instruction of the write instruction The information generates the target instruction and sends the target instruction, as shown in FIG. 5A .
  • the data storage unit that stores the data to be read does not need to know where the data to be read will be read. Therefore, the bus control unit 104 can directly use the storage address of the bypass information and the instruction information of the read command as The target command is sent, as shown in Figure 5B.
  • the bus control unit 104 may also receive target data read by the read command, where the target data may carry a storage address of the second bypass information corresponding to the read command.
  • the bus control unit 104 may write the target data into the storage address of the second bypass information corresponding to the read command, so that the data requester (for example, a register) can read the target data from the storage address of the second bypass information.
  • the bus control unit 104 is further configured to clear the bypass information corresponding to the first instruction from the storage address of the bypass information when the processing of the first instruction is completed, so that The storage unit for storing bypass information can free up storage space to store bypass information of other instructions. Wherein, when the processing of the read instruction is completed, the bypass information corresponding to the read instruction can be cleared from the storage address of the bypass information corresponding to the read instruction; Clear the bypass information corresponding to the write command in the storage address of the information.
  • the sent instruction queue includes the information of instruction 1, the information of instruction 2 and the information of instruction 3 and the sent instruction queue is not full; the instruction queue includes the instruction 4. Instruction 5 and Instruction 6. Then it is possible to extract the information of the command at the forefront in the command queue (that is, command 4), and based on the information of command 4 and the information of command 1, command 2 and command 3, check whether command 4 satisfies the command sending condition. If satisfied, send instruction 4. If the command 4 is sent successfully, the information of the command 4 is stored in the sent command queue in the clock cycle T2.
  • the information of the processed instruction (assumed to be the information of instruction 1) can also be cleared from the sent instruction queue.
  • the clock period T2 may be before the clock period T3 or after the clock period T3, which is not limited in the present disclosure.
  • the processing completion time of the instruction sent earlier may be earlier or later than the processing completion time of the instruction sent later, that is, the order in which the information of each instruction is stored in the sent instruction queue is the same as the information of each instruction from the sent instruction The order in which the queues are cleared is not necessarily the same.
  • the bus control unit 104 is further configured to clear the information of the processed second instruction from the sent instruction queue.
  • the sent instruction queue 1012 may send the cached information of the second instruction and/or the cached address of the information to the bus control unit 104 .
  • the bus control unit 104 may send an enable signal to the sent instruction queue 1012, and the enable signal may carry information such as the second instruction whose processing is completed cache address in the sent instruction queue 1012, so that the sent instruction queue 1012 can clear the information in the corresponding cache address in response to the enable signal.
  • the instruction distribution unit 102 may distribute instructions to the instruction queues 1011 in each processing unit 101 according to the thread group number (S1).
  • the instruction distribution unit 102 may also decouple the instruction information and the bypass information in the instruction, and store the decoupled bypass information into the storage address of the bypass information (S2).
  • the storage address and instruction information are combined to generate a first instruction.
  • the detecting unit 1013 can judge whether there is a data hazard (that is, whether the first instruction satisfies the instruction sending condition) based on the information of the first instruction and the information of the second instructions stored in the sent instruction queue, or based on the statistics of the statistics unit 1014 The information judges whether there is a data risk (S3). If there is a data hazard, the first instruction is still cached in the instruction queue 1011, and periodically re-judged whether there is a data hazard (S4). If there is no data hazard, the first instruction is taken from the instruction queue 1011 and sent to the instruction arbitration unit 103 (S5). The processing flow of each processing unit 101 is the same, and will not be described one by one here.
  • the instruction arbitration unit 103 may sequentially send each first instruction to the bus control unit 104 according to the priorities of the first instructions sent by each processing unit 101 ( S6 ).
  • the bus control unit 104 may generate a target command based on the first command received from the command arbitration unit 103, and send the target command to a corresponding target address (S7).
  • the bus control unit 104 may also clear the information corresponding to the second command in the sent command queue 1012 (S8), and clear the bypass information (S9) when receiving the processing completion information for a certain second command. ).
  • the detection unit 1013 may also write the information of the first instruction into the sent instruction queue 1012 when the first instruction is successfully sent through the bus control unit 104 ( S10 ).
  • the execution order of the above steps is not limited to the order shown in the figure, for example, the order of steps S8 and S9 can be interchanged, the order of step S10 and step S8 or S9 can be interchanged, etc.
  • multiple processing units process data hazards in parallel. When some processing units have data hazards, other processing units without data hazards can still send instructions.
  • the present disclosure allows consecutive super-issuance of instructions of the same read and write type. Instruction over-issuance can effectively improve system memory access performance.
  • the disclosed method can be used to realize efficient multi-processing unit processing data adventure, improve the memory access performance of the system, and use the scalability and deformation of the disclosed method to reduce power consumption and reduce crossbar complexity.
  • the present disclosure also provides an artificial intelligence chip, including: a data processing circuit 801 ; and a control unit 802 configured to send instructions to the data processing circuit 801 .
  • the data processing circuit 801 may adopt the data processing circuit described in any embodiment of the present disclosure.
  • details of the data processing circuit 801 in this embodiment refer to the foregoing embodiments for details, and details are not repeated here.
  • an embodiment of the present disclosure also provides a data processing method, which is applied to the detection unit included in each processing unit in the data processing circuit described in any embodiment of the present disclosure, the method comprising:
  • Step 901 Based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit, detect whether the first instruction satisfies instruction sending condition;
  • Step 902 If the first instruction satisfies the instruction sending condition, take the first instruction from the instruction queue and send it;
  • the information includes at least one of read-write type information and address information.
  • the detecting based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit whether the first instruction satisfies Instruction sending conditions including: when the sent instruction queue is not full, based on the read/write type information of the first instruction, the read/write type information of the second instruction, and the address of the first instruction information and the address information of the second instruction to detect whether the first instruction satisfies the instruction sending condition; when the sent instruction queue is full, based on the read/write type information of the first instruction and the second instruction The read/write type information of the second instruction detects whether the first instruction satisfies the instruction sending condition.
  • the read/write type information based on the first instruction, the read/write type information of the second instruction, the address of the first instruction information and the address information of the second instruction to detect whether the first instruction satisfies the instruction sending condition, including: the address information of each second instruction in the sent instruction queue is different from the address information of the first instruction
  • the first instruction satisfies the instruction sending condition there is at least one potential conflicting instruction whose address information is the same as the address information of the first instruction in the sent instruction queue, but each of the potential conflicting instructions
  • the read/write type of the first instruction is the same as the read/write type of the first instruction, it is determined that the first instruction satisfies the instruction sending condition; there is at least one of the potential conflict instructions in the sent instruction queue, and If there is at least one conflicting instruction whose read/write type is different from that of the first instruction among the at least one potential conflicting instruction, it is determined that the first instruction does not satisfy the instruction
  • the sending condition includes: in the case that there is at least one second instruction in the sent instruction queue whose read/write type information is different from the read/write type information of the first instruction, determining that the first instruction does not satisfy instruction sending Condition: when the read/write type information of each second instruction in the sent instruction queue is the same as the read/write type information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition.
  • processing units are used to process instructions sent by different thread groups.
  • the circuit further includes a statistical unit for counting the following information: the total number of the second instructions; the number of processed instructions in each second instruction; and the read/write type information of each second instruction; Detecting whether the first instruction satisfies the instruction sending condition based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit, It includes: in the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read-write type information is different from that of the first instruction, detecting the Whether the first instruction satisfies the instruction sending condition.
  • the detecting based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit whether the first instruction satisfies Instruction sending conditions include: when all items of information in the sent instruction queue are invalid, and there are second instructions whose read-write type information is different from the first instruction, if each second instruction is processed Completed, it is determined that the first instruction satisfies the instruction sending condition.
  • the information of the first instruction is written into the sent instruction queue when the first instruction is sent successfully.
  • an embodiment of the present disclosure also provides a data processing device, which is applied to the detection unit included in each processing unit in the data processing circuit described in any embodiment of the present disclosure, and the device includes:
  • a detection module 1001 configured to detect whether the first instruction satisfies the requirements based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit. Instruction sending condition;
  • the information includes at least one of read-write type information and address information.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Abstract

Provided in the embodiments of the present disclosure are a data processing circuit, an artificial intelligence chip, and a data processing method and apparatus. Instructions are processed by means of a plurality of processing units, and each processing unit independently determines, on the basis of information of a first instruction received by the processing unit and information of a second instruction sent by the processing unit, whether there are data hazards between the instructions processed by the processing units. When there are data hazards between the instructions processed by some of the processing units, the instructions can still be sent by means of other processing units, such that instruction congestion is reduced.

Description

数据处理电路及人工智能芯片、数据处理方法和装置Data processing circuit and artificial intelligence chip, data processing method and device
相关公开的交叉引用Related Publication Cross-References
本公开要求于2021年11月29日提交的、申请号为202111435830.3的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。This disclosure claims priority to the Chinese patent publication with application number 202111435830.3 filed on November 29, 2021, the entire content of which is incorporated herein by reference.
技术领域technical field
本公开涉及集成电路设计技术领域,尤其涉及数据处理电路及人工智能芯片、数据处理方法和装置。The present disclosure relates to the technical field of integrated circuit design, in particular to a data processing circuit, an artificial intelligence chip, a data processing method and a device.
背景技术Background technique
随着人工智能和高性能计算的不断发展,相应处理系统所需要处理的数据量日趋庞大。在处理过程中,大量数据需要在内部存储空间和外部存储空间之间进行搬运。在搬运过程中可能发生数据冒险的情况,即,至少两笔指令指示往相同的存储地址分别读取和写入数据,可能导致数据读写错误。由此,需要对读指令和写指令进行控制,以防发生数据读写错误。相关技术一般只判断前后两条指令之间是否存在数据冒险。一旦发生数据冒险,则后续指令无法下发,容易导致指令拥塞。With the continuous development of artificial intelligence and high-performance computing, the amount of data that the corresponding processing system needs to process is increasing. During processing, a large amount of data needs to be moved between internal storage space and external storage space. Data hazards may occur during the transfer process, that is, at least two instructions indicate to read and write data to the same storage address, which may lead to data read and write errors. Therefore, it is necessary to control the read and write commands to prevent data read and write errors. The related technology generally only judges whether there is a data hazard between the preceding and following two instructions. Once a data hazard occurs, subsequent commands cannot be issued, which easily leads to command congestion.
发明内容Contents of the invention
本公开提供一种数据处理电路及人工智能芯片、数据处理方法和装置。The disclosure provides a data processing circuit, an artificial intelligence chip, a data processing method and a device.
根据本公开实施例的第一方面,提供一种数据处理电路,所述电路包括多个处理单元,所述多个处理单元中的每个处理单元均包括:指令队列,用于对接收到的第一指令进行缓存;已发送指令队列,用于缓存已发送的至少一条第二指令中每条第二指令的信息;检测单元,用于基于所述第一指令的信息与每条第二指令的信息检测所述第一指令是否满足指令发送条件,并在满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送;所述信息包括读写类型信息与地址信息中的至少一者。According to the first aspect of the embodiments of the present disclosure, there is provided a data processing circuit, the circuit includes a plurality of processing units, and each of the plurality of processing units includes: an instruction queue for processing received The first instruction is cached; the sent instruction queue is used to cache the information of each second instruction in at least one second instruction that has been sent; the detection unit is used for based on the information of the first instruction and each second instruction information to detect whether the first instruction satisfies the instruction sending condition, and if the instruction sending condition is met, take the first instruction from the instruction queue and send it; the information includes read-write type information and at least one of the address information.
可选地,所述检测单元具体用于:在所述已发送指令队列未存满的情况下,基于所述第一指令的读写类型信息、所述第二指令的读写类型信息、所述第一指令的地址信息以及所述第二指令的地址信息检测所述第一指令是否满足指令发送条件;在所述已发送指令队列存满的情况下,基于所述第一指令的读写类型信息以及所述第二指令的读写类型信息检测所述第一指令是否满足指令发送条件。Optionally, the detection unit is specifically configured to: when the sent instruction queue is not full, based on the read/write type information of the first instruction, the read/write type information of the second instruction, the The address information of the first instruction and the address information of the second instruction detect whether the first instruction satisfies the instruction sending condition; when the sent instruction queue is full, read and write based on the first instruction The type information and the read/write type information of the second instruction detect whether the first instruction satisfies the instruction sending condition.
可选地,在所述已发送指令队列未存满的情况下,所述检测单元具体用于:在所述已发送指令队列中各第二指令的地址信息均不同于所述第一指令的地址信息的情况下,判定所述第一指令满足所述指令发送条件;在所述已发送指令队列中存在地址信息与所述第一指令的地址信息相同的至少一条潜在冲突指令、但各所述潜在冲突指令的读写类型与所述第一指令的读写类型相同的情况下,判定所述第一指令满足所述指令发送条件;在所述已发送指令队列中存在至少一条所述潜在冲突指令、且所述至少一条潜在冲突指令中存在读写类型与所述第一指令的读写类型不同的至少一条冲突指令的情况下,判定所述第一指令不满足所述指令发送条件。Optionally, when the sent instruction queue is not full, the detection unit is specifically configured to: address information of each second instruction in the sent instruction queue is different from that of the first instruction In the case of address information, it is determined that the first instruction satisfies the instruction sending condition; there is at least one potential conflicting instruction whose address information is the same as the address information of the first instruction in the sent instruction queue, but each In the case where the read/write type of the potential conflicting instruction is the same as the read/write type of the first instruction, it is determined that the first instruction satisfies the instruction sending condition; there is at least one of the potential conflicting instructions in the sent instruction queue In the case of conflicting instructions, and there is at least one conflicting instruction whose read/write type is different from that of the first instruction among the at least one potential conflicting instruction, it is determined that the first instruction does not satisfy the instruction sending condition.
可选地,在所述已发送指令队列存满的情况下,所述检测单元具体用于:在所述已发送指令队列中存在至少一条第二指令的读写类型信息与所述第一指令的读写类型信息不同的情况下,判定所述第一指令不满足指令发送条件;在所述已发送指令队列中各 第二指令的读写类型信息均与所述第一指令的读写类型信息相同的情况下,判定所述第一指令满足指令发送条件。Optionally, in the case that the sent instruction queue is full, the detecting unit is specifically configured to: there is at least one second instruction in the sent instruction queue that has the same read/write type information as the first instruction In the case of different read-write type information, it is determined that the first instruction does not meet the instruction sending condition; the read-write type information of each second instruction in the sent instruction queue is the same as the read-write type information of the first instruction If the information is the same, it is determined that the first instruction satisfies the instruction sending condition.
可选地,不同的处理单元用于处理不同的线程组发送的指令。Optionally, different processing units are used to process instructions sent by different thread groups.
可选地,所述电路还包括第一指令分发单元,用于:接收各个线程组发送的第一指令,每个线程组发送的第一指令中均携带对应线程组的标识信息;分别基于各个线程组发送的第一指令中携带的标识信息,将各个线程组发送的第一指令分发至对应的处理单元。Optionally, the circuit further includes a first instruction distribution unit, configured to: receive first instructions sent by each thread group, each first instruction sent by each thread group carries identification information of the corresponding thread group; The identification information carried in the first instruction sent by the thread group distributes the first instruction sent by each thread group to the corresponding processing unit.
可选地,不同的处理单元发送的指令具有不同的优先级;所述电路还包括:指令仲裁单元,用于接收各个处理单元发送的第一指令,并基于各个处理单元发送的第一指令的优先级,依次对接收到的第一指令进行发送。Optionally, instructions sent by different processing units have different priorities; the circuit further includes: an instruction arbitration unit, configured to receive the first instructions sent by each processing unit, and based on the first instruction sent by each processing unit The priority is to send the first instruction received in sequence.
可选地,所述第一指令中包括旁路信息的存储地址,所述旁路信息的存储地址下存储有所述第一指令对应的旁路信息;所述电路还包括:第二指令分发单元,用于对携带所述旁路信息的原始指令进行解耦,得到解耦后的原始指令以及所述旁路信息,将所述旁路信息存储到旁路信息的存储地址下,基于解耦后的原始指令与所述旁路信息的存储地址生成所述第一指令,并将所述第一指令下发至所述处理单元中的所述指令队列;总线控制单元,用于在所述第一指令满足所述指令发送条件的情况下,基于所述第一指令和所述旁路信息的存储地址生成目标指令,并对所述目标指令进行发送。Optionally, the first instruction includes a storage address of the bypass information, and the bypass information corresponding to the first instruction is stored under the storage address of the bypass information; the circuit further includes: a second instruction distribution A unit, configured to decouple the original instruction carrying the bypass information, obtain the decoupled original instruction and the bypass information, store the bypass information under the storage address of the bypass information, and based on the solution generating the first instruction from the coupled original instruction and the storage address of the bypass information, and sending the first instruction to the instruction queue in the processing unit; If the first instruction satisfies the instruction sending condition, generate a target instruction based on the first instruction and the storage address of the bypass information, and send the target instruction.
可选地,所述总线控制单元还用于:从所述已发送指令队列中清空已处理完成的第二指令的信息。Optionally, the bus control unit is further configured to: clear the information of the processed second instruction from the sent instruction queue.
可选地,所述第一指令包括写指令,所述旁路信息包括所述写指令对应的第一旁路信息;所述电路还包括:第一存储单元,用于存储所述第一旁路信息。Optionally, the first instruction includes a write instruction, and the bypass information includes first bypass information corresponding to the write instruction; the circuit further includes: a first storage unit, configured to store the first bypass information road information.
可选地,所述总线控制单元用于:从所述写指令中提取所述第一旁路信息的存储地址,从所述第一旁路信息的存储地址中获取所述第一旁路信息,基于所述第一旁路信息与所述写指令生成所述目标指令,并对所述目标指令进行发送。Optionally, the bus control unit is configured to: extract the storage address of the first bypass information from the write instruction, and acquire the first bypass information from the storage address of the first bypass information , generating the target instruction based on the first bypass information and the write instruction, and sending the target instruction.
可选地,所述第一指令包括读指令,所述旁路信息包括所述读指令对应的第二旁路信息;所述电路还包括:第二存储单元,用于存储所述第二旁路信息。Optionally, the first instruction includes a read instruction, and the bypass information includes second bypass information corresponding to the read instruction; the circuit further includes: a second storage unit, configured to store the second bypass information road information.
可选地,所述总线控制单元还用于:将所述读指令作为所述目标指令。Optionally, the bus control unit is further configured to: use the read instruction as the target instruction.
可选地,所述总线控制单元还用于:接收所述读指令读取到的目标数据,所述目标数据中携带所述第二旁路信息的存储地址;将所述目标数据写入所述第二存储单元中所述第二旁路信息的存储地址中。Optionally, the bus control unit is further configured to: receive the target data read by the read instruction, the target data carrying the storage address of the second bypass information; write the target data into the In the storage address of the second bypass information in the second storage unit.
可选地,所述总线控制单元还用于:在所述第一指令处理完成的情况下,从所述旁路信息的存储地址中清除所述第一指令对应的旁路信息。Optionally, the bus control unit is further configured to: clear the bypass information corresponding to the first instruction from the storage address of the bypass information when the processing of the first instruction is completed.
可选地,在所述第二指令的总数超过所述已发送指令队列的长度的情况下,将所述已发送指令队列中的各项信息均置为无效。Optionally, when the total number of the second instructions exceeds the length of the sent instruction queue, all items of information in the sent instruction queue are invalidated.
可选地,所述电路还包括统计单元,用于统计以下信息:所述第二指令的总数;各个第二指令中已处理完成的指令的数量;以及各个第二指令的读写类型信息;所述检测单元用于:在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,基于所述统计单元统计的信息检测所述第一指令是否满足指令发送条件。Optionally, the circuit further includes a statistical unit for counting the following information: the total number of the second instructions; the number of processed instructions in each second instruction; and the read/write type information of each second instruction; The detection unit is configured to: in the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read/write type information is different from the first instruction, based on the statistics of the statistical unit The information detects whether the first instruction satisfies the instruction sending condition.
可选地,在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,若各第二指令均处理完成,判定所述第一指令满 足指令发送条件。Optionally, when all items of information in the sent command queue are invalid, and there are second commands whose read-write type information is different from the first command, if all the second commands are processed, It is determined that the first instruction satisfies the instruction sending condition.
可选地,所述第一指令的信息在所述第一指令发送成功的情况下被写入所述已发送指令队列。Optionally, the information of the first instruction is written into the sent instruction queue when the first instruction is sent successfully.
根据本公开实施例的第二方面,提供一种人工智能芯片,包括:本公开任一实施例所述的数据处理电路;以及控制单元,用于向所述数据处理电路发送指令。According to the second aspect of the embodiments of the present disclosure, there is provided an artificial intelligence chip, including: the data processing circuit described in any embodiment of the present disclosure; and a control unit, configured to send instructions to the data processing circuit.
根据本公开实施例的第三方面,提供一种数据处理方法,应用于本公开任一实施例所述的数据处理电路中的每个处理单元所包括的检测单元,所述方法包括:基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息检测所述第一指令是否满足指令发送条件;在第一指令满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送;所述信息包括读写类型信息与地址信息中的至少一者。According to a third aspect of the embodiments of the present disclosure, there is provided a data processing method, which is applied to the detection unit included in each processing unit in the data processing circuit described in any embodiment of the present disclosure, the method includes: based on the The information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit detect whether the first instruction satisfies the instruction sending condition; When the instruction sending condition is satisfied, the first instruction is taken out from the instruction queue and sent; the information includes at least one of read-write type information and address information.
根据本公开实施例的第四方面,提供一种数据处理装置,应用于本公开任一实施例所述的数据处理电路中每个处理单元所包括的检测单元,所述装置包括:检测模块,用于基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息检测所述第一指令是否满足指令发送条件;发送模块,用于在第一指令满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送;所述信息包括读写类型信息与地址信息中的至少一者。According to the fourth aspect of the embodiments of the present disclosure, there is provided a data processing device, which is applied to the detection unit included in each processing unit in the data processing circuit according to any embodiment of the present disclosure, the device includes: a detection module, It is used to detect whether the first instruction satisfies the instruction sending condition based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit; A sending module, configured to take out the first instruction from the instruction queue and send it when the first instruction satisfies the instruction sending condition; the information includes at least one of read-write type information and address information .
本公开实施例通过多个处理单元来处理指令,各个处理单元之间相互独立地基于本处理单元接收到的第一指令的信息以及本处理单元已发送的第二指令的信息判断本处理单元所处理的指令之间是否存在数据冒险。在其中的部分处理单元所处理的指令之间存在数据冒险的情况下,仍然可以通过其他处理单元发送指令,从而减少了指令拥塞。In the embodiment of the present disclosure, a plurality of processing units are used to process instructions, and each processing unit judges independently based on the information of the first instruction received by the processing unit and the information of the second instruction sent by the processing unit. Whether there is a data hazard between the instructions processed. In the case of data hazards between instructions processed by some of the processing units, instructions can still be sent through other processing units, thereby reducing instruction congestion.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
附图说明Description of drawings
此处的附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The drawings here show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solutions of the present disclosure.
图1是本公开实施例的数据处理电路的示意图。FIG. 1 is a schematic diagram of a data processing circuit of an embodiment of the present disclosure.
图2是本公开实施例的检测单元的检测原理的示意图。Fig. 2 is a schematic diagram of the detection principle of the detection unit of the embodiment of the present disclosure.
图3是本公开另一实施例的数据处理电路的示意图。FIG. 3 is a schematic diagram of a data processing circuit according to another embodiment of the present disclosure.
图4是本公开实施例的指令超发的示意图。FIG. 4 is a schematic diagram of instruction over-issuance according to an embodiment of the present disclosure.
图5A和图5B分别是本公开实施例的指令的解耦与合并的示意图。5A and 5B are schematic diagrams of decoupling and merging of instructions in an embodiment of the present disclosure, respectively.
图6是本公开实施例的指令发送过程的示意图。FIG. 6 is a schematic diagram of an instruction sending process of an embodiment of the present disclosure.
图7是本公开实施例的整体流程图。FIG. 7 is an overall flowchart of an embodiment of the present disclosure.
图8是本公开实施例的人工智能芯片的框图。Fig. 8 is a block diagram of an artificial intelligence chip according to an embodiment of the present disclosure.
图9是本公开实施例的数据处理方法的流程图。FIG. 9 is a flowchart of a data processing method according to an embodiment of the present disclosure.
图10是本公开实施例的数据处理装置的框图。FIG. 10 is a block diagram of a data processing device of an embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned purposes, features and advantages of the embodiments of the present disclosure more obvious and understandable, the technical solutions in the embodiments of the present disclosure are described below in conjunction with the accompanying drawings The program is described in further detail.
在内部存储空间和外部存储空间之间进行数据搬运时,可能发生数据冒险的情况,即,至少两笔指令指示往相同的存储地址分别读取和写入数据,可能导致数据读写错误。由此,需要对读指令和写指令进行控制,以防发生数据读写错误。举例来说,外部存储空间中的地址1存储有数据d1,用于数据搬运的控制单元产生了两笔指令,一笔指令是指示从地址1中读取数据并写入寄存器的读指令,另一笔指令是指示向地址1写入数据d2的写指令。所述外部存储空间可以包括但不限于双倍速率同步动态随机存储器(Double Data Rate SDRAM,DDR)或者高带宽显存(High Bandwidth Memory,HBM)等。所述控制单元可以包括但不限于中央处理单元(Central Processing Unit,CPU)或者图形处理器(Graphics Processing Unit,GPU)等。这两笔指令按照不同的顺序执行,会产生不同的执行结果。如果先执行读指令再执行写指令,则向寄存器写入的数据为地址1中的原始数据d1。如果先执行写指令再执行读指令,则向寄存器写入的数据为地址1中被写指令修改后的数据d2。这种因对同一个地址分别执行读写类型不同的指令而可能导致的数据读写错误的情况称为数据冒险。为了减少数据冒险,需要对读指令和写指令的发送过程进行控制。然而,相关技术一般只判断前后两条指令之间是否存在数据冒险。一旦发生数据冒险,则后续指令无法下发,容易导致指令拥塞。When data is moved between the internal storage space and the external storage space, data hazards may occur, that is, at least two instructions indicate to read and write data to the same storage address, which may lead to data read and write errors. Therefore, it is necessary to control the read and write commands to prevent data read and write errors. For example, address 1 in the external storage space stores data d1, and the control unit for data transfer generates two instructions, one instruction is a read instruction indicating to read data from address 1 and write it into the register, and the other One command is a write command indicating to write data d2 to address 1. The external storage space may include, but is not limited to, double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR) or high bandwidth memory (High Bandwidth Memory, HBM). The control unit may include, but is not limited to, a central processing unit (Central Processing Unit, CPU) or a graphics processing unit (Graphics Processing Unit, GPU). These two instructions are executed in different orders, which will produce different execution results. If the read command is executed first and then the write command is executed, the data written to the register is the original data d1 in address 1. If the write command is executed first and then the read command is executed, the data written to the register is the data d2 in address 1 modified by the write command. This kind of data read and write errors that may be caused by executing instructions of different read and write types on the same address is called data hazard. In order to reduce data hazards, it is necessary to control the sending process of read commands and write commands. However, the related technology generally only judges whether there is a data hazard between two instructions before and after. Once a data hazard occurs, subsequent commands cannot be issued, which easily leads to command congestion.
基于此,本公开实施例提供一种数据处理电路100,参见图1和图3,所述电路100包括多个处理单元101,所述多个处理单元中的每个处理单元101均包括:Based on this, an embodiment of the present disclosure provides a data processing circuit 100. Referring to FIG. 1 and FIG. 3, the circuit 100 includes a plurality of processing units 101, and each processing unit 101 in the plurality of processing units includes:
指令队列1011,用于对接收到的第一指令进行缓存;an instruction queue 1011, configured to cache the received first instruction;
已发送指令队列1012,用于缓存已发送的至少一条第二指令中每条第二指令的信息; Sent instruction queue 1012, configured to cache information of each second instruction in at least one second instruction that has been sent;
检测单元1013,用于基于所述第一指令的信息与每条第二指令的信息检测所述第一指令是否满足指令发送条件,并在满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送;The detection unit 1013 is configured to detect whether the first instruction satisfies the instruction sending condition based on the information of the first instruction and the information of each second instruction, and if the instruction sending condition is satisfied, the taking the first command out of the queue and sending it;
所述信息包括读写类型信息与地址信息中的至少一者。The information includes at least one of read-write type information and address information.
相关技术中一般在存在数据冒险的情况下停止发送指令,并等待已发送的指令处理 完成才能重新发送指令。本公开实施例采用多个相互独立的处理单元101,即便其中某个或某些处理单元处理的指令之间存在数据冒险,仍然可以通过其他的处理单元来发送指令。只有在所有的处理单元处理的指令之间都存在数据冒险的情况下才需要停止发送指令。因此,本公开实施例能够有效提高指令发送效率,减少指令拥塞。In the related art, the instruction is generally stopped when there is a data hazard, and the instruction cannot be resent until the processing of the sent instruction is completed. The embodiment of the present disclosure adopts multiple independent processing units 101 , even if there is a data hazard among the instructions processed by one or some of the processing units, the instructions can still be sent through other processing units. It is only necessary to stop issuing instructions if there is a data hazard between instructions processed by all processing units. Therefore, the embodiments of the present disclosure can effectively improve instruction sending efficiency and reduce instruction congestion.
由于各个处理单元之间相互独立地基于本处理单元接收到的第一指令的信息以及本处理单元已发送的第二指令的信息判断本处理单元所处理的指令之间是否存在数据冒险。在其中的部分处理单元所处理的指令之间存在数据冒险的情况下,仍然可以通过其他处理单元发送指令,从而减少了指令拥塞。需要说明的是,在2笔指令指示向同一地址写入数据时可能出现写冲突问题,本公开实施例中的数据处理电路还可以包含用于避免出现写冲突问题的处理模块。Since each processing unit independently judges whether there is a data hazard between the instructions processed by the processing unit based on the information of the first instruction received by the processing unit and the information of the second instruction sent by the processing unit. In the case of data hazards between instructions processed by some of the processing units, instructions can still be sent through other processing units, thereby reducing instruction congestion. It should be noted that when two instructions indicate to write data to the same address, a write conflict problem may occur, and the data processing circuit in the embodiment of the present disclosure may further include a processing module for avoiding the write conflict problem.
上述指令队列1011可以是一个先进先出(First In First Out,FIFO)队列,指令队列1011每接收到上级控制单元发送的第一指令,可以对该第一指令进行缓存。本公开实施例中,上级控制单元例如为如图3所示的指令分发单元102,或者,图8所示的控制单元802。第一指令在指令队列1011中的缓存顺序可以与指令队列1011接收到第一指令的顺序相同。第一指令可以包括读指令,用于从外部存储空间(例如,硬盘)中的某个地址读取数据。所述第一指令也可以包括写指令,用于将数据写入外部存储空间中的某个地址。第一指令中可以包括所述第一指令对应的信息,所述信息包括地址信息和读写类型信息。其中,地址信息用于指示所述第一指令所请求访问的地址。对于读指令,所述地址信息表示所需读取的数据所在的地址(即数据来源);对于写指令,所述地址信息表示数据需要写入的地址(即数据的目的地)。读写类型信息用于表示所述第一指令是读指令还是写指令。The above instruction queue 1011 may be a First In First Out (FIFO) queue, and each time the instruction queue 1011 receives the first instruction sent by the superior control unit, it may cache the first instruction. In the embodiment of the present disclosure, the upper-level control unit is, for example, the instruction distribution unit 102 shown in FIG. 3 , or the control unit 802 shown in FIG. 8 . The order in which the first instructions are cached in the instruction queue 1011 may be the same as the order in which the instruction queue 1011 receives the first instructions. The first instruction may include a read instruction for reading data from a certain address in the external storage space (for example, a hard disk). The first instruction may also include a write instruction for writing data to a certain address in the external storage space. The first instruction may include information corresponding to the first instruction, where the information includes address information and read/write type information. Wherein, the address information is used to indicate the address requested to be accessed by the first instruction. For a read command, the address information indicates the address where the data to be read is located (ie, the data source); for a write command, the address information indicates the address where the data needs to be written (ie, the data destination). The read/write type information is used to indicate whether the first instruction is a read instruction or a write instruction.
上述已发送指令队列1012中可以对每条第二指令的信息均进行缓存。第二指令是指已成功发送但未被处理完成的指令。可以在接收到针对一第二指令返回的用于表征指令被处理完成的通知消息的情况下确定该指令被处理完成,并在已发送指令队列1012中清空该第二指令的信息。该通知消息的返回条件可以基于实际情况设置。例如,针对一条读指令,可以在将该读指令所请求的数据返回给数据请求方的情况下,由数据请求方返回所述通知消息。又例如,针对一条写指令,可以在将该写指令中携带的数据写入指定的数据接收方的情况下,由数据接收方返回所述通知消息。当然,实际情况不限于以上列举的方式。The information of each second instruction may be cached in the above-mentioned sent instruction queue 1012 . The second instruction refers to an instruction that has been successfully sent but not yet processed. It may be determined that the instruction has been processed after receiving a notification message returned for a second instruction indicating that the instruction has been processed, and the information of the second instruction is cleared in the sent instruction queue 1012 . The return condition of the notification message can be set based on actual conditions. For example, for a read command, the notification message may be returned by the data requester when the data requested by the read command is returned to the data requester. For another example, for a write command, when the data carried in the write command is written to the specified data receiver, the data receiver may return the notification message. Of course, the actual situation is not limited to the methods listed above.
第二指令的信息也可以包括地址信息和读写类型信息。第二指令中包括的地址信息与读写类型信息的含义可参见第一指令中包括的地址信息与读写类型信息的含义,此处不再赘述。已发送指令队列1012中的地址信息和读写类型信息可以对应缓存,即,将同一条第二指令的地址信息和读写类型信息作为一条信息进行缓存。The information of the second command may also include address information and read/write type information. For the meaning of the address information and the read/write type information included in the second instruction, please refer to the meaning of the address information and the read/write type information included in the first instruction, which will not be repeated here. The address information and read/write type information in the sent instruction queue 1012 may be cached correspondingly, that is, the address information and read/write type information of the same second instruction are cached as one piece of information.
检测单元1013可以分别将所述第一指令的信息与已发送指令队列1012中缓存的每条信息进行比较,从而确定所述第一指令是否与已发送的第二指令之间存在数据冒险。在第一指令与任意一条第二指令之间存在数据冒险的情况下,不对第一指令进行发送。只有在第一指令与任意一条第二指令之间均不存在数据冒险的情况下,才对第一指令进行发送。在存在数据冒险的情况下,检测单元1013可以在一定的时间间隔之后再次检测是否存在数据冒险,直到不存在数据冒险并将所述第一指令从所述指令队列1011中取出进行发送。其中,所述时间间隔可以是一个时钟周期,也可以是其他时长。The detecting unit 1013 may respectively compare the information of the first instruction with each piece of information cached in the sent instruction queue 1012, so as to determine whether there is a data hazard between the first instruction and the sent second instruction. If there is a data hazard between the first instruction and any second instruction, the first instruction is not sent. The first instruction is sent only when there is no data hazard between the first instruction and any second instruction. In the case of a data hazard, the detection unit 1013 may detect whether there is a data hazard again after a certain time interval until there is no data hazard and take the first instruction out of the instruction queue 1011 for transmission. Wherein, the time interval may be one clock cycle, or other durations.
下面结合图2对判断是否存在数据冒险的方式进行说明。检测单元1013可以基于读写类型信息和地址信息中的至少一者来判断是否存在数据冒险。具体来说,可以先判断已发送指令队列1012是否已存满(步骤201)。在所述已发送指令队列未存满的情况下,可以基于所述第一指令的读写类型信息、所述第二指令的读写类型信息、所述第一指令 的地址信息以及所述第二指令的地址信息检测所述第一指令是否满足指令发送条件。The manner of judging whether there is a data hazard will be described below with reference to FIG. 2 . The detection unit 1013 may determine whether there is a data hazard based on at least one of the read/write type information and the address information. Specifically, it may first be determined whether the sent instruction queue 1012 is full (step 201). When the sent instruction queue is not full, it may be based on the read/write type information of the first instruction, the read/write type information of the second instruction, the address information of the first instruction, and the second instruction. The address information of the second instruction detects whether the first instruction satisfies the instruction sending condition.
具体来说,在所述已发送指令队列1012中各第二指令的地址信息均不同于所述第一指令的地址信息的情况下,可以判定所述第一指令满足指令发送条件(步骤202),并发送第一指令(步骤207)。例如,假设已发送指令队列1012中包括的信息如表1所示:Specifically, in the case where the address information of each second instruction in the sent instruction queue 1012 is different from the address information of the first instruction, it may be determined that the first instruction satisfies the instruction sending condition (step 202) , and send the first instruction (step 207). For example, assume that the information included in the sent instruction queue 1012 is as shown in Table 1:
表1已发送指令队列Table 1 Sent command queue
信息所属指令information belongs to the instruction 地址信息Address information 读写类型信息Read and write type information
指令1instruction 1 A1A1 read
指令2 command 2 A2 A2 Write
指令3command 3 A2A2 read
……... ……... ……...
其中,指令1、指令2和指令3均为第二指令,已发送指令队列1012中包括的信息{A1,读}为指令1的信息,{A2,写}为指令2的信息,{A3,读}为指令3的信息。再假设所述第一指令中的地址信息为A3,A3不同于A1,且A3不同于A2。由于第一指令中的地址信息与已发送的各个第二指令中的地址信息均不相同,且不同地址之间的数据读写是相互独立的,因此,无论第一指令的读写数据类型与各第二指令的读写数据类型是怎样的,第一指令与任意一条第二指令之间均不存在数据冒险,即,第一指令满足指令发送条件,从而可以对第一指令进行发送。Wherein, instruction 1, instruction 2 and instruction 3 are the second instructions, the information {A1, read} included in the sent instruction queue 1012 is the information of instruction 1, {A2, write} is the information of instruction 2, {A3, Read} is the information of instruction 3. Assume further that the address information in the first instruction is A3, A3 is different from A1, and A3 is different from A2. Since the address information in the first command is different from the address information in the sent second commands, and the data read and write between different addresses are independent of each other, no matter the read and write data type of the first command and the What is the read-write data type of each second instruction? There is no data hazard between the first instruction and any second instruction, that is, the first instruction satisfies the instruction sending condition, so the first instruction can be sent.
仍假设已发送指令队列1012中包括的信息如表1所示,并假设所述第一指令中的地址信息为A1,由于第一指令与指令1针对的是同一地址,如果此时对第一指令进行发送,则在第一指令与指令1的读写类型信息不同的情况下,可能发生数据读写错误,即存在数据冒险。因此,在这种情况下需要同时结合读写类型信息与地址信息共同判断第一指令是否满足指令发送条件。Still assume that the information included in the sent instruction queue 1012 is as shown in Table 1, and assume that the address information in the first instruction is A1, since the first instruction and instruction 1 are directed to the same address, if the first instruction If the instruction is sent, if the read/write type information of the first instruction is different from that of instruction 1, a data read/write error may occur, that is, there is a data hazard. Therefore, in this case, it is necessary to combine the read-write type information and the address information to determine whether the first instruction satisfies the instruction sending condition.
在所述已发送指令队列1012中存在与所述第一指令的地址信息相同的目标地址信息的情况下,若所述已发送指令队列1012中存在与所述第一指令的读写类型信息不同的目标读写类型信息,且所述目标读写类型信息与所述目标地址信息属于同一条第二指令,可以判定所述第一指令不满足指令发送条件(步骤203),从而不发送第一指令(步骤206)。In the case that the sent instruction queue 1012 has the same target address information as the address information of the first instruction, if the sent instruction queue 1012 has the read/write type information different from the first instruction target read/write type information, and the target read/write type information and the target address information belong to the same second instruction, it can be determined that the first instruction does not meet the instruction sending condition (step 203), so that the first instruction is not sent instruction (step 206).
在所述已发送指令队列1012中存在与所述第一指令的地址信息相同的目标地址信息的情况下,若所述已发送指令队列1012中不存在与所述第一指令的读写类型信息不同的目标读写类型信息,其中,所述目标读写类型信息与所述目标地址信息属于同一条第二指令,可以判定所述第一指令满足指令发送条件(步骤202),并发送第一指令(步骤207)。In the case that there is the same target address information as the address information of the first instruction in the sent instruction queue 1012, if there is no read/write type information of the first instruction in the sent instruction queue 1012 different target read-write type information, wherein the target read-write type information and the target address information belong to the same second instruction, it can be determined that the first instruction satisfies the instruction sending condition (step 202), and send the first instruction (step 207).
例如,在上述实施例中,假设第一指令的读写类型信息为写,由于第一指令的地址信息与指令1的地址信息相同,且第一指令的读写类型信息与指令1的读写类型信息不同,因此存在数据冒险,第一指令不满足指令发送条件。假设第一指令的读写类型信息为读,则并不存在任何一条第二指令既满足该第二指令的地址信息与第一指令的地址信息相同,又满足该第二指令的读写类型信息与第一指令的读写类型信息不同,因此,不存在数据冒险,第一指令满足指令发送条件。For example, in the above embodiment, it is assumed that the read/write type information of the first instruction is write, since the address information of the first instruction is the same as the address information of instruction 1, and the read/write type information of the first instruction is the same as the read/write type information of instruction 1. The type information is different, so there is a data hazard, and the first command does not satisfy the command sending condition. Assuming that the read-write type information of the first instruction is read, there is no second instruction that satisfies the same address information of the second instruction as the address information of the first instruction and satisfies the read-write type information of the second instruction. Different from the read/write type information of the first instruction, therefore, there is no data hazard, and the first instruction satisfies the instruction sending condition.
此外,在所述已发送指令队列1012中各第二指令的读写类型信息均与所述第一指令的读写类型信息相同的情况下,可以判定所述第一指令满足指令发送条件。在一种可行的实现方式中,在已发送指令队列未存满的情况下,检测单元还可以采用下述方式判断第一指令是否满足指令发送条件,包括:在所述已发送指令队列中各所述第二指令的地址信息均不同于所述第一指令的地址信息的情况下,判定所述第一指令满足指令发送条 件;在所述已发送指令队列中存在地址信息与所述第一指令的地址信息相同的至少一条潜在冲突指令、但各所述潜在冲突指令的读写类型与所述第一指令的读写类型相同的情况下,判定所述第一指令满足所述指令发送条件;在所述已发送指令队列中存在至少一条所述潜在冲突指令、且所述至少一条潜在冲突指令中存在读写类型与所述第一指令的读写类型不同的至少一条冲突指令的情况下,判定所述第一指令不满足所述指令发送条件。In addition, when the read/write type information of each second instruction in the sent instruction queue 1012 is the same as the read/write type information of the first instruction, it may be determined that the first instruction satisfies the instruction sending condition. In a feasible implementation, when the sent instruction queue is not full, the detection unit may also use the following method to determine whether the first instruction satisfies the instruction sending condition, including: When the address information of the second instruction is different from the address information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition; the address information and the first instruction exist in the sent instruction queue. In the case of at least one potentially conflicting instruction with the same address information of the instruction, but the read/write type of each potential conflicting instruction is the same as the read/write type of the first instruction, it is determined that the first instruction satisfies the instruction sending condition ; In the case where there is at least one potentially conflicting instruction in the sent instruction queue, and there is at least one conflicting instruction whose read/write type is different from the read/write type of the first instruction in the at least one potentially conflicting instruction , determining that the first instruction does not satisfy the instruction sending condition.
在所述已发送指令队列1012存满的情况下,可以基于所述第一指令的读写类型信息以及所述第二指令的读写类型信息检测所述第一指令是否满足指令发送条件。具体来说,在所述已发送指令队列1012中各第二指令的读写类型信息均与所述第一指令的读写类型信息相同的情况下,判定所述第一指令满足指令发送条件(步骤204),并发送第一指令(步骤207)。When the sent instruction queue 1012 is full, it may be detected whether the first instruction satisfies the instruction sending condition based on the read/write type information of the first instruction and the read/write type information of the second instruction. Specifically, in the case where the read/write type information of each second instruction in the sent instruction queue 1012 is the same as the read/write type information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition ( Step 204), and send the first instruction (step 207).
例如,参见图4上方的(1),假设已发送指令队列1012中的各项读写类型信息均为写,且所述第一指令的读写类型信息也为写,无论各条写指令中的地址信息(地址1,地址2,……,地址n)是否相同,各条写指令之间都不存在数据冒险,因此,在这种情况下,可以直接对第一指令进行发送。同理,参见图4上方的(2),假设已发送指令队列1012中的各项读写类型信息均为读,且所述第一指令的读写类型信息也为读,也可以直接对第一指令进行发送。如果在已发送指令队列1012存满的情况下继续发送第一指令,则发送出去的指令数量超出了已发送指令队列1012的长度,这种情况可以称为指令超发。通过指令超发,能够提高连续的多条读写类型相同的指令的发送效率,进一步减少指令拥塞。For example, referring to (1) at the top of FIG. 4 , it is assumed that each item of read/write type information in the sent instruction queue 1012 is write, and the read/write type information of the first instruction is also write, regardless of whether each write instruction Whether the address information (address 1, address 2, . . . , address n) is the same, there is no data hazard between the various write instructions. Therefore, in this case, the first instruction can be sent directly. In the same way, referring to (2) at the top of FIG. 4 , assuming that all the read-write type information in the sent instruction queue 1012 is read, and the read-write type information of the first instruction is also read, it is also possible to directly write to the first instruction A command is sent. If the first instruction continues to be sent when the sent instruction queue 1012 is full, the number of sent instructions exceeds the length of the sent instruction queue 1012, and this situation may be called instruction overissue. Through instruction oversending, it is possible to improve the sending efficiency of multiple consecutive instructions of the same read and write type, and further reduce instruction congestion.
在所述已发送指令队列1012中存在至少一条第二指令(例如,第j条第二指令)的读写类型信息与所述第一指令的读写类型信息不同的情况下,判定所述第一指令不满足指令发送条件(步骤205),从而不发送第一指令(步骤206)。例如,已发送指令队列1012中的各项读写类型信息既包括读又包括写;或者,已发送指令队列1012中的各项读写类型信息均为读,但第一指令的读写类型信息为写;或者,已发送指令队列1012中的各项读写类型信息均为写,但第一指令的读写类型信息为读。在上述三种情况下,无论第一指令与第二指令中的地址信息是怎样的,均不对第一指令进行发送。When the read/write type information of at least one second instruction (for example, the jth second instruction) is different from the read/write type information of the first instruction in the sent instruction queue 1012, it is determined that the An instruction does not satisfy the instruction sending condition (step 205), so the first instruction is not sent (step 206). For example, each item of read-write type information in the sent instruction queue 1012 includes both read and write; or, each item of read-write type information in the sent instruction queue 1012 is read, but the read-write type information of the first instruction is write; or, all the read-write type information in the sent instruction queue 1012 is write, but the read-write type information of the first instruction is read. In the above three cases, regardless of the address information in the first command and the second command, the first command is not sent.
应说明的是,在进行指令超发时,已经发送且未被处理完的指令数量已经超出了已发送指令队列1012的长度。例如,已经发送且未被处理完的指令的数量为5,已发送指令队列1012中总共可缓存4条指令的信息(即已发送指令队列1012的长度为4)。这种情况下,在发送超发指令后无法将超发指令的信息存储到已发送指令队列1012,也不能基于已发送指令队列1012中的信息来确定是否存在数据冒险。例如,假设超发指令S0为针对地址A0的读指令,而指令S0的信息{A0,读}并未缓存在已发送指令队列1012中,并假设已发送指令队列1012中的各项信息如表2所示:It should be noted that, when over-issuing instructions, the number of instructions that have been sent but not yet processed has exceeded the length of the sent instruction queue 1012 . For example, the number of instructions that have been sent and have not been processed is 5, and the information of a total of 4 instructions can be cached in the sent instruction queue 1012 (that is, the length of the sent instruction queue 1012 is 4). In this case, the information of the super-issued command cannot be stored in the sent command queue 1012 after the super-issued command is sent, and whether there is a data hazard cannot be determined based on the information in the sent command queue 1012 . For example, assume that the super-issue instruction S0 is a read instruction for address A0, and the information {A0, read} of instruction S0 is not cached in the sent instruction queue 1012, and assume that the information in the sent instruction queue 1012 is as shown in the table 2 shows:
表2Table 2
地址信息Address information 读写类型信息Read and write type information
A1A1 read
A2A2 read
A3A3 read
在信息{A1,读}对应的第二指令处理完成的情况下,将信息{A1,读}从已发送指令队列1012中清除,此时已发送指令队列1012未存满。如果指令队列1011接收到一条信息为{A0,写}的指令Sk,则根据前述判断数据冒险的方式可知,由于已发送指令队列1012中不存在与指令Sk的地址信息相同的地址信息,因此,如果基于已发送指令队列1012中的信息会判定指令Sk满足指令发送条件。但实际上,由于已发送且未被处理 完的指令中存在一条与指令Sk读写类型信息不同且地址信息相同的指令S0,指令S0与指令Sk之间存在数据冒险,因此实际上指令Sk是不满足指令发送条件的。意思就是说,在指令超发的情况下,如果仍基于已发送指令队列1012中的信息判断第一指令是否满足指令发送条件,则可能得到错误的判断结果。When the processing of the second instruction corresponding to the information {A1, read} is completed, the information {A1, read} is cleared from the sent instruction queue 1012, and the sent instruction queue 1012 is not full at this time. If the instruction queue 1011 receives an instruction Sk whose information is {A0, write}, it can be known from the above-mentioned method of judging data hazard that the same address information as the address information of the instruction Sk does not exist in the sent instruction queue 1012, therefore, If it is determined based on the information in the sent command queue 1012 that the command Sk satisfies the command sending condition. But in fact, because there is an instruction S0 with different read and write type information and the same address information as the instruction Sk among the instructions that have been sent and have not been processed, there is a data hazard between the instruction S0 and the instruction Sk, so the actual instruction Sk is The conditions for sending the command are not met. That is to say, in the case of command oversending, if it is still judged based on the information in the sent command queue 1012 whether the first command satisfies the command sending condition, a wrong judgment result may be obtained.
为了提高指令超发情况下的判断准确性,可以在所述第二指令的总数超过所述已发送指令队列的长度的情况下,将所述已发送指令队列中的各项信息均置为无效。在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,可以基于已发送指令的统计信息检测所述第一指令是否满足指令发送条件。在这种情况下,只有已发送的各个第二指令均处理完成,才对所述第一指令进行发送;只要存在未处理完成的第二指令,就不对所述第一指令进行发送。本公开实施例中,可以在发送的所有第二指令均处理完成后,或者,在超发的所有指令均处理完成后,取消为已发送指令队列中各项信息的无效设置。In order to improve the judgment accuracy in the case of command oversending, when the total number of the second commands exceeds the length of the sent command queue, all items of information in the sent command queue can be set to invalid . In the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read/write type information is different from that of the first instruction, the first instruction may be detected based on the statistical information of the sent instruction. Whether the instruction satisfies the sending condition of the instruction. In this case, the first instruction is sent only when all the sent second instructions have been processed; as long as there are unprocessed second instructions, the first instruction is not sent. In the embodiment of the present disclosure, after all the second instructions sent are processed, or after all the super-issued instructions are processed, the invalid setting for each item of information in the sent instruction queue can be canceled.
在一些实施例中,所述电路还包括统计单元1014,用于统计以下信息:所述第二指令的总数;各个第二指令中已处理完成的指令的数量;以及各个第二指令的读写类型信息。检测单元1013可以在所述已发送指令队列1012中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,基于所述统计单元1014统计的信息检测所述第一指令是否满足指令发送条件。In some embodiments, the circuit further includes a statistical unit 1014, configured to count the following information: the total number of the second instructions; the number of instructions that have been processed in each second instruction; and the reading and writing of each second instruction type information. The detection unit 1013 may base on the statistics of the statistical unit 1014 in the case that all items of information in the sent command queue 1012 are invalid and there is a second command whose read/write type information is different from the first command. The information detects whether the first instruction satisfies an instruction sending condition.
检测单元1013可以先确定第一指令与已发送的各第二指令的读写类型信息是否相同。在读写类型信息不同的情况下,从统计单元1014获取已发送的第二指令的总数以及各个第二指令中已处理完成的指令的数量。只有在已发送的第二指令的总数与已处理完成的指令的数量相等的情况下,才判定第一指令满足指令发送条件,否则判定第一指令不满足指令发送条件。The detection unit 1013 may first determine whether the read/write type information of the first instruction is the same as that of the sent second instructions. In the case that the read and write type information is different, the total number of sent second instructions and the number of instructions processed in each second instruction are obtained from the statistics unit 1014 . Only when the total number of sent second instructions is equal to the number of processed instructions, it is determined that the first instruction satisfies the instruction sending condition; otherwise, it is judged that the first instruction does not meet the instruction sending condition.
在一些实施例中,不同的处理单元101用于处理不同的线程组发送的指令。其中,每个处理单元101可用于处理一个或多个线程组发送的指令,且不同的处理单元负责的线程组可以相同,也可以不同。例如,处理单元0用于处理线程组0发送的指令,处理单元1用于处理线程组1和线程组2发送的指令,处理单元2用于处理线程组3和线程组4发送的指令。为了将不同线程组发送的指令分发给对应的处理单元,所述电路还包括第一指令分发单元,用于接收各个线程组发送的第一指令,每个线程组发送的第一指令中均携带对应线程组的标识信息;分别基于各个线程组发送的第一指令中携带的标识信息,将各个线程组发送的第一指令分发至对应的处理单元101。标识信息例如线程组号。In some embodiments, different processing units 101 are used to process instructions sent by different thread groups. Wherein, each processing unit 101 may be used to process instructions sent by one or more thread groups, and the thread groups responsible for different processing units may be the same or different. For example, processing unit 0 is used to process instructions sent by thread group 0, processing unit 1 is used to process instructions sent by thread group 1 and thread group 2, and processing unit 2 is used to process instructions sent by thread group 3 and thread group 4. In order to distribute the instructions sent by different thread groups to corresponding processing units, the circuit further includes a first instruction distribution unit, configured to receive the first instructions sent by each thread group, and the first instructions sent by each thread group carry Corresponding to the identification information of the thread group; distributing the first instruction sent by each thread group to the corresponding processing unit 101 based on the identification information carried in the first instruction sent by each thread group respectively. Identification information such as thread group number.
在一些实施例中,不同的处理单元101发送的指令具有不同的优先级;所述电路还包括:指令仲裁单元103,用于接收各个处理单元101发送的第一指令,并基于各个处理单元101发送的第一指令的优先级,依次对各个处理单元101发送的第一指令进行发送。In some embodiments, instructions sent by different processing units 101 have different priorities; the circuit further includes: an instruction arbitration unit 103, configured to receive the first instruction sent by each processing unit 101, and based on each processing unit 101 The priority of the first instruction to be sent is to send the first instructions sent by each processing unit 101 in sequence.
在一些实施例中,所述第一指令中包括旁路信息的存储地址,所述旁路信息的存储地址下存储有所述第一指令对应的旁路信息;所述电路还包括:第二指令分发单元,用于对携带所述旁路信息的原始指令进行解耦,得到解耦后的原始指令以及所述旁路信息,将所述旁路信息存储到旁路信息的存储地址下,基于解耦后的原始指令与所述旁路信息的存储地址生成所述第一指令,并将所述第一指令下发至所述指令队列;总线控制单元,用于基于所述第一指令生成目标指令,并对所述目标指令进行发送。In some embodiments, the first instruction includes a storage address of the bypass information, and the bypass information corresponding to the first instruction is stored under the storage address of the bypass information; the circuit further includes: a second an instruction distribution unit, configured to decouple the original instruction carrying the bypass information, obtain the decoupled original instruction and the bypass information, and store the bypass information under the storage address of the bypass information, generating the first instruction based on the decoupled original instruction and the storage address of the bypass information, and sending the first instruction to the instruction queue; a bus control unit, configured to based on the first instruction Generate a target instruction, and send the target instruction.
应当说明的是,在上述实施例中,用于向各个处理单元101分发指令的第一指令分发单元与用于对原始指令进行解耦的第二指令分发单元可以是同一个指令分发单元102,但在实际应用中,也可以通过不同的指令分发单元分别执行指令分发与指令解耦。在一 种可行的实现方式中,原始指令可以由线程组发送。It should be noted that, in the above embodiments, the first instruction distribution unit for distributing instructions to each processing unit 101 and the second instruction distribution unit for decoupling original instructions may be the same instruction distribution unit 102, However, in practical applications, instruction distribution and instruction decoupling may also be performed separately through different instruction distribution units. In one possible implementation, raw instructions may be issued by thread groups.
原始指令中可能携带一些旁路信息,这些旁路信息与判断第一指令是否满足指令发送条件的过程无关。例如,针对写指令,旁路信息可以包括但不限于待写入的数据、用于标识待写入的数据的有效位的标识信息等。针对读指令,旁路信息可以包括但不限于数据读取的目标地址、存储所述目标地址的寄存器地址等。如果这些旁路信息始终携带在指令中,则每个处理单元都需要额外的存储空间来存储这些旁路信息,会增加数据处理电路的面积和功耗。因此,本实施例将旁路信息与指令进行解耦,对旁路信息单独进行存储,并基于指令中除旁路信息以外的部分进行数据冒险检测,从而降低电路的面积和功耗,同时能够降低crossbar(交叉开关矩阵)复杂度。只有在确定第一指令满足指令发送条件的情况下,才再次将旁路信息与第一指令进行合并,得到目标指令,并对目标指令进行发送。The original instruction may carry some bypass information, which has nothing to do with the process of judging whether the first instruction satisfies the instruction sending condition. For example, for a write instruction, the bypass information may include, but not limited to, data to be written, identification information for identifying valid bits of the data to be written, and the like. For a read instruction, the bypass information may include, but not limited to, a target address for data reading, an address of a register storing the target address, and the like. If the bypass information is always carried in the instruction, each processing unit needs additional storage space to store the bypass information, which will increase the area and power consumption of the data processing circuit. Therefore, in this embodiment, the bypass information is decoupled from the instruction, the bypass information is stored separately, and the data hazard detection is performed based on the part of the instruction except the bypass information, thereby reducing the area and power consumption of the circuit, and at the same time Reduce crossbar (crossbar matrix) complexity. Only when it is determined that the first instruction satisfies the instruction sending condition, the bypass information is combined with the first instruction again to obtain the target instruction, and the target instruction is sent.
参见图5A和图5B,首先,指令分发单元102在接收到包括旁路信息和指令信息的原始指令之后,可以从原始指令中提取出旁路信息,并将旁路信息发送至旁路信息存储单元。这里可以对读指令对应的旁路信息与写指令对应的旁路信息分别进行存储。其中,参见图3,写指令对应的旁路信息可以存储至第一存储单元105,读指令对应的旁路信息可以存储至第二存储单元106。旁路信息存储单元可以返回旁路信息的存储地址至指令分发单元102。对于写指令,所述旁路信息的存储地址为写指令对应的旁路信息在第一存储单元105中的存储地址;对于读指令,所述旁路信息的存储地址为读指令对应的旁路信息在第二存储单元106中的存储地址。指令分发单元102可以对指令信息与旁路信息的存储地址进行合并后生成第一指令,并将第一指令通过指令仲裁单元103发送至总线控制单元104。Referring to FIG. 5A and FIG. 5B, first, after receiving the original instruction including bypass information and instruction information, the instruction distribution unit 102 can extract the bypass information from the original instruction, and send the bypass information to the bypass information storage unit. Here, the bypass information corresponding to the read command and the bypass information corresponding to the write command may be stored separately. Wherein, referring to FIG. 3 , the bypass information corresponding to the write command may be stored in the first storage unit 105 , and the bypass information corresponding to the read command may be stored in the second storage unit 106 . The bypass information storage unit may return the storage address of the bypass information to the instruction dispatch unit 102 . For a write instruction, the storage address of the bypass information is the storage address of the bypass information corresponding to the write instruction in the first storage unit 105; for a read instruction, the storage address of the bypass information is the corresponding bypass information of the read instruction. The storage address of the information in the second storage unit 106 . The instruction distribution unit 102 may combine the storage addresses of the instruction information and the bypass information to generate a first instruction, and send the first instruction to the bus control unit 104 through the instruction arbitration unit 103 .
总线控制单元104可以生成最终的目标指令并进行发送。其中,针对写指令与读指令,总线控制单元104可以采用不同的方式生成目标指令。具体来说,针对写指令,由于待写入数据等旁路信息需要随指令一同发送至目标地址,以便将待写入数据写入目标地址,因此,总线控制单元104可以从所述写指令中提取写指令对应的第一旁路信息的存储地址,从所述第一旁路信息的存储地址中获取所述第一旁路信息,基于所述第一旁路信息与所述写指令的指令信息生成所述目标指令,并对所述目标指令进行发送,如图5A所示。针对读指令,存储待读取数据的数据存储单元并不需要知道待读取数据会被读取到哪里,因此,总线控制单元104可以直接将旁路信息的存储地址与读指令的指令信息作为目标指令进行发送,如图5B所示。The bus control unit 104 can generate the final target instruction and send it. Wherein, for the write command and the read command, the bus control unit 104 may generate the target command in different ways. Specifically, for the write command, since the bypass information such as the data to be written needs to be sent to the target address together with the command, so as to write the data to be written into the target address, the bus control unit 104 can obtain the data from the write command Extracting the storage address of the first bypass information corresponding to the write instruction, obtaining the first bypass information from the storage address of the first bypass information, based on the first bypass information and the instruction of the write instruction The information generates the target instruction and sends the target instruction, as shown in FIG. 5A . For the read command, the data storage unit that stores the data to be read does not need to know where the data to be read will be read. Therefore, the bus control unit 104 can directly use the storage address of the bypass information and the instruction information of the read command as The target command is sent, as shown in Figure 5B.
进一步地,针对读指令,总线控制单元104还可以接收所述读指令读取到的目标数据,所述目标数据中可以携带所述读指令对应的第二旁路信息的存储地址。总线控制单元104可以将目标数据写入所述读指令对应的第二旁路信息的存储地址,以便数据请求方(例如,寄存器)从第二旁路信息的存储地址中读取目标数据。Further, for the read command, the bus control unit 104 may also receive target data read by the read command, where the target data may carry a storage address of the second bypass information corresponding to the read command. The bus control unit 104 may write the target data into the storage address of the second bypass information corresponding to the read command, so that the data requester (for example, a register) can read the target data from the storage address of the second bypass information.
在一些实施例中,所述总线控制单元104还用于在所述第一指令处理完成的情况下,从所述旁路信息的存储地址中清除所述第一指令对应的旁路信息,以便用于存储旁路信息的存储单元能够空出存储空间来存储其他指令的旁路信息。其中,在读指令处理完成的情况下,可以从读指令对应的旁路信息的存储地址中清除该读指令对应的旁路信息;在写指令处理完成的情况下,可以从写指令对应的旁路信息的存储地址中清除该写指令对应的旁路信息。In some embodiments, the bus control unit 104 is further configured to clear the bypass information corresponding to the first instruction from the storage address of the bypass information when the processing of the first instruction is completed, so that The storage unit for storing bypass information can free up storage space to store bypass information of other instructions. Wherein, when the processing of the read instruction is completed, the bypass information corresponding to the read instruction can be cleared from the storage address of the bypass information corresponding to the read instruction; Clear the bypass information corresponding to the write command in the storage address of the information.
在一些实施例中,在所述第一指令发送成功的情况下,将所述第一指令的信息写入所述已发送指令队列。参见图6,在指令发送过程中,假设在时钟周期T1,已发送指令队列中包括指令1的信息、指令2的信息和指令3的信息且已发送指令队列未存满;指令队列中包括指令4、指令5和指令6。则可以提取指令队列中处于最前端的指令(即 指令4)的信息,并基于指令4的信息与指令1、指令2和指令3的信息检测指令4是否满足指令发送条件。如果满足,则对指令4进行发送。在指令4发送成功的情况下,在时钟周期T2,将指令4的信息存入已发送指令队列。在时钟周期T3,还可以从已发送指令队列中清除已处理完成的指令的信息(假设为指令1的信息)。应当说明的是,时钟周期T2可以在时钟周期T3之前,也可以在时钟周期T3之后,本公开对此不作限制。此外,先发送的指令被处理完成的时间可以早于或晚于后发送的指令被处理完成的时间,即,各指令的信息存入已发送指令队列的顺序与各指令的信息从已发送指令队列中清除的顺序并不一定相同。In some embodiments, if the first instruction is sent successfully, write the information of the first instruction into the sent instruction queue. Referring to Fig. 6, in the instruction sending process, assume that in the clock cycle T1, the sent instruction queue includes the information of instruction 1, the information of instruction 2 and the information of instruction 3 and the sent instruction queue is not full; the instruction queue includes the instruction 4. Instruction 5 and Instruction 6. Then it is possible to extract the information of the command at the forefront in the command queue (that is, command 4), and based on the information of command 4 and the information of command 1, command 2 and command 3, check whether command 4 satisfies the command sending condition. If satisfied, send instruction 4. If the command 4 is sent successfully, the information of the command 4 is stored in the sent command queue in the clock cycle T2. In the clock cycle T3, the information of the processed instruction (assumed to be the information of instruction 1) can also be cleared from the sent instruction queue. It should be noted that the clock period T2 may be before the clock period T3 or after the clock period T3, which is not limited in the present disclosure. In addition, the processing completion time of the instruction sent earlier may be earlier or later than the processing completion time of the instruction sent later, that is, the order in which the information of each instruction is stored in the sent instruction queue is the same as the information of each instruction from the sent instruction The order in which the queues are cleared is not necessarily the same.
在一些实施例中,总线控制单元104还用于从所述已发送指令队列中清空已处理完成的第二指令的信息。已发送指令队列1012可以将缓存的第二指令的信息和/或该信息的缓存地址发送至总线控制单元104。这样,在该信息所属的第二指令处理完成的情况下,总线控制单元104可以向已发送指令队列1012发送一个使能信号,所述使能信号中可携带例如处理完成的第二指令的信息在已发送指令队列1012中的缓存地址,以使得已发送指令队列1012可以响应于该使能信号清除相应缓存地址中的信息。In some embodiments, the bus control unit 104 is further configured to clear the information of the processed second instruction from the sent instruction queue. The sent instruction queue 1012 may send the cached information of the second instruction and/or the cached address of the information to the bus control unit 104 . In this way, when the processing of the second instruction to which the information belongs is completed, the bus control unit 104 may send an enable signal to the sent instruction queue 1012, and the enable signal may carry information such as the second instruction whose processing is completed cache address in the sent instruction queue 1012, so that the sent instruction queue 1012 can clear the information in the corresponding cache address in response to the enable signal.
参见图7,是本公开实施例的整体流程图。该流程可通过图3所示的电路实现。首先,可由指令分发单元102依据线程组号向各处理单元101中的指令队列1011分发指令(S1)。指令分发单元102还可以对指令中的指令信息与旁路信息进行解耦,解耦后的旁路信息存入旁路信息的存储地址(S2)。将该存储地址与指令信息进行合并,生成第一指令。检测单元1013可以基于第一指令的信息与已发送指令队列中所存储的各第二指令的信息,判断是否存在数据冒险(即第一指令是否满足指令发送条件),或者基于统计单元1014统计的信息判断是否存在数据冒险(S3)。如果存在数据冒险,则仍然将第一指令缓存在指令队列1011中,并周期性地重新判断是否存在数据冒险(S4)。如果不存在数据冒险,则从指令队列1011中取出第一指令并发送至指令仲裁单元103(S5)。各个处理单元101的处理流程相同,此处不再一一描述。Referring to FIG. 7 , it is an overall flowchart of an embodiment of the present disclosure. This process can be realized by the circuit shown in Fig. 3 . First, the instruction distribution unit 102 may distribute instructions to the instruction queues 1011 in each processing unit 101 according to the thread group number (S1). The instruction distribution unit 102 may also decouple the instruction information and the bypass information in the instruction, and store the decoupled bypass information into the storage address of the bypass information (S2). The storage address and instruction information are combined to generate a first instruction. The detecting unit 1013 can judge whether there is a data hazard (that is, whether the first instruction satisfies the instruction sending condition) based on the information of the first instruction and the information of the second instructions stored in the sent instruction queue, or based on the statistics of the statistics unit 1014 The information judges whether there is a data risk (S3). If there is a data hazard, the first instruction is still cached in the instruction queue 1011, and periodically re-judged whether there is a data hazard (S4). If there is no data hazard, the first instruction is taken from the instruction queue 1011 and sent to the instruction arbitration unit 103 (S5). The processing flow of each processing unit 101 is the same, and will not be described one by one here.
指令仲裁单元103可以按照各处理单元101所发送的第一指令的优先级,依次将各个第一指令发送至总线控制单元104(S6)。总线控制单元104可以基于从指令仲裁单元103接收到的第一指令生成目标指令,并将目标指令发送到对应的目标地址(S7)。The instruction arbitration unit 103 may sequentially send each first instruction to the bus control unit 104 according to the priorities of the first instructions sent by each processing unit 101 ( S6 ). The bus control unit 104 may generate a target command based on the first command received from the command arbitration unit 103, and send the target command to a corresponding target address (S7).
此外,总线控制单元104还可以在接收到针对某个第二指令的处理完成信息的情况下,清除已发送指令队列1012中该第二指令对应的信息(S8),以及清除旁路信息(S9)。检测单元1013还可以在第一指令通过总线控制单元104成功发送的情况下,将第一指令的信息写入已发送指令队列1012(S10)。In addition, the bus control unit 104 may also clear the information corresponding to the second command in the sent command queue 1012 (S8), and clear the bypass information (S9) when receiving the processing completion information for a certain second command. ). The detection unit 1013 may also write the information of the first instruction into the sent instruction queue 1012 when the first instruction is successfully sent through the bus control unit 104 ( S10 ).
上述各步骤的执行顺序并不限于图中所示的顺序,例如,步骤S8与S9的顺序可以互换,步骤S10与步骤S8或S9的顺序可以互换等。The execution order of the above steps is not limited to the order shown in the figure, for example, the order of steps S8 and S9 can be interchanged, the order of step S10 and step S8 or S9 can be interchanged, etc.
本公开实施例中,多处理单元并行处理数据冒险。当某些处理单元出现数据冒险时,其它无数据冒险的处理单元依然可以发送指令。此外,本公开允许连续相同读写类型的指令超发。指令超发可以有效提升系统访存性能。可以利用本公开的方法实现高效的多处理单元处理数据冒险,提升系统的访存性能,利用本公开方法的扩展性及变形,可以降低功耗和降低crossbar复杂度。In the embodiment of the present disclosure, multiple processing units process data hazards in parallel. When some processing units have data hazards, other processing units without data hazards can still send instructions. In addition, the present disclosure allows consecutive super-issuance of instructions of the same read and write type. Instruction over-issuance can effectively improve system memory access performance. The disclosed method can be used to realize efficient multi-processing unit processing data adventure, improve the memory access performance of the system, and use the scalability and deformation of the disclosed method to reduce power consumption and reduce crossbar complexity.
如图8所示,本公开所述还提供一种人工智能芯片,包括:数据处理电路801;以及控制单元802,用于向所述数据处理电路801发送指令。As shown in FIG. 8 , the present disclosure also provides an artificial intelligence chip, including: a data processing circuit 801 ; and a control unit 802 configured to send instructions to the data processing circuit 801 .
所述数据处理电路801可以采用本公开任一实施例所述的数据处理电路。本实施例中数据处理电路801的具体细节详见前述实施例,此处不再赘述。The data processing circuit 801 may adopt the data processing circuit described in any embodiment of the present disclosure. For details of the data processing circuit 801 in this embodiment, refer to the foregoing embodiments for details, and details are not repeated here.
参见图9,本公开实施例还提供一种数据处理方法,应用于本公开任一实施例所述 的数据处理电路中的每个处理单元所包括的检测单元,所述方法包括:Referring to Fig. 9, an embodiment of the present disclosure also provides a data processing method, which is applied to the detection unit included in each processing unit in the data processing circuit described in any embodiment of the present disclosure, the method comprising:
步骤901:基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息,检测所述第一指令是否满足指令发送条件;Step 901: Based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit, detect whether the first instruction satisfies instruction sending condition;
步骤902:在第一指令满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送;Step 902: If the first instruction satisfies the instruction sending condition, take the first instruction from the instruction queue and send it;
所述信息包括读写类型信息与地址信息中的至少一者。The information includes at least one of read-write type information and address information.
可选地,所述基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息检测所述第一指令是否满足指令发送条件,包括:在所述已发送指令队列未存满的情况下,基于所述第一指令的读写类型信息、所述第二指令的读写类型信息、所述第一指令的地址信息以及所述第二指令的地址信息检测所述第一指令是否满足指令发送条件;在所述已发送指令队列存满的情况下,基于所述第一指令的读写类型信息以及所述第二指令的读写类型信息检测所述第一指令是否满足指令发送条件。Optionally, the detecting based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit whether the first instruction satisfies Instruction sending conditions, including: when the sent instruction queue is not full, based on the read/write type information of the first instruction, the read/write type information of the second instruction, and the address of the first instruction information and the address information of the second instruction to detect whether the first instruction satisfies the instruction sending condition; when the sent instruction queue is full, based on the read/write type information of the first instruction and the second instruction The read/write type information of the second instruction detects whether the first instruction satisfies the instruction sending condition.
可选地,在所述已发送指令队列未存满的情况下,所述基于所述第一指令的读写类型信息、所述第二指令的读写类型信息、所述第一指令的地址信息以及所述第二指令的地址信息检测所述第一指令是否满足指令发送条件,包括:在所述已发送指令队列中各第二指令的地址信息均不同于所述第一指令的地址信息的情况下,判定所述第一指令满足指令发送条件;在所述已发送指令队列中存在地址信息与所述第一指令的地址信息相同的至少一条潜在冲突指令、但各所述潜在冲突指令的读写类型与所述第一指令的读写类型相同的情况下,判定所述第一指令满足所述指令发送条件;在所述已发送指令队列中存在至少一条所述潜在冲突指令、且所述至少一条潜在冲突指令中存在读写类型与所述第一指令的读写类型不同的至少一条冲突指令的情况下,判定所述第一指令不满足所述指令发送条件。Optionally, when the sent instruction queue is not full, the read/write type information based on the first instruction, the read/write type information of the second instruction, the address of the first instruction information and the address information of the second instruction to detect whether the first instruction satisfies the instruction sending condition, including: the address information of each second instruction in the sent instruction queue is different from the address information of the first instruction In the case of , it is determined that the first instruction satisfies the instruction sending condition; there is at least one potential conflicting instruction whose address information is the same as the address information of the first instruction in the sent instruction queue, but each of the potential conflicting instructions When the read/write type of the first instruction is the same as the read/write type of the first instruction, it is determined that the first instruction satisfies the instruction sending condition; there is at least one of the potential conflict instructions in the sent instruction queue, and If there is at least one conflicting instruction whose read/write type is different from that of the first instruction among the at least one potential conflicting instruction, it is determined that the first instruction does not satisfy the instruction sending condition.
可选地,在所述已发送指令队列存满的情况下,所述基于所述第一指令的读写类型信息以及所述第二指令的读写类型信息检测所述第一指令是否满足指令发送条件,包括:在所述已发送指令队列中存在至少一条第二指令的读写类型信息与所述第一指令的读写类型信息不同的情况下,判定所述第一指令不满足指令发送条件;在所述已发送指令队列中各第二指令的读写类型信息均与所述第一指令的读写类型信息相同的情况下,判定所述第一指令满足指令发送条件。Optionally, when the sent instruction queue is full, the detection based on the read/write type information of the first instruction and the read/write type information of the second instruction whether the first instruction satisfies the instruction The sending condition includes: in the case that there is at least one second instruction in the sent instruction queue whose read/write type information is different from the read/write type information of the first instruction, determining that the first instruction does not satisfy instruction sending Condition: when the read/write type information of each second instruction in the sent instruction queue is the same as the read/write type information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition.
可选地,不同的处理单元用于处理不同的线程组发送的指令。Optionally, different processing units are used to process instructions sent by different thread groups.
可选地,所述电路还包括统计单元,用于统计以下信息:所述第二指令的总数;各个第二指令中已处理完成的指令的数量;以及各个第二指令的读写类型信息;所述基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息检测所述第一指令是否满足指令发送条件,包括:在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,基于所述统计单元统计的信息检测所述第一指令是否满足指令发送条件。Optionally, the circuit further includes a statistical unit for counting the following information: the total number of the second instructions; the number of processed instructions in each second instruction; and the read/write type information of each second instruction; Detecting whether the first instruction satisfies the instruction sending condition based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit, It includes: in the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read-write type information is different from that of the first instruction, detecting the Whether the first instruction satisfies the instruction sending condition.
可选地,所述基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息检测所述第一指令是否满足指令发送条件,包括:在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,若各第二指令均处理完成,判定所述第一指令满足指令发送条件。Optionally, the detecting based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit whether the first instruction satisfies Instruction sending conditions include: when all items of information in the sent instruction queue are invalid, and there are second instructions whose read-write type information is different from the first instruction, if each second instruction is processed Completed, it is determined that the first instruction satisfies the instruction sending condition.
可选地,所述第一指令的信息在所述第一指令发送成功的情况下被写入所述已发送指令队列。Optionally, the information of the first instruction is written into the sent instruction queue when the first instruction is sent successfully.
上述方法实施例的细节详见前述数据处理电路的实施例,此处不再赘述。For details of the foregoing method embodiments, refer to the aforementioned embodiments of the data processing circuit, and details are not repeated here.
参见图10,本公开实施例还提供一种数据处理装置,应用于本公开任一实施例所述的数据处理电路中每个处理单元所包括的检测单元,所述装置包括:Referring to FIG. 10 , an embodiment of the present disclosure also provides a data processing device, which is applied to the detection unit included in each processing unit in the data processing circuit described in any embodiment of the present disclosure, and the device includes:
检测模块1001,用于基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条第二指令的信息检测所述第一指令是否满足指令发送条件;A detection module 1001, configured to detect whether the first instruction satisfies the requirements based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit. Instruction sending condition;
发送模块1002,用于在满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送;A sending module 1002, configured to take out the first instruction from the instruction queue and send it when the instruction sending condition is satisfied;
所述信息包括读写类型信息与地址信息中的至少一者。The information includes at least one of read-write type information and address information.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence of the technical solutions of the embodiments of this specification or the part that contributes to related technologies can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this specification.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The systems, devices, modules, or units described in the above embodiments can be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。The above is only the specific implementation of the embodiment of this specification. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the embodiment of this specification, some improvements and modifications can also be made. These Improvements and modifications should also be regarded as the scope of protection of the embodiments of this specification.

Claims (17)

  1. 一种数据处理电路,其特征在于,所述电路包括多个处理单元,所述多个处理单元中的每个处理单元均包括:A data processing circuit, characterized in that the circuit includes a plurality of processing units, and each processing unit in the plurality of processing units includes:
    指令队列,用于对接收到的第一指令进行缓存;an instruction queue, configured to cache the received first instruction;
    已发送指令队列,用于缓存已发送的至少一条第二指令中每条所述第二指令的信息,其中,所述信息包括读写类型信息与地址信息中的至少一者;A sent instruction queue, configured to cache information of each of the at least one second instruction that has been sent, wherein the information includes at least one of read-write type information and address information;
    检测单元,用于基于所述第一指令的信息与每条所述第二指令的信息检测所述第一指令是否满足指令发送条件,并在满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送。A detection unit, configured to detect whether the first instruction satisfies the instruction sending condition based on the information of the first instruction and the information of each second instruction, and if the instruction sending condition is satisfied, from the The first instruction is taken out from the instruction queue and sent.
  2. 根据权利要求1所述的电路,其特征在于,所述检测单元具体用于:The circuit according to claim 1, wherein the detection unit is specifically used for:
    在所述已发送指令队列未存满的情况下,基于所述第一指令的读写类型信息、所述第二指令的读写类型信息、所述第一指令的地址信息以及所述第二指令的地址信息检测所述第一指令是否满足所述指令发送条件;If the sent instruction queue is not full, based on the read/write type information of the first instruction, the read/write type information of the second instruction, the address information of the first instruction, and the second instruction The address information of the instruction detects whether the first instruction satisfies the instruction sending condition;
    在所述已发送指令队列存满的情况下,基于所述第一指令的读写类型信息以及所述第二指令的读写类型信息检测所述第一指令是否满足所述指令发送条件。When the sent instruction queue is full, it is detected whether the first instruction satisfies the instruction sending condition based on the read/write type information of the first instruction and the read/write type information of the second instruction.
  3. 根据权利要求2所述的电路,其特征在于,在所述已发送指令队列未存满的情况下,所述检测单元具体用于:The circuit according to claim 2, wherein when the sent instruction queue is not full, the detection unit is specifically used for:
    在所述已发送指令队列中各所述第二指令的地址信息均不同于所述第一指令的地址信息的情况下,判定所述第一指令满足所述指令发送条件;When the address information of each of the second instructions in the sent instruction queue is different from the address information of the first instruction, determining that the first instruction satisfies the instruction sending condition;
    在所述已发送指令队列中存在地址信息与所述第一指令的地址信息相同的至少一条潜在冲突指令、但各所述潜在冲突指令的读写类型与所述第一指令的读写类型相同的情况下,判定所述第一指令满足所述指令发送条件;There is at least one potential conflict instruction whose address information is the same as that of the first instruction in the sent instruction queue, but the read/write type of each of the potential conflict instructions is the same as the read/write type of the first instruction In the case of , it is determined that the first instruction satisfies the instruction sending condition;
    在所述已发送指令队列中存在至少一条所述潜在冲突指令、且所述至少一条潜在冲突指令中存在读写类型与所述第一指令的读写类型不同的至少一条冲突指令的情况下,判定所述第一指令不满足所述指令发送条件。In the case where there is at least one potentially conflicting instruction in the sent instruction queue, and there is at least one conflicting instruction whose read/write type is different from the read/write type of the first instruction in the at least one potentially conflicting instruction, It is determined that the first instruction does not satisfy the instruction sending condition.
  4. 根据权利要求2或3所述的电路,其特征在于,在所述已发送指令队列存满的情况下,所述检测单元具体用于:The circuit according to claim 2 or 3, wherein when the sent instruction queue is full, the detection unit is specifically configured to:
    在所述已发送指令队列中存在至少一条所述第二指令的读写类型信息与所述第一指令的读写类型信息不同的情况下,判定所述第一指令不满足所述指令发送条件;If there is at least one second instruction in the sent instruction queue whose read/write type information is different from that of the first instruction, it is determined that the first instruction does not satisfy the instruction sending condition ;
    在所述已发送指令队列中各所述第二指令的读写类型信息均与所述第一指令的读写类型信息相同的情况下,判定所述第一指令满足所述指令发送条件。If the read/write type information of each second instruction in the sent instruction queue is the same as the read/write type information of the first instruction, it is determined that the first instruction satisfies the instruction sending condition.
  5. 根据权利要求1至4任意一项所述的电路,其特征在于,不同的处理单元用于处理不同的线程组发送的指令,所述电路还包括第一指令分发单元,用于:The circuit according to any one of claims 1 to 4, wherein different processing units are used to process instructions sent by different thread groups, and the circuit also includes a first instruction distribution unit, configured to:
    接收各个线程组发送的第一指令,每个线程组发送的第一指令中均携带对应线程组的标识信息;receiving the first instruction sent by each thread group, the first instruction sent by each thread group carries the identification information of the corresponding thread group;
    分别基于各个线程组发送的第一指令中携带的标识信息,将各个线程组发送的第一指令分发至对应的处理单元。The first instructions sent by each thread group are distributed to corresponding processing units based on the identification information carried in the first instructions sent by each thread group respectively.
  6. 根据权利要求1至5任意一项所述的电路,其特征在于,所述电路还包括:The circuit according to any one of claims 1 to 5, wherein the circuit further comprises:
    指令仲裁单元,用于接收各个所述处理单元发送的所述第一指令,并基于各个所述处理单元发送的所述第一指令的优先级,依次对接收到的所述第一指令进行发送。An instruction arbitration unit, configured to receive the first instructions sent by each of the processing units, and based on the priority of the first instructions sent by each of the processing units, sequentially send the received first instructions .
  7. 根据权利要求1至6任意一项所述的电路,其特征在于,所述电路还包括:The circuit according to any one of claims 1 to 6, wherein the circuit further comprises:
    第二指令分发单元,用于对携带旁路信息的原始指令进行解耦,得到解耦后的原始指令以及所述旁路信息,将所述旁路信息存储到旁路信息的存储地址下,基于解耦后的原始指令与所述旁路信息的存储地址生成所述第一指令,并将所述第一指令下发至所述处理单元中的所述指令队列;The second instruction distribution unit is configured to decouple the original instruction carrying the bypass information, obtain the decoupled original instruction and the bypass information, and store the bypass information under the storage address of the bypass information, generating the first instruction based on the decoupled original instruction and the storage address of the bypass information, and sending the first instruction to the instruction queue in the processing unit;
    总线控制单元,用于在所述第一指令满足所述指令发送条件的情况下,基于所述 第一指令和所述旁路信息的存储地址生成目标指令,并对所述目标指令进行发送。A bus control unit, configured to generate a target command based on the first command and the storage address of the bypass information when the first command satisfies the command sending condition, and send the target command.
  8. 根据权利要求7所述的电路,其特征在于,所述总线控制单元还用于:The circuit according to claim 7, wherein the bus control unit is also used for:
    从所述已发送指令队列中清空已处理完成的第二指令的信息;clearing the information of the processed second instruction from the sent instruction queue;
    和/或and / or
    在所述第一指令处理完成的情况下,从所述旁路信息的存储地址中清除所述第一指令对应的旁路信息。When the processing of the first instruction is completed, the bypass information corresponding to the first instruction is cleared from the storage address of the bypass information.
  9. 根据权利要求7或8所述的电路,其特征在于,所述第一指令包括写指令,所述旁路信息包括所述写指令对应的第一旁路信息;所述电路还包括用于存储所述第一旁路信息的第一存储单元;The circuit according to claim 7 or 8, wherein the first instruction includes a write instruction, and the bypass information includes first bypass information corresponding to the write instruction; a first storage unit of the first bypass information;
    和/或and / or
    所述第一指令包括读指令,所述旁路信息包括所述读指令对应的第二旁路信息;所述电路还包括用于存储所述第二旁路信息的第二存储单元。The first instruction includes a read instruction, and the bypass information includes second bypass information corresponding to the read instruction; the circuit further includes a second storage unit for storing the second bypass information.
  10. 根据权利要求9所述的电路,其特征在于,所述总线控制单元还用于:The circuit according to claim 9, wherein the bus control unit is also used for:
    接收所述读指令读取到的目标数据,所述目标数据中携带所述第二旁路信息的存储地址;receiving the target data read by the read instruction, the target data carrying the storage address of the second bypass information;
    将所述目标数据写入所述第二存储单元中所述第二旁路信息的存储地址中。writing the target data into a storage address of the second bypass information in the second storage unit.
  11. 根据权利要求7至10任意一项所述的电路,其特征在于,在所述第二指令的总数超过所述已发送指令队列的长度的情况下,将所述已发送指令队列中的各项信息均置为无效。The circuit according to any one of claims 7 to 10, wherein when the total number of the second instructions exceeds the length of the sent instruction queue, the items in the sent instruction queue Information is invalidated.
  12. 根据权利要求11所述的电路,其特征在于,所述电路还包括统计单元,用于统计以下信息:The circuit according to claim 11, characterized in that the circuit also includes a statistical unit for counting the following information:
    所述第二指令的总数;the total number of said second instructions;
    各个第二指令中已处理完成的指令的数量;以及the number of completed orders in each second order; and
    各个第二指令的读写类型信息;Read and write type information of each second instruction;
    所述检测单元用于:在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,基于所述统计单元统计的信息检测所述第一指令是否满足所述指令发送条件。The detection unit is configured to: in the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read/write type information is different from the first instruction, based on the statistics of the statistical unit The information is used to detect whether the first instruction satisfies the instruction sending condition.
  13. 根据权利要求12所述的电路,其特征在于,所述检测单元还用于:The circuit according to claim 12, wherein the detection unit is also used for:
    在所述已发送指令队列中的各项信息均为无效,且存在读写类型信息与所述第一指令不同的第二指令的情况下,若各所述第二指令均处理完成,判定所述第一指令满足所述指令发送条件。In the case that all items of information in the sent instruction queue are invalid, and there is a second instruction whose read-write type information is different from the first instruction, if all the second instructions are processed, it is determined that the The first instruction satisfies the instruction sending condition.
  14. 根据权利要求1至13任意一项所述的电路,其特征在于,所述第一指令的信息在所述第一指令发送成功的情况下被写入所述已发送指令队列。The circuit according to any one of claims 1 to 13, wherein the information of the first instruction is written into the sent instruction queue when the first instruction is sent successfully.
  15. 一种人工智能芯片,其特征在于,包括:An artificial intelligence chip is characterized in that, comprising:
    权利要求1至14任意一项所述的数据处理电路;以及The data processing circuit according to any one of claims 1 to 14; and
    控制单元,用于向所述数据处理电路发送指令。A control unit, configured to send instructions to the data processing circuit.
  16. 一种数据处理方法,其特征在于,应用于权利要求1至14任意一项所述的数据处理电路中的每个处理单元所包括的检测单元,所述方法包括:A data processing method, characterized in that it is applied to the detection unit included in each processing unit in the data processing circuit according to any one of claims 1 to 14, the method comprising:
    基于所述处理单元对应的指令队列中所述第一指令的信息与所述处理单元对应的已发送指令队列中每条所述第二指令的信息,检测所述第一指令是否满足所述指令发送条件,其中,所述信息包括读写类型信息与地址信息中的至少一者;Detecting whether the first instruction satisfies the instruction based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit Sending conditions, wherein the information includes at least one of read-write type information and address information;
    在所述第一指令满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送。If the first instruction satisfies the instruction sending condition, the first instruction is taken out from the instruction queue and sent.
  17. 一种数据处理装置,其特征在于,应用于权利要求1至14任意一项所述的数据处理电路中每个处理单元所包括的检测单元,所述装置包括:A data processing device, characterized in that it is applied to the detection unit included in each processing unit in the data processing circuit according to any one of claims 1 to 14, the device comprising:
    检测单元,用于基于所述处理单元对应的指令队列中所述第一指令的信息与所述 处理单元对应的已发送指令队列中每条所述第二指令的信息,检测所述第一指令是否满足所述指令发送条件,其中,所述信息包括读写类型信息与地址信息中的至少一者;A detection unit, configured to detect the first instruction based on the information of the first instruction in the instruction queue corresponding to the processing unit and the information of each second instruction in the sent instruction queue corresponding to the processing unit Whether the instruction sending condition is satisfied, wherein the information includes at least one of read-write type information and address information;
    发送单元,用于在所述第一指令满足所述指令发送条件的情况下,从所述指令队列中取出所述第一指令进行发送。A sending unit, configured to take out the first instruction from the instruction queue and send it if the first instruction satisfies the instruction sending condition.
PCT/CN2022/124509 2021-11-29 2022-10-11 Data processing circuit, artificial intelligence chip, and data processing method and apparatus WO2023093335A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111435830.3A CN114091384A (en) 2021-11-29 2021-11-29 Data processing circuit, artificial intelligence chip, data processing method and device
CN202111435830.3 2021-11-29

Publications (1)

Publication Number Publication Date
WO2023093335A1 true WO2023093335A1 (en) 2023-06-01

Family

ID=80305502

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124509 WO2023093335A1 (en) 2021-11-29 2022-10-11 Data processing circuit, artificial intelligence chip, and data processing method and apparatus

Country Status (2)

Country Link
CN (1) CN114091384A (en)
WO (1) WO2023093335A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091384A (en) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 Data processing circuit, artificial intelligence chip, data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160378715A1 (en) * 2015-06-26 2016-12-29 Intel Corporation Hardware processors and methods for tightly-coupled heterogeneous computing
US20190220283A1 (en) * 2006-09-29 2019-07-18 Arm Finance Overseas Limited Load/store unit for a processor, and applications thereof
CN113326066A (en) * 2021-04-13 2021-08-31 腾讯科技(深圳)有限公司 Quantum control microarchitecture, quantum control processor and instruction execution method
CN114091384A (en) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 Data processing circuit, artificial intelligence chip, data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190220283A1 (en) * 2006-09-29 2019-07-18 Arm Finance Overseas Limited Load/store unit for a processor, and applications thereof
US20160378715A1 (en) * 2015-06-26 2016-12-29 Intel Corporation Hardware processors and methods for tightly-coupled heterogeneous computing
CN113326066A (en) * 2021-04-13 2021-08-31 腾讯科技(深圳)有限公司 Quantum control microarchitecture, quantum control processor and instruction execution method
CN114091384A (en) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 Data processing circuit, artificial intelligence chip, data processing method and device

Also Published As

Publication number Publication date
CN114091384A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
JP4866646B2 (en) How to select commands to send to memory, memory controller, computer system
US7222224B2 (en) System and method for improving performance in computer memory systems supporting multiple memory access latencies
US8977811B2 (en) Scalable schedulers for memory controllers
JP7257401B2 (en) Dynamic refresh per bank and all banks
US9141568B2 (en) Proportional memory operation throttling
KR101270848B1 (en) Multi-ported memory controller with ports associated with traffic classes
US10783104B2 (en) Memory request management system
US9280290B2 (en) Method for steering DMA write requests to cache memory
KR102402630B1 (en) Cache Control Aware Memory Controller
CN102662868A (en) Dynamic group association cache device for processor and access method thereof
US9632954B2 (en) Memory queue handling techniques for reducing impact of high-latency memory operations
US11568907B2 (en) Data bus and buffer management in memory device for performing in-memory data operations
US10152434B2 (en) Efficient arbitration for memory accesses
US11474942B2 (en) Supporting responses for memory types with non-uniform latencies on same channel
WO2023093335A1 (en) Data processing circuit, artificial intelligence chip, and data processing method and apparatus
US10157123B1 (en) Methods and apparatus for a scheduler for memory access
US10545887B2 (en) Multiple linked list data structure
US20120221831A1 (en) Accessing Common Registers In A Multi-Core Processor
US11636056B1 (en) Hierarchical arbitration structure
US6961800B2 (en) Method for improving processor performance
US11360701B1 (en) Memory and storage controller with integrated memory coherency interconnect
US20140052941A1 (en) Calculation processing device and control method for calculation processing device
CN112965816B (en) Memory management technology and computer system
US20230197130A1 (en) Apparatus and methods employing asynchronous fifo buffer with read prediction
US20230161506A1 (en) Multiple host memory controller

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897420

Country of ref document: EP

Kind code of ref document: A1