WO2023093260A1 - 指令处理装置、方法、计算机设备及存储介质 - Google Patents

指令处理装置、方法、计算机设备及存储介质 Download PDF

Info

Publication number
WO2023093260A1
WO2023093260A1 PCT/CN2022/120992 CN2022120992W WO2023093260A1 WO 2023093260 A1 WO2023093260 A1 WO 2023093260A1 CN 2022120992 W CN2022120992 W CN 2022120992W WO 2023093260 A1 WO2023093260 A1 WO 2023093260A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
result data
current
write address
current instruction
Prior art date
Application number
PCT/CN2022/120992
Other languages
English (en)
French (fr)
Inventor
霍冠廷
王文强
徐宁仪
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023093260A1 publication Critical patent/WO2023093260A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the technical field of electronic circuits, and in particular, to an instruction processing device, method, data processing chip, computer equipment, and computer-readable storage medium.
  • the current data processing method has a problem of high power consumption.
  • Embodiments of the present disclosure at least provide an instruction processing device, a method, a data processing chip, a computer device, and a computer-readable storage medium.
  • an embodiment of the present disclosure provides an instruction processing device, including a sequentially connected controller, an instruction processing circuit, and an arithmetic unit; the controller is configured to send at least one instruction to be processed to the instruction processing circuit; the The instruction processing circuit is configured to determine whether to write the result data corresponding to the current instruction into the external memory according to the address information of the current instruction, wherein the current instruction is in the first issuing sequence among the at least one pending instruction instruction; in response to determining that the result data does not need to be written to the external memory, generating a first target instruction that characterizes that the result data does not need to be written to the external memory based on the current instruction, and sending to the arithmetic unit sending the first target instruction; the operation unit is configured to, in response to receiving the first target instruction, execute the first target instruction to obtain the result data.
  • the instruction processing circuit is further configured to send the current instruction to the operation unit in response to determining that the result data corresponding to the current instruction needs to be written into the external memory;
  • the operation unit is further configured to, in response to receiving the current instruction, execute the current instruction, obtain the result data, and write the result data into the external memory.
  • the instruction processing circuit is further configured to, in response to determining that the result data corresponding to the current instruction needs to be written into the external memory, based on the generation of the current instruction to generate the characterization needs to write the Write the result data into the second target instruction of the external memory, and send the second target instruction to the operation unit; the operation unit is also configured to execute the first target instruction in response to receiving the second target instruction. Two target instructions, obtain the result data, and write the result data into the external memory.
  • the at least one instruction to be processed includes a plurality of instructions to be processed;
  • the address information of the current instruction includes a first write address;
  • the instruction processing circuit when determining whether to write the result data corresponding to the current instruction into the external memory, is used to: compare the first write address of the current instruction with the second write address of the first instruction; respond Because the first write address is the same as the second write address, it is determined that the result data corresponding to the current instruction does not need to be written into the external memory; wherein the first instruction includes the plurality of pending instructions Instructions other than the current one described.
  • the instruction processing circuit is further configured to determine the first instruction from the plurality of pending instructions based on the type of the current instruction.
  • the instruction processing circuit determines that there is no need to write the result data corresponding to the current instruction into the external When in the memory, it is used to: in response to the first write address being the same as the second write address, determine a second instruction from the plurality of pending instructions, wherein the second instruction is in the plurality of The issue sequence in the instruction to be processed is located between the issue sequence of the first instruction and the current instruction; comparing the read address of the second instruction with the first write address of the current instruction; responding to the The read address of the second instruction is different from the first write address, and it is determined that the result data corresponding to the current instruction does not need to be written into the external memory.
  • the instruction processing circuit includes: a buffer stack, the write end of the buffer stack is connected to the controller, and is used for buffering the plurality of pending instructions; a comparison circuit , the comparison circuit is connected to the output end of the buffer stack, and is used to read the first write address of the current instruction and the second write address of the first instruction from the buffer stack; The first write address is compared with the second write address; in response to the first write address and the second write address being the same, a first control signal is output; an instruction modification circuit, the instruction modification circuit and the instruction modification circuit The output terminal of the buffer stack and the comparison circuit are connected, for responding to reading the current instruction from the buffer stack and receiving the first control signal sent by the comparison circuit, based on the The current instruction generates the first target instruction, and sends the first target instruction to the operation unit.
  • each device in the instruction processing circuit completes the instruction processing process, reduces the write access process to the external memory, saves power consumption, and reduces problems such as storage resource conflicts.
  • the instruction modification circuit when generating the first target instruction based on the current instruction, is configured to: modify an output control bit in the current instruction to a preset value; Or, add a preset bit at the preset position of the current instruction, and set the value of the preset bit as the preset value; wherein, the preset value is used to indicate that the current The result data corresponding to the instruction is written into the external memory.
  • the comparison circuit when the comparison circuit outputs the first control signal in response to the first write address being the same as the second write address, it is configured to: based on the current instruction and the The positions of the registers corresponding to the first instructions respectively in the buffer stack, determine the target buffer stack from the buffer stack; read the read address corresponding to the second instruction from the target buffer stack; comparing the read address of the second instruction with the first write address; in response to the difference between the read address of the second instruction and the first write address, outputting the first control to the instruction modification circuit Signal.
  • the comparison circuit can be more accurately determined as an instruction that does not need to be written into the external memory.
  • the comparison circuit is further configured to output a second control signal to the instruction modification circuit in response to the difference between the first write address and the second write address; the instruction modification The circuit is further configured to send the current instruction to the operation unit in response to reading the current instruction from the buffer stack and receiving the second control signal sent by the comparison circuit.
  • an embodiment of the present disclosure provides an instruction processing method, which is applied to an instruction processing device, and the instruction processing device includes a controller, an instruction processing circuit, and an arithmetic unit connected in sequence; the instruction processing method includes: the The controller sends at least one instruction to be processed to the instruction processing circuit; the instruction processing circuit determines whether to write the result data corresponding to the current instruction into the external memory according to the address information of the current instruction, wherein the current instruction is an instruction in a first issue order of the at least one pending instruction; in response to determining that the result data does not need to be written to the external memory by the instruction processing circuit, generating a representation based on the current instruction does not need to write the Write the result data into a first target instruction in the external memory, and send the first target instruction to the computing unit; the computing unit executes the first target instruction in response to receiving the first target instruction , to get the result data.
  • the instruction processing method further includes: the instruction processing circuit, in response to determining that the result data corresponding to the current instruction needs to be written into the external memory, sends The current instruction; the operation unit executes the current instruction in response to receiving the current instruction, obtains the result data, and writes the result data into the external memory.
  • the instruction processing circuit determines that the result data corresponding to the current instruction needs to be written into the external memory in response to the determination that the result data corresponding to the current instruction needs to be written based on the generation of the current instruction.
  • the at least one instruction to be processed includes a plurality of instructions to be processed; the address information of the current instruction includes a first write address; and the instruction processing circuit determines according to the address information of the current instruction Whether to write the result data corresponding to the current instruction into the external memory includes: comparing the first write address of the current instruction with the second write address of the first instruction; responding to the first write address and The second write address is the same, and it is determined that there is no need to write the result data corresponding to the current instruction into the external memory; wherein, the first instruction includes the plurality of pending instructions except the current instruction other instructions.
  • the instruction processing method further includes: the instruction processing circuit determines the first instruction from the plurality of pending instructions based on the type of the current instruction.
  • the instruction processing circuit determines that the result data corresponding to the current instruction does not need to be written into the external memory in response to the fact that the first write address is the same as the second write address , comprising: in response to the first write address being the same as the second write address, determining a second instruction from the plurality of pending instructions, wherein the second instruction is among the plurality of pending instructions The issue sequence of the first instruction and the issue sequence of the current instruction is located between the issue sequence of the first instruction and the current instruction; the read address of the second instruction is compared with the first write address of the current instruction; in response to the second instruction The read address is different from the first write address, and it is determined that the result data corresponding to the current instruction does not need to be written into the external memory.
  • the instruction processing circuit includes: a buffer stack, a comparison circuit, and an instruction modification circuit; wherein, the write end of the buffer stack is connected to the controller for caching all The plurality of instructions to be processed; the output end of the buffer stack is respectively connected to the comparison circuit and the instruction modification circuit; the instruction processing method also includes: the comparison circuit reads from the buffer stack The first write address of the current instruction and the second write address of the first instruction; comparing the first write address with the second write address; responding to the first write address and the The second write address is the same, and outputs the first control signal; the instruction modification circuit responds to reading the current instruction from the buffer stack and receiving the first control signal sent by the comparison circuit, based on The current instruction generates the first target instruction, and sends the first target instruction to the operation unit.
  • the instruction modification circuit generating the first target instruction based on the current instruction includes: modifying an output control bit in the current instruction to a preset value; or, in the Add a preset bit to the preset position of the current instruction, and set the value of the preset bit to the preset value; wherein, the preset value is used to indicate that the result corresponding to the current instruction does not need to be Data is written into the external memory.
  • the comparison circuit outputs a first control signal in response to the first write address being the same as the second write address, including: based on the current instruction and the first instruction The positions of the corresponding registers in the cache pile, determine the target cache pile from the cache pile; read the read address corresponding to the second instruction from the target cache pile; comparing the read address of the instruction with the first write address; and outputting the first control signal to the instruction modification circuit in response to the difference between the read address of the second instruction and the first write address.
  • the instruction processing method further includes: the comparison circuit outputs a second control signal to the instruction modification circuit in response to the difference between the first write address and the second write address ; the instruction modification circuit sends the current instruction to the operation unit in response to reading the current instruction from the buffer stack and receiving the second control signal sent by the comparison circuit.
  • an embodiment of the present disclosure further provides a data processing chip, including the instruction processing device according to any one of the first aspect.
  • an embodiment of the present disclosure further provides a computer device, including the instruction processing apparatus according to any one of the first aspect.
  • the embodiment of the present disclosure further provides a computer device, including the data processing chip as described in the third aspect.
  • the embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the instructions described in any one of the above-mentioned second aspects are executed. The steps of the processing method.
  • FIG. 1 shows a schematic diagram of an instruction processing device provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of an instruction processing circuit in an instruction processing device provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of an instruction processing circuit in an instruction processing device provided by an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of an instruction processing method provided by an embodiment of the present disclosure
  • CNN Convolutional Neural Networks
  • Convolutional neural network is an efficient recognition algorithm widely used in pattern recognition, image processing and other fields in recent years. It has the characteristics of simple structure, few training parameters and strong adaptability, translation, rotation and scaling. While convolutional neural networks deliver impressive results across a range of computer vision and machine learning tasks, they are computationally demanding, limiting their deployability. In order to solve the contradiction between calculation amount and speed, Application Specific Integrated Circuit (ASIC) can be used to apply it to the calculation acceleration of neural network.
  • ASIC Application Specific Integrated Circuit
  • the convolution in the convolutional neural network refers to defining the convolution kernel, sliding and matching the feature map, multiplying the corresponding positions and then accumulating.
  • the final result of the accumulation is the captured local space feature.
  • the accumulated intermediate value is only used for subsequent calculation without outputting it, otherwise it will bring additional power consumption and conflict of storage resources.
  • the operation type and operation object are included in the convolution expansion instruction, and the control bit is not set for whether to output an accumulation. In the case that the instruction set does not control the result output, if it is not processed in the circuit, the calculated intermediate value will continue to be output and overwritten, resulting in the introduction of additional power consumption and the conflict of tighter storage resources.
  • the present disclosure provides an instruction processing device, which processes at least one instruction to be processed sent by the controller through the instruction processing circuit, determines whether to write the result data corresponding to the current instruction into the external memory, and determines whether there is no need to When the result data corresponding to the current instruction is written into the external memory, the corresponding first target instruction is generated, and the operation unit is used to execute the first target instruction to obtain the corresponding result data, and the result data will not be written into the external memory , which reduces the write access process to the external memory, reduces the power consumption in the data processing process, and reduces storage resource conflicts and other issues.
  • FIG. 1 it is a schematic diagram of an instruction processing device provided by an embodiment of the present disclosure.
  • the instruction processing device includes: a controller 110 , an instruction processing circuit 120 and a computing unit 130 connected in sequence.
  • the controller 110 is configured to send at least one instruction to be processed to the instruction processing circuit 120 .
  • the instruction processing circuit 120 is configured to determine whether to write the result data corresponding to the current instruction into the external memory according to the address information of the current instruction, wherein the current instruction is the first instruction among the at least one instruction to be processed. issuing a sequence of instructions; in response to determining that said result data need not be written to said external memory, generating a first target instruction based on said current instruction that characterizes said need not said write of said result data to said external memory, and sending to said The operation unit sends the first target instruction.
  • the operation unit 130 is configured to execute the first target instruction to obtain the result data in response to receiving the first target instruction.
  • the instruction processing circuit 120 is further configured to send the current instruction to the operation unit in response to determining that result data corresponding to the current instruction needs to be written into the external memory.
  • the operation unit 130 is further configured to execute the current instruction in response to receiving the current instruction, obtain the result data, and write the result data into the external memory.
  • controller 110 the command processing circuit 120, and the computing unit 130 will be described in detail respectively.
  • the controller 110 is also configured to receive commands issued by the command issuing device.
  • the command issuing device may include, for example, any one of the following: a host (host) virtual machine, a computer container (container), an application program, or different functions in the application program.
  • the controller 110 in the command processing device can analyze the command and generate one or more instructions to be processed. By executing one or more pending instructions, the execution of the commands issued by the instruction issuing device can be realized.
  • the controller 110 can be a central processing unit (central processing unit, CPU) in the terminal device
  • the instruction processing circuit 120 can be a plurality of logic circuits in the terminal device
  • the computing unit 130 can be a terminal device that performs calculations
  • the processing device and the like are not limited here.
  • the controller 110 receives the issued command, and converts the issued command to obtain relevant instructions that can be processed by the instruction processing circuit 120;
  • the calculation unit 130 can perform a final calculation process to obtain a corresponding calculation result.
  • the signal transmission between the command issuing device and the controller 110 , the command processing circuit 120 and the computing unit 130 may be carried out through electrical signals, optical signals, etc., which are not limited herein.
  • Each instruction to be processed may include at least one of the following information: operation code and operand.
  • the operand can be represented as the read address of the operand. Through the read address, the operand can be read from the storage location corresponding to the read address, and the operation corresponding to the operation code can be performed on the operand to obtain the instruction corresponding to the pending instruction.
  • each instruction to be processed may also include a write address, and the write address is used to write the result of the instruction to be processed into a storage location corresponding to the write address.
  • the result data of some instructions is only intermediate result data.
  • the command corresponding to the convolution operator is sent to the data processing device, it can be parsed into multiple pending Instructions, in the process of performing convolution, there will be a large amount of intermediate result data, which will be stored in the same storage location of the external memory at different processing stages, and will eventually be overwritten by the convolution result data of the convolution operator.
  • result data of a pending instruction is only intermediate result data, it will be overwritten by the result data of other subsequent pending instructions, and will not be used as the operand of other pending instructions, that is, it does not need to be stored in the external memory .
  • a processing method for the result data of each instruction to be processed can be determined.
  • MACC is the Multiply-accumulate operations instruction
  • dst is the write address
  • rs1 and rs2 are the read addresses.
  • MACC refers to the representation method of CNN calculation amount
  • the total calculation amount MACC of a network is equal to the MACC accumulation of each layer.
  • the embodiments of the present disclosure determine whether to write the result data corresponding to the current instruction into the external memory according to the address information of the current instruction.
  • the address information of the current instruction includes a first write address.
  • the instruction processing circuit 120 when determining whether to write the result data corresponding to the current instruction into the external memory according to the address information of the current instruction, is used to: The second write address is compared; in response to the first write address being the same as the second write address, it is determined that the result data corresponding to the current instruction does not need to be written into the external memory; wherein, the first An instruction includes other instructions in the plurality of pending instructions except the current instruction.
  • the instruction processing circuit 120 is further configured to: determine the first instruction from the plurality of pending instructions based on the type of the current instruction.
  • any one of the following manners (1) or (2) may be adopted to determine the first instruction from the multiple pending instructions.
  • the pending instruction a1 is the current instruction.
  • the pending instruction a2 is regarded as the current first instruction, and the first write address of the current instruction a1 is compared with the second write address of the current first instruction a2; If the write address is not the same as the second write address of the current first instruction a2, then the pending instruction a3 is taken as the current first instruction, and the first write address of the current instruction a1 is compared with the second write address of the current first instruction a3 Compare; if the first write address of the current instruction a1 is different from the second write address of the current first instruction a3, then use a4 as the current first instruction, and compare the first write address of the current instruction a1 with the current first instruction a4 Compare the second write address of the current instruction a1 with the second write address of the current first instruction an
  • the result data will be overwritten by the result data of other pending instructions, but before it is overwritten by the results of other pending instructions, it will be read as the operand of another pending instruction Fetch to indicate this other pending instruction.
  • the result data of the current instruction a1 will be overwritten by the result data of the pending instruction a5, when the pending instruction a4 is executed, the result data of the current instruction a1 needs to be read as an operand , therefore, even if the first write address of the current instruction a1 is the same as the second write address of the pending instruction a5, since the pending instruction a4 is executed before the pending instruction a5, it is still necessary to write the result data of the current instruction a1 into the storage space corresponding to the corresponding first write address.
  • the instruction processing circuit 120 determines that it is not necessary to write the result data corresponding to the current instruction into the external When in the memory, it can also be specifically used for: in response to the first write address and the second write address being the same, based on the order of issuing the first instruction and the current instruction, from the plurality of pending instructions determine the second instruction; compare the read address of the second instruction with the first write address of the current instruction; in response to the difference between the read address of the second instruction and the first write address, determine that no Writing result data corresponding to the current instruction into the external memory.
  • any one of the following (3) or (4) methods may be adopted.
  • each instruction to be processed is taken as the first instruction in turn, and the first write address of the current instruction and the current The second write address of the first instruction is compared. If the second write address of the current first instruction is the same as the first write address of the current instruction, the subsequent comparison will be stopped. Therefore, only the second The current first instruction whose write address is the same as the first write address is determined as a reference instruction, each pending instruction whose emission position in the emission sequence is between the current instruction and the reference instruction is determined as a second instruction, and the second The read address of the instruction is compared with the first write address of the current instruction.
  • the first write address of the current instruction a1 is the same as the second write address of the i-th instruction to be processed as the first instruction ai, then the pending instruction a2 to the pending instruction a( i-1) are all determined as the second instruction, and the read addresses of the pending instruction a2 to a(i-1) are compared with the first write address of the current instruction a1 respectively.
  • the result data corresponding to the current instruction needs to be written into the external memory. If the read addresses of all second instructions are different from the first write address of the current instruction, it is determined that the result data corresponding to the current instruction does not need to be written into the external memory.
  • the embodiment of the present disclosure also provides a specific example of the instruction processing circuit 120 .
  • the instruction processing circuit 120 includes a register stack 121 , a comparison circuit 122 and an instruction modification circuit 123 .
  • the write end of the buffer stack 121 is connected to the controller 110 for buffering the plurality of instructions to be processed; the output end of the buffer stack 121 is respectively connected to the comparison circuit 122 and the The instruction modification circuit 123 is connected; the comparison circuit 122 is used to read the first write address of the current instruction and the second write address of the first instruction from the buffer stack 121; The address is compared with the second write address; in response to the first write address being the same as the second write address, a first control signal is output to the instruction modification circuit 123; the instruction modification circuit 123 is used to In response to reading the current instruction from the buffer stack and receiving the first control signal sent by the comparison circuit 122, generating the first target instruction based on the pending instruction, and sending the instruction to the The operation unit sends the first target instruction.
  • the buffer stack 121 may receive a plurality of pending instructions issued by the controller 110, and store the plurality of pending instructions in the buffer stack 121, to be called by the comparison circuit 122 and the instruction modification circuit 123 .
  • the comparison circuit 122 is the main processing circuit of the instruction processing circuit 120, which is used to compare the address information of the current instruction and the first instruction, and can send the instruction to the instruction when the first write address is the same as the second write address.
  • the modification circuit 123 outputs the first control signal; or, in the case that the first write address and the second write address are different, outputs the second control signal to the instruction modification circuit 123 .
  • the comparison circuit 122 when the comparison circuit 122 outputs the first control signal to the instruction modification circuit 123 in response to the first write address being the same as the second write address, it is configured to: based on the current instruction and The positions of the registers corresponding to the first instructions respectively in the buffer stack 121, determine the target buffer stack 121 from the buffer stack 121; read the second instruction from the target buffer stack 121 corresponding to the read address of the second instruction; compare the read address of the second instruction with the first write address; in response to the difference between the read address of the second instruction and the first write address, send the instruction modification circuit 123 outputting the first control signal.
  • the processing mechanism of the comparison circuit 122 is similar to the description of the instruction processing circuit 120 in the above description, and will not be repeated here.
  • the instruction modification circuit 123 can determine the modified content of the current instruction based on the comparison result obtained by the comparison circuit 122, and generate a corresponding target instruction.
  • the instruction modification circuit 123 when generating the target instruction based on the current instruction, is configured to: modify the output control bit in the current instruction to a preset value; or, in the current instruction A preset bit is added to the preset position, and the value of the preset bit is set as the preset value; wherein, the preset value is used to indicate that it is not necessary to write the result data corresponding to the current instruction into the in the external memory mentioned above.
  • the current instruction is stored in the cache pile 121, and when the current instruction is output from the cache pile 121, the write address of the current instruction is compared with the write addresses of other instructions of the same type stored in the cache pile 121 Whether they are the same, if they are the same, modify the preset bit of the preset position in the current instruction to 0, which is used to indicate that the result data corresponding to the current instruction does not need to be written into the external memory. Alternatively, if the preset bit at the preset position is not set in the current instruction, a 1-bit signal may be added to the current instruction to indicate that the result data corresponding to the current instruction does not access the external memory.
  • the comparison circuit 122 is further configured to: output a second control signal to the instruction modification circuit 123 in response to the difference between the first write address and the second write address; the The instruction modification circuit 123 is further configured to send the current instruction to the operation unit in response to reading the current instruction from the buffer stack and receiving the second control signal sent by the comparison circuit 122 . That is, when it is determined to access the external memory, a corresponding access operation is performed.
  • FIG. 3 it is another schematic diagram of the instruction processing circuit 120 in the instruction processing apparatus provided by the embodiment of the present disclosure.
  • the pending instructions 1, 2, if there is any pending instruction other than the current instruction that satisfies the following two conditions: First, the write address of the pending instruction is the same as the write address of the current instruction; for example, the current instruction 0 and the pending instruction 5 second, there is no other pending instruction between the pending instruction and the current instruction whose read address is the same as the current instruction’s write address, for example, the read addresses of pending instruction 1 to pending instruction 4 are all the same as If the write addresses of the current instruction 0 are different, it is determined that the result data corresponding to the current instruction does not need to be written into the external memory.
  • the preset position in the current command can be modified to 0. If the preset position is not included in the current command, a 1-bit signal is added to indicate that the result data corresponding to the current command does not access the external memory
  • the arithmetic unit 130 includes, for example, at least one two-dimensional processing engine (Processing Engine, PE) array and a register array (local register file), and each PE includes computing elements such as multipliers and adders for performing specific computing tasks.
  • PE Processing Engine
  • register array local register file
  • the computing unit 130 executes the first target instruction to obtain the result data.
  • the current instruction when it is determined that the result data corresponding to the current instruction needs to be written into the external memory, the current instruction is sent to the operation unit; the operation unit 130 responds to receiving the current instruction , execute the current instruction, obtain the result data, and write the result data into the external memory.
  • the computing unit 130 can perform read/write access to the connected external memory, and correspondingly, the external memory can store data transmitted during the read/write access process of the connected computing unit 130 .
  • the computing unit includes at least one PE array, and each PE may be connected to a different external memory, or multiple PEs may be connected to the same external memory.
  • the first target instruction includes a calculation instruction to obtain corresponding result data calculated according to the current instruction, and an instruction indicating whether to write to the external memory.
  • the arithmetic unit 130 when it is determined that the result data corresponding to the current instruction does not need to be written into the external memory, the arithmetic unit 130 will receive the generated first target instruction.
  • the first target instruction includes an instruction to calculate the result data and an instruction not to write instruction into the external memory, thus, the computing unit 130 does not perform the process of writing the calculated result data into the external memory while obtaining the corresponding result data, but keeps the result data in the computing unit 130, so as to pending subsequent processing.
  • the arithmetic unit 130 When it is determined that the result data corresponding to the current instruction needs to be written into the external memory, the arithmetic unit 130 will receive the generated second target instruction. At this time, the second target instruction includes the instruction for calculating the result data and the instruction for writing the external memory. instruction, thus, the computing unit 130 performs the process of writing the calculated result data into the external memory while calculating the corresponding result data.
  • the controller 110 sends multiple instructions to be processed to the instruction processing circuit 120, and the instruction processing circuit 120 processes the multiple instructions to be processed sent by the controller 110 to determine whether to Write the result data corresponding to the current instruction in the plurality of pending instructions into the external memory, and when it is determined that the result data corresponding to the current instruction does not need to be written into the external memory, generate a corresponding first target instruction, and use the arithmetic unit 130 Execute the first target instruction to obtain the corresponding result data, and the result data will not be written into the external memory, which reduces the write access process to the external memory, reduces power consumption during data processing, and reduces storage Resource conflicts and other issues.
  • the embodiment of the present disclosure also provides an instruction processing method corresponding to the instruction processing device. Since the problem-solving principle of the method in the embodiment of the present disclosure is similar to that of the above-mentioned instruction processing device in the embodiment of the present disclosure, the implementation of the method Reference can be made to the implementation of the device, and repeated descriptions will not be repeated.
  • the execution subject of the instruction processing method provided by the embodiments of the present disclosure is generally a computer device with certain computing capabilities, such as a terminal device or a server or other processing device, and the instruction processing method can use the processor to call the implemented in the form of computer readable instructions.
  • FIG. 4 is a flowchart of an instruction processing method provided by an embodiment of the present disclosure.
  • the instruction processing method is applied to an instruction processing device, and the instruction processing device includes a sequentially connected controller, an instruction processing circuit, and a computing unit.
  • the instruction processing method includes the following steps S401-S403.
  • S401 The controller sends at least one instruction to be processed to the instruction processing circuit.
  • the instruction processing circuit determines whether to write the result data corresponding to the current instruction into the external memory according to the address information of the current instruction; in response to determining that the result data does not need to be written into the external memory, based on the The current instruction generates a first target instruction that does not need to write the result data into the external memory, and sends the first target instruction to the operation unit, wherein the current instruction is the at least one pending The instruction in the first issue order of instructions.
  • the instruction processing method further includes: the instruction processing circuit, in response to determining that the result data corresponding to the current instruction needs to be written into the external memory, sends The current instruction; the operation unit executes the current instruction in response to receiving the current instruction, obtains the result data, and writes the result data into the external memory.
  • the instruction processing circuit is further configured to, in response to determining that the result data corresponding to the current instruction needs to be written into the external memory, and the generation of the characterization based on the current instruction needs to write the result data writing a second target instruction into the external memory, and sending the second target instruction to the operation unit; the operation unit is also configured to execute the second target instruction in response to receiving the second target instruction instruction to obtain the result data and write the result data into the external memory.
  • the at least one instruction to be processed includes a plurality of instructions to be processed; the address information of the current instruction includes a first write address; and the instruction processing circuit determines according to the address information of the current instruction Whether to write the result data corresponding to the current instruction into the external memory includes: comparing the first write address of the current instruction with the second write address of the first instruction; responding to the first write address and The second write address is the same, and it is determined that there is no need to write the result data corresponding to the current instruction into the external memory; wherein, the first instruction includes the plurality of pending instructions except the current instruction other instructions.
  • the instruction processing method further includes: the instruction processing circuit determines the first instruction from the plurality of pending instructions based on the type of the current instruction.
  • the instruction processing circuit determines that the result data corresponding to the current instruction does not need to be written into the external memory in response to the fact that the first write address is the same as the second write address , comprising: in response to the first write address being the same as the second write address, determining a second instruction from the plurality of pending instructions based on the issue order of the first instruction and the current instruction; Comparing the read address of the second instruction with the first write address of the current instruction; in response to the difference between the read address of the second instruction and the first write address, it is determined that there is no need to update the address corresponding to the current instruction The resulting data is written to the external memory.
  • the instruction processing circuit includes a buffer stack, a comparison circuit, and an instruction modification circuit; wherein, the write end of the buffer stack is connected to the controller for caching the A plurality of instructions to be processed; the output terminals of the buffer stack are respectively connected to the comparison circuit and the instruction modification circuit; the instruction processing method further includes: the comparison circuit reads the the first write address of the current instruction and the second write address of the first instruction; compare the first write address with the second write address; respond to the first write address and the second write address The two write addresses are the same, and output a first control signal to the instruction modification circuit; the instruction modification circuit responds to reading the current instruction from the buffer stack and receiving the first control signal sent by the comparison circuit A control signal, generating the first target instruction based on the pending instruction, and sending the first target instruction to the computing unit.
  • the instruction modification circuit generating the first target instruction based on the current instruction includes: modifying an output control bit in the current instruction to a preset value; or, in the Add a preset bit to the preset position of the current instruction, and set the value of the preset bit to the preset value; wherein, the preset value is used to indicate that the result corresponding to the current instruction does not need to be Data is written into the external memory.
  • the comparison circuit outputs a first control signal to the instruction modification circuit in response to the first write address being the same as the second write address, including: based on the current instruction The positions of the registers corresponding to the first instruction respectively in the cache pile, determining the target cache pile from the cache pile; reading the read address corresponding to the second instruction from the target cache pile ; Comparing the read address of the second instruction with the first write address; in response to the difference between the read address of the second instruction and the first write address, outputting the first control to the instruction modification circuit Signal.
  • the instruction processing method further includes: the comparison circuit outputs a second control signal to the instruction modification circuit in response to the difference between the first write address and the second write address ; the instruction modification circuit sends the current instruction to the operation unit in response to reading the current instruction from the buffer stack and receiving the second control signal sent by the comparison circuit.
  • the instruction processing circuit processes at least one instruction to be processed sent by the controller, determines whether to write the result data corresponding to the current instruction in the at least one instruction to be processed into the external memory, and determines that there is no need to write the current instruction
  • the corresponding result data is written in the external memory, generate the corresponding first target instruction, and use the arithmetic unit to execute the first target instruction to obtain the corresponding result data.
  • the write access process to the external memory can be reduced, saving Power consumption, reducing storage resource conflicts and other issues.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • An embodiment of the present disclosure further provides a data processing chip, including the instruction processing device according to any one of the embodiments of the present disclosure.
  • the data processing chip provided by the embodiments of the present disclosure may include a graphics processor, an AI chip, and the like.
  • the embodiment of the present disclosure also provides a computer device, including an instruction memory and the instruction processing device provided in the embodiment of the present disclosure.
  • the embodiment of the present disclosure also provides a computer device, including the data processing chip provided by the embodiment of the present disclosure.
  • the computer device provided by the embodiment of the present disclosure may include a smart terminal such as a mobile phone, or may also be other devices, servers, etc. that can be used for instruction processing, which is not limited here.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which a computer program is stored.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the instruction processing method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
  • a software development kit Software Development Kit, SDK
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Advance Control (AREA)

Abstract

本公开提供了一种指令处理装置、方法、计算机设备及存储介质。该指令处理装置包括依次连接的控制器、指令处理电路、以及运算单元;所述控制器用于向所述指令处理电路发送至少一条待处理指令;所述指令处理电路用于根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令;响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据写入所述外部存储器的第一目标指令,并向所述运算单元发送所述第一目标指令;所述运算单元用于响应于接收到第一目标指令,执行所述第一目标指令,得到所述结果数据。

Description

指令处理装置、方法、计算机设备及存储介质
交叉引用声明
本申请要求于2021年11月29日提交中国专利局的申请号为202111433398.4的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及电子线路技术领域,具体而言,涉及一种指令处理装置、方法、数据处理芯片、计算机设备及计算机可读存储介质。
背景技术
随着云计算、大数据和人工智能(Artificial Intelligence,AI)技术发展,也带来了不断增长的算力需求。当前对数据的处理方式存在功耗较大的问题。
发明内容
本公开实施例至少提供一种指令处理装置、方法、数据处理芯片、计算机设备及计算机可读存储介质。
第一方面,本公开实施例提供了一种指令处理装置,包括依次连接的控制器、指令处理电路以及运算单元;所述控制器用于向所述指令处理电路发送至少一条待处理指令;所述指令处理电路用于根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,其中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令;响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据写入所述外部存储器的第一目标指令,并向所述运算单元发送所述第一目标指令;所述运算单元,用于响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
这样,可以减少对外部存储器的写入访问过程,节约功耗,减少存储资源冲突等问题。
在一种可选的实施方式中,所述指令处理电路还用于响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,向所述运算单元发送所述当前指令;所述运算单元还用于响应于接收到所述当前指令,执行所述当前指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
这样,使得确定为需要写入的数据可以进行正常处理。
在一种可选的实施方式中,所述指令处理电路还用于响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,基于所述当前指令生成表征需要将所述结果数据写入所述外部存储器的第二目标指令,并向所述运算单元发送所述第二目标指令;所述运算单元还用于响应于接收到所述第二目标指令,执行所述第二目标指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述至少一条待处理指令包括多条待处理指令;所述当前指令的地址信息包括第一写地址;所述指令处理电路,在根据所述当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中时,用于:将所述当前指令的第一写地址和所述第一指令的第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中;其中,所述第一指令包括所述多条待处理指令中除所述当前指令外的其他指令。
这样,可以通过判断写地址是否相同,确定是否将当前指令对应的结果数据写入所述外部存储器中,减少无需写入的指令对应的结果数据造成的功耗消耗。
在一种可选的实施方式中,所述指令处理电路还用于基于所述当前指令的类型,从所述多条待处理指令中确定所述第一指令。
这样,可以针对相同类型的待处理指令进行处理,减少误处理。
在一种可选的实施方式中,所述指令处理电路,在响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中时,用于:响应于所述第一写地址和所述第二写地址相同,从所述多条待处理指令中确定第二指令,其中,所述第二指令在所述多条待处理指令中的发射顺序位于所述第一指令和所述当前指令的发射顺序之间;将所述第二指令的读地址和所述当前指令的第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中。
这样,可以使得确定出的无需写入的指令更为准确,减少对后续处理造成的影响。
在一种可选的实施方式中,所述指令处理电路包括:缓存器堆,所述缓存器堆的写入端与所述控制器连接,用于缓存所述多条待处理指令;比较电路,所述比较电路与所述缓存器堆的输出端连接,用于从所述缓存器堆中读取所述当前指令的第一写地址以及所述第一指令的第二写地址;将所述第一写地址与所述第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,输出第一控制信号;指令修饰电路,所述指令修饰电路与所述缓存器堆的输出端以及所述比较电路连接,用于响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第一控制信号,基于所述当前指令生成所述第一目标指令,并向所述运算单元发送所述第一目标指令。
这样,通过指令处理电路中的各个装置完成对于指令的处理过程,减少对外部存储器的写入访问过程,节约功耗,减少存储资源冲突等问题。
在一种可选的实施方式中,所述指令修饰电路,在基于所述当前指令生成所述第一目标指令时,用于:将所述当前指令中的输出控制位修改为预设值;或者,在所述当前指令的预设位置增加预设比特位,并将所述预设比特位的数值设置为所述预设值;其中,所述预设值用于指示无需将所述当前指令对应的结果数据写入所述外部存储器中。
这样,通过处理第一目标指令,使得不访问外部存储器,节约功耗,减少存储资源冲突等问题。
在一种可选的实施方式中,所述比较电路,在响应于所述第一写地址和所述第二写地址相同,输出第一控制信号时,用于:基于所述当前指令和所述第一指令分别对应的寄存器在所述缓存器堆中的位置,从所述缓存器堆中确定目标缓存器堆;从所述目标缓存器堆中读取第二指令对应的读地址;将所述第二指令的读地址和所述第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,向所述指令修饰电路输出所述第一控制信号。
这样,通过比较电路可以更为精准地确定出为无需写入外部存储器的指令。
在一种可选的实施方式中,所述比较电路还用于响应于所述第一写地址和所述第二写地址不同,向所述指令修饰电路输出第二控制信号;所述指令修饰电路还用于响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第二控制信号,向所述运算单元发送所述当前指令。
这样,可以写入需要写入至外部存储器的指令对应的结果数据。
第二方面,本公开实施例提供了一种指令处理方法,应用于指令处理装置,所述指令处理装置包括依次连接的控制器、指令处理电路以及运算单元;所述指令处理方法包括:所述控制器向所述指令处理电路发送至少一条待处理指令;所述指令处理电路根据 当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,其中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令;所述指令处理电路响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据写入所述外部存储器的第一目标指令,并向所述运算单元发送所述第一目标指令;所述运算单元响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
在一种可选的实施方式中,所述指令处理方法还包括:所述指令处理电路响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,向所述运算单元发送所述当前指令;所述运算单元响应于接收到所述当前指令,执行所述当前指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述指令处理电路响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,基于所述当前指令生成表征需要将所述结果数据写入所述外部存储器的第二目标指令,并向所述运算单元发送所述第二目标指令;所述运算单元响应于接收到所述第二目标指令,执行所述第二目标指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述至少一条待处理指令包括多条待处理指令;所述当前指令的地址信息包括第一写地址;所述指令处理电路根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,包括:将所述当前指令的第一写地址和第一指令的第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中;其中,所述第一指令包括所述多条待处理指令中除所述当前指令外的其他指令。
在一种可选的实施方式中,所述指令处理方法还包括:所述指令处理电路基于所述当前指令的类型,从所述多条待处理指令中确定所述第一指令。
在一种可选的实施方式中,所述指令处理电路响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中,包括:响应于所述第一写地址和所述第二写地址相同,从所述多条待处理指令中确定第二指令,其中,所述第二指令在所述多条待处理指令中的发射顺序位于所述第一指令和所述当前指令的发射顺序之间;将所述第二指令的读地址和所述当前指令的第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述指令处理电路包括:缓存器堆、比较电路以及指令修饰电路;其中,所述缓存器堆的写入端与所述控制器连接,用于缓存所述多条待处理指令;所述缓存器堆的输出端分别与所述比较电路和所述指令修饰电路连接;所述指令处理方法还包括:所述比较电路从所述缓存器堆中读取所述当前指令的第一写地址以及所述第一指令的第二写地址;将所述第一写地址与所述第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,输出第一控制信号;所述指令修饰电路响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第一控制信号,基于所述当前指令生成所述第一目标指令,并向所述运算单元发送所述第一目标指令。
在一种可选的实施方式中,所述指令修饰电路基于所述当前指令生成所述第一目标指令,包括:将所述当前指令中的输出控制位修改为预设值;或者,在所述当前指令的预设位置增加预设比特位,并将所述预设比特位的数值设置为所述预设值;其中,所述预设值用于指示无需将所述当前指令对应的结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述比较电路响应于所述第一写地址和所述第二写地址 相同,输出第一控制信号,包括:基于所述当前指令和所述第一指令分别对应的寄存器在所述缓存器堆中的位置,从所述缓存器堆中确定目标缓存器堆;从所述目标缓存器堆中读取第二指令对应的读地址;将所述第二指令的读地址和所述第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,向所述指令修饰电路输出所述第一控制信号。
在一种可选的实施方式中,所述指令处理方法还包括:所述比较电路响应于所述第一写地址和所述第二写地址不同,向所述指令修饰电路输出第二控制信号;所述指令修饰电路响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第二控制信号,向所述运算单元发送所述当前指令。
第三方面,本公开实施例还提供一种数据处理芯片,包括如第一方面任一项所述的指令处理装置。
第四方面,本公开实施例还提供一种计算机设备,包括如第一方面任一项所述的指令处理装置。
第五方面,本公开实施例还提供一种计算机设备,包括如第三方面所述的数据处理芯片。
第六方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第二方面任一项所述的指令处理方法的步骤。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种指令处理装置的示意图;
图2示出了本公开实施例所提供的指令处理装置中的指令处理电路的示意图;
图3示出了本公开实施例所提供的指令处理装置中的指令处理电路的示意图;
图4示出了本公开实施例所提供的一种指令处理方法的流程图;
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述。所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述无意限制要求保护的本公开的范围。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
经研究发现,卷积神经网络(Convolutional Neural Networks,CNN)是一类包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表算法之一。卷积神经网络是近年来广泛应用于模式识别、图像处理等领域的一种高效识别算法,具有结构简单、训练参数少和适应性强、平移、旋转、缩放等特点。虽然卷积神经网络在一系列计算机视觉和机器学习任务中均能提供令人印象深刻的结果,但其在计算上却要求很高,从而限制了其可部署性。为了解决计算量与速度之间的矛盾,可以采用专用集成电路(Application Specific Integrated Circuit,ASIC),使其应用于神经网络的计算加速中。
以卷积神经网络为例,卷积神经网络中的卷积是指定义好卷积核,并对特征图进行滑动匹配,对应位置相乘再累加,累加得到的最终结果就是捕捉到的局部空间特征。对于累加得到的中间值,只用于后续计算而不需要将其输出,如果输出将带来额外的功耗消耗以及存储资源的冲突。例如,在卷积扩展指令中包含操作类型以及操作对象,对于一次累加是否输出未设置控制位。在指令集未对结果输出进行控制的情况下,如果电路中不加以处理,则计算出的中间值将持续输出覆盖,导致引入额外的功耗消耗以及更加紧俏的存储资源的冲突问题。
基于上述研究,本公开提供了一种指令处理装置,通过指令处理电路对控制器发送的至少一条待处理指令进行处理,确定是否将当前指令对应的结果数据写入外部存储器中,并在确定无需将当前指令对应的结果数据写入外部存储器中时,生成对应的第一目标指令,并利用运算单元执行该第一目标指令,得到对应的结果数据,而结果数据不会被写入外部存储器中,也就减少了对外部存储器的写入访问过程,减少了数据处理过程中的功耗,减少存储资源冲突等问题。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
为便于对本实施例进行理解,首先对本公开实施例提供的一种指令处理装置进行详细介绍。
参见图1所示,为本公开实施例所提供的一种指令处理装置的示意图。如图1所示,所述指令处理装置包括:依次连接的控制器110、指令处理电路120以及运算单元130。
所述控制器110用于向所述指令处理电路120发送至少一条待处理指令。
所述指令处理电路120用于根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,其中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令;响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据所述写入外部存储器的第一目标指令,并向所述运算单元发送所述第一目标指令。
所述运算单元130用于响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
在本公开一实施例中,指令处理电路120还用于响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,向所述运算单元发送所述当前指令。
所述运算单元130还用于响应于接收到所述当前指令,执行所述当前指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
下面,对控制器110、指令处理电路120以及运算单元130分别加以详细说明。
控制器110还用于接收命令下发装置下发的命令。命令下发装置例如可以包括下述任一种:主机(host)虚拟机、计算机容器(container)、应用程序、或者应用程序中的不同功能。命令下发装置在将命令下发给指令处理装置后,指令处理装置中的控制器110能够对命令进行解析,生成一条或多条待处理指令。通过执行一条或多条待处理指令,能够实现对指令下发装置下发的命令的执行。
其中,所述控制器110可以为终端设备中的中央处理器(central processing unit,CPU),指令处理电路120可以为终端设备中的多个逻辑电路,以及运算单元130可以为终端设备中执行运算处理的装置等,在此不作限定。在命令下发装置下发命令之后,控制器110接收到下发的命令,并将下发的命令进行转换处理,得到能够被指令处理电路120进行处理的相关指令;在通过指令处理电路120对控制器110下发的指令进行处理之后,运算单元130便可以进行最终的运算过程,得到对应的运算结果。
这里,命令下发装置与控制器110、指令处理电路120以及运算单元130之间可以通过电信号、光信号等进行信号传输,在此不作限定。
所述待处理指令可以有多条,多条待处理指令可以来源于同一条命令,也可以来源于多条命令;不同的待处理指令的类型可以相同,也可以不同。在每一条待处理指令中,可以包括下述至少一种信息:操作码和操作数。其中,操作数可以表示为操作数的读地址,通过该读地址,能够从与读地址对应的存储位置读取操作数,并对操作数执行与操作码对应的操作,得到该待处理指令对应的结果数据;另外,在每一条待处理指令中还可以包括写地址,该写地址用于将该待处理指令的结果写入到与写地址对应的存储位置。
在两条待处理指令的写地址相同的情况下,在先执行的待处理指令得到的结果数据存储至对应的存储位置后,后执行的待处理指令得到的结果数据会将先执行的待处理指令得到的结果数据覆盖。
在深度学习领域,在很多情况下会存在某些指令的结果数据仅仅为中间结果数据的情况,例如卷积算子对应的命令被下发至数据处理装置后,能够被解析为多条待处理指令,在执行卷积过程中,会存在大量的中间结果数据,这些中间结果数据会在不同处理阶段存储至外部存储器的相同存储位置,并最终被卷积算子的卷积结果数据覆盖。
若某条待处理指令的结果数据仅仅为中间结果数据,会被后续的其他待处理指令的结果数据覆盖,且不会作为其他待处理指令的操作数,即可以不将其存储至外部存储器中。这样,通过分析多条待处理指令的结果数据是否为中间结果数据,进而可以确定对于每条待处理指令的结果数据的处理方法。
示例性的,以累加乘计算为例,假设存在以下命令:
for(i=0;i<n;i++)
MACC(dst,rs1,rs2)。
其中,MACC为累加乘(Multiply-accumulate operations)指令,dst为写地址,rs1和rs2为读地址。其中,MACC指的是CNN计算量的表示方法,一个网络的总的计算量MACC等于每一层的MACC累加。针对于每一层神经网络均存在一写地址以及一读地址,针对每一个写地址/读地址均需要进行一次或多次计算,进而针对层数较多的神经网络,出现大量的计算量。
在这段命令中,将重复进行多次MACC操作,也即可以将该条命令,解析为多条MACC操作分别对应的待处理指令,且每次MACC操作的写地址相同。在这段命令中,一次MACC操作结果的写地址,会被后续的多次MACC操作结果多次覆盖写入,并且将最后一次MACC操作结果,作为该条命令的最终结果。因此,在运算过程中,当多条MACC操作对应的待处理指令的写地址相同的情况下,后续的指令在写入结果数据时会覆盖在先指令写入的结果数据,进而造成在先指令的写入过程未能带来实质性的影响,反而带来额外的功能消耗,以及存储资源的冲突。因此,如果可以更改指令对于结果数据的写入过程,使得写地址相同的多条待处理指令在写入结果数据时,只保留最后一次MACC操作得到的结果数据,这样可以有效地减少功耗,减少存储资源冲突等问题。
本公开实施例基于上述原理,根据当前指令的地址信息,确定是否将当前指令对应的结果数据写入外部存储器中。
示例性的,所述当前指令的地址信息包括第一写地址。
指令处理电路120,在根据所述当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中时,用于:将所述当前指令的第一写地址和第一指令的第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中;其中,所述第一指令包括所述多条待处理指令中除所述当前指令外的其他指令。
在具体实施中,所述指令处理电路120还用于:基于所述当前指令的类型,从所述多条待处理指令中确定所述第一指令。
示例性的,可以采用下述方式(1)或(2)中任一种方式,从多条待处理指令中确定第一指令。
(1)执行下述至少一次比较步骤,直至确定的当前第一指令的第二写地址和当前指令的第一写地址相同,或者直至多条待处理指令均被确定为第一指令。按照当前指令在所有待处理指令的发射顺序中的发射位置以及未被确定为第一指令的其他待处理指令在所有待处理指令的发射顺序中的发射位置,从其他待处理指令中确定与当前指令的发射位置最接近的一条待处理指令作为当前第一指令,并将当前第一指令的第二写地址与当前指令的第一写地址进行比较。
示例性的,假设多条待处理指令按照指令的发射顺序依次为:a1、a2、a3、……、an。待处理指令a1为当前指令,首先将待处理指令a2作为当前第一指令,将当前指令a1的第一写地址和当前第一指令a2的第二写地址进行比对;若当前指令a1的第一写地址和当前第一指令a2的第二写地址不相同,则将待处理指令a3作为当前第一指令,将当前指令a1的第一写地址和当前第一指令a3的第二写地址进行比对;若当前指令a1的第一写地址和当前第一指令a3的第二写地址不相同,则将a4作为当前第一指令,将当前指令a1的第一写地址和当前第一指令a4的第二写地址进行比对;……;将当前指令a1的第一写地址和当前第一指令an的第二写地址进行比对,若当前指令a1的第一写地址和当前第一指令an的第二写地址不相同,则确认需要将当前指令a1的结果数据写入外部存储器中。若在上述示例中,当前指令a1的第一写地址和第i条第一指令ai的第二写地址相同,则确认无需要将当前指令a1的结果数据写入外部存储器中。
(2)将至少一条待处理指令中除当前指令外的其他待处理指令均确定为第一指令,然后将当前指令的第一写地址与各条第一指令的第二写地址分别进行比较。若存在任一条第一指令的第二写地址和当前指令的第一写地址相同,则确认无需将当前指令的结果数据写入到外部存储器中。若所有第一指令的第二写地址和当前指令的第一写地址均不 相同,则确认需要将当前指令的结果写入到外部存储器中。
对于某些待处理指令而言,其结果数据虽然会被其他待处理指令的结果数据覆盖,但在其被其他待处理指令的结果覆盖之前,会被作为另一待处理指令的操作数进行读取,用于指示该另一待处理指令。
例如在上述示例中,对于当前指令a1的结果数据,其虽然会被待处理指令a5的结果数据覆盖,但在执行待处理指令a4时,需要使用当前指令a1的结果数据作为操作数进行读取,因此,即使当前指令a1的第一写地址和待处理指令a5的第二写地址相同,由于待处理指令a4在待处理指令a5之前被执行,因此,仍然需要将当前指令a1的结果数据写入到对应的第一写地址对应的存储空间。
因此,在本公开另一实施例中,指令处理电路120,在响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中时,具体还可以用于:响应于所述第一写地址和所述第二写地址相同,基于所述第一指令和所述当前指令的发射顺序,从所述多条待处理指令中确定第二指令;将所述第二指令的读地址和所述当前指令的第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中。
在具体实施中,在基于第一指令和当前指令之间的发射顺序,从多条待处理指令中确定第二指令时,例如可以采用下述(3)或(4)中任一种方式。
(3)针对采用上述第(1)种方式确定第一指令,由于是按照待处理指令的发射顺序,依次将各条待处理指令作为第一指令,并将当前指令的第一写地址和当前第一指令的第二写地址进行比对,在当前第一指令的第二写地址和当前指令的第一写地址相同的情况下,就会停止后续的比对,因此,只需要将第二写地址和第一写地址相同的当前第一指令确定为参考指令,将在发射顺序中的发射位置在当前指令和参考指令之间的各条待处理指令确定为第二指令,并将第二指令的读地址和当前指令的第一写地址进行比对。
例如,在上述(1)的示例中,当前指令a1的第一写地址和第i条待处理指令作为第一指令ai的第二写地址相同,则将待处理指令a2~待处理指令a(i-1)均确定为第二指令,并将待处理指令a2~待处理指令a(i-1)的读地址分别和当前指令a1的第一写地址进行比对。
(4)针对采用上述第(2)种方式确定第一指令,由于会将除当前指令外的其他待处理指令均作为第一指令,并将各条第一指令的第二写地址分别和当前指令的第一写地址进行比对,因此,若其中存在m条第一指令的第二写地址和第一写地址相同,则可以从m条第一指令中确定一条发射位置与当前指令的发射位置最接近的第一指令作为参考指令,将发射位置在当前指令和参考指令之间的各条待处理指令确定为第二指令,并将第二指令的读地址和当前指令的第一写地址进行比对。
在确定的第二指令中,若任一条第二指令的读地址和当前指令的第一写地址相同,则需要将当前指令对应的结果数据写入到外部存储器中。若所有第二指令的读地址和当前指令的第一写地址均不相同,则确定无需将当前指令对应的结果数据写入到外部存储器中。
参见图2和图3所示,本公开实施例还提供一种指令处理电路120的具体示例。所述指令处理电路120包括缓存器堆121、比较电路122以及指令修饰电路123。
其中,所述缓存器堆121的写入端与所述控制器110连接,用于缓存所述多条待处理指令;所述缓存器堆121的输出端分别与所述比较电路122和所述指令修饰电路123 连接;所述比较电路122用于从所述缓存器堆121中读取所述当前指令的第一写地址以及所述第一指令的第二写地址;将所述第一写地址与所述第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,向所述指令修饰电路123输出第一控制信号;所述指令修饰电路123用于响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路122发送的所述第一控制信号,基于所述待处理指令生成所述第一目标指令,并向所述运算单元发送所述第一目标指令。
示例性的,所述缓存器堆121可以接收控制器110下发的多条待处理指令,并将多条待处理指令存储在缓存器堆121中,以待比较电路122和指令修饰电路123调用。
比较电路122为指令处理电路120的主要处理电路,用于针对当前指令以及第一指令的地址信息进行比较,能够在第一写地址和所述第二写地址相同的情况下,向所述指令修饰电路123输出第一控制信号;或者,在第一写地址和第二写地址不同的情况下,向指令修饰电路123输出第二控制信号。
具体的,所述比较电路122,在响应于所述第一写地址和所述第二写地址相同,向所述指令修饰电路123输出第一控制信号时,用于:基于所述当前指令和所述第一指令分别对应的寄存器在所述缓存器堆121中的位置,从所述缓存器堆121中确定目标缓存器堆121;从所述目标缓存器堆121中读取第二指令对应的读地址;将所述第二指令的读地址和所述第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,向所述指令修饰电路123输出所述第一控制信号。这里,比较电路122的处理机制与上述描述中针对指令处理电路120的描述内容类似,在此不再赘述。
指令修饰电路123可以基于比较电路122得到的比较结果,确定对当前指令的修饰内容,并生成对应的目标指令。
具体的,所述指令修饰电路123,在基于所述当前指令生成所述目标指令时,用于:将所述当前指令中的输出控制位修改为预设值;或者,在所述当前指令的预设位置增加预设比特位,并将所述预设比特位的数值设置为所述预设值;其中,所述预设值用于指示无需将所述当前指令对应的结果数据写入所述外部存储器中。
示例性的,将当前指令存入缓存器堆121中,当当前指令从缓存器堆121中输出时,比较当前指令的写地址与缓存器堆121中存入的同类型的其他指令的写地址是否相同,如果相同,则将当前指令中的预设位置的预设比特位修改为0,用于指示无需将所述当前指令对应的结果数据写入外部存储器中。或者,当前指令中未设置预设位置的预设比特位,可以对当前指令附加1bit信号,以指示当前指令对应的结果数据不访问外部存储器。
在本公开另一实施例中,所述比较电路122还用于:响应于所述第一写地址和所述第二写地址不同,向所述指令修饰电路123输出第二控制信号;所述指令修饰电路123还用于响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路122发送的所述第二控制信号,向所述运算单元发送所述当前指令。即,当确定对外部存储器进行访问时,执行对应的访问操作。
参阅图3,为本公开实施例所提供的指令处理装置中,指令处理电路120的另一示意图。如图3所示,待处理指令1、2、…、n会依次存入缓存器堆121的中的n个指令缓存中。具体的,如果存在除当前指令之外的任意一条待处理指令满足以下两个条件:其一,该待处理指令的写地址与当前指令的写地址相同;例如,当前指令0与待处理指令5的写地址相同;其二,该待处理指令到当前指令之间不存在其他待处理指令的读地址和当前指令的写地址相同,例如,待处理指令1~待处理指令4的读地址均和当前指令0的写地址不相同,则确定无需将当前指令对应的结果数据写入外部存储器。可以将当 前指令中的预设位置修改为0,如果当前指令中不包含预设位置,则附加1bit信号指示当前指令对应的结果数据不访问外部存储器。
运算单元130例如包括至少一个二维处理引擎(Processing Engine,PE)阵列和寄存器阵列(local register file),在每个PE中,包括乘加器等计算元件,用于执行具体的计算任务。
运算单元130响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
在本公开另一个实施例中,在确定需要将所述当前指令对应的结果数据写入外部存储器中的情况下,向所述运算单元发送当前指令;运算单元130响应于接收到所述当前指令,执行所述当前指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
其中,所述运算单元130可以对所连接的外部存储器进行读/写访问,对应的,外部存储器可以存储所连接的运算单元130进行读/写访问过程中传输的数据。例如,计算单元中包括至少一个PE阵列,每个PE可以分别与不同的外部存储器相连接,或者多个PE与同一外部存储器相连接。
在本公开实施例中,第一目标指令包括根据该当前指令计算得到对应的结果数据的计算指令,以及是否写入外部存储器的指示指令。
具体的,在确定无需将当前指令对应的结果数据写入外部存储器中的情况下,运算单元130将接收到生成的第一目标指令,此时第一目标指令包括计算结果数据的指令以及不写入外部存储器的指令,由此,运算单元130在计算得到对应的结果数据的同时,不进行将计算得到的结果数据进行写入外部存储器的处理,而是将结果数据保留在运算单元130,以待后续的处理过程。
在确定需要将当前指令对应的结果数据写入外部存储器中的情况下,运算单元130将接收到生成的第二目标指令,此时第二目标指令包括计算结果数据的指令以及写入外部存储器的指令,由此,运算单元130在计算得到对应的结果数据的同时,进行将计算得到的结果数据进行写入外部存储器的处理过程。
本公开实施例提供的指令处理装置中,通过控制器110向指令处理电路120发送多条待处理指令,并通过指令处理电路120对控制器110发送的多条待处理指令进行处理,确定是否将多条待处理指令中的当前指令对应的结果数据写入外部存储器中,并在确定无需将当前指令对应的结果数据写入外部存储器中时,生成对应的第一目标指令,并利用运算单元130执行该第一目标指令,得到对应的结果数据,而结果数据不会被写入外部存储器中,也就减少了对外部存储器的写入访问过程,减少了数据处理过程中的功耗,减少存储资源冲突等问题。
基于同一发明构思,本公开实施例中还提供了与指令处理装置对应的指令处理方法,由于本公开实施例中的方法解决问题的原理与本公开实施例上述指令处理装置相似,因此方法的实施可以参见装置的实施,重复之处不再赘述。
本公开实施例所提供的指令处理方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括终端设备或服务器或其它处理设备,该指令处理方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
参见图4所示,图4为本公开实施例提供的指令处理方法的流程图。所述指令处理方法应用于指令处理装置,所述指令处理装置包括依次连接的控制器、指令处理电路以及运算单元。所述指令处理方法包括以下步骤S401~S403。
S401:所述控制器向所述指令处理电路发送至少一条待处理指令。
S402:所述指令处理电路根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中;响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据写入所述外部存储器的第一目标指令,并向所述运算单元发送所述第一目标指令,其中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令。
S403:所述运算单元响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
在一种可选的实施方式中,所述指令处理方法还包括:所述指令处理电路响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,向所述运算单元发送所述当前指令;所述运算单元响应于接收到所述当前指令,执行所述当前指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述指令处理电路还用于响应于确定需要将所述当前指令对应的结果数据写入外部存储器中,基于所述当前指令生成表征需要将所述结果数据写入所述外部存储器的第二目标指令,并向所述运算单元发送所述第二目标指令;所述运算单元还用于响应于接收到所述第二目标指令,执行所述第二目标指令,得到所述结果数据,并将所述结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述至少一条待处理指令包括多条待处理指令;所述当前指令的地址信息包括第一写地址;所述指令处理电路根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,包括:将所述当前指令的第一写地址和第一指令的第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中;其中,所述第一指令包括所述多条待处理指令中除所述当前指令外的其他指令。
在一种可选的实施方式中,所述指令处理方法还包括:所述指令处理电路基于所述当前指令的类型,从所述多条待处理指令中确定所述第一指令。
在一种可选的实施方式中,所述指令处理电路响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中,包括:响应于所述第一写地址和所述第二写地址相同,基于所述第一指令和所述当前指令的发射顺序,从所述多条待处理指令中确定第二指令;将所述第二指令的读地址和所述当前指令的第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述指令处理电路包括缓存器堆、比较电路以及指令修饰电路;其中,所述缓存器堆的写入端与所述控制器连接,用于缓存所述多条待处理指令;所述缓存器堆的输出端分别与所述比较电路和所述指令修饰电路连接;所述指令处理方法还包括:所述比较电路从所述缓存器堆中读取所述当前指令的第一写地址以及所述第一指令的第二写地址;将所述第一写地址与所述第二写地址进行比对;响应于所述第一写地址和所述第二写地址相同,向所述指令修饰电路输出第一控制信号;所述指令修饰电路响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第一控制信号,基于所述待处理指令生成所述第一目标指令,并向所述运算单元发送所述第一目标指令。
在一种可选的实施方式中,所述指令修饰电路基于所述当前指令生成所述第一目标指令,包括:将所述当前指令中的输出控制位修改为预设值;或者,在所述当前指令的预设位置增加预设比特位,并将所述预设比特位的数值设置为所述预设值;其中,所 述预设值用于指示无需将所述当前指令对应的结果数据写入所述外部存储器中。
在一种可选的实施方式中,所述比较电路响应于所述第一写地址和所述第二写地址相同,向所述指令修饰电路输出第一控制信号,包括:基于所述当前指令和所述第一指令分别对应的寄存器在所述缓存器堆中的位置,从所述缓存器堆中确定目标缓存器堆;从所述目标缓存器堆中读取第二指令对应的读地址;将所述第二指令的读地址和所述第一写地址进行比对;响应于所述第二指令的读地址和所述第一写地址不同,向所述指令修饰电路输出第一控制信号。
在一种可选的实施方式中,所述指令处理方法还包括:所述比较电路响应于所述第一写地址和所述第二写地址不同,向所述指令修饰电路输出第二控制信号;所述指令修饰电路响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第二控制信号,向所述运算单元发送所述当前指令。
本公开实施例通过指令处理电路对控制器发送的至少一条待处理指令进行处理,确定是否将至少一条待处理指令中的当前指令对应的结果数据写入外部存储器中,并在确定无需将当前指令对应的结果数据写入外部存储器中时,生成对应的第一目标指令,并利用运算单元执行该第一目标指令,得到对应的结果数据,这样,可以减少对外部存储器的写入访问过程,节约功耗,减少存储资源冲突等问题。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
本公开实施例还提供一种数据处理芯片,包括如本公开任一实施例所述的指令处理装置。
本公开实施例提供的数据处理芯片可以包括图形处理器、AI芯片等。
本公开实施例还提供一种计算机设备,包括指令存储器和本公开实施例提供的指令处理装置。
本公开实施例还提供一种计算机设备,包括本公开实施例提供的数据处理芯片。
本公开实施例提供的计算机设备可以包括手机等智能终端,或者也可以是其他可以用于进行指令处理的设备、服务器等,这里并不限制。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的指令处理方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的指令处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅 为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (15)

  1. 一种指令处理装置,其特征在于,包括依次连接的控制器、指令处理电路以及运算单元;
    所述控制器用于向所述指令处理电路发送至少一条待处理指令;
    所述指令处理电路用于:
    根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,其中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令;
    响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据写入所述外部存储器的第一目标指令;并
    向所述运算单元发送所述第一目标指令;
    所述运算单元用于响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
  2. 根据权利要求1所述的装置,其特征在于,
    所述指令处理电路还用于响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,向所述运算单元发送所述当前指令;
    所述运算单元还用于:
    响应于接收到所述当前指令,执行所述当前指令,得到所述结果数据;并
    将所述结果数据写入所述外部存储器中。
  3. 根据权利要求1所述的装置,其特征在于,
    所述指令处理电路还用于;
    响应于确定需要将所述当前指令对应的结果数据写入所述外部存储器中,基于所述当前指令生成表征需要将所述结果数据写入所述外部存储器的第二目标指令;并
    向所述运算单元发送所述第二目标指令;
    所述运算单元,还用于:
    响应于接收到所述第二目标指令,执行所述第二目标指令,得到所述结果数据;并
    将所述结果数据写入所述外部存储器中。
  4. 根据权利要求1至3任一项所述的装置,其特征在于,所述至少一条待处理指令包括多条待处理指令;所述当前指令的地址信息包括第一写地址;所述指令处理电路,在根据所述当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中时,用于:
    将所述当前指令的第一写地址和第一指令的第二写地址进行比对;
    响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中;
    其中,所述第一指令包括所述多条待处理指令中除所述当前指令外的其他指令。
  5. 根据权利要求4所述的装置,其特征在于,所述指令处理电路还用于:
    基于所述当前指令的类型,从所述多条待处理指令中确定所述第一指令。
  6. 根据权利要求4或5所述的装置,其特征在于,所述指令处理电路,在响应于所述第一写地址和所述第二写地址相同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中时,用于:
    响应于所述第一写地址和所述第二写地址相同,从所述多条待处理指令中确定第二指令,其中,所述第二指令在所述多条待处理指令中的发射顺序位于所述第一指令和所述当前指令的发射顺序之间;
    将所述第二指令的读地址和所述当前指令的第一写地址进行比对;
    响应于所述第二指令的读地址和所述第一写地址不同,确定无需将所述当前指令对应的结果数据写入所述外部存储器中。
  7. 根据权利要求4至6任一项所述的装置,其特征在于,所述指令处理电路包括:
    缓存器堆,所述缓存器堆的写入端与所述控制器连接,用于缓存所述多条待处理指令;
    比较电路,所述比较电路与所述缓存器堆的输出端连接,用于:
    从所述缓存器堆中读取所述当前指令的第一写地址以及所述第一指令的第二写地址;
    将所述第一写地址与所述第二写地址进行比对;
    响应于所述第一写地址和所述第二写地址相同,输出第一控制信号;
    指令修饰电路,所述指令修饰电路与所述缓存器堆的输出端以及所述比较电路连接,用于:
    响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第一控制信号,基于所述当前指令生成所述第一目标指令,并
    向所述运算单元发送所述第一目标指令。
  8. 根据权利要求7所述的装置,其特征在于,所述指令修饰电路,在基于所述当前指令生成所述第一目标指令时,用于:
    将所述当前指令中的输出控制位修改为预设值;或者,
    在所述当前指令的预设位置增加预设比特位,并将所述预设比特位的数值设置为所述预设值;
    其中,所述预设值用于指示无需将所述当前指令对应的结果数据写入所述外部存储器中。
  9. 根据权利要求7或8所述的装置,其特征在于,所述比较电路,在响应于所述第一写地址和所述第二写地址相同,输出第一控制信号时,用于:
    基于所述当前指令和所述第一指令分别对应的寄存器在所述缓存器堆中的位置,从所述缓存器堆中确定目标缓存器堆;
    从所述目标缓存器堆中读取第二指令对应的读地址;
    将所述第二指令的读地址和所述第一写地址进行比对;
    响应于所述第二指令的读地址和所述第一写地址不同,向所述指令修饰电路输出所述第一控制信号。
  10. 根据权利要求7或8所述的装置,其特征在于,
    所述比较电路还用于响应于所述第一写地址和所述第二写地址不同,向所述指令修饰电路输出第二控制信号;
    所述指令修饰电路还用于响应于从所述缓存器堆中读取到所述当前指令和接收到所述比较电路发送的所述第二控制信号,向所述运算单元发送所述当前指令。
  11. 一种指令处理方法,其特征在于,应用于指令处理装置,所述指令处理装置包括依次连接的控制器、指令处理电路以及运算单元;所述指令处理方法包括:
    所述控制器向所述指令处理电路发送至少一条待处理指令;
    所述指令处理电路根据当前指令的地址信息,确定是否将所述当前指令对应的结果数据写入外部存储器中,其中,所述当前指令为所述至少一条待处理指令中处于第一发射顺序的指令;
    所述指令处理电路响应于确定无需将所述结果数据写入所述外部存储器中,基于所述当前指令生成表征无需将所述结果数据写入所述外部存储器的第一目标指令,并向所述运算单元发送所述第一目标指令;
    所述运算单元响应于接收到所述第一目标指令,执行所述第一目标指令,得到所述结果数据。
  12. 一种数据处理芯片,其特征在于,包括如权利要求1至10任一项所述的指令处理装置。
  13. 一种计算机设备,其特征在于,包括如权利要求1至10任一项所述的指令处理装置。
  14. 一种计算机设备,其特征在于,包括如权利要求12所述的数据处理芯片。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如权利要求11所述的指令处理方法的步骤。
PCT/CN2022/120992 2021-11-29 2022-09-23 指令处理装置、方法、计算机设备及存储介质 WO2023093260A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111433398.4 2021-11-29
CN202111433398.4A CN114090466A (zh) 2021-11-29 2021-11-29 一种指令处理装置、方法、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023093260A1 true WO2023093260A1 (zh) 2023-06-01

Family

ID=80305675

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120992 WO2023093260A1 (zh) 2021-11-29 2022-09-23 指令处理装置、方法、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN114090466A (zh)
WO (1) WO2023093260A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114090466A (zh) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 一种指令处理装置、方法、计算机设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308808A1 (en) * 2016-04-26 2017-10-26 Paypal, Inc Machine learning system
CN111026540A (zh) * 2018-10-10 2020-04-17 上海寒武纪信息科技有限公司 任务处理方法、任务调度器和任务处理装置
CN111783958A (zh) * 2020-07-03 2020-10-16 中用科技有限公司 一种数据处理系统、方法、装置和存储介质
CN114090466A (zh) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 一种指令处理装置、方法、计算机设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308808A1 (en) * 2016-04-26 2017-10-26 Paypal, Inc Machine learning system
CN111026540A (zh) * 2018-10-10 2020-04-17 上海寒武纪信息科技有限公司 任务处理方法、任务调度器和任务处理装置
CN111783958A (zh) * 2020-07-03 2020-10-16 中用科技有限公司 一种数据处理系统、方法、装置和存储介质
CN114090466A (zh) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 一种指令处理装置、方法、计算机设备及存储介质

Also Published As

Publication number Publication date
CN114090466A (zh) 2022-02-25

Similar Documents

Publication Publication Date Title
US10261796B2 (en) Processor and method for executing in-memory copy instructions indicating on-chip or off-chip memory
US10877757B2 (en) Binding constants at runtime for improved resource utilization
CN108431831B (zh) 循环代码处理器优化
US11182207B2 (en) Pre-fetching task descriptors of dependent tasks
CN111008040B (zh) 缓存装置及缓存方法、计算装置及计算方法
US11093168B2 (en) Processing of neural networks on electronic devices
KR102287677B1 (ko) 데이터 액세스 방법, 장치, 기기 및 저장 매체
WO2023093260A1 (zh) 指令处理装置、方法、计算机设备及存储介质
KR20210080009A (ko) 가속기, 가속기의 동작 방법 및 가속기를 포함한 디바이스
US20200089550A1 (en) Broadcast command and response
US20190294442A1 (en) Computer system and memory access technology
US20240078284A1 (en) Two-way descriptor matching on deep learning accelerator
US20150242224A1 (en) Disk resize of a virtual machine
CN117056255B (zh) 一种原子操作装置、方法、设备及介质
WO2017116923A1 (en) Efficient instruction processing for sparse data
WO2023142524A1 (zh) 指令处理方法、装置、芯片、电子设备以及存储介质
US11113061B2 (en) Register saving for function calling
US20210406209A1 (en) Allreduce enhanced direct memory access functionality
CN111656319B (zh) 利用特定数检测的多流水线架构
WO2021072060A1 (en) Method and system for executing neural network
CN114896179B (zh) 内存页的拷贝方法、装置、计算设备及可读存储介质
US20220014705A1 (en) Data processing method and related product
US11972261B2 (en) Hardware device for enforcing atomicity for memory operations
WO2019041265A1 (zh) 特征提取电路和图像处理集成电路
CN117437451B (zh) 图像匹配方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897346

Country of ref document: EP

Kind code of ref document: A1