CN114090466A - Instruction processing device and method, computer equipment and storage medium - Google Patents

Instruction processing device and method, computer equipment and storage medium Download PDF

Info

Publication number
CN114090466A
CN114090466A CN202111433398.4A CN202111433398A CN114090466A CN 114090466 A CN114090466 A CN 114090466A CN 202111433398 A CN202111433398 A CN 202111433398A CN 114090466 A CN114090466 A CN 114090466A
Authority
CN
China
Prior art keywords
instruction
processed
result data
write address
external memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111433398.4A
Other languages
Chinese (zh)
Inventor
霍冠廷
王文强
徐宁仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Power Tensors Intelligent Technology Co Ltd
Original Assignee
Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Power Tensors Intelligent Technology Co Ltd filed Critical Shanghai Power Tensors Intelligent Technology Co Ltd
Priority to CN202111433398.4A priority Critical patent/CN114090466A/en
Publication of CN114090466A publication Critical patent/CN114090466A/en
Priority to PCT/CN2022/120992 priority patent/WO2023093260A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present disclosure provides an instruction processing apparatus, method, computer device, and storage medium, wherein the instruction processing apparatus includes: the controller, the instruction processing circuit and the arithmetic unit are connected in sequence; the controller is used for sending a command to be processed to the command processing circuit; the instruction processing circuit is used for determining whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit; and the arithmetic unit is used for responding to the received target instruction and executing the target instruction to obtain the result data. The embodiment of the disclosure can reduce the write access process to the external memory, reduce power consumption, and reduce the problems of storage resource conflict and the like.

Description

Instruction processing device and method, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of electronic circuit technologies, and in particular, to an instruction processing apparatus, an instruction processing method, a computer device, and a storage medium.
Background
With the development of cloud computing, big data and artificial intelligence technologies, ever-increasing computing power demands are brought. The current data processing mode has the problem of large power consumption.
Disclosure of Invention
The embodiment of the disclosure at least provides an instruction processing device, an instruction processing method, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides an instruction processing apparatus, including: the controller, the instruction processing circuit and the arithmetic unit are connected in sequence;
the controller is used for sending a command to be processed to the command processing circuit;
the instruction processing circuit is used for determining whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit;
and the arithmetic unit is used for responding to the received target instruction and executing the target instruction to obtain the result data.
Therefore, the write access process to the external memory can be reduced, the power consumption is saved, and the problems of storage resource conflict and the like are reduced.
In an optional implementation manner, the instruction processing circuit is further configured to, in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory, send the current instruction to be processed to the arithmetic unit;
the arithmetic unit is further configured to execute the current instruction to be processed in response to receiving the current instruction to be processed, obtain the result data, and write the result data into the external memory.
In this way, data determined to need to be written can be processed normally.
In an optional implementation manner, the instruction processing circuit is further configured to, in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory, generate, based on the current instruction to be processed, a second target instruction representing that the result data needs to be written into the external memory, and send the second target instruction to the arithmetic unit;
the arithmetic unit is further configured to execute the second target instruction in response to receiving the second target instruction, obtain the result data, and write the result data into the external memory.
In an alternative embodiment, there are multiple said instructions to be processed; the address information of the current instruction to be processed comprises: a first write address;
the instruction processing circuit, when determining whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed, is configured to:
comparing the first write address of the current instruction to be processed with the second write address of the first instruction; in response to that the first write address is the same as the second write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory;
wherein the first instruction comprises: and at least part of other instructions except the current instruction to be processed are processed.
Therefore, whether the result data corresponding to the current instruction to be processed is written into the external memory or not can be determined by judging whether the write addresses are the same, and power consumption caused by the result data corresponding to the instruction which does not need to be written is reduced.
In an alternative embodiment, the instruction processing circuit is further configured to: determining the first instruction from a plurality of the instructions to be processed based on the type of the current instruction to be processed.
Therefore, the processing can be performed aiming at the instructions to be processed of the same type, and the error processing is reduced.
In an optional implementation manner, when it is determined that result data corresponding to the current instruction to be processed does not need to be written into the external memory in response to the first write address being the same as the second write address, the instruction processing circuit is configured to:
in response to the first write address being the same as the second write address, determining a second instruction from the instructions to be processed based on an emission order of the first instruction and the current instruction to be processed;
comparing the read address of the second instruction with the first write address of the current instruction to be processed;
and in response to that the read address is different from the first write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory.
Therefore, the determined instruction which does not need to be written is more accurate, and the influence on subsequent processing is reduced.
In an alternative embodiment, the instruction processing circuit includes: the device comprises a buffer stack, a comparison circuit and an instruction modification circuit;
the write-in end of the buffer stack is connected with the controller and used for buffering the instruction to be processed; the output end of the buffer stack is respectively connected with the comparison circuit and the instruction modification circuit;
the comparison circuit is used for reading a first write address of the current instruction to be processed and a second write address of the first instruction from the buffer stack; comparing the first write address with the second write address; in response to the first write address being the same as the second write address, outputting a first control signal to the instruction decoration circuit;
the instruction modification circuit is used for responding to the current instruction to be processed read from the buffer and receiving the first control signal sent by the comparison circuit, generating the target instruction based on the instruction to be processed, and sending the target instruction to the arithmetic unit.
Therefore, the processing process of the instruction is completed through each device in the instruction processing circuit, the writing access process to the external memory is reduced, the power consumption is saved, and the problems of storage resource conflict and the like are reduced.
In an alternative embodiment, the instruction modification circuit, when generating the target instruction based on the pending instruction, is configured to:
modifying an output control bit in the instruction to be processed into a preset value; alternatively, the first and second electrodes may be,
adding a preset bit at a preset position of the instruction to be processed, and setting the value of the preset bit as the preset value;
the preset value is used for indicating that result data corresponding to the current instruction to be processed does not need to be written into an external memory.
Therefore, by processing the target instruction, the external memory is not accessed, the power consumption is saved, and the problems of storage resource conflict and the like are reduced.
In an alternative embodiment, the compare circuit, when outputting a first control signal to the instruction modifier circuit in response to the first write address being the same as the second write address, is configured to:
determining a target buffer memory file from the buffer memory file based on the positions of the current instruction to be processed and the registers respectively corresponding to the first instruction in the buffer memory file;
reading a read address corresponding to a second instruction from the target buffer stack;
comparing the read address with the first write address;
in response to the read address being different from the first write address, outputting a first control signal to the instruction decoration circuit.
In this way, an instruction that is not required to be written to the external memory can be determined more accurately by the comparison circuit.
In an optional implementation, the comparison circuit is further configured to: in response to the first write address and the second write address being different, outputting a second control signal to the instruction decoration circuit;
the instruction modification circuit is further configured to send the current instruction to be processed to the arithmetic unit in response to the instruction to be processed being read from the buffer and the second control signal sent by the comparison circuit being received.
In this way, the result data corresponding to the instruction that needs to be written to the external memory can be written.
In a second aspect, an embodiment of the present disclosure provides an instruction processing method, which is applied to an instruction processing apparatus, where the instruction processing apparatus includes: the controller, the instruction processing circuit and the arithmetic unit are connected in sequence; the instruction processing method comprises the following steps:
the controller sends a command to be processed to the command processing circuit;
the instruction processing circuit determines whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit;
and the arithmetic unit responds to the received target instruction and executes the target instruction to obtain the result data.
In an optional embodiment, the method further comprises:
the instruction processing circuit responds to the fact that the result data corresponding to the current instruction to be processed needs to be written into an external memory, and sends the current instruction to be processed to the arithmetic unit;
and the arithmetic unit responds to the received current instruction to be processed, executes the current instruction to be processed to obtain the result data, and writes the result data into the external memory.
In an optional implementation manner, in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory, the instruction processing circuit generates, based on the current instruction to be processed, a second target instruction representing that the result data needs to be written into the external memory, and sends the second target instruction to the arithmetic unit;
and the arithmetic unit responds to the second target instruction, executes the second target instruction, obtains the result data, and writes the result data into the external memory.
In an alternative embodiment, there are multiple said instructions to be processed; the address information of the current instruction to be processed comprises: a first write address;
the instruction processing circuit determines whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed, and the method comprises the following steps:
comparing the first write address of the current instruction to be processed with the second write address of the first instruction; in response to that the first write address is the same as the second write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory;
wherein the first instruction comprises: and at least part of other instructions except the current instruction to be processed are processed.
In an optional embodiment, the method further comprises:
the instruction processing circuit determines the first instruction from a plurality of the instructions to be processed based on a type of the current instruction to be processed.
In an optional implementation manner, the determining that the result data corresponding to the current instruction to be processed does not need to be written into the external memory in response to the first write address being the same as the second write address includes:
in response to the first write address being the same as the second write address, determining a second instruction from the instructions to be processed based on an emission order of the first instruction and the current instruction to be processed;
comparing the read address of the second instruction with the first write address of the current instruction to be processed;
and in response to that the read address is different from the first write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory.
In an alternative embodiment, the instruction processing circuit includes: the device comprises a buffer stack, a comparison circuit and an instruction modification circuit; the write-in end of the buffer stack is connected with the controller and used for buffering the instruction to be processed; the output end of the buffer stack is respectively connected with the comparison circuit and the instruction modification circuit; the instruction processing method further comprises:
the comparison circuit reads a first write address of the current instruction to be processed and a second write address of the first instruction from the buffer stack; comparing the first write address with the second write address; in response to the first write address being the same as the second write address, outputting a first control signal to the instruction decoration circuit;
the instruction modification circuit responds to the current instruction to be processed read from the buffer and receives the first control signal sent by the comparison circuit, generates the target instruction based on the instruction to be processed and sends the target instruction to the arithmetic unit.
In an alternative embodiment, the instruction modification circuit generates the target instruction based on the pending instruction, including:
modifying an output control bit in the instruction to be processed into a preset value; alternatively, the first and second electrodes may be,
adding a preset bit at a preset position of the instruction to be processed, and setting the value of the preset bit as the preset value;
the preset value is used for indicating that result data corresponding to the current instruction to be processed does not need to be written into an external memory.
In an alternative embodiment, the compare circuit outputting a first control signal to the instruction modifier circuit in response to the first write address being the same as the second write address, includes:
determining a target buffer memory file from the buffer memory file based on the positions of the current instruction to be processed and the registers respectively corresponding to the first instruction in the buffer memory file;
reading a read address corresponding to a second instruction from the target buffer stack;
comparing the read address with the first write address;
in response to the read address being different from the first write address, outputting a first control signal to the instruction decoration circuit.
In an optional embodiment, the method further comprises:
the comparison circuit outputs a second control signal to the instruction modification circuit in response to the first write address and the second write address being different;
the instruction modification circuit responds to the current instruction to be processed read from the buffer and receives the second control signal sent by the comparison circuit, and sends the current instruction to be processed to the arithmetic unit.
In a third aspect, an embodiment of the present disclosure further provides a data processing chip, including: an instruction processing apparatus as claimed in any one of the first aspect.
In a fourth aspect, an embodiment of the present disclosure further provides a computer device, including: an instruction processing apparatus as claimed in any one of the first aspect, or comprising a data processing chip as claimed in the third aspect.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the second aspect or any one of the possible implementation manners of the second aspect.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
FIG. 1 is a schematic diagram of an instruction processing apparatus provided by an embodiment of the present disclosure;
FIG. 2 shows one of the schematic diagrams of an instruction processing circuit in the instruction processing apparatus provided by the embodiments of the present disclosure;
FIG. 3 is a second schematic diagram of an instruction processing circuit in the instruction processing apparatus according to the embodiment of the disclosure;
FIG. 4 is a flow chart illustrating a method of instruction processing provided by an embodiment of the present disclosure;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
The convolutional neural network is a feedforward neural network which comprises convolutional calculation and has a deep structure, and is one of the representative algorithms of deep learning. The convolutional neural network is an efficient recognition algorithm which is widely applied to the fields of pattern recognition, image processing and the like in recent years, and has the characteristics of simple structure, few training parameters, strong adaptability, translation, rotation, scaling and the like. While convolutional neural networks provide impressive results in a range of computer vision and machine learning tasks, they are computationally demanding, limiting their deployability. In order to solve the contradiction between the calculation amount and the speed, an Application Specific Integrated Circuit (ASIC) can be used, so that the method is applied to the calculation acceleration of the neural network.
Taking a convolutional neural network as an example, the convolution in the convolutional neural network refers to defining a convolutional kernel, performing sliding matching on a feature map, multiplying corresponding positions, and then accumulating, wherein the final result obtained by accumulation is the captured local spatial feature. The accumulated intermediate values are only used for subsequent calculation without outputting, and if the intermediate values are output, extra power consumption and conflict of storage resources are caused. For example, the convolution expansion instruction includes an operation type and the operation object, and whether or not a control bit is not set for one accumulation output is determined. In the case where the instruction set does not control the outcome output, if not processed in the circuit, the computation of intermediate values will continue to overwrite the output, presenting a conflict problem that introduces additional power consumption and more stringent memory resources.
Based on the above research, the present disclosure provides an instruction processing apparatus, which processes a to-be-processed instruction sent by a controller through an instruction processing circuit, determines whether to write result data corresponding to the current to-be-processed instruction into an external memory, generates a corresponding target instruction when it is determined that it is not necessary to write result data corresponding to the current to-be-processed instruction into the external memory, and executes the target instruction by using an arithmetic unit to obtain corresponding result data, where the result data is not written into the external memory, thereby reducing a write access process to the external memory, reducing power consumption in a data processing process, and reducing problems such as storage resource conflict.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
To facilitate understanding of the present embodiment, a detailed description is first given of an instruction processing apparatus provided in an embodiment of the present disclosure.
Referring to fig. 1, a schematic diagram of an instruction processing apparatus according to an embodiment of the disclosure is shown. As shown in fig. 1, the instruction processing apparatus includes: a controller 110, an instruction processing circuit 120, and an arithmetic unit 130 connected in this order;
the controller 110 is configured to send a to-be-processed instruction to the instruction processing circuit 120;
the instruction processing circuit 120 is configured to determine whether to write result data corresponding to a current instruction to be processed into an external memory according to address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit;
the arithmetic unit 130 is configured to, in response to receiving a target instruction, execute the target instruction to obtain the result data.
In another embodiment of the present disclosure, the instruction processing circuit 120 is further configured to send the current instruction to be processed to the arithmetic unit in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory;
the operation unit 130 is further configured to, in response to receiving the current instruction to be processed, execute the current instruction to be processed to obtain the result data, and write the result data into the external memory.
Next, the controller 110, the instruction processing circuit 120, and the arithmetic unit 130 will be described in detail.
A: for the controller 110: the controller 110 is further configured to receive a command issued by the command issuing apparatus; the command issuing means may include, for example, any one of the following: host virtual machines, computer containers (containers), applications, or different functions in an application. After the command issuing device issues the command to the command processing device, the controller 110 in the command processing device can analyze the command to generate a plurality of instructions to be processed. By executing a plurality of instructions to be processed, the execution of the command issued by the instruction issuing device can be realized.
The controller 110 may be a Central Processing Unit (CPU) in the terminal device, the instruction processing circuit 120 may be a plurality of logic circuits in the terminal device, and the arithmetic unit 130 may be a device for executing arithmetic processing in the terminal device, which is not limited herein. After the command issuing device issues the command, the controller 110 receives the issued command and performs conversion processing on the issued command to obtain a related command capable of being processed by the command processing circuit 120; after the instruction processing circuit 120 processes the instruction issued by the controller 110, the operation unit 130 may perform a final operation process to obtain a corresponding operation result.
Here, the command issuing device and the controller 110, the instruction processing circuit 120, and the arithmetic unit 130 may transmit signals by electric signals, optical signals, or the like, and are not limited thereto.
B: for instruction processing circuit 120: the instruction to be processed can be multiple, and multiple instructions to be processed can be from the same command or from multiple commands; the types of different instructions to be processed can be the same or different; in a pending instruction, at least one of the following information is included: an opcode, an operand; the operand can be represented as a read address of the operand, and the operand can be read from a storage position corresponding to the read address through the read address, and the operation corresponding to the operation code is executed on the operand to obtain result data corresponding to the instruction to be processed; in addition, a write address for writing a result of the instruction to be processed to a storage location corresponding to the write address may be further included in the instruction to be processed.
Under the condition that the write addresses of the two to-be-processed instructions are the same, after result data of the to-be-processed instruction executed in advance are stored in the corresponding storage positions, the result data obtained by the to-be-processed instruction executed later can be covered with the result data obtained by the instruction executed in advance.
In the deep learning field, in many cases, the result of some instructions is only an intermediate result, for example, after a command corresponding to a convolution operator is issued to a data processing apparatus, the command can be resolved into a plurality of instructions to be executed, during the execution of convolution, a large number of intermediate results exist, and these intermediate results may be stored in the same storage location of an external memory at different processing stages and are finally covered by the convolution result of the convolution operator.
If the result data of a certain instruction to be processed is only an intermediate result, the result data will be overwritten by the result data of other subsequent instructions to be processed, and will not be used as the operand of other instructions to be processed, i.e. it will not be stored in the external memory. In this way, by analyzing whether the result data of the plurality of instructions to be processed is intermediate data, the processing method of the result data of each instruction to be processed can be determined.
Illustratively, taking the cumulative multiplication calculation as an example, assume that the following commands exist:
for(i=0;i<n;i++)
MACC(dst,rs1,rs2)
where MACC is the accumulate multiply instruction, dst is the write address, and rs1/rs2 is the read address. The cumulative multiplication instruction (MACC) refers to a method for representing the calculated amount of a Convolutional Neural Network (CNN), and the total calculated amount MACC of one network is equal to the MACC accumulation of each layer. A write address and a read address exist for each layer of neural network, and one or more times of calculation is needed for each write address/read address, so that a large amount of calculation is generated for the neural network with a large number of layers.
In the command, the MACC operation is repeated for a plurality of times, that is, the command may be resolved into to-be-processed instructions corresponding to the plurality of MACC operations, and the write addresses of each MACC operation are the same. In this command: the write address of the result of one MACC operation is written by the result of the subsequent MACC operations for multiple times in an overwriting mode, and the result of the last MACC operation is used as the final result of the command. Therefore, in the operation process, when the write addresses of the to-be-processed instructions corresponding to the multiple MACC operations are the same, the subsequent instruction overwrites the result data written by the previous instruction when writing the result data, thereby causing the problem that the writing process of the previous instruction cannot bring substantial influence, but brings additional functional consumption and conflict of storage resources. Therefore, if the writing process of the instruction to the result data can be changed, so that a plurality of instructions with the same writing address only reserve the result data obtained by the last MACC operation when writing the result data, the power consumption can be effectively reduced, and the problems of storage resource conflict and the like can be reduced.
Based on the above principle, the embodiment of the present disclosure determines whether to write result data corresponding to a current instruction to be processed into an external memory according to address information of the current instruction to be processed.
Illustratively, the address information of the current instruction to be processed includes: the first write address.
The instruction processing circuit 120, when determining whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed, is configured to:
comparing the first write address of the current instruction to be processed with the second write address of the first instruction; in response to that the first write address is the same as the second write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory;
wherein the first instruction comprises: and at least part of other instructions except the current instruction to be processed are processed.
In a specific implementation, the instruction processing circuit 120 is further configured to: determining the first instruction from a plurality of the instructions to be processed based on the type of the current instruction to be processed.
For example, the first instruction may be determined from the plurality of instructions to be processed in any one of the following manners (1) or (2):
(1) executing the following comparison step at least once until the determined second write address of the current first instruction is the same as the first write address of the current instruction to be processed, or until the instructions to be processed are all determined to be the first instructions:
and determining one closest to the transmitting position of the current instruction to be processed from other instructions to be processed as the current first instruction according to the transmitting positions of the current instruction to be processed in the transmitting sequence of all instructions to be processed and the transmitting positions of other instructions to be processed which are not determined as the first instruction in the transmitting sequence of all instructions to be processed, and comparing the second write address of the current first instruction with the first write address of the current instruction to be processed.
For example, it is assumed that the instructions to be processed are, in order, in the transmission order of the instructions: a1, a2, a3, … … and an.
a1 is a current command to be transmitted, firstly, a2 is used as a current first command, and the first write address of a1 is compared with the second write address of a 2;
if the first write address of a1 is not the same as the second write address of a2, taking a3 as the current first instruction, and comparing the first write address of a1 with the second write address of a 3;
if the first write address of a1 is not the same as the second write address of a3, taking a4 as the current first instruction, and comparing the first write address of a1 with the second write address of a 4;
……
and comparing the first write address of a1 with the second write address of an, and if the first write address of a1 is different from the second write address of an, confirming that the result data of a1 needs to be written into the external memory.
If the first write address of a1 and the second write address of the ith first instruction ai are the same in the above example, it is confirmed that there is no need to write the result data of a1 in the external memory.
(2) Determining other instructions to be processed except the current instruction to be processed in the instructions to be processed as first instructions, and then respectively comparing the first write address of the current instruction to be processed with the second write addresses of the first instructions.
And if the second write address of any one first instruction is the same as the first write address of the current instruction to be processed, confirming that the result data of the current instruction to be processed is not required to be written into the external memory.
And if the second write addresses of all the first instructions are different from the first write address of the current instruction to be processed, confirming that the result of the current instruction to be processed needs to be written into the external memory.
For some pending instructions, although the result data is overwritten by the result data of other pending instructions, the result data is read as an operand of another pending instruction before being overwritten by the result data of other pending instructions, so as to indicate the another pending instruction.
For example, in the above example, for the result data of the current pending instruction a1, which may be overwritten by the result data of the pending instruction a5, when executing the pending instruction a4, it is necessary to read using the result data of the current pending instruction a1 as an operand, and therefore, even if the first write address of a1 is the same as the second write address of a5, when a4 is executed before a5, it is still necessary to write the result data of a1 into the storage space corresponding to the first write address.
Therefore, in another embodiment of the present disclosure, when determining that it is not necessary to write result data corresponding to the current instruction to be processed into the external memory in response to that the first write address is the same as the second write address, the instruction processing circuit 120 may be further specifically configured to:
in response to the first write address being the same as the second write address, determining a second instruction from the instructions to be processed based on an emission order of the first instruction and the current instruction to be processed;
comparing the read address of the second instruction with the first write address of the current instruction to be processed;
and in response to that the read address is different from the first write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory.
In a specific implementation, when determining the second instruction from the instructions to be processed based on the transmission sequence between the first instruction and the current instruction to be processed, for example, any one of the following (3) or (4) may be adopted:
(3): for determining the first instruction by the above-mentioned method (1), because each instruction to be processed is sequentially used as the first instruction according to the transmission sequence of the instruction to be processed, and the first write address of the current instruction to be processed is compared with the second write address of the current first instruction, and when the second write address of the current first instruction is the same as the first write address of the current instruction to be processed, the subsequent comparison is stopped, therefore, only the current first instruction with the same second write address and first write address needs to be determined as the reference instruction, each instruction to be processed with the transmission position in the transmission sequence between the current instruction to be processed and the reference instruction is determined as the second instruction, and the read address of the second instruction is compared with the first write address of the current instruction to be processed.
For example, in the example of (1) above, if the first write address of a1 is the same as the second write address of the ith first instruction ai, the to-be-processed instruction a2 to a (i-1) are all determined as the second instructions, and the read addresses of a2 to a (i-1) are compared with the first write address of the current to-be-processed instruction a 1.
(4) For determining the first instruction by adopting the above-mentioned manner (2), because all the other instructions to be processed except the current instruction to be processed are taken as the first instructions, and the second write addresses of the respective first instructions are compared with the first write address of the current instruction to be processed, if the second write addresses of m first instructions are the same as the first write addresses, a first instruction whose emission position is closest to the emission position of the current instruction to be processed can be determined from the m first instructions as a reference instruction, the respective instruction to be processed whose emission position is between the current instruction to be processed and the reference instruction is determined as the second instruction, and the read address of the second instruction is compared with the first write address of the current instruction to be processed.
In the determined second instruction, if the read address of any second instruction is the same as the first write address of the current instruction to be processed, the result data corresponding to the current instruction to be processed needs to be written into the external memory.
And if the read addresses of all the second instructions are different from the first write address of the current instruction to be processed, determining that the result data corresponding to the current instruction to be processed does not need to be written into the external memory.
Referring to fig. 2 and fig. 3, the embodiment of the present disclosure further provides a specific example of an instruction processing circuit 120, where the instruction processing circuit 120 includes: a register file 121, a comparator circuit 122, and an instruction modifier circuit 123;
a write end of the buffer stack 121 is connected to the controller 110, and is configured to buffer the to-be-processed instruction; the output end of the register file 121 is connected to the comparison circuit 122 and the instruction modification circuit 123 respectively;
the comparison circuit 122 is configured to read a first write address of the current instruction to be processed and a second write address of the first instruction from the buffer stack 121; comparing the first write address with the second write address; outputting a first control signal to the instruction modification circuit 123 in response to the first write address being the same as the second write address;
the instruction modification circuit 123 is configured to, in response to the current instruction to be processed being read from the buffer and receiving the first control signal sent by the comparison circuit 122, generate the target instruction based on the instruction to be processed, and send the target instruction to the arithmetic unit.
For example, the buffer stack 121 may receive a plurality of instructions to be processed issued by the controller 110, and store the plurality of instructions to be processed in the buffer stack 121 to be called by the comparison circuit 122 and the instruction modification circuit 123.
The comparison circuit 122 is a main processing circuit of the instruction processing circuit 120, and is configured to compare the current instruction to be processed and address information of the first instruction, and may output a first control signal to the instruction modification circuit 123 when the first write address is the same as the second write address, or output a second control signal to the instruction modification circuit 123 when the first write address is different from the second write address.
Specifically, when the comparison circuit 122 outputs a first control signal to the instruction modification circuit 123 in response to the first write address being the same as the second write address, the comparison circuit is configured to: determining a target buffer file 121 from the buffer file 121 based on the positions of the registers corresponding to the current instruction to be processed and the first instruction respectively in the buffer file 121; reading a read address corresponding to a second instruction from the target buffer stack 121; comparing the read address with the first write address; in response to the read address being different from the first write address, a first control signal is output to the instruction modification circuit 123. Here, the processing mechanism of the comparison circuit 122 is similar to that described in the above description for the instruction processing circuit 120, and is not described again here.
The instruction modification circuit 123 may determine the modification content of the current instruction to be processed based on the comparison result obtained by the comparison circuit 122, and generate a corresponding target instruction.
Specifically, the instruction modification circuit 123, when generating the target instruction based on the instruction to be processed, is configured to:
modifying an output control bit in the instruction to be processed into a preset value; alternatively, the first and second electrodes may be,
adding a preset bit at a preset position of the instruction to be processed, and setting the value of the preset bit as the preset value;
the preset value is used for indicating that result data corresponding to the current instruction to be processed does not need to be written into an external memory.
Illustratively, when a current instruction to be processed is stored in the buffer stack 121, when the current instruction to be processed is output from the buffer stack 121, comparing whether a write address of the current instruction to be processed is the same as a write address of another instruction of the same type stored in the buffer stack 121, and if the write addresses are the same, modifying a preset bit at a preset position in the current instruction to be processed to 0, so as to indicate that it is not necessary to write result data corresponding to the current instruction to be processed into an external memory. Or, the current instruction to be processed does not have a preset bit at a preset position, and a 1-bit signal may be added to the current instruction to be processed to indicate that the result data corresponding to the current instruction to be processed does not access the external memory.
In another embodiment of the present disclosure, the comparing circuit 122 is further configured to: outputting a second control signal to the instruction modification circuit 123 in response to the first write address and the second write address being different; the instruction modification circuit 123 is further configured to send the current instruction to be processed to the arithmetic unit in response to the instruction to be processed being read from the buffer and the second control signal sent by the comparison circuit 122 being received. That is, when it is determined to access the external memory, a corresponding access operation is performed.
Referring to fig. 3, a second schematic diagram of an instruction processing circuit 120 in the instruction processing apparatus according to the embodiment of the disclosure is shown. As shown in fig. 3, the pending instructions 1, 2, …, n are sequentially stored in the n instruction caches of the register file 121, specifically, if any instruction except the current pending instruction satisfies the following two conditions: firstly, the write address of the instruction is the same as the write address of the current instruction to be processed; for example, instruction 0 has the same write address as instruction 5; and secondly, if no other instruction read address exists between the instruction and the current instruction to be processed, and the write address of the current instruction to be processed is the same, for example, the read addresses of the instructions 1 to 4 are the same as the write address of the instruction 0, it is determined that the result data corresponding to the current instruction to be processed does not need to be written into an external memory. The preset position in the current instruction to be processed can be modified to be 0, and if the instruction does not contain the preset position, the additional 1-bit signal indicates that the result data corresponding to the current instruction to be processed does not access the external memory.
C: for the operation unit 130, the operation unit 130 includes, for example: a two-dimensional Processing Engine (PE) array and a register array (local register file), in each PE, a computing element such as a multiplier-adder is included for performing a specific computing task.
The arithmetic unit 130, in response to receiving the target instruction, executes the target instruction, resulting in the result data.
In another embodiment of the present disclosure, when it is determined that result data corresponding to the current instruction to be processed needs to be written into an external memory, the current instruction to be processed is sent to the arithmetic unit; in response to receiving the current instruction to be processed, the operation unit 130 executes the current instruction to be processed to obtain the result data, and writes the result data into the external memory.
The operation unit 130 may perform read/write access on the connected external memory, and correspondingly, the external memory may store data transmitted during the read/write access performed by the connected operation unit 130. For example, the computing unit includes at least one PE array; each PE may be connected to a different external memory, or multiple PEs may be connected to the same external memory.
In the embodiment of the present disclosure, the target instruction includes a calculation instruction for calculating corresponding result data in the instruction to be processed, and an instruction for indicating whether to write the instruction into the external memory.
Specifically, after determining that it is not necessary to write result data corresponding to the current instruction to be processed into the external memory, at this time, the operation unit 130 receives the generated target instruction, where the target instruction includes an instruction for calculating the result data and an instruction not written into the external memory, and thus, while the operation unit 130 calculates and obtains the corresponding result data, the operation unit 130 does not perform processing for writing the obtained result data into the external memory, but retains the result data in the operation unit 130 for a subsequent processing procedure.
After determining that the result data corresponding to the current instruction to be processed needs to be written into the external memory, at this time, the arithmetic unit 130 receives the generated target instruction, where the target instruction includes an instruction for calculating the result data and an instruction for writing into the external memory, and thus, the arithmetic unit 130 performs a processing procedure for writing the obtained result data into the external memory while calculating the corresponding result data.
In the instruction processing apparatus provided in the embodiment of the present disclosure, the controller 110 sends the instruction to be processed to the instruction processing circuit 120, and the instruction processing circuit 120 processes the instruction to be processed sent by the controller 110, determines whether to write the result data corresponding to the current instruction to be processed into the external memory, and generates a corresponding target instruction when it is determined that the result data corresponding to the current instruction to be processed does not need to be written into the external memory, and executes the target instruction by using the arithmetic unit 130 to obtain the corresponding result data, which is not written into the external memory, thereby reducing the write access process to the external memory, reducing the power consumption in the data processing process, and reducing the problems of storage resource conflict and the like. The controller 110, the instruction processing circuit 120, and the arithmetic unit 130 are described in detail below, respectively.
Based on the same inventive concept, an instruction processing method corresponding to the instruction processing apparatus is also provided in the embodiments of the present disclosure, and since the principle of solving the problem of the method in the embodiments of the present disclosure is similar to that of the instruction processing apparatus in the embodiments of the present disclosure, the implementation of the method can refer to the implementation of the apparatus, and repeated details are not described again.
An execution subject of the instruction processing method provided by the embodiment of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: terminal equipment or a server or other processing equipment, and the instruction processing method can be realized by calling computer readable instructions stored in a memory by a processor.
Referring to fig. 4, fig. 4 is a flowchart of an instruction processing method provided by the embodiment of the present disclosure, where the instruction processing method is applied to an instruction processing apparatus, and the instruction processing apparatus includes: the device comprises a controller, an instruction processing circuit and an arithmetic unit which are connected in sequence. The instruction processing method comprises steps S401-S403, wherein:
s401: the controller sends a command to be processed to the command processing circuit;
s402: the instruction processing circuit determines whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit;
s403: and the arithmetic unit responds to the received target instruction and executes the target instruction to obtain the result data.
In an optional embodiment, the method further comprises:
the instruction processing circuit responds to the fact that the result data corresponding to the current instruction to be processed needs to be written into an external memory, and sends the current instruction to be processed to the arithmetic unit;
and the arithmetic unit responds to the received current instruction to be processed, executes the current instruction to be processed to obtain the result data, and writes the result data into the external memory.
In an optional implementation manner, the instruction processing circuit is further configured to, in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory, generate, based on the current instruction to be processed, a second target instruction representing that the result data needs to be written into the external memory, and send the second target instruction to the arithmetic unit;
the arithmetic unit is further configured to execute the second target instruction in response to receiving the second target instruction, obtain the result data, and write the result data into the external memory.
In an alternative embodiment, there are multiple said instructions to be processed; the address information of the current instruction to be processed comprises: a first write address;
the instruction processing circuit determines whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed, and the method comprises the following steps:
comparing the first write address of the current instruction to be processed with the second write address of the first instruction; in response to that the first write address is the same as the second write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory;
wherein the first instruction comprises: and at least part of other instructions except the current instruction to be processed are processed.
In an optional embodiment, the method further comprises:
the instruction processing circuit determines the first instruction from a plurality of the instructions to be processed based on a type of the current instruction to be processed.
In an optional implementation manner, the determining that the result data corresponding to the current instruction to be processed does not need to be written into the external memory in response to the first write address being the same as the second write address includes:
in response to the first write address being the same as the second write address, determining a second instruction from the instructions to be processed based on an emission order of the first instruction and the current instruction to be processed;
comparing the read address of the second instruction with the first write address of the current instruction to be processed;
and in response to that the read address is different from the first write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory.
In an alternative embodiment, the instruction processing circuit includes: the device comprises a buffer stack, a comparison circuit and an instruction modification circuit; the write-in end of the buffer stack is connected with the controller and used for buffering the instruction to be processed; the output end of the buffer stack is respectively connected with the comparison circuit and the instruction modification circuit; the instruction processing method further comprises:
the comparison circuit reads a first write address of the current instruction to be processed and a second write address of the first instruction from the buffer stack; comparing the first write address with the second write address; in response to the first write address being the same as the second write address, outputting a first control signal to the instruction decoration circuit;
the instruction modification circuit responds to the current instruction to be processed read from the buffer and receives the first control signal sent by the comparison circuit, generates the target instruction based on the instruction to be processed and sends the target instruction to the arithmetic unit.
In an alternative embodiment, the instruction modification circuit generates the target instruction based on the pending instruction, including:
modifying an output control bit in the instruction to be processed into a preset value; alternatively, the first and second electrodes may be,
adding a preset bit at a preset position of the instruction to be processed, and setting the value of the preset bit as the preset value;
the preset value is used for indicating that result data corresponding to the current instruction to be processed does not need to be written into an external memory.
In an alternative embodiment, the compare circuit outputting a first control signal to the instruction modifier circuit in response to the first write address being the same as the second write address, includes:
determining a target buffer memory file from the buffer memory file based on the positions of the current instruction to be processed and the registers respectively corresponding to the first instruction in the buffer memory file;
reading a read address corresponding to a second instruction from the target buffer stack;
comparing the read address with the first write address;
in response to the read address being different from the first write address, outputting a first control signal to the instruction decoration circuit.
In an optional embodiment, the method further comprises:
the comparison circuit outputs a second control signal to the instruction modification circuit in response to the first write address and the second write address being different;
the instruction modification circuit responds to the current instruction to be processed read from the buffer and receives the second control signal sent by the comparison circuit, and sends the current instruction to be processed to the arithmetic unit.
According to the embodiment of the disclosure, the instruction processing circuit is used for processing the instruction to be processed sent by the controller, determining whether result data corresponding to the current instruction to be processed is written into the external memory, generating a corresponding target instruction when it is determined that the result data corresponding to the current instruction to be processed is not needed to be written into the external memory, and executing the target instruction by using the arithmetic unit to obtain corresponding result data.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
An embodiment of the present disclosure further provides a data processing chip, including: an instruction processing apparatus as claimed in any one of the embodiments of the present disclosure.
The data processing chip provided by the embodiment of the present disclosure may include a graphics processor, an Artificial Intelligence (AI) chip, and the like.
An embodiment of the present disclosure further provides a computer device, including: instruction memory and the instruction processing device that this disclosed embodiment provided or the data processing chip that this disclosed embodiment provided.
The computer device provided by the embodiment of the present disclosure may include an intelligent terminal such as a mobile phone, or may also be other devices, servers, and the like that may be used for performing instruction processing, and is not limited herein.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the instruction processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute steps of the instruction processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (14)

1. An instruction processing apparatus, comprising: the controller, the instruction processing circuit and the arithmetic unit are connected in sequence;
the controller is used for sending a command to be processed to the command processing circuit;
the instruction processing circuit is used for determining whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit;
and the arithmetic unit is used for responding to the received target instruction and executing the target instruction to obtain the result data.
2. The apparatus according to claim 1, wherein the instruction processing circuit is further configured to, in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory, send the current instruction to be processed to the arithmetic unit;
the arithmetic unit is further configured to execute the current instruction to be processed in response to receiving the current instruction to be processed, obtain the result data, and write the result data into the external memory.
3. The apparatus according to claim 1, wherein the instruction processing circuit is further configured to, in response to determining that result data corresponding to the current instruction to be processed needs to be written into an external memory, generate, based on the current instruction to be processed, a second target instruction that indicates that the result data needs to be written into the external memory, and send the second target instruction to the arithmetic unit;
the arithmetic unit is further configured to execute the second target instruction in response to receiving the second target instruction, obtain the result data, and write the result data into the external memory.
4. The apparatus according to any one of claims 1-3, wherein there are a plurality of said pending instructions; the address information of the current instruction to be processed comprises: a first write address;
the instruction processing circuit, when determining whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed, is configured to:
comparing the first write address of the current instruction to be processed with the second write address of the first instruction; in response to that the first write address is the same as the second write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory;
wherein the first instruction comprises: and at least part of other instructions except the current instruction to be processed are processed.
5. The apparatus of claim 4, wherein the instruction processing circuit is further configured to: determining the first instruction from a plurality of the instructions to be processed based on the type of the current instruction to be processed.
6. The apparatus according to claim 4 or 5, wherein the instruction processing circuit, when determining that it is not necessary to write the result data corresponding to the current instruction to be processed into the external memory in response to the first write address being the same as the second write address, is configured to:
in response to the first write address being the same as the second write address, determining a second instruction from the instructions to be processed based on an emission order of the first instruction and the current instruction to be processed;
comparing the read address of the second instruction with the first write address of the current instruction to be processed;
and in response to that the read address is different from the first write address, determining that result data corresponding to the current instruction to be processed does not need to be written into the external memory.
7. The apparatus of any of claims 4-6, wherein the instruction processing circuit comprises: the device comprises a buffer stack, a comparison circuit and an instruction modification circuit;
the write-in end of the buffer stack is connected with the controller and used for buffering the instruction to be processed; the output end of the buffer stack is respectively connected with the comparison circuit and the instruction modification circuit;
the comparison circuit is used for reading a first write address of the current instruction to be processed and a second write address of the first instruction from the buffer stack; comparing the first write address with the second write address; in response to the first write address being the same as the second write address, outputting a first control signal to the instruction decoration circuit;
the instruction modification circuit is used for responding to the current instruction to be processed read from the buffer and receiving the first control signal sent by the comparison circuit, generating the target instruction based on the instruction to be processed, and sending the target instruction to the arithmetic unit.
8. The apparatus of claim 7, wherein the instruction grooming circuit, in generating the target instruction based on the pending instruction, is to:
modifying an output control bit in the instruction to be processed into a preset value; alternatively, the first and second electrodes may be,
adding a preset bit at a preset position of the instruction to be processed, and setting the value of the preset bit as the preset value;
the preset value is used for indicating that result data corresponding to the current instruction to be processed does not need to be written into an external memory.
9. The apparatus of claim 7 or 8, wherein the compare circuit, when outputting a first control signal to the instruction modifier circuit in response to the first write address being the same as the second write address, is to:
determining a target buffer memory file from the buffer memory file based on the positions of the current instruction to be processed and the registers respectively corresponding to the first instruction in the buffer memory file;
reading a read address corresponding to a second instruction from the target buffer stack;
comparing the read address with the first write address;
in response to the read address being different from the first write address, outputting a first control signal to the instruction decoration circuit.
10. The apparatus of any of claims 7-8, wherein the comparison circuit is further configured to: in response to the first write address and the second write address being different, outputting a second control signal to the instruction decoration circuit;
the instruction modification circuit is further configured to send the current instruction to be processed to the arithmetic unit in response to the instruction to be processed being read from the buffer and the second control signal sent by the comparison circuit being received.
11. An instruction processing method is applied to an instruction processing device, and the instruction processing device comprises the following steps: the controller, the instruction processing circuit and the arithmetic unit are connected in sequence; the instruction processing method comprises the following steps:
the controller sends a command to be processed to the command processing circuit;
the instruction processing circuit determines whether to write result data corresponding to the current instruction to be processed into an external memory according to the address information of the current instruction to be processed; in response to determining that the result data does not need to be written into an external memory, generating a first target instruction representing that the result data does not need to be written into the external memory based on the current instruction to be processed, and sending the first target instruction to the arithmetic unit;
and the arithmetic unit responds to the received target instruction and executes the target instruction to obtain the result data.
12. A data processing chip, comprising: an instruction processing apparatus as claimed in any one of claims 1 to 10.
13. A computer device, comprising: an instruction processing apparatus as claimed in any one of claims 1 to 10, or comprising a data processing chip as claimed in claim 12.
14. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the instruction processing method according to claim 11.
CN202111433398.4A 2021-11-29 2021-11-29 Instruction processing device and method, computer equipment and storage medium Pending CN114090466A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111433398.4A CN114090466A (en) 2021-11-29 2021-11-29 Instruction processing device and method, computer equipment and storage medium
PCT/CN2022/120992 WO2023093260A1 (en) 2021-11-29 2022-09-23 Instruction processing apparatus and method, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111433398.4A CN114090466A (en) 2021-11-29 2021-11-29 Instruction processing device and method, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114090466A true CN114090466A (en) 2022-02-25

Family

ID=80305675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111433398.4A Pending CN114090466A (en) 2021-11-29 2021-11-29 Instruction processing device and method, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114090466A (en)
WO (1) WO2023093260A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093260A1 (en) * 2021-11-29 2023-06-01 上海商汤智能科技有限公司 Instruction processing apparatus and method, and computer device and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308808A1 (en) * 2016-04-26 2017-10-26 Paypal, Inc Machine learning system
CN111026540B (en) * 2018-10-10 2022-12-02 上海寒武纪信息科技有限公司 Task processing method, task scheduler and task processing device
CN111783958A (en) * 2020-07-03 2020-10-16 中用科技有限公司 Data processing system, method, device and storage medium
CN114090466A (en) * 2021-11-29 2022-02-25 上海阵量智能科技有限公司 Instruction processing device and method, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023093260A1 (en) * 2021-11-29 2023-06-01 上海商汤智能科技有限公司 Instruction processing apparatus and method, and computer device and storage medium

Also Published As

Publication number Publication date
WO2023093260A1 (en) 2023-06-01

Similar Documents

Publication Publication Date Title
CN107895191B (en) Information processing method and related product
CN107704267B (en) Convolution neural network operation instruction and method thereof
CN111310904B (en) Apparatus and method for performing convolutional neural network training
CN109358900B (en) Artificial neural network forward operation device and method supporting discrete data representation
CN107766079B (en) Processor and method for executing instructions on processor
CN110574045B (en) Pattern matching for optimized deep network processing
EP3877839A1 (en) Dot product calculators and methods of operating the same
CN109903350B (en) Image compression method and related device
CN111143272A (en) Data processing method and device for heterogeneous computing platform and readable storage medium
CN114090466A (en) Instruction processing device and method, computer equipment and storage medium
CN113254073B (en) Data processing method and device
US11610128B2 (en) Neural network training under memory restraint
CN111860772B (en) Device and method for executing artificial neural network mapping operation
CN109542837B (en) Operation method, device and related product
CN109543835B (en) Operation method, device and related product
WO2022181252A1 (en) Joint detection device, training model generation device, joint detection method, training model generation method, and computer-readable recording medium
CN114896179B (en) Memory page copying method and device, computing equipment and readable storage medium
CN117437451B (en) Image matching method, device, equipment and storage medium
CN111597886B (en) Hardware accelerator, system and acceleration method for fingerprint image processing
CN117056255B (en) Atomic operation device, method, equipment and medium
CN111931937B (en) Gradient updating method, device and system of image processing model
CN109543836B (en) Operation method, device and related product
KR20220021704A (en) Method and apparatus of processing convolution operation based on redundancy reduction
CN115016848A (en) Instruction processing method, device, chip, board card, equipment and medium
KR20230095775A (en) Memory expander performing near data processing function and accelerator system including the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40061873

Country of ref document: HK