WO2020078446A1 - Computation method and apparatus, and related product - Google Patents

Computation method and apparatus, and related product Download PDF

Info

Publication number
WO2020078446A1
WO2020078446A1 PCT/CN2019/111852 CN2019111852W WO2020078446A1 WO 2020078446 A1 WO2020078446 A1 WO 2020078446A1 CN 2019111852 W CN2019111852 W CN 2019111852W WO 2020078446 A1 WO2020078446 A1 WO 2020078446A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
macro instruction
macro
running
data
Prior art date
Application number
PCT/CN2019/111852
Other languages
French (fr)
Chinese (zh)
Inventor
兰慧盈
杜子东
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201811221763.3A external-priority patent/CN111079912B/en
Priority claimed from CN201811222476.4A external-priority patent/CN111079907B/en
Priority claimed from CN201811222530.5A external-priority patent/CN111079915B/en
Priority claimed from CN201811221685.7A external-priority patent/CN111079911B/en
Priority claimed from CN201811222454.8A external-priority patent/CN111078280B/en
Priority claimed from CN201811222553.6A external-priority patent/CN111078125B/en
Priority claimed from CN201811221780.7A external-priority patent/CN111079913B/en
Priority claimed from CN201811221806.8A external-priority patent/CN111079925B/en
Priority claimed from CN201811221778.XA external-priority patent/CN111078291B/en
Priority claimed from CN201811221788.3A external-priority patent/CN111078282B/en
Priority claimed from CN201811220957.1A external-priority patent/CN111079924B/en
Priority claimed from CN201811221789.8A external-priority patent/CN111078283B/en
Priority claimed from CN201811221783.0A external-priority patent/CN111078281B/en
Priority claimed from CN201811222491.9A external-priority patent/CN111078284B/en
Priority claimed from CN201811222555.5A external-priority patent/CN111078285B/en
Priority claimed from CN201811220923.2A external-priority patent/CN111079910B/en
Priority claimed from CN201811222523.5A external-priority patent/CN111079914B/en
Priority claimed from CN201811222531.XA external-priority patent/CN111079916B/en
Priority claimed from CN201811220922.8A external-priority patent/CN111079909B/en
Priority claimed from CN201811221643.3A external-priority patent/CN111078293B/en
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2020078446A1 publication Critical patent/WO2020078446A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of information processing technology, and in particular, to a neural network instruction generation method, device, and related products.
  • neural network algorithms With the continuous development of technology, the use of neural network algorithms is becoming more and more widespread. It has been well used in image recognition, speech recognition, natural language processing and other fields.
  • a large-scale neural network model based on Graphics (Processing Unit, GPU) and Central Processing Unit (CPU) requires a lot of calculation time and consumes a lot of power.
  • the method of accelerating the processing speed of the neural network model has problems such as inability to cross-platform processing, low processing efficiency, high development cost, and error-prone.
  • a neural network instruction generation device comprising:
  • a device determination module configured to determine a running device that executes the macro instruction according to the received macro instruction
  • the instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.
  • a machine learning computing device comprising:
  • One or more neural network instruction generation devices described in the first aspect above used to obtain data and control information to be calculated from other processing devices, and perform a specified machine learning operation, and pass the execution result to the I / O interface Other processing devices;
  • the machine learning operation device includes a plurality of the neural network instruction generation devices
  • the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure
  • a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.
  • a combined processing device comprising:
  • the machine learning operation device interacts with the other processing device to jointly complete the calculation operation specified by the user.
  • a machine learning chip including the machine learning network computing device described in the second aspect above or the combined processing device described in the third aspect above.
  • a machine learning chip packaging structure including the machine learning chip described in the fourth aspect above.
  • a board card including the machine learning chip packaging structure described in the fifth aspect above.
  • an electronic device including the machine learning chip described in the above fourth aspect or the board described in the sixth aspect above.
  • a neural network instruction generation method comprising:
  • an operation instruction is generated.
  • the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and / or medical devices.
  • the vehicle includes an airplane, ship, and / or vehicle;
  • the household appliance includes a TV, air conditioner, microwave oven, refrigerator, rice cooker, humidifier, washing machine, electric lamp, gas stove, and range hood;
  • the medical Equipment includes MRI, B-mode ultrasound and / or electrocardiograph.
  • the neural network instruction generation method, device and related products provided by the embodiments of the present disclosure include a device determination module and an instruction generation module.
  • the device determination module is used to determine a running device that executes the macro instruction based on the received macro instruction.
  • the instruction generation module is used to generate the operation instruction according to the macro instruction and the operation equipment.
  • the method, device and related products can be used across platforms, with good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material costs for development.
  • FIG. 1 shows a schematic diagram of a processor of an instruction generation and processing method according to an embodiment of the present disclosure.
  • FIG. 2 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.
  • FIG. 3 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.
  • FIG. 4 shows a flowchart of a neural network instruction generation method according to an embodiment of the present disclosure.
  • FIG. 5 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure.
  • FIG. 6 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure.
  • FIG. 7 shows a flowchart of a neural network instruction processing method according to an embodiment of the present disclosure.
  • FIGS. 8a and 8b are schematic diagrams illustrating application scenarios of a neural network instruction generation device and a processing system according to an embodiment of the present disclosure.
  • 9a and 9b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
  • FIG. 10 shows a schematic structural diagram of a board according to an embodiment of the present disclosure.
  • the term “if” may be interpreted as “when” or “once” or “in response to a determination” or “in response to a detection” depending on the context.
  • the phrase “if determined” or “if [described condition or event] is detected” may be interpreted in the context to mean “once determined” or “in response to a determination” or “once detected [described condition or event ] “Or” In response to detection of [the described condition or event] ".
  • the instruction generation and instruction processing method may be applied to a processor, which may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or may be used to perform artificial intelligence operations Artificial Intelligence Processor (IPU). Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, etc.
  • the artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Processing, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. This disclosure does not limit the specific types of processors.
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit may independently run various assigned tasks, such as: convolution operation tasks and pooling tasks Or fully connected tasks.
  • the present disclosure does not limit the processing unit and the tasks executed by the processing unit.
  • FIG. 1 shows a schematic diagram of a processor of an instruction generation and processing method according to an embodiment of the present disclosure.
  • the processor 100 includes a plurality of processing units 101 and a storage unit 102.
  • the plurality of processing units 101 are used to execute an instruction sequence
  • the storage unit 102 is used to store data, which may include a random access memory (RAM, Random Access Memory) And register file.
  • the multiple processing units 101 in the processor 100 can share a part of the storage space, for example, share a part of the RAM storage space and the register file, and can also have their own storage spaces at the same time.
  • the device includes a device determination module 11 and an instruction generation module 12.
  • the device determination module 11 is used to determine a running device that executes the macro instruction according to the received macro instruction.
  • the instruction generation module 12 is used to generate an operation instruction according to the macro instruction and the operation device.
  • a macro instruction is a term for batch processing.
  • the macro instruction can be a rule or pattern, or grammatical replacement. When a macro instruction is encountered, the rule or pattern is automatically replaced.
  • the macro can be formed by integrating the to-be-executed instructions that are commonly used for calculation, control, and transportation of data.
  • the macro instruction may include at least one of the following: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.
  • the calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction.
  • the control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction.
  • the data handling macro instruction may include at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction.
  • the write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
  • the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters.
  • the run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
  • the identification of the designated device may be the identification of the physical address, IP address, name, and number of the designated device.
  • the identification may include one or any combination of numbers, letters, and symbols.
  • the input address may be an input address of data, a read address, etc. to obtain data
  • the output address may be an output address of processed data, a write address, etc. to store data.
  • the input amount may be information characterizing the size of the data such as the input scale and input length of the data.
  • the output volume may be information indicating the size of the data volume such as the output scale and output length of the data.
  • Operands can include the length of the register, the address of the register, the identification of the register, the immediate value, and so on.
  • the immediate number is the number given in the immediate addressing mode instruction.
  • the instruction parameter may refer to a parameter corresponding to the macro instruction and related to its execution.
  • the instruction parameter may be the address and length of the second operand.
  • the instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, and so on.
  • a macro instruction it must include an operation code and at least one operation field, where the operation code is the operation type, and the operation field includes the identification of the specified device, input address, output address, input volume, output volume, Operands and instruction parameters.
  • the operation code may be the part of the instruction or field (usually represented by code) specified in the computer program to perform the operation, and is an instruction sequence number used to inform the device that executes the instruction which instruction needs to be executed.
  • the operation domain may be the source of all data required to execute the corresponding instruction. All data required to execute the corresponding instruction include parameter data, data to be operated or processed, corresponding operation method, or stored parameter data, to be operated or The data to be processed, the address of the corresponding operation method, etc.
  • the device determination module 11 may determine one or more running devices according to the macro instruction.
  • the instruction generation module 12 may generate one or more execution instructions. When there are multiple generated running instructions, the multiple running instructions may be executed in the same running device, or may be executed in different running devices, which is not limited in the present disclosure.
  • the neural network instruction generation device includes a device determination module and an instruction generation module.
  • the device determination module is used to determine a running device that executes the macro instruction according to the received macro instruction.
  • the instruction generation module is used to generate the operation instruction according to the macro instruction and the operation equipment.
  • the device can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
  • FIG. 3 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.
  • the device may further include a macro instruction generation module 13.
  • the macro generation module 13 is used to receive the instruction to be executed, and generate a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • the designated device may be determined according to the operation type, input amount, output amount, etc. of the instruction to be executed.
  • the received instruction to be executed may be one or multiple instructions.
  • the instruction to be executed may include at least one of the following: a calculation instruction to be executed, a control instruction to be executed, and a data handling instruction to be executed.
  • the calculation instruction to be executed may include at least one of a neural network calculation instruction to be executed, a vector logic calculation instruction to be executed, a matrix vector calculation instruction to be executed, a scalar calculation instruction to be executed and a scalar logic calculation instruction to be executed.
  • the control instruction to be executed may include at least one of an unconditional jump instruction to be executed and a conditional jump instruction to be executed.
  • the data carrying instruction to be executed may include at least one of a reading instruction to be executed and a writing instruction to be executed.
  • the read instruction to be executed may include at least one of a read neuron instruction to be executed, a read synapse instruction to be executed and a read scalar instruction to be executed.
  • the write instruction to be executed may include at least one of a write neuron instruction to be executed, a write synapse instruction to be executed and a write scalar instruction to be executed.
  • the instruction to be executed may include at least one of the following options: operation type, input address, output address, input amount, output amount, operand, and instruction parameter.
  • the determined identifier of the designated device may be added to the instruction to be executed to generate a macro instruction.
  • a certain instruction m to be executed is "XXX ; param)".
  • XXX is the operation type
  • param is the instruction parameter.
  • the designated device m-1 can be determined according to the operation type "XXX" of the instruction m to be executed. Then, an identifier (for example, 09) specifying the device m-1 is added to the instruction m to be executed, and a macro instruction M "XXX09, ... param" corresponding to the instruction m to be executed is generated.
  • the identifier of the specified device corresponding to each instruction to be executed can be added to the instruction to be executed, and a macro instruction can be generated according to the plurality of instructions to be executed with the identifier of the specified device , Or generate multiple corresponding macros.
  • the device determination module 11 may include a first determination submodule 111.
  • the first determining submodule 111 is used to determine that the designated device is an operating device when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the macro instruction.
  • the execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
  • the macro instruction may include the identifier of one or more designated devices that execute the macro instruction.
  • the first determination submodule 111 can directly determine the specified device as the running device, saving the generation time of generating the running instruction based on the macro instruction, and ensuring that The generated operation instruction can be executed by the corresponding operation device.
  • the device may further include a resource acquisition module 14.
  • the device determination module 11 may further include a second determination sub-module 112.
  • the resource acquisition module 14 is used to acquire resource information of candidate devices.
  • the second determining submodule 112 is used to determine the running device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not contain the identifier of the specified device .
  • the resource information may include the instruction set included in the candidate device.
  • the instruction set included in the alternative device may be an instruction set corresponding to the operation type of one or more macroinstructions. The more instruction sets the candidate device contains, the more types of macro instructions the candidate device can execute.
  • the second determination submodule 112 may determine one or more running devices capable of executing the macro instruction from the candidate devices.
  • the determined instruction set of the operating device includes the instruction set corresponding to the macro instruction. For example, if the received macro instruction is a neural network calculation macro instruction, the candidate device containing the instruction set corresponding to the neural network calculation macro instruction may be determined as an operation device to ensure that it can execute the generated operation instruction.
  • the device determination module 11 may further include a third determination sub-module 113.
  • the third determining submodule 113 determines that the operating device is based on the macro instruction and the resource information of the candidate device when it is determined that the macro instruction contains the identifier of the designated device, and the resource of the designated device does not satisfy the execution conditions for executing the macro instruction.
  • the third determining submodule 113 may determine that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions, it may be determined that the designated device of the macro instruction does not have the ability to execute the macro instruction.
  • the third determining sub-module 113 may determine the running device from the candidate devices, and may determine the candidate device containing the instruction set corresponding to the macro instruction as the running device.
  • the macro instruction may include at least one of an input amount and an output amount
  • the instruction generation module 12 is further used to determine the data amount of the macro instruction, according to the data amount of the macro instruction , Macro instruction and resource information of running equipment to generate running instruction.
  • the data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the running device may further include at least one of storage capacity and remaining storage capacity.
  • the storage capacity of the running device may refer to the amount of binary information that the memory of the running device can hold.
  • the remaining storage capacity of the running device may refer to the storage capacity that the running device can currently use for command operation after removing the occupied storage capacity.
  • the resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity, the stronger the operating capability of the operating equipment.
  • the instruction generation module 12 may determine the specific method of splitting the macro instruction according to the resource information of each running device, the data amount of the macro instruction, etc., so as to split the macro instruction and generate the corresponding to the running device Running instructions.
  • the instruction generation module 12 may include a first instruction generation sub-module 121.
  • the first instruction generation sub-module 121 is used to split the macro instruction into multiple pieces according to the amount of operation data and the amount of data of the operation device when it is determined that the operation device is one and the resource of the operation device does not meet the capacity condition for executing the macro instruction Run instructions, so that the running equipment executes multiple run instructions in sequence.
  • the operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity may be based on the operation data The amount is determined.
  • the amount of operation data of the operation device may be determined according to the storage capacity or remaining storage capacity of the operation device.
  • the capacity condition may be that the amount of running data of the running device is greater than or equal to the amount of data of the macro instruction.
  • the resource of the running device does not meet the capacity condition of executing the macro instruction, it may mean that the amount of running data of the running device is less than the amount of data of the macro instruction.
  • the operation input and operation output must be less than or equal to the operation data to ensure that the generated operation instructions can be executed by the operation equipment.
  • the operation input quantity (or operation output quantity) of different operation instructions among the plurality of operation instructions may be the same or different, and the disclosure does not limit this.
  • the first instruction generation sub-module 121 determines that there is only one running device, and when the resources of the running device meet the capacity condition for executing the macro instruction, it can directly convert the macro instruction into a running instruction, and also The instruction is split into multiple running instructions, which is not limited in this disclosure.
  • the instruction generation module 12 may include a second instruction generation sub-module 122.
  • the second instruction generation sub-module 122 is configured to split the macro instruction according to the operation data amount and data amount of each operation device when it is determined that there are multiple operation devices, and generate an operation instruction corresponding to each operation device.
  • the operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount.
  • the operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
  • the operation input amount and the operation output amount need to be less than or equal to the operation data amount to ensure that the generated operation instruction can be executed by the operation device.
  • the second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device for the corresponding operation device to execute.
  • the operation instruction includes at least one of the operation input quantity and the operation output quantity, except that the data quantity of the operation instruction can be limited so that it can be executed by the corresponding operation device. It can also meet the special requirements of different operation instructions on the operation input and / or operation output.
  • the operation input and / or operation output may not be included, and the default operation input may be preset And the default operating output, so that when the operating device determines that there is no operating input or operating output in the received operating command, it can use the default operating input and default operating output as the operating input and operating output of the operating command. the amount.
  • the default input amount and the default output amount for different types of macro instructions may be preset.
  • the corresponding preset default input quantity and default output quantity can be used as the input quantity and output quantity of the macro instruction.
  • the data amount of the macro instruction is determined according to the default input amount and / or the default output amount
  • the operation instruction is generated according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device.
  • the generated operation instruction may not include the operation input quantity and the operation output quantity, or may include at least one of the operation input quantity and the operation output quantity.
  • the operation device may execute the operation instruction according to the preset default operation input value and / or default operation output value.
  • the instruction generation module 12 may also split the macro instruction to generate a running instruction according to the macro instruction and the preset macro instruction splitting rule.
  • the macro splitting rule may be determined according to the conventional macro splitting method (for example, splitting according to the processing of the macro command, etc.), combined with the threshold of the operating data amount of the commands that all candidate devices can execute.
  • the storage capacity (or remaining storage capacity) of all the candidate devices can be compared, and the determined minimum storage capacity (or remaining storage capacity) can be determined as the threshold of the operating data amount of instructions that all the candidate devices can execute.
  • the running instruction generated by the instruction generating module according to the macro instruction may be an instruction to be executed, or one or more parsed instructions obtained by parsing the instruction to be executed, which is not limited in the present disclosure.
  • the device may further include a queue construction module 15.
  • the queue construction module 15 is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • an instruction queue uniquely corresponding to each running device can be constructed.
  • the running instructions in the instruction queue can be ordered, and the running instructions can be sent to the only corresponding running device in the instruction queue; or the instruction queue can be sent to the running device, so that the running devices execute the operations in the order of the running instructions in the command queue. instruction.
  • the running equipment executes the running instructions according to the instruction queue, avoids the running instructions from being erroneous or delayed, and prevents the running instructions from being missed.
  • the queue ordering rules may be determined based on the estimated execution time of the execution of the execution instruction, the generation time of the execution instruction, the operation input amount, operation output amount, operation type and other information related to the operation instruction itself. There are no restrictions.
  • the device may further include an instruction dispatch module 16.
  • the instruction dispatch module 16 is used to send the operation instruction to the operation equipment, so that the operation equipment executes the operation instruction.
  • the operation instruction when there is only one operation instruction executed by the operation device, the operation instruction may be directly sent to the operation device.
  • all the multiple operating instructions may be sent to the operating device, so that the operating device sequentially executes the multiple operating instructions. It is also possible to send a plurality of operation instructions to the corresponding operation equipment in sequence, wherein each time the operation equipment completes the current operation instruction, the next operation instruction corresponding to the operation equipment is sent to the operation equipment.
  • a person skilled in the art may set the manner of sending the operation instruction to the operation device, and the disclosure does not limit this.
  • the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162, and an instruction sending submodule 163.
  • the instruction assembly submodule 161 is used to generate an assembly file according to the operation instruction.
  • the assembly translation submodule 162 is used to translate the assembly file into a binary file.
  • the instruction sending submodule 163 is used to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • the data amount of the running instruction can be reduced, the time for sending the running instruction to the running device can be saved, and the conversion and execution speed of the macro instruction can be improved.
  • the running device can decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.
  • the running device may be one or any combination of CPU, GPU, and embedded neural network processor (Neural-network Processing Unit, NPU for short). In this way, the speed at which the device generates the run command according to the macro command is increased.
  • NPU Neuro-network Processing Unit
  • the device may be set in the CPU and / or NPU.
  • the process of generating the running instruction according to the macro instruction through the CPU and / or NPU, it provides more possible ways for the implementation of the device.
  • the present disclosure provides a running device for executing the running command generated by the neural network command generating device.
  • the running equipment includes a control module and an execution module.
  • the control module is used to obtain data, neural network models and operation instructions, and can also be used to parse the operation instructions, obtain multiple analysis instructions, and send the multiple analysis instructions and data to the execution module.
  • the execution module is used to execute multiple parsing instructions based on the data to obtain the execution result.
  • the running device further includes a storage module.
  • the storage module may include at least one of a register and a cache, and the cache may include a high-speed temporary storage cache.
  • the cache can be used to store data. Registers can be used to store scalar data in the data.
  • control module may include an instruction storage submodule and an instruction processing submodule.
  • the instruction storage sub-module is used to store operation instructions.
  • the instruction processing submodule is used to parse the running instruction and obtain multiple parsed instructions.
  • control module may further include a storage queue sub-module.
  • the storage queue sub-module is used to store a running instruction queue, and the running instruction queue contains the running instructions to be executed by the running device and multiple parsing instructions. In the running instruction queue, all instructions are arranged in the order of execution.
  • the execution module may further include a dependency processing sub-module.
  • the dependency processing submodule is used to cache the first parsing instruction in the instruction storage submodule when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction. After the execution of the zeroth parsing instruction is completed , Extract the first parsing instruction from the instruction storage submodule and send it to the execution module.
  • the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: a first storage address interval storing data required by the first parsing instruction and a zeroth storage storing data required by the zeroth parsing instruction The address interval has overlapping areas. Conversely, there is no association between the first parsing instruction and the zeroth parsing instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
  • the present disclosure provides a neural network instruction processing system.
  • the system includes the aforementioned neural network instruction generating device and the aforementioned operating device.
  • FIG. 4 shows a flowchart of a neural network instruction generation method according to an embodiment of the present disclosure.
  • the method is applied to the above neural network instruction generating device, and the method includes step S41 and step S42.
  • step S41 based on the received macro instruction, the operating device that executes the macro instruction is determined.
  • step S42 an operation instruction is generated based on the macro instruction and the operation device.
  • step S41 may include: when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition of executing the macro instruction, determining the designated device as the running device.
  • the execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
  • the method may further include: acquiring resource information of the candidate device.
  • step S41 may further include: when it is determined that the macro instruction does not contain the identifier of the specified device, according to the received macro instruction and the resource information of the candidate device, a running device for executing the macro instruction is determined from the candidate device .
  • the resource information may include the instruction set included in the candidate device.
  • step S41 may further include: when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the macro instruction, according to the macro instruction and the resource of the alternative device Information to determine the operating equipment.
  • the macro instruction may include at least one of input quantity and output quantity.
  • Step S42 may include: determining the data amount of the macro instruction, and generating an operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operating device.
  • the data amount may be determined according to at least one of the input amount and the output amount
  • the resource information of the operating device may further include at least one item of storage capacity and remaining storage capacity.
  • generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: determining that the operation device is one and the resource of the operation device does not satisfy the execution of the macro instruction Under the capacity condition, the macro instruction is divided into multiple operation instructions according to the operation data and data amount of the operation equipment, so that the operation equipment executes the plurality of operation instructions in sequence.
  • the operation data volume of the operation equipment may be determined based on the resource information of the operation equipment, and each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity are based on the operation data definite.
  • generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: when determining that there are multiple operation devices, according to the operation data amount of each operation device
  • the macro instruction is split with the amount of data to generate an operation instruction corresponding to each operation device.
  • the operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount.
  • the operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
  • the method may further include: sorting the execution instructions according to the queue sorting rule, and constructing an instruction queue corresponding to the execution device according to the sorted execution instructions.
  • the method may further include: receiving an instruction to be executed, and generating a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • the method may further include: sending the operation instruction to the operation device, so that the operation device executes the operation instruction.
  • sending the running instruction to the running device to make the running device execute the running instruction includes: generating an assembly file according to the running instruction; translating the assembly file into a binary file; sending the binary file to the running device, So that the running device executes the running instruction according to the binary file.
  • the resource information may include at least one of the storage capacity of the candidate device, the remaining storage capacity, and the instruction set included in the candidate device.
  • the running device may be one or any combination of CPU, GPU, and NPU.
  • the method can be applied to the CPU and / or NPU.
  • the macro instruction may include at least one of the following instructions: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.
  • the calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction.
  • the control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction.
  • the data handling macro instruction may include at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction.
  • the write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
  • the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters.
  • the run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
  • the method for generating a neural network instruction determines the operation device for executing the macro instruction according to the received macro instruction; and generates the operation instruction according to the macro instruction and the operation device.
  • the method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
  • the present disclosure also provides a neural network instruction execution method.
  • the method is applied to the above operation device.
  • the method includes: acquiring data, a neural network model, and an operation instruction through the operation device, parsing the operation instruction, and obtaining multiple analysis instructions, and Execute multiple parsing instructions based on the data to get the execution result.
  • the method may further include: storing data and scalar data in the data through the running device.
  • the running device includes a storage module
  • the storage module includes one or more of a register and a cache
  • the cache includes a high-speed temporary storage cache.
  • Cache used to store data. Registers are used to store scalar data in the data.
  • the method may further include:
  • the running instruction queue is stored by the running device.
  • the running instruction queue includes the running instruction and a plurality of parsing instructions.
  • the running instruction queue and the multiple parsing instructions are arranged in order according to the order in which they are executed.
  • the method may further include:
  • the running device determines that there is an association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cached first parsing instruction is executed.
  • the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes: a first storage address interval storing data required by the first parsing instruction and a zeroth storage address storing data required by the zeroth parsing instruction Intervals have overlapping areas.
  • the method for executing a neural network instruction obtains data, a neural network model, and a running instruction through a running device, analyzes the running instruction, obtains multiple analyzing instructions, and executes multiple analyzing instructions based on the data to obtain an execution result .
  • the method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
  • the present disclosure also provides a neural network instruction processing method.
  • the method is applied to a neural network instruction processing system.
  • the neural network instruction processing system includes the aforementioned neural network instruction generating device and the aforementioned operating device.
  • the method includes the above-mentioned neural network instruction generation method applied to the neural network instruction generation device and the neural network instruction execution method applied to the running equipment.
  • the method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
  • a neural network instruction generating device comprising:
  • a device determination module configured to determine a running device that executes the macro instruction according to the received macro instruction
  • the instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.
  • the device determination module includes:
  • the first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ,
  • the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
  • Clause B3 The device according to Clause B2, the device further comprising:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module also includes:
  • a second determining submodule configured to determine from the candidate device based on the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not include the identifier of the designated device A running device for executing the macro instruction
  • the resource information includes an instruction set included in the candidate device.
  • Clause B4 The apparatus according to Clause B3, the device determination module, further comprising:
  • a third determining submodule when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the candidate device Resource information to determine the operating equipment.
  • Clause B5. The device according to any one of Clauses B2-B4, the macro instruction includes at least one of an input quantity and an output quantity,
  • the instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes:
  • the first instruction generating submodule is used to determine, according to the amount of operation data of the operation device and the data, when the operation device is determined to be one and the resource of the operation device does not satisfy the capacity condition for executing the macro instruction
  • the macro instruction is divided into multiple running instructions to enable the running device to execute multiple running instructions in sequence
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.
  • the instruction generation module includes:
  • a second instruction generation sub-module used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation device 'S running instructions
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
  • Clause B8 The device according to Clause B1, the device further comprising:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause B9 The device according to Clause B2, the device further comprising:
  • the macro generation module is used for receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • Clause B10 The device according to Clause B1, the device further comprising:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the device is set in the CPU and / or NPU;
  • the macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,
  • the calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
  • the control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction
  • the data handling macro instruction includes at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction.
  • the instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
  • the macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
  • the run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
  • Article B12. A machine learning computing device, the device comprising:
  • One or more neural network instruction generation devices as described in any one of Clause B1-Clause B11, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;
  • the machine learning operation device includes a plurality of the neural network instruction generation devices
  • the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure
  • a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.
  • a combined processing device comprising:
  • Machine learning computing devices general interconnection interfaces and other processing devices as described in clause B12;
  • the machine learning operation device interacts with the other processing device to jointly complete the calculation operation specified by the user.
  • Clause B14 the combined processing device according to Clause B13, further comprising: a storage device, which is connected to the machine learning computing device and the other processing device, respectively, for storing the machine learning computing device and the Data from other processing devices.
  • Article B15 A machine learning chip, the machine learning chip includes:
  • Article B16 An electronic device, the electronic device comprising:
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause B15;
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • the storage device includes: multiple sets of storage units, each of which is connected to the machine learning chip through a bus, and the storage unit is: DDR SDRAM;
  • the machine learning chip includes: a DDR controller for controlling data transmission and data storage of each storage unit;
  • the interface device is: a standard PCIE interface.
  • a neural network instruction generation method comprising:
  • an operation instruction is generated.
  • determining a running device that executes the macro instruction includes:
  • the designated device When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, determining the designated device as the running device,
  • the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
  • Clause B21 The method according to Clause B20, the method further comprising:
  • determining a running device for executing the macro instruction includes:
  • a candidate for executing the macro instruction is determined from the candidate device Running equipment
  • the resource information includes an instruction set included in the candidate device.
  • determining a running device that executes the macro instruction includes:
  • the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Run the device.
  • the macro instruction includes at least one of an input quantity and an output quantity
  • generating an operation instruction includes:
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes:
  • the macro instruction is split into according to the running data amount of the running device and the data amount Multiple operating instructions, so that the operating device sequentially executes multiple operating instructions,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.
  • generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes:
  • the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
  • Clause B26 The method according to Clause B19, the method further comprising:
  • the running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
  • Clause B27 The method according to Clause B20, the method further comprising:
  • Clause B28 The method according to Clause B19, the method further comprising:
  • sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
  • the running device is one or any combination of CPU, GPU and NPU;
  • the method is applied to CPU and / or NPU;
  • the macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,
  • the calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
  • the control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction
  • the data handling macro instruction includes at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction.
  • the instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
  • the macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
  • the run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
  • FIG. 5 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure. As shown in FIG. 5, the system includes an instruction generation device 100 and an operation device 200.
  • the instruction generation device 100 includes a device determination module 11 and an instruction generation module 12.
  • the device determination module 11 is used to determine a running device that executes the macro instruction according to the received macro instruction.
  • the instruction generation module 12 is used to generate an operation instruction according to the macro instruction and the operation device.
  • the operation device 200 includes a control module 21 and an execution module 22.
  • the control module 21 is used to obtain the required data, the neural network model, and the operation instructions, analyze the operation instructions, and obtain multiple analysis instructions.
  • the execution module 22 is used to execute multiple parsing instructions according to the data to obtain the execution result.
  • a macro instruction is a term for batch processing.
  • a macro instruction can be a rule or a pattern, or a syntax replacement.
  • a macro instruction processing device, system, etc. will automatically perform this when it encounters a macro instruction. Replacement of rules or patterns.
  • the macro instruction can be formed by integrating the instructions to be executed that are commonly used for calculation, control and transportation of data.
  • the macro instruction may include at least one of the following: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.
  • the calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction.
  • the control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction.
  • the data handling macro instruction may include at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction.
  • the write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
  • the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters.
  • the run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
  • the identification of the designated device may be the identification of the physical address, IP address, name, and number of the designated device.
  • the identification may include one or any combination of numbers, letters, and symbols.
  • the input address may be an input address of data, a read address, etc. to obtain data
  • the output address may be an output address of processed data, a write address, etc. to store data.
  • the input amount may be information characterizing the size of the data such as the input scale and input length of the data.
  • the output volume may be information indicating the size of the data volume such as the output scale and output length of the data.
  • the operand may include one or more of the length of the register, the address of the register, the identification of the register, the immediate value, and so on.
  • the immediate number is the number given in the immediate addressing mode instruction.
  • the instruction parameter may refer to a parameter corresponding to the macro instruction and related to its execution.
  • the instruction parameter may be the address and length of the second operand.
  • the instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, and so on.
  • a macro instruction it must include an operation code and at least one operation field, where the operation code is the operation type, and the operation field includes the identification of the specified device, input address, output address, input volume, output volume, Operands and instruction parameters.
  • the operation code may be the part of the instruction or field (usually represented by code) specified in the computer program to perform the operation, and is an instruction sequence number used to inform the device that executes the instruction which instruction needs to be executed.
  • the operation domain may be the source of all data required to execute the corresponding instruction. All data required to execute the corresponding instruction include parameter data, data to be operated or processed, corresponding operation method, or stored parameter data, to be operated or The data to be processed, the address of the corresponding operation method, etc.
  • the device determination module 11 may determine one or more running devices according to the macro instruction.
  • the instruction generation module 12 may generate one or more execution instructions. When there are multiple generated running instructions, the multiple running instructions may be executed in the same running device, or may be executed in different running devices, which is not limited in the present disclosure.
  • the neural network instruction processing system includes an instruction generation device and an operation device.
  • the instruction generating device includes a device determining module for determining a running device that executes the macro instruction based on the received macro instruction; an instruction generating module is used for generating a running instruction based on the macro instruction and the running device.
  • the operation device includes a control module for acquiring required data, a neural network model and operation instructions, analyzing the operation instructions to obtain multiple analysis instructions; an execution module for executing the plurality of analysis instructions based on the data, and obtaining execution results .
  • the neural network instruction processing system provided by the embodiments of the present disclosure can be used across platforms, has good applicability, fast instruction conversion speed, high processing efficiency, low error probability, and low development labor and material costs.
  • the instruction generation device 100 may further include a macro instruction generation module 13.
  • the macro generation module 13 is used to receive the instruction to be executed, and generate a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • the designated device may be determined according to the operation type, input amount, output amount, etc. of the instruction to be executed.
  • the received instruction to be executed may be one or multiple instructions.
  • the instruction to be executed may include at least one of the following: a calculation instruction to be executed, a control instruction to be executed, and a data handling instruction to be executed.
  • the calculation instruction to be executed may include at least one of a neural network calculation instruction to be executed, a vector logic calculation instruction to be executed, a matrix vector calculation instruction to be executed, a scalar calculation instruction to be executed and a scalar logic calculation instruction to be executed.
  • the control instruction to be executed may include at least one of an unconditional jump instruction to be executed and a conditional jump instruction to be executed.
  • the data carrying instruction to be executed may include at least one of a reading instruction to be executed and a writing instruction to be executed.
  • the read instruction to be executed may include at least one of a read neuron instruction to be executed, a read synapse instruction to be executed and a read scalar instruction to be executed.
  • the write instruction to be executed may include at least one of a write neuron instruction to be executed, a write synapse instruction to be executed and a write scalar instruction to be executed.
  • the instruction to be executed may include at least one of the following options: operation type, input address, output address, input amount, output amount, operand, and instruction parameter.
  • the determined identifier of the designated device may be added to the instruction to be executed to generate a macro instruction.
  • a certain instruction m to be executed is "XXX ; param)".
  • XXX is the operation type
  • param is the instruction parameter.
  • the designated device m-1 can be determined according to the operation type "XXX" of the instruction m to be executed. Then, an identifier (for example, 09) specifying the device m-1 is added to the instruction m to be executed, and a macro instruction M "XXX09, ... param" corresponding to the instruction m to be executed is generated.
  • the identifier of the specified device corresponding to each instruction to be executed can be added to the instruction to be executed, and a macro instruction can be generated according to the plurality of instructions to be executed with the identifier of the specified device , Or generate multiple corresponding macros.
  • the device determination module 11 may include a first determination submodule 111.
  • the first determining submodule 111 is used to determine that the designated device is an operating device when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the macro instruction.
  • the execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
  • the macro instruction may include the identifier of one or more designated devices that execute the macro instruction.
  • the first determination submodule 111 can directly determine the specified device as the running device, saving the generation time of generating the running instruction based on the macro instruction, and ensuring that The generated operation instruction can be executed by the corresponding operation device.
  • the instruction generation device 100 may further include a resource acquisition module 14.
  • the device determination module 11 may further include a second determination sub-module 112.
  • the resource acquisition module 14 is used to acquire resource information of candidate devices.
  • the second determining submodule 112 is used to determine the running device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not contain the identifier of the specified device .
  • the resource information may include the instruction set included in the candidate device.
  • the instruction set included in the alternative device may be an instruction set corresponding to the operation type of one or more macroinstructions. The more instruction sets the candidate device contains, the more types of macro instructions the candidate device can execute.
  • the second determination submodule 112 may determine one or more running devices capable of executing the macro instruction from the candidate devices.
  • the determined instruction set of the running device includes an instruction set corresponding to the macro instruction. For example, if the received macro instruction is a neural network calculation macro instruction, the candidate device containing the instruction set corresponding to the neural network calculation macro instruction may be determined as the operation device to ensure that the operation device can execute the generated operation instruction.
  • the device determination module 11 may further include a third determination sub-module 113.
  • the third determining submodule 113 determines that the operating device is based on the macro instruction and the resource information of the candidate device when it is determined that the macro instruction contains the identifier of the designated device, and the resource of the designated device does not satisfy the execution conditions for executing the macro instruction.
  • the third determining submodule 113 may determine that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions, it may be determined that the designated device of the macro instruction does not have the ability to execute the macro instruction.
  • the third determining sub-module 113 may determine the running device from the candidate devices, and may determine the candidate device containing the instruction set corresponding to the macro instruction as the running device.
  • the macro instruction may include at least one of the input amount and the output amount
  • the instruction generation module 12 is further used to determine the data amount of the macro instruction, based on the data amount of the macro instruction , Macro instruction and resource information of running equipment to generate running instruction.
  • the data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the running device may further include at least one of storage capacity and remaining storage capacity.
  • the storage capacity of the running device may refer to the amount of binary information that the memory of the running device can hold.
  • the remaining storage capacity of the running device may refer to the storage capacity that the running device can currently use for command operation after removing the occupied storage capacity.
  • the resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity, the stronger the operating capability of the operating equipment.
  • the instruction generation module 12 may determine the specific method of splitting the macro instruction according to the resource information of each running device, the data amount of the macro instruction, etc., so as to split the macro instruction and generate the corresponding to the running device Running instructions.
  • the instruction generation module 12 may include a first instruction generation sub-module 121.
  • the first instruction generation sub-module 121 is used to split the macro instruction into multiple pieces according to the amount of operation data and the amount of data of the operation device when it is determined that the operation device is one and the resource of the operation device does not meet the capacity condition for executing the macro instruction Run instructions, so that the running equipment executes multiple run instructions in sequence.
  • the operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity may be based on the operation data The amount is determined.
  • the amount of operation data of the operation device may be determined according to the storage capacity or remaining storage capacity of the operation device.
  • the capacity condition may be that the amount of running data of the running device is greater than or equal to the amount of data of the macro instruction.
  • the resource of the running device does not meet the capacity condition of executing the macro instruction, it may mean that the amount of running data of the running device is less than the amount of data of the macro instruction.
  • the operation input and operation output must be less than or equal to the operation data to ensure that the generated operation instructions can be executed by the operation equipment.
  • the operation input quantity (or operation output quantity) of different operation instructions among the plurality of operation instructions may be the same or different, and the disclosure does not limit this.
  • the first instruction generation sub-module 121 determines that there is only one running device, and when the resources of the running device meet the capacity condition for executing the macro instruction, it can directly convert the macro instruction into a running instruction, and also The instruction is split into multiple running instructions, which is not limited in this disclosure.
  • the instruction generation module 12 may include a second instruction generation sub-module 122.
  • the second instruction generation sub-module 122 is configured to split the macro instruction according to the operation data amount and data amount of each operation device when it is determined that there are multiple operation devices, and generate an operation instruction corresponding to each operation device.
  • the operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount.
  • the operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
  • the operation input amount and the operation output amount need to be less than or equal to the operation data amount to ensure that the generated operation instruction can be executed by the operation device.
  • the second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device for the corresponding operation device to execute.
  • the operation instruction includes at least one of the operation input amount and the operation output amount.
  • the operation instruction in addition to limiting the data volume of the operation instructions so that they can be executed by the corresponding operation equipment, it can also meet the special limitation requirements of the operation input quantity and / or the operation output quantity of different operation instructions.
  • the operation instruction may not include the operation input and / or operation output, and the default may be set in advance.
  • the operation input value and the default operation output value enable the operation device to use the default operation input value and the default operation output value as the operation input value of the operation instruction when it is determined that there is no operation input value or operation output value in the received operation instruction 3. Operation output.
  • the default input amount and the default output amount for different types of macro instructions may be preset.
  • the corresponding preset default input quantity and default output quantity can be used as the input quantity and output quantity of the macro instruction.
  • the data amount of the macro instruction is determined according to the default input amount and / or the default output amount
  • the operation instruction is generated according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device.
  • the generated operation instruction may not include the operation input quantity and the operation output quantity, or may include at least one of the operation input quantity and the operation output quantity.
  • the operation device may execute the operation instruction according to the preset default operation input value and / or default operation output value.
  • the instruction generation module 12 may also split the macro instruction to generate a running instruction according to the macro instruction and the preset macro instruction splitting rule.
  • the macro splitting rule may be determined according to the conventional macro splitting method (for example, splitting according to the processing of the macro command, etc.), combined with the threshold of the operating data amount of the commands that all candidate devices can execute.
  • the storage capacity (or remaining storage capacity) of all the candidate devices can be compared, and the determined minimum storage capacity (or remaining storage capacity) can be determined as the threshold of the operating data amount of instructions that all the candidate devices can execute.
  • the running instruction generated by the instruction generating module according to the macro instruction may be an instruction to be executed, or one or more parsed instructions obtained by parsing the instruction to be executed, which is not limited in the present disclosure.
  • the instruction generation device 100 may further include a queue construction module 15.
  • the queue construction module 15 is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • an instruction queue uniquely corresponding to each running device can be constructed.
  • the running instructions in the instruction queue can be ordered, and the running instructions can be sent to the only corresponding running device in the instruction queue; or the instruction queue can be sent to the running device, so that the running devices execute the operations in the order of the running instructions in the command queue. instruction.
  • the running equipment executes the running instructions according to the instruction queue, avoids the running instructions from being erroneous or delayed, and prevents the running instructions from being missed.
  • the queue ordering rules may be determined based on the estimated execution time of the execution of the execution instruction, the generation time of the execution instruction, the operation input amount, operation output amount, operation type and other information related to the operation instruction itself. There are no restrictions.
  • the instruction generation device 100 may further include an instruction dispatch module 16.
  • the instruction dispatch module 16 is used to send the operation instruction to the operation equipment, so that the operation equipment executes the operation instruction.
  • the operation instruction when there is only one operation instruction executed by the operation device, the operation instruction may be directly sent to the operation device.
  • all the multiple operating instructions may be sent to the operating device, so that the operating device sequentially executes the multiple operating instructions. It is also possible to send a plurality of operation instructions to the corresponding operation equipment in sequence, wherein each time the operation equipment completes the current operation instruction, the next operation instruction corresponding to the operation equipment is sent to the operation equipment.
  • a person skilled in the art may set the manner of sending the operation instruction to the operation device, and the disclosure does not limit this.
  • the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162 and an instruction sending submodule 163.
  • the instruction assembly submodule 161 is used to generate an assembly file according to the operation instruction.
  • the assembly translation submodule 162 is used to translate the assembly file into a binary file.
  • the instruction sending submodule 163 is used to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • the data amount of the running instruction can be reduced, the time for sending the running instruction to the running device can be saved, and the conversion and execution speed of the macro instruction can be improved.
  • the running device can decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.
  • the running device may be one or any combination of CPU, GPU, and embedded neural network processor (Neural-network Processing Unit, NPU for short). In this way, the speed at which the instruction generation device generates the operation instruction according to the macro instruction is increased.
  • NPU Neuro-network Processing Unit
  • the instruction generating device 100 may be set in the CPU and / or NPU. In order to realize the process of generating the running instruction according to the macro instruction through the CPU and / or NPU, it provides more possible ways for the realization of the instruction generating device.
  • the running device 200 further includes a storage module 23.
  • the storage module 23 may include at least one of a register and a cache, and the cache may include a high-speed temporary storage cache.
  • the cache can be used to store data. Registers can be used to store scalar data in the data.
  • control module 21 may include an instruction storage submodule 211 and an instruction processing submodule 212.
  • the instruction storage sub-module 211 is used to store operation instructions.
  • the instruction processing sub-module 212 is used to parse the running instruction to obtain multiple parsed instructions.
  • control module 21 may further include a storage queue sub-module 213.
  • the storage queue sub-module 213 is used to store a running instruction queue, and the running instruction queue includes a running instruction to be executed by the running device and a plurality of parsing instructions. In the running instruction queue, all instructions are arranged in the order of execution.
  • the execution module 22 may further include a dependency relationship processing sub-module 221.
  • the dependency processing submodule 221 is used to cache the first parsing instruction in the instruction storage submodule when it is determined that the first parsing instruction is associated with the zeroth parsing instruction before the first parsing instruction, and the execution of the zeroth parsing instruction is completed After that, the first parsing instruction is extracted from the instruction storage submodule and sent to the execution module.
  • the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: a first storage address interval storing data required by the first parsing instruction and a zeroth storage storing data required by the zeroth parsing instruction The address interval has overlapping areas. Conversely, there is no association between the first parsing instruction and the zeroth parsing instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
  • FIG. 7 shows a flowchart of a neural network instruction processing method according to an embodiment of the present disclosure. As shown in FIG. 7, the method is applied to the neural network instruction processing system including the instruction generation device and the operation device. The method includes step S51 and step S52.
  • Step S51 The instruction generation device determines the operation device that executes the macro instruction according to the received macro instruction, and generates the operation instruction according to the macro instruction and the operation device.
  • Step S52 Obtain data, neural network model and operation instruction through the operation device, analyze the operation instruction, obtain multiple analysis instructions, and execute the multiple analysis instructions according to the data to obtain the execution result.
  • step S51 may include: when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition of executing the macro instruction, determining the designated device as the running device.
  • the execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
  • the method may further include: acquiring resource information of the candidate device.
  • step S51 may also include: when it is determined that the macro instruction does not contain the identifier of the designated device, determine the running device for executing the macro instruction from the candidate device based on the received macro instruction and the resource information of the candidate device .
  • the resource information may include the instruction set included in the candidate device.
  • step S51 may further include: when it is determined that the macro instruction contains the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the macro instruction, according to the macro instruction and the resources of the alternative device Information to determine the operating equipment.
  • the macro instruction may include at least one of the input amount and the output amount.
  • generating the operation instruction according to the macro instruction and the operation device may include: determining the data amount of the macro instruction, according to the macro The data volume of the instruction, the macro instruction and the resource information of the operation equipment generate the operation instruction.
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: determining that the operation device is one and the resource of the operation device does not satisfy the execution of the macro instruction Under the capacity condition, the macro instruction is divided into multiple operation instructions according to the operation data amount and data amount of the operation device, so that the operation device executes multiple operation instructions in sequence.
  • the operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, and each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity are based on the operation data quantity definite.
  • generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: when determining that there are multiple operation devices, according to the operation data amount of each operation device
  • the macro instruction is split with the amount of data to generate an operation instruction corresponding to each operation device.
  • the operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount.
  • the operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
  • the method may further include: sorting the execution instructions according to the queue sorting rule by the instruction generation device, and constructing an instruction queue corresponding to the execution device according to the sorted execution instructions.
  • the method may further include: receiving the instruction to be executed through the instruction generating device, and generating a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • the method may further include: sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction.
  • sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction may include:
  • the method may further include: storing data and scalar data in the data through the running device.
  • the running device includes a storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache, a cache for storing data, and a register for storing scalar data in the data.
  • the method may further include:
  • the running instruction queue is stored by the running device, where the running instruction queue includes the running instruction and multiple parsing instructions.
  • the running instruction and the multiple parsing instructions are arranged in order according to the order in which they are executed.
  • the method may further include:
  • the running device determines that there is an association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cached first parsing instruction is executed.
  • the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes: a first storage address interval storing data required by the first parsing instruction and a zeroth storage address storing data required by the zeroth parsing instruction Intervals have overlapping areas.
  • the running device may be one or any combination of CPU, GPU, and NPU.
  • the instruction generating device is set in the CPU and / or NPU.
  • the macro instruction may include at least one of the following instructions: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.
  • the calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction.
  • the control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction.
  • the data handling macro instruction may include at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction.
  • the write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
  • the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters.
  • the run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
  • the neural network instruction processing method determines the operation device that executes the macro instruction according to the received macro instruction through the instruction generation device, and generates the operation instruction according to the macro instruction and the operation device;
  • the neural network model and the operation instructions analyze the operation instructions to obtain multiple analysis instructions, and execute the multiple analysis instructions based on the data to obtain the execution result.
  • the method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
  • a neural network instruction processing system the system includes an instruction generation device and an operation device,
  • the instruction generating device includes:
  • a device determination module configured to determine a running device that executes the macro instruction according to the received macro instruction
  • An instruction generation module configured to generate an operation instruction according to the macro instruction and the operation device
  • the operation equipment includes:
  • the control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
  • the execution module is configured to execute the multiple parsing instructions according to the data to obtain an execution result.
  • the instruction generating device further includes:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module includes at least one of the following sub-modules:
  • the first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ;
  • a second determining submodule configured to determine from the candidate device based on the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not include the identifier of the designated device A running device for executing the macro instruction;
  • a third determination submodule when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the alternative Equipment resource information, determine the operating equipment,
  • the execution condition includes: the specified device includes an instruction set corresponding to the macro instruction, and the resource information includes an instruction set included in the candidate device.
  • the macro instruction includes at least one of an input quantity and an output quantity
  • the instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes at least one of the following sub-modules:
  • the first instruction generating submodule is used to determine, according to the amount of operation data of the operation device and the data, when the operation device is determined to be one and the resource of the operation device does not satisfy the capacity condition for executing the macro instruction
  • the macro instruction is divided into multiple running instructions to enable the running device to execute multiple running instructions in sequence
  • a second instruction generation sub-module used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation device 'S running instructions
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on execution
  • the operation data amount of the operation equipment of the operation instruction is determined.
  • the instruction generating device further includes:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • the instruction generating device further includes:
  • the macro generation module is used for receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • the instruction generating device further includes:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • Clause A8 The system according to Clause A1, the operation device, further comprising:
  • a storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • control module includes:
  • An instruction storage sub-module for storing the operation instruction
  • An instruction processing sub-module which is used to parse the running instruction to obtain the multiple parsing instructions
  • a storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
  • the execution module includes:
  • the dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling instruction,
  • the calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
  • the control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction
  • the data handling macro instruction includes at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction.
  • the instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
  • the macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
  • the run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
  • a machine learning computing device comprising:
  • the machine learning computing device includes a plurality of the neural network instruction processing systems
  • the plurality of the neural network instruction processing systems may be connected and transmit data through a specific structure
  • a plurality of the neural network instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction processing systems share the same control system or own Respective control systems; multiple neural network instruction processing systems share memory or have their own memory; multiple neural network instruction processing systems are interconnected in an arbitrary topology.
  • a combined processing device comprising:
  • Machine learning computing device general interconnection interface and other processing devices as described in clause A12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a machine learning chip the machine learning chip includes:
  • Article A15 An electronic device, the electronic device comprising:
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause A14;
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • a neural network instruction processing method is applied to a neural network instruction processing system.
  • the system includes an instruction generation device and an operation device.
  • the method includes:
  • Obtain data, a neural network model, and an operation instruction through the operation device analyze the operation instruction, obtain a plurality of analysis instructions, and execute the plurality of analysis instructions according to the data to obtain an execution result.
  • Clause A18 The method according to Clause A17, the method further comprising:
  • the instruction generating device determines the operating device that executes the macro instruction according to the received macro instruction, including at least one of the following:
  • the designated device is determined as the running device
  • a candidate for executing the macro instruction is determined from the candidate device Running equipment
  • the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Running equipment,
  • the execution condition includes: the specified device includes an instruction set corresponding to the macro instruction, and the resource information includes an instruction set included in the candidate device.
  • the macro instruction includes at least one of an input quantity and an output quantity
  • generating an operation instruction includes:
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • Clause A20 According to the method of Clause A19, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including at least one of the following:
  • the macro instruction is split into according to the running data amount of the running device and the data amount Multiple operating instructions, so that the operating device sequentially executes multiple operating instructions;
  • the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on execution
  • the operation data amount of the operation equipment of the operation instruction is determined.
  • Clause A21 The method according to Clause A17, the method further comprising:
  • the instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause A22 The method according to Clause A18, the method further comprising:
  • the instruction generating device receives the instruction to be executed, and generates the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
  • Clause A23 The method according to Clause A17, the method further comprising:
  • sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction includes:
  • Clause A24 The method according to Clause A17, the method further comprising:
  • the running device includes a storage module
  • the storage module includes one or more of a register and a cache
  • the cache includes a high-speed temporary storage cache
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • Clause A25 The method according to Clause A17, the method further comprising:
  • the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
  • Clause A26 The method according to Clause A25, the method further comprising:
  • the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the macro instruction includes at least one of the following instructions:
  • the calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
  • the control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction
  • the data handling macro instruction includes at least one of a read macro instruction and a write macro instruction.
  • the read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction.
  • the instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
  • the macro instruction includes at least one of the following options:
  • the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction
  • the run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
  • calculation instructions For calculation instructions, the following calculation instruction generation devices and methods, as well as calculation instruction processing systems and methods, can be used for processing, and the foregoing can be better understood in accordance with the following clauses:
  • a calculation instruction generating device comprising:
  • a device determination module configured to determine a running device that executes the calculation macro instruction based on the received calculation macro instruction
  • An instruction generation module for generating an operation instruction according to the calculation macro instruction and the operation device
  • the calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
  • the calculation macro instruction includes an operation type, an input address and an output address
  • the operation instruction includes the operation type, an operation input address and an operation output address
  • the operation input address and the operation output address are based on the input The address and the output address are determined.
  • the device determination module includes:
  • the first determining submodule is configured to determine the designated device as the designated device when it is determined that the calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction Running equipment,
  • the execution condition includes that the designated device contains an instruction set corresponding to the calculation macro instruction.
  • Clause C3 The device according to Clause C2, the device further comprising:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module also includes:
  • a second determination submodule configured to determine, from the candidate device, the calculation macro instruction and the resource information of the candidate device according to the received computer macro instruction and the resource information of the candidate device Determine a running device for executing the calculation macro instruction
  • the resource information includes an instruction set included in the candidate device.
  • the device determination module further includes:
  • the third determining submodule when determining that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.
  • Clause C5 The device according to any one of Clauses C2-C4, the calculation macro instruction further includes at least one of an input quantity and an output quantity,
  • the instruction generation module is further used to determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes:
  • the first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the calculation macro instruction, according to the operating data amount of the operating device and the The amount of data splits the calculation macro instruction into multiple running instructions, so that the running device sequentially executes the multiple running instructions,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
  • the instruction generation module includes:
  • the second instruction generation submodule is used to split the calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation Equipment operating instructions,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
  • Clause C8 The device according to Clause C1, the device further comprising:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause C9 The device according to Clause C2, the device further comprising:
  • the macro generation module is used to receive the calculation instruction to be executed, and generate the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.
  • Clause C10 The device according to Clause C1, the device further comprising:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the device is set in the CPU and / or NPU;
  • the calculation macro instruction includes at least one of the following instructions: at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction;
  • the calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
  • a machine learning computing device comprising:
  • the machine learning operation device includes a plurality of the calculation instruction generation devices
  • the plurality of the calculation instruction generation devices may be connected and transmit data through a specific structure
  • a plurality of the calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the calculation instruction generation devices share the same control system or have their own A control system; a plurality of the calculation instruction generating devices share memory or have their own memories; the interconnection mode of the plurality of calculation instruction generating devices is an arbitrary interconnection topology.
  • a combined processing device comprising:
  • Machine learning computing device general interconnection interface and other processing devices as described in Clause C12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause C12 or a combination described in Article C13 Processing device
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • the calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
  • the calculation macro instruction includes an operation type, an input address and an output address
  • the operation instruction includes the operation type, an operation input address and an operation output address
  • the operation input address and the operation output address are based on the input The address and the output address are determined.
  • determining an operating device that executes the calculation macro instruction includes:
  • the calculation macro instruction includes the identification of the designated device, and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction, determining the designated device as the running device,
  • the execution condition includes that the designated device contains an instruction set corresponding to the calculation macro instruction.
  • Clause C17 The method according to Clause C16, the method further comprising:
  • determining a running device that executes the calculation macro instruction includes:
  • calculation macro instruction does not include the identification of the designated device, based on the received calculation macro instruction and the resource information of the candidate device, determining from the candidate device for performing the calculation Macro operating equipment,
  • the resource information includes an instruction set included in the candidate device.
  • determining the operating device that executes the calculation macro instruction includes:
  • the calculation macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the resource information of the calculation macro instruction and the candidate device To determine the operating equipment.
  • the calculation macro instruction further includes at least one of an input quantity and an output quantity
  • Generating an operation instruction according to the calculation macro instruction and the operation device including:
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction, and the resource information of the operation device includes:
  • the calculation macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
  • generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction, and the resource information of the operation device includes:
  • the calculation macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
  • Clause C22 The method according to Clause C15, the method further comprising:
  • the running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
  • Clause C23 The method according to Clause C16, the method further comprising:
  • Clause C24 The method according to Clause C15, the method further comprising:
  • sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
  • the running device is one or any combination of CPU, GPU and NPU;
  • the method is applied to CPU and / or NPU;
  • the calculation macro instruction includes at least one of the following instructions: at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
  • the calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
  • a computing instruction processing system the system includes an instruction generating device and an operating device,
  • the instruction generating device includes:
  • a device determination module configured to determine a running device that executes the calculation macro instruction based on the received calculation macro instruction
  • An instruction generation module configured to generate an operation instruction according to the calculation macro instruction and the operation device
  • the operation equipment includes:
  • the control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
  • An execution module configured to execute the multiple parsing instructions based on the data to obtain an execution result
  • the calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
  • the calculation macro instruction includes an operation type, an input address and an output address
  • the operation instruction includes the operation type, an operation input address and an operation output address
  • the operation input address and the operation output address are based on the input The address and the output address are determined.
  • the instruction generating device further includes:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module includes at least one of the following sub-modules:
  • the first determining submodule is configured to determine the designated device as the designated device when it is determined that the calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction Running equipment
  • a second determination submodule configured to determine, from the candidate device, the calculation macro instruction and the resource information of the candidate device according to the received computer macro instruction and the resource information of the candidate device Determine a running device for executing the calculation macro instruction;
  • the third determining submodule when determining that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,
  • the execution condition includes: the specified device includes an instruction set corresponding to the calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
  • the calculation macro instruction further includes at least one of an input quantity and an output quantity
  • the instruction generation module is further used to determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes at least one of the following sub-modules:
  • the first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the calculation macro instruction, according to the operating data amount of the operating device and the The amount of data splits the calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
  • the second instruction generation submodule is used to split the calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation Equipment operating instructions,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
  • the instruction generating device further includes:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • the instruction generating device further includes:
  • the macro generation module is used to receive the calculation instruction to be executed, and generate the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.
  • the instruction generating device further includes:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • Clause D8 The system according to Clause D1, the operation device, further comprising:
  • a storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • control module includes:
  • An instruction storage sub-module for storing the operation instruction
  • An instruction processing sub-module which is used to parse the running instruction to obtain the multiple parsing instructions
  • a storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
  • the execution module includes:
  • the dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the calculation macro instruction includes at least one of the following instructions: neural network calculation macro instruction, vector logic calculation macro instruction, matrix vector calculation macro instruction, scalar calculation macro instruction, and scalar logic calculation macro instruction;
  • the calculation macro instruction includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
  • a machine learning computing device comprising:
  • the plurality of calculation instruction processing systems may be connected and transmit data through a specific structure
  • a plurality of the computing instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the computing instruction processing systems share the same control system or have their own A control system; a plurality of the computing instruction processing systems share memory or have their own memories; the interconnection method of the plurality of computing instruction processing systems is any interconnected topology.
  • a combined processing device comprising:
  • Machine learning computing devices general interconnect interfaces and other processing devices as described in clause D12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the clause D12 or a combination described in the clause D13 Processing device
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • a method for processing calculation instructions is applied to a calculation instruction processing system.
  • the system includes an instruction generation device and an operation device.
  • the method includes:
  • the calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
  • the calculation macro instruction includes an operation type, an input address and an output address
  • the operation instruction includes the operation type, an operation input address and an operation output address
  • the operation input address and the operation output address are based on the input The address and the output address are determined.
  • Clause D16 The method according to Clause D15, the method further comprising:
  • the instruction generating device determines the operating device that executes the calculation macro instruction according to the received calculation macro instruction, including at least one of the following:
  • the designated device is determined as the running device
  • calculation macro instruction does not include the identification of the designated device, based on the received calculation macro instruction and the resource information of the candidate device, determining from the candidate device for performing the calculation Macro operating equipment;
  • the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the resource of the candidate device Information, determine operating equipment,
  • the execution condition includes: the specified device includes an instruction set corresponding to the calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
  • the calculation macro instruction further includes at least one of an input quantity and an output quantity
  • Generating an operation instruction according to the calculation macro instruction and the operation device including:
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • Clause D18 according to the method of Clause D17, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device, including at least one of the following:
  • the calculation macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;
  • the calculation macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
  • Clause D19 The method according to Clause D15, the method further comprising:
  • the instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause D20 The method according to Clause D16, the method further comprising:
  • the calculation instruction to be executed is received by the instruction generation device, and the calculation macro instruction is generated according to the determined identifier of the designated device and the calculation instruction to be executed.
  • Clause D21 The method according to Clause D15, the method further comprising:
  • sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction includes:
  • Clause D22 The method according to Clause D15, the method further comprising:
  • the operation device includes a storage module
  • the storage module includes one or more of a register and a cache
  • the cache includes a high-speed temporary storage cache
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • Clause D23 The method according to Clause D15, the method further comprising:
  • the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
  • Clause D24 The method according to Clause D23, the method further comprising:
  • the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the calculation macro instruction includes at least one of the following instructions: neural network calculation macro instruction, vector logic calculation macro instruction, matrix vector calculation macro instruction, scalar calculation macro instruction, and scalar logic calculation macro instruction;
  • the calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
  • Neural network computing instructions can be processed by the following computing instruction generating devices and methods, and neural network computing instruction processing systems, methods and other related products. The foregoing can be better understood according to the following clauses:
  • a neural network computing instruction generation device comprising:
  • a device determination module configured to determine a running device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction
  • An instruction generation module used to calculate macroinstructions and the operation equipment according to the neural network, and generate operation instructions
  • the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm
  • the neural network calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • the device determination module includes:
  • the first determining submodule is configured to, when it is determined that the neural network calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the neural network calculation macro instruction, the specified device Determined to be the operating device,
  • the execution condition includes: the specified device contains an instruction set corresponding to the neural network calculation macro instruction.
  • Clause E3 The device according to Clause E2, the device further comprising:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module also includes:
  • the second determining submodule is configured to determine the neural network calculation macro instruction does not include the identifier of the specified device, according to the received neural network calculation macro instruction and the resource information of the candidate device, from the Among the candidate devices, a running device for executing the neural network calculation macro instruction is determined,
  • the resource information includes an instruction set included in the candidate device.
  • Clause E4 The apparatus according to Clause E3, the device determination module, further comprising:
  • a third determining submodule when it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the neural network calculation macro instruction, according to the neural network
  • the network computing macro and the resource information of the candidate device determine the operating device.
  • the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity
  • the instruction generation module is also used to determine the data amount of the neural network calculation macroinstruction, and generate according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device Run instruction,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes:
  • the first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the amount of data of the macro network calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the neural network calculation macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
  • the instruction generation module includes:
  • the second instruction generation sub-module is used to split the neural network calculation macro instruction according to the operation data amount of each operation device and the data amount when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
  • Clause E8 The device according to Clause E1, the device further comprising:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause E9 The device according to Clause E2, the device further comprising:
  • the macro generation module is used to receive the neural network calculation instruction to be executed, and generate the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
  • Clause E10 The device according to Clause E1, the device further comprising:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • the running device is one or any combination of CPU, GPU and NPU
  • the device is set in the CPU and / or NPU;
  • the neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,
  • the neural network calculation macro instruction also includes at least one of the following options: the identifier, input amount, output amount, operand, and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter including At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
  • a machine learning computing device comprising:
  • One or more neural network computing instruction generation devices as described in any one of clauses E1 to E11, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
  • the machine learning computing device includes a plurality of the neural network computing instruction generating devices
  • the plurality of the neural network computing instruction generating devices may be connected and transmit data through a specific structure
  • a plurality of the neural network computing instruction generating devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network computing instruction generating devices share the same control system Or have their own control systems; multiple neural network computing instruction generating devices share memory or have their own memories; the interconnection method of multiple neural network computing instruction generating devices is any interconnected topology.
  • a combined processing device comprising:
  • Machine learning computing device general interconnection interface and other processing devices as described in clause E12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause E12 or a combination described in Article E13 Processing device
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • a neural network computing instruction generation method comprising:
  • the received neural network calculation macroinstruction determine a running device that executes the neural network calculation macroinstruction
  • the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm
  • the neural network calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • determining a running device that executes the neural network calculation macroinstruction includes:
  • the neural network computing macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the neural network computing macro instruction, determining the designated device as the running device,
  • the execution condition includes: the specified device contains an instruction set corresponding to the neural network calculation macro instruction.
  • Clause E17 The method according to Clause E16, the method further comprising:
  • determining a running device for executing the neural network calculation macro instruction includes:
  • the resource information includes an instruction set included in the candidate device.
  • determining a running device that executes the neural network calculation macroinstruction includes:
  • the neural network calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the neural network calculation macro instruction, calculate the macro instruction and the The resource information of the candidate equipment determines the operation equipment.
  • the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity
  • generating an operation instruction includes:
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the neural network is adjusted according to the running data amount of the running device and the data amount
  • the calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
  • the neural network calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
  • Clause E22 The method according to Clause E15, the method further comprising:
  • the running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
  • Clause E23 The method according to Clause E16, the method further comprising:
  • the neural network calculation instruction to be executed is received, and the neural network calculation macro instruction is generated according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
  • Clause E24 The method according to Clause E15, the method further comprising:
  • sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
  • the running device is one or any combination of CPU, GPU and NPU;
  • the method is applied to CPU and / or NPU;
  • the neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,
  • the neural network calculation macro instruction also includes at least one of the following options: the identifier, input amount, output amount, operand, and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter including At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
  • a neural network computing instruction processing system the system includes an instruction generating device and an operating device,
  • the instruction generating device includes:
  • a device determination module configured to determine a running device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction
  • An instruction generation module configured to calculate a macro instruction and the operation device according to the neural network, and generate an operation instruction
  • the operation equipment includes:
  • the control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
  • An execution module configured to execute the multiple parsing instructions based on the data to obtain an execution result
  • the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm
  • the neural network calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • the instruction generating device further includes:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module includes at least one of the following sub-modules:
  • the first determining submodule is configured to, when it is determined that the neural network calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the neural network calculation macro instruction, the specified device Determined to be the operating device;
  • the second determining submodule is configured to determine the neural network calculation macro instruction does not include the identifier of the specified device, according to the received neural network calculation macro instruction and the resource information of the candidate device, from the A running device for executing the neural network calculation macro instruction is determined from the candidate devices;
  • a third determining submodule when it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the neural network calculation macro instruction, based
  • the network calculation macro and the resource information of the candidate device determine the operating device
  • the execution condition includes: the specified device includes an instruction set corresponding to the neural network calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
  • the instruction generation module is also used to determine the data amount of the neural network calculation macroinstruction, and generate according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device Run instruction,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes at least one of the following sub-modules:
  • the first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the amount of data of the macro network calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the neural network calculation macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence,
  • the second instruction generation sub-module is used to split the neural network calculation macro instruction according to the operation data amount of each operation device and the data amount when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
  • the instruction generating device further includes:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • the instruction generating device further includes:
  • the macro generation module is used to receive the neural network calculation instruction to be executed, and generate the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
  • the instruction generation device further includes:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • Clause F8 The system according to Clause F1, the operation device, further comprising:
  • a storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • control module includes:
  • An instruction storage sub-module for storing the operation instruction
  • An instruction processing sub-module which is used to parse the running instruction to obtain the multiple parsing instructions
  • a storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
  • the execution module includes:
  • the dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction;
  • the neural network calculation macro instruction includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the neural network calculation macro instruction, and the instruction parameter includes a volume At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
  • a machine learning computing device comprising:
  • the multiple neural network computing instruction processing systems may be connected and transmit data through a specific structure
  • a plurality of the neural network computing instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network computing instruction processing systems share the same control system Or have their own control systems; a plurality of the neural network computing instruction processing systems share memory or have their own memories; the interconnection method of the plurality of neural network computing instruction processing systems is any interconnected topology.
  • a combined processing device comprising:
  • Machine learning computing devices general interconnection interfaces and other processing devices as described in clause F12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause F12 or a combination described in the clause F13 Processing device
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • a neural network computing instruction processing method is applied to a neural network computing instruction processing system.
  • the system includes an instruction generating device and an operating device.
  • the method includes:
  • the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm
  • the neural network calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • Clause F16 The method according to Clause F15, the method further comprising:
  • the instruction generating device determines the operating device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction, including at least one of the following:
  • the designated device When it is determined that the neural network computing macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the neural network computing macro instruction, determine the designated device as the running device;
  • the neural network calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the neural network calculation macro instruction
  • the macro instruction and all instructions are calculated according to the neural network Describe the resource information of the alternative equipment, determine the operating equipment,
  • the execution condition includes: the specified device includes an instruction set corresponding to the neural network calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
  • the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity
  • generating an operation instruction includes:
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • Clause F18 According to the method of Clause F17, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction, including at least one of the following:
  • the neural network is adjusted according to the running data amount of the running device and the data amount
  • the calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions
  • the neural network calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
  • Clause F19 The method according to Clause F15, the method further comprising:
  • the instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause F20 The method according to Clause F16, the method further comprising:
  • the instruction generating device receives the neural network calculation instruction to be executed, and generates the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
  • Clause F21 The method according to Clause F15, the method further comprising:
  • sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction includes:
  • Clause F22 The method according to Clause F15, the method further comprising:
  • the operation device includes a storage module
  • the storage module includes one or more of a register and a cache
  • the cache includes a high-speed temporary storage cache
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • Clause F23 The method according to Clause F15, the method further comprising:
  • the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
  • Clause F24 The method according to Clause F23, the method further comprising:
  • the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,
  • the neural network calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter includes At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
  • a vector logic calculation instruction generating device comprising:
  • a device determination module configured to determine a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction
  • An instruction generation module used to calculate a macro instruction and the running device according to the vector logic, and generate a running instruction
  • the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector
  • the vector logic calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address.
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • the device determination module includes:
  • the first determining submodule is configured to, when it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution condition for executing the vector logic calculation macro instruction, the specified device Determined to be the operating device,
  • the execution condition includes: the specified device contains an instruction set corresponding to the vector logic calculation macro instruction.
  • Clause G3 The device according to Clause G2, the device further comprising:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module also includes:
  • the second determining submodule is used to determine the vector logic calculation macro instruction and the resource information of the candidate device according to the received vector logic when determining that the vector logic calculation macro instruction does not include the identifier of the specified device, Among the candidate devices, a running device for executing the vector logic calculation macro instruction is determined,
  • the resource information includes an instruction set included in the candidate device.
  • Clause G4 The apparatus according to Clause G3, the device determination module, further comprising:
  • a third determining submodule when it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the vector logic calculation macro instruction, according to the vector Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device.
  • the vector logic calculation macro instruction further includes at least one of an input quantity and an output quantity
  • the instruction generation module is further used to determine the data amount of the vector logic calculation macro instruction, generate the operation instruction according to the data amount, the vector logic calculation macro instruction and the resource information of the running device,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes:
  • the first instruction generating submodule is used for determining that the number of running devices is one, and the amount of running data of the running device is less than the amount of data of the vector logic calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the vector logic calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
  • the instruction generation module includes:
  • the second instruction generation sub-module is used to split the vector logic calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
  • Clause G8 The device according to Clause G1, the device further comprising:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • Clause G9 The device according to Clause G2, the device further comprising:
  • the macro generation module is configured to receive a vector logic calculation instruction to be executed, and generate the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.
  • Clause G10 The device according to Clause G1, the device further comprising:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • the running device is one or any combination of CPU, GPU and NPU
  • the device is set in the CPU and / or NPU;
  • the vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
  • the vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.
  • a machine learning computing device comprising:
  • One or more vector logic calculation instruction generating devices as described in any one of clauses G1 to G11, used to obtain data to be operated and control information from other processing apparatuses, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
  • the machine learning operation device includes a plurality of the vector logic calculation instruction generation devices
  • the plurality of vector logic calculation instruction generation devices can be connected and transmit data through a specific structure
  • a plurality of the vector logic calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the vector logic calculation instruction generation devices share the same control system Or have their own control systems; a plurality of the vector logic calculation instruction generating devices share memory or have their own memories; the interconnection mode of the plurality of vector logic calculation instruction generating devices is an arbitrary interconnection topology.
  • a combined processing device comprising:
  • Machine learning computing devices general interconnection interfaces and other processing devices as described in Clause G12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article G12 or a combination described in Article G13 Processing device
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • a vector logic calculation instruction generation method including:
  • the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector
  • the vector logic calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address.
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • determining a running device that executes the vector logic calculation macroinstruction includes:
  • the vector logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the vector logic calculation macro instruction, determining the specified device as the running device,
  • the execution condition includes that the specified device contains an instruction set corresponding to the vector logic calculation macro instruction.
  • Clause G17 The method according to Clause G16, the method further comprising:
  • determining a running device that executes the vector logic calculation macro instruction includes:
  • the macro instruction and the resource information of the candidate device are calculated according to the received vector logic calculation, and the A running device that executes the vector logic calculation macro instruction,
  • the resource information includes an instruction set included in the candidate device.
  • determining a running device that executes the vector logic calculation macroinstruction includes:
  • the macro instruction and the The resource information of the candidate equipment determines the operation equipment.
  • the vector logic calculation macro instruction further includes at least one of an input quantity and an output quantity
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • Clause G20 According to the method described in Clause G19, calculate the data amount of the macro instruction according to the vector logic, the vector logic calculation macro instruction and the resource information of the running device, to generate the running instruction, including:
  • the vector logic is adjusted according to the running data amount of the running device and the data amount
  • the calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions
  • the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.
  • Clause G21 According to the method described in Clause G19, according to the method described in Clause G19, according to the data amount of the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device, generating the running instruction includes:
  • the vector logic calculation macro instruction is split according to the running data amount of each running device and the data amount, and a running instruction corresponding to each running device is generated,
  • the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
  • Clause G22 The method according to Clause G15, the method further comprising:
  • the running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
  • Clause G23 The method according to Clause G16, the method further comprising:
  • Clause G24 The method according to Clause G15, the method further comprising:
  • sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
  • the running device is one or any combination of CPU, GPU and NPU;
  • the method is applied to CPU and / or NPU;
  • the vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
  • the vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.
  • a vector logic calculation instruction processing system the system includes an instruction generation device and an operation device,
  • the instruction generating device includes:
  • a device determination module configured to determine a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction
  • An instruction generation module used to calculate a macro instruction and the running device according to the vector logic, and generate a running instruction
  • the operation equipment includes:
  • the control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
  • An execution module configured to execute the multiple parsing instructions based on the data to obtain an execution result
  • the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector
  • the vector logic calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address.
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • the instruction generating device further includes:
  • Resource acquisition module used to obtain resource information of candidate devices
  • the device determination module includes at least one of the following sub-modules:
  • the first determining submodule is configured to, when it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution condition for executing the vector logic calculation macro instruction, the specified device Determined to be the operating device;
  • the second determining submodule is used to determine the vector logic calculation macro instruction and the resource information of the candidate device according to the received vector logic when determining that the vector logic calculation macro instruction does not include the identifier of the specified device, A running device for executing the vector logic calculation macro instruction in the candidate device is determined;
  • a third determining submodule when it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the vector logic calculation macro instruction, according to the vector Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device,
  • the execution condition includes: the specified device includes an instruction set corresponding to the vector logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
  • the vector logic calculation macroinstruction further includes at least one of an input quantity and an output quantity
  • the instruction generation module is further used to determine the data amount of the vector logic calculation macro instruction, and generate the data amount according to the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device to generate Run instruction,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  • the instruction generation module includes at least one of the following sub-modules:
  • the first instruction generating submodule is used for determining that the number of running devices is one, and the amount of running data of the running device is less than the amount of data of the vector logic calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the vector logic calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
  • the second instruction generation sub-module is used to split the vector logic calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
  • the operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount.
  • the operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
  • the instruction generating device further includes:
  • the queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
  • the instruction generating device further includes:
  • the macro generation module is configured to receive a vector logic calculation instruction to be executed, and generate the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.
  • the instruction generating device further includes:
  • An instruction dispatching module configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction
  • the instruction dispatching module includes:
  • An instruction assembly submodule used to generate an assembly file according to the operation instruction
  • An assembly translation submodule used to translate the assembly file into a binary file
  • An instruction sending submodule configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  • Clause H8 The system according to Clause H1, the operation device, further comprising:
  • a storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
  • the cache is used to store the data
  • the register is used to store scalar data in the data.
  • control module includes:
  • An instruction storage sub-module for storing the operation instruction
  • An instruction processing sub-module which is used to parse the running instruction to obtain the multiple parsing instructions
  • a storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
  • the execution module includes:
  • the dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
  • association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
  • the first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
  • the running device is one or any combination of CPU, GPU and NPU;
  • the instruction generating device is set in the CPU and / or NPU;
  • the vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
  • the vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.
  • a machine learning computing device comprising:
  • the machine learning operation device includes a plurality of the vector logic calculation instruction processing systems
  • the plurality of vector logic calculation instruction processing systems may be connected and transmit data through a specific structure
  • a plurality of the vector logic calculation instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the vector logic calculation instruction processing systems share the same control system Or have their own control systems; multiple of the vector logic computing instruction processing systems share memory or have their own memory; the interconnection method of multiple vector logic computing instruction processing systems is any interconnected topology.
  • a combined processing device comprising:
  • Machine learning computing device general interconnection interface and other processing devices as described in clause H12;
  • the machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
  • the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  • a board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the clause H12 or a combination described in the clause H13 Processing device
  • the machine learning chip is respectively connected to the storage device, the control device and the interface device;
  • the storage device is used for storing data
  • the interface device is used to realize data transmission between the machine learning chip and an external device
  • the control device is used for monitoring the state of the machine learning chip.
  • a vector logic calculation instruction processing method is applied to a vector logic calculation instruction processing system.
  • the system includes an instruction generation device and an operation device.
  • the method includes:
  • the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector
  • the vector logic calculation macro instruction includes the operation type, input address and output address
  • the operation instruction includes the operation type, operation input address and operation output address.
  • the operation input address and the operation output address are respectively The input address and the output address are determined.
  • Clause H16 The method according to Clause H15, the method further comprising:
  • the instruction generating device determines the running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction, including at least one of the following:
  • the vector logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the vector logic calculation macro instruction, determine the specified device as the running device;
  • the macro instruction and the resource information of the candidate device are calculated according to the received vector logic calculation, and the A running device that executes the vector logic calculation macro instruction;
  • the execution condition includes: the specified device includes an instruction set corresponding to the vector logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
  • the vector logic calculation macroinstruction further includes at least one of an input quantity and an output quantity
  • Determining the data amount of the vector logic calculation macro instruction generating the execution instruction according to the data amount of the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device,
  • the data amount is determined according to at least one of the input amount and the output amount
  • the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Advance Control (AREA)

Abstract

A computation method and apparatus, and a related product. A combined processing apparatus therein comprises: a machine learning computation apparatus, a universal interconnection interface, and other processing apparatuses; the machine learning computation apparatus interacts with the other processing apparatuses to collectively implement calculation operations specified by a user, and the combined processing apparatus also comprises: a storage apparatus, the storage apparatus respectively being connected to the machine learning computation apparatus and the other processing apparatuses, and being used for storing data of the machine learning computation apparatus and the other processing apparatuses. The present computation method and apparatus and related product can be used across platforms, and have good usability, rapid command conversion speed, high processing efficiency, low probability of error, and low labour and material costs for development.

Description

运算方法、装置及相关产品Calculation method, device and related products 技术领域Technical field
本公开涉及信息处理技术领域,特别是涉及一种神经网络指令生成方法、装置及相关产品。The present disclosure relates to the field of information processing technology, and in particular, to a neural network instruction generation method, device, and related products.
背景技术Background technique
随着科技的不断发展,神经网络算法的使用越来越广泛。其在图像识别、语音识别、自然语言处理等领域中都得到了良好的应用。但由于神经网络算法的复杂度越来越高,其模型的规模不断增大。基于图形处理器(Graphics Processing Unit,简称GPU)、中央处理器(Central Processing Unit,简称CPU)的大规模的神经网络模型,要花费大量的计算时间,且耗电量大。相关技术中,对神经网络模型的处理速度进行加快的方式存在无法跨平台处理、处理效率低、开发成本高、易出错等问题。With the continuous development of technology, the use of neural network algorithms is becoming more and more widespread. It has been well used in image recognition, speech recognition, natural language processing and other fields. However, due to the increasing complexity of neural network algorithms, the scale of its models continues to increase. A large-scale neural network model based on Graphics (Processing Unit, GPU) and Central Processing Unit (CPU) requires a lot of calculation time and consumes a lot of power. In the related art, the method of accelerating the processing speed of the neural network model has problems such as inability to cross-platform processing, low processing efficiency, high development cost, and error-prone.
发明内容Summary of the invention
基于此,有必要针对上述技术问题,提供一种神经网络指令生成方法、装置及相关产品,使其能够跨平台使用,提高处理效率,降低出错几率和开发成本。Based on this, it is necessary to provide a neural network instruction generation method, device, and related products for the above technical problems, so that it can be used across platforms, improve processing efficiency, and reduce the probability of errors and development costs.
根据本公开的第一方面,提供了一种神经网络指令生成装置,所述装置包括:According to a first aspect of the present disclosure, there is provided a neural network instruction generation device, the device comprising:
设备确定模块,用于根据接收到的宏指令,确定执行所述宏指令的运行设备;A device determination module, configured to determine a running device that executes the macro instruction according to the received macro instruction;
指令生成模块,用于根据所述宏指令和所述运行设备,生成运行指令。The instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.
根据本公开的第二方面,提供了一种机器学习运算装置,所述装置包括:According to a second aspect of the present disclosure, there is provided a machine learning computing device, the device comprising:
一个或多个上述第一方面所述的神经网络指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more neural network instruction generation devices described in the first aspect above, used to obtain data and control information to be calculated from other processing devices, and perform a specified machine learning operation, and pass the execution result to the I / O interface Other processing devices;
当所述机器学习运算装置包含多个所述神经网络指令生成装置时,所述多个所述神经网络指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the neural network instruction generation devices, the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure;
其中,多个所述神经网络指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述神经网络指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述神经网络指令生成装置共享内存或者拥有各自的内存;多个所述神经网络指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.
根据本公开的第三方面,提供了一种组合处理装置,所述装置包括:According to a third aspect of the present disclosure, there is provided a combined processing device, the device comprising:
上述第二方面所述的机器学习运算装置、通用互联接口和其他处理装置;The machine learning computing device, general interconnection interface and other processing devices described in the second aspect above;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作。The machine learning operation device interacts with the other processing device to jointly complete the calculation operation specified by the user.
根据本公开的第四方面,提供了一种机器学习芯片,所述机器学习芯片包括上述第二方面所述的机器学习络运算装置或上述第三方面所述的组合处理装置。According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network computing device described in the second aspect above or the combined processing device described in the third aspect above.
根据本公开的第五方面,提供了一种机器学习芯片封装结构,该机器学习芯片封装结构包括上述第四方面所述的机器学习芯片。According to a fifth aspect of the present disclosure, there is provided a machine learning chip packaging structure including the machine learning chip described in the fourth aspect above.
根据本公开的第六方面,提供了一种板卡,该板卡包括上述第五方面所述的机器学习芯片封装结构。According to a sixth aspect of the present disclosure, there is provided a board card including the machine learning chip packaging structure described in the fifth aspect above.
根据本公开的第七方面,提供了一种电子设备,所述电子设备包括上述第四方面所述的机器学习芯片或上述第六方面所述的板卡。According to a seventh aspect of the present disclosure, there is provided an electronic device including the machine learning chip described in the above fourth aspect or the board described in the sixth aspect above.
根据本公开的第八方面,提供了一种神经网络指令生成方法,所述方法包括:According to an eighth aspect of the present disclosure, there is provided a neural network instruction generation method, the method comprising:
根据接收到的宏指令,确定执行所述宏指令的运行设备;According to the received macro instruction, determine the execution device for executing the macro instruction;
根据所述宏指令和所述运行设备,生成运行指令。According to the macro instruction and the operation device, an operation instruction is generated.
在一些实施例中,所述电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In some embodiments, the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and / or medical devices.
在一些实施例中,所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微 波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。In some embodiments, the vehicle includes an airplane, ship, and / or vehicle; the household appliance includes a TV, air conditioner, microwave oven, refrigerator, rice cooker, humidifier, washing machine, electric lamp, gas stove, and range hood; and the medical Equipment includes MRI, B-mode ultrasound and / or electrocardiograph.
本公开实施例所提供的神经网络指令生成方法、装置及相关产品,该装置包括设备确定模块和指令生成模块,设备确定模块用于根据接收到的宏指令,确定执行宏指令的运行设备。指令生成模块用于根据宏指令和运行设备,生成运行指令。该方法、装置及相关产品可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The neural network instruction generation method, device and related products provided by the embodiments of the present disclosure include a device determination module and an instruction generation module. The device determination module is used to determine a running device that executes the macro instruction based on the received macro instruction. The instruction generation module is used to generate the operation instruction according to the macro instruction and the operation equipment. The method, device and related products can be used across platforms, with good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material costs for development.
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。Other features and aspects of the present disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图说明BRIEF DESCRIPTION
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。The drawings included in the specification and forming a part of the specification together with the specification show exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principles of the present disclosure.
图1示出根据本公开实施例的指令生成、处理方法的处理器的示意图。FIG. 1 shows a schematic diagram of a processor of an instruction generation and processing method according to an embodiment of the present disclosure.
图2示出根据本公开一实施例的神经网络指令生成装置的框图。2 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.
图3示出根据本公开一实施例的神经网络指令生成装置的框图。FIG. 3 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.
图4示出根据本公开一实施例的神经网络指令生成方法的流程图。FIG. 4 shows a flowchart of a neural network instruction generation method according to an embodiment of the present disclosure.
图5示出根据本公开一实施例的神经网络指令处理系统的框图。5 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure.
图6示出根据本公开一实施例的神经网络指令处理系统的框图。6 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure.
图7示出根据本公开一实施例的神经网络指令处理方法的流程图。7 shows a flowchart of a neural network instruction processing method according to an embodiment of the present disclosure.
图8a、图8b示出根据本公开一实施例的神经网络指令生成装置、处理系统的应用场景的示意图。8a and 8b are schematic diagrams illustrating application scenarios of a neural network instruction generation device and a processing system according to an embodiment of the present disclosure.
图9a、图9b示出根据本公开一实施例的组合处理装置的框图。9a and 9b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
图10示出根据本公开一实施例的板卡的结构示意图。FIG. 10 shows a schematic structural diagram of a board according to an embodiment of the present disclosure.
具体实施方式detailed description
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、“第零”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "zeroth", etc. in the claims, specification, and drawings of the present disclosure are used to distinguish different objects, not to describe a specific order. The terms "comprising" and "including" used in the specification and claims of the present disclosure indicate the presence of the described features, wholes, steps, operations, elements and / or components, but do not exclude one or more other features, wholes , Steps, operations, elements, components and / or their existence or addition.
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terminology used in the present specification of the disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the disclosure. As used in this disclosure specification and claims, the singular forms "a", "an", and "the" are intended to include the plural forms unless the context clearly dictates otherwise. It should also be further understood that the term "and / or" used in the present specification and claims refers to any and all possible combinations of one or more of the associated listed items and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and claims, the term "if" may be interpreted as "when" or "once" or "in response to a determination" or "in response to a detection" depending on the context. Similarly, the phrase "if determined" or "if [described condition or event] is detected" may be interpreted in the context to mean "once determined" or "in response to a determination" or "once detected [described condition or event ] "Or" In response to detection of [the described condition or event] ".
根据本公开实施例的指令生成、指令处理方法可应用于处理器中,该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或 组合。本公开对处理器的具体类型不作限制。The instruction generation and instruction processing method according to the embodiments of the present disclosure may be applied to a processor, which may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or may be used to perform artificial intelligence operations Artificial Intelligence Processor (IPU). Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, etc. The artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Processing, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. This disclosure does not limit the specific types of processors.
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit may independently run various assigned tasks, such as: convolution operation tasks and pooling tasks Or fully connected tasks. The present disclosure does not limit the processing unit and the tasks executed by the processing unit.
图1示出根据本公开实施例的指令生成、处理方法的处理器的示意图。如图1所示,处理器100包括多个处理单元101以及存储单元102,多个处理单元101用于执行指令序列,存储单元102用于存储数据,可包括随机存储器(RAM,Random Access Memory)和寄存器堆。处理器100中的多个处理单元101既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。FIG. 1 shows a schematic diagram of a processor of an instruction generation and processing method according to an embodiment of the present disclosure. As shown in FIG. 1, the processor 100 includes a plurality of processing units 101 and a storage unit 102. The plurality of processing units 101 are used to execute an instruction sequence, and the storage unit 102 is used to store data, which may include a random access memory (RAM, Random Access Memory) And register file. The multiple processing units 101 in the processor 100 can share a part of the storage space, for example, share a part of the RAM storage space and the register file, and can also have their own storage spaces at the same time.
图2示出根据本公开一实施例的神经网络指令生成装置的框图。如图2所示,该装置包括设备确定模块11和指令生成模块12。设备确定模块11用于根据接收到的宏指令,确定执行宏指令的运行设备。指令生成模块12用于根据宏指令和运行设备,生成运行指令。2 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure. As shown in FIG. 2, the device includes a device determination module 11 and an instruction generation module 12. The device determination module 11 is used to determine a running device that executes the macro instruction according to the received macro instruction. The instruction generation module 12 is used to generate an operation instruction according to the macro instruction and the operation device.
在该实现方式中,宏指令是一种批量处理的称谓,宏指令可以是一种规则或模式,或称语法替换,在遇到宏指令时会自动进行这一规则或模式的替换。宏指令可以是对常用的用于对数据进行计算、控制和搬运等处理的待执行指令整合形成的。In this implementation mode, a macro instruction is a term for batch processing. The macro instruction can be a rule or pattern, or grammatical replacement. When a macro instruction is encountered, the rule or pattern is automatically replaced. The macro can be formed by integrating the to-be-executed instructions that are commonly used for calculation, control, and transportation of data.
在一种可能的实现方式中,宏指令可以包括以下至少一种:计算宏指令、控制宏指令和数据搬运宏指令。其中,计算宏指令可以包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种。控制宏指令可以包括无条件跳转宏指令和有条件跳转宏指令中的至少一种。数据搬运宏指令可以包括读宏指令和写宏指令中的至少一种。读宏指令可以包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。写宏指令可以包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。In a possible implementation manner, the macro instruction may include at least one of the following: a calculation macro instruction, a control macro instruction, and a data handling macro instruction. The calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction. The control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction. The data handling macro instruction may include at least one of a read macro instruction and a write macro instruction. The read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction. The write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
在一种可能的实现方式中,宏指令可以包含以下选项中的至少一项:用于执行宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数。运行指令可以包含以下选项中的至少一项:操作类型、输入地址、输出地址、操作数和指令参数。In a possible implementation, the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters. The run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
其中,指定设备的标识可以是指定设备的物理地址、IP地址、名称、编号等标识。标识可以包括数字、字母、符号中的其中一种或任意组合。在宏指令的指定设备的标识的位置为空时,确定该宏指令无指定设备;或者,在宏指令中不包含“指定设备的标识”这个字段时,确定该宏指令无指定设备。操作类型可以是指该宏指令对数据所进行操作的类型,表征该宏指令的具体类型,如在某宏指令的操作类型为“XXX”时,可以根据“XXX”确定该宏指令对数据所进行的操作的具体类型。根据操作类型可以确定执行该宏指令所需的指令集合,如在某宏指令的操作类型为“XXX”时,其所需的指令集合为进行“XXX”所对应的处理所需的所有指令集。输入地址可以是数据的输入地址、读取地址等获得数据的地址,输出地址可以是被处理后的数据的输出地址、写入地址等存储数据的地址。输入量可以是数据的输入规模、输入长度等表征其数据量大小的信息。输出量可以是数据的输出规模、输出长度等表征其数据量的大小的信息。操作数可以包括寄存器的长度、寄存器的地址、寄存器的标识、立即数等。立即数为在立即寻址方式指令中给出的数。指令参数可以是指对应于该宏指令、与其执行相关的参数。例如,指令参数可以是第二个操作数的地址和长度等。指令参数可以是卷积核的大小、卷积核的步长和卷积核的填充等。The identification of the designated device may be the identification of the physical address, IP address, name, and number of the designated device. The identification may include one or any combination of numbers, letters, and symbols. When the position of the identifier of the designated device of the macro instruction is empty, it is determined that the macro instruction has no designated device; or, when the field "ID of the designated device" is not included in the macro instruction, it is determined that the macro instruction has no designated device. The operation type can refer to the type of operation of the macro instruction on the data, which represents the specific type of the macro instruction. For example, when the operation type of a macro instruction is "XXX", the macro instruction can be determined according to "XXX" The specific type of operation performed. The instruction set required to execute the macro instruction can be determined according to the operation type. For example, when the operation type of a macro instruction is "XXX", the required instruction set is all instruction sets required for the processing corresponding to "XXX" . The input address may be an input address of data, a read address, etc. to obtain data, and the output address may be an output address of processed data, a write address, etc. to store data. The input amount may be information characterizing the size of the data such as the input scale and input length of the data. The output volume may be information indicating the size of the data volume such as the output scale and output length of the data. Operands can include the length of the register, the address of the register, the identification of the register, the immediate value, and so on. The immediate number is the number given in the immediate addressing mode instruction. The instruction parameter may refer to a parameter corresponding to the macro instruction and related to its execution. For example, the instruction parameter may be the address and length of the second operand. The instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, and so on.
在该实现方式中,对于一个宏指令,其必须包括操作码和至少一个操作域,其中操作码即为操作类型,操作域包括指定设备的标识、输入地址、输出地址、输入量、输出量、操作数和指令参数。操作码可以是计算机程序中所规定的要执行操作的那一部分指令或字段(通常用代码表示),是指令序列号,用来告知执行指令的装置具体需要执行哪一条指令。操作域可以是执行对应的指令所需的所有数据的来源,执行对应的指令所需的所有数据包括参数数据、待运算或待处理的数据、对应的运算方法,或者存储参数数据、待运算或待处理的数据、对应的运算方法的地址等等。In this implementation, for a macro instruction, it must include an operation code and at least one operation field, where the operation code is the operation type, and the operation field includes the identification of the specified device, input address, output address, input volume, output volume, Operands and instruction parameters. The operation code may be the part of the instruction or field (usually represented by code) specified in the computer program to perform the operation, and is an instruction sequence number used to inform the device that executes the instruction which instruction needs to be executed. The operation domain may be the source of all data required to execute the corresponding instruction. All data required to execute the corresponding instruction include parameter data, data to be operated or processed, corresponding operation method, or stored parameter data, to be operated or The data to be processed, the address of the corresponding operation method, etc.
应当理解的是,本领域技术人员可以根据需要对宏指令的指令格式以及所包含的内容进行设置,本公开对此不作限制。It should be understood that, those skilled in the art can set the instruction format and the content of the macro instruction as needed, and this disclosure does not limit this.
在本实施例中,设备确定模块11可以根据宏指令确定一个或多个运行设备。指令生成模块12可以生成一个或多个运行指令。在生成的运行指令为多个时,多个运行指令可以在同一个运行设备中被执行,也可以在不同的运行设备中被执行,本公开对此不作限制。In this embodiment, the device determination module 11 may determine one or more running devices according to the macro instruction. The instruction generation module 12 may generate one or more execution instructions. When there are multiple generated running instructions, the multiple running instructions may be executed in the same running device, or may be executed in different running devices, which is not limited in the present disclosure.
本公开实施例所提供的神经网络指令生成装置包括设备确定模块和指令生成模块,设备确定模块用于根据接收到的宏指令,确定执行宏指令的运行设备。指令生成模块用于根据宏指令和运行设备,生成运行指令。该装置可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The neural network instruction generation device provided by the embodiment of the present disclosure includes a device determination module and an instruction generation module. The device determination module is used to determine a running device that executes the macro instruction according to the received macro instruction. The instruction generation module is used to generate the operation instruction according to the macro instruction and the operation equipment. The device can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
图3示出根据本公开一实施例的神经网络指令生成装置的框图。在一种可能的实现方式中,如图3所示,该装置还可以包括宏指令生成模块13。宏指令生成模块13用于接收待执行指令,根据确定的指定设备的标识和待执行指令生成宏指令。FIG. 3 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 3, the device may further include a macro instruction generation module 13. The macro generation module 13 is used to receive the instruction to be executed, and generate a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
在该实现方式中,指定设备可以是根据待执行指令的操作类型、输入量、输出量等确定的。所接收到的待执行指令可以是一条,也可以是多条。In this implementation, the designated device may be determined according to the operation type, input amount, output amount, etc. of the instruction to be executed. The received instruction to be executed may be one or multiple instructions.
待执行指令可以包括以下至少一种:待执行计算指令、待执行控制指令和待执行数据搬运指令。其中,待执行计算指令可以包括待执行神经网络计算指令、待执行向量逻辑计算指令、待执行矩阵向量计算指令、待执行标量计算指令和待执行标量逻辑计算指令中的至少一种。待执行控制指令可以包括待执行无条件跳转指令和待执行有条件跳转指令中的至少一种。待执行数据搬运指令可以包括待执行读指令和待执行写指令中的至少一种。待执行读指令可以包括待执行读神经元指令、待执行读突触指令和待执行读标量指令中的至少一种。待执行写指令可以包括待执行写神经元指令、待执行写突触指令和待执行写标量指令中的至少一种。The instruction to be executed may include at least one of the following: a calculation instruction to be executed, a control instruction to be executed, and a data handling instruction to be executed. The calculation instruction to be executed may include at least one of a neural network calculation instruction to be executed, a vector logic calculation instruction to be executed, a matrix vector calculation instruction to be executed, a scalar calculation instruction to be executed and a scalar logic calculation instruction to be executed. The control instruction to be executed may include at least one of an unconditional jump instruction to be executed and a conditional jump instruction to be executed. The data carrying instruction to be executed may include at least one of a reading instruction to be executed and a writing instruction to be executed. The read instruction to be executed may include at least one of a read neuron instruction to be executed, a read synapse instruction to be executed and a read scalar instruction to be executed. The write instruction to be executed may include at least one of a write neuron instruction to be executed, a write synapse instruction to be executed and a write scalar instruction to be executed.
待执行指令可以包含以下选项中的至少一项:操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数。The instruction to be executed may include at least one of the following options: operation type, input address, output address, input amount, output amount, operand, and instruction parameter.
在该实现方式中,在待执行指令为一个时,可以将确定的指定设备的标识添加到待执行指令中,生成宏指令。举例来说,某待执行指令m为“XXX……param)”。其中,XXX为操作类型,param为指令参数。可以根据该待执行指令m的操作类型“XXX”确定其指定设备m-1。然后,在待执行指令m中添加指定设备m-1的标识(例如,09),生成对应该待执行指令m的宏指令M“XXX 09,……param”。在待执行指令为多个时,可以将确定的每个待执行指令所对应的指定设备的标识添加到待执行指令中,根据带有指定设备的标识的多个待执行指令,生成一个宏指令,或者生成对应的多个宏指令。In this implementation manner, when there is only one instruction to be executed, the determined identifier of the designated device may be added to the instruction to be executed to generate a macro instruction. For example, a certain instruction m to be executed is "XXX ...... param)". Among them, XXX is the operation type, param is the instruction parameter. The designated device m-1 can be determined according to the operation type "XXX" of the instruction m to be executed. Then, an identifier (for example, 09) specifying the device m-1 is added to the instruction m to be executed, and a macro instruction M "XXX09, ... param" corresponding to the instruction m to be executed is generated. When there are multiple instructions to be executed, the identifier of the specified device corresponding to each instruction to be executed can be added to the instruction to be executed, and a macro instruction can be generated according to the plurality of instructions to be executed with the identifier of the specified device , Or generate multiple corresponding macros.
应当理解的是,本领域技术人员可以根据需要对待执行指令的指令格式以及所包含的内容进行设置,本公开对此不作限制。It should be understood that those skilled in the art can set the instruction format and the content of the instructions to be executed as needed, and this disclosure does not limit this.
在一种可能的实现方式中,如图3所示,设备确定模块11可以包括第一确定子模块111。第一确定子模块111用于在确定宏指令中包含指定设备的标识,且指定设备的资源满足执行宏指令的执行条件时,将指定设备确定为运行设备。其中,执行条件可以包括:指定设备中包含与宏指令相对应的指令集。In a possible implementation manner, as shown in FIG. 3, the device determination module 11 may include a first determination submodule 111. The first determining submodule 111 is used to determine that the designated device is an operating device when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the macro instruction. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
在该实现方式中,宏指令中可以包含执行宏指令的一个或多个指定设备的标识。在宏指令中包含指定设备的标识,且指定设备的资源满足执行条件时,第一确定子模块111可以直接将指定设备确定为运行设备,节省基于宏指令生成运行指令的生成时间,且可以保证所生成的运行指令能够被对应的运行设备所执行。In this implementation, the macro instruction may include the identifier of one or more designated devices that execute the macro instruction. When the identifier of the specified device is included in the macro instruction, and the resources of the specified device meet the execution conditions, the first determination submodule 111 can directly determine the specified device as the running device, saving the generation time of generating the running instruction based on the macro instruction, and ensuring that The generated operation instruction can be executed by the corresponding operation device.
在一种可能的实现方式中,如图3所示,该装置还可以包括资源获取模块14。设备确定模块11还可以包括第二确定子模块112。资源获取模块14用于获取备选设备的资源信息。第二确定子模块112用于在确定宏指令中不包含指定设备的标识时,根据接收到的宏指令和备选设备的资源信息,从备选设备中确定出用于执行宏指令的运行设备。其中,资源信息可以包括备选设备所包含的指令集。备选设备所包含的指令集可以是对应于一种或多种宏指令的操作类型的指令集合。备选设备所包含的指令集越多,备选设备能够执行的宏指令的类型越多。In a possible implementation manner, as shown in FIG. 3, the device may further include a resource acquisition module 14. The device determination module 11 may further include a second determination sub-module 112. The resource acquisition module 14 is used to acquire resource information of candidate devices. The second determining submodule 112 is used to determine the running device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not contain the identifier of the specified device . The resource information may include the instruction set included in the candidate device. The instruction set included in the alternative device may be an instruction set corresponding to the operation type of one or more macroinstructions. The more instruction sets the candidate device contains, the more types of macro instructions the candidate device can execute.
在该实现方式中,第二确定子模块112在确定宏指令中不包含指定设备的标识时,可以从备选设备中确定出能够执行宏指令的一个或多个运行设备。其中,所确定的运行设备的指令集中包括与宏指 令相对应的指令集合。例如,接收到的宏指令为神经网络计算宏指令,可将包含对应于神经网络计算宏指令的指令集的备选设备确定为运行设备,以保证其可以运行生成的运行指令。In this implementation manner, when it is determined that the macro instruction does not include the identifier of the specified device, the second determination submodule 112 may determine one or more running devices capable of executing the macro instruction from the candidate devices. Among them, the determined instruction set of the operating device includes the instruction set corresponding to the macro instruction. For example, if the received macro instruction is a neural network calculation macro instruction, the candidate device containing the instruction set corresponding to the neural network calculation macro instruction may be determined as an operation device to ensure that it can execute the generated operation instruction.
在一种可能的实现方式中,如图3所示,设备确定模块11还可以包括第三确定子模块113。第三确定子模块113在确定宏指令中包含指定设备的标识,且指定设备的资源不满足执行宏指令的执行条件时,根据宏指令和备选设备的资源信息,确定运行设备。In a possible implementation manner, as shown in FIG. 3, the device determination module 11 may further include a third determination sub-module 113. The third determining submodule 113 determines that the operating device is based on the macro instruction and the resource information of the candidate device when it is determined that the macro instruction contains the identifier of the designated device, and the resource of the designated device does not satisfy the execution conditions for executing the macro instruction.
在该实现方式中,第三确定子模块113在确定宏指令中包含指定设备的标识,且指定设备的资源不满足执行条件时,可以认定该宏指令的指定设备不具备执行宏指令的能力。第三确定子模块113可以从备选设备中确定运行设备,可以将包含与宏指令相对应的指令集的备选设备确定为运行设备。In this implementation manner, when the third determining submodule 113 determines that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions, it may be determined that the designated device of the macro instruction does not have the ability to execute the macro instruction. The third determining sub-module 113 may determine the running device from the candidate devices, and may determine the candidate device containing the instruction set corresponding to the macro instruction as the running device.
在一种可能的实现方式中,如图3所示,宏指令可以包含输入量和输出量中的至少一项,指令生成模块12还用于确定宏指令的数据量,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令。其中,宏指令的数据量可以是根据输入量和输出量中的至少一项确定的,运行设备的资源信息还可以包括存储容量、剩余存储容量的至少一项。In a possible implementation manner, as shown in FIG. 3, the macro instruction may include at least one of an input amount and an output amount, and the instruction generation module 12 is further used to determine the data amount of the macro instruction, according to the data amount of the macro instruction , Macro instruction and resource information of running equipment to generate running instruction. The data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the running device may further include at least one of storage capacity and remaining storage capacity.
其中,运行设备的存储容量可以是指运行设备的存储器可以容纳的二进制信息量。运行设备的剩余存储容量可以是指去除被占用的存储容量之后,运行设备当前所能用于指令运行的存储容量。运行设备的资源信息能够表征该运行设备的运行能力。存储容量越大、剩余存储容量越大,运行设备的运行能力越强。The storage capacity of the running device may refer to the amount of binary information that the memory of the running device can hold. The remaining storage capacity of the running device may refer to the storage capacity that the running device can currently use for command operation after removing the occupied storage capacity. The resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity, the stronger the operating capability of the operating equipment.
在该实现方式中,指令生成模块12可以根据每个运行设备的资源信息、宏指令的数据量等,确定拆分宏指令的具体方式,以对宏指令进行拆分,生成与运行设备相对应的运行指令。In this implementation, the instruction generation module 12 may determine the specific method of splitting the macro instruction according to the resource information of each running device, the data amount of the macro instruction, etc., so as to split the macro instruction and generate the corresponding to the running device Running instructions.
在一种可能的实现方式中,如图3所示,指令生成模块12可以包括第一指令生成子模块121。第一指令生成子模块121用于在确定运行设备为一个,且在运行设备的资源不满足执行宏指令的容量条件时,根据运行设备的运行数据量和数据量将宏指令拆分成多条运行指令,以使运行设备依次执行多条运行指令。其中,运行设备的运行数据量可以是根据运行设备的资源信息确定的,每条运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量可以是根据运行数据量确定的。In a possible implementation manner, as shown in FIG. 3, the instruction generation module 12 may include a first instruction generation sub-module 121. The first instruction generation sub-module 121 is used to split the macro instruction into multiple pieces according to the amount of operation data and the amount of data of the operation device when it is determined that the operation device is one and the resource of the operation device does not meet the capacity condition for executing the macro instruction Run instructions, so that the running equipment executes multiple run instructions in sequence. The operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity may be based on the operation data The amount is determined.
在该实现方式中,运行设备的运行数据量可以是根据运行设备的存储容量或剩余存储容量确定的。容量条件可以是运行设备的运行数据量大于或等于宏指令的数据量,换言之,运行设备的资源不满足执行宏指令的容量条件可以是指:运行设备的运行数据量小于宏指令的数据量。运行输入量和运行输出量需小于或等于运行数据量,以保证所生成的运行指令可以被运行设备执行。多个运行指令中不同运行指令的运行输入量(或运行输出量)可以相同,也可以不同,本公开对此不作限制。In this implementation, the amount of operation data of the operation device may be determined according to the storage capacity or remaining storage capacity of the operation device. The capacity condition may be that the amount of running data of the running device is greater than or equal to the amount of data of the macro instruction. In other words, if the resource of the running device does not meet the capacity condition of executing the macro instruction, it may mean that the amount of running data of the running device is less than the amount of data of the macro instruction. The operation input and operation output must be less than or equal to the operation data to ensure that the generated operation instructions can be executed by the operation equipment. The operation input quantity (or operation output quantity) of different operation instructions among the plurality of operation instructions may be the same or different, and the disclosure does not limit this.
在该实现方式中,第一指令生成子模块121在确定运行设备为一个,且在运行设备的资源满足执行宏指令的容量条件时,可以直接将宏指令转化为一个运行指令,还可以将宏指令拆分为多个运行指令,本公开对此不作限制。In this implementation manner, when the first instruction generation sub-module 121 determines that there is only one running device, and when the resources of the running device meet the capacity condition for executing the macro instruction, it can directly convert the macro instruction into a running instruction, and also The instruction is split into multiple running instructions, which is not limited in this disclosure.
在一种可能的实现方式中,如图3所示,指令生成模块12可以包括第二指令生成子模块122。第二指令生成子模块122用于在确定运行设备为多个时,根据每个运行设备的运行数据量和数据量对宏指令进行拆分,生成对应于每个运行设备的运行指令。其中,每个运行设备的运行数据量可以是根据每个运行设备的资源信息确定的,运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量是根据执行运行指令的运行设备的运行数据量确定的。In a possible implementation manner, as shown in FIG. 3, the instruction generation module 12 may include a second instruction generation sub-module 122. The second instruction generation sub-module 122 is configured to split the macro instruction according to the operation data amount and data amount of each operation device when it is determined that there are multiple operation devices, and generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
在该实现方式中,运行输入量和运行输出量需小于或等于运行数据量,以保证所生成的运行指令可以运行设备执行。第二指令生成子模块122可以根据每个运行设备的运行数据量,为每个运行设备生成一个或多个运行指令,以供对应的运行设备执行。In this implementation, the operation input amount and the operation output amount need to be less than or equal to the operation data amount to ensure that the generated operation instruction can be executed by the operation device. The second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device for the corresponding operation device to execute.
在上述实现方式中,运行指令中包含运行输入量运行输出量中的至少一项,除了可以限定运行指令的数据量,使其能够被对应的运行设备执行之外。还可以满足不同运行指令对运行输入量和/或运行输出量的特殊限定需求。In the above implementation manner, the operation instruction includes at least one of the operation input quantity and the operation output quantity, except that the data quantity of the operation instruction can be limited so that it can be executed by the corresponding operation device. It can also meet the special requirements of different operation instructions on the operation input and / or operation output.
在一种可能的实现方式中,对于一些对运行输入量和/或运行输出量没有特殊限定需求的运行指令,其中可以不包含运行输入量和/或运行输出量,可以预先设置默认运行输入量和默认运行输出量,使得运行设备在确定接收到的运行指令中不存在运行输入量、运行输出量时,可以将默认运行输入量、 默认运行输出量作为该运行指令的运行输入量、运行输出量。通过预设默认运行输入量和默认运行输出量的方式,可以简化运行指令的生成过程,节省运行指令的生成时间。In a possible implementation, for some operation commands that do not have special requirements on the operation input and / or operation output, the operation input and / or operation output may not be included, and the default operation input may be preset And the default operating output, so that when the operating device determines that there is no operating input or operating output in the received operating command, it can use the default operating input and default operating output as the operating input and operating output of the operating command. the amount. By presetting the default operation input amount and the default operation output amount, the generation process of the operation instruction can be simplified and the generation time of the operation instruction can be saved.
在一种可能的实现方式中,可以将预先设置针对不同类型的宏指令的默认输入量和默认输出量。在宏指令中不包含输入量和输出量时,可以将对应的预先设置的默认输入量和默认输出量作为宏指令的输入量和输出量。进而根据默认输入量和/或默认输出量确定宏指令的数据量,并根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令。在宏指令中不包含输入量和输出量时,所生成的运行指令可以不包含运行输入量和运行输出量,也可以包含运行输入量和运行输出量中的至少一项。在运行指令中不包含运行输入量和/或运行输出量时,运行设备可以根据预先设置的默认运行输入量和/或默认运行输出量执行运行指令。In a possible implementation manner, the default input amount and the default output amount for different types of macro instructions may be preset. When the input quantity and output quantity are not included in the macro instruction, the corresponding preset default input quantity and default output quantity can be used as the input quantity and output quantity of the macro instruction. Furthermore, the data amount of the macro instruction is determined according to the default input amount and / or the default output amount, and the operation instruction is generated according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device. When the macro instruction does not include the input quantity and the output quantity, the generated operation instruction may not include the operation input quantity and the operation output quantity, or may include at least one of the operation input quantity and the operation output quantity. When the operation instruction does not include the operation input value and / or the operation output value, the operation device may execute the operation instruction according to the preset default operation input value and / or default operation output value.
在一种可能的实现方式中,指令生成模块12还可以根据宏指令以及预先设置的宏指令拆分规则,对宏指令进行拆分生成运行指令。宏指令拆分规则可以是根据常规的宏指令拆分方式(例如,根据宏指令的处理过程等进行拆分),结合所有备选设备能够执行的指令的运行数据量阈值确定的。将宏指令拆分成运行输入量以及运行输出量均小于或等于运行数据量阈值的运行指令,以保证生成的运行指令可以在其对应的运行设备(运行设备为备选设备中的任意一个)中被执行。其中,可以比较所有备选设备的存储容量(或剩余存储容量),将确定出的最小的存储容量(或剩余存储容量)确定为所有备选设备能够执行的指令的运行数据量阈值。In a possible implementation manner, the instruction generation module 12 may also split the macro instruction to generate a running instruction according to the macro instruction and the preset macro instruction splitting rule. The macro splitting rule may be determined according to the conventional macro splitting method (for example, splitting according to the processing of the macro command, etc.), combined with the threshold of the operating data amount of the commands that all candidate devices can execute. Split the macro instruction into operation instructions whose operation input and operation output are both less than or equal to the operation data threshold to ensure that the generated operation instruction can be in its corresponding operation device (the operation device is any one of the alternative devices) Was executed. Among them, the storage capacity (or remaining storage capacity) of all the candidate devices can be compared, and the determined minimum storage capacity (or remaining storage capacity) can be determined as the threshold of the operating data amount of instructions that all the candidate devices can execute.
应当理解的是,本领域技术人员可以根据实际需要对运行指令的生成方式进行设置,本公开对此不作限制。It should be understood that a person skilled in the art may set the generation method of the running instruction according to actual needs, which is not limited in this disclosure.
在本实施例中,指令生成模块根据宏指令所生成的运行指令可以是待执行指令,也可以是对待执行指令进行解析所获得的解析后的一个或多个指令,本公开对此不作限制。In this embodiment, the running instruction generated by the instruction generating module according to the macro instruction may be an instruction to be executed, or one or more parsed instructions obtained by parsing the instruction to be executed, which is not limited in the present disclosure.
在一种可能的实现方式中,如图3所示,该装置还可以包括队列构建模块15。队列构建模块15用于根据队列排序规则对运行指令进行排序,根据排序后的运行指令构建与运行设备相对应的指令队列。In a possible implementation manner, as shown in FIG. 3, the device may further include a queue construction module 15. The queue construction module 15 is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
在该实现方式中,可以为每个运行设备构建与之唯一对应的指令队列。可以按照指令队列中运行指令的排序,依次向指令队列唯一对应的运行设备发送运行指令;或者可以将指令队列发送至运行设备,以使运行设备按照指令队列中运行指令的排序依次执行其中的运行指令。通过上述方式,可保证运行设备按照指令队列执行运行指令,避免运行指令被错误、延误执行,避免运行指令被遗漏执行。In this implementation, an instruction queue uniquely corresponding to each running device can be constructed. The running instructions in the instruction queue can be ordered, and the running instructions can be sent to the only corresponding running device in the instruction queue; or the instruction queue can be sent to the running device, so that the running devices execute the operations in the order of the running instructions in the command queue. instruction. In this way, it can be ensured that the running equipment executes the running instructions according to the instruction queue, avoids the running instructions from being erroneous or delayed, and prevents the running instructions from being missed.
在该实现方式中,队列排序规则可以是根据执行运行指令的预计执行时长、运行指令的生成时间、与运行指令自身相关的运行输入量、运行输出量、操作类型等信息确定的,本公开对此不作限制。In this implementation, the queue ordering rules may be determined based on the estimated execution time of the execution of the execution instruction, the generation time of the execution instruction, the operation input amount, operation output amount, operation type and other information related to the operation instruction itself. There are no restrictions.
在一种可能的实现方式中,如图3所示,该装置还可以包括指令分派模块16。指令分派模块16用于将运行指令发送至运行设备,以使运行设备执行运行指令。In a possible implementation manner, as shown in FIG. 3, the device may further include an instruction dispatch module 16. The instruction dispatch module 16 is used to send the operation instruction to the operation equipment, so that the operation equipment executes the operation instruction.
在该实现方式中,在运行设备所执行的运行指令为一个时,可以直接将该运行指令发送至运行设备。在运行设备所执行的运行指令为多个时,可以将多个运行指令全部发送至运行设备,以使运行设备依次执行多个运行指令。还可以将多个运行指令依次发送给与之对应的运行设备,其中,每次在运行设备执行完成当前运行指令之后,向运行设备发送与之对应的下一个运行指令。本领域技术人员可以对向运行设备发送运行指令的方式进行设置,本公开对此不作限制。In this implementation manner, when there is only one operation instruction executed by the operation device, the operation instruction may be directly sent to the operation device. When there are multiple operating instructions executed by the operating device, all the multiple operating instructions may be sent to the operating device, so that the operating device sequentially executes the multiple operating instructions. It is also possible to send a plurality of operation instructions to the corresponding operation equipment in sequence, wherein each time the operation equipment completes the current operation instruction, the next operation instruction corresponding to the operation equipment is sent to the operation equipment. A person skilled in the art may set the manner of sending the operation instruction to the operation device, and the disclosure does not limit this.
在一种可能的实现方式中,如图3所示,指令分派模块16可以包括指令汇编子模块161、汇编翻译子模块162和指令发送子模块163。指令汇编子模块161用于根据所述运行指令生成汇编文件。汇编翻译子模块162用于将汇编文件翻译成二进制文件。指令发送子模块163用于将二进制文件发送至运行设备,以使运行设备根据二进制文件执行运行指令。In a possible implementation, as shown in FIG. 3, the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162, and an instruction sending submodule 163. The instruction assembly submodule 161 is used to generate an assembly file according to the operation instruction. The assembly translation submodule 162 is used to translate the assembly file into a binary file. The instruction sending submodule 163 is used to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
通过上述方式,可以降低运行指令的数据量,节省向运行设备发送运行指令的时间,提高宏指令的转换、执行速度。Through the above method, the data amount of the running instruction can be reduced, the time for sending the running instruction to the running device can be saved, and the conversion and execution speed of the macro instruction can be improved.
在该实现方式中,在二进制文件被发送至运行设备之后,运行设备可以对接收到的二进制文件进行译码获得对应的运行指令,并执行所获得的运行指令,获得执行结果。In this implementation manner, after the binary file is sent to the running device, the running device can decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.
在一种可能的实现方式中,运行设备可以为CPU、GPU和嵌入式神经网络处理器(Neural-network Processing Unit,简称NPU)中的其中一种或任意组合。这样,提高了装置根据宏指令生成运行指令 的速度。In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and embedded neural network processor (Neural-network Processing Unit, NPU for short). In this way, the speed at which the device generates the run command according to the macro command is increased.
在一种可能的实现方式中,该装置可以设置于CPU和/或NPU中。以实现通过CPU和/或NPU实现根据宏指令生成运行指令的过程,为装置的实现提供了更多的可能方式。In a possible implementation, the device may be set in the CPU and / or NPU. In order to realize the process of generating the running instruction according to the macro instruction through the CPU and / or NPU, it provides more possible ways for the implementation of the device.
本公开提供一种运行设备,该运行设备用于执行上述神经网络指令生成装置所生成的运行指令。该运行设备包括控制模块和执行模块。控制模块用于获取数据、神经网络模型以及运行指令,还可以用于对运行指令进行解析,获得多个解析指令,并将多个解析指令和数据发送至执行模块。执行模块用于根据数据执行多个解析指令,得到执行结果。The present disclosure provides a running device for executing the running command generated by the neural network command generating device. The running equipment includes a control module and an execution module. The control module is used to obtain data, neural network models and operation instructions, and can also be used to parse the operation instructions, obtain multiple analysis instructions, and send the multiple analysis instructions and data to the execution module. The execution module is used to execute multiple parsing instructions based on the data to obtain the execution result.
在一种可能的实现方式中,运行设备还包括存储模块。该存储模块可以包括寄存器和缓存中的至少一种,缓存可以包括高速暂存缓存。缓存可以用于存储数据。寄存器可以用于存储数据中的标量数据。In a possible implementation, the running device further includes a storage module. The storage module may include at least one of a register and a cache, and the cache may include a high-speed temporary storage cache. The cache can be used to store data. Registers can be used to store scalar data in the data.
在一种可能的实现方式中,控制模块可以包括指令存储子模块和指令处理子模块。指令存储子模块用于存储运行指令。指令处理子模块用于对运行指令进行解析,得到多个解析指令。In a possible implementation manner, the control module may include an instruction storage submodule and an instruction processing submodule. The instruction storage sub-module is used to store operation instructions. The instruction processing submodule is used to parse the running instruction and obtain multiple parsed instructions.
在一种可能的实现方式中,控制模块还可以包括存储队列子模块。存储队列子模块用于存储运行指令队列,该运行指令队列中包含运行设备所需执行的运行指令以及多个解析指令。在运行指令队列中所有指令按照执行的先后顺序依次排列。In a possible implementation, the control module may further include a storage queue sub-module. The storage queue sub-module is used to store a running instruction queue, and the running instruction queue contains the running instructions to be executed by the running device and multiple parsing instructions. In the running instruction queue, all instructions are arranged in the order of execution.
在一种可能的实现方式中,执行模块还可以包括依赖关系处理子模块。依赖关系处理子模块用于在确定第一解析指令与第一解析指令之前的第零解析指令存在关联关系时,将第一解析指令缓存在指令存储子模块中,在第零解析指令执行完毕后,从指令存储子模块中提取第一解析指令发送至执行模块。In a possible implementation, the execution module may further include a dependency processing sub-module. The dependency processing submodule is used to cache the first parsing instruction in the instruction storage submodule when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction. After the execution of the zeroth parsing instruction is completed , Extract the first parsing instruction from the instruction storage submodule and send it to the execution module.
其中,第一解析指令与第一解析指令之前的第零解析指令存在关联关系可以包括:存储第一解析指令所需数据的第一存储地址区间与存储第零解析指令所需数据的第零存储地址区间具有重叠区域。反之,第一解析指令与第零解析指令之间没有关联关系可以是第一存储地址区间与第零存储地址区间没有重叠区域。Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: a first storage address interval storing data required by the first parsing instruction and a zeroth storage storing data required by the zeroth parsing instruction The address interval has overlapping areas. Conversely, there is no association between the first parsing instruction and the zeroth parsing instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
本公开提供一种神经网络指令处理系统,该系统包括上述神经网络指令生成装置和上述运行设备。The present disclosure provides a neural network instruction processing system. The system includes the aforementioned neural network instruction generating device and the aforementioned operating device.
需要说明的是,尽管以上述实施例作为示例介绍了神经网络指令生成装置、运行设备、神经网络指令处理系统如上,但本领域技术人员能够理解,本公开应不限于此。事实上,用户完全可根据个人喜好和/或实际应用场景灵活设定各模块,只要符合本公开的技术方案即可。It should be noted that although the above-mentioned embodiment is taken as an example to introduce the neural network instruction generating device, the operating device, and the neural network instruction processing system as above, those skilled in the art can understand that the present disclosure should not be limited to this. In fact, the user can set various modules flexibly according to personal preferences and / or actual application scenarios, as long as the technical solutions of the present disclosure are met.
图4示出根据本公开一实施例的神经网络指令生成方法的流程图。如图4所示,该方法应用于上述神经网络指令生成装置,该方法包括步骤S41和步骤S42。在步骤S41中,根据接收到的宏指令,确定执行宏指令的运行设备。在步骤S42中,根据宏指令和运行设备,生成运行指令。FIG. 4 shows a flowchart of a neural network instruction generation method according to an embodiment of the present disclosure. As shown in FIG. 4, the method is applied to the above neural network instruction generating device, and the method includes step S41 and step S42. In step S41, based on the received macro instruction, the operating device that executes the macro instruction is determined. In step S42, an operation instruction is generated based on the macro instruction and the operation device.
在一种可能的实现方式中,步骤S41可以包括:在确定宏指令中包含指定设备的标识,且指定设备的资源满足执行宏指令的执行条件时,将指定设备确定为运行设备。其中,执行条件可以包括:指定设备中包含与宏指令相对应的指令集。In a possible implementation manner, step S41 may include: when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition of executing the macro instruction, determining the designated device as the running device. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
在一种可能的实现方式中,该方法还可以包括:获取备选设备的资源信息。其中,步骤S41还可以包括:在确定宏指令中不包含指定设备的标识时,根据接收到的宏指令和备选设备的资源信息,从备选设备中确定出用于执行宏指令的运行设备。其中,资源信息可以包括备选设备所包含的指令集。In a possible implementation manner, the method may further include: acquiring resource information of the candidate device. Wherein, step S41 may further include: when it is determined that the macro instruction does not contain the identifier of the specified device, according to the received macro instruction and the resource information of the candidate device, a running device for executing the macro instruction is determined from the candidate device . The resource information may include the instruction set included in the candidate device.
在一种可能的实现方式中,步骤S41还可以包括:在确定宏指令中包含指定设备的标识,且指定设备的资源不满足执行宏指令的执行条件时,根据宏指令和备选设备的资源信息,确定运行设备。In a possible implementation manner, step S41 may further include: when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the macro instruction, according to the macro instruction and the resource of the alternative device Information to determine the operating equipment.
在一种可能的实现方式中,宏指令可以包含输入量和输出量中的至少一项。步骤S42可以包括:确定宏指令的数据量,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令。其中,数据量可以是根据输入量和输出量中的至少一项确定的,运行设备的资源信息还可以包括存储容量、剩余存储容量的至少一项。In a possible implementation manner, the macro instruction may include at least one of input quantity and output quantity. Step S42 may include: determining the data amount of the macro instruction, and generating an operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operating device. Wherein, the data amount may be determined according to at least one of the input amount and the output amount, and the resource information of the operating device may further include at least one item of storage capacity and remaining storage capacity.
在一种可能的实现方式中,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令,可以包括:在确定运行设备为一个,且运行设备的资源不满足执行宏指令的容量条件时,根据运行设备的运行数据和数据量将宏指令拆分成多条运行指令,以使运行设备依次执行多条运行指令。其中, 运行设备的运行数据量可以是根据运行设备的资源信息确定的,每条运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量是根据运行数据量确定的。In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: determining that the operation device is one and the resource of the operation device does not satisfy the execution of the macro instruction Under the capacity condition, the macro instruction is divided into multiple operation instructions according to the operation data and data amount of the operation equipment, so that the operation equipment executes the plurality of operation instructions in sequence. The operation data volume of the operation equipment may be determined based on the resource information of the operation equipment, and each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity are based on the operation data definite.
在一种可能的实现方式中,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令,可以包括:在确定运行设备为多个时,根据每个运行设备的运行数据量和数据量对宏指令进行拆分,生成对应于每个运行设备的运行指令。其中,每个运行设备的运行数据量可以是根据每个运行设备的资源信息确定的,运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量是根据执行运行指令的运行设备的运行数据量确定的。In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: when determining that there are multiple operation devices, according to the operation data amount of each operation device The macro instruction is split with the amount of data to generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
在一种可能的实现方式中,该方法还可以包括:根据队列排序规则对运行指令进行排序,根据排序后的运行指令构建与运行设备相对应的指令队列。In a possible implementation manner, the method may further include: sorting the execution instructions according to the queue sorting rule, and constructing an instruction queue corresponding to the execution device according to the sorted execution instructions.
在一种可能的实现方式中,该方法还可以包括:接收待执行指令,根据确定的指定设备的标识和待执行指令生成宏指令。In a possible implementation manner, the method may further include: receiving an instruction to be executed, and generating a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
在一种可能的实现方式中,该方法还可以包括:将运行指令发送至运行设备,以使运行设备执行运行指令。In a possible implementation manner, the method may further include: sending the operation instruction to the operation device, so that the operation device executes the operation instruction.
在一种可能的实现方式中,将运行指令发送至运行设备,以使运行设备执行运行指令,包括:根据运行指令生成汇编文件;将汇编文件翻译成二进制文件;将二进制文件发送至运行设备,以使运行设备根据二进制文件执行运行指令。In a possible implementation, sending the running instruction to the running device to make the running device execute the running instruction includes: generating an assembly file according to the running instruction; translating the assembly file into a binary file; sending the binary file to the running device, So that the running device executes the running instruction according to the binary file.
在一种可能的实现方式中,资源信息可以包括备选设备的存储容量、剩余存储容量、备选设备所包含的指令集中的至少一种。In a possible implementation manner, the resource information may include at least one of the storage capacity of the candidate device, the remaining storage capacity, and the instruction set included in the candidate device.
在一种可能的实现方式中,运行设备可以为CPU、GPU和NPU中的其中一种或任意组合。In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and NPU.
在一种可能的实现方式中,该方法可以应用于CPU和/或NPU中。In a possible implementation, the method can be applied to the CPU and / or NPU.
在一种可能的实现方式中,宏指令可以包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运宏指令。In a possible implementation manner, the macro instruction may include at least one of the following instructions: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.
其中,计算宏指令可以包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种。控制宏指令可以包括无条件跳转宏指令和有条件跳转宏指令中的至少一种。数据搬运宏指令可以包括读宏指令和写宏指令中的至少一种。读宏指令可以包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。写宏指令可以包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。The calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction. The control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction. The data handling macro instruction may include at least one of a read macro instruction and a write macro instruction. The read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction. The write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
在一种可能的实现方式中,宏指令可以包含以下选项中的至少一项:用于执行宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数。运行指令可以包含以下选项中的至少一项:操作类型、输入地址、输出地址、操作数和指令参数。In a possible implementation, the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters. The run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
本公开实施例所提供的神经网络指令生成方法,根据接收到的宏指令,确定执行宏指令的运行设备;根据宏指令和运行设备,生成运行指令。该方法可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The method for generating a neural network instruction provided by an embodiment of the present disclosure determines the operation device for executing the macro instruction according to the received macro instruction; and generates the operation instruction according to the macro instruction and the operation device. The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
本公开还提供一种神经网络指令执行方法,该方法应用于上述运行设备,该方法包括:通过运行设备获取数据、神经网络模型以及运行指令,对运行指令进行解析,获得多个解析指令,并根据数据执行多个解析指令,得到执行结果。The present disclosure also provides a neural network instruction execution method. The method is applied to the above operation device. The method includes: acquiring data, a neural network model, and an operation instruction through the operation device, parsing the operation instruction, and obtaining multiple analysis instructions, and Execute multiple parsing instructions based on the data to get the execution result.
在一种可能的实现方式中,该方法还可以包括:通过运行设备存储数据以及数据中的标量数据。其中,运行设备包括存储模块,存储模块包括寄存器、缓存中的一个或多个,缓存包括高速暂存缓存。缓存,用于存储数据。寄存器,用于存储数据中标量数据。In a possible implementation manner, the method may further include: storing data and scalar data in the data through the running device. Wherein, the running device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache. Cache, used to store data. Registers are used to store scalar data in the data.
在一种可能的实现方式中,该方法还可以包括:In a possible implementation manner, the method may further include:
通过运行设备存储运行指令;Store running instructions through running equipment;
通过运行设备对运行指令进行解析,得到多个解析指令;Analyze the operation instructions through the operation equipment to obtain multiple analysis instructions;
通过运行设备存储运行指令队列,运行指令队列包括运行指令和多个解析指令,运行指令队列运行指令和多个解析指令按照被执行的先后顺序依次排列。The running instruction queue is stored by the running device. The running instruction queue includes the running instruction and a plurality of parsing instructions. The running instruction queue and the multiple parsing instructions are arranged in order according to the order in which they are executed.
在一种可能的实现方式中,该方法还可以包括:In a possible implementation manner, the method may further include:
通过运行设备在确定第一解析指令与第一解析指令之前的第零解析指令存在关联关系时,缓存第一解析指令,在第零解析指令执行完毕后,执行缓存的第一解析指令。When the running device determines that there is an association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cached first parsing instruction is executed.
其中,第一解析指令与第一解析指令之前的第零解析指令存在关联关系包括:存储第一解析指令所需数据的第一存储地址区间与存储第零解析指令所需数据的第零存储地址区间具有重叠的区域。Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes: a first storage address interval storing data required by the first parsing instruction and a zeroth storage address storing data required by the zeroth parsing instruction Intervals have overlapping areas.
本公开实施例所提供的神经网络指令执行方法,通过运行设备获取数据、神经网络模型以及运行指令,对运行指令进行解析,获得多个解析指令,并根据数据执行多个解析指令,得到执行结果。该方法可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The method for executing a neural network instruction provided by an embodiment of the present disclosure obtains data, a neural network model, and a running instruction through a running device, analyzes the running instruction, obtains multiple analyzing instructions, and executes multiple analyzing instructions based on the data to obtain an execution result . The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
本公开还提供一种神经网络指令处理方法,该方法应用于神经网络指令处理系统,该神经网络指令处理系统包括上述神经网络指令生成装置和上述运行设备。该方法包括上述应用于神经网络指令生成装置的神经网络指令生成方法和应用于运行设备的神经网络指令执行方法。该方法可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The present disclosure also provides a neural network instruction processing method. The method is applied to a neural network instruction processing system. The neural network instruction processing system includes the aforementioned neural network instruction generating device and the aforementioned operating device. The method includes the above-mentioned neural network instruction generation method applied to the neural network instruction generation device and the neural network instruction execution method applied to the running equipment. The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
依据以下条款可更好地理解前述内容:The foregoing can be better understood based on the following terms:
条款B1、一种神经网络指令生成装置,所述装置包括:Clause B1, a neural network instruction generating device, the device comprising:
设备确定模块,用于根据接收到的宏指令,确定执行所述宏指令的运行设备;A device determination module, configured to determine a running device that executes the macro instruction according to the received macro instruction;
指令生成模块,用于根据所述宏指令和所述运行设备,生成运行指令。The instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.
条款B2、根据条款B1所述的装置,所述设备确定模块,包括:Clause B2. The apparatus according to Clause B1, the device determination module includes:
第一确定子模块,用于在确定所述宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ,
其中,所述执行条件包括:所述指定设备包含与所述宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
条款B3、根据条款B2所述的装置,所述装置还包括:Clause B3. The device according to Clause B2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述宏指令中不包含所述指定设备的标识时,根据接收到的宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述宏指令的运行设备,A second determining submodule, configured to determine from the candidate device based on the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not include the identifier of the designated device A running device for executing the macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款B4、根据条款B3所述的装置,所述设备确定模块,还包括:Clause B4. The apparatus according to Clause B3, the device determination module, further comprising:
第三确定子模块,在确定所述宏指令包含所述指定设备的标识,且所述指定设备的资源不满足执行所述宏指令的执行条件时,根据所述宏指令和所述备选设备的资源信息,确定运行设备。A third determining submodule, when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the candidate device Resource information to determine the operating equipment.
条款B5、根据条款B2-条款B4任一项所述的装置,所述宏指令包含输入量和输出量中的至少一项,Clause B5. The device according to any one of Clauses B2-B4, the macro instruction includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述宏指令的数据量,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款B6、根据条款B5所述的装置,所述指令生成模块,包括:Clause B6. The apparatus according to Clause B5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的资源不满足执行所述宏指令的容量条件时,根据所述运行设备的运行数据量和所述数据量将所述宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used to determine, according to the amount of operation data of the operation device and the data, when the operation device is determined to be one and the resource of the operation device does not satisfy the capacity condition for executing the macro instruction The macro instruction is divided into multiple running instructions to enable the running device to execute multiple running instructions in sequence,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.
条款B7、根据条款B5所述的装置,所述指令生成模块,包括:Clause B7. The apparatus according to Clause B5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述宏指令进行拆分,生成对应于每个运行设备的运行指令,A second instruction generation sub-module, used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation device 'S running instructions,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令包含运 行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
条款B8、根据条款B1所述的装置,所述装置还包括:Clause B8. The device according to Clause B1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款B9、根据条款B2所述的装置,所述装置还包括:Clause B9. The device according to Clause B2, the device further comprising:
宏指令生成模块,用于接收待执行指令,根据确定的指定设备的标识和所述待执行指令生成所述宏指令。The macro generation module is used for receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
条款B10、根据条款B1所述的装置,所述装置还包括:Clause B10. The device according to Clause B1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款B11、根据条款B1所述的装置,Clause B11, the device according to Clause B1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述宏指令包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运宏指令,The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,
其中,所述计算宏指令包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,
数据搬运宏指令包括读宏指令和写宏指令中的至少一种,所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种,所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种;The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
所述宏指令包含以下选项中的至少一项:用于执行所述宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数,The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
所述运行指令包含以下选项中的至少一项:所述操作类型、所述输入地址、所述输出地址、所述操作数和所述指令参数。The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
条款B12、一种机器学习运算装置,所述装置包括:Article B12. A machine learning computing device, the device comprising:
一个或多个如条款B1-条款B 11任一项所述的神经网络指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more neural network instruction generation devices as described in any one of Clause B1-Clause B11, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述神经网络指令生成装置时,所述多个所述神经网络指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the neural network instruction generation devices, the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure;
其中,多个所述神经网络指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述神经网络指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述神经网络指令生成装置共享内存或者拥有各自的内存;多个所述神经网络指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.
条款B13、一种组合处理装置,所述装置包括:Clause B13. A combined processing device, the device comprising:
如条款B12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause B12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作。The machine learning operation device interacts with the other processing device to jointly complete the calculation operation specified by the user.
条款B14、根据条款B13所述的组合处理装置,还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Clause B14, the combined processing device according to Clause B13, further comprising: a storage device, which is connected to the machine learning computing device and the other processing device, respectively, for storing the machine learning computing device and the Data from other processing devices.
条款B15、一种机器学习芯片,所述机器学习芯片包括:Article B15. A machine learning chip, the machine learning chip includes:
如条款B12所述的机器学习运算装置或如条款B13所述的组合处理装置。The machine learning arithmetic device according to clause B12 or the combined processing device according to clause B13.
条款B16、一种电子设备,所述电子设备包括:Article B16. An electronic device, the electronic device comprising:
如条款B15所述的机器学习芯片。Machine learning chip as described in clause B15.
条款B17、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款B15所述的机器学习芯片;Clause B17, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause B15;
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款B18、根据条款B17所述的板卡,Clause B18, the board described in Clause B17,
所述存储器件包括:多组存储单元,每一组所述存储单元与所述机器学习芯片通过总线连接,所述存储单元为:DDR SDRAM;The storage device includes: multiple sets of storage units, each of which is connected to the machine learning chip through a bus, and the storage unit is: DDR SDRAM;
所述机器学习芯片包括:DDR控制器,用于控制每个所述存储单元的数据传输与数据存储;The machine learning chip includes: a DDR controller for controlling data transmission and data storage of each storage unit;
所述接口装置为:标准PCIE接口。The interface device is: a standard PCIE interface.
条款B19、一种神经网络指令生成方法,所述方法包括:Clause B19. A neural network instruction generation method, the method comprising:
根据接收到的宏指令,确定执行所述宏指令的运行设备;According to the received macro instruction, determine the execution device for executing the macro instruction;
根据所述宏指令和所述运行设备,生成运行指令。According to the macro instruction and the operation device, an operation instruction is generated.
条款B20、根据条款B19所述的方法,根据接收到的宏指令,确定执行所述宏指令的运行设备,包括:Clause B20. According to the method described in Clause B19, based on the received macro instruction, determining a running device that executes the macro instruction includes:
在确定所述宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, determining the designated device as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
条款B21、根据条款B20所述的方法,所述方法还包括:Clause B21. The method according to Clause B20, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的宏指令,确定执行所述宏指令的运行设备,包括:Wherein, according to the received macro instruction, determining a running device for executing the macro instruction includes:
在确定所述宏指令中不包含所述指定设备的标识时,根据接收到的宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述宏指令的运行设备,When it is determined that the macro instruction does not include the identifier of the designated device, according to the received macro instruction and the resource information of the candidate device, a candidate for executing the macro instruction is determined from the candidate device Running equipment,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款B22、根据条款B21所述的方法,根据接收到的宏指令,确定执行所述宏指令的运行设备,包括:Clause B22. According to the method described in Clause B21, based on the received macro instruction, determining a running device that executes the macro instruction includes:
在确定所述宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述宏指令的执行条件时,根据所述宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Run the device.
条款B23、根据条款B20-条款B 22任一项所述的方法,所述宏指令包含输入量和输出量中的至少一项,Clause B23, the method according to any one of Clause B20-Clause B22, the macro instruction includes at least one of an input quantity and an output quantity,
根据所述宏指令和所述运行设备,生成运行指令,包括:According to the macro instruction and the operation device, generating an operation instruction includes:
确定所述宏指令的数据量,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款B24、根据条款B23所述的方法,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause B24. According to the method described in Clause B23, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes:
在确定所述运行设备为一个,且所述运行设备的资源不满足执行所述宏指令的容量条件时,根据所述运行设备的运行数据量和所述数据量将所述宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that the running device is one, and the resources of the running device do not satisfy the capacity condition for executing the macro instruction, the macro instruction is split into according to the running data amount of the running device and the data amount Multiple operating instructions, so that the operating device sequentially executes multiple operating instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令包含运 行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.
条款B25、根据条款B23所述的方法,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause B25. According to the method described in Clause B23, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
条款B26、根据条款B19所述的方法,所述方法还包括:Clause B26. The method according to Clause B19, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款B27、根据条款B20所述的方法,所述方法还包括:Clause B27. The method according to Clause B20, the method further comprising:
接收待执行指令,根据确定的指定设备的标识和所述待执行指令生成所述宏指令。Receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
条款B28、根据条款B19所述的方法,所述方法还包括:Clause B28. The method according to Clause B19, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款B29、根据条款B19所述的方法,Clause B29, according to the method described in Clause B19,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述宏指令包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运宏指令,The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,
其中,所述计算宏指令包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,
数据搬运宏指令包括读宏指令和写宏指令中的至少一种,所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种,所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种;The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
所述宏指令包含以下选项中的至少一项:用于执行所述宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数,The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
所述运行指令包含以下选项中的至少一项:所述操作类型、所述输入地址、所述输出地址、所述操作数和所述指令参数。The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
图5示出根据本公开一实施例的神经网络指令处理系统的框图。如图5所示,该系统包括指令生成设备100和运行设备200。5 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure. As shown in FIG. 5, the system includes an instruction generation device 100 and an operation device 200.
指令生成设备100包括设备确定模块11和指令生成模块12。设备确定模块11用于根据接收到的宏指令,确定执行宏指令的运行设备。指令生成模块12用于根据宏指令和运行设备,生成运行指令。The instruction generation device 100 includes a device determination module 11 and an instruction generation module 12. The device determination module 11 is used to determine a running device that executes the macro instruction according to the received macro instruction. The instruction generation module 12 is used to generate an operation instruction according to the macro instruction and the operation device.
运行设备200包括控制模块21和执行模块22。控制模块21用于获取所需数据、神经网络模型以及运行指令,对运行指令进行解析,获得多个解析指令。执行模块22用于根据数据执行多个解析指令,得到执行结果。The operation device 200 includes a control module 21 and an execution module 22. The control module 21 is used to obtain the required data, the neural network model, and the operation instructions, analyze the operation instructions, and obtain multiple analysis instructions. The execution module 22 is used to execute multiple parsing instructions according to the data to obtain the execution result.
在该实现方式中,宏指令是一种批量处理的称谓,宏指令可以是一种规则或模式,或称语法替换,宏指令的处理装置、系统等在遇到宏指令时会自动进行这一规则或模式的替换。宏指令可以是对常用 的用于对数据进行计算、控制和搬运等处理的待执行指令进行整合形成的。In this implementation, a macro instruction is a term for batch processing. A macro instruction can be a rule or a pattern, or a syntax replacement. A macro instruction processing device, system, etc. will automatically perform this when it encounters a macro instruction. Replacement of rules or patterns. The macro instruction can be formed by integrating the instructions to be executed that are commonly used for calculation, control and transportation of data.
在一种可能的实现方式中,宏指令可以包括以下至少一种:计算宏指令、控制宏指令和数据搬运宏指令。其中,计算宏指令可以包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种。控制宏指令可以包括无条件跳转宏指令和有条件跳转宏指令中的至少一种。数据搬运宏指令可以包括读宏指令和写宏指令中的至少一种。读宏指令可以包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。写宏指令可以包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。In a possible implementation manner, the macro instruction may include at least one of the following: a calculation macro instruction, a control macro instruction, and a data handling macro instruction. The calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction. The control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction. The data handling macro instruction may include at least one of a read macro instruction and a write macro instruction. The read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction. The write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
在一种可能的实现方式中,宏指令可以包含以下选项中的至少一项:用于执行宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数。运行指令可以包含以下选项中的至少一项:操作类型、输入地址、输出地址、操作数和指令参数。In a possible implementation, the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters. The run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
其中,指定设备的标识可以是指定设备的物理地址、IP地址、名称、编号等标识。标识可以包括数字、字母、符号中的其中一种或任意组合。在宏指令的指定设备的标识的位置为空时,确定该宏指令无指定设备;或者,在宏指令中不包含“指定设备的标识”这个字段时,确定该宏指令无指定设备。操作类型可以是指该宏指令对数据所进行操作的类型,表征该宏指令的具体类型,如在某宏指令的操作类型为“XXX”时,可以根据“XXX”确定该宏指令对数据所进行的操作的具体类型。根据操作类型可以确定执行该宏指令所需的指令集合,如在某宏指令的操作类型为“XXX”时,其所需的指令集合为进行“XXX”所对应的处理所需的所有指令集。输入地址可以是数据的输入地址、读取地址等获得数据的地址,输出地址可以是被处理后的数据的输出地址、写入地址等存储数据的地址。输入量可以是数据的输入规模、输入长度等表征其数据量大小的信息。输出量可以是数据的输出规模、输出长度等表征其数据量的大小的信息。操作数可以包括寄存器的长度、寄存器的地址、寄存器的标识、立即数等中的一个或多个。立即数为在立即寻址方式指令中给出的数。指令参数可以是指对应于该宏指令、与其执行相关的参数。例如,指令参数可以是第二个操作数的地址和长度等。指令参数可以是卷积核的大小、卷积核的步长和卷积核的填充等。The identification of the designated device may be the identification of the physical address, IP address, name, and number of the designated device. The identification may include one or any combination of numbers, letters, and symbols. When the position of the identifier of the designated device of the macro instruction is empty, it is determined that the macro instruction has no designated device; or, when the field "ID of the designated device" is not included in the macro instruction, it is determined that the macro instruction has no designated device. The operation type can refer to the type of operation of the macro instruction on the data, which represents the specific type of the macro instruction. For example, when the operation type of a macro instruction is "XXX", the macro instruction can be determined according to "XXX" The specific type of operation performed. The instruction set required to execute the macro instruction can be determined according to the operation type. For example, when the operation type of a macro instruction is "XXX", the required instruction set is all instruction sets required for the processing corresponding to "XXX" . The input address may be an input address of data, a read address, etc. to obtain data, and the output address may be an output address of processed data, a write address, etc. to store data. The input amount may be information characterizing the size of the data such as the input scale and input length of the data. The output volume may be information indicating the size of the data volume such as the output scale and output length of the data. The operand may include one or more of the length of the register, the address of the register, the identification of the register, the immediate value, and so on. The immediate number is the number given in the immediate addressing mode instruction. The instruction parameter may refer to a parameter corresponding to the macro instruction and related to its execution. For example, the instruction parameter may be the address and length of the second operand. The instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, and so on.
在该实现方式中,对于一个宏指令,其必须包括操作码和至少一个操作域,其中操作码即为操作类型,操作域包括指定设备的标识、输入地址、输出地址、输入量、输出量、操作数和指令参数。操作码可以是计算机程序中所规定的要执行操作的那一部分指令或字段(通常用代码表示),是指令序列号,用来告知执行指令的装置具体需要执行哪一条指令。操作域可以是执行对应的指令所需的所有数据的来源,执行对应的指令所需的所有数据包括参数数据、待运算或待处理的数据、对应的运算方法,或者存储参数数据、待运算或待处理的数据、对应的运算方法的地址等等。In this implementation, for a macro instruction, it must include an operation code and at least one operation field, where the operation code is the operation type, and the operation field includes the identification of the specified device, input address, output address, input volume, output volume, Operands and instruction parameters. The operation code may be the part of the instruction or field (usually represented by code) specified in the computer program to perform the operation, and is an instruction sequence number used to inform the device that executes the instruction which instruction needs to be executed. The operation domain may be the source of all data required to execute the corresponding instruction. All data required to execute the corresponding instruction include parameter data, data to be operated or processed, corresponding operation method, or stored parameter data, to be operated or The data to be processed, the address of the corresponding operation method, etc.
应当理解的是,本领域技术人员可以根据需要对宏指令的指令格式以及所包含的内容进行设置,本公开对此不作限制。It should be understood that, those skilled in the art can set the instruction format and the content of the macro instruction as needed, and this disclosure does not limit this.
在本实施例中,设备确定模块11可以根据宏指令确定一个或多个运行设备。指令生成模块12可以生成一个或多个运行指令。在生成的运行指令为多个时,多个运行指令可以在同一个运行设备中被执行,也可以在不同的运行设备中被执行,本公开对此不作限制。In this embodiment, the device determination module 11 may determine one or more running devices according to the macro instruction. The instruction generation module 12 may generate one or more execution instructions. When there are multiple generated running instructions, the multiple running instructions may be executed in the same running device, or may be executed in different running devices, which is not limited in the present disclosure.
本公开实施例所提供的神经网络指令处理系统,该系统包括指令生成设备和运行设备。指令生成设备包括设备确定模块用于根据接收到的宏指令,确定执行宏指令的运行设备;指令生成模块用于根据宏指令和运行设备,生成运行指令。运行设备包括控制模块用于获取所需数据、神经网络模型以及运行指令,对运行指令进行解析,获得多个解析指令;执行模块用于根据所述数据执行所述多个解析指令,得到执行结果。本公开实施例所提供的神经网络指令处理系统,可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The neural network instruction processing system provided by the embodiments of the present disclosure includes an instruction generation device and an operation device. The instruction generating device includes a device determining module for determining a running device that executes the macro instruction based on the received macro instruction; an instruction generating module is used for generating a running instruction based on the macro instruction and the running device. The operation device includes a control module for acquiring required data, a neural network model and operation instructions, analyzing the operation instructions to obtain multiple analysis instructions; an execution module for executing the plurality of analysis instructions based on the data, and obtaining execution results . The neural network instruction processing system provided by the embodiments of the present disclosure can be used across platforms, has good applicability, fast instruction conversion speed, high processing efficiency, low error probability, and low development labor and material costs.
图6示出根据本公开一实施例的神经网络指令处理系统的框图。在一种可能的实现方式中,如图6所示,指令生成设备100还可以包括宏指令生成模块13。宏指令生成模块13用于接收待执行指令,根据确定的指定设备的标识和待执行指令生成宏指令。6 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include a macro instruction generation module 13. The macro generation module 13 is used to receive the instruction to be executed, and generate a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
在该实现方式中,指定设备可以是根据待执行指令的操作类型、输入量、输出量等确定的。所接收到的待执行指令可以是一条,也可以是多条。In this implementation, the designated device may be determined according to the operation type, input amount, output amount, etc. of the instruction to be executed. The received instruction to be executed may be one or multiple instructions.
待执行指令可以包括以下至少一种:待执行计算指令、待执行控制指令和待执行数据搬运指令。其中,待执行计算指令可以包括待执行神经网络计算指令、待执行向量逻辑计算指令、待执行矩阵向量计算指令、待执行标量计算指令和待执行标量逻辑计算指令中的至少一种。待执行控制指令可以包括待执行无条件跳转指令和待执行有条件跳转指令中的至少一种。待执行数据搬运指令可以包括待执行读指令和待执行写指令中的至少一种。待执行读指令可以包括待执行读神经元指令、待执行读突触指令和待执行读标量指令中的至少一种。待执行写指令可以包括待执行写神经元指令、待执行写突触指令和待执行写标量指令中的至少一种。The instruction to be executed may include at least one of the following: a calculation instruction to be executed, a control instruction to be executed, and a data handling instruction to be executed. The calculation instruction to be executed may include at least one of a neural network calculation instruction to be executed, a vector logic calculation instruction to be executed, a matrix vector calculation instruction to be executed, a scalar calculation instruction to be executed and a scalar logic calculation instruction to be executed. The control instruction to be executed may include at least one of an unconditional jump instruction to be executed and a conditional jump instruction to be executed. The data carrying instruction to be executed may include at least one of a reading instruction to be executed and a writing instruction to be executed. The read instruction to be executed may include at least one of a read neuron instruction to be executed, a read synapse instruction to be executed and a read scalar instruction to be executed. The write instruction to be executed may include at least one of a write neuron instruction to be executed, a write synapse instruction to be executed and a write scalar instruction to be executed.
待执行指令可以包含以下选项中的至少一项:操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数。The instruction to be executed may include at least one of the following options: operation type, input address, output address, input amount, output amount, operand, and instruction parameter.
在该实现方式中,在待执行指令为一个时,可以将确定的指定设备的标识添加到待执行指令中,生成宏指令。举例来说,某待执行指令m为“XXX……param)”。其中,XXX为操作类型,param为指令参数。可以根据该待执行指令m的操作类型“XXX”确定其指定设备m-1。然后,在待执行指令m中添加指定设备m-1的标识(例如,09),生成对应该待执行指令m的宏指令M“XXX 09,……param”。在待执行指令为多个时,可以将确定的每个待执行指令所对应的指定设备的标识添加到待执行指令中,根据带有指定设备的标识的多个待执行指令,生成一个宏指令,或者生成对应的多个宏指令。In this implementation manner, when there is only one instruction to be executed, the determined identifier of the designated device may be added to the instruction to be executed to generate a macro instruction. For example, a certain instruction m to be executed is "XXX ...... param)". Among them, XXX is the operation type, param is the instruction parameter. The designated device m-1 can be determined according to the operation type "XXX" of the instruction m to be executed. Then, an identifier (for example, 09) specifying the device m-1 is added to the instruction m to be executed, and a macro instruction M "XXX09, ... param" corresponding to the instruction m to be executed is generated. When there are multiple instructions to be executed, the identifier of the specified device corresponding to each instruction to be executed can be added to the instruction to be executed, and a macro instruction can be generated according to the plurality of instructions to be executed with the identifier of the specified device , Or generate multiple corresponding macros.
应当理解的是,本领域技术人员可以根据需要对待执行指令的指令格式以及所包含的内容进行设置,本公开对此不作限制。It should be understood that those skilled in the art can set the instruction format and the content of the instructions to be executed as needed, and this disclosure does not limit this.
在一种可能的实现方式中,如图6所示,设备确定模块11可以包括第一确定子模块111。第一确定子模块111用于在确定宏指令中包含指定设备的标识,且指定设备的资源满足执行宏指令的执行条件时,将指定设备确定为运行设备。其中,执行条件可以包括:指定设备中包含与宏指令相对应的指令集。In a possible implementation manner, as shown in FIG. 6, the device determination module 11 may include a first determination submodule 111. The first determining submodule 111 is used to determine that the designated device is an operating device when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the macro instruction. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
在该实现方式中,宏指令中可以包含执行宏指令的一个或多个指定设备的标识。在宏指令中包含指定设备的标识,且指定设备的资源满足执行条件时,第一确定子模块111可以直接将指定设备确定为运行设备,节省基于宏指令生成运行指令的生成时间,且可以保证所生成的运行指令能够被对应的运行设备所执行。In this implementation, the macro instruction may include the identifier of one or more designated devices that execute the macro instruction. When the identifier of the specified device is included in the macro instruction, and the resources of the specified device meet the execution conditions, the first determination submodule 111 can directly determine the specified device as the running device, saving the generation time of generating the running instruction based on the macro instruction, and ensuring that The generated operation instruction can be executed by the corresponding operation device.
在一种可能的实现方式中,如图6所示,指令生成设备100还可以包括资源获取模块14。设备确定模块11还可以包括第二确定子模块112。资源获取模块14用于获取备选设备的资源信息。第二确定子模块112用于在确定宏指令中不包含指定设备的标识时,根据接收到的宏指令和备选设备的资源信息,从备选设备中确定出用于执行宏指令的运行设备。其中,资源信息可以包括备选设备所包含的指令集。备选设备所包含的指令集可以是对应于一种或多种宏指令的操作类型的指令集合。备选设备所包含的指令集越多,备选设备能够执行的宏指令的类型越多。In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include a resource acquisition module 14. The device determination module 11 may further include a second determination sub-module 112. The resource acquisition module 14 is used to acquire resource information of candidate devices. The second determining submodule 112 is used to determine the running device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not contain the identifier of the specified device . The resource information may include the instruction set included in the candidate device. The instruction set included in the alternative device may be an instruction set corresponding to the operation type of one or more macroinstructions. The more instruction sets the candidate device contains, the more types of macro instructions the candidate device can execute.
在该实现方式中,第二确定子模块112在确定宏指令中不包含指定设备的标识时,可以从备选设备中确定出能够执行宏指令的一个或多个运行设备。其中,所确定的运行设备的指令集中包括与宏指令相对应的指令集合。例如,接收到的宏指令为神经网络计算宏指令,可将包含对应于神经网络计算宏指令的指令集的备选设备确定为运行设备,以保证运行设备可以运行生成的运行指令。In this implementation manner, when it is determined that the macro instruction does not include the identifier of the specified device, the second determination submodule 112 may determine one or more running devices capable of executing the macro instruction from the candidate devices. Among them, the determined instruction set of the running device includes an instruction set corresponding to the macro instruction. For example, if the received macro instruction is a neural network calculation macro instruction, the candidate device containing the instruction set corresponding to the neural network calculation macro instruction may be determined as the operation device to ensure that the operation device can execute the generated operation instruction.
在一种可能的实现方式中,如图6所示,设备确定模块11还可以包括第三确定子模块113。第三确定子模块113在确定宏指令中包含指定设备的标识,且指定设备的资源不满足执行宏指令的执行条件时,根据宏指令和备选设备的资源信息,确定运行设备。In a possible implementation manner, as shown in FIG. 6, the device determination module 11 may further include a third determination sub-module 113. The third determining submodule 113 determines that the operating device is based on the macro instruction and the resource information of the candidate device when it is determined that the macro instruction contains the identifier of the designated device, and the resource of the designated device does not satisfy the execution conditions for executing the macro instruction.
在该实现方式中,第三确定子模块113在确定宏指令中包含指定设备的标识,且指定设备的资源不满足执行条件时,可以认定该宏指令的指定设备不具备执行宏指令的能力。第三确定子模块113可以从备选设备中确定运行设备,可以将包含与宏指令相对应的指令集的备选设备确定为运行设备。In this implementation manner, when the third determining submodule 113 determines that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions, it may be determined that the designated device of the macro instruction does not have the ability to execute the macro instruction. The third determining sub-module 113 may determine the running device from the candidate devices, and may determine the candidate device containing the instruction set corresponding to the macro instruction as the running device.
在一种可能的实现方式中,如图6所示,宏指令可以包含输入量和输出量中的至少一项,指令生成模块12还用于确定宏指令的数据量,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令。其中,宏指令的数据量可以是根据输入量和输出量中的至少一项确定的,运行设备的资源信息还可以包括存储容量、剩余存储容量的至少一项。In a possible implementation manner, as shown in FIG. 6, the macro instruction may include at least one of the input amount and the output amount, and the instruction generation module 12 is further used to determine the data amount of the macro instruction, based on the data amount of the macro instruction , Macro instruction and resource information of running equipment to generate running instruction. The data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the running device may further include at least one of storage capacity and remaining storage capacity.
其中,运行设备的存储容量可以是指运行设备的存储器可以容纳的二进制信息量。运行设备的剩余存储容量可以是指去除被占用的存储容量之后,运行设备当前所能用于指令运行的存储容量。运行设备的资源信息能够表征该运行设备的运行能力。存储容量越大、剩余存储容量越大,运行设备的运行能力越强。The storage capacity of the running device may refer to the amount of binary information that the memory of the running device can hold. The remaining storage capacity of the running device may refer to the storage capacity that the running device can currently use for command operation after removing the occupied storage capacity. The resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity, the stronger the operating capability of the operating equipment.
在该实现方式中,指令生成模块12可以根据每个运行设备的资源信息、宏指令的数据量等,确定拆分宏指令的具体方式,以对宏指令进行拆分,生成与运行设备相对应的运行指令。In this implementation, the instruction generation module 12 may determine the specific method of splitting the macro instruction according to the resource information of each running device, the data amount of the macro instruction, etc., so as to split the macro instruction and generate the corresponding to the running device Running instructions.
在一种可能的实现方式中,如图6所示,指令生成模块12可以包括第一指令生成子模块121。第一指令生成子模块121用于在确定运行设备为一个,且在运行设备的资源不满足执行宏指令的容量条件时,根据运行设备的运行数据量和数据量将宏指令拆分成多条运行指令,以使运行设备依次执行多条运行指令。其中,运行设备的运行数据量可以是根据运行设备的资源信息确定的,每条运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量可以是根据运行数据量确定的。In a possible implementation manner, as shown in FIG. 6, the instruction generation module 12 may include a first instruction generation sub-module 121. The first instruction generation sub-module 121 is used to split the macro instruction into multiple pieces according to the amount of operation data and the amount of data of the operation device when it is determined that the operation device is one and the resource of the operation device does not meet the capacity condition for executing the macro instruction Run instructions, so that the running equipment executes multiple run instructions in sequence. The operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity may be based on the operation data The amount is determined.
在该实现方式中,运行设备的运行数据量可以是根据运行设备的存储容量或剩余存储容量确定的。容量条件可以是运行设备的运行数据量大于或等于宏指令的数据量,换言之,运行设备的资源不满足执行宏指令的容量条件可以是指:运行设备的运行数据量小于宏指令的数据量。运行输入量和运行输出量需小于或等于运行数据量,以保证所生成的运行指令可以被运行设备执行。多个运行指令中不同运行指令的运行输入量(或运行输出量)可以相同,也可以不同,本公开对此不作限制。In this implementation, the amount of operation data of the operation device may be determined according to the storage capacity or remaining storage capacity of the operation device. The capacity condition may be that the amount of running data of the running device is greater than or equal to the amount of data of the macro instruction. In other words, if the resource of the running device does not meet the capacity condition of executing the macro instruction, it may mean that the amount of running data of the running device is less than the amount of data of the macro instruction. The operation input and operation output must be less than or equal to the operation data to ensure that the generated operation instructions can be executed by the operation equipment. The operation input quantity (or operation output quantity) of different operation instructions among the plurality of operation instructions may be the same or different, and the disclosure does not limit this.
在该实现方式中,第一指令生成子模块121在确定运行设备为一个,且在运行设备的资源满足执行宏指令的容量条件时,可以直接将宏指令转化为一个运行指令,还可以将宏指令拆分为多个运行指令,本公开对此不作限制。In this implementation manner, when the first instruction generation sub-module 121 determines that there is only one running device, and when the resources of the running device meet the capacity condition for executing the macro instruction, it can directly convert the macro instruction into a running instruction, and also The instruction is split into multiple running instructions, which is not limited in this disclosure.
在一种可能的实现方式中,如图6所示,指令生成模块12可以包括第二指令生成子模块122。第二指令生成子模块122用于在确定运行设备为多个时,根据每个运行设备的运行数据量和数据量对宏指令进行拆分,生成对应于每个运行设备的运行指令。其中,每个运行设备的运行数据量可以是根据每个运行设备的资源信息确定的,运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量是根据执行运行指令的运行设备的运行数据量确定的。In a possible implementation manner, as shown in FIG. 6, the instruction generation module 12 may include a second instruction generation sub-module 122. The second instruction generation sub-module 122 is configured to split the macro instruction according to the operation data amount and data amount of each operation device when it is determined that there are multiple operation devices, and generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
在该实现方式中,运行输入量和运行输出量需小于或等于运行数据量,以保证所生成的运行指令可以被运行设备执行。第二指令生成子模块122可以根据每个运行设备的运行数据量,为每个运行设备生成一个或多个运行指令,以供对应的运行设备执行。In this implementation mode, the operation input amount and the operation output amount need to be less than or equal to the operation data amount to ensure that the generated operation instruction can be executed by the operation device. The second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device for the corresponding operation device to execute.
在上述实现方式中,运行指令中包含运行输入量、运行输出量中的至少一项。这样,除了可以限定运行指令的数据量,使其能够被对应的运行设备执行之外,还可以满足不同运行指令对运行输入量和/或运行输出量的特殊限定需求。In the above implementation, the operation instruction includes at least one of the operation input amount and the operation output amount. In this way, in addition to limiting the data volume of the operation instructions so that they can be executed by the corresponding operation equipment, it can also meet the special limitation requirements of the operation input quantity and / or the operation output quantity of different operation instructions.
在一种可能的实现方式中,对于一些对运行输入量和/或运行输出量没有特殊限定需求的运行指令,该运行指令中可以不包含运行输入量和/或运行输出量,可以预先设置默认运行输入量和默认运行输出量,使得运行设备在确定接收到的运行指令中不存在运行输入量、运行输出量时,可以将默认运行输入量、默认运行输出量作为该运行指令的运行输入量、运行输出量。通过预设默认运行输入量和默认运行输出量的方式,可以简化运行指令的生成过程,节省运行指令的生成时间。In a possible implementation manner, for some operation instructions that do not have special requirements on the operation input and / or operation output, the operation instruction may not include the operation input and / or operation output, and the default may be set in advance. The operation input value and the default operation output value enable the operation device to use the default operation input value and the default operation output value as the operation input value of the operation instruction when it is determined that there is no operation input value or operation output value in the received operation instruction 3. Operation output. By presetting the default operation input amount and the default operation output amount, the generation process of the operation instruction can be simplified and the generation time of the operation instruction can be saved.
在一种可能的实现方式中,可以将预先设置针对不同类型的宏指令的默认输入量和默认输出量。在宏指令中不包含输入量和输出量时,可以将对应的预先设置的默认输入量和默认输出量作为宏指令的输入量和输出量。进而根据默认输入量和/或默认输出量确定宏指令的数据量,并根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令。在宏指令中不包含输入量和输出量时,所生成的运行指令可以不包含运行输入量和运行输出量,也可以包含运行输入量和运行输出量中的至少一项。在运行指令中不包含运行输入量和/或运行输出量时,运行设备可以根据预先设置的默认运行输入量和/或默认运行输出量执行运行指令。In a possible implementation manner, the default input amount and the default output amount for different types of macro instructions may be preset. When the input quantity and output quantity are not included in the macro instruction, the corresponding preset default input quantity and default output quantity can be used as the input quantity and output quantity of the macro instruction. Furthermore, the data amount of the macro instruction is determined according to the default input amount and / or the default output amount, and the operation instruction is generated according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device. When the macro instruction does not include the input quantity and the output quantity, the generated operation instruction may not include the operation input quantity and the operation output quantity, or may include at least one of the operation input quantity and the operation output quantity. When the operation instruction does not include the operation input value and / or the operation output value, the operation device may execute the operation instruction according to the preset default operation input value and / or default operation output value.
在一种可能的实现方式中,指令生成模块12还可以根据宏指令以及预先设置的宏指令拆分规则,对宏指令进行拆分生成运行指令。宏指令拆分规则可以是根据常规的宏指令拆分方式(例如,根据宏指令的处理过程等进行拆分),结合所有备选设备能够执行的指令的运行数据量阈值确定的。将宏指 令拆分成运行输入量以及运行输出量均小于或等于运行数据量阈值的运行指令,以保证生成的运行指令可以在其对应的运行设备(运行设备为备选设备中的任意一个)中被执行。其中,可以比较所有备选设备的存储容量(或剩余存储容量),将确定出的最小的存储容量(或剩余存储容量)确定为所有备选设备能够执行的指令的运行数据量阈值。In a possible implementation manner, the instruction generation module 12 may also split the macro instruction to generate a running instruction according to the macro instruction and the preset macro instruction splitting rule. The macro splitting rule may be determined according to the conventional macro splitting method (for example, splitting according to the processing of the macro command, etc.), combined with the threshold of the operating data amount of the commands that all candidate devices can execute. Split the macro instruction into operation instructions whose operation input and operation output are both less than or equal to the operation data threshold to ensure that the generated operation instruction can be in its corresponding operation device (the operation device is any one of the alternative devices) Was executed. Among them, the storage capacity (or remaining storage capacity) of all the candidate devices can be compared, and the determined minimum storage capacity (or remaining storage capacity) can be determined as the threshold of the operating data amount of instructions that all the candidate devices can execute.
应当理解的是,本领域技术人员可以根据实际需要对运行指令的生成方式进行设置,本公开对此不作限制。It should be understood that a person skilled in the art may set the generation method of the running instruction according to actual needs, which is not limited in this disclosure.
在本实施例中,指令生成模块根据宏指令所生成的运行指令可以是待执行指令,也可以是对待执行指令进行解析所获得的解析后的一个或多个指令,本公开对此不作限制。In this embodiment, the running instruction generated by the instruction generating module according to the macro instruction may be an instruction to be executed, or one or more parsed instructions obtained by parsing the instruction to be executed, which is not limited in the present disclosure.
在一种可能的实现方式中,如图6所示,指令生成设备100还可以包括队列构建模块15。队列构建模块15用于根据队列排序规则对运行指令进行排序,根据排序后的运行指令构建与运行设备相对应的指令队列。In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include a queue construction module 15. The queue construction module 15 is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
在该实现方式中,可以为每个运行设备构建与之唯一对应的指令队列。可以按照指令队列中运行指令的排序,依次向指令队列唯一对应的运行设备发送运行指令;或者可以将指令队列发送至运行设备,以使运行设备按照指令队列中运行指令的排序依次执行其中的运行指令。通过上述方式,可保证运行设备按照指令队列执行运行指令,避免运行指令被错误、延误执行,避免运行指令被遗漏执行。In this implementation, an instruction queue uniquely corresponding to each running device can be constructed. The running instructions in the instruction queue can be ordered, and the running instructions can be sent to the only corresponding running device in the instruction queue; or the instruction queue can be sent to the running device, so that the running devices execute the operations in the order of the running instructions in the command queue. instruction. In this way, it can be ensured that the running equipment executes the running instructions according to the instruction queue, avoids the running instructions from being erroneous or delayed, and prevents the running instructions from being missed.
在该实现方式中,队列排序规则可以是根据执行运行指令的预计执行时长、运行指令的生成时间、与运行指令自身相关的运行输入量、运行输出量、操作类型等信息确定的,本公开对此不作限制。In this implementation, the queue ordering rules may be determined based on the estimated execution time of the execution of the execution instruction, the generation time of the execution instruction, the operation input amount, operation output amount, operation type and other information related to the operation instruction itself. There are no restrictions.
在一种可能的实现方式中,如图6所示,指令生成设备100还可以包括指令分派模块16。指令分派模块16用于将运行指令发送至运行设备,以使运行设备执行运行指令。In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include an instruction dispatch module 16. The instruction dispatch module 16 is used to send the operation instruction to the operation equipment, so that the operation equipment executes the operation instruction.
在该实现方式中,在运行设备所执行的运行指令为一个时,可以直接将该运行指令发送至运行设备。在运行设备所执行的运行指令为多个时,可以将多个运行指令全部发送至运行设备,以使运行设备依次执行多个运行指令。还可以将多个运行指令依次发送给与之对应的运行设备,其中,每次在运行设备执行完成当前运行指令之后,向运行设备发送与之对应的下一个运行指令。本领域技术人员可以对向运行设备发送运行指令的方式进行设置,本公开对此不作限制。In this implementation manner, when there is only one operation instruction executed by the operation device, the operation instruction may be directly sent to the operation device. When there are multiple operating instructions executed by the operating device, all the multiple operating instructions may be sent to the operating device, so that the operating device sequentially executes the multiple operating instructions. It is also possible to send a plurality of operation instructions to the corresponding operation equipment in sequence, wherein each time the operation equipment completes the current operation instruction, the next operation instruction corresponding to the operation equipment is sent to the operation equipment. A person skilled in the art may set the manner of sending the operation instruction to the operation device, and the disclosure does not limit this.
在一种可能的实现方式中,如图6所示,指令分派模块16可以包括指令汇编子模块161、汇编翻译子模块162和指令发送子模块163。指令汇编子模块161用于根据所述运行指令生成汇编文件。汇编翻译子模块162用于将汇编文件翻译成二进制文件。指令发送子模块163用于将二进制文件发送至运行设备,以使运行设备根据二进制文件执行运行指令。In a possible implementation, as shown in FIG. 6, the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162 and an instruction sending submodule 163. The instruction assembly submodule 161 is used to generate an assembly file according to the operation instruction. The assembly translation submodule 162 is used to translate the assembly file into a binary file. The instruction sending submodule 163 is used to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
通过上述方式,可以降低运行指令的数据量,节省向运行设备发送运行指令的时间,提高宏指令的转换、执行速度。Through the above method, the data amount of the running instruction can be reduced, the time for sending the running instruction to the running device can be saved, and the conversion and execution speed of the macro instruction can be improved.
在该实现方式中,在二进制文件被发送至运行设备之后,运行设备可以对接收到的二进制文件进行译码获得对应的运行指令,并执行所获得的运行指令,获得执行结果。In this implementation manner, after the binary file is sent to the running device, the running device can decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.
在一种可能的实现方式中,运行设备可以为CPU、GPU和嵌入式神经网络处理器(Neural-network Processing Unit,简称NPU)中的其中一种或任意组合。这样,提高了指令生成设备根据宏指令生成运行指令的速度。In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and embedded neural network processor (Neural-network Processing Unit, NPU for short). In this way, the speed at which the instruction generation device generates the operation instruction according to the macro instruction is increased.
在一种可能的实现方式中,该指令生成设备100可以设置于CPU和/或NPU中。以实现通过CPU和/或NPU实现根据宏指令生成运行指令的过程,为指令生成设备的实现提供了更多的可能方式。In a possible implementation, the instruction generating device 100 may be set in the CPU and / or NPU. In order to realize the process of generating the running instruction according to the macro instruction through the CPU and / or NPU, it provides more possible ways for the realization of the instruction generating device.
在一种可能的实现方式中,运行设备200还包括存储模块23。该存储模块23可以包括寄存器和缓存中的至少一种,缓存可以包括高速暂存缓存。缓存可以用于存储数据。寄存器可以用于存储数据中的标量数据。In a possible implementation manner, the running device 200 further includes a storage module 23. The storage module 23 may include at least one of a register and a cache, and the cache may include a high-speed temporary storage cache. The cache can be used to store data. Registers can be used to store scalar data in the data.
在一种可能的实现方式中,控制模块21可以包括指令存储子模块211和指令处理子模块212。指令存储子模块211用于存储运行指令。指令处理子模块212用于对运行指令进行解析,得到多个解析指令。In a possible implementation, the control module 21 may include an instruction storage submodule 211 and an instruction processing submodule 212. The instruction storage sub-module 211 is used to store operation instructions. The instruction processing sub-module 212 is used to parse the running instruction to obtain multiple parsed instructions.
在一种可能的实现方式中,控制模块21还可以包括存储队列子模块213。存储队列子模块213用于存储运行指令队列,该运行指令队列中包含运行设备所需执行的运行指令以及多个解析指令。在运行指令队列中所有指令按照执行的先后顺序依次排列。In a possible implementation, the control module 21 may further include a storage queue sub-module 213. The storage queue sub-module 213 is used to store a running instruction queue, and the running instruction queue includes a running instruction to be executed by the running device and a plurality of parsing instructions. In the running instruction queue, all instructions are arranged in the order of execution.
在一种可能的实现方式中,执行模块22还可以包括依赖关系处理子模块221。依赖关系处理子模块221用于在确定第一解析指令与第一解析指令之前的第零解析指令存在关联关系时,将第一解析指令缓存在指令存储子模块中,在第零解析指令执行完毕后,从指令存储子模块中提取第一解析指令发送至执行模块。In a possible implementation manner, the execution module 22 may further include a dependency relationship processing sub-module 221. The dependency processing submodule 221 is used to cache the first parsing instruction in the instruction storage submodule when it is determined that the first parsing instruction is associated with the zeroth parsing instruction before the first parsing instruction, and the execution of the zeroth parsing instruction is completed After that, the first parsing instruction is extracted from the instruction storage submodule and sent to the execution module.
其中,第一解析指令与第一解析指令之前的第零解析指令存在关联关系可以包括:存储第一解析指令所需数据的第一存储地址区间与存储第零解析指令所需数据的第零存储地址区间具有重叠区域。反之,第一解析指令与第零解析指令之间没有关联关系可以是第一存储地址区间与第零存储地址区间没有重叠区域。Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: a first storage address interval storing data required by the first parsing instruction and a zeroth storage storing data required by the zeroth parsing instruction The address interval has overlapping areas. Conversely, there is no association between the first parsing instruction and the zeroth parsing instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
需要说明的是,尽管本文以上述实施例作为示例对神经网络指令处理系统进行了如上介绍,但本领域技术人员能够理解,本公开应不限于此。事实上,用户完全可根据个人喜好和/或实际应用场景灵活设定各模块,只要符合本公开的技术方案即可。It should be noted that although the above embodiment is taken as an example to introduce the neural network instruction processing system as described above, those skilled in the art can understand that the present disclosure should not be limited to this. In fact, the user can set various modules flexibly according to personal preferences and / or actual application scenarios, as long as the technical solutions of the present disclosure are met.
图7示出根据本公开一实施例的神经网络指令处理方法的流程图。如图7所示,该方法应用于上述包括指令生成设备和运行设备的神经网络指令处理系统,该方法包括步骤S51和步骤S52。7 shows a flowchart of a neural network instruction processing method according to an embodiment of the present disclosure. As shown in FIG. 7, the method is applied to the neural network instruction processing system including the instruction generation device and the operation device. The method includes step S51 and step S52.
步骤S51:通过指令生成设备根据接收到的宏指令,确定执行宏指令的运行设备,并根据宏指令和运行设备,生成运行指令。Step S51: The instruction generation device determines the operation device that executes the macro instruction according to the received macro instruction, and generates the operation instruction according to the macro instruction and the operation device.
步骤S52:通过运行设备获取数据、神经网络模型以及运行指令,对运行指令进行解析,获得多个解析指令,并根据数据执行多个解析指令,得到执行结果。Step S52: Obtain data, neural network model and operation instruction through the operation device, analyze the operation instruction, obtain multiple analysis instructions, and execute the multiple analysis instructions according to the data to obtain the execution result.
在一种可能的实现方式中,步骤S51可以包括:在确定宏指令中包含指定设备的标识,且指定设备的资源满足执行宏指令的执行条件时,将指定设备确定为运行设备。其中,执行条件可以包括:指定设备中包含与宏指令相对应的指令集。In a possible implementation manner, step S51 may include: when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition of executing the macro instruction, determining the designated device as the running device. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.
在一种可能的实现方式中,该方法还可以包括:获取备选设备的资源信息。In a possible implementation manner, the method may further include: acquiring resource information of the candidate device.
其中,步骤S51还可以包括:在确定宏指令中不包含指定设备的标识时,根据接收到的宏指令和备选设备的资源信息,从备选设备中确定出用于执行宏指令的运行设备。其中,资源信息可以包括备选设备所包含的指令集。Wherein, step S51 may also include: when it is determined that the macro instruction does not contain the identifier of the designated device, determine the running device for executing the macro instruction from the candidate device based on the received macro instruction and the resource information of the candidate device . The resource information may include the instruction set included in the candidate device.
在一种可能的实现方式中,步骤S51还可以包括:在确定宏指令中包含指定设备的标识,且指定设备的资源不满足执行宏指令的执行条件时,根据宏指令和备选设备的资源信息,确定运行设备。In a possible implementation, step S51 may further include: when it is determined that the macro instruction contains the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the macro instruction, according to the macro instruction and the resources of the alternative device Information to determine the operating equipment.
在一种可能的实现方式中,宏指令可以包含输入量和输出量中的至少一项,步骤S51中根据宏指令和运行设备,生成运行指令,可以包括:确定宏指令的数据量,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令。其中,数据量是根据输入量和输出量中的至少一项确定的,运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。In a possible implementation manner, the macro instruction may include at least one of the input amount and the output amount. In step S51, generating the operation instruction according to the macro instruction and the operation device may include: determining the data amount of the macro instruction, according to the macro The data volume of the instruction, the macro instruction and the resource information of the operation equipment generate the operation instruction. Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
在一种可能的实现方式中,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令,可以包括:在确定运行设备为一个,且运行设备的资源不满足执行宏指令的容量条件时,根据运行设备的运行数据量和数据量将宏指令拆分成多条运行指令,以使运行设备依次执行多条运行指令。其中,运行设备的运行数据量可以是根据运行设备的资源信息确定的,每条运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量是根据运行数据量确定的。In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: determining that the operation device is one and the resource of the operation device does not satisfy the execution of the macro instruction Under the capacity condition, the macro instruction is divided into multiple operation instructions according to the operation data amount and data amount of the operation device, so that the operation device executes multiple operation instructions in sequence. The operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, and each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity are based on the operation data quantity definite.
在一种可能的实现方式中,根据宏指令的数据量、宏指令和运行设备的资源信息,生成运行指令,可以包括:在确定运行设备为多个时,根据每个运行设备的运行数据量和数据量对宏指令进行拆分,生成对应于每个运行设备的运行指令。其中,每个运行设备的运行数据量可以是根据每个运行设备的资源信息确定的,运行指令可以包含运行输入量和运行输出量中的至少一项,运行输入量和运行输出量是根据执行运行指令的运行设备的运行数据量确定的。In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: when determining that there are multiple operation devices, according to the operation data amount of each operation device The macro instruction is split with the amount of data to generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.
在一种可能的实现方式中,该方法还可以包括:通过指令生成设备根据队列排序规则对运行指令进行排序,根据排序后的运行指令构建与运行设备相对应的指令队列。In a possible implementation manner, the method may further include: sorting the execution instructions according to the queue sorting rule by the instruction generation device, and constructing an instruction queue corresponding to the execution device according to the sorted execution instructions.
在一种可能的实现方式中,该方法还可以包括:通过指令生成设备接收待执行指令,根据确定的指定设备的标识和待执行指令生成宏指令。In a possible implementation manner, the method may further include: receiving the instruction to be executed through the instruction generating device, and generating a macro instruction according to the determined identifier of the designated device and the instruction to be executed.
在一种可能的实现方式中,该方法还可以包括:通过指令生成设备将运行指令发送至运行设备, 以使运行设备执行运行指令。In a possible implementation manner, the method may further include: sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction.
在一种可能的实现方式中,通过指令生成设备将运行指令发送至运行设备,以使运行设备执行运行指令,可以包括:In a possible implementation manner, sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction, may include:
根据运行指令生成汇编文件;Generate assembly files according to the operating instructions;
将汇编文件翻译成二进制文件;Translate assembly files into binary files;
将二进制文件发送至运行设备,以使运行设备根据二进制文件执行运行指令。Send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
在一种可能的实现方式中,该方法还可以包括:通过运行设备存储数据以及数据中的标量数据。其中,运行设备包括存储模块,存储模块包括寄存器、缓存中的一个或多个,缓存包括高速暂存缓存,缓存,用于存储数据;寄存器,用于存储数据中标量数据。In a possible implementation manner, the method may further include: storing data and scalar data in the data through the running device. Among them, the running device includes a storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache, a cache for storing data, and a register for storing scalar data in the data.
在一种可能的实现方式中,该方法还可以包括:In a possible implementation manner, the method may further include:
通过运行设备存储运行指令;Store running instructions through running equipment;
通过运行设备对运行指令进行解析,得到多个解析指令;Analyze the operation instructions through the operation equipment to obtain multiple analysis instructions;
通过运行设备存储运行指令队列,其中,运行指令队列包括运行指令和多个解析指令,在运行指令队列中,运行指令和多个解析指令按照被执行的先后顺序依次排列。The running instruction queue is stored by the running device, where the running instruction queue includes the running instruction and multiple parsing instructions. In the running instruction queue, the running instruction and the multiple parsing instructions are arranged in order according to the order in which they are executed.
在一种可能的实现方式中,该方法还可以包括:In a possible implementation manner, the method may further include:
通过运行设备在确定第一解析指令与第一解析指令之前的第零解析指令存在关联关系时,缓存第一解析指令,在第零解析指令执行完毕后,执行缓存的第一解析指令。When the running device determines that there is an association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cached first parsing instruction is executed.
其中,第一解析指令与第一解析指令之前的第零解析指令存在关联关系包括:存储第一解析指令所需数据的第一存储地址区间与存储第零解析指令所需数据的第零存储地址区间具有重叠的区域。Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes: a first storage address interval storing data required by the first parsing instruction and a zeroth storage address storing data required by the zeroth parsing instruction Intervals have overlapping areas.
在一种可能的实现方式中,运行设备可以为CPU、GPU和NPU中的其中一种或任意组合。In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and NPU.
在一种可能的实现方式中,指令生成设备设置于CPU和/或NPU中。In a possible implementation, the instruction generating device is set in the CPU and / or NPU.
在一种可能的实现方式中,宏指令可以包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运宏指令。In a possible implementation manner, the macro instruction may include at least one of the following instructions: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.
其中,计算宏指令可以包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种。控制宏指令可以包括无条件跳转宏指令和有条件跳转宏指令中的至少一种。数据搬运宏指令可以包括读宏指令和写宏指令中的至少一种。读宏指令可以包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。写宏指令可以包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。The calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction. The control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction. The data handling macro instruction may include at least one of a read macro instruction and a write macro instruction. The read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction. The write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.
在一种可能的实现方式中,宏指令可以包含以下选项中的至少一项:用于执行宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数。运行指令可以包含以下选项中的至少一项:操作类型、输入地址、输出地址、操作数和指令参数。In a possible implementation, the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters. The run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.
本公开实施例所提供的神经网络指令处理方法,通过指令生成设备根据接收到的宏指令,确定执行宏指令的运行设备,并根据宏指令和运行设备,生成运行指令;通过运行设备获取数据、神经网络模型以及运行指令,对运行指令进行解析,获得多个解析指令,并根据数据执行多个解析指令,得到执行结果。该方法可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。The neural network instruction processing method provided by the embodiment of the present disclosure determines the operation device that executes the macro instruction according to the received macro instruction through the instruction generation device, and generates the operation instruction according to the macro instruction and the operation device; The neural network model and the operation instructions analyze the operation instructions to obtain multiple analysis instructions, and execute the multiple analysis instructions based on the data to obtain the execution result. The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.
依据以下条款可更好地理解前述内容:The foregoing can be better understood based on the following terms:
条款A1、一种神经网络指令处理系统,所述系统包括指令生成设备和运行设备,Clause A1, a neural network instruction processing system, the system includes an instruction generation device and an operation device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的宏指令,确定执行所述宏指令的运行设备;A device determination module, configured to determine a running device that executes the macro instruction according to the received macro instruction;
指令生成模块,用于根据所述宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to generate an operation instruction according to the macro instruction and the operation device;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果。The execution module is configured to execute the multiple parsing instructions according to the data to obtain an execution result.
条款A2、根据条款A1所述的系统,所述指令生成设备还包括:Clause A2. The system according to Clause A1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ;
第二确定子模块,用于在确定所述宏指令中不包含所述指定设备的标识时,根据接收到的宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述宏指令的运行设备;A second determining submodule, configured to determine from the candidate device based on the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not include the identifier of the designated device A running device for executing the macro instruction;
第三确定子模块,在确定所述宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述宏指令的执行条件时,根据所述宏指令和所述备选设备的资源信息,确定运行设备,A third determination submodule, when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the alternative Equipment resource information, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the macro instruction, and the resource information includes an instruction set included in the candidate device.
条款A3、根据条款A2所述的系统,所述宏指令包含输入量和输出量中的至少一项,Clause A3. The system according to Clause A2, the macro instruction includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述宏指令的数据量,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款A4、根据条款A3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause A4. The system according to Clause A3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的资源不满足执行所述宏指令的容量条件时,根据所述运行设备的运行数据量和所述数据量将所述宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used to determine, according to the amount of operation data of the operation device and the data, when the operation device is determined to be one and the resource of the operation device does not satisfy the capacity condition for executing the macro instruction The macro instruction is divided into multiple running instructions to enable the running device to execute multiple running instructions in sequence,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述宏指令进行拆分,生成对应于每个运行设备的运行指令,A second instruction generation sub-module, used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation device 'S running instructions,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on execution The operation data amount of the operation equipment of the operation instruction is determined.
条款A5、根据条款A1所述的系统,所述指令生成设备还包括:Clause A5. The system according to Clause A1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款A6、根据条款A2所述的系统,所述指令生成设备还包括:Clause A6. The system according to Clause A2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行指令,根据确定的指定设备的标识和所述待执行指令生成所述宏指令。The macro generation module is used for receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
条款A7、根据条款A1所述的系统,所述指令生成设备还包括:Clause A7. The system according to Clause A1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款A8、根据条款A1所述的系统,所述运行设备,还包括:Clause A8. The system according to Clause A1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款A9、根据条款A1所述的系统,所述控制模块,包括:Clause A9. The system according to Clause A1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款A10、根据条款A9所述的系统,所述执行模块,包括:Clause A10. The system according to Clause A9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款A11、根据条款A1所述的系统,Clause A11, the system according to Clause A1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述宏指令包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运指令,The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling instruction,
其中,所述计算宏指令包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,
数据搬运宏指令包括读宏指令和写宏指令中的至少一种,所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种,所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种;The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
所述宏指令包含以下选项中的至少一项:用于执行所述宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数,The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
所述运行指令包含以下选项中的至少一项:所述操作类型、所述输入地址、所述输出地址、所述操作数和所述指令参数。The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
条款A12、一种机器学习运算装置,所述装置包括:Clause A12. A machine learning computing device, the device comprising:
一个或多个如条款A1-条款A11任一项所述的神经网络指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more neural network instruction processing systems as described in any one of Clause A1-Clause A11, used to obtain the data and control information to be calculated from other processing devices, and perform the specified machine learning operation, and pass the execution result through I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述神经网络指令处理系统时,所述多个所述神经网络指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning computing device includes a plurality of the neural network instruction processing systems, the plurality of the neural network instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述神经网络指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述神经网络指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述神经网络指令处理系统共享内存或者拥有各自的内存;多个所述神经网络指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the neural network instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction processing systems share the same control system or own Respective control systems; multiple neural network instruction processing systems share memory or have their own memory; multiple neural network instruction processing systems are interconnected in an arbitrary topology.
条款A13、一种组合处理装置,所述组合处理装置包括:Clause A13. A combined processing device, the combined processing device comprising:
如条款A12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause A12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款A14、一种机器学习芯片,所述机器学习芯片包括:Article A14. A machine learning chip, the machine learning chip includes:
如条款A12所述的机器学习运算装置或如条款A13所述的组合处理装置。The machine learning arithmetic device according to clause A12 or the combined processing device according to clause A13.
条款A15、一种电子设备,所述电子设备包括:Article A15. An electronic device, the electronic device comprising:
如条款A14所述的机器学习芯片。Machine learning chip as described in clause A14.
条款A16、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款A14所述的机器学习芯片;Clause A16, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause A14;
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款A17、一种神经网络指令处理方法,所述方法应用于神经网络指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause A17. A neural network instruction processing method. The method is applied to a neural network instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的宏指令,确定执行所述宏指令的运行设备,并根据所述宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, a running device that executes the macro instruction according to the received macro instruction, and generating a running instruction according to the macro instruction and the running device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果。Obtain data, a neural network model, and an operation instruction through the operation device, analyze the operation instruction, obtain a plurality of analysis instructions, and execute the plurality of analysis instructions according to the data to obtain an execution result.
条款A18、根据条款A17所述的方法,所述方法还包括:Clause A18. The method according to Clause A17, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的宏指令,确定执行所述宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the operating device that executes the macro instruction according to the received macro instruction, including at least one of the following:
在确定所述宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, the designated device is determined as the running device;
在确定所述宏指令中不包含所述指定设备的标识时,根据接收到的宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述宏指令的运行设备;When it is determined that the macro instruction does not include the identifier of the designated device, according to the received macro instruction and the resource information of the candidate device, a candidate for executing the macro instruction is determined from the candidate device Running equipment
在确定所述宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述宏指令的执行条件时,根据所述宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Running equipment,
其中,所述执行条件包括:所述指定设备中包含与所述宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the macro instruction, and the resource information includes an instruction set included in the candidate device.
条款A19、根据条款A18所述的方法,所述宏指令包含输入量和输出量中的至少一项,Clause A19. The method according to Clause A18, the macro instruction includes at least one of an input quantity and an output quantity,
根据所述宏指令和所述运行设备,生成运行指令,包括:According to the macro instruction and the operation device, generating an operation instruction includes:
确定所述宏指令的数据量,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款A20、根据条款A19所述的方法,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause A20. According to the method of Clause A19, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的资源不满足执行所述宏指令的容量条件时,根据所述运行设备的运行数据量和所述数据量将所述宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that the running device is one, and the resources of the running device do not satisfy the capacity condition for executing the macro instruction, the macro instruction is split into according to the running data amount of the running device and the data amount Multiple operating instructions, so that the operating device sequentially executes multiple operating instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on execution The operation data amount of the operation equipment of the operation instruction is determined.
条款A21、根据条款A17所述的方法,所述方法还包括:Clause A21. The method according to Clause A17, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款A22、根据条款A18所述的方法,所述方法还包括:Clause A22. The method according to Clause A18, the method further comprising:
通过所述指令生成设备接收待执行指令,根据确定的指定设备的标识和所述待执行指令生成所述宏指令。The instruction generating device receives the instruction to be executed, and generates the macro instruction according to the determined identifier of the designated device and the instruction to be executed.
条款A23、根据条款A17所述的方法,所述方法还包括:Clause A23. The method according to Clause A17, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指 令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款A24、根据条款A17所述的方法,所述方法还包括:Clause A24. The method according to Clause A17, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存的一个或多个,所述缓存包括高速暂存缓存,Wherein, the running device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款A25、根据条款A17所述的方法,所述方法还包括:Clause A25. The method according to Clause A17, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款A26、根据条款A25所述的方法,所述方法还包括:Clause A26. The method according to Clause A25, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款A27、根据条款A17所述的方法,Clause A27, according to the method described in Clause A17,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述宏指令包括以下指令中的至少一种:The macro instruction includes at least one of the following instructions:
计算宏指令、控制宏指令和数据搬运指令,Calculation macro instruction, control macro instruction and data handling instruction,
其中,所述计算宏指令包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,
数据搬运宏指令包括读宏指令和写宏指令中的至少一种,所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种,所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种;The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
所述宏指令包括以下选项中的至少一项:The macro instruction includes at least one of the following options:
用于执行所述宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数,The identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
所述运行指令包括以下选项中的至少一项:所述操作类型、所述输入地址、所述输出地址、所述操作数和所述指令参数。The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
对于不同的指令上述装置、系统和对应的方法,可以针对指令的不同进行不同的处理,为更清晰的描述针对不同指令上述装置、系统和对应方法的工作过程和原理,可以结合以下条款进行理解。For different instructions, the above devices, systems and corresponding methods can be processed differently for different instructions. For a clearer description of the working process and principles of the above instructions, devices and systems for different instructions, the following terms can be understood .
对于计算指令,可以通过下述计算指令生成装置、方法,以及计算指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:For calculation instructions, the following calculation instruction generation devices and methods, as well as calculation instruction processing systems and methods, can be used for processing, and the foregoing can be better understood in accordance with the following clauses:
条款C1、一种计算指令生成装置,所述装置包括:Clause C1, a calculation instruction generating device, the device comprising:
设备确定模块,用于根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备;A device determination module, configured to determine a running device that executes the calculation macro instruction based on the received calculation macro instruction;
指令生成模块,用于根据所述计算宏指令和所述运行设备,生成运行指令,An instruction generation module for generating an operation instruction according to the calculation macro instruction and the operation device,
其中,所述计算宏指令是指用于进行数据计算的宏指令,所述数据计算包括机器学习计算、向量逻辑计算、矩阵向量计算、标量计算和标量逻辑计算中的至少一种,The calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
所述计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The calculation macro instruction includes an operation type, an input address and an output address, and the operation instruction includes the operation type, an operation input address and an operation output address, and the operation input address and the operation output address are based on the input The address and the output address are determined.
条款C2、根据条款C1所述的装置,所述设备确定模块,包括:Clause C2. The apparatus according to Clause C1, the device determination module includes:
第一确定子模块,用于在确定所述计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine the designated device as the designated device when it is determined that the calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction Running equipment,
其中,所述执行条件包括所述指定设备中包含与所述计算宏指令相对应的指令集。Wherein, the execution condition includes that the designated device contains an instruction set corresponding to the calculation macro instruction.
条款C3、根据条款C2所述的装置,所述装置还包括:Clause C3. The device according to Clause C2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述计算宏指令中不包含所述指定设备的标识时,根据接收到的计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述计算宏指令的运行设备,A second determination submodule, configured to determine, from the candidate device, the calculation macro instruction and the resource information of the candidate device according to the received computer macro instruction and the resource information of the candidate device Determine a running device for executing the calculation macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款C4、根据条款C3所述的装置,所述设备确定模块,还包括:Clause C4. The apparatus according to Clause C3, the device determination module further includes:
第三确定子模块,在确定所述计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述计算宏指令的执行条件时,根据所述计算宏指令和所述备选设备的资源信息,确定运行设备。The third determining submodule, when determining that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.
条款C5、根据条款C2-条款C4任一项所述的装置,所述计算宏指令还包含输入量和输出量中的至少一项,Clause C5. The device according to any one of Clauses C2-C4, the calculation macro instruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述计算宏指令的数据量,根据所述计算宏指令的数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is further used to determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款C6、根据条款C5所述的装置,所述指令生成模块,包括:Clause C6. The device according to Clause C5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述计算宏指令拆分成多条运行指令,以使所述运行设备依次执行所述多条运行指令,The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the calculation macro instruction, according to the operating data amount of the operating device and the The amount of data splits the calculation macro instruction into multiple running instructions, so that the running device sequentially executes the multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款C7、根据条款C5所述的装置,所述指令生成模块,包括:Clause C7. The device according to Clause C5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation submodule is used to split the calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation Equipment operating instructions,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款C8、根据条款C1所述的装置,所述装置还包括:Clause C8. The device according to Clause C1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款C9、根据条款C2所述的装置,所述装置还包括:Clause C9. The device according to Clause C2, the device further comprising:
宏指令生成模块,用于接收待执行计算指令,根据确定的指定设备的标识和所述待执行计算指令 生成所述计算宏指令。The macro generation module is used to receive the calculation instruction to be executed, and generate the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.
条款C10、根据条款C1所述的装置,所述装置还包括:Clause C10. The device according to Clause C1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款C11、根据条款C1所述的装置,Clause C11, the device according to Clause C1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述计算宏指令包括以下指令中的至少一种:神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种;The calculation macro instruction includes at least one of the following instructions: at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction;
所述计算宏指令还包含以下选项中的至少一项:用于执行所述计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数。The calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
条款C12、一种机器学习运算装置,所述装置包括:Clause C12. A machine learning computing device, the device comprising:
一个或多个如条款C1-条款C11任一项所述的计算指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more computing instruction generation devices as described in any one of Clause C1-Clause C11, used to obtain data to be calculated and control information from other processing devices, and perform specified machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述计算指令生成装置时,所述多个所述计算指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the calculation instruction generation devices, the plurality of the calculation instruction generation devices may be connected and transmit data through a specific structure;
其中,多个所述计算指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述计算指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述计算指令生成装置共享内存或者拥有各自的内存;多个所述计算指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the calculation instruction generation devices share the same control system or have their own A control system; a plurality of the calculation instruction generating devices share memory or have their own memories; the interconnection mode of the plurality of calculation instruction generating devices is an arbitrary interconnection topology.
条款C13、一种组合处理装置,所述装置包括:Clause C13. A combined processing device, the device comprising:
如条款C12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in Clause C12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款C14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款C12所述的机器学习运算装置或如条款C13所述的组合处理装置;Clause C14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause C12 or a combination described in Article C13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款C15、一种计算指令生成方法,所述方法包括:Clause C15, a method for generating calculation instructions, the method comprising:
根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备;According to the received calculation macro instruction, determine a running device that executes the calculation macro instruction;
根据所述计算宏指令和所述运行设备,生成运行指令,Generate an operation instruction according to the calculation macro instruction and the operation device,
其中,所述计算宏指令是指用于进行数据计算的宏指令,所述数据计算包括机器学习计算、向量逻辑计算、矩阵向量计算、标量计算和标量逻辑计算中的至少一种,The calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
所述计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The calculation macro instruction includes an operation type, an input address and an output address, and the operation instruction includes the operation type, an operation input address and an operation output address, and the operation input address and the operation output address are based on the input The address and the output address are determined.
条款C16、根据条款C15所述的方法,根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备,包括:Clause C16. According to the method described in Clause C15, based on the received calculation macro instruction, determining an operating device that executes the calculation macro instruction includes:
在确定所述计算宏指令包括指定设备的标识,且所述指定设备的资源满足执行所述计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction, determining the designated device as the running device,
其中,所述执行条件包括所述指定设备中包含与所述计算宏指令相对应的指令集。Wherein, the execution condition includes that the designated device contains an instruction set corresponding to the calculation macro instruction.
条款C17、根据条款C16所述的方法,所述方法还包括:Clause C17. The method according to Clause C16, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备,包括:Wherein, according to the received calculation macro instruction, determining a running device that executes the calculation macro instruction includes:
在确定所述计算宏指令中不包含所述指定设备的标识时,根据接收到的计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述计算宏指令的运行设备,When it is determined that the calculation macro instruction does not include the identification of the designated device, based on the received calculation macro instruction and the resource information of the candidate device, determining from the candidate device for performing the calculation Macro operating equipment,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款C18、根据条款C17所述的方法,根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备,包括:Clause C18. According to the method described in Clause C17, based on the received calculation macro instruction, determining the operating device that executes the calculation macro instruction includes:
在确定所述计算宏指令包括所述指定设备的标识,且所述指定设备的资源不满足执行所述计算宏指令的执行条件时,根据所述计算宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the calculation macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the resource information of the calculation macro instruction and the candidate device To determine the operating equipment.
条款C19、根据条款C16-条款C18任一项所述的方法,所述计算宏指令还包含输入量和输出量中的至少一项,Clause C19, the method according to any one of Clause C16-C18, the calculation macro instruction further includes at least one of an input quantity and an output quantity,
根据所述计算宏指令和所述运行设备,生成运行指令,包括:Generating an operation instruction according to the calculation macro instruction and the operation device, including:
确定所述计算宏指令的数据量,根据所述数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount, the calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款C20、根据条款C19所述的方法,根据所述计算宏指令的数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause C20. According to the method described in Clause C19, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction, and the resource information of the operation device includes:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the calculation macro instruction, the calculation macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款C21、根据条款C19所述的方法,根据所述计算宏指令的数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause C21. According to the method described in Clause C19, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction, and the resource information of the operation device includes:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the calculation macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款C22、根据条款C15所述的方法,所述方法还包括:Clause C22. The method according to Clause C15, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款C23、根据条款C16所述的方法,所述方法还包括:Clause C23. The method according to Clause C16, the method further comprising:
接收待执行计算指令,根据确定的指定设备的标识和所述待执行计算指令生成所述计算宏指令。Receiving the calculation instruction to be executed, and generating the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.
条款C24、根据条款C15所述的方法,所述方法还包括:Clause C24. The method according to Clause C15, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款C25、根据条款C15所述的方法,Clause C25, according to the method described in Clause C15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述计算宏指令包括以下指令中的至少一种:神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of the following instructions: at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
所述计算宏指令还包含以下选项中的至少一项:用于执行所述计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数。The calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
条款D1、一种计算指令处理系统,所述系统包括指令生成设备和运行设备,Clause D1, a computing instruction processing system, the system includes an instruction generating device and an operating device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备;A device determination module, configured to determine a running device that executes the calculation macro instruction based on the received calculation macro instruction;
指令生成模块,用于根据所述计算宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to generate an operation instruction according to the calculation macro instruction and the operation device;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述计算宏指令是指用于进行数据计算的宏指令,所述数据计算包括机器学习计算、向量逻辑计算、矩阵向量计算、标量计算和标量逻辑计算中的至少一种,The calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
所述计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The calculation macro instruction includes an operation type, an input address and an output address, and the operation instruction includes the operation type, an operation input address and an operation output address, and the operation input address and the operation output address are based on the input The address and the output address are determined.
条款D2、根据条款D1所述的系统,所述指令生成设备还包括:Clause D2. The system according to Clause D1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to determine the designated device as the designated device when it is determined that the calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction Running equipment
第二确定子模块,用于在确定所述计算宏指令中不包含所述指定设备的标识时,根据接收到的计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述计算宏指令的运行设备;A second determination submodule, configured to determine, from the candidate device, the calculation macro instruction and the resource information of the candidate device according to the received computer macro instruction and the resource information of the candidate device Determine a running device for executing the calculation macro instruction;
第三确定子模块,在确定所述计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述计算宏指令的执行条件时,根据所述计算宏指令和所述备选设备的资源信息,确定运行设备,The third determining submodule, when determining that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
条款D3、根据条款D2所述的系统,所述计算宏指令还包含输入量和输出量中的至少一项,Clause D3. The system according to Clause D2, the calculation macro instruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述计算宏指令的数据量,根据所述计算宏指令的数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is further used to determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款D4、根据条款D3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause D4. The system according to Clause D3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the calculation macro instruction, according to the operating data amount of the operating device and the The amount of data splits the calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation submodule is used to split the calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation Equipment operating instructions,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款D5、根据条款D1所述的系统,所述指令生成设备还包括:Clause D5. The system according to Clause D1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款D6、根据条款D2所述的系统,所述指令生成设备还包括:Clause D6. The system according to Clause D2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行计算指令,根据确定的指定设备的标识和所述待执行计算指令生成所述计算宏指令。The macro generation module is used to receive the calculation instruction to be executed, and generate the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.
条款D7、根据条款D1所述的系统,所述指令生成设备还包括:Clause D7. The system according to Clause D1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款D8、根据条款D1所述的系统,所述运行设备,还包括:Clause D8. The system according to Clause D1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款D9、根据条款D1所述的系统,所述控制模块,包括:Clause D9. The system according to Clause D1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款D10、根据条款D9所述的系统,所述执行模块,包括:Clause D10. The system according to Clause D9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款D11、根据条款D1所述的系统,Clause D11, the system according to Clause D1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述计算宏指令包括以下指令中的至少一种:神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令;The calculation macro instruction includes at least one of the following instructions: neural network calculation macro instruction, vector logic calculation macro instruction, matrix vector calculation macro instruction, scalar calculation macro instruction, and scalar logic calculation macro instruction;
所述计算宏指令包含以下选项中的至少一项:用于执行所述计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数。The calculation macro instruction includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
条款D12、一种机器学习运算装置,所述装置包括:Clause D12. A machine learning computing device, the device comprising:
一个或多个如条款D1-条款D11任一项所述的计算指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more computing instruction processing systems as described in any one of Clause D1-Clause D11, used to obtain data to be calculated and control information from other processing devices, and perform specified machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述计算指令处理系统时,所述多个所述计算指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning computing device includes a plurality of the calculation instruction processing systems, the plurality of calculation instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述计算指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大 规模的机器学习的运算;多个所述计算指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述计算指令处理系统共享内存或者拥有各自的内存;多个所述计算指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the computing instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the computing instruction processing systems share the same control system or have their own A control system; a plurality of the computing instruction processing systems share memory or have their own memories; the interconnection method of the plurality of computing instruction processing systems is any interconnected topology.
条款D13、一种组合处理装置,所述组合处理装置包括:Clause D13. A combined processing device, the combined processing device comprising:
如条款D12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnect interfaces and other processing devices as described in clause D12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款D14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款D12所述的机器学习运算装置或如条款D13所述的组合处理装置;Clause D14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the clause D12 or a combination described in the clause D13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款D15、一种计算指令处理方法,所述方法应用于计算指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause D15. A method for processing calculation instructions. The method is applied to a calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备,并根据所述计算宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, a running device that executes the calculation macro instruction according to the received calculation macro instruction, and generating an operation instruction based on the calculation macro instruction and the running device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述计算宏指令是指用于进行数据计算的宏指令,所述数据计算包括机器学习计算、向量逻辑计算、矩阵向量计算、标量计算和标量逻辑计算中的至少一种,The calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,
所述计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The calculation macro instruction includes an operation type, an input address and an output address, and the operation instruction includes the operation type, an operation input address and an operation output address, and the operation input address and the operation output address are based on the input The address and the output address are determined.
条款D16、根据条款D15所述的方法,所述方法还包括:Clause D16. The method according to Clause D15, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的计算宏指令,确定执行所述计算宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the operating device that executes the calculation macro instruction according to the received calculation macro instruction, including at least one of the following:
在确定所述计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the calculation macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction, the designated device is determined as the running device;
在确定所述计算宏指令中不包含所述指定设备的标识时,根据接收到的计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述计算宏指令的运行设备;When it is determined that the calculation macro instruction does not include the identification of the designated device, based on the received calculation macro instruction and the resource information of the candidate device, determining from the candidate device for performing the calculation Macro operating equipment;
在确定所述计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述计算宏指令的执行条件时,根据所述计算宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the resource of the candidate device Information, determine operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
条款D17、根据条款D16所述的方法,所述计算宏指令还包含输入量和输出量中的至少一项,Clause D17. According to the method of Clause D16, the calculation macro instruction further includes at least one of an input quantity and an output quantity,
根据所述计算宏指令和所述运行设备,生成运行指令,包括:Generating an operation instruction according to the calculation macro instruction and the operation device, including:
确定所述计算宏指令的数据量,根据所述计算宏指令的数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款D18、根据条款D17所述的方法,根据所述计算宏指令的数据量、所述计算宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause D18, according to the method of Clause D17, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the calculation macro instruction, the calculation macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the calculation macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款D19、根据条款D15所述的方法,所述方法还包括:Clause D19. The method according to Clause D15, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款D20、根据条款D16所述的方法,所述方法还包括:Clause D20. The method according to Clause D16, the method further comprising:
通过所述指令生成设备接收待执行计算指令,根据确定的指定设备的标识和所述待执行计算指令生成所述计算宏指令。The calculation instruction to be executed is received by the instruction generation device, and the calculation macro instruction is generated according to the determined identifier of the designated device and the calculation instruction to be executed.
条款D21、根据条款D15所述的方法,所述方法还包括:Clause D21. The method according to Clause D15, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款D22、根据条款D15所述的方法,所述方法还包括:Clause D22. The method according to Clause D15, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款D23、根据条款D15所述的方法,所述方法还包括:Clause D23. The method according to Clause D15, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款D24、根据条款D23所述的方法,所述方法还包括:Clause D24. The method according to Clause D23, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款D25、根据条款D15所述的方法,Clause D25, the method according to Clause D15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述计算宏指令包括以下指令中的至少一种:神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令;The calculation macro instruction includes at least one of the following instructions: neural network calculation macro instruction, vector logic calculation macro instruction, matrix vector calculation macro instruction, scalar calculation macro instruction, and scalar logic calculation macro instruction;
所述计算宏指令还包括以下选项中的至少一项:用于执行所述计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数。The calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.
对于神经网络计算指令,可以通过下述计算指令生成装置、方法,以及神经网络计算指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:Neural network computing instructions can be processed by the following computing instruction generating devices and methods, and neural network computing instruction processing systems, methods and other related products. The foregoing can be better understood according to the following clauses:
条款E1、一种神经网络计算指令生成装置,所述装置包括:Clause E1, a neural network computing instruction generation device, the device comprising:
设备确定模块,用于根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备;A device determination module, configured to determine a running device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction;
指令生成模块,用于根据所述神经网络计算宏指令和所述运行设备,生成运行指令,An instruction generation module, used to calculate macroinstructions and the operation equipment according to the neural network, and generate operation instructions,
其中,所述神经网络计算宏指令是指用于计算神经网络算法的宏指令,Wherein, the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm,
所述神经网络计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The neural network calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address, and the operation input address and the operation output address are respectively The input address and the output address are determined.
条款E2、根据条款E1所述的装置,所述设备确定模块,包括:Clause E2. The apparatus according to Clause E1, the device determination module includes:
第一确定子模块,用于在确定所述神经网络计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述神经网络计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to, when it is determined that the neural network calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the neural network calculation macro instruction, the specified device Determined to be the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述神经网络计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the neural network calculation macro instruction.
条款E3、根据条款E2所述的装置,所述装置还包括:Clause E3. The device according to Clause E2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述神经网络计算宏指令中不包含所述指定设备的标识时,根据接收到的神经网络计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述神经网络计算宏指令的运行设备,The second determining submodule is configured to determine the neural network calculation macro instruction does not include the identifier of the specified device, according to the received neural network calculation macro instruction and the resource information of the candidate device, from the Among the candidate devices, a running device for executing the neural network calculation macro instruction is determined,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款E4、根据条款E3所述的装置,所述设备确定模块,还包括:Clause E4. The apparatus according to Clause E3, the device determination module, further comprising:
第三确定子模块,在确定所述神经网络计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述神经网络计算宏指令的执行条件时,根据所述神经网络计算宏指令和所述备选设备的资源信息,确定运行设备。A third determining submodule, when it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the neural network calculation macro instruction, according to the neural network The network computing macro and the resource information of the candidate device determine the operating device.
条款E5、根据条款E2-条款E4任一项所述的装置,所述神经网络计算宏指令还包含输入量和输出量中的至少一项,Clause E5. The device according to any one of Clause E2- Clause E4, the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述神经网络计算宏指令的数据量,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the neural network calculation macroinstruction, and generate according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device Run instruction,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款E6、根据条款E5所述的装置,所述指令生成模块,包括:Clause E6. The device according to Clause E5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述神经网络计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述神经网络计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the amount of data of the macro network calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the neural network calculation macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款E7、根据条款E5所述的装置,所述指令生成模块,包括:Clause E7. The device according to Clause E5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述神经网络计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the neural network calculation macro instruction according to the operation data amount of each operation device and the data amount when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含 运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款E8、根据条款E1所述的装置,所述装置还包括:Clause E8. The device according to Clause E1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款E9、根据条款E2所述的装置,所述装置还包括:Clause E9. The device according to Clause E2, the device further comprising:
宏指令生成模块,用于接收待执行神经网络计算指令,根据确定的指定设备的标识和所述待执行神经网络计算指令生成所述神经网络计算宏指令。The macro generation module is used to receive the neural network calculation instruction to be executed, and generate the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
条款E10、根据条款E1所述的装置,所述装置还包括:Clause E10. The device according to Clause E1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款E11、根据条款E1所述的装置,Clause E11, the device according to Clause E1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合The running device is one or any combination of CPU, GPU and NPU
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述神经网络计算宏指令包括卷积计算宏指令、池化计算宏指令中的至少一种,The neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,
所述神经网络计算宏指令还包含以下选项中的至少一项:用于执行所述神经网络计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括卷积核的大小、所述卷积核的步长和所述卷积核的填充中的至少一种。The neural network calculation macro instruction also includes at least one of the following options: the identifier, input amount, output amount, operand, and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter including At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
条款E12、一种机器学习运算装置,所述装置包括:Clause E12. A machine learning computing device, the device comprising:
一个或多个如条款E1-条款E11任一项所述的神经网络计算指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more neural network computing instruction generation devices as described in any one of clauses E1 to E11, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述神经网络计算指令生成装置时,所述多个所述神经网络计算指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning computing device includes a plurality of the neural network computing instruction generating devices, the plurality of the neural network computing instruction generating devices may be connected and transmit data through a specific structure;
其中,多个所述神经网络计算指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述神经网络计算指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述神经网络计算指令生成装置共享内存或者拥有各自的内存;多个所述神经网络计算指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the neural network computing instruction generating devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network computing instruction generating devices share the same control system Or have their own control systems; multiple neural network computing instruction generating devices share memory or have their own memories; the interconnection method of multiple neural network computing instruction generating devices is any interconnected topology.
条款E13、一种组合处理装置,所述装置包括:Clause E13. A combined processing device, the device comprising:
如条款E12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause E12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款E14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款E12所述的机器学习运算装置或如条款E13所述的组合处理装置;Clause E14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause E12 or a combination described in Article E13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款E15、一种神经网络计算指令生成方法,所述方法包括:Clause E15. A neural network computing instruction generation method, the method comprising:
根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备;According to the received neural network calculation macroinstruction, determine a running device that executes the neural network calculation macroinstruction;
根据所述神经网络计算宏指令和所述运行设备,生成运行指令,Calculate the macro instruction and the operation device according to the neural network to generate an operation instruction,
其中,所述神经网络计算宏指令是指用于计算神经网络算法的宏指令,Wherein, the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm,
所述神经网络计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The neural network calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address, and the operation input address and the operation output address are respectively The input address and the output address are determined.
条款E16、根据条款E15所述的方法,根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备,包括:Clause E16. According to the method described in Clause E15, according to the received neural network calculation macroinstruction, determining a running device that executes the neural network calculation macroinstruction includes:
在确定所述神经网络计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述神经网络计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the neural network computing macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the neural network computing macro instruction, determining the designated device as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述神经网络计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the neural network calculation macro instruction.
条款E17、根据条款E16所述的方法,所述方法还包括:Clause E17. The method according to Clause E16, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备,包括:Wherein, according to the received neural network calculation macro instruction, determining a running device for executing the neural network calculation macro instruction includes:
在确定所述神经网络计算宏指令中不包含所述指定设备的标识时,根据接收到的神经网络计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述神经网络计算宏指令的运行设备,When it is determined that the identification of the specified device is not included in the neural network calculation macro instruction, based on the received neural network calculation macro instruction and resource information of the candidate device, a determination is made from the candidate device A running device that executes the neural network calculation macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款E18、根据条款E17所述的方法,根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备,包括:Clause E18. According to the method described in Clause E17, based on the received neural network calculation macroinstruction, determining a running device that executes the neural network calculation macroinstruction includes:
在确定所述神经网络计算宏指令包括所述指定设备的标识,且所述指定设备的资源不满足执行所述神经网络计算宏指令的执行条件时,根据所述神经网络计算宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the neural network calculation macro instruction, calculate the macro instruction and the The resource information of the candidate equipment determines the operation equipment.
条款E19、根据条款E16-条款E18任一项所述的方法,所述神经网络计算宏指令还包含输入量和输出量中的至少一项,Clause E19. The method according to any one of Clause E16-Clause E18, the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,
根据所述神经网络计算宏指令和所述运行设备,生成运行指令,包括:According to the neural network calculation macro instruction and the operation device, generating an operation instruction includes:
确定所述神经网络计算宏指令的数据量,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the neural network calculation macro instruction, generate an operation instruction according to the data amount of the neural network calculation macro instruction, the neural network calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款E20、根据条款E19所述的方法,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause E20. According to the method described in Clause E19, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction includes:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述神经网络计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述神经网络计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the neural network calculation macro instruction, the neural network is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款E21、根据条款E19所述的方法,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause E21. According to the method described in Clause E19, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction includes:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述神经网络计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operating devices, the neural network calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款E22、根据条款E15所述的方法,所述方法还包括:Clause E22. The method according to Clause E15, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款E23、根据条款E16所述的方法,所述方法还包括:Clause E23. The method according to Clause E16, the method further comprising:
接收待执行神经网络计算指令,根据确定的指定设备的标识和所述待执行神经网络计算指令生成所述神经网络计算宏指令。The neural network calculation instruction to be executed is received, and the neural network calculation macro instruction is generated according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
条款E24、根据条款E15所述的方法,所述方法还包括:Clause E24. The method according to Clause E15, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款E25、根据条款E15所述的方法,Clause E25, according to the method described in Clause E15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述神经网络计算宏指令包括卷积计算宏指令、池化计算宏指令中的至少一种,The neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,
所述神经网络计算宏指令还包含以下选项中的至少一项:用于执行所述神经网络计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括卷积核的大小、所述卷积核的步长和所述卷积核的填充中的至少一种。The neural network calculation macro instruction also includes at least one of the following options: the identifier, input amount, output amount, operand, and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter including At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
条款F1、一种神经网络计算指令处理系统,所述系统包括指令生成设备和运行设备,Clause F1, a neural network computing instruction processing system, the system includes an instruction generating device and an operating device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备;A device determination module, configured to determine a running device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction;
指令生成模块,用于根据所述神经网络计算宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to calculate a macro instruction and the operation device according to the neural network, and generate an operation instruction;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述神经网络计算宏指令是指用于计算神经网络算法的宏指令,Wherein, the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm,
所述神经网络计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The neural network calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address, and the operation input address and the operation output address are respectively The input address and the output address are determined.
条款F2、根据条款F1所述的系统,所述指令生成设备还包括:Clause F2. The system according to Clause F1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述神经网络计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述神经网络计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to, when it is determined that the neural network calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the neural network calculation macro instruction, the specified device Determined to be the operating device;
第二确定子模块,用于在确定所述神经网络计算宏指令中不包含所述指定设备的标识时,根据接收到的神经网络计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述神经网络计算宏指令的运行设备;The second determining submodule is configured to determine the neural network calculation macro instruction does not include the identifier of the specified device, according to the received neural network calculation macro instruction and the resource information of the candidate device, from the A running device for executing the neural network calculation macro instruction is determined from the candidate devices;
第三确定子模块,在确定所述神经网络计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述神经网络计算宏指令的执行条件时,根据所述神经网络计算宏指令和所述备选设备的资源信息,确定运行设备,A third determining submodule, when it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the neural network calculation macro instruction, based The network calculation macro and the resource information of the candidate device determine the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述神经网络计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the neural network calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
条款F3、根据条款F2所述的系统,所述神经网络计算宏指令还包含输入量和输出量中的至少一项,Clause F3. The system according to Clause F2, wherein the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述神经网络计算宏指令的数据量,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the neural network calculation macroinstruction, and generate according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device Run instruction,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款F4、根据条款F3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause F4. The system according to Clause F3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述神经网络计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述神经网络计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the amount of data of the macro network calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the neural network calculation macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述神经网络计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the neural network calculation macro instruction according to the operation data amount of each operation device and the data amount when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款F5、根据条款F1所述的系统,所述指令生成设备还包括:Clause F5. The system according to Clause F1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款F6、根据条款F2所述的系统,所述指令生成设备还包括:Clause F6. The system according to Clause F2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行神经网络计算指令,根据确定的指定设备的标识和所述待执行神经网络计算指令生成所述神经网络计算宏指令。The macro generation module is used to receive the neural network calculation instruction to be executed, and generate the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
条款F7、根据条款F1所述的系统,所述指令生成设备还包括:Clause F7. The system according to Clause F1, the instruction generation device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款F8、根据条款F1所述的系统,所述运行设备,还包括:Clause F8. The system according to Clause F1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款F9、根据条款F1所述的系统,所述控制模块,包括:Clause F9. The system according to Clause F1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款F10、根据条款F9所述的系统,所述执行模块,包括:Clause F10. The system according to Clause F9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款F11、根据条款F1所述的系统,Clause F11, the system according to Clause F1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述神经网络计算宏指令包括卷积计算宏指令、池化计算宏指令中的至少一种;The neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction;
所述神经网络计算宏指令包含以下选项中的至少一项:用于执行所述神经网络计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括卷积核的大小、所述卷积核的步长和所述卷积核的填充中的至少一种。The neural network calculation macro instruction includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the neural network calculation macro instruction, and the instruction parameter includes a volume At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
条款F12、一种机器学习运算装置,所述装置包括:Clause F12. A machine learning computing device, the device comprising:
一个或多个如条款F1-条款F11任一项所述的神经网络计算指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more neural network computing instruction processing systems as described in any one of clauses F1-F11, which are used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述神经网络计算指令处理系统时,所述多个所述神经网络计算指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning computing device includes multiple neural network computing instruction processing systems, the multiple neural network computing instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述神经网络计算指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述神经网络计算指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述神经网络计算指令处理系统共享内存或者拥有各自的内存;多个所述神经网络计算指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the neural network computing instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network computing instruction processing systems share the same control system Or have their own control systems; a plurality of the neural network computing instruction processing systems share memory or have their own memories; the interconnection method of the plurality of neural network computing instruction processing systems is any interconnected topology.
条款F13、一种组合处理装置,所述组合处理装置包括:Clause F13. A combined processing device, the combined processing device comprising:
如条款F12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause F12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款F14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款F12所述的机器学习运算装置或如条款F13所述的组合处理装置;Clause F14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause F12 or a combination described in the clause F13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款F15、一种神经网络计算指令处理方法,所述方法应用于神经网络计算指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause F15. A neural network computing instruction processing method. The method is applied to a neural network computing instruction processing system. The system includes an instruction generating device and an operating device. The method includes:
通过所述指令生成设备根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备,并根据所述神经网络计算宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, a macro instruction according to the received neural network calculation macro instruction, and executing an instruction to execute the neural network calculation macro instruction, and generating an operation instruction according to the neural network calculation macro instruction and the execution device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述神经网络计算宏指令是指用于计算神经网络算法的宏指令,Wherein, the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm,
所述神经网络计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The neural network calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address, and the operation input address and the operation output address are respectively The input address and the output address are determined.
条款F16、根据条款F15所述的方法,所述方法还包括:Clause F16. The method according to Clause F15, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的神经网络计算宏指令,确定执行所述神经网络计算宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the operating device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction, including at least one of the following:
在确定所述神经网络计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述神经网络计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the neural network computing macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the neural network computing macro instruction, determine the designated device as the running device;
在确定所述神经网络计算宏指令中不包含所述指定设备的标识时,根据接收到的神经网络计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述神经网络计算宏指令的运行设备;When it is determined that the identification of the specified device is not included in the neural network calculation macro instruction, based on the received neural network calculation macro instruction and resource information of the candidate device, a determination is made from the candidate device A running device that executes the neural network calculation macro instruction;
在确定所述神经网络计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述神经网络计算宏指令的执行条件时,根据所述神经网络计算宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the neural network calculation macro instruction, the macro instruction and all instructions are calculated according to the neural network Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述神经网络计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the neural network calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
条款F17、根据条款F16所述的方法,所述神经网络计算宏指令还包含输入量和输出量中的至少一项,Clause F17. According to the method of Clause F16, the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,
根据所述神经网络计算宏指令和所述运行设备,生成运行指令,包括:According to the neural network calculation macro instruction and the operation device, generating an operation instruction includes:
确定所述神经网络计算宏指令的数据量,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the neural network calculation macro instruction, generate an operation instruction according to the data amount of the neural network calculation macro instruction, the neural network calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款F18、根据条款F17所述的方法,根据所述神经网络计算宏指令的数据量、所述神经网络计算宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause F18. According to the method of Clause F17, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述神经网络计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述神经网络计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the neural network calculation macro instruction, the neural network is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述神经网络计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operating devices, the neural network calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款F19、根据条款F15所述的方法,所述方法还包括:Clause F19. The method according to Clause F15, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款F20、根据条款F16所述的方法,所述方法还包括:Clause F20. The method according to Clause F16, the method further comprising:
通过所述指令生成设备接收待执行神经网络计算指令,根据确定的指定设备的标识和所述待执行神经网络计算指令生成所述神经网络计算宏指令。The instruction generating device receives the neural network calculation instruction to be executed, and generates the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.
条款F21、根据条款F15所述的方法,所述方法还包括:Clause F21. The method according to Clause F15, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款F22、根据条款F15所述的方法,所述方法还包括:Clause F22. The method according to Clause F15, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款F23、根据条款F15所述的方法,所述方法还包括:Clause F23. The method according to Clause F15, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款F24、根据条款F23所述的方法,所述方法还包括:Clause F24. The method according to Clause F23, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款F25、根据条款F15所述的方法,Clause F25, according to the method described in Clause F15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述神经网络计算宏指令包括卷积计算宏指令、池化计算宏指令中的至少一种,The neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,
所述神经网络计算宏指令还包括以下选项中的至少一项:用于执行所述神经网络计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括卷积核的大小、所述卷积核的步长和所述卷积核的填充中的至少一种。The neural network calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter includes At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.
对于向量逻辑计算指令,可以通过下述向量逻辑计算指令生成装置、方法,以及向量逻辑计算指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:For vector logic calculation instructions, the following vector logic calculation instruction generation devices and methods, and vector logic calculation instruction processing systems, methods and other related products can be used for processing. The foregoing content can be better understood according to the following clauses:
条款G1、一种向量逻辑计算指令生成装置,所述装置包括:Clause G1, a vector logic calculation instruction generating device, the device comprising:
设备确定模块,用于根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备;A device determination module, configured to determine a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction;
指令生成模块,用于根据所述向量逻辑计算宏指令和所述运行设备,生成运行指令,An instruction generation module, used to calculate a macro instruction and the running device according to the vector logic, and generate a running instruction,
其中,所述向量逻辑计算宏指令是指用于对向量进行逻辑运算的宏指令,Wherein, the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector,
所述向量逻辑计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The vector logic calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款G2、根据条款G1所述的装置,所述设备确定模块,包括:Clause G2. The apparatus according to Clause G1, the device determination module includes:
第一确定子模块,用于在确定所述向量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述向量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to, when it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution condition for executing the vector logic calculation macro instruction, the specified device Determined to be the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述向量逻辑计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the vector logic calculation macro instruction.
条款G3、根据条款G2所述的装置,所述装置还包括:Clause G3. The device according to Clause G2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述向量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的向量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述向量逻辑计算宏指令的运行设备,The second determining submodule is used to determine the vector logic calculation macro instruction and the resource information of the candidate device according to the received vector logic when determining that the vector logic calculation macro instruction does not include the identifier of the specified device, Among the candidate devices, a running device for executing the vector logic calculation macro instruction is determined,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款G4、根据条款G3所述的装置,所述设备确定模块,还包括:Clause G4. The apparatus according to Clause G3, the device determination module, further comprising:
第三确定子模块,在确定所述向量逻辑计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述向量逻辑计算宏指令的执行条件时,根据所述向量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备。A third determining submodule, when it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the vector logic calculation macro instruction, according to the vector Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device.
条款G5、根据条款G2-条款G4任一项所述的装置,所述向量逻辑计算宏指令还包含输入量和输出量中的至少一项,Clause G5. The device according to any one of Clause G2- Clause G4, the vector logic calculation macro instruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述向量逻辑计算宏指令的数据量,根据所述数据量、所述向量 逻辑计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is further used to determine the data amount of the vector logic calculation macro instruction, generate the operation instruction according to the data amount, the vector logic calculation macro instruction and the resource information of the running device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款G6、根据条款G5所述的装置,所述指令生成模块,包括:Clause G6. The apparatus according to Clause G5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述向量逻辑计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述向量逻辑计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used for determining that the number of running devices is one, and the amount of running data of the running device is less than the amount of data of the vector logic calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the vector logic calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款G7、根据条款G5所述的装置,所述指令生成模块,包括:Clause G7. The apparatus according to Clause G5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述向量逻辑计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the vector logic calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款G8、根据条款G1所述的装置,所述装置还包括:Clause G8. The device according to Clause G1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款G9、根据条款G2所述的装置,所述装置还包括:Clause G9. The device according to Clause G2, the device further comprising:
宏指令生成模块,用于接收待执行向量逻辑计算指令,根据确定的指定设备的标识和所述待执行向量逻辑计算指令生成所述向量逻辑计算宏指令。The macro generation module is configured to receive a vector logic calculation instruction to be executed, and generate the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.
条款G10、根据条款G1所述的装置,所述装置还包括:Clause G10. The device according to Clause G1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款G11、根据条款G1所述的装置,Clause G11, the device according to Clause G1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合The running device is one or any combination of CPU, GPU and NPU
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述向量逻辑计算宏指令包括以下指令中的至少一种:向量与计算宏指令、向量或计算宏指令、向量非计算宏指令、向量比较计算宏指令;The vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
所述向量逻辑计算宏指令还包含以下选项中的至少一项:用于执行所述向量逻辑计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.
条款G12、一种机器学习运算装置,所述装置包括:Clause G12, a machine learning computing device, the device comprising:
一个或多个如条款G1-条款G11任一项所述的向量逻辑计算指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more vector logic calculation instruction generating devices as described in any one of clauses G1 to G11, used to obtain data to be operated and control information from other processing apparatuses, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述向量逻辑计算指令生成装置时,所述多个所述向量逻辑计算指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the vector logic calculation instruction generation devices, the plurality of vector logic calculation instruction generation devices can be connected and transmit data through a specific structure;
其中,多个所述向量逻辑计算指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述向量逻辑计算指令生成装置共享同一控制系统或拥有各自 的控制系统;多个所述向量逻辑计算指令生成装置共享内存或者拥有各自的内存;多个所述向量逻辑计算指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the vector logic calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the vector logic calculation instruction generation devices share the same control system Or have their own control systems; a plurality of the vector logic calculation instruction generating devices share memory or have their own memories; the interconnection mode of the plurality of vector logic calculation instruction generating devices is an arbitrary interconnection topology.
条款G13、一种组合处理装置,所述装置包括:Clause G13. A combined processing device, the device comprising:
如条款G12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in Clause G12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款G14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款G12所述的机器学习运算装置或如条款G13所述的组合处理装置;Clause G14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article G12 or a combination described in Article G13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款G15、一种向量逻辑计算指令生成方法,所述方法包括:Clause G15, a vector logic calculation instruction generation method, the method including:
根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备;Determining a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction;
根据所述向量逻辑计算宏指令和所述运行设备,生成运行指令,Calculate the macro instruction and the running device according to the vector logic to generate a running instruction,
其中,所述向量逻辑计算宏指令是指用于对向量进行逻辑运算的宏指令,Wherein, the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector,
所述向量逻辑计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The vector logic calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款G16、根据条款G15所述的方法,根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备,包括:Clause G16. According to the method described in Clause G15, according to the received vector logic calculation macroinstruction, determining a running device that executes the vector logic calculation macroinstruction includes:
在确定所述向量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述向量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the vector logic calculation macro instruction, determining the specified device as the running device,
其中,所述执行条件包括所述指定设备中包含与所述向量逻辑计算宏指令相对应的指令集。Wherein, the execution condition includes that the specified device contains an instruction set corresponding to the vector logic calculation macro instruction.
条款G17、根据条款G16所述的方法,所述方法还包括:Clause G17. The method according to Clause G16, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备,包括:Wherein, according to the received vector logic calculation macro instruction, determining a running device that executes the vector logic calculation macro instruction includes:
在确定所述向量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的向量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述向量逻辑计算宏指令的运行设备,When it is determined that the vector logic calculation macro instruction does not include the identifier of the specified device, the macro instruction and the resource information of the candidate device are calculated according to the received vector logic calculation, and the A running device that executes the vector logic calculation macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款G18、根据条款G17所述的方法,根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备,包括:Clause G18. According to the method described in Clause G17, according to the received vector logic calculation macroinstruction, determining a running device that executes the vector logic calculation macroinstruction includes:
在确定所述向量逻辑计算宏指令包括所述指定设备的标识,且所述指定设备的资源不满足执行所述向量逻辑计算宏指令的执行条件时,根据所述向量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the vector logic calculation macro instruction, the macro instruction and the The resource information of the candidate equipment determines the operation equipment.
条款G19、根据条款G16-条款G18任一项所述的方法,所述向量逻辑计算宏指令还包含输入量和输出量中的至少一项,Clause G19. The method according to any one of Clause G16-G18, the vector logic calculation macro instruction further includes at least one of an input quantity and an output quantity,
根据所述向量逻辑计算宏指令和所述运行设备,生成运行指令,包括:Calculate the macro instruction and the operation device according to the vector logic to generate the operation instruction, including:
确定所述向量逻辑计算宏指令的数据量,根据所述数据量、所述向量逻辑计算宏指令和所述运行设备的资源信息,生成运行指令,Determining the data amount of the vector logic calculation macro instruction, generating the execution instruction according to the data amount, the vector logic calculation macro instruction and the resource information of the running device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款G20、根据条款G19所述的方法,根据所述向量逻辑计算宏指令的数据量、所述向量逻辑计 算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause G20. According to the method described in Clause G19, calculate the data amount of the macro instruction according to the vector logic, the vector logic calculation macro instruction and the resource information of the running device, to generate the running instruction, including:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述向量逻辑计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述向量逻辑计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the vector logic calculation macro instruction, the vector logic is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.
条款G21、根据条款G19所述的方法,根据所述向量逻辑计算宏指令的数据量、所述向量逻辑计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause G21. According to the method described in Clause G19, according to the data amount of the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device, generating the running instruction includes:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述向量逻辑计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of running devices, the vector logic calculation macro instruction is split according to the running data amount of each running device and the data amount, and a running instruction corresponding to each running device is generated,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
条款G22、根据条款G15所述的方法,所述方法还包括:Clause G22. The method according to Clause G15, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款G23、根据条款G16所述的方法,所述方法还包括:Clause G23. The method according to Clause G16, the method further comprising:
接收待执行向量逻辑计算指令,根据确定的指定设备的标识和所述待执行向量逻辑计算指令生成所述向量逻辑计算宏指令。Receiving the vector logic calculation instruction to be executed, and generating the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.
条款G24、根据条款G15所述的方法,所述方法还包括:Clause G24. The method according to Clause G15, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款G25、根据条款G15所述的方法,Clause G25, according to the method described in Clause G15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述向量逻辑计算宏指令包括以下指令中的至少一种:向量与计算宏指令、向量或计算宏指令、向量非计算宏指令、向量比较计算宏指令;The vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
所述向量逻辑计算宏指令还包含以下选项中的至少一项:用于执行所述向量逻辑计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.
条款H1、一种向量逻辑计算指令处理系统,所述系统包括指令生成设备和运行设备,Clause H1, a vector logic calculation instruction processing system, the system includes an instruction generation device and an operation device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备;A device determination module, configured to determine a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction;
指令生成模块,用于根据所述向量逻辑计算宏指令和所述运行设备,生成运行指令;An instruction generation module, used to calculate a macro instruction and the running device according to the vector logic, and generate a running instruction;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述向量逻辑计算宏指令是指用于对向量进行逻辑运算的宏指令,Wherein, the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector,
所述向量逻辑计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The vector logic calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款H2、根据条款H1所述的系统,所述指令生成设备还包括:Clause H2. The system according to Clause H1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述向量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述向量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to, when it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution condition for executing the vector logic calculation macro instruction, the specified device Determined to be the operating device;
第二确定子模块,用于在确定所述向量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的向量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述向量逻辑计算宏指令的运行设备;The second determining submodule is used to determine the vector logic calculation macro instruction and the resource information of the candidate device according to the received vector logic when determining that the vector logic calculation macro instruction does not include the identifier of the specified device, A running device for executing the vector logic calculation macro instruction in the candidate device is determined;
第三确定子模块,在确定所述向量逻辑计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述向量逻辑计算宏指令的执行条件时,根据所述向量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备,A third determining submodule, when it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the vector logic calculation macro instruction, according to the vector Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述向量逻辑计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the vector logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款H3、根据条款H2所述的系统,所述向量逻辑计算宏指令还包含输入量和输出量中的至少一项,Clause H3. The system according to Clause H2, the vector logic calculation macroinstruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述向量逻辑计算宏指令的数据量,根据所述向量逻辑计算宏指令的数据量、所述向量逻辑计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is further used to determine the data amount of the vector logic calculation macro instruction, and generate the data amount according to the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device to generate Run instruction,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款H4、根据条款H3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause H4. The system according to Clause H3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述向量逻辑计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述向量逻辑计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used for determining that the number of running devices is one, and the amount of running data of the running device is less than the amount of data of the vector logic calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the vector logic calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述向量逻辑计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the vector logic calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款H5、根据条款H1所述的系统,所述指令生成设备还包括:Clause H5. The system according to Clause H1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款H6、根据条款H2所述的系统,所述指令生成设备还包括:Clause H6. The system according to Clause H2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行向量逻辑计算指令,根据确定的指定设备的标识和所述待执行向量逻辑计算指令生成所述向量逻辑计算宏指令。The macro generation module is configured to receive a vector logic calculation instruction to be executed, and generate the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.
条款H7、根据条款H1所述的系统,所述指令生成设备还包括:Clause H7. The system according to Clause H1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款H8、根据条款H1所述的系统,所述运行设备,还包括:Clause H8. The system according to Clause H1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款H9、根据条款H1所述的系统,所述控制模块,包括:Clause H9. The system according to Clause H1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款H10、根据条款H9所述的系统,所述执行模块,包括:Clause H10. The system according to Clause H9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款H11、根据条款H1所述的系统,Clause H11, the system according to Clause H1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述向量逻辑计算宏指令包括以下指令中的至少一种:向量与计算宏指令、向量或计算宏指令、向量非计算宏指令、向量比较计算宏指令;The vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
所述向量逻辑计算宏指令还包含以下选项中的至少一项:用于执行所述向量逻辑计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.
条款H12、一种机器学习运算装置,所述装置包括:Clause H12. A machine learning computing device, the device comprising:
一个或多个如条款H1-条款H 11任一项所述的向量逻辑计算指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more vector logic calculation instruction processing systems as described in any one of Clause H1-Clause H11, used to obtain data to be calculated and control information from other processing devices, and perform specified machine learning operations, which will execute the results Passed to other processing devices through the I / O interface;
当所述机器学习运算装置包含多个所述向量逻辑计算指令处理系统时,所述多个所述向量逻辑计算指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the vector logic calculation instruction processing systems, the plurality of vector logic calculation instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述向量逻辑计算指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述向量逻辑计算指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述向量逻辑计算指令处理系统共享内存或者拥有各自的内存;多个所述向量逻辑计算指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the vector logic calculation instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the vector logic calculation instruction processing systems share the same control system Or have their own control systems; multiple of the vector logic computing instruction processing systems share memory or have their own memory; the interconnection method of multiple vector logic computing instruction processing systems is any interconnected topology.
条款H13、一种组合处理装置,所述组合处理装置包括:Clause H13. A combined processing device, the combined processing device comprising:
如条款H12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause H12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款H14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款H12所述的机器学习运算装置或如条款H13所述的组合处理装置;Clause H14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the clause H12 or a combination described in the clause H13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款H15、一种向量逻辑计算指令处理方法,所述方法应用于向量逻辑计算指令处理系统,所述 系统包括指令生成设备和运行设备,所述方法包括:Clause H15. A vector logic calculation instruction processing method. The method is applied to a vector logic calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备,并根据所述向量逻辑计算宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, the macro instruction according to the received vector logic calculation macro, and determining the execution device that executes the vector logic calculation macro instruction, and generating the execution instruction according to the vector logic calculation macro instruction and the execution device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述向量逻辑计算宏指令是指用于对向量进行逻辑运算的宏指令,Wherein, the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector,
所述向量逻辑计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The vector logic calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款H16、根据条款H15所述的方法,所述方法还包括:Clause H16. The method according to Clause H15, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的向量逻辑计算宏指令,确定执行所述向量逻辑计算宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction, including at least one of the following:
在确定所述向量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述向量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the vector logic calculation macro instruction, determine the specified device as the running device;
在确定所述向量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的向量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述向量逻辑计算宏指令的运行设备;When it is determined that the vector logic calculation macro instruction does not include the identifier of the specified device, the macro instruction and the resource information of the candidate device are calculated according to the received vector logic calculation, and the A running device that executes the vector logic calculation macro instruction;
在确定所述向量逻辑计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述向量逻辑计算宏指令的执行条件时,根据所述向量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the vector logic calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述向量逻辑计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the vector logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款H17、根据条款H16所述的方法,所述向量逻辑计算宏指令还包含输入量和输出量中的至少一项,Clause H17. According to the method of Clause H16, the vector logic calculation macroinstruction further includes at least one of an input quantity and an output quantity,
根据所述向量逻辑计算宏指令和所述运行设备,生成运行指令,包括:Calculate the macro instruction and the operation device according to the vector logic to generate the operation instruction, including:
确定所述向量逻辑计算宏指令的数据量,根据所述向量逻辑计算宏指令的数据量、所述向量逻辑计算宏指令和所述运行设备的资源信息,生成运行指令,Determining the data amount of the vector logic calculation macro instruction, generating the execution instruction according to the data amount of the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款H18、根据条款H17所述的方法,根据所述向量逻辑计算宏指令的数据量、所述向量逻辑计算宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause H18. According to the method described in Clause H17, according to the vector logic calculation macro instruction data amount, the vector logic calculation macro instruction, and the resource information of the running device, generating the running instruction, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述向量逻辑计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述向量逻辑计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the vector logic calculation macro instruction, the vector logic is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述向量逻辑计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of running devices, the vector logic calculation macro instruction is split according to the running data amount of each running device and the data amount, and a running instruction corresponding to each running device is generated,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款H19、根据条款H15所述的方法,所述方法还包括:Clause H19. The method according to Clause H15, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款H20、根据条款H16所述的方法,所述方法还包括:Clause H20. The method according to Clause H16, the method further comprising:
通过所述指令生成设备接收待执行向量逻辑计算指令,根据确定的指定设备的标识和所述待执行 向量逻辑计算指令生成所述向量逻辑计算宏指令。The vector logic calculation instruction to be executed is received by the instruction generating device, and the vector logic calculation macro instruction is generated according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.
条款H21、根据条款H15所述的方法,所述方法还包括:Clause H21. The method according to Clause H15, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款H22、根据条款H15所述的方法,所述方法还包括:Clause H22. The method according to Clause H15, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款H23、根据条款H15所述的方法,所述方法还包括:Clause H23. The method according to Clause H15, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款H24、根据条款H23所述的方法,所述方法还包括:Clause H24. The method according to Clause H23, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款H25、根据条款H15所述的方法,Clause H25, the method according to Clause H15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述向量逻辑计算宏指令包括以下指令中的至少一种:向量与计算宏指令、向量或计算宏指令、向量非计算宏指令、向量比较计算宏指令;The vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;
所述向量逻辑计算宏指令还包括以下选项中的至少一项:用于执行所述向量逻辑计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数。The vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the vector logic calculation macro instruction.
对于矩阵向量计算指令,可以通过下述矩阵向量计算指令生成装置、方法,以及矩阵向量计算指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:Matrix vector calculation instructions can be processed by the following matrix vector calculation instruction generation devices and methods, as well as matrix vector calculation instruction processing systems and methods, and related products. The foregoing can be better understood based on the following clauses:
条款I1、一种矩阵向量计算指令生成装置,所述装置包括:Clause I1, a matrix vector calculation instruction generating device, the device comprising:
设备确定模块,用于根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备;A device determination module, configured to calculate a macro instruction based on the received matrix vector, and determine a running device that executes the matrix vector calculation macro instruction;
指令生成模块,用于根据所述矩阵向量计算宏指令和所述运行设备,生成运行指令,An instruction generating module, configured to calculate a macro instruction and the running device according to the matrix vector, and generate a running instruction,
其中,所述矩阵向量计算宏指令是指用于对矩阵和向量进行计算的宏指令,The matrix vector calculation macro instruction refers to a macro instruction used to calculate matrix and vector,
所述矩阵向量计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The matrix vector calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款I2、根据条款I1所述的装置,所述设备确定模块,包括:Clause I2. The apparatus according to Clause I1, the device determination module includes:
第一确定子模块,用于在确定所述矩阵向量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述矩阵向量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to, when it is determined that the matrix vector calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the matrix vector calculation macro instruction, the specified device Determined to be the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述矩阵向量计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the matrix vector calculation macro instruction.
条款I3、根据条款I2所述的装置,所述装置还包括:Clause I3. The device according to Clause I2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述矩阵向量计算宏指令中不包含所述指定设备的标识时,根据接收到的矩阵向量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述矩阵向量计算宏指令的运行设备,The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received matrix vector when it is determined that the matrix vector calculation macro instruction does not include the identifier of the specified device, from the Among the candidate devices, a running device for executing the matrix vector calculation macro instruction is determined,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款I4、根据条款I3所述的装置,所述设备确定模块,还包括:Clause I4. The apparatus according to Clause I3, the device determination module, further comprising:
第三确定子模块,在确定所述矩阵向量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述矩阵向量计算宏指令的执行条件时,根据所述矩阵向量计算宏指令和所述备选设备的资源信息,确定运行设备。A third determining submodule, when it is determined that the matrix vector calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the matrix vector calculation macro instruction, according to the matrix The vector calculation macro instruction and the resource information of the candidate device determine the running device.
条款I5、根据条款I2-条款I4任一项所述的装置,所述矩阵向量计算宏指令还包含输入量和输出量中的至少一项,Clause I5. The device according to any one of Clause I2- Clause I4, the matrix vector calculation macroinstruction further includes at least one of an input quantity and an output quantity,
所述指令生成模块,还用于确定所述矩阵向量计算宏指令的数据量,根据所述数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the matrix vector calculation macro instruction, generate the operation instruction according to the data amount, the matrix vector calculation macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款I6、根据条款I5所述的装置,所述指令生成模块,包括:Clause I6. The apparatus according to Clause I5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述矩阵向量计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述矩阵向量计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the data amount of the matrix vector calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the matrix vector calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款I7、根据条款I5所述的装置,所述指令生成模块,包括:Clause I7. The apparatus according to Clause I5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述矩阵向量计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the matrix vector calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款I8、根据条款I1所述的装置,所述装置还包括:Clause I8. The device according to Clause I1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款I9、根据条款I2所述的装置,所述装置还包括:Clause I9. The device according to Clause I2, the device further comprising:
宏指令生成模块,用于接收待执行矩阵向量计算指令,根据确定的指定设备的标识和所述待执行矩阵向量计算指令生成所述矩阵向量计算宏指令。The macro generation module is configured to receive a matrix vector calculation instruction to be executed, and generate the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.
条款I10、根据条款I1所述的装置,所述装置还包括:Clause I10. The device according to Clause I1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款I11、根据条款I1所述的装置,Clause I11, the device according to Clause I1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合The running device is one or any combination of CPU, GPU and NPU
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述矩阵向量计算宏指令包括以下指令中的至少一种:矩阵乘向量计算宏指令、向量乘矩阵计算宏指令、张量计算宏指令、矩阵相加计算宏指令、矩阵相减计算宏指令;The matrix vector calculation macro instruction includes at least one of the following instructions: matrix multiply vector calculation macro instruction, vector multiply matrix calculation macro instruction, tensor calculation macro instruction, matrix addition calculation macro instruction, and matrix subtraction calculation macro instruction;
所述矩阵向量计算宏指令还包含以下选项中的至少一项:用于执行所述矩阵向量计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The matrix vector calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the matrix vector calculation macro instruction, the instruction parameter includes At least one of the address and length of the second operand.
条款I12、一种机器学习运算装置,所述装置包括:Clause I12. A machine learning computing device, the device comprising:
一个或多个如条款I1-条款I11任一项所述的矩阵向量计算指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more matrix vector calculation instruction generating devices as described in any one of clauses I1-I11, used to obtain data to be operated and control information from other processing apparatuses, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述矩阵向量计算指令生成装置时,所述多个所述矩阵向量计算指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of matrix vector calculation instruction generating devices, the plurality of matrix vector calculation instruction generating devices may be connected and transmit data through a specific structure;
其中,多个所述矩阵向量计算指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述矩阵向量计算指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述矩阵向量计算指令生成装置共享内存或者拥有各自的内存;多个所述矩阵向量计算指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the matrix vector calculation instruction generating devices interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the matrix vector calculation instruction generating devices share the same control system Or have their own control systems; a plurality of the matrix vector calculation instruction generating devices share memory or have their own memory; the interconnection mode of the plurality of matrix vector calculation instruction generating devices is an arbitrary interconnection topology.
条款I13、一种组合处理装置,所述装置包括:Clause I13. A combined processing device, the device comprising:
如条款I12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause I12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款I14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款I12所述的机器学习运算装置或如条款I13所述的组合处理装置;Clause I14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause I12 or a combination described in Clause I13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款I15、一种矩阵向量计算指令生成方法,所述方法包括:Clause I15. A method for generating a matrix vector calculation instruction, the method comprising:
根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备;Determining a macro instruction according to the received matrix vector calculation macro instruction, and determining a running device that executes the matrix vector calculation macro instruction;
根据所述矩阵向量计算宏指令和所述运行设备,生成运行指令,Calculate the macro instruction and the running device according to the matrix vector to generate a running instruction,
其中,所述矩阵向量计算宏指令是指用于对矩阵和向量进行计算的宏指令,The matrix vector calculation macro instruction refers to a macro instruction used to calculate matrix and vector,
所述矩阵向量计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The matrix vector calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款I16、根据条款I15所述的方法,根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备,包括:Clause I16. According to the method described in Clause I15, according to the received matrix vector calculation macroinstruction, determining a running device that executes the matrix vector calculation macroinstruction includes:
在确定所述矩阵向量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述矩阵向量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the matrix vector calculation macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the matrix vector calculation macro instruction, determining the designated device as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述矩阵向量计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the matrix vector calculation macro instruction.
条款I17、根据条款I16所述的方法,所述方法还包括:Clause I17. The method according to Clause I16, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备,包括:Wherein, according to the received matrix vector calculation macro instruction, determining a running device that executes the matrix vector calculation macro instruction includes:
在确定所述矩阵向量计算宏指令中不包含所述指定设备的标识时,根据接收到的矩阵向量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述矩阵向量计算宏指令的运行设备,When it is determined that the matrix vector calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received matrix vector, and determine the A running device that executes the matrix vector calculation macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款I18、根据条款I17所述的方法,根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备,包括:Clause I18. According to the method described in Clause I17, determining a running device that executes the matrix vector calculation macroinstruction according to the received matrix vector calculation macroinstruction includes:
在确定所述矩阵向量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述矩阵向量计算宏指令的执行条件时,根据所述矩阵向量计算宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the matrix vector calculation macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the matrix vector calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.
条款I19、根据条款I16-条款I18任一项所述的方法,所述矩阵向量计算宏指令还包含输入量和输出量中的至少一项,Clause I19. The method according to any one of Clause I16-Clause I18, the matrix vector calculation macro instruction further includes at least one of an input quantity and an output quantity,
根据所述矩阵向量计算宏指令和所述运行设备,生成运行指令,包括:The calculation of the macro instruction and the operation device according to the matrix vector to generate the operation instruction includes:
确定所述矩阵向量计算宏指令的数据量,根据所述数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,Determining the data amount of the matrix vector calculation macro instruction, generating the execution instruction according to the data amount, the matrix vector calculation macro instruction and the resource information of the running device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款I20、根据条款I19所述的方法,根据所述矩阵向量计算宏指令的数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause I20. According to the method described in Clause I19, calculating the data amount of the macro instruction according to the matrix vector, the matrix vector calculating the macro instruction and the resource information of the running device, generating the running instruction, including:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述矩阵向量计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述矩阵向量计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that there is only one running device and the amount of running data of the running device is smaller than the data amount of the matrix vector calculation macro instruction, the matrix vector is calculated according to the running data amount of the running device and the data amount The calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.
条款I21、根据条款I19所述的方法,根据所述矩阵向量计算宏指令的数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause I21. According to the method described in Clause I19, calculating the data amount of the macro instruction according to the matrix vector, the matrix vector calculating the macro instruction and the resource information of the running device, generating the running instruction, including:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述矩阵向量计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operating devices, the matrix vector calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.
条款I22、根据条款I15所述的方法,所述方法还包括:Clause I22. The method according to Clause I15, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款I23、根据条款I16所述的方法,所述方法还包括:Clause I23. The method according to Clause I16, the method further comprising:
接收待执行矩阵向量计算指令,根据确定的指定设备的标识和所述待执行矩阵向量计算指令生成所述矩阵向量计算宏指令。Receiving the matrix vector calculation instruction to be executed, and generating the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.
条款I24、根据条款I15所述的方法,所述方法还包括:Clause I24. The method according to Clause I15, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款I25、根据条款I15所述的方法,Clause I25, the method according to Clause I15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述矩阵向量计算宏指令包括以下指令中的至少一种:矩阵乘向量计算宏指令、向量乘矩阵计算宏指令、张量计算宏指令、矩阵相加计算宏指令、矩阵相减计算宏指令;The matrix vector calculation macro instruction includes at least one of the following instructions: matrix multiply vector calculation macro instruction, vector multiply matrix calculation macro instruction, tensor calculation macro instruction, matrix addition calculation macro instruction, and matrix subtraction calculation macro instruction;
所述矩阵向量计算宏指令还包含以下选项中的至少一项:用于执行所述矩阵向量计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The matrix vector calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the matrix vector calculation macro instruction, the instruction parameter includes At least one of the address and length of the second operand.
条款J1、一种矩阵向量计算指令处理系统,所述系统包括指令生成设备和运行设备,Clause J1, a matrix vector calculation instruction processing system, the system includes an instruction generation device and an operation device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备;A device determination module, configured to calculate a macro instruction based on the received matrix vector, and determine a running device that executes the matrix vector calculation macro instruction;
指令生成模块,用于根据所述矩阵向量计算宏指令和所述运行设备,生成运行指令;An instruction generating module, configured to calculate a macro instruction and the running device according to the matrix vector, and generate a running instruction;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述矩阵向量计算宏指令是指用于对矩阵和向量进行计算的宏指令,The matrix vector calculation macro instruction refers to a macro instruction used to calculate matrix and vector,
所述矩阵向量计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The matrix vector calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款J2、根据条款J1所述的系统,所述指令生成设备还包括:Clause J2. The system according to Clause J1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述矩阵向量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述矩阵向量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to, when it is determined that the matrix vector calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the matrix vector calculation macro instruction, the specified device Determined to be the operating device;
第二确定子模块,用于在确定所述矩阵向量计算宏指令中不包含所述指定设备的标识时,根据接收到的矩阵向量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述矩阵向量计算宏指令的运行设备;The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received matrix vector when it is determined that the matrix vector calculation macro instruction does not include the identifier of the specified device, from the A running device for executing the matrix vector calculation macro instruction is determined from the candidate devices;
第三确定子模块,在确定所述矩阵向量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述矩阵向量计算宏指令的执行条件时,根据所述矩阵向量计算宏指令和所述备选设备的资源信息,确定运行设备,A third determining submodule, when it is determined that the matrix vector calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the matrix vector calculation macro instruction, according to the matrix Vector calculation macro instruction and the resource information of the candidate equipment to determine the operation equipment,
其中,所述执行条件包括:所述指定设备中包含与所述矩阵向量计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the matrix vector calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款J3、根据条款J2所述的系统,所述矩阵向量计算宏指令还包含输入量和输出量中的至少一项,Clause J3. The system according to Clause J2, the matrix vector calculation macroinstruction further includes at least one of input and output,
所述指令生成模块,还用于确定所述矩阵向量计算宏指令的数据量,根据所述矩阵向量计算宏指令的数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is further used to determine the data amount of the matrix vector calculation macro instruction, generate the data amount of the matrix vector calculation macro instruction, the matrix vector calculation macro instruction and the resource information of the running device to generate Run instruction,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款J4、根据条款J3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause J4. The system according to Clause J3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述矩阵向量计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述矩阵向量计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the data amount of the matrix vector calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the matrix vector calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述矩阵向量计算宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the matrix vector calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款J5、根据条款J1所述的系统,所述指令生成设备还包括:Clause J5. The system according to Clause J1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款J6、根据条款J2所述的系统,所述指令生成设备还包括:Clause J6. The system according to Clause J2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行矩阵向量计算指令,根据确定的指定设备的标识和所述待执行矩阵向量计算指令生成所述矩阵向量计算宏指令。The macro generation module is configured to receive a matrix vector calculation instruction to be executed, and generate the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.
条款J7、根据条款J1所述的系统,所述指令生成设备还包括:Clause J7. The system according to Clause J1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款J8、根据条款J1所述的系统,所述运行设备,还包括:Clause J8. The system according to Clause J1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款J9、根据条款J1所述的系统,所述控制模块,包括:Clause J9. The system according to Clause J1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款J10、根据条款J9所述的系统,所述执行模块,包括:Clause J10. The system according to Clause J9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款J11、根据条款J1所述的系统,Clause J11, the system according to Clause J1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述矩阵向量计算宏指令包括以下指令中的至少一种:矩阵乘向量计算宏指令、向量乘矩阵计算宏指令、张量计算宏指令、矩阵相加计算宏指令、矩阵相减计算宏指令;The matrix vector calculation macro instruction includes at least one of the following instructions: matrix multiply vector calculation macro instruction, vector multiply matrix calculation macro instruction, tensor calculation macro instruction, matrix addition calculation macro instruction, and matrix subtraction calculation macro instruction;
所述矩阵向量计算宏指令还包含以下选项中的至少一项:用于执行所述矩阵向量计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The matrix vector calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the matrix vector calculation macro instruction, the instruction parameter includes At least one of the address and length of the second operand.
条款J12、一种机器学习运算装置,所述装置包括:Clause J12. A machine learning computing device, the device comprising:
一个或多个如条款J1-条款J11任一项所述的矩阵向量计算指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more matrix vector calculation instruction processing systems as described in any one of clauses J1 to J11, used to obtain data to be operated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述矩阵向量计算指令处理系统时,所述多个所述矩阵向量计 算指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the matrix vector calculation instruction processing systems, the plurality of matrix vector calculation instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述矩阵向量计算指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述矩阵向量计算指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述矩阵向量计算指令处理系统共享内存或者拥有各自的内存;多个所述矩阵向量计算指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the matrix vector calculation instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the matrix vector calculation instruction processing systems share the same control system Or have their own control systems; multiple of the matrix vector calculation instruction processing systems share memory or have their own memory; the interconnection method of multiple matrix vector calculation instruction processing systems is any interconnected topology.
条款J13、一种组合处理装置,所述组合处理装置包括:Clause J13. A combined processing device, the combined processing device comprising:
如条款J12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause J12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款J14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款J12所述的机器学习运算装置或如条款J13所述的组合处理装置;Clause J14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause J12 or a combination described in Article J13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款J15、一种矩阵向量计算指令处理方法,所述方法应用于矩阵向量计算指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause J15. A matrix vector calculation instruction processing method. The method is applied to a matrix vector calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备,并根据所述矩阵向量计算宏指令和所述运行设备,生成运行指令;The instruction generation device calculates a macro instruction according to the received matrix vector, determines an operation device that executes the matrix vector calculation macro instruction, and generates an operation instruction according to the matrix vector calculation macro instruction and the operation device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述矩阵向量计算宏指令是指用于对矩阵和向量进行计算的宏指令,The matrix vector calculation macro instruction refers to a macro instruction used to calculate matrix and vector,
所述矩阵向量计算宏指令包含操作类型、输入地址和输出地址,所述运行指令包含所述操作类型、运行输入地址和运行输出地址,所述运行输入地址和所述运行输出地址是分别根据所述输入地址、所述输出地址确定的。The matrix vector calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.
条款J16、根据条款J15所述的方法,所述方法还包括:Clause J16. The method according to Clause J15, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的矩阵向量计算宏指令,确定执行所述矩阵向量计算宏指令的运行设备,包括以下至少一项:Wherein, the instruction generation device determining the macro instruction according to the received matrix vector calculation macro instruction to determine the execution device that executes the matrix vector calculation macro instruction includes at least one of the following:
在确定所述矩阵向量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述矩阵向量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the matrix vector calculation macro instruction includes the identification of the designated device, and the resources of the designated device satisfy the execution conditions for executing the matrix vector calculation macro instruction, the designated device is determined as the running device;
在确定所述矩阵向量计算宏指令中不包含所述指定设备的标识时,根据接收到的矩阵向量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述矩阵向量计算宏指令的运行设备;When it is determined that the matrix vector calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received matrix vector, and determine the A running device that executes the matrix vector calculation macro instruction;
在确定所述矩阵向量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述矩阵向量计算宏指令的执行条件时,根据所述矩阵向量计算宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the matrix vector calculation macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the matrix vector calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述矩阵向量计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the matrix vector calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款J17、根据条款J16所述的方法,所述矩阵向量计算宏指令还包含输入量和输出量中的至少一项,Clause J17. According to the method of Clause J16, the matrix vector calculation macro instruction further includes at least one of an input quantity and an output quantity,
根据所述矩阵向量计算宏指令和所述运行设备,生成运行指令,包括:The calculation of the macro instruction and the operation device according to the matrix vector to generate the operation instruction includes:
确定所述矩阵向量计算宏指令的数据量,根据所述矩阵向量计算宏指令的数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,Determining the data amount of the matrix vector calculation macro instruction, generating the execution instruction according to the matrix vector calculation macro instruction data amount, the matrix vector calculation macro instruction and the resource information of the running device,
其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
条款J18、根据条款J17所述的方法,根据所述矩阵向量计算宏指令的数据量、所述矩阵向量计算宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause J18. According to the method described in Clause J17, calculate the data amount of the macro instruction according to the matrix vector, the matrix vector calculation macro instruction and the resource information of the running device, to generate the running instruction, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述矩阵向量计算宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述矩阵向量计算宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that there is only one running device and the amount of running data of the running device is smaller than the data amount of the matrix vector calculation macro instruction, the matrix vector is calculated according to the running data amount of the running device and the data amount The calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述矩阵向量计算宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operating devices, the matrix vector calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.
条款J19、根据条款J15所述的方法,所述方法还包括:Clause J19. The method according to Clause J15, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款J20、根据条款J16所述的方法,所述方法还包括:Clause J20. The method according to Clause J16, the method further comprising:
通过所述指令生成设备接收待执行矩阵向量计算指令,根据确定的指定设备的标识和所述待执行矩阵向量计算指令生成所述矩阵向量计算宏指令。Receiving the matrix vector calculation instruction to be executed by the instruction generating device, and generating the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.
条款J21、根据条款J15所述的方法,所述方法还包括:Clause J21. The method according to Clause J15, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款J22、根据条款J15所述的方法,所述方法还包括:Clause J22. The method according to Clause J15, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款J23、根据条款J15所述的方法,所述方法还包括:Clause J23. The method according to Clause J15, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款J24、根据条款J23所述的方法,所述方法还包括:Clause J24. The method according to Clause J23, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款J25、根据条款J15所述的方法,Clause J25, according to the method described in Clause J15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述矩阵向量计算宏指令包括以下指令中的至少一种:矩阵乘向量计算宏指令、向量乘矩阵计算宏指令、张量计算宏指令、矩阵相加计算宏指令、矩阵相减计算宏指令;The matrix vector calculation macro instruction includes at least one of the following instructions: matrix multiply vector calculation macro instruction, vector multiply matrix calculation macro instruction, tensor calculation macro instruction, matrix addition calculation macro instruction, and matrix subtraction calculation macro instruction;
所述矩阵向量计算宏指令还包括以下选项中的至少一项:用于执行所述矩阵向量计算宏指令的指定设备的标识、输入量、输出量、操作数和指令参数,所述指令参数包括第二个操作数的地址、长度中的至少一种。The matrix vector calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the matrix vector calculation macro instruction, the instruction parameter including At least one of the address and length of the second operand.
对于标量计算指令,可以通过下述标量计算指令生成装置、方法,以及标量计算指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:For scalar calculation instructions, they can be processed by the following scalar calculation instruction generation devices and methods, and scalar calculation instruction processing systems, methods, and other related products. The foregoing can be better understood according to the following clauses:
条款K1、一种标量计算指令生成装置,所述装置包括:Clause K1, a scalar calculation instruction generating device, the device comprising:
设备确定模块,用于根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备;A device determination module, configured to calculate a macro instruction according to the received scalar calculation, and determine a running device that executes the scalar calculation macro instruction;
指令生成模块,用于根据所述标量计算宏指令和所述运行设备,生成运行指令,An instruction generating module, configured to calculate a macro instruction and the running device according to the scalar, and generate a running instruction,
其中,所述标量计算宏指令是指用于对标量进行算术运算的宏指令,Wherein, the scalar calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,
所述标量计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The scalar calculation macro instruction includes the operation type, the first operand, the second operand, and the output address. The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first running operand, the second running operand, and the running output address are determined according to the first operand, the second operand, and the output address, respectively.
条款K2、根据条款K1所述的装置,所述设备确定模块,包括:Clause K2. The apparatus according to Clause K1, the device determination module includes:
第一确定子模块,用于在确定所述标量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine that the designated device is determined as the designated device when the scalar calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the scalar calculation macro instruction The operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述标量计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the scalar calculation macro instruction.
条款K3、根据条款K2所述的装置,所述装置还包括:Clause K3. The device according to Clause K2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述标量计算宏指令中不包含所述指定设备的标识时,根据接收到的标量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量计算宏指令的运行设备,A second determining submodule, configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar when determining that the scalar calculation macro instruction does not include the identification of the specified device A running device for executing the scalar calculation macro instruction is determined in the device,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款K4、根据条款K3所述的装置,所述设备确定模块,还包括:Clause K4. The apparatus according to Clause K3, the device determination module, further comprising:
第三确定子模块,在确定所述标量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量计算宏指令的执行条件时,根据所述标量计算宏指令和所述备选设备的资源信息,确定运行设备。A third determination submodule, when it is determined that the scalar calculation macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the scalar calculation macro instruction, calculate the macro according to the scalar calculation The instruction and the resource information of the candidate device determine the operating device.
条款K5、根据条款K1所述的装置,所述装置还包括:Clause K5. The device according to Clause K1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款K6、根据条款K2所述的装置,所述装置还包括:Clause K6. The device according to Clause K2, the device further comprising:
宏指令生成模块,用于接收待执行标量计算指令,根据确定的指定设备的标识和所述待执行标量计算指令生成所述标量计算宏指令。The macro generation module is configured to receive a scalar calculation instruction to be executed, and generate the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.
条款K7、根据条款K1所述的装置,所述装置还包括:Clause K7. The device according to Clause K1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款K8、根据条款K1所述的装置,Clause K8, the device according to Clause K1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合The running device is one or any combination of CPU, GPU and NPU
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述标量计算宏指令包括以下指令中的至少一种:标量相加计算宏指令、标量相减计算宏指令、标量相乘计算宏指令和标量相除计算宏指令。The scalar calculation macro instruction includes at least one of the following instructions: scalar addition calculation macro instruction, scalar subtraction calculation macro instruction, scalar multiplication calculation macro instruction, and scalar division calculation macro instruction.
条款K9、一种机器学习运算装置,所述装置包括:Clause K9. A machine learning computing device, the device comprising:
一个或多个如条款K1-条款K8任一项所述的标量计算指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more scalar calculation instruction generation devices as described in any one of Clause K1-Clause K8, used to obtain data to be calculated and control information from other processing devices, and perform a specified machine learning operation, and pass the execution result through I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述标量计算指令生成装置时,所述多个所述标量计算指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the scalar calculation instruction generation devices, the plurality of the scalar calculation instruction generation devices may be connected and transmit data through a specific structure;
其中,多个所述标量计算指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述标量计算指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述标量计算指令生成装置共享内存或者拥有各自的内存;多个所述标量计算指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the scalar calculation instruction generation devices interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the scalar calculation instruction generation devices share the same control system or own Respective control systems; a plurality of the scalar calculation instruction generation devices share memory or have their own memories; the interconnection method of the plurality of scalar calculation instruction generation devices is an arbitrary interconnection topology.
条款K10、一种组合处理装置,所述装置包括:Clause K10. A combined processing device, the device comprising:
如条款K9所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in Clause K9;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款K11、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款K9所述的机器学习运算装置或如条款K10所述的组合处理装置;Clause K11, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause K9 or a combination described in Article K10 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款K12、一种标量计算指令生成方法,所述方法包括:Clause K12. A method for generating a scalar calculation instruction, the method comprising:
根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备;Determining a running device that executes the scalar calculation macro instruction according to the received scalar calculation macro instruction;
根据所述标量计算宏指令和所述运行设备,生成运行指令,Calculate the macro instruction and the operation device according to the scalar, and generate an operation instruction,
其中,所述标量计算宏指令是指用于对标量进行算术运算的宏指令,Wherein, the scalar calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,
所述标量计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,The scalar calculation macro instruction includes the operation type, the first operand, the second operand, and the output address,
所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.
条款K13、根据条款K12所述的方法,根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备,包括:Clause K13. According to the method described in Clause K12, based on the received scalar calculation macro instruction, determining the running device that executes the scalar calculation macro instruction includes:
在确定所述标量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the scalar calculation macro instruction includes the identification of the designated device, and the resources of the designated device satisfy the execution conditions for executing the scalar calculation macro instruction, determining the designated device as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述标量计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the scalar calculation macro instruction.
条款K14、根据条款K13所述的方法,所述方法还包括:Clause K14. The method according to Clause K13, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备,包括:Wherein, according to the received scalar calculation macro instruction, determining the running device for executing the scalar calculation macro instruction includes:
在确定所述标量计算宏指令中不包含所述指定设备的标识时,根据接收到的标量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量计算宏指令的运行设备,When it is determined that the identifier of the designated device is not included in the scalar calculation macro instruction, based on the received scalar calculation macro instruction and the resource information of the candidate device, the candidate device is determined for execution Describe the scalar calculation macro operation equipment,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款K15、根据条款K14所述的方法,根据接收到的标量计算宏指令,确定执行所述标量计算宏 指令的运行设备,包括:Clause K15. According to the method described in Clause K14, according to the received scalar calculation macro instruction, determine the running device that executes the scalar calculation macro instruction, including:
在确定所述标量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量计算宏指令的执行条件时,根据所述标量计算宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the scalar calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar calculation macro instruction, the macro instruction and the alternative are calculated according to the scalar calculation macro instruction The resource information of the equipment determines the operating equipment.
条款K16、根据条款K12所述的方法,所述方法还包括:Clause K16. The method according to Clause K12, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款K17、根据条款K13所述的方法,所述方法还包括:Clause K17. The method according to Clause K13, the method further comprising:
接收待执行标量计算指令,根据确定的指定设备的标识和所述待执行标量计算指令生成所述标量计算宏指令。Receiving a scalar calculation instruction to be executed, and generating the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.
条款K18、根据条款K12所述的方法,所述方法还包括:Clause K18. The method according to Clause K12, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款K19、根据条款K12所述的方法,Clause K19, according to the method described in Clause K12,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述标量计算宏指令包括以下指令中的至少一种:标量相加计算宏指令、标量相减计算宏指令、标量相乘计算宏指令和标量相除计算宏指令。The scalar calculation macro instruction includes at least one of the following instructions: scalar addition calculation macro instruction, scalar subtraction calculation macro instruction, scalar multiplication calculation macro instruction, and scalar division calculation macro instruction.
条款L1、一种标量计算指令处理系统,所述系统包括指令生成设备和运行设备,Clause L1, a scalar calculation instruction processing system, the system includes an instruction generation device and an operation device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备;A device determination module, configured to calculate a macro instruction according to the received scalar calculation, and determine a running device that executes the scalar calculation macro instruction;
指令生成模块,用于根据所述标量计算宏指令和所述运行设备,生成运行指令;An instruction generating module, configured to calculate a macro instruction and the running device according to the scalar, and generate a running instruction;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述标量计算宏指令是指用于对标量进行算术运算的宏指令,Wherein, the scalar calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,
所述标量计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The scalar calculation macro instruction includes the operation type, the first operand, the second operand, and the output address. The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first running operand, the second running operand, and the running output address are determined according to the first operand, the second operand, and the output address, respectively.
条款L2、根据条款L1所述的系统,所述指令生成设备还包括:Clause L2. The system according to Clause L1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述标量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to determine that the designated device is determined as the designated device when the scalar calculation macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the scalar calculation macro instruction The operating equipment;
第二确定子模块,用于在确定所述标量计算宏指令中不包含所述指定设备的标识时,根据接收到的标量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量计算宏指令的运行设备;A second determining submodule, configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar when determining that the scalar calculation macro instruction does not include the identification of the specified device A running device for executing the scalar calculation macro instruction is determined in the device;
第三确定子模块,在确定所述标量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量计算宏指令的执行条件时,根据所述标量计算宏指令和所述备选设备的资源信息,确定运行设备,A third determination submodule, when it is determined that the scalar calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the scalar calculation macro instruction, calculate the macro according to the scalar calculation macro instruction Instruction and resource information of the candidate device, to determine the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述标量计算宏指令相对应的指令集,所述资 源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar calculation macro instruction, and the resource information includes an instruction set included in the candidate device.
条款L3、根据条款L1所述的系统,所述指令生成设备还包括:Clause L3. The system according to Clause L1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款L4、根据条款L2所述的系统,所述指令生成设备还包括:Clause L4. The system according to Clause L2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行标量计算指令,根据确定的指定设备的标识和所述待执行标量计算指令生成所述标量计算宏指令。The macro generation module is configured to receive a scalar calculation instruction to be executed, and generate the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.
条款L5、根据条款L1所述的系统,所述指令生成设备还包括:Clause L5. The system according to Clause L1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款L6、根据条款L1所述的系统,所述运行设备,还包括:Clause L6. The system according to Clause L1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款L7、根据条款L1所述的系统,所述控制模块,包括:Clause L7. The system according to Clause L1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款L8、根据条款L7所述的系统,所述执行模块,包括:Clause L8. The system according to Clause L7, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款L9、根据条款L1所述的系统,Clause L9, the system according to Clause L1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述标量计算宏指令包括以下指令中的至少一种:标量相加计算宏指令、标量相减计算宏指令、标量相乘计算宏指令和标量相除计算宏指令。The scalar calculation macro instruction includes at least one of the following instructions: scalar addition calculation macro instruction, scalar subtraction calculation macro instruction, scalar multiplication calculation macro instruction, and scalar division calculation macro instruction.
条款L10、一种机器学习运算装置,所述装置包括:Clause L10. A machine learning computing device, the device comprising:
一个或多个如条款L1-条款L9任一项所述的标量计算指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more scalar calculation instruction processing systems as described in any one of Clause L1-Clause L9, used to obtain data and control information to be calculated from other processing devices, and perform designated machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述标量计算指令处理系统时,所述多个所述标量计算指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the scalar calculation instruction processing systems, the plurality of the scalar calculation instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述标量计算指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述标量计算指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述标量计算指令处理系统共享内存或者拥有各自的内存;多个所述标量计算指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the scalar calculation instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the scalar calculation instruction processing systems share the same control system or own Respective control systems; a plurality of the scalar calculation instruction processing systems share memory or have respective memories; the interconnection method of the plurality of scalar calculation instruction processing systems is an arbitrary interconnection topology.
条款L11、一种组合处理装置,所述组合处理装置包括:Clause L11. A combined processing device, the combined processing device comprising:
如条款L110所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause L110;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款L12、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款L10所述的机器学习运算装置或如条款L13所述的组合处理装置;Clause L12, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause L10 or a combination described in Article L13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款L13、一种标量计算指令处理方法,所述方法应用于标量计算指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause L13. A scalar calculation instruction processing method. The method is applied to a scalar calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备,并根据所述标量计算宏指令和所述运行设备,生成运行指令;Determining, by the instruction generation device, a macro instruction according to the received scalar calculation macro, to determine an operation device that executes the scalar calculation macro instruction, and generating an operation instruction according to the scalar calculation macro instruction and the execution device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述标量计算宏指令是指用于对标量进行算术运算的宏指令,Wherein, the scalar calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,
所述标量计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,The scalar calculation macro instruction includes the operation type, the first operand, the second operand, and the output address,
所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.
条款L14、根据条款L13所述的方法,所述方法还包括:Clause L14. The method according to Clause L13, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的标量计算宏指令,确定执行所述标量计算宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the running device that executes the scalar calculation macro instruction according to the received scalar calculation macro instruction, including at least one of the following:
在确定所述标量计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the scalar calculation macro instruction includes the identification of the designated device, and the resource of the designated device satisfies the execution conditions for executing the scalar calculation macro instruction, the designated device is determined as the running device;
在确定所述标量计算宏指令中不包含所述指定设备的标识时,根据接收到的标量计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量计算宏指令的运行设备;When it is determined that the identifier of the designated device is not included in the scalar calculation macro instruction, based on the received scalar calculation macro instruction and the resource information of the candidate device, the candidate device is determined for execution Describe the scalar calculation macro operation equipment;
在确定所述标量计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量计算宏指令的执行条件时,根据所述标量计算宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the scalar calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar calculation macro instruction, the macro instruction and the alternative are calculated according to the scalar calculation macro instruction Equipment resource information, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述标量计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款L15、根据条款L13所述的方法,所述方法还包括:Clause L15. The method according to Clause L13, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款L16、根据条款L14所述的方法,所述方法还包括:Clause L16. The method according to Clause L14, the method further comprising:
通过所述指令生成设备接收待执行标量计算指令,根据确定的指定设备的标识和所述待执行标量计算指令生成所述标量计算宏指令。The instruction generating device receives the scalar calculation instruction to be executed, and generates the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.
条款L17、根据条款L13所述的方法,所述方法还包括:Clause L17. The method according to Clause L13, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款L18、根据条款L13所述的方法,所述方法还包括:Clause L18. The method according to Clause L13, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款L19、根据条款L13所述的方法,所述方法还包括:Clause L19. The method according to Clause L13, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款L20、根据条款L19所述的方法,所述方法还包括:Clause L20. The method according to Clause L19, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款L21、根据条款L13所述的方法,Clause L21, according to the method described in Clause L13,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述标量计算宏指令包括以下指令中的至少一种:标量相加计算宏指令、标量相减计算宏指令、标量相乘计算宏指令和标量相除计算宏指令。The scalar calculation macro instruction includes at least one of the following instructions: scalar addition calculation macro instruction, scalar subtraction calculation macro instruction, scalar multiplication calculation macro instruction, and scalar division calculation macro instruction.
对于标量逻辑计算指令,可以通过下述标量逻辑计算指令生成装置、方法,以及标量逻辑计算指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:For scalar logic calculation instructions, the following scalar logic calculation instruction generation devices and methods, and scalar logic calculation instruction processing systems, methods and other related products can be used for processing, and the foregoing content can be better understood according to the following clauses:
条款M1、一种标量逻辑计算指令生成装置,所述装置包括:Clause M1, a scalar logic calculation instruction generating device, the device comprising:
设备确定模块,用于根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备;A device determination module, configured to calculate a macro instruction according to the received scalar logic, and determine a running device that executes the scalar logic calculation macro instruction;
指令生成模块,用于根据所述标量逻辑计算宏指令和所述运行设备,生成运行指令,An instruction generation module, used to calculate a macro instruction and the running device according to the scalar logic, and generate a running instruction,
其中,所述标量逻辑计算宏指令是指用于对标量进行逻辑运算的宏指令,Wherein, the scalar logic calculation macro instruction refers to a macro instruction used to perform logical operation on the scalar,
所述标量逻辑计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,The scalar logic calculation macro instruction includes operation type, first operand, second operand and output address,
所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.
条款M2、根据条款M1所述的装置,所述设备确定模块,包括:Clause M2. The apparatus according to Clause M1, the device determination module includes:
第一确定子模块,用于在确定所述标量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to, when it is determined that the scalar logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the scalar logic calculation macro instruction, the specified device Determined to be the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述标量逻辑计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the scalar logic calculation macro instruction.
条款M3、根据条款M2所述的装置,所述装置还包括:Clause M3. The device according to Clause M2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述标量逻辑计算宏指令中不包含所述指定设备的标识时,根据接 收到的标量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量逻辑计算宏指令的运行设备,The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar logic when it is determined that the scalar logic calculation macro instruction does not include the identification of the specified device. Among the candidate devices, a running device for executing the scalar logic calculation macro instruction is determined,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款M4、根据条款M3所述的装置,所述设备确定模块,还包括:Clause M4. The apparatus according to Clause M3, the device determination module further includes:
第三确定子模块,在确定所述标量逻辑计算宏指令中包含包括所述指定设备的标识,且所述指定设备的资源不满足执行所述标量逻辑计算宏指令的执行条件时,根据所述标量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备。The third determining submodule, when determining that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar logic calculation macro instruction, according to the The scalar logic calculates the macro instruction and the resource information of the candidate device to determine the running device.
条款M5、根据条款M1所述的装置,所述装置还包括:Clause M5. The device according to Clause M1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款M6、根据条款M2所述的装置,所述装置还包括:Clause M6. The device according to Clause M2, the device further comprising:
宏指令生成模块,用于接收待执行标量逻辑计算指令,根据确定的指定设备的标识和所述待执行标量逻辑计算指令生成所述标量逻辑计算宏指令。The macro generation module is used for receiving a scalar logic calculation instruction to be executed, and generating the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.
条款M7、根据条款M1所述的装置,所述装置还包括:Clause M7. The device according to Clause M1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款M8、根据条款M1所述的装置,Clause M8, the device according to Clause M1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合The running device is one or any combination of CPU, GPU and NPU
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述标量逻辑计算宏指令包括以下指令中的至少一种:标量与计算宏指令、标量或计算宏指令、标量非计算宏指令、标量比较计算宏指令。The scalar logic calculation macro instruction includes at least one of the following instructions: a scalar and calculation macro instruction, a scalar or calculation macro instruction, a scalar non-calculation macro instruction, and a scalar comparison calculation macro instruction.
条款M9、一种机器学习运算装置,所述装置包括:Clause M9. A machine learning computing device, the device comprising:
一个或多个如条款M1-条款M8任一项所述的标量逻辑计算指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more scalar logic calculation instruction generation devices as described in any one of clauses M1 to M8, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述标量逻辑计算指令生成装置时,所述多个所述标量逻辑计算指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the scalar logic calculation instruction generation devices, the plurality of the scalar logic calculation instruction generation devices can be connected and transmit data through a specific structure;
其中,多个所述标量逻辑计算指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述标量逻辑计算指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述标量逻辑计算指令生成装置共享内存或者拥有各自的内存;多个所述标量逻辑计算指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the scalar logic calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the scalar logic calculation instruction generation devices share the same control system Or have their own control systems; multiple scalar logic calculation instruction generating devices share memory or have their own memory; multiple scalar logic calculation instruction generating devices are interconnected in an arbitrary interconnection topology.
条款M10、一种组合处理装置,所述装置包括:Clause M10. A combined processing device, the device comprising:
如条款M9所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnect interfaces and other processing devices as described in clause M9;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述的组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device .
条款M11、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款M9所述的机器学习运算装置或如条款M10所述的组合处理装置;Clause M11, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause M9 or a combination described in the clause M10 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款M12、一种标量逻辑计算指令生成方法,所述方法包括:Clause M12, a method for generating a scalar logic calculation instruction, the method comprising:
根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备;According to the received scalar logic calculation macro instruction, determine the running device that executes the scalar logic calculation macro instruction;
根据所述标量逻辑计算宏指令和所述运行设备,生成运行指令,Calculate the macro instruction and the operation device according to the scalar logic, and generate an operation instruction,
其中,所述标量逻辑计算宏指令是指用于对标量进行逻辑运算的宏指令,所述标量逻辑计算宏指令包括操作类型、第一操作数、第二操作数和输出地址,Wherein, the scalar logic calculation macro instruction refers to a macro instruction for performing logical operation on a scalar, the scalar logic calculation macro instruction includes an operation type, a first operand, a second operand and an output address,
所述运行指令包括所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.
条款M13、根据条款M12所述的方法,根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备,包括:Clause M13. According to the method described in Clause M12, according to the received scalar logic calculation macroinstruction, determining a running device that executes the scalar logic calculation macroinstruction includes:
在确定所述标量逻辑计算宏指令中包含包括指定设备的标识,且所述指定设备的资源满足执行所述标量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the scalar logic calculation macro instruction includes the identification of the designated device, and the resources of the designated device satisfy the execution conditions for executing the scalar logic calculation macro instruction, determining the designated device as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述标量逻辑计算宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the scalar logic calculation macro instruction.
条款M14、根据条款M13所述的方法,所述方法还包括:Clause M14. The method according to Clause M13, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备,包括:Wherein, according to the received scalar logic calculation macro instruction, determining a running device that executes the scalar logic calculation macro instruction includes:
在确定所述标量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的标量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量逻辑计算宏指令的运行设备,When it is determined that the scalar logic calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received scalar logic calculation, and determine the A running device that executes the scalar logic calculation macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款M15、根据条款M14所述的方法,根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备,包括:Clause M15. According to the method described in Clause M14, according to the received scalar logic calculation macro instruction, determining a running device that executes the scalar logic calculation macro instruction includes:
在确定所述标量逻辑计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量逻辑计算宏指令的执行条件时,根据所述标量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the scalar logic calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.
条款M16、根据条款M12所述的方法,所述方法还包括:Clause M16. The method according to Clause M12, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款M17、根据条款M13所述的方法,所述方法还包括:Clause M17. The method according to Clause M13, the method further comprising:
接收待执行标量逻辑计算指令,根据确定的指定设备的标识和所述待执行标量逻辑计算指令生成所述标量逻辑计算宏指令。Receiving a scalar logic calculation instruction to be executed, and generating the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.
条款M18、根据条款M12所述的方法,所述方法还包括:Clause M18. The method according to Clause M12, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款M19、根据条款M12所述的方法,Clause M19, according to the method described in Clause M12,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述标量逻辑计算宏指令包括以下指令中的至少一种:标量与计算宏指令、标量或计算宏指令、标量非计算宏指令、标量比较计算宏指令。The scalar logic calculation macro instruction includes at least one of the following instructions: a scalar and calculation macro instruction, a scalar or calculation macro instruction, a scalar non-calculation macro instruction, and a scalar comparison calculation macro instruction.
条款N1、一种标量逻辑计算指令处理系统,所述系统包括指令生成设备和运行设备,Clause N1, a scalar logic calculation instruction processing system, the system includes an instruction generation device and an operation device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备;A device determination module, configured to calculate a macro instruction according to the received scalar logic, and determine a running device that executes the scalar logic calculation macro instruction;
指令生成模块,用于根据所述标量逻辑计算宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to calculate a macro instruction and the operation device according to the scalar logic, and generate an operation instruction;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述标量逻辑计算宏指令是指用于对标量进行逻辑运算的宏指令,Wherein, the scalar logic calculation macro instruction refers to a macro instruction used to perform logical operation on the scalar,
所述标量逻辑计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,The scalar logic calculation macro instruction includes operation type, first operand, second operand and output address,
所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.
条款N2、根据条款N1所述的系统,所述指令生成设备还包括:Clause N2. The system according to Clause N1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述标量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to, when it is determined that the scalar logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the scalar logic calculation macro instruction, the specified device Determined to be the operating device;
第二确定子模块,用于在确定所述标量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的标量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量逻辑计算宏指令的运行设备;The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar logic when it is determined that the scalar logic calculation macro instruction does not include the identification of the specified device. A running device for executing the scalar logic calculation macro instruction in the candidate device is determined;
第三确定子模块,在确定所述标量逻辑计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量逻辑计算宏指令的执行条件时,根据所述标量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备,A third determination submodule, when it is determined that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar logic calculation macro instruction, based on the scalar Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device,
其中,所述执行条件包括:所述指定设备中包含与所述标量逻辑计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款N3、根据条款N1所述的系统,所述指令生成设备还包括:Clause N3. The system according to Clause N1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款N4、根据条款N2所述的系统,所述指令生成设备还包括:Clause N4. The system according to Clause N2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行标量逻辑计算指令,根据确定的指定设备的标识和所述待执行标量逻辑计算指令生成所述标量逻辑计算宏指令。The macro generation module is used for receiving a scalar logic calculation instruction to be executed, and generating the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.
条款N5、根据条款N1所述的系统,所述指令生成设备还包括:Clause N5. The system according to Clause N1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款N6、根据条款N1所述的系统,所述运行设备,还包括:Clause N6. The system according to Clause N1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款N7、根据条款N1所述的系统,所述控制模块,包括:Clause N7. The system according to Clause N1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款N8、根据条款N7所述的系统,所述执行模块,包括:Clause N8. The system according to Clause N7, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款N9、根据条款N1所述的系统,Clause N9, the system according to Clause N1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述标量逻辑计算宏指令包括以下指令中的至少一种:标量与计算宏指令、标量或计算宏指令、标量非计算宏指令、标量比较计算宏指令。The scalar logic calculation macro instruction includes at least one of the following instructions: a scalar and calculation macro instruction, a scalar or calculation macro instruction, a scalar non-calculation macro instruction, and a scalar comparison calculation macro instruction.
条款N10、一种机器学习运算装置,所述装置包括:Clause N10. A machine learning computing device, the device comprising:
一个或多个如条款N1-条款N9任一项所述的标量逻辑计算指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more scalar logic calculation instruction processing systems as described in any one of clauses N1 to N9, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述标量逻辑计算指令处理系统时,所述多个所述标量逻辑计算指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the scalar logic calculation instruction processing systems, the plurality of scalar logic calculation instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述标量逻辑计算指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述标量逻辑计算指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述标量逻辑计算指令处理系统共享内存或者拥有各自的内存;多个所述标量逻辑计算指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the scalar logic computing instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the scalar logic computing instruction processing systems share the same control system Or have their own control systems; multiple scalar logic calculation instruction processing systems share memory or have their own memory; multiple scalar logic calculation instruction processing systems are interconnected in an arbitrary interconnection topology.
条款N11、一种组合处理装置,所述组合处理装置包括:Clause N11. A combined processing device, the combined processing device comprising:
如条款N110所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause N110;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款N12、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款N10所述的机器学习运算装置或如条款N13所述的组合处理装置;Clause N12, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the article N10 or a combination described in the article N13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款N13、一种标量逻辑计算指令处理方法,所述方法应用于标量逻辑计算指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause N13. A scalar logic calculation instruction processing method. The method is applied to a scalar logic calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备,并根据所述标量逻辑计算宏指令和所述运行设备,生成运行指令;The instruction generation device calculates a macro instruction according to the received scalar logic, determines an operation device that executes the scalar logic calculation macro instruction, and calculates the macro instruction and the operation device according to the scalar logic to generate an operation instruction;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述标量逻辑计算宏指令是指用于对标量进行算术运算的宏指令,Wherein, the scalar logic calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,
所述标量逻辑计算宏指令包含操作类型、第一操作数、第二操作数和输出地址,The scalar logic calculation macro instruction includes operation type, first operand, second operand and output address,
所述运行指令包含所述操作类型、第一运行操作数、第二运行操作数和运行输出地址,所述第一 运行操作数、所述第二运行操作数和所述运行输出地址是分别根据所述第一操作数、所述第二操作数和所述输出地址确定的。The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.
条款N14、根据条款N13所述的方法,所述方法还包括:Clause N14. The method according to Clause N13, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的标量逻辑计算宏指令,确定执行所述标量逻辑计算宏指令的运行设备,包括以下至少一项:Wherein, the instruction generation device determines the operation device that executes the scalar logic calculation macro instruction according to the received scalar logic calculation macro instruction, including at least one of the following:
在确定所述标量逻辑计算宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述标量逻辑计算宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the scalar logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the scalar logic calculation macro instruction, determine the specified device as the running device;
在确定所述标量逻辑计算宏指令中不包含所述指定设备的标识时,根据接收到的标量逻辑计算宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述标量逻辑计算宏指令的运行设备;When it is determined that the scalar logic calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received scalar logic calculation, and determine the A running device that executes the scalar logic calculation macro instruction;
在确定所述标量逻辑计算宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述标量逻辑计算宏指令的执行条件时,根据所述标量逻辑计算宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the scalar logic calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述标量逻辑计算宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.
条款N15、根据条款N13所述的方法,所述方法还包括:Clause N15. The method according to Clause N13, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款N16、根据条款N14所述的方法,所述方法还包括:Clause N16. The method according to Clause N14, the method further comprising:
通过所述指令生成设备接收待执行标量逻辑计算指令,根据确定的指定设备的标识和所述待执行标量逻辑计算指令生成所述标量逻辑计算宏指令。The instruction generating device receives the scalar logic calculation instruction to be executed, and generates the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.
条款N17、根据条款N13所述的方法,所述方法还包括:Clause N17. The method according to Clause N13, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款N18、根据条款N13所述的方法,所述方法还包括:Clause N18. The method according to Clause N13, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款N19、根据条款N13所述的方法,所述方法还包括:Clause N19. The method according to Clause N13, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款N20、根据条款N19所述的方法,所述方法还包括:Clause N20. The method according to Clause N19, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款N21、根据条款N13所述的方法,Clause N21, according to the method described in Clause N13,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述标量逻辑计算宏指令包括以下指令中的至少一种:标量与计算宏指令、标量或计算宏指令、标量非计算宏指令、标量比较计算宏指令。The scalar logic calculation macro instruction includes at least one of the following instructions: a scalar and calculation macro instruction, a scalar or calculation macro instruction, a scalar non-calculation macro instruction, and a scalar comparison calculation macro instruction.
对于控制指令,可以通过下述控制指令生成装置、方法,以及控制指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:The control instructions can be processed by the following control instruction generation devices and methods, as well as control instruction processing systems and methods, and related products. The foregoing can be better understood according to the following clauses:
条款O1、一种控制指令生成装置,所述装置包括:Clause O1, a control instruction generating device, the device comprising:
设备确定模块,用于根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备;A device determination module, configured to determine a running device that executes the control macro instruction according to the received control macro instruction;
指令生成模块,用于根据所述控制宏指令和所述运行设备,生成运行指令,An instruction generation module, configured to generate an operation instruction according to the control macro instruction and the operation device,
其中,所述控制宏指令是指用于控制指令流跳转至目标跳转位置的宏指令,Wherein, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position,
所述控制宏指令包含操作类型和目标跳转位置,所述运行指令包含所述操作类型和运行目标跳转位置,所述运行目标跳转位置是根据所述目标跳转位置确定的。The control macro instruction includes an operation type and a target jump position, and the run instruction includes the operation type and a run target jump position, and the run target jump position is determined according to the target jump position.
条款O2、根据条款O1所述的装置,所述设备确定模块,包括:Clause O2. The apparatus according to Clause O1, the device determination module includes:
第一确定子模块,用于在确定所述控制宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述控制宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine the designated device as the control device when it is determined that the control macro instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the control macro instruction Running equipment,
其中,所述执行条件包括:所述指定设备中包含与所述控制宏指令的操作类型相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the operation type of the control macro instruction.
条款O3、根据条款O2所述的装置,所述装置还包括:Clause O3. The device according to Clause O2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述控制宏指令中不包含所述指定设备的标识时,根据接收到的控制宏指令的操作类型和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述控制宏指令的运行设备,The second determining sub-module is used to determine from the backup device according to the operation type of the received control macro instruction and the resource information of the candidate device when it is determined that the control macro instruction does not include the identification of the designated device The operating device for executing the control macro instruction is selected from the selected devices,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款O4、根据条款O3所述的装置,所述设备确定模块,还包括:Clause O4. The apparatus according to Clause O3, the device determination module further includes:
第三确定子模块,在确定所述控制宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述控制宏指令的执行条件时,根据所述控制宏指令和所述备选设备的资源信息,确定运行设备。A third determination submodule, when it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.
条款O5、根据条款O1所述的装置,所述装置还包括:Clause O5. The device according to Clause O1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款O6、根据条款O2所述的装置,所述装置还包括:Clause O6. The device according to Clause O2, the device further comprising:
宏指令生成模块,用于接收待执行控制指令,根据确定的指定设备的标识和所述待执行控制指令生成所述控制宏指令。The macro generation module is used for receiving the control instruction to be executed, and generating the control macro instruction according to the determined identifier of the designated device and the control instruction to be executed.
条款O7、根据条款O1所述的装置,所述装置还包括:Clause O7. The device according to Clause O1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款O8、根据条款O1所述的装置,Clause O8, the device according to Clause O1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合The running device is one or any combination of CPU, GPU and NPU
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,所述有条件跳转宏指令包含跳转条件。The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction, and the conditional jump macro instruction includes a jump condition.
条款O9、一种机器学习运算装置,所述装置包括:Clause O9. A machine learning computing device, the device comprising:
一个或多个如条款O1-条款O8任一项所述的控制指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more control instruction generation devices as described in any one of clauses O1 to O8, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, and pass the execution results through O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述控制指令生成装置时,所述多个所述控制指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the control instruction generation devices, the plurality of control instruction generation devices may be connected and transmit data through a specific structure;
其中,多个所述控制指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述控制指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述控制指令生成装置共享内存或者拥有各自的内存;多个所述控制指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the control instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the control instruction generation devices share the same control system or have their own A control system; a plurality of the control instruction generation devices share memory or have their own memories; the interconnection mode of the plurality of control instruction generation devices is an arbitrary interconnection topology.
条款O10、一种组合处理装置,所述装置包括:Clause O10. A combined processing device, the device comprising:
如条款O9所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause O9;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款O11、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款O9所述的机器学习运算装置或如条款O10所述的组合处理装置;Clause O11, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause O9 or a combination described in Article O10 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款O12、一种控制指令生成方法,所述方法包括:Clause O12. A control instruction generation method, the method comprising:
根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备;According to the received control macro instruction, determine the execution device for executing the control macro instruction;
根据所述控制宏指令和所述运行设备,生成运行指令,Generate an operation instruction according to the control macro instruction and the operation device,
其中,所述控制宏指令是指用于控制指令流跳转至目标跳转位置的宏指令,Wherein, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position,
所述控制宏指令包含操作类型和目标跳转位置,所述运行指令包含所述操作类型和运行目标跳转位置,所述运行目标跳转位置是根据所述目标跳转位置确定的。The control macro instruction includes an operation type and a target jump position, and the run instruction includes the operation type and a run target jump position, and the run target jump position is determined according to the target jump position.
条款O13、根据条款O12所述的方法,根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备,包括:Clause O13. According to the method described in Clause O12, based on the received control macro instruction, determining a running device that executes the control macro instruction includes:
在确定所述控制宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述控制宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the control macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the control macro instruction, the designated device is determined as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述控制宏指令的操作类型相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the operation type of the control macro instruction.
条款O14、根据条款O13所述的方法,所述方法还包括:Clause O14. The method according to Clause O13, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备,包括:Wherein, according to the received control macro instruction, determining a running device for executing the control macro instruction includes:
在确定所述控制宏指令中不包含所述指定设备的标识时,根据接收到的控制宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述控制宏指令的运行设备,When it is determined that the control macro instruction does not include the identification of the designated device, according to the received control macro instruction and the resource information of the candidate device, the candidate device is determined to perform the control Macro operating equipment,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款O15、根据条款O14所述的方法,根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备,包括:Clause O15. According to the method described in Clause O14, according to the received control macro instruction, determining a running device that executes the control macro instruction includes:
在确定所述控制宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述控制宏指令的执行条件时,根据所述控制宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the resource of the candidate device Information to determine the operating equipment.
条款O16、根据条款O12所述的方法,所述方法还包括:Clause O16. The method according to Clause O12, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款O17、根据条款O13所述的方法,所述方法还包括:Clause O17. The method according to Clause O13, the method further comprising:
接收待执行控制指令,根据确定的指定设备的标识和所述待执行控制指令生成所述控制宏指令。The control instruction to be executed is received, and the control macro instruction is generated according to the determined identifier of the designated device and the control instruction to be executed.
条款O18、根据条款O12所述的方法,所述方法还包括:Clause O18. The method according to Clause O12, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款O19、根据条款O12所述的方法,Clause O19, according to the method described in Clause O12,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,所述有条件跳转宏指令包含跳转条件。The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction, and the conditional jump macro instruction includes a jump condition.
条款P1、一种控制指令处理系统,所述系统包括指令生成设备和运行设备,Clause P1, a control instruction processing system, the system includes an instruction generation device and an operation device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备;A device determination module, configured to determine a running device that executes the control macro instruction according to the received control macro instruction;
指令生成模块,用于根据所述控制宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to generate an operation instruction according to the control macro instruction and the operation device;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述控制宏指令是指用于控制指令流跳转至目标跳转位置的宏指令,Wherein, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position,
所述控制宏指令包含操作类型和目标跳转位置,所述运行指令包含所述操作类型和运行目标跳转位置,所述运行目标跳转位置是根据所述目标跳转位置确定的。The control macro instruction includes an operation type and a target jump position, and the run instruction includes the operation type and a run target jump position, and the run target jump position is determined according to the target jump position.
条款P2、根据条款P1所述的系统,所述指令生成设备还包括:Clause P2. The system according to Clause P1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述控制宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述控制宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to determine the designated device as the control device when it is determined that the control macro instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the control macro instruction Running equipment
第二确定子模块,用于在确定所述控制宏指令中不包含所述指定设备的标识时,根据接收到的控制宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述控制宏指令的运行设备;The second determination submodule is configured to, when determining that the control macro instruction does not include the identifier of the designated device, select the candidate device according to the received control macro instruction and the resource information of the candidate device. Determine a running device for executing the control macro instruction;
第三确定子模块,在确定所述控制宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述控制宏指令的执行条件时,根据所述控制宏指令和所述备选设备的资源信息,确定运行设备,A third determination submodule, when it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述控制宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the control macro instruction, and the resource information includes an instruction set included in the candidate device.
条款P3、根据条款P1所述的系统,所述指令生成设备还包括:Clause P3. The system according to Clause P1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款P4、根据条款P2所述的系统,所述指令生成设备还包括:Clause P4. The system according to Clause P2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行控制指令,根据确定的指定设备的标识和所述待执行控制指令 生成所述控制宏指令。The macro generation module is used for receiving the control instruction to be executed, and generating the control macro instruction according to the determined identifier of the designated device and the control instruction to be executed.
条款P5、根据条款P1所述的系统,所述指令生成设备还包括:Clause P5. The system according to Clause P1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款P6、根据条款P1所述的系统,所述运行设备,还包括:Clause P6. The system according to Clause P1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款P7、根据条款P1所述的系统,所述控制模块,包括:Clause P7. The system according to Clause P1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款P8、根据条款P7所述的系统,所述执行模块,包括:Clause P8. The system according to Clause P7, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款P9、根据条款P1所述的系统,Clause P9, the system described in Clause P1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,所述有条件跳转宏指令包含跳转条件。The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction, and the conditional jump macro instruction includes a jump condition.
条款P10、一种机器学习运算装置,所述装置包括:Clause P10. A machine learning computing device, the device comprising:
一个或多个如条款P1-条款P9任一项所述的控制指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more control instruction processing systems as described in any one of Clause P1-Clause P9, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述控制指令处理系统时,所述多个所述控制指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning computing device includes a plurality of the control instruction processing systems, the plurality of the control instruction processing systems can be connected and transmit data through a specific structure;
其中,多个所述控制指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述控制指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述控制指令处理系统共享内存或者拥有各自的内存;多个所述控制指令处理系统的互联方式是任意互联拓扑。Among them, multiple control instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; multiple control instruction processing systems share the same control system or have their own Control system; multiple control instruction processing systems share memory or have their own memory; multiple interconnection methods of the control instruction processing system are arbitrary interconnection topologies.
条款P11、一种组合处理装置,所述组合处理装置包括:Clause P11. A combined processing device, the combined processing device comprising:
如条款P110所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in Clause P110;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款P12、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款P10所述的机器学习运算装置或如条款P13所述的组合处理装置;Clause P12, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article P10 or a combination described in Article P13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款P13、一种控制指令处理方法,所述方法应用于控制指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause P13. A control instruction processing method. The method is applied to a control instruction processing system. The system includes an instruction generation device and an operation device. The method includes:
通过所述指令生成设备根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备,并根据所述控制宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, a running device that executes the control macro instruction according to the received control macro instruction, and generating a running instruction according to the control macro instruction and the running device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述控制宏指令是指用于控制指令流跳转至目标跳转位置的宏指令,Wherein, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position,
所述控制宏指令包含操作类型和目标跳转位置,所述运行指令包含所述操作类型和运行目标跳转位置,所述运行目标跳转位置是根据所述目标跳转位置确定的。The control macro instruction includes an operation type and a target jump position, and the run instruction includes the operation type and a run target jump position, and the run target jump position is determined according to the target jump position.
条款P14、根据条款P13所述的方法,所述方法还包括:Clause P14. The method according to Clause P13, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的控制宏指令,确定执行所述控制宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the operating device that executes the control macro instruction according to the received control macro instruction, including at least one of the following:
在确定所述控制宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述控制宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the control macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the control macro instruction, the designated device is determined as the running device;
在确定所述控制宏指令中不包含所述指定设备的标识时,根据接收到的控制宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述控制宏指令的运行设备;When it is determined that the control macro instruction does not include the identification of the designated device, according to the received control macro instruction and the resource information of the candidate device, the candidate device is determined to perform the control Macro operating equipment;
在确定所述控制宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述控制宏指令的执行条件时,根据所述控制宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the resource of the candidate device Information, determine operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述控制宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the control macro instruction, and the resource information includes an instruction set included in the candidate device.
条款P15、根据条款P13所述的方法,所述方法还包括:Clause P15. The method according to Clause P13, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款P16、根据条款P14所述的方法,所述方法还包括:Clause P16. The method according to Clause P14, the method further comprising:
通过所述指令生成设备接收待执行控制指令,根据确定的指定设备的标识和所述待执行控制指令生成所述控制宏指令。Receiving the control instruction to be executed through the instruction generating device, and generating the control macro instruction according to the determined identifier of the designated device and the control instruction to be executed.
条款P17、根据条款P13所述的方法,所述方法还包括:Clause P17. The method according to Clause P13, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款P18、根据条款P13所述的方法,所述方法还包括:Clause P18. The method according to Clause P13, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款P19、根据条款P13所述的方法,所述方法还包括:Clause P19. The method according to Clause P13, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款P20、根据条款P19所述的方法,所述方法还包括:Clause P20. The method according to Clause P19, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款P21、根据条款P13所述的方法,Clause P21, according to the method described in Clause P13,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,所述有条件跳转宏指令包含跳转条件。The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction, and the conditional jump macro instruction includes a jump condition.
对于数据读入指令,可以通过下述数据读入指令生成装置、方法,以及数据读入指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:The data reading instruction can be processed by the following data reading instruction generating device and method, as well as the data reading instruction processing system, method and other related products, and the aforementioned content can be better understood according to the following clauses:
条款Q1、一种数据读入指令生成装置,所述装置包括:Clause Q1, a data reading instruction generation device, the device comprising:
设备确定模块,用于根据接收到的读宏指令,确定执行所述读宏指令的运行设备;A device determination module, configured to determine a running device that executes the read macro instruction based on the received read macro instruction;
指令生成模块,用于根据所述读宏指令和所述运行设备,生成运行指令,An instruction generation module, configured to generate an operation instruction based on the read macro instruction and the operation device,
其中,所述读宏指令是指用于进行数据读入的宏指令,Wherein, the read macro instruction refers to a macro instruction used for data reading,
所述读宏指令包含操作类型、数据读入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据读入地址和运行数据加密方式地址,所述运行数据读入地址和运行数据加密方式地址分别是根据所述数据读入地址和所述数据加密方式地址确定的。The read macro instruction includes an operation type, data read-in address, and data encryption mode address. The operation instruction includes the operation type, operation data read-in address, and operation data encryption mode address, and the operation data read-in address and The operating data encryption mode address is determined based on the data read-in address and the data encryption mode address, respectively.
条款Q2、根据条款Q1所述的装置,所述设备确定模块,包括:Clause Q2. The apparatus according to Clause Q1, the device determination module includes:
第一确定子模块,用于在确定所述读宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述读宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine the designated device as the designated device when it is determined that the read macro instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the read macro instruction Running equipment,
其中,所述执行条件包括:所述指定设备中包含与所述读宏指令的操作类型相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the operation type of the read macro instruction.
条款Q3、根据条款Q2所述的装置,所述装置还包括:Clause Q3. The device according to Clause Q2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述读宏指令中不包含所述指定设备的标识时,根据接收到的读宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述读宏指令的运行设备,A second determining submodule, configured to determine, from the candidate device, the read macro instruction and the resource information of the candidate device according to the received reader macro instruction and the resource information of the candidate device Determine a running device for executing the read macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款Q4、根据条款Q3所述的装置,所述设备确定模块,还包括:Clause Q4. The apparatus according to Clause Q3, the device determination module, further comprising:
第三确定子模块,在确定所述读宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述读宏指令的执行条件时,根据所述读宏指令和所述备选设备的资源信息,确定运行设备。The third determining submodule, when determining that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and all Describe the resource information of the alternative equipment and determine the operation equipment.
条款Q5、根据条款Q2-条款Q4任一项所述的装置,所述读宏指令还包含读入量,Clause Q5. The device according to any one of Clause Q2-Clause Q4, the read macro instruction further includes a read-in amount,
所述指令生成模块,还用于确定所述读宏指令的数据量,根据所述数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the read macro instruction, and generate an operation instruction according to the data amount, the read macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述读入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to the read-in amount, and the resource information of the operating device further includes at least one of storage capacity and remaining storage capacity.
条款Q6、根据条款Q5所述的装置,所述指令生成模块,包括:Clause Q6. The apparatus according to Clause Q5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述读宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述读宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the read macro instruction, according to the operating data amount of the operating device and the The amount of data splits the read macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行读入量,所述运行读入量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, and each operation instruction further includes an operation reading amount, and the operation reading amount is determined according to the operation data amount.
条款Q7、根据条款Q5所述的装置,所述指令生成模块,包括:Clause Q7. The device according to Clause Q5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述读宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation submodule is used to split the read macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a command corresponding to each operation Equipment operating instructions,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行读入量,所述运行读入量是根据执行所述运行指令的运行设备的运行数据量确定的。Wherein, the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes an operation reading amount, and the operation reading amount is based on the operation device executing the operation instruction The amount of operating data is determined.
条款Q8、根据条款Q1所述的装置,所述装置还包括:Clause Q8. The device according to Clause Q1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款Q9、根据条款Q2所述的装置,所述装置还包括:Clause Q9. The device according to Clause Q2, the device further comprising:
宏指令生成模块,用于接收待执行读指令,根据确定的指定设备的标识和所述待执行读指令生成所述读宏指令。The macro generation module is used to receive the read instruction to be executed, and generate the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.
条款Q10、根据条款Q1所述的装置,所述装置还包括:Clause Q10. The device according to Clause Q1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款Q11、根据条款Q1所述的装置,Clause Q11, the device according to Clause Q1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。The read macro instruction includes at least one of read neuron macro instruction, read synapse macro instruction and read scalar macro instruction.
条款Q12、一种机器学习运算装置,所述装置包括:Clause Q12. A machine learning computing device, the device comprising:
一个或多个如条款Q1-条款Q11任一项所述的数据读入指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more data reading instruction generating devices as described in any one of Clause Q1-Article Q11, used to obtain the data and control information to be calculated from other processing devices, and execute the specified machine learning operation, and pass the execution result The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述数据读入指令生成装置时,所述多个所述数据读入指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the data reading instruction generating devices, the plurality of data reading instruction generating devices may be connected and transmit data through a specific structure;
其中,多个所述数据读入指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述数据读入指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述数据读入指令生成装置共享内存或者拥有各自的内存;多个所述数据读入指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the data reading instruction generating devices are interconnected and transmitting data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the data reading instruction generating devices share the same control system Or have their own control systems; a plurality of the data reading instruction generating devices share memory or have their own memory; the interconnection method of the plurality of data reading instruction generating devices is any interconnected topology.
条款Q13、一种组合处理装置,所述装置包括:Clause Q13. A combined processing device, the device comprising:
如条款Q12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause Q12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款Q14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款Q12所述的机器学习运算装置或如条款Q13所述的组合处理装置;Clause Q14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause Q12 or a combination described in Article Q13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款Q15、一种数据读入指令生成方法,所述方法包括:Article Q15. A method for generating a data reading instruction. The method includes:
根据接收到的读宏指令,确定执行所述读宏指令的运行设备;According to the received macro read command, determine the operating device that executes the macro read command;
根据所述读宏指令和所述运行设备,生成运行指令,Generate an operation instruction according to the read macro instruction and the operation device,
其中,所述读宏指令是指用于进行数据读入的宏指令,Wherein, the read macro instruction refers to a macro instruction used for data reading,
所述读宏指令包含操作类型、数据读入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据读入地址和运行数据加密方式地址,所述运行数据读入地址和运行数据加密方式地址分别是根据所述数据读入地址和所述数据加密方式地址确定的。The read macro instruction includes an operation type, data read-in address, and data encryption mode address. The operation instruction includes the operation type, operation data read-in address, and operation data encryption mode address, and the operation data read-in address and The operating data encryption mode address is determined based on the data read-in address and the data encryption mode address, respectively.
条款Q16、根据条款Q15所述的方法,根据接收到的读宏指令,确定执行所述读宏指令的运行设备,包括:Clause Q16. According to the method described in Clause Q15, based on the received read macro instruction, determining the operating device that executes the read macro instruction includes:
在确定所述读宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述读宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the read macro instruction includes the identifier of the specified device, and the resources of the specified device satisfy the execution conditions for executing the read macro instruction, determine the specified device as the running device,
其中,所述执行条件包括:所述指定设备中包含与所述读宏指令的操作类型相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the operation type of the read macro instruction.
条款Q17、根据条款Q16所述的方法,所述方法还包括:Clause Q17. The method according to Clause Q16, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的读宏指令,确定执行所述读宏指令的运行设备,包括:Wherein, according to the received macro reading instruction, determining the running device for executing the macro reading instruction includes:
在确定所述读宏指令中不包含所述指定设备的标识时,根据接收到的读宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述读宏指令的运行设备,When it is determined that the read macro instruction does not include the identification of the designated device, based on the received read macro instruction and the resource information of the candidate device, the candidate device is determined to perform the reading Macro operating equipment,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款Q18、根据条款Q17所述的方法,根据接收到的读宏指令,确定执行所述读宏指令的运行设备,包括:Clause Q18. According to the method described in Clause Q17, based on the received read macro instruction, determine the operating device that executes the read macro instruction, including:
在确定所述读宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述读宏指令的执行条件时,根据所述读宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and the resource of the candidate device Information to determine the operating equipment.
条款Q19、根据条款Q16-条款Q18任一项所述的方法,所述读宏指令还包含读入量,Clause Q19, the method according to any one of Clause Q16- Clause Q18, the read macro instruction further includes a read-in amount,
根据所述读宏指令和所述运行设备,生成运行指令,包括:Generating an operation instruction according to the read macro instruction and the operation device, including:
确定所述读宏指令的数据量,根据所述数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the read macro instruction, and generate an operation instruction according to the data amount, the read macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述读入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to the read-in amount, and the resource information of the operating device further includes at least one of storage capacity and remaining storage capacity.
条款Q20、根据条款Q19所述的方法,根据所述读宏指令的数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause Q20. According to the method of Clause Q19, generating an operation instruction according to the data amount of the read macro instruction, the read macro instruction, and the resource information of the operation device includes:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述读宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述读宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that there is only one running device, and the amount of running data of the running device is smaller than the data amount of the read macro instruction, the read macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行读入量,所述运行读入量是根据所述运行数据量确定的。Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, and each operation instruction further includes an operation reading amount, and the operation reading amount is determined according to the operation data amount.
条款Q21、根据条款Q19所述的方法,根据所述读宏指令的数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause Q21. According to the method described in Clause Q19, generating an operation instruction according to the data amount of the read macro instruction, the read macro instruction, and the resource information of the operation device includes:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述读宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the read macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行读入量,所述运行读入量是根据执行所述运行指令的运行设备的运行数据量确定的。Wherein, the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes an operation reading amount, and the operation reading amount is based on the operation device executing the operation instruction The amount of operating data is determined.
条款Q22、根据条款Q15所述的方法,所述方法还包括:Clause Q22. The method according to Clause Q15, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款Q23、根据条款Q16所述的方法,所述方法还包括:Clause Q23. The method according to Clause Q16, the method further comprising:
接收待执行读指令,根据确定的指定设备的标识和所述待执行读指令生成所述读宏指令。Receiving the read instruction to be executed, and generating the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.
条款Q24、根据条款Q15所述的方法,所述方法还包括:Clause Q24. The method according to Clause Q15, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款Q25、根据条款Q15所述的方法,Clause Q25, the method according to Clause Q15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。The read macro instruction includes at least one of read neuron macro instruction, read synapse macro instruction and read scalar macro instruction.
条款R1、一种数据读入处理系统,所述系统包括指令生成设备和运行设备,Clause R1, a data reading processing system, the system includes an instruction generating device and an operating device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的读宏指令,确定执行所述读宏指令的运行设备;A device determination module, configured to determine a running device that executes the read macro instruction based on the received read macro instruction;
指令生成模块,用于根据所述读宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to generate an operation instruction according to the read macro instruction and the operation device;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述读宏指令是指用于进行数据读入的宏指令,Wherein, the read macro instruction refers to a macro instruction used for data reading,
所述读宏指令包含操作类型、数据读入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据读入地址和运行数据加密方式地址,所述运行数据读入地址和运行数据加密方式地址分别是根据所述数据读入地址和所述数据加密方式地址确定的。The read macro instruction includes an operation type, data read-in address, and data encryption mode address. The operation instruction includes the operation type, operation data read-in address, and operation data encryption mode address, and the operation data read-in address and The operating data encryption mode address is determined based on the data read-in address and the data encryption mode address, respectively.
条款R2、根据条款R1所述的系统,所述指令生成设备还包括:Clause R2. The system according to Clause R1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述读宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述读宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to determine that the designated device is the Running equipment
第二确定子模块,用于在确定所述读宏指令中不包含所述指定设备的标识时,根据接收到的读宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述读宏指令的运行设备;A second determining submodule, configured to determine, from the candidate device, the read macro instruction and the resource information of the candidate device according to the received reader macro instruction and the resource information of the candidate device Determine a running device for executing the read macro command;
第三确定子模块,在确定所述读宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述读宏指令的执行条件时,根据所述读宏指令和所述备选设备的资源信息,确定运行设备,The third determining submodule, when determining that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and all Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述读宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the read macro instruction, and the resource information includes an instruction set included in the candidate device.
条款R3、根据条款R2所述的系统,所述读宏指令还包含读入量,Clause R3, the system according to Clause R2, the read macro instruction also includes the read-in amount,
所述指令生成模块,还用于确定所述读宏指令的数据量,根据所述数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the read macro instruction, and generate an operation instruction according to the data amount, the read macro instruction and the resource information of the operation device,
其中,所述数据量是根据所述读入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to the read-in amount, and the resource information of the operating device further includes at least one of storage capacity and remaining storage capacity.
条款R4、根据条款R3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause R4. The system according to Clause R3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述 读宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述读宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the read macro instruction, according to the operating data amount of the operating device and the The amount of data splits the read macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述读宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation submodule is used to split the read macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a command corresponding to each operation Equipment operating instructions,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行读入量,所述运行读入量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes an operation read-in amount, and the operation read-in amount is determined according to the operation data amount of the operation device executing the operation instruction of.
条款R5、根据条款R1所述的系统,所述指令生成设备还包括:Clause R5. The system according to Clause R1, the instruction generation device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款R6、根据条款R2所述的系统,所述指令生成设备还包括:Clause R6. The system according to Clause R2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行读指令,根据确定的指定设备的标识和所述待执行读指令生成所述读宏指令。The macro generation module is used to receive the read instruction to be executed, and generate the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.
条款R7、根据条款R1所述的系统,所述指令生成设备还包括:Clause R7. The system according to Clause R1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款R8、根据条款R1所述的系统,所述运行设备,还包括:Clause R8. The system according to Clause R1, the operating device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款R9、根据条款R1所述的系统,所述控制模块,包括:Clause R9. The system according to Clause R1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款R10、根据条款R9所述的系统,所述执行模块,包括:Clause R10. The system according to Clause R9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款R11、根据条款R1所述的系统,Clause R11, the system according to Clause R1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。The read macro instruction includes at least one of read neuron macro instruction, read synapse macro instruction and read scalar macro instruction.
条款R12、一种机器学习运算装置,所述装置包括:Clause R12. A machine learning computing device, the device comprising:
一个或多个如条款R1-条款R11任一项所述的数据读入指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more data reading instruction processing systems as described in any one of Clause R1-Clause R11, used to obtain data to be operated and control information from other processing devices, and perform designated machine learning operations, and pass the execution result The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述数据读入指令处理系统时,所述多个所述数据读入指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning computing device includes a plurality of the data reading instruction processing systems, the plurality of data reading instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述数据读入指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持 更大规模的机器学习的运算;多个所述数据读入指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述数据读入指令处理系统共享内存或者拥有各自的内存;多个所述数据读入指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the data reading instruction processing systems are interconnected and transmitting data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the data reading instruction processing systems share the same control system Or have their own control systems; multiple of the data read-in instruction processing systems share memory or have their own memories; the interconnection of multiple of the data read-in instruction processing systems is any interconnected topology.
条款R13、一种组合处理装置,所述组合处理装置包括:Clause R13. A combined processing device, the combined processing device comprising:
如条款R12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause R12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款R14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款R12所述的机器学习运算装置或如条款R13所述的组合处理装置;Clause R14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the article R12 or a combination described in the article R13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款R15、一种数据读入指令处理方法,所述方法应用于数据读入指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause R15. A data reading instruction processing method. The method is applied to a data reading instruction processing system. The system includes an instruction generating device and an operating device. The method includes:
通过所述指令生成设备根据接收到的读宏指令,确定执行所述读宏指令的运行设备,并根据所述读宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, a running device that executes the read macro instruction according to the received read macro instruction, and generating a running instruction based on the read macro instruction and the running device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述读宏指令是指用于进行数据读入的宏指令,Wherein, the read macro instruction refers to a macro instruction used for data reading,
所述读宏指令包含操作类型、数据读入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据读入地址和运行数据加密方式地址,所述运行数据读入地址和运行数据加密方式地址分别是根据所述数据读入地址和所述数据加密方式地址确定的。The read macro instruction includes an operation type, data read-in address, and data encryption mode address. The operation instruction includes the operation type, operation data read-in address, and operation data encryption mode address, and the operation data read-in address and The operating data encryption mode address is determined based on the data read-in address and the data encryption mode address, respectively.
条款R16、根据条款R15所述的方法,所述方法还包括:Clause R16. The method according to Clause R15, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的读宏指令,确定执行所述读宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the running device that executes the read macro instruction according to the received read macro instruction, including at least one of the following:
在确定所述读宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述读宏指令的执行条件时,将所述指定设备确定为所述运行设备;When it is determined that the read macro instruction includes the identifier of the specified device, and the resources of the specified device satisfy the execution conditions for executing the read macro instruction, determine the specified device as the running device;
在确定所述读宏指令中不包含所述指定设备的标识时,根据接收到的读宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述读宏指令的运行设备;When it is determined that the read macro instruction does not include the identification of the designated device, based on the received read macro instruction and the resource information of the candidate device, it is determined from the candidate device for performing the read Macro operating equipment;
在确定所述读宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述读宏指令的执行条件时,根据所述读宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and the resource of the candidate device Information, determine operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述读宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the read macro instruction, and the resource information includes an instruction set included in the candidate device.
条款R17、根据条款R16所述的方法,所述读宏指令还包含读入量,Clause R17. According to the method described in Clause R16, the read macro instruction further includes a read-in amount,
根据所述读宏指令和所述运行设备,生成运行指令,包括:Generating an operation instruction according to the read macro instruction and the operation device, including:
确定所述读宏指令的数据量,根据所述读宏指令的数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,Determining the data amount of the read macro instruction, and generating an operation instruction according to the data amount of the read macro instruction, the read macro instruction and the resource information of the operating device,
其中,所述数据量是根据所述读入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to the read-in amount, and the resource information of the operating device further includes at least one of storage capacity and remaining storage capacity.
条款R18、根据条款R17所述的方法,根据所述读宏指令的数据量、所述读宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause R18. According to the method described in Clause R17, generate an operation instruction according to the data amount of the read macro instruction, the read macro instruction, and the resource information of the operation device, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述读宏指令的数据量时,根据 所述运行设备的运行数据量和所述数据量将所述读宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that there is only one running device, and the amount of running data of the running device is smaller than the data amount of the read macro instruction, the read macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述读宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the read macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行读入量,所述运行读入量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes an operation read-in amount, and the operation read-in amount is determined according to the operation data amount of the operation device executing the operation instruction of.
条款R19、根据条款R15所述的方法,所述方法还包括:Clause R19. The method according to Clause R15, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款R20、根据条款R16所述的方法,所述方法还包括:Clause R20. The method according to Clause R16, the method further comprising:
通过所述指令生成设备接收待执行读指令,根据确定的指定设备的标识和所述待执行读指令生成所述读宏指令。Receiving the read instruction to be executed through the instruction generating device, and generating the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.
条款R21、根据条款R15所述的方法,所述方法还包括:Clause R21. The method according to Clause R15, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款R22、根据条款R15所述的方法,所述方法还包括:Clause R22. The method according to Clause R15, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款R23、根据条款R15所述的方法,所述方法还包括:Clause R23. The method according to Clause R15, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款R24、根据条款R23所述的方法,所述方法还包括:Clause R24. The method according to Clause R23, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款R25、根据条款R15所述的方法,Clause R25, according to the method described in Clause R15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种。The read macro instruction includes at least one of read neuron macro instruction, read synapse macro instruction and read scalar macro instruction.
对于数据写入指令,可以通过下述数据写入指令生成装置、方法,以及数据写入指令处理系统、方法等相关产品进行处理,依据以下条款可更好地理解前述内容:For the data writing instruction, it can be processed by the following data writing instruction generating device and method, as well as data writing instruction processing system, method and other related products, and the foregoing can be better understood according to the following clauses:
条款S1、一种数据写入指令生成装置,所述装置包括:Clause S1, a data writing instruction generating device, the device comprising:
设备确定模块,用于根据接收到的写宏指令,确定执行所述写宏指令的运行设备;A device determining module, configured to determine a running device that executes the macro writing instruction according to the received macro writing instruction;
指令生成模块,用于根据所述写宏指令和所述运行设备,生成运行指令,An instruction generation module, configured to generate an operation instruction based on the write macro instruction and the operation device,
其中,所述写宏指令是指用于进行数据写入的宏指令,Wherein, the write macro instruction refers to a macro instruction for writing data,
所述写宏指令包含操作类型、数据写入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据写入地址和运行数据加密方式地址,所述运行数据写入地址和运行数据加密方式地址分别是根据所述数据写入地址和所述数据加密方式地址确定的。The write macro instruction includes an operation type, data write address, and data encryption mode address, and the operation instruction includes the operation type, operation data write address, and operation data encryption mode address, and the operation data write address and The operating data encryption mode address is determined based on the data write address and the data encryption mode address, respectively.
条款S2、根据条款S1所述的装置,所述设备确定模块,包括:Clause S2. The apparatus according to Clause S1, the device determination module includes:
第一确定子模块,用于在确定所述写宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述写宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine the designated device as the specified device when determining that the macro write instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the macro write instruction Running equipment,
其中,所述执行条件包括:所述指定设备中包含与所述写宏指令的操作类型相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the type of operation of writing the macro instruction.
条款S3、根据条款S2所述的装置,所述装置还包括:Clause S3. The device according to Clause S2, the device further comprising:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,还包括:The device determination module also includes:
第二确定子模块,用于在确定所述写宏指令中不包含所述指定设备的标识时,根据接收到的写宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述写宏指令的运行设备,A second determining submodule, configured to determine, from the candidate device, based on the received macro command and the resource information of the candidate device, when determining that the identifier of the designated device is not included in the command to write the macro Determine a running device for executing the write macro instruction,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款S4、根据条款S3所述的装置,所述设备确定模块,还包括:Clause S4. The apparatus according to Clause S3, the device determination module, further comprising:
第三确定子模块,在确定所述写宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述写宏指令的执行条件时,根据所述写宏指令和所述备选设备的资源信息,确定运行设备。The third determining submodule determines that the designated macro device contains the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the designated macro command and all Describe the resource information of the alternative equipment and determine the operation equipment.
条款S5、根据条款S2-条款S4任一项所述的装置,所述写宏指令还包含写入量,Clause S5. The device according to any one of Clause S2-S4, the write macro instruction further includes a write amount,
所述指令生成模块,还用于确定所述写宏指令的数据量,根据所述数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount, the write macro instruction and the resource information of the operating device,
其中,所述数据量是根据所述写入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data volume is determined according to the write volume, and the resource information of the running device further includes at least one of storage capacity and remaining storage capacity.
条款S6、根据条款S5所述的装置,所述指令生成模块,包括:Clause S6. The apparatus according to Clause S5, the instruction generation module includes:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述写宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述写宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is configured to determine the number of the running device and the amount of running data of the running device is smaller than the amount of data of the macro instruction, according to the running data amount of the running device and the The amount of data splits the write macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行写入量,所述运行写入量是根据所述运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and each operation instruction further includes an operation write amount, and the operation write amount is determined according to the operation data amount.
条款S7、根据条款S5所述的装置,所述指令生成模块,包括:Clause S7. The apparatus according to Clause S5, the instruction generation module includes:
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述写宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation submodule is used to split the write macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a command corresponding to each operation Equipment operating instructions,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行写入量,所述运行写入量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes an operation write amount, and the operation write amount is based on the operation device executing the operation instruction The amount of operating data is determined.
条款S8、根据条款S1所述的装置,所述装置还包括:Clause S8. The device according to Clause S1, the device further comprising:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款S9、根据条款S2所述的装置,所述装置还包括:Clause S9. The device according to Clause S2, the device further comprising:
宏指令生成模块,用于接收待执行写指令,根据确定的指定设备的标识和所述待执行写指令生成所述写宏指令。The macro generation module is used to receive the write instruction to be executed, and generate the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.
条款S10、根据条款S1所述的装置,所述装置还包括:Clause S10. The device according to Clause S1, the device further comprising:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款S11、根据条款S1所述的装置,Clause S11, the device according to Clause S1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。The write macro instruction includes at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.
条款S12、一种机器学习运算装置,所述装置包括:Clause S12. A machine learning computing device, the device comprising:
一个或多个如条款S1-条款S11任一项所述的数据写入指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more data writing instruction generation devices as described in any one of clauses S1 to S11, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述数据写入指令生成装置时,所述多个所述数据写入指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the data writing instruction generation devices, the plurality of data writing instruction generation devices can be connected and transmit data through a specific structure;
其中,多个所述数据写入指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述数据写入指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述数据写入指令生成装置共享内存或者拥有各自的内存;多个所述数据写入指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the data writing instruction generating devices are interconnected and transmitting data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the data writing instruction generating devices share the same control system Or have their own control systems; a plurality of the data writing instruction generating devices share memory or have their own memories; the interconnection method of the plurality of data writing instruction generating devices is any interconnected topology.
条款S13、一种组合处理装置,所述装置包括:Clause S13. A combined processing device, the device comprising:
如条款S12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing device, general interconnection interface and other processing devices as described in clause S12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款S14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款S12所述的机器学习运算装置或如条款S13所述的组合处理装置;Clause S14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause S12 or a combination described in the clause S13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款S15、一种数据写入指令生成方法,所述方法包括:Clause S15. A method for generating a data writing instruction, the method comprising:
根据接收到的写宏指令,确定执行所述写宏指令的运行设备;According to the received macro write instruction, determine the execution device for executing the macro write instruction;
根据所述写宏指令和所述运行设备,生成运行指令,Generate an operation instruction according to the write macro instruction and the operation device,
其中,所述写宏指令是指用于进行数据写入的宏指令,Wherein, the write macro instruction refers to a macro instruction for writing data,
所述写宏指令包含操作类型、数据写入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据写入地址和运行数据加密方式地址,所述运行数据写入地址和运行数据加密方式地址分别是根据所述数据写入地址和所述数据加密方式地址确定的。The write macro instruction includes an operation type, data write address, and data encryption mode address, and the operation instruction includes the operation type, operation data write address, and operation data encryption mode address, and the operation data write address and The operating data encryption mode address is determined based on the data write address and the data encryption mode address, respectively.
条款S16、根据条款S15所述的方法,根据接收到的写宏指令,确定执行所述写宏指令的运行设备,包括:Clause S16. According to the method described in Clause S15, based on the received write macro instruction, determining an operating device to execute the write macro instruction includes:
在确定所述写宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述写宏指令的执行条件时,将所述指定设备确定为所述运行设备,Determining that the designated device is the running device when it is determined that the designated macro device contains the identifier of the designated device and the resources of the designated device satisfy the execution conditions for executing the designated macro command,
其中,所述执行条件包括所述指定设备中包含与所述写宏指令的操作类型相对应的指令集。Wherein, the execution condition includes that the designated device contains an instruction set corresponding to the operation type of the write macro instruction.
条款S17、根据条款S16所述的方法,所述方法还包括:Clause S17. The method according to Clause S16, the method further comprising:
获取备选设备的资源信息,Obtain resource information of alternative devices,
其中,根据接收到的写宏指令,确定执行所述写宏指令的运行设备,包括:Wherein, according to the received macro writing instruction, determining a running device for executing the macro writing instruction includes:
在确定所述写宏指令中不包含所述指定设备的标识时,根据接收到的写宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述写宏指令的运行设备,When it is determined that the write macro instruction does not include the identification of the designated device, according to the received write macro instruction and the resource information of the candidate device, the candidate device is determined to perform the write Macro operating equipment,
其中,所述资源信息包括所述备选设备所包含的指令集。Wherein, the resource information includes an instruction set included in the candidate device.
条款S18、根据条款S17所述的方法,根据接收到的写宏指令,确定执行所述写宏指令的运行设备,包括:Clause S18. According to the method described in Clause S17, based on the received write macro instruction, determining an operating device to execute the write macro instruction includes:
在确定所述写宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述写宏指令的执行条件时,根据所述写宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the designated macro device includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the resources of the designated macro device and the candidate device Information to determine the operating equipment.
条款S19、根据条款S16-条款S18任一项所述的方法,所述写宏指令还包含写入量,Clause S19, the method according to any one of Clause S16-S18, the write macro instruction further includes a write amount,
根据所述写宏指令和所述运行设备,生成运行指令,包括:According to the macro instruction and the operation device, generating an operation instruction includes:
确定所述写宏指令的数据量,根据所述数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount, the write macro instruction and the resource information of the operating device,
其中,所述数据量是根据所述写入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data volume is determined according to the write volume, and the resource information of the running device further includes at least one of storage capacity and remaining storage capacity.
条款S20、根据条款S19所述的方法,根据所述写宏指令的数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause S20. According to the method of Clause S19, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述写宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述写宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the write macro instruction, the write macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,
其中,所述运行设备的运行数据量是根据所述运行设备的资源信息确定的,每条运行指令还包含运行写入量,所述运行写入量是根据所述运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and each operation instruction further includes an operation write amount, and the operation write amount is determined according to the operation data amount.
条款S21、根据条款S19所述的方法,根据所述写宏指令的数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,包括:Clause S21, according to the method described in Clause S19, generating an operation instruction according to the data amount of the macro writing instruction, the macro writing instruction and the resource information of the operating device, including:
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述写宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operating devices, the write macro instruction is split according to the operating data amount of each operating device and the data amount, and an operating command corresponding to each operating device is generated,
其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令还包含运行写入量,所述运行写入量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes an operation write amount, and the operation write amount is based on the operation device executing the operation instruction The amount of operating data is determined.
条款S22、根据条款S15所述的方法,所述方法还包括:Clause S22. The method according to Clause S15, the method further comprising:
根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.
条款S23、根据条款S16所述的方法,所述方法还包括:Clause S23. The method according to Clause S16, the method further comprising:
接收待执行写指令,根据确定的指定设备的标识和所述待执行写指令生成所述写宏指令。Receiving the write instruction to be executed, and generating the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.
条款S24、根据条款S15所述的方法,所述方法还包括:Clause S24. The method according to Clause S15, the method further comprising:
将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款S25、根据条款S15所述的方法,Clause S25, according to the method described in Clause S15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述方法应用于CPU和/或NPU中;The method is applied to CPU and / or NPU;
所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。The write macro instruction includes at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.
条款T1、一种数据写入处理系统,所述系统包括指令生成设备和运行设备,Clause T1, a data writing processing system, the system includes an instruction generating device and an operating device,
所述指令生成设备,包括:The instruction generating device includes:
设备确定模块,用于根据接收到的写宏指令,确定执行所述写宏指令的运行设备;A device determining module, configured to determine a running device that executes the macro writing instruction according to the received macro writing instruction;
指令生成模块,用于根据所述写宏指令和所述运行设备,生成运行指令;An instruction generation module, configured to generate an operation instruction according to the write macro instruction and the operation device;
所述运行设备,包括:The operation equipment includes:
控制模块,用于获取所需数据、神经网络模型以及所述运行指令,对所述运行指令进行解析,获 得多个解析指令;The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;
执行模块,用于根据所述数据执行所述多个解析指令,得到执行结果,An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,
其中,所述写宏指令是指用于进行数据写入的宏指令,Wherein, the write macro instruction refers to a macro instruction for writing data,
所述写宏指令包含操作类型、数据写入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据写入地址和运行数据加密方式地址,所述运行数据写入地址和运行数据加密方式地址分别是根据所述数据写入地址和所述数据加密方式地址确定的。The write macro instruction includes an operation type, data write address, and data encryption mode address, and the operation instruction includes the operation type, operation data write address, and operation data encryption mode address, and the operation data write address and The operating data encryption mode address is determined based on the data write address and the data encryption mode address, respectively.
条款T2、根据条款T1所述的系统,所述指令生成设备还包括:Clause T2. The system according to Clause T1, the instruction generating device further includes:
资源获取模块,用于获取备选设备的资源信息,Resource acquisition module, used to obtain resource information of candidate devices,
所述设备确定模块,包括以下至少一个子模块:The device determination module includes at least one of the following sub-modules:
第一确定子模块,用于在确定所述写宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述写宏指令的执行条件时,将所述指定设备确定为所述运行设备;The first determining submodule is configured to determine the designated device as the specified device when determining that the macro write instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the macro write instruction Running equipment
第二确定子模块,用于在确定所述写宏指令中不包含所述指定设备的标识时,根据接收到的写宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述写宏指令的运行设备;A second determining submodule, configured to determine, from the candidate device, based on the received macro command and the resource information of the candidate device, when determining that the identifier of the designated device is not included in the command to write the macro Determine a running device for executing the write macro instruction;
第三确定子模块,在确定所述写宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述写宏指令的执行条件时,根据所述写宏指令和所述备选设备的资源信息,确定运行设备,The third determining submodule determines that the designated macro device contains the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the designated macro command and all Describe the resource information of the alternative equipment, determine the operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述写宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the write macro instruction, and the resource information includes an instruction set included in the candidate device.
条款T3、根据条款T2所述的系统,所述写宏指令还包含写入量,Clause T3, the system according to Clause T2, the write macro instruction also includes the write amount,
所述指令生成模块,还用于确定所述写宏指令的数据量,根据所述数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount, the write macro instruction and the resource information of the operating device,
其中,所述数据量是根据所述写入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data volume is determined according to the write volume, and the resource information of the running device further includes at least one of storage capacity and remaining storage capacity.
条款T4、根据条款T3所述的系统,所述指令生成模块,包括以下至少一个子模块:Clause T4. The system according to Clause T3, the instruction generation module includes at least one of the following sub-modules:
第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述写宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述写宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,The first instruction generating submodule is configured to determine the number of the running device and the amount of running data of the running device is smaller than the amount of data of the macro instruction, according to the running data amount of the running device and the The amount of data splits the write macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,
第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述写宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation submodule is used to split the write macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a command corresponding to each operation Equipment operating instructions,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行写入量,所述运行写入量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes an operation write amount, and the operation write amount is determined according to the operation data amount of the operation device executing the operation instruction of.
条款T5、根据条款T1所述的系统,所述指令生成设备还包括:Clause T5. The system according to Clause T1, the instruction generating device further includes:
队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.
条款T6、根据条款T2所述的系统,所述指令生成设备还包括:Clause T6. The system according to Clause T2, the instruction generating device further includes:
宏指令生成模块,用于接收待执行写指令,根据确定的指定设备的标识和所述待执行写指令生成所述写宏指令。The macro generation module is used to receive the write instruction to be executed, and generate the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.
条款T7、根据条款T1所述的系统,所述指令生成设备还包括:Clause T7. The system according to Clause T1, the instruction generating device further includes:
指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款T8、根据条款T1所述的系统,所述运行设备,还包括:Clause T8. The system according to Clause T1, the operation device, further comprising:
存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款T9、根据条款T1所述的系统,所述控制模块,包括:Clause T9. The system according to Clause T1, the control module includes:
指令存储子模块,用于存储所述运行指令;An instruction storage sub-module for storing the operation instruction;
指令处理子模块,用于对所述运行指令进行解析,得到所述多个解析指令;An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;
存储队列子模块,用于存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.
条款T10、根据条款T9所述的系统,所述执行模块,包括:Clause T10. The system according to Clause T9, the execution module includes:
依赖关系处理子模块,用于在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,将所述第一解析指令缓存在所述指令存储子模块中,在所述第零解析指令执行完毕后,从所述指令存储子模块中提取所述第一解析指令发送至所述执行模块,The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款T11、根据条款T1所述的系统,Clause T11, the system according to Clause T1,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。The write macro instruction includes at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.
条款T12、一种机器学习运算装置,所述装置包括:Clause T12, a machine learning computing device, the device comprising:
一个或多个如条款T1-条款T11任一项所述的数据写入指令处理系统,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more data writing instruction processing systems as described in any one of clauses T1 to T11, used to obtain data and control information to be calculated from other processing devices, and perform designated machine learning operations, and pass the results The I / O interface is passed to other processing devices;
当所述机器学习运算装置包含多个所述数据写入指令处理系统时,所述多个所述数据写入指令处理系统间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the data writing instruction processing systems, the plurality of data writing instruction processing systems may be connected and transmit data through a specific structure;
其中,多个所述数据写入指令处理系统通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述数据写入指令处理系统共享同一控制系统或拥有各自的控制系统;多个所述数据写入指令处理系统共享内存或者拥有各自的内存;多个所述数据写入指令处理系统的互联方式是任意互联拓扑。Among them, a plurality of the data writing instruction processing systems are interconnected and transmitting data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the data writing instruction processing systems share the same control system Or have their own control systems; multiple of the data writing instruction processing systems share memory or have their own memories; the interconnection of multiple data writing instruction processing systems is any interconnected topology.
条款T13、一种组合处理装置,所述组合处理装置包括:Clause T13, a combined processing device, the combined processing device comprising:
如条款T12所述的机器学习运算装置、通用互联接口和其他处理装置;Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause T12;
所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
其中,所述组合处理装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
条款T14、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及机器学习芯片,所述机器学习芯片包括条款T12所述的机器学习运算装置或如条款T13所述的组合处理装置;Clause T14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article T12 or a combination described in Article T13 Processing device
其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
所述存储器件,用于存储数据;The storage device is used for storing data;
所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
条款T15、一种数据写入指令处理方法,所述方法应用于数据写入指令处理系统,所述系统包括指令生成设备和运行设备,所述方法包括:Clause T15. A method for processing a data writing instruction. The method is applied to a data writing instruction processing system. The system includes an instruction generating device and an operating device. The method includes:
通过所述指令生成设备根据接收到的写宏指令,确定执行所述写宏指令的运行设备,并根据所述写宏指令和所述运行设备,生成运行指令;Determining, by the instruction generating device, an operating device that executes the macro writing instruction according to the received macro writing instruction, and generating an operating instruction according to the macro writing instruction and the operating device;
通过所述运行设备获取数据、神经网络模型以及运行指令,对所述运行指令进行解析,获得多个解析指令,并根据所述数据执行所述多个解析指令,得到执行结果,Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,
其中,所述写宏指令是指用于进行数据写入的宏指令,Wherein, the write macro instruction refers to a macro instruction for writing data,
所述写宏指令包含操作类型、数据写入地址、数据加密方式地址,所述运行指令中包含所述操作类型、运行数据写入地址和运行数据加密方式地址,所述运行数据写入地址和运行数据加密方式地址分别是根据所述数据写入地址和所述数据加密方式地址确定的。The write macro instruction includes an operation type, data write address, and data encryption mode address, and the operation instruction includes the operation type, operation data write address, and operation data encryption mode address, and the operation data write address and The operating data encryption mode address is determined based on the data write address and the data encryption mode address, respectively.
条款T16、根据条款T15所述的方法,所述方法还包括:Clause T16. The method according to Clause T15, the method further comprising:
通过所述指令生成设备获取备选设备的资源信息,Obtain the resource information of the candidate device through the instruction generating device,
其中,通过所述指令生成设备根据接收到的写宏指令,确定执行所述写宏指令的运行设备,包括以下至少一项:Wherein, the instruction generating device determines the running device that executes the macro writing instruction according to the received macro writing instruction, including at least one of the following:
在确定所述写宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述写宏指令的执行条件时,将所述指定设备确定为所述运行设备;Determining that the designated device is the running device when it is determined that the write macro instruction includes the identifier of the specified device, and the resources of the specified device satisfy the execution conditions for executing the write macro instruction;
在确定所述写宏指令中不包含所述指定设备的标识时,根据接收到的写宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述写宏指令的运行设备;When it is determined that the write macro instruction does not include the identification of the designated device, according to the received write macro instruction and the resource information of the candidate device, the candidate device is determined to perform the write Macro operating equipment;
在确定所述写宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述写宏指令的执行条件时,根据所述写宏指令和所述备选设备的资源信息,确定运行设备,When it is determined that the designated macro device includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the resources of the designated macro device and the candidate device Information, determine operating equipment,
其中,所述执行条件包括:所述指定设备中包含与所述写宏指令相对应的指令集,所述资源信息包括所述备选设备所包含的指令集。Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the write macro instruction, and the resource information includes an instruction set included in the candidate device.
条款T17、根据条款T16所述的方法,所述写宏指令还包含写入量,Clause T17, according to the method of Clause T16, the write macro instruction also includes the write amount,
根据所述写宏指令和所述运行设备,生成运行指令,包括:According to the macro instruction and the operation device, generating an operation instruction includes:
确定所述写宏指令的数据量,根据所述写宏指令的数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount of the write macro instruction, the write macro instruction and the resource information of the running device,
其中,所述数据量是根据所述写入量确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data volume is determined according to the write volume, and the resource information of the running device further includes at least one of storage capacity and remaining storage capacity.
条款T18、根据条款T17所述的方法,根据所述写宏指令的数据量、所述写宏指令和所述运行设备的资源信息,生成运行指令,包括以下至少一项:Clause T18. According to the method described in Clause T17, generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including at least one of the following:
在确定所述运行设备为一个,且所述运行设备的运行数据量小于所述写宏指令的数据量时,根据所述运行设备的运行数据量和所述数据量将所述写宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the write macro instruction, the write macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;
在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述写宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operating devices, the write macro instruction is split according to the operating data amount of each operating device and the data amount, and an operating command corresponding to each operating device is generated,
其中,运行设备的运行数据量是根据运行设备的资源信息确定的,所述运行指令还包含运行写入量,所述运行写入量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes an operation write amount, and the operation write amount is determined according to the operation data amount of the operation device executing the operation instruction of.
条款T19、根据条款T15所述的方法,所述方法还包括:Clause T19. The method according to Clause T15, the method further comprising:
通过所述指令生成设备根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列。The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.
条款T20、根据条款T16所述的方法,所述方法还包括:Clause T20. The method according to Clause T16, the method further comprising:
通过所述指令生成设备接收待执行写指令,根据确定的指定设备的标识和所述待执行写指令生成所述写宏指令。Receiving the write instruction to be executed through the instruction generating device, and generating the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.
条款T21、根据条款T15所述的方法,所述方法还包括:Clause T21, the method according to Clause T15, the method further comprising:
通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,
其中,通过所述指令生成设备将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:
根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
条款T22、根据条款T15所述的方法,所述方法还包括:Clause T22. The method according to Clause T15, the method further comprising:
通过所述运行设备存储所述数据以及所述数据中的标量数据,Storing the data and the scalar data in the data through the operating device,
其中,所述运行设备包括存储模块,所述存储模块包括寄存器、缓存中的一个或多个,所述缓存包括高速暂存缓存,Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,
所述缓存,用于存储所述数据;The cache is used to store the data;
所述寄存器,用于存储所述数据中标量数据。The register is used to store scalar data in the data.
条款T23、根据条款T15所述的方法,所述方法还包括:Clause T23. The method according to Clause T15, the method further comprising:
通过所述运行设备存储所述运行指令;Storing the operation instruction through the operation device;
通过所述运行设备对所述运行指令进行解析,得到所述多个解析指令;Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;
通过所述运行设备存储运行指令队列,所述运行指令队列包括所述运行指令和所述多个解析指令,所述运行指令队列所述运行指令和所述多个解析指令按照被执行的先后顺序依次排列。Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.
条款T24、根据条款T23所述的方法,所述方法还包括:Clause T24. The method according to Clause T23, the method further comprising:
通过所述运行设备在确定第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系时,缓存所述第一解析指令,在所述第零解析指令执行完毕后,执行缓存的所述第一解析指令,When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,
其中,所述第一解析指令与所述第一解析指令之前的第零解析指令存在关联关系包括:Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:
存储所述第一解析指令所需数据的第一存储地址区间与存储所述第零解析指令所需数据的第零存储地址区间具有重叠的区域。The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.
条款T25、根据条款T15所述的方法,Clause T25, the method according to Clause T15,
所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
所述指令生成设备设置于CPU和/或NPU中;The instruction generating device is set in the CPU and / or NPU;
所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种。The write macro instruction includes at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.
在一种可能的实现方式中,计算宏指令是指用于进行数据计算的宏指令,数据计算可以包括机器学习计算、神经网络计算、向量逻辑计算、矩阵向量计算、标量计算和标量逻辑计算中的至少一种。In a possible implementation, the calculation macro instruction refers to a macro instruction used for data calculation. The data calculation may include machine learning calculation, neural network calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation. At least one.
神经网络计算宏指令可以是指用于对神经网络算法进行计算的宏指令。例如,对卷积运算(convolutional computation)、池化运算(Pooling)等神经网络算法进行计算的卷积计算宏指令、池化计算宏指令等。不同类型的神经网络计算宏指令对应于不同的操作类型。例如,卷积计算宏指令所对应的操作类型可以为CONV。The neural network calculation macro instruction may refer to a macro instruction used to calculate a neural network algorithm. For example, convolution calculation macros and pooling calculation macros that perform calculations on neural network algorithms such as convolutional computation and pooling. Different types of neural network calculation macros correspond to different types of operations. For example, the operation type corresponding to the convolution calculation macro instruction may be CONV.
向量逻辑计算宏指令可以是指用于对向量进行逻辑运算的宏指令。例如,对向量进行“与”、“比较”、“或”等逻辑计算的宏指令。不同类型的向量逻辑计算宏指令对应于不同的操作类型。例如,向量与计算宏指令所对应的操作类型可以为VAND、向量或计算宏指令所对应的操作类型可以为VOR。The vector logic calculation macro instruction may refer to a macro instruction used to perform logic operation on a vector. For example, a macro instruction that performs logical calculations such as "and", "comparison", or "or" on a vector. Different types of vector logic calculation macro instructions correspond to different types of operations. For example, the operation type corresponding to the vector and the calculation macro instruction may be VAND, and the operation type corresponding to the vector or the calculation macro instruction may be VOR.
矩阵向量计算宏指令可以是指用于对矩阵和向量进行计算的宏指令。例如,对矩阵和向量进行矩阵乘向量计算、向量乘矩阵计算、张量计算、矩阵相加计算、矩阵相减计算等计算的宏指令。不同类型的矩阵向量计算宏指令对应于不同的操作类型,例如,矩阵相加计算宏指令所对应的操作类型为MADD,矩阵乘向量计算宏指令所对应的操作类型为MMV。The matrix vector calculation macro instruction may refer to a macro instruction used to calculate a matrix and a vector. For example, macro instructions for matrix and vector calculations such as matrix multiply vector calculation, vector multiply matrix calculation, tensor calculation, matrix addition calculation, and matrix subtraction calculation. Different types of matrix vector calculation macro instructions correspond to different operation types. For example, the operation type corresponding to the matrix addition calculation macro instruction is MADD, and the operation type corresponding to the matrix multiplication vector calculation macro instruction is MMV.
标量计算宏指令可以是指用于对标量进行算术运算的宏指令。例如,对标量进行标量相加、标量相减、标量相乘、标量相除等计算的宏指令。不同类型的标量计算宏指令对应于不同的操作类型。例如,标量相减宏指令所对应的操作类型为SSUB、标量相加宏指令所对应的操作类型为SADD。A scalar calculation macro instruction may refer to a macro instruction used for arithmetic operation on a scalar. For example, a macro instruction for scalar addition, scalar subtraction, scalar multiplication, and scalar division calculations. Different types of scalar calculation macros correspond to different types of operations. For example, the operation type corresponding to the scalar subtraction macro instruction is SSUB, and the operation type corresponding to the scalar addition macro instruction is SADD.
标量逻辑计算宏指令可以是用于对标量进行逻辑运算的宏指令。例如,对标量进行“与”、“比较”、“或”、“非”等逻辑运算的宏指令。不同类型的标量逻辑计算宏指令对应于不同的操作类型,例如,标量与计算宏指令所对应的操作类型可以为SAND、标量或计算宏指令所对应的操作类型可以为SOR。The scalar logic calculation macro instruction may be a macro instruction for performing logical operation on the scalar. For example, a macro instruction that performs logical operations such as "AND", "Comparison", "OR", and "NO" on a scalar. Different types of scalar logic calculation macro instructions correspond to different operation types. For example, the operation types corresponding to scalar and calculation macro instructions may be SAND, and the operation types corresponding to calculation macro instructions may be SOR.
在一种可能的实现方式中,控制宏指令是指用于控制指令流跳转至目标跳转位置的宏指令。无条件跳转宏指令可以是用于对指令流进行控制,使其无条件的跳转至指定位置的宏指令。有条件跳转宏指令可以是用于对指令流进行控制,使有条件跳转宏指令在其需要满足的条件为真时,跳转至指定位置的宏指令。In a possible implementation, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position. The unconditional jump macro instruction may be a macro instruction used to control the instruction flow so that it unconditionally jumps to a specified position. The conditional jump macro instruction may be a macro instruction used to control the instruction flow so that the conditional jump macro instruction jumps to a specified position when the condition that it needs to satisfy is true.
在一种可能的实现方式中,数据搬运宏指令是指用于对数据进行读入、写入等搬运处理的宏指令。读宏指令可以是指将数据从内存中读入到存储数据的位置的宏指令,可以是指用于进行数据读入的宏指令。根据数据类型的不同,读宏指令可以包括用于读入神经元数据的读神经元宏指令、用于读入突触数据的读突触宏指令和用于读入标量数据的读标量宏指令。写宏指令可以是指将数据从其存储位置写入到内存中的宏指令,可以是指用于进行数据写入的宏指令。根据数据类型的不同,写宏指令可以包括用于写入神经元数据的写神经元宏指令、用于写入突触数据的写突触宏指令和用于写入标量数据的写标量宏指令。其中,神经元数据即为神经网络算法中的输入神经元、输出神经元,突触数据即为神经网络算法中的权值。In a possible implementation manner, the data handling macro instruction refers to a macro instruction used for carrying out handling processing such as reading and writing of data. The read macro instruction may refer to a macro instruction that reads data from a memory to a location where data is stored, or may refer to a macro instruction that is used to read data. Depending on the data type, read macros may include read neuron macros for reading neuron data, read synapse macros for reading synapse data, and read scalar macros for reading scalar data . Writing a macro instruction may refer to a macro instruction that writes data from its storage location to a memory, or may refer to a macro instruction used to write data. According to different data types, the write macro instruction may include a write neuron macro instruction for writing neuron data, a write synapse macro instruction for writing synapse data, and a write scalar macro instruction for writing scalar data. . Among them, the neuron data is the input neuron and output neuron in the neural network algorithm, and the synapse data is the weight in the neural network algorithm.
在一种可能的实现方式中,对宏指令的指令格式、待执行指令的指令格式及运行设备执行运行指令的过程进行描述,以下是为具体示例。In a possible implementation manner, the instruction format of the macro instruction, the instruction format of the instruction to be executed, and the process of the operation device executing the operation instruction are described. The following is a specific example.
宏指令的指令格式可以为如下格式示例。The command format of the macro command can be the following format example.
神经网络计算宏指令的指令格式可以是:The instruction format of the neural network calculation macro instruction can be:
Type device_id,input_addr,output_addr,input_h,input_w,input_c,output_h,output_w,output_c,[param1,param2,…]Type device_id, input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, [param1, param2,…]
其中,Type为操作类型,device_id为指定设备的标识,input_addr为输入地址,output_addr为输出地址,input_h、input_w、input_c为输入的神经元规模(即输入量),output_h、output_w、output_c为输出的神经元规模(即输出量),param1、param2为指令参数。Among them, Type is the type of operation, device_id is the identification of the specified device, input_addr is the input address, output_addr is the output address, input_h, input_w, input_c are the input neuron size (ie, the amount of input), output_h, output_w, output_c are the output nerves Meta scale (ie output), param1 and param2 are instruction parameters.
对于神经网络计算宏指令,其必须包含操作类型、输入地址和输出地址,且根据神经网络计算宏指令所生成的运行指令也须包含操作类型、运行输入地址和运行输出地址,运行输入地址和运行输出地址是分别根据输入地址、输出地址确定的。For a neural network calculation macro, it must include the operation type, input address, and output address, and the run instruction generated based on the neural network calculation macro must also include the operation type, run input address, and run output address, run input address, and run The output address is determined according to the input address and output address respectively.
以卷积计算宏指令为例,其指令格式为:CONV device_id,input_addr,output_addr,input_h,input_w,input_c,output_h,output_w,output_c,kernel,stride,pad。调用时卷积计算宏指令可以为如下示例:Taking the convolution calculation macro instruction as an example, the instruction format is: CONV device_id, input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, kernel, stride, pad. The convolution calculation macro instruction when called can be the following example:
@CONV#0,#4,#500,#5,#5,#32,#3,#3,#16,#3,#1,#0@ CONV # 0, # 4, # 500, # 5, # 5, # 32, # 3, # 3, # 16, # 3, # 1, # 0
其中,该卷积计算宏指令的操作类型为CONV。指定设备为设备0。数据的输入地址为地址4。数据的输出地址为地址500。数据的输入量为5x5x32。数据的输出量为3x3x16。卷积核的大小为3,卷积核的步长为1,卷积核的填充为0。The operation type of the convolution calculation macro instruction is CONV. Specify the device as device 0. The input address of the data is address 4. The output address of the data is address 500. The input amount of data is 5x5x32. The data output is 3x3x16. The size of the convolution kernel is 3, the step size of the convolution kernel is 1, and the padding of the convolution kernel is 0.
若根据上述卷积计算宏指令示例生成的运行指令为“@CONV#4,#500,#5,#5,#32,#3,#3,#16,#3,#1,#0”为例。运行设备在接收到该运行指令之后,其执行过程为:从地址4处获取到输入量为5x5x32的数据。按照该运行指令中卷积核的大小(3)、步长为(1)和填充(0)对输入量为5x5x32的数据进行卷积运算,得到数据量为3x3x16的执行结果时,将该执行结果存储至输出地址500。If the running instruction generated according to the above example of the convolution calculation macro instruction is "@ CONV # 4, # 500, # 5, # 5, # 32, # 3, # 3, # 16, # 3, # 1, # 0" For example. After the running device receives the running instruction, its execution process is: acquiring data with an input amount of 5x5x32 from address 4. According to the size (3), step size (1) and padding (0) of the convolution kernel in the running instruction, perform convolution operation on the data with the input volume of 5x5x32, and when the execution result of the data volume is 3x3x16, execute the The result is stored to the output address 500.
向量逻辑计算宏指令的指令格式可以是:The instruction format of the vector logic calculation macro instruction can be:
Type device_id,input_addr,output_addr,input_size,output_size,[param1,param2,…]Type device_id, input_addr, output_addr, input_size, output_size, [param1, param2,…]
其中,Type为操作类型,device_id为指定设备的标识,input_addr为输入地址,output_addr为输出地址,input_size为输入向量的大小(即输入量),output_size为输出向量的大小(即输出量),param1、param2为指令参数。指令参数可以是第二个操作数的地址和长度。Among them, Type is the type of operation, device_id is the identification of the specified device, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (ie, the amount of input), output_size is the size of the output vector (ie, the amount of output), param1 param2 is the command parameter. The instruction parameter can be the address and length of the second operand.
对于向量逻辑计算宏指令,其必须包含操作类型、输入地址和输出地址,且根据向量逻辑计算宏指令所生成的运行指令也须包含操作类型、运行输入地址和运行输出地址,运行输入地址和运行输出地址是分别根据输入地址、输出地址确定的。For the vector logic calculation macro instruction, it must contain the operation type, input address and output address, and the run instruction generated according to the vector logic calculation macro instruction must also include the operation type, run input address and run output address, run input address and run The output address is determined according to the input address and output address respectively.
以根据某个向量逻辑计算宏指令所生成的运行指令为“@VAND#501,#7,#33,#4”为例。运行设备在接收到该运行指令后,其执行过程为:从输入地址501处获取到大小为33的输入向量,对该输入向量进行“与”逻辑运算,获得大小为4的输出向量,并将该大小为4的输出向量作为执行结果存储至输出地址7处。Take the running instruction generated by a vector logic calculation macro instruction as "@ VAND # 501, # 7, # 33, # 4" as an example. After the running device receives the running instruction, its execution process is: obtaining an input vector of size 33 from the input address 501, performing an AND logic operation on the input vector to obtain an output vector of size 4, and The output vector of size 4 is stored as the execution result at output address 7.
矩阵向量计算宏指令的指令格式可以是:The instruction format of the matrix vector calculation macro instruction can be:
Type device_id,input_addr,output_addr,input_size,output_size,[param1,param2,…]Type device_id, input_addr, output_addr, input_size, output_size, [param1, param2,…]
其中,Type为操作类型,idevice_id为指定设备的标识,input_addr为输入地址,output_addr为输 出地址,input_size为输入向量的大小(即输入量),output_size为输出向量的大小(即输出量),param1、param2为指令参数。指令参数可以是第二个操作数的地址和长度。Among them, Type is the type of operation, idevice_id is the identification of the specified device, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (ie, the amount of input), output_size is the size of the output vector (ie, the amount of output), param1 param2 is the command parameter. The instruction parameter can be the address and length of the second operand.
对于矩阵向量计算宏指令,其必须包含操作类型、输入地址和输出地址,且根据矩阵向量计算宏指令所生成的运行指令也须包含操作类型、运行输入地址和运行输出地址,运行输入地址和运行输出地址是分别根据输入地址、输出地址确定的。For the matrix vector calculation macro instruction, it must include the operation type, input address and output address, and the run instruction generated according to the matrix vector calculation macro instruction must also include the operation type, run input address and run output address, run input address and run The output address is determined according to the input address and output address respectively.
以根据某个矩阵向量计算宏指令生成的运行指令为“@MADD#502,#8,#34,#5”为例。运行设备在接收到该运行指令后,其执行过程为:从输入地址502处获得大小为34的输入矩阵向量。对该输入矩阵向量进行矩阵相加计算,获得大小为5的输出矩阵向量。将该大小为5的输出矩阵向量作为执行结果存储至存储地址8处。Take the running instruction generated by calculating a macro instruction according to a matrix vector as "@ MADD # 502, # 8, # 34, # 5" as an example. After the running device receives the running instruction, its execution process is: obtaining an input matrix vector of size 34 from the input address 502. Matrix addition calculation is performed on the input matrix vector to obtain an output matrix vector of size 5. The output matrix vector of size 5 is stored as the execution result at the storage address 8.
标量计算宏指令的指令格式可以是:The instruction format of the scalar calculation macro instruction can be:
Type device_id,op1,op2,ansType device_id, op1, op2, ans
其中,Type为操作类型,device_id为指定设备的标识,op1、op2为两个操作数。Ans为标量计算宏指令计算结果的存放地址或者用于存放计算结果的寄存器的标识。其中,所获取的标量的大小、对获取的标量进行计算所输出的标量的大小可以预先设定。Among them, Type is the operation type, device_id is the identifier of the specified device, and op1 and op2 are the two operands. Ans is the storage address of the scalar calculation macro instruction calculation result or the identifier of the register used to store the calculation result. The size of the acquired scalar and the size of the scalar output by calculating the acquired scalar can be set in advance.
对于标量计算宏指令,其必须包含操作类型、第一操作数、第二操作数和输出地址。且根据标量计算宏指令所生成的运行指令也须包含操作类型、第一运行操作数、第二运行操作数和运行输出地址。其中,第一运行操作数、第二运行操作数和运行输出地址是分别根据第一操作数、第二操作数和输出地址确定的。For a scalar calculation macro, it must contain the operation type, first operand, second operand, and output address. And the run instruction generated according to the scalar calculation macro instruction must also include the operation type, the first run operand, the second run operand and the run output address. Among them, the first operation operand, the second operation operand, and the operation output address are determined according to the first operand, the second operand, and the output address, respectively.
以根据某个标量计算宏指令生成的运行指令为“@SADD#503,#504,#3”为例。运行设备在接收到该运行指令后,其执行过程为:从寄存器的地址503中获取第一标量以及从寄存器的地址504中获取第二标量,将第一标量和第二标量相加,并将相加计算所获得的结果作为执行结果存储至寄存器的存储地址3处。Take the running instruction generated by a macro calculation based on a scalar as "@ SADD # 503, # 504, # 3" as an example. After the running device receives the running instruction, its execution process is: obtaining the first scalar from the address 503 of the register and the second scalar from the address 504 of the register, adding the first scalar and the second scalar, and adding The result obtained by the addition calculation is stored as the execution result at the storage address 3 of the register.
标量逻辑计算宏指令的指令格式可以是:The instruction format of the scalar logic calculation macro instruction can be:
Type device_id,op1,op2,ansType device_id, op1, op2, ans
其中,Type为操作类型,device_id为指定设备的标识,op1、op2为两个操作数。Ans为标量逻辑计算宏指令计算结果的存放地址或者用于存放计算结果的寄存器的标识。其中,所获取的标量的大小、对获取的标量进行计算所输出的标量的大小可以预先设定。Among them, Type is the operation type, device_id is the identifier of the specified device, and op1 and op2 are the two operands. Ans is the storage address of the scalar logic calculation macro instruction calculation result or the identifier of the register used to store the calculation result. The size of the acquired scalar and the size of the scalar output by calculating the acquired scalar can be set in advance.
对于标量逻辑计算宏指令,其必须包含操作类型、第一操作数、第二操作数和输出地址。且根据标量逻辑计算宏指令所生成的运行指令也须包含操作类型、第一运行操作数、第二运行操作数和运行输出地址。其中,第一运行操作数、第二运行操作数和运行输出地址是分别根据第一操作数、第二操作数和输出地址确定的。For a scalar logic calculation macro instruction, it must contain the operation type, first operand, second operand, and output address. And the operation instruction generated by calculating the macro instruction according to the scalar logic must also include the operation type, the first operation operand, the second operation operand and the operation output address. Among them, the first operation operand, the second operation operand, and the operation output address are determined according to the first operand, the second operand, and the output address, respectively.
以根据某个标量逻辑计算宏指令生成的运行指令为“@SAND#703,#704,#8”为例。运行设备在接收到该运行指令后,其执行过程为:从寄存器的地址703中获取第一标量以及从寄存器的地址704中获取第二标量,对第一标量和第二标量进行“与”逻辑运算,并将所获得的结果作为执行结果存储至寄存器的存储地址8处。Take the running instruction generated by a macro instruction calculated according to a scalar logic as "@ SAND # 703, # 704, # 8" as an example. After the running device receives the running instruction, its execution process is: acquiring the first scalar from the register address 703 and the second scalar from the register address 704, and performing an AND logic on the first scalar and the second scalar Calculate and store the obtained result as the execution result to the storage address 8 of the register.
无条件跳转宏指令的指令格式可以是:The instruction format of an unconditional jump macro instruction can be:
Jump device_id,srcJump device_id, src
其中,Jump为无条件跳转宏指令所对应的操作类型,device_id为指定设备的标识,src为指令流所需跳转到的目标跳转位置。目标跳转位置可以是寄存器的长度、寄存器的地址、寄存器的标识、立即数等。Among them, Jump is the operation type corresponding to the unconditional jump macro instruction, device_id is the identifier of the specified device, and src is the target jump position to which the instruction stream needs to jump. The target jump position may be the length of the register, the address of the register, the identifier of the register, the immediate value, and so on.
对于无条件跳转宏指令,其必须包含操作类型和目标跳转位置,且根据无条件跳转宏指令所生成的运行指令也须包含操作类型和运行目标跳转位置。其中,运行目标跳转位置是根据目标跳转位置确定的。For an unconditional jump macro instruction, it must include the operation type and target jump position, and the run instruction generated according to the unconditional jump macro instruction must also include the operation type and run target jump position. Among them, the running target jump position is determined according to the target jump position.
以根据某个无条件跳转宏指令所生成的运行指令为“@Jump#505”为例。运行设备在接收到该运行指令后,其执行过程为:将当前指令流跳转至地址505处继续执行。Take the running instruction generated according to an unconditional jump macro instruction as "@ Jump # 505" as an example. After the running device receives the running instruction, its execution process is: jump the current instruction stream to the address 505 to continue execution.
有条件跳转宏指令的指令格式可以是:The instruction format of the conditional jump macro instruction can be:
CB device_id,src,conditionCB device_id, src, condition
其中,CB为有条件跳转宏指令所对应的操作类型,device_id为指定设备的标识,src为指令流所需跳转到的目标跳转位置,condition为跳转的条件。例如,condition可以为“寄存器的值为零是否为真”,在寄存器的值为零时,可以跳转至目标跳转位置。目标跳转位置可以是寄存器的长度、寄存器的地址、寄存器的标识、立即数等。Among them, CB is the operation type corresponding to the conditional jump macro instruction, device_id is the identifier of the specified device, src is the target jump position to which the instruction stream needs to jump, and condition is the jump condition. For example, condition can be "whether the value of the register is zero is true", when the value of the register is zero, you can jump to the target jump position. The target jump position may be the length of the register, the address of the register, the identifier of the register, the immediate value, and so on.
对于有条件跳转宏指令,其必须包含操作类型和目标跳转位置,且根据有条件跳转宏指令所生成的运行指令也须包含操作类型和运行目标跳转位置。其中,运行目标跳转位置是根据目标跳转位置确定的。For the conditional jump macro instruction, it must include the operation type and target jump position, and the run instruction generated according to the conditional jump macro instruction must also include the operation type and run target jump position. Among them, the running target jump position is determined according to the target jump position.
以根据某个有条件跳转宏指令所生成的运行指令为“@CB#506#h”为例。运行设备在接收到该运行指令后,其执行过程为:判断其跳转条件“h”是否为真,在“h”为真时,将当前指令流跳转至地址506处继续执行。Take the running instruction generated according to a conditional jump macro as "@ CB # 506 # h" as an example. After the running device receives the running instruction, its execution process is: judging whether its jump condition "h" is true, and when "h" is true, jump the current instruction stream to the address 506 to continue execution.
读神经元宏指令的指令格式可以是:The instruction format for reading neuron macro instructions can be:
NLOAD device_id,src_addr,des_addr,sizeNLOAD devices_id, src_addr, des_addr, size
其中,NLOAD为读神经元宏指令所对应的操作类型,device_id为指定设备的标识,src_addr为读取神经元数据的数据读入地址,des_addr为存储读取神经元数据所需的加密方式的数据加密方式地址,size为神经元数据的读入量。Among them, NLOAD is the operation type corresponding to the read neuron macro instruction, device_id is the identification of the specified device, src_addr is the data read address for reading neuron data, and des_addr is the data that stores the encryption method required to read neuron data Encryption address, size is the amount of neuron data read in.
以根据某个读神经元宏指令所生成的运行指令为“@NLOAD#505#506#9”为例。运行设备在接收到该运行指令后,其执行过程为:从地址506处获取到神经元数据的加密方式,根据该加密方式从地址505处读取读入量为9的神经元数据。Take the running instruction generated according to a certain read neuron macro instruction as "@ NLOAD # 505 # 506 # 9" as an example. After the operation device receives the operation instruction, its execution process is: obtaining the encryption method of the neuron data from the address 506, and reading the neuron data with a read amount of 9 from the address 505 according to the encryption method.
读突触宏指令的指令格式可以是:The command format for reading synaptic macros can be:
WLOAD device_id,src_addr,des_addr,sizeWLOAD devices_id, src_addr, des_addr, size
其中,WLOAD为读突触宏指令所对应的操作类型,device_id为指定设备的标识,src_addr为读取突触数据的数据读入地址,des_addr为存储读取突触数据所需的加密方式的数据加密方式地址,size为突触数据的读入量。Among them, WLOAD is the operation type corresponding to the read synapse macroinstruction, device_id is the identification of the specified device, src_addr is the data reading address for reading synapse data, and des_addr is the data for storing the encryption method required to read the synapse data Encrypted address, size is the amount of synaptic data read.
以根据某个读突触宏指令所生成的运行指令为“@WLOAD#507#508#10”为例。运行设备在接收到该运行指令后,其执行过程为:从地址508处获取到读入突触数据的加密方式,根据该加密方式从地址507处读取读入量为10的突触数据。Take the operation command generated according to a certain read synaptic macro as "@ WLOAD # 507 # 508 # 10" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of reading synaptic data from the address 508, and reading the synaptic data with the reading amount of 10 from the address 507 according to the encryption method.
读标量宏指令的指令格式可以是:The command format of the read scalar macro command can be:
SLOAD device_id,src,desSLOAD, device_id, src, des
其中,SLOAD为读标量宏指令所对应的操作类型,device_id为指定设备的标识,src为读取标量的数据读入地址,des为存储读取标量数据所需的加密方式的数据加密方式地址。Among them, SLOAD is the operation type corresponding to the read scalar macro instruction, device_id is the identification of the specified device, src is the data read-in address for reading scalar data, and des is the data encryption method address for storing the encryption method required for reading scalar data.
以根据某个读标量宏指令所生成的运行指令为“@SLOAD#601#602”为例。运行设备在接收到该运行指令后,其执行过程为:从地址602处获取到标量数据的加密方式,根据该加密方式从地址601处读取其存储的标量数据。Take the operation command generated according to a certain read scalar macro command as "@ SLOAD # 601 # 602" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the scalar data from the address 602, and reading the stored scalar data from the address 601 according to the encryption method.
其中,读神经元宏指令、读突触宏指令和读标量宏指令中所包含的数据读入地址以及数据加密方式地址,可以是寄存器的地址、编号、名称等标识。对于读神经元宏指令、读突触宏指令和读标量宏指令,其必须包含操作类型、数据读入地址、数据加密方式地址,运行指令中也须包含操作类型、运行数据读入地址和运行数据加密方式地址。其中,运行数据读入地址和运行数据加密方式地址分别是根据数据读入地址和数据加密方式地址确定的。Among them, the data read-in address and the data encryption method address contained in the read neuron macro instruction, read synapse macro instruction and read scalar macro instruction may be the address, number, name, etc. of the register. For reading neuron macroinstruction, reading synapse macroinstruction and reading scalar macroinstruction, it must include operation type, data read-in address, data encryption method address, and operation type must also include operation type, operation data read-in address and operation Data encryption method address. Among them, the operation data read-in address and the operation data encryption mode address are determined according to the data read-in address and the data encryption mode address, respectively.
写神经元宏指令的指令格式可以是:The instruction format for writing a neuron macro instruction can be:
NSTORE device_id,src_addr,des_addr,sizeNSTORE device_id, src_addr, des_addr, size
其中,NSTORE为写神经元宏指令所对应的操作类型,device_id为指定设备的标识,src_addr为写入神经元数据的数据写入地址,des_addr为存储写入神经元数据所需的加密方式的数据加密方式地址,size为数据的写入量。Among them, NSTORE is the operation type corresponding to the write neuron macro instruction, device_id is the identification of the specified device, src_addr is the data write address to write neuron data, and des_addr is the data to store the encryption method required to write neuron data Encryption address, size is the amount of data written.
以根据某个写神经元宏指令所生成的运行指令为“@NSTORE#603#604#14”为例。运行设备在接收到该运行指令后,其执行过程为:从地址604处获取到神经元数据的加密方式,根据该加密方式将待写入的神经元数据写入地址14处。Take the running instruction generated according to a certain neuron macro instruction as "@ NSTORE # 603 # 604 # 14" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the neuron data from the address 604, and writing the neuron data to be written to the address 14 according to the encryption method.
写突触宏指令的指令格式可以是:The command format for writing synaptic macros can be:
WSTORE device_id,src_addr,des_addr,sizeWSTORE device_id, src_addr, des_addr, size
其中,WSTORE为写突触宏指令所对应的操作类型,device_id为指定设备的标识,src_addr为写入突触数据的数据写入地址,des_addr为存储写入突触数据所需的加密方式的数据加密方式地址,size为数据的写入量。Among them, WSTORE is the operation type corresponding to the write synapse macro instruction, device_id is the identification of the specified device, src_addr is the data write address to write synapse data, and des_addr is the data to store the encryption method required to write synapse data Encryption address, size is the amount of data written.
以根据某个写突触宏指令所生成的运行指令为“@WSTORE#605#606#15”为例。运行设备在接收到该运行指令后,其执行过程为:从地址606处获取到突触数据的加密方式,根据该加密方式将待写入的突触数据写入地址14处。Take the operation command generated according to a certain write synapse macro command as "@ WSTORE # 605 # 606 # 15" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the synaptic data from the address 606, and writing the synaptic data to be written to the address 14 according to the encryption method.
写标量宏指令的指令格式可以是:The instruction format for writing a scalar macro instruction can be:
SSTORE device_id,src,desSSTORE device_id, src, des
其中,SSTORE为读标量宏指令所对应的操作类型,device_id为指定设备的标识,src为写入标量数据的数据写入地址,des为存储写入标量数据所需的加密方式的数据加密方式地址。Among them, SSTORE is the operation type corresponding to the read scalar macro instruction, device_id is the identification of the specified device, src is the data write address to write scalar data, and des is the data encryption method address to store the encryption method required to write scalar data .
以根据某个写标量宏指令所生成的运行指令为“@SSTORE#607#608”为例。运行设备在接收到该运行指令后,其执行过程为:从地址608处获取到标量数据的加密方式,根据该加密方式将待写入的标量数据写入地址14处。Take the running instruction generated according to a certain write scalar macro instruction as "@ SSTORE # 607 # 608" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the scalar data from the address 608, and writing the scalar data to be written to the address 14 according to the encryption method.
其中,写神经元宏指令、写突触宏指令和写标量宏指令中所包含的数据写入地址和数据加密方式地址可以是寄存器的地址、编号、名称等标识。对于写神经元宏指令、写突触宏指令和写标量宏指令,其必须包含操作类型、数据写入地址、数据加密方式地址,运行指令中也须包含操作类型、运行数据写入地址和运行数据加密方式地址。其中,运行数据写入地址和运行数据加密方式地址分别是根据数据写入地址和数据加密方式地址确定的。Among them, the data write address and the data encryption method address contained in the write neuron macro instruction, write synapse macro instruction and write scalar macro instruction may be the address, number, name, etc. of the register. For writing neuron macroinstruction, writing synapse macroinstruction and writing scalar macroinstruction, it must include operation type, data write address, data encryption method address, and operation type must also include operation type, operation data write address and operation Data encryption method address. Among them, the operation data write address and the operation data encryption mode address are determined according to the data write address and the data encryption mode address, respectively.
待执行指令的指令格式可以为如下格式示例。The instruction format of the instruction to be executed may be the following format example.
待执行神经网络计算指令的指令格式可以是:The instruction format of the neural network calculation instruction to be executed may be:
Type input_addr,output_addr,input_h,input_w,input_c,output_h,output_w,output_c,[param1,param2,…]Type input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, [param1, param2,…]
其中,Type为操作类型,input_addr为输入地址,output_addr为输出地址,input_h、input_w、input_c为输入的神经元规模(即输入量),output_h、output_w、output_c为输出的神经元规模(即输出量),param1、param2为指令参数。Among them, Type is the operation type, input_addr is the input address, output_addr is the output address, input_h, input_w, input_c are the input neuron size (ie, input), and output_h, output_w, output_c are the output neuron size (ie, output) , Param1 and param2 are command parameters.
以待执行卷积指令为例,其指令格式为CONV input_addr,output_addr,input_h,input_w,input_c,output_h,output_w,output_c,kernel,stride,pad。调用时待执行卷积指令可以为:Taking the convolution instruction to be executed as an example, the instruction format is CONV input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, kernel, stride, pad. The convolution instruction to be executed when called can be:
@CONV#6,#500,#5,#5,#32,#3,#3,#16,#3,#1,#0@ CONV # 6, # 500, # 5, # 5, # 32, # 3, # 3, # 16, # 3, # 1, # 0
其中,该待执行卷积指令的操作类型为卷积神经网络计算。数据的输入地址为地址6。数据的输出地址为地址500。数据的输入量为5x5x32。数据的输出量为3x3x16。卷积核的大小为3,卷积核的步长为1,卷积核的填充为0。The operation type of the convolution instruction to be executed is convolution neural network calculation. The input address of the data is address 6. The output address of the data is address 500. The input amount of data is 5x5x32. The data output is 3x3x16. The size of the convolution kernel is 3, the step size of the convolution kernel is 1, and the padding of the convolution kernel is 0.
待执行向量逻辑计算指令的指令格式可以是:The instruction format of the vector logic calculation instruction to be executed may be:
Type input_addr,output_addr,input_size,output_size,[param1,param2,…]Type input_addr, output_addr, input_size, output_size, [param1, param2,…]
其中,Type为操作类型,input_addr为输入地址,output_addr为输出地址,input_size为输入向量的大小(即输入量),output_size为输出向量的大小(即输出量),param1、param2为指令参数。指令参数可以是第二个操作数的地址和长度。Among them, Type is the operation type, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (that is, the input amount), output_size is the size of the output vector (that is, the output amount), and param1 and param2 are the instruction parameters. The instruction parameter can be the address and length of the second operand.
待执行矩阵向量计算指令的指令格式可以是:The instruction format of the matrix vector calculation instruction to be executed may be:
Type input_addr,output_addr,input_size,output_size,[param1,param2,…]Type input_addr, output_addr, input_size, output_size, [param1, param2,…]
其中,Type为操作类型,input_addr为输入地址,output_addr为输出地址,input_size为输入向量的大小(即输入量),output_size为输出向量的大小(即输出量),param1、param2为指令参数。指令 参数可以是第二个操作数的地址和长度。Among them, Type is the operation type, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (that is, the input amount), output_size is the size of the output vector (that is, the output amount), and param1 and param2 are the instruction parameters. The instruction parameter can be the address and length of the second operand.
待执行标量计算指令的指令格式可以是:The instruction format of the scalar calculation instruction to be executed may be:
Type op1,op2,ansType, op1, op2, ans
其中,Type为操作类型,op1、op2为两个操作数。Ans为待执行标量计算指令计算结果的存放地址或者用于存放计算结果的寄存器的标识。Among them, Type is the operation type, op1, op2 are two operands. Ans is the storage address of the calculation result of the scalar calculation instruction to be executed or the identifier of the register used to store the calculation result.
待执行标量逻辑计算指令的指令格式可以是:The instruction format of the scalar logic calculation instruction to be executed may be:
Type op1,op2,ansType, op1, op2, ans
其中,Type为操作类型,op1、op2为两个操作数。Ans为待执行标量逻辑计算指令计算结果的存放地址或者用于存放计算结果的寄存器的标识。Among them, Type is the operation type, op1, op2 are two operands. Ans is the storage address of the calculation result of the scalar logic calculation instruction to be executed or the identifier of the register used to store the calculation result.
待执行无条件跳转指令的指令格式可以是:The instruction format of the unconditional jump instruction to be executed may be:
Jump srcJump src
其中,Jump为待执行无条件跳转指令所对应的操作类型,src为指令流所需跳转到的目标跳转位置。Jump is the type of operation corresponding to the unconditional jump instruction to be executed, and src is the target jump position to which the instruction stream needs to jump.
待执行有条件跳转指令的指令格式可以是:The instruction format of the conditional jump instruction to be executed may be:
CB src,conditionCB src, condition
其中,CB为待执行有条件跳转指令所对应的操作类型,src为指令流所需跳转到的目标跳转位置,condition为跳转的条件。例如,condition可以为“寄存器的值为零是否为真”,在寄存器的值为零时,可以跳转至目标跳转位置。Among them, CB is the type of operation corresponding to the conditional jump instruction to be executed, src is the target jump position to which the instruction stream needs to jump, and condition is the condition of the jump. For example, condition can be "whether the value of the register is zero is true", when the value of the register is zero, you can jump to the target jump position.
待执行读神经元指令的指令格式可以是:The instruction format of the read neuron instruction to be executed may be:
NLOAD src_addr,des_addr,sizeNLOAD src_addr, des_addr, size
其中,NLOAD为待执行读神经元指令所对应的操作类型,src_addr为读取神经元数据数据的数据读入地址,des_addr为存储读取神经元数据所需的加密方式的数据加密方式地址,size为神经元数据的读入量。Among them, NLOAD is the type of operation corresponding to the read neuron instruction to be executed, src_addr is the data read-in address for reading neuron data, and des_addr is the data encryption method address that stores the encryption method required to read neuron data, size It is the amount of neuron data read in.
待执行读突触指令的指令格式可以是:The command format of the read synaptic command to be executed may be:
WLOAD src_addr,des_addr,sizeWLOAD src_addr, des_addr, size
其中,WLOAD为待执行读突触指令所对应的操作类型,src_addr为读取突触数据的数据读入地址,des_addr为存储读取突触数据所需的加密方式的数据加密方式地址,size为突触数据的读入量。Among them, WLOAD is the type of operation corresponding to the read synaptic command to be executed, src_addr is the data read-in address for reading synaptic data, des_addr is the data encryption method address for storing the encryption method required for reading synaptic data, and size The amount of synapse data read.
待执行读标量指令的指令格式可以是:The instruction format of the read scalar instruction to be executed may be:
SLOAD src,desSLOAD src, des
其中,SLOAD为待执行读标量指令所对应的操作类型,src为读取标量数据的数据读入地址,des为存储读取标量数据所需的加密方式的数据加密方式地址。Among them, SLOAD is the type of operation corresponding to the scalar read instruction to be executed, src is the data read-in address for reading scalar data, and des is the data encryption method address for storing the encryption method required for reading scalar data.
待执行写神经元指令的指令格式可以是:The format of the instruction to be executed to write the neuron may be:
NSTORE src_addr,des_addr,sizeNSTORE src_addr, des_addr, size
其中,NSTORE为待执行写神经元指令所对应的操作类型,src_addr为写入神经元数据的数据写入地址,des_addr为存储写入神经元数据所需的加密方式的数据加密方式地址,size为神经元数据的写入量。Among them, NSTORE is the type of operation corresponding to the write neuron instruction to be executed, src_addr is the data write address to write neuron data, des_addr is the data encryption method address to store the encryption method required to write neuron data, size is The amount of neuron data written.
待执行写突触指令的指令格式可以是:The instruction format of the write synaptic instruction to be executed may be:
WSTORE src_addr,des_addr,sizeWSTORE src_addr, des_addr, size
其中,WSTORE为待执行写突触指令所对应的操作类型,src_addr为写入突触数据的数据写入地址,des_addr为存储写入突触数据所需的加密方式的数据加密方式地址,size为突触数据的写入量。Among them, WSTORE is the type of operation corresponding to the write synaptic command to be executed, src_addr is the data write address to write synaptic data, des_addr is the data encryption method address to store the encryption method required to write synaptic data, size is The amount of synaptic data written.
待执行写标量原指令的指令格式可以是:The instruction format of the original instruction to write a scalar can be:
SSTORE src,desSSTORE src, des
其中,SSTORE为待执行读标量指令所对应的操作类型,src为写入标量数据的数据写入地址,des为存储写入标量数据所需的加密方式的数据加密方式地址。Among them, SSTORE is the type of operation corresponding to the scalar read instruction to be executed, src is the data write address for writing scalar data, and des is the data encryption method address for storing the encryption method required for writing scalar data.
应当理解的是,本文中宏指令中的“@”、“#”仅用于分隔宏指令中所记载的参数,是为了便于技术 人员理解,在实际使用过程中,“@”、“#”并不是宏指令中所需包含的内容。It should be understood that the "@" and "#" in the macro instruction in this article are only used to separate the parameters recorded in the macro instruction, in order to facilitate the technical personnel to understand, in the actual use process, "@", "#" It is not required to be included in the macro.
应用示例Application examples
图8a、图8b示出根据本公开一实施例的神经网络指令生成装置、处理系统的应用场景的示意图。如图8a、图8b所示,用于执行宏指令的备选设备可以为多个,备选设备可以为CPU-1、CPU-2、…、CPU-n,NPU-1、NPU-2、…、NPU-n和GPU-1、GPU-2、…、GPU-n。8a and 8b are schematic diagrams illustrating application scenarios of a neural network instruction generation device and a processing system according to an embodiment of the present disclosure. As shown in FIGS. 8a and 8b, there may be multiple candidate devices for executing macro instructions. The candidate devices may be CPU-1, CPU-2, ..., CPU-n, NPU-1, NPU-2, ..., NPU-n and GPU-1, GPU-2, ..., GPU-n.
示例1Example 1
以下结合“神经网络指令生成装置根据宏指令生成运行指令的工作过程”作为一个示例性应用场景,给出根据本公开实施例的应用示例,以便于理解神经网络指令生成装置的流程。本领域技术人员应理解,以下应用示例仅仅是出于便于理解本公开实施例的目的,不应视为对本公开实施例的限制。The following describes an application example according to an embodiment of the present disclosure in order to understand the flow of the neural network instruction generating device in conjunction with "the working process of the neural network instruction generating device generating the running instruction according to the macro instruction" as an exemplary application scenario. Those skilled in the art should understand that the following application examples are only for the purpose of facilitating the understanding of the embodiments of the present disclosure, and should not be considered as limitations to the embodiments of the present disclosure.
神经网络指令生成装置根据某宏指令生成运行指令的工作过程及原理如下。The working process and principle of the neural network command generating device generating the running command according to a certain macro command are as follows.
资源获取模块14 Resource acquisition module 14
获取备选设备的资源信息,该资源信息包括备选设备的剩余存储容量、存储容量和备选设备所包含的指令集。资源获取模块14将获取到的备选设备的资源信息发送至设备确定模块11和指令生成模块12。Acquire resource information of the candidate device, the resource information includes the remaining storage capacity, storage capacity of the candidate device, and the instruction set included in the candidate device. The resource acquisition module 14 sends the acquired resource information of the candidate device to the device determination module 11 and the instruction generation module 12.
设备确定模块11(包括第一确定子模块111、第二确定子模块112和第三确定子模块113)Device determination module 11 (including first determination sub-module 111, second determination sub-module 112, and third determination sub-module 113)
在接收到宏指令时,根据接收到的宏指令,确定执行宏指令的运行设备。例如,接收到如下宏指令。其中,宏指令可以是来自不同的平台的。When a macro instruction is received, the operating device that executes the macro instruction is determined according to the received macro instruction. For example, the following macro command is received. Among them, the macro instructions can be from different platforms.
宏指令1:@XXX#01……Macro 1: @ XXX # 01 ...
宏指令2:@SSS#02……Macro 2: @ SSS # 02 ...
宏指令3:@DDD#04……Macro 3: @ DDD # 04 ...
宏指令4:@NNN……Macro 4: @NNN ...
第一确定子模块111在确定在宏指令中包含指定设备的标识,且确定该指定设备中包含与宏指令相对应的指令集时,第一确定子模块111可以将该指定设备确定为执行宏指令的运行设备,并将确定的运行设备的标识发送至指令生成模块12。例如,第一确定子模块111可以将标识01所对应的指定设备如CPU-2(CPU-2中包含与宏指令1相对应的指令集)确定为用于执行宏指令1的运行设备。可以将标识02所对应的指定设备如CPU-1(CPU-1中包含与宏指令2相对应的指令集)确定为用于执行宏指令2的运行设备。When the first determining sub-module 111 determines that the identifier of the designated device is included in the macro instruction, and determines that the designated device contains the instruction set corresponding to the macro instruction, the first determining sub-module 111 may determine the designated device to execute the macro The commanded operating device, and send the determined operating device identifier to the command generation module 12. For example, the first determining submodule 111 may determine a designated device corresponding to the identifier 01, such as CPU-2 (CPU-2 contains an instruction set corresponding to the macro instruction 1), as the running device for executing the macro instruction 1. A designated device corresponding to the identifier 02, such as CPU-1 (CPU-1 contains an instruction set corresponding to the macro instruction 2), may be determined as a running device for executing the macro instruction 2.
第三确定子模块113在确定在宏指令中包含指定设备的标识,且确定该指定设备中不包含与宏指令相对应的指令集时,第三确定子模块113可以将包含与宏指令相对应的指令集的备选设备确定为运行设备,并将确定的运行设备的标识发送至指令生成模块12。例如,第三确定子模块113在确定标识04所对应的指定设备中不包含与宏指令3相对应的指令集时,可以将包含与宏指令3的操作类型DDD相对应的指令集的备选设备如NPU-n、NPU-2,确定为用于执行宏指令3的运行设备。When the third determining submodule 113 determines that the identifier of the specified device is included in the macro instruction, and determines that the specified device does not contain the instruction set corresponding to the macro instruction, the third determining submodule 113 may correspond to the inclusion of the macro instruction The candidate device of the instruction set is determined to be the operating device, and the identification of the determined operating device is sent to the instruction generating module 12. For example, when determining that the designated device corresponding to the identifier 04 does not contain the instruction set corresponding to the macro instruction 3, the third determining sub-module 113 may include an alternative of the instruction set corresponding to the operation type DDD of the macro instruction 3 Devices such as NPU-n and NPU-2 are determined to be running devices for executing macro instructions 3.
第二确定子模块112在确定宏指令中不存在指定设备的标识(指定设备的标识所对应的位置为空,或者在宏指令中不包含“指定设备的标识”这个字段)时,第二确定子模块112可以根据该宏指令和备选设备的资源信息,从备选设备中确定出运行设备(具体确定过程详见上文第二确定子模块112的相关描述),并将确定出的运行设备的标识发送至指令生成模块12。例如,由于宏指令4中不存在指定设备的标识,第二确定子模块112可以根据宏指令4的操作类型NNN和备选设备的资源信息(所包含的指令集),从备选设备中确定出用于执行宏指令4的运行设备,例如,GPU-n(GPU-n中包含与操作类型NNN相对应的指令集)。When the second determination submodule 112 determines that the identifier of the designated device does not exist in the macro instruction (the position corresponding to the identifier of the designated device is empty, or does not include the field of "identifier of the designated device" in the macro instruction), the second determination The sub-module 112 may determine the operating device from the candidate device according to the macro instruction and the resource information of the candidate device (for the specific determination process, please refer to the relevant description of the second determining sub-module 112 above), and the determined operation The identification of the device is sent to the instruction generation module 12. For example, since the identifier of the designated device does not exist in the macro instruction 4, the second determination sub-module 112 may determine from the candidate device based on the operation type NNN of the macro instruction 4 and the resource information of the candidate device (instruction set included) A running device for executing macro instructions 4 is provided, for example, GPU-n (GPU-n contains an instruction set corresponding to the operation type NNN).
指令生成模块12(包括第一指令生成模块121和第二指令生成模块122)Instruction generation module 12 (including the first instruction generation module 121 and the second instruction generation module 122)
第一指令生成模块121在运行设备为一个,且运行设备的资源不满足执行宏指令的容量条件时,根据运行设备的运行数据量和数据量将宏指令拆分成多条运行指令,并将多条运行指令发送至队列构建模块15。例如,根据宏指令2的数据量和运行设备CPU-1的运行数据量生成多条运行指令2-1、2-2、…、2-n。根据宏指令4的数据量和运行设备GPU-n的运行数据量生成多条运行指令4-1、4-2、…、4-n。The first instruction generation module 121 splits the macro instruction into a plurality of operation instructions according to the operation data amount and data amount of the operation equipment when the operation equipment is one and the resources of the operation equipment do not satisfy the capacity condition for executing the macro instruction Multiple running instructions are sent to the queue building module 15. For example, multiple operation instructions 2-1, 2-2, ..., 2-n are generated based on the data amount of the macro instruction 2 and the operation data amount of the operation device CPU-1. A plurality of execution instructions 4-1, 4-2, ..., 4-n are generated according to the data amount of the macro instruction 4 and the operation data amount of the operation device GPU-n.
第一指令生成模块121在确定运行设备为一个,且运行设备的资源满足执行宏指令的容量条件时, 可以根据宏指令生成一条运行指令,并将其发送至队列构建模块15。例如,根据宏指令1的数据量和运行设备CPU-2的运行数据量生成一条运行指令1-1。When the first instruction generation module 121 determines that there is only one running device, and the resources of the running device meet the capacity condition for executing the macro instruction, it may generate a running instruction according to the macro instruction and send it to the queue construction module 15. For example, an operation instruction 1-1 is generated based on the data amount of the macro instruction 1 and the operation data amount of the operation device CPU-2.
第二指令生成模块122在确定运行设备为多个时,根据每个运行设备的运行数据量和宏指令的数据量对宏指令进行拆分,生成对应于每个运行设备的运行指令,并将其发送至队列构建模块15。例如,根据宏指令3的数据量和运行设备NPU-n的运行数据量、运行设备NPU-2的运行数据量,为运行设备NPU-n生成多条运行指令3-1、3-2、…、3-n,为运行设备NPU-2生成多条运行指令3’-1、3’-2、…、3’-n。When determining that there are multiple operating devices, the second instruction generating module 122 splits the macro instruction according to the operating data amount of each operating device and the data amount of the macro instruction, generates an operating instruction corresponding to each operating device, and Send to queue building module 15. For example, according to the data amount of the macro instruction 3, the operation data amount of the operation device NPU-n, and the operation data amount of the operation device NPU-2, a plurality of operation instructions 3-1, 3-2, ... are generated for the operation device NPU-n , 3-n, generate multiple running instructions 3'-1, 3'-2, ..., 3'-n for running device NPU-2.
队列构建模块15 Queue building module 15
在接收到运行指令时,根据队列排序规则对每个运行设备所需执行的所有运行指令进行排序,根据排序后的运行指令为每个运行设备构建与之唯一对应的指令队列,并将指令队列发送至指令分派模块16。具体地,When receiving a running instruction, it sorts all the running instructions that each running device needs to execute according to the queue sorting rules, and builds a unique instruction queue corresponding to each running device according to the sorted running instructions, and queues the instruction queue Send to instruction dispatch module 16. specifically,
对于被运行设备CPU-2执行的一条运行指令1-1。所构建的对应于运行设备CPU-2的指令队列CPU-2”仅包括运行指令1-1。For a running instruction 1-1 executed by the running device CPU-2. The constructed instruction queue CPU-2 "corresponding to the execution device CPU-2 includes only the execution instruction 1-1.
对于被运行设备CPU-1执行的多条运行指令2-1、2-2、…、2-n。根据队列排序规则对多条运行指令2-1、2-2、…、2-n进行排序,根据排序后的多条运行指令2-1、2-2、…、2-n构建与运行设备CPU-1相对应的指令队列CPU-1”。For multiple execution instructions 2-1, 2-2, ..., 2-n executed by the operation device CPU-1. Sort multiple running instructions 2-1, 2-2, ..., 2-n according to the queue sorting rules, and build and run equipment based on the sorted multiple running instructions 2-1, 2-2, ..., 2-n CPU-1 corresponds to the instruction queue CPU-1 ".
对于被运行设备NPU-n执行的多条运行指令3-1、3-2、…、3-n。根据队列排序规则对多条运行指令3-1、3-2、…、3-n进行排序,根据排序后的多条运行指令3-n、…、3-2、3-1构建与运行设备NPU-n相对应的指令队列NPU-n”。For multiple running instructions 3-1, 3-2, ..., 3-n executed by the running device NPU-n. Sort multiple running instructions 3-1, 3-2, ..., 3-n according to the queue sorting rules, and build and run equipment based on the sorted multiple running instructions 3-n, ..., 3-2, 3-1 The instruction queue NPU-n corresponding to NPU-n ".
对于被运行设备NPU-2执行的多条运行指令3’-1、3’-2、…、3’-n。根据队列排序规则对多条运行指令3’-1、3’-2、…、3’-n进行排序,根据排序后的多条运行指令3’-n、…、3’-2、3’-1构建与运行设备NPU-2相对应的指令队列NPU-2”。For a plurality of running instructions 3'-1, 3'-2, ..., 3'-n executed by the running device NPU-2. Sort the multiple running instructions 3'-1, 3'-2, ..., 3'-n according to the queue sorting rules, and sort the multiple running instructions 3'-n, ..., 3'-2, 3 ' -1 Construct the instruction queue NPU-2 "corresponding to the running device NPU-2".
对于被运行设备GPU-n执行的多条运行指令4-1、4-2、…、4-n。根据队列排序规则对多条运行指令4-1、4-2、…、4-n进行排序,根据排序后的多条运行指令4-1、4-2、…、4-n构建与运行设备GPU-n相对应的指令队列GPU-n”。For multiple execution instructions 4-1, 4-2, ..., 4-n executed by the execution device GPU-n. Sort multiple operating instructions 4-1, 4-2, ..., 4-n according to the queue sorting rules, and build and run equipment based on the sorted multiple operating instructions 4-1, 4-2, ..., 4-n GPU-n corresponding instruction queue GPU-n ".
指令分派模块16 Instruction dispatch module 16
在接收到指令队列之后,将每个指令队列中的运行指令,依次发送至对应的运行设备中,以使运行设备执行运行指令。例如,将指令队列CPU-2”中包括的运行指令1-1发送至其对应的运行设备CPU-2。将指令队列CPU-1”中的多条运行指令2-1、2-2、…、2-n,依次发送至其对应的运行设备CPU-1。将指令队列NPU-n”中的多条运行指令3-n、…、3-2、3-1,依次发送至其对应的运行设备NPU-n。将指令队列NPU-2”中的多条运行指令3’-n、…、3’-2、3’-1,依次发送至其对应的运行设备NPU-2。将队列GPU-n”中的多条运行指令4-1、4-2、…、4-n,依次发送至其对应的运行设备GPU-n。After receiving the instruction queue, the operation instructions in each instruction queue are sequentially sent to the corresponding operation device, so that the operation device executes the operation instruction. For example, the execution instruction 1-1 included in the instruction queue CPU-2 "is sent to its corresponding operation device CPU-2. The multiple execution instructions 2-1, 2-2, ... in the instruction queue CPU-1" , 2-n, sent to its corresponding running device CPU-1 in turn. Send multiple running instructions 3-n, ..., 3-2, 3-1 in the instruction queue NPU-n "to their corresponding running device NPU-n in turn. Send multiple instructions in the instruction queue NPU-2" The operating instructions 3'-n, ..., 3'-2, 3'-1 are sent to its corresponding operating device NPU-2 in sequence. Multiple running instructions 4-1, 4-2, ..., 4-n in the queue GPU-n "are sequentially sent to its corresponding running device GPU-n.
其中,上述运行设备CPU-2、运行设备CPU-1、运行设备NPU-n和运行设备NPU-2在接收到指令队列之后,按照指令队列中运行指令的排列顺序,依次执行运行指令。以运行设备CPU-2为例,描述其执行所接收到的运行指令的具体过程。运行设备CPU-2包括控制模块、执行模块和存储模块。其中,控制模块包括指令存储子模块、指令处理子模块和存储队列子模块,执行模块包括依赖关系处理子模块,详见上文关于运行设备的相关描述。Wherein, the above-mentioned running device CPU-2, running device CPU-1, running device NPU-n and running device NPU-2 execute the running instructions in sequence according to the order of running instructions in the command queue after receiving the instruction queue. Taking the running device CPU-2 as an example, the specific process of executing the received running instruction is described. The operating device CPU-2 includes a control module, an execution module, and a storage module. The control module includes an instruction storage sub-module, an instruction processing sub-module, and a storage queue sub-module, and the execution module includes a dependency relationship processing sub-module. For details, see the related description about operating devices above.
假定根据宏指令1所生成的运行指令1-1为“@XXX……”。运行设备CPU-2在接收到运行指令1-1之后,执行运行指令1-1的过程如下:Assume that the run instruction 1-1 generated according to the macro instruction 1 is "@XXX ...". After receiving the operation instruction 1-1, the operation device CPU-2 executes the operation instruction 1-1 as follows:
运行设备CPU-2的控制模块获取数据、神经网络模型以及运行指令1-1。其中,指令存储子模块用于存储运行指令1-1。指令处理子模块用于对运行指令1-1进行解析,获得多个解析指令如解析指令0、解析指令1和解析指令2,并将多个解析指令发送至存储队列子模块和执行模块。存储队列子模块用于存储运行指令队列,运行指令队列中包含运行设备CPU-2所需执行的解析指令0、解析指令1和解析指令2以及其他运行指令,在运行指令队列中所有指令按照执行的先后顺序依次排列。例如,获得的多个解析指令的执行的先后顺序为解析指令0、解析指令1和解析指令2,且解析指令1与解析指令0之间存在关联关系。The control module of the operating device CPU-2 acquires data, neural network models, and operating instructions 1-1. Among them, the instruction storage submodule is used to store the operation instruction 1-1. The instruction processing sub-module is used to parse the running instruction 1-1, obtain multiple parse instructions such as parse instruction 0, parse instruction 1 and parse instruction 2, and send the multiple parse instructions to the storage queue sub-module and the execution module. The storage queue sub-module is used to store the running instruction queue. The running instruction queue contains the parsing instruction 0, parsing instruction 1 and parsing instruction 2 and other running instructions to be executed by the running device CPU-2. All instructions in the running instruction queue are executed according to The order is in order. For example, the execution order of the obtained multiple analysis instructions is analysis instruction 0, analysis instruction 1, and analysis instruction 2, and there is an association relationship between analysis instruction 1 and analysis instruction 0.
运行设备CPU-2的执行模块接收到多个解析指令后,依赖关系处理子模块判断多个解析指令之间是否存在关联关系。依赖关系处理子模块确定出解析指令1与解析指令0存在关联关系,则将解析指令1缓存至指令存储子模块中,并在确定解析指令0执行完毕之后,从缓存中提取出解析指令1发送至执行模块,以供执行模块执行。After the execution module of the running device CPU-2 receives the multiple analysis commands, the dependency processing sub-module determines whether there is an association relationship between the multiple analysis commands. The dependency processing sub-module determines that there is an association relationship between the parsing instruction 1 and the parsing instruction 0, then caches the parsing instruction 1 in the instruction storage sub-module, and after determining that the parsing instruction 0 has been executed, extracts the parsing instruction 1 from the cache and sends it To the execution module for execution by the execution module.
执行模块接收并执行解析指令0、解析指令1和解析指令2,以完成运行指令1-1的运行。The execution module receives and executes the parsing instruction 0, parsing instruction 1, and parsing instruction 2 to complete the operation of the running instruction 1-1.
以上各模块的工作过程可参考上文的相关描述。For the working process of the above modules, please refer to the relevant description above.
这样,该装置可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。In this way, the device can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low development labor and material costs.
示例2Example 2
神经网络指令处理系统可以包括上述备选设备和指令生成设备。如图8a、图8b所示,用于执行宏指令的备选设备可以为多个,备选设备可以为CPU-1、CPU-2、…、CPU-n,NPU-1、NPU-2、…、NPU-n和GPU-1、GPU-2、…、GPU-n,备选设备被选定用于执行对应的运行指令即为运行设备。神经网络指令处理系统根据某宏指令生成并执行运行指令的工作过程及原理如下。The neural network instruction processing system may include the foregoing alternative device and instruction generation device. As shown in FIGS. 8a and 8b, there may be multiple candidate devices for executing macro instructions. The candidate devices may be CPU-1, CPU-2, ..., CPU-n, NPU-1, NPU-2, ..., NPU-n and GPU-1, GPU-2, ..., GPU-n, the candidate device is selected to execute the corresponding running instruction is the running device. The working process and principle of the neural network instruction processing system generating and executing the operation instruction according to a certain macro instruction are as follows.
资源获取模块14 Resource acquisition module 14
获取备选设备的资源信息,该资源信息包括备选设备的剩余存储容量、存储容量和备选设备所包含的指令集。资源获取模块14将获取到的备选设备的资源信息发送至设备确定模块11和指令生成模块12。Acquire resource information of the candidate device, the resource information includes the remaining storage capacity, storage capacity of the candidate device, and the instruction set included in the candidate device. The resource acquisition module 14 sends the acquired resource information of the candidate device to the device determination module 11 and the instruction generation module 12.
设备确定模块11(包括第一确定子模块111、第二确定子模块112和第三确定子模块113)Device determination module 11 (including first determination sub-module 111, second determination sub-module 112, and third determination sub-module 113)
在接收到宏指令时,根据接收到的宏指令,确定执行宏指令的运行设备。例如,接收到如下宏指令。其中,宏指令可以是来自不同的平台的。When a macro instruction is received, the operating device that executes the macro instruction is determined according to the received macro instruction. For example, the following macro command is received. Among them, the macro instructions can be from different platforms.
宏指令1:@XXX#01……Macro 1: @ XXX # 01 ...
宏指令2:@SSS#02……Macro 2: @ SSS # 02 ...
宏指令3:@DDD#04……Macro 3: @ DDD # 04 ...
宏指令4:@NNN……Macro 4: @NNN ...
第一确定子模块111在确定在宏指令中包含指定设备的标识,且确定该指定设备中包含与宏指令相对应的指令集时,第一确定子模块111可以将该指定设备确定为执行宏指令的运行设备,并将确定的运行设备的标识发送至指令生成模块12。例如,第一确定子模块111可以将标识01所对应的指定设备如CPU-2(CPU-2中包含与宏指令1相对应的指令集)确定为用于执行宏指令1的运行设备。可以将标识02所对应的指定设备如CPU-1(CPU-1中包含与宏指令2相对应的指令集)确定为用于执行宏指令2的运行设备。When the first determining sub-module 111 determines that the identifier of the designated device is included in the macro instruction, and determines that the designated device contains the instruction set corresponding to the macro instruction, the first determining sub-module 111 may determine the designated device to execute the macro The commanded operating device, and send the determined operating device identifier to the command generation module 12. For example, the first determining submodule 111 may determine a designated device corresponding to the identifier 01, such as CPU-2 (CPU-2 contains an instruction set corresponding to the macro instruction 1), as the running device for executing the macro instruction 1. A designated device corresponding to the identifier 02, such as CPU-1 (CPU-1 contains an instruction set corresponding to the macro instruction 2), may be determined as a running device for executing the macro instruction 2.
第三确定子模块113在确定在宏指令中包含指定设备的标识,且确定该指定设备中不包含与宏指令相对应的指令集时,第三确定子模块113可以将包含与宏指令相对应的指令集的备选设备确定为运行设备,并将确定的运行设备的标识发送至指令生成模块12。例如,第三确定子模块113在确定标识04所对应的指定设备中不包含与宏指令3相对应的指令集时,可以将包含与宏指令3的操作类型DDD相对应的指令集的备选设备如NPU-n、NPU-2,确定为用于执行宏指令3的运行设备。When the third determining submodule 113 determines that the identifier of the specified device is included in the macro instruction, and determines that the specified device does not contain the instruction set corresponding to the macro instruction, the third determining submodule 113 may correspond to the inclusion of the macro instruction The candidate device of the instruction set is determined to be the operating device, and the identification of the determined operating device is sent to the instruction generating module 12. For example, when determining that the designated device corresponding to the identifier 04 does not contain the instruction set corresponding to the macro instruction 3, the third determining sub-module 113 may include an alternative of the instruction set corresponding to the operation type DDD of the macro instruction 3 Devices such as NPU-n and NPU-2 are determined to be running devices for executing macro instructions 3.
第二确定子模块112在确定宏指令中不存在指定设备的标识(指定设备的标识所对应的位置为空,或者在宏指令中不包含“指定设备的标识”这个字段)时,第二确定子模块112可以根据该宏指令和备选设备的资源信息,从备选设备中确定出运行设备(具体确定过程详见上文第二确定子模块112的相关描述),并将确定出的运行设备的标识发送至指令生成模块12。例如,由于宏指令4中不存在指定设备的标识,第二确定子模块112可以根据宏指令4的操作类型NNN和备选设备的资源信息(所包含的指令集),从备选设备中确定出用于执行宏指令4的运行设备,例如,GPU-n(GPU-n中包含与操作类型NNN相对应的指令集)。When the second determination submodule 112 determines that the identifier of the designated device does not exist in the macro instruction (the position corresponding to the identifier of the designated device is empty, or does not include the field of "identifier of the designated device" in the macro instruction), the second determination The sub-module 112 may determine the operating device from the candidate device according to the macro instruction and the resource information of the candidate device (for the specific determination process, please refer to the relevant description of the second determining sub-module 112 above), and the determined operation The identification of the device is sent to the instruction generation module 12. For example, since the identifier of the designated device does not exist in the macro instruction 4, the second determination sub-module 112 may determine from the candidate device based on the operation type NNN of the macro instruction 4 and the resource information of the candidate device (instruction set included) A running device for executing macro instructions 4 is provided, for example, GPU-n (GPU-n contains an instruction set corresponding to the operation type NNN).
指令生成模块12(包括第一指令生成模块121和第二指令生成模块122)Instruction generation module 12 (including the first instruction generation module 121 and the second instruction generation module 122)
第一指令生成模块121在运行设备为一个,且运行设备的资源不满足执行宏指令的容量条件时,根据运行设备的运行数据量和数据量将宏指令拆分成多条运行指令,并将多条运行指令发送至队列构 建模块15。例如,根据宏指令2的数据量和运行设备CPU-1的运行数据量生成多条运行指令2-1、2-2、…、2-n。根据宏指令4的数据量和运行设备GPU-n的运行数据量生成多条运行指令4-1、4-2、…、4-n。The first instruction generation module 121 splits the macro instruction into a plurality of operation instructions according to the operation data amount and data amount of the operation equipment when the operation equipment is one and the resources of the operation equipment do not satisfy the capacity condition for executing the macro instruction Multiple running instructions are sent to the queue building module 15. For example, multiple operation instructions 2-1, 2-2, ..., 2-n are generated based on the data amount of the macro instruction 2 and the operation data amount of the operation device CPU-1. A plurality of execution instructions 4-1, 4-2, ..., 4-n are generated according to the data amount of the macro instruction 4 and the operation data amount of the operation device GPU-n.
第一指令生成模块121在确定运行设备为一个,且运行设备的资源满足执行宏指令的容量条件时,可以根据宏指令生成一条运行指令,并将其发送至队列构建模块15。例如,根据宏指令1的数据量和运行设备CPU-2的运行数据量生成一条运行指令1-1。When the first instruction generating module 121 determines that there is only one running device and the resources of the running device meet the capacity condition for executing the macro instruction, it can generate a running instruction according to the macro instruction and send it to the queue construction module 15. For example, an operation instruction 1-1 is generated based on the data amount of the macro instruction 1 and the operation data amount of the operation device CPU-2.
第二指令生成模块122在确定运行设备为多个时,根据每个运行设备的运行数据量和宏指令的数据量对宏指令进行拆分,生成对应于每个运行设备的运行指令,并将其发送至队列构建模块15。例如,根据宏指令3的数据量和运行设备NPU-n的运行数据量、运行设备NPU-2的运行数据量,为运行设备NPU-n生成多条运行指令3-1、3-2、…、3-n,为运行设备NPU-2生成多条运行指令3’-1、3’-2、…、3’-n。When determining that there are multiple operating devices, the second instruction generating module 122 splits the macro instruction according to the operating data amount of each operating device and the data amount of the macro instruction, generates an operating instruction corresponding to each operating device, and Send to queue building module 15. For example, according to the data amount of the macro instruction 3, the operation data amount of the operation device NPU-n, and the operation data amount of the operation device NPU-2, a plurality of operation instructions 3-1, 3-2, ... are generated for the operation device NPU-n , 3-n, generate multiple running instructions 3'-1, 3'-2, ..., 3'-n for running device NPU-2.
队列构建模块15 Queue building module 15
在接收到运行指令时,根据队列排序规则对每个运行设备所需执行的所有运行指令进行排序,根据排序后的运行指令为每个运行设备构建与之唯一对应的指令队列,并将指令队列发送至指令分派模块16。具体地,When receiving a running instruction, it sorts all the running instructions that each running device needs to execute according to the queue sorting rules, and builds a unique instruction queue corresponding to each running device according to the sorted running instructions, and queues the instruction queue Send to instruction dispatch module 16. specifically,
对于被运行设备CPU-2执行的一条运行指令1-1。所构建的对应于运行设备CPU-2的指令队列CPU-2”仅包括运行指令1-1。For a running instruction 1-1 executed by the running device CPU-2. The constructed instruction queue CPU-2 "corresponding to the execution device CPU-2 includes only the execution instruction 1-1.
对于被运行设备CPU-1执行的多条运行指令2-1、2-2、…、2-n。根据队列排序规则对多条运行指令2-1、2-2、…、2-n进行排序,根据排序后的多条运行指令2-1、2-2、…、2-n构建与运行设备CPU-1相对应的指令队列CPU-1”。For multiple execution instructions 2-1, 2-2, ..., 2-n executed by the operation device CPU-1. Sort multiple running instructions 2-1, 2-2, ..., 2-n according to the queue sorting rules, and build and run equipment based on the sorted multiple running instructions 2-1, 2-2, ..., 2-n CPU-1 corresponds to the instruction queue CPU-1 ".
对于被运行设备NPU-n执行的多条运行指令3-1、3-2、…、3-n。根据队列排序规则对多条运行指令3-1、3-2、…、3-n进行排序,根据排序后的多条运行指令3-n、…、3-2、3-1构建与运行设备NPU-n相对应的指令队列NPU-n”。For multiple running instructions 3-1, 3-2, ..., 3-n executed by the running device NPU-n. Sort multiple running instructions 3-1, 3-2, ..., 3-n according to the queue sorting rules, and build and run equipment based on the sorted multiple running instructions 3-n, ..., 3-2, 3-1 The instruction queue NPU-n corresponding to NPU-n ".
对于被运行设备NPU-2执行的多条运行指令3’-1、3’-2、…、3’-n。根据队列排序规则对多条运行指令3’-1、3’-2、…、3’-n进行排序,根据排序后的多条运行指令3’-n、…、3’-2、3’-1构建与运行设备NPU-2相对应的指令队列NPU-2”。For a plurality of running instructions 3'-1, 3'-2, ..., 3'-n executed by the running device NPU-2. Sort the multiple running instructions 3'-1, 3'-2, ..., 3'-n according to the queue sorting rules, and sort the multiple running instructions 3'-n, ..., 3'-2, 3 ' -1 Construct the instruction queue NPU-2 "corresponding to the running device NPU-2".
对于被运行设备GPU-n执行的多条运行指令4-1、4-2、…、4-n。根据队列排序规则对多条运行指令4-1、4-2、…、4-n进行排序,根据排序后的多条运行指令4-1、4-2、…、4-n构建与运行设备GPU-n相对应的指令队列GPU-n”。For multiple execution instructions 4-1, 4-2, ..., 4-n executed by the execution device GPU-n. Sort multiple operating instructions 4-1, 4-2, ..., 4-n according to the queue sorting rules, and build and run equipment based on the sorted multiple operating instructions 4-1, 4-2, ..., 4-n GPU-n corresponding instruction queue GPU-n ".
指令分派模块16 Instruction dispatch module 16
在接收到指令队列之后,将每个指令队列中的运行指令,依次发送至对应的运行设备中,以使运行设备执行运行指令。例如,将指令队列CPU-2”中包括的运行指令1-1发送至其对应的运行设备CPU-2。将指令队列CPU-1”中的多条运行指令2-1、2-2、…、2-n,依次发送至其对应的运行设备CPU-1。将指令队列NPU-n”中的多条运行指令3-n、…、3-2、3-1,依次发送至其对应的运行设备NPU-n。将指令队列NPU-2”中的多条运行指令3’-n、…、3’-2、3’-1,依次发送至其对应的运行设备NPU-2。将队列GPU-n”中的多条运行指令4-1、4-2、…、4-n,依次发送至其对应的运行设备GPU-n。After receiving the instruction queue, the operation instructions in each instruction queue are sequentially sent to the corresponding operation device, so that the operation device executes the operation instruction. For example, the execution instruction 1-1 included in the instruction queue CPU-2 "is sent to its corresponding operation device CPU-2. The multiple execution instructions 2-1, 2-2, ... in the instruction queue CPU-1" , 2-n, sent to its corresponding running device CPU-1 in turn. Send multiple running instructions 3-n, ..., 3-2, 3-1 in the instruction queue NPU-n "to their corresponding running device NPU-n in turn. Send multiple instructions in the instruction queue NPU-2" The operating instructions 3'-n, ..., 3'-2, 3'-1 are sent to its corresponding operating device NPU-2 in sequence. Multiple running instructions 4-1, 4-2, ..., 4-n in the queue GPU-n "are sequentially sent to its corresponding running device GPU-n.
其中,上述运行设备CPU-2、运行设备CPU-1、运行设备NPU-n和运行设备NPU-2在接收到指令队列之后,按照指令队列中运行指令的排列顺序,依次执行运行指令。以运行设备CPU-2为例,描述其执行所接收到的运行指令的具体过程。运行设备CPU-2包括控制模块21、执行模块22和存储模块23。其中,控制模块21包括指令存储子模块211、指令处理子模块212和存储队列子模块213,执行模块22包括依赖关系处理子模块221,详见上文关于运行设备的相关描述。Wherein, the above-mentioned running device CPU-2, running device CPU-1, running device NPU-n and running device NPU-2 execute the running instructions in sequence according to the order of running instructions in the command queue after receiving the instruction queue. Taking the running device CPU-2 as an example, the specific process of executing the received running instruction is described. The running device CPU-2 includes a control module 21, an execution module 22, and a storage module 23. The control module 21 includes an instruction storage sub-module 211, an instruction processing sub-module 212 and a storage queue sub-module 213, and the execution module 22 includes a dependency relationship processing sub-module 221. For details, please refer to the above related description of the running device.
假定根据宏指令1所生成的运行指令1-1为“@XXX……”。运行设备CPU-2在接收到运行指令1-1之后,执行运行指令1-1的过程如下:Assume that the run instruction 1-1 generated according to the macro instruction 1 is "@XXX ...". After receiving the operation instruction 1-1, the operation device CPU-2 executes the operation instruction 1-1 as follows:
运行设备CPU-2的控制模块21获取数据、神经网络模型以及运行指令1-1。其中,指令存储子模块211用于存储运行指令1-1。指令处理子模块212用于对运行指令1-1进行解析,获得多个解析指令如解析指令0、解析指令1和解析指令2,并将多个解析指令发送至存储队列子模块213和执行模块22。存储队列子模块213用于存储运行指令队列,运行指令队列中包含运行设备CPU-2所需执行的解析指令0、 解析指令1和解析指令2以及其他运行指令,在运行指令队列中所有指令按照执行的先后顺序依次排列。例如,获得的多个解析指令的执行的先后顺序为解析指令0、解析指令1和解析指令2,且解析指令1与解析指令0之间存在关联关系。The control module 21 of the operating device CPU-2 acquires data, neural network models, and operating instructions 1-1. Among them, the instruction storage submodule 211 is used to store the running instruction 1-1. The instruction processing sub-module 212 is used to parse the running instruction 1-1, obtain multiple parse instructions such as parse instruction 0, parse instruction 1 and parse instruction 2, and send the multiple parse instructions to the storage queue sub-module 213 and execution module twenty two. The storage queue sub-module 213 is used to store a running instruction queue. The running instruction queue contains the parsing instruction 0, parsing instruction 1 and parsing instruction 2 and other running instructions that need to be executed by the running device CPU-2. All instructions in the running instruction queue follow The order of execution is in order. For example, the execution order of the obtained multiple analysis instructions is analysis instruction 0, analysis instruction 1, and analysis instruction 2, and there is an association relationship between analysis instruction 1 and analysis instruction 0.
运行设备CPU-2的执行模块22接收到多个解析指令后,其中的依赖关系处理子模块221判断多个解析指令之间是否存在关联关系。依赖关系处理子模块221确定出解析指令1与解析指令0存在关联关系,则将解析指令1缓存至指令存储子模块211中,并在确定解析指令0执行完毕之后,从缓存中提取出解析指令1发送至执行模块22,以供执行模块22执行。After the execution module 22 of the running device CPU-2 receives multiple parsing instructions, the dependency processing sub-module 221 determines whether there is an association relationship between the parsing instructions. The dependency processing submodule 221 determines that there is an association relationship between the parsing instruction 1 and the parsing instruction 0, and then caches the parsing instruction 1 in the instruction storage submodule 211, and after determining that the parsing instruction 0 has been executed, extracts the parsing instruction from the cache 1 Send to execution module 22 for execution module 22 to execute.
执行模块22接收并执行解析指令0、解析指令1和解析指令2,以完成运行指令1-1的运行。The execution module 22 receives and executes the parsing instruction 0, parsing instruction 1, and parsing instruction 2 to complete the operation of the operation instruction 1-1.
以上各模块的工作过程可参考上文的相关描述。For the working process of the above modules, please refer to the relevant description above.
这样,该系统可跨平台使用,适用性好,指令转换的速度快、处理效率高、出错几率低,且开发的人力、物力成本低。In this way, the system can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low development labor and material costs.
本公开提供一种机器学习运算装置,该机器学习运算装置可以包括一个或多个上述神经网络指令生成装置(或者包括一个或多个上述神经网络指令处理系统),用于从其他处理装置中获取待运算数据和控制信息,执行指定的机器学习运算。该机器学习运算装置可以从其他机器学习运算装置或非机器学习运算装置中获得宏指令或待执行指令,并将执行结果通过I/O接口传递给外围设备(也可称其他处理装置)。外围设备譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口,服务器。当包含一个以上神经网络指令生成装置(或者神经网络指令处理系统)时,神经网络指令生成装置(或者神经网络指令处理系统)间可以通过特定的结构进行链接并传输数据,譬如,通过快速外部设备互连总线(也即PCIE总线)进行互联并传输数据,以支持更大规模的神经网络的运算。此时,可以共享同一控制系统,也可以有各自独立的控制系统;可以共享内存,也可以每个加速器有各自的内存。此外,其互联方式可以是任意互联拓扑。The present disclosure provides a machine learning computing device. The machine learning computing device may include one or more of the aforementioned neural network instruction generating devices (or include one or more of the aforementioned neural network instruction processing systems) for obtaining from other processing devices To calculate the data and control information, perform the specified machine learning operation. The machine learning computing device can obtain macroinstructions or instructions to be executed from other machine learning computing devices or non-machine learning computing devices, and transfer the execution results to peripheral devices (also called other processing devices) through the I / O interface. Peripheral equipment such as camera, monitor, mouse, keyboard, network card, wifi interface, server. When more than one neural network command generation device (or neural network command processing system) is included, the neural network command generation device (or neural network command processing system) can link and transmit data through a specific structure, for example, through a fast external device The interconnect bus (that is, the PCIE bus) interconnects and transmits data to support larger-scale neural network operations. At this time, you can share the same control system or have separate control systems; you can share memory, or each accelerator has its own memory. In addition, the interconnection method can be any interconnection topology.
该机器学习运算装置具有较高的兼容性,可通过PCIE接口与各种类型的服务器相连接。The machine learning computing device has high compatibility, and can be connected with various types of servers through the PCIE interface.
图9a示出根据本公开一实施例的组合处理装置的框图。如图9a所示,该组合处理装置包括上述机器学习运算装置、通用互联接口和其他处理装置。机器学习运算装置与其他处理装置进行交互,共同完成用户指定的操作。9a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in FIG. 9a, the combined processing device includes the above-mentioned machine learning computing device, a universal interconnection interface, and other processing devices. The machine learning computing device interacts with other processing devices to complete the operation specified by the user.
其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口,包括数据搬运,完成对本机器学习运算装置的开启、停止等基本控制;其他处理装置也可以和机器学习运算装置协作共同完成运算任务。Other processing devices include one or more types of general-purpose / special-purpose processors such as central processing unit CPU, graphics processor GPU, neural network processor. The number of processors included in other processing devices is not limited. Other processing devices serve as an interface between the machine learning computing device and external data and control, including data handling, to complete the basic control of starting and stopping the machine learning computing device; other processing devices can also cooperate with the machine learning computing device to complete the computing task.
通用互联接口,用于在机器学习运算装置与其他处理装置间传输数据和控制指令。该机器学习运算装置从其他处理装置中获取所需的输入数据,写入机器学习运算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入机器学习运算装置片上的控制缓存;也可以读取机器学习运算装置的存储模块中的数据并传输给其他处理装置。General interconnection interface, used to transfer data and control instructions between machine learning computing devices and other processing devices. The machine learning computing device obtains the required input data from other processing devices and writes them into the on-chip storage device of the machine learning computing device; it can obtain control instructions from other processing devices and write them into the control cache of the machine learning computing device; The data in the storage module of the machine learning computing device can be read and transmitted to other processing devices.
图9b示出根据本公开一实施例的组合处理装置的框图。在一种可能的实现方式中,如图9b所示,该组合处理装置还可以包括存储装置,存储装置分别与机器学习运算装置和所述其他处理装置连接。存储装置用于保存在机器学习运算装置和所述其他处理装置的数据,尤其适用于所需要运算的数据在本机器学习运算装置或其他处理装置的内部存储中无法全部保存的数据。9b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 9b, the combined processing device may further include a storage device, and the storage device is respectively connected to the machine learning operation device and the other processing device. The storage device is used to store data stored in the machine learning computing device and the other processing devices, and is particularly suitable for data that cannot be saved in the internal storage of the machine learning computing device or other processing devices.
该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。The combined processing device can be used as an SOC on-chip system for mobile phones, robots, drones, video surveillance equipment, etc., effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption. In this case, the general interconnection interface of the combined processing device is connected to some components of the device. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
本公开提供一种机器学习芯片,该芯片包括上述机器学习运算装置或组合处理装置。The present disclosure provides a machine learning chip including the above machine learning arithmetic device or combination processing device.
本公开提供一种机器学习芯片封装结构,该机器学习芯片封装结构包括上述机器学习芯片。The present disclosure provides a machine learning chip packaging structure including the above machine learning chip.
本公开提供一种板卡,图10示出根据本公开一实施例的板卡的结构示意图。如图10所示,该板卡包括上述机器学习芯片封装结构或者上述机器学习芯片。板卡除了包括机器学习芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392。The present disclosure provides a board card. FIG. 10 shows a schematic diagram of a board card according to an embodiment of the present disclosure. As shown in FIG. 10, the board includes the above machine learning chip packaging structure or the above machine learning chip. In addition to the machine learning chip 389, the board may also include other supporting components. The supporting components include but are not limited to: a storage device 390, an interface device 391, and a control device 392.
存储器件390与机器学习芯片389(或者机器学习芯片封装结构内的机器学习芯片)通过总线连接,用于存储数据。存储器件390可以包括多组存储单元393。每一组存储单元393与机器学习芯片389通过总线连接。可以理解,每一组存储单元393可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The storage device 390 and the machine learning chip 389 (or the machine learning chip in the machine learning chip package structure) are connected via a bus, and are used to store data. The memory device 390 may include multiple sets of memory cells 393. Each group of storage units 393 and the machine learning chip 389 are connected by a bus. It can be understood that each group of storage units 393 may be DDR SDRAM (English: Double Data Rate SDRAM, double rate synchronous dynamic random access memory).
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
在一个实施例中,存储器件390可以包括4组存储单元393。每一组存储单元393可以包括多个DDR4颗粒(芯片)。在一个实施例中,机器学习芯片389内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组存储单元393中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。In one embodiment, the memory device 390 may include 4 sets of memory cells 393. Each group of memory cells 393 may include multiple DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include four 72-bit DDR4 controllers. Among the 72-bit DDR4 controllers, 64 bits are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transmission can reach 25600MB / s.
在一个实施例中,每一组存储单元393包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在机器学习芯片389中设置控制DDR的控制器,用于对每个存储单元393的数据传输与数据存储的控制。In one embodiment, each group of storage units 393 includes multiple double-rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling the data transmission and data storage of each storage unit 393.
接口装置391与机器学习芯片389(或者机器学习芯片封装结构内的机器学习芯片)电连接。接口装置391用于实现机器学习芯片389与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,接口装置391可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至机器学习芯片289,实现数据转移。优选的,当采用PCIE 3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,接口装置391还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,接口装置能够实现转接功能即可。另外,机器学习芯片的计算结果仍由接口装置传送回外部设备(例如服务器)。The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip in the machine learning chip packaging structure). The interface device 391 is used to realize data transmission between the machine learning chip 389 and an external device (such as a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the machine learning chip 289 through a standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB / s. In another embodiment, the interface device 391 may also be other interfaces. The present disclosure does not limit the specific expressions of the other interfaces described above, and the interface device can implement the transfer function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (such as a server) by the interface device.
控制器件392与机器学习芯片389电连接。控制器件392用于对机器学习芯片389的状态进行监控。具体的,机器学习芯片389与控制器件392可以通过SPI接口电连接。控制器件392可以包括单片机(Micro Controller Unit,MCU)。如机器学习芯片389可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,机器学习芯片389可以处于多负载和轻负载等不同的工作状态。通过控制器件可以实现对机器学习芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。The control device 392 is electrically connected to the machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a microcontroller (Micro Controller Unit, MCU). For example, the machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and may drive multiple loads. Therefore, the machine learning chip 389 can be in different working states such as multiple loads and light loads. The control device can realize the regulation of the working states of multiple processing chips, multiple processes and / or multiple processing circuits in the machine learning chip.
本公开提供一种电子设备,该电子设备包括上述机器学习芯片或板卡。The present disclosure provides an electronic device including the aforementioned machine learning chip or board.
电子设备可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic equipment can include data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, Earphones, mobile storage, wearable devices, vehicles, household appliances, and / or medical devices.
交通工具可以包括飞机、轮船和/或车辆。家用电器可以包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机。医疗设备可以包括核磁共振仪、B超仪和/或心电图仪。Vehicles may include airplanes, ships, and / or vehicles. Household appliances may include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods. The medical equipment may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus, and / or an electrocardiograph.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the sequence of actions described. Because according to the present disclosure, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.
进一步需要说明的是,虽然图4、图7的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图4、图7中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be further noted that although the steps in the flowcharts of FIG. 4 and FIG. 7 are displayed in order according to the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIGS. 4 and 7 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution order of the stages is not necessarily sequential, but may be executed in turn or alternately with other steps or sub-steps of the other steps or at least a part of the stages.
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。It should be understood that the above device embodiments are only schematic, and the device of the present disclosure may also be implemented in other ways. For example, the division of the units / modules in the above embodiments is only a division of logical functions, and there may be other divisions in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be ignored or not implemented.
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, unless otherwise specified, each functional unit / module in each embodiment of the present disclosure may be integrated into one unit / module, or each unit / module may exist alone physically, or two or more units / The modules are integrated together. The above integrated units / modules may be implemented in the form of hardware or software program modules.
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。If the integrated unit / module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, or the like. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, and so on. Unless otherwise specified, the artificial intelligence processor may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Memory Cube), etc. Wait.
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit / module is implemented in the form of a software program module and sold or used as an independent product, it may be stored in a computer-readable memory. Based on such an understanding, the technical solution of the present disclosure essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a memory, Several instructions are included to enable a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是非易失性计算机可读存储介质。An embodiment of the present disclosure also proposes a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented. The computer-readable storage medium may be a non-volatile computer-readable storage medium.
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to call the instructions stored in the memory to perform the above method.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed in an embodiment, you can refer to the related descriptions of other embodiments. The technical features of the above embodiments can be arbitrarily combined. To simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上所述,本说明书内容不应理解为对本公开的限制。The embodiments of the present disclosure have been described in detail above, and specific examples have been used herein to explain the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, those skilled in the art based on the ideas of the present disclosure, based on the specific embodiments of the present disclosure and the changes or modifications made in the scope of application, all fall within the scope of protection of the present disclosure. In summary, the content of this specification should not be construed as limiting the disclosure.

Claims (19)

  1. 一种神经网络指令生成装置,其特征在于,所述装置包括:A neural network instruction generating device, characterized in that the device comprises:
    设备确定模块,用于根据接收到的宏指令,确定执行所述宏指令的运行设备;A device determination module, configured to determine a running device that executes the macro instruction according to the received macro instruction;
    指令生成模块,用于根据所述宏指令和所述运行设备,生成运行指令。The instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.
  2. 根据权利要求1所述的装置,其特征在于,所述设备确定模块,包括:The apparatus according to claim 1, wherein the device determination module includes:
    第一确定子模块,用于在确定所述宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述宏指令的执行条件时,将所述指定设备确定为所述运行设备,The first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ,
    其中,所述执行条件包括:所述指定设备包含与所述宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
  3. 根据权利要求2所述的装置,其特征在于,所述装置还包括:The device according to claim 2, wherein the device further comprises:
    资源获取模块,用于获取备选设备的资源信息,所述资源信息包括所述备选设备所包含的指令集,A resource obtaining module, configured to obtain resource information of the candidate device, the resource information includes an instruction set included in the candidate device,
    所述设备确定模块,还包括第二确定子模块和/或第三确定子模块,The device determination module further includes a second determination submodule and / or a third determination submodule,
    所述第二确定子模块,用于在确定所述宏指令中不包含所述指定设备的标识时,根据接收到的宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述宏指令的运行设备,The second determining submodule is configured to, from the candidate device, according to the received macro instruction and the resource information of the candidate device when determining that the macro instruction does not include the identifier of the designated device Determine a running device for executing the macro instruction,
    所述第三确定子模块,在确定所述宏指令包含所述指定设备的标识,且所述指定设备的资源不满足执行所述宏指令的执行条件时,根据所述宏指令和所述备选设备的资源信息,确定运行设备。The third determining sub-module, when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the backup Select the resource information of the device and determine the operating device.
  4. 根据权利要求2或3所述的装置,其特征在于,所述宏指令包含输入量和输出量中的至少一项,The device according to claim 2 or 3, wherein the macro instruction includes at least one of an input quantity and an output quantity,
    所述指令生成模块,还用于确定所述宏指令的数据量,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,The instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,
    其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  5. 根据权利要求4所述的装置,其特征在于,所述指令生成模块包括第一指令生成子模块和/或第二指令生成子模块,The apparatus according to claim 4, wherein the instruction generation module includes a first instruction generation sub-module and / or a second instruction generation sub-module,
    所述第一指令生成子模块,用于在确定所述运行设备为一个,且所述运行设备的资源不满足执行所述宏指令的容量条件时,根据所述运行设备的运行数据量和所述数据量将所述宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令;The first instruction generating sub-module is used to determine the number of the running device and the resource of the running device does not satisfy the capacity condition for executing the macro instruction, according to the amount of running data and The amount of data splits the macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence;
    所述第二指令生成子模块,用于在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述宏指令进行拆分,生成对应于每个运行设备的运行指令,The second instruction generation sub-module is used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,
    其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
  6. 根据权利要求1所述的装置,其特征在于,所述装置还包括队列构建模块、宏指令生成模块和指令分派模块中的至少一个模块,The apparatus according to claim 1, wherein the apparatus further comprises at least one module of a queue construction module, a macro instruction generation module and an instruction dispatch module,
    所述队列构建模块,用于根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列;The queue construction module is configured to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions;
    所述宏指令生成模块,用于接收待执行指令,根据确定的指定设备的标识和所述待执行指令生成所述宏指令;The macro instruction generation module is configured to receive an instruction to be executed, and generate the macro instruction according to the determined identifier of the designated device and the instruction to be executed;
    所述指令分派模块,用于将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,The instruction dispatching module is configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,
    其中,所述指令分派模块,包括:Wherein, the instruction dispatching module includes:
    指令汇编子模块,用于根据所述运行指令生成汇编文件;An instruction assembly submodule, used to generate an assembly file according to the operation instruction;
    汇编翻译子模块,用于将所述汇编文件翻译成二进制文件;An assembly translation submodule, used to translate the assembly file into a binary file;
    指令发送子模块,用于将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  7. 根据权利要求1所述的装置,其特征在于,The device according to claim 1, characterized in that
    所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
    所述装置设置于CPU和/或NPU中;The device is set in the CPU and / or NPU;
    所述宏指令包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运宏指令,The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,
    其中,所述计算宏指令包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
    所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,
    数据搬运宏指令包括读宏指令和写宏指令中的至少一种,所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种,所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种;The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
    所述宏指令包含以下选项中的至少一项:用于执行所述宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数,The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
    所述运行指令包含以下选项中的至少一项:所述操作类型、所述输入地址、所述输出地址、所述操作数和所述指令参数。The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
  8. 一种机器学习运算装置,其特征在于,所述装置包括:A machine learning computing device, characterized in that the device includes:
    一个或多个如权利要求1-6任一项所述的神经网络指令生成装置,用于从其他处理装置中获取待运算数据和控制信息,并执行指定的机器学习运算,将执行结果通过I/O接口传递给其他处理装置;One or more neural network instruction generating devices according to any one of claims 1-6, which are used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;
    当所述机器学习运算装置包含多个所述神经网络指令生成装置时,所述多个所述神经网络指令生成装置间可以通过特定的结构进行连接并传输数据;When the machine learning operation device includes a plurality of the neural network instruction generation devices, the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure;
    其中,多个所述神经网络指令生成装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的机器学习的运算;多个所述神经网络指令生成装置共享同一控制系统或拥有各自的控制系统;多个所述神经网络指令生成装置共享内存或者拥有各自的内存;多个所述神经网络指令生成装置的互联方式是任意互联拓扑。Among them, a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.
  9. 一种组合处理装置,其特征在于,所述装置包括:A combined processing device, characterized in that the device comprises:
    如权利要求8所述的机器学习运算装置、通用互联接口和其他处理装置;The machine learning computing device, universal interconnection interface and other processing devices as claimed in claim 8;
    所述机器学习运算装置与所述其他处理装置进行交互,共同完成用户指定的计算操作,The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,
    其中,所述装置还包括:存储装置,该存储装置分别与所述机器学习运算装置和所述其他处理装置连接,用于保存所述机器学习运算装置和所述其他处理装置的数据。Wherein, the device further includes a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
  10. 一种机器学习芯片,其特征在于,所述机器学习芯片包括:A machine learning chip, characterized in that the machine learning chip includes:
    如权利要求8所述的机器学习运算装置或如权利要求9所述的组合处理装置。The machine learning arithmetic device according to claim 8 or the combined processing device according to claim 9.
  11. 一种电子设备,其特征在于,所述电子设备包括:An electronic device, characterized in that the electronic device includes:
    如权利要求10所述的机器学习芯片。The machine learning chip according to claim 10.
  12. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求10所述的机器学习芯片;A board card, characterized in that the board card comprises: a storage device, an interface device and a control device, and the machine learning chip according to claim 10;
    其中,所述机器学习芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;
    所述存储器件,用于存储数据;The storage device is used for storing data;
    所述接口装置,用于实现所述机器学习芯片与外部设备之间的数据传输;The interface device is used to realize data transmission between the machine learning chip and an external device;
    所述控制器件,用于对所述机器学习芯片的状态进行监控。The control device is used for monitoring the state of the machine learning chip.
  13. 一种神经网络指令生成方法,其特征在于,所述方法包括:A neural network instruction generation method, characterized in that the method includes:
    根据接收到的宏指令,确定执行所述宏指令的运行设备;According to the received macro instruction, determine the execution device for executing the macro instruction;
    根据所述宏指令和所述运行设备,生成运行指令。According to the macro instruction and the operation device, an operation instruction is generated.
  14. 根据权利要求13所述的方法,其特征在于,根据接收到的宏指令,确定执行所述宏指令的运行设备,包括:The method according to claim 13, characterized in that, according to the received macro instruction, determining a running device that executes the macro instruction includes:
    在确定所述宏指令中包含指定设备的标识,且所述指定设备的资源满足执行所述宏指令的执行条件时,将所述指定设备确定为所述运行设备,When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, determining the designated device as the running device,
    其中,所述执行条件包括:所述指定设备中包含与所述宏指令相对应的指令集。Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:The method according to claim 14, wherein the method further comprises:
    获取备选设备的资源信息,所述资源信息包括所述备选设备所包含的指令集,Acquiring resource information of the candidate device, the resource information includes an instruction set included in the candidate device,
    其中,根据接收到的宏指令,确定执行所述宏指令的运行设备,包括以下任一项处理:Wherein, according to the received macro instruction, determining the execution device for executing the macro instruction includes any one of the following processes:
    在确定所述宏指令中不包含所述指定设备的标识时,根据接收到的宏指令和所述备选设备的资源信息,从所述备选设备中确定出用于执行所述宏指令的运行设备,或者When it is determined that the macro instruction does not include the identifier of the designated device, according to the received macro instruction and the resource information of the candidate device, determine from the candidate device for executing the macro instruction Run the device, or
    在确定所述宏指令中包含所述指定设备的标识,且所述指定设备的资源不满足执行所述宏指令的执行条件时,根据所述宏指令和所述备选设备的资源信息,确定运行设备。When it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Run the device.
  16. 根据权利要求14或15所述的方法,其特征在于,所述宏指令包含输入量和输出量中的至少一项,The method according to claim 14 or 15, wherein the macro instruction includes at least one of an input quantity and an output quantity,
    根据所述宏指令和所述运行设备,生成运行指令,包括:According to the macro instruction and the operation device, generating an operation instruction includes:
    确定所述宏指令的数据量,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,Determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device,
    其中,所述数据量是根据所述输入量和所述输出量中的至少一项确定的,所述运行设备的资源信息还包括存储容量、剩余存储容量的至少一项。Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
  17. 根据权利要求16所述的方法,其特征在于,根据所述宏指令的数据量、所述宏指令和所述运行设备的资源信息,生成运行指令,包括以下任一项处理:The method according to claim 16, wherein generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes any one of the following processes:
    在确定所述运行设备为一个,且所述运行设备的资源不满足执行所述宏指令的容量条件时,根据所述运行设备的运行数据量和所述数据量将所述宏指令拆分成多条运行指令,以使所述运行设备依次执行多条运行指令,或者When it is determined that the running device is one, and the resources of the running device do not satisfy the capacity condition for executing the macro instruction, the macro instruction is split into according to the running data amount of the running device and the data amount Multiple running instructions, so that the running device sequentially executes multiple running instructions, or
    在确定所述运行设备为多个时,根据每个运行设备的运行数据量和所述数据量对所述宏指令进行拆分,生成对应于每个运行设备的运行指令,When it is determined that there are a plurality of operation devices, the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,
    其中,每个运行设备的运行数据量是根据每个运行设备的资源信息确定的,所述运行指令包含运行输入量和运行输出量中的至少一项,所述运行输入量和所述运行输出量是根据执行所述运行指令的运行设备的运行数据量确定的。The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
  18. 根据权利要求13所述的方法,其特征在于,所述方法还包括以下至少一项处理:The method according to claim 13, wherein the method further comprises at least one of the following processes:
    根据队列排序规则对所述运行指令进行排序,根据排序后的运行指令构建与所述运行设备相对应的指令队列;Sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions;
    接收待执行指令,根据确定的指定设备的标识和所述待执行指令生成所述宏指令;Receiving an instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed;
    将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,
    其中,将所述运行指令发送至所述运行设备,以使所述运行设备执行所述运行指令,包括:Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:
    根据所述运行指令生成汇编文件;Generate an assembly file according to the operation instruction;
    将所述汇编文件翻译成二进制文件;Translate the assembly file into a binary file;
    将所述二进制文件发送至所述运行设备,以使所述运行设备根据所述二进制文件执行所述运行指令。Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
  19. 根据权利要求13所述的方法,其特征在于,The method according to claim 13, characterized in that
    所述运行设备为CPU、GPU和NPU中的其中一种或任意组合;The running device is one or any combination of CPU, GPU and NPU;
    所述方法应用于CPU和/或NPU;The method is applied to CPU and / or NPU;
    所述宏指令包括以下指令中的至少一种:计算宏指令、控制宏指令和数据搬运宏指令,The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,
    其中,所述计算宏指令包括神经网络计算宏指令、向量逻辑计算宏指令、矩阵向量计算宏指令、标量计算宏指令和标量逻辑计算宏指令中的至少一种,The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,
    所述控制宏指令包括无条件跳转宏指令和有条件跳转宏指令中的至少一种,The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,
    数据搬运宏指令包括读宏指令和写宏指令中的至少一种,所述读宏指令包括读神经元宏指令、读突触宏指令和读标量宏指令中的至少一种,所述写宏指令包括写神经元宏指令、写突触宏指令和写标量宏指令中的至少一种;The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;
    所述宏指令包含以下选项中的至少一项:用于执行所述宏指令的指定设备的标识、操作类型、输入地址、输出地址、输入量、输出量、操作数和指令参数,The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,
    所述运行指令包含以下选项中的至少一项:所述操作类型、所述输入地址、所述输出地址、所述操作数和所述指令参数。The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
PCT/CN2019/111852 2018-10-19 2019-10-18 Computation method and apparatus, and related product WO2020078446A1 (en)

Applications Claiming Priority (40)

Application Number Priority Date Filing Date Title
CN201811222476.4A CN111079907B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811222476.4 2018-10-19
CN201811221643.3 2018-10-19
CN201811220922.8 2018-10-19
CN201811222553.6 2018-10-19
CN201811221780.7 2018-10-19
CN201811222531.X 2018-10-19
CN201811221685.7A CN111079911B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811222523.5 2018-10-19
CN201811222454.8 2018-10-19
CN201811222454.8A CN111078280B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811222553.6A CN111078125B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811221780.7A CN111079913B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811220923.2 2018-10-19
CN201811221806.8A CN111079925B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811221783.0A CN111078281B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811221788.3A CN111078282B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811221763.3A CN111079912B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811222530.5A CN111079915B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811221778.XA CN111078291B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811222491.9A CN111078284B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811221763.3 2018-10-19
CN201811222555.5A CN111078285B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811221789.8 2018-10-19
CN201811220923.2A CN111079910B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811221783.0 2018-10-19
CN201811222555.5 2018-10-19
CN201811222523.5A CN111079914B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811221778.X 2018-10-19
CN201811221788.3 2018-10-19
CN201811222531.XA CN111079916B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811221685.7 2018-10-19
CN201811221806.8 2018-10-19
CN201811220957.1 2018-10-19
CN201811220922.8A CN111079909B (en) 2018-10-19 2018-10-19 Operation method, system and related product
CN201811221643.3A CN111078293B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811222530.5 2018-10-19
CN201811221789.8A CN111078283B (en) 2018-10-19 2018-10-19 Operation method, device and related product
CN201811222491.9 2018-10-19
CN201811220957.1A CN111079924B (en) 2018-10-19 2018-10-19 Operation method, system and related product

Publications (1)

Publication Number Publication Date
WO2020078446A1 true WO2020078446A1 (en) 2020-04-23

Family

ID=70283666

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/111852 WO2020078446A1 (en) 2018-10-19 2019-10-18 Computation method and apparatus, and related product

Country Status (1)

Country Link
WO (1) WO2020078446A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179434A1 (en) * 2014-12-19 2016-06-23 Intel Corporation Storage device and method for performing convolution operations
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor
CN107103113A (en) * 2017-03-23 2017-08-29 中国科学院计算技术研究所 Towards the Automation Design method, device and the optimization method of neural network processor
CN107886167A (en) * 2016-09-29 2018-04-06 北京中科寒武纪科技有限公司 Neural network computing device and method
CN108416431A (en) * 2018-01-19 2018-08-17 上海兆芯集成电路有限公司 Neural network microprocessor and macro instruction processing method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179434A1 (en) * 2014-12-19 2016-06-23 Intel Corporation Storage device and method for performing convolution operations
CN107886167A (en) * 2016-09-29 2018-04-06 北京中科寒武纪科技有限公司 Neural network computing device and method
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor
CN107103113A (en) * 2017-03-23 2017-08-29 中国科学院计算技术研究所 Towards the Automation Design method, device and the optimization method of neural network processor
CN108416431A (en) * 2018-01-19 2018-08-17 上海兆芯集成电路有限公司 Neural network microprocessor and macro instruction processing method

Similar Documents

Publication Publication Date Title
EP3451151B1 (en) Apparatus and method for executing vector comparison operation
WO2017185404A1 (en) Apparatus and method for performing vector logical operation
CN107315565B (en) Device and method for generating random vectors obeying certain distribution
WO2020078446A1 (en) Computation method and apparatus, and related product
CN111079909B (en) Operation method, system and related product
CN112465116B (en) Compiling method, operation method, electronic device, and storage medium
CN111079925B (en) Operation method, device and related product
CN111078284B (en) Operation method, system and related product
WO2020073925A1 (en) Operation method and apparatus, computer device and storage medium
CN111078291B (en) Operation method, system and related product
KR102467544B1 (en) arithmetic device and its operating method
CN111079924B (en) Operation method, system and related product
CN111338694B (en) Operation method, device, computer equipment and storage medium
WO2022001438A1 (en) Computing apparatus, integrated circuit chip, board card, device and computing method
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111339060B (en) Operation method, device, computer equipment and storage medium
WO2020192587A1 (en) Artificial intelligence computing device and related product
CN111078285B (en) Operation method, system and related product
WO2022253287A1 (en) Method for generating random number, and related product thereof
WO2020073923A1 (en) Operation method and device, computer equipment, and storage medium
CN111079914B (en) Operation method, system and related product
WO2022001496A1 (en) Computing apparatus, integrated circuit chip, board card, electronic device, and computing method
CN111078125B (en) Operation method, device and related product
CN111078282B (en) Operation method, device and related product
CN111078280B (en) Operation method, device and related product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19872873

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19872873

Country of ref document: EP

Kind code of ref document: A1