WO2020078446A1

WO2020078446A1 - Computation method and apparatus, and related product

Info

Publication number: WO2020078446A1
Application number: PCT/CN2019/111852
Authority: WO
Inventors: 兰慧盈; 杜子东
Original assignee: 中科寒武纪科技股份有限公司
Priority date: 2018-10-19
Filing date: 2019-10-18
Publication date: 2020-04-23

Abstract

A computation method and apparatus, and a related product. A combined processing apparatus therein comprises: a machine learning computation apparatus, a universal interconnection interface, and other processing apparatuses; the machine learning computation apparatus interacts with the other processing apparatuses to collectively implement calculation operations specified by a user, and the combined processing apparatus also comprises: a storage apparatus, the storage apparatus respectively being connected to the machine learning computation apparatus and the other processing apparatuses, and being used for storing data of the machine learning computation apparatus and the other processing apparatuses. The present computation method and apparatus and related product can be used across platforms, and have good usability, rapid command conversion speed, high processing efficiency, low probability of error, and low labour and material costs for development.

Description

Calculation method, device and related products

Technical field

The present disclosure relates to the field of information processing technology, and in particular, to a neural network instruction generation method, device, and related products.

Background technique

With the continuous development of technology, the use of neural network algorithms is becoming more and more widespread. It has been well used in image recognition, speech recognition, natural language processing and other fields. However, due to the increasing complexity of neural network algorithms, the scale of its models continues to increase. A large-scale neural network model based on Graphics (Processing Unit, GPU) and Central Processing Unit (CPU) requires a lot of calculation time and consumes a lot of power. In the related art, the method of accelerating the processing speed of the neural network model has problems such as inability to cross-platform processing, low processing efficiency, high development cost, and error-prone.

Summary of the invention

Based on this, it is necessary to provide a neural network instruction generation method, device, and related products for the above technical problems, so that it can be used across platforms, improve processing efficiency, and reduce the probability of errors and development costs.

According to a first aspect of the present disclosure, there is provided a neural network instruction generation device, the device comprising:

A device determination module, configured to determine a running device that executes the macro instruction according to the received macro instruction;

The instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.

According to a second aspect of the present disclosure, there is provided a machine learning computing device, the device comprising:

One or more neural network instruction generation devices described in the first aspect above, used to obtain data and control information to be calculated from other processing devices, and perform a specified machine learning operation, and pass the execution result to the I / O interface Other processing devices;

When the machine learning operation device includes a plurality of the neural network instruction generation devices, the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure;

Among them, a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.

According to a third aspect of the present disclosure, there is provided a combined processing device, the device comprising:

The machine learning computing device, general interconnection interface and other processing devices described in the second aspect above;

The machine learning operation device interacts with the other processing device to jointly complete the calculation operation specified by the user.

According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network computing device described in the second aspect above or the combined processing device described in the third aspect above.

According to a fifth aspect of the present disclosure, there is provided a machine learning chip packaging structure including the machine learning chip described in the fourth aspect above.

According to a sixth aspect of the present disclosure, there is provided a board card including the machine learning chip packaging structure described in the fifth aspect above.

According to a seventh aspect of the present disclosure, there is provided an electronic device including the machine learning chip described in the above fourth aspect or the board described in the sixth aspect above.

According to an eighth aspect of the present disclosure, there is provided a neural network instruction generation method, the method comprising:

According to the received macro instruction, determine the execution device for executing the macro instruction;

According to the macro instruction and the operation device, an operation instruction is generated.

In some embodiments, the electronic device includes a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and / or medical devices.

In some embodiments, the vehicle includes an airplane, ship, and / or vehicle; the household appliance includes a TV, air conditioner, microwave oven, refrigerator, rice cooker, humidifier, washing machine, electric lamp, gas stove, and range hood; and the medical Equipment includes MRI, B-mode ultrasound and / or electrocardiograph.

The neural network instruction generation method, device and related products provided by the embodiments of the present disclosure include a device determination module and an instruction generation module. The device determination module is used to determine a running device that executes the macro instruction based on the received macro instruction. The instruction generation module is used to generate the operation instruction according to the macro instruction and the operation equipment. The method, device and related products can be used across platforms, with good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material costs for development.

Other features and aspects of the present disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

BRIEF DESCRIPTION

The drawings included in the specification and forming a part of the specification together with the specification show exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principles of the present disclosure.

FIG. 1 shows a schematic diagram of a processor of an instruction generation and processing method according to an embodiment of the present disclosure.

2 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.

FIG. 3 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure.

FIG. 4 shows a flowchart of a neural network instruction generation method according to an embodiment of the present disclosure.

5 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure.

6 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure.

7 shows a flowchart of a neural network instruction processing method according to an embodiment of the present disclosure.

8a and 8b are schematic diagrams illustrating application scenarios of a neural network instruction generation device and a processing system according to an embodiment of the present disclosure.

9a and 9b show block diagrams of a combined processing device according to an embodiment of the present disclosure.

FIG. 10 shows a schematic structural diagram of a board according to an embodiment of the present disclosure.

detailed description

The technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

It should be understood that the terms "first", "second", "zeroth", etc. in the claims, specification, and drawings of the present disclosure are used to distinguish different objects, not to describe a specific order. The terms "comprising" and "including" used in the specification and claims of the present disclosure indicate the presence of the described features, wholes, steps, operations, elements and / or components, but do not exclude one or more other features, wholes , Steps, operations, elements, components and / or their existence or addition.

It should also be understood that the terminology used in the present specification of the disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the disclosure. As used in this disclosure specification and claims, the singular forms "a", "an", and "the" are intended to include the plural forms unless the context clearly dictates otherwise. It should also be further understood that the term "and / or" used in the present specification and claims refers to any and all possible combinations of one or more of the associated listed items and includes these combinations.

As used in this specification and claims, the term "if" may be interpreted as "when" or "once" or "in response to a determination" or "in response to a detection" depending on the context. Similarly, the phrase "if determined" or "if [described condition or event] is detected" may be interpreted in the context to mean "once determined" or "in response to a determination" or "once detected [described condition or event ] "Or" In response to detection of [the described condition or event] ".

The instruction generation and instruction processing method according to the embodiments of the present disclosure may be applied to a processor, which may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or may be used to perform artificial intelligence operations Artificial Intelligence Processor (IPU). Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, etc. The artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Processing, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. This disclosure does not limit the specific types of processors.

In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit may independently run various assigned tasks, such as: convolution operation tasks and pooling tasks Or fully connected tasks. The present disclosure does not limit the processing unit and the tasks executed by the processing unit.

FIG. 1 shows a schematic diagram of a processor of an instruction generation and processing method according to an embodiment of the present disclosure. As shown in FIG. 1, the processor 100 includes a plurality of processing units 101 and a storage unit 102. The plurality of processing units 101 are used to execute an instruction sequence, and the storage unit 102 is used to store data, which may include a random access memory (RAM, Random Access Memory) And register file. The multiple processing units 101 in the processor 100 can share a part of the storage space, for example, share a part of the RAM storage space and the register file, and can also have their own storage spaces at the same time.

2 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure. As shown in FIG. 2, the device includes a device determination module 11 and an instruction generation module 12. The device determination module 11 is used to determine a running device that executes the macro instruction according to the received macro instruction. The instruction generation module 12 is used to generate an operation instruction according to the macro instruction and the operation device.

In this implementation mode, a macro instruction is a term for batch processing. The macro instruction can be a rule or pattern, or grammatical replacement. When a macro instruction is encountered, the rule or pattern is automatically replaced. The macro can be formed by integrating the to-be-executed instructions that are commonly used for calculation, control, and transportation of data.

In a possible implementation manner, the macro instruction may include at least one of the following: a calculation macro instruction, a control macro instruction, and a data handling macro instruction. The calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction. The control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction. The data handling macro instruction may include at least one of a read macro instruction and a write macro instruction. The read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction. The write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.

In a possible implementation, the macro instruction may include at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and Instruction parameters. The run instruction can contain at least one of the following options: operation type, input address, output address, operand, and instruction parameters.

The identification of the designated device may be the identification of the physical address, IP address, name, and number of the designated device. The identification may include one or any combination of numbers, letters, and symbols. When the position of the identifier of the designated device of the macro instruction is empty, it is determined that the macro instruction has no designated device; or, when the field "ID of the designated device" is not included in the macro instruction, it is determined that the macro instruction has no designated device. The operation type can refer to the type of operation of the macro instruction on the data, which represents the specific type of the macro instruction. For example, when the operation type of a macro instruction is "XXX", the macro instruction can be determined according to "XXX" The specific type of operation performed. The instruction set required to execute the macro instruction can be determined according to the operation type. For example, when the operation type of a macro instruction is "XXX", the required instruction set is all instruction sets required for the processing corresponding to "XXX" . The input address may be an input address of data, a read address, etc. to obtain data, and the output address may be an output address of processed data, a write address, etc. to store data. The input amount may be information characterizing the size of the data such as the input scale and input length of the data. The output volume may be information indicating the size of the data volume such as the output scale and output length of the data. Operands can include the length of the register, the address of the register, the identification of the register, the immediate value, and so on. The immediate number is the number given in the immediate addressing mode instruction. The instruction parameter may refer to a parameter corresponding to the macro instruction and related to its execution. For example, the instruction parameter may be the address and length of the second operand. The instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, and so on.

In this implementation, for a macro instruction, it must include an operation code and at least one operation field, where the operation code is the operation type, and the operation field includes the identification of the specified device, input address, output address, input volume, output volume, Operands and instruction parameters. The operation code may be the part of the instruction or field (usually represented by code) specified in the computer program to perform the operation, and is an instruction sequence number used to inform the device that executes the instruction which instruction needs to be executed. The operation domain may be the source of all data required to execute the corresponding instruction. All data required to execute the corresponding instruction include parameter data, data to be operated or processed, corresponding operation method, or stored parameter data, to be operated or The data to be processed, the address of the corresponding operation method, etc.

It should be understood that, those skilled in the art can set the instruction format and the content of the macro instruction as needed, and this disclosure does not limit this.

In this embodiment, the device determination module 11 may determine one or more running devices according to the macro instruction. The instruction generation module 12 may generate one or more execution instructions. When there are multiple generated running instructions, the multiple running instructions may be executed in the same running device, or may be executed in different running devices, which is not limited in the present disclosure.

The neural network instruction generation device provided by the embodiment of the present disclosure includes a device determination module and an instruction generation module. The device determination module is used to determine a running device that executes the macro instruction according to the received macro instruction. The instruction generation module is used to generate the operation instruction according to the macro instruction and the operation equipment. The device can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.

FIG. 3 shows a block diagram of a neural network instruction generation device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 3, the device may further include a macro instruction generation module 13. The macro generation module 13 is used to receive the instruction to be executed, and generate a macro instruction according to the determined identifier of the designated device and the instruction to be executed.

In this implementation, the designated device may be determined according to the operation type, input amount, output amount, etc. of the instruction to be executed. The received instruction to be executed may be one or multiple instructions.

The instruction to be executed may include at least one of the following: a calculation instruction to be executed, a control instruction to be executed, and a data handling instruction to be executed. The calculation instruction to be executed may include at least one of a neural network calculation instruction to be executed, a vector logic calculation instruction to be executed, a matrix vector calculation instruction to be executed, a scalar calculation instruction to be executed and a scalar logic calculation instruction to be executed. The control instruction to be executed may include at least one of an unconditional jump instruction to be executed and a conditional jump instruction to be executed. The data carrying instruction to be executed may include at least one of a reading instruction to be executed and a writing instruction to be executed. The read instruction to be executed may include at least one of a read neuron instruction to be executed, a read synapse instruction to be executed and a read scalar instruction to be executed. The write instruction to be executed may include at least one of a write neuron instruction to be executed, a write synapse instruction to be executed and a write scalar instruction to be executed.

The instruction to be executed may include at least one of the following options: operation type, input address, output address, input amount, output amount, operand, and instruction parameter.

In this implementation manner, when there is only one instruction to be executed, the determined identifier of the designated device may be added to the instruction to be executed to generate a macro instruction. For example, a certain instruction m to be executed is "XXX ...... param)". Among them, XXX is the operation type, param is the instruction parameter. The designated device m-1 can be determined according to the operation type "XXX" of the instruction m to be executed. Then, an identifier (for example, 09) specifying the device m-1 is added to the instruction m to be executed, and a macro instruction M "XXX09, ... param" corresponding to the instruction m to be executed is generated. When there are multiple instructions to be executed, the identifier of the specified device corresponding to each instruction to be executed can be added to the instruction to be executed, and a macro instruction can be generated according to the plurality of instructions to be executed with the identifier of the specified device , Or generate multiple corresponding macros.

It should be understood that those skilled in the art can set the instruction format and the content of the instructions to be executed as needed, and this disclosure does not limit this.

In a possible implementation manner, as shown in FIG. 3, the device determination module 11 may include a first determination submodule 111. The first determining submodule 111 is used to determine that the designated device is an operating device when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the macro instruction. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.

In this implementation, the macro instruction may include the identifier of one or more designated devices that execute the macro instruction. When the identifier of the specified device is included in the macro instruction, and the resources of the specified device meet the execution conditions, the first determination submodule 111 can directly determine the specified device as the running device, saving the generation time of generating the running instruction based on the macro instruction, and ensuring that The generated operation instruction can be executed by the corresponding operation device.

In a possible implementation manner, as shown in FIG. 3, the device may further include a resource acquisition module 14. The device determination module 11 may further include a second determination sub-module 112. The resource acquisition module 14 is used to acquire resource information of candidate devices. The second determining submodule 112 is used to determine the running device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not contain the identifier of the specified device . The resource information may include the instruction set included in the candidate device. The instruction set included in the alternative device may be an instruction set corresponding to the operation type of one or more macroinstructions. The more instruction sets the candidate device contains, the more types of macro instructions the candidate device can execute.

In this implementation manner, when it is determined that the macro instruction does not include the identifier of the specified device, the second determination submodule 112 may determine one or more running devices capable of executing the macro instruction from the candidate devices. Among them, the determined instruction set of the operating device includes the instruction set corresponding to the macro instruction. For example, if the received macro instruction is a neural network calculation macro instruction, the candidate device containing the instruction set corresponding to the neural network calculation macro instruction may be determined as an operation device to ensure that it can execute the generated operation instruction.

In a possible implementation manner, as shown in FIG. 3, the device determination module 11 may further include a third determination sub-module 113. The third determining submodule 113 determines that the operating device is based on the macro instruction and the resource information of the candidate device when it is determined that the macro instruction contains the identifier of the designated device, and the resource of the designated device does not satisfy the execution conditions for executing the macro instruction.

In this implementation manner, when the third determining submodule 113 determines that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions, it may be determined that the designated device of the macro instruction does not have the ability to execute the macro instruction. The third determining sub-module 113 may determine the running device from the candidate devices, and may determine the candidate device containing the instruction set corresponding to the macro instruction as the running device.

In a possible implementation manner, as shown in FIG. 3, the macro instruction may include at least one of an input amount and an output amount, and the instruction generation module 12 is further used to determine the data amount of the macro instruction, according to the data amount of the macro instruction , Macro instruction and resource information of running equipment to generate running instruction. The data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the running device may further include at least one of storage capacity and remaining storage capacity.

The storage capacity of the running device may refer to the amount of binary information that the memory of the running device can hold. The remaining storage capacity of the running device may refer to the storage capacity that the running device can currently use for command operation after removing the occupied storage capacity. The resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity, the stronger the operating capability of the operating equipment.

In this implementation, the instruction generation module 12 may determine the specific method of splitting the macro instruction according to the resource information of each running device, the data amount of the macro instruction, etc., so as to split the macro instruction and generate the corresponding to the running device Running instructions.

In a possible implementation manner, as shown in FIG. 3, the instruction generation module 12 may include a first instruction generation sub-module 121. The first instruction generation sub-module 121 is used to split the macro instruction into multiple pieces according to the amount of operation data and the amount of data of the operation device when it is determined that the operation device is one and the resource of the operation device does not meet the capacity condition for executing the macro instruction Run instructions, so that the running equipment executes multiple run instructions in sequence. The operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity may be based on the operation data The amount is determined.

In this implementation, the amount of operation data of the operation device may be determined according to the storage capacity or remaining storage capacity of the operation device. The capacity condition may be that the amount of running data of the running device is greater than or equal to the amount of data of the macro instruction. In other words, if the resource of the running device does not meet the capacity condition of executing the macro instruction, it may mean that the amount of running data of the running device is less than the amount of data of the macro instruction. The operation input and operation output must be less than or equal to the operation data to ensure that the generated operation instructions can be executed by the operation equipment. The operation input quantity (or operation output quantity) of different operation instructions among the plurality of operation instructions may be the same or different, and the disclosure does not limit this.

In this implementation manner, when the first instruction generation sub-module 121 determines that there is only one running device, and when the resources of the running device meet the capacity condition for executing the macro instruction, it can directly convert the macro instruction into a running instruction, and also The instruction is split into multiple running instructions, which is not limited in this disclosure.

In a possible implementation manner, as shown in FIG. 3, the instruction generation module 12 may include a second instruction generation sub-module 122. The second instruction generation sub-module 122 is configured to split the macro instruction according to the operation data amount and data amount of each operation device when it is determined that there are multiple operation devices, and generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.

In this implementation, the operation input amount and the operation output amount need to be less than or equal to the operation data amount to ensure that the generated operation instruction can be executed by the operation device. The second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device for the corresponding operation device to execute.

In the above implementation manner, the operation instruction includes at least one of the operation input quantity and the operation output quantity, except that the data quantity of the operation instruction can be limited so that it can be executed by the corresponding operation device. It can also meet the special requirements of different operation instructions on the operation input and / or operation output.

In a possible implementation, for some operation commands that do not have special requirements on the operation input and / or operation output, the operation input and / or operation output may not be included, and the default operation input may be preset And the default operating output, so that when the operating device determines that there is no operating input or operating output in the received operating command, it can use the default operating input and default operating output as the operating input and operating output of the operating command. the amount. By presetting the default operation input amount and the default operation output amount, the generation process of the operation instruction can be simplified and the generation time of the operation instruction can be saved.

In a possible implementation manner, the default input amount and the default output amount for different types of macro instructions may be preset. When the input quantity and output quantity are not included in the macro instruction, the corresponding preset default input quantity and default output quantity can be used as the input quantity and output quantity of the macro instruction. Furthermore, the data amount of the macro instruction is determined according to the default input amount and / or the default output amount, and the operation instruction is generated according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device. When the macro instruction does not include the input quantity and the output quantity, the generated operation instruction may not include the operation input quantity and the operation output quantity, or may include at least one of the operation input quantity and the operation output quantity. When the operation instruction does not include the operation input value and / or the operation output value, the operation device may execute the operation instruction according to the preset default operation input value and / or default operation output value.

In a possible implementation manner, the instruction generation module 12 may also split the macro instruction to generate a running instruction according to the macro instruction and the preset macro instruction splitting rule. The macro splitting rule may be determined according to the conventional macro splitting method (for example, splitting according to the processing of the macro command, etc.), combined with the threshold of the operating data amount of the commands that all candidate devices can execute. Split the macro instruction into operation instructions whose operation input and operation output are both less than or equal to the operation data threshold to ensure that the generated operation instruction can be in its corresponding operation device (the operation device is any one of the alternative devices) Was executed. Among them, the storage capacity (or remaining storage capacity) of all the candidate devices can be compared, and the determined minimum storage capacity (or remaining storage capacity) can be determined as the threshold of the operating data amount of instructions that all the candidate devices can execute.

It should be understood that a person skilled in the art may set the generation method of the running instruction according to actual needs, which is not limited in this disclosure.

In this embodiment, the running instruction generated by the instruction generating module according to the macro instruction may be an instruction to be executed, or one or more parsed instructions obtained by parsing the instruction to be executed, which is not limited in the present disclosure.

In a possible implementation manner, as shown in FIG. 3, the device may further include a queue construction module 15. The queue construction module 15 is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.

In this implementation, an instruction queue uniquely corresponding to each running device can be constructed. The running instructions in the instruction queue can be ordered, and the running instructions can be sent to the only corresponding running device in the instruction queue; or the instruction queue can be sent to the running device, so that the running devices execute the operations in the order of the running instructions in the command queue. instruction. In this way, it can be ensured that the running equipment executes the running instructions according to the instruction queue, avoids the running instructions from being erroneous or delayed, and prevents the running instructions from being missed.

In this implementation, the queue ordering rules may be determined based on the estimated execution time of the execution of the execution instruction, the generation time of the execution instruction, the operation input amount, operation output amount, operation type and other information related to the operation instruction itself. There are no restrictions.

In a possible implementation manner, as shown in FIG. 3, the device may further include an instruction dispatch module 16. The instruction dispatch module 16 is used to send the operation instruction to the operation equipment, so that the operation equipment executes the operation instruction.

In this implementation manner, when there is only one operation instruction executed by the operation device, the operation instruction may be directly sent to the operation device. When there are multiple operating instructions executed by the operating device, all the multiple operating instructions may be sent to the operating device, so that the operating device sequentially executes the multiple operating instructions. It is also possible to send a plurality of operation instructions to the corresponding operation equipment in sequence, wherein each time the operation equipment completes the current operation instruction, the next operation instruction corresponding to the operation equipment is sent to the operation equipment. A person skilled in the art may set the manner of sending the operation instruction to the operation device, and the disclosure does not limit this.

In a possible implementation, as shown in FIG. 3, the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162, and an instruction sending submodule 163. The instruction assembly submodule 161 is used to generate an assembly file according to the operation instruction. The assembly translation submodule 162 is used to translate the assembly file into a binary file. The instruction sending submodule 163 is used to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.

Through the above method, the data amount of the running instruction can be reduced, the time for sending the running instruction to the running device can be saved, and the conversion and execution speed of the macro instruction can be improved.

In this implementation manner, after the binary file is sent to the running device, the running device can decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.

In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and embedded neural network processor (Neural-network Processing Unit, NPU for short). In this way, the speed at which the device generates the run command according to the macro command is increased.

In a possible implementation, the device may be set in the CPU and / or NPU. In order to realize the process of generating the running instruction according to the macro instruction through the CPU and / or NPU, it provides more possible ways for the implementation of the device.

The present disclosure provides a running device for executing the running command generated by the neural network command generating device. The running equipment includes a control module and an execution module. The control module is used to obtain data, neural network models and operation instructions, and can also be used to parse the operation instructions, obtain multiple analysis instructions, and send the multiple analysis instructions and data to the execution module. The execution module is used to execute multiple parsing instructions based on the data to obtain the execution result.

In a possible implementation, the running device further includes a storage module. The storage module may include at least one of a register and a cache, and the cache may include a high-speed temporary storage cache. The cache can be used to store data. Registers can be used to store scalar data in the data.

In a possible implementation manner, the control module may include an instruction storage submodule and an instruction processing submodule. The instruction storage sub-module is used to store operation instructions. The instruction processing submodule is used to parse the running instruction and obtain multiple parsed instructions.

In a possible implementation, the control module may further include a storage queue sub-module. The storage queue sub-module is used to store a running instruction queue, and the running instruction queue contains the running instructions to be executed by the running device and multiple parsing instructions. In the running instruction queue, all instructions are arranged in the order of execution.

In a possible implementation, the execution module may further include a dependency processing sub-module. The dependency processing submodule is used to cache the first parsing instruction in the instruction storage submodule when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction. After the execution of the zeroth parsing instruction is completed , Extract the first parsing instruction from the instruction storage submodule and send it to the execution module.

Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: a first storage address interval storing data required by the first parsing instruction and a zeroth storage storing data required by the zeroth parsing instruction The address interval has overlapping areas. Conversely, there is no association between the first parsing instruction and the zeroth parsing instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.

The present disclosure provides a neural network instruction processing system. The system includes the aforementioned neural network instruction generating device and the aforementioned operating device.

It should be noted that although the above-mentioned embodiment is taken as an example to introduce the neural network instruction generating device, the operating device, and the neural network instruction processing system as above, those skilled in the art can understand that the present disclosure should not be limited to this. In fact, the user can set various modules flexibly according to personal preferences and / or actual application scenarios, as long as the technical solutions of the present disclosure are met.

FIG. 4 shows a flowchart of a neural network instruction generation method according to an embodiment of the present disclosure. As shown in FIG. 4, the method is applied to the above neural network instruction generating device, and the method includes step S41 and step S42. In step S41, based on the received macro instruction, the operating device that executes the macro instruction is determined. In step S42, an operation instruction is generated based on the macro instruction and the operation device.

In a possible implementation manner, step S41 may include: when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition of executing the macro instruction, determining the designated device as the running device. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.

In a possible implementation manner, the method may further include: acquiring resource information of the candidate device. Wherein, step S41 may further include: when it is determined that the macro instruction does not contain the identifier of the specified device, according to the received macro instruction and the resource information of the candidate device, a running device for executing the macro instruction is determined from the candidate device . The resource information may include the instruction set included in the candidate device.

In a possible implementation manner, step S41 may further include: when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the macro instruction, according to the macro instruction and the resource of the alternative device Information to determine the operating equipment.

In a possible implementation manner, the macro instruction may include at least one of input quantity and output quantity. Step S42 may include: determining the data amount of the macro instruction, and generating an operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operating device. Wherein, the data amount may be determined according to at least one of the input amount and the output amount, and the resource information of the operating device may further include at least one item of storage capacity and remaining storage capacity.

In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: determining that the operation device is one and the resource of the operation device does not satisfy the execution of the macro instruction Under the capacity condition, the macro instruction is divided into multiple operation instructions according to the operation data and data amount of the operation equipment, so that the operation equipment executes the plurality of operation instructions in sequence. The operation data volume of the operation equipment may be determined based on the resource information of the operation equipment, and each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity are based on the operation data definite.

In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: when determining that there are multiple operation devices, according to the operation data amount of each operation device The macro instruction is split with the amount of data to generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.

In a possible implementation manner, the method may further include: sorting the execution instructions according to the queue sorting rule, and constructing an instruction queue corresponding to the execution device according to the sorted execution instructions.

In a possible implementation manner, the method may further include: receiving an instruction to be executed, and generating a macro instruction according to the determined identifier of the designated device and the instruction to be executed.

In a possible implementation manner, the method may further include: sending the operation instruction to the operation device, so that the operation device executes the operation instruction.

In a possible implementation, sending the running instruction to the running device to make the running device execute the running instruction includes: generating an assembly file according to the running instruction; translating the assembly file into a binary file; sending the binary file to the running device, So that the running device executes the running instruction according to the binary file.

In a possible implementation manner, the resource information may include at least one of the storage capacity of the candidate device, the remaining storage capacity, and the instruction set included in the candidate device.

In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and NPU.

In a possible implementation, the method can be applied to the CPU and / or NPU.

In a possible implementation manner, the macro instruction may include at least one of the following instructions: a calculation macro instruction, a control macro instruction, and a data handling macro instruction.

The calculation macro instruction may include at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction. The control macro instruction may include at least one of an unconditional jump macro instruction and a conditional jump macro instruction. The data handling macro instruction may include at least one of a read macro instruction and a write macro instruction. The read macro instruction may include at least one of read neuron macro instruction, read synapse macro instruction, and read scalar macro instruction. The write macro instruction may include at least one of write neuron macro instruction, write synapse macro instruction, and write scalar macro instruction.

The method for generating a neural network instruction provided by an embodiment of the present disclosure determines the operation device for executing the macro instruction according to the received macro instruction; and generates the operation instruction according to the macro instruction and the operation device. The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.

The present disclosure also provides a neural network instruction execution method. The method is applied to the above operation device. The method includes: acquiring data, a neural network model, and an operation instruction through the operation device, parsing the operation instruction, and obtaining multiple analysis instructions, and Execute multiple parsing instructions based on the data to get the execution result.

In a possible implementation manner, the method may further include: storing data and scalar data in the data through the running device. Wherein, the running device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache. Cache, used to store data. Registers are used to store scalar data in the data.

In a possible implementation manner, the method may further include:

Store running instructions through running equipment;

Analyze the operation instructions through the operation equipment to obtain multiple analysis instructions;

The running instruction queue is stored by the running device. The running instruction queue includes the running instruction and a plurality of parsing instructions. The running instruction queue and the multiple parsing instructions are arranged in order according to the order in which they are executed.

In a possible implementation manner, the method may further include:

When the running device determines that there is an association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cached first parsing instruction is executed.

Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes: a first storage address interval storing data required by the first parsing instruction and a zeroth storage address storing data required by the zeroth parsing instruction Intervals have overlapping areas.

The method for executing a neural network instruction provided by an embodiment of the present disclosure obtains data, a neural network model, and a running instruction through a running device, analyzes the running instruction, obtains multiple analyzing instructions, and executes multiple analyzing instructions based on the data to obtain an execution result . The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.

The present disclosure also provides a neural network instruction processing method. The method is applied to a neural network instruction processing system. The neural network instruction processing system includes the aforementioned neural network instruction generating device and the aforementioned operating device. The method includes the above-mentioned neural network instruction generation method applied to the neural network instruction generation device and the neural network instruction execution method applied to the running equipment. The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.

The foregoing can be better understood based on the following terms:

Clause B1, a neural network instruction generating device, the device comprising:

Clause B2. The apparatus according to Clause B1, the device determination module includes:

The first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.

Clause B3. The device according to Clause B2, the device further comprising:

Resource acquisition module, used to obtain resource information of candidate devices,

The device determination module also includes:

A second determining submodule, configured to determine from the candidate device based on the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not include the identifier of the designated device A running device for executing the macro instruction,

Wherein, the resource information includes an instruction set included in the candidate device.

Clause B4. The apparatus according to Clause B3, the device determination module, further comprising:

A third determining submodule, when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the candidate device Resource information to determine the operating equipment.

Clause B5. The device according to any one of Clauses B2-B4, the macro instruction includes at least one of an input quantity and an output quantity,

The instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,

Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.

Clause B6. The apparatus according to Clause B5, the instruction generation module includes:

The first instruction generating submodule is used to determine, according to the amount of operation data of the operation device and the data, when the operation device is determined to be one and the resource of the operation device does not satisfy the capacity condition for executing the macro instruction The macro instruction is divided into multiple running instructions to enable the running device to execute multiple running instructions in sequence,

Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operating data.

Clause B7. The apparatus according to Clause B5, the instruction generation module includes:

A second instruction generation sub-module, used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation device 'S running instructions,

The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.

Clause B8. The device according to Clause B1, the device further comprising:

The queue construction module is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.

Clause B9. The device according to Clause B2, the device further comprising:

The macro generation module is used for receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.

Clause B10. The device according to Clause B1, the device further comprising:

An instruction dispatching module, configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,

Wherein, the instruction dispatching module includes:

An instruction assembly submodule, used to generate an assembly file according to the operation instruction;

An assembly translation submodule, used to translate the assembly file into a binary file;

An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.

Clause B11, the device according to Clause B1,

The running device is one or any combination of CPU, GPU and NPU;

The device is set in the CPU and / or NPU;

The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,

The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,

The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,

The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;

The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,

The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.

Article B12. A machine learning computing device, the device comprising:

One or more neural network instruction generation devices as described in any one of Clause B1-Clause B11, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;

Clause B13. A combined processing device, the device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause B12;

Clause B14, the combined processing device according to Clause B13, further comprising: a storage device, which is connected to the machine learning computing device and the other processing device, respectively, for storing the machine learning computing device and the Data from other processing devices.

Article B15. A machine learning chip, the machine learning chip includes:

The machine learning arithmetic device according to clause B12 or the combined processing device according to clause B13.

Article B16. An electronic device, the electronic device comprising:

Machine learning chip as described in clause B15.

Clause B17, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause B15;

Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;

The storage device is used for storing data;

The interface device is used to realize data transmission between the machine learning chip and an external device;

The control device is used for monitoring the state of the machine learning chip.

Clause B18, the board described in Clause B17,

The storage device includes: multiple sets of storage units, each of which is connected to the machine learning chip through a bus, and the storage unit is: DDR SDRAM;

The machine learning chip includes: a DDR controller for controlling data transmission and data storage of each storage unit;

The interface device is: a standard PCIE interface.

Clause B19. A neural network instruction generation method, the method comprising:

Clause B20. According to the method described in Clause B19, based on the received macro instruction, determining a running device that executes the macro instruction includes:

When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, determining the designated device as the running device,

Clause B21. The method according to Clause B20, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received macro instruction, determining a running device for executing the macro instruction includes:

When it is determined that the macro instruction does not include the identifier of the designated device, according to the received macro instruction and the resource information of the candidate device, a candidate for executing the macro instruction is determined from the candidate device Running equipment,

Clause B22. According to the method described in Clause B21, based on the received macro instruction, determining a running device that executes the macro instruction includes:

When it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Run the device.

Clause B23, the method according to any one of Clause B20-Clause B22, the macro instruction includes at least one of an input quantity and an output quantity,

According to the macro instruction and the operation device, generating an operation instruction includes:

Determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device,

Clause B24. According to the method described in Clause B23, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes:

When it is determined that the running device is one, and the resources of the running device do not satisfy the capacity condition for executing the macro instruction, the macro instruction is split into according to the running data amount of the running device and the data amount Multiple operating instructions, so that the operating device sequentially executes multiple operating instructions,

Clause B25. According to the method described in Clause B23, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes:

When it is determined that there are a plurality of operation devices, the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,

Clause B26. The method according to Clause B19, the method further comprising:

The running instructions are sorted according to the queue sorting rules, and an instruction queue corresponding to the running device is constructed according to the sorted running instructions.

Clause B27. The method according to Clause B20, the method further comprising:

Receiving the instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed.

Clause B28. The method according to Clause B19, the method further comprising:

Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,

Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.

Clause B29, according to the method described in Clause B19,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

5 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure. As shown in FIG. 5, the system includes an instruction generation device 100 and an operation device 200.

The instruction generation device 100 includes a device determination module 11 and an instruction generation module 12. The device determination module 11 is used to determine a running device that executes the macro instruction according to the received macro instruction. The instruction generation module 12 is used to generate an operation instruction according to the macro instruction and the operation device.

The operation device 200 includes a control module 21 and an execution module 22. The control module 21 is used to obtain the required data, the neural network model, and the operation instructions, analyze the operation instructions, and obtain multiple analysis instructions. The execution module 22 is used to execute multiple parsing instructions according to the data to obtain the execution result.

In this implementation, a macro instruction is a term for batch processing. A macro instruction can be a rule or a pattern, or a syntax replacement. A macro instruction processing device, system, etc. will automatically perform this when it encounters a macro instruction. Replacement of rules or patterns. The macro instruction can be formed by integrating the instructions to be executed that are commonly used for calculation, control and transportation of data.

The identification of the designated device may be the identification of the physical address, IP address, name, and number of the designated device. The identification may include one or any combination of numbers, letters, and symbols. When the position of the identifier of the designated device of the macro instruction is empty, it is determined that the macro instruction has no designated device; or, when the field "ID of the designated device" is not included in the macro instruction, it is determined that the macro instruction has no designated device. The operation type can refer to the type of operation of the macro instruction on the data, which represents the specific type of the macro instruction. For example, when the operation type of a macro instruction is "XXX", the macro instruction can be determined according to "XXX" The specific type of operation performed. The instruction set required to execute the macro instruction can be determined according to the operation type. For example, when the operation type of a macro instruction is "XXX", the required instruction set is all instruction sets required for the processing corresponding to "XXX" . The input address may be an input address of data, a read address, etc. to obtain data, and the output address may be an output address of processed data, a write address, etc. to store data. The input amount may be information characterizing the size of the data such as the input scale and input length of the data. The output volume may be information indicating the size of the data volume such as the output scale and output length of the data. The operand may include one or more of the length of the register, the address of the register, the identification of the register, the immediate value, and so on. The immediate number is the number given in the immediate addressing mode instruction. The instruction parameter may refer to a parameter corresponding to the macro instruction and related to its execution. For example, the instruction parameter may be the address and length of the second operand. The instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, and so on.

The neural network instruction processing system provided by the embodiments of the present disclosure includes an instruction generation device and an operation device. The instruction generating device includes a device determining module for determining a running device that executes the macro instruction based on the received macro instruction; an instruction generating module is used for generating a running instruction based on the macro instruction and the running device. The operation device includes a control module for acquiring required data, a neural network model and operation instructions, analyzing the operation instructions to obtain multiple analysis instructions; an execution module for executing the plurality of analysis instructions based on the data, and obtaining execution results . The neural network instruction processing system provided by the embodiments of the present disclosure can be used across platforms, has good applicability, fast instruction conversion speed, high processing efficiency, low error probability, and low development labor and material costs.

6 shows a block diagram of a neural network instruction processing system according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include a macro instruction generation module 13. The macro generation module 13 is used to receive the instruction to be executed, and generate a macro instruction according to the determined identifier of the designated device and the instruction to be executed.

In a possible implementation manner, as shown in FIG. 6, the device determination module 11 may include a first determination submodule 111. The first determining submodule 111 is used to determine that the designated device is an operating device when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the macro instruction. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.

In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include a resource acquisition module 14. The device determination module 11 may further include a second determination sub-module 112. The resource acquisition module 14 is used to acquire resource information of candidate devices. The second determining submodule 112 is used to determine the running device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not contain the identifier of the specified device . The resource information may include the instruction set included in the candidate device. The instruction set included in the alternative device may be an instruction set corresponding to the operation type of one or more macroinstructions. The more instruction sets the candidate device contains, the more types of macro instructions the candidate device can execute.

In this implementation manner, when it is determined that the macro instruction does not include the identifier of the specified device, the second determination submodule 112 may determine one or more running devices capable of executing the macro instruction from the candidate devices. Among them, the determined instruction set of the running device includes an instruction set corresponding to the macro instruction. For example, if the received macro instruction is a neural network calculation macro instruction, the candidate device containing the instruction set corresponding to the neural network calculation macro instruction may be determined as the operation device to ensure that the operation device can execute the generated operation instruction.

In a possible implementation manner, as shown in FIG. 6, the device determination module 11 may further include a third determination sub-module 113. The third determining submodule 113 determines that the operating device is based on the macro instruction and the resource information of the candidate device when it is determined that the macro instruction contains the identifier of the designated device, and the resource of the designated device does not satisfy the execution conditions for executing the macro instruction.

In a possible implementation manner, as shown in FIG. 6, the macro instruction may include at least one of the input amount and the output amount, and the instruction generation module 12 is further used to determine the data amount of the macro instruction, based on the data amount of the macro instruction , Macro instruction and resource information of running equipment to generate running instruction. The data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the running device may further include at least one of storage capacity and remaining storage capacity.

In a possible implementation manner, as shown in FIG. 6, the instruction generation module 12 may include a first instruction generation sub-module 121. The first instruction generation sub-module 121 is used to split the macro instruction into multiple pieces according to the amount of operation data and the amount of data of the operation device when it is determined that the operation device is one and the resource of the operation device does not meet the capacity condition for executing the macro instruction Run instructions, so that the running equipment executes multiple run instructions in sequence. The operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity may be based on the operation data The amount is determined.

In a possible implementation manner, as shown in FIG. 6, the instruction generation module 12 may include a second instruction generation sub-module 122. The second instruction generation sub-module 122 is configured to split the macro instruction according to the operation data amount and data amount of each operation device when it is determined that there are multiple operation devices, and generate an operation instruction corresponding to each operation device. The operation data volume of each operation device can be determined according to the resource information of each operation device, and the operation instruction can include at least one of the operation input amount and the operation output amount. The operation input amount and the operation output amount are based on the execution The operation data of the operation equipment of the operation instruction is determined.

In this implementation mode, the operation input amount and the operation output amount need to be less than or equal to the operation data amount to ensure that the generated operation instruction can be executed by the operation device. The second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device for the corresponding operation device to execute.

In the above implementation, the operation instruction includes at least one of the operation input amount and the operation output amount. In this way, in addition to limiting the data volume of the operation instructions so that they can be executed by the corresponding operation equipment, it can also meet the special limitation requirements of the operation input quantity and / or the operation output quantity of different operation instructions.

In a possible implementation manner, for some operation instructions that do not have special requirements on the operation input and / or operation output, the operation instruction may not include the operation input and / or operation output, and the default may be set in advance. The operation input value and the default operation output value enable the operation device to use the default operation input value and the default operation output value as the operation input value of the operation instruction when it is determined that there is no operation input value or operation output value in the received operation instruction 3. Operation output. By presetting the default operation input amount and the default operation output amount, the generation process of the operation instruction can be simplified and the generation time of the operation instruction can be saved.

In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include a queue construction module 15. The queue construction module 15 is used to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions.

In a possible implementation manner, as shown in FIG. 6, the instruction generation device 100 may further include an instruction dispatch module 16. The instruction dispatch module 16 is used to send the operation instruction to the operation equipment, so that the operation equipment executes the operation instruction.

In a possible implementation, as shown in FIG. 6, the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162 and an instruction sending submodule 163. The instruction assembly submodule 161 is used to generate an assembly file according to the operation instruction. The assembly translation submodule 162 is used to translate the assembly file into a binary file. The instruction sending submodule 163 is used to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.

In a possible implementation manner, the running device may be one or any combination of CPU, GPU, and embedded neural network processor (Neural-network Processing Unit, NPU for short). In this way, the speed at which the instruction generation device generates the operation instruction according to the macro instruction is increased.

In a possible implementation, the instruction generating device 100 may be set in the CPU and / or NPU. In order to realize the process of generating the running instruction according to the macro instruction through the CPU and / or NPU, it provides more possible ways for the realization of the instruction generating device.

In a possible implementation manner, the running device 200 further includes a storage module 23. The storage module 23 may include at least one of a register and a cache, and the cache may include a high-speed temporary storage cache. The cache can be used to store data. Registers can be used to store scalar data in the data.

In a possible implementation, the control module 21 may include an instruction storage submodule 211 and an instruction processing submodule 212. The instruction storage sub-module 211 is used to store operation instructions. The instruction processing sub-module 212 is used to parse the running instruction to obtain multiple parsed instructions.

In a possible implementation, the control module 21 may further include a storage queue sub-module 213. The storage queue sub-module 213 is used to store a running instruction queue, and the running instruction queue includes a running instruction to be executed by the running device and a plurality of parsing instructions. In the running instruction queue, all instructions are arranged in the order of execution.

In a possible implementation manner, the execution module 22 may further include a dependency relationship processing sub-module 221. The dependency processing submodule 221 is used to cache the first parsing instruction in the instruction storage submodule when it is determined that the first parsing instruction is associated with the zeroth parsing instruction before the first parsing instruction, and the execution of the zeroth parsing instruction is completed After that, the first parsing instruction is extracted from the instruction storage submodule and sent to the execution module.

It should be noted that although the above embodiment is taken as an example to introduce the neural network instruction processing system as described above, those skilled in the art can understand that the present disclosure should not be limited to this. In fact, the user can set various modules flexibly according to personal preferences and / or actual application scenarios, as long as the technical solutions of the present disclosure are met.

7 shows a flowchart of a neural network instruction processing method according to an embodiment of the present disclosure. As shown in FIG. 7, the method is applied to the neural network instruction processing system including the instruction generation device and the operation device. The method includes step S51 and step S52.

Step S51: The instruction generation device determines the operation device that executes the macro instruction according to the received macro instruction, and generates the operation instruction according to the macro instruction and the operation device.

Step S52: Obtain data, neural network model and operation instruction through the operation device, analyze the operation instruction, obtain multiple analysis instructions, and execute the multiple analysis instructions according to the data to obtain the execution result.

In a possible implementation manner, step S51 may include: when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition of executing the macro instruction, determining the designated device as the running device. The execution condition may include: the specified device contains an instruction set corresponding to the macro instruction.

In a possible implementation manner, the method may further include: acquiring resource information of the candidate device.

Wherein, step S51 may also include: when it is determined that the macro instruction does not contain the identifier of the designated device, determine the running device for executing the macro instruction from the candidate device based on the received macro instruction and the resource information of the candidate device . The resource information may include the instruction set included in the candidate device.

In a possible implementation, step S51 may further include: when it is determined that the macro instruction contains the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the macro instruction, according to the macro instruction and the resources of the alternative device Information to determine the operating equipment.

In a possible implementation manner, the macro instruction may include at least one of the input amount and the output amount. In step S51, generating the operation instruction according to the macro instruction and the operation device may include: determining the data amount of the macro instruction, according to the macro The data volume of the instruction, the macro instruction and the resource information of the operation equipment generate the operation instruction. Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.

In a possible implementation, generating the operation instruction according to the data amount of the macro instruction, the macro instruction, and the resource information of the operation device may include: determining that the operation device is one and the resource of the operation device does not satisfy the execution of the macro instruction Under the capacity condition, the macro instruction is divided into multiple operation instructions according to the operation data amount and data amount of the operation device, so that the operation device executes multiple operation instructions in sequence. The operation data volume of the operation equipment may be determined according to the resource information of the operation equipment, and each operation instruction may include at least one of the operation input quantity and the operation output quantity, and the operation input quantity and the operation output quantity are based on the operation data quantity definite.

In a possible implementation manner, the method may further include: sorting the execution instructions according to the queue sorting rule by the instruction generation device, and constructing an instruction queue corresponding to the execution device according to the sorted execution instructions.

In a possible implementation manner, the method may further include: receiving the instruction to be executed through the instruction generating device, and generating a macro instruction according to the determined identifier of the designated device and the instruction to be executed.

In a possible implementation manner, the method may further include: sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction.

In a possible implementation manner, sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction, may include:

Generate assembly files according to the operating instructions;

Translate assembly files into binary files;

Send the binary file to the running device, so that the running device executes the running instruction according to the binary file.

In a possible implementation manner, the method may further include: storing data and scalar data in the data through the running device. Among them, the running device includes a storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache, a cache for storing data, and a register for storing scalar data in the data.

In a possible implementation manner, the method may further include:

Store running instructions through running equipment;

The running instruction queue is stored by the running device, where the running instruction queue includes the running instruction and multiple parsing instructions. In the running instruction queue, the running instruction and the multiple parsing instructions are arranged in order according to the order in which they are executed.

In a possible implementation manner, the method may further include:

In a possible implementation, the instruction generating device is set in the CPU and / or NPU.

The neural network instruction processing method provided by the embodiment of the present disclosure determines the operation device that executes the macro instruction according to the received macro instruction through the instruction generation device, and generates the operation instruction according to the macro instruction and the operation device; The neural network model and the operation instructions analyze the operation instructions to obtain multiple analysis instructions, and execute the multiple analysis instructions based on the data to obtain the execution result. The method can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low labor and material cost for development.

The foregoing can be better understood based on the following terms:

Clause A1, a neural network instruction processing system, the system includes an instruction generation device and an operation device,

The instruction generating device includes:

An instruction generation module, configured to generate an operation instruction according to the macro instruction and the operation device;

The operation equipment includes:

The control module is used to obtain the required data, the neural network model and the operation instruction, analyze the operation instruction, and obtain multiple analysis instructions;

The execution module is configured to execute the multiple parsing instructions according to the data to obtain an execution result.

Clause A2. The system according to Clause A1, the instruction generating device further includes:

The device determination module includes at least one of the following sub-modules:

The first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ;

A second determining submodule, configured to determine from the candidate device based on the received macro instruction and the resource information of the candidate device when it is determined that the macro instruction does not include the identifier of the designated device A running device for executing the macro instruction;

A third determination submodule, when it is determined that the macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the alternative Equipment resource information, determine the operating equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause A3. The system according to Clause A2, the macro instruction includes at least one of an input quantity and an output quantity,

Clause A4. The system according to Clause A3, the instruction generation module includes at least one of the following sub-modules:

The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on execution The operation data amount of the operation equipment of the operation instruction is determined.

Clause A5. The system according to Clause A1, the instruction generating device further includes:

Clause A6. The system according to Clause A2, the instruction generating device further includes:

Clause A7. The system according to Clause A1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause A8. The system according to Clause A1, the operation device, further comprising:

A storage module, the storage module includes one or more of a register and a cache, the cache includes a high-speed temporary storage cache,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause A9. The system according to Clause A1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

An instruction processing sub-module, which is used to parse the running instruction to obtain the multiple parsing instructions;

A storage queue sub-module for storing a running instruction queue, the running instruction queue including the running instruction and the plurality of parsing instructions, the running instruction queue, the running instruction and the multiple parsing instructions are executed according to Arranged in order.

Clause A10. The system according to Clause A9, the execution module includes:

The dependency processing sub-module is used to cache the first parsing instruction in the instruction storage sub-module when it is determined that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction After the execution of the zeroth parsing instruction is completed, extract the first parsing instruction from the instruction storage submodule and send it to the execution module,

Wherein, the association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction includes:

The first storage address section storing data required by the first parsing instruction and the zeroth storage address section storing data required by the zeroth parsing instruction have overlapping areas.

Clause A11, the system according to Clause A1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling instruction,

Clause A12. A machine learning computing device, the device comprising:

One or more neural network instruction processing systems as described in any one of Clause A1-Clause A11, used to obtain the data and control information to be calculated from other processing devices, and perform the specified machine learning operation, and pass the execution result through I / O interface is passed to other processing devices;

When the machine learning computing device includes a plurality of the neural network instruction processing systems, the plurality of the neural network instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the neural network instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction processing systems share the same control system or own Respective control systems; multiple neural network instruction processing systems share memory or have their own memory; multiple neural network instruction processing systems are interconnected in an arbitrary topology.

Clause A13. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause A12;

The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,

Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.

Article A14. A machine learning chip, the machine learning chip includes:

The machine learning arithmetic device according to clause A12 or the combined processing device according to clause A13.

Article A15. An electronic device, the electronic device comprising:

Machine learning chip as described in clause A14.

Clause A16, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip as described in Clause A14;

The storage device is used for storing data;

Clause A17. A neural network instruction processing method. The method is applied to a neural network instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

Determining, by the instruction generating device, a running device that executes the macro instruction according to the received macro instruction, and generating a running instruction according to the macro instruction and the running device;

Obtain data, a neural network model, and an operation instruction through the operation device, analyze the operation instruction, obtain a plurality of analysis instructions, and execute the plurality of analysis instructions according to the data to obtain an execution result.

Clause A18. The method according to Clause A17, the method further comprising:

Obtain the resource information of the candidate device through the instruction generating device,

Wherein, the instruction generating device determines the operating device that executes the macro instruction according to the received macro instruction, including at least one of the following:

When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, the designated device is determined as the running device;

When it is determined that the macro instruction does not include the identifier of the designated device, according to the received macro instruction and the resource information of the candidate device, a candidate for executing the macro instruction is determined from the candidate device Running equipment

When it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Running equipment,

Clause A19. The method according to Clause A18, the macro instruction includes at least one of an input quantity and an output quantity,

Clause A20. According to the method of Clause A19, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including at least one of the following:

When it is determined that the running device is one, and the resources of the running device do not satisfy the capacity condition for executing the macro instruction, the macro instruction is split into according to the running data amount of the running device and the data amount Multiple operating instructions, so that the operating device sequentially executes multiple operating instructions;

Clause A21. The method according to Clause A17, the method further comprising:

The instruction generating device sorts the running instructions according to a queue sorting rule, and constructs an instruction queue corresponding to the running device according to the sorted running instructions.

Clause A22. The method according to Clause A18, the method further comprising:

The instruction generating device receives the instruction to be executed, and generates the macro instruction according to the determined identifier of the designated device and the instruction to be executed.

Clause A23. The method according to Clause A17, the method further comprising:

Sending the operation instruction to the operation device through the instruction generation device, so that the operation device executes the operation instruction,

Wherein, sending the operation instruction to the operation device by the instruction generation device, so that the operation device executes the operation instruction, includes:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause A24. The method according to Clause A17, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

Wherein, the running device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause A25. The method according to Clause A17, the method further comprising:

Storing the operation instruction through the operation device;

Analyzing the operation instruction by the operation device to obtain the multiple analysis instructions;

Storing an execution instruction queue by the operation device, the operation instruction queue including the operation instruction and the plurality of parsing instructions, the operation instruction queue, the operation instruction and the plurality of parsing instructions in the order of being executed Arrange in order.

Clause A26. The method according to Clause A25, the method further comprising:

When the running device determines that there is an association between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction, the first parsing instruction is cached, and after the execution of the zeroth parsing instruction is completed, the cache is executed The first parsing instruction,

Clause A27, according to the method described in Clause A17,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The macro instruction includes at least one of the following instructions:

Calculation macro instruction, control macro instruction and data handling instruction,

The macro instruction includes at least one of the following options:

The identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,

For different instructions, the above devices, systems and corresponding methods can be processed differently for different instructions. For a clearer description of the working process and principles of the above instructions, devices and systems for different instructions, the following terms can be understood .

For calculation instructions, the following calculation instruction generation devices and methods, as well as calculation instruction processing systems and methods, can be used for processing, and the foregoing can be better understood in accordance with the following clauses:

Clause C1, a calculation instruction generating device, the device comprising:

A device determination module, configured to determine a running device that executes the calculation macro instruction based on the received calculation macro instruction;

An instruction generation module for generating an operation instruction according to the calculation macro instruction and the operation device,

The calculation macro instruction refers to a macro instruction used for data calculation, and the data calculation includes at least one of machine learning calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation,

The calculation macro instruction includes an operation type, an input address and an output address, and the operation instruction includes the operation type, an operation input address and an operation output address, and the operation input address and the operation output address are based on the input The address and the output address are determined.

Clause C2. The apparatus according to Clause C1, the device determination module includes:

The first determining submodule is configured to determine the designated device as the designated device when it is determined that the calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction Running equipment,

Wherein, the execution condition includes that the designated device contains an instruction set corresponding to the calculation macro instruction.

Clause C3. The device according to Clause C2, the device further comprising:

The device determination module also includes:

A second determination submodule, configured to determine, from the candidate device, the calculation macro instruction and the resource information of the candidate device according to the received computer macro instruction and the resource information of the candidate device Determine a running device for executing the calculation macro instruction,

Clause C4. The apparatus according to Clause C3, the device determination module further includes:

The third determining submodule, when determining that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.

Clause C5. The device according to any one of Clauses C2-C4, the calculation macro instruction further includes at least one of an input quantity and an output quantity,

The instruction generation module is further used to determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,

Clause C6. The device according to Clause C5, the instruction generation module includes:

The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the calculation macro instruction, according to the operating data amount of the operating device and the The amount of data splits the calculation macro instruction into multiple running instructions, so that the running device sequentially executes the multiple running instructions,

Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, each operation instruction further includes at least one of operation input amount and operation output amount, the operation input amount and the operation The amount of output is determined according to the amount of operating data.

Clause C7. The device according to Clause C5, the instruction generation module includes:

The second instruction generation submodule is used to split the calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a corresponding to each operation Equipment operating instructions,

The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation The output amount is determined according to the operation data amount of the operation device that executes the operation instruction.

Clause C8. The device according to Clause C1, the device further comprising:

Clause C9. The device according to Clause C2, the device further comprising:

The macro generation module is used to receive the calculation instruction to be executed, and generate the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.

Clause C10. The device according to Clause C1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause C11, the device according to Clause C1,

The running device is one or any combination of CPU, GPU and NPU;

The device is set in the CPU and / or NPU;

The calculation macro instruction includes at least one of the following instructions: at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction;

The calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.

Clause C12. A machine learning computing device, the device comprising:

One or more computing instruction generation devices as described in any one of Clause C1-Clause C11, used to obtain data to be calculated and control information from other processing devices, and perform specified machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the calculation instruction generation devices, the plurality of the calculation instruction generation devices may be connected and transmit data through a specific structure;

Among them, a plurality of the calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the calculation instruction generation devices share the same control system or have their own A control system; a plurality of the calculation instruction generating devices share memory or have their own memories; the interconnection mode of the plurality of calculation instruction generating devices is an arbitrary interconnection topology.

Clause C13. A combined processing device, the device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in Clause C12;

Clause C14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause C12 or a combination described in Article C13 Processing device

The storage device is used for storing data;

Clause C15, a method for generating calculation instructions, the method comprising:

According to the received calculation macro instruction, determine a running device that executes the calculation macro instruction;

Generate an operation instruction according to the calculation macro instruction and the operation device,

Clause C16. According to the method described in Clause C15, based on the received calculation macro instruction, determining an operating device that executes the calculation macro instruction includes:

When it is determined that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction, determining the designated device as the running device,

Clause C17. The method according to Clause C16, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received calculation macro instruction, determining a running device that executes the calculation macro instruction includes:

When it is determined that the calculation macro instruction does not include the identification of the designated device, based on the received calculation macro instruction and the resource information of the candidate device, determining from the candidate device for performing the calculation Macro operating equipment,

Clause C18. According to the method described in Clause C17, based on the received calculation macro instruction, determining the operating device that executes the calculation macro instruction includes:

When it is determined that the calculation macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the resource information of the calculation macro instruction and the candidate device To determine the operating equipment.

Clause C19, the method according to any one of Clause C16-C18, the calculation macro instruction further includes at least one of an input quantity and an output quantity,

Generating an operation instruction according to the calculation macro instruction and the operation device, including:

Determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount, the calculation macro instruction and the resource information of the operation device,

Clause C20. According to the method described in Clause C19, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction, and the resource information of the operation device includes:

When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the calculation macro instruction, the calculation macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,

Clause C21. According to the method described in Clause C19, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction, and the resource information of the operation device includes:

When it is determined that there are a plurality of operation devices, the calculation macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,

Clause C22. The method according to Clause C15, the method further comprising:

Clause C23. The method according to Clause C16, the method further comprising:

Receiving the calculation instruction to be executed, and generating the calculation macro instruction according to the determined identifier of the designated device and the calculation instruction to be executed.

Clause C24. The method according to Clause C15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause C25, according to the method described in Clause C15,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

The calculation macro instruction includes at least one of the following instructions: at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,

Clause D1, a computing instruction processing system, the system includes an instruction generating device and an operating device,

The instruction generating device includes:

An instruction generation module, configured to generate an operation instruction according to the calculation macro instruction and the operation device;

The operation equipment includes:

An execution module, configured to execute the multiple parsing instructions based on the data to obtain an execution result,

Clause D2. The system according to Clause D1, the instruction generating device further includes:

The first determining submodule is configured to determine the designated device as the designated device when it is determined that the calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction Running equipment

A second determination submodule, configured to determine, from the candidate device, the calculation macro instruction and the resource information of the candidate device according to the received computer macro instruction and the resource information of the candidate device Determine a running device for executing the calculation macro instruction;

The third determining submodule, when determining that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the calculation macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause D3. The system according to Clause D2, the calculation macro instruction further includes at least one of an input quantity and an output quantity,

Clause D4. The system according to Clause D3, the instruction generation module includes at least one of the following sub-modules:

The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the calculation macro instruction, according to the operating data amount of the operating device and the The amount of data splits the calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,

The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes at least one of an operation input amount and an operation output amount. The operation input amount and the operation output amount are based on The amount of operation data of the operation device that executes the operation instruction is determined.

Clause D5. The system according to Clause D1, the instruction generating device further includes:

Clause D6. The system according to Clause D2, the instruction generating device further includes:

Clause D7. The system according to Clause D1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause D8. The system according to Clause D1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause D9. The system according to Clause D1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause D10. The system according to Clause D9, the execution module includes:

Clause D11, the system according to Clause D1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The calculation macro instruction includes at least one of the following instructions: neural network calculation macro instruction, vector logic calculation macro instruction, matrix vector calculation macro instruction, scalar calculation macro instruction, and scalar logic calculation macro instruction;

The calculation macro instruction includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the calculation macro instruction.

Clause D12. A machine learning computing device, the device comprising:

One or more computing instruction processing systems as described in any one of Clause D1-Clause D11, used to obtain data to be calculated and control information from other processing devices, and perform specified machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;

When the machine learning computing device includes a plurality of the calculation instruction processing systems, the plurality of calculation instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the computing instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the computing instruction processing systems share the same control system or have their own A control system; a plurality of the computing instruction processing systems share memory or have their own memories; the interconnection method of the plurality of computing instruction processing systems is any interconnected topology.

Clause D13. A combined processing device, the combined processing device comprising:

Machine learning computing devices, general interconnect interfaces and other processing devices as described in clause D12;

Clause D14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the clause D12 or a combination described in the clause D13 Processing device

The storage device is used for storing data;

Clause D15. A method for processing calculation instructions. The method is applied to a calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

Determining, by the instruction generating device, a running device that executes the calculation macro instruction according to the received calculation macro instruction, and generating an operation instruction based on the calculation macro instruction and the running device;

Acquiring data, a neural network model, and operating instructions through the operating device, parsing the operating instructions, obtaining multiple analytical instructions, and executing the multiple analytical instructions based on the data, to obtain an execution result,

Clause D16. The method according to Clause D15, the method further comprising:

Wherein, the instruction generating device determines the operating device that executes the calculation macro instruction according to the received calculation macro instruction, including at least one of the following:

When it is determined that the calculation macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the calculation macro instruction, the designated device is determined as the running device;

When it is determined that the calculation macro instruction does not include the identification of the designated device, based on the received calculation macro instruction and the resource information of the candidate device, determining from the candidate device for performing the calculation Macro operating equipment;

When it is determined that the calculation macro instruction includes the identification of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the calculation macro instruction, according to the calculation macro instruction and the resource of the candidate device Information, determine operating equipment,

Clause D17. According to the method of Clause D16, the calculation macro instruction further includes at least one of an input quantity and an output quantity,

Determine the data amount of the calculation macro instruction, and generate an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device,

Clause D18, according to the method of Clause D17, generating an operation instruction according to the data amount of the calculation macro instruction, the calculation macro instruction and the resource information of the operation device, including at least one of the following:

When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the calculation macro instruction, the calculation macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;

Clause D19. The method according to Clause D15, the method further comprising:

Clause D20. The method according to Clause D16, the method further comprising:

The calculation instruction to be executed is received by the instruction generation device, and the calculation macro instruction is generated according to the determined identifier of the designated device and the calculation instruction to be executed.

Clause D21. The method according to Clause D15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause D22. The method according to Clause D15, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

Wherein, the operation device includes a storage module, the storage module includes one or more of a register and a cache, and the cache includes a high-speed temporary storage cache,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause D23. The method according to Clause D15, the method further comprising:

Storing the operation instruction through the operation device;

Clause D24. The method according to Clause D23, the method further comprising:

Clause D25, the method according to Clause D15,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Neural network computing instructions can be processed by the following computing instruction generating devices and methods, and neural network computing instruction processing systems, methods and other related products. The foregoing can be better understood according to the following clauses:

Clause E1, a neural network computing instruction generation device, the device comprising:

A device determination module, configured to determine a running device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction;

An instruction generation module, used to calculate macroinstructions and the operation equipment according to the neural network, and generate operation instructions,

Wherein, the neural network calculation macro instruction refers to a macro instruction used to calculate a neural network algorithm,

The neural network calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address, and the operation input address and the operation output address are respectively The input address and the output address are determined.

Clause E2. The apparatus according to Clause E1, the device determination module includes:

The first determining submodule is configured to, when it is determined that the neural network calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the neural network calculation macro instruction, the specified device Determined to be the operating device,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the neural network calculation macro instruction.

Clause E3. The device according to Clause E2, the device further comprising:

The device determination module also includes:

The second determining submodule is configured to determine the neural network calculation macro instruction does not include the identifier of the specified device, according to the received neural network calculation macro instruction and the resource information of the candidate device, from the Among the candidate devices, a running device for executing the neural network calculation macro instruction is determined,

Clause E4. The apparatus according to Clause E3, the device determination module, further comprising:

A third determining submodule, when it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the neural network calculation macro instruction, according to the neural network The network computing macro and the resource information of the candidate device determine the operating device.

Clause E5. The device according to any one of Clause E2- Clause E4, the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,

The instruction generation module is also used to determine the data amount of the neural network calculation macroinstruction, and generate according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device Run instruction,

Clause E6. The device according to Clause E5, the instruction generation module includes:

The first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the amount of data of the macro network calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the neural network calculation macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence,

Clause E7. The device according to Clause E5, the instruction generation module includes:

The second instruction generation sub-module is used to split the neural network calculation macro instruction according to the operation data amount of each operation device and the data amount when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,

Clause E8. The device according to Clause E1, the device further comprising:

Clause E9. The device according to Clause E2, the device further comprising:

The macro generation module is used to receive the neural network calculation instruction to be executed, and generate the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.

Clause E10. The device according to Clause E1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause E11, the device according to Clause E1,

The running device is one or any combination of CPU, GPU and NPU

The device is set in the CPU and / or NPU;

The neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction,

The neural network calculation macro instruction also includes at least one of the following options: the identifier, input amount, output amount, operand, and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter including At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.

Clause E12. A machine learning computing device, the device comprising:

One or more neural network computing instruction generation devices as described in any one of clauses E1 to E11, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;

When the machine learning computing device includes a plurality of the neural network computing instruction generating devices, the plurality of the neural network computing instruction generating devices may be connected and transmit data through a specific structure;

Among them, a plurality of the neural network computing instruction generating devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network computing instruction generating devices share the same control system Or have their own control systems; multiple neural network computing instruction generating devices share memory or have their own memories; the interconnection method of multiple neural network computing instruction generating devices is any interconnected topology.

Clause E13. A combined processing device, the device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause E12;

Clause E14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause E12 or a combination described in Article E13 Processing device

The storage device is used for storing data;

Clause E15. A neural network computing instruction generation method, the method comprising:

According to the received neural network calculation macroinstruction, determine a running device that executes the neural network calculation macroinstruction;

Calculate the macro instruction and the operation device according to the neural network to generate an operation instruction,

Clause E16. According to the method described in Clause E15, according to the received neural network calculation macroinstruction, determining a running device that executes the neural network calculation macroinstruction includes:

When it is determined that the neural network computing macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the neural network computing macro instruction, determining the designated device as the running device,

Clause E17. The method according to Clause E16, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received neural network calculation macro instruction, determining a running device for executing the neural network calculation macro instruction includes:

When it is determined that the identification of the specified device is not included in the neural network calculation macro instruction, based on the received neural network calculation macro instruction and resource information of the candidate device, a determination is made from the candidate device A running device that executes the neural network calculation macro instruction,

Clause E18. According to the method described in Clause E17, based on the received neural network calculation macroinstruction, determining a running device that executes the neural network calculation macroinstruction includes:

When it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the neural network calculation macro instruction, calculate the macro instruction and the The resource information of the candidate equipment determines the operation equipment.

Clause E19. The method according to any one of Clause E16-Clause E18, the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,

According to the neural network calculation macro instruction and the operation device, generating an operation instruction includes:

Determine the data amount of the neural network calculation macro instruction, generate an operation instruction according to the data amount of the neural network calculation macro instruction, the neural network calculation macro instruction and the resource information of the operation device,

Clause E20. According to the method described in Clause E19, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction includes:

When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the neural network calculation macro instruction, the neural network is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions,

Clause E21. According to the method described in Clause E19, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction includes:

When it is determined that there are a plurality of operating devices, the neural network calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,

Clause E22. The method according to Clause E15, the method further comprising:

Clause E23. The method according to Clause E16, the method further comprising:

The neural network calculation instruction to be executed is received, and the neural network calculation macro instruction is generated according to the determined identifier of the designated device and the neural network calculation instruction to be executed.

Clause E24. The method according to Clause E15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause E25, according to the method described in Clause E15,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause F1, a neural network computing instruction processing system, the system includes an instruction generating device and an operating device,

The instruction generating device includes:

An instruction generation module, configured to calculate a macro instruction and the operation device according to the neural network, and generate an operation instruction;

The operation equipment includes:

Clause F2. The system according to Clause F1, the instruction generating device further includes:

The first determining submodule is configured to, when it is determined that the neural network calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the neural network calculation macro instruction, the specified device Determined to be the operating device;

The second determining submodule is configured to determine the neural network calculation macro instruction does not include the identifier of the specified device, according to the received neural network calculation macro instruction and the resource information of the candidate device, from the A running device for executing the neural network calculation macro instruction is determined from the candidate devices;

A third determining submodule, when it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the neural network calculation macro instruction, based The network calculation macro and the resource information of the candidate device determine the operating device,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the neural network calculation macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause F3. The system according to Clause F2, wherein the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,

Clause F4. The system according to Clause F3, the instruction generation module includes at least one of the following sub-modules:

Clause F5. The system according to Clause F1, the instruction generating device further includes:

Clause F6. The system according to Clause F2, the instruction generating device further includes:

Clause F7. The system according to Clause F1, the instruction generation device further includes:

Wherein, the instruction dispatching module includes:

Clause F8. The system according to Clause F1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause F9. The system according to Clause F1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause F10. The system according to Clause F9, the execution module includes:

Clause F11, the system according to Clause F1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The neural network calculation macro instruction includes at least one of a convolution calculation macro instruction and a pooled calculation macro instruction;

The neural network calculation macro instruction includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the neural network calculation macro instruction, and the instruction parameter includes a volume At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.

Clause F12. A machine learning computing device, the device comprising:

One or more neural network computing instruction processing systems as described in any one of clauses F1-F11, which are used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;

When the machine learning computing device includes multiple neural network computing instruction processing systems, the multiple neural network computing instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the neural network computing instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network computing instruction processing systems share the same control system Or have their own control systems; a plurality of the neural network computing instruction processing systems share memory or have their own memories; the interconnection method of the plurality of neural network computing instruction processing systems is any interconnected topology.

Clause F13. A combined processing device, the combined processing device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause F12;

Clause F14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause F12 or a combination described in the clause F13 Processing device

The storage device is used for storing data;

Clause F15. A neural network computing instruction processing method. The method is applied to a neural network computing instruction processing system. The system includes an instruction generating device and an operating device. The method includes:

Determining, by the instruction generating device, a macro instruction according to the received neural network calculation macro instruction, and executing an instruction to execute the neural network calculation macro instruction, and generating an operation instruction according to the neural network calculation macro instruction and the execution device;

Clause F16. The method according to Clause F15, the method further comprising:

Wherein, the instruction generating device determines the operating device that executes the neural network calculation macro instruction according to the received neural network calculation macro instruction, including at least one of the following:

When it is determined that the neural network computing macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the neural network computing macro instruction, determine the designated device as the running device;

When it is determined that the identification of the specified device is not included in the neural network calculation macro instruction, based on the received neural network calculation macro instruction and resource information of the candidate device, a determination is made from the candidate device A running device that executes the neural network calculation macro instruction;

When it is determined that the neural network calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the neural network calculation macro instruction, the macro instruction and all instructions are calculated according to the neural network Describe the resource information of the alternative equipment, determine the operating equipment,

Clause F17. According to the method of Clause F16, the neural network calculation macroinstruction further includes at least one of an input quantity and an output quantity,

Clause F18. According to the method of Clause F17, according to the data amount of the neural network calculation macroinstruction, the neural network calculation macroinstruction and the resource information of the running device, generating the running instruction, including at least one of the following:

When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the neural network calculation macro instruction, the neural network is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions;

Clause F19. The method according to Clause F15, the method further comprising:

Clause F20. The method according to Clause F16, the method further comprising:

The instruction generating device receives the neural network calculation instruction to be executed, and generates the neural network calculation macro instruction according to the determined identifier of the designated device and the neural network calculation instruction to be executed.

Clause F21. The method according to Clause F15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause F22. The method according to Clause F15, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause F23. The method according to Clause F15, the method further comprising:

Storing the operation instruction through the operation device;

Clause F24. The method according to Clause F23, the method further comprising:

Clause F25, according to the method described in Clause F15,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The neural network calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the neural network calculation macro instruction, the instruction parameter includes At least one of the size of the convolution kernel, the step size of the convolution kernel, and the filling of the convolution kernel.

For vector logic calculation instructions, the following vector logic calculation instruction generation devices and methods, and vector logic calculation instruction processing systems, methods and other related products can be used for processing. The foregoing content can be better understood according to the following clauses:

Clause G1, a vector logic calculation instruction generating device, the device comprising:

A device determination module, configured to determine a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction;

An instruction generation module, used to calculate a macro instruction and the running device according to the vector logic, and generate a running instruction,

Wherein, the vector logic calculation macro instruction refers to a macro instruction used to perform logic operation on a vector,

The vector logic calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.

Clause G2. The apparatus according to Clause G1, the device determination module includes:

The first determining submodule is configured to, when it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution condition for executing the vector logic calculation macro instruction, the specified device Determined to be the operating device,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the vector logic calculation macro instruction.

Clause G3. The device according to Clause G2, the device further comprising:

The device determination module also includes:

The second determining submodule is used to determine the vector logic calculation macro instruction and the resource information of the candidate device according to the received vector logic when determining that the vector logic calculation macro instruction does not include the identifier of the specified device, Among the candidate devices, a running device for executing the vector logic calculation macro instruction is determined,

Clause G4. The apparatus according to Clause G3, the device determination module, further comprising:

A third determining submodule, when it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the vector logic calculation macro instruction, according to the vector Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device.

Clause G5. The device according to any one of Clause G2- Clause G4, the vector logic calculation macro instruction further includes at least one of an input quantity and an output quantity,

The instruction generation module is further used to determine the data amount of the vector logic calculation macro instruction, generate the operation instruction according to the data amount, the vector logic calculation macro instruction and the resource information of the running device,

Clause G6. The apparatus according to Clause G5, the instruction generation module includes:

The first instruction generating submodule is used for determining that the number of running devices is one, and the amount of running data of the running device is less than the amount of data of the vector logic calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the vector logic calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,

Clause G7. The apparatus according to Clause G5, the instruction generation module includes:

The second instruction generation sub-module is used to split the vector logic calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,

Clause G8. The device according to Clause G1, the device further comprising:

Clause G9. The device according to Clause G2, the device further comprising:

The macro generation module is configured to receive a vector logic calculation instruction to be executed, and generate the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.

Clause G10. The device according to Clause G1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause G11, the device according to Clause G1,

The running device is one or any combination of CPU, GPU and NPU

The device is set in the CPU and / or NPU;

The vector logic calculation macro instruction includes at least one of the following instructions: vector and calculation macro instruction, vector or calculation macro instruction, vector non-calculation macro instruction, vector comparison calculation macro instruction;

The vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameters of the designated device used to execute the vector logic calculation macro instruction, and the instruction parameters include At least one of the address and length of the second operand.

Clause G12, a machine learning computing device, the device comprising:

One or more vector logic calculation instruction generating devices as described in any one of clauses G1 to G11, used to obtain data to be operated and control information from other processing apparatuses, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the vector logic calculation instruction generation devices, the plurality of vector logic calculation instruction generation devices can be connected and transmit data through a specific structure;

Among them, a plurality of the vector logic calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the vector logic calculation instruction generation devices share the same control system Or have their own control systems; a plurality of the vector logic calculation instruction generating devices share memory or have their own memories; the interconnection mode of the plurality of vector logic calculation instruction generating devices is an arbitrary interconnection topology.

Clause G13. A combined processing device, the device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in Clause G12;

Clause G14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article G12 or a combination described in Article G13 Processing device

The storage device is used for storing data;

Clause G15, a vector logic calculation instruction generation method, the method including:

Determining a running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction;

Calculate the macro instruction and the running device according to the vector logic to generate a running instruction,

Clause G16. According to the method described in Clause G15, according to the received vector logic calculation macroinstruction, determining a running device that executes the vector logic calculation macroinstruction includes:

When it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the vector logic calculation macro instruction, determining the specified device as the running device,

Wherein, the execution condition includes that the specified device contains an instruction set corresponding to the vector logic calculation macro instruction.

Clause G17. The method according to Clause G16, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received vector logic calculation macro instruction, determining a running device that executes the vector logic calculation macro instruction includes:

When it is determined that the vector logic calculation macro instruction does not include the identifier of the specified device, the macro instruction and the resource information of the candidate device are calculated according to the received vector logic calculation, and the A running device that executes the vector logic calculation macro instruction,

Clause G18. According to the method described in Clause G17, according to the received vector logic calculation macroinstruction, determining a running device that executes the vector logic calculation macroinstruction includes:

When it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the vector logic calculation macro instruction, the macro instruction and the The resource information of the candidate equipment determines the operation equipment.

Clause G19. The method according to any one of Clause G16-G18, the vector logic calculation macro instruction further includes at least one of an input quantity and an output quantity,

Calculate the macro instruction and the operation device according to the vector logic to generate the operation instruction, including:

Determining the data amount of the vector logic calculation macro instruction, generating the execution instruction according to the data amount, the vector logic calculation macro instruction and the resource information of the running device,

Clause G20. According to the method described in Clause G19, calculate the data amount of the macro instruction according to the vector logic, the vector logic calculation macro instruction and the resource information of the running device, to generate the running instruction, including:

When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the vector logic calculation macro instruction, the vector logic is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions,

Clause G21. According to the method described in Clause G19, according to the data amount of the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device, generating the running instruction includes:

When it is determined that there are a plurality of running devices, the vector logic calculation macro instruction is split according to the running data amount of each running device and the data amount, and a running instruction corresponding to each running device is generated,

Clause G22. The method according to Clause G15, the method further comprising:

Clause G23. The method according to Clause G16, the method further comprising:

Receiving the vector logic calculation instruction to be executed, and generating the vector logic calculation macro instruction according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.

Clause G24. The method according to Clause G15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause G25, according to the method described in Clause G15,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause H1, a vector logic calculation instruction processing system, the system includes an instruction generation device and an operation device,

The instruction generating device includes:

An instruction generation module, used to calculate a macro instruction and the running device according to the vector logic, and generate a running instruction;

The operation equipment includes:

Clause H2. The system according to Clause H1, the instruction generating device further includes:

The first determining submodule is configured to, when it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution condition for executing the vector logic calculation macro instruction, the specified device Determined to be the operating device;

The second determining submodule is used to determine the vector logic calculation macro instruction and the resource information of the candidate device according to the received vector logic when determining that the vector logic calculation macro instruction does not include the identifier of the specified device, A running device for executing the vector logic calculation macro instruction in the candidate device is determined;

A third determining submodule, when it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the vector logic calculation macro instruction, according to the vector Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the vector logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.

Clause H3. The system according to Clause H2, the vector logic calculation macroinstruction further includes at least one of an input quantity and an output quantity,

The instruction generation module is further used to determine the data amount of the vector logic calculation macro instruction, and generate the data amount according to the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device to generate Run instruction,

Clause H4. The system according to Clause H3, the instruction generation module includes at least one of the following sub-modules:

Clause H5. The system according to Clause H1, the instruction generating device further includes:

Clause H6. The system according to Clause H2, the instruction generating device further includes:

Clause H7. The system according to Clause H1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause H8. The system according to Clause H1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause H9. The system according to Clause H1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause H10. The system according to Clause H9, the execution module includes:

Clause H11, the system according to Clause H1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause H12. A machine learning computing device, the device comprising:

One or more vector logic calculation instruction processing systems as described in any one of Clause H1-Clause H11, used to obtain data to be calculated and control information from other processing devices, and perform specified machine learning operations, which will execute the results Passed to other processing devices through the I / O interface;

When the machine learning operation device includes a plurality of the vector logic calculation instruction processing systems, the plurality of vector logic calculation instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the vector logic calculation instruction processing systems interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the vector logic calculation instruction processing systems share the same control system Or have their own control systems; multiple of the vector logic computing instruction processing systems share memory or have their own memory; the interconnection method of multiple vector logic computing instruction processing systems is any interconnected topology.

Clause H13. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause H12;

Clause H14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the clause H12 or a combination described in the clause H13 Processing device

The storage device is used for storing data;

Clause H15. A vector logic calculation instruction processing method. The method is applied to a vector logic calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

Determining, by the instruction generating device, the macro instruction according to the received vector logic calculation macro, and determining the execution device that executes the vector logic calculation macro instruction, and generating the execution instruction according to the vector logic calculation macro instruction and the execution device;

Clause H16. The method according to Clause H15, the method further comprising:

Wherein, the instruction generating device determines the running device that executes the vector logic calculation macro instruction according to the received vector logic calculation macro instruction, including at least one of the following:

When it is determined that the vector logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the vector logic calculation macro instruction, determine the specified device as the running device;

When it is determined that the vector logic calculation macro instruction does not include the identifier of the specified device, the macro instruction and the resource information of the candidate device are calculated according to the received vector logic calculation, and the A running device that executes the vector logic calculation macro instruction;

When it is determined that the vector logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the vector logic calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,

Clause H17. According to the method of Clause H16, the vector logic calculation macroinstruction further includes at least one of an input quantity and an output quantity,

Determining the data amount of the vector logic calculation macro instruction, generating the execution instruction according to the data amount of the vector logic calculation macro instruction, the vector logic calculation macro instruction and the resource information of the running device,

Clause H18. According to the method described in Clause H17, according to the vector logic calculation macro instruction data amount, the vector logic calculation macro instruction, and the resource information of the running device, generating the running instruction, including at least one of the following:

When it is determined that there is only one running device and the running data amount of the running device is less than the data amount of the vector logic calculation macro instruction, the vector logic is adjusted according to the running data amount of the running device and the data amount The calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions;

Clause H19. The method according to Clause H15, the method further comprising:

Clause H20. The method according to Clause H16, the method further comprising:

The vector logic calculation instruction to be executed is received by the instruction generating device, and the vector logic calculation macro instruction is generated according to the determined identifier of the designated device and the vector logic calculation instruction to be executed.

Clause H21. The method according to Clause H15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause H22. The method according to Clause H15, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause H23. The method according to Clause H15, the method further comprising:

Storing the operation instruction through the operation device;

Clause H24. The method according to Clause H23, the method further comprising:

Clause H25, the method according to Clause H15,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The vector logic calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the vector logic calculation macro instruction.

Matrix vector calculation instructions can be processed by the following matrix vector calculation instruction generation devices and methods, as well as matrix vector calculation instruction processing systems and methods, and related products. The foregoing can be better understood based on the following clauses:

Clause I1, a matrix vector calculation instruction generating device, the device comprising:

A device determination module, configured to calculate a macro instruction based on the received matrix vector, and determine a running device that executes the matrix vector calculation macro instruction;

An instruction generating module, configured to calculate a macro instruction and the running device according to the matrix vector, and generate a running instruction,

The matrix vector calculation macro instruction refers to a macro instruction used to calculate matrix and vector,

The matrix vector calculation macro instruction includes the operation type, input address and output address, and the operation instruction includes the operation type, operation input address and operation output address. The operation input address and the operation output address are respectively The input address and the output address are determined.

Clause I2. The apparatus according to Clause I1, the device determination module includes:

The first determining submodule is configured to, when it is determined that the matrix vector calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the matrix vector calculation macro instruction, the specified device Determined to be the operating device,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the matrix vector calculation macro instruction.

Clause I3. The device according to Clause I2, the device further comprising:

The device determination module also includes:

The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received matrix vector when it is determined that the matrix vector calculation macro instruction does not include the identifier of the specified device, from the Among the candidate devices, a running device for executing the matrix vector calculation macro instruction is determined,

Clause I4. The apparatus according to Clause I3, the device determination module, further comprising:

A third determining submodule, when it is determined that the matrix vector calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the matrix vector calculation macro instruction, according to the matrix The vector calculation macro instruction and the resource information of the candidate device determine the running device.

Clause I5. The device according to any one of Clause I2- Clause I4, the matrix vector calculation macroinstruction further includes at least one of an input quantity and an output quantity,

The instruction generation module is also used to determine the data amount of the matrix vector calculation macro instruction, generate the operation instruction according to the data amount, the matrix vector calculation macro instruction and the resource information of the operation device,

Clause I6. The apparatus according to Clause I5, the instruction generation module includes:

The first instruction generating submodule is used for determining that the number of the running device is one, and the amount of running data of the running device is less than the data amount of the matrix vector calculation macro instruction, according to the amount of running data of the running device and The amount of data splits the matrix vector calculation macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,

Clause I7. The apparatus according to Clause I5, the instruction generation module includes:

The second instruction generation sub-module is used to split the matrix vector calculation macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,

Clause I8. The device according to Clause I1, the device further comprising:

Clause I9. The device according to Clause I2, the device further comprising:

The macro generation module is configured to receive a matrix vector calculation instruction to be executed, and generate the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.

Clause I10. The device according to Clause I1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause I11, the device according to Clause I1,

The running device is one or any combination of CPU, GPU and NPU

The device is set in the CPU and / or NPU;

The matrix vector calculation macro instruction includes at least one of the following instructions: matrix multiply vector calculation macro instruction, vector multiply matrix calculation macro instruction, tensor calculation macro instruction, matrix addition calculation macro instruction, and matrix subtraction calculation macro instruction;

The matrix vector calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand and instruction parameter of the designated device for executing the matrix vector calculation macro instruction, the instruction parameter includes At least one of the address and length of the second operand.

Clause I12. A machine learning computing device, the device comprising:

One or more matrix vector calculation instruction generating devices as described in any one of clauses I1-I11, used to obtain data to be operated and control information from other processing apparatuses, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of matrix vector calculation instruction generating devices, the plurality of matrix vector calculation instruction generating devices may be connected and transmit data through a specific structure;

Among them, a plurality of the matrix vector calculation instruction generating devices interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the matrix vector calculation instruction generating devices share the same control system Or have their own control systems; a plurality of the matrix vector calculation instruction generating devices share memory or have their own memory; the interconnection mode of the plurality of matrix vector calculation instruction generating devices is an arbitrary interconnection topology.

Clause I13. A combined processing device, the device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause I12;

Clause I14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause I12 or a combination described in Clause I13 Processing device

The storage device is used for storing data;

Clause I15. A method for generating a matrix vector calculation instruction, the method comprising:

Determining a macro instruction according to the received matrix vector calculation macro instruction, and determining a running device that executes the matrix vector calculation macro instruction;

Calculate the macro instruction and the running device according to the matrix vector to generate a running instruction,

Clause I16. According to the method described in Clause I15, according to the received matrix vector calculation macroinstruction, determining a running device that executes the matrix vector calculation macroinstruction includes:

When it is determined that the matrix vector calculation macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the matrix vector calculation macro instruction, determining the designated device as the running device,

Clause I17. The method according to Clause I16, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received matrix vector calculation macro instruction, determining a running device that executes the matrix vector calculation macro instruction includes:

When it is determined that the matrix vector calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received matrix vector, and determine the A running device that executes the matrix vector calculation macro instruction,

Clause I18. According to the method described in Clause I17, determining a running device that executes the matrix vector calculation macroinstruction according to the received matrix vector calculation macroinstruction includes:

When it is determined that the matrix vector calculation macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the matrix vector calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.

Clause I19. The method according to any one of Clause I16-Clause I18, the matrix vector calculation macro instruction further includes at least one of an input quantity and an output quantity,

The calculation of the macro instruction and the operation device according to the matrix vector to generate the operation instruction includes:

Determining the data amount of the matrix vector calculation macro instruction, generating the execution instruction according to the data amount, the matrix vector calculation macro instruction and the resource information of the running device,

Clause I20. According to the method described in Clause I19, calculating the data amount of the macro instruction according to the matrix vector, the matrix vector calculating the macro instruction and the resource information of the running device, generating the running instruction, including:

When it is determined that there is only one running device and the amount of running data of the running device is smaller than the data amount of the matrix vector calculation macro instruction, the matrix vector is calculated according to the running data amount of the running device and the data amount The calculation macro instruction is divided into multiple running instructions, so that the running device sequentially executes multiple running instructions,

Clause I21. According to the method described in Clause I19, calculating the data amount of the macro instruction according to the matrix vector, the matrix vector calculating the macro instruction and the resource information of the running device, generating the running instruction, including:

When it is determined that there are a plurality of operating devices, the matrix vector calculation macro instruction is split according to the operating data amount of each operating device and the data amount to generate an operating instruction corresponding to each operating device,

Clause I22. The method according to Clause I15, the method further comprising:

Clause I23. The method according to Clause I16, the method further comprising:

Receiving the matrix vector calculation instruction to be executed, and generating the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.

Clause I24. The method according to Clause I15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause I25, the method according to Clause I15,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause J1, a matrix vector calculation instruction processing system, the system includes an instruction generation device and an operation device,

The instruction generating device includes:

An instruction generating module, configured to calculate a macro instruction and the running device according to the matrix vector, and generate a running instruction;

The operation equipment includes:

Clause J2. The system according to Clause J1, the instruction generating device further includes:

The first determining submodule is configured to, when it is determined that the matrix vector calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the matrix vector calculation macro instruction, the specified device Determined to be the operating device;

The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received matrix vector when it is determined that the matrix vector calculation macro instruction does not include the identifier of the specified device, from the A running device for executing the matrix vector calculation macro instruction is determined from the candidate devices;

A third determining submodule, when it is determined that the matrix vector calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the matrix vector calculation macro instruction, according to the matrix Vector calculation macro instruction and the resource information of the candidate equipment to determine the operation equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the matrix vector calculation macro instruction, and the resource information includes the instruction set included in the candidate device.

Clause J3. The system according to Clause J2, the matrix vector calculation macroinstruction further includes at least one of input and output,

The instruction generation module is further used to determine the data amount of the matrix vector calculation macro instruction, generate the data amount of the matrix vector calculation macro instruction, the matrix vector calculation macro instruction and the resource information of the running device to generate Run instruction,

Clause J4. The system according to Clause J3, the instruction generation module includes at least one of the following sub-modules:

Clause J5. The system according to Clause J1, the instruction generating device further includes:

Clause J6. The system according to Clause J2, the instruction generating device further includes:

Clause J7. The system according to Clause J1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause J8. The system according to Clause J1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause J9. The system according to Clause J1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause J10. The system according to Clause J9, the execution module includes:

Clause J11, the system according to Clause J1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause J12. A machine learning computing device, the device comprising:

One or more matrix vector calculation instruction processing systems as described in any one of clauses J1 to J11, used to obtain data to be operated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the matrix vector calculation instruction processing systems, the plurality of matrix vector calculation instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the matrix vector calculation instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the matrix vector calculation instruction processing systems share the same control system Or have their own control systems; multiple of the matrix vector calculation instruction processing systems share memory or have their own memory; the interconnection method of multiple matrix vector calculation instruction processing systems is any interconnected topology.

Clause J13. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause J12;

Clause J14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause J12 or a combination described in Article J13 Processing device

The storage device is used for storing data;

Clause J15. A matrix vector calculation instruction processing method. The method is applied to a matrix vector calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

The instruction generation device calculates a macro instruction according to the received matrix vector, determines an operation device that executes the matrix vector calculation macro instruction, and generates an operation instruction according to the matrix vector calculation macro instruction and the operation device;

Clause J16. The method according to Clause J15, the method further comprising:

Wherein, the instruction generation device determining the macro instruction according to the received matrix vector calculation macro instruction to determine the execution device that executes the matrix vector calculation macro instruction includes at least one of the following:

When it is determined that the matrix vector calculation macro instruction includes the identification of the designated device, and the resources of the designated device satisfy the execution conditions for executing the matrix vector calculation macro instruction, the designated device is determined as the running device;

When it is determined that the matrix vector calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received matrix vector, and determine the A running device that executes the matrix vector calculation macro instruction;

When it is determined that the matrix vector calculation macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the matrix vector calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,

Clause J17. According to the method of Clause J16, the matrix vector calculation macro instruction further includes at least one of an input quantity and an output quantity,

Determining the data amount of the matrix vector calculation macro instruction, generating the execution instruction according to the matrix vector calculation macro instruction data amount, the matrix vector calculation macro instruction and the resource information of the running device,

Clause J18. According to the method described in Clause J17, calculate the data amount of the macro instruction according to the matrix vector, the matrix vector calculation macro instruction and the resource information of the running device, to generate the running instruction, including at least one of the following:

When it is determined that there is only one running device and the amount of running data of the running device is smaller than the data amount of the matrix vector calculation macro instruction, the matrix vector is calculated according to the running data amount of the running device and the data amount The calculation macro instruction is split into multiple running instructions, so that the running device sequentially executes multiple running instructions;

Clause J19. The method according to Clause J15, the method further comprising:

Clause J20. The method according to Clause J16, the method further comprising:

Receiving the matrix vector calculation instruction to be executed by the instruction generating device, and generating the matrix vector calculation macro instruction according to the determined identifier of the designated device and the matrix vector calculation instruction to be executed.

Clause J21. The method according to Clause J15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause J22. The method according to Clause J15, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause J23. The method according to Clause J15, the method further comprising:

Storing the operation instruction through the operation device;

Clause J24. The method according to Clause J23, the method further comprising:

Clause J25, according to the method described in Clause J15,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The matrix vector calculation macro instruction also includes at least one of the following options: the identification, input amount, output amount, operand, and instruction parameter of the designated device for executing the matrix vector calculation macro instruction, the instruction parameter including At least one of the address and length of the second operand.

For scalar calculation instructions, they can be processed by the following scalar calculation instruction generation devices and methods, and scalar calculation instruction processing systems, methods, and other related products. The foregoing can be better understood according to the following clauses:

Clause K1, a scalar calculation instruction generating device, the device comprising:

A device determination module, configured to calculate a macro instruction according to the received scalar calculation, and determine a running device that executes the scalar calculation macro instruction;

An instruction generating module, configured to calculate a macro instruction and the running device according to the scalar, and generate a running instruction,

Wherein, the scalar calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,

The scalar calculation macro instruction includes the operation type, the first operand, the second operand, and the output address. The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first running operand, the second running operand, and the running output address are determined according to the first operand, the second operand, and the output address, respectively.

Clause K2. The apparatus according to Clause K1, the device determination module includes:

The first determining submodule is configured to determine that the designated device is determined as the designated device when the scalar calculation macro instruction includes the identification of the designated device and the resource of the designated device satisfies the execution condition for executing the scalar calculation macro instruction The operating equipment,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the scalar calculation macro instruction.

Clause K3. The device according to Clause K2, the device further comprising:

The device determination module also includes:

A second determining submodule, configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar when determining that the scalar calculation macro instruction does not include the identification of the specified device A running device for executing the scalar calculation macro instruction is determined in the device,

Clause K4. The apparatus according to Clause K3, the device determination module, further comprising:

A third determination submodule, when it is determined that the scalar calculation macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the scalar calculation macro instruction, calculate the macro according to the scalar calculation The instruction and the resource information of the candidate device determine the operating device.

Clause K5. The device according to Clause K1, the device further comprising:

Clause K6. The device according to Clause K2, the device further comprising:

The macro generation module is configured to receive a scalar calculation instruction to be executed, and generate the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.

Clause K7. The device according to Clause K1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause K8, the device according to Clause K1,

The running device is one or any combination of CPU, GPU and NPU

The device is set in the CPU and / or NPU;

The scalar calculation macro instruction includes at least one of the following instructions: scalar addition calculation macro instruction, scalar subtraction calculation macro instruction, scalar multiplication calculation macro instruction, and scalar division calculation macro instruction.

Clause K9. A machine learning computing device, the device comprising:

One or more scalar calculation instruction generation devices as described in any one of Clause K1-Clause K8, used to obtain data to be calculated and control information from other processing devices, and perform a specified machine learning operation, and pass the execution result through I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the scalar calculation instruction generation devices, the plurality of the scalar calculation instruction generation devices may be connected and transmit data through a specific structure;

Among them, a plurality of the scalar calculation instruction generation devices interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the scalar calculation instruction generation devices share the same control system or own Respective control systems; a plurality of the scalar calculation instruction generation devices share memory or have their own memories; the interconnection method of the plurality of scalar calculation instruction generation devices is an arbitrary interconnection topology.

Clause K10. A combined processing device, the device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in Clause K9;

Clause K11, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause K9 or a combination described in Article K10 Processing device

The storage device is used for storing data;

Clause K12. A method for generating a scalar calculation instruction, the method comprising:

Determining a running device that executes the scalar calculation macro instruction according to the received scalar calculation macro instruction;

Calculate the macro instruction and the operation device according to the scalar, and generate an operation instruction,

The scalar calculation macro instruction includes the operation type, the first operand, the second operand, and the output address,

The operation instruction includes the operation type, the first operation operand, the second operation operand, and the operation output address. The first operation operand, the second operation operand, and the operation output address are based on The first operand, the second operand, and the output address are determined.

Clause K13. According to the method described in Clause K12, based on the received scalar calculation macro instruction, determining the running device that executes the scalar calculation macro instruction includes:

When it is determined that the scalar calculation macro instruction includes the identification of the designated device, and the resources of the designated device satisfy the execution conditions for executing the scalar calculation macro instruction, determining the designated device as the running device,

Clause K14. The method according to Clause K13, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received scalar calculation macro instruction, determining the running device for executing the scalar calculation macro instruction includes:

When it is determined that the identifier of the designated device is not included in the scalar calculation macro instruction, based on the received scalar calculation macro instruction and the resource information of the candidate device, the candidate device is determined for execution Describe the scalar calculation macro operation equipment,

Clause K15. According to the method described in Clause K14, according to the received scalar calculation macro instruction, determine the running device that executes the scalar calculation macro instruction, including:

When it is determined that the scalar calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar calculation macro instruction, the macro instruction and the alternative are calculated according to the scalar calculation macro instruction The resource information of the equipment determines the operating equipment.

Clause K16. The method according to Clause K12, the method further comprising:

Clause K17. The method according to Clause K13, the method further comprising:

Receiving a scalar calculation instruction to be executed, and generating the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.

Clause K18. The method according to Clause K12, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause K19, according to the method described in Clause K12,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause L1, a scalar calculation instruction processing system, the system includes an instruction generation device and an operation device,

The instruction generating device includes:

An instruction generating module, configured to calculate a macro instruction and the running device according to the scalar, and generate a running instruction;

The operation equipment includes:

Clause L2. The system according to Clause L1, the instruction generating device further includes:

The first determining submodule is configured to determine that the designated device is determined as the designated device when the scalar calculation macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the scalar calculation macro instruction The operating equipment;

A second determining submodule, configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar when determining that the scalar calculation macro instruction does not include the identification of the specified device A running device for executing the scalar calculation macro instruction is determined in the device;

A third determination submodule, when it is determined that the scalar calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition of executing the scalar calculation macro instruction, calculate the macro according to the scalar calculation macro instruction Instruction and resource information of the candidate device, to determine the operating device,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar calculation macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause L3. The system according to Clause L1, the instruction generating device further includes:

Clause L4. The system according to Clause L2, the instruction generating device further includes:

Clause L5. The system according to Clause L1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause L6. The system according to Clause L1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause L7. The system according to Clause L1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause L8. The system according to Clause L7, the execution module includes:

Clause L9, the system according to Clause L1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause L10. A machine learning computing device, the device comprising:

One or more scalar calculation instruction processing systems as described in any one of Clause L1-Clause L9, used to obtain data and control information to be calculated from other processing devices, and perform designated machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the scalar calculation instruction processing systems, the plurality of the scalar calculation instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the scalar calculation instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the scalar calculation instruction processing systems share the same control system or own Respective control systems; a plurality of the scalar calculation instruction processing systems share memory or have respective memories; the interconnection method of the plurality of scalar calculation instruction processing systems is an arbitrary interconnection topology.

Clause L11. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause L110;

Clause L12, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause L10 or a combination described in Article L13 Processing device

The storage device is used for storing data;

Clause L13. A scalar calculation instruction processing method. The method is applied to a scalar calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

Determining, by the instruction generation device, a macro instruction according to the received scalar calculation macro, to determine an operation device that executes the scalar calculation macro instruction, and generating an operation instruction according to the scalar calculation macro instruction and the execution device;

Clause L14. The method according to Clause L13, the method further comprising:

Wherein, the instruction generating device determines the running device that executes the scalar calculation macro instruction according to the received scalar calculation macro instruction, including at least one of the following:

When it is determined that the scalar calculation macro instruction includes the identification of the designated device, and the resource of the designated device satisfies the execution conditions for executing the scalar calculation macro instruction, the designated device is determined as the running device;

When it is determined that the identifier of the designated device is not included in the scalar calculation macro instruction, based on the received scalar calculation macro instruction and the resource information of the candidate device, the candidate device is determined for execution Describe the scalar calculation macro operation equipment;

When it is determined that the scalar calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar calculation macro instruction, the macro instruction and the alternative are calculated according to the scalar calculation macro instruction Equipment resource information, determine the operating equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar calculation macro instruction, and the resource information includes the instruction set included in the candidate device.

Clause L15. The method according to Clause L13, the method further comprising:

Clause L16. The method according to Clause L14, the method further comprising:

The instruction generating device receives the scalar calculation instruction to be executed, and generates the scalar calculation macro instruction according to the determined identifier of the designated device and the scalar calculation instruction to be executed.

Clause L17. The method according to Clause L13, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause L18. The method according to Clause L13, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause L19. The method according to Clause L13, the method further comprising:

Storing the operation instruction through the operation device;

Clause L20. The method according to Clause L19, the method further comprising:

Clause L21, according to the method described in Clause L13,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

For scalar logic calculation instructions, the following scalar logic calculation instruction generation devices and methods, and scalar logic calculation instruction processing systems, methods and other related products can be used for processing, and the foregoing content can be better understood according to the following clauses:

Clause M1, a scalar logic calculation instruction generating device, the device comprising:

A device determination module, configured to calculate a macro instruction according to the received scalar logic, and determine a running device that executes the scalar logic calculation macro instruction;

An instruction generation module, used to calculate a macro instruction and the running device according to the scalar logic, and generate a running instruction,

Wherein, the scalar logic calculation macro instruction refers to a macro instruction used to perform logical operation on the scalar,

The scalar logic calculation macro instruction includes operation type, first operand, second operand and output address,

Clause M2. The apparatus according to Clause M1, the device determination module includes:

The first determining submodule is configured to, when it is determined that the scalar logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the scalar logic calculation macro instruction, the specified device Determined to be the operating device,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the scalar logic calculation macro instruction.

Clause M3. The device according to Clause M2, the device further comprising:

The device determination module also includes:

The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar logic when it is determined that the scalar logic calculation macro instruction does not include the identification of the specified device. Among the candidate devices, a running device for executing the scalar logic calculation macro instruction is determined,

Clause M4. The apparatus according to Clause M3, the device determination module further includes:

The third determining submodule, when determining that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar logic calculation macro instruction, according to the The scalar logic calculates the macro instruction and the resource information of the candidate device to determine the running device.

Clause M5. The device according to Clause M1, the device further comprising:

Clause M6. The device according to Clause M2, the device further comprising:

The macro generation module is used for receiving a scalar logic calculation instruction to be executed, and generating the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.

Clause M7. The device according to Clause M1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause M8, the device according to Clause M1,

The running device is one or any combination of CPU, GPU and NPU

The device is set in the CPU and / or NPU;

The scalar logic calculation macro instruction includes at least one of the following instructions: a scalar and calculation macro instruction, a scalar or calculation macro instruction, a scalar non-calculation macro instruction, and a scalar comparison calculation macro instruction.

Clause M9. A machine learning computing device, the device comprising:

One or more scalar logic calculation instruction generation devices as described in any one of clauses M1 to M8, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the scalar logic calculation instruction generation devices, the plurality of the scalar logic calculation instruction generation devices can be connected and transmit data through a specific structure;

Among them, a plurality of the scalar logic calculation instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the scalar logic calculation instruction generation devices share the same control system Or have their own control systems; multiple scalar logic calculation instruction generating devices share memory or have their own memory; multiple scalar logic calculation instruction generating devices are interconnected in an arbitrary interconnection topology.

Clause M10. A combined processing device, the device comprising:

Machine learning computing devices, general interconnect interfaces and other processing devices as described in clause M9;

Wherein, the combined processing device further includes: a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device .

Clause M11, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause M9 or a combination described in the clause M10 Processing device

The storage device is used for storing data;

Clause M12, a method for generating a scalar logic calculation instruction, the method comprising:

According to the received scalar logic calculation macro instruction, determine the running device that executes the scalar logic calculation macro instruction;

Calculate the macro instruction and the operation device according to the scalar logic, and generate an operation instruction,

Wherein, the scalar logic calculation macro instruction refers to a macro instruction for performing logical operation on a scalar, the scalar logic calculation macro instruction includes an operation type, a first operand, a second operand and an output address,

Clause M13. According to the method described in Clause M12, according to the received scalar logic calculation macroinstruction, determining a running device that executes the scalar logic calculation macroinstruction includes:

When it is determined that the scalar logic calculation macro instruction includes the identification of the designated device, and the resources of the designated device satisfy the execution conditions for executing the scalar logic calculation macro instruction, determining the designated device as the running device,

Clause M14. The method according to Clause M13, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received scalar logic calculation macro instruction, determining a running device that executes the scalar logic calculation macro instruction includes:

When it is determined that the scalar logic calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received scalar logic calculation, and determine the A running device that executes the scalar logic calculation macro instruction,

Clause M15. According to the method described in Clause M14, according to the received scalar logic calculation macro instruction, determining a running device that executes the scalar logic calculation macro instruction includes:

When it is determined that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the scalar logic calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.

Clause M16. The method according to Clause M12, the method further comprising:

Clause M17. The method according to Clause M13, the method further comprising:

Receiving a scalar logic calculation instruction to be executed, and generating the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.

Clause M18. The method according to Clause M12, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause M19, according to the method described in Clause M12,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause N1, a scalar logic calculation instruction processing system, the system includes an instruction generation device and an operation device,

The instruction generating device includes:

An instruction generation module, configured to calculate a macro instruction and the operation device according to the scalar logic, and generate an operation instruction;

The operation equipment includes:

Clause N2. The system according to Clause N1, the instruction generating device further includes:

The first determining submodule is configured to, when it is determined that the scalar logic calculation macro instruction includes the identification of the specified device, and the resource of the specified device satisfies the execution conditions for executing the scalar logic calculation macro instruction, the specified device Determined to be the operating device;

The second determination submodule is configured to calculate the macro instruction and the resource information of the candidate device according to the received scalar logic when it is determined that the scalar logic calculation macro instruction does not include the identification of the specified device. A running device for executing the scalar logic calculation macro instruction in the candidate device is determined;

A third determination submodule, when it is determined that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the scalar logic calculation macro instruction, based on the scalar Logically calculate the macro instruction and the resource information of the candidate device to determine the operating device,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the scalar logic calculation macro instruction, and the resource information includes the instruction set included in the candidate device.

Clause N3. The system according to Clause N1, the instruction generating device further includes:

Clause N4. The system according to Clause N2, the instruction generating device further includes:

Clause N5. The system according to Clause N1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause N6. The system according to Clause N1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause N7. The system according to Clause N1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause N8. The system according to Clause N7, the execution module includes:

Clause N9, the system according to Clause N1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause N10. A machine learning computing device, the device comprising:

One or more scalar logic calculation instruction processing systems as described in any one of clauses N1 to N9, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, passing the execution result through The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the scalar logic calculation instruction processing systems, the plurality of scalar logic calculation instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the scalar logic computing instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the scalar logic computing instruction processing systems share the same control system Or have their own control systems; multiple scalar logic calculation instruction processing systems share memory or have their own memory; multiple scalar logic calculation instruction processing systems are interconnected in an arbitrary interconnection topology.

Clause N11. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause N110;

Clause N12, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the article N10 or a combination described in the article N13 Processing device

The storage device is used for storing data;

Clause N13. A scalar logic calculation instruction processing method. The method is applied to a scalar logic calculation instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

The instruction generation device calculates a macro instruction according to the received scalar logic, determines an operation device that executes the scalar logic calculation macro instruction, and calculates the macro instruction and the operation device according to the scalar logic to generate an operation instruction;

Wherein, the scalar logic calculation macro instruction refers to a macro instruction used for arithmetic operation on the scalar,

Clause N14. The method according to Clause N13, the method further comprising:

Wherein, the instruction generation device determines the operation device that executes the scalar logic calculation macro instruction according to the received scalar logic calculation macro instruction, including at least one of the following:

When it is determined that the scalar logic calculation macro instruction includes the identification of the specified device, and the resources of the specified device satisfy the execution conditions for executing the scalar logic calculation macro instruction, determine the specified device as the running device;

When it is determined that the scalar logic calculation macro instruction does not include the identification of the designated device, calculate the macro instruction and the resource information of the candidate device according to the received scalar logic calculation, and determine the A running device that executes the scalar logic calculation macro instruction;

When it is determined that the scalar logic calculation macro instruction includes the identifier of the specified device, and the resources of the specified device do not satisfy the execution conditions for executing the scalar logic calculation macro instruction, the macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,

Clause N15. The method according to Clause N13, the method further comprising:

Clause N16. The method according to Clause N14, the method further comprising:

The instruction generating device receives the scalar logic calculation instruction to be executed, and generates the scalar logic calculation macro instruction according to the determined identifier of the designated device and the scalar logic calculation instruction to be executed.

Clause N17. The method according to Clause N13, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause N18. The method according to Clause N13, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause N19. The method according to Clause N13, the method further comprising:

Storing the operation instruction through the operation device;

Clause N20. The method according to Clause N19, the method further comprising:

Clause N21, according to the method described in Clause N13,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The control instructions can be processed by the following control instruction generation devices and methods, as well as control instruction processing systems and methods, and related products. The foregoing can be better understood according to the following clauses:

Clause O1, a control instruction generating device, the device comprising:

A device determination module, configured to determine a running device that executes the control macro instruction according to the received control macro instruction;

An instruction generation module, configured to generate an operation instruction according to the control macro instruction and the operation device,

Wherein, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position,

The control macro instruction includes an operation type and a target jump position, and the run instruction includes the operation type and a run target jump position, and the run target jump position is determined according to the target jump position.

Clause O2. The apparatus according to Clause O1, the device determination module includes:

The first determining submodule is configured to determine the designated device as the control device when it is determined that the control macro instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the control macro instruction Running equipment,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the operation type of the control macro instruction.

Clause O3. The device according to Clause O2, the device further comprising:

The device determination module also includes:

The second determining sub-module is used to determine from the backup device according to the operation type of the received control macro instruction and the resource information of the candidate device when it is determined that the control macro instruction does not include the identification of the designated device The operating device for executing the control macro instruction is selected from the selected devices,

Clause O4. The apparatus according to Clause O3, the device determination module further includes:

A third determination submodule, when it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the Describe the resource information of the alternative equipment and determine the operation equipment.

Clause O5. The device according to Clause O1, the device further comprising:

Clause O6. The device according to Clause O2, the device further comprising:

The macro generation module is used for receiving the control instruction to be executed, and generating the control macro instruction according to the determined identifier of the designated device and the control instruction to be executed.

Clause O7. The device according to Clause O1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause O8, the device according to Clause O1,

The running device is one or any combination of CPU, GPU and NPU

The device is set in the CPU and / or NPU;

The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction, and the conditional jump macro instruction includes a jump condition.

Clause O9. A machine learning computing device, the device comprising:

One or more control instruction generation devices as described in any one of clauses O1 to O8, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, and pass the execution results through O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the control instruction generation devices, the plurality of control instruction generation devices may be connected and transmit data through a specific structure;

Among them, a plurality of the control instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the control instruction generation devices share the same control system or have their own A control system; a plurality of the control instruction generation devices share memory or have their own memories; the interconnection mode of the plurality of control instruction generation devices is an arbitrary interconnection topology.

Clause O10. A combined processing device, the device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause O9;

Clause O11, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause O9 or a combination described in Article O10 Processing device

The storage device is used for storing data;

Clause O12. A control instruction generation method, the method comprising:

According to the received control macro instruction, determine the execution device for executing the control macro instruction;

Generate an operation instruction according to the control macro instruction and the operation device,

Clause O13. According to the method described in Clause O12, based on the received control macro instruction, determining a running device that executes the control macro instruction includes:

When it is determined that the control macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the control macro instruction, the designated device is determined as the running device,

Clause O14. The method according to Clause O13, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received control macro instruction, determining a running device for executing the control macro instruction includes:

When it is determined that the control macro instruction does not include the identification of the designated device, according to the received control macro instruction and the resource information of the candidate device, the candidate device is determined to perform the control Macro operating equipment,

Clause O15. According to the method described in Clause O14, according to the received control macro instruction, determining a running device that executes the control macro instruction includes:

When it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the resource of the candidate device Information to determine the operating equipment.

Clause O16. The method according to Clause O12, the method further comprising:

Clause O17. The method according to Clause O13, the method further comprising:

The control instruction to be executed is received, and the control macro instruction is generated according to the determined identifier of the designated device and the control instruction to be executed.

Clause O18. The method according to Clause O12, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause O19, according to the method described in Clause O12,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause P1, a control instruction processing system, the system includes an instruction generation device and an operation device,

The instruction generating device includes:

An instruction generation module, configured to generate an operation instruction according to the control macro instruction and the operation device;

The operation equipment includes:

Clause P2. The system according to Clause P1, the instruction generating device further includes:

The first determining submodule is configured to determine the designated device as the control device when it is determined that the control macro instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the control macro instruction Running equipment

The second determination submodule is configured to, when determining that the control macro instruction does not include the identifier of the designated device, select the candidate device according to the received control macro instruction and the resource information of the candidate device. Determine a running device for executing the control macro instruction;

A third determination submodule, when it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the Describe the resource information of the alternative equipment, determine the operating equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the control macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause P3. The system according to Clause P1, the instruction generating device further includes:

Clause P4. The system according to Clause P2, the instruction generating device further includes:

Clause P5. The system according to Clause P1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause P6. The system according to Clause P1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause P7. The system according to Clause P1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause P8. The system according to Clause P7, the execution module includes:

Clause P9, the system described in Clause P1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause P10. A machine learning computing device, the device comprising:

One or more control instruction processing systems as described in any one of Clause P1-Clause P9, used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through O interface is passed to other processing devices;

When the machine learning computing device includes a plurality of the control instruction processing systems, the plurality of the control instruction processing systems can be connected and transmit data through a specific structure;

Among them, multiple control instruction processing systems interconnect and transmit data through a fast external device interconnect bus to support larger-scale machine learning operations; multiple control instruction processing systems share the same control system or have their own Control system; multiple control instruction processing systems share memory or have their own memory; multiple interconnection methods of the control instruction processing system are arbitrary interconnection topologies.

Clause P11. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in Clause P110;

Clause P12, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article P10 or a combination described in Article P13 Processing device

The storage device is used for storing data;

Clause P13. A control instruction processing method. The method is applied to a control instruction processing system. The system includes an instruction generation device and an operation device. The method includes:

Determining, by the instruction generating device, a running device that executes the control macro instruction according to the received control macro instruction, and generating a running instruction according to the control macro instruction and the running device;

Clause P14. The method according to Clause P13, the method further comprising:

Wherein, the instruction generating device determines the operating device that executes the control macro instruction according to the received control macro instruction, including at least one of the following:

When it is determined that the control macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution conditions for executing the control macro instruction, the designated device is determined as the running device;

When it is determined that the control macro instruction does not include the identification of the designated device, according to the received control macro instruction and the resource information of the candidate device, the candidate device is determined to perform the control Macro operating equipment;

When it is determined that the control macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the control macro instruction, according to the control macro instruction and the resource of the candidate device Information, determine operating equipment,

Clause P15. The method according to Clause P13, the method further comprising:

Clause P16. The method according to Clause P14, the method further comprising:

Receiving the control instruction to be executed through the instruction generating device, and generating the control macro instruction according to the determined identifier of the designated device and the control instruction to be executed.

Clause P17. The method according to Clause P13, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause P18. The method according to Clause P13, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause P19. The method according to Clause P13, the method further comprising:

Storing the operation instruction through the operation device;

Clause P20. The method according to Clause P19, the method further comprising:

Clause P21, according to the method described in Clause P13,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

The data reading instruction can be processed by the following data reading instruction generating device and method, as well as the data reading instruction processing system, method and other related products, and the aforementioned content can be better understood according to the following clauses:

Clause Q1, a data reading instruction generation device, the device comprising:

A device determination module, configured to determine a running device that executes the read macro instruction based on the received read macro instruction;

An instruction generation module, configured to generate an operation instruction based on the read macro instruction and the operation device,

Wherein, the read macro instruction refers to a macro instruction used for data reading,

The read macro instruction includes an operation type, data read-in address, and data encryption mode address. The operation instruction includes the operation type, operation data read-in address, and operation data encryption mode address, and the operation data read-in address and The operating data encryption mode address is determined based on the data read-in address and the data encryption mode address, respectively.

Clause Q2. The apparatus according to Clause Q1, the device determination module includes:

The first determining submodule is configured to determine the designated device as the designated device when it is determined that the read macro instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the read macro instruction Running equipment,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the operation type of the read macro instruction.

Clause Q3. The device according to Clause Q2, the device further comprising:

The device determination module also includes:

A second determining submodule, configured to determine, from the candidate device, the read macro instruction and the resource information of the candidate device according to the received reader macro instruction and the resource information of the candidate device Determine a running device for executing the read macro instruction,

Clause Q4. The apparatus according to Clause Q3, the device determination module, further comprising:

The third determining submodule, when determining that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and all Describe the resource information of the alternative equipment and determine the operation equipment.

Clause Q5. The device according to any one of Clause Q2-Clause Q4, the read macro instruction further includes a read-in amount,

The instruction generation module is also used to determine the data amount of the read macro instruction, and generate an operation instruction according to the data amount, the read macro instruction and the resource information of the operation device,

Wherein, the data amount is determined according to the read-in amount, and the resource information of the operating device further includes at least one of storage capacity and remaining storage capacity.

Clause Q6. The apparatus according to Clause Q5, the instruction generation module includes:

The first instruction generating submodule is configured to determine the number of the operating device and the operating data amount of the operating device is less than the data amount of the read macro instruction, according to the operating data amount of the operating device and the The amount of data splits the read macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,

Wherein, the operation data amount of the operation device is determined according to the resource information of the operation device, and each operation instruction further includes an operation reading amount, and the operation reading amount is determined according to the operation data amount.

Clause Q7. The device according to Clause Q5, the instruction generation module includes:

The second instruction generation submodule is used to split the read macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a command corresponding to each operation Equipment operating instructions,

Wherein, the operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes an operation reading amount, and the operation reading amount is based on the operation device executing the operation instruction The amount of operating data is determined.

Clause Q8. The device according to Clause Q1, the device further comprising:

Clause Q9. The device according to Clause Q2, the device further comprising:

The macro generation module is used to receive the read instruction to be executed, and generate the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.

Clause Q10. The device according to Clause Q1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause Q11, the device according to Clause Q1,

The running device is one or any combination of CPU, GPU and NPU;

The device is set in the CPU and / or NPU;

The read macro instruction includes at least one of read neuron macro instruction, read synapse macro instruction and read scalar macro instruction.

Clause Q12. A machine learning computing device, the device comprising:

One or more data reading instruction generating devices as described in any one of Clause Q1-Article Q11, used to obtain the data and control information to be calculated from other processing devices, and execute the specified machine learning operation, and pass the execution result The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the data reading instruction generating devices, the plurality of data reading instruction generating devices may be connected and transmit data through a specific structure;

Among them, a plurality of the data reading instruction generating devices are interconnected and transmitting data through a fast external device interconnect bus to support larger-scale machine learning operations; a plurality of the data reading instruction generating devices share the same control system Or have their own control systems; a plurality of the data reading instruction generating devices share memory or have their own memory; the interconnection method of the plurality of data reading instruction generating devices is any interconnected topology.

Clause Q13. A combined processing device, the device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause Q12;

Clause Q14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Clause Q12 or a combination described in Article Q13 Processing device

The storage device is used for storing data;

Article Q15. A method for generating a data reading instruction. The method includes:

According to the received macro read command, determine the operating device that executes the macro read command;

Generate an operation instruction according to the read macro instruction and the operation device,

Clause Q16. According to the method described in Clause Q15, based on the received read macro instruction, determining the operating device that executes the read macro instruction includes:

When it is determined that the read macro instruction includes the identifier of the specified device, and the resources of the specified device satisfy the execution conditions for executing the read macro instruction, determine the specified device as the running device,

Clause Q17. The method according to Clause Q16, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received macro reading instruction, determining the running device for executing the macro reading instruction includes:

When it is determined that the read macro instruction does not include the identification of the designated device, based on the received read macro instruction and the resource information of the candidate device, the candidate device is determined to perform the reading Macro operating equipment,

Clause Q18. According to the method described in Clause Q17, based on the received read macro instruction, determine the operating device that executes the read macro instruction, including:

When it is determined that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and the resource of the candidate device Information to determine the operating equipment.

Clause Q19, the method according to any one of Clause Q16- Clause Q18, the read macro instruction further includes a read-in amount,

Generating an operation instruction according to the read macro instruction and the operation device, including:

Determine the data amount of the read macro instruction, and generate an operation instruction according to the data amount, the read macro instruction and the resource information of the operation device,

Clause Q20. According to the method of Clause Q19, generating an operation instruction according to the data amount of the read macro instruction, the read macro instruction, and the resource information of the operation device includes:

When it is determined that there is only one running device, and the amount of running data of the running device is smaller than the data amount of the read macro instruction, the read macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,

Clause Q21. According to the method described in Clause Q19, generating an operation instruction according to the data amount of the read macro instruction, the read macro instruction, and the resource information of the operation device includes:

When it is determined that there are a plurality of operation devices, the read macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,

Clause Q22. The method according to Clause Q15, the method further comprising:

Clause Q23. The method according to Clause Q16, the method further comprising:

Receiving the read instruction to be executed, and generating the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.

Clause Q24. The method according to Clause Q15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause Q25, the method according to Clause Q15,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause R1, a data reading processing system, the system includes an instruction generating device and an operating device,

The instruction generating device includes:

An instruction generation module, configured to generate an operation instruction according to the read macro instruction and the operation device;

The operation equipment includes:

Clause R2. The system according to Clause R1, the instruction generating device further includes:

The first determining submodule is configured to determine that the designated device is the Running equipment

A second determining submodule, configured to determine, from the candidate device, the read macro instruction and the resource information of the candidate device according to the received reader macro instruction and the resource information of the candidate device Determine a running device for executing the read macro command;

The third determining submodule, when determining that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and all Describe the resource information of the alternative equipment, determine the operating equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the read macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause R3, the system according to Clause R2, the read macro instruction also includes the read-in amount,

Clause R4. The system according to Clause R3, the instruction generation module includes at least one of the following sub-modules:

The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes an operation read-in amount, and the operation read-in amount is determined according to the operation data amount of the operation device executing the operation instruction of.

Clause R5. The system according to Clause R1, the instruction generation device further includes:

Clause R6. The system according to Clause R2, the instruction generating device further includes:

Clause R7. The system according to Clause R1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause R8. The system according to Clause R1, the operating device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause R9. The system according to Clause R1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause R10. The system according to Clause R9, the execution module includes:

Clause R11, the system according to Clause R1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause R12. A machine learning computing device, the device comprising:

One or more data reading instruction processing systems as described in any one of Clause R1-Clause R11, used to obtain data to be operated and control information from other processing devices, and perform designated machine learning operations, and pass the execution result The I / O interface is passed to other processing devices;

When the machine learning computing device includes a plurality of the data reading instruction processing systems, the plurality of data reading instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the data reading instruction processing systems are interconnected and transmitting data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the data reading instruction processing systems share the same control system Or have their own control systems; multiple of the data read-in instruction processing systems share memory or have their own memories; the interconnection of multiple of the data read-in instruction processing systems is any interconnected topology.

Clause R13. A combined processing device, the combined processing device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause R12;

Clause R14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in the article R12 or a combination described in the article R13 Processing device

The storage device is used for storing data;

Clause R15. A data reading instruction processing method. The method is applied to a data reading instruction processing system. The system includes an instruction generating device and an operating device. The method includes:

Determining, by the instruction generating device, a running device that executes the read macro instruction according to the received read macro instruction, and generating a running instruction based on the read macro instruction and the running device;

Clause R16. The method according to Clause R15, the method further comprising:

Wherein, the instruction generating device determines the running device that executes the read macro instruction according to the received read macro instruction, including at least one of the following:

When it is determined that the read macro instruction includes the identifier of the specified device, and the resources of the specified device satisfy the execution conditions for executing the read macro instruction, determine the specified device as the running device;

When it is determined that the read macro instruction does not include the identification of the designated device, based on the received read macro instruction and the resource information of the candidate device, it is determined from the candidate device for performing the read Macro operating equipment;

When it is determined that the read macro instruction includes the identifier of the specified device, and the resource of the specified device does not satisfy the execution condition for executing the read macro instruction, according to the read macro instruction and the resource of the candidate device Information, determine operating equipment,

Clause R17. According to the method described in Clause R16, the read macro instruction further includes a read-in amount,

Determining the data amount of the read macro instruction, and generating an operation instruction according to the data amount of the read macro instruction, the read macro instruction and the resource information of the operating device,

Clause R18. According to the method described in Clause R17, generate an operation instruction according to the data amount of the read macro instruction, the read macro instruction, and the resource information of the operation device, including at least one of the following:

When it is determined that there is only one running device, and the amount of running data of the running device is smaller than the data amount of the read macro instruction, the read macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;

Clause R19. The method according to Clause R15, the method further comprising:

Clause R20. The method according to Clause R16, the method further comprising:

Receiving the read instruction to be executed through the instruction generating device, and generating the read macro instruction according to the determined identifier of the designated device and the read instruction to be executed.

Clause R21. The method according to Clause R15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause R22. The method according to Clause R15, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause R23. The method according to Clause R15, the method further comprising:

Storing the operation instruction through the operation device;

Clause R24. The method according to Clause R23, the method further comprising:

Clause R25, according to the method described in Clause R15,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

For the data writing instruction, it can be processed by the following data writing instruction generating device and method, as well as data writing instruction processing system, method and other related products, and the foregoing can be better understood according to the following clauses:

Clause S1, a data writing instruction generating device, the device comprising:

A device determining module, configured to determine a running device that executes the macro writing instruction according to the received macro writing instruction;

An instruction generation module, configured to generate an operation instruction based on the write macro instruction and the operation device,

Wherein, the write macro instruction refers to a macro instruction for writing data,

The write macro instruction includes an operation type, data write address, and data encryption mode address, and the operation instruction includes the operation type, operation data write address, and operation data encryption mode address, and the operation data write address and The operating data encryption mode address is determined based on the data write address and the data encryption mode address, respectively.

Clause S2. The apparatus according to Clause S1, the device determination module includes:

The first determining submodule is configured to determine the designated device as the specified device when determining that the macro write instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the macro write instruction Running equipment,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the type of operation of writing the macro instruction.

Clause S3. The device according to Clause S2, the device further comprising:

The device determination module also includes:

A second determining submodule, configured to determine, from the candidate device, based on the received macro command and the resource information of the candidate device, when determining that the identifier of the designated device is not included in the command to write the macro Determine a running device for executing the write macro instruction,

Clause S4. The apparatus according to Clause S3, the device determination module, further comprising:

The third determining submodule determines that the designated macro device contains the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the designated macro command and all Describe the resource information of the alternative equipment and determine the operation equipment.

Clause S5. The device according to any one of Clause S2-S4, the write macro instruction further includes a write amount,

The instruction generation module is also used to determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount, the write macro instruction and the resource information of the operating device,

Wherein, the data volume is determined according to the write volume, and the resource information of the running device further includes at least one of storage capacity and remaining storage capacity.

Clause S6. The apparatus according to Clause S5, the instruction generation module includes:

The first instruction generating submodule is configured to determine the number of the running device and the amount of running data of the running device is smaller than the amount of data of the macro instruction, according to the running data amount of the running device and the The amount of data splits the write macro instruction into multiple running instructions, so that the running device sequentially executes multiple running instructions,

The operation data amount of the operation device is determined according to the resource information of the operation device, and each operation instruction further includes an operation write amount, and the operation write amount is determined according to the operation data amount.

Clause S7. The apparatus according to Clause S5, the instruction generation module includes:

The second instruction generation submodule is used to split the write macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate a command corresponding to each operation Equipment operating instructions,

The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction further includes an operation write amount, and the operation write amount is based on the operation device executing the operation instruction The amount of operating data is determined.

Clause S8. The device according to Clause S1, the device further comprising:

Clause S9. The device according to Clause S2, the device further comprising:

The macro generation module is used to receive the write instruction to be executed, and generate the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.

Clause S10. The device according to Clause S1, the device further comprising:

Wherein, the instruction dispatching module includes:

Clause S11, the device according to Clause S1,

The running device is one or any combination of CPU, GPU and NPU;

The device is set in the CPU and / or NPU;

The write macro instruction includes at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.

Clause S12. A machine learning computing device, the device comprising:

One or more data writing instruction generation devices as described in any one of clauses S1 to S11, used to obtain data to be calculated and control information from other processing devices, and perform designated machine learning operations, and pass the execution result through The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the data writing instruction generation devices, the plurality of data writing instruction generation devices can be connected and transmit data through a specific structure;

Among them, a plurality of the data writing instruction generating devices are interconnected and transmitting data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the data writing instruction generating devices share the same control system Or have their own control systems; a plurality of the data writing instruction generating devices share memory or have their own memories; the interconnection method of the plurality of data writing instruction generating devices is any interconnected topology.

Clause S13. A combined processing device, the device comprising:

Machine learning computing device, general interconnection interface and other processing devices as described in clause S12;

Clause S14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning arithmetic device described in the clause S12 or a combination described in the clause S13 Processing device

The storage device is used for storing data;

Clause S15. A method for generating a data writing instruction, the method comprising:

According to the received macro write instruction, determine the execution device for executing the macro write instruction;

Generate an operation instruction according to the write macro instruction and the operation device,

Clause S16. According to the method described in Clause S15, based on the received write macro instruction, determining an operating device to execute the write macro instruction includes:

Determining that the designated device is the running device when it is determined that the designated macro device contains the identifier of the designated device and the resources of the designated device satisfy the execution conditions for executing the designated macro command,

Wherein, the execution condition includes that the designated device contains an instruction set corresponding to the operation type of the write macro instruction.

Clause S17. The method according to Clause S16, the method further comprising:

Obtain resource information of alternative devices,

Wherein, according to the received macro writing instruction, determining a running device for executing the macro writing instruction includes:

When it is determined that the write macro instruction does not include the identification of the designated device, according to the received write macro instruction and the resource information of the candidate device, the candidate device is determined to perform the write Macro operating equipment,

Clause S18. According to the method described in Clause S17, based on the received write macro instruction, determining an operating device to execute the write macro instruction includes:

When it is determined that the designated macro device includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the resources of the designated macro device and the candidate device Information to determine the operating equipment.

Clause S19, the method according to any one of Clause S16-S18, the write macro instruction further includes a write amount,

Determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount, the write macro instruction and the resource information of the operating device,

Clause S20. According to the method of Clause S19, generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including:

When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the write macro instruction, the write macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device executes multiple running instructions in sequence,

Clause S21, according to the method described in Clause S19, generating an operation instruction according to the data amount of the macro writing instruction, the macro writing instruction and the resource information of the operating device, including:

When it is determined that there are a plurality of operating devices, the write macro instruction is split according to the operating data amount of each operating device and the data amount, and an operating command corresponding to each operating device is generated,

Clause S22. The method according to Clause S15, the method further comprising:

Clause S23. The method according to Clause S16, the method further comprising:

Receiving the write instruction to be executed, and generating the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.

Clause S24. The method according to Clause S15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause S25, according to the method described in Clause S15,

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

Clause T1, a data writing processing system, the system includes an instruction generating device and an operating device,

The instruction generating device includes:

An instruction generation module, configured to generate an operation instruction according to the write macro instruction and the operation device;

The operation equipment includes:

Clause T2. The system according to Clause T1, the instruction generating device further includes:

The first determining submodule is configured to determine the designated device as the specified device when determining that the macro write instruction includes the identifier of the designated device and the resource of the designated device satisfies the execution condition for executing the macro write instruction Running equipment

A second determining submodule, configured to determine, from the candidate device, based on the received macro command and the resource information of the candidate device, when determining that the identifier of the designated device is not included in the command to write the macro Determine a running device for executing the write macro instruction;

The third determining submodule determines that the designated macro device contains the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the designated macro command and all Describe the resource information of the alternative equipment, determine the operating equipment,

Wherein, the execution condition includes: the specified device includes an instruction set corresponding to the write macro instruction, and the resource information includes an instruction set included in the candidate device.

Clause T3, the system according to Clause T2, the write macro instruction also includes the write amount,

Clause T4. The system according to Clause T3, the instruction generation module includes at least one of the following sub-modules:

The operation data amount of the operation device is determined according to the resource information of the operation device, and the operation instruction further includes an operation write amount, and the operation write amount is determined according to the operation data amount of the operation device executing the operation instruction of.

Clause T5. The system according to Clause T1, the instruction generating device further includes:

Clause T6. The system according to Clause T2, the instruction generating device further includes:

Clause T7. The system according to Clause T1, the instruction generating device further includes:

Wherein, the instruction dispatching module includes:

Clause T8. The system according to Clause T1, the operation device, further comprising:

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause T9. The system according to Clause T1, the control module includes:

An instruction storage sub-module for storing the operation instruction;

Clause T10. The system according to Clause T9, the execution module includes:

Clause T11, the system according to Clause T1,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

Clause T12, a machine learning computing device, the device comprising:

One or more data writing instruction processing systems as described in any one of clauses T1 to T11, used to obtain data and control information to be calculated from other processing devices, and perform designated machine learning operations, and pass the results The I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the data writing instruction processing systems, the plurality of data writing instruction processing systems may be connected and transmit data through a specific structure;

Among them, a plurality of the data writing instruction processing systems are interconnected and transmitting data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the data writing instruction processing systems share the same control system Or have their own control systems; multiple of the data writing instruction processing systems share memory or have their own memories; the interconnection of multiple data writing instruction processing systems is any interconnected topology.

Clause T13, a combined processing device, the combined processing device comprising:

Machine learning computing devices, general interconnection interfaces and other processing devices as described in clause T12;

Clause T14, a board card, the board card includes: a storage device, an interface device and a control device, and a machine learning chip, the machine learning chip includes the machine learning computing device described in Article T12 or a combination described in Article T13 Processing device

The storage device is used for storing data;

Clause T15. A method for processing a data writing instruction. The method is applied to a data writing instruction processing system. The system includes an instruction generating device and an operating device. The method includes:

Determining, by the instruction generating device, an operating device that executes the macro writing instruction according to the received macro writing instruction, and generating an operating instruction according to the macro writing instruction and the operating device;

Clause T16. The method according to Clause T15, the method further comprising:

Wherein, the instruction generating device determines the running device that executes the macro writing instruction according to the received macro writing instruction, including at least one of the following:

Determining that the designated device is the running device when it is determined that the write macro instruction includes the identifier of the specified device, and the resources of the specified device satisfy the execution conditions for executing the write macro instruction;

When it is determined that the write macro instruction does not include the identification of the designated device, according to the received write macro instruction and the resource information of the candidate device, the candidate device is determined to perform the write Macro operating equipment;

When it is determined that the designated macro device includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution condition for executing the designated macro command, according to the resources of the designated macro device and the candidate device Information, determine operating equipment,

Clause T17, according to the method of Clause T16, the write macro instruction also includes the write amount,

Determine the data amount of the write macro instruction, and generate an operation instruction according to the data amount of the write macro instruction, the write macro instruction and the resource information of the running device,

Clause T18. According to the method described in Clause T17, generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device, including at least one of the following:

When it is determined that there is only one running device and the amount of running data of the running device is less than the amount of data of the write macro instruction, the write macro instruction is disassembled according to the running data amount of the running device and the data amount Divided into multiple running instructions, so that the running device sequentially executes multiple running instructions;

Clause T19. The method according to Clause T15, the method further comprising:

Clause T20. The method according to Clause T16, the method further comprising:

Receiving the write instruction to be executed through the instruction generating device, and generating the write macro instruction according to the determined identifier of the designated device and the write instruction to be executed.

Clause T21, the method according to Clause T15, the method further comprising:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Clause T22. The method according to Clause T15, the method further comprising:

Storing the data and the scalar data in the data through the operating device,

The cache is used to store the data;

The register is used to store scalar data in the data.

Clause T23. The method according to Clause T15, the method further comprising:

Storing the operation instruction through the operation device;

Clause T24. The method according to Clause T23, the method further comprising:

Clause T25, the method according to Clause T15,

The running device is one or any combination of CPU, GPU and NPU;

The instruction generating device is set in the CPU and / or NPU;

In a possible implementation, the calculation macro instruction refers to a macro instruction used for data calculation. The data calculation may include machine learning calculation, neural network calculation, vector logic calculation, matrix vector calculation, scalar calculation, and scalar logic calculation. At least one.

The neural network calculation macro instruction may refer to a macro instruction used to calculate a neural network algorithm. For example, convolution calculation macros and pooling calculation macros that perform calculations on neural network algorithms such as convolutional computation and pooling. Different types of neural network calculation macros correspond to different types of operations. For example, the operation type corresponding to the convolution calculation macro instruction may be CONV.

The vector logic calculation macro instruction may refer to a macro instruction used to perform logic operation on a vector. For example, a macro instruction that performs logical calculations such as "and", "comparison", or "or" on a vector. Different types of vector logic calculation macro instructions correspond to different types of operations. For example, the operation type corresponding to the vector and the calculation macro instruction may be VAND, and the operation type corresponding to the vector or the calculation macro instruction may be VOR.

The matrix vector calculation macro instruction may refer to a macro instruction used to calculate a matrix and a vector. For example, macro instructions for matrix and vector calculations such as matrix multiply vector calculation, vector multiply matrix calculation, tensor calculation, matrix addition calculation, and matrix subtraction calculation. Different types of matrix vector calculation macro instructions correspond to different operation types. For example, the operation type corresponding to the matrix addition calculation macro instruction is MADD, and the operation type corresponding to the matrix multiplication vector calculation macro instruction is MMV.

A scalar calculation macro instruction may refer to a macro instruction used for arithmetic operation on a scalar. For example, a macro instruction for scalar addition, scalar subtraction, scalar multiplication, and scalar division calculations. Different types of scalar calculation macros correspond to different types of operations. For example, the operation type corresponding to the scalar subtraction macro instruction is SSUB, and the operation type corresponding to the scalar addition macro instruction is SADD.

The scalar logic calculation macro instruction may be a macro instruction for performing logical operation on the scalar. For example, a macro instruction that performs logical operations such as "AND", "Comparison", "OR", and "NO" on a scalar. Different types of scalar logic calculation macro instructions correspond to different operation types. For example, the operation types corresponding to scalar and calculation macro instructions may be SAND, and the operation types corresponding to calculation macro instructions may be SOR.

In a possible implementation, the control macro instruction refers to a macro instruction used to control the instruction flow to jump to the target jump position. The unconditional jump macro instruction may be a macro instruction used to control the instruction flow so that it unconditionally jumps to a specified position. The conditional jump macro instruction may be a macro instruction used to control the instruction flow so that the conditional jump macro instruction jumps to a specified position when the condition that it needs to satisfy is true.

In a possible implementation manner, the data handling macro instruction refers to a macro instruction used for carrying out handling processing such as reading and writing of data. The read macro instruction may refer to a macro instruction that reads data from a memory to a location where data is stored, or may refer to a macro instruction that is used to read data. Depending on the data type, read macros may include read neuron macros for reading neuron data, read synapse macros for reading synapse data, and read scalar macros for reading scalar data . Writing a macro instruction may refer to a macro instruction that writes data from its storage location to a memory, or may refer to a macro instruction used to write data. According to different data types, the write macro instruction may include a write neuron macro instruction for writing neuron data, a write synapse macro instruction for writing synapse data, and a write scalar macro instruction for writing scalar data. . Among them, the neuron data is the input neuron and output neuron in the neural network algorithm, and the synapse data is the weight in the neural network algorithm.

In a possible implementation manner, the instruction format of the macro instruction, the instruction format of the instruction to be executed, and the process of the operation device executing the operation instruction are described. The following is a specific example.

The command format of the macro command can be the following format example.

The instruction format of the neural network calculation macro instruction can be:

Type device_id, input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, [param1, param2,…]

Among them, Type is the type of operation, device_id is the identification of the specified device, input_addr is the input address, output_addr is the output address, input_h, input_w, input_c are the input neuron size (ie, the amount of input), output_h, output_w, output_c are the output nerves Meta scale (ie output), param1 and param2 are instruction parameters.

For a neural network calculation macro, it must include the operation type, input address, and output address, and the run instruction generated based on the neural network calculation macro must also include the operation type, run input address, and run output address, run input address, and run The output address is determined according to the input address and output address respectively.

Taking the convolution calculation macro instruction as an example, the instruction format is: CONV device_id, input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, kernel, stride, pad. The convolution calculation macro instruction when called can be the following example:

@ CONV # 0, # 4, # 500, # 5, # 5, # 32, # 3, # 3, # 16, # 3, # 1, # 0

The operation type of the convolution calculation macro instruction is CONV. Specify the device as device 0. The input address of the data is address 4. The output address of the data is address 500. The input amount of data is 5x5x32. The data output is 3x3x16. The size of the convolution kernel is 3, the step size of the convolution kernel is 1, and the padding of the convolution kernel is 0.

If the running instruction generated according to the above example of the convolution calculation macro instruction is "@ CONV # 4, # 500, # 5, # 5, # 32, # 3, # 3, # 16, # 3, # 1, # 0" For example. After the running device receives the running instruction, its execution process is: acquiring data with an input amount of 5x5x32 from address 4. According to the size (3), step size (1) and padding (0) of the convolution kernel in the running instruction, perform convolution operation on the data with the input volume of 5x5x32, and when the execution result of the data volume is 3x3x16, execute the The result is stored to the output address 500.

The instruction format of the vector logic calculation macro instruction can be:

Type device_id, input_addr, output_addr, input_size, output_size, [param1, param2,…]

Among them, Type is the type of operation, device_id is the identification of the specified device, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (ie, the amount of input), output_size is the size of the output vector (ie, the amount of output), param1 param2 is the command parameter. The instruction parameter can be the address and length of the second operand.

For the vector logic calculation macro instruction, it must contain the operation type, input address and output address, and the run instruction generated according to the vector logic calculation macro instruction must also include the operation type, run input address and run output address, run input address and run The output address is determined according to the input address and output address respectively.

Take the running instruction generated by a vector logic calculation macro instruction as "@ VAND # 501, # 7, # 33, # 4" as an example. After the running device receives the running instruction, its execution process is: obtaining an input vector of size 33 from the input address 501, performing an AND logic operation on the input vector to obtain an output vector of size 4, and The output vector of size 4 is stored as the execution result at output address 7.

The instruction format of the matrix vector calculation macro instruction can be:

Among them, Type is the type of operation, idevice_id is the identification of the specified device, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (ie, the amount of input), output_size is the size of the output vector (ie, the amount of output), param1 param2 is the command parameter. The instruction parameter can be the address and length of the second operand.

For the matrix vector calculation macro instruction, it must include the operation type, input address and output address, and the run instruction generated according to the matrix vector calculation macro instruction must also include the operation type, run input address and run output address, run input address and run The output address is determined according to the input address and output address respectively.

Take the running instruction generated by calculating a macro instruction according to a matrix vector as "@ MADD # 502, # 8, # 34, # 5" as an example. After the running device receives the running instruction, its execution process is: obtaining an input matrix vector of size 34 from the input address 502. Matrix addition calculation is performed on the input matrix vector to obtain an output matrix vector of size 5. The output matrix vector of size 5 is stored as the execution result at the storage address 8.

The instruction format of the scalar calculation macro instruction can be:

Type device_id, op1, op2, ans

Among them, Type is the operation type, device_id is the identifier of the specified device, and op1 and op2 are the two operands. Ans is the storage address of the scalar calculation macro instruction calculation result or the identifier of the register used to store the calculation result. The size of the acquired scalar and the size of the scalar output by calculating the acquired scalar can be set in advance.

For a scalar calculation macro, it must contain the operation type, first operand, second operand, and output address. And the run instruction generated according to the scalar calculation macro instruction must also include the operation type, the first run operand, the second run operand and the run output address. Among them, the first operation operand, the second operation operand, and the operation output address are determined according to the first operand, the second operand, and the output address, respectively.

Take the running instruction generated by a macro calculation based on a scalar as "@ SADD # 503, # 504, # 3" as an example. After the running device receives the running instruction, its execution process is: obtaining the first scalar from the address 503 of the register and the second scalar from the address 504 of the register, adding the first scalar and the second scalar, and adding The result obtained by the addition calculation is stored as the execution result at the storage address 3 of the register.

The instruction format of the scalar logic calculation macro instruction can be:

Type device_id, op1, op2, ans

Among them, Type is the operation type, device_id is the identifier of the specified device, and op1 and op2 are the two operands. Ans is the storage address of the scalar logic calculation macro instruction calculation result or the identifier of the register used to store the calculation result. The size of the acquired scalar and the size of the scalar output by calculating the acquired scalar can be set in advance.

For a scalar logic calculation macro instruction, it must contain the operation type, first operand, second operand, and output address. And the operation instruction generated by calculating the macro instruction according to the scalar logic must also include the operation type, the first operation operand, the second operation operand and the operation output address. Among them, the first operation operand, the second operation operand, and the operation output address are determined according to the first operand, the second operand, and the output address, respectively.

Take the running instruction generated by a macro instruction calculated according to a scalar logic as "@ SAND # 703, # 704, # 8" as an example. After the running device receives the running instruction, its execution process is: acquiring the first scalar from the register address 703 and the second scalar from the register address 704, and performing an AND logic on the first scalar and the second scalar Calculate and store the obtained result as the execution result to the storage address 8 of the register.

The instruction format of an unconditional jump macro instruction can be:

Jump device_id, src

Among them, Jump is the operation type corresponding to the unconditional jump macro instruction, device_id is the identifier of the specified device, and src is the target jump position to which the instruction stream needs to jump. The target jump position may be the length of the register, the address of the register, the identifier of the register, the immediate value, and so on.

For an unconditional jump macro instruction, it must include the operation type and target jump position, and the run instruction generated according to the unconditional jump macro instruction must also include the operation type and run target jump position. Among them, the running target jump position is determined according to the target jump position.

Take the running instruction generated according to an unconditional jump macro instruction as "@ Jump # 505" as an example. After the running device receives the running instruction, its execution process is: jump the current instruction stream to the address 505 to continue execution.

The instruction format of the conditional jump macro instruction can be:

CB device_id, src, condition

Among them, CB is the operation type corresponding to the conditional jump macro instruction, device_id is the identifier of the specified device, src is the target jump position to which the instruction stream needs to jump, and condition is the jump condition. For example, condition can be "whether the value of the register is zero is true", when the value of the register is zero, you can jump to the target jump position. The target jump position may be the length of the register, the address of the register, the identifier of the register, the immediate value, and so on.

For the conditional jump macro instruction, it must include the operation type and target jump position, and the run instruction generated according to the conditional jump macro instruction must also include the operation type and run target jump position. Among them, the running target jump position is determined according to the target jump position.

Take the running instruction generated according to a conditional jump macro as "@ CB # 506 # h" as an example. After the running device receives the running instruction, its execution process is: judging whether its jump condition "h" is true, and when "h" is true, jump the current instruction stream to the address 506 to continue execution.

The instruction format for reading neuron macro instructions can be:

NLOAD devices_id, src_addr, des_addr, size

Among them, NLOAD is the operation type corresponding to the read neuron macro instruction, device_id is the identification of the specified device, src_addr is the data read address for reading neuron data, and des_addr is the data that stores the encryption method required to read neuron data Encryption address, size is the amount of neuron data read in.

Take the running instruction generated according to a certain read neuron macro instruction as "@ NLOAD # 505 # 506 # 9" as an example. After the operation device receives the operation instruction, its execution process is: obtaining the encryption method of the neuron data from the address 506, and reading the neuron data with a read amount of 9 from the address 505 according to the encryption method.

The command format for reading synaptic macros can be:

WLOAD devices_id, src_addr, des_addr, size

Among them, WLOAD is the operation type corresponding to the read synapse macroinstruction, device_id is the identification of the specified device, src_addr is the data reading address for reading synapse data, and des_addr is the data for storing the encryption method required to read the synapse data Encrypted address, size is the amount of synaptic data read.

Take the operation command generated according to a certain read synaptic macro as "@ WLOAD # 507 # 508 # 10" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of reading synaptic data from the address 508, and reading the synaptic data with the reading amount of 10 from the address 507 according to the encryption method.

The command format of the read scalar macro command can be:

SLOAD, device_id, src, des

Among them, SLOAD is the operation type corresponding to the read scalar macro instruction, device_id is the identification of the specified device, src is the data read-in address for reading scalar data, and des is the data encryption method address for storing the encryption method required for reading scalar data.

Take the operation command generated according to a certain read scalar macro command as "@ SLOAD # 601 # 602" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the scalar data from the address 602, and reading the stored scalar data from the address 601 according to the encryption method.

Among them, the data read-in address and the data encryption method address contained in the read neuron macro instruction, read synapse macro instruction and read scalar macro instruction may be the address, number, name, etc. of the register. For reading neuron macroinstruction, reading synapse macroinstruction and reading scalar macroinstruction, it must include operation type, data read-in address, data encryption method address, and operation type must also include operation type, operation data read-in address and operation Data encryption method address. Among them, the operation data read-in address and the operation data encryption mode address are determined according to the data read-in address and the data encryption mode address, respectively.

The instruction format for writing a neuron macro instruction can be:

NSTORE device_id, src_addr, des_addr, size

Among them, NSTORE is the operation type corresponding to the write neuron macro instruction, device_id is the identification of the specified device, src_addr is the data write address to write neuron data, and des_addr is the data to store the encryption method required to write neuron data Encryption address, size is the amount of data written.

Take the running instruction generated according to a certain neuron macro instruction as "@ NSTORE # 603 # 604 # 14" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the neuron data from the address 604, and writing the neuron data to be written to the address 14 according to the encryption method.

The command format for writing synaptic macros can be:

WSTORE device_id, src_addr, des_addr, size

Among them, WSTORE is the operation type corresponding to the write synapse macro instruction, device_id is the identification of the specified device, src_addr is the data write address to write synapse data, and des_addr is the data to store the encryption method required to write synapse data Encryption address, size is the amount of data written.

Take the operation command generated according to a certain write synapse macro command as "@ WSTORE # 605 # 606 # 15" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the synaptic data from the address 606, and writing the synaptic data to be written to the address 14 according to the encryption method.

The instruction format for writing a scalar macro instruction can be:

SSTORE device_id, src, des

Among them, SSTORE is the operation type corresponding to the read scalar macro instruction, device_id is the identification of the specified device, src is the data write address to write scalar data, and des is the data encryption method address to store the encryption method required to write scalar data .

Take the running instruction generated according to a certain write scalar macro instruction as "@ SSTORE # 607 # 608" as an example. After the operation device receives the operation instruction, its execution process is: acquiring the encryption method of the scalar data from the address 608, and writing the scalar data to be written to the address 14 according to the encryption method.

Among them, the data write address and the data encryption method address contained in the write neuron macro instruction, write synapse macro instruction and write scalar macro instruction may be the address, number, name, etc. of the register. For writing neuron macroinstruction, writing synapse macroinstruction and writing scalar macroinstruction, it must include operation type, data write address, data encryption method address, and operation type must also include operation type, operation data write address and operation Data encryption method address. Among them, the operation data write address and the operation data encryption mode address are determined according to the data write address and the data encryption mode address, respectively.

The instruction format of the instruction to be executed may be the following format example.

The instruction format of the neural network calculation instruction to be executed may be:

Type input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, [param1, param2,…]

Among them, Type is the operation type, input_addr is the input address, output_addr is the output address, input_h, input_w, input_c are the input neuron size (ie, input), and output_h, output_w, output_c are the output neuron size (ie, output) , Param1 and param2 are command parameters.

Taking the convolution instruction to be executed as an example, the instruction format is CONV input_addr, output_addr, input_h, input_w, input_c, output_h, output_w, output_c, kernel, stride, pad. The convolution instruction to be executed when called can be:

@ CONV # 6, # 500, # 5, # 5, # 32, # 3, # 3, # 16, # 3, # 1, # 0

The operation type of the convolution instruction to be executed is convolution neural network calculation. The input address of the data is address 6. The output address of the data is address 500. The input amount of data is 5x5x32. The data output is 3x3x16. The size of the convolution kernel is 3, the step size of the convolution kernel is 1, and the padding of the convolution kernel is 0.

The instruction format of the vector logic calculation instruction to be executed may be:

Type input_addr, output_addr, input_size, output_size, [param1, param2,…]

Among them, Type is the operation type, input_addr is the input address, output_addr is the output address, input_size is the size of the input vector (that is, the input amount), output_size is the size of the output vector (that is, the output amount), and param1 and param2 are the instruction parameters. The instruction parameter can be the address and length of the second operand.

The instruction format of the matrix vector calculation instruction to be executed may be:

Type input_addr, output_addr, input_size, output_size, [param1, param2,…]

The instruction format of the scalar calculation instruction to be executed may be:

Type, op1, op2, ans

Among them, Type is the operation type, op1, op2 are two operands. Ans is the storage address of the calculation result of the scalar calculation instruction to be executed or the identifier of the register used to store the calculation result.

The instruction format of the scalar logic calculation instruction to be executed may be:

Type, op1, op2, ans

Among them, Type is the operation type, op1, op2 are two operands. Ans is the storage address of the calculation result of the scalar logic calculation instruction to be executed or the identifier of the register used to store the calculation result.

The instruction format of the unconditional jump instruction to be executed may be:

Jump src

Jump is the type of operation corresponding to the unconditional jump instruction to be executed, and src is the target jump position to which the instruction stream needs to jump.

The instruction format of the conditional jump instruction to be executed may be:

CB src, condition

Among them, CB is the type of operation corresponding to the conditional jump instruction to be executed, src is the target jump position to which the instruction stream needs to jump, and condition is the condition of the jump. For example, condition can be "whether the value of the register is zero is true", when the value of the register is zero, you can jump to the target jump position.

The instruction format of the read neuron instruction to be executed may be:

NLOAD src_addr, des_addr, size

Among them, NLOAD is the type of operation corresponding to the read neuron instruction to be executed, src_addr is the data read-in address for reading neuron data, and des_addr is the data encryption method address that stores the encryption method required to read neuron data, size It is the amount of neuron data read in.

The command format of the read synaptic command to be executed may be:

WLOAD src_addr, des_addr, size

Among them, WLOAD is the type of operation corresponding to the read synaptic command to be executed, src_addr is the data read-in address for reading synaptic data, des_addr is the data encryption method address for storing the encryption method required for reading synaptic data, and size The amount of synapse data read.

The instruction format of the read scalar instruction to be executed may be:

SLOAD src, des

Among them, SLOAD is the type of operation corresponding to the scalar read instruction to be executed, src is the data read-in address for reading scalar data, and des is the data encryption method address for storing the encryption method required for reading scalar data.

The format of the instruction to be executed to write the neuron may be:

NSTORE src_addr, des_addr, size

Among them, NSTORE is the type of operation corresponding to the write neuron instruction to be executed, src_addr is the data write address to write neuron data, des_addr is the data encryption method address to store the encryption method required to write neuron data, size is The amount of neuron data written.

The instruction format of the write synaptic instruction to be executed may be:

WSTORE src_addr, des_addr, size

Among them, WSTORE is the type of operation corresponding to the write synaptic command to be executed, src_addr is the data write address to write synaptic data, des_addr is the data encryption method address to store the encryption method required to write synaptic data, size is The amount of synaptic data written.

The instruction format of the original instruction to write a scalar can be:

SSTORE src, des

Among them, SSTORE is the type of operation corresponding to the scalar read instruction to be executed, src is the data write address for writing scalar data, and des is the data encryption method address for storing the encryption method required for writing scalar data.

It should be understood that the "@" and "#" in the macro instruction in this article are only used to separate the parameters recorded in the macro instruction, in order to facilitate the technical personnel to understand, in the actual use process, "@", "#" It is not required to be included in the macro.

Application examples

8a and 8b are schematic diagrams illustrating application scenarios of a neural network instruction generation device and a processing system according to an embodiment of the present disclosure. As shown in FIGS. 8a and 8b, there may be multiple candidate devices for executing macro instructions. The candidate devices may be CPU-1, CPU-2, ..., CPU-n, NPU-1, NPU-2, ..., NPU-n and GPU-1, GPU-2, ..., GPU-n.

Example 1

The following describes an application example according to an embodiment of the present disclosure in order to understand the flow of the neural network instruction generating device in conjunction with "the working process of the neural network instruction generating device generating the running instruction according to the macro instruction" as an exemplary application scenario. Those skilled in the art should understand that the following application examples are only for the purpose of facilitating the understanding of the embodiments of the present disclosure, and should not be considered as limitations to the embodiments of the present disclosure.

The working process and principle of the neural network command generating device generating the running command according to a certain macro command are as follows.

Resource acquisition module 14

Acquire resource information of the candidate device, the resource information includes the remaining storage capacity, storage capacity of the candidate device, and the instruction set included in the candidate device. The resource acquisition module 14 sends the acquired resource information of the candidate device to the device determination module 11 and the instruction generation module 12.

Device determination module 11 (including first determination sub-module 111, second determination sub-module 112, and third determination sub-module 113)

When a macro instruction is received, the operating device that executes the macro instruction is determined according to the received macro instruction. For example, the following macro command is received. Among them, the macro instructions can be from different platforms.

Macro 1: @ XXX # 01 ...

Macro 2: @ SSS # 02 ...

Macro 3: @ DDD # 04 ...

Macro 4: @NNN ...

When the first determining sub-module 111 determines that the identifier of the designated device is included in the macro instruction, and determines that the designated device contains the instruction set corresponding to the macro instruction, the first determining sub-module 111 may determine the designated device to execute the macro The commanded operating device, and send the determined operating device identifier to the command generation module 12. For example, the first determining submodule 111 may determine a designated device corresponding to the identifier 01, such as CPU-2 (CPU-2 contains an instruction set corresponding to the macro instruction 1), as the running device for executing the macro instruction 1. A designated device corresponding to the identifier 02, such as CPU-1 (CPU-1 contains an instruction set corresponding to the macro instruction 2), may be determined as a running device for executing the macro instruction 2.

When the third determining submodule 113 determines that the identifier of the specified device is included in the macro instruction, and determines that the specified device does not contain the instruction set corresponding to the macro instruction, the third determining submodule 113 may correspond to the inclusion of the macro instruction The candidate device of the instruction set is determined to be the operating device, and the identification of the determined operating device is sent to the instruction generating module 12. For example, when determining that the designated device corresponding to the identifier 04 does not contain the instruction set corresponding to the macro instruction 3, the third determining sub-module 113 may include an alternative of the instruction set corresponding to the operation type DDD of the macro instruction 3 Devices such as NPU-n and NPU-2 are determined to be running devices for executing macro instructions 3.

When the second determination submodule 112 determines that the identifier of the designated device does not exist in the macro instruction (the position corresponding to the identifier of the designated device is empty, or does not include the field of "identifier of the designated device" in the macro instruction), the second determination The sub-module 112 may determine the operating device from the candidate device according to the macro instruction and the resource information of the candidate device (for the specific determination process, please refer to the relevant description of the second determining sub-module 112 above), and the determined operation The identification of the device is sent to the instruction generation module 12. For example, since the identifier of the designated device does not exist in the macro instruction 4, the second determination sub-module 112 may determine from the candidate device based on the operation type NNN of the macro instruction 4 and the resource information of the candidate device (instruction set included) A running device for executing macro instructions 4 is provided, for example, GPU-n (GPU-n contains an instruction set corresponding to the operation type NNN).

Instruction generation module 12 (including the first instruction generation module 121 and the second instruction generation module 122)

The first instruction generation module 121 splits the macro instruction into a plurality of operation instructions according to the operation data amount and data amount of the operation equipment when the operation equipment is one and the resources of the operation equipment do not satisfy the capacity condition for executing the macro instruction Multiple running instructions are sent to the queue building module 15. For example, multiple operation instructions 2-1, 2-2, ..., 2-n are generated based on the data amount of the macro instruction 2 and the operation data amount of the operation device CPU-1. A plurality of execution instructions 4-1, 4-2, ..., 4-n are generated according to the data amount of the macro instruction 4 and the operation data amount of the operation device GPU-n.

When the first instruction generation module 121 determines that there is only one running device, and the resources of the running device meet the capacity condition for executing the macro instruction, it may generate a running instruction according to the macro instruction and send it to the queue construction module 15. For example, an operation instruction 1-1 is generated based on the data amount of the macro instruction 1 and the operation data amount of the operation device CPU-2.

When determining that there are multiple operating devices, the second instruction generating module 122 splits the macro instruction according to the operating data amount of each operating device and the data amount of the macro instruction, generates an operating instruction corresponding to each operating device, and Send to queue building module 15. For example, according to the data amount of the macro instruction 3, the operation data amount of the operation device NPU-n, and the operation data amount of the operation device NPU-2, a plurality of operation instructions 3-1, 3-2, ... are generated for the operation device NPU-n , 3-n, generate multiple running instructions 3'-1, 3'-2, ..., 3'-n for running device NPU-2.

Queue building module 15

When receiving a running instruction, it sorts all the running instructions that each running device needs to execute according to the queue sorting rules, and builds a unique instruction queue corresponding to each running device according to the sorted running instructions, and queues the instruction queue Send to instruction dispatch module 16. specifically,

For a running instruction 1-1 executed by the running device CPU-2. The constructed instruction queue CPU-2 "corresponding to the execution device CPU-2 includes only the execution instruction 1-1.

For multiple execution instructions 2-1, 2-2, ..., 2-n executed by the operation device CPU-1. Sort multiple running instructions 2-1, 2-2, ..., 2-n according to the queue sorting rules, and build and run equipment based on the sorted multiple running instructions 2-1, 2-2, ..., 2-n CPU-1 corresponds to the instruction queue CPU-1 ".

For multiple running instructions 3-1, 3-2, ..., 3-n executed by the running device NPU-n. Sort multiple running instructions 3-1, 3-2, ..., 3-n according to the queue sorting rules, and build and run equipment based on the sorted multiple running instructions 3-n, ..., 3-2, 3-1 The instruction queue NPU-n corresponding to NPU-n ".

For a plurality of running instructions 3'-1, 3'-2, ..., 3'-n executed by the running device NPU-2. Sort the multiple running instructions 3'-1, 3'-2, ..., 3'-n according to the queue sorting rules, and sort the multiple running instructions 3'-n, ..., 3'-2, 3 ' -1 Construct the instruction queue NPU-2 "corresponding to the running device NPU-2".

For multiple execution instructions 4-1, 4-2, ..., 4-n executed by the execution device GPU-n. Sort multiple operating instructions 4-1, 4-2, ..., 4-n according to the queue sorting rules, and build and run equipment based on the sorted multiple operating instructions 4-1, 4-2, ..., 4-n GPU-n corresponding instruction queue GPU-n ".

Instruction dispatch module 16

After receiving the instruction queue, the operation instructions in each instruction queue are sequentially sent to the corresponding operation device, so that the operation device executes the operation instruction. For example, the execution instruction 1-1 included in the instruction queue CPU-2 "is sent to its corresponding operation device CPU-2. The multiple execution instructions 2-1, 2-2, ... in the instruction queue CPU-1" , 2-n, sent to its corresponding running device CPU-1 in turn. Send multiple running instructions 3-n, ..., 3-2, 3-1 in the instruction queue NPU-n "to their corresponding running device NPU-n in turn. Send multiple instructions in the instruction queue NPU-2" The operating instructions 3'-n, ..., 3'-2, 3'-1 are sent to its corresponding operating device NPU-2 in sequence. Multiple running instructions 4-1, 4-2, ..., 4-n in the queue GPU-n "are sequentially sent to its corresponding running device GPU-n.

Wherein, the above-mentioned running device CPU-2, running device CPU-1, running device NPU-n and running device NPU-2 execute the running instructions in sequence according to the order of running instructions in the command queue after receiving the instruction queue. Taking the running device CPU-2 as an example, the specific process of executing the received running instruction is described. The operating device CPU-2 includes a control module, an execution module, and a storage module. The control module includes an instruction storage sub-module, an instruction processing sub-module, and a storage queue sub-module, and the execution module includes a dependency relationship processing sub-module. For details, see the related description about operating devices above.

Assume that the run instruction 1-1 generated according to the macro instruction 1 is "@XXX ...". After receiving the operation instruction 1-1, the operation device CPU-2 executes the operation instruction 1-1 as follows:

The control module of the operating device CPU-2 acquires data, neural network models, and operating instructions 1-1. Among them, the instruction storage submodule is used to store the operation instruction 1-1. The instruction processing sub-module is used to parse the running instruction 1-1, obtain multiple parse instructions such as parse instruction 0, parse instruction 1 and parse instruction 2, and send the multiple parse instructions to the storage queue sub-module and the execution module. The storage queue sub-module is used to store the running instruction queue. The running instruction queue contains the parsing instruction 0, parsing instruction 1 and parsing instruction 2 and other running instructions to be executed by the running device CPU-2. All instructions in the running instruction queue are executed according to The order is in order. For example, the execution order of the obtained multiple analysis instructions is analysis instruction 0, analysis instruction 1, and analysis instruction 2, and there is an association relationship between analysis instruction 1 and analysis instruction 0.

After the execution module of the running device CPU-2 receives the multiple analysis commands, the dependency processing sub-module determines whether there is an association relationship between the multiple analysis commands. The dependency processing sub-module determines that there is an association relationship between the parsing instruction 1 and the parsing instruction 0, then caches the parsing instruction 1 in the instruction storage sub-module, and after determining that the parsing instruction 0 has been executed, extracts the parsing instruction 1 from the cache and sends it To the execution module for execution by the execution module.

The execution module receives and executes the parsing instruction 0, parsing instruction 1, and parsing instruction 2 to complete the operation of the running instruction 1-1.

For the working process of the above modules, please refer to the relevant description above.

In this way, the device can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low development labor and material costs.

Example 2

The neural network instruction processing system may include the foregoing alternative device and instruction generation device. As shown in FIGS. 8a and 8b, there may be multiple candidate devices for executing macro instructions. The candidate devices may be CPU-1, CPU-2, ..., CPU-n, NPU-1, NPU-2, ..., NPU-n and GPU-1, GPU-2, ..., GPU-n, the candidate device is selected to execute the corresponding running instruction is the running device. The working process and principle of the neural network instruction processing system generating and executing the operation instruction according to a certain macro instruction are as follows.

Resource acquisition module 14

Macro 1: @ XXX # 01 ...

Macro 2: @ SSS # 02 ...

Macro 3: @ DDD # 04 ...

Macro 4: @NNN ...

When the first instruction generating module 121 determines that there is only one running device and the resources of the running device meet the capacity condition for executing the macro instruction, it can generate a running instruction according to the macro instruction and send it to the queue construction module 15. For example, an operation instruction 1-1 is generated based on the data amount of the macro instruction 1 and the operation data amount of the operation device CPU-2.

Queue building module 15

Instruction dispatch module 16

Wherein, the above-mentioned running device CPU-2, running device CPU-1, running device NPU-n and running device NPU-2 execute the running instructions in sequence according to the order of running instructions in the command queue after receiving the instruction queue. Taking the running device CPU-2 as an example, the specific process of executing the received running instruction is described. The running device CPU-2 includes a control module 21, an execution module 22, and a storage module 23. The control module 21 includes an instruction storage sub-module 211, an instruction processing sub-module 212 and a storage queue sub-module 213, and the execution module 22 includes a dependency relationship processing sub-module 221. For details, please refer to the above related description of the running device.

The control module 21 of the operating device CPU-2 acquires data, neural network models, and operating instructions 1-1. Among them, the instruction storage submodule 211 is used to store the running instruction 1-1. The instruction processing sub-module 212 is used to parse the running instruction 1-1, obtain multiple parse instructions such as parse instruction 0, parse instruction 1 and parse instruction 2, and send the multiple parse instructions to the storage queue sub-module 213 and execution module twenty two. The storage queue sub-module 213 is used to store a running instruction queue. The running instruction queue contains the parsing instruction 0, parsing instruction 1 and parsing instruction 2 and other running instructions that need to be executed by the running device CPU-2. All instructions in the running instruction queue follow The order of execution is in order. For example, the execution order of the obtained multiple analysis instructions is analysis instruction 0, analysis instruction 1, and analysis instruction 2, and there is an association relationship between analysis instruction 1 and analysis instruction 0.

After the execution module 22 of the running device CPU-2 receives multiple parsing instructions, the dependency processing sub-module 221 determines whether there is an association relationship between the parsing instructions. The dependency processing submodule 221 determines that there is an association relationship between the parsing instruction 1 and the parsing instruction 0, and then caches the parsing instruction 1 in the instruction storage submodule 211, and after determining that the parsing instruction 0 has been executed, extracts the parsing instruction from the cache 1 Send to execution module 22 for execution module 22 to execute.

The execution module 22 receives and executes the parsing instruction 0, parsing instruction 1, and parsing instruction 2 to complete the operation of the operation instruction 1-1.

In this way, the system can be used across platforms, has good applicability, fast command conversion speed, high processing efficiency, low error probability, and low development labor and material costs.

The present disclosure provides a machine learning computing device. The machine learning computing device may include one or more of the aforementioned neural network instruction generating devices (or include one or more of the aforementioned neural network instruction processing systems) for obtaining from other processing devices To calculate the data and control information, perform the specified machine learning operation. The machine learning computing device can obtain macroinstructions or instructions to be executed from other machine learning computing devices or non-machine learning computing devices, and transfer the execution results to peripheral devices (also called other processing devices) through the I / O interface. Peripheral equipment such as camera, monitor, mouse, keyboard, network card, wifi interface, server. When more than one neural network command generation device (or neural network command processing system) is included, the neural network command generation device (or neural network command processing system) can link and transmit data through a specific structure, for example, through a fast external device The interconnect bus (that is, the PCIE bus) interconnects and transmits data to support larger-scale neural network operations. At this time, you can share the same control system or have separate control systems; you can share memory, or each accelerator has its own memory. In addition, the interconnection method can be any interconnection topology.

The machine learning computing device has high compatibility, and can be connected with various types of servers through the PCIE interface.

9a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in FIG. 9a, the combined processing device includes the above-mentioned machine learning computing device, a universal interconnection interface, and other processing devices. The machine learning computing device interacts with other processing devices to complete the operation specified by the user.

Other processing devices include one or more types of general-purpose / special-purpose processors such as central processing unit CPU, graphics processor GPU, neural network processor. The number of processors included in other processing devices is not limited. Other processing devices serve as an interface between the machine learning computing device and external data and control, including data handling, to complete the basic control of starting and stopping the machine learning computing device; other processing devices can also cooperate with the machine learning computing device to complete the computing task.

General interconnection interface, used to transfer data and control instructions between machine learning computing devices and other processing devices. The machine learning computing device obtains the required input data from other processing devices and writes them into the on-chip storage device of the machine learning computing device; it can obtain control instructions from other processing devices and write them into the control cache of the machine learning computing device; The data in the storage module of the machine learning computing device can be read and transmitted to other processing devices.

9b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in FIG. 9b, the combined processing device may further include a storage device, and the storage device is respectively connected to the machine learning operation device and the other processing device. The storage device is used to store data stored in the machine learning computing device and the other processing devices, and is particularly suitable for data that cannot be saved in the internal storage of the machine learning computing device or other processing devices.

The combined processing device can be used as an SOC on-chip system for mobile phones, robots, drones, video surveillance equipment, etc., effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption. In this case, the general interconnection interface of the combined processing device is connected to some components of the device. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.

The present disclosure provides a machine learning chip including the above machine learning arithmetic device or combination processing device.

The present disclosure provides a machine learning chip packaging structure including the above machine learning chip.

The present disclosure provides a board card. FIG. 10 shows a schematic diagram of a board card according to an embodiment of the present disclosure. As shown in FIG. 10, the board includes the above machine learning chip packaging structure or the above machine learning chip. In addition to the machine learning chip 389, the board may also include other supporting components. The supporting components include but are not limited to: a storage device 390, an interface device 391, and a control device 392.

The storage device 390 and the machine learning chip 389 (or the machine learning chip in the machine learning chip package structure) are connected via a bus, and are used to store data. The memory device 390 may include multiple sets of memory cells 393. Each group of storage units 393 and the machine learning chip 389 are connected by a bus. It can be understood that each group of storage units 393 may be DDR SDRAM (English: Double Data Rate SDRAM, double rate synchronous dynamic random access memory).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.

In one embodiment, the memory device 390 may include 4 sets of memory cells 393. Each group of memory cells 393 may include multiple DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include four 72-bit DDR4 controllers. Among the 72-bit DDR4 controllers, 64 bits are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transmission can reach 25600MB / s.

In one embodiment, each group of storage units 393 includes multiple double-rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling the data transmission and data storage of each storage unit 393.

The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip in the machine learning chip packaging structure). The interface device 391 is used to realize data transmission between the machine learning chip 389 and an external device (such as a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the machine learning chip 289 through a standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB / s. In another embodiment, the interface device 391 may also be other interfaces. The present disclosure does not limit the specific expressions of the other interfaces described above, and the interface device can implement the transfer function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (such as a server) by the interface device.

The control device 392 is electrically connected to the machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a microcontroller (Micro Controller Unit, MCU). For example, the machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and may drive multiple loads. Therefore, the machine learning chip 389 can be in different working states such as multiple loads and light loads. The control device can realize the regulation of the working states of multiple processing chips, multiple processes and / or multiple processing circuits in the machine learning chip.

The present disclosure provides an electronic device including the aforementioned machine learning chip or board.

Electronic equipment can include data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, Earphones, mobile storage, wearable devices, vehicles, household appliances, and / or medical devices.

Vehicles may include airplanes, ships, and / or vehicles. Household appliances may include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods. The medical equipment may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus, and / or an electrocardiograph.

It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the sequence of actions described. Because according to the present disclosure, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.

It should be further noted that although the steps in the flowcharts of FIG. 4 and FIG. 7 are displayed in order according to the arrows, the steps are not necessarily executed in the order indicated by the arrows. Unless clearly stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIGS. 4 and 7 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution order of the stages is not necessarily sequential, but may be executed in turn or alternately with other steps or sub-steps of the other steps or at least a part of the stages.

It should be understood that the above device embodiments are only schematic, and the device of the present disclosure may also be implemented in other ways. For example, the division of the units / modules in the above embodiments is only a division of logical functions, and there may be other divisions in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be ignored or not implemented.

In addition, unless otherwise specified, each functional unit / module in each embodiment of the present disclosure may be integrated into one unit / module, or each unit / module may exist alone physically, or two or more units / The modules are integrated together. The above integrated units / modules may be implemented in the form of hardware or software program modules.

If the integrated unit / module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, or the like. The physical implementation of the hardware structure includes but is not limited to transistors, memristors, and so on. Unless otherwise specified, the artificial intelligence processor may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Memory Cube), etc. Wait.

If the integrated unit / module is implemented in the form of a software program module and sold or used as an independent product, it may be stored in a computer-readable memory. Based on such an understanding, the technical solution of the present disclosure essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a memory, Several instructions are included to enable a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

An embodiment of the present disclosure also proposes a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to call the instructions stored in the memory to perform the above method.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed in an embodiment, you can refer to the related descriptions of other embodiments. The technical features of the above embodiments can be arbitrarily combined. To simplify the description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the scope described in this specification.

The embodiments of the present disclosure have been described in detail above, and specific examples have been used herein to explain the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, those skilled in the art based on the ideas of the present disclosure, based on the specific embodiments of the present disclosure and the changes or modifications made in the scope of application, all fall within the scope of protection of the present disclosure. In summary, the content of this specification should not be construed as limiting the disclosure.

Claims

A neural network instruction generating device, characterized in that the device comprises:

A device determination module, configured to determine a running device that executes the macro instruction according to the received macro instruction;

The instruction generation module is used to generate an operation instruction according to the macro instruction and the operation device.
The apparatus according to claim 1, wherein the device determination module includes:

The first determining submodule is configured to determine the designated device as the running device when it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device satisfy the execution conditions for executing the macro instruction ,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
The device according to claim 2, wherein the device further comprises:

A resource obtaining module, configured to obtain resource information of the candidate device, the resource information includes an instruction set included in the candidate device,

The device determination module further includes a second determination submodule and / or a third determination submodule,

The second determining submodule is configured to, from the candidate device, according to the received macro instruction and the resource information of the candidate device when determining that the macro instruction does not include the identifier of the designated device Determine a running device for executing the macro instruction,

The third determining sub-module, when it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device does not satisfy the execution condition for executing the macro instruction, according to the macro instruction and the backup Select the resource information of the device and determine the operating device.
The device according to claim 2 or 3, wherein the macro instruction includes at least one of an input quantity and an output quantity,

The instruction generation module is also used to determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operating device,

Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
The apparatus according to claim 4, wherein the instruction generation module includes a first instruction generation sub-module and / or a second instruction generation sub-module,

The first instruction generating sub-module is used to determine the number of the running device and the resource of the running device does not satisfy the capacity condition for executing the macro instruction, according to the amount of running data and The amount of data splits the macro instruction into multiple running instructions, so that the running device executes multiple running instructions in sequence;

The second instruction generation sub-module is used to split the macro instruction according to the amount of operation data of each operation device and the amount of data when it is determined that there are a plurality of operation devices, and generate corresponding to each Operating instructions for operating equipment,

The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
The apparatus according to claim 1, wherein the apparatus further comprises at least one module of a queue construction module, a macro instruction generation module and an instruction dispatch module,

The queue construction module is configured to sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions;

The macro instruction generation module is configured to receive an instruction to be executed, and generate the macro instruction according to the determined identifier of the designated device and the instruction to be executed;

The instruction dispatching module is configured to send the operation instruction to the operation device, so that the operation device executes the operation instruction,

Wherein, the instruction dispatching module includes:

An instruction assembly submodule, used to generate an assembly file according to the operation instruction;

An assembly translation submodule, used to translate the assembly file into a binary file;

An instruction sending submodule, configured to send the binary file to the running device, so that the running device executes the running instruction according to the binary file.
The device according to claim 1, characterized in that

The running device is one or any combination of CPU, GPU and NPU;

The device is set in the CPU and / or NPU;

The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,

The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,

The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,

The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;

The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,

The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
A machine learning computing device, characterized in that the device includes:

One or more neural network instruction generating devices according to any one of claims 1-6, which are used to obtain data and control information to be calculated from other processing devices, and perform specified machine learning operations, and pass the execution result through I / O interface is passed to other processing devices;

When the machine learning operation device includes a plurality of the neural network instruction generation devices, the plurality of the neural network instruction generation devices can be connected and transmit data through a specific structure;

Among them, a plurality of the neural network instruction generation devices interconnect and transmit data through a fast external device interconnection bus to support larger-scale machine learning operations; a plurality of the neural network instruction generation devices share the same control system or own Respective control systems; a plurality of the neural network instruction generating devices share memory or have their own memories; the interconnection method of the plurality of neural network instruction generating devices is an arbitrary interconnection topology.
A combined processing device, characterized in that the device comprises:

The machine learning computing device, universal interconnection interface and other processing devices as claimed in claim 8;

The machine learning computing device interacts with the other processing device to jointly complete the calculation operation specified by the user,

Wherein, the device further includes a storage device, which is respectively connected to the machine learning computing device and the other processing device, and is used for storing data of the machine learning computing device and the other processing device.
A machine learning chip, characterized in that the machine learning chip includes:

The machine learning arithmetic device according to claim 8 or the combined processing device according to claim 9.
An electronic device, characterized in that the electronic device includes:

The machine learning chip according to claim 10.
A board card, characterized in that the board card comprises: a storage device, an interface device and a control device, and the machine learning chip according to claim 10;

Wherein, the machine learning chip is respectively connected to the storage device, the control device and the interface device;

The storage device is used for storing data;

The interface device is used to realize data transmission between the machine learning chip and an external device;

The control device is used for monitoring the state of the machine learning chip.
A neural network instruction generation method, characterized in that the method includes:

According to the received macro instruction, determine the execution device for executing the macro instruction;

According to the macro instruction and the operation device, an operation instruction is generated.
The method according to claim 13, characterized in that, according to the received macro instruction, determining a running device that executes the macro instruction includes:

When it is determined that the macro instruction includes the identifier of the designated device, and the resource of the designated device satisfies the execution condition for executing the macro instruction, determining the designated device as the running device,

Wherein, the execution condition includes: the specified device contains an instruction set corresponding to the macro instruction.
The method according to claim 14, wherein the method further comprises:

Acquiring resource information of the candidate device, the resource information includes an instruction set included in the candidate device,

Wherein, according to the received macro instruction, determining the execution device for executing the macro instruction includes any one of the following processes:

When it is determined that the macro instruction does not include the identifier of the designated device, according to the received macro instruction and the resource information of the candidate device, determine from the candidate device for executing the macro instruction Run the device, or

When it is determined that the macro instruction includes the identifier of the designated device, and the resources of the designated device do not satisfy the execution conditions for executing the macro instruction, determine according to the resource information of the macro instruction and the candidate device Run the device.
The method according to claim 14 or 15, wherein the macro instruction includes at least one of an input quantity and an output quantity,

According to the macro instruction and the operation device, generating an operation instruction includes:

Determine the data amount of the macro instruction, and generate an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device,

Wherein, the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one item of storage capacity and remaining storage capacity.
The method according to claim 16, wherein generating an operation instruction according to the data amount of the macro instruction, the macro instruction and the resource information of the operation device includes any one of the following processes:

When it is determined that the running device is one, and the resources of the running device do not satisfy the capacity condition for executing the macro instruction, the macro instruction is split into according to the running data amount of the running device and the data amount Multiple running instructions, so that the running device sequentially executes multiple running instructions, or

When it is determined that there are a plurality of operation devices, the macro instruction is split according to the operation data amount of each operation device and the data amount, and an operation instruction corresponding to each operation device is generated,

The operation data amount of each operation device is determined according to the resource information of each operation device, and the operation instruction includes at least one of an operation input amount and an operation output amount, the operation input amount and the operation output The amount is determined according to the amount of operation data of the operation device that executes the operation instruction.
The method according to claim 13, wherein the method further comprises at least one of the following processes:

Sort the running instructions according to the queue sorting rules, and construct an instruction queue corresponding to the running device according to the sorted running instructions;

Receiving an instruction to be executed, and generating the macro instruction according to the determined identifier of the designated device and the instruction to be executed;

Sending the operation instruction to the operation device, so that the operation device executes the operation instruction,

Wherein, sending the operation instruction to the operation device to make the operation device execute the operation instruction includes:

Generate an assembly file according to the operation instruction;

Translate the assembly file into a binary file;

Sending the binary file to the running device, so that the running device executes the running instruction according to the binary file.
The method according to claim 13, characterized in that

The running device is one or any combination of CPU, GPU and NPU;

The method is applied to CPU and / or NPU;

The macro instruction includes at least one of the following instructions: calculation macro instruction, control macro instruction and data handling macro instruction,

The calculation macro instruction includes at least one of a neural network calculation macro instruction, a vector logic calculation macro instruction, a matrix vector calculation macro instruction, a scalar calculation macro instruction, and a scalar logic calculation macro instruction,

The control macro instruction includes at least one of an unconditional jump macro instruction and a conditional jump macro instruction,

The data handling macro instruction includes at least one of a read macro instruction and a write macro instruction. The read macro instruction includes at least one of a read neuron macro instruction, a read synapse macro instruction, and a read scalar macro instruction. The instruction includes at least one of write neuron macroinstruction, write synapse macroinstruction and write scalar macroinstruction;

The macro instruction includes at least one of the following options: the identification, operation type, input address, output address, input amount, output amount, operand and instruction parameter of the designated device for executing the macro instruction,

The run instruction includes at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.