CN111079916A - Operation method, system and related product - Google Patents

Operation method, system and related product Download PDF

Info

Publication number
CN111079916A
CN111079916A CN201811222531.XA CN201811222531A CN111079916A CN 111079916 A CN111079916 A CN 111079916A CN 201811222531 A CN201811222531 A CN 201811222531A CN 111079916 A CN111079916 A CN 111079916A
Authority
CN
China
Prior art keywords
instruction
control
macro
instructions
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811222531.XA
Other languages
Chinese (zh)
Other versions
CN111079916B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN201811222531.XA priority Critical patent/CN111079916B/en
Priority to PCT/CN2019/111852 priority patent/WO2020078446A1/en
Publication of CN111079916A publication Critical patent/CN111079916A/en
Application granted granted Critical
Publication of CN111079916B publication Critical patent/CN111079916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30094Condition code generation, e.g. Carry, Zero flag

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)

Abstract

The present disclosure relates to an operational method, system and related product. The system includes an instruction generating device and an executing device. The instruction generating device includes: the device determining module is used for determining running devices for executing the control macro instructions according to the received control macro instructions; and the instruction generating module is used for generating an operation instruction according to the control macro instruction and the operation equipment. The operation device includes: the control module is used for acquiring required data, a neural network model and an operation instruction, analyzing the operation instruction and acquiring a plurality of analysis instructions; the execution module is used for executing the plurality of analysis instructions according to the data to obtain an execution result. The operation method, the operation system and the related products provided by the embodiment of the disclosure can be used in a cross-platform mode, and have the advantages of good applicability, high instruction conversion speed, high processing efficiency, low error probability and low development cost of manpower and material resources.

Description

Operation method, system and related product
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a method and a system for processing a control instruction, and a related product.
Background
With the continuous development of science and technology, neural network algorithms are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the scale of the neural network algorithms is continuously increased. A large-scale neural network model based on a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU) takes a lot of computation time and consumes a lot of power. In the related art, the method for accelerating the processing speed of the neural network model has the problems of incapability of cross-platform processing, low processing efficiency, high development cost, easiness in making mistakes and the like.
Disclosure of Invention
In view of this, the present disclosure provides a method and a system for processing a control instruction, and a related product, which enable the control instruction to be used across platforms, improve processing efficiency, and reduce error probability and development cost.
According to a first aspect of the present disclosure, there is provided a control instruction processing system including an instruction generating apparatus and an executing apparatus,
the instruction generating apparatus includes:
the device determining module is used for determining running devices for executing the control macro instructions according to the received control macro instructions;
the instruction generating module is used for generating an operating instruction according to the control macro instruction and the operating equipment; the operation device includes:
the control module is used for acquiring required data, a neural network model and the operation instruction, analyzing the operation instruction and acquiring a plurality of analysis instructions;
the execution module is used for executing the plurality of analysis instructions according to the data to obtain an execution result,
the control macro is used for controlling the instruction stream to jump to a target jump position, the control macro comprises an operation type and a target jump position, the operation instruction comprises the operation type and an operation target jump position, and the operation target jump position is determined according to the target jump position.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:
one or more control instruction processing systems according to the first aspect, configured to acquire data to be operated and control information from another processing apparatus, execute a designated machine learning operation, and transmit an execution result to the other processing apparatus through an I/O interface;
when the machine learning arithmetic device comprises a plurality of control instruction processing systems, the control instruction processing systems can be connected through a specific structure and transmit data;
the control instruction processing systems are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the control instruction processing systems share the same control system or own respective control systems; the control instruction processing systems share a memory or own respective memories; the interconnection mode of the control instruction processing systems is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:
the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.
According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a control instruction processing method applied to a control instruction processing system including an instruction generating apparatus and an executing apparatus, the method including:
determining, by the instruction generating device, an operating device that executes the control macro instruction according to the received control macro instruction, and generating an operating instruction according to the control macro instruction and the operating device;
acquiring data, a neural network model and an operation instruction through the operation equipment, analyzing the operation instruction to obtain a plurality of analysis instructions, executing the plurality of analysis instructions according to the data to obtain an execution result,
wherein the control macro-instruction is a macro-instruction for controlling the instruction stream to jump to the target jump position,
the control macro instruction comprises an operation type and a target jump position, the operation instruction comprises the operation type and an operation target jump position, and the operation target jump position is determined according to the target jump position.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The control instruction processing method, the control instruction processing system and the related product provided by the embodiment of the disclosure comprise instruction generating equipment and operating equipment. The instruction generating device includes: the device determining module is used for determining running devices for executing the control macro instructions according to the received control macro instructions; and the instruction generating module is used for generating an operation instruction according to the control macro instruction and the operation equipment. The operation device includes: the control module is used for acquiring required data, a neural network model and an operation instruction, analyzing the operation instruction and acquiring a plurality of analysis instructions; the execution module is used for executing the plurality of analysis instructions according to the data to obtain an execution result. The method, the system and the related products can be used in a cross-platform mode, the applicability is good, the instruction conversion speed is high, the processing efficiency is high, the error probability is low, and the cost of developing manpower and material resources is low.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a block diagram of a control instruction processing system according to an embodiment of the present disclosure.
FIG. 2 shows a block diagram of a control instruction processing system according to an embodiment of the present disclosure.
Fig. 3a, 3b show schematic diagrams of application scenarios of a control instruction processing system according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.
Fig. 6 shows a flow diagram of a control instruction processing method according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
FIG. 1 shows a block diagram of a control instruction processing system according to an embodiment of the present disclosure. As shown in fig. 1, the system includes an instruction generating apparatus 100 and an executing apparatus 200.
The instruction generating device 100 includes a device determining module 11 and an instruction generating module 12. The device determining module 11 is configured to determine, according to the received control macro, an operating device that executes the control macro. The instruction generating module 12 is configured to generate an operation instruction according to the control macro and the operation device.
The operation device 200 includes a control module 21 and an execution module 22. The control module 21 is configured to obtain required data, a neural network model, and an operation instruction, and analyze the operation instruction to obtain a plurality of analysis instructions. The execution module 22 is configured to execute a plurality of parsing instructions according to the data to obtain an execution result.
The control macro is used for controlling the instruction stream to jump to the target jump position. The control macro instruction comprises an operation type and a target jump position, and the operation instruction comprises an operation type and a target jump position. The operation target jump position is determined according to the target jump position.
In this embodiment, macro is a name for batch processing, and macro may be a rule or pattern, or syntax replacement, which is automatically performed when macro is encountered. The control macro can be formed by integrating the commonly used control instructions to be executed for calculating, controlling, transporting and the like of data.
In this embodiment, the operation type may refer to a type of the control macro performing an operation on data, and represents a specific type of the control macro, for example, when an operation type of a certain control macro is "XXX", the specific type of the operation performed on the data by the control macro may be determined according to the "XXX". The instruction set required for executing the control macro instruction may be determined according to the operation type, for example, when the operation type of a control macro instruction is "XXX", the instruction set required for executing the processing corresponding to "XXX" is all the instruction sets required for executing the processing corresponding to "XXX". The target jump location may be the location to which the instruction stream needs to jump. The target jump location may be the length of a register, the address of a register, the identity of a register, an immediate, etc. The control macro may also contain other parameters related to control instruction stream jumps, which are not limited by this disclosure.
In one possible implementation, the control macro may include at least one of: conditional jump macro instructions and unconditional jump macro instructions. The conditional jump macro may contain a jump condition. The jump condition is a condition that needs to be satisfied to perform an instruction stream jump.
It should be understood that the instruction format and the contained content of the control macro instructions may be set as desired by those skilled in the art, and the present disclosure is not limited thereto.
In this embodiment, the device determining module 11 may determine one or more operating devices according to the control macro. Instruction generation module 12 may generate one or more execution instructions. When a plurality of generated operating instructions are provided, the plurality of operating instructions may be executed in the same operating device or different operating devices, and the present disclosure is not limited thereto.
In this embodiment, when there is one execution device, the instruction generation module 12 may generate one or more execution instructions according to the control macro. When there are multiple operating devices, the instruction generating module 12 may generate an operating instruction corresponding to each operating device according to the control macro, where each operating device may correspond to one or more operating instructions.
The control instruction processing system provided by the embodiment of the disclosure comprises an instruction generating device and an operating device. The instruction generating device comprises a device determining module used for determining an operation device for executing the control macro instruction according to the received control macro instruction; and the instruction generating module is used for generating an operation instruction according to the control macro instruction and the operation equipment. The operation equipment comprises a control module, a neural network model and an operation instruction, wherein the control module is used for acquiring required data, the neural network model and the operation instruction, analyzing the operation instruction and acquiring a plurality of analysis instructions; the execution module is used for executing the plurality of analysis instructions according to the data to obtain an execution result. The control instruction processing system provided by the embodiment of the disclosure can be used in a cross-platform manner, and has the advantages of good applicability, high instruction conversion speed, high processing efficiency, low error probability and low development cost of manpower and material resources.
FIG. 2 shows a block diagram of a control instruction processing system according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2, the device determining module 11 may include a first determining sub-module 111. The first determining sub-module 111 is configured to determine the specified device as an operating device when it is determined that the control macro includes the identifier of the specified device and the resource of the specified device meets the execution condition for executing the control macro. Wherein, the execution condition may include: the designated device contains a set of instructions corresponding to the control macro.
The identifier of the specific device may be a physical address, an IP address, a name, a number, and the like of the specific device. The mark may comprise one or any combination of numbers, letters, symbols. When the position of the identifier of the designated equipment of the control macro instruction is null, determining that the control macro instruction has no designated equipment; or, when the field of "identification of specified device" is not included in the control macro, it is determined that the control macro does not have the specified device.
In this implementation, the control macro may include an identification of one or more designated devices that execute the control macro. When the control macro includes the identifier of the specified device and the resource of the specified device meets the execution condition, the first determining sub-module 111 may directly determine the specified device as the operating device, so as to save the generation time for generating the operating instruction based on the control macro and ensure that the generated operating instruction can be executed by the corresponding operating device.
In one possible implementation, as shown in fig. 2, the instruction generating device 100 may further include a macro instruction generating module 13. The macro instruction generating module 13 is configured to receive a control instruction to be executed, and generate a control macro instruction according to the determined identifier of the specific device and the control instruction to be executed.
In this implementation, the specified device may be determined according to the operation type of the control instruction to be executed, or the like. The received control instruction to be executed can be one or more.
The control instructions to be executed may include at least one of: and a conditional jump instruction is to be executed and an unconditional jump instruction is to be executed.
The control instruction to be executed may include at least the following: operation type and target jump position.
In this implementation, when there is one control instruction to be executed, the determined identifier of the specific device may be added to the control instruction to be executed, so as to generate a control macro. For example, some control instruction m to be executed is "XXX … … param". Where XXX is the operation type and param is the instruction parameter. The designated device m-1 can be determined according to the operation type "XXX" of the control instruction m to be executed. Then, an identification (e.g., 09) specifying the device M-1 is added to the control instruction M to be executed, and a control macro instruction M "XXX 09, … … param" corresponding to the control instruction M to be executed is generated. When the number of the control instructions to be executed is multiple, the identifier of the specified device corresponding to each determined control instruction to be executed may be added to the control instructions to be executed, and one control macro instruction is generated or a plurality of corresponding control macro instructions are generated according to the plurality of control instructions to be executed with the identifier of the specified device.
It should be understood that the instruction format and the contained content of the control instruction to be executed can be set by those skilled in the art according to the needs, and the present disclosure does not limit this.
In one possible implementation, as shown in fig. 2, the instruction generating apparatus 100 may further include a resource obtaining module 14. The device determination module 11 may also include a second determination submodule 112. The resource obtaining module 14 is configured to obtain resource information of the alternative device. The second determining submodule 112 is configured to, when it is determined that the control macro does not include the identifier of the specified device, determine, according to the received control macro and the resource information of the alternative device, an operating device for executing the control macro from the alternative device. Wherein the resource information may comprise a set of instructions contained by the alternative device. The set of instructions included by the alternative device may be a set of instructions corresponding to one or more types of operations of the control macro. The more instruction sets the alternative device contains, the more types of control macro instructions the alternative device is capable of executing.
In this implementation, the second determining submodule 112 may determine, when it is determined that the control macro does not include the identifier of the specific device, one or more operating devices capable of executing the control macro from the alternative devices. Wherein the determined instruction set of the operating device includes an instruction set corresponding to the control macro. For example, the received control macro is an unconditional control jump macro, and an alternative device containing an instruction set corresponding to the unconditional control jump macro may be determined as a running device to ensure that it can run the generated running instruction.
In one possible implementation, as shown in fig. 2, the device determining module 11 may further include a third determining sub-module 113. When it is determined that the control macro includes the identifier of the specified device and the resource of the specified device does not satisfy the execution condition for executing the control macro, the third determining sub-module 113 determines the operating device according to the control macro and the resource information of the alternative device.
In this implementation, when it is determined that the control macro includes the identifier of the specified device and the resource of the specified device does not satisfy the execution condition, the third determining sub-module 113 may determine that the specified device of the control macro does not have the capability of executing the control macro. The third determination submodule 113 may determine an operating device from among the alternative devices, and may determine an alternative device containing an instruction set corresponding to the control macro as the operating device.
In a possible implementation manner, the instruction generating module 12 is further configured to determine a data amount of the control macro, and generate the operation instruction according to the data amount of the control macro, and resource information of the operation device. The resource information of the operating device may further include at least one of a storage capacity and a remaining storage capacity.
The storage capacity of the operating device may refer to the amount of binary information that the memory of the operating device can accommodate. The remaining storage capacity of the operating device may refer to the storage capacity that the operating device is currently available for instruction execution after the occupied storage capacity is removed. The resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity are, the stronger the operation capability of the operation device is.
In this implementation manner, the instruction generating module 12 may determine a specific manner of splitting the control macro instruction according to the resource information of each operating device, the data size of the control macro instruction, and the like, so as to split the control macro instruction and generate an operating instruction corresponding to the operating device.
In a possible implementation manner, the instruction generating module 12 may be configured to, when it is determined that there is one execution device and the operation data amount of the execution device is smaller than the data amount of the control macro, split the control macro into a plurality of operation instructions according to the operation data amount and the data amount of the execution device, so that the execution device sequentially executes the plurality of operation instructions. Wherein, the operation data amount of the operation device may be determined according to the resource information of the operation device. The operation data amount of the operation device may be determined according to the storage capacity or the remaining storage capacity of the operation device.
In this implementation, when it is determined that there is one operating device and the data amount of the operating device is greater than or equal to the data amount of the control macro, the instruction generating module 12 may directly convert the control macro into one operating instruction, and may split the control macro into a plurality of operating instructions, which is not limited in this disclosure.
In a possible implementation manner, the instruction generating module 12 may be configured to, when it is determined that there are multiple operating devices, split the control macro according to the data amount of the control macro and the operation data amount of each operating device, and generate an operation instruction corresponding to each operating device. Wherein the operation data amount of each operation device may be determined according to the resource information of each operation device.
In this implementation, the instruction generating module 12 may generate one or more operation instructions for each operating device according to the operation data amount of each operating device, so that the corresponding operating device can execute the operation instructions.
In a possible implementation manner, the instruction generating module 12 may further split the control macro instruction according to the control macro instruction and a preset control macro instruction splitting rule to generate the operation instruction. The control macro splitting rule may be determined according to a conventional control macro splitting manner (for example, splitting according to a processing procedure of a control macro, and the like), in combination with a threshold of an amount of running data of instructions that can be executed by all the candidate devices. And splitting the control macro instruction into the operation instruction with the data volume less than or equal to the operation data volume threshold value so as to ensure that the generated operation instruction can be executed in the corresponding operation device (the operation device is any one of the alternative devices). The storage capacities (or remaining storage capacities) of all the candidate devices may be compared, and the determined minimum storage capacity (or remaining storage capacity) may be determined as an operation data amount threshold of the instruction that can be executed by all the candidate devices.
It should be understood that, the person skilled in the art can set the generation mode of the operation instruction according to the actual needs, and the present disclosure does not limit this.
In this embodiment, the operation instruction generated by the instruction generating module according to the control macro may be a control instruction to be executed, or may be one or more analyzed instructions obtained by analyzing the control instruction to be executed, which is not limited in this disclosure.
In one possible implementation, as shown in fig. 2, the instruction generating apparatus 100 may further include a queue building module 15. The queue building module 15 is configured to sort the operation instructions according to a queue sorting rule, and build an instruction queue corresponding to the operation device according to the sorted operation instructions.
In this implementation, an instruction queue uniquely corresponding to each execution device may be constructed for each execution device. The operating instructions can be sequentially sent to the operating equipment uniquely corresponding to the instruction queue according to the sequence of the operating instructions in the instruction queue; or the instruction queue may be sent to the execution device, so that the execution device sequentially executes the execution instructions in the instruction queue according to the order of the execution instructions in the instruction queue. By the mode, the operation equipment can execute the operation instruction according to the instruction queue, the operation instruction is prevented from being executed mistakenly and delayed, and the operation instruction is prevented from being omitted.
In this implementation, the queue sorting rule may be determined according to information such as an expected execution time for executing the execution instruction, a generation time of the execution instruction, and an operation type related to the execution instruction itself, which is not limited by this disclosure.
In one possible implementation, as shown in FIG. 2, instruction generation device 100 may also include an instruction dispatch module 16. The instruction dispatch module 16 is configured to send the execution instruction to the execution device, so that the execution device executes the execution instruction.
In this implementation, when there is one execution instruction executed by the execution device, the execution instruction may be directly sent to the execution device. When the number of the operation instructions executed by the operation device is multiple, all of the multiple operation instructions may be sent to the operation device, so that the operation device sequentially executes the multiple operation instructions. The plurality of operation instructions can also be sequentially sent to the corresponding operation equipment, wherein after the operation equipment completes the current operation instruction, the next operation instruction corresponding to the current operation instruction is sent to the operation equipment each time. The manner in which the person skilled in the art can send the operation instruction to the operation device is set, and the present disclosure does not limit this.
In one possible implementation, as shown in FIG. 2, the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162, and an instruction issue submodule 163. The instruction assembling sub-module 161 is used for generating an assembling file according to the operation instruction. The assembly translation sub-module 162 is used to translate the assembly file into a binary file. The instruction sending submodule 163 is configured to send the binary file to the operating device, so that the operating device executes the operating instruction according to the binary file.
By the mode, the data volume of the operation instruction can be reduced, the time for sending the operation instruction to the operation equipment is saved, and the conversion and execution speed of the control macro instruction is increased.
In this implementation manner, after the binary file is sent to the running device, the running device may decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.
In a possible implementation manner, the running device 200 may be one or any combination of a CPU, a GPU, and an embedded Neural-Network Processing Unit (NPU). In this way, the speed at which the device generates the operating instructions from the control macro is increased.
In one possible implementation, the instruction generating apparatus 100 may be provided in the CPU and/or the NPU. To realize the process of generating the operation instruction according to the control macro by the CPU and/or NPU, providing more possible ways for the implementation of the instruction generating apparatus 100.
In a possible implementation, the execution device 200 further comprises a storage module 23. The memory module 23 may include at least one of a register and a cache, and the cache may include a scratch pad cache. The cache may be used to store data. The registers may be used to store scalar data within the data.
In one possible implementation, the control module 21 may include an instruction storage submodule 211 and an instruction processing submodule 212. The instruction storage submodule 211 is used for storing the operation instruction. The instruction processing submodule 212 is configured to parse the operation instruction to obtain a plurality of parsing instructions.
In one possible implementation, the control module 21 may further include a store queue submodule 213. The storage queue submodule 213 is configured to store an operation instruction queue, where the operation instruction queue includes an operation instruction that needs to be executed by the operation device and a plurality of parsing instructions. And all the instructions in the operation instruction queue are sequentially arranged according to the execution sequence.
In a possible implementation, the execution module 22 may further include a dependency processing sub-module 221. The dependency relationship processing submodule 221 is configured to cache the first parsing instruction in the instruction storage submodule when it is determined that the first parsing instruction has an association relationship with a zeroth parsing instruction before the first parsing instruction, and extract the first parsing instruction from the instruction storage submodule after the zeroth parsing instruction is executed and send the first parsing instruction to the execution module.
The association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: the first storage address interval for storing the data required by the first analysis instruction and the zeroth storage address interval for storing the data required by the zeroth analysis instruction have an overlapping area. Conversely, the no association relationship between the first parse instruction and the zeroth parse instruction may be that the first memory address interval and the zeroth memory address interval have no overlapping area.
In one possible implementation, the control macro refers to a macro that controls the instruction stream to jump to a target jump location. The unconditional jump macro may be a macro for controlling the instruction stream to jump unconditionally to a specified location. The conditional jump macro may be a macro for controlling an instruction stream so that the conditional jump macro jumps to a specified location when a condition that needs to be satisfied is true.
In this embodiment, for a control macro, it must include an operation code, i.e. an operation type, and at least one operation field, which includes an identifier of a specific device, a target jump location, and a jump condition. An opcode may be the portion of an instruction or field (usually denoted by a code) specified in a computer program that is to perform an operation, and is an instruction sequence number that tells the device executing the instruction which instruction specifically needs to be executed. The operation domain may be a source of all data required for executing the corresponding instruction, including parameter data, data to be operated on or processed, a corresponding operation method, or an address or the like storing the parameter data, the data to be operated on or processed, the corresponding operation method.
It should be noted that, although the control instruction processing system is described above by taking the above-described embodiment as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Application example
An application example according to the embodiment of the present disclosure is given below in conjunction with "a working process of a control instruction processing system" as one exemplary application scenario to facilitate understanding of a flow of the control instruction processing system. It is to be understood by those skilled in the art that the following application examples are for the purpose of facilitating understanding of the embodiments of the present disclosure only and are not to be construed as limiting the embodiments of the present disclosure.
First, the instruction format of the control macro instruction, the instruction format of the control instruction to be executed, and the process of executing the execution instruction by the execution device are described, and the following is a specific example.
The instruction format of the unconditional jump macro may be:
Jump device_id,src
wherein Jump is an operation type corresponding to the unconditional Jump macro instruction, device _ id is an identifier of the designated device, and src is a target Jump position to which the instruction stream needs to Jump. The target jump location may be the length of a register, the address of a register, the identity of a register, an immediate, etc.
For the unconditional jump macro, the unconditional jump macro must contain an operation type and a target jump position, and the operation instruction generated according to the unconditional jump macro must also contain the operation type and the target jump position. Wherein the operation target jump position is determined according to the target jump position.
Take the example where the run instruction generated from a certain unconditional Jump macro is "@ Jump # 505". After the operation device receives the operation instruction, the execution process is as follows: the current instruction stream is jumped to address 505 for continued execution.
The instruction format of the conditional jump macro instruction may be:
CB device_id,src,condition
the CB is an operation type corresponding to the conditional jump macro instruction, the device _ id is an identifier of a designated device, the src is a target jump position to which the instruction stream needs to jump, and the condition is a jump condition. For example, the condition may be "whether the value of the register is zero or not true", and when the value of the register is zero, the jump to the target jump position may be performed. The target jump location may be the length of a register, the address of a register, the identity of a register, an immediate, etc.
For the conditional jump macro, it must contain the operation type and the target jump position, and the operation instruction generated according to the conditional jump macro must also contain the operation type and the target jump position. Wherein the operation target jump position is determined according to the target jump position.
Take the example where the run instruction generated from a conditional jump macro instruction is "@ CB #506# h". After the operation device receives the operation instruction, the execution process is as follows: and judging whether the jump condition 'h' is true, and jumping the current instruction stream to the address 506 for continuous execution when the 'h' is true.
The instruction format of the unconditional jump instruction to be executed may be:
Jump src
wherein Jump is the operation type corresponding to the unconditional Jump instruction to be executed, and src is the target Jump position to which the instruction stream needs to Jump.
The instruction format of the conditional jump instruction to be executed may be:
CB src,condition
the CB is an operation type corresponding to a conditional jump instruction to be executed, src is a target jump position to which an instruction stream needs to jump, and condition is a jump condition. For example, the condition may be "whether the value of the register is zero or not true", and when the value of the register is zero, the jump to the target jump position may be performed.
Fig. 3a, 3b show schematic diagrams of application scenarios of a control instruction processing system according to an embodiment of the present disclosure. The control instruction processing system may include the above-described alternative device and the instruction generating device. As shown in fig. 3a and fig. 3b, there may be a plurality of alternative devices for executing the control macro, the alternative devices may be CPU-1, CPU-2, …, CPU-n, NPU-1, NPU-2, …, NPU-n and GPU-1, GPU-2, …, and GPU-n, and the alternative device is selected to execute the corresponding operation instruction, i.e. the operation device. The working process and principle of the control instruction processing system for generating and executing the operation instruction according to a certain control macro instruction are as follows.
Resource acquisition module 14
And acquiring resource information of the alternative device, wherein the resource information comprises the residual storage capacity and the storage capacity of the alternative device and an instruction set contained in the alternative device. The resource obtaining module 14 sends the obtained resource information of the candidate device to the device determining module 11 and the instruction generating module 12.
The device determination module 11 (including a first determination sub-module 111, a second determination sub-module 112, and a third determination sub-module 113)
And when the control macro instruction is received, determining the running equipment for executing the control macro instruction according to the received control macro instruction. For example, the following control macro is received. Wherein the control macro may be from a different platform.
Control macro 1: @ XXX #01 … …
Control macro 2: @ SSS #02 … …
Control macro 3: @ DDD #04 … …
Control macro 4: @ NNN … …
When the first determining sub-module 111 determines that the control macro includes the identifier of the specified device and determines that the specified device includes the instruction set corresponding to the control macro, the first determining sub-module 111 may determine the specified device as an operating device for executing the control macro, and send the identifier of the determined operating device to the instruction generating module 12. For example, the first determination sub-module 111 may determine a specific device corresponding to the identifier 01, such as CPU-2 (the CPU-2 includes an instruction set corresponding to the control macro 1), as an execution device for executing the control macro 1. A specific device such as CPU-1 (in which CPU-1 includes an instruction set corresponding to control macro 2) corresponding to the identifier 02 may be determined as an execution device for executing the control macro 2.
When the third determining sub-module 113 determines that the control macro includes the identifier of the specified device and determines that the specified device does not include the instruction set corresponding to the control macro, the third determining sub-module 113 may determine, as the operating device, the candidate device including the instruction set corresponding to the control macro, and send the identifier of the determined operating device to the instruction generating module 12. For example, when it is determined that the instruction set corresponding to the control macro 3 is not included in the designated device corresponding to the identifier 04, the third determination sub-module 113 may determine, as the running device for executing the control macro 3, the candidate device, such as NPU-n or NPU-2, including the instruction set corresponding to the operation type DDD of the control macro 3.
When the second determining submodule 112 determines that the control macro instruction does not have the identifier of the specified device (the position corresponding to the identifier of the specified device is empty, or the control macro instruction does not include the field of "identifier of specified device"), the second determining submodule 112 may determine the operating device from the alternative devices according to the control macro instruction and the resource information of the alternative devices (the specific determination process is detailed in the description related to the second determining submodule 112), and send the determined identifier of the operating device to the instruction generating module 12. For example, since the control macro 4 does not have the identifier of the specified device, the second determining sub-module 112 may determine, from the candidate devices, an operating device, for example, GPU-n (GPU-n includes an instruction set corresponding to the operation type NNN) for executing the control macro 4, according to the operation type NNN of the control macro 4 and resource information (included instruction set) of the candidate devices.
Instruction generation module 12
When there is one operating device, the control macro may be split into multiple operating instructions, and the multiple operating instructions may be sent to the queue building module 15. For example, a plurality of execution instructions 2-1, 2-2, …, 2-n are generated based on the data amount of the control macro instruction 2 and the execution data amount of the execution device CPU-1. And generating a plurality of operating instructions 4-1, 4-2, … and 4-n according to the data volume of the control macro instruction 4 and the operating data volume of the operating device GPU-n.
When it is determined that there is one operating device, an operating instruction may be generated according to the control macro instruction and sent to the queue building module 15. For example, one operation instruction 1-1 is generated based on the data amount of the control macro instruction 1 and the operation data amount of the operation device CPU-2.
When a plurality of operating devices are determined, the control macro is split, an operating instruction corresponding to each operating device is generated, and the operating instruction is sent to the queue building module 15. For example, a plurality of operation commands 3-1, 3-2, …, 3-n are generated for the operation device NPU-n and a plurality of operation commands 3 ' -1, 3 ' -2, …, 3 ' -n are generated for the operation device NPU-2, based on the data amount of the control macro 3 and the operation data amount of the operation device NPU-n and the operation data amount of the operation device NPU-2.
Queue building Block 15
When receiving the operation instruction, all the operation instructions to be executed by each operation device are sorted according to the queue sorting rule, a unique corresponding instruction queue is constructed for each operation device according to the sorted operation instructions, and the instruction queue is sent to the instruction dispatching module 16. In particular, the amount of the solvent to be used,
for an operation instruction 1-1 executed by the operation device CPU-2. The instruction queue CPU-2 "constructed corresponding to the execution device CPU-2 includes only the execution instructions 1-1.
For a plurality of execution instructions 2-1, 2-2, …, 2-n executed by the execution device CPU-1. And sequencing the plurality of operating instructions 2-1, 2-2, … and 2-n according to a queue sequencing rule, and constructing an instruction queue CPU-1' corresponding to the operating equipment CPU-1 according to the sequenced plurality of operating instructions 2-1, 2-2, … and 2-n.
For a plurality of execution instructions 3-1, 3-2, …, 3-n executed by the execution device NPU-n. The multiple operating instructions 3-1, 3-2, …, 3-n are sorted according to a queue sorting rule, and an instruction queue NPU-n' corresponding to the operating equipment NPU-n is constructed according to the sorted multiple operating instructions 3-n, …, 3-2, 3-1.
For the plurality of execution instructions 3 ' -1, 3 ' -2, …, 3 ' -n executed by the execution device NPU-2. The plurality of operating instructions 3 '-1, 3' -2, …, 3 '-n are ordered according to a queue ordering rule, and an instruction queue NPU-2 "corresponding to the operating device NPU-2 is constructed according to the ordered plurality of operating instructions 3' -n, …, 3 '-2, 3' -1.
For the plurality of execution instructions 4-1, 4-2, …, 4-n executed by the execution device GPU-n. And sequencing the plurality of operating instructions 4-1, 4-2, … and 4-n according to a queue sequencing rule, and constructing an instruction queue GPU-n' corresponding to the operating equipment GPU-n according to the sequenced plurality of operating instructions 4-1, 4-2, … and 4-n.
Instruction dispatch module 16
After the instruction queues are received, the operation instructions in each instruction queue are sequentially sent to corresponding operation equipment, so that the operation equipment executes the operation instructions. For example, the execution instruction 1-1 included in the instruction queue CPU-2 ″ is sent to its corresponding execution device CPU-2. And sequentially sending a plurality of running instructions 2-1, 2-2, … and 2-n in the instruction queue CPU-1' to the corresponding running equipment CPU-1. And sequentially sending the plurality of operating instructions 3-n, …, 3-2 and 3-1 in the instruction queue NPU-n' to the corresponding operating equipment NPU-n. And sequentially sending the plurality of running instructions 3 '-n, …, 3' -2 and 3 '-1 in the instruction queue NPU-2' to the corresponding running equipment NPU-2. And sequentially sending the multiple operating instructions 4-1, 4-2, … and 4-n in the queue GPU-n' to the corresponding operating equipment GPU-n.
After receiving the instruction queue, the operation device CPU-2, the operation device CPU-1, the operation device NPU-n and the operation device NPU-2 execute the operation instructions in sequence according to the arrangement sequence of the operation instructions in the instruction queue. Taking the operating device CPU-2 as an example, a specific process of executing the received operating instruction will be described. The running device CPU-2 comprises a control module, an execution module and a storage module. The control module comprises an instruction storage submodule, an instruction processing submodule and a storage queue submodule, and the execution module comprises a dependency relationship processing submodule, which refers to the relevant description about the operation equipment in detail.
Assume that the execution instruction 1-1 generated according to the control macro instruction 1 is "@ XXX … …". After receiving the operation instruction 1-1, the operation device CPU-2 executes the operation instruction 1-1 as follows:
the control module of the operating device CPU-2 obtains data, a neural network model and an operating instruction 1-1. The instruction storage submodule is used for storing an operation instruction 1-1. The instruction processing submodule is used for analyzing the operation instruction 1-1, obtaining a plurality of analysis instructions such as an analysis instruction 0, an analysis instruction 1 and an analysis instruction 2, and sending the plurality of analysis instructions to the storage queue submodule and the execution module. The storage queue submodule is used for storing an operation instruction queue, the operation instruction queue comprises an analysis instruction 0, an analysis instruction 1, an analysis instruction 2 and other operation instructions which are required to be executed by the CPU-2 of the operation equipment, and all the instructions are sequentially arranged in the operation instruction queue according to the execution sequence. For example, the obtained sequence of the execution of the multiple analysis instructions is analysis instruction 0, analysis instruction 1, and analysis instruction 2, and there is an association relationship between analysis instruction 1 and analysis instruction 0.
After the execution module of the operating device CPU-2 receives the plurality of analysis instructions, the dependency relationship processing submodule judges whether an association relationship exists among the plurality of analysis instructions. And the dependency relationship processing submodule determines that the analysis instruction 1 and the analysis instruction 0 have an incidence relationship, caches the analysis instruction 1 into the instruction storage submodule, and extracts the analysis instruction 1 from the cache and sends the analysis instruction 1 to the execution module after determining that the analysis instruction 0 is executed, so that the execution module can execute the analysis instruction.
The execution module receives and executes the resolving instruction 0, the resolving instruction 1 and the resolving instruction 2 to complete the operation of the operation instruction 1-1.
The working process of the above modules can refer to the above related description.
Therefore, the system can be used in a cross-platform mode, the applicability is good, the instruction conversion speed is high, the processing efficiency is high, the error probability is low, and the cost of developing manpower and material resources is low.
The present disclosure provides a machine learning arithmetic device, which may include one or more of the above-described control instruction processing systems, for acquiring data to be operated and control information from other processing devices, and executing a specified machine learning operation. The machine learning arithmetic device can obtain a control macro instruction or a control instruction to be executed from other machine learning arithmetic devices or non-machine learning arithmetic devices, and transmit an execution result to peripheral equipment (also called other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one control command processing system is included, the control command processing systems can be linked and transmit data through a specific structure, for example, a PCIE bus is used for interconnection and data transmission, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 4a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface, and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 4b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.
The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 5, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.
The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller. It is appreciated that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transfer may reach 25600 MB/s.
In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.
Interface device 391 is electrically coupled to machine learning chip 389 (or a machine learning chip within a machine learning chip package). The interface device 391 is used to implement data transmission between the machine learning chip 389 and an external device (e.g., a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted to the machine learning chip 289 by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device 391 may also be another interface, and the disclosure does not limit the specific representation of the other interface, and the interface device can implement the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device, which includes the above machine learning chip or board card.
The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.
Fig. 6 shows a flow diagram of a control instruction processing method according to an embodiment of the present disclosure. As shown in fig. 6, the method is applied to the control instruction processing system including the instruction generating device and the execution device described above, and includes step S41 and step S42.
In step S41, the instruction generating device determines an operation device that executes the control macro instruction based on the received control macro instruction, and generates an operation instruction based on the control macro instruction and the operation device.
In step S42, the operation device acquires the data, the neural network model, and the operation instruction, analyzes the operation instruction to obtain a plurality of analysis instructions, and executes the plurality of analysis instructions according to the data to obtain an execution result.
The control macro is used for controlling the instruction stream to jump to the target jump position. The control macro instruction comprises an operation type and a target jump position, and the operation instruction comprises an operation type and a target jump position. The operation target jump position is determined according to the target jump position.
In one possible implementation, step S41 may include: and when the control macro instruction is determined to contain the identification of the specified device, and the resource of the specified device meets the execution condition for executing the control macro instruction, determining the specified device as the operating device. Wherein, the execution condition may include: the designated device contains a set of instructions corresponding to the control macro.
In one possible implementation, the method may further include: and acquiring resource information of the alternative equipment.
Wherein, step S41 may further include: and when the control macro instruction is determined not to contain the identification of the specified equipment, determining running equipment for executing the control macro instruction from the alternative equipment according to the received control macro instruction and the resource information of the alternative equipment. Wherein the resource information may comprise a set of instructions contained by the alternative device.
In one possible implementation, step S41 may further include: and when the control macro instruction is determined to contain the identification of the specified equipment and the resource of the specified equipment does not meet the execution condition for executing the control macro instruction, determining the running equipment according to the control macro instruction and the resource information of the alternative equipment.
In one possible implementation, the method may further include: and sequencing the operating instructions according to a queue sequencing rule through the instruction generating equipment, and constructing an instruction queue corresponding to the operating equipment according to the sequenced operating instructions.
In one possible implementation, the method may further include: and receiving a control instruction to be executed through the instruction generating equipment, and generating a control macro instruction according to the determined identification of the specified equipment and the control instruction to be executed.
In one possible implementation, the method may further include: and sending the operation instruction to the operation equipment through the instruction generation equipment so as to enable the operation equipment to execute the operation instruction.
In a possible implementation manner, sending, by the instruction generating device, the execution instruction to the executing device to cause the executing device to execute the execution instruction may include:
generating an assembly file according to the operation instruction;
translating the assembly file into a binary file;
and sending the binary file to the running equipment so that the running equipment executes the running instruction according to the binary file.
In one possible implementation, the method may further include: the data and scalar data in the data are stored by the running device. The running equipment comprises a storage module, wherein the storage module comprises any combination of a register and a cache, and the cache comprises a temporary cache for high-speed storage and is used for storing data; and the register is used for storing scalar data in the data.
In one possible implementation, the method may further include:
storing the operation instruction through the operation equipment;
analyzing the operation instruction through the operation equipment to obtain a plurality of analysis instructions;
and storing an operation instruction queue through the operation equipment, wherein the operation instruction queue comprises an operation instruction and a plurality of analysis instructions, and the operation instruction queue operation instruction and the plurality of analysis instructions are sequentially arranged according to the executed sequence.
In one possible implementation, the method may further include:
the method comprises the steps that when the running equipment determines that the first analysis instruction and a zero analysis instruction before the first analysis instruction have an incidence relation, the first analysis instruction is cached, and after the execution of the zero analysis instruction is finished, the cached first analysis instruction is executed.
The method for analyzing the data comprises the following steps that an incidence relation exists between a first analysis instruction and a zeroth analysis instruction before the first analysis instruction: the first storage address interval for storing the data required by the first resolving instruction and the zeroth storage address interval for storing the data required by the zeroth resolving instruction have an overlapped area.
In one possible implementation, the running device may be one or any combination of a CPU, a GPU and an NPU.
In one possible implementation, the instruction generating device may be provided in the CPU and/or the NPU.
In one possible implementation, the control macro may include at least one of the following instructions: unconditional jump macro instructions and conditional jump macro instructions. The conditional jump macro may contain a jump condition.
According to the control instruction processing method provided by the embodiment of the disclosure, the instruction generating device determines the operating device executing the control macro instruction according to the received control macro instruction, and generates the operating instruction according to the control macro instruction and the operating device; the data, the neural network model and the operation instruction are obtained through the operation equipment, the operation instruction is analyzed to obtain a plurality of analysis instructions, and the plurality of analysis instructions are executed according to the data to obtain an execution result. The method can be used in a cross-platform mode, and is good in applicability, high in instruction conversion speed, high in processing efficiency, low in error probability, and low in development labor and material cost.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.
The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (21)

1. A control instruction processing system comprising an instruction generating apparatus and an executing apparatus,
the instruction generating apparatus includes:
the device determining module is used for determining running devices for executing the control macro instructions according to the received control macro instructions;
the instruction generating module is used for generating an operating instruction according to the control macro instruction and the operating equipment;
the operation device includes:
the control module is used for acquiring required data, a neural network model and the operation instruction, analyzing the operation instruction and acquiring a plurality of analysis instructions;
the execution module is used for executing the plurality of analysis instructions according to the data to obtain an execution result,
wherein the control macro-instruction is a macro-instruction for controlling the instruction stream to jump to the target jump position,
the control macro instruction comprises an operation type and a target jump position, the operation instruction comprises the operation type and an operation target jump position, and the operation target jump position is determined according to the target jump position.
2. The system of claim 1, wherein the instruction generating device further comprises:
a resource obtaining module for obtaining resource information of the alternative device,
the device determination module comprises at least one of the following sub-modules:
the first determining submodule is used for determining the specified equipment as the running equipment when the control macro instruction is determined to contain the identification of the specified equipment and the resource of the specified equipment meets the execution condition for executing the control macro instruction;
a second determining submodule, configured to determine, when it is determined that the control macro does not include the identifier of the specified device, an operating device for executing the control macro from the candidate device according to the received control macro and the resource information of the candidate device;
a third determining submodule, configured to determine an operating device according to the control macro and the resource information of the candidate device when it is determined that the control macro includes the identifier of the specified device and the resource of the specified device does not satisfy the execution condition for executing the control macro,
wherein the execution condition includes: the specified device contains an instruction set corresponding to the control macro, and the resource information includes an instruction set contained in the alternative device.
3. The system of claim 1, wherein the instruction generating device further comprises:
and the queue construction module is used for sequencing the operating instructions according to a queue sequencing rule and constructing an instruction queue corresponding to the operating equipment according to the sequenced operating instructions.
4. The system of claim 2, wherein the instruction generating device further comprises:
and the macro instruction generating module is used for receiving the control instruction to be executed and generating the control macro instruction according to the determined identifier of the specified equipment and the control instruction to be executed.
5. The system of claim 1, wherein the instruction generating device further comprises:
an instruction dispatching module, configured to send the execution instruction to the execution device, so that the execution device executes the execution instruction,
wherein the instruction dispatch module comprises:
the instruction assembly submodule is used for generating an assembly file according to the operation instruction;
the assembly translation submodule is used for translating the assembly file into a binary file;
and the instruction sending submodule is used for sending the binary file to the operating equipment so as to enable the operating equipment to execute the operating instruction according to the binary file.
6. The system of claim 1, wherein the operational equipment further comprises:
a storage module, the storage module comprises any combination of a register and a cache, the cache comprises a temporary cache,
the cache is used for storing the data;
and the register is used for storing scalar data in the data.
7. The system of claim 1, wherein the control module comprises:
the instruction storage submodule is used for storing the operation instruction;
the instruction processing submodule is used for analyzing the operation instruction to obtain a plurality of analysis instructions;
and the storage queue submodule is used for storing an operation instruction queue, the operation instruction queue comprises the operation instructions and the plurality of analysis instructions, and the operation instructions and the plurality of analysis instructions in the operation instruction queue are sequentially arranged according to the executed sequence.
8. The system of claim 7, wherein the execution module comprises:
the dependency relationship processing submodule is used for caching the first analysis instruction in the instruction storage submodule when determining that the first analysis instruction and a zero analysis instruction before the first analysis instruction have an incidence relationship, extracting the first analysis instruction from the instruction storage submodule after the zero analysis instruction is executed, and sending the first analysis instruction to the execution module,
wherein the association relationship between the first parsing instruction and a zeroth parsing instruction before the first parsing instruction comprises:
and a first storage address interval for storing the data required by the first resolving instruction and a zeroth storage address interval for storing the data required by the zeroth resolving instruction have an overlapped area.
9. The system of claim 1,
the running equipment is one or any combination of a CPU, a GPU and an NPU;
the instruction generation device is arranged in the CPU and/or the NPU;
the control macro-instruction includes at least one of an unconditional jump macro-instruction and a conditional jump macro-instruction, the conditional jump macro-instruction including a jump condition.
10. A machine learning arithmetic device, the device comprising:
one or more control instruction processing systems according to any one of claims 1 to 9, configured to obtain data to be operated and control information from another processing apparatus, perform a specified machine learning operation, and transmit an execution result to the other processing apparatus through an I/O interface;
when the machine learning arithmetic device comprises a plurality of control instruction processing systems, the control instruction processing systems can be connected through a specific structure and transmit data;
the control instruction processing systems are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the control instruction processing systems share the same control system or own respective control systems; the control instruction processing systems share a memory or own respective memories; the interconnection mode of the control instruction processing systems is any interconnection topology.
11. A combined processing apparatus, characterized in that the combined processing apparatus comprises:
the machine learning computing device of claim 110, a universal interconnect interface, and other processing devices;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
12. The utility model provides a board card, its characterized in that, the board card includes: memory device, interface device and control device and machine learning chip comprising a machine learning arithmetic device according to claim 10 or a combined processing device according to claim 13;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
13. A control instruction processing method is applied to a control instruction processing system, the system comprises an instruction generating device and an operating device, and the method comprises the following steps:
determining, by the instruction generating device, an operating device that executes the control macro instruction according to the received control macro instruction, and generating an operating instruction according to the control macro instruction and the operating device;
acquiring data, a neural network model and an operation instruction through the operation equipment, analyzing the operation instruction to obtain a plurality of analysis instructions, executing the plurality of analysis instructions according to the data to obtain an execution result,
wherein the control macro-instruction is a macro-instruction for controlling the instruction stream to jump to the target jump position,
the control macro instruction comprises an operation type and a target jump position, the operation instruction comprises the operation type and an operation target jump position, and the operation target jump position is determined according to the target jump position.
14. The method of claim 13, further comprising:
acquiring resource information of the alternative device through the instruction generating device,
the method comprises the following steps that the instruction generation equipment determines running equipment for executing the control macro instruction according to the received control macro instruction, and comprises at least one of the following steps:
when it is determined that the control macro instruction contains an identifier of a specified device and the resource of the specified device meets an execution condition for executing the control macro instruction, determining the specified device as the running device;
when the control macro instruction is determined not to contain the identifier of the specified device, determining running equipment for executing the control macro instruction from the alternative device according to the received control macro instruction and the resource information of the alternative device;
when the control macro instruction is determined to contain the identification of the specified device and the resource of the specified device does not meet the execution condition for executing the control macro instruction, determining the running device according to the control macro instruction and the resource information of the alternative device,
wherein the execution condition includes: the specified device contains an instruction set corresponding to the control macro, and the resource information includes an instruction set contained in the alternative device.
15. The method of claim 13, further comprising:
and sequencing the operating instructions according to a queue sequencing rule through the instruction generating equipment, and constructing an instruction queue corresponding to the operating equipment according to the sequenced operating instructions.
16. The method of claim 14, further comprising:
and receiving a control instruction to be executed through the instruction generating equipment, and generating the control macro instruction according to the determined identifier of the specified equipment and the control instruction to be executed.
17. The method of claim 13, further comprising:
sending, by the instruction generating apparatus, the execution instruction to the execution apparatus to cause the execution apparatus to execute the execution instruction,
the sending of the operation instruction to the operation device through the instruction generation device so as to enable the operation device to execute the operation instruction includes:
generating an assembly file according to the operation instruction;
translating the assembly file into a binary file;
and sending the binary file to the operating equipment so that the operating equipment executes the operating instruction according to the binary file.
18. The method of claim 13, further comprising:
storing, by the runtime device, the data and scalar data in the data,
wherein the running device comprises a storage module, the storage module comprises any combination of a register and a cache, the cache comprises a temporary cache,
the cache is used for storing the data;
and the register is used for storing scalar data in the data.
19. The method of claim 13, further comprising:
storing, by the operational device, the operational instructions;
analyzing the operation instruction through the operation equipment to obtain a plurality of analysis instructions;
and storing an operation instruction queue through the operation equipment, wherein the operation instruction queue comprises the operation instructions and the plurality of analysis instructions, and the operation instructions and the plurality of analysis instructions in the operation instruction queue are sequentially arranged according to the executed sequence.
20. The method of claim 19, further comprising:
caching the first analysis instruction when the running equipment determines that the first analysis instruction and a zeroth analysis instruction before the first analysis instruction have an incidence relation, executing the cached first analysis instruction after the zeroth analysis instruction is executed,
wherein the association relationship between the first parsing instruction and a zeroth parsing instruction before the first parsing instruction comprises:
and a first storage address interval for storing the data required by the first resolving instruction and a zeroth storage address interval for storing the data required by the zeroth resolving instruction have an overlapped area.
21. The method of claim 13,
the running equipment is one or any combination of a CPU, a GPU and an NPU;
the instruction generation device is arranged in the CPU and/or the NPU;
the control macro-instruction includes at least one of an unconditional jump macro-instruction and a conditional jump macro-instruction, the conditional jump macro-instruction including a jump condition.
CN201811222531.XA 2018-10-19 2018-10-19 Operation method, system and related product Active CN111079916B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811222531.XA CN111079916B (en) 2018-10-19 2018-10-19 Operation method, system and related product
PCT/CN2019/111852 WO2020078446A1 (en) 2018-10-19 2019-10-18 Computation method and apparatus, and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811222531.XA CN111079916B (en) 2018-10-19 2018-10-19 Operation method, system and related product

Publications (2)

Publication Number Publication Date
CN111079916A true CN111079916A (en) 2020-04-28
CN111079916B CN111079916B (en) 2021-01-15

Family

ID=70308448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811222531.XA Active CN111079916B (en) 2018-10-19 2018-10-19 Operation method, system and related product

Country Status (1)

Country Link
CN (1) CN111079916B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703836A (en) * 2021-08-20 2021-11-26 北京空间飞行器总体设计部 SCPI instruction management method for spacecraft power system evaluation

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1360693A (en) * 1999-05-13 2002-07-24 Arc国际美国控股公司 Method and apparatus for jump delay slot control in pipelined processor
CN1782989A (en) * 1996-08-27 2006-06-07 松下电器产业株式会社 Multithreaded processor for processing multiple instruction streams independently of each other by flexibly controlling throughput in each instruction stream
CN102006183A (en) * 2010-11-12 2011-04-06 百度在线网络技术(北京)有限公司 Configuration parameter based Method and configuration equipment for configuring network equipment
CN102541611A (en) * 2010-12-21 2012-07-04 无锡江南计算技术研究所 Instruction translation device and method, instruction processing device and processor
US20160034808A1 (en) * 2012-12-21 2016-02-04 International Business Machines Corporation Hardware architecture for simulating a neural network of neurons
US20160179434A1 (en) * 2014-12-19 2016-06-23 Intel Corporation Storage device and method for performing convolution operations
CN106325819A (en) * 2015-06-17 2017-01-11 华为技术有限公司 Computer instruction processing method, coprocessor and system
US20170124451A1 (en) * 2015-10-28 2017-05-04 Google Inc. Stream-based accelerator processing of computational graphs
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor
CN107103113A (en) * 2017-03-23 2017-08-29 中国科学院计算技术研究所 Towards the Automation Design method, device and the optimization method of neural network processor
CN107450972A (en) * 2017-07-04 2017-12-08 阿里巴巴集团控股有限公司 A kind of dispatching method, device and electronic equipment
CN107886167A (en) * 2016-09-29 2018-04-06 北京中科寒武纪科技有限公司 Neural network computing device and method
CN108416431A (en) * 2018-01-19 2018-08-17 上海兆芯集成电路有限公司 Neural network microprocessor and macro instruction processing method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1782989A (en) * 1996-08-27 2006-06-07 松下电器产业株式会社 Multithreaded processor for processing multiple instruction streams independently of each other by flexibly controlling throughput in each instruction stream
CN1360693A (en) * 1999-05-13 2002-07-24 Arc国际美国控股公司 Method and apparatus for jump delay slot control in pipelined processor
CN102006183A (en) * 2010-11-12 2011-04-06 百度在线网络技术(北京)有限公司 Configuration parameter based Method and configuration equipment for configuring network equipment
CN102541611A (en) * 2010-12-21 2012-07-04 无锡江南计算技术研究所 Instruction translation device and method, instruction processing device and processor
US20160034808A1 (en) * 2012-12-21 2016-02-04 International Business Machines Corporation Hardware architecture for simulating a neural network of neurons
CN107003988A (en) * 2014-12-19 2017-08-01 英特尔公司 Storage device and method for performing convolution algorithm
US20160179434A1 (en) * 2014-12-19 2016-06-23 Intel Corporation Storage device and method for performing convolution operations
CN106325819A (en) * 2015-06-17 2017-01-11 华为技术有限公司 Computer instruction processing method, coprocessor and system
US20170124451A1 (en) * 2015-10-28 2017-05-04 Google Inc. Stream-based accelerator processing of computational graphs
CN107886167A (en) * 2016-09-29 2018-04-06 北京中科寒武纪科技有限公司 Neural network computing device and method
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor
CN107103113A (en) * 2017-03-23 2017-08-29 中国科学院计算技术研究所 Towards the Automation Design method, device and the optimization method of neural network processor
CN107450972A (en) * 2017-07-04 2017-12-08 阿里巴巴集团控股有限公司 A kind of dispatching method, device and electronic equipment
CN108416431A (en) * 2018-01-19 2018-08-17 上海兆芯集成电路有限公司 Neural network microprocessor and macro instruction processing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
阎强: "卷积神经网络处理器的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈一鸣: "基于GPU的深度神经网络优化方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703836A (en) * 2021-08-20 2021-11-26 北京空间飞行器总体设计部 SCPI instruction management method for spacecraft power system evaluation

Also Published As

Publication number Publication date
CN111079916B (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN111079909B (en) Operation method, system and related product
CN111079916B (en) Operation method, system and related product
CN111078291B (en) Operation method, system and related product
CN111078284B (en) Operation method, system and related product
CN111079925B (en) Operation method, device and related product
CN111078293B (en) Operation method, device and related product
CN111079911B (en) Operation method, system and related product
CN111078285B (en) Operation method, system and related product
CN111078281B (en) Operation method, system and related product
CN111079914B (en) Operation method, system and related product
CN111079915B (en) Operation method, device and related product
CN111078125B (en) Operation method, device and related product
CN111079913B (en) Operation method, device and related product
CN111079912B (en) Operation method, system and related product
CN111078280B (en) Operation method, device and related product
CN111079910B (en) Operation method, device and related product
CN111079907B (en) Operation method, device and related product
CN111078283B (en) Operation method, device and related product
CN111079924B (en) Operation method, system and related product
CN111078282B (en) Operation method, device and related product
CN111353595A (en) Operation method, device and related product
CN111325331B (en) Operation method, device and related product
CN112396186A (en) Execution method, device and related product
CN112394990A (en) Floating point to half precision floating point instruction processing device and method and related products
CN111045729A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201217

Address after: Room 611-194, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, hi tech Zone, Hefei City, Anhui Province

Applicant after: Anhui Cambrian Information Technology Co., Ltd

Address before: 100190 room 644, research complex, 6 South Road, Haidian District Science Academy, Beijing.

Applicant before: Zhongke Cambrian Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant