CN112394998A - Operation method, device and related product - Google Patents

Operation method, device and related product Download PDF

Info

Publication number
CN112394998A
CN112394998A CN201910744396.3A CN201910744396A CN112394998A CN 112394998 A CN112394998 A CN 112394998A CN 201910744396 A CN201910744396 A CN 201910744396A CN 112394998 A CN112394998 A CN 112394998A
Authority
CN
China
Prior art keywords
instruction
written
write
machine learning
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910744396.3A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910744396.3A priority Critical patent/CN112394998A/en
Publication of CN112394998A publication Critical patent/CN112394998A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30018Bit or string instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The disclosure relates to an operation method, an operation device and a related product. The machine learning device comprises one or more instruction processing devices, is used for acquiring data to be executed and control information from other processing devices, executes specified machine learning operation and transmits an execution result to other processing devices through an I/O interface; when the machine learning arithmetic device includes a plurality of instruction processing devices, the plurality of instruction processing devices can be connected to each other by a specific configuration to transfer data. The command processing devices are interconnected through a Peripheral Component Interface Express (PCIE) bus and transmit data; the plurality of instruction processing devices share the same control system or own control system and share the memory or own memory; the interconnection mode of the plurality of instruction processing apparatuses is an arbitrary interconnection topology. The operation method, the operation device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high processing efficiency and high processing speed on the write instruction.

Description

Operation method, device and related product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an operation method, an operation device, and a related product.
Background
With the continuous development of science and technology, machine learning, especially neural network algorithms, are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the types and the number of involved data operations are increasing. However, in the existing instructions and technologies, a fixed amount of data writing is often supported, and flexible and dynamic amount of data writing operation of specified data cannot be efficiently supported, so that the execution efficiency is low, and the execution speed is slow.
Disclosure of Invention
In view of the above, the present disclosure provides an operation method, an operation device and a related product, so as to improve efficiency and speed of writing data.
According to a first aspect of the present disclosure, there is provided a write instruction processing apparatus, the apparatus including:
the compiling module is used for compiling the obtained write number instruction to obtain a compiled write number instruction;
the control module is used for analyzing the compiled writing instruction to obtain an operation code and an operation domain of the writing instruction, generating a number to be written according to the operation code, and acquiring a quantity parameter and a target address of the number to be written according to the operation domain;
and the execution module is used for writing the number to be written into the target address according to the quantity parameter.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:
one or more of the write instruction processing apparatuses according to the first aspect described above, configured to acquire data and control information to be executed from another processing apparatus, execute a specified machine learning operation, and transmit an execution result to the other processing apparatus through an I/O interface;
when the machine learning arithmetic device comprises a plurality of the written number instruction processing devices, the written number instruction processing devices can be connected through a specific structure and transmit data;
the write instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the plurality of write instruction processing devices share the same control system or own respective control systems; the plurality of write instruction processing devices share a memory or own memories; the interconnection mode of the plurality of write instruction processing devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:
the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.
According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a write instruction processing method applied to a write instruction processing apparatus, the method including:
compiling the obtained write number instruction to obtain a compiled write number instruction;
analyzing the compiled write instruction to obtain an operation code and an operation domain of the write instruction, generating a to-be-written number according to the operation code, and acquiring a quantity parameter and a target address of the to-be-written number required by executing the write instruction according to the operation domain;
and writing the number to be written into the target address according to the quantity parameter.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The device comprises a compiling module, a control module and an execution module, wherein the compiling module is used for compiling the obtained writing instructions to obtain compiled writing instructions; the control module is used for analyzing the compiled write number instruction to obtain an operation code and an operation domain of the write number instruction, generating a number to be written according to the operation code, and acquiring the number to be written and a target address required by executing the write number instruction according to the operation code and the operation domain; and the execution module is used for writing the number to be written of the number to be written into the target address. The write instruction processing method, the write instruction processing device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high write instruction processing efficiency and high write instruction processing speed.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 1a and 1b show block diagrams of a write number instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 2 a-2 e show block diagrams of a write number instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating an application scenario of a write instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.
Fig. 6 shows a flowchart of a write number instruction processing method according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a compiling module 10, a control module 11 and an execution module 12.
And the compiling module 10 is configured to compile the obtained write instruction to obtain a compiled write instruction.
The control module 11 is configured to parse the compiled write instruction to obtain an operation code and an operation domain of the write instruction, generate a to-be-written number according to the operation code, and obtain a quantity parameter and a target address of the to-be-written number according to the operation domain.
And the execution module 12 is configured to write the number to be written into the target address according to the quantity parameter.
In this embodiment, the write instruction obtained by the compiling module 10 is an uncompiled software instruction that cannot be directly executed by hardware, and the compiling module needs to compile the write instruction (uncompiled) first to obtain a compiled write instruction. The compiled write number instruction is a hardware instruction which can be directly used for hardware execution. The control module 11 parses the compiled write instruction. The control module can obtain the operation code and the operation domain of the write number instruction, so as to obtain the required quantity parameter and the target address of the number to be written.
In this embodiment, the operation code may be a part of an instruction or a field (usually indicated by a code) specified in the computer program to perform an operation, and is an instruction sequence number used to inform a device executing the instruction which instruction needs to be executed specifically. The operation domain may be a source of data required for executing the corresponding instruction, and the data required for executing the corresponding instruction includes parameters such as the number of data to be written and/or addresses such as storage data. For a write instruction it must include an opcode and an operation field, where the operation field includes at least a quantity parameter of the number to be written and a target address. The parameter of the number of the to-be-written number may be the number of the to-be-written number, or may be an address for storing the number of the to-be-written number.
In one possible implementation, the parameter of the number of to-be-written numbers is the number of to-be-written numbers. And compiling the obtained write number instruction by the compiling module to obtain the compiled write number instruction. And the control module analyzes the obtained compiled write number instruction to obtain an operation code and an operation domain of the compiled write number instruction. And generating the number to be written according to the operation code, and directly acquiring the number to be written and the target address according to the operation domain. It is assumed here that the number to be written is 0, the number of numbers to be written is 64, and the target address is 0000. The execution unit performs writing of the number to be written to the target address, that is, writing of 64 0 s from where the address is 0000.
In one possible implementation, the parameter of the number of to-be-written numbers is an address storing the number of to-be-written numbers. And compiling the obtained write number instruction by the compiling module to obtain the compiled write number instruction. And the control module analyzes the obtained compiled write number instruction to obtain an operation code and an operation domain of the compiled write number instruction. And generating the number to be written according to the operation code. And resolving the address for storing the number of the number to be written according to the number parameter of the number to be written in the operation domain, and acquiring the number of the number to be written according to the address of the number to be written. The control unit also obtains a target address according to the operation domain. It is assumed here that the number to be written is 1, the number to be written address is 1000, and the target address is 0000, obtained from the operation code and the operation field. And acquiring the number of the data to be written to be 256 according to the number of the data to be written to the address. The execution unit writes the number of the numbers to be written to the target address, that is, 256 pieces of 1 are written from the place where the address is 0000.
It should be understood that the instruction format of the write number instruction, the instruction format of the compiled write number instruction, and the contained operation code and operation domain may be set as required by those skilled in the art, and the disclosure is not limited thereto.
In this embodiment, the apparatus may include one or more compiling modules, one or more control modules, and one or more execution modules, and the number of the compiling modules, the control modules, and the execution modules may be set according to actual needs, which is not limited in this disclosure.
The write instruction processing device provided by the embodiment of the disclosure comprises a compiling module, a control module and an execution module. The compiling module is used for compiling the obtained writing number instruction to obtain a compiled writing number instruction; the control module is used for analyzing the compiled write number instruction to obtain an operation code and an operation domain of the write number instruction, generating a to-be-written number according to the operation code, and acquiring a quantity parameter to be operated and a target address required by executing the write number instruction according to the operation domain; and the execution module writes the number to be written into the target address according to the quantity parameter. The write instruction processing device provided by the embodiment of the disclosure has a wide application range, can specify the number of the to-be-written numbers through the write instruction, can flexibly write the to-be-written numbers of a specified scale according to the write instruction, and has high processing efficiency and high processing speed.
FIG. 1a shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 1a, the compiling module 10 compiles the obtained write instruction to obtain a compiled write instruction. The control module 11 analyzes the compiled write instruction to obtain an operation code and an operation domain of the write instruction, generates a to-be-written number according to the operation code, and acquires a number parameter and a target address of the to-be-written number required for executing the write instruction according to the operation domain. And distributing the corresponding number of the sub-numbers to be written and the corresponding sub-target addresses for each execution sub-module. The execution module 12 may include an execution sub-module 120, and the execution sub-module 120 is configured to write the number to be written according to the number of the corresponding sub-numbers to be written and the corresponding sub-target addresses. For example, the number of to-be-written is 256 bits, the number of to-be-written is 0, and the destination address is 0000. The number of the sub-module 120 to be written is 256 bits, the sub-number to be written is 0, and the address of the sub-tree is 0000, so that 256 bits of 0 are written to the 0000 address.
FIG. 1b shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 1b, the control module 11 parses the compiled write instruction to obtain an operation code and an operation domain of the write instruction, generates a number to be written according to the operation code, and obtains a number parameter and a target address of the number to be written, which are required for executing the write instruction, according to the operation domain. And distributing the corresponding number of the sub-numbers to be written and the corresponding sub-target addresses for each execution sub-module. The execution module 12 may include a plurality of execution sub-modules 120, and the execution sub-modules 120 are configured to write the number to be written to the corresponding sub-target addresses in series or in parallel according to the number of the corresponding sub-numbers to be written. For example, the execution module 12 may include 4 execution submodules 120, the number of writes is 256 bits, the number of writes is 0, and the destination address is 0000. The addresses of the sub-numbers to be written of the 4 execution submodules are 0000, 0064, 0128 and 0196 respectively, and the number of the corresponding sub-numbers to be written is 64 bits. The 4 execution submodules can complete the zero write operation in parallel, that is, the 4 execution submodules simultaneously and respectively write 64-bit 0 at the 0000 address, the 0064 address, the 0128 address and the 0196 address, thereby completing the requirement of writing 256-bit 0 in parallel at the 0000 address. The 4 execution submodules can also complete the zero writing operation in series, that is, the 4 execution submodules sequentially and respectively write 64-bit 0 at the 0000 address, the 0064 address, the 0128 address and the 0196 address, thereby completing the requirement of serially writing 256-bit 0 at the 0000 address. The sub-to-write addresses of the 4 execution submodules can also be 0000, 0032, 0128 and 0196 respectively, and the corresponding numbers of the sub-to-write numbers are 32 bits, 96 bits, 64 bits and 64 bits respectively. The 4 execution submodules may complete the write zero operation in parallel, such as the 4 execution submodules writing 32 0's, 96 0's, 64 0's simultaneously and respectively from 0000 address, 0064 address, 0128 address, and 0196 address, thereby completing the write 256-bit 0 request from 0000 address. The 4 execution submodules can complete the zero writing operation serially, for example, the 4 execution submodules can also sequentially and respectively write 32 0 s, 96 0 s, 64 0 s and 64 0 s from the 0000 address, the 0064 address, the 0128 address and the 0196 address, thereby completing the requirement of serially writing 256 bit 0 s from the 0000 address.
Fig. 2a shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2a, the execution module 12 may include a master execution sub-module 121 and a plurality of slave execution sub-modules 122.
And the compiling module 10 is configured to compile the obtained write instruction by the compiling module to obtain a compiled write instruction.
The control module 11 is further configured to parse the compiled write instruction to obtain a plurality of sub-write instructions, and send part or all of the required data and/or addresses to the main execution sub-module 121.
And the main execution submodule 121 is used for performing sub-write instruction, data and/or address distribution and transmission with the plurality of slave execution submodules. The main execution submodule can directly transmit the sub-writing instruction, the data and/or the address to the slave execution submodule, and can also transmit the sub-writing instruction, the data and/or the address after distribution. For example, the target address is 100 units (the unit may be 1byte, or multiple bits, such as 4bits, etc.), the number of to-be-written numbers is 40, the slave execution submodules include 4, and one data is 2 units. The main execution submodule directly transmits the parameters to the slave execution submodules, and can also transmit the parameters after distribution, wherein the parameters can be uniformly distributed, namely the address transmitted to the first slave execution submodule is 100 as a target address, and the number of the to-be-written data is 10; the address transmitted to the second slave execution submodule is the target address 120, and the number of the to-be-written data is 10; the address transmitted to the third slave execution submodule is the target address 140, and the number of the to-be-written data is 10; the address transmitted to the fourth slave execution submodule is the target address 160, and the number of the to-be-written numbers is 10. Non-uniform allocation may also be performed as desired. If the first slave execution submodule writes 20 numbers, the target address is 100; the second slave execution submodule writes 10 numbers, and the target address is 140; the third slave execution submodule writes 5 numbers, and the target address is 160; the fourth slave execution submodule writes 5 numbers with a target address of 170.
The slave execution submodule 122 is configured to write the number to be written of the corresponding number to be written to the corresponding target address according to the sub-write number instruction, the data, and/or the address transmitted from the master execution submodule 121. It can be understood that, if the master execution unit directly transmits the parameters to each slave execution submodule, each slave execution submodule needs to complete the write operation after processing according to the known default number; if the main execution unit distributes and transmits the parameters to each slave execution submodule, each slave execution submodule can directly complete the write operation. For example, the target address is 100 units (the unit may be 1byte, or multiple bits, such as 4bits, etc.), the number of to-be-written numbers is 40, the slave execution submodules include 4, and one data is 2 units. The master execution submodule directly transmits the parameters to the slave execution submodule, and the plurality of slave execution submodules adopt a mode of uniformly distributing tasks, so that the number of each to-be-written number is 40/4-10. The first slave execution submodule calculates to obtain a corresponding target address of 100+ (1-1) × 10 × 2 ═ 100 according to the transmitted data; the second slave execution submodule calculates to obtain a corresponding target address of 100+ (2-1) × 10 × 2 ═ 120 according to the transmitted data; the third slave execution submodule calculates to obtain a corresponding target address of 100+ (3-1) × 10 × 2 ═ 140 according to the transmitted data; the fourth slave execution submodule calculates the corresponding target address to be 100+ (4-1) × 10 × 2 ═ 160 according to the transmitted data. If the main execution sub-module is allocated and the allocated parameters are transmitted to the slave execution sub-module, the slave execution sub-module can execute after processing or directly execute. The slave execution submodules can simultaneously write the number to be written of the number to be written into the corresponding target addresses in parallel, and can also sequentially write the number to be written into the corresponding target addresses in series.
It should be noted that, a person skilled in the art may set a connection manner between the master execution sub-module and the plurality of slave execution sub-modules according to actual needs to implement the configuration setting of the execution module, for example, the configuration of the execution module may be an "H" type configuration, an array type configuration, a tree type configuration, and the like, which is not limited in this disclosure.
Fig. 2b shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2b, the execution module 12 may further include one or more branch execution sub-modules 123, where the branch execution sub-module 123 is configured to forward data, addresses and/or execution instructions between the master execution sub-module 121 and the slave execution sub-module 122. The main execution sub-module 121 is connected to one or more branch execution sub-modules 123. Therefore, the main execution submodule, the branch execution submodule and the slave execution submodule in the execution module are connected by adopting an H-shaped structure, and the branch execution submodule forwards data, addresses and/or sub-writing instructions, so that the resource occupation of the main execution submodule is saved, and the instruction processing speed is further improved.
Fig. 2c shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2c, a plurality of slave execution submodules 122 are distributed in an array.
Each slave execution submodule 122 is connected with other adjacent slave execution submodules 122, the master execution submodule 121 is connected with k slave execution submodules 122 in the plurality of slave execution submodules 122, and the k slave execution submodules 122 are: the n slave execution sub-modules 122 of the 1 st row, the n slave execution sub-modules 122 of the m th row, and the m slave execution sub-modules 122 of the 1 st column.
As shown in fig. 2c, the k slave execution sub-modules only include the n slave execution sub-modules in the 1 st row, the n slave execution sub-modules in the m th row, and the m slave execution sub-modules in the 1 st column, that is, the k slave execution sub-modules are slave execution sub-modules directly connected to the master execution sub-module from among the plurality of slave execution sub-modules. The k slave execution sub-modules are used for forwarding data and instructions between the master execution sub-module and the plurality of slave execution sub-modules. Therefore, the plurality of slave execution submodules are distributed in an array, the instruction speed of the master execution submodule for sending data, addresses and/or sub-write numbers to the slave execution submodules can be increased, and the instruction processing speed is further increased.
Fig. 2d shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2d, the execution module may further include a tree sub-module 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master execution submodule 121, and the plurality of branch ports 402 are connected to the plurality of slave execution submodules 122, respectively. The tree sub-module 124 has a transceiving function, and is used for forwarding data, addresses and/or sub-write instructions between the master execution sub-module 121 and the slave execution sub-module 122. Therefore, the execution modules are connected in a tree structure under the action of the tree sub-modules, and the forwarding function of the tree sub-modules is utilized, so that the speed of the main execution sub-module for sending data, addresses and/or sub-write instructions to the slave execution sub-modules can be increased, and the processing speed of the instructions is increased.
In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one level of nodes. The nodes are line structures with forwarding functions, and the nodes do not have operation functions. The lowest level node is connected to the slave execution submodule to forward data, address and/or sub-write instructions between the master execution submodule 121 and the slave execution submodule 122. In particular, if the tree submodule has zero level nodes, the apparatus does not require the tree submodule.
In one possible implementation, the tree submodule 124 may include a plurality of nodes of an n-ary tree structure, and the plurality of nodes of the n-ary tree structure may have a plurality of layers.
For example, fig. 2e shows a block diagram of a write number instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 2e, the n-ary tree structure may be a binary tree structure, with the tree sub-modules comprising 2 levels of nodes 01. The lowest level node 01 is connected with the slave execution submodule 122 to forward data, address and/or sub-write instructions between the master execution submodule 121 and the slave execution submodule 122.
In this implementation, the n-ary tree structure may also be a ternary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of n in the n-ary tree structure and the number of layers of nodes in the n-ary tree structure may be set by those skilled in the art as needed, and the disclosure is not limited thereto.
In one possible implementation, as shown in fig. 1, 1a, 1b, and 2 a-2 e, the apparatus may further include a storage module 13. The storage module 13 is configured to store the number to be written, which is the number to be written.
In this implementation, the storage module may include one or more of a memory, a cache, and a register, and the cache may include a scratch pad cache. The number to be written and the number to be written of the number to be written may be stored in the memory, the cache and/or the register of the storage module as needed, which is not limited in this disclosure.
In a possible implementation manner, the apparatus may further include a direct memory access module for reading or storing data from the storage module.
In a possible implementation manner, as shown in fig. 1, fig. 1a, fig. 1b, and fig. 2a to fig. 2e, the control module 11 may be configured to determine whether the number of to-be-written numbers is an integer multiple of a preset number, and execute only an instruction that the number of to-be-written numbers is an integer multiple of the preset number. The preset number may be data specified in advance by a user, or may be data related to a hardware structure. For example, when the preset number is 64, only instructions with the number of to-be-written numbers being integer multiples of 64 are executed, such as 64, 128, and the like.
In the implementation mode, the number of the data to be written and the hardware result or the unit of the data address specified by the user can be ensured not to generate contradiction, so that the management is convenient, and the management of the data address is convenient.
In a possible implementation manner, as shown in fig. 1, fig. 1a, fig. 1b, and fig. 2a to fig. 2e, the control module 11 may be configured to determine whether the number of to-be-written numbers is an integer multiple of a preset number. And when the number of the to-be-written numbers is not the integral multiple of the preset number, adjusting the number of the to-be-written numbers to be the integral multiple of the preset number of the to-be-written numbers. For example, when the number of to-be-written numbers is 128 and the preset number is 64, the control unit 11 determines that the number of to-be-written numbers is an integer multiple of the preset number, and directly executes the instruction. When the number of the to-be-written numbers is 75 and the preset number is 32, the control module 11 determines that the number of the to-be-written numbers is not an integer multiple of the preset number, and may perform upward adjustment according to the setting of the user or the requirement of the hardware, for example, upward adjustment to the integer multiple of the preset number closest to the number of the to-be-written numbers, that is, not less than the number of the to-be-written numbers, and the integer multiple of the preset number closest to the number of the to-be-written numbers. One way of calculation is to use a further approximation method to round the ratio between the number to be written and a preset number and to use the product of the rounded ratio and the preset number to obtain an integer multiple of the preset number. As corresponding to the above example, when the number of the to-be-written numbers is 75 and the preset number is 32, the number of the to-be-written numbers is adjusted to
Figure BDA0002165071810000101
The number of the writing data to be written is not more than the number of the writing data, and the number of the writing data to be written is the nearest integer multiple of the preset number. One way of calculation is to round the ratio of the number to be written to a preset number by using a tail-removing approximation method, and obtain an integer multiple of the preset number by using the product of the rounded ratio and the preset number. As corresponding to the above example, when the number of the to-be-written numbers is 75 and the preset number is 32, the number of the to-be-written numbers is adjusted to
Figure BDA0002165071810000102
In the implementation mode, the number of the data to be written and the hardware result or the unit of the data address designated by the user can not be inconsistent, the flexibility can be improved, the burden of a programmer on managing the data address is reduced, and the data address can be conveniently managed.
In one possible implementation, as shown in fig. 1, 1a, 1b, and 2 a-2 e, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.
The instruction storage submodule 111 is used for storing compiled instructions, including write data instructions.
The instruction processing sub-module 112 is configured to parse the compiled write instruction to obtain an operation code and an operation domain of the write instruction.
The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes a plurality of compiled instructions, including write-number instructions, that are sequentially arranged according to an execution order.
In this implementation manner, the instruction queue may be obtained by arranging the execution order of the plurality of compiled write instructions according to the receiving time, the priority level, and the like of the compiled instructions, so as to sequentially execute the plurality of compiled instructions according to the instruction queue, where the plurality of compiled instructions includes the write instructions.
In one possible implementation, as shown in fig. 1, 1a, 1b, and 2 a-2 e, the control module 11 may further include a dependency processing sub-module 114.
The dependency relationship processing submodule 114 is configured to, when it is determined that a first to-be-executed instruction in the multiple to-be-executed instructions is associated with a zeroth to-be-executed instruction before the first to-be-executed instruction, cache the first to-be-executed instruction in the instruction storage submodule 111, and after the zeroth to-be-executed instruction is executed, extract the first to-be-executed instruction from the instruction storage submodule 111 and send the first to-be-executed instruction to the operation module 12.
The method for determining the zero-th instruction to be executed before the first instruction to be executed has an incidence relation with the first instruction to be executed comprises the following steps: the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area.
The first instruction to be executed may include a compiled write instruction.
By the method, the compiled write-number instruction can be executed after the execution of the compiled instruction is finished according to the dependency relationship between the compiled write-number instruction and the compiled instruction before, so that the accuracy of the operation result is ensured.
In one possible implementation, the instruction format of the write number instruction may be:
write_zero dst,num
where write _ zero is an operation code of the write number instruction, where the number to be written, which indicates the number to be written, is 0. dst, num is the operation domain of the write number instruction. Wherein dst is the target address, and num is the number parameter of the number to be written, i.e. the number of the number to be written.
In one possible implementation, the instruction format of the write number instruction may be:
write_zero_addr dst,num_addr
where write _ zero _ addr is an operation code of a write number instruction, where the number to be written, which represents the number to be written, is 0. dst, num _ addr is the operation field of the write number instruction. Wherein dst is a target address, and num _ addr is a parameter of the number of to-be-written numbers, which is an address for storing the number of to-be-written numbers.
In one possible implementation, the instruction format of the write number instruction may be:
write_one dst,num
wherein, write _ one is an operation code of the write number instruction, wherein the number to be written, which represents the number to be written, is 1. dst, num is the operation domain of the write number instruction. Wherein dst is the target address, and num is the number parameter of the number to be written, i.e. the number of the number to be written.
In one possible implementation, the instruction format of the write number instruction may be:
write_one_addr dst,num_addr
the write _ one _ addr is an operation code of a write number instruction, wherein the write number indicating the write number is 0. dst, num _ addr is the operation field of the write number instruction. The dst is a target address, and the num _ addr number parameter of the number to be written is an address for storing the number to be written.
In one possible implementation, the compiling module 10 may be further configured to generate an assembly file according to the writing number instruction, and translate the assembly file into a binary file, where the binary file is a compiled writing number instruction.
It should be understood that the location of the opcode, opcode and operand field in the instruction format for the write number instruction may be set by one skilled in the art as desired and is not limited by this disclosure.
In one possible implementation manner, the apparatus may be disposed in one or more of a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), and an embedded Neural Network Processor (NPU).
It should be noted that, although the above-mentioned embodiments are described as examples of the write instruction processing apparatus, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Application example
An application example according to the embodiment of the present disclosure is given below in conjunction with "performing write operation with a write instruction processing apparatus" as an exemplary application scenario to facilitate understanding of a flow of the write instruction processing apparatus. It is understood by those skilled in the art that the following application examples are merely for the purpose of facilitating understanding of the embodiments of the present disclosure and should not be construed as limiting the embodiments of the present disclosure
Fig. 3 is a schematic diagram illustrating an application scenario of a write instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the process of the write instruction processing means processing the write instruction is as follows:
in a possible implementation manner, the compiling module 10 compiles the obtained write number instruction 1 (e.g. the write zero instruction 1 is @ write _ zero #500#64) to obtain a compiled write number instruction 1. The control module 11 analyzes the obtained compiled write instruction 1 to obtain an operation code and an operation domain of the write instruction 1. The operation of the number writing instruction 1 is to write 0, the target address is 500, and the number to be written is 64. The control module 11 generates a number to be written, i.e. 0. The execution module 12 performs a write number operation, i.e., writes 64 0's to the target address 500.
In one possible implementation manner, the compiling module 10 compiles the obtained write number instruction 2 (e.g. the write zero instruction 2 is @ write _ zero #500#250) to obtain a compiled write number instruction 2. The control module 11 analyzes the obtained compiled write instruction 2 to obtain an operation code and an operation domain of the write instruction 2. The operation code of the write number instruction 2 is write 0, the target address is 500, and the number of the to-be-written numbers is 250. The hardware characteristic determines the preset number to be 64. The control module 11 generates a number to be written, i.e. 0. The control module 11 determines that the number of the to-be-written data is not an integer multiple of the preset number, and rounds the number up to the nearest integer multiple of the preset number, that is, the number is the integer multiple of the preset number
Figure BDA0002165071810000131
The execution module 12 performs a write number operation, i.e., writes 256 0's to the target address 500.
In a possible implementation manner, the compiling module 10 compiles the obtained write number instruction 3 (e.g. the write-one instruction 3 is @ write _ one #1000#250) to obtain a compiled write number instruction 3. The control module 11 analyzes the obtained compiled write instruction 3 to obtain an operation code and an operation domain of the write instruction 3. The operation code of the write number instruction 3 is write 1, the target address is 1000, and the number to be written is 250. The user specifies a preset number of 64. The control module 11 generates a number to be written, i.e. 1. The control module 11 determines that the number of the to-be-written data is not an integer multiple of the preset number, and rounds the to-be-written data down to the nearest integer multiple of the preset number according to a mode designated by a user, that is, the number of the to-be-written data is an integer multiple of the preset number
Figure BDA0002165071810000132
The execution module 12 performs a write number operation, i.e., writes 192 1's to the target address 1000.
The working process of the above modules can refer to the above related description.
Thus, the written instruction processing device can efficiently and quickly process the written instruction.
The present disclosure provides a machine learning arithmetic device, which may include one or more of the above-described write instruction processing devices, for acquiring data to be executed and control information from other processing devices, and executing a specified machine learning arithmetic. The machine learning arithmetic device can obtain the write number command from other machine learning arithmetic device or non-machine learning arithmetic device, and transmit the execution result to the peripheral equipment (also called other processing device) through the I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one write command processing device is included, the write command processing devices can be linked and transmit data through a specific structure, for example, the write command processing devices are interconnected and transmit data through a PCIE bus, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 4a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface, and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 4b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.
The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 5, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.
The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 of memory cells may be a ddr SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller.
In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.
Interface device 391 is electrically coupled to machine learning chip 389 (or a machine learning chip within a machine learning chip package). The interface device 391 is used to implement data transmission between the machine learning chip 389 and an external device (e.g., a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted to the machine learning chip 289 by the server through the standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device 391 may also be another interface, and the disclosure does not limit the specific representation of the other interface, and the interface device can implement the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device, which includes the above machine learning chip or board card.
The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.
Fig. 6 shows a flowchart of a write number instruction processing method according to an embodiment of the present disclosure. As shown in fig. 6, the method is applied to the above-described write number instruction processing apparatus, and includes step S50, step S51, and step S52.
In step S50, the obtained write count instruction is compiled to obtain a compiled write count instruction.
In step S51, the compiled write instruction is parsed to obtain an operation code and an operation domain of the write instruction, a to-be-written number is generated according to the operation code, and a number parameter and a target address of the to-be-written number required for executing the write instruction are obtained according to the operation domain.
In step S52, the number to be written is written to the target address according to the quantity parameter.
In a possible implementation manner, step S51 may include, when the number parameter of the number to be written is the number to be written, parsing the obtained compiled write instruction to obtain an operation code and an operation domain of the compiled write instruction. And generating the number to be written according to the operation code, and directly acquiring the number to be written and the target address according to the operation domain.
In a possible implementation manner, step S51 may include, when the number parameter of the number to be written is an address for storing the number to be written, parsing the obtained compiled write instruction to obtain an operation code and an operation domain of the compiled write instruction. And generating the number to be written according to the operation code. And resolving the addresses of the number to be written according to the number parameter of the number to be written in the operation domain, and acquiring the number to be written according to the addresses of the number to be written. The control unit also obtains a target address according to the operation domain.
In one possible implementation, step S52 may include:
and writing the number to be written of the number to be written into the target address by utilizing one execution submodule according to the number to be written.
In one possible implementation, step S52 may include:
distributing the corresponding number of the sub-numbers to be written and the corresponding sub-target addresses for the execution sub-modules;
and writing the number to be written into the corresponding sub-target address according to the number of the corresponding sub-numbers to be written.
In a possible implementation manner, the compiled write number instruction is analyzed to obtain a plurality of sub write number instructions and required data and/or addresses; allocating and transmitting sub-write instructions, data and/or addresses; and writing the number to be written of the number to be written into the target address according to the sub-writing number instruction, the data and/or the address transmitted from the main execution sub-module.
In a possible implementation manner, the number of the to-be-written numbers is an integer multiple of a preset number.
In a possible implementation manner, whether the number of the to-be-written numbers is an integer multiple of a preset number is judged, and when the number of the to-be-written numbers is not the integer multiple, the number of the to-be-written numbers is adjusted to be the integer multiple of the preset number.
In a possible implementation manner, adjusting the number of to-be-written numbers to be an integer multiple of the preset number may include:
and increasing the number of the numbers to be written to be an integral multiple of the preset number closest to the number of the numbers to be written.
In a possible implementation manner, adjusting the number of to-be-written numbers to be an integer multiple of the preset number may include:
and reducing the number of the numbers to be written to be the integral multiple of the preset number closest to the number of the numbers to be written.
In a possible implementation manner, parsing the compiled write instruction to obtain an operation code and an operation domain of the write instruction may include:
storing compiled instructions, including compiled write instructions;
analyzing the compiled write instruction to obtain an operation code and an operation domain of the write instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of compiled instructions which are sequentially arranged according to an execution sequence, and the compiled instructions comprise write instructions.
In one possible implementation, the method may further include:
caching the compiled first write instruction cache when determining that the compiled first write instruction in the plurality of compiled write instructions has an incidence relation with a compiled zeroth instruction before the compiled first write instruction, executing the compiled first write instruction after the execution of the compiled zeroth instruction is finished,
the association relationship between the compiled first write instruction and the compiled zeroth instruction before the compiled first write instruction may include:
and a first storage address interval for storing the compiled data required by the first write instruction and a compiled zeroth storage address interval for storing the compiled data required by the zeroth instruction have an overlapped area.
It should be noted that, although the above embodiments are described as examples of the write number instruction processing method, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
The write instruction processing method provided by the embodiment of the disclosure has the advantages of wide application range, high write instruction processing efficiency and high write instruction processing speed.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present disclosure, it should be understood that the disclosed system and apparatus may be implemented in other ways. For example, the above-described embodiments of systems and apparatuses are merely illustrative, and for example, a division of a device, an apparatus, and a module is merely a logical division, and an actual implementation may have another division, for example, a plurality of modules may be combined or integrated into another system or apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices, apparatuses or modules, and may be an electrical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.
The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The foregoing may be better understood in light of the following clauses:
clause a1, a write instruction processing apparatus, characterized in that the apparatus comprises:
the compiling module is used for compiling the obtained write number instruction to obtain a compiled write number instruction;
the control module is used for analyzing the compiled write number instruction to obtain an operation code and an operation domain of the write number instruction, generating a number to be written according to the operation code, and acquiring a number parameter and a target address of the number to be written according to the operation domain;
and the execution module is used for writing the number to be written into the target address according to the quantity parameter.
Clause a2, the apparatus of clause a1, wherein,
the parameter of the number of the to-be-written numbers may be the number of the to-be-written numbers, or may be an address for storing the number of the to-be-written numbers.
Clause A3, the apparatus of clause a2, wherein,
and the control module is also used for acquiring the number of the data to be written according to the address for storing the number of the data to be written.
Clause a4, the apparatus of clause a1, wherein,
the execution module comprises one or more execution sub-modules;
the control module is also used for distributing the corresponding number of the sub-numbers to be written and the corresponding sub-target addresses for the execution sub-modules;
and the execution submodule is used for writing the number to be written into the corresponding sub target address according to the number of the corresponding sub numbers to be written.
Clause a5, the apparatus of clause a1, wherein,
the execution module comprises a main execution submodule and a plurality of slave execution submodules,
the control module is further configured to parse the compiled write instruction to obtain a plurality of sub-write instructions, and to send the required data and/or addresses to the main execution sub-module;
the main execution submodule is used for carrying out sub-writing instruction, data and/or address distribution and transmission with the plurality of slave execution submodules;
and the slave execution submodule is used for writing the number to be written of the number to be written into the target address according to the sub-writing number instruction, the data and the address transmitted by the master execution submodule.
Clause a6, the apparatus of clauses a 1-5, further comprising:
and the storage module is used for storing the number to be written of the number to be written.
Clause a7, the apparatus of clause a1, wherein,
the number of the data to be written is integral multiple of the preset number.
Clause A8, the apparatus of clause a1, wherein,
the control unit is further configured to determine whether the number of the to-be-written numbers is an integer multiple of a preset number, and adjust the number of the to-be-written numbers to the integer multiple of the preset number of the to-be-written numbers when the number of the to-be-written numbers is not the integer multiple.
Clause a9, the apparatus of clause A8, wherein the adjusting the number of to-be-written numbers to an integer multiple of the preset number of the number of to-be-written numbers comprises
And increasing the number of the numbers to be written to be an integral multiple of the preset number closest to the number of the numbers to be written.
Clause a10, the apparatus of clause A8, wherein the adjusting the number of to-be-written numbers to an integer multiple of the preset number of the number of to-be-written numbers comprises
And reducing the number of the numbers to be written to be an integral multiple of the preset number closest to the number of the numbers to be written.
Clause a11, the apparatus of clause a1, wherein the control module comprises:
the instruction storage submodule is used for storing compiled instructions, including the compiled writing instructions;
the instruction processing submodule is used for analyzing the compiled write number instruction to obtain an operation code and an operation domain of the compiled write number instruction;
and the queue storage submodule is used for storing an instruction queue, and the instruction queue comprises a plurality of compiled instructions which are sequentially arranged according to an execution sequence and comprise compiled write-number instructions.
Clause a12, the apparatus of clause a11, wherein the control module further comprises:
the dependency relationship processing submodule is used for caching a first instruction to be executed in the instruction storage submodule when the fact that the first instruction to be executed in the plurality of instructions to be executed is associated with a zeroth instruction to be executed before the first instruction to be executed is determined, extracting the first instruction to be executed from the instruction storage submodule after the zeroth instruction to be executed is executed, and sending the first instruction to be executed to the operation module,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area;
the first to-be-executed instruction comprises a compiled write instruction.
Clause a13, the apparatus of clause a1, wherein,
the control module is also used for generating an assembly file according to the write number instruction and translating the assembly file into a binary file,
wherein the binary file is the compiled write number instruction.
Clause a14, a machine learning arithmetic device, comprising:
one or more write instruction processing apparatus as described in any of clauses a1 to clause a13, configured to obtain data and control information to be executed from other processing apparatus, execute a specified machine learning operation, and transmit an execution result to other processing apparatus through an I/O interface;
when the machine learning arithmetic device comprises a plurality of the written number instruction processing devices, the written number instruction processing devices can be connected through a specific structure and transmit data;
the write instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the plurality of write instruction processing devices share the same control system or own respective control systems; the plurality of write instruction processing devices share a memory or own memories; the interconnection mode of the plurality of write instruction processing devices is any interconnection topology.
Clause a15, a combination processing device, characterized in that the combination processing device comprises:
the machine learning computing device, universal interconnect interface, and other processing device of clause a 14;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
Clause a16, a machine learning chip, wherein the machine learning chip comprises:
the machine learning computing device of clause a14 or the combined processing device of clause a 15.
Clause a17, an electronic device, characterized in that the electronic device comprises:
the machine learning chip of clause a 16.
Clause a18, a board card, wherein the board card comprises: a memory device, an interface device and a control device and a machine learning chip as described in clause a 16;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
Clause a19, a write instruction processing method, applied to a write instruction processing apparatus, the method comprising:
compiling the obtained write number instruction to obtain a compiled write number instruction;
analyzing the compiled write number instruction to obtain an operation code and an operation domain of the write number instruction, generating a number to be written according to the operation code, and acquiring a number parameter and a target address of the number to be written, which are required by executing the write number instruction, according to the operation code and the operation domain;
and writing the number to be written into the target address according to the quantity parameter.
Clause a20, the method of clause a19, wherein,
the number parameter of the number to be written is the number of the number to be written or an address for storing the number of the number to be written.
Clause a21, the method of clause a20, wherein,
and acquiring the number of the data to be written according to the address for storing the number of the data to be written.
Clause a22, the method of clause a19, wherein,
distributing the corresponding number of the sub-numbers to be written and the corresponding sub-target addresses for the execution sub-modules;
and writing the number to be written into the corresponding sub-target address according to the number of the corresponding sub-numbers to be written.
Clause a23, the method of clause a 19-clause a22, wherein,
analyzing the compiled writing instructions to obtain a plurality of sub writing instructions and required data and/or addresses;
allocating and transmitting sub-write instructions, data and/or addresses;
and writing the number to be written of the number to be written into the target address according to the sub-writing number instruction, the data and/or the address transmitted from the main execution sub-module.
Clause a24, the method of clause a19, further comprising:
and storing the number to be written of the number to be written.
Clause a25, the method of clause a19, wherein,
the number of the data to be written is integral multiple of the preset number.
Clause a26, the method of clause a19, wherein,
and judging whether the number of the numbers to be written is an integral multiple of a preset number, and if not, adjusting the number of the numbers to be written to be the integral multiple of the preset number of the number to be written.
Clause a27, the method of clause a26, wherein the adjusting the number of to-be-written numbers to an integer multiple of the preset number of the number of to-be-written numbers comprises
And increasing the number of the numbers to be written to be an integral multiple of the preset number closest to the number of the numbers to be written.
Clause a28, the method of clause a26, wherein the adjusting the number of to-be-written numbers to an integer multiple of the preset number of the number of to-be-written numbers comprises
And reducing the number of the numbers to be written to be an integral multiple of the preset number closest to the number of the numbers to be written.
Clause a29 and the method according to clause a19, wherein the parsing the obtained compiled write instruction to obtain an operation code and an operation field of the compiled write instruction includes:
storing compiled instructions, including the compiled write instructions;
analyzing the compiled write number instruction to obtain an operation code and an operation domain of the compiled write number instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of compiled instructions which are sequentially arranged according to an execution sequence, and the compiled instructions comprise compiled write data instructions.
Clause a30, the method of clause a29, further comprising:
when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions is associated with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling to execute the first to-be-executed instruction,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area;
the first to-be-executed instruction comprises a compiled write instruction.
Clause a31, the method according to clause a19, wherein compiling the obtained write number instruction to obtain a compiled write number instruction comprises:
and generating an assembly file according to the writing number instruction, and translating the assembly file into a binary file, wherein the binary file is the compiled writing number instruction.
Clause a32, a non-transitory computer readable storage medium having computer program instructions stored thereon that, when executed by a processor, implement the method of any of clauses a 19-a 31.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A write instruction processing apparatus, characterized in that the apparatus comprises:
the compiling module is used for compiling the obtained write number instruction to obtain a compiled write number instruction;
the control module is used for analyzing the compiled writing instruction to obtain an operation code and an operation domain of the writing instruction, generating a number to be written according to the operation code, and acquiring a quantity parameter and a target address of the number to be written according to the operation domain;
and the execution module is used for writing the number to be written into the target address according to the quantity parameter.
2. The apparatus of claim 1,
the parameter of the number of the to-be-written numbers may be the number of the to-be-written numbers, or may be an address for storing the number of the to-be-written numbers.
3. The apparatus of claim 2,
and the control module is also used for acquiring the number of the data to be written according to the address for storing the number of the data to be written.
4. A machine learning arithmetic device, the device comprising:
one or more write instruction processing devices according to any one of claims 1 to 3, configured to obtain data and control information to be executed from other processing devices, execute a specified machine learning operation, and transmit an execution result to the other processing devices through the I/O interface;
when the machine learning arithmetic device comprises a plurality of the written number instruction processing devices, the written number instruction processing devices can be connected through a specific structure and transmit data;
the write instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; the plurality of write instruction processing devices share the same control system or own respective control systems; the plurality of write instruction processing devices share a memory or own memories; the interconnection mode of the plurality of write instruction processing devices is any interconnection topology.
5. A combined processing apparatus, characterized in that the combined processing apparatus comprises:
the machine learning computing device, the universal interconnect interface, and the other processing device of claim 4;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
6. A machine learning chip, the machine learning chip comprising:
the machine learning arithmetic device according to claim 4 or the combined processing device according to claim 5.
7. An electronic device, characterized in that the electronic device comprises:
the machine learning chip of claim 6.
8. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface device and a control device and a machine learning chip according to claim 6;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
9. A method for processing a write instruction, the method being applied to a write instruction processing apparatus, the method comprising:
compiling the obtained write number instruction to obtain a compiled write number instruction;
analyzing the compiled write number instruction to obtain an operation code and an operation domain of the write number instruction, generating a number to be written according to the operation code, and acquiring a number parameter and a target address of the number to be written, which are required by executing the write number instruction, according to the operation code and the operation domain;
and writing the number to be written into the target address according to the quantity parameter.
10. A non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of claim 9.
CN201910744396.3A 2019-08-13 2019-08-13 Operation method, device and related product Pending CN112394998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910744396.3A CN112394998A (en) 2019-08-13 2019-08-13 Operation method, device and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910744396.3A CN112394998A (en) 2019-08-13 2019-08-13 Operation method, device and related product

Publications (1)

Publication Number Publication Date
CN112394998A true CN112394998A (en) 2021-02-23

Family

ID=74602562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910744396.3A Pending CN112394998A (en) 2019-08-13 2019-08-13 Operation method, device and related product

Country Status (1)

Country Link
CN (1) CN112394998A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0266627A (en) * 1988-08-31 1990-03-06 Texas Instr Japan Ltd Parallel instruction executing processor
JPH1196018A (en) * 1997-09-22 1999-04-09 Fujitsu Ltd Compiling device, its method and computer-readable recording medium recording compiling execution program
JP2005235007A (en) * 2004-02-20 2005-09-02 Sony Corp Data storage device and method
KR20090046568A (en) * 2007-11-06 2009-05-11 삼성전자주식회사 Flash memory system and writing method of thereof
US20100042817A1 (en) * 2008-08-15 2010-02-18 Apple Inc. Shift-in-right instructions for processing vectors
CN106940684A (en) * 2016-01-05 2017-07-11 青岛海信电器股份有限公司 A kind of method and device pressed than feature data
US20180276035A1 (en) * 2015-10-08 2018-09-27 Shanghai Zhaoxin Semiconductor Co., Ltd. Neural network unit that interrupts processing core upon condition
US20190035473A1 (en) * 2017-07-25 2019-01-31 Western Digital Technologies, Inc. Group write operations for a data storage device
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0266627A (en) * 1988-08-31 1990-03-06 Texas Instr Japan Ltd Parallel instruction executing processor
JPH1196018A (en) * 1997-09-22 1999-04-09 Fujitsu Ltd Compiling device, its method and computer-readable recording medium recording compiling execution program
JP2005235007A (en) * 2004-02-20 2005-09-02 Sony Corp Data storage device and method
KR20090046568A (en) * 2007-11-06 2009-05-11 삼성전자주식회사 Flash memory system and writing method of thereof
US20100042817A1 (en) * 2008-08-15 2010-02-18 Apple Inc. Shift-in-right instructions for processing vectors
US20180276035A1 (en) * 2015-10-08 2018-09-27 Shanghai Zhaoxin Semiconductor Co., Ltd. Neural network unit that interrupts processing core upon condition
CN106940684A (en) * 2016-01-05 2017-07-11 青岛海信电器股份有限公司 A kind of method and device pressed than feature data
US20190035473A1 (en) * 2017-07-25 2019-01-31 Western Digital Technologies, Inc. Group write operations for a data storage device
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
CN110096310B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN112394998A (en) Operation method, device and related product
CN112394999A (en) Operation method, device and related product
CN112395003A (en) Operation method, device and related product
CN111353595A (en) Operation method, device and related product
CN111382851A (en) Operation method, device and related product
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN111381873A (en) Operation method, device and related product
CN111381872A (en) Operation method, device and related product
CN111382390B (en) Operation method, device and related product
CN111045729A (en) Operation method, device and related product
CN111047027A (en) Operation method, device and related product
CN112394985A (en) Execution method, device and related product
CN111222633A (en) Operation method, device and related product
CN111047028A (en) Operation method, device and related product
CN112396186A (en) Execution method, device and related product
CN112394902A (en) Device and method for processing half-precision floating point to floating point instruction and related products
CN112394990A (en) Floating point to half precision floating point instruction processing device and method and related products
CN112394991A (en) Floating point to half precision floating point instruction processing device and method and related products
CN112394986A (en) Device and method for processing half-precision floating point to floating point instruction and related products
CN112394987A (en) Short shaping to half precision floating point instruction processing device, method and related product
CN112394989A (en) Unsigned to half-precision floating point instruction processing device, method and related product
CN112394903A (en) Short shaping to half precision floating point instruction processing device, method and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination