CN111338694B - Operation method, device, computer equipment and storage medium - Google Patents

Operation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111338694B
CN111338694B CN201910636077.0A CN201910636077A CN111338694B CN 111338694 B CN111338694 B CN 111338694B CN 201910636077 A CN201910636077 A CN 201910636077A CN 111338694 B CN111338694 B CN 111338694B
Authority
CN
China
Prior art keywords
vector data
migration
instruction
processing
migrated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910636077.0A
Other languages
Chinese (zh)
Other versions
CN111338694A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to PCT/CN2019/110167 priority Critical patent/WO2020073925A1/en
Publication of CN111338694A publication Critical patent/CN111338694A/en
Application granted granted Critical
Publication of CN111338694B publication Critical patent/CN111338694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The present disclosure relates to an arithmetic method, an apparatus, a computer device, and a storage medium. Wherein the combined processing device comprises: a machine learning arithmetic device, a universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with other processing devices to jointly complete the calculation operation designated by the user, wherein the combined processing device further comprises: and the storage device is respectively connected with the machine learning arithmetic device and the other processing devices and is used for storing the data of the machine learning arithmetic device and the other processing devices. The operation method, the operation device, the computer equipment and the storage medium provided by the embodiment of the disclosure have wide application range, high instruction processing efficiency and high instruction processing speed.

Description

Operation method, operation device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a vector data migration instruction processing method and apparatus, a computer device, and a storage medium.
Background
With the continuous development of science and technology, machine learning, especially neural network algorithms, are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the types and the number of involved data operations are increasing. In the related art, the efficiency and speed of the migration processing of vector data are low.
Disclosure of Invention
In view of the above, the present disclosure provides a vector data migration instruction processing method, apparatus, computer device and storage medium to improve efficiency and speed of migration processing on vector data.
According to a first aspect of the present disclosure, there is provided a vector data migration instruction processing apparatus, the apparatus comprising:
the control module is used for analyzing the obtained vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing;
the processing module stores the vector data to be migrated into the target address according to the migration parameters,
the operation code is used for indicating that the processing of the vector data migration instruction on the vector data is migration processing, the operation domain comprises a vector data address to be migrated and the target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type of the migration processing.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:
one or more vector data migration instruction processing devices according to the first aspect, configured to obtain vector data to be migrated and control information from another processing device, execute a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;
when the machine learning arithmetic device comprises a plurality of vector data migration instruction processing devices, the vector data migration instruction processing devices can be connected through a specific structure and transmit data;
the vector data migration instruction processing devices are interconnected through a Peripheral Component Interface Express (PCIE) bus and transmit data so as to support larger-scale machine learning operation; the vector data migration instruction processing devices share the same control system or own respective control systems; the vector data migration instruction processing devices share a memory or own respective memories; the interconnection mode of the vector data migration instruction processing devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:
the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.
According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a vector data migration instruction processing method, which is applied to a vector data migration instruction processing apparatus, the apparatus including a control module and a processing module, the method including:
analyzing the obtained vector data migration instruction by using a control module to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing;
storing the vector data to be migrated into the target address by using a processing module according to the migration parameters,
the operation code is used for indicating that the processing of the vector data migration instruction on the vector data is migration processing, the operation domain comprises a vector data address to be migrated and the target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type of the migration processing.
According to a ninth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above vector data migration instruction processing method.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The vector data migration instruction processing method, the vector data migration instruction processing device, the computer equipment and the storage medium provided by the embodiment of the disclosure comprise a control module and a processing module. The control module is used for analyzing the obtained vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing; and the processing module stores the vector data to be migrated into the target address according to the migration parameters. The vector data instruction processing method, the vector data instruction processing device, the computer equipment and the storage medium provided by the embodiment of the disclosure have wide application range, high processing efficiency and high processing speed for vector data instructions, and high processing efficiency and high processing speed for vector data migration.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 illustrates a block diagram of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 2 a-2 e show block diagrams of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating an application scenario of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.
FIG. 6 shows a flow diagram of a vector data migration instruction processing method according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, not all embodiments of the present disclosure. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.
It should be understood that the terms "zero," "first," "second," and the like in the claims, the description, and the drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Due to the wide use of neural network algorithms, the computing man power of computer hardware is continuously improved, and the types and the number of data operations involved in practical application are continuously improved. Because the variety of programming languages is various, in order to implement the migration of vector data under different language environments, in the related art, because no vector data migration instruction which can be widely applied to various programming languages exists at the present stage, a technician needs to customize one or more instructions corresponding to the programming language environment to implement the vector data migration, which results in low efficiency and low speed of the vector data migration. The present disclosure provides a vector data migration instruction processing method, apparatus, computer device, and storage medium, which can implement vector data migration with only one instruction, and can significantly improve the efficiency and speed of performing vector data migration.
Fig. 1 illustrates a block diagram of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a control module 11 and a processing module 12.
The control module 11 is configured to analyze the obtained vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, obtain vector data to be migrated and a target address, which are required for executing the vector data migration instruction, according to the operation code and the operation domain, and determine a migration parameter required for migration processing. The operation code is used for indicating that the vector data migration instruction performs migration processing on the vector data, the operation domain includes a to-be-migrated vector data address and a target address, and the migration parameters may include an initial storage space where the to-be-migrated vector data address is located, a target storage space where the target address is located, and a migration type for performing migration processing.
And the processing module 12 stores the vector data to be migrated into the target address according to the migration parameters.
In this embodiment, the vector data to be migrated may be one or more. The migration type may indicate the vector data storage speed of the initial storage space, the vector data storage speed of the target storage space, and the fast and slow relationship between the two storage speeds. In the vector data migration instruction, different codes can be set for the storage speed relation between different target storage spaces and the initial storage space, and the storage speed is distinguished. For example, a code of the migration type "the storage speed of the initial storage space is greater than that of the target storage space" may be set to "st". The code of the migration type "the storage speed of the initial storage space is equal to the storage speed of the target storage space" may be set to "mv". The code of the migration type "the storage speed of the initial storage space is smaller than that of the target storage space" may be set to "ld". The migration type and the code of the migration type can be set by those skilled in the art according to actual needs, and the disclosure does not limit this.
In this embodiment, the migration parameters may include the names, numbers, and other identifications of the initial storage space and the target storage space to represent the initial storage space and the target storage space.
In this embodiment, the initial storage space and the target storage space may be NRAM, WRAM, DRAM, registers, and the like of the device for storing data, and the DRAM may include a ldam, a GDRAM, and the like. Among them, nram (nano tube Random Access memory) is a nonvolatile memory based on Carbon Nanotube (CNT). Wram (window RAM) is one of VRAM (Video RAM). DRAM (dynamic Random Access memory) is a dynamic Random Access memory. LDRAMs are Local DRAMs, which may be DRAMs unique to a computing core in the device. The GDRAM is Global DRAM, and can be shared by a plurality of computing cores in the device. The computation core is a unit, a module, or the like that performs data operations in the device, and is a processing module as described below.
In this embodiment, the vector data migration instruction obtained by the control module is a hardware instruction which does not need to be compiled and can be directly executed by hardware, and the control module may analyze the obtained vector data migration instruction. The control module can obtain the vector data to be migrated from the vector data address to be migrated. The control module may obtain instructions and data through a data input output unit, which may be one or more data I/O interfaces or I/O pins.
In this embodiment, the operation code may be a part of an instruction or a field (usually indicated by a code) specified in the computer program to perform an operation, and is an instruction sequence number used to inform a device executing the instruction which instruction needs to be executed specifically. The operation domain may be a source of all data required for executing the corresponding instruction, and all data required for executing the corresponding instruction includes a target address, a vector data address to be migrated, a migration parameter for performing migration processing, and the like. For a vector data migration instruction it must include an opcode and an operation field, where the operation field includes at least the vector data address to be migrated and the target address.
It should be understood that one skilled in the art may set the vector data migration instruction format and the included opcodes and operation fields as needed, and the disclosure is not limited thereto.
In this embodiment, the apparatus may include one or more control modules and one or more processing modules, and the number of the control modules and the number of the processing modules may be set according to actual needs, which is not limited by this disclosure. When the apparatus includes a control module, the control module may receive the vector data migration instruction and control one or more processing modules to perform the migration process. When the apparatus includes a plurality of control modules, the plurality of control modules may receive the vector data migration instruction, respectively, and control the corresponding one or more processing modules to perform the migration processing.
The vector data migration instruction processing device provided by the embodiment of the disclosure comprises a control module and a processing module. The control module is used for analyzing the obtained vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing. And the processing module is used for storing the vector data to be migrated into the target address according to the migration parameters. The vector data migration instruction processing device provided by the embodiment of the disclosure has a wide application range, high processing efficiency and high processing speed for vector data migration instructions, and high processing efficiency and high processing speed for vector data migration.
Figure 2a illustrates a block diagram of a vector data instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2a, the processing module 12 may include a master processing sub-module 121 and a plurality of slave processing sub-modules 122.
The main processing sub-module 121 is configured to process the vector data to be migrated to obtain processed vector data to be migrated, and store the processed vector data to be migrated in the target address. The processing performed on the vector data to be migrated includes processing such as data type conversion, and the vector data to be migrated may also be stored directly without being processed, which is not limited in this disclosure.
In a possible implementation manner, the control module 11 is further configured to analyze the obtained calculation instruction to obtain an operation domain and an operation code of the calculation instruction, and obtain data to be operated, which is required for executing the calculation instruction, according to the operation domain and the operation code. The processing module 12 is further configured to perform an operation on the data to be operated according to the calculation instruction to obtain a calculation result of the calculation instruction. The processing module may include a plurality of operators for performing operations corresponding to operation types of the calculation instructions.
In this implementation, the calculation instruction may be other instructions for performing arithmetic operations, logical operations, and the like on data such as scalars, vectors, matrices, tensors, and the like, and those skilled in the art may set the calculation instruction according to actual needs, which is not limited by the present disclosure.
In this implementation, the arithmetic unit may include an adder, a divider, a multiplier, a comparator, and the like, which are capable of performing arithmetic operations, logical operations, and the like on data. The type and number of the arithmetic units may be set according to the requirements such as the size of the data amount of the arithmetic operation to be performed, the type of the arithmetic operation, the processing speed and efficiency of the arithmetic operation on the data, and the disclosure does not limit the types and the number.
In a possible implementation manner, the control module 11 is further configured to analyze the calculation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main processing sub-module 121.
The main processing sub-module 121 is configured to perform preamble processing on data to be operated, and transmit data and operation instructions with the plurality of slave processing sub-modules 122.
The slave processing sub-module 122 is configured to perform an intermediate operation in parallel according to the data and the operation instruction transmitted from the master processing sub-module 121 to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master processing sub-module 122.
The main processing sub-module 121 is further configured to perform subsequent processing on the plurality of intermediate results to obtain a calculation result of the calculation instruction, and store the calculation result in the corresponding address.
In this implementation, when the calculation instruction is an operation performed on scalar or vector data, the apparatus may control the main processing sub-module to perform an operation corresponding to the calculation instruction by using an operator of the main processing sub-module. When the calculation instruction is to perform an operation on data having a dimension greater than or equal to 2, such as a matrix or a tensor, the apparatus may control the slave processing sub-module to perform an operation corresponding to the calculation instruction by using an operator therein.
It should be noted that, a person skilled in the art may set the connection manner between the master processing sub-module and the multiple slave processing sub-modules according to actual needs to implement the configuration setting of the processing module, for example, the configuration of the processing module may be an "H" type configuration, an array type configuration, a tree type configuration, and the like, which is not limited in this disclosure.
Fig. 2b illustrates a block diagram of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2b, the processing module 12 may further include one or more branch processing sub-modules 123, where the branch processing sub-module 123 is configured to forward data and/or operation instructions between the master processing sub-module 121 and the slave processing sub-module 122. Wherein, the main processing sub-module 121 is connected with one or more branch processing sub-modules 123. Therefore, the main processing sub-module, the branch processing sub-module and the auxiliary processing sub-module in the processing module are connected by adopting an H-shaped structure, and data and/or operation instructions are forwarded by the branch processing sub-module, so that the resource occupation of the main processing sub-module is saved, and the instruction processing speed is further improved.
Fig. 2c illustrates a block diagram of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2c, a plurality of slave processing sub-modules 122 are distributed in an array.
Each slave processing sub-module 122 is connected with other adjacent slave processing sub-modules 122, the master processing sub-module 121 connects k slave processing sub-modules 122 of the plurality of slave processing sub-modules 122, the k slave processing sub-modules 122 are: the n slave processing sub-modules 122 of row 1, the n slave processing sub-modules 122 of row m, and the m slave processing sub-modules 122 of column 1.
As shown in fig. 2c, the k slave processing sub-modules only include the n slave processing sub-modules in the 1 st row, the n slave processing sub-modules in the m th row, and the m slave processing sub-modules in the 1 st column, that is, the k slave processing sub-modules are slave processing sub-modules directly connected to the master processing sub-module from among the multiple slave processing sub-modules. The k slave processing sub-modules are used for forwarding data and instructions between the main processing sub-module and the plurality of slave processing sub-modules. Therefore, the plurality of slave processing sub-modules are distributed in an array, the speed of sending data and/or operation instructions from the main processing sub-module to the slave processing sub-modules can be increased, and the instruction processing speed is further increased.
Figure 2d illustrates a block diagram of a vector data migration instruction processing apparatus, according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2d, the processing module may further include a tree sub-module 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master processing submodule 121, and the plurality of branch ports 402 are connected to the plurality of slave processing submodules 122, respectively. The tree sub-module 124 has a transceiving function, and is used for forwarding data and/or operation instructions between the master processing sub-module 121 and the slave processing sub-module 122. Therefore, the processing modules are connected in a tree-shaped structure under the action of the tree-shaped sub-modules, and the speed of sending data and/or operation instructions to the slave processing sub-modules by the main processing sub-modules can be increased by utilizing the forwarding function of the tree-shaped sub-modules, so that the processing speed of the instructions is increased.
In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one level of nodes. The nodes are line structures with forwarding functions, and the nodes do not have operation functions. The lowest level node is connected to the slave processing submodule to forward data and/or operation instructions between the master processing submodule 121 and the slave processing submodule 122. In particular, if the tree submodule has zero level nodes, the apparatus does not require the tree submodule.
In one possible implementation, the tree submodule 124 may include a plurality of nodes of an n-ary tree structure, and the plurality of nodes of the n-ary tree structure may have a plurality of layers.
For example, fig. 2e illustrates a block diagram of a vector data migration instruction processing apparatus, according to an embodiment of the present disclosure. As shown in fig. 2e, the n-ary tree structure may be a binary tree structure, with the tree sub-modules comprising 2 levels of nodes 01. The lowest node 01 is connected with the slave processing submodule 122 to forward data and/or operation instructions between the master processing submodule 121 and the slave processing submodule 122.
In this implementation, the n-ary tree structure may also be a ternary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of n in the n-ary tree structure and the number of layers of nodes in the n-ary tree structure may be set by those skilled in the art as needed, and the disclosure is not limited thereto.
In one possible implementation, the operation domain may further include a vector data migration amount. The control module 11 is further configured to determine a vector data migration amount according to the operation domain, and obtain vector data to be migrated corresponding to the vector data migration amount from the vector data address to be migrated.
In this implementation, the vector data migration amount may be a data amount of the acquired vector data to be migrated.
In one possible implementation, a default vector data migration amount may be preset. When the vector data migration amount is not included in the operation domain, the default vector data migration amount may be determined as the vector data migration amount of the current vector data migration instruction. And then obtaining the vector data to be migrated corresponding to the vector data migration volume from the vector data address to be migrated.
In a possible implementation manner, when the vector data migration amount is not included in the operation domain, all the vector data to be migrated stored in the operation domain may be directly obtained from the vector data address to be migrated.
In one possible implementation, the operational domain may also include migration parameters. Determining migration parameters required for performing migration processing may include: and determining migration parameters required for the migration processing according to the operation domain.
In one possible implementation, the opcode may also be used to indicate a migration parameter. Determining migration parameters required for the migration processing may include: and determining migration parameters required by the migration processing according to the operation codes.
In one possible implementation, default migration parameters may also be set. When the migration parameter of the current vector data migration instruction cannot be determined according to the operation domain and the operation code, the default migration parameter may be determined as the migration parameter of the current vector data migration instruction.
In a possible implementation manner, the initial storage space and the target storage space corresponding to the vector data address to be migrated may be determined according to the data address and the target address of the vector to be migrated, and the migration parameter may be determined according to parameters such as the storage speed and the storage space type of the initial storage space and the target storage space.
In one possible implementation, as shown in fig. 2 a-2 e, the apparatus may further include a storage module 13. The storage module 13 is used for storing vector data to be migrated.
In this implementation, the storage module may include one or more of a cache and a register, and the cache may include a temporary cache and may further include at least one NRAM (Neuron Random Access Memory). And the cache is used for storing the data to be operated and the vector data to be migrated. And the register is used for storing scalar data in the data to be operated.
In one possible implementation, the cache may include a neuron cache. The neuron cache, that is, the neuron random access memory, may be configured to store neuron data in data to be operated on, where the neuron data includes neuron vector data.
In a possible implementation manner, the apparatus may further include a direct memory access module for reading or storing data from the storage module.
In one possible implementation, as shown in fig. 2 a-2 e, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.
The instruction storage submodule 111 is used for storing a vector data migration instruction.
The instruction processing sub-module 112 is configured to parse the vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction.
The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes multiple instructions to be executed that are sequentially arranged according to an execution order, and the multiple instructions to be executed may include a vector data migration instruction.
In this implementation, the instructions to be executed may also include computational instructions related or unrelated to vector data migration, which is not limited by this disclosure. The execution sequence of the multiple instructions to be executed can be arranged according to the receiving time, the priority level and the like of the instructions to be executed to obtain an instruction queue, so that the multiple instructions to be executed can be sequentially executed according to the instruction queue.
In one possible implementation, as shown in fig. 2 a-2 e, the control module 11 may include a dependency processing sub-module 114.
The dependency relationship processing submodule 114 is configured to, when it is determined that a first to-be-executed instruction in the plurality of to-be-executed instructions has an association relationship with a zeroth to-be-executed instruction before the first to-be-executed instruction, cache the first to-be-executed instruction in the instruction storage submodule 111, and after the zeroth to-be-executed instruction is executed, extract the first to-be-executed instruction from the instruction storage submodule 111 and send the first to-be-executed instruction to the processing module 12.
The method for determining the zero-th instruction to be executed before the first instruction to be executed has an incidence relation with the first instruction to be executed comprises the following steps: the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area. On the contrary, there is no association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, which may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
By the method, according to the dependency relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, the subsequent first to-be-executed instruction is executed after the execution of the prior zeroth to-be-executed instruction is finished, and the accuracy of a result is guaranteed.
In one possible implementation, the instruction format of the vector data migration instruction may be:
vector dst src type.space1.space2 size
wherein, vector is the operation code of the vector data migration instruction, and dst, src0, type. Where dst is a target address, src is an address of vector data to be migrated, and when there are multiple vector data to be migrated, src may include multiple addresses src0, src1, …, src n of the vector data to be migrated, which is not limited by this disclosure. type, space1, space2 is a migration parameter, type in type, space1, space2 represents a migration type, space1 in type, space1, space2 represents an initial storage space where a vector data address src to be migrated is located, and space2 in type, space1, space2 represents a target storage space where a target address dst is located. size is the amount of vector data migration.
In one possible implementation, the instruction format of the vector data migration instruction may also be:
type.space1.space2 dst src size
where type, space1, space2 is the opcode of the vector data migration instruction, dst, src, size are the operation domains of the vector data migration instruction. Where dst is a target address, src is an address of vector data to be migrated, and when there are multiple vector data to be migrated, src may include multiple addresses src0, src1, …, src n of the vector data to be migrated, which is not limited by this disclosure. size is the amount of vector data migration. Type in the opcode type, space1, space2 represents a migration type, space1 in the type, space1, space2 represents an initial storage space where a vector data address src to be migrated is located, and space2 in the type, space1, space2 represents a target storage space where a target address dst is located.
Wherein, type can be ld, st, mv. The migration type denoted by ld is "the storage speed of the initial storage space is smaller than that of the target storage space". The migration type denoted by st is "the storage speed of the initial storage space is greater than that of the target storage space". The migration type denoted by mv is "the storage speed of the initial storage space equals the storage speed of the target storage space".
In one possible implementation manner, the instruction format of the vector data migration instruction with the migration type of "the storage speed of the initial storage space is less than the storage speed of the target storage space" may be set as: space1.space2dst src0 size. According to the vector data migration size, the initial storage space1, the target storage space2 and the migration type ld, vector data to be migrated with the data size of vector data migration size are obtained from a vector data address src0 to be migrated in the initial storage space1, and are stored in a target address dst in the target storage space 2. Wherein the storage speed of the initial storage space1 is less than that of the target storage space 2.
In one possible implementation manner, the instruction format of the vector data migration instruction with the migration type of "the storage speed of the initial storage space is greater than that of the target storage space" may be set as follows: space1.space2dst src0 size. According to the vector data migration amount size, the initial storage space1, the target storage space2 and the migration type st, acquiring vector data to be migrated with the data amount being the vector data migration amount size from a vector data address src0 to be migrated in the initial storage space1, and storing the vector data to be migrated in a target address dst in the target storage space 2. Wherein the storage speed of the initial storage space1 is greater than that of the target storage space 2.
In one possible implementation manner, the instruction format of the vector data migration instruction with the migration type of "the storage speed of the initial storage space is equal to the storage speed of the target storage space" may be set as: space1.space2dst src0 size. According to the vector data migration amount size, the initial storage space1, the target storage space2 and the migration type st, acquiring vector data to be migrated with the amount of the vector data migration amount size from the vector data address src0 to be migrated in the initial storage space1, and storing the vector data to be migrated into the target address dst in the target storage space 2. Wherein the storage speed of the initial storage space1 is equal to the storage speed of the target storage space 2.
It should be understood that one skilled in the art may set the location of the opcode, opcode and operand field in the instruction format for the vector data migration instruction as desired, and the disclosure is not limited thereto.
In one possible implementation manner, the apparatus may be disposed in one or more of a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), and an embedded Neural Network Processor (NPU).
It should be noted that, although the vector data migration instruction processing apparatus has been described above by taking the above-described embodiment as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Application example
An application example according to the embodiment of the present disclosure is given below in conjunction with "data migration using a vector data migration instruction processing apparatus" as an exemplary application scenario to facilitate understanding of a flow of the vector data migration instruction processing apparatus. It is to be understood by those skilled in the art that the following application examples are for the purpose of facilitating understanding of the embodiments of the present disclosure only and are not to be construed as limiting the embodiments of the present disclosure.
Fig. 3 is a schematic diagram illustrating an application scenario of a vector data migration instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the vector data migration instruction processing apparatus processes the vector data migration instruction as follows:
the control module 11 parses the obtained vector data migration instruction 1 (for example, the vector data migration instruction 1 is ld.200.3005004005), and obtains an operation code and an operation domain of the vector data migration instruction 1. The operation code of the vector data migration instruction 1 is ld, the initial storage space is 200, the target storage space is 300, the target address is 500, the address of the vector data to be migrated is 400, and the migration amount of the vector data is 5. It can be determined from the operation code ld that the storage speed of the initial storage space 200 is less than that of the target storage space 300. The control module 11 obtains the vector data to be migrated with the data volume of 5 from the vector data address 400 to be migrated in the initial storage space of 200. The processing module 12 stores the vector data to be migrated into the target address 500 in the target storage space 300 according to the migration parameter.
The vector data migration instruction 1 may be ld.200.3005004005, vector 500400 ld.200.3005, or the like, and the processing procedures of the two are similar and will not be described again.
The working process of the above modules can refer to the above related description.
Thus, the vector data migration instruction processing device can efficiently and quickly process the vector data migration instruction, and the processing efficiency and the processing speed for vector data migration are high.
The present disclosure provides a machine learning arithmetic device, which may include one or more of the above vector data migration instruction processing devices, and is configured to acquire vector data to be migrated and control information from other processing devices, and perform a specified machine learning arithmetic. The machine learning arithmetic device can obtain the vector data migration instruction from other machine learning arithmetic devices or non-machine learning arithmetic devices, and transmit the execution result to peripheral equipment (also called other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one vector data migration instruction processing device is included, the vector data migration instruction processing devices can be linked and transmit data through a specific structure, for example, the vector data migration instruction processing devices are interconnected and transmit data through a PCIE bus, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be a separate memory for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 4a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface, and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 4b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some components are such as camera, display, mouse, keyboard, network card, wifi interface.
The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.
The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 5, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.
The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller. It is appreciated that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transfer may reach 25600 MB/s.
In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.
Interface device 391 is electrically coupled to machine learning chip 389 (or a machine learning chip within a machine learning chip package). The interface device 391 is used to implement data transmission between the machine learning chip 389 and an external device (e.g., a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted to the machine learning chip 289 by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device 391 may also be another interface, and the disclosure does not limit the specific representation of the other interface, and the interface device can implement the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may comprise a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, which may carry a plurality of loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device, which includes the above machine learning chip or board card.
The electronic device may include a data processing apparatus, a computer device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.
FIG. 6 shows a flow diagram of a vector data migration instruction processing method according to an embodiment of the present disclosure. The method can be applied to computer equipment and the like comprising a memory and a processor, wherein the memory is used for storing data used in the process of executing the method; the processor is used for executing relevant processing and operation steps, such as the steps S51 and S52. As shown in fig. 6, the method is applied to the vector data migration instruction processing apparatus described above, and includes step S51 and step S52.
In step S51, the control module is used to analyze the obtained vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, and obtain vector data to be migrated and a target address required for executing the vector data migration instruction according to the operation code and the operation domain, and determine migration parameters required for migration processing. The operation code is used for indicating that the vector data migration instruction performs migration processing on the vector data, the operation domain comprises a vector data address to be migrated and a target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type for migration processing.
In step S52, the processing module stores the vector data to be migrated into the target address according to the migration parameters.
In one possible implementation, the processing module includes a master processing sub-module and a plurality of slave processing sub-modules. Wherein, the step S52 may include:
and processing the vector data to be migrated to obtain processed vector data to be migrated, and storing the processed vector data to be migrated into the target address.
In one possible implementation, the operation domain may also include a vector data migration amount. Obtaining the vector data to be migrated and the target address required for executing the vector data migration instruction according to the operation code and the operation domain may include:
and determining the vector data migration volume according to the operation domain, and acquiring vector data to be migrated corresponding to the vector data migration volume from the vector data address to be migrated.
In one possible implementation, the operational domain may also include migration parameters. Determining migration parameters required for the migration processing may include: and determining migration parameters required for the migration processing according to the operation domain.
In one possible implementation, the opcode may also be used to indicate a migration parameter. Determining migration parameters required for the migration processing may include: and determining migration parameters required by the migration processing according to the operation codes.
In one possible implementation, the method further includes: storing the vector data to be migrated by using a storage module of the device,
wherein the memory module comprises at least one of a register and a cache,
the cache is used for storing data to be operated and vector data to be migrated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
and the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
In one possible implementation, step S51 may include:
a storage vector data migration instruction;
analyzing the vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise vector data migration instructions.
In one possible implementation, the method may further include:
when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relation with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling the execution of the first to-be-executed instruction,
the method for determining the zero-th instruction to be executed before the first instruction to be executed has an incidence relation with the first instruction to be executed comprises the following steps:
the first storage address interval for storing the data required by the first to-be-executed instruction and the zeroth storage address interval for storing the data required by the zeroth to-be-executed instruction have an overlapped area.
It should be noted that, although the vector data migration instruction processing method is described above by taking the above-mentioned embodiment as an example, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
The vector data migration instruction processing method provided by the embodiment of the disclosure has the advantages of wide application range, high processing efficiency and high processing speed for vector data migration instructions, and high processing efficiency and high processing speed for vector migration.
The present disclosure also provides a non-transitory computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the above vector data migration instruction processing method.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
It should be further noted that, although the steps in the flowchart of fig. 6 are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It should be understood that the above-described apparatus embodiments are merely exemplary, and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.
In addition, unless otherwise specified, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.
If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. Unless otherwise specified, the storage module may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory (rram), Dynamic Random Access Memory (dram), Static Random Access Memory (SRAM), enhanced Dynamic Random Access Memory (edram), High-Bandwidth Memory (HBM), hybrid Memory cube (hmc), and the like.
The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The foregoing may be better understood in light of the following clauses:
clause a1, a vector data migration instruction processing apparatus, the apparatus comprising:
the control module is used for compiling the obtained vector data migration instruction to obtain a compiled vector data migration instruction, analyzing the compiled vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address which are required for executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required for migration processing;
the processing module stores the vector data to be migrated into the target address according to the migration parameters,
the operation code is used for indicating that the processing of the vector data migration instruction on the vector data is migration processing, the operation domain comprises a vector data address to be migrated and the target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type of the migration processing.
Clause a2, the apparatus of clause a1, the processing module comprising a master processing submodule and a plurality of slave processing submodules,
and the main processing sub-module is used for processing the vector data to be migrated to obtain processed vector data to be migrated, and storing the processed vector data to be migrated into the target address.
Clause A3, the device of clause a1, the operation domain further comprising an amount of vector data migration,
the control module is further configured to determine the vector data migration amount according to the operation domain, and acquire the vector data to be migrated corresponding to the vector data migration amount from the vector data address to be migrated.
Clause a4, the apparatus of clause a1, the operational domain further comprising migration parameters,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation domain.
Clause a5, the apparatus of clause a1, the opcode further to indicate a migration parameter,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation codes.
Clause a6, the apparatus of clause a1, further comprising:
a storage module for storing the vector data to be migrated,
wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing data to be operated and the vector data to be migrated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
Clause a7, the apparatus of clause a1, the control module comprising:
the instruction storage submodule is used for storing the vector data migration instruction;
the instruction processing submodule is used for analyzing the vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction;
and the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the vector data migration instruction.
Clause A8, the apparatus of clause a7, the control module further comprising:
the dependency relationship processing submodule is used for caching a first to-be-executed instruction in the instruction storage submodule when the fact that the incidence relationship exists between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction is determined, extracting the first to-be-executed instruction from the instruction storage submodule after the zeroth to-be-executed instruction is executed, and sending the first to-be-executed instruction to the processing module,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.
Clause a9, a machine learning computing device, the device comprising:
one or more vector data migration instruction processing apparatuses according to any one of clauses a 1-clause A8, configured to obtain vector data to be migrated and control information from another processing apparatus, perform a specified machine learning operation, and transmit an execution result to the other processing apparatus through an I/O interface;
when the machine learning arithmetic device comprises a plurality of vector data migration instruction processing devices, the vector data migration instruction processing devices can be connected through a specific structure and transmit data;
the vector data migration instruction processing devices are interconnected through a Peripheral Component Interface Express (PCIE) bus and transmit data so as to support larger-scale machine learning operation; the plurality of the vector data migration instruction processing devices share the same control system or own respective control systems; the vector data migration instruction processing devices share a memory or own respective memories; the interconnection mode of the vector data migration instruction processing devices is any interconnection topology.
Clause a10, a combined processing device, comprising:
the machine learning computing device, universal interconnect interface, and other processing device of clause a 9;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation appointed by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
Clause a11, a machine learning chip, the machine learning chip comprising:
the machine learning computing device of clause a9 or the combined processing device of clause a 10.
Clause a12, an electronic device, comprising:
the machine learning chip of clause a 11.
Clause a13, a board, the board comprising: a memory device, an interface device and a control device and a machine learning chip as described in clause a 11;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
Clause a14, a vector data migration instruction processing method, the method being applied to a vector data migration instruction processing apparatus, the apparatus including a control module and a processing module, the method including:
analyzing the obtained vector data migration instruction by using a control module to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing;
storing the vector data to be migrated into the target address by using a processing module according to the migration parameters,
the operation code is used for indicating that the processing of the vector data migration instruction on the vector data is migration processing, the operation domain comprises a vector data address to be migrated and the target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type of the migration processing.
Clause a15, the method of clause a14, the processing module comprising a master processing submodule and a plurality of slave processing submodules,
storing the vector data to be migrated into the target address according to the migration parameters, wherein the storing comprises:
and processing the vector data to be migrated to obtain processed vector data to be migrated, and storing the processed vector data to be migrated into the target address.
Clause a16, the method of clause a14, the operation field further comprising an amount of vector data migration,
obtaining vector data to be migrated and a target address required for executing a vector data migration instruction according to the operation code and the operation domain, wherein the obtaining comprises:
and determining the vector data migration volume according to the operation domain, and acquiring vector data to be migrated corresponding to the vector data migration volume from the vector data address to be migrated.
Clause a17, the method of clause a14, the operational domain further comprising migration parameters,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation domain.
Clause a18, the method of clause a14, the opcode further being for indicating migration parameters,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation codes.
Clause a19, the method of clause a14, the method further comprising:
storing the vector data to be migrated using a storage module of the device,
wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing data to be operated and vector data to be migrated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, and the neuron data comprises neuron vector data.
Clause a20, according to the method described in clause a14, parsing the obtained vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, including:
storing the vector data migration instruction;
analyzing the vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the vector data migration instruction.
Clause a21, the method of clause a20, the method further comprising:
when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relationship with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling to execute the first to-be-executed instruction,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.
Clause a22, a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of clauses a 14-21.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (22)

1. An apparatus for vector data migration instruction processing, the apparatus comprising:
the control module is used for compiling the obtained vector data migration instruction to obtain a compiled vector data migration instruction, analyzing the compiled vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address which are required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing;
the processing module stores the vector data to be migrated into the target address according to the migration parameters,
the operation code is used for indicating that the processing of the vector data migration instruction on the vector data is migration processing, the operation domain comprises a vector data address to be migrated and the target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type of the migration processing;
the migration type indicates the vector data storage speed of the initial storage space, the vector data storage speed of the target storage space, and the fast-slow relationship between the vector data storage speed of the initial storage space and the vector data storage speed of the target storage space.
2. The apparatus of claim 1, wherein the processing module comprises a master processing sub-module and a plurality of slave processing sub-modules,
and the main processing sub-module is used for processing the vector data to be migrated to obtain processed vector data to be migrated, and storing the processed vector data to be migrated into the target address.
3. The apparatus of claim 1, wherein the operation field further comprises an amount of vector data migration,
the control module is further configured to determine the vector data migration amount according to the operation domain, and acquire the vector data to be migrated corresponding to the vector data migration amount from the vector data address to be migrated.
4. The apparatus of claim 1, wherein the operational domain further comprises a migration parameter,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation domain.
5. The apparatus of claim 1, wherein the opcode is further configured to indicate a migration parameter,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation codes.
6. The apparatus of claim 1, further comprising:
a storage module for storing the vector data to be migrated,
wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing data to be operated and vector data to be migrated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
7. The apparatus of claim 1, wherein the control module comprises:
the instruction storage submodule is used for storing the vector data migration instruction;
the instruction processing submodule is used for analyzing the vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction;
and the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the vector data migration instruction.
8. The apparatus of claim 7, wherein the control module further comprises:
the dependency relationship processing submodule is used for caching a first to-be-executed instruction in the instruction storage submodule when the fact that the incidence relationship exists between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction is determined, extracting the first to-be-executed instruction from the instruction storage submodule after the zeroth to-be-executed instruction is executed, and sending the first to-be-executed instruction to the processing module,
wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:
and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.
9. A machine learning arithmetic device, the device comprising:
one or more vector data migration instruction processing devices according to any one of claims 1 to 8, configured to obtain vector data to be migrated and control information from another processing device, perform a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;
when the machine learning operation device comprises a plurality of vector data migration instruction processing devices, the vector data migration instruction processing devices can be connected through a specific structure and transmit data;
the vector data migration instruction processing devices are interconnected through a Peripheral Component Interface Express (PCIE) bus and transmit data so as to support larger-scale machine learning operation; the plurality of the vector data migration instruction processing devices share the same control system or own respective control systems; the vector data migration instruction processing devices share a memory or own respective memories; the interconnection mode of the vector data migration instruction processing devices is any interconnection topology.
10. A combined processing apparatus, characterized in that the combined processing apparatus comprises:
the machine learning computing device, universal interconnect interface, and other processing device of claim 9;
the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing apparatus further comprises: and a storage device connected to the machine learning calculation device and the other processing device, respectively, for storing data of the machine learning calculation device and the other processing device.
11. A machine learning chip, the machine learning chip comprising:
a machine learning computation apparatus according to claim 9 or a combined processing apparatus according to claim 10.
12. An electronic device, characterized in that the electronic device comprises:
the machine learning chip of claim 11.
13. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and a machine learning chip according to claim 11;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
14. A vector data migration instruction processing method is applied to a vector data migration instruction processing device, the device comprises a control module and a processing module, and the method comprises the following steps:
analyzing the obtained vector data migration instruction by using a control module to obtain an operation code and an operation domain of the vector data migration instruction, obtaining vector data to be migrated and a target address required by executing the vector data migration instruction according to the operation code and the operation domain, and determining migration parameters required by migration processing;
storing the vector data to be migrated into the target address by using a processing module according to the migration parameters,
the operation code is used for indicating that the processing of the vector data migration instruction on the vector data is migration processing, the operation domain comprises a vector data address to be migrated and the target address, and the migration parameters comprise an initial storage space where the vector data address to be migrated is located, a target storage space where the target address is located and a migration type of the migration processing;
the migration type indicates the vector data storage speed of the initial storage space, the vector data storage speed of the target storage space, and the fast-slow relationship between the vector data storage speed of the initial storage space and the vector data storage speed of the target storage space.
15. The method of claim 14, wherein the processing module comprises a master processing sub-module and a plurality of slave processing sub-modules,
storing the vector data to be migrated into the target address according to the migration parameters, wherein the storing comprises:
and processing the vector data to be migrated to obtain processed vector data to be migrated, and storing the processed vector data to be migrated into the target address.
16. The method of claim 14, wherein the operation field further comprises an amount of vector data migration,
obtaining vector data to be migrated and a target address required for executing a vector data migration instruction according to the operation code and the operation domain, wherein the method comprises the following steps of:
and determining the vector data migration volume according to the operation domain, and acquiring vector data to be migrated corresponding to the vector data migration volume from the vector data address to be migrated.
17. The method of claim 14, wherein the operational domain further comprises a migration parameter,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation domain.
18. The method of claim 14, wherein the operation code is further configured to indicate a migration parameter,
the determining of the migration parameters required for the migration processing includes:
and determining migration parameters required by the migration processing according to the operation codes.
19. The method of claim 14, further comprising:
storing the vector data to be migrated using a storage module of the device,
wherein the storage module comprises at least one of a register and a cache,
the cache is used for storing data to be operated and the vector data to be migrated, and comprises at least one neuron cache NRAM;
the register is used for storing scalar data in the data to be operated;
the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
20. The method of claim 14, wherein parsing the obtained vector data migration instruction to obtain an operation code and an operation field of the vector data migration instruction comprises:
storing the vector data migration instruction;
analyzing the vector data migration instruction to obtain an operation code and an operation domain of the vector data migration instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the vector data migration instruction.
21. The method of claim 20, further comprising:
when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions is associated with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after determining that the zeroth to-be-executed instruction is completely executed, controlling to execute the first to-be-executed instruction,
the method for associating the first to-be-executed instruction with the zeroth to-be-executed instruction before the first to-be-executed instruction comprises the following steps of:
and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.
22. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of claims 14 to 21.
CN201910636077.0A 2018-10-09 2019-07-15 Operation method, device, computer equipment and storage medium Active CN111338694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/110167 WO2020073925A1 (en) 2018-10-09 2019-10-09 Operation method and apparatus, computer device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018115559558 2018-12-19
CN201811555955 2018-12-19

Publications (2)

Publication Number Publication Date
CN111338694A CN111338694A (en) 2020-06-26
CN111338694B true CN111338694B (en) 2022-05-31

Family

ID=71183287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910636077.0A Active CN111338694B (en) 2018-10-09 2019-07-15 Operation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111338694B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104603766A (en) * 2012-09-28 2015-05-06 英特尔公司 Accelerated interlane vector reduction instructions
CN104899181A (en) * 2014-03-07 2015-09-09 Arm有限公司 Data processing apparatus and method for processing vector operands
CN107315566A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing vector circulant shift operation
CN107315568A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of device for being used to perform vector logic computing
CN107341547A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 A kind of apparatus and method for being used to perform convolutional neural networks training
CN107832804A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104603766A (en) * 2012-09-28 2015-05-06 英特尔公司 Accelerated interlane vector reduction instructions
CN104899181A (en) * 2014-03-07 2015-09-09 Arm有限公司 Data processing apparatus and method for processing vector operands
CN107315566A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing vector circulant shift operation
CN107315568A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of device for being used to perform vector logic computing
CN107341547A (en) * 2016-04-29 2017-11-10 北京中科寒武纪科技有限公司 A kind of apparatus and method for being used to perform convolutional neural networks training
CN107832804A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium

Also Published As

Publication number Publication date
CN111338694A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
CN110096310B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
CN111338694B (en) Operation method, device, computer equipment and storage medium
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111353124A (en) Operation method, operation device, computer equipment and storage medium
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
CN111290789B (en) Operation method, operation device, computer equipment and storage medium
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111353595A (en) Operation method, device and related product
CN111339060B (en) Operation method, device, computer equipment and storage medium
CN111290788B (en) Operation method, operation device, computer equipment and storage medium
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN111353125B (en) Operation method, operation device, computer equipment and storage medium
CN111340202A (en) Operation method, device and related product
CN111124497B (en) Operation method, operation device, computer equipment and storage medium
CN111275197B (en) Operation method, device, computer equipment and storage medium
CN112395009A (en) Operation method, operation device, computer equipment and storage medium
CN112396169B (en) Operation method, device, computer equipment and storage medium
CN111062483A (en) Operation method, operation device, computer equipment and storage medium
CN111325331B (en) Operation method, device and related product
CN112395007A (en) Operation method, operation device, computer equipment and storage medium
CN112395001A (en) Operation method, operation device, computer equipment and storage medium
CN111047028A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant