CN111275197B - Operation method, device, computer equipment and storage medium - Google Patents

Operation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111275197B
CN111275197B CN201910625443.2A CN201910625443A CN111275197B CN 111275197 B CN111275197 B CN 111275197B CN 201910625443 A CN201910625443 A CN 201910625443A CN 111275197 B CN111275197 B CN 111275197B
Authority
CN
China
Prior art keywords
scalar
instruction
type conversion
data type
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910625443.2A
Other languages
Chinese (zh)
Other versions
CN111275197A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to PCT/CN2019/110167 priority Critical patent/WO2020073925A1/en
Publication of CN111275197A publication Critical patent/CN111275197A/en
Application granted granted Critical
Publication of CN111275197B publication Critical patent/CN111275197B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30058Conditional branch instructions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The present disclosure relates to an operation method, an apparatus, a computer device, and a storage medium. The combined processing device comprises: a machine learning computing device, a universal interconnection interface and other processing devices; the machine learning operation device interacts with other processing devices to jointly complete the calculation operation designated by the user, wherein the combined processing device further comprises: and a storage device connected to the machine learning computing device and the other processing device, respectively, for storing data of the machine learning computing device and the other processing device. The operation method, the operation device, the computer equipment and the storage medium provided by the embodiment of the disclosure have wide application range, high operation processing efficiency and high operation processing speed.

Description

Operation method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a scalar type conversion instruction operation method, apparatus, computer device, and storage medium.
Background
With the continuous development of technology, machine learning, especially neural network algorithms, are increasingly used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of the neural network algorithm is higher and higher, the types and the number of the data operations involved are continuously increased. In the related art, scalar type conversion operation is performed on scalar data with low efficiency and low speed.
Disclosure of Invention
In view of this, the present disclosure proposes an operation method, apparatus, computer device, and storage medium to improve the efficiency and speed of scalar type conversion operation on scalar data.
According to a first aspect of the present disclosure, there is provided a scalar type conversion instruction processing apparatus, the apparatus comprising:
the control module is used for analyzing the obtained scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated;
an operation module, configured to perform scalar type conversion operation on the scalar to be operated of an initial data type according to the target data type, obtain an operation result, store the operation result in the target address, where the data type of the operation result is the target data type,
the operation code is used for indicating that the operation performed on data by the scalar type conversion instruction is scalar type conversion operation, and the operation field comprises a scalar address to be operated and the target address.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device comprising:
one or more scalar type conversion instruction processing apparatuses according to the first aspect are configured to obtain scalar and control information to be operated from other processing apparatuses, perform specified machine learning operation, and transmit an execution result to the other processing apparatuses through an I/O interface;
when the machine learning arithmetic device comprises a plurality of scalar type conversion instruction processing devices, the scalar type conversion instruction processing devices can be connected through a specific structure and transmit data;
the scalar type conversion instruction processing devices are interconnected and transmit data through a PCIE bus of a rapid external equipment interconnection bus so as to support larger-scale machine learning operation; a plurality of scalar type conversion instruction processing devices share the same control system or have respective control systems; a plurality of scalar type conversion instruction processing devices share a memory or have respective memories; the interconnection mode of a plurality of scalar type conversion instruction processing devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combination processing apparatus, the apparatus comprising:
The machine learning arithmetic device, the universal interconnect interface, and the other processing device described in the second aspect;
the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning arithmetic device described in the above second aspect or the combination processing device described in the above third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure including the machine learning chip of the fourth aspect described above.
According to a sixth aspect of the present disclosure, there is provided a board including the machine learning chip package structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device including the machine learning chip described in the fourth aspect or the board described in the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a scalar type conversion instruction processing method, the method being applied to a scalar type conversion instruction processing apparatus, the method comprising:
Analyzing the obtained scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated;
performing scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type to obtain an operation result, storing the operation result into the target address, wherein the data type of the operation result is the target data type,
the operation code is used for indicating that the operation performed on data by the scalar type conversion instruction is scalar type conversion operation, and the operation field comprises a scalar address to be operated and the target address.
According to a ninth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the scalar type conversion instruction processing method described above.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
The embodiment of the disclosure provides a scalar type conversion instruction processing method, a scalar type conversion instruction processing device, computer equipment and a storage medium. The control module is used for analyzing the obtained scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated. The operation module is used for performing scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type, obtaining an operation result and storing the operation result into the target address. The scalar type conversion instruction processing method, the scalar type conversion instruction processing device, the scalar type conversion instruction processing computer device and the scalar type conversion instruction processing storage medium are wide in application range, high in processing efficiency and high in processing speed for scalar type conversion instructions.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 illustrates a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 2 a-2 f illustrate block diagrams of scalar type conversion instruction processing devices according to an embodiment of the present disclosure.
Fig. 3 shows a schematic diagram of an application scenario of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing apparatus according to an embodiment of the disclosure.
Fig. 5 shows a schematic structural diagram of a board according to an embodiment of the present disclosure.
Fig. 6 illustrates a flow chart of a scalar type conversion instruction processing method according to an embodiment of the present disclosure.
Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the disclosure. Based on the embodiments in this disclosure, all other embodiments that may be made by those skilled in the art without the inventive effort are within the scope of the present disclosure.
It should be understood that the terms "zero," "first," "second," and the like in the claims, specification and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of the present disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Because of the wide use of neural network algorithms, the capability of computer hardware operators is continuously improved, and the variety and the number of data operations involved in practical application are continuously improved. Because the types of programming languages are various, in order to realize the operation process of scalar operation under different language environments, in the related technology, because the current stage has no scalar type conversion instruction which can be widely applied to various programming languages, technicians need to customize a plurality of instructions corresponding to the programming language environments to realize scalar type conversion, so that the efficiency of type conversion is low and the speed is low. The present disclosure provides a type conversion instruction processing method, apparatus, computer device, and storage medium, which can implement scalar type conversion with only one instruction, and can significantly improve the efficiency and speed of performing scalar type conversion.
Fig. 1 illustrates a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a control module 11 and an operation module 12.
The control module 11 is configured to parse the obtained scalar type conversion instruction to obtain an operation code and an operation field of the scalar type conversion instruction, obtain a scalar to be operated and a target address required for executing the scalar type conversion instruction according to the operation code and the operation field, and determine a target data type and an initial data type of the scalar to be operated. The operation code is used for indicating the operation of the scalar type conversion instruction on the data to be scalar type conversion operation, and the operation field comprises a scalar address to be operated and a target address.
The operation module 12 is configured to perform scalar type conversion operation on a scalar to be operated of an initial data type according to a target data type, obtain an operation result, and store the operation result in a target address. The data type of the operation result is the target data type.
In this embodiment, the control module may obtain the scalar to be operated from the scalar to be operated address. The control module may obtain scalar type conversion instructions and scalar to be operated on through a data input output unit, which may be one or more data I/O interfaces or I/O pins.
In this embodiment, the opcode may be a portion of an instruction or field (usually represented by a code) specified in a computer program to perform an operation, and is an instruction sequence number used to inform an apparatus executing the instruction of which instruction is specifically required to be executed. An operation field may be a source of all data required to execute a corresponding instruction, including parameter data, scalar to be operated on, corresponding method of operation, and so forth. It must include an opcode and an operation field for a scalar type conversion instruction, where the operation field includes at least the scalar address to be operated on and the target address.
It should be appreciated that one skilled in the art may set the instruction format of the scalar type conversion instruction, as well as the opcode and operation fields involved, as desired, and this disclosure is not limited in this regard.
In this embodiment, the apparatus may include one or more control modules and one or more operation modules, and the number of the control modules and the operation modules may be set according to actual needs, which is not limited in this disclosure. When the apparatus includes a control module, the control module may receive the scalar type conversion instruction and control one or more operation modules to perform scalar type conversion operations. When the device comprises a plurality of control modules, the control modules can respectively receive scalar type conversion instructions and control the corresponding operation module or operation modules to perform scalar type conversion operation.
The scalar type conversion instruction processing device provided by the embodiment of the disclosure comprises a control module and an operation module. The control module is used for analyzing the obtained scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated. The operation module is used for performing scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type, obtaining an operation result and storing the operation result into the target address. The scalar type conversion instruction processing device provided by the embodiment of the disclosure has wide application range, high processing efficiency and high processing speed for scalar type conversion instructions, and high processing efficiency and high processing speed for scalar type conversion.
Fig. 2a shows a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2a, the operation module 12 may include a plurality of scalar operators 120 for performing scalar type conversion operations.
In this implementation, the operation module may also include a scalar operator. The number of scalar operators may be set according to the size of the data amount required to perform the scalar type conversion operation, the processing speed, efficiency, and the like of the scalar type conversion operation, which is not limited by the present disclosure.
Fig. 2b shows a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2b, the operation module 12 may include a master operation sub-module 121 and a plurality of slave operation sub-modules 122. The main operator sub-module 121 may include a plurality of scalar operators 120 (not shown in the figures).
The main operation sub-module 121 is configured to perform scalar type conversion operations by using the scalar operators 120, obtain operation results, and store the operation results in the target address.
In one possible implementation manner, the control module 11 is further configured to parse the obtained calculation instruction to obtain an operation domain and an operation code of the calculation instruction, and obtain data to be operated required for executing the calculation instruction according to the operation domain and the operation code. The operation module 12 is further configured to operate on the data to be operated according to the calculation instruction, so as to obtain a calculation result of the calculation instruction. The operation module may include a plurality of operators for performing operations corresponding to operation types of the calculation instruction.
In this implementation manner, the calculation instruction may be other instructions for performing arithmetic operations, logical operations, and other operations on data such as scalar, vector, matrix, tensor, etc., and those skilled in the art may set the calculation instruction according to actual needs, which is not limited in this disclosure.
In this implementation, the arithmetic unit may include an arithmetic unit capable of performing arithmetic operations, logical operations, and the like on data, such as an adder, a divider, a multiplier, and a comparator. The type and number of the operators may be set according to the size of the data amount of the operation to be performed, the operation type, the processing speed of the operation on the data, the efficiency, and the like, and the present disclosure is not limited thereto.
In a possible implementation manner, the control module 11 is further configured to parse the calculation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main operation sub-module 121.
The master operator module 121 is configured to perform preamble processing on data to be operated on, and perform data and operation instruction transmission with the plurality of slave operator modules 122.
The slave operation sub-module 122 is configured to perform an intermediate operation in parallel according to the data and the operation instruction transmitted from the master operation sub-module 121 to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master operation sub-module 122.
The main operation sub-module 121 is further configured to perform subsequent processing on the plurality of intermediate results, obtain a calculation result of the calculation instruction, and store the calculation result in a corresponding address.
In this implementation, when the calculation instruction is an operation performed on scalar, vector data, the apparatus may control the main operation submodule to perform an operation corresponding to the calculation instruction using an operator therein. When the calculation instruction is an operation for data with dimensions greater than or equal to 2, such as a matrix, a tensor and the like, the device can control the slave operation submodule to perform an operation corresponding to the calculation instruction by using an operator in the slave operation submodule.
It should be noted that, a person skilled in the art may set the connection manner between the master operator module and the plurality of slave operator modules according to actual needs, so as to implement the architecture setting of the operation module, for example, the architecture of the operation module may be an "H" type architecture, an array type architecture, a tree type architecture, etc., which is not limited in this disclosure.
Fig. 2c shows a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2c, the operation module 12 may further include one or more branch operation sub-modules 123, where the branch operation sub-modules 123 are configured to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. Wherein the main operator module 121 is connected to one or more branch operator modules 123. In this way, the main operator module, the branch operator module and the slave operator module in the operation module are connected by adopting an H-shaped framework, and data and/or operation instructions are forwarded through the branch operator module, so that the occupation of resources of the main operator module is saved, and the processing speed of the instructions is further improved.
Fig. 2d shows a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2d, a plurality of slave operator modules 122 are distributed in an array.
Each slave operator module 122 is connected to other adjacent slave operator modules 122, and the master operator module 121 connects k slave operator modules 122 among the plurality of slave operator modules 122, where the k slave operator modules 122 are: n slave operation sub-modules 122 of row 1, n slave operation sub-modules 122 of row m, and m slave operation sub-modules 122 of column 1.
As shown in fig. 2d, the k slave operator modules only include n slave operator modules in row 1, n slave operator modules in row m, and m slave operator modules in column 1, that is, the k slave operator modules are slave operator modules directly connected with the master operator module from the plurality of slave operator modules. The k slave operation sub-modules are used for forwarding data and instructions among the master operation sub-module and the plurality of slave operation sub-modules. In this way, the plurality of slave operation sub-modules are distributed in an array, so that the speed of sending data and/or operating instructions to the slave operation sub-modules by the master operation sub-module can be improved, and the processing speed of the instructions can be further improved.
Fig. 2e shows a block diagram of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2e, the operation module may further include a tree submodule 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master operator module 121, and the plurality of branch ports 402 are connected to the plurality of slave operator modules 122, respectively. The tree submodule 124 has a transceiving function and is used for forwarding data and/or operation instructions between the master operation submodule 121 and the slave operation submodule 122. Therefore, the operation module is connected in a tree-shaped structure through the action of the tree-shaped sub-module, and the forwarding function of the tree-shaped sub-module is utilized, so that the speed of transmitting data and/or operation instructions to the slave operation sub-module by the master operation sub-module can be improved, and the processing speed of the instructions is further improved.
In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one layer of nodes. The node is a line structure with a forwarding function, and the node itself has no operation function. The lowest level of nodes are connected with the slave operator modules to forward data and/or operation instructions between the master operator module 121 and the slave operator module 122. In particular, if the tree submodule has zero level nodes, the device does not require a tree submodule.
In one possible implementation, tree submodule 124 may include a plurality of nodes of an n-ary tree structure, which may have a plurality of layers.
For example, fig. 2f shows a block diagram of a scalar type conversion instruction processing device according to an embodiment of the present disclosure. As shown in fig. 2f, the n-ary tree structure may be a binary tree structure, the tree submodule comprising a layer 2 node 01. The lowest level node 01 interfaces with the slave operator module 122 to forward data and/or operation instructions between the master operator module 121 and the slave operator module 122.
In this implementation, the n-ary tree structure may also be a three-ary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of layers of n in the n-ary tree structure and nodes in the n-ary tree structure can be set as desired by those skilled in the art, and this disclosure is not limited in this regard.
In one possible implementation, the operation domain may also include an initial data type and a target data type. The control module 11 is further configured to determine a target data type and an initial data type of the scalar to be operated on according to the operation domain.
In one possible implementation, the opcode may also be used to indicate an initial data type and a target data type. The control module 11 is further configured to determine the target data type and the initial data type of the scalar to be operated on according to the operation code.
In one possible implementation, when the initial data type and/or the target data type cannot be determined according to the operation code or the operation field, the initial data type and/or the target data type may be determined according to a default initial data type and a default target data type that are set in advance. The default initial data type set in advance may be determined as the initial data type of the current scalar type conversion instruction, and the default target data type set in advance may be determined as the target data type of the current scalar type conversion instruction. The manner of determining the target data type and the initial data type may be set by those skilled in the art according to actual needs, and the present disclosure is not limited thereto.
In one possible implementation, the target data type may include any one of a 16-bit floating point number, a 32-bit floating point number, a 48-bit floating point number, a 16-bit integer, a 32-bit integer, and a 48-bit integer, and the initial data type may include any one of a 16-bit signed number, a 32-bit signed number, a 48-bit signed number, a 16-bit unsigned number, a 32-bit unsigned number, a 48-bit unsigned number, and a pointer data type.
In this implementation, the target data type and the initial data type may also be data types such as a 64-bit integer, and those skilled in the art may set the target data type and the initial data type according to actual needs, so long as it is ensured that the target data type is different from the data type indicated by the initial data type, which is not limited by the present disclosure.
In this implementation, the above-described identification (or code) of the number, name, etc. of the target data type and the initial data type may be set to determine the target data type and the initial data type indicated by the scalar type conversion instruction from the identification (or code) in the scalar type conversion instruction. For example, the identification of a 16-bit floating point number may be set to cvtf16, the identification of a 32-bit floating point number to cvtf32, the identification of a 48-bit floating point number to cvtf48, the identification of a 16-bit integer to cvti16, the identification of a 32-bit integer to cvti32, and the identification of a 48-bit integer to cvti48. The identification of the 16-bit signed number is set to s16, the identification of the 32-bit signed number is set to s32, the identification of the 48-bit signed number is set to s48, the identification of the 16-bit unsigned number is set to u16, the identification of the 32-bit unsigned number is set to u32, the identification of the 48-bit unsigned number is set to u48, and the identification of the pointer data type is set to ptr. The identification of the target data type and the initial data type may be set by those skilled in the art according to actual needs, and the present disclosure is not limited thereto.
In one possible implementation, as shown in fig. 2 a-2 f, the apparatus may further comprise a storage module 13. The storage module 13 is used for storing the scalar to be operated on.
In this implementation, the memory module may include one or more of a cache and a register, the cache may include a scratch pad cache, and may also include at least one NRAM (Neuron Random Access Memory, neuronal random access memory). The cache may be used to store data to be operated on and the register may be used to store scalar to be operated on.
In one possible implementation, the cache may comprise a neuron cache. The neuron cache, that is, the above-mentioned neuron random access memory, may be used to store neuron data in the data to be operated on, and the neuron data may include neuron vector data. Wherein the data to be calculated includes data related to performing scalar type conversion and/or data related to operations of other calculation instructions.
In one possible implementation, the apparatus may further include a direct memory access module for reading or storing data from the storage module.
In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113.
Instruction storage sub-module 111 is used to store scalar type conversion instructions.
The instruction processing sub-module 112 is configured to parse the scalar type conversion instruction to obtain an operation code and an operation field of the scalar type conversion instruction.
The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes a plurality of instructions to be executed, which may include scalar type conversion instructions, sequentially arranged according to an execution order.
In this implementation, the instructions to be executed may further include calculation instructions that have a certain correlation or are not correlated with the scalar type conversion instruction, and those skilled in the art may set the instructions according to actual needs, which is not limited in this disclosure. The instruction queue may be obtained by arranging the execution sequences of the plurality of instructions to be executed according to the receiving time, the priority level, and the like of the instructions to be executed, so that the plurality of instructions to be executed are sequentially executed according to the instruction queue.
In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may include a dependency processing sub-module 114.
The dependency relationship processing sub-module 114 is configured to cache a first to-be-executed instruction in the instruction storage sub-module 111 when determining that there is an association relationship between the first to-be-executed instruction in the plurality of to-be-executed instructions and a zeroth to-be-executed instruction before the first to-be-executed instruction, and extract the first to-be-executed instruction from the instruction storage sub-module 111 and send the first to-be-executed instruction to the operation module 12 after the execution of the zeroth to-be-executed instruction is completed. The first to-be-executed instruction and the zeroth to-be-executed instruction are instructions in the plurality of to-be-executed instructions.
The association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes: the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas. Otherwise, the first to-be-executed instruction and the zeroth to-be-executed instruction have no association relationship, and the first storage address interval and the zeroth storage address interval have no overlapping area.
By the method, the following instructions to be executed can be executed after the previous instructions to be executed are executed according to the dependency relationship among the instructions to be executed, and the accuracy of operation results is ensured.
In one possible implementation, the instruction format of the scalar type conversion instruction may be:
scalar dst src0 opcode.type
where scale is the opcode of the scalar type conversion instruction and dst, src0, opcode. Type is the operation field of the scalar type conversion instruction. Where dst is the target address. src0 is the scalar address to be operated on. An opcode in opcode.type is a target data type and a type in opcode.type is an initial data type of a scalar to be operated on.
In one possible implementation, the instruction format of the scalar type conversion instruction may also be:
opcode.scalar.type dstsrc0
Where opcode. Scale. Type is the opcode of the scalar type conversion instruction and dst, src0 are the operation fields of the scalar type conversion instruction. Wherein an opcode in an opcode. Scale.type is used to indicate a target data type, a type in an opcode. Scale.type is used to indicate an initial data type of a scalar to be operated on, and a scale in an opcode. Scale.type is used to indicate that the instruction is a scalar type conversion instruction. dst is the target address and src0 is the scalar address to be operated on.
It should be appreciated that one skilled in the art may set the opcode of a scalar type conversion instruction, the location of the opcode and the operation field in the instruction format, as desired, and this disclosure is not limited thereto.
In one possible implementation, the apparatus may be disposed in one or more of a graphics processor (Graphics Processing Unit, GPU for short), a central processor (Central Processing Unit, CPU for short), and an embedded Neural network processor (Neural-network Processing Unit, NPU for short).
It should be noted that, although the scalar type conversion instruction processing apparatus is described above by way of example in the above embodiments, those skilled in the art will appreciate that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.
Application example
An application example according to an embodiment of the present disclosure is given below in conjunction with "scalar type conversion operation with scalar type conversion instruction processing apparatus" as one exemplary application scenario, in order to facilitate understanding of the flow of scalar type conversion instruction processing apparatus. It will be appreciated by those skilled in the art that the following application examples are for purposes of facilitating understanding of the embodiments of the present disclosure only and should not be construed as limiting the embodiments of the present disclosure.
Fig. 3 shows a schematic diagram of an application scenario of a scalar type conversion instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the scalar type conversion instruction processing apparatus processes a scalar type conversion instruction as follows:
the control module 11 parses the obtained scalar type conversion instruction 1 (for example, the scalar type conversion instruction 1 is scaler 500 100 cvtf16.u32) to obtain the operation code and the operation domain of the scalar type conversion instruction 1. The operation code of the scalar type conversion instruction 1 is scaler, the target address is 500, the scalar address to be operated is 100, the target data type is cvtf16 (i.e. 16 is a floating point number), and the initial data type of the scalar to be operated is u32 (i.e. 32-bit unsigned number). The control module 11 acquires a scalar to be operated from the scalar to be operated address 100.
The operation module 12 performs scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type (i.e. converts the data type of the 32-bit unsigned scalar to be operated into 16 to be a floating point number), so as to obtain an operation result, and stores the operation result into the target address 500.
The operation of the above modules may be described with reference to the relevant description above.
Thus, the scalar type conversion instruction processing device can efficiently and rapidly process scalar type conversion instructions, and has high processing efficiency and high processing speed for scalar type conversion.
The present disclosure provides a machine learning arithmetic device that may include one or more of the scalar type conversion instruction processing devices described above for acquiring scalar and control information to be operated from other processing devices, and performing a specified machine learning operation. The machine learning computing device may obtain scalar type conversion instructions from other machine learning computing devices or non-machine learning computing devices, and transfer execution results to peripheral devices (also referred to as other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one scalar type conversion instruction processing device is included, scalar type conversion instruction processing devices may be linked and data transmitted through a specific structure, for example, interconnected and data transmitted through a PCIE bus, so as to support operation of a larger-scale neural network. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.
The machine learning operation device has higher compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing apparatus according to an embodiment of the disclosure. As shown in fig. 4a, the combined processing device includes the machine learning computing device, the universal interconnect interface, and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing means may include one or more processor types of general purpose/special purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; the other processing device may cooperate with the machine learning computing device to complete the computing task.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a chip of the machine learning operation device; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to the other processing device.
Fig. 4b shows a block diagram of a combined processing apparatus according to an embodiment of the disclosure. In a possible implementation, as shown in fig. 4b, the combined processing device may further comprise a storage device, which is connected to the machine learning computing device and the other processing device, respectively. The storage device is used for storing data of the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing devices in the internal storage of the data required to be calculated.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.
The present disclosure provides a machine learning chip including the machine learning arithmetic device or the combination processing device described above.
The present disclosure provides a machine learning chip packaging structure including the machine learning chip described above.
The present disclosure provides a board card, and fig. 5 shows a schematic structural diagram of the board card according to an embodiment of the present disclosure. As shown in fig. 5, the board card includes the above machine learning chip package structure or the above machine learning chip. In addition to including machine learning chip 389, the board card may include other kits including, but not limited to: a memory device 390, an interface device 391 and a control device 392.
The memory device 390 is connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each set of memory units 393 is connected to the machine learning chip 389 via a bus. It is understood that each set of memory cells 393 may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 sets of memory cells 393. Each set of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers within, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is appreciated that the theoretical bandwidth of data transfer may reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells 393.
In one embodiment, each set of memory cells 393 includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage for each memory unit 393.
The interface device 391 is electrically connected to the machine learning chip 389 (or the machine learning chip within the machine learning chip package structure). The interface device 391 is used to enable data transfer between the machine learning chip 389 and an external device (e.g., a server or computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the machine learning chip 289 through a standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device 391 may be another interface, and the disclosure is not limited to the specific implementation form of the other interface, and the interface device may be capable of implementing the transfer function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to the machine learning chip 389. The control device 392 is configured to monitor the status of the machine learning chip 389. Specifically, machine learning chip 389 and control device 392 may be electrically connected via an SPI interface. The control device 392 may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the machine learning chip 389 may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the machine learning chip 389 may be in different operating states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device including the machine learning chip or the board card described above.
The electronic device may include a data processing apparatus, a computer device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers, range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
Fig. 6 illustrates a flow chart of a scalar type conversion instruction processing method according to an embodiment of the present disclosure. The method may be applied to, for example, a computer device comprising a memory and a processor, wherein the memory is used to store data used during execution of the method; the processor is configured to perform related processing and operation steps, such as performing step S51 and step S52 described below. As shown in fig. 6, the method is applied to the scalar type conversion instruction processing apparatus described above, and includes step S51 and step S52.
In step S51, the control module is used to parse the obtained scalar type conversion instruction to obtain the operation code and operation field of the scalar type conversion instruction, and obtain the scalar to be operated and the target address required for executing the scalar type conversion instruction according to the operation code and operation field, and determine the target data type and the initial data type of the scalar to be operated. The operation code is used for indicating the operation of the scalar type conversion instruction on the data to be scalar type conversion operation, and the operation field comprises a scalar address to be operated and a target address.
In step S52, the operation module performs scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type, so as to obtain an operation result, and stores the operation result into the target address, where the data type of the operation result is the target data type.
In one possible implementation, performing a scalar type conversion operation on a scalar to be operated on of an initial data type according to a target data type may include:
a scalar type conversion operation is performed using a plurality of scalar operators in an operation module.
In one possible implementation, the operation module includes a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module including a plurality of scalar operators. Wherein, step S52 may include:
and executing scalar type conversion operation by using a plurality of scalar operators in the main operation submodule to obtain an operation result, and storing the operation result into a target address.
In one possible implementation, the operation domain may further include an initial data type and a target data type, and step S51 may include: the target data type and the initial data type of the scalar to be operated are determined according to the operation domain.
In one possible implementation, the operation code is further used to indicate an initial data type and a target data type, and step S51 may include: determining a target data type and an initial data type of a scalar to be operated on based on an opcode
In one possible implementation, the target data type may include any one of a 16-bit floating point number, a 32-bit floating point number, a 48-bit floating point number, a 16-bit integer, a 32-bit integer, and a 48-bit integer, and the initial data type may include any one of a 16-bit signed number, a 32-bit signed number, a 48-bit signed number, a 16-bit unsigned number, a 32-bit unsigned number, a 48-bit unsigned number, and a pointer data type.
In one possible implementation, the method may further include: the scalar to be operated on is stored by the memory module of the device,
wherein the memory module comprises at least one of a register and a cache,
the buffer memory is used for storing data to be operated and comprises at least one neuron buffer memory NRAM;
a register for storing a scalar to be operated on;
and the neuron cache is used for storing neuron data in the data to be operated, wherein the neuron data comprises neuron vector data.
In one possible implementation, step S51 may include:
storing scalar type conversion instructions;
analyzing the scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction;
an instruction queue is stored, the instruction queue comprising a plurality of instructions to be executed, which may include scalar type conversion instructions, arranged in order of execution.
In one possible implementation, the method may further include:
when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relation with the zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, controlling the execution of the first to-be-executed instruction after determining that the zeroth to-be-executed instruction is executed,
the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
It should be noted that, although the scalar type conversion instruction processing method is described above by way of example in the above embodiments, those skilled in the art will appreciate that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, so long as the technical scheme of the disclosure is met.
The scalar type conversion instruction processing method provided by the embodiment of the disclosure has wide application range, high processing efficiency and high processing speed for scalar type conversion instructions, and high processing efficiency and high processing speed for scalar type conversion.
The present disclosure also provides a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the scalar type conversion instruction processing method described above.
It should be noted that, for simplicity of description, the foregoing method embodiments are all depicted as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
It should be further noted that, although the steps in the flowchart of fig. 6 are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 6 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
It should be understood that the above-described device embodiments are merely illustrative and that the device of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.
In addition, unless specifically stated, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules described above may be implemented either in hardware or in software program modules.
The integrated units/modules, if implemented in hardware, may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. Unless otherwise indicated, the Memory modules may be any suitable magnetic or magneto-optical storage medium, such as resistive Random Access Memory RRAM (Resistive Random Access Memory), dynamic Random Access Memory DRAM (Dynamic Random Access Memory), static Random Access Memory SRAM (Static Random Access Memory), enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), high-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cube HMC (Hybrid Memory Cube), etc.
The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the various embodiments of the present disclosure. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.
The foregoing may be better understood in light of the following clauses:
clause A1, a scalar type conversion instruction processing apparatus, the apparatus comprising:
the control module is used for analyzing the obtained scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated;
an operation module, configured to perform scalar type conversion operation on the scalar to be operated of an initial data type according to the target data type, obtain an operation result, store the operation result in the target address, where the data type of the operation result is the target data type,
the operation code is used for indicating that the operation performed on data by the scalar type conversion instruction is scalar type conversion operation, and the operation field comprises a scalar address to be operated and the target address.
Clause A2, the apparatus of clause A1, the operation module comprising:
a plurality of scalar operators for performing the scalar type conversion operations.
Clause A3, the apparatus of clause A2, the operation module comprising a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising the plurality of scalar operators,
the main operation sub-module is used for executing the scalar type conversion operation by utilizing the scalar operators to obtain an operation result, and storing the operation result into the target address.
Clause A4, the apparatus of clause A1, the operation field further comprising an initial data type and a target data type,
the control module is further configured to determine a target data type and an initial data type of the scalar to be operated according to the operation domain.
Clause A5, the apparatus of clause A1, the opcode further being used to indicate an initial data type and a target data type,
the control module is further configured to determine a target data type and an initial data type of the scalar to be operated according to the operation code.
Clause A6, the apparatus of clause A1, the target data type comprising any of a 16-bit floating point number, a 32-bit floating point number, a 48-bit floating point number, a 16-bit integer, a 32-bit integer, and a 48-bit integer, the initial data type comprising any of a 16-bit signed number, a 32-bit signed number, a 48-bit signed number, a 16-bit unsigned number, a 32-bit unsigned number, a 48-bit unsigned number, and a pointer data type.
Clause A7, the apparatus of clause A1, further comprising:
a storage module for storing the scalar to be operated,
wherein the memory module comprises at least one of a register and a cache,
the buffer is used for storing the data to be operated, and the buffer comprises at least one neuron buffer NRAM;
the register is used for storing the scalar to be operated;
the neuron cache is used for storing neuron data in the data to be operated, and the neuron data comprises neuron vector data.
Clause A8, the apparatus of clause A1, the control module comprising:
an instruction storage sub-module for storing the scalar type conversion instruction;
the instruction processing submodule is used for analyzing the scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction;
the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed, the instructions to be executed are sequentially arranged according to an execution sequence, and the instructions to be executed comprise the scalar type conversion instructions.
Clause A9, the apparatus of clause A8, the control module further comprising:
A dependency relationship processing sub-module, configured to cache a first to-be-executed instruction in the plurality of to-be-executed instructions in the instruction storage sub-module when determining that there is an association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction, extract the first to-be-executed instruction from the instruction storage sub-module after the execution of the zeroth to-be-executed instruction is completed, send the first to-be-executed instruction to the operation module,
wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
and the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
Clause a10, a machine learning computing device, the device comprising:
one or more scalar type conversion instruction processing apparatuses according to any one of clauses A1 to A9, configured to obtain scalar and control information to be operated from other processing apparatuses, perform specified machine learning operation, and transfer the execution result to the other processing apparatuses through an I/O interface;
when the machine learning arithmetic device comprises a plurality of scalar type conversion instruction processing devices, the scalar type conversion instruction processing devices can be connected through a specific structure and transmit data;
The scalar type conversion instruction processing devices are interconnected and transmit data through a PCIE bus of a rapid external equipment interconnection bus so as to support larger-scale machine learning operation; a plurality of scalar type conversion instruction processing devices share the same control system or have respective control systems; a plurality of scalar type conversion instruction processing devices share a memory or have respective memories; the interconnection mode of a plurality of scalar type conversion instruction processing devices is any interconnection topology.
Clause a11, a combination processing device, the combination processing device comprising:
the machine learning computing device, universal interconnect interface, and other processing device of clause a 10;
the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing device further comprises: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.
Clause a12, a machine learning chip, the machine learning chip comprising:
The machine learning computing device of clause a10 or the combination processing device of clause a 11.
Clause a13, an electronic device, comprising:
the machine learning chip of clause a 12.
Clause a14, a board card, the board card comprising: a memory device, an interface device, and a control device, and a machine learning chip as set forth in clause a 12;
wherein the machine learning chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
the control device is used for monitoring the state of the machine learning chip.
Clause a15, a scalar type conversion instruction processing method, the method being applied to a scalar type conversion instruction processing apparatus, the apparatus comprising a control module and an operation module, the method comprising:
analyzing the obtained scalar type conversion instruction by utilizing a control module to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated;
Performing scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type by using an operation module to obtain an operation result, storing the operation result into the target address, wherein the data type of the operation result is the target data type,
the operation code is used for indicating that the operation performed on data by the scalar type conversion instruction is scalar type conversion operation, and the operation field comprises a scalar address to be operated and the target address.
Clause a16, the method of clause a15, performing a scalar type conversion operation on the scalar to be operated on an initial data type according to the target data type, comprising:
the scalar type conversion operation is performed with a plurality of scalar operators in the operation module.
Clause a17, the method of clause a16, the operation module comprising a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising the plurality of scalar operators,
the scalar type conversion operation is performed on the scalar to be operated of the initial data type according to the target data type, an operation result is obtained, and the operation result is stored in the target address, and the method comprises the following steps:
And executing the scalar type conversion operation by using a plurality of scalar operators in the main operation submodule to obtain an operation result, and storing the operation result into the target address.
Clause a18, the method of clause a15, the operation field further comprising an initial data type and a target data type,
wherein determining a target data type and an initial data type of the scalar to be operated on includes:
and determining a target data type and an initial data type of the scalar to be operated according to the operation field.
Clause a19, the method of clause a15, the opcode further being used to indicate an initial data type and a target data type,
wherein determining a target data type and an initial data type of the scalar to be operated on includes:
and determining a target data type and an initial data type of the scalar to be operated according to the operation code.
Clause a20, the method of clause a15, the target data type comprising any of a 16-bit floating point number, a 32-bit floating point number, a 48-bit floating point number, a 16-bit integer, a 32-bit integer, and a 48-bit integer, the initial data type comprising any of a 16-bit signed number, a 32-bit signed number, a 48-bit signed number, a 16-bit unsigned number, a 32-bit unsigned number, a 48-bit unsigned number, and a pointer data type.
Clause a21, the method of clause a16, the method further comprising:
storing the scalar to be operated on with a memory module of the device,
wherein the memory module comprises at least one of a register and a cache,
the buffer is used for storing the data to be operated, and the buffer comprises at least one neuron buffer NRAM;
the register is used for storing the scalar to be operated;
the neuron cache is used for storing neuron data in the data to be operated, and the neuron data comprises neuron vector data.
Clause a22, the method according to clause a15, parsing the scalar type conversion instruction to obtain an opcode and an operation field of the scalar type conversion instruction, including:
storing the scalar type conversion instruction;
analyzing the scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed, which are sequentially arranged according to an execution sequence, and the instructions to be executed comprise the scalar type conversion instruction.
Clause a23, the method of clause a22, the method further comprising:
When determining that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relation with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, controlling the execution of the first to-be-executed instruction after determining that the zeroth to-be-executed instruction is executed,
wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
and the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
Clause a24, a non-transitory computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the method of any of clauses a15 to a 23.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (22)

1. A scalar type conversion instruction processing apparatus, the apparatus comprising:
the control module is used for analyzing the obtained scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated;
an operation module, configured to perform scalar type conversion operation on the scalar to be operated of an initial data type according to the target data type, obtain an operation result, store the operation result in the target address, where the data type of the operation result is the target data type,
the operation code is used for indicating that the operation performed on data by the scalar type conversion instruction is scalar type conversion operation, and the operation field comprises a scalar address to be operated and the target address;
wherein, the control module includes:
an instruction storage sub-module for storing the scalar type conversion instruction;
the instruction processing submodule is used for analyzing the scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction;
The queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed, the instructions to be executed are sequentially arranged according to an execution sequence, and the instructions to be executed comprise the scalar type conversion instructions.
2. The apparatus of claim 1, wherein the operation module comprises:
a plurality of scalar operators for performing the scalar type conversion operations.
3. The apparatus of claim 2, wherein the operation module comprises a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising the plurality of scalar operators,
the main operation sub-module is used for executing the scalar type conversion operation by utilizing the scalar operators to obtain an operation result, and storing the operation result into the target address.
4. The apparatus of claim 1, wherein the operation field further comprises an initial data type and a target data type,
the control module is further configured to determine a target data type and an initial data type of the scalar to be operated according to the operation domain.
5. The apparatus of claim 1, wherein the opcode is further used to indicate an initial data type and a target data type,
The control module is further configured to determine a target data type and an initial data type of the scalar to be operated according to the operation code.
6. The apparatus of claim 1, wherein the target data type comprises any one of a 16-bit floating point number, a 32-bit floating point number, a 48-bit floating point number, a 16-bit integer, a 32-bit integer, and a 48-bit integer, and the initial data type comprises any one of a 16-bit signed number, a 32-bit signed number, a 48-bit signed number, a 16-bit unsigned number, a 32-bit unsigned number, a 48-bit unsigned number, and a pointer data type.
7. The apparatus of claim 1, wherein the apparatus further comprises:
a storage module for storing the scalar to be operated,
wherein the memory module comprises at least one of a register and a cache,
the buffer memory is used for storing data to be operated, and the buffer memory comprises at least one neuron buffer memory NRAM;
the register is used for storing the scalar to be operated;
the neuron cache is used for storing neuron data in the data to be operated, and the neuron data comprises neuron vector data.
8. The apparatus of claim 1, wherein the control module further comprises:
A dependency relationship processing sub-module, configured to cache a first to-be-executed instruction in the plurality of to-be-executed instructions in the instruction storage sub-module when determining that there is an association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction, extract the first to-be-executed instruction from the instruction storage sub-module after the execution of the zeroth to-be-executed instruction is completed, send the first to-be-executed instruction to the operation module,
wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
and the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
9. A machine learning computing device, the device comprising:
one or more scalar type conversion instruction processing apparatuses according to any one of claims 1 to 8, configured to obtain scalar and control information to be operated from other processing apparatuses, perform specified machine learning operations, and transfer execution results to the other processing apparatuses through I/O interfaces;
When the machine learning arithmetic device comprises a plurality of scalar type conversion instruction processing devices, the scalar type conversion instruction processing devices can be connected through a specific structure and transmit data;
the scalar type conversion instruction processing devices are interconnected and transmit data through a PCIE bus of a rapid external equipment interconnection bus so as to support larger-scale machine learning operation; a plurality of scalar type conversion instruction processing devices share the same control system or have respective control systems; a plurality of scalar type conversion instruction processing devices share a memory or have respective memories; the interconnection mode of a plurality of scalar type conversion instruction processing devices is any interconnection topology.
10. A combination processing apparatus, characterized in that the combination processing apparatus comprises:
the machine learning computing device, universal interconnect interface, and other processing device of claim 9;
the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user,
wherein the combination processing device further comprises: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.
11. A machine learning chip, the machine learning chip comprising:
the machine learning computing device of claim 9 or the combination processing device of claim 10.
12. An electronic device, the electronic device comprising:
the machine learning chip of claim 11.
13. A board, characterized in that, the board includes: a memory device, an interface device and a control device, and a machine learning chip as claimed in claim 11;
wherein the machine learning chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
the control device is used for monitoring the state of the machine learning chip.
14. A scalar type conversion instruction processing method, the method being applied to a scalar type conversion instruction processing apparatus, the apparatus including a control module and an operation module, the method comprising:
analyzing the obtained scalar type conversion instruction by utilizing a control module to obtain an operation code and an operation domain of the scalar type conversion instruction, obtaining a scalar to be operated and a target address required by executing the scalar type conversion instruction according to the operation code and the operation domain, and determining a target data type and an initial data type of the scalar to be operated;
Performing scalar type conversion operation on the scalar to be operated of the initial data type according to the target data type by using an operation module to obtain an operation result, storing the operation result into the target address, wherein the data type of the operation result is the target data type,
the operation code is used for indicating that the operation performed on data by the scalar type conversion instruction is scalar type conversion operation, and the operation field comprises a scalar address to be operated and the target address;
the operation code and the operation domain of the scalar type conversion instruction are obtained by analyzing the scalar type conversion instruction, and the operation code and the operation domain comprise the following steps:
storing the scalar type conversion instruction;
analyzing the scalar type conversion instruction to obtain an operation code and an operation domain of the scalar type conversion instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed, which are sequentially arranged according to an execution sequence, and the instructions to be executed comprise the scalar type conversion instruction.
15. The method of claim 14, wherein performing a scalar type conversion operation on the scalar to be operated on of an initial data type according to the target data type comprises:
The scalar type conversion operation is performed with a plurality of scalar operators in the operation module.
16. The method of claim 15, wherein the operation module comprises a master operation sub-module and a plurality of slave operation sub-modules, the master operation sub-module comprising the plurality of scalar operators,
the scalar type conversion operation is performed on the scalar to be operated of the initial data type according to the target data type, an operation result is obtained, and the operation result is stored in the target address, and the method comprises the following steps:
and executing the scalar type conversion operation by using a plurality of scalar operators in the main operation submodule to obtain an operation result, and storing the operation result into the target address.
17. The method of claim 14, wherein the operation field further comprises an initial data type and a target data type,
wherein determining a target data type and an initial data type of the scalar to be operated on includes:
and determining a target data type and an initial data type of the scalar to be operated according to the operation field.
18. The method of claim 14, wherein the opcode is further used to indicate an initial data type and a target data type,
Wherein determining a target data type and an initial data type of the scalar to be operated on includes:
and determining a target data type and an initial data type of the scalar to be operated according to the operation code.
19. The method of claim 14, wherein the target data type comprises any one of a 16-bit floating point number, a 32-bit floating point number, a 48-bit floating point number, a 16-bit integer, a 32-bit integer, and a 48-bit integer, and the initial data type comprises any one of a 16-bit signed number, a 32-bit signed number, a 48-bit signed number, a 16-bit unsigned number, a 32-bit unsigned number, a 48-bit unsigned number, and a pointer data type.
20. The method of claim 15, wherein the method further comprises:
storing the scalar to be operated on with a memory module of the device,
wherein the memory module comprises at least one of a register and a cache,
the buffer memory is used for storing data to be operated, and the buffer memory comprises at least one neuron buffer memory NRAM;
the register is used for storing the scalar to be operated;
the neuron cache is used for storing neuron data in the data to be operated, and the neuron data comprises neuron vector data.
21. The method of claim 14, wherein the method further comprises:
when determining that the first to-be-executed instruction in the plurality of to-be-executed instructions has an association relation with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, controlling the execution of the first to-be-executed instruction after determining that the zeroth to-be-executed instruction is executed,
wherein, the association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction includes:
and the first storage address interval for storing the data required by the first instruction to be executed and the zeroth storage address interval for storing the data required by the zeroth instruction to be executed have overlapping areas.
22. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 14 to 21.
CN201910625443.2A 2018-10-09 2019-07-11 Operation method, device, computer equipment and storage medium Active CN111275197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/110167 WO2020073925A1 (en) 2018-10-09 2019-10-09 Operation method and apparatus, computer device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811481630 2018-12-05
CN201811481630X 2018-12-05

Publications (2)

Publication Number Publication Date
CN111275197A CN111275197A (en) 2020-06-12
CN111275197B true CN111275197B (en) 2023-11-10

Family

ID=71002923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910625443.2A Active CN111275197B (en) 2018-10-09 2019-07-11 Operation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111275197B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620589A (en) * 2008-06-30 2010-01-06 英特尔公司 Efficient parallel floating point exception handling in a processor
WO2017185418A1 (en) * 2016-04-29 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing neural network computation and matrix/vector computation
CN107608715A (en) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 For performing the device and method of artificial neural network forward operation
CN107832844A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN107861757A (en) * 2017-11-30 2018-03-30 上海寒武纪信息科技有限公司 Arithmetic unit and Related product
CN108009126A (en) * 2017-12-15 2018-05-08 北京中科寒武纪科技有限公司 A kind of computational methods and Related product

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7849294B2 (en) * 2008-01-31 2010-12-07 International Business Machines Corporation Sharing data in internal and memory representations with dynamic data-driven conversion
US9411585B2 (en) * 2011-09-16 2016-08-09 International Business Machines Corporation Multi-addressable register files and format conversions associated therewith
US10762164B2 (en) * 2016-01-20 2020-09-01 Cambricon Technologies Corporation Limited Vector and matrix computing device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620589A (en) * 2008-06-30 2010-01-06 英特尔公司 Efficient parallel floating point exception handling in a processor
WO2017185418A1 (en) * 2016-04-29 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing neural network computation and matrix/vector computation
CN107608715A (en) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 For performing the device and method of artificial neural network forward operation
CN107992329A (en) * 2017-07-20 2018-05-04 上海寒武纪信息科技有限公司 A kind of computational methods and Related product
CN107832844A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN108874444A (en) * 2017-10-30 2018-11-23 上海寒武纪信息科技有限公司 Machine learning processor and the method for executing Outer Product of Vectors instruction using processor
CN107861757A (en) * 2017-11-30 2018-03-30 上海寒武纪信息科技有限公司 Arithmetic unit and Related product
CN108009126A (en) * 2017-12-15 2018-05-08 北京中科寒武纪科技有限公司 A kind of computational methods and Related product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨一晨 ; 梁峰 ; 张国和 ; 何平 ; 吴斌 ; 高震霆 ; .一种基于可编程逻辑器件的卷积神经网络协处理器设计.西安交通大学学报.2018,(07),全文. *
赵秀娟 ; 詹春 ; 郎长胜 ; .基于微程序控制模型机的设计与实现.江西科技师范大学学报.2017,(06),全文. *

Also Published As

Publication number Publication date
CN111275197A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
CN110096310B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
CN111353124A (en) Operation method, operation device, computer equipment and storage medium
CN111340202B (en) Operation method, device and related product
CN111275197B (en) Operation method, device, computer equipment and storage medium
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111339060B (en) Operation method, device, computer equipment and storage medium
CN111338694B (en) Operation method, device, computer equipment and storage medium
CN111353595A (en) Operation method, device and related product
CN111290788B (en) Operation method, operation device, computer equipment and storage medium
CN111290789B (en) Operation method, operation device, computer equipment and storage medium
CN111353125B (en) Operation method, operation device, computer equipment and storage medium
CN112396169B (en) Operation method, device, computer equipment and storage medium
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111124497B (en) Operation method, operation device, computer equipment and storage medium
CN112395006B (en) Operation method, device, computer equipment and storage medium
CN112396186B (en) Execution method, execution device and related product
CN111325331B (en) Operation method, device and related product
CN111062483A (en) Operation method, operation device, computer equipment and storage medium
CN111047028A (en) Operation method, device and related product
CN112395007A (en) Operation method, operation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant