CN110941789B - Tensor operation method and device - Google Patents

Tensor operation method and device Download PDF

Info

Publication number
CN110941789B
CN110941789B CN201811109603.XA CN201811109603A CN110941789B CN 110941789 B CN110941789 B CN 110941789B CN 201811109603 A CN201811109603 A CN 201811109603A CN 110941789 B CN110941789 B CN 110941789B
Authority
CN
China
Prior art keywords
source data
data
tensor
target data
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811109603.XA
Other languages
Chinese (zh)
Other versions
CN110941789A (en
Inventor
谭洪贺
陈亮
凌坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201811109603.XA priority Critical patent/CN110941789B/en
Publication of CN110941789A publication Critical patent/CN110941789A/en
Application granted granted Critical
Publication of CN110941789B publication Critical patent/CN110941789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A tensor operation method, a tensor operation device and a computer are disclosed. The method comprises the following steps: receiving a tensor operation instruction, which includes an instruction type field for describing an operation type, a source data structure description field for describing a structure of source data, and a source data addressing field for describing a storage address of the source data within a storage space; analyzing tensor operation instructions; fetching source data to be operated from the storage space according to the source data structure description field and the source data addressing field; and performing an operation defined by the instruction type field on the acquired source data. In this way, the tensor can be directly and quickly acquired and operated by describing the operation type and the data structure and the storage address of the tensor in the tensor operation instruction, thereby improving the operation efficiency.

Description

Tensor operation method and device
Technical Field
The present application relates generally to the field of computers, and more particularly, to a tensor operation method and apparatus, and a computing device implementing the tensor operation method.
Background
In the field of artificial intelligence, a large number of related computations, such as neural network computations, involve tensor operations, such as matrix operations, and existing devices for performing tensor operations are mainly a general-purpose Central Processing Unit (CPU) and a general-purpose Graphics Processing Unit (GPU).
Specifically, the general-purpose CPU executes general-purpose instructions through the register file and the general-purpose functional units when performing tensor operations. However, since a single general-purpose CPU is used for scalar calculation, the calculation performance is low when performing the calculation of the multidimensional tensor. If multiple general-purpose CPUs are used to perform operations in parallel, the intercommunication between them can become a performance bottleneck.
When the general GPU performs tensor operation, the general register file and the general stream processing unit are used for executing general SIMD instructions to perform tensor operation. However, the on-chip cache of the GPU is too small, and off-chip data handling is required continuously when performing large-scale tensor operations, and off-chip bandwidth becomes a major performance bottleneck.
Thus, improved tensor operation schemes are desired.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. Embodiments of the present application provide a tensor operation method, a tensor operation apparatus, and a computer that directly and rapidly acquire and perform an operation by describing an operation type and a data structure and a memory address of the tensor in a tensor operation instruction, thereby improving operation efficiency.
According to an aspect of the present application, there is provided a tensor operation method including: receiving a tensor operation instruction, wherein the tensor operation instruction comprises an instruction type field for describing operation types, a source data structure description field for describing the structure of source data, and a source data addressing field for describing the storage address of the source data in a storage space; analyzing the tensor operation instruction; according to the source data structure description field and the source data addressing field, obtaining source data to be operated from a storage space; and performing an operation defined by the instruction type field on the retrieved source data.
In some embodiments, the tensor operation instruction further includes a target data structure description field for describing a structure of target data, and a target data addressing field for describing a storage address of the target data within the storage space, and wherein the method further includes storing an operation result of the source data as target data in a data structure defined by the target data structure description field into the storage address defined by the target data addressing field.
In some embodiments, the operation types include: data handling, including data loading, data storage, and data movement; and data operations including addition, multiplication, reordering, scaling, convolution, and pooling.
In some embodiments, the tensor operation instruction includes one or more source data structure description fields, one or more source data addressing fields corresponding to the one or more source data structure description fields, one or more target data structure description fields, and one or more target data addressing fields corresponding to the one or more target data structure description fields.
In some embodiments, each of the source data addressing field and the destination data addressing field includes an immediate representing a memory address or a register number indicating a register in which the memory address is stored.
In some embodiments, the source data structure description field includes a source data dimension, a size of the source data in each dimension, and a source data type length, wherein the target data structure description field includes a target data dimension, a size of the target data in each dimension, and a target data type length, and wherein the storage address includes a start address, a dimension storage order, and a storage interval for each dimension.
In some embodiments, after parsing the tensor operation instruction and before fetching source data to perform an operation, the tensor operation method further comprises: and storing the tensor operation instruction after the analysis is completed in a cache queue.
According to another aspect of the present application, there is provided a tensor operation apparatus including: the acquisition unit is used for acquiring tensor operation instructions; the analysis unit is used for analyzing the obtained tensor operation instruction, and the tensor operation instruction comprises an instruction type field for describing operation types, a source data structure description field for describing the structure of source data and a source data addressing field for describing the storage address of the source data in the storage space; the data access unit is used for acquiring the source data to be operated from the storage space according to the source data structure description field and the source data addressing field; and a calculation execution unit for executing the operation defined by the instruction type field on the acquired source data.
In some embodiments, the tensor operation instruction further includes a target data structure description field for describing a structure of target data, and a target data addressing field for describing a storage address of the target data within the storage space.
In some embodiments, the data access unit is further configured to store, as target data, a result of the operation of the source data, into a storage address defined by the target data addressing field according to a data structure defined by the target data structure description field.
In some embodiments, the tensor operation device further comprises: and the caching unit is used for storing the analyzed tensor operation instruction in a cache queue.
According to another aspect of the present application, there is provided a computer comprising: a processor; and a memory in which computer program instructions are stored which, when executed by the processor, cause the processor to perform the tensor operation method described above.
According to another aspect of the application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the tensor operation method described above.
Compared with the prior art, the tensor operation method, the tensor operation device and the computer can receive tensor operation instructions, wherein the tensor operation instructions comprise instruction type fields for describing operation types, source data structure description fields for describing the structure of source data and source data addressing fields for describing storage addresses of the source data in a storage space; analyzing the tensor operation instruction; according to the source data structure description field and the source data addressing field, obtaining source data to be operated from a storage space; and performing an operation defined by the instruction type field on the acquired source data. In this way, the tensor can be directly and quickly acquired and operated by describing the operation type and the data structure and the storage address of the tensor in the tensor operation instruction, thereby improving the operation efficiency.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 illustrates a schematic diagram of the structure of a tensor operation instruction according to an embodiment of the present application.
Fig. 2 illustrates a flow chart of a tensor operation method according to an embodiment of the present application.
FIG. 3 illustrates a schematic diagram of an example of target data storage according to an embodiment of the application.
Fig. 4 illustrates a schematic diagram of a data structure description field according to an embodiment of the present application.
Fig. 5 illustrates a schematic diagram of a data addressing field according to an embodiment of the present application.
Fig. 6 illustrates a block diagram of a tensor operation device according to an embodiment of the present application.
Fig. 7 illustrates a block diagram of a computer in accordance with an embodiment of the present application.
Detailed Description
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Summary of the application
As described above, since general-purpose CPUs and GPUs have a problem of inefficiency in artificial intelligence-related tensor operations, such as neural network operations, it is necessary to improve the efficiency of tensor operations. For example, google developed a processor TPU dedicated to performing tensor operations to improve efficiency, but currently does not disclose and sell TPU externally.
In view of the above technical problems, the basic idea of the present application is to directly describe the operation type and the data structure and the storage address of the tensor in the tensor operation instruction, so that the processor can directly and quickly acquire the tensor and execute the operation.
Specifically, the application provides a tensor operation method, a tensor operation device and a computer, which firstly receive tensor operation instructions, wherein the tensor operation instructions comprise an instruction type field for describing operation types, a source data structure description field for describing the structure of source data and a source data addressing field for describing the storage address of the source data in a storage space, then analyze the tensor operation instructions, acquire the source data to be operated from the storage space according to the source data structure description field and the source data addressing field, and finally execute the operation defined by the instruction type field on the acquired source data. In this way, tensors can be directly and quickly acquired and operated, so that the operation efficiency is improved.
Here, it will be understood by those skilled in the art that the tensor operation method provided by the present application may be implemented by a new hardware architecture specifically developed for executing the tensor operation instruction, or may be implemented by modifying an existing CPU, GPU, FPGA, or the like to use the tensor operation instruction.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Exemplary instruction Structure
Fig. 1 illustrates a schematic diagram of the structure of a tensor operation instruction according to an embodiment of the present application.
As shown in fig. 1, a tensor operation instruction 100 according to an embodiment of the present application includes an instruction type field 110, a source data structure description field 120, and a source data addressing field 130.
Specifically, the instruction type field 110 is used to describe the type of tensor operation involved in the tensor operation instruction 100, or the function performed by the tensor operation instruction 100, such as data movement, data operation, and so on.
The source data structure description field 120 is used to describe the structure of the source data, such as the size of the multidimensional data in each dimension, the data length, etc.
The source data addressing field 130 is used to describe the storage address of the source data within the storage space, which together with the source data structure description field 120 is used to obtain the source data from the storage space to perform the operation defined by the instruction type field 110.
It will be appreciated that the source data addressing field 130 may include an immediate representing the memory address or may also include a register number indicating the register in which the memory address is stored, so that the memory address may be retrieved by accessing the register.
In addition, the number of source data structure description fields 120 and source data addressing fields 130 corresponds to the number of source data depending on the number of source data involved in the tensor operation.
The specific structure and application of the respective fields will become more apparent in the following description of the tensor operation method according to the embodiment of the present application with reference to the accompanying drawings.
Exemplary method
Fig. 2 illustrates a flow chart of a tensor operation method according to an embodiment of the present application.
As shown in fig. 2, the tensor operation method according to an embodiment of the present application includes:
step S210, receiving a tensor operation instruction, where the tensor operation instruction may include an instruction type field for describing an operation type, a source data structure description field for describing a structure of source data, and a source data addressing field for describing a storage address of the source data in a storage space, as described above;
step S220, analyzing the tensor operation instruction, so that the fields and the specific contents included in the instruction can be determined;
step S230, fetching the source data to be operated from the storage space according to the source data structure description field and the source data addressing field determined by analysis; and
in step S240, an operation defined by the instruction type field is performed on the acquired source data.
In step S210, a tensor operation instruction is received. The tensor operation instructions may be stored in a memory or an instruction cache in advance, and the fetching unit described below may fetch the tensor operation instructions from the memory or the instruction cache. Here, the structure of the tensor operation instruction has been described above with reference to fig. 1, and it will be understood by those skilled in the art that the tensor in the embodiment of the present application may include tensors having various dimensions, for example, may be one-dimensional tensors, i.e., vectors, may be two-dimensional tensors, i.e., matrices, and may also be higher-dimensional tensors such as three-dimensional, four-dimensional, and the like. Taking the calculation in the artificial intelligence field as an example, the tensor commonly used in convolution operation is a three-dimensional tensor.
Accordingly, the operation types involve various operations and operations on tensors, typically including data shifting and data operations. Examples of data movement operations include, but are not limited to, data loading, data storage, and data movement, among others. For example, data loading may refer to moving tensor data from an external memory space, such as a hard disk or solid state memory SSD, to an internal memory space, such as a DDR memory or SRAM cache, data storage may refer to storing tensor data from an internal memory space, such as a DDR memory or SRAM cache, to an external memory space, and data movement may be movement of tensor data within an internal memory space or cache, or the like. Examples of data operations include, but are not limited to, addition, multiplication, reordering, scaling, convolution, pooling, and the like, such as vector addition, vector multiplication, matrix multiplication, and the like. Here, for example, reordering refers to changing the order of storage of tensor data and newly storing, convolution refers to performing a convolution operation on two tensor data, for example, two matrices, and obtaining and storing another tensor data, for example, another matrix, and pooling refers to performing a pooling operation on one tensor data to obtain and store another tensor data. The various operations and operations described herein on tensors are not limited to any particular operation and operation, e.g., convolution may include various convolution operations such as depth direction (depthwise) convolution, pixel-by-pixel (pointwise) convolution, group convolution, sparse convolution, etc., pooling may include maximum pooling, minimum pooling, average pooling, global pooling, etc. Those skilled in the art will appreciate that data operations may also include more complex operations on tensors, such as dimension reduction operations on high-dimensional data, and the like. Various data operations, including handling and computing, are known to those skilled in the relevant arts and are not exemplified herein.
The details of the source data structure description field and the source data addressing field and how the source data is obtained based on these fields will be further described below.
In step S220, the tensor operation instruction is parsed. That is, by parsing the tensor operation instruction, it is determined fields included in the instruction, such as how many source data structure description fields and source data addressing fields, which indicate how many source data are needed for calculation, and the contents of the respective fields. For example, the analysis step S220 can determine the type of operation related to the tensor operation instruction, the structure of source data related to the operation, the storage address in the storage space, and the like.
Optionally, after step S220, storing the parsed tensor operation instruction in a buffer queue is further included, which is especially applicable to the case of multiple tensor operation instructions. That is, after the analysis of the plurality of tensor operation instructions is completed, the analyzed plurality of tensor operation instructions may be stored in a buffer queue, and after the data access and operation corresponding to the previous tensor operation instruction is completed, the next analyzed tensor operation instruction is taken out from the buffer queue, and the corresponding data is acquired and operated.
Next, in step S230, respective source data to be operated on may be retrieved from the storage space according to the source data structure description field and the source data addressing field determined in the parsing step. It should be appreciated that, as previously described, the source data addressing field may include an immediate representing the memory address, or may also include a register number indicating the register in which the memory address is stored, so that the memory address may be retrieved by accessing the register. The source data may be stored in a cache, such as an SRAM memory, of the processor, for example, where the source data addressing field indicates the address of the source data in the cache. In other embodiments, the source data may also be stored in a memory external to the processor, such as DDR memory, for example, where the source data addressing field indicates the address of the source data in memory.
Specifically, a source data structure description field is used to describe the structure of source data. For example, for three-dimensional data, three-dimensional dimensions, i.e., an x-direction dimension size_x, a y-direction dimension size_y, a z-direction dimension size_z, and a data type length, such as 8 bits, 16 bits, and the like, need to be described. In some embodiments, the source data structure description field may also include a symbol flag that indicates whether the source data is signed or unsigned.
And, the source data addressing field describes the storage structure of the source data within the storage space. For example, for three-dimensional data, the storage order in the x-direction storage interval stride_x, the y-direction storage interval stride_y, and the z-direction storage interval stride_z, including the start address addr_st, xyz of the data, are included. Also, the storage order may employ a default fixed storage order, such as an xyz-direction storage order for three-dimensional data. In this way, the memory address of any part of the tensor data in the memory space can be obtained. For example, taking xyz as an example of the storage sequence of three-dimensional data, the address calculation formula of any point (x, y, z) in the storage space is addr_st+x_x+y_y+z_stride_z.
Here, it will be appreciated by those skilled in the art that for a fixed memory space Size, a predetermined condition needs to be satisfied between the memory space and the data Size, i.e., assuming that n data points are stored in the memory space, then stride_x+.Size_x/n, stride_y+.Size_y/n, and stride_z+.Size_z/n need to be satisfied, also taking three-dimensional data as an example.
Thus, in the tensor operation method according to an embodiment of the present application, the source data structure description field includes a source data dimension, a size of source data in each dimension, and a source data type length, and the source data addressing field includes a start address, a dimension storage order, and a storage interval of each dimension.
Also, as described above, in the tensor operation method according to an embodiment of the present application, the tensor operation instruction may include one or more source data structure description fields and one or more source data address fields respectively corresponding to the one or more source data structure description fields, depending on the number of source data. For example, for a convolution operation, one source data may be a feature data tensor and the other source data may be a convolution kernel tensor. Here, the contents of each source data structure description field and each source data addressing field are the same as those described above, and a detailed description thereof will be omitted.
In step S240, the operation defined by the instruction type field is performed on the acquired source data. That is, by analyzing the tensor operation instruction, the source data can be obtained quickly directly according to the manner described in the above step S130, and the corresponding operation is performed based on the determined tensor operation type, thereby improving the calculation efficiency of the tensor operation.
In addition, in the tensor operation method according to the embodiment of the present application, the tensor operation instruction may further describe a data structure and a storage address of the target data. That is, the tensor operation instruction may further include a target data structure description field for describing a structure of target data, and a target data addressing field for describing a storage address of the target data within the storage space. The target data is the result data generated by the calculation of step S240. It will be appreciated that the destination data structure description field and the destination data addressing field may be similar to the source data structure description field and the source data addressing field, respectively, except that the latter describes the operation source data, the former describes the operation result (destination) data. For example, the destination data addressing field may also include an immediate representing the destination data storage address, similar to the source data addressing field, or may also include a register number indicating a register in which the storage address is stored, so that the storage address may be retrieved by accessing the register. Accordingly, a detailed description of the target data structure description field and the target data addressing field will not be repeated here. It should be appreciated that the data structure of the target data may be different from the data structure of the source data.
In this way, the target data as a result of the operation can be further stored in the storage address defined by the target data addressing field according to the data structure defined by the target data structure description field, thereby achieving the storage of the target data in the desired manner, which is particularly advantageous in tensor-dependent operations. Taking convolution operation as an example, the calculation process involves sequentially performing convolution operations on each part of the multi-channel image data by using a convolution kernel, and the result of each convolution operation is used as a part of one image. By storing a part of the images obtained by each convolution operation according to a predetermined rule, it is possible to finally realize continuous storage of the entire image according to a predetermined order. Thus, the internal address management is facilitated, and the subsequent operation such as loading, moving, operation and the like on the image data is also facilitated, so that the operation efficiency is further improved.
Fig. 3 illustrates an example of target data according to an embodiment of the present application, by which the principle of the above-described storage is explained. As shown in fig. 3, if the data obtained after each convolution is 3 x 3 data, the final image data of 9×9×3 is formed, and includes 243 pixels. In storing, it is desirable that the entire image data shown in fig. 3 be stored in the memory in a predetermined order, which is generally stored in the order of xyz, and if the start storage position is set to 0, the 243 pixel points are stored in the storage addresses of 0 to 242 in the order of xyz. For each pixel xyz, the storage location is (x-1) + (y-1) x 9+ (z-1) x 81. If the target data obtained after a certain calculation is x=4 to 6, y=4 to 6, and z=1 to 3, as indicated by a dotted line box in the figure, the desired storage locations of these 27 data are 30-32, 39-41, 48-50, 111-113, 120-122, 129-131, 192-194, 201-203, 210-212. Therefore, the storage start position addr_st of the target data can be set to 30, the storage order is xyz, the sizes size_x, size_y, and size_z in each dimension direction are 3, the storage interval stride_x in the x direction is 1, the storage interval stride_y in the y direction is 9, and the storage interval stride_z in the z direction is 81. Thus, 3 x 3 target data indicated by a dotted line frame can be stored in a predetermined position, eventually, the whole image is stored in the consecutive storage positions of 0-242 according to the sequence of xyz, so that the subsequent operation on the image is facilitated. It will be appreciated that the calculated target data may in turn be used as source data for a next level of computation in, for example, a neural network, and thus the data structure description fields and address fields of the source data and target data may be similar to each other, but the values thereof, i.e. the specific data structures and storage locations defined, may be different from each other.
As described above, by specifying the data structure and the storage address for the target data, it is possible to store a plurality of target data generated by a plurality of operations to predetermined positions, respectively, so that the plurality of target data as a whole are stored in a desired order, unlike a storage array in which the respective target data are stored in order. Thus, address management, arithmetic operation, and the like of the whole data constituted by the plurality of target data can be facilitated, thereby further improving the calculation efficiency related thereto.
Fig. 4 illustrates a schematic diagram of a data structure description field according to an embodiment of the present application. As shown in fig. 4, the data structure description field 300 includes a data dimension 310, a first dimension 320-1, a second dimension 320-2, …, an nth dimension 320-n, and a data type length 330, which are the same as those described above, and will not be repeated here.
Fig. 5 illustrates a schematic diagram of a data addressing field according to an embodiment of the present application. As shown in FIG. 5, the data addressing field 400 includes a start address 410, a dimension storage order 420, a first dimension interval 430-1, a second dimension interval 430-2, …, and an nth dimension interval 430-n. Also, the detailed description is the same as that described above, and will not be repeated here. As described above, the data structure description field and the data addressing field of the source data and the target data may be similar to each other.
As previously described, the source data addressing field and the destination data addressing field may include an immediate representing a data storage address, or may also include a register number indicating a register in which the storage address is stored, so that the storage address may be retrieved by accessing the register. The register number can be represented with a smaller number of bits than the immediate number, so that the instruction can be made shorter, thereby improving the operation efficiency.
Exemplary apparatus
Fig. 6 illustrates a block diagram of a tensor operation device according to an embodiment of the present application.
As shown in fig. 6, the tensor operation device 500 according to an embodiment of the present application includes: a parsing unit 510, sometimes also called an instruction decoder, for parsing the received tensor operation instruction, the tensor operation instruction comprising an instruction type field for describing an operation type, a source data structure description field for describing a structure of source data, and a source data addressing field for describing a storage address of the source data in a storage space; a data access unit 520, configured to obtain, from a storage space such as a cache or a memory, source data to be operated on according to the source data structure description field and the source data addressing field; and a calculation execution unit 530 for executing the operation defined by the instruction type field on the acquired source data.
In one example, in the tensor operation device 500, the tensor operation instruction further includes a target data structure description field for describing a structure of the target data, and a target data address field for describing a storage address of the target data in the storage space.
In one example, in the tensor operation device 500, the data access unit 520 is further configured to store, as target data, an operation result of the source data, into a storage address defined by the target data addressing field according to a data structure defined by the target data structure description field.
In one example, in the tensor computing device 500, the operation types include: data handling, including data loading, data storage, and data movement; and, data operations including addition, multiplication, reordering, scaling, convolution, and pooling.
In one example, in the tensor computing device 500 described above, the tensor computing instruction includes one or more source data structure description fields, one or more source data addressing fields corresponding to the one or more source data structure description fields, one or more target data structure description fields, and one or more target data addressing fields corresponding to the one or more target data structure description fields.
In one example, in the tensor computing device 500, the source data structure description field includes a source data dimension, a size of the source data in each dimension, and a source data type length, wherein the source data addressing field includes a start address, a dimension storage order, and a storage interval of each dimension, wherein the target data structure description field includes a target data dimension, a size of the target data in each dimension, and a target data type length, and wherein the target data addressing field includes a start address, a dimension storage order, and a storage interval of each dimension.
In one example, the tensor computing device 500 further includes: the retrieving unit 540 is configured to retrieve the tensor operation instruction from the memory for storing the tensor operation instruction.
In one example, the tensor computing device 500 further includes: the buffer unit 550 is configured to store the parsed tensor operation instruction in a buffer queue.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above tensor operation device 500 have been described in detail in the above description with reference to fig. 1 to 5, and thus, repetitive descriptions thereof will be omitted.
As described above, the tensor operation device 500 according to the embodiment of the present application may be implemented in various terminal apparatuses, for example, a computer or the like for performing tensor operation. In one example, the tensor operation device 500 according to an embodiment of the present application may be integrated into the terminal device as one software module and/or hardware module. For example, the tensor operation means 500 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the tensor operation device 500 may also be one of a plurality of hardware modules of the terminal device.
Alternatively, in another example, the tensor operation device 500 and the terminal device may be separate devices, and the tensor operation device 500 may be connected to the terminal device through a wired and/or wireless network and transmit the interaction information according to an agreed data format.
Exemplary computer
Fig. 7 illustrates a block diagram of a computer in accordance with an embodiment of the present application.
As shown in fig. 7, the computer 10 includes a processor 11 and a memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, such as a Graphics Processing Unit (GPU), or may be other types of processing units as well, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that may be executed by the processor 11 to implement the tensor operation method and/or other desired functions of the various embodiments of the present application described above. Various contents such as source data, target data, etc. may also be stored in the computer-readable storage medium.
In one example, computer 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
For example, the input device 13 may include, for example, a keyboard, a mouse, and the like.
The output device 14 may output various information to the outside, including the result of tensor operation and the like. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the computer 10 relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. being omitted. In addition, computer 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a tensor operation method according to the various embodiments of the application described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in the tensor operation method according to the various embodiments of the present application described in the above "exemplary method" section of the present specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (9)

1. A tensor operation method, comprising:
receiving a tensor operation instruction, wherein the tensor for the tensor operation instruction comprises: one-dimensional tensors, two-dimensional tensors, three-dimensional tensors, four-dimensional tensors, or tensors of higher dimensions; the tensor operation instruction includes: an instruction type field for describing a type of operation, one or more source data structure description fields for corresponding to a structure describing one or more source data to be operated on, one or more source data addressing fields for describing a storage address of the one or more source data within a storage space corresponding to the one or more source data structure description fields, a target data structure description field for describing a structure of target data, and a target data addressing field for describing a storage address of the target data within the storage space; wherein the source data structure description field comprises a source data dimension, a size of the source data in each dimension, and a source data type length, the target data structure description field comprises a target data dimension, a size of the target data in each dimension, and a target data type length, and the storage address comprises a start address, a dimension storage order, and a storage interval of each dimension;
analyzing the tensor operation instruction;
acquiring each source data to be operated from a storage space according to the source data structure description field and the source data addressing field corresponding to each source data to be operated; and
performing an operation defined by the instruction type field on the retrieved source data;
and taking the operation result as target data, and storing the data structure defined by the target data structure description field into a storage address defined by the target data addressing field.
2. The tensor operation method of claim 1, wherein the operation type includes:
data movement, wherein the data movement comprises data loading, data storage and data movement; and
data operations including addition, multiplication, reordering, scaling, convolution, and pooling.
3. The tensor operation method of claim 1, wherein the tensor operation instruction includes: one or more target data structure description fields, and one or more target data addressing fields corresponding to the one or more target data structure description fields.
4. A tensor operation method as claimed in claim 3, wherein each of said source data addressing field and said target data addressing field comprises an immediate representing a memory address or a register number indicating a register in which a memory address is stored.
5. The tensor operation method of claim 1, wherein after parsing the tensor operation instruction and before retrieving each source data to perform an operation, the tensor operation method further comprises:
and storing the tensor operation instruction after the analysis is completed in a cache queue.
6. A tensor operation device comprising:
the acquisition unit is used for acquiring tensor operation instructions, and the tensor for the tensor operation instructions comprises: one-dimensional tensors, two-dimensional tensors, three-dimensional tensors, four-dimensional tensors, or tensors of higher dimensions;
an analysis unit, configured to analyze the obtained tensor operation instruction, where the tensor operation instruction includes: an instruction type field for describing a type of operation, one or more source data structure description fields for corresponding to a structure describing one or more source data to be operated on, one or more source data addressing fields for describing a storage address of the one or more source data within a storage space corresponding to the one or more source data structure description fields, a target data structure description field for describing a structure of target data, and a target data addressing field for describing a storage address of the target data within the storage space; wherein the source data structure description field comprises a source data dimension, a size of the source data in each dimension, and a source data type length, the target data structure description field comprises a target data dimension, a size of the target data in each dimension, and a target data type length, and the storage address comprises a start address, a dimension storage order, and a storage interval of each dimension;
a data access unit, configured to obtain, from a storage space, each source data to be operated according to the source data structure description field and the source data addressing field corresponding to each source data to be operated; and
a calculation execution unit for executing the operation defined by the instruction type field on the acquired source data;
the data access unit is further configured to store the operation result as target data, according to a data structure defined by the target data structure description field, into a storage address defined by the target data addressing field.
7. The tensor computing device of claim 6, further comprising:
and the caching unit is used for storing the analyzed tensor operation instruction in a cache queue.
8. A computer, comprising:
a processor; and
a memory in which computer program instructions are stored which, when executed by the processor, cause the processor to perform the tensor operation method of any one of claims 1 to 5.
9. A computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the tensor operation method of any of claims 1-5.
CN201811109603.XA 2018-09-21 2018-09-21 Tensor operation method and device Active CN110941789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811109603.XA CN110941789B (en) 2018-09-21 2018-09-21 Tensor operation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811109603.XA CN110941789B (en) 2018-09-21 2018-09-21 Tensor operation method and device

Publications (2)

Publication Number Publication Date
CN110941789A CN110941789A (en) 2020-03-31
CN110941789B true CN110941789B (en) 2023-12-15

Family

ID=69904576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811109603.XA Active CN110941789B (en) 2018-09-21 2018-09-21 Tensor operation method and device

Country Status (1)

Country Link
CN (1) CN110941789B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11275671B2 (en) * 2020-07-27 2022-03-15 Huawei Technologies Co., Ltd. Systems, methods and media for dynamically shaped tensors using liquid types
CN114489790A (en) * 2020-11-13 2022-05-13 中科寒武纪科技股份有限公司 Data processing device, data processing method and related product
CN114218152B (en) * 2021-12-06 2023-08-15 海飞科(南京)信息技术有限公司 Stream processing method, processing circuit and electronic equipment
CN115599442B (en) * 2022-12-14 2023-03-10 成都登临科技有限公司 AI chip, electronic equipment and tensor processing method
CN115658146B (en) * 2022-12-14 2023-03-31 成都登临科技有限公司 AI chip, tensor processing method and electronic equipment
CN116202760B (en) * 2023-05-05 2023-08-18 赛腾机电科技(常州)有限公司 Singular value decomposition method and system for third-order tensor for mechanical fault diagnosis
CN116861149B (en) * 2023-09-05 2024-01-09 之江实验室 Convolution operation optimization method, device and processor

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4633389A (en) * 1982-02-03 1986-12-30 Hitachi, Ltd. Vector processor system comprised of plural vector processors
JPH09198374A (en) * 1996-01-23 1997-07-31 Hitachi Ltd Vector processor
US6212622B1 (en) * 1998-08-24 2001-04-03 Advanced Micro Devices, Inc. Mechanism for load block on store address generation
CN106991077A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of matrix computations device
CN106990940A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of vector calculation device
CN107111489A (en) * 2014-11-14 2017-08-29 英特尔公司 Morton Coordinate Adjusting processor, method, system and instruction
CN107315574A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing matrix multiplication
CN107608715A (en) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 For performing the device and method of artificial neural network forward operation
CN107766079A (en) * 2016-08-19 2018-03-06 北京百度网讯科技有限公司 Processor and the method for execute instruction on a processor
CN107977231A (en) * 2017-12-15 2018-05-01 北京中科寒武纪科技有限公司 A kind of computational methods and Related product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10275243B2 (en) * 2016-07-02 2019-04-30 Intel Corporation Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4633389A (en) * 1982-02-03 1986-12-30 Hitachi, Ltd. Vector processor system comprised of plural vector processors
JPH09198374A (en) * 1996-01-23 1997-07-31 Hitachi Ltd Vector processor
US6212622B1 (en) * 1998-08-24 2001-04-03 Advanced Micro Devices, Inc. Mechanism for load block on store address generation
CN107111489A (en) * 2014-11-14 2017-08-29 英特尔公司 Morton Coordinate Adjusting processor, method, system and instruction
CN106991077A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of matrix computations device
CN106990940A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of vector calculation device
CN107315574A (en) * 2016-04-26 2017-11-03 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing matrix multiplication
CN107766079A (en) * 2016-08-19 2018-03-06 北京百度网讯科技有限公司 Processor and the method for execute instruction on a processor
CN107608715A (en) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 For performing the device and method of artificial neural network forward operation
CN107977231A (en) * 2017-12-15 2018-05-01 北京中科寒武纪科技有限公司 A kind of computational methods and Related product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GPU矩阵乘法的性能定量分析模型;尹孟嘉;《计算机科学》;第13-17页 *

Also Published As

Publication number Publication date
CN110941789A (en) 2020-03-31

Similar Documents

Publication Publication Date Title
CN110941789B (en) Tensor operation method and device
EP3451162B1 (en) Device and method for use in executing matrix multiplication operations
EP3832499B1 (en) Matrix computing device
EP3407182B1 (en) Vector computing device
EP3451163A1 (en) Device and method for use in executing matrix addition/subtraction operations
EP3627437A1 (en) Data screening device and method
US11010338B2 (en) Data screening device and method
US9037798B2 (en) System and method of operating a computing device to perform memoization including transforming input/output parameters to reduce redundancies and efficiently cache data
KR20200143685A (en) Method and accelerator device for accelerating computation
US20190179635A1 (en) Method and apparatus for tensor and convolution operations
US20190197761A1 (en) Texture processor based ray tracing acceleration method and system
US20130243329A1 (en) Parallel object detection method for heterogeneous multithreaded microarchitectures
EP3832500B1 (en) Device and method for performing vector four-fundamental-rule operation
US20120221788A1 (en) Multi-dimensional array manipulation
US20130166516A1 (en) Apparatus and method for comparing a first vector of data elements and a second vector of data elements
CN107315716B (en) Device and method for executing vector outer product operation
US11429872B2 (en) Accelerated decision tree execution
US20190278574A1 (en) Techniques for transforming serial program code into kernels for execution on a parallel processor
CN111651202A (en) Device for executing vector logic operation
US10049487B2 (en) Identifying duplicate indices in an input index stream
CN113688982A (en) Processing unit, related device and method
KR20210014561A (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
Li Parallel implementation of the recursive least square for hyperspectral image compression on GPUs
US11409840B2 (en) Dynamically adaptable arrays for vector and matrix operations
KR20230124598A (en) Compressed Command Packets for High Throughput and Low Overhead Kernel Initiation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant