CN117539798A

CN117539798A - Method for data processing, computing device and computer readable storage medium

Info

Publication number: CN117539798A
Application number: CN202311498474.9A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Bi Ren Technology Co ltd
Current assignee: Shanghai Bi Ren Technology Co ltd
Priority date: 2023-11-10
Filing date: 2023-11-10
Publication date: 2024-02-09

Abstract

The present disclosure relates to a method of data processing, a computing device, and a computer-readable storage medium. The method comprises the following steps: responding to one of the input data in the first format and the output data in the second format being a continuous tensor, and the other of the input data in the first format and the output data in the second format being a discontinuous tensor, wherein the first format and the second format are both special data formats of the artificial intelligence chip, traversing the shape of the continuous tensor; calculating to obtain a physical address of the discontinuous tensor corresponding to the current traversed logical coordinate of the continuous tensor based on the current traversed logical coordinate of the continuous tensor, the step length of the continuous tensor and the step length of the discontinuous tensor; and performing discontinuous copying based on the logical coordinates traversed by the continuous tensor and the calculated physical addresses of the corresponding discontinuous tensors. Thus, the present disclosure can at least improve the overall performance of an artificial intelligence chip for data processing.

Description

Method for data processing, computing device and computer readable storage medium

Technical Field

The present disclosure relates generally to the field of information processing, and in particular, to a method, computing device, and computer readable storage medium for data processing.

Background

Artificial intelligence chips typically have a proprietary data format (also referred to as a proprietary data layout) that is different from the data format at the Host (Host) end. For example, the data format at the host side is a Plain (Plain) format in which memory addresses are continuous in dimension.

In the artificial intelligent chip, a deformation (View) type operator can carry out data movement on tensors in a memory, and the deformation type operator can only logically convert continuous tensors into discontinuous tensors without carrying out data movement. For example, view type operators may include a Reshape operator, a Permute operator, a Slice operator, and the like. In some modes for artificial intelligence model training (e.g., the eager mode), there may be a large number of View type operators that typically logically convert continuous tensors to discontinuous tensors in order to avoid the performance overhead of frequent data movement. When the artificial intelligence model needs to use a discontinuous tensor, it needs to re-open a memory space to perform a discontinuous Copy (Uncontiguous Copy, also called strudded Copy) operation to ensure that the data is continuous in memory.

In conventional data processing schemes, when a View type operator needs to convert data in one proprietary data format into data in another proprietary format, the conversion of proprietary data format and Plain format is typically performed by invoking a Reorder (Reorder) operator, and the discontinuous copy involved is performed under the Plain format data. However, artificial intelligence chips perform poorly when data processing is performed with Plain format data; and, the artificial intelligent chip can generate additional expenditure when calling the Reorder operator, thereby further reducing the overall performance of the artificial intelligent chip.

In summary, the conventional data processing solution has the following disadvantages: discontinuous copying is performed under the data in the Plain format, and a Reorder operator needs to be called before and after discontinuous copying is performed, so that the data processing performance is poor.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a method, a computing device, and a computer-readable storage medium for data processing that can at least improve the overall performance of an artificial intelligence chip for data processing.

According to a first aspect of the present disclosure, there is provided a method of data processing, the method comprising: in response to one of the input data in the first format and the output data in the second format being a continuous tensor and the other of the input data in the first format and the output data in the second format being a discontinuous tensor, wherein both the first format and the second format are proprietary data formats of the artificial intelligence chip, traversing a Shape (Shape) of the continuous tensor; calculating a physical address of the discontinuous tensor corresponding to the current traversed logical coordinate of the continuous tensor based on the current traversed logical coordinate of the continuous tensor, a step length (Stride) of the continuous tensor and a step length of the discontinuous tensor; and performing discontinuous copying based on the logical coordinates traversed by the continuous tensor and the calculated physical addresses of the corresponding discontinuous tensors.

In some embodiments, the private data format of the artificial intelligence chip is a block-wise sequential format of memory addresses.

In some embodiments, the method further comprises: judging whether the input data and the output data are continuous or not; if the input data is continuous and the output data is discontinuous, determining the input data as a continuous tensor and the output data as a discontinuous tensor; and if the input data is discontinuous and the output data is continuous, determining the input data as a discontinuous tensor and the output data as a continuous tensor.

In some embodiments, calculating the physical address of the non-continuous tensor corresponding to the logical coordinates of the continuous tensor that are currently traversed includes: in response to the output data being a continuous tensor, the input data being a discontinuous tensor, performing the following: calculating to obtain a logic address of the corresponding input data based on the current traversed logic coordinates of the output data; calculating logic coordinates of the corresponding input data based on the logic address of the corresponding input data; and calculating the physical address of the corresponding input data based on the logical coordinates of the corresponding input data.

In some embodiments, calculating the logical address of the corresponding input data based on the current traversed logical coordinates of the output data includes: and multiplying the currently traversed logic coordinates of the output data by the step length of the input data to obtain the corresponding logic address of the input data.

In some embodiments, calculating logical coordinates of the corresponding input data based on the logical address of the corresponding input data includes: based on the logical address of the corresponding input data and the step length of the input data, calculating to obtain the logical coordinates of the corresponding input data.

In some embodiments, calculating the physical address of the corresponding input data based on the logical coordinates of the corresponding input data includes: calculating to obtain a physical address offset value of the corresponding input data based on the logical coordinates of the corresponding input data and the shape of the input data; and calculating the physical address of the corresponding input data based on the physical address offset value and the memory starting address of the corresponding input data.

In some embodiments, making the non-sequential copy based on the current traversed logical coordinates of the sequential tensor, the calculated physical addresses of the corresponding non-sequential tensor includes: reading target data based on the physical address of the corresponding input data; and writing the read target data into the output data according to the current traversed logic coordinates of the output data.

In some embodiments, calculating the physical address of the non-continuous tensor corresponding to the logical coordinates of the continuous tensor that are currently traversed includes: in response to the input data being a continuous tensor, the output data being a discontinuous tensor, performing the following: calculating to obtain a logic address of corresponding output data based on the current traversed logic coordinates of the input data; calculating to obtain the logic coordinates of the corresponding output data based on the logic address of the corresponding output data; and calculating the physical address of the corresponding output data based on the logical coordinates of the corresponding output data.

In some embodiments, making the non-sequential copy based on the current traversed logical coordinates of the sequential tensor, the calculated physical addresses of the corresponding non-sequential tensor includes: reading target data based on the currently traversed logic coordinates of the input data; and writing the read target data into the output data according to the physical address of the output data.

According to a second aspect of the present disclosure, there is provided a computing device comprising: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor cause the computing device to perform the method of the first aspect of the disclosure.

According to a third aspect of the present disclosure there is provided a computer readable storage medium having stored thereon computer program code which when executed performs the method of the first aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, the same or similar reference numerals denote the same or similar elements.

Fig. 1 illustrates a schematic diagram of a conventional data processing procedure.

Fig. 2 illustrates a flow chart of a method of data processing according to an embodiment of the present disclosure.

FIG. 3 illustrates a flowchart of a method of calculating physical addresses of non-consecutive tensors corresponding to the logical coordinates of the consecutive tensors currently traversed in accordance with an embodiment of the present disclosure.

Fig. 4A illustrates exemplary input data according to an embodiment of the present disclosure.

Fig. 4B illustrates an exemplary physical arrangement of input data on a memory according to an embodiment of the present disclosure.

Fig. 4C illustrates exemplary output data according to an embodiment of the present disclosure.

Fig. 4D illustrates an exemplary physical arrangement of output data on a memory according to an embodiment of the present disclosure.

FIG. 5 illustrates a flowchart of a method of calculating physical addresses of non-consecutive tensors corresponding to the logical coordinates of the consecutive tensors currently traversed in accordance with an embodiment of the present disclosure.

Fig. 6 schematically illustrates a block diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment. The term "another embodiment" means "at least one additional embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

Fig. 1 illustrates a schematic diagram of a conventional data processing procedure. As shown in fig. 1, the View type operator in the artificial intelligence chip converts input data in a four-dimensional Activation (Activation) format into output data in a Matrix (Matrix) format by performing the following processes, wherein the Activation format and the Matrix format are both proprietary data formats of the artificial intelligence chip: converting input data in an action format into first intermediate data in a Plain format by reordering (e.g., invoking Reorder operator); modifying a step size (not shown in fig. 1) associated with the first intermediate data in Plain format according to semantics of the View type operator; making a discontinuous copy to convert the first intermediate data in Plain format to second intermediate data in Plain format; the second intermediate data in the Plain format is converted to output data in Matrix format by reordering (e.g., invoking Reorder operator).

In the conventional data processing technical scheme shown in fig. 1, discontinuous copying is performed under the data in the Plain format, and a Reorder operator needs to be called before and after discontinuous copying, so that the performance is poor.

To at least partially address one or more of the above problems, as well as other potential problems, example embodiments of the present disclosure propose a solution for data processing. In the embodiment of the disclosure, by traversing the shape of the continuous tensor, calculating to obtain the physical address of the discontinuous tensor corresponding to the current traversed logical coordinate of the continuous tensor based on the current traversed logical coordinate of the continuous tensor, the step length of the continuous tensor and the step length of the discontinuous tensor, and performing discontinuous copying based on the current traversed logical coordinate of the continuous tensor and the calculated physical address of the corresponding discontinuous tensor, the discontinuous copying is directly performed under the data in the special data format of the artificial intelligent chip, and a Reorder operator is not required before and after the discontinuous copying is performed, so that the overall performance of the data processing of the artificial intelligent chip can be at least improved.

The present disclosure is illustrated by the following several specific examples. Detailed descriptions of known functions and known components may be omitted for the sake of clarity and conciseness in the following description of the embodiments of the present disclosure. When any element of an embodiment of the present disclosure appears in more than one drawing, the element is identified by the same reference numeral in each drawing.

Fig. 2 illustrates a flow chart of a method 200 of data processing according to an embodiment of the present disclosure. The method 200 may be performed by an artificial intelligence chip or by the electronic device 600 shown in fig. 6. It should be understood that method 200 may also include additional blocks not shown and/or that the blocks shown may be omitted, the scope of the disclosure being not limited in this respect.

It should be noted that the artificial intelligence chip for implementing the embodiments of the present disclosure may depend on the actual situation, and the present disclosure is not limited thereto. For example, artificial intelligence chips for implementing embodiments of the present disclosure are GPUs, GPGPUs, etc. for deep learning. For another example, an artificial intelligence chip for implementing embodiments of the present disclosure may have one or more processing units, which may include special purpose processing units such as GPGPU, GPU, FPGA and ASICs.

In the embodiment shown in fig. 2, in response to one of the input data in the first format and the output data in the second format being a continuous tensor and the other of the input data in the first format and the output data in the second format being a discontinuous tensor, wherein both the first format and the second format are proprietary data formats of the artificial intelligence chip, the artificial intelligence chip may proceed to steps 202 through 206.

Regarding the continuous tensor, the storage sequence of the bottom one-dimensional array elements in the memory is consistent with the element sequence which is expanded in one dimension according to the line priority. For example, reference may be made to the embodiments described later in connection with fig. 4C and 4D, which are not repeated here.

Regarding the discontinuous tensor, the storage sequence of the bottom one-dimensional array elements in the memory is inconsistent with the element sequence which is unfolded in one dimension according to the line priority. For example, reference may be made to the embodiments described later in connection with fig. 4A and 4B, which are not repeated here.

With respect to the proprietary data format of the artificial intelligence chip, it refers to a data format specifically designed for the artificial intelligence chip, and it is different from the Plain format. For example, the proprietary data format of an artificial intelligence chip is a format in which memory addresses are contiguous in blocks. For example, proprietary data formats for artificial intelligence chips include Matrix format, action format, convolutional weight (Convweight) format, two-dimensional Vector (Vector) format, and deep convolutional weight (DWCWeight) format, among others. It should be noted that the specific data format of the artificial intelligence chip may depend on the actual situation, and the embodiment of the present disclosure is not limited thereto.

At step 202, the shape of the continuous tensor is traversed.

Regarding the shape of the tensor, it refers to the data volume of the element for each dimension of the tensor. For example, reference may be made to the embodiments described later in connection with fig. 4A and 4C, which are not repeated here.

With respect to traversing the shape of the continuous tensor, it refers to traversing each element in the continuous tensor according to the shape of the continuous tensor. It should be noted that, the manner, order, granularity, etc. of traversing the elements when traversing the shape of the continuous tensor may depend on the actual situation, and embodiments of the present disclosure are not limited thereto. For example, traversing the shape of the continuous tensor includes traversing the shape of the continuous tensor at a memory friendly granularity.

In step 204, a physical address of the non-continuous tensor corresponding to the logical coordinate traversed by the current continuous tensor is calculated based on the logical coordinate traversed by the current continuous tensor, the step size of the continuous tensor, and the step size of the non-continuous tensor.

With respect to logical coordinates, it refers to the coordinates of the tensor semantically. For example, reference may be made to the embodiments described later in connection with fig. 3, 4A and 4C, which are not repeated here.

With respect to step size, it refers to the number of elements that a tensor needs to skip in memory in order to fetch the next element along each dimension. For example, reference may be made to the embodiments described later in connection with fig. 3, 4A to 4D, and no further description is given here.

With respect to physical addresses, it refers to the address of the tensor in memory. For example, reference may be made to the embodiments described later in connection with fig. 4B and 4D, which are not repeated here.

As for the physical address of the discontinuous tensor corresponding to the logical coordinate traversed by the continuous tensor, for example, reference may be made to the embodiments described later in connection with fig. 3 and fig. 4A to 4D, or reference may be made to the embodiments described later in connection with fig. 5, which will not be repeated here.

In step 206, a non-continuous copy is made based on the logical coordinates of the continuous tensor currently traversed, the physical address of the corresponding non-continuous tensor calculated.

With respect to discontinuous copying, it refers to an artificial intelligence chip re-creating a memory space, and converting continuous tensors and discontinuous tensors by copying.

Regarding the non-continuous copying based on the logical coordinates currently traversed by the continuous tensor and the physical addresses of the corresponding non-continuous tensor calculated, for example, reference may be made to the embodiments described later in connection with fig. 3 and fig. 4A to 4D, or reference may be made to the embodiments described later in connection with fig. 5, which will not be repeated here.

In the embodiment of the disclosure, by traversing the shape of the continuous tensor, calculating to obtain the physical address of the discontinuous tensor corresponding to the current traversed logical coordinate of the continuous tensor based on the current traversed logical coordinate of the continuous tensor, the step length of the continuous tensor and the step length of the discontinuous tensor, and performing discontinuous copying based on the current traversed logical coordinate of the continuous tensor and the calculated physical address of the corresponding discontinuous tensor, the discontinuous copying is directly performed under the data in the special data format of the artificial intelligent chip, and a Reorder operator is not required before and after the discontinuous copying is performed, so that the overall performance of the data processing of the artificial intelligent chip can be at least improved.

It should be noted that the embodiments of the present disclosure do not limit which of the input data in the first format and the output data in the second format is the continuous tensor, which is the discontinuous tensor. For example, in some embodiments of the present disclosure, the method of data processing further comprises: judging whether the input data and the output data are continuous or not; if the input data is continuous and the output data is discontinuous, determining the input data as a continuous tensor and the output data as a discontinuous tensor; and if the input data is discontinuous and the output data is continuous, determining the input data as a discontinuous tensor and the output data as a continuous tensor.

An exemplary method of calculating the physical address of the discontinuous tensor corresponding to the logical coordinates currently traversed by the continuous tensor is described below in connection with fig. 3 and 4A-4D.

FIG. 3 illustrates a flowchart of a method 300 of calculating physical addresses of non-consecutive tensors corresponding to the logical coordinates of the consecutive tensors currently traversed, in accordance with an embodiment of the present disclosure. The method 300 may be performed by an artificial intelligence chip or by the electronic device 600 shown in fig. 6. It should be understood that method 300 may also include additional blocks not shown and/or that the blocks shown may be omitted, the scope of the disclosure being not limited in this respect.

Fig. 4A illustrates exemplary input data according to an embodiment of the present disclosure. Fig. 4B illustrates an exemplary physical arrangement of input data on a memory according to an embodiment of the present disclosure. Fig. 4C illustrates exemplary output data according to an embodiment of the present disclosure. Fig. 4D illustrates an exemplary physical arrangement of output data on a memory according to an embodiment of the present disclosure.

In the examples shown in fig. 4A to 4D, the input data is a discontinuous tensor, the output data is a continuous tensor, the shape of the input data is [4,3], and the input data is, for example, a matrix of 4 rows and 3 columns; the shape of the output data is [3,4], for example, a matrix of 3 rows and 4 columns; the step size of the input data is [1,3], which characterizes, for example, accessing data by moving 3 data at a time in the dimension direction corresponding to the column, the step size of the output data is [4,1], which characterizes, for example, accessing data by moving 4 data at a time in the dimension direction corresponding to the row. Illustratively, taking the step size of the input data as [ n, m ] as an example, it characterizes that the first dimension moves n data at a time in the original data, and the second dimension moves m data at a time in the original data to access the data.

In addition, in this disclosure, taking the shape of the tensor (or "input data", or "output data") as an example, it is characterized that the tensor (or "input data", or "output data") has n data elements in its first dimension, m data elements in its second dimension, and p data elements in its third dimension, and therefore, the tensor (or "input data", or "output data") has n×m×p data elements. If the coordinate of one of the data elements is q, r, s, the position of the data element in the tensor (or "input data", or "output data") is characterized, the coordinate value corresponding to the data element in the first dimension is q, the coordinate value corresponding to the data element in the second dimension is r, and the coordinate value corresponding to the data element in the third dimension is s.

In the embodiment depicted in FIG. 3, the artificial intelligence chip may proceed to steps 302 through 306 in response to the output data being a continuous tensor and the input data being a discontinuous tensor.

In step 302, a logical address of the corresponding input data is calculated based on the logical coordinates of the output data currently traversed.

For example, the logical coordinates currently traversed by the output data are multiplied by the step size of the input data to obtain the logical address of the corresponding input data.

For example, in connection with the embodiments described in connection with fig. 4A-4D, assume that the logical coordinate currently traversed by the output data is [2,1]. In this case, the currently traversed output data is the data "9" of the hatched portion in fig. 4C, and the input data corresponding to the currently traversed output data is the data "9" of the hatched portion in fig. 4A. The logical address of the corresponding input data can be calculated based on the following formula (1):

logical address of input data=logical coordinate [0] currently traversed by output data [0] step length of input data+logical coordinate [1] currently traversed by output data step length of input data [1] formula (1), for example, the logical address of corresponding input data can be calculated based on formula (1) as

2*1+1*3＝5。

In step 304, based on the logical address of the corresponding input data, the logical coordinates of the corresponding input data are calculated.

For example, based on the logical address of the corresponding input data and the step size of the input data, the logical coordinates of the corresponding input data are calculated.

For example, the embodiments described in connection with fig. 4A to 4D assume that the logical address of the corresponding input data is 5. The logical coordinates of the corresponding input data can be calculated based on the following formula (2) and formula (3):

logical coordinates of input data [0] = logical address of input data/step size of input data [1] formula (2)

Step length [1] formula (3) of input data of logical coordinates [1] = logical address of input data-logical coordinates of input data [0]

For example, based on the formula (2) and the formula (3), the logical coordinates of the corresponding input data may be calculated as [ 5/3=1, (5-1*3) =2 ], i.e., [1,2].

In step 306, a physical address of the corresponding input data is calculated based on the logical coordinates of the corresponding input data.

For example, based on the logical coordinates of the corresponding input data and the shape of the input data, calculating a physical address offset value of the corresponding input data; and calculating the physical address of the corresponding input data based on the physical address offset value and the memory starting address of the corresponding input data.

For example, in the embodiments described in connection with fig. 4A to 4D, it is assumed that the logical coordinates of the corresponding input data are [1,2]. The physical address offset value of the corresponding input data may be calculated based on the following formula (4):

physical address offset value of input data=logical coordinate of input data [0 ]. Shape of input data [1] + logical coordinate of input data [1] formula (4)

For example, based on the formula (4), a physical address offset value of 1×3+2=5 of the corresponding input data may be calculated. For example, by shifting the physical address shift value 5 of the input data based on the memory start address of the input data, the physical address of the corresponding input data "9" is calculated as shown in fig. 4B.

In the embodiment described in connection with fig. 3 and 4A-4D, making a non-continuous copy based on the current traversed logical coordinates of the continuous tensor, the calculated physical addresses of the corresponding non-continuous tensor, for example, includes: reading target data based on the physical address of the corresponding input data; and writing the read target data into the output data according to the current traversed logic coordinates of the output data.

For example, writing the read target data to the output data at the logical coordinates of the output data that are currently traversed includes: calculating to obtain the physical address of the currently traversed output data according to the current traversed logic coordinates of the output data based on the read target data; and writing the read target data to the physical address of the currently traversed output data, as shown in fig. 4D. It should be noted that, the implementation manner of calculating the physical address of the output data traversed currently may refer to the implementation manner of calculating the physical address of the input data similarly, which is not described herein again.

Fig. 5 illustrates a flowchart of a method 500 of calculating physical addresses of non-consecutive tensors corresponding to the logical coordinates of the consecutive tensors currently traversed in accordance with an embodiment of the present disclosure. The method 500 may be performed by an artificial intelligence chip or by the electronic device 600 shown in fig. 6. It should be understood that method 500 may also include additional blocks not shown and/or that the blocks shown may be omitted, the scope of the disclosure being not limited in this respect.

In the embodiment depicted in FIG. 5, in response to the input data being a continuous tensor and the output data being a discontinuous tensor, the artificial intelligence chip may proceed to steps 502 through 506.

In step 502, a logical address of the corresponding output data is calculated based on the logical coordinates currently traversed by the input data.

In step 504, based on the logical address of the corresponding output data, the logical coordinates of the corresponding output data are calculated.

In step 506, a physical address of the corresponding output data is calculated based on the logical coordinates of the corresponding output data.

It should be noted that the specific implementation manner of steps 502 to 506 in fig. 5 may depend on the actual situation, which is not described in detail in this disclosure. For example, the specific implementation of steps 502 to 506 in fig. 5 may refer to the specific implementation of steps 302 to 306 in fig. 3 similarly, and will not be described herein.

In the embodiment described in connection with fig. 5, making a non-continuous copy based on the logical coordinates of the continuous tensor currently traversed, the physical addresses of the corresponding non-continuous tensor calculated, comprises: reading target data based on the currently traversed logic coordinates of the input data; and writing the read target data into the output data according to the physical address of the output data, the specific implementation manner may be similarly referred to the specific implementation manner of the embodiment described in connection with fig. 3 and fig. 4A to 4D, and will not be described herein.

Additionally, in some embodiments of the present disclosure, a computing device is presented comprising: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor cause the computing device to perform the method 200, the method 300, and the method 500 as described above.

Additionally, in some embodiments of the present disclosure, a computer-readable storage medium having computer program code stored thereon that, when executed, performs the methods 200, 300, and 500 described above is presented.

Fig. 6 schematically illustrates a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure. Electronic device 600 may be used to perform method 200, method 300, and method 500. As shown in fig. 6, the electronic device 600 includes a central processing unit (i.e., CPU 602) that can perform various suitable actions and processes according to computer program instructions stored in a read-only memory (i.e., ROM 604) or computer program instructions loaded from a storage unit 616 into a random access memory (i.e., RAM 606). In the RAM 606, various programs and data required for the operation of the electronic device 600 may also be stored. The CPU 602, ROM 604, and RAM 606 are connected to each other by a bus 608. An input/output interface (i.e., I/O interface 610) is also connected to bus 608.

Various components in the electronic device 600 are connected to the I/O interface 610, including: the input unit 612, the output unit 614, and the storage unit 616, the cpu 602 performs the respective methods and processes described above, for example, performs the methods 200, 300, and 500. For example, in some embodiments, the methods 200, 300, and 500 may be implemented as computer software programs stored on a machine-readable medium, such as the storage unit 616. In some embodiments, some or all of the computer program may be loaded and/or installed onto electronic device 600 via ROM 604 and/or communication unit 618. When the computer program is loaded into RAM 606 and executed by CPU 602, one or more of the operations of method 200, method 300, and method 500 described above may be performed. Alternatively, in other embodiments, CPU 602 may be configured to perform one or more actions of method 200, method 300, and method 500 in any other suitable manner (e.g., by means of firmware).

It should be further appreciated that the present invention can be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.

Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor in a voice interaction device, a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of data processing, the method comprising:

in response to one of the input data in the first format and the output data in the second format being a continuous tensor, and the other of the input data in the first format and the output data in the second format being a discontinuous tensor, wherein the first format and the second format are both proprietary data formats of the artificial intelligence chip,

traversing the shape of the continuous tensor;

calculating to obtain a physical address of the discontinuous tensor corresponding to the current traversed logical coordinate of the continuous tensor based on the current traversed logical coordinate of the continuous tensor, the step length of the continuous tensor and the step length of the discontinuous tensor; and

non-continuous copying is performed based on the logical coordinates of the continuous tensor currently traversed and the calculated physical addresses of the corresponding non-continuous tensors.

2. The method of claim 1, wherein the private data format of the artificial intelligence chip is a block-wise sequential format of memory addresses.

3. The method according to claim 1, wherein the method further comprises:

judging whether the input data and the output data are continuous or not;

if the input data is continuous and the output data is discontinuous, determining the input data as a continuous tensor and the output data as a discontinuous tensor; and

if the input data is discontinuous and the output data is continuous, the input data is determined as a discontinuous tensor and the output data is determined as a continuous tensor.

4. The method of claim 1, wherein calculating the physical address of the non-continuous tensor corresponding to the logical coordinate traversed by the current continuous tensor comprises:

in response to the output data being a continuous tensor, the input data being a discontinuous tensor, performing the following:

calculating to obtain a logic address of the corresponding input data based on the current traversed logic coordinates of the output data;

calculating logic coordinates of the corresponding input data based on the logic address of the corresponding input data; and

based on the logical coordinates of the corresponding input data, the physical address of the corresponding input data is calculated.

5. The method of claim 4, wherein calculating a logical address of the corresponding input data based on the current traversed logical coordinates of the output data comprises:

and multiplying the currently traversed logic coordinates of the output data by the step length of the input data to obtain the corresponding logic address of the input data.

6. The method of claim 4, wherein calculating logical coordinates of the corresponding input data based on the logical address of the corresponding input data comprises:

based on the logical address of the corresponding input data and the step length of the input data, calculating to obtain the logical coordinates of the corresponding input data.

7. The method of claim 4, wherein calculating the physical address of the corresponding input data based on the logical coordinates of the corresponding input data comprises:

calculating to obtain a physical address offset value of the corresponding input data based on the logical coordinates of the corresponding input data and the shape of the input data; and

and calculating the physical address of the corresponding input data based on the physical address offset value and the memory starting address of the corresponding input data.

8. The method of claim 7, wherein making the non-sequential copies based on the logical coordinates of the sequential tensor currently traversed, the physical addresses of the corresponding non-sequential tensors calculated, comprises:

reading target data based on the physical address of the corresponding input data; and

and writing the read target data into the output data according to the logical coordinates of the output data, which are traversed currently.

9. The method of claim 1, wherein calculating the physical address of the non-continuous tensor corresponding to the logical coordinate traversed by the current continuous tensor comprises:

in response to the input data being a continuous tensor, the output data being a discontinuous tensor, performing the following:

calculating to obtain a logic address of corresponding output data based on the current traversed logic coordinates of the input data;

calculating to obtain the logic coordinates of the corresponding output data based on the logic address of the corresponding output data; and

based on the logical coordinates of the corresponding output data, the physical address of the corresponding output data is calculated.

10. The method of claim 9, wherein making the non-sequential copy based on the logical coordinates of the sequential tensor currently traversed, the physical addresses of the corresponding non-sequential tensor calculated comprises:

reading target data based on the currently traversed logic coordinates of the input data; and

and writing the read target data into the output data according to the physical address of the output data.

11. A computing device, comprising:

at least one processor; and

at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor cause the computing device to perform the method of any one of claims 1 to 10.

12. A computer readable storage medium having stored thereon computer program code which, when executed, performs the method according to any of claims 1 to 10.