CN114282160A

CN114282160A - Data processing device, integrated circuit chip, equipment and implementation method thereof

Info

Publication number: CN114282160A
Application number: CN202011036325.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2022-04-05

Abstract

The present disclosure relates to a data processing apparatus, a method, an integrated circuit chip, an electronic device, and a board, wherein the data processing apparatus is included in a computing apparatus, the computing apparatus may be included in a combined processing apparatus, the combined processing apparatus may further include a general interconnect interface and other processing apparatuses. The computing device interacts with other processing devices to jointly complete computing operations specified by a user. The combined processing device may further comprise a storage device connected to the computing device and the other processing device, respectively, for storing data of the computing device and the other processing device. The scheme disclosed by the invention can be widely applied to various conversions of multi-dimensional data, and the efficiency of data conversion is improved.

Description

Data processing device, integrated circuit chip, equipment and implementation method thereof

Technical Field

The present disclosure relates generally to the field of data processing. More particularly, the present disclosure relates to a data processing apparatus, an integrated circuit chip, an electronic device, a board card, and a method implemented by the data processing apparatus.

Background

Operations in the field of artificial intelligence typically involve the processing of multidimensional data (e.g., two-dimensional matrices or three-dimensional arrays). Taking the processing of a two-dimensional matrix as an example, the conversion operation may include transposition, rotation, or mirroring. For such conversion operations, it is currently common to use dedicated matrix operation custom circuits to implement them. However, these matrix operation circuits are relatively complex in design, and the interfaces and functions are relatively fixed, and one type of matrix operation circuit can only process a corresponding type of matrix conversion, and cannot perform multiple conversion operations of the matrix according to actual needs. Therefore, how to obtain a data processing device capable of performing a conversion operation on multidimensional data becomes a problem to be solved in the prior art.

Further, in a computing system, an instruction set is a set of instructions for performing computations and controlling the computing system, and plays a critical role in improving the performance of a computing chip (e.g., a processor) in the computing system. Various types of computing chips (particularly those in the field of artificial intelligence) currently utilize associated instruction sets to perform various general or specific control operations and data processing operations. However, current instruction sets suffer from a number of drawbacks. For example, existing instruction sets are limited to hardware architectures and perform poorly in terms of flexibility. Further, current instructions also present improvements in the conversion of various data types, particularly for the processing of multidimensional data.

Disclosure of Invention

To address at least the technical problems noted in the background section above, and to provide a computing architecture and instruction system for efficiently processing multidimensional data, the solutions of the present disclosure will be described in several aspects below.

In a first aspect, the present disclosure provides a data processing apparatus, which includes a data cache circuit and a data conversion circuit, wherein the data cache circuit is configured to perform data caching, and the data conversion circuit is configured to perform a store and read operation on data to be converted in the data cache circuit according to a data conversion instruction, so as to implement data conversion on the data to be converted.

In a second aspect, the present disclosure provides an integrated circuit chip comprising a data processing apparatus as described in the first aspect above.

In a third aspect, the present disclosure provides an electronic device comprising an integrated circuit chip as described in the second aspect above.

In a fourth aspect, the present disclosure provides a board card comprising the integrated circuit chip as described in the third aspect above.

In a fifth aspect, the present disclosure provides a method implemented by a data processing apparatus, wherein the data processing apparatus comprises a data caching circuit and a data conversion circuit, the method comprising: performing data caching using the data caching circuitry; and using a data conversion circuit to execute a storing operation and a reading operation on the data to be converted in the data cache circuit according to the data conversion instruction so as to realize the data conversion of the data to be converted.

With the data processing apparatus, the integrated circuit chip, the electronic device, the board card and the method provided in the foregoing aspects, the scheme of the present disclosure may implement data conversion on data, for example, multidimensional data, using a data conversion instruction. Specifically, by performing the storing and reading operations on the data to be converted in the data cache circuit by using the data conversion instruction, the scheme of the disclosure can implement various operations such as addressing, carrying, deforming and the like on the multi-dimensional data. In addition, since the foregoing data conversion operation is realized by means of an instruction manner, the scheme of the present disclosure reduces the modification to the hardware architecture and improves the efficiency of data conversion.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:

FIG. 1 is a schematic diagram illustrating a data processing apparatus according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a computing device according to an embodiment of the present disclosure;

3-8 are flow diagrams respectively illustrating various types of operation of a data conversion circuit according to an embodiment of the present disclosure;

FIG. 9 is a flow diagram illustrating a method implemented by a data processing apparatus according to an embodiment of the present disclosure;

FIG. 10 is a block diagram illustrating a combined processing device according to an embodiment of the present disclosure; and

fig. 11 is a schematic structural diagram illustrating a board card according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram illustrating a data processing apparatus 100 according to an embodiment of the present disclosure. As shown in fig. 1, the data processing apparatus 100 includes a data buffer circuit 102 and a data conversion circuit 104. In one embodiment, the data caching circuitry may be configured to perform data caching. In one exemplary application scenario, data suitable for buffering by the data buffering circuit of the present disclosure may be multidimensional data, including, for example, tensor data. In one embodiment, the data conversion circuit may be configured to perform a store operation and a read operation on data to be converted (e.g., multidimensional data) in the aforementioned data cache circuit according to the data conversion instruction, so as to implement data conversion on the data to be converted. For example, by performing the logging and reading on the data to be converted in different manners, the scheme of the present disclosure may perform transformation operations on various spatial positions on the data to be converted to obtain deformed data. Taking three-dimensional data as an example, a transposition, mirroring, or multi-angle (e.g., 90 ° or 180 °) rotation operation of the three-dimensional data may be implemented using the scheme of the present disclosure. In one application scenario, when the data to be converted is a matrix to be converted (i.e. a kind of two-dimensional data), the data buffer circuit may include a buffer memory array for buffering the matrix data written or transformed by the data conversion circuit in a storing operation, or for transferring the matrix data to the data conversion circuit in a reading operation, so that the data conversion circuit transfers the matrix data to an external memory or a computing unit for proper conversion.

In one embodiment, when the data to be converted is multi-dimensional data, the data conversion instruction may include data amount information and inter-dimension offset information for performing a store and read operation with respect to each dimension of the multi-dimensional data. In an example scenario, the data amount information may include the number of data to be stored and read in each dimension, and the inter-dimension offset information includes an address interval to be spanned from a current dimension to a next dimension. In another example scenario, the address interval is determined according to the number of data in the current dimension and the footprint of each data.

As an example, when the multidimensional data is data having three dimensions of length, width, and height, then for data in the length or width direction (i.e., one dimension), the data amount information may be information in terms of the number, size, and/or occupied space of each data in the length or width direction. Further, the inter-dimension shift information may be inter-dimension shift information from one-dimensional data composed of a length or width direction to two-dimensional data composed of a length and width, or inter-dimension shift information from two-dimensional data composed of a length and width to three-dimensional data composed of a length, width, and height. For example, the inter-dimension offset information may be the number of data and/or address space offsets spanned from the previous low dimension to the next high dimension.

To facilitate reading and writing multidimensional data, the present disclosure proposes to define an M-dimensional counter, i.e., M N _1, N _2, N _3, and … … N _ M one-dimensional counters. In counting, when the nth counter is counted for one period N _ N (e.g., from 0 to N _ N), the nth counter may be reset to zero, and the (N + 1) th counter + 1. Based on the definition of the M-dimension counter, the present disclosure proposes maintaining an M-dimension read counter and read pointer, and maintaining an M-dimension write counter and write pointer. For an M-dimensional read counter, it can be expressed as: r _ cnt (i _1, i _2, i _3 … i _ M), and accordingly the read pointer can be expressed as: r _ p is R _ addr + i _1 _ s _0+ i _2 _ s _1+ … + s _ M-1 _ n _ M, where R _ addr is the read base address. In the reading process, the M-dimensional counter R _ cnt is incremented by one after each reading of 0 elements of R _ n. Similarly, for an M-dimensional write counter, it can be expressed as: w _ cnt (i _1, i _2, i _3 … i _ M), and accordingly the write pointer can be expressed as: w _ p — addr + i _1 _ s _0+ i _2 _ s _1+ … + s _ M-1 _ n _ M, where W _ addr is the write base address.

Based on the above-described M-dimensional read and write counters, the scheme of the present disclosure may implement data conversion of data to be converted by storing R _ n0 elements into the data cache circuit via the data conversion circuit, and then reading W _ n0 elements from the data cache circuit via the data conversion circuit. For example, the data conversion circuit may perform selective output of partial data on multidimensional data, rotate the data at an angle, mirror or transpose the data, or the like, using the aforementioned data storing and reading operations, according to a data conversion instruction.

The composition and operation of the data processing apparatus of the present disclosure are described above in connection with fig. 1. Based on the above description, those skilled in the art can understand that the data processing apparatus of the present disclosure transforms multidimensional data using data conversion instructions, which improves the execution efficiency of multidimensional data conversion. In addition, by performing various kinds of storing and reading operations on the multi-dimensional data using the data conversion circuit to convert the data, the scheme of the present disclosure simplifies the complexity of the multi-dimensional data conversion operation and accelerates the progress of the multi-dimensional data conversion. Therefore, the scheme of the disclosure also reduces the data processing overhead, and in a computing scene needing data conversion, the computing efficiency is improved and the computing overhead is reduced.

Fig. 2 is a schematic diagram illustrating a computing device 200 according to an embodiment of the present disclosure. As shown in fig. 2, the computing device 200 may include the data processing device 100 described above in connection with fig. 1, i.e., the data cache circuit 102 and the data conversion circuit 104 shown in the figure. Since the data processing apparatus of the present disclosure has been described in detail above in conjunction with fig. 1, and the detailed description about the data buffer circuit 102 and the data conversion circuit 104 also applies to the computing apparatus 200, the same contents will not be described in detail.

As shown in the figure, the computing device of the present disclosure further includes a computing circuit 204 and a storage circuit 202. The computing circuitry and memory circuitry herein may be implemented in various ways depending on the application scenario. In one embodiment, the memory circuit may take the form of a memory, such as a dynamic random access memory ("DRAM") or a double data rate synchronous dynamic random access memory ("DDR SDRAM"), which may be used to store operational data required by the computing circuit to perform operations, or data for exchange with external memory. When the computing device of the present disclosure is applied to the field of artificial intelligence, the aforementioned operation data or data to be exchanged may be data of various related fields, such as various training data, network model data and parameters in machine learning, and various types of data (such as image data and the like) to be detected.

In another embodiment, the computing circuitry may take the form of a general-purpose or special-purpose processor, and a general-purpose or special-purpose processor core, which may include various types of operators and buses (e.g., a data bus, a control bus, or a broadcast bus). When the disclosed solution is applied to the field of artificial intelligence, the computing circuit can be implemented or included in a single-core or multi-core deep learning processor to implement various computing operations. In one application scenario, when the computing circuitry is implemented as a processor core, it may be packaged together with data caching circuitry and data conversion circuitry to form a processor. In this case, the data caching circuitry may be implemented as a cache of the computing device to store data and instructions in memory (e.g., storage circuitry 202) that are most frequently accessed by the computing circuitry, such that the computing circuitry need not read the needed data and instructions from memory that is running relatively slowly.

Fig. 3-8 are flowcharts respectively illustrating various types of operations of a data conversion circuit according to an embodiment of the present disclosure. As described above, the data conversion circuit of the present disclosure obtains data amount information and/or offset information between dimensions of multidimensional data according to a data conversion instruction, and cooperates with the data cache circuit to implement different types of storing and reading operations on the multidimensional data, thereby implementing data conversion on the multidimensional data. Various exemplary operations will be described in detail below in conjunction with fig. 3-8.

FIG. 3 illustrates a flow 300 performed by the data conversion circuit of the present disclosure in performing a store and read operation. Specifically, at step S302, the data conversion circuit may be configured to enter a store and read operation in the data cache circuit for a corresponding number of data within a dimension of the data to be converted according to data amount information for the dimension. In other words, the data conversion circuit of the present disclosure may perform operations of writing one or more data to or reading from the data cache circuit within the same dimension, thereby implementing processing of data of a specific dimension of the multi-dimensional data. Further, at step S304, the data conversion circuit may be configured to address to a next dimension according to the inter-dimension offset information, so as to perform a store and read operation on a corresponding number of data within the next dimension in the data cache circuit. It can be seen that, in this case, the data conversion circuit may perform the storing and reading operation on one dimension data by using the inter-dimension offset information after performing the storing and reading operation on the current dimension data, thereby implementing the conversion operation on a plurality of continuous dimension data. Further, by utilizing the offset information between the dimensions, the conversion operation of the cross-dimension data is also realized.

Based on the operations illustrated in fig. 3 above, in one embodiment, the data conversion instruction may further include the aforementioned store base address information ("W _ addr") and read base address information ("R _ addr"), wherein in performing the write and read operations, the data conversion circuit is further configured to address to the next dimension to perform a store operation according to the store base address information and inter-dimension offset information, and to address to the next dimension to perform a read operation according to the read base address information and inter-dimension offset information. It can be seen that by utilizing base address information, the data conversion circuit can more accurately and efficiently locate multidimensional data that requires both store and read operations to be performed. Furthermore, by introducing the base address information, the method for positioning to multi-dimensional data is expanded and the addressing space is expanded. In addition, by introducing the base address information and the inter-dimension offset information, the data processing apparatus of the present disclosure may implement various types of operations on the multi-dimensional data, such as one or more of a bypass operation, a multi-angle rotation operation, a mirroring operation, or a sequential transformation operation of the multi-dimensional data, based on the data conversion instruction.

As previously noted, in one implementation scenario, when the multi-dimensional data is implemented as a two-dimensional matrix, then the data caching circuitry of the present disclosure may include a cache memory array. In one embodiment, the size of the cache memory array may be determined according to the number of rows X and the number of columns Y of the matrix to be converted and the memory space K occupied by the basic elements in the matrix. Specifically, according to the number of rows X, the number of columns Y, and the storage space K occupied by the basic elements of the matrix to be converted, the data processing apparatus of the present disclosure may set a cache storage array that matches the size of the matrix to be converted, where the storage space occupied by the basic elements of the cache storage array is greater than or equal to K, and the number of rows of the cache storage array is greater than or equal to the larger of X and Y, and the number of columns of the cache storage array is greater than or equal to the larger of X and Y. The size of the cache memory array is set to meet the requirement that the matrix to be converted can be stored in the cache memory array according to a preset access mode. For example, when X is not equal to Y, the number of rows and columns is interchanged under the transpose operation, and the number of rows and columns of the cache memory array formed in the above arrangement can support such a change in the number of rows and columns during the matrix conversion process. Of course, when X is equal to Y, the number of rows in the cache memory array is greater than or equal to any one of X and Y, and the number of columns in the cache memory array is greater than or equal to any one of X and Y.

Under different scenes, the data conversion circuit disclosed by the invention can be cooperated with a cache memory array to execute corresponding storage and reading operations on a two-dimensional matrix, so that various operations on the two-dimensional matrix are realized. According to the present disclosure, the aforementioned various types of operations may be, for example, a transpose operation shown in fig. 4, a rotation 270 ° operation shown in fig. 5, a rotation 90 ° operation shown in fig. 6, a rotation 180 ° operation shown in fig. 7, and a mirror operation shown in fig. 8. These operations will be described in detail below in conjunction with fig. 4-8.

Fig. 4 is a flow chart illustrating matrix transposition performed by the data conversion circuit according to the data conversion instruction 400 of the present disclosure. As shown in fig. 4, at step S402, the data conversion circuit may store each row of the matrix to be converted into a corresponding row in the cache storage array in an intra-row order according to the data conversion instruction to form an intermediate matrix. Next, at step S404, the data conversion circuit may read each column of the intermediate matrix in order from the first column to the last column of the intermediate matrix and in order within a column in the cache storage array to be output as the first row to the last row of the matrix in order to convert the matrix to be converted into a corresponding transpose matrix.

In particular, assuming that the matrix to be transformed is an X Y matrix, X may be equal to Y or not equal to Y. The data conversion circuit may store the 1 st row of the X Y matrix to the 1 st to the Y th basic element positions in the 1 st row of the cache memory array, respectively, in order from the 1 st to the Y th basic elements, and thus operate in a loop in order from the 1 st to the X th rows of the X Y matrix until the X th row of the X Y matrix is stored to the 1 st to the Y th basic element positions in the X th row of the cache memory array, respectively, in order from the 1 st to the Y th basic elements, forming an intermediate matrix of (X Y), which may be understood as a matrix to be converted copied to the cache memory array. Then, the data conversion circuit reads the 1 st basic element from the 1 st row to the X th row of the (X Y) intermediate matrix in sequence, and splices the read X basic elements into 1 row and uses the spliced row as the 1 st row of the transpose matrix in sequence, and thus the operation is circulated in sequence from the 1 st basic element to the Y th basic element until the 1 st row to the Y th basic element from the X Y intermediate matrix are read in sequence, and splices the read X basic elements into 1 row and uses the spliced row as the Y th row of the transpose matrix in sequence, thereby forming the transposed matrix.

Fig. 5 is a flow diagram 500 illustrating matrix rotation 270 operations performed by the data conversion circuitry of the present disclosure pursuant to a data conversion instruction. As shown in fig. 5, at step S502, the data conversion circuit may store each row of the matrix to be converted into a corresponding row in the cache memory array in reverse order within a row according to the data conversion instruction to form an intermediate matrix. Next, at step S504, the data conversion circuit may read each column of the intermediate matrix in order from the first column to the last column of the intermediate matrix and in order within a column in the cache memory array to be output as the first row to the last row of the matrix in turn, so as to convert the matrix to be converted into a corresponding matrix rotated by 270 °.

In particular, assuming that the matrix to be transformed is an X Y matrix, X may be equal to Y or not equal to Y. The data conversion circuit may store the 1 st row of the X Y matrix to the 1 st to Y basic element positions in the 1 st row of the cache memory array in the order from the Y basic element to the 1 st basic element, and thus operate cyclically in the order from the 1 st to X rows of the X Y matrix until the X th row of the X Y matrix is stored to the 1 st to Y basic element positions in the X th row of the cache memory array in the order from the Y basic element to the 1 st basic element, respectively, to form an X Y intermediate matrix, which may be understood as being formed by intra-row mirroring each row of a matrix to be converted. Then, the data conversion circuit may sequentially read the 1 st basic element from the 1 st row to the 1 st row of the X × Y intermediate matrix, and concatenate the read X basic elements into 1 row in this order and use the row 1 as the rotated matrix, and thus circulate the operations in the order from the 1 st basic element to the Y basic element until the 1 st row to the Y basic element from the X × Y intermediate matrix are sequentially read, and concatenate the read X basic elements into 1 row in this order and use the row Y as the rotated matrix, thereby forming a matrix rotated by 270 °.

Fig. 6 is a flow diagram 600 illustrating matrix rotation 90 operations performed by the data conversion circuitry of the present disclosure upon a data conversion instruction. As shown in fig. 6, at step S602, the data conversion circuit may store each row of the matrix to be converted into a corresponding row in the cache memory array in an intra-row order to form an intermediate matrix. Next, at step S604, the data conversion circuit may read each column of the intermediate matrix in order from the first column to the last column of the intermediate matrix and in reverse order within the column in the cache memory array to be output as the first row to the last row of the matrix in order to convert the matrix to be converted into a corresponding matrix rotated by 90 °.

In particular, assuming that the matrix to be transformed is an X Y matrix, X may be equal to Y or not equal to Y. The data conversion circuit may store the 1 st row of the X Y matrix to the 1 st basic element position to the Y th basic element position in the 1 st row of the cache memory array, respectively, in order from the 1 st basic element to the Y th basic element, and thus operate cyclically in order from the 1 st row to the X th row of the X Y matrix until the X th row of the X Y matrix is stored to the 1 st basic element position to the Y th basic element position in the X th row of the cache memory array, respectively, in order from the 1 st basic element to the Y th basic element, forming an X Y intermediate matrix, which may be understood as a matrix to be converted being copied to the cache memory array. Then, the data conversion circuit may cycle the operation of reading the 1 st row of the X × Y intermediate matrix in the order from the Y-th basic element to the 1 st basic element as the 1 st row of the rotated matrix until reading the X-th row of the X × Y intermediate matrix in the order from the 1 st row to the X-th row as the X-th row of the rotated matrix, thereby forming a matrix rotated by 90 °.

Fig. 7 is a flow diagram 700 illustrating matrix rotation 180 operations performed by the data conversion circuitry of the present disclosure pursuant to a data conversion instruction. As shown in fig. 7, at step S702, the data conversion circuit may store each row of the matrix to be converted into a corresponding row in the cache memory array in reverse order within the row to form an intermediate matrix. Next, at step S704, the data conversion circuit may read each row of the intermediate matrix sequentially as a first row to a last row of a matrix in order from a last row to a first row of the intermediate matrix and in intra-row order in the cache memory array to convert the matrix to be converted into a corresponding matrix rotated by 180 °.

In particular, assuming that the matrix to be transformed is an X Y matrix, X may be equal to Y or not equal to Y. The data conversion circuit may store the 1 st row of the X Y matrix to the 1 st to Y basic element positions in the 1 st row of the cache memory array in the order from the Y basic element to the 1 st basic element, and thus operate cyclically in the order from the 1 st to X rows of the X Y matrix until the X th row of the X Y matrix is stored to the 1 st to Y basic element positions in the X th row of the cache memory array in the order from the Y basic element to the 1 st basic element, respectively, to form an X Y intermediate matrix, which may be understood as being formed by intra-row mirroring each row of a matrix to be converted. Then, the data conversion circuit may read the X-th row of the X × Y intermediate matrix in order from the 1 st basic element to the Y-th basic element as the 1 st row of the rotated matrix, and thus cyclically operate in order from the X-th row to the 1 st row of the intermediate matrix until the 1 st row of the X × Y intermediate matrix is read in order from the 1 st basic element to the Y-th basic element as the X-th row of the rotated matrix, thereby forming a matrix rotated by 180 °.

FIG. 8 is a flow chart 800 illustrating matrix mirroring operations performed by the data conversion circuitry of the present disclosure pursuant to a data conversion instruction. As shown in fig. 8, at step S802, the data conversion circuit may store each row of the matrix to be converted into a corresponding row in the cache memory array in reverse order within the row to form an intermediate matrix. Next, at step S804, each row of the intermediate matrix is read in sequence from the last row to the first row of the intermediate matrix and in reverse order within the row in the cache memory array as the first row to the last row of the matrix, so as to convert the matrix to be converted into the corresponding mirror matrix.

In particular, assuming that the matrix to be transformed is an X Y matrix, X may be equal to Y or not equal to Y. The conversion processing circuit may store the 1 st row of the X × Y matrix to the 1 st basic element position to the Y th basic element position in the 1 st row of the cache memory array in the order from the Y th basic element to the 1 st basic element, and thus, the operation is cycled in the order from the 1 st row to the X th row of the X × Y matrix until the X th row of the X × Y matrix is stored to the 1 st basic element position to the Y th basic element position in the X th row of the cache memory array in the order from the Y th basic element to the 1 st basic element, so as to form an X × Y intermediate matrix, which may be understood as being formed by internally mirroring each row of the matrix to be converted. Then, the data conversion circuit may read the X-th row of the X × Y intermediate matrix in order from the Y-th basic element to the 1-th basic element as the 1 st row of the mirrored matrix, and thus cyclically operate in order from the X-th row to the 1 st row of the intermediate matrix until the 1 st row of the X × Y intermediate matrix is read in order from the Y-th basic element to the 1 st basic element as the X-th row of the mirrored matrix, thereby forming the mirrored matrix.

The above description with reference to fig. 4 to 8 describes that the matrix to be converted is stored in the cache memory array in a preset manner to form an intermediate matrix, and then a read operation is performed on the intermediate matrix to obtain a converted matrix. It is understood that when the space occupied by the cache memory array is larger than the matrix to be converted (intermediate matrix), the operation on the intermediate matrix can also be regarded as the operation on the effective basic elements in the cache memory array. In addition, it should be noted that the above describes an example of five matrix conversion operations implemented by the data conversion circuit, which is used for illustrative and non-limiting purposes only, and the data conversion circuit of the present disclosure may also implement other conversions of the matrix according to the data conversion instruction.

FIG. 9 is a flow diagram illustrating a method 900 implemented by a data processing apparatus according to an embodiment of the present disclosure. It will be appreciated that the data processing apparatus herein is the data processing apparatus discussed above in connection with fig. 1-8. Therefore, the foregoing description of the data processing apparatus is also applicable to the scheme shown in fig. 9, and the same contents will not be described again.

As shown in fig. 9, at step S902, the method 900 performs data caching using a data caching circuit. The data herein may be multidimensional data, such as a two-dimensional matrix or a three-dimensional array, in accordance with various embodiments of the present disclosure. At step S904, the method 900 uses the data conversion circuit to perform a store and read operation on the data to be converted in the data cache circuit according to the data conversion instruction, so as to implement data conversion on the data to be converted. Although not shown in fig. 9, those skilled in the art will appreciate that the method 900 may perform various operations of the data processing apparatus described in conjunction with fig. 1-8.

Fig. 10 is a block diagram illustrating a combined processing device 1000 according to an embodiment of the present disclosure. As shown in fig. 10, the combined processing device 1000 includes a computing processing device 1002, an interface device 1004, other processing devices 1006, and a storage device 1008. Depending on the application scenario, one or more computing devices 1010 may be included in the computing processing device, which may include the data processing device of the present disclosure, and may be configured to perform the operations described herein in conjunction with fig. 1-9.

In various embodiments, the computing processing device of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware structure of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or as part of a hardware structure of an artificial intelligence processor core, computing processing devices of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure.

In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively perform user-specified operations. Other Processing devices of the present disclosure may include one or more types of general and/or special purpose processors such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an artificial intelligence processor, and the like, depending on the implementation. These processors may include, but are not limited to, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, etc., and the number may be determined based on actual needs. As previously mentioned, the computational processing apparatus of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure only. However, when considered together, a computing processing device and other processing devices may be considered to form a heterogeneous multi-core structure.

In one or more embodiments, the other processing devices may interface the computing processing device of the present disclosure with external data and controls, performing basic controls including, but not limited to, data handling, starting and/or stopping of the computing device, and the like. In further embodiments, other processing devices may also cooperate with the computing processing device to collectively perform computational tasks.

In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing processing device may obtain input data from other processing devices via the interface device, and write the input data into a storage device (or memory) on the computing processing device. Further, the computing processing device may obtain the control instruction from the other processing device via the interface device, and write the control instruction into the control cache on the computing processing device slice. Alternatively or optionally, the interface device may also read data from the memory device of the computing processing device and transmit the data to the other processing device.

Additionally or alternatively, the combined processing device of the present disclosure may further include a storage device. As shown in the figure, the storage means is connected to the computing processing means and the further processing means, respectively. In one or more embodiments, the storage device may be used to hold data for the computing processing device and/or the other processing devices. For example, the data may be data that is not fully retained within internal or on-chip storage of a computing processing device or other processing device.

In some embodiments, the present disclosure also discloses a chip (e.g., chip 1102 shown in fig. 11). In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combinatorial processing devices as shown in fig. 10. The chip may be connected to other associated components through an external interface device (e.g., external interface device 1106 shown in fig. 11). The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) and/or the like may be integrated on the chip. In some embodiments, the present disclosure also discloses a chip packaging structure, which includes the above chip. In some embodiments, the present disclosure also discloses a board card including the above chip packaging structure. The board will be described in detail below with reference to fig. 11.

Fig. 11 is a schematic diagram illustrating a structure of a board 1100 according to an embodiment of the present disclosure. As shown in FIG. 11, the card includes a memory device 1104 for storing data, which includes one or more memory cells 1110. The memory device may be coupled to and communicate data with control device 1108 and chip 1102 described above via, for example, a bus. Further, the board also includes an external interface device 1106 configured for data relay or transfer functions between the chip (or chips in a chip package) and an external device 1112 (e.g., a server or computer, etc.). For example, the data to be processed may be transferred to the chip by an external device through an external interface means. For another example, the calculation result of the chip may be transmitted back to an external device via the external interface device. According to different application scenarios, the external interface device may have different interface forms, for example, it may adopt a standard PCIE interface or the like.

In one or more embodiments, the control device in the disclosed board card may be configured to regulate the state of the chip. Therefore, in an application scenario, the control device may include a single chip Microcomputer (MCU) for controlling the operating state of the chip.

From the above description in conjunction with fig. 10 and 11, it will be understood by those skilled in the art that the present disclosure also discloses an electronic device or apparatus, which may include one or more of the above boards, one or more of the above chips and/or one or more of the above combined processing devices.

According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph. The electronic device or apparatus of the present disclosure may also be applied to the fields of the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction site, medical, and the like. Further, the electronic device or apparatus of the present disclosure may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as a cloud, an edge, and a terminal. In one or more embodiments, an electronic device or apparatus with high computing power according to the present disclosure may be applied to a cloud device (e.g., a cloud server), and an electronic device or apparatus with low power consumption may be applied to a terminal device and/or an edge device (e.g., a smartphone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device to simulate the hardware resources of the terminal device and/or the edge device according to the hardware information of the terminal device and/or the edge device, and uniform management, scheduling and cooperative work of end-cloud integration or cloud-edge-end integration can be completed.

It is noted that for the sake of brevity, this disclosure presents some methods and embodiments thereof as a series of acts or combinations thereof, but those skilled in the art will appreciate that the aspects of the disclosure are not limited by the order of the acts described. Accordingly, one of ordinary skill in the art will appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in this disclosure are capable of being practiced in other than the specifically disclosed embodiments, and that the acts or modules illustrated herein are not necessarily required to practice one or more aspects of the disclosure. In addition, the present disclosure may focus on the description of some embodiments, depending on the solution. In view of the above, those skilled in the art will understand that portions of the disclosure that are not described in detail in one embodiment may also be referred to in the related description of other embodiments.

In particular implementation, based on the disclosure and teachings of the present disclosure, one skilled in the art will appreciate that several embodiments disclosed in the present disclosure may be implemented in other ways not disclosed herein. For example, as for the units in the foregoing embodiments of the electronic device or apparatus, the units are divided based on the logic functions, and there may be other dividing manners in actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.

In the present disclosure, units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, part or all of the units can be selected to achieve the purpose of the scheme of the embodiment of the disclosure. In addition, in some scenarios, multiple units in embodiments of the present disclosure may be integrated into one unit or each unit may exist physically separately.

In some implementation scenarios, the integrated units may be implemented in the form of software program modules. If implemented in the form of software program modules and sold or used as a stand-alone product, the integrated units may be stored in a computer readable memory. In this regard, when aspects of the present disclosure are embodied in the form of a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described in embodiments of the present disclosure. The Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

In other implementation scenarios, the integrated unit may also be implemented in hardware, that is, a specific hardware circuit, which may include a digital circuit and/or an analog circuit, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, transistors or memristors, among other devices. In view of this, the various devices described herein (e.g., computing devices or other processing devices) may be implemented by suitable hardware processors, such as CPUs, GPUs, FPGAs, DSPs, ASICs, and the like. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), and may be, for example, a variable Resistive Memory (RRAM), a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), an Enhanced Dynamic Random Access Memory (EDRAM), a High Bandwidth Memory (HBM), a Hybrid Memory Cube (HMC), a ROM, a RAM, or the like.

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that equivalents or alternatives within the scope of these claims be covered thereby.

The foregoing may be better understood in light of the following clauses:

clause 1, a data processing apparatus comprising a data cache circuit and a data conversion circuit, wherein:

the data caching circuitry is configured to perform data caching; and

the data conversion circuit is configured to execute a storing operation and a reading operation on data to be converted in the data cache circuit according to a data conversion instruction so as to realize data conversion on the data to be converted.

Clause 2, the data processing apparatus according to clause 1, wherein the data to be converted is multidimensional data, and the data conversion instruction includes data amount information and inter-dimension shift information for performing a deposit and read operation with respect to each dimension in the multidimensional data.

Clause 3, the data processing apparatus according to clause 2, wherein the data amount information includes the number of data to be stored and read in each dimension, and the inter-dimension offset information includes an address interval to be spanned from the current dimension to the next dimension.

Clause 4, the data processing apparatus of clause 3, wherein the address interval is determined according to the number of data in the current dimension and the footprint of each data.

Clause 5, the data processing apparatus of clause 3, wherein in performing the deposit and read operations, the data conversion circuitry is configured to perform the operations of:

according to the data quantity information of the dimensionality of the data to be converted, performing storage and reading operation on the corresponding number of data in the dimensionality in the data cache circuit; and

addressing to a next dimension in accordance with the inter-dimension offset information to perform store and read operations in the data cache circuit on a corresponding number of data in the next dimension.

Clause 6, the data processing apparatus of clause 5, wherein the data conversion instruction further comprises store base address information and read base address information, wherein in performing the write and read operations, the data conversion circuitry is configured to:

addressing the next dimension to perform a logging operation according to the logging base address information and the inter-dimension offset information; and

addressing to the next dimension to perform a read operation according to the read base address information and inter-dimension offset information.

Clause 7, the data processing apparatus of any of clauses 1-6, wherein the data transformation comprises performing one or more of a bypass operation, a multi-angle rotation operation, a mirroring operation, or a sequential transformation operation on the multi-dimensional data.

Clause 8, the data processing apparatus of clause 1, wherein the data to be converted is a matrix to be converted, and the data caching circuit comprises a cache storage array.

Clause 9, the data processing apparatus of clause 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instructions:

storing each row of the matrix to be converted into a corresponding row in the cache storage array according to an intra-row sequence to form an intermediate matrix; and

reading each column of the intermediate matrix in the cache storage array in sequence from the first column to the last column of the intermediate matrix and in sequence within the columns as the first row to the last row of the matrix for output, so as to convert the matrix to be converted into a corresponding transpose matrix.

Clause 10, the data processing apparatus of clause 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instructions:

storing each row of the matrix to be converted into a corresponding row in the cache storage array according to an in-row reverse order to form an intermediate matrix; and

reading each column of the intermediate matrix in sequence from the first column to the last column of the intermediate matrix and in sequence within the columns in the cache memory array as the first row to the last row of the matrix to convert the matrix to be converted into a corresponding matrix rotated by 270 °.

Clause 11, the data processing apparatus of clause 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instructions:

reading each column of the intermediate matrix in sequence from the first column to the last column of the intermediate matrix and in reverse order within the columns in the cache memory array as the first row to the last row of the matrix to convert the matrix to be converted into a corresponding matrix rotated by 90 degrees.

Clause 12, the data processing apparatus of clause 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instructions:

reading each row of the intermediate matrix in sequence from the last row to the first row of the intermediate matrix and in-row sequence in the cache memory array as the first row to the last row of the matrix to convert the matrix to be converted into a corresponding matrix rotated by 180 degrees.

Clause 13, the data processing apparatus of clause 8, wherein the data conversion circuitry is configured to, in accordance with the data conversion instructions:

and reading each row of the intermediate matrix in the cache memory array in sequence from the last row to the first row of the intermediate matrix and in reverse order in rows to be output as the first row to the last row of the matrix so as to convert the matrix to be converted into the corresponding mirror image matrix.

Clause 14, an integrated circuit chip comprising the data processing apparatus of any one of clauses 1-13.

Clause 15, an electronic device, comprising the integrated circuit chip of clause 14.

Clause 16, a board comprising the integrated circuit chip of clause 14.

Clause 17, a data processing apparatus implemented method, wherein the data processing apparatus comprises data caching circuitry and data conversion circuitry, the method comprising:

performing data caching using the data caching circuitry; and

and using the data conversion circuit to execute the storing and reading operation of the data to be converted in the data cache circuit according to the data conversion instruction so as to realize the data conversion of the data to be converted.

Clause 18, the method according to clause 17, wherein the data to be converted is multidimensional data, and the data conversion instruction includes data amount information and inter-dimension shift information for performing a deposit and read operation with respect to each dimension in the multidimensional data.

Clause 19, the method of clause 18, wherein the data volume information includes a number of data to be stored and read in each dimension, and the inter-dimension offset information includes an address interval to be spanned from a current dimension to a next dimension.

Clause 20, the method of clause 19, wherein the address interval is determined according to the number of data in the current dimension and the footprint of each data.

Clause 21, the method of clause 19, wherein in performing the deposit and read operations, the method comprises performing the following operations using the data conversion circuitry:

Clause 22, the method of clause 21, wherein the data translation instruction further comprises a store base address information and a read base address information, wherein in performing the write and read operations, the method comprises using data translation circuitry to:

Clause 23, the method of any one of clauses 17-22, wherein the data transformation comprises performing one or more of a bypass operation, a multi-angle rotation operation, a mirroring operation, or a sequential transformation operation on the multi-dimensional data.

Clause 24, the method of clause 17, wherein the data to be converted is a matrix to be converted, and the data caching circuit comprises a cache storage array.

Clause 25, the method of clause 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instructions:

Clause 26, the method of clause 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instructions:

Clause 27, the method of clause 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instructions:

Clause 28, the method of clause 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instructions:

Clause 29, the method of clause 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instructions:

In the above embodiments of the present disclosure, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

Claims

1. A data processing apparatus comprising a data buffer circuit and a data conversion circuit, wherein:

the data caching circuitry is configured to perform data caching; and

2. The data processing apparatus according to claim 1, wherein the data to be converted is multidimensional data, and the data conversion instruction includes data amount information and inter-dimension shift information for performing a deposit and read operation with respect to each dimension in the multidimensional data.

3. The data processing apparatus according to claim 2, wherein the data amount information includes the number of data to be stored and read in each dimension, and the inter-dimension offset information includes an address interval to be spanned from a current dimension to a next dimension.

4. The data processing apparatus according to claim 3, wherein the address interval is determined according to the number of data within the current dimension and the footprint of each data.

5. The data processing apparatus according to claim 3, wherein in performing the store and read operations, the data conversion circuitry is configured to perform operations of:

6. The data processing apparatus of claim 5, wherein the data translation instruction further comprises store base address information and read base address information, wherein in performing the write and read operations, the data translation circuitry is configured to:

7. The data processing apparatus according to any of claims 1-6, wherein the data transformation comprises performing one or more of a bypass operation, a multi-angle rotation operation, a mirroring operation, or a sequential transformation operation on the multi-dimensional data.

8. The data processing apparatus according to claim 1, wherein the data to be converted is a matrix to be converted, and the data buffering circuitry comprises a buffer memory array.

9. The data processing apparatus according to claim 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instruction:

10. The data processing apparatus according to claim 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instruction:

11. The data processing apparatus according to claim 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instruction:

12. The data processing apparatus according to claim 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instruction:

13. The data processing apparatus according to claim 8, wherein the data conversion circuitry is configured to perform the following operations in accordance with the data conversion instruction:

14. An integrated circuit chip comprising a data processing device according to any one of claims 1 to 13.

15. An electronic device comprising the integrated circuit chip of claim 14.

16. A board card comprising the integrated circuit chip of claim 14.

17. A method implemented by a data processing apparatus, wherein the data processing apparatus comprises a data caching circuit and a data conversion circuit, the method comprising:

performing data caching using the data caching circuitry; and

18. The method of claim 17, wherein the data to be converted is multidimensional data, and the data conversion instruction includes data amount information and inter-dimension offset information for performing a deposit and read operation with respect to each dimension in the multidimensional data.

19. The method of claim 18, wherein the data amount information includes the number of data to be stored and read in each dimension, and the inter-dimension offset information includes an address interval to be spanned from a current dimension to a next dimension.

20. The method of claim 19, wherein the address interval is determined according to a number of data within the current dimension and a footprint of each data.

21. The method of claim 19, wherein in performing the store and read operations, the method comprises using the data conversion circuitry to:

22. The method of claim 21, wherein the data translation instruction further comprises store base address information and read base address information, wherein in performing the write and read operations, the method comprises using data translation circuitry to:

23. The method of any of claims 17-22, wherein the data transformation comprises performing one or more of a bypass operation, a multi-angle rotation operation, a mirroring operation, or a sequential transformation operation on the multi-dimensional data.

24. The method of claim 17, wherein the data to be converted is a matrix to be converted and the data caching circuitry comprises a cache memory array.

25. The method of claim 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instruction:

26. The method of claim 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instruction:

27. The method of claim 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instruction:

28. The method of claim 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instruction:

29. The method of claim 24, wherein the data conversion circuitry is used to perform the following operations in accordance with the data conversion instruction: