CN116776059A

CN116776059A - Matrix operation method, device and computer equipment

Info

Publication number: CN116776059A
Application number: CN202310518943.2A
Authority: CN
Inventors: 裴京; 王松; 马骋; 李博文; 徐海峥
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-05-09
Filing date: 2023-05-09
Publication date: 2023-09-19

Abstract

The application relates to a matrix operation method, a device and computer equipment, wherein the method is applied to a neuromorphic chip and comprises the following steps: firstly, splitting the rows of elements in a first matrix to obtain a plurality of first submatrices, splitting the rows of elements in a second matrix to obtain a plurality of second submatrices, then respectively storing the plurality of first submatrices and the plurality of second submatrices into different cores in a neuromorphic chip, wherein each core stores one first submatrix and one second submatrix, and finally, operating the cores to obtain operation results of the first matrix and the second matrix. By adopting the method, the operation speed of the neuromorphic chip can be improved while the storage resources of the neuromorphic chip are saved.

Description

Matrix operation method, device and computer equipment

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a matrix operation method, a device, and a computer device.

Background

When a neuromorphic chip under a many-core architecture performs computation, matrix processing is generally performed on data to be processed to implement fast reasoning, for example, performing matrix product operation and the like.

In the related art, when performing the product operation of the matrix, the matrix needs to be converted into a vector, and the product of the matrix is obtained by calling the multiplier-adder for multiple times.

However, the matrixing process in the related art occupies more memory resources of the neuromorphic chip, and affects the operation speed of the neuromorphic chip.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a matrix operation method, apparatus, and computer device that can save memory resources of a neuromorphic chip and increase the operation speed of the neuromorphic chip.

In a first aspect, the present application provides a matrix operation method, applied to a neuromorphic chip, the method comprising:

performing row splitting on elements in the first matrix to obtain a plurality of first submatrices, and performing row splitting on elements in the second matrix to obtain a plurality of second submatrices;

storing the first submatrices and the second submatrices into different kernels in the neuromorphic chip respectively; storing a first sub-matrix and a second sub-matrix in each core;

and operating each kernel to obtain operation results of the first matrix and the second matrix.

In one embodiment, storing the first submatrices and the second submatrices in different cores in the neuromorphic chip includes:

storing each first sub-matrix into each core according to the mode that one core corresponds to one first sub-matrix;

and under the condition that the storage of each first submatrix is completed, storing each second submatrix into each core in a mode that one core corresponds to one second submatrix.

In one embodiment, the operation result of the first matrix and the second matrix is obtained by running each kernel, including:

if the operation of each core is finished, updating the second submatrix stored in each core, and operating each core after updating;

the operation of each core updated each time is finished, the step of updating the second submatrices stored in each core is continuously executed until each core stores each second submatrix, and the operation results of each core obtained by all operation are obtained;

and determining operation results of the first matrix and the second matrix according to operation results of the cores.

In one embodiment, updating the second submatrix stored in each core includes:

The first submatrix in each kernel is controlled to be unchanged, and the second submatrix in each kernel is circulated according to a preset sequence; the preset sequence includes a column sequence order of each second sub-matrix in the second matrix, or a core position order.

In one embodiment, obtaining the running results of each kernel obtained by all the running includes:

aiming at any kernel, transposing a second sub-matrix in the kernel to generate a second transposed sub-matrix;

and acquiring an operation result of the kernel according to the first submatrix and the second transposed submatrix.

In one embodiment, determining the operation result of the first matrix and the second matrix according to the operation result of each kernel includes:

obtaining the calculation result address of each kernel;

filling the operation result of each kernel into the corresponding position in the operation result matrix according to the calculation result address of each kernel to obtain the operation results of the first matrix and the second matrix; the number of rows and columns of the operation result matrix is determined according to the first matrix and the second matrix.

In one embodiment, obtaining the calculation result address of each core includes:

acquiring a row address of a first submatrix in each core and a transposed row address of a second transposed submatrix in each core;

And determining the calculated result address of each kernel according to the row address of each first submatrix and the transposed row address of each second transposed submatrix.

In one embodiment, before storing the first submatrices and the second submatrices in different cores in the neuromorphic chip, the matrix operation method further includes:

determining a storage space according to the first matrix and the second matrix;

the number of kernels is determined from within the neuromorphic chip based on the memory space.

In a second aspect, the present application also provides a matrix operation device, including:

the splitting module is used for carrying out row splitting on elements in the first matrix to obtain a plurality of first submatrices, and carrying out row splitting on elements in the second matrix to obtain a plurality of second submatrices;

the memory module is used for respectively storing the first submatrices and the second submatrices into different kernels in the neuromorphic chip; storing a first sub-matrix and a second sub-matrix in each core;

the acquisition module is used for acquiring operation results of the first matrix and the second matrix by operating each kernel.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method in any of the embodiments of the first aspect described above when the computer program is executed.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method in any of the embodiments of the first aspect described above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method in any of the embodiments of the first aspect described above.

The matrix operation method, the matrix operation device and the computer equipment firstly split the rows of the elements in the first matrix to obtain a plurality of first submatrices, split the rows of the elements in the second matrix to obtain a plurality of second submatrices, and then respectively store the plurality of first submatrices and the plurality of second submatrices into different cores in the neuromorphic chip, wherein each core stores one first submatrix and one second submatrix, and finally, operation results of the first matrix and the second matrix are obtained by operating each core. The method is based on row splitting of elements in a first matrix and row splitting of elements in a second matrix, and operation results of the first matrix and the second matrix are obtained according to a first submatrix obtained by row splitting and a second submatrix obtained by column splitting. In the process of matrix operation, the corresponding relation between the row elements of the first matrix and the column elements of the second matrix is considered, so that the operation flow of the matrix is simplified. In addition, a first submatrix and a second submatrix are stored in different cores in the neuromorphic chip, which means that only submatrices needing to be operated are stored in one core of the neuromorphic chip, and additional matrix elements are not needed to be stored, so that the effectiveness of each matrix element in the core is improved. Based on the method, through the process of obtaining the operation result by operating each kernel, the memory resource in the neuromorphic chip is saved, and the operation speed of the neuromorphic chip is further improved. In addition, since the neuromorphic chip is usually mounted in a computer device for operation, it is understood that when the matrix operation is performed in the neuromorphic chip by the matrix operation method provided by the present application, an increase in the operation efficiency of the matrices is equivalent to an increase in the operation speed of the neuromorphic chip, and thus, an increase in the operation speed of the computer device on which the neuromorphic chip is mounted.

Drawings

FIG. 1 is a diagram of an application environment of a matrix operation method in one embodiment;

FIG. 2 is a flow chart of a matrix operation method in one embodiment;

FIG. 3 is a flow chart illustrating a matrix storage basis in one embodiment;

FIG. 4 is a flow chart of a result acquisition step in one embodiment;

FIG. 5 is a flow chart of a kernel update procedure in one embodiment;

FIG. 6 is a flowchart of a result obtaining step in another embodiment;

FIG. 7 is a diagram illustrating a vector operation step in one embodiment;

FIG. 8 is a flowchart of a result obtaining step in another embodiment;

FIG. 9 is a flowchart of a result address obtaining step in one embodiment;

FIG. 10 is a flow chart of a kernel number determination step in one embodiment;

FIG. 11 is a schematic diagram of a kernel matrix product step in one embodiment;

fig. 12 is a block diagram showing the structure of a matrix operation device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The matrix operation method provided by the embodiment of the application can be applied to a neuromorphic chip. The neuromorphic chip may be a voice processing neuromorphic chip, a video processing neuromorphic chip, an image processing neuromorphic chip, or the like, and the internal structure thereof may be as shown in fig. 1. The neuromorphic chip includes a processor, memory, input/Output interfaces (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the neuromorphic chip is operative to provide computing and control capabilities. The memory of the neuromorphic chip comprises a nonvolatile storage medium and a kernel memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the neuromorphic chip is used for storing thread stack processing data. The input/output interface of the neuromorphic chip is used to exchange information between the processor and an external device. The communication interface of the neuromorphic chip is used for communicating with an external terminal through network connection. The computer program is executed by a processor to implement a matrix processing method. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the neuromorphic chip to which the present inventive arrangements are applied, and that a particular neuromorphic chip may include more or fewer components than those shown, or may combine certain components, or may have a different arrangement of components.

In the related art, there are limitations in performing a product operation of a matrix, which at least include the following two points:

(1) Primitive instructions are limited. In the related art, primitive instructions in the neuromorphic chip are directed to vectors, when matrix operation is performed, the matrices are required to be converted into vectors, and a product result of the matrices is obtained by calling a multiplier-adder for multiple times. That is, the related art cannot directly perform the product on the matrix, which results in a complex matrix product operation process of the neuromorphic chip.

(2) Memory resources are limited. In the related art, when a neuromorphic chip performs matrix operation, a first matrix is split according to rows, a first submatrix is obtained, a second matrix is copied for multiple times, and then the second matrices obtained by copying the split first submatrix cores are combined and stored in different cores in the neuromorphic chip respectively. That is, a plurality of identical secondary matrices are stored in the neuromorphic chip, resulting in a large memory resource of the neuromorphic chip occupied by matrix operations.

In summary, the matrixing process in the related art occupies more memory resources of the neuromorphic chip, and affects the operation speed of the neuromorphic chip. Based on the method, the matrix to be operated is independently stored in different cores by carrying out row splitting and column splitting on the matrix, so that the core resources in the neuromorphic chip are fully utilized, and the operation speed of the neuromorphic chip is improved.

In one embodiment, as shown in fig. 2, a matrix operation method is provided, and the method is applied to the neuromorphic chip in fig. 1 for illustration, and includes the following steps:

s201, splitting the element in the first matrix in rows to obtain a plurality of first submatrices, and splitting the element in the second matrix in rows to obtain a plurality of second submatrices.

It should be noted that the neuromorphic chip in the embodiment of the present application may be applied to a plurality of scenes, such as voice processing, video processing, image processing, etc., according to actual requirements. Then, according to different application scenarios, the neuromorphic chip in the embodiment of the present application may be a voice processing neuromorphic chip, a video processing neuromorphic chip, an image processing neuromorphic chip, or the like.

Further, when the neuromorphic chip works in different scenes, the obtained information is usually converted into data which is convenient for computer identification, the data is converted into a matrix, and the matrix is operated to realize corresponding reasoning.

Taking the example of the product operation by the matrix, two matrices are needed to be involved, and the representation is performed by a first matrix and a second matrix. Then, if the neuromorphic chip is a speech processing neuromorphic chip, the first matrix and the second matrix are speech matrices generated based on the audio information; if the neuromorphic chip is a video processing neuromorphic chip, the first matrix and the second matrix are video matrices generated based on video information; if the neuromorphic chip is an image processing neuromorphic chip, the first matrix and the second matrix are image matrices generated based on image information.

In the embodiment of the application, the number of rows of the first matrix is the same as the number of columns of the second matrix, and the mode of splitting the rows of the elements of the first matrix and the mode of splitting the rows of the elements of the second matrix are the same based on the number of cores in the neuromorphic chip, and are all equally split. That is, the number of first sub-matrices is the same as the number of second sub-matrices, and the number of rows of each first sub-matrix is the same as the number of columns of each second sub-matrix.

In the embodiment of the application, under the condition that the first submatrix and the second submatrix can perform matrix operation, the number of rows of the first submatrix and the number of columns of the second submatrix are not limited, that is, each first submatrix comprises at least one row of elements in the first matrix, and each second submatrix comprises at least one column of elements in the second matrix.

S202, storing a plurality of first submatrices and a plurality of second submatrices into different kernels in a neuromorphic chip respectively; each core stores a first sub-matrix and a second sub-matrix.

Inside the neuromorphic chip, a plurality of neuromorphic kernels, i.e., kernels, are included. As the most basic and core part of a neuromorphic chip, each core has a large number of individual "neurons" or execution units, each of which can receive input in the form of pulses from any other neuron.

Before the matrix operation is carried out on the neuromorphic chip, the number of the kernels is set according to storage resources and operation resources required by the matrix operation so as to support the matrix operation. In the embodiment of the application, the number of the first submatrices is the same as that of the second submatrices, and the first submatrices and the second submatrices are respectively stored into different cores in the neuromorphic chip according to a storage rule that each core stores the first submatrices and the second submatrices.

For example, if there are m first sub-matrices and m second sub-matrices, the first sub-matrices and the second sub-matrices are combined to obtain m sub-matrix combinations, and the m sub-matrix combinations are stored in m cores in the neuromorphic chip, where the first sub-matrices and the second sub-matrices in different cores are different.

For example, if there are m first sub-matrices and m second sub-matrices, the first sub-matrices and the second sub-matrices are combined to obtain m ² Sub-matrices are combined and m ² Sub-matrix combination is stored in a neuromorphic chip ² The first and second sub-matrices in different cores are different from each other in the cores.

S203, obtaining operation results of the first matrix and the second matrix by running each kernel.

And operating each kernel, obtaining matrix operation results of the first submatrix and the second submatrix in each kernel, and then obtaining operation results of the first matrix and the second matrix according to the matrix operation results of the first submatrix and the second submatrix in each kernel.

Optionally, according to the first submatrix and the second submatrix in each core, determining an operation result address corresponding to each core, and according to the operation result address, combining the operation results of each core to obtain the operation results of the first matrix and the second matrix.

In the embodiment of the application, firstly, elements in a first matrix are split into a plurality of first submatrices, and elements in a second matrix are split into a plurality of second submatrices, then the plurality of first submatrices and the plurality of second submatrices are respectively stored in different cores in a neuromorphic chip, wherein each core stores one first submatrix and one second submatrix, and finally, operation results of the first matrix and the second matrix are obtained by operating each core. Because the matrix operation method provided by the embodiment of the application is based on the line splitting of the elements in the first matrix and the line splitting of the elements in the second matrix, the operation results of the first matrix and the second matrix are obtained according to the first submatrix obtained by the line splitting and the second submatrix obtained by the line splitting. In the process of matrix operation, the corresponding relation between the row elements of the first matrix and the column elements of the second matrix is considered, so that the operation flow of the matrix is simplified. In addition, a first submatrix and a second submatrix are stored in different cores in the neuromorphic chip, which means that only submatrices needing to be operated are stored in one core of the neuromorphic chip, and additional matrix elements are not needed to be stored, so that the effectiveness of each matrix element in the core is improved. Based on the method, through the process of obtaining the operation result by operating each kernel, the memory resource in the neuromorphic chip is saved, and the operation speed of the neuromorphic chip is further improved. In addition, the neuromorphic chip is mounted in a computer device for operation, so it can be understood that when the matrix operation method provided by the embodiment of the application performs matrix operation in the neuromorphic chip, the operation efficiency of the matrices is improved, which is equivalent to the improvement of the operation speed of the neuromorphic chip, and thus, the operation speed of the computer device mounted in the neuromorphic chip is also equivalent to the improvement of the operation speed of the computer device mounted in the neuromorphic chip.

The foregoing embodiments illustrate the rule of storing cores, i.e. one first sub-matrix and one second sub-matrix are stored in each core. For this storage rule, there may be various storage methods, for example, sequentially storing the first sub-matrix and the second sub-matrix in one core, or simultaneously storing the first sub-matrix and the second sub-matrix. Based on this, the following describes a memory method of the kernel by an embodiment.

In one embodiment, as shown in fig. 3, storing the first submatrices and the second submatrices in different cores in the neuromorphic chip includes:

s301, storing each first sub-matrix into each core in a mode that one core corresponds to one first sub-matrix.

The number of the inner cores in the neuromorphic chip is consistent with that of the first submatrices. And respectively storing a first submatrix in different cores, wherein the first submatrices in different cores are different, so that the first submatrices stored in the cores in the neuromorphic chip cover all elements in the first matrix.

And S301, when the storage of each first submatrix is completed, storing each second submatrix into each core in a mode that one core corresponds to one second submatrix.

Under the condition that the storage of each first submatrix is completed, one second submatrix is respectively stored in different cores, and the second submatrices in different cores are different, so that the second submatrices stored in the cores in the neuromorphic chip cover all elements in the second matrix.

It should be noted that, in the case where one core stores one first sub-matrix and one second sub-matrix, and the first sub-matrices in different cores are different from each other and the second sub-matrices in different cores are different from each other, the storage order of the first sub-matrix and the second sub-matrix is not limited.

In the embodiment of the application, under the condition that the first submatrix is stored in each core, the second submatrix is stored, and the elements of each first submatrix and the elements of each second submatrix are completely stored in different cores in the neuromorphic chip, so that the storage mode improves the effectiveness of matrix elements in the cores and avoids the waste of core storage resources in the neuromorphic chip.

Since the operation of the first matrix and the second matrix needs to traverse all elements in the first matrix and all elements in the second matrix, and matrix elements stored by different cores in the neuromorphic chip are mutually independent, after the operation of the different cores is finished, part of elements in each core need to be updated so as to establish the association relation between each first sub-matrix and each second sub-matrix, thereby ensuring the integrity of the operation results of the first matrix and the second matrix. Based on this, one implementation of obtaining the operation results of the first matrix and the second matrix by updating the submatrices in the kernel will be described below by way of one embodiment.

In one embodiment, as shown in fig. 4, by running each kernel, the operation result of the first matrix and the second matrix is obtained, including:

s401, if the operation of each core is finished, updating the second submatrix stored in each core, and operating each core after updating.

And under the condition that each kernel in the neuromorphic chip is monitored to store the first submatrix and the second submatrix, each kernel is operated, and an operation result of the first submatrix and the second submatrix is obtained. And, in a plurality of second sub-matrixes, obtaining a second target sub-matrix different from the second sub-matrix stored in the kernel, replacing the second sub-matrix stored in the kernel to finish updating the second sub-matrix stored in the current kernel, and then operating each updated kernel.

S402, after the operation of each updated kernel is finished, the step of updating the second submatrices stored in each kernel is continuously executed until each kernel stores each second submatrix, and operation results of each kernel obtained by all operations are obtained.

And after the operation of each core updated each time is finished, continuously executing the step of updating the second submatrix stored in each core, updating the second submatrix stored in the core once every time, and operating the core once to obtain an operation result once.

When each core stores each second submatrix, it means that the first submatrix and all the second submatrices in each core are combined once, and the operation result of each core is the operation result obtained by combining the first submatrix and each second submatrix in each core.

S403, determining operation results of the first matrix and the second matrix according to operation results of the cores.

The operation result of each core includes a first sub-matrix and a plurality of operation results of all the second sub-matrices, and then the operation result of each core is the plurality of operation results of all the first sub-matrices and all the second sub-matrices. And filling the operation results of the cores into corresponding positions according to the mapping relation between the first submatrices and the first matrixes and the mapping relation between the second submatrices and the second matrixes, and obtaining operation results of the first matrixes and the second matrixes.

In the embodiment of the application, the operation results of the first matrix and the second matrix are obtained by continuously updating the second submatrices stored in each kernel and continuously obtaining the operation results of the updated kernels. In the process of kernel operation or kernel updating, a first submatrix and a second submatrix are always stored in each kernel, so that the operation frequency and the updating frequency of each kernel are consistent.

Next, an implementation manner of "update the second submatrix stored in each core" in the foregoing embodiment will be described by way of one embodiment.

In one embodiment, updating the second submatrix stored in each core includes:

Taking the example that the first matrix is split into four first sub-matrices A0, A1, A2 and A3 and the second matrix is split into four second sub-matrices B0, B1, B2 and B3, the updating step of the second sub-matrices in each core is described.

As shown in fig. 5, four independent cores are included in the neuromorphic chip, and in an initial state, first core memories A0 and B0, second core memories A1 and B1, third core memories A2 and B2, and fourth core memories A3 and B3 are stored. And if the operation of the four cores is finished, controlling A0, A1, A2 and A3 in each core to be unchanged, and circulating B0, B1, B2 and B3 in each core according to a preset sequence.

Alternatively, the second sub-matrix in each kernel may be cycled in the order of the column sequence of the second sub-matrix in the second matrix; the second submatrix in each core may also be cycled through in the order of the core locations.

In the embodiment of the application, the circulation sequence of the second submatrix in each kernel can be the column sequence of the second submatrix in the second matrix or the position sequence of each kernel, so that the diversity and the flexibility of the updating mode of the kernel storage data are improved. In addition, as the second submatrix in the kernel is updated in a circulating way, the first submatrix and the second submatrix only need to be stored once in the whole neuromorphic chip, and the using quantity of the kernel in the neuromorphic chip is saved.

The operation results of the first matrix and the second matrix are obtained according to the operation results of the cores. In view of this, the following describes the steps of acquiring the running results of the cores by one embodiment.

In one embodiment, as shown in fig. 6, the operation results of each kernel obtained by all the operations are obtained, including:

s601, for any core, transpose the second sub-matrix in the core to generate a second transposed sub-matrix.

In the neuromorphic chip, elements in the matrix are stored in rows, and primitive instructions in the neuromorphic chip can only multiply vectors. Taking the first submatrix and the second submatrix for product operation as an example, the row of the first submatrix needs to be multiplied by the column of the second submatrix, and the operation process corresponding to the primitive instruction is shown in fig. 7, the first submatrix can only be one row, the second submatrix can only be one column, and the row element of the first submatrix is obtained through one address, and the column of the second submatrix needs to be obtained through multiple addressing in different rows, so that the addressing process needs to take a lot of time, and the operation efficiency is affected.

Based on this, when one first sub-matrix and one second sub-matrix are stored in the cores, the second transposed sub-matrix is generated by transposing the second sub-matrix while keeping the storage format of the first sub-matrix unchanged for any core. In this way, in the process of each kernel operation, only one addressing is needed to obtain the elements of the second sub-matrix, so that the addressing time of matrix operation is saved.

S602, obtaining an operation result of the kernel according to the first submatrix and the second transposed submatrix.

Since the rows of the first sub-matrix are identical to the columns of the second sub-matrix, and the second transposed sub-matrix is obtained by transposing the second sub-matrix, the rows of the first sub-matrix are identical to the rows of the second transposed sub-matrix. And respectively carrying out accumulated summation on the elements of each row of the first submatrix and the elements of each row of the second transposed submatrix to obtain operation results of the first submatrix and the second submatrix, namely operation results of the kernel.

In the embodiment of the application, the second submatrix in the kernel is transposed, one-time addressing is supported to obtain a column of elements in the second submatrix, the addressing time in the matrix operation process is saved, the operation processes of the first submatrix and the second transposed submatrix are simplified, and the running speed of the kernel is further improved.

Because the storage process and the operation process of each kernel are mutually independent, the operation result of each kernel is also discrete, and if the operation results of the first matrix and the second matrix are to be obtained, the operation results of the kernels are required to be integrated. In view of this, in the following, a description will be given of an acquisition procedure of an operation result of a matrix in the case of acquiring an operation result of each core by an embodiment.

In one embodiment, as shown in fig. 8, determining the operation result of the first matrix and the second matrix according to the operation result of each kernel includes:

s801, obtaining the calculation result address of each kernel.

It should be noted that, one kernel corresponds to a plurality of operation results, and the calculation result address of the kernel refers to the position of each operation result of the kernel in the matrix operation result. In the case that the first submatrix storage in the core is completed, the calculation result address of each core is determined.

S802, filling operation results of the cores into corresponding positions in an operation result matrix according to the calculation result addresses of the cores to obtain operation results of a first matrix and a second matrix; the number of rows and columns of the operation result matrix is determined according to the first matrix and the second matrix.

Taking the first matrix and the second matrix for product operation as an example according to the operation logic of matrix operation, the number of rows of the operation result matrix is the same as that of the first matrix, and the number of columns of the operation result matrix is the same as that of the second matrix.

And writing the operation results of the cores into corresponding positions according to the calculation result addresses of the cores every time the operation results of the cores are obtained, and obtaining the operation results of the first matrix and the second matrix until all the operation results of the cores are written.

In the embodiment of the application, the operation result of each kernel is filled according to the calculation result address of each kernel to obtain the operation result matrix, and the obtained operation result is reliable and easy to operate. And the operation result matrix is determined according to the first matrix and the second matrix, so that the customized operation result matrix can store the operation result of each kernel and does not waste the storage space of the nerve morphology chip.

The matrix stored by each core is updated in a cyclic manner, so that one core corresponds to a plurality of running results, and naturally, the calculated result address of one core includes the addresses of the plurality of running results. In view of this, the step of acquiring the calculation result address of each core will be described below by way of one embodiment.

In one embodiment, as shown in fig. 9, obtaining the calculation result address of each core includes:

s901, acquiring a row address of a first submatrix in each core and a transposed row address of a second transposed submatrix in each core.

For any kernel, acquiring a row address of a first sub-matrix according to the first sub-matrix stored in the kernel; and obtaining the transposed row address of the second transposed sub-matrix according to the second transposed sub-matrix stored in the kernel.

S902, determining the calculated result address of each kernel according to the row address of each first submatrix and the transposed row address of each second transposed submatrix.

The calculation result address is a basis for storing the calculation results of the first submatrix and the second submatrix, the calculation result address of each core is an address of a matrix including a designated row number and a designated column number, the designated row number is the row number of the first submatrix, and the designated column number is the row number of the second transposed submatrix.

Determining a row position of an operation result of the row element in a calculation result according to a row address of a first subarray in the kernel; and determining the column position of the operation result of the transposed row element in the calculation result according to the transposed row address of the second transposed sub-matrix in the kernel. In this way, the operation is performed on the designated row element and the designated transposed row element, so that the calculation result address can be determined according to the corresponding row position and column position in the calculation result, and the calculation result addresses of the first sub-matrix and the second transposed sub-matrix can be obtained by analogy.

In the embodiment of the application, the process of determining the calculated result address of the kernel accords with matrix operation logic and supports the pipeline working mode of the neuromorphic chip according to the row address of the first submatrix and the transposed row address of the second transposed submatrix in the kernel.

Before the matrix operation is performed on the neuromorphic chip, storage resources and computing resources required by the matrix operation are preset, and the number of kernels is allocated to ensure that the matrix operation can be performed smoothly. Based on this, the determination step of the number of cores is explained below by way of one embodiment.

In one embodiment, as shown in fig. 10, before storing the first submatrices and the second submatrices in different cores in the neuromorphic chip, the matrix operation method further includes:

s1001, determining a storage space according to the first matrix and the second matrix.

Firstly, according to the dimension of the first matrix and the dimension of the second matrix, the storage space of the two matrices is determined, and then according to the space required by the execution calculation and calculation result of the two matrices, the storage space is further enlarged. In the embodiment of the application, the double of the sum of the storage spaces of the two matrixes is determined as the storage space.

Optionally, the dimensions of the first matrix and the second matrix are 1024×1024, the two matrices occupy 1MB of storage space respectively, and a total of 2MB is required, and considering that the space is still required for performing calculation and calculation results of the first matrix and the second matrix, the storage space is further enlarged to 4MB. Taking the example of a first matrix divided into four 256 x 256 dimensions of a first sub-matrix and a second sub-matrix divided into 4 256 x 256 dimensions of a second sub-matrix, the first matrix and the second matrix perform a product operation as shown in figure 11, the required time complexity is 1024 x 1024 multiplications.

S1002, determining the number of kernels from the neuromorphic chip according to the storage space.

In the neuromorphic chip, the size of each core is 128kb, and the number of cores is determined according to the ratio of the memory space to the core size.

Still taking the dimensions of the first matrix and the second matrix as 1024×1024 as an example, a memory space of 4MB is required, and since the size of each core is 128kb, 16 cores are determined from within the neuromorphic chip.

In the embodiment of the application, the determined number of the cores has reliability in consideration of the storage space of the first matrix and the second matrix, the operation space of the first matrix and the second matrix and the storage space of the operation result of the first matrix and the second matrix, and the effective allocation of the core resources in the neuromorphic chip is realized.

In one embodiment, a matrix operation method is provided, including the steps of:

(1) And determining the storage space according to the first matrix and the second matrix.

(2) The number of different cores is determined from within the neuromorphic chip based on the memory space.

(3) And splitting the elements in the first matrix to obtain a plurality of first submatrices, and splitting the elements in the second matrix to obtain a plurality of second submatrices.

(4) And storing each first sub-matrix into each core in a mode that one core corresponds to one first sub-matrix.

(5) And storing each second submatrix into each core in a mode that one core corresponds to one second submatrix when the storage of each first submatrix is completed.

(6) And if the operation of each current kernel is finished, controlling the first submatrix in each kernel to be unchanged, and circulating the second submatrix in each kernel according to a preset sequence to operate each updated kernel. The preset sequence comprises a column sequence of each second sub-matrix in the second matrix, or a position sequence of each kernel.

(7) And after the operation of each core updated each time is finished, continuing to execute the step of updating the second submatrices stored in each core until each core stores each second submatrix.

(8) And aiming at any kernel, transposing the second submatrix in the kernel to generate a second transposed submatrix.

(9) And acquiring an operation result of the kernel according to the first submatrix and the second transposed submatrix.

(10) And obtaining the operation results of each core obtained by all the operations.

(11) And acquiring the row address of the first submatrix in each core and the transposed row address of the second transposed submatrix in each core.

(12) And determining the calculated result address of each kernel according to the row address of each first submatrix and the transposed row address of each second transposed submatrix.

(13) Filling the operation result of each kernel to the corresponding position in the operation result matrix according to the calculation result address of each kernel to obtain the operation results of the first matrix and the second matrix; the number of rows and columns of the operation result matrix is determined according to the first matrix and the second matrix.

In the embodiment of the application, firstly, elements in a first matrix are split into a plurality of first submatrices, and elements in a second matrix are split into a plurality of second submatrices, then the plurality of first submatrices and the plurality of second submatrices are respectively stored in different cores in a neuromorphic chip, wherein each core stores one first submatrix and one second submatrix, and finally, operation results of the first matrix and the second matrix are obtained by operating each core. According to the embodiment of the application, on the basis of carrying out row splitting on elements in the first matrix and row splitting on elements in the second matrix, the operation results of the first matrix and the second matrix are obtained according to the first submatrix obtained by row splitting and the second submatrix obtained by column splitting. In the process of matrix operation, the corresponding relation between the row elements of the first matrix and the column elements of the second matrix is considered, so that the operation flow of the matrix is simplified. In addition, a first submatrix and a second submatrix are stored in different cores in the neuromorphic chip, which means that only submatrices needing to be operated are stored in one core of the neuromorphic chip, and additional matrix elements are not needed to be stored, so that the effectiveness of each matrix element in the core is improved. Based on the method, through the process of obtaining the operation result by operating each kernel, the memory resource in the neuromorphic chip is saved, and the operation speed of the neuromorphic chip is further improved.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a matrix operation device for realizing the matrix operation method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiment of one or more matrix operation apparatus provided below may be referred to the limitation of the matrix operation method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 12, there is provided a matrix operation device 1200 including: a splitting module 1201, a storing module 1202 and an acquiring module 1203, wherein:

a splitting module 1201, configured to perform row splitting on elements in the first matrix to obtain a plurality of first submatrices, and perform row splitting on elements in the second matrix to obtain a plurality of second submatrices;

a storage module 1202, configured to store the plurality of first submatrices and the plurality of second submatrices into different cores in the neuromorphic chip, respectively; storing a first sub-matrix and a second sub-matrix in each core;

the obtaining module 1203 is configured to obtain an operation result of the first matrix and the second matrix by running each kernel.

In one embodiment, the memory module 1202 includes a first memory unit and a second memory unit, wherein:

the first storage unit is used for storing each first sub-matrix into each core in a mode that one core corresponds to one first sub-matrix;

and the second storage unit is used for storing each second submatrix into each core in a mode that one core corresponds to one second submatrix under the condition that the storage of each first submatrix is completed.

In one embodiment, the acquisition module 1203 includes a matrix update unit, an update loop unit, and a result determination unit, where:

the matrix updating unit is used for updating the second sub-matrix stored in each kernel if the current operation of each kernel is finished, and operating each updated kernel;

the updating circulation unit is used for continuously executing the step of updating the second submatrices stored in each kernel after the operation of each kernel updated each time is finished until each kernel stores each second submatrix, and obtaining the operation results of each kernel obtained by all operation;

and the result determining unit is used for determining the operation results of the first matrix and the second matrix according to the operation results of the cores.

In one embodiment, the matrix updating unit is further configured to control the first submatrix in each core to be unchanged, and cycle the second submatrix in each core according to a preset sequence; the preset sequence includes a column sequence order of each second sub-matrix in the second matrix, or a core position order.

In one embodiment, the update loop unit includes a matrix transpose subunit and a matrix operation subunit, wherein:

a matrix transposition subunit, configured to transpose, for any core, a second sub-matrix in the core, to generate a second transposed sub-matrix;

And the matrix operation subunit is used for acquiring the operation result of the kernel according to the first submatrix and the second transposed submatrix.

In one embodiment, the result determination unit comprises an address acquisition subunit and a result filling subunit, wherein:

an address obtaining subunit, configured to obtain a calculation result address of each core;

the result filling subunit is used for filling the operation result of each kernel to the corresponding position in the operation result matrix according to the calculation result address of each kernel to obtain the operation results of the first matrix and the second matrix; the number of rows and columns of the operation result matrix is determined according to the first matrix and the second matrix.

In one embodiment, the address obtaining subunit is further configured to obtain a row address of a first sub-matrix in each core and a transposed row address of a second transposed sub-matrix in each core, and determine the computation result address of each core according to the row address of each first sub-matrix and the transposed row address of each second transposed sub-matrix.

In one embodiment, the matrix operation device 1200 further comprises a space determination module and a number determination module, wherein:

the space determining module is used for determining a storage space according to the first matrix and the second matrix;

And the quantity determining module is used for determining the quantity of the kernels from the neuromorphic chip according to the storage space.

Each module in the matrix operation device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

obtaining the calculation result address of each kernel;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

Obtaining the calculation result address of each kernel;

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

obtaining the calculation result address of each kernel;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A matrix operation method, applied to a neuromorphic chip, the method comprising:

storing the first submatrices and the second submatrices into different kernels in a neuromorphic chip respectively; storing a first sub-matrix and a second sub-matrix in each core;

2. The method of claim 1, wherein storing the plurality of first sub-matrices and the plurality of second sub-matrices, respectively, into different cores within a neuromorphic chip comprises:

storing each first sub-matrix into each kernel in a mode that one kernel corresponds to one first sub-matrix;

and under the condition that the storage of each first submatrix is completed, storing each second submatrix into each kernel in a mode that one kernel corresponds to one second submatrix.

3. The method according to claim 1 or 2, wherein the obtaining the operation results of the first matrix and the second matrix by running each kernel includes:

if the operation of each core is finished currently, updating the second sub-matrix stored in each core, and operating each core after updating;

after the operation of each updated kernel is finished, continuing to execute the step of updating the second submatrices stored in each kernel until each kernel stores each second submatrix, and obtaining the operation results of each kernel obtained by all operations;

And determining operation results of the first matrix and the second matrix according to operation results of the kernels.

4. A method according to claim 3, wherein said updating the second sub-matrix stored in each of said cores comprises:

controlling the first submatrix in each core to be unchanged, and circulating the second submatrix in each core according to a preset sequence; the preset sequence comprises a column sequence of each second sub-matrix in the second matrix, or a core position sequence.

5. A method according to claim 3, wherein said obtaining operation results for each of said kernels for all operations comprises:

for any kernel, transposing a second sub-matrix in the kernel to generate a second transposed sub-matrix;

6. A method according to claim 3, wherein said determining the operation result of the first matrix and the second matrix according to the operation result of each of the cores comprises:

obtaining the calculation result address of each kernel;

Filling the operation result of each core to the corresponding position in the operation result matrix according to the operation result address of each core to obtain the operation results of the first matrix and the second matrix; the number of rows and columns of the operation result matrix is determined according to the first matrix and the second matrix.

7. The method of claim 6, wherein the obtaining the computation result address of each core comprises:

8. The method of claim 1 or 2, wherein prior to storing the plurality of first sub-matrices and the plurality of second sub-matrices, respectively, into different cores within a neuromorphic chip, the method further comprises:

and determining the number of kernels from the neuromorphic chip according to the storage space.

9. A data processing apparatus, the apparatus comprising:

the storage module is used for respectively storing the plurality of first submatrices and the plurality of second submatrices into different kernels in the neuromorphic chip; storing a first sub-matrix and a second sub-matrix in each core;

the acquisition module is used for acquiring operation results of the first matrix and the second matrix by operating the kernels.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.