CN108984115B

CN108984115B - Data parallel writing and reading method, device and system

Info

Publication number: CN108984115B
Application number: CN201810614178.3A
Authority: CN
Inventors: 刘大可; 苗志东
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2020-07-28
Anticipated expiration: 2038-06-14
Also published as: CN108984115A

Abstract

The invention provides a method, a device and a system for writing and reading data in parallel, wherein the method comprises the following steps: converting a write data index of a data vector to be written into a one-dimensional write address; acquiring a write-in data enable vector, a first storage index vector and a first storage address vector of the to-be-written data vector according to the one-dimensional write-in address and the number of preset write-in data; and reordering the write-in data enable vector, the first storage address vector and the to-be-written data vector according to the first storage index vector, and storing the reordered to-be-written data vector into the parallel memory according to the reordered write-in data enable vector and the first storage address vector. The invention supports the parallel writing of data from one or more dimensions, and improves the flexibility and writing efficiency of data writing.

Description

Data parallel writing and reading method, device and system

Technical Field

The invention belongs to the technical field of data access, and particularly relates to a method, a device and a system for writing and reading data in parallel.

Background

In recent years, artificial intelligence has been widely used in various fields. The artificial intelligence algorithm is usually an algorithm with a large data volume, so in order to accelerate the execution speed of the artificial intelligence algorithm, not only the operation system but also the storage system needs to be optimized.

The GPU is a hardware platform for a largely used artificial intelligence algorithm, and the storage of the GPU is a special multi-level cache structure for matrix calculation to optimize the storage system. In the field of embedded applications, due to constraints of power consumption and the like, artificial intelligence algorithms are typically implemented using custom programmable chips instead of GPUs. One of these embedded chips is the vector computer chip, which is also well suited for artificial intelligence algorithm acceleration, and such computer chips typically use vector memory as a storage system.

However, the GPU is large in size and power consumption. Thus, the use in the embedded application field is greatly limited. Artificial intelligence algorithms require a large number of matrix operations, and therefore the data processed is often multi-dimensional blocks of data, including one-dimensional and multi-dimensional. Different algorithms require parallel sequential reading and writing of data from one or more dimensions. The vector memory can only fixedly access vector data with a specific length each time, and the flexibility of data access is insufficient, so that the requirement of a complex and variable artificial intelligence algorithm on data access cannot be met.

Disclosure of Invention

In order to overcome the problems of large size and power consumption and inflexible data access of the conventional data access system or at least partially solve the problems, the invention provides a data parallel writing and reading method, device and system.

According to a first aspect of the present invention, there is provided a data parallel writing method, comprising:

converting a write data index of a data vector to be written into a one-dimensional write address; the data vector to be written is one-dimensional or multidimensional vector in a multidimensional data matrix to be written, and the written data index is an index of a first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written in;

acquiring a write-in data enable vector, a first storage index vector and a first storage address vector of the to-be-written data vector according to the one-dimensional write-in address and the number of preset write-in data; each element in the write data enable vector is used for indicating whether the element at the corresponding position in the to-be-written data vector is written or not; the first storage index vector is a vector formed by indexes of storage subunits in the parallel memory corresponding to each element in the data vector to be written; the first storage address vector is a vector formed by addresses in storage subunits corresponding to elements in the data vector to be written;

and reordering the write-in data enable vector, the first storage address vector and the to-be-written data vector according to the first storage index vector, and storing the reordered to-be-written data vector into the parallel memory according to the reordered write-in data enable vector and the first storage address vector.

According to a second aspect of the present invention, there is provided a data parallel reading method, comprising:

converting a read data index of a data vector to be read into a one-dimensional read address; the data vector to be read is a one-dimensional or multi-dimensional vector in a multi-dimensional data matrix to be read; the read data index is an index of a first element to be read in all elements of the data vector to be read in the multidimensional data matrix to be read;

acquiring a read data enabling vector, a second storage index vector and a second storage address vector of the to-be-read data vector according to the one-dimensional read address and the number of preset read data; each element in the read data enable vector is used for indicating whether an element at a corresponding position in the to-be-read data vector is read or not; the second storage index vector is a vector formed by indexes of storage subunits of elements in the data vector to be read in a parallel memory; the second storage address vector is a vector formed by addresses of elements in the data vector to be read in each storage subunit;

and reordering the read data enable vector and the second storage address vector according to the second storage index vector, reading a storage data vector from the parallel memory according to the reordered read data enable vector and the second storage address vector, and reordering the storage data vector according to the second storage index vector to obtain the data vector to be read.

According to a third aspect of the present invention, there is provided a data parallel writing apparatus comprising:

the first conversion module is used for converting a write data index of a data vector to be written into a one-dimensional write address; the data vector to be written is one-dimensional or multidimensional vector in a multidimensional data matrix to be written, and the written data index is an index of a first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written in;

the first obtaining module is used for obtaining a write-in data enabling vector, a first storage index vector and a first storage address vector of the data vector to be written according to the one-dimensional write-in address and the number of preset write-in data; each element in the write data enable vector is used for indicating whether the element at the corresponding position in the to-be-written data vector is written or not; the first storage index vector is a vector formed by indexes of storage subunits in the parallel memory corresponding to each element in the data vector to be written; the first storage address vector is a vector formed by addresses in storage subunits corresponding to elements in the data vector to be written;

and the storing module is used for reordering the write-in data enabling vector, the first storage address vector and the to-be-written data vector according to the first storage index vector, and storing the reordered to-be-written data vector into the parallel memory according to the reordered write-in data enabling vector and the first storage address vector.

According to a fourth aspect of the present invention, there is provided a data parallel reading apparatus comprising:

the second conversion module is used for converting the read data index of the data vector to be read into a one-dimensional read address; the data vector to be read is one-dimensional or multidimensional vector in a multidimensional data matrix to be read, and the read data index is an index of a first element to be read in all elements of the data vector to be read in the multidimensional data matrix to be read;

the second obtaining module is used for obtaining a read data enabling vector, a second storage index vector and a second storage address vector of the data vector to be read according to the one-dimensional read address and the number of preset read data; each element in the read data enable vector is used for indicating whether an element at a corresponding position in the to-be-read data vector is read or not; the second storage index vector is a vector formed by indexes of storage subunits in the parallel memory corresponding to each element in the data vector to be read; the second storage address vector is a vector formed by addresses in storage subunits corresponding to elements in the data vector to be read;

and the reading module is used for reordering the read data enable vector and the second storage address vector according to the second storage index vector, reading a storage data vector from the parallel memory according to the reordered read data enable vector and the second storage address vector, reordering the storage data vector according to the second storage index vector and acquiring the data vector to be read.

According to a fifth aspect of the present invention, there is provided a data parallel read-write system, comprising:

parallel memory, and the data parallel writing device and the data parallel reading device.

The invention provides a method, a device and a system for writing and reading data in parallel, wherein the method comprises the steps of converting a written data index of a data vector to be written into a one-dimensional written address, obtaining a written data enable vector, a first storage index vector and a first storage address vector of the data vector to be written according to the one-dimensional written address and the number of preset written data, reordering the written data enable vector, the first storage address vector and the data vector to be written according to the first storage index vector, and storing the reordered data vector to be written into a parallel memory according to the reordered written data enable vector and the first storage address vector, so that the data can be written into the parallel memory from one or more dimensions in parallel, and the flexibility and the writing efficiency of the data are improved.

Drawings

Fig. 1 is a schematic overall flow chart of a data parallel writing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating parallel writing of four-dimensional data vectors to be written in two dimensions in the data parallel writing method according to the embodiment of the present invention;

fig. 3 is a schematic diagram illustrating parallel writing of four-dimensional data vectors to be written in one dimension in the data parallel writing method according to the embodiment of the present invention;

FIG. 4 is a diagram illustrating a method for parallel data writing according to an embodiment of the present invention, in which a written data index is transformed into a one-dimensional written address;

FIG. 5 is a schematic diagram of a network structure for reordering writes in a parallel data writing method according to an embodiment of the present invention;

fig. 6 is a schematic overall flow chart of a data parallel reading method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an overall structure of a data parallel writing apparatus according to an embodiment of the present invention;

fig. 8 is a schematic view of an overall structure of a data parallel reading apparatus according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

Fig. 1 is a schematic overall flow chart of a data parallel writing method provided in an embodiment of the present invention, where the method includes: s101, converting a written data index of a data vector to be written into a one-dimensional written address; the data vector to be written is one-dimensional or multidimensional vector in a multidimensional data matrix to be written, and the written data index is an index of a first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written in the index of the multidimensional data matrix to be written;

the DATA vector to be written is a four-dimensional vector [ DIM3, DIM2, DIM1, DIM0] under the control of a control signal W _ ctrl0 and W _ ctrl1, so as to generate a one-dimensional write address, for example, the DATA vector to be written is a four-dimensional vector [ DIM3, DIM2, DIM1, DIM0], wherein DIM3, DIM2, DIM1, and DIM0 represent different dimensions, the size of the DATA vector to be written in each dimension is DIM3, DIM2, DIM1, and DIM 0. the maximum number of parallel write DATA is S, the DATA vector to be written W _ DATA is a vector with the length of N, the number of continuous writes from DIM 6 and DIM0 is continuous as shown in fig. 2, the maximum number of parallel writes from DIM0, the number of continuous writes from DIM 43 to DIM0, the maximum number of parallel writes from two dimensions is shown in parallel 3, i.e. the maximum number of parallel writes from two dimensions of parallel reads and writes is shown in 1, i.e. the maximum number of parallel writes from the parallel write parameter 1, which is defined as shown in the case of two continuous reads and parallel write parameter 3642.

S102, acquiring a write-in data enable vector, a first storage index vector and a first storage address vector of a data vector to be written according to the one-dimensional write-in address and the number of preset write-in data; each element in the write data enable vector is used for indicating whether the element at the corresponding position in the data vector to be written is written or not; the first storage index vector is a vector formed by indexes of all storage subunits in the parallel memory corresponding to all elements in the data vector to be written; the first storage address vector is a vector formed by addresses in storage subunits corresponding to elements in the data vector to be written;

the preset write data number W _ M is the number of elements needing to be written in the data vector to be written. And calculating according to the one-dimensional write address W _ base and the preset write data number W _ M to obtain a write data enable vector, a first storage index vector and a first storage address vector of the data vector to be written. The write data enable vector, the first storage index vector and the first storage address vector are vectors with the length equal to N respectively. Each element in the write DATA enable vector W _ BE is 0 or 1, and is used to indicate whether the element at the corresponding position in the DATA vector W _ DATA to BE written is written, where 1 indicates writing and 0 indicates not writing. The first storage index vector W _ BI is a vector formed by the indexes of the storage subunits to be stored in the parallel memory corresponding to each element in the DATA vector W _ DATA to be written. The first memory address vector W _ BA is a vector of addresses into which the DATA vector W _ DATA to be written is to be stored in the respective memory subunit.

S103, reordering the write-in data enable vector, the first storage address vector and the data vector to be written according to the first storage index vector, and storing the reordered data vector to be written into the parallel memory according to the reordered write-in data enable vector and the first storage address vector.

Specifically, the write DATA enable vector, the first storage address vector and the DATA vector to BE written are input into a write reordering network, the write DATA enable vector W _ BE, the first storage address vector W _ BA and the DATA vector to BE written W _ DATA are reordered according to the first storage index vector, and the reordered write DATA enable vector W _ BE _ R, the first storage address vector W _ BA _ R and the DATA vector to BE written W _ DATA _ R are obtained. Wherein, W _ BA _ R is the address of each element to BE stored in the DATA vector to BE written corresponding to each storage subunit, W _ BE _ R is the enabling vector of whether each storage subunit is enabled, and W _ DATA _ R is the element to BE stored in the DATA vector to BE written corresponding to each storage subunit.

According to the embodiment, the written data index of the data vector to be written is converted into the one-dimensional written address, the written data enable vector, the first storage index vector and the first storage address vector of the data vector to be written are obtained according to the one-dimensional written address and the preset number of the written data, the written data enable vector, the first storage address vector and the data vector to be written are reordered according to the first storage index vector, and the reordered written data vector to be written is stored in the parallel memory according to the reordered written data enable vector and the first storage address vector, so that the data can be written in from one or more dimensions in parallel, and the flexibility and the writing efficiency of data writing are improved.

On the basis of the foregoing embodiment, in this embodiment, the step S101 specifically includes: reordering the written data indexes, and splitting index values corresponding to preset dimensions written in parallel in the reordered written data indexes into a plurality of index values respectively;

for example, as shown in fig. 4, the write data indices [ dim0, dim1, dim2, dim3] are reordered under the control of w _ ctrl0 to obtain reordered write data indices [ dimnp1, dimnp0, dimp1, dimp0 ]. The DIMP1 and the DIMP0 are two dimensions for parallel writing, the DIMNP1 and the DIMNP0 are two dimensions for which parallel writing is not required, and the sizes of the data vectors to be written DIMNP1, DIMNP0, DIMP1 and DIMP0 in the four dimensions are DIMNP1, DIMNP0, DIMP1 and DIMP0, respectively.

The maximum number of parallel writes of two parallel write dimensions, DIM1 and DIM0, is defined as K and L, respectively, K and L may be set according to the performance of the parallel memory, according to K, DIM1 is split into two dimensions, i.e., DIM1_ P and DIM1_ B, where DIM1_ P is DIM 1% K, DIM1_ B is DIM 1// K, the size of the data vector to be written on dimension DIM1_ P is DIM1_ P _ K, the size on dimension DIM1_ B is DIM1_ B1// K, according to 1, the DIM1 is split into two dimensions, i.e., DIM1_ P and DIM1_ B, where DIM1_ P is DIM 1% 1, DIM1_ B is 1, DIM1_ np is 1, the size of data vector to be written on dimension 1, and the data vector to be written on dimension 1 is 1, and the data vector 1 is 1, 1B 1 is 1, 1 is 1B 1, 1 is 1B, 1, and 1 is 1B 1 is 1, 1B 1, where the data is split into a data to be written on a vector 1.

And reordering the split written data index again under the control of w _ ctrl1, and calculating according to the reordered written data index to obtain a one-dimensional written address.

For example, the split write data indices [ dimnp1, dimnp0, dimp1_ b, dimp1_ p, dimp0_ b, dimp0_ p ] are reordered again, resulting in reordered write data indices [ dnp3, dnp2, dnp1, dnp0, dp1, dp0 ]. Wherein dp1 is dimp1_ p, and dp0 is dimp0_ p. dnp3, dnp2, dnp1 and dnp0 are again reordered from dimnp1, dimnp0, dimp1_ b and dimp0_ b. The sizes of the data vectors to be written in the corresponding dimensions after re-reordering are DNP3, DNP2, DNP1, DNP0, DP1 and DP0, respectively, which are obtained by performing the same re-reordering on DIMNP1, DIMNP0, DIMP1_ B, DIMP1_ P, DIMP0_ B and DIMP0_ P. Calculating according to the re-reordered written data index, and obtaining the one-dimensional written address w _ base by the following formula:

w _ base ═ DP0+ DP1 × DP0+ DNP0 × DP0 × DP1+ DNP1 × DP0 × DP1 × DNP0+ dn p2 × DP0 × DP1 × DNP0 × DNP1+ DNP3 × DP0 × DP1 × DNP0 × DNP1 × DNP 1. The present embodiment is not limited to the dimension of the data vector to be written, the dimension for data splitting and the dimension into which the data is split, nor to the dimension for reordering again when reordering again.

On the basis of the foregoing embodiment, in this embodiment, the step of obtaining the write data enable vector, the first storage index vector, and the first storage address vector of the data vector to be written according to the one-dimensional write address and the preset number of write data specifically includes: determining the number of elements with a median value of 1 in a write data enable vector according to the number of preset write data, and determining the number of elements with a median value of 0 in a write data enable vector according to the difference between the length of the data vector to be written and the number of the preset write data; and acquiring a first storage index vector and a first storage address vector according to the index of each element in the to-be-written data vector, the one-dimensional writing address and the length of the to-be-written data vector.

Specifically, from W _ M, a mathematical representation of a write data enable vector W _ BE can BE derived: w _ BE [ W _ M {1}, (N-W _ M) {0} ]. I.e. the first W _ M elements of the DATA vector W _ DATA to be written are written, and the other elements of the W _ DATA are not written. The first storage index vector W _ BI is calculated as: w _ BI ═ (W _ base + [0,1,2, …, N-1 ])% N. The first memory address vector W _ BA is calculated by the formula: w _ BA ═ (W _ base + [0,1,2, …, N-1])// N.

On the basis of the foregoing embodiments, in this embodiment, the reordering the write data enable vector, the first storage address vector, and the to-be-written data vector according to the first storage index vector specifically includes: acquiring a write-in data index vector formed by indexes of elements corresponding to the indexes of the storage subunits in the to-be-written data vector according to the first storage index vector; and reordering the write data enable vector, the first storage address vector and the data vector to be written according to the write data index vector.

Specifically, the write DATA index vector W _ BI _ R is a vector corresponding to W _ BI, and represents an index vector formed by indexes of elements to be written in the DATA vector W _ DATA to be written by each storage subunit in the parallel memory, and the vector can be obtained by calculating the W _ BI, and the formula is as follows: w _ BI _ R ═ N-W _ BI [0] + [0,1,2, …, N-1 ])% N. The write DATA enable vector W _ BE, the first memory address vector W _ BA, and the DATA-to-BE-written vector W _ DATA are reordered according to the write DATA index vector W _ BI _ R. In the write reordering network, W _ BE, W _ BA, and W _ DATA are independently reordered according to W _ BI _ R, and a schematic structure of the write reordering network is shown in fig. 5.

The data vector to be written is, for example, a [ DIM, DIM, DIM, DIM ] is written into a parallel memory composed of 4 memory subunits, that is, N is 4, the size of the data vector to be written corresponds to DIM 12, DIM 10, DIM 8, DIM 6, the index values of the data index to be written in 4 dimensions are interleaved under the control of w _ ctrl, the output and input relationships are a DIM np, DIM, DIM2, the size of the data vector to be written in dimensions is DIM 8, DIM np is 6, DIM 12, DIM 10 is read/write, and the data vector to be written in 2 dimensions is read/write, a DIM mp 2, a DIM DNP is a DIM2, and a DIM mp is read/write-up, and a DIM2, the data vector to be read/write-in 2 dimensions is read/write-up, and a DIM2, and the data vector to be read/write-up is read/write-up, and read/write-up are read/write-up, respectively, and read/write-up are performed according to dimensions, i.

When the write DATA index W _ index is [0,2,0,0], the preset DATA write number W _ M is 4, and the DATA vector to be written W _ DATA is [6,7,8,9], the DATA parallel write process is as follows:

calculating w _ base according to the write data index w _ index, namely:

w_base＝dp0_p+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1＝2。

and calculating W _ BE, W _ BA and W _ BI _ R according to the W _ base and the W _ M, namely:

W_BE＝[W_M{1},(N-W_M){0}]＝[1,1,1,1]；

W_BI＝(w_base+[0,1,2,…,N-1])％N＝[2,3,0,1]；

W_BI_R＝(N-W_BI[0]+[0,1,2,…,N-1])％N＝[2,3,0,1]；

W_BA＝(w_base+[0,1,2,…,N-1])//N＝[0,0,1,1]；

the W _ BE, W _ BA and W _ DATA inputs are written into the reordering network, and under the control of W _ BI _ R, the output is:

W_BE_R＝[1,1,1,1]；

W_BA_R＝[1,1,0,0]；

W_DATA_R＝[8,9,6,7]。

and writing the data vector to BE written in parallel according to W _ BE _ R, W _ BA _ R and W _ BA _ R. Since each element in the W _ BE _ R vector is 1, all memory subunits are enabled, W _ BA _ R gives the address of each memory subunit, and W _ DATA _ R gives the DATA to BE stored by each memory subunit.

In another embodiment of the present invention, a data parallel reading method is provided, and fig. 6 is a schematic overall flow chart of the data parallel reading method provided in the embodiment of the present invention, where the method includes: s601, converting a read data index of a data vector to be read into a one-dimensional read address; the data vector to be read is one-dimensional or multidimensional vector in a multidimensional data matrix to be read, and the read data index is an index of a first element to be read in the multidimensional data matrix to be read in all elements of the data vector to be read;

the data vector to be read is a data vector which needs to be read in parallel. The read data index is an index of a first element to be read in the data vector to be read in the multidimensional data matrix to be read. Wherein the element to be read is an element which needs to be read. The read data index is transformed under the control of the control signals r _ ctrl0 and r _ ctrl1 to generate a one-dimensional read address. The one-dimensional read address generation method is the same as the one-dimensional write address generation method.

S602, acquiring a read data enable vector, a second storage index vector and a second storage address vector of a data vector to be read according to the one-dimensional read address and the number of preset read data; each element in the read data enable vector is used for indicating whether an element at a corresponding position in the data vector to be read is read or not; the second storage index vector is a vector formed by indexes of storage subunits of elements in the data vector to be read in the parallel memory; the second storage address vector is a vector formed by addresses of elements in the data vector to be read in each storage subunit;

the preset number of the read data is the number of elements needing to be read in the data vector to be read. And calculating according to the one-dimensional reading address R _ base and the preset reading data number R _ M to obtain a reading data enabling vector, a second storage index vector and a second storage address vector of the data vector to be read. The read data enable vector, the second storage index vector and the second storage address vector are vectors with the length equal to N respectively. Each element in the read DATA enable vector R _ BE is 0 or 1, and is used to indicate whether the element at the corresponding position in the DATA vector R _ DATA to BE read is read, where 1 indicates read and 0 indicates no read. The second storage index vector R _ BI is a vector formed by storing the index of the subunit in the parallel memory for each element in the DATA vector R _ DATA to be read. The second memory address vector R _ BA is a vector formed by reading addresses of each element in the DATA vector R _ DATA to be read in each memory subunit.

S603, reordering the read data enable vector and the second storage address vector according to the second storage index vector, reading the storage data vector from the parallel memory according to the reordered read data enable vector and the second storage address vector, reordering the storage data vector according to the second storage index vector, and obtaining the data vector to be read.

Specifically, the read data enable vector, the second storage address vector and the data vector to BE read are input into a read reordering network, the read data enable vector R _ BE and the second storage address vector R _ BA are reordered according to the second storage index vector R _ BI, and the reordered read data enable vector R _ BE _ R and the second storage address vector R _ BA _ R are obtained. Wherein, R _ BA _ R is the address of each element in the data vector to BE read pre-stored in each storage subunit, and R _ BE _ R is an enable vector indicating whether each element in the data vector to BE read pre-stored in each storage subunit is read or not. The stored DATA vector R _ DATA _ R is read from the parallel memory according to R _ BE _ R and R _ BA _ R. The order of the elements in the R _ DATA _ R is arranged in the order of the memory subunits, which is the same as but in a different order than the values of the elements in the DATA vector R _ DATA to be read. And reordering the R _ DATA _ R according to the second storage index vector R _ BI to obtain the R _ DATA.

In the embodiment, the read data index of the data vector to be read is converted into the one-dimensional read address, the write data enable vector, the second storage index vector and the second storage address vector of the data vector to be read are obtained according to the one-dimensional read address and the preset number of the read data, the write data enable vector and the first storage address vector are reordered according to the second storage index vector, the stored data vector is read from the parallel memory according to the reordered read data enable vector and the second storage address vector, the stored data vector is reordered according to the second storage index vector, and the data vector to be read is obtained, so that the data can be read from one or more dimensions in parallel, and the flexibility of data reading and the writing efficiency are improved.

On the basis of the foregoing embodiment, step S601 in this embodiment specifically includes: the step of converting the read data index of the data vector to be read into a one-dimensional read address specifically comprises: reordering the read data indexes, and splitting index values corresponding to preset dimensions which are read in parallel in the reordered read data indexes into a plurality of index values respectively; and reordering the split read data index again, and calculating according to the reordered read data index again to obtain a one-dimensional read address.

For example, the read data indices [ dim0, dim1, dim2, dim3] are reordered under the control of r _ ctrl0 to obtain reordered read data indices [ dimnp1, dimnp0, dimp1, dimp0 ]. The DIMP1 and the DIMP0 are two dimensions for parallel reading, the DIMNP1 and the DIMNP0 are two dimensions for which parallel reading is not required, and the sizes of the data vectors to be read DIMNP1, DIMNP0, DIMP1 and DIMP0 in the four dimensions are DIMNP1, DIMNP0, DIMP1 and DIMP0, respectively.

The maximum number of parallel reads of two parallel read dimensions, DIM1 and DIM0, is defined as K and L, respectively, K and L may be set according to the performance of the parallel memory, according to K, DIM1 is split into two dimensions, i.e., DIM1_ P and DIM1_ B, where DIM1_ P is DIM 1% K, DIM1_ B is DIM 1// K, the size of the vector of data to be read on dimension DIM1_ P is DIM1_ P _ K, the size on dimension DIM1_ B is DIM1_ B, di 1/K, according to 1, the size of DIM1 is split into two dimensions, i.e., DIM1_ P and DIM1_ B, where DIM1_ P is DIM 1% 1, DIM1_ B is 1, the size of DIM1_ P is 1, and the size of the vector of data to be read on dimension 1 is 1_ mnp1, and the vector of data to be read 1_ DIM1_ 1, where the size of the vector of data to be read 1_ DIM1_ 1, the dimension 1_ mnp 1_ 1_ 1_ 1 is 1, the size of the vector of data to be read on dimension 1, where the dimension 1_ mnp 1_ 1_ 1_ 1, the dimension 363672 _ 1, the index 1_ 1_ 1_ mnp _ 1_ 1, the index of the vector of the index 1.

And reordering the split read data index again, and calculating according to the reordered read data index again to obtain a one-dimensional read address.

For example, the split read data indices [ dimnp1, dimnp0, dimp1_ b, dimp1_ p, dimp0_ b, dimp0_ p ] are reordered again under the control of r _ ctrl1, resulting in reordered read data indices [ dnp3, dnp2, dnp1, dnp0, dp1, dp0 ]. Wherein dp1 is dimp1_ p, and dp0 is dimp0_ p. dnp3, dnp2, dnp1 and dnp0 are again reordered from dimnp1, dimnp0, dimp1_ b and dimp0_ b. The sizes of the data vector to be read in the corresponding dimensions after re-reordering are DNP3, DNP2, DNP1, DNP0, DP1 and DP0, respectively, which are obtained by similarly re-reordering DIMNP1, DIMNP0, DIMP1_ B, DIMP1_ P, DIMP0_ B and DIMP0_ P. Calculating according to the re-reordered read data index, and obtaining a one-dimensional read address w _ base by the following formula:

w _ base ═ DP0+ DP1 × DP0+ DNP0 × DP0 × DP1+ DNP1 × DP0 × DP1 × DNP0+ dn p2 × DP0 × DP1 × DNP0 × DNP1+ DNP3 × DP0 × DP1 × DNP0 × DNP1 × DNP 1. The embodiment is not limited to the dimension of the data vector to be read, the dimension for splitting the data and the dimension for splitting into the data, nor the dimension for not re-ordering when re-ordering is performed.

On the basis of the foregoing embodiment, in this embodiment, the step of obtaining the read data enable vector, the second storage index vector, and the second storage address vector of the data vector to be read according to the one-dimensional read address and the preset number of read data specifically includes:

determining the number of elements with the median value of 1 in a read data enabling vector according to the number of preset read data, and determining the number of elements with the median value of 0 in the read data enabling vector according to the difference value between the length of the data vector to be read and the number of the preset read data; and acquiring a second storage index vector and a second storage address vector according to the index of each element in the data vector to be read, the one-dimensional reading address and the length of the data vector to be read.

Specifically, from R _ M, a mathematical representation of a read data enable vector R _ BE can BE derived: r _ BE ═ R _ M {1}, (N-R _ M) {0} ]. That is, the first R _ M elements in the DATA vector R _ DATA to be read are written, and the other elements in the R _ DATA are not read. The calculation formula of the second storage index vector R _ BI is: r _ BI ═ (R _ base + [0,1,2, …, N-1 ])% N. The second memory address vector R _ BA is calculated by the formula: r _ BA ═ (R _ base + [0,1,2, …, N-1])// N.

On the basis of the foregoing embodiments, in this embodiment, the reordering the read data enable vector and the second memory address vector according to the second memory index vector specifically includes: acquiring a read data index vector formed by indexes of elements corresponding to the indexes of the storage subunits in the to-be-read data vector according to the second storage index vector; the read data enable vector and the second memory address vector are reordered according to the read data index vector.

Specifically, the read DATA index vector R _ BI _ R is a vector corresponding to R _ BI, and represents an index vector formed by indexes of elements to be read in the read DATA vector R _ DATA of each storage subunit in the parallel memory, and the vector can be obtained by calculating R _ BI, and the formula is as follows: r _ BI _ R ═ (N-R _ BI [0] + [0,1,2, …, N-1 ])% N. The write data enable vector R _ BE and the second memory address vector R _ BA are reordered according to the read data index vector R _ BI _ R. In the read reordering network, R _ BE and R _ BA are independently reordered according to R _ BI _ R, and the structure of the read reordering network is the same as that of the write in ordering network. The stored DATA vector R _ DATA _ R is read from the parallel memory according to R _ BE _ R and R _ BA _ R. The order of the elements in the R _ DATA _ R is arranged in the order of the memory subunits, which is the same as but in a different order than the values of the elements in the DATA vector R _ DATA to be read. And reordering the R _ DATA _ R according to the R _ BI to obtain the R _ DATA.

For example, the DATA vector W _ DATA to be written to the parallel memory recited in the above embodiment is read [6,7,8,9 ]. That is, when the DATA vector R _ DATA to be read is [6,7,8,9], the read DATA index R _ dex is [0,2,0,0], the preset number of read DATA R _ M is 4, and the DATA parallel reading process is as follows:

calculating r _ base according to the read data index r _ index, namely:

r_base＝dp0_p+dp1*DP0+dnp0*DP0*DP1+dnp1*DP0*DP1*DNP0+dnp2*

DP0*DP1*DNP0*DNP1+dnp3*DP0*DP1*DNP0*DNP1*DNP1＝2。

and calculating R _ BE, R _ BA and R _ BI _ R according to R _ base and R _ M, namely:

R_BE＝[R_M{1},(N-R_M){0}]＝[1,1,1,1]；

R_BI＝(r_base+[0,1,2,…,N-1])％N＝[2,3,0,1]；

R_BI_R＝(N-R_BI[0]+[0,1,2,…,N-1])％N＝[2,3,0,1]；

R_BA＝(r_base+[0,1,2,…,N-1])//N＝[0,0,1,1]；

r _ BE and R _ BA are input into the reading reordering network, and under the control of R _ BI _ R, the output is as follows:

R_BE_R＝[1,1,1,1]；

R_BA_R＝[1,1,0,0]；

and reading the storage DATA vector R _ DATA _ R from the parallel memory according to R _ BE _ R and R _ BA _ R and W _ BA _ R. Since each element in the R _ BE _ R vector is 1, all memory subunits are enabled, R _ BA _ R gives the address of each memory subunit, and the DATA read is R _ DATA _ R ═ 8,9,6, 7. The values of the elements in R _ DATA _ R and R _ DATA are the same, but in a different order, requiring reordering. And reordering the R _ DATA _ R under the control of the R _ BI to obtain the R _ DATA.

In another embodiment of the present invention, a data parallel writing apparatus is provided, referring to fig. 7. The device is used for realizing the embodiment of the data parallel writing method. Therefore, the description and definition in the data parallel writing method in the foregoing embodiments can be used for understanding each execution module in the embodiments of the present invention.

The data parallel writing device comprises: the first transformation module 701 is configured to transform a write data index of a data vector to be written into a one-dimensional write address; the data vector to be written is one-dimensional or multidimensional vector in a multidimensional data matrix to be written, and the written data index is an index of a first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written; the first obtaining module 702 is configured to obtain a write data enable vector, a first storage index vector, and a first storage address vector of a data vector to be written according to a one-dimensional write address and a preset number of write data; each element in the write data enable vector is used for indicating whether the element at the corresponding position in the data vector to be written is written or not; the first storage index vector is a vector formed by indexes of all storage subunits in the parallel memory corresponding to all elements in the data vector to be written; the first storage address vector is a vector formed by addresses in storage subunits corresponding to elements in the data vector to be written; the storing module 703 is configured to reorder the write-in data enable vector, the first storage address vector, and the to-be-written data vector according to the first storage index vector, and store the reordered to-be-written data vector into the parallel memory according to the reordered write-in data enable vector and the first storage address vector.

On the basis of the foregoing embodiment, in this embodiment, the first transformation module is specifically configured to: reordering the written data indexes, and splitting index values corresponding to preset dimensions written in parallel in the reordered written data indexes into a plurality of index values respectively; and reordering the split written data index again, and calculating according to the reordered written data index again to obtain a one-dimensional written address.

On the basis of the foregoing embodiment, in this embodiment, the first obtaining module is specifically configured to: determining the number of elements with a median value of 1 in a write data enable vector according to the number of preset write data, and determining the number of elements with a median value of 0 in a write data enable vector according to the difference between the length of the data vector to be written and the number of the preset write data; and acquiring a first storage index vector and a first storage address vector according to the index of each element in the to-be-written data vector, the one-dimensional writing address and the length of the to-be-written data vector.

On the basis of the foregoing embodiments, the storage module in this embodiment is specifically configured to: acquiring a write-in data index vector formed by indexes of elements corresponding to the indexes of the storage subunits in the to-be-written data vector according to the first storage index vector; and reordering the write data enable vector, the first storage address vector and the data vector to be written according to the write data index vector.

The writing data index of the data vector to be written is converted into a one-dimensional writing address, the writing data enable vector, the first storage index vector and the first storage address vector of the data vector to be written are obtained according to the one-dimensional writing address and the preset number of the writing data, the writing data enable vector, the first storage address vector and the data vector to be written are reordered according to the first storage index vector, and the reordered data vector to be written is stored in the parallel memory according to the reordered writing data enable vector and the first storage address vector, so that the data can be written in from one or more dimensions in parallel, and the flexibility and the writing efficiency of data writing are improved.

In another embodiment of the present invention, a data parallel reading apparatus is provided, referring to fig. 8. The device is used for realizing the embodiments of the data parallel reading method. Therefore, the description and definition in the data parallel reading method in the foregoing embodiments can be used for understanding each execution module in the embodiments of the present invention.

The data parallel writing device comprises: the second conversion module 801 is configured to convert a read data index of a data vector to be read into a one-dimensional read address; the data vector to be read is one-dimensional or multidimensional vector in a multidimensional data matrix to be read, and the read data index is an index of a first element to be read in the multidimensional data matrix to be read in all elements of the data vector to be read; the second obtaining module 802 is configured to obtain a read data enable vector, a second storage index vector, and a second storage address vector of the data vector to be read according to the one-dimensional read address and the number of preset read data; each element in the read data enable vector is used for indicating whether an element at a corresponding position in the data vector to be read is read or not; the second storage index vector is a vector formed by indexes of all storage subunits in the parallel memory corresponding to all elements in the data vector to be read; the second storage address vector is a vector formed by addresses in the storage subunits corresponding to the elements in the data vector to be read; the reading module 803 is configured to reorder the read data enable vector and the second storage address vector according to the second storage index vector, read the storage data vector from the parallel memory according to the reordered read data enable vector and the second storage address vector, reorder the storage data vector according to the second storage index vector, and obtain the data vector to be read.

On the basis of the foregoing embodiment, the second transformation module in this embodiment is specifically configured to: reordering the read data indexes, and splitting index values corresponding to preset dimensions which are read in parallel in the reordered read data indexes into a plurality of index values respectively; and reordering the split read data index again, and calculating according to the reordered read data index again to obtain a one-dimensional read address.

On the basis of the foregoing embodiment, the second obtaining module in this embodiment is specifically configured to: determining the number of elements with the median value of 1 in a read data enabling vector according to the number of preset read data, and determining the number of elements with the median value of 0 in the read data enabling vector according to the difference value between the length of the data vector to be read and the number of the preset read data; and acquiring a second storage index vector and a second storage address vector according to the index of each element in the data vector to be read, the one-dimensional reading address and the length of the data vector to be read.

On the basis of the foregoing embodiments, the reading module in this embodiment is specifically configured to: acquiring a read data index vector formed by indexes of elements corresponding to the indexes of the storage subunits in the to-be-read data vector according to the second storage index vector; the read data enable vector and the second memory address vector are reordered according to the read data index vector.

In another embodiment of the present invention, a data parallel read-write system is provided, which includes a parallel memory, any one of the above embodiments of the data parallel write devices, and any one of the above embodiments of the data parallel read devices.

Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for parallel writing of data, comprising:

converting a write data index of a data vector to be written into a one-dimensional write address; the data vector to be written is one-dimensional or multidimensional vector in a multidimensional data matrix to be written, and the written data index is an index of a first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written;

2. The method of claim 1, wherein transforming the write data index of the data vector to be written to a one-dimensional write address comprises:

reordering the written data indexes, and splitting index values corresponding to preset dimensions written in the reordered written data indexes in parallel into a plurality of index values respectively;

and reordering the split written data index again, and calculating according to the reordered written data index again to obtain the one-dimensional written address.

3. The method according to claim 1, wherein the step of obtaining a write data enable vector, a first storage index vector, and a first storage address vector of the to-be-written data vector according to the one-dimensional write address and a preset number of write data specifically comprises:

determining the number of elements with a median value of 1 in the write data enable vector according to the number of the preset write data, and determining the number of elements with a median value of 0 in the write data enable vector according to the difference between the length of the to-be-written data vector and the number of the preset write data;

and acquiring the first storage index vector and the first storage address vector according to the index of each element in the to-be-written data vector, the one-dimensional writing address and the length of the to-be-written data vector.

4. The method according to any of claims 1-3, wherein reordering the write data enable vector, the first memory address vector, and the to-be-written data vector according to the first memory index vector comprises:

according to the first storage index vector, obtaining a write-in data index vector formed by indexes of elements corresponding to the indexes of the storage subunits in the to-be-written-in data vector;

and reordering the write data enable vector, the first storage address vector and the to-be-written data vector according to the write data index vector.

5. A method for parallel reading of data, comprising:

converting a read data index of a data vector to be read into a one-dimensional read address; the data vector to be read is one-dimensional or multidimensional vector in a multidimensional data matrix to be read, and the read data index is an index of a first element to be read in all elements of the data vector to be read in the multidimensional data matrix to be read;

6. The method of claim 5, wherein transforming the read data index of the data vector to be read to a one-dimensional read address comprises:

reordering the read data index, and splitting index values corresponding to preset dimensions read in parallel in the reordered read data index into a plurality of index values respectively;

and reordering the split read data index again, and calculating according to the reordered read data index again to obtain the one-dimensional read address.

7. The method according to claim 5, wherein the step of obtaining the read data enable vector, the second storage index vector, and the second storage address vector of the to-be-read data vector according to the one-dimensional read address and a preset number of read data specifically comprises:

determining the number of elements with the median value of 1 in the read data enabling vector according to the preset number of the read data, and determining the number of elements with the median value of 0 in the read data enabling vector according to the difference value between the length of the data vector to be read and the preset number of the read data;

and acquiring the second storage index vector and the second storage address vector according to the index of each element in the data vector to be read, the one-dimensional reading address and the length of the data vector to be read.

8. The method of any of claims 5-7, wherein reordering the read data enable vector and the second memory address vector according to the second memory index vector specifically comprises:

according to the second storage index vector, obtaining a read data index vector formed by indexes of elements corresponding to the indexes of the storage subunits in the to-be-read data vector;

reordering the read data enable vector and the second memory address vector according to the read data index vector.

9. A data parallel writing apparatus, comprising:

the first conversion module is used for converting a write data index of a data vector to be written into a one-dimensional write address; the data vector to be written is one-dimensional or multidimensional vector in a multidimensional data matrix to be written, and the written data index is an index of a first element to be written in all elements of the data vector to be written in the multidimensional data matrix to be written;

10. A data parallel reading apparatus, comprising:

11. A system for parallel reading and writing of data, characterized in that it comprises a parallel memory, and an apparatus according to claims 9 and 10.