CN114201443A

CN114201443A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114201443A
Application number: CN202111542673.6A
Authority: CN
Inventors: 裴京; 王松; 谢天天; 于秋爽
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-03-18

Abstract

The disclosure relates to a data processing method, a data processing device, an electronic device and a storage medium, which are applied to a multi-processor many-core system, wherein the processing method comprises the following steps: dividing the data into a plurality of first data, and determining a vector number corresponding to the first data; wherein the first data comprises at least one vector; determining a vector number corresponding to an inward vector of each first data based on the vector number corresponding to the first data; and sequentially distributing the vectors in each first data to corresponding computing cores based on the corresponding vector numbers of the vectors in each first data. The processing method realizes flexible distribution of the first data. In addition, the processing method is not limited by the number of dimensions, the flow of distributing the multidimensional data to the computing cores can be simplified in a mode of distributing the first data through vector numbers, high flexibility and convenience in data distribution are achieved, and the processing efficiency of the multi-processor multi-core system is improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a method and an apparatus for processing data, an electronic device, and a storage medium.

Background

The brain-like computing chip adopts a decentralized many-core parallel processing architecture, each computing core can independently operate and exchange data, and under the normal condition, the architecture integrating storage and computation easily causes the limitation of computing resources of each computing core, so how to divide the processing data, how to distribute the divided data to each computing core and directly influence the data processing efficiency of the computing cores.

Disclosure of Invention

According to a first aspect of the present disclosure, there is provided a data processing method applied to a multi-processor many-core system, the multi-processor many-core system including a plurality of processors, each processor including a plurality of computing cores, the processing method including: dividing the data into a plurality of first data, and determining a vector number corresponding to the first data; wherein the first data comprises at least one vector; determining a vector number corresponding to an inward vector of each first data based on the vector number corresponding to the first data; and sequentially distributing the vectors in each first data to corresponding computing cores based on the corresponding vector numbers of the vectors in each first data.

In a possible implementation manner, the vector includes a plurality of second data, and the allocation sequence numbers corresponding to the second data in the same vector are consecutive, and the allocation sequence numbers are used to indicate the sequence in which the second data are allocated to the computing core.

In a possible implementation, the determining a vector number corresponding to the first data includes: in a case where the number of distribution dimensions of the data is greater than or equal to two, performing the following: determining a first dimension which is not the first distribution order and a second dimension which is not the last distribution order according to the distribution order of the data in the distribution dimension; and determining all vector numbers corresponding to the first data based on the length of the first data in the first dimension and the number of the first data in the second dimension.

In a possible implementation, the sequentially allocating the vectors in each first data to the corresponding computing cores based on the corresponding vector numbers of the vectors in each first data includes: determining a computing core corresponding to each first data; and according to the sequence of the vector number corresponding to each first datum, sequentially distributing the vectors corresponding to the vector numbers to the corresponding computing cores.

In one possible embodiment, the dividing the data into a plurality of first data includes: dividing the data into a plurality of first data based on preset dividing parameters; wherein the partitioning parameter is used to determine a number of the first data in each dimension; and generating coordinates corresponding to each first datum based on the dividing parameters.

In a possible implementation manner, the determining, based on the vector number corresponding to the first data, the vector number corresponding to the vector within each first data includes: in a case where the number of distribution dimensions of the data is greater than or equal to two, performing the following: determining a vector number corresponding to a first vector in each first datum based on the length of the first datum in a first dimension, the number of the first datum in a second dimension and corresponding coordinates; and determining all vector numbers corresponding to each first datum based on the first vector number corresponding to each first datum and all vector numbers in the first datum.

In a possible implementation manner, the determining, based on the vector number corresponding to the first data, the vector number corresponding to the vector within each first data includes: and determining the vector number corresponding to each first data based on the number of the first data in the distribution dimensions and the vector number corresponding to the first data when the number of the distribution dimensions of the data is equal to one.

According to a second aspect of the present disclosure, there is provided a data processing apparatus applied to a multi-processor many-core system including a plurality of processors, each processor including a plurality of computing cores, the processing apparatus comprising: the data dividing module is used for dividing the data into a plurality of first data and determining a vector number corresponding to the first data; wherein the first data comprises at least one vector; the vector distribution module is used for determining a vector number corresponding to an inward vector in each first data based on the vector number corresponding to the first data; and the data distribution module is used for sequentially distributing the vectors in each first data to the corresponding computing cores based on the vector numbers corresponding to the vectors in each first data.

Wherein the processor is configured to perform a method of processing data according to any one of claims 1 to 8.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the data processing method of any one of the above.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method of processing data as described in any one of the above.

The embodiment of the disclosure provides a data processing method, which is applied to a multi-processor multi-core system, and can divide data into a plurality of first data, determine a vector number corresponding to the first data, determine a vector number corresponding to an internal vector of each first data based on the vector number corresponding to the first data, and sequentially allocate the vectors in each first data to corresponding computation cores based on the vector number corresponding to the internal vector of each first data, thereby realizing flexible allocation of the first data. In addition, the processing method is not limited by the number of dimensions, the flow of distributing the multidimensional data to the computing cores can be simplified in a mode of distributing the first data through vector numbers, high flexibility and convenience in data distribution are achieved, and the processing efficiency of the multi-processor multi-core system is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a reference diagram illustrating data partitioning in the related art.

FIG. 2 is a reference schematic diagram of a multi-processor many-core system provided in accordance with an embodiment of the present disclosure.

Fig. 3 is a flowchart of a data processing method according to an embodiment of the present disclosure.

Fig. 4 is a reference diagram illustrating a processing method of two-dimensional data according to an embodiment of the present disclosure.

Fig. 5 is a reference diagram illustrating a processing method for processing three-dimensional data according to an embodiment of the disclosure.

Fig. 6 is a reference diagram illustrating a processing method for processing three-dimensional data according to an embodiment of the disclosure.

Fig. 7 is a reference diagram illustrating a four-dimensional data processing method according to an embodiment of the disclosure.

Fig. 8 is a reference diagram illustrating a processing method of two-dimensional data according to an embodiment of the present disclosure.

Fig. 9 is a reference diagram of a data processing method according to an embodiment of the disclosure.

Fig. 10 is a block diagram of a data processing apparatus provided according to an embodiment of the present disclosure.

Fig. 11 is a block diagram of an electronic device 1200 provided according to an embodiment of the present disclosure.

Fig. 12 is a block diagram of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating data partitioning in the related art. Core _0 to Core _5 in fig. 1 represent 6 computation cores, each small square represents one piece of sub data (for example, one frame in a one-dimensional time sequence signal), and 1 to 24 represent the storage sequence number of each sub data in the storage medium (in the sub data allocation process, it can also be understood as the sequence number of sub data allocated to the computation Core). In the related art, for the one-dimensional data, an averaging manner is usually adopted, that is, the total number of sub-data is divided by the number of the computing cores to be allocated, in fig. 1, the number of the computing cores is 6, and the number of the sub-data is 24, then the number of the sub-data allocated to each computing core is 24/6, that is, 4. With reference to fig. 1, the inventors have found that the distribution formula has a problem of inflexible data distribution, that is, the storage formula defines that the storage order of the sub-data in each computing core is continuous and the storage order of the sub-data between adjacent computing cores is continuous, for example: the storage sequence numbers of the sub data in Core _0 are 1 to 4 in sequence, and the storage sequence number of the last sub data in Core _0 after allocation is 4, which is consecutive to the storage sequence number 5 of the first sub data in Core _ 1. In addition, the distribution method cannot be well generalized to multidimensional, resulting in low flexibility of data distribution to the computing cores.

In view of this, the present disclosure provides a data processing method, which is applied to a multi-processor multi-core system, and is capable of dividing data into a plurality of first data, determining a vector number corresponding to the first data, then determining a vector number corresponding to an internal vector of each first data based on the vector number corresponding to the first data, and then sequentially allocating the vectors in each first data to corresponding computation cores based on the vector number corresponding to the internal vector of each first data, thereby achieving flexible allocation of the first data. In addition, the processing method is not limited by the number of dimensions, the flow of distributing the multidimensional data to the computing cores can be simplified in a mode of distributing the first data through vector numbers, high flexibility and convenience in data distribution are achieved, and the processing efficiency of the multi-processor multi-core system is improved.

Referring to FIG. 2, FIG. 2 is a schematic diagram of a multiprocessor many-core system according to an embodiment of the present disclosure. As shown in FIG. 2, a multi-processor many-core system may include multiple processors.

In one possible implementation, as shown in fig. 2, each processor may include multiple computing cores, with data transfer enabled between computing cores within each processor, and between computing cores of different processors; wherein each computing core includes a storage component for storing data for transmission with other computing cores.

In one possible implementation, as shown in FIG. 2, each compute core may include a processing component and a storage component. The processing means may comprise a dendrite unit, an axon unit, a soma unit, a routing unit. The storage part may include a plurality of storage units.

In a possible implementation manner, a plurality of processors may also be integrated into a brain-like computing chip, which is a neural morphological circuit integrated with a computer, and by taking a processing mode of the brain as a reference, the processing efficiency is improved and the power consumption is reduced by simulating the transmission and processing of information by neurons in the brain. Each processor can comprise a plurality of computing cores, and different tasks can be processed independently among the computing cores or the same task can be processed in parallel, so that the processing efficiency is improved. The information transmission between the cores can be carried out through the routing unit in the computing core.

Within the computing core, processing components and storage components may be provided. The processing means may comprise a dendrite unit, an axon unit, a soma unit and a routing unit. The processing component can simulate the processing mode of neurons of the brain on information, wherein the dendritic units are used for receiving signals, the axonal units are used for sending spike signals, the soma units are used for integrated transformation of the signals, and the routing units are used for information transmission with other computing cores. The processing unit in the computing core can perform read-write access on a plurality of storage units of the storage unit to perform data interaction with the storage unit in the computing core, and can respectively undertake respective data processing tasks and/or data transmission tasks to obtain data processing results, or perform communication with other computing cores. Wherein communicating with other computing cores includes communicating with other computing cores within the present processor, as well as communicating with computing cores within other processors.

In one possible implementation manner, the storage unit includes a plurality of storage units, where the storage units may be Static Random Access Memories (SRAMs). For example, an SRAM with a read/write width of 16B and a capacity of 12KB may be included. The capacity size and bit width of the memory unit are not limited in the present disclosure.

Referring to fig. 3 based on the multi-processor many-core system, fig. 3 is a flowchart of a data processing method according to an embodiment of the disclosure. With reference to fig. 3, the processing method includes:

step S100, dividing the data into a plurality of first data, and determining a vector number corresponding to a first data, where the first data includes at least one vector. The vector numbers are used to determine the order in which the data is allocated to the compute cores. For example, the vector may include a plurality of second data, and the allocation sequence numbers corresponding to the second data in the same vector are consecutive, and the allocation sequence numbers are used to indicate the sequence in which the second data are allocated to the computing core. In one example, the data may be stored in an external memory or in one processor a of a multi-processor many-core system, which may be coupled to another processor B to transfer data to some or all of the computational cores of processor B. For example, the second data may be a basic unit constituting the data, for example: in the one-dimensional time sequence data, the second data may be a time sequence signal of each frame, and in the two-dimensional and three-dimensional images, the second data may be pixel points.

Referring to fig. 4, fig. 4 is a reference schematic diagram illustrating a processing method for processing two-dimensional data according to an embodiment of the disclosure.

Referring to fig. 4, the first data is a rectangle having a width (i.e., a side length in the horizontal direction) of 4 and a height (i.e., a side length in the vertical direction) of 2 in fig. 4, and the second data is a small square having a width and a height of 1, that is, the data in fig. 4 includes 18 first data, and each first data includes 8 second data. Each of the first data may be assigned to a corresponding computing core. For example: the second sub-data with the allocation sequence numbers of 1, 2, 3, 4, 13, 14, 15, 16 belongs to the first data, the corresponding calculation Core is Core _0, the allocation sequence numbers of the second sub-data with the allocation sequence numbers of 1, 2, 3, 4 in the first data are continuous to be used as one vector, and the allocation sequence numbers of the second sub-data with the allocation sequence numbers of 13, 14, 15, 16 are continuous to be used as the other vector.

Continuing to refer to fig. 3, in step S200, a vector number corresponding to an internal vector of each first datum is determined based on a vector number corresponding to the first datum.

Step S300, sequentially allocating the vectors in each first data to the corresponding computing cores based on the vector numbers corresponding to the vectors in each first data. Illustratively, each first data may correspond to one compute core, and the external memory or processor a sequentially assigns the second data to each compute core of processor B based on the order of the numbering of the vectors in the first data.

In one possible implementation, step S100 may include: in a case where the number of distribution dimensions of the data is greater than or equal to two, performing the following: determining a first dimension which is not the first distribution order and a second dimension which is not the last distribution order according to the distribution order of the data in the distribution dimension; and determining all vector numbers corresponding to the first data based on the length of the first data in the first dimension and the number of the first data in the second dimension.

Illustratively, the distribution dimension is a dimension in which the data is distributed. The above allocation order, i.e. the allocation order of data in each dimension, can also be understood as the storage order of data before being allocated to the computing core. For example: with reference to fig. 4, the data is two-dimensional data, and the distribution dimension of the data includes width and height. In the width dimension and the height dimension, if the distribution order of the data is distributed in the directions of the width dimension and the height dimension, the distribution order is represented as the arrangement of the distribution order numbers in fig. 4, that is, the distribution order numbers of the same row (that is, the width dimension) are consecutive. If the distribution order of the data is distributed in the direction of the height dimension first and then the width dimension, the distribution order numbers in the same column (that is, the height dimension) are shown to be continuous. Taking the distribution order in fig. 4 as an example, the first dimension of the non-first distribution order is the height dimension, and the second dimension of the non-last distribution order is the width dimension. Based on the data length of the first data in the height dimension (i.e., 2) and the data amount in the width dimension (i.e., 3), all vector numbers corresponding to the first data can be determined.

For convenience of understanding, each first datum may be numbered first width and then height, i.e., the first datum comprises an assignment sequence of

numbers

1, 2, 3, 4, 13, 14, 15, 16, the second datum comprises an assignment sequence of

numbers

5, 6, 7, 8, 17, 18, 19, 20, the fourth datum comprises an assignment sequence of

numbers

25, 26, 27, 28, 37, 38, 39, 40, and so on. Each first data includes two vectors in fig. 4, i.e. a plurality of second data with consecutive sequential numbering are assigned, for example: the first data includes a vector: 1. 2, 3, 4, and vector: 13. 14, 15, 16.

For example, in the case that the data is two-dimensional data, all vector numbers corresponding to the first data are determined based on the length of the first data in the first dimension and the number of the first data in the second dimension, and may be calculated by Python pseudo code as follows:

For y in range(Div_Fy):

Core[0].Vector_num_th＝J*y

where Div _ Fy is the data length of the first data in the first dimension, i.e. the data length of the first data in the height dimension in fig. 4, and the value is 2 in fig. 4. For y in range (Div _ Fy) represents each integer y taken from 0 to Div _ Fy (including 0, excluding Div _ Fy). Core [0] Vector _ num _ th is a Vector number pre-allocated to Core _0, that is, a Vector number in the first piece of first data. J is the number of data of the first data in the second dimension, i.e. the first data in the width dimension in fig. 4, and the value in fig. 4 is 3. That is, the vector number of the vector included in the first data may be determined by the product of the number of data of the first data in the second dimension and each integer between 0 and the data length of the first data in the first dimension.

Referring to the above pseudo code, it can be calculated that in fig. 4, the first data includes: two vectors with vector number 0 (i.e. 3 x 0) and vector number 3 (i.e. 3 x 1).

Referring to fig. 5, fig. 5 is a reference diagram illustrating a processing method for processing three-dimensional data according to an embodiment of the present disclosure. As shown in fig. 5, when the data is three-dimensional data, that is, the distribution dimension of the data includes depth, width (i.e., x-axis in fig. 5), and height (i.e., y-axis in fig. 5), and the storage sequence is: in the case of depth first, width second, and height last, the order in which the data is stored (which may also be understood as the order assigned to the compute cores) is stored in accordance with the table shown in fig. 5 (i.e., compute task data store in fig. 5). Each minicube in the figure can be considered a second datum.

Where Fz is the data length of the data in the depth dimension (96 px in fig. 5), Fx is the data length of the data in the width dimension (4 px in fig. 5), M is the number of divisions of the data in the depth dimension (i.e., the number of first data in the depth dimension, 3 in fig. 5), I is the length of the data in the height dimension (4 px in fig. 5), J is the length of the data in the width dimension (4 px in fig. 5), and Fz/M is the length of each vector (32 px in fig. 5).

Referring to fig. 6, fig. 6 is a reference schematic diagram illustrating a processing method for processing three-dimensional data according to an embodiment of the present disclosure. The storage sequence in fig. 6 is: depth first, width second, and height last. Where M is the number of partitions of the data in the depth dimension (i.e., the number of the first data in the depth dimension, which is 3 in fig. 6), I is the number of partitions of the data in the height dimension (i.e., the number of the first data in the height dimension, which is 2 in fig. 6), and J is the number of partitions of the data in the width dimension (i.e., the number of the first data in the width dimension, which is 2 in fig. 6). The first data (i.e., Core _0 in the figure) in fig. 6 includes: the vectors are numbered 0 (the vector includes pixels with storage sequence numbers 0 to 31), 3 (i.e., M-numbered vector including pixels with storage sequence numbers 96 to 127), 12 (i.e., Fx M-numbered vector including pixels with storage sequence numbers 384 to 415), and 15 (i.e., Fx M + M including pixels with storage sequence numbers 480 to 511). For another example: the 4 th first data (i.e., Core _4 in fig. 6) in fig. 6 includes: vectors with numbers of 6 (the vector includes storage sequence numbers 192 to 223 pixels), 9 (the vector includes storage sequence numbers 288 to 319 pixels), 18 (the vector includes storage sequence numbers 576 to 607 pixels), 21 (the vector includes storage sequence numbers 672 to 703 pixels).

For example, in the case that the data is three-dimensional data, all the vector numbers corresponding to the first data determined above may be calculated by Python pseudo code as follows:

For x in range(Div_Fx):

For y in range(Div_Fy):

Core[0].Vector_num_th＝M*(y*Fx+x)

where Div _ Fx represents the length of the first data in the width dimension (2 px in fig. 6), Div _ Fy represents the length of the first data in the height dimension (2 px in fig. 6), and Fx is the length of the data in the width dimension (4 px in fig. 6), which can be obtained by Fx ═ J × Div _ Fx, where J is the number of the first data in the width dimension (2 in fig. 6). M represents the number of first data in the depth dimension (3 in fig. 6). In conjunction with the above definitions of the first dimension (non-first dimension) and the second dimension (non-last dimension), that is, in fig. 6, the first dimension includes a width dimension and a height dimension, and the second dimension includes a depth dimension and a width dimension.

Exemplarily, substituting the values of the various parameters in fig. 6 into the above pseudo code may result in:

For x in range(2):

For y in range(2):

Core[0].Vector_num_th＝3*(y*4+x)

accordingly, the first data in fig. 6 includes the vector numbers: 0. 3, 12 and 15.

In a possible implementation, step S100 may further include: and dividing the data into a plurality of first data based on preset dividing parameters. And generating coordinates corresponding to each first datum based on the dividing parameters. Wherein the partitioning parameter is used to determine a number of the first data in each dimension. For example: i, J, M may be a partition parameter, which may be set according to the actual capacity of the storage unit in the compute core, and the embodiments of the present disclosure are not limited herein. The coordinates are used to represent the position relationship of each divided first data, taking fig. 4 as an example, the diagram includes two dimensions, namely a width dimension and a height dimension, and the storage sequence is width first and then height, then the width is used as a first coordinate value, the height is used as a second coordinate value, and each first data can be expressed as: (0, 0), (1, 0), (2, 0), (0, 1), (1, 1), etc., taking (0, 0) as an example, it represents the 0 th first data in the width dimension and the 0 th first data in the height dimension, in other words, each value in the coordinates, representing that the first data is the several first data in the dimension. And the coordinates of each first datum are used for calculating the vector number corresponding to each datum subsequently.

Referring to fig. 7, fig. 7 is a reference diagram illustrating a four-dimensional data processing method according to an embodiment of the disclosure. Illustratively, the four-dimensional data may be weight data used by the neural network in performing convolution operations. The allocation order of the four-dimensional data shown in fig. 7 is: number dimension (which can be understood as the number of convolution kernels in a convolution operation), depth dimension, width dimension, and height dimension.

Where the division parameters are sequentially represented as N, M, J, I in the above allocation order, N is set to 4, M is set to 2, J is set to 1, and I is set to 1 in fig. 7. The order of storing the weight data is in accordance with the number of each minicube, which may represent one value of the convolution kernels, where the number of W is the number of the convolution kernels, in other words, there are 128 convolution kernels in fig. 7. Accordingly, the four-dimensional data is divided into 8 first data, and since the value of J, I is 1, it can be understood that only two coordinate values are required in this example, i.e. the coordinate representing the value M, N, and in combination with fig. 7, the first coordinate value can be M, and the second coordinate value can be N, i.e. each first data can be represented as (0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (1, 2), and (1, 3). The convolution kernels 0 to 31 (i.e., W0 to W31 in fig. 7) are divided into coordinates (0, 0) and (1, 0), and the convolution kernels 32 to 63 (i.e., W32 to W63 in fig. 7) are divided into coordinates (0, 1) and (1, 1), and so on, which are not described herein again. Illustratively, as shown in fig. 7, the first data (i.e., the first data in which m is 0 and n is 0 in fig. 7) includes a vector representing the weight values of 0 to 31 allocation order numbers, a vector representing the weight values of 128 to 159 allocation order numbers, a vector representing the weight values of 256 to 287 allocation order numbers, and the like.

In one possible implementation, step S200 may include: in a case where the number of distribution dimensions of the data is greater than or equal to two, performing the following: determining a vector number corresponding to a first vector in each first datum based on the length of the first datum in a first dimension, the number of the first datum in a second dimension and corresponding coordinates; and determining all vector numbers corresponding to each first datum based on the first vector number corresponding to each first datum and all vector numbers in the first datum.

With reference to fig. 4, in the case that the data is two-dimensional data, based on the length of the first data in the first dimension, the number of the first data in the second dimension, and the corresponding coordinates, the following Python pseudo code may be referred to determine the vector number corresponding to the first vector in each first data:

Core[X].Vector_start＝Div_Fy*J*i+j

the values of j and i are the coordinates (j, i) corresponding to the xth first data, and other parameters are explained and are not described herein.

For example: the coordinates of the second first data (i.e., the first data assigned with the sequence number 5 of the first second data) are (1, 0), and the vector number corresponding to the first vector is 2 × 3 × 0+1 ═ 1. For another example: the coordinate of the 6 th first data (i.e., the first data assigned with the sequence number of 33 of the first second data) is (2, 1), and the vector number corresponding to the first vector is 2 × 3 × 1+2 — 8.

For example, in the case that the data is two-dimensional data, all vector numbers corresponding to each first data are determined based on a first vector number corresponding to each first data and all vector numbers in the first data, and the following Python pseudo code may be referred to:

For i in range(core_num):

For j in range(core[0]Vector[vector_num]):

core[i]Vector[j]＝core[0]Vector[j]+core[i]Vector[0]

wherein, core _ num represents the total number of the first data, and core [0] Vector [ Vector _ num ] represents the total number of vectors in the first data.

According to the above equation, taking the second first data in fig. 4 as an example, when the vector numbers of the first data are known to be 0 and 3, and the first vector number of the second first data is known to be 1, it is known that the second vector number of the second first data is 1+3 — 4. In the example above, when the first vector number in the fifth first data is 4, it can be seen that the second vector number in the fifth first data is 4+3 — 7. In other words, each vector number in the first data to be determined may be found by sequentially adding each vector number in the first data to the first vector number in the first data to be determined.

Referring to fig. 8, fig. 8 is a reference diagram illustrating a processing method for processing two-dimensional data according to an embodiment of the disclosure. Although the division in fig. 8 is different from that in fig. 4, the vector number can be determined by referring to the pseudo code of the two-dimensional data.

For example: in the left half of fig. 8 where data is stored in the width direction and then stored in the height direction (i.e., preceding and following data is stored in fig. 8), only one coordinate value is required to represent different first data since data division is performed in only one dimension. The omitted coordinate values are considered as 1, i.e., the first data (i.e., Core _0 in the figure) to the sixth first data (i.e., Core _5 in the figure), and the corresponding coordinates are expressed as (the number of the first data in the width dimension and the number of the first data in the height dimension) in terms of the dimension assignment sequence, i.e., as (1, 0), (1, 1), (1, 2), (1, 3), etc., i.e., i of Core [ X ]. Vector _ start ═ Div _ Fy ═ J + J is constantly 1. Similarly, the coordinates of the right half (i.e. the first column and the second column in fig. 8) are arranged according to the distribution sequence, and the corresponding coordinates are (the number of the first data in the height dimension and the number of the first data in the width dimension), i.e. represented as (1, 0), (1, 1), (1, 2), (1, 3), which is the same as the first column and the second column, in other words, the above pseudo code is applicable to the division of various two-dimensional data.

With reference to fig. 6, in the case that the data is three-dimensional data, based on the length of the first data in the first dimension, the number of the first data in the second dimension, and the corresponding coordinates, the following Python pseudo code may be referred to determine the vector number corresponding to the first vector in each first data:

Core[X].Vector_start＝M*(Div_Fx*(Div_Fy*J*i+j))+m

the values of j, i, and m are the coordinates (m, j, i) corresponding to the xth first data, and other parameters are explained and are not described herein.

For example: the coordinates of the 2 nd first data (i.e., the first data assigned with the sequence number 32 of the first second data) are (1, 0, 0), and the vector number corresponding to the first vector is 3 × 2 (2 × 0+0)) +1 ═ 1. For another example: the coordinates of the 4 th first data (i.e., the first data with the allocation sequence number of 192 of the first second data) are (0, 1, 0), and the vector number corresponding to the first vector thereof is 3 × 2 (2 × 0+1)) +0 ═ 6.

In the case of processing the three-dimensional data, it is determined that all the vector numbers corresponding to each first data can be calculated with reference to the pseudo code of the two-dimensional data based on the first vector number corresponding to each first data and all the vector numbers in the first data, which is not described herein again.

In one possible implementation, step S200 may include: and determining the vector number corresponding to each first data based on the number of the first data in the distribution dimensions and the vector number corresponding to the first data when the number of the distribution dimensions of the data is equal to one.

For example, the vector number corresponding to each first data may be determined by sequentially adding the vector number corresponding to the first data and the number of the first data in the distribution dimension. For example: with reference to fig. 1, if the number of the first data in the distribution dimension is 6, the vector number 0 corresponding to the first data is sequentially added to 0, 1, 2, 3, 4, and 5, and the vector numbers in the first to sixth first data are sequentially obtained.

With continued reference to fig. 3, in step S300, the vectors in each first data are sequentially allocated to the corresponding computing cores based on the corresponding vector numbers of the vectors in each first data. Illustratively, this step may include: determining a computing core corresponding to each first data; and according to the sequence of the vector number corresponding to each first datum, sequentially distributing the vectors corresponding to the vector numbers to the corresponding computing cores. The embodiment of the present disclosure does not limit the corresponding manner between the computation cores and the first data, and the first data to be processed by each computation core may be determined according to any preset rule. For example, for the total number of Q first data, Q computation cores may be allocated first to correspond to each first data one by one, and for each computation core, vectors are allocated to the computation core in sequence according to the order (e.g., 0, 3) of the vector numbers in the corresponding first data for operation, and the number of vectors allocated each time may be determined according to the number of vectors that can be received or processed by the computation core at the same time. Therefore, the process of distributing multidimensional data for the computing cores can be simplified, the method is not limited to continuous distribution of vectors, and the processing efficiency of the multi-processor many-core chip is improved.

Referring to fig. 9, fig. 9 is a reference schematic diagram of a data processing method according to an embodiment of the present disclosure.

With reference to fig. 9, data D (i.e., FMin × Dx Dy in the figure) may be three-dimensional image data, weight data W (i.e., FMin × FMout × K in the figure), output data O (FMout × Ox Oy), one data D may be divided according to a depth direction m, a width direction i, and a height direction j, i.e., denoted as D (i; j; m) in the figure, and weight data W may be divided according to a depth direction m and a number direction (i.e., a convolution kernel sorting direction, and a convolution kernel numbering direction) n of convolution kernels, i.e., denoted as W (m, n) in the figure. The numerous Core (i; j; m; n) represents the coordinates of the calculation Core allocated to the divided data D and the weight data W, different coordinates are used for identifying different calculation cores, values in the coordinates correspond to i, j, m and n in the data D and the weight data W one by one, and the calculation cores generate output data O according to the weight data W and the data D so as to complete convolution operation of the calculation cores.

Referring to fig. 10, fig. 10 is a block diagram of a data processing device according to an embodiment of the disclosure.

In one possible implementation manner, with reference to fig. 10, an embodiment of the present disclosure further provides a data processing apparatus 100 applied to a multi-processor many-core system, where the multi-processor many-core system includes multiple processors, each of the processors includes multiple computing cores, and the processing apparatus includes: a data dividing module 110, configured to divide the data into a plurality of first data, and determine a vector number corresponding to the first data; wherein the first data comprises at least one vector; the vector allocation module 120 is configured to determine a vector number corresponding to an inward vector of each first data based on a vector number corresponding to the first data; the data allocation module 130 is configured to sequentially allocate the vectors in each first data to the corresponding computing cores based on the vector numbers corresponding to the vectors in each first data.

In some embodiments, functions of the system or modules included in the system provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, no further description is provided here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Referring to fig. 11, fig. 11 is a block diagram of an electronic device 1200 according to an embodiment of the disclosure. As shown in fig. 11, the electronic device 1200 includes a computing processing means 1202 (e.g., the processor system described above including a plurality of artificial intelligence chips), an interface means 1204, other processing means 1206, and a storage means 1208. Depending on the application scenario, one or more computing devices 1210 may be included in a computing processing device (e.g., artificial intelligence chips, where each chip may include multiple functional cores).

In one possible implementation, the computing processing device of the present disclosure may be configured to perform operations specified by a user. In an exemplary application, the computing processing device may be implemented as a single chip artificial intelligence processor or a multi-chip artificial intelligence processor. Similarly, one or more computing devices included within the computing processing device may be implemented as an artificial intelligence chip or as part of a hardware structure of an artificial intelligence chip. When a plurality of computing devices are implemented as artificial intelligence chips or as part of the hardware structure of artificial intelligence chips, the computing processing device of the present disclosure may be considered as having a single chip structure or a homogeneous multi-chip structure.

In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively perform user-specified operations. Other Processing devices of the present disclosure may include one or more types of general and/or special purpose processors such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an artificial intelligence processor, and the like, depending on the implementation. These processors may include, but are not limited to, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, etc., and the number may be determined based on actual needs. As previously mentioned, the computational processing apparatus of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure only. However, when considered together, a computing processing device and other processing devices may be considered to form a heterogeneous multi-core structure.

In one or more embodiments, the other processing devices may interface with external data and controls as a computational processing device of the present disclosure (which may be embodied as an artificial intelligence, such as a neural network computing related computing device), performing basic controls including, but not limited to, data handling, starting and/or stopping of the computing device, and the like. In further embodiments, other processing devices may also cooperate with the computing processing device to collectively perform computing tasks.

In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing processing device may obtain input data from other processing devices via the interface device, and write the input data into a storage device (or memory) on the computing processing device. Further, the computing processing device may obtain the control instruction from the other processing device via the interface device, and write the control instruction into the control cache on the computing processing device slice. Alternatively or optionally, the interface device may also read data from the memory device of the computing processing device and transmit the data to the other processing device.

Additionally or alternatively, the electronic device of the present disclosure may further comprise a storage means. As shown in the figure, the storage means is connected to the computing processing means and the further processing means, respectively. In one or more embodiments, the storage device may be used to hold data for the computing processing device and/or the other processing devices. For example, the data may be data that is not fully retained within internal or on-chip storage of a computing processing device or other processing device.

According to different application scenarios, the artificial intelligence chip disclosed by the disclosure can be used for a server, a cloud server, a server cluster, a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a PC device, a terminal of the internet of things, a mobile terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an automatic driving terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Referring to fig. 12, fig. 12 is a block diagram of an electronic device according to an embodiment of the disclosure.

For example, the electronic device 1900 may be provided as a terminal device or a server. Referring to fig. 12, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of processing data for use in a multi-processor many-core system, the multi-processor many-core system including a plurality of processors, each processor including a plurality of compute cores, the method comprising:

dividing the data into a plurality of first data, and determining a vector number corresponding to the first data; wherein the first data comprises at least one vector;

determining a vector number corresponding to an inward vector of each first data based on the vector number corresponding to the first data;

and sequentially distributing the vectors in each first data to corresponding computing cores based on the corresponding vector numbers of the vectors in each first data.

2. The processing method of claim 1, wherein the vector comprises a plurality of second data, and the allocation sequence numbers corresponding to the second data in the same vector are consecutive, and the allocation sequence numbers are used to indicate the sequence of allocating the second data to the computing core.

3. The processing method of claim 1, wherein said determining a vector number corresponding to the first data comprises:

in a case where the number of distribution dimensions of the data is greater than or equal to two, performing the following:

determining a first dimension which is not the first distribution order and a second dimension which is not the last distribution order according to the distribution order of the data in the distribution dimension;

and determining all vector numbers corresponding to the first data based on the length of the first data in the first dimension and the number of the first data in the second dimension.

4. The processing method of claim 1, wherein said sequentially assigning the vectors in each first data to the corresponding compute core based on the corresponding vector number of the vector in each first data comprises:

determining a computing core corresponding to each first data;

and according to the sequence of the vector number corresponding to each first datum, sequentially distributing the vectors corresponding to the vector numbers to the corresponding computing cores.

5. The processing method of claim 1, wherein said dividing said data into a plurality of first data comprises:

dividing the data into a plurality of first data based on preset dividing parameters; wherein the partitioning parameter is used to determine a number of the first data in each dimension;

and generating coordinates corresponding to each first datum based on the dividing parameters.

6. The processing method of claim 5, wherein the determining a vector number corresponding to an internal vector of each first datum based on a vector number corresponding to the first datum comprises:

determining a vector number corresponding to a first vector in each first datum based on the length of the first datum in a first dimension, the number of the first datum in a second dimension and corresponding coordinates;

and determining all vector numbers corresponding to each first datum based on the first vector number corresponding to each first datum and all vector numbers in the first datum.

7. The processing method of claim 5, wherein the determining a vector number corresponding to an internal vector of each first datum based on a vector number corresponding to the first datum comprises:

and determining the vector number corresponding to each first data based on the number of the first data in the distribution dimensions and the vector number corresponding to the first data when the number of the distribution dimensions of the data is equal to one.

8. A data processing apparatus for use in a multi-processor many-core system, the multi-processor many-core system including a plurality of processors, each processor including a plurality of computing cores, the processing apparatus comprising:

the data dividing module is used for dividing the data into a plurality of first data and determining a vector number corresponding to the first data; wherein the first data comprises at least one vector;

the vector distribution module is used for determining a vector number corresponding to an inward vector in each first data based on the vector number corresponding to the first data;

and the data distribution module is used for sequentially distributing the vectors in each first data to the corresponding computing cores based on the vector numbers corresponding to the vectors in each first data.

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of processing data of any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement a method of processing data according to any one of claims 1 to 7.