CN116361205A

CN116361205A - Data processing apparatus, method, device and medium for determining tensor memory address

Info

Publication number: CN116361205A
Application number: CN202310370023.0A
Authority: CN
Inventors: 曾浩伦
Original assignee: Kunlun Core Beijing Technology Co ltd
Current assignee: Kunlun Core Beijing Technology Co ltd
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-06-30

Abstract

The present disclosure provides a data processing apparatus, relates to the technical field of artificial intelligence, and in particular relates to the technical field of chips. The specific implementation scheme is as follows: a storage unit; a processor configured to: performing target operation according to at least one continuous address space in a target storage area of the storage unit so as to determine at least one current address space of a target operator in the plurality of operators, wherein the at least one current address space corresponds to at least one first tensor to be addressed of the target operator respectively; determining at least one current target remaining space in the target storage area based on the lifecycle of the at least one first to-be-addressed tensor and the at least one current address space; and returning to the target operation in response to determining that the sum of the capacities of the at least one current target remaining space does not meet the preset condition. The disclosure also provides a method, an electronic device and a storage medium for determining a tensor storage address.

Description

Data processing apparatus, method, device and medium for determining tensor memory address

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of chip technology. More particularly, the present disclosure provides a data processing apparatus, a method of determining a tensor storage address, an electronic device, and a storage medium.

Background

With the development of artificial intelligence technology, the artificial intelligence chip can be applied to intensive computation or diversity computation based on preset rules.

Disclosure of Invention

The present disclosure provides a data processing apparatus, a method of determining a tensor storage address, an electronic device, and a storage medium.

According to an aspect of the present disclosure, there is provided a data processing apparatus comprising: a storage unit; a processor configured to: performing target operation according to at least one continuous address space in a target storage area of the storage unit so as to determine at least one current address space of a target operator in the plurality of operators, wherein the at least one current address space corresponds to at least one first tensor to be addressed of the target operator respectively; determining at least one current target residual space in the target storage area according to the life cycle of the at least one first tensor to be addressed and the at least one current address space, wherein the at least one current target residual space corresponds to at least one correlation operator associated with the target operator respectively; and returning to the target operation in response to determining that the sum of the capacities of the at least one current target remaining space does not meet the preset condition.

According to another aspect of the present disclosure, there is provided a method of determining a tensor storage address, the method comprising: performing target operation according to at least one continuous address space in a target storage area of the storage unit so as to determine at least one current address space of a target operator in the plurality of operators, wherein the at least one current address space corresponds to at least one first tensor to be addressed of the target operator respectively; determining at least one current target residual space in the target storage area according to the life cycle of the at least one first tensor to be addressed and the at least one current address space, wherein the at least one current target residual space corresponds to at least one correlation operator associated with the target operator respectively; and returning to the target operation in response to determining that the sum of the capacities of the at least one current target remaining space does not meet the preset condition.

According to another aspect of the present disclosure, there is provided an electronic device including the data processing apparatus provided by the present disclosure.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1A is a schematic computational diagram of a neural network model, according to one embodiment of the present disclosure;

FIG. 1B is a schematic diagram of multiple address spaces in a third level cache according to one embodiment of the present disclosure;

FIG. 2 is a schematic block diagram of a data processing apparatus according to one embodiment of the present disclosure;

FIG. 3A is a schematic diagram of multiple address spaces in a third level cache according to another embodiment of the present disclosure;

FIG. 3B is a schematic diagram of multiple address spaces in a target storage area according to another embodiment of the present disclosure;

FIGS. 3C and 3D are schematic diagrams of multiple address spaces in a target storage area according to another embodiment of the present disclosure;

FIG. 4 is a schematic effect diagram of a data handling apparatus according to one embodiment of the present disclosure;

FIG. 5 is a flowchart of a method of determining tensor storage addresses according to one embodiment of the present disclosure;

FIG. 6 is a schematic block diagram of an electronic device according to one embodiment of the present disclosure; and

fig. 7 is a block diagram of an electronic device to which a method of determining a tensor storage address may be applied, according to one embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Taking the Kunlun core artificial intelligent chip (XPU) as an example, the architecture of the artificial intelligent chip is more suitable for intensive computation or diversity computation based on preset rules. In the inference engine, a Neural Network model (NN) may be represented as a computational graph that includes operators and tensors. In the process of compiling the neural network model, the storage space of the tensor can be pre-allocated so as to reduce access delay.

The artificial intelligence chip may include multi-level storage. For example, multi-Level storage may include Level 1Cache (l 1 Cache), level 2Cache (l 2 Cache), level 3Cache (l 3 Cache), and Global storage (Global Memory) units. If the tensors are reasonably distributed in the three-level cache, the computing performance of the artificial intelligent chip can be improved, which will be described below with reference to fig. 1A and 1B.

Fig. 1A is a schematic computational diagram of a neural network model, according to one embodiment of the present disclosure.

As shown in fig. 1A, the neural network model may include operators OP101 through OP114. The input tensor of the neural network model is input to the operator OP101, and a tensor T101 can be obtained. The tensor T101 is input to the operator OP102, and the tensor T102 can be obtained. The tensor T102 is input to the operator OP103, and the tensor T103 can be obtained. The tensor T103 is input to the operator OP104, and the tensor T104 can be obtained. The tensor T104 is input to the operator OP105, and the tensor T105 can be obtained. The tensor T105 is input to the operator OP106, and the tensor T106 can be obtained. The tensor T106 and the tensor T103 are input to the operator OP107, and the tensor T107 can be obtained. Tensor T107 is input to operator OP108, and tensor T108 can be obtained. Tensor T108 and tensor T107 are input to operator OP109, and tensor T109 can be obtained. The tensor T109 is input to the operator OP110, and the tensor T110 can be obtained. The tensor T110 is input to the operator OP111, and the tensor T111 can be obtained. Tensor T111 and tensor T110 are input to operator OP112, and tensor T112 can be obtained. The tensor T112 is input to the operator OP113, and the tensor T113 can be obtained. The tensor T113 is input to the operator OP114, and the tensor T114 can be obtained.

As shown in fig. 1A, the tensor T103 is an output of the operator OP103, and serves as inputs of the operators OP104 and OP107, respectively. Thus, the lifecycle of the tensor T103 is continued at least until the operator OP107 is executed. It will be appreciated that a period of time may be determined as the lifecycle of the tensor from the time the tensor is written to the storage unit to the time the address space of the tensor is released. In addition, when the tensor T103 is input to the operator OP104 and the operator OP107, respectively, the operator OP104 and the operator OP103 have a dependency relationship, and the operator OP107 and the operator OP103 also have a dependency relationship. After the tensor is generated, the tensor may be written to the three-level cache. The address space of the plurality of tensors in the three-level cache is shown in FIG. 1B.

FIG. 1B is a schematic diagram of multiple address spaces in a third level cache according to one embodiment of the present disclosure.

In some embodiments, the address space of the tensor in the tertiary cache may be determined manually. The address space of each operator may also be determined based on the topological relationship between the operators.

However, for complex and diverse neural network models, the time required to manually determine the address space is costly, resulting in inefficient model development.

Furthermore, the address space determined from the topological relation is difficult to be multiplexed with high performance. For example, after tensor T101 is generated, tensor T101 may be written to the three levels of cache. After inputting the tensor T101 into the operator OP102, the address space of the tensor T101 may be released. Other tensors are still written or released at predetermined addresses, making it difficult to reuse the address space of tensor T101.

Thus, to improve the performance of an artificial intelligence chip, the present disclosure provides a data processing apparatus, as will be described below.

Fig. 2 is a schematic block diagram of a data processing apparatus according to one embodiment of the present disclosure.

As shown in fig. 2, the data processing apparatus 200 may include a storage unit 210 and a processor 220.

The storage unit 210 may include the third level cache described above. For example, the storage unit may include a target storage area for storing a plurality of tensors.

The processor 220 may be configured to: the target operation is performed based on at least one contiguous address space in a target storage area of the storage unit to determine at least one current address space of a target operator of the plurality of operators. In an embodiment of the present disclosure, the at least one current address space corresponds to at least one first tensor to be addressed of the target operator, respectively. For example, the target operator may be any one of a plurality of operators. The plurality of operators may, for example, perform convolution processing, attention-mechanism-based fusion processing, or the like. The amount of data of the first to-be-addressed tensor may be smaller than the capacity of one continuous address space. The capacity of the current address space may be less than or equal to the capacity of the continuous address space. The current address space may store a first tensor to be addressed. Taking the example that the target operator is the operator OP103 described above, both the tensor T103 output by the operator OP103 and the tensor T102 input to the operator OP103 can be used as the first tensor to be addressed.

The processor 220 may be further configured to: at least one current target remaining space is determined in the target storage area based on the lifecycle of the at least one first to-be-addressed tensor and the at least one current address space.

Taking the example that the target operator is the operator OP103 described above, the tensor T103 output by the operator OP103 may be used as the input of the operator OP104 and the operator OP107. After being processed by the operator OP107, the address space of the tensor T103 may be released. During the run time of the operators OP104 to OP107, the tensor T103 may be stored in a storage unit. Further, the tensor T102 output by the operator OP102 may be an input of the tensor OP 103. Thus, at least one operator associated with operator OP103 may comprise: operator OP102, operator OP104, operator OP105, operator OP106, and operator OP107.

The first address of the current address space of the tensor T102 may coincide with the first address of the target storage area, and the first address of the current address space of the tensor T103 may be determined by the last address of the current address space of the tensor T102. In which case the unoccupied contiguous address space in the target storage area may be taken as the current target remaining space corresponding to the operator OP 104.

The processor 220 may be further configured to: and returning to the target operation in response to determining that the sum of the capacities of the at least one current target remaining space does not meet the preset condition. In the embodiment of the present disclosure, the preset condition may be: the sum of the capacities of the at least one current target remaining space is greater than or equal to a preset capacity threshold. The sum of the capacities of the 5 current target remaining spaces corresponding to the operator OP102, the operator OP104, the operator OP105, the operator OP106, the operator OP107, respectively, may be determined. If the sum of the capacities is less than the preset capacity threshold, the iteration of the current round may be ended. Next, the target operation may be returned to perform the iteration of the subsequent round to determine at least one subsequent address space of the operator OP103, and further to redetermine the subsequent target remaining space to which the correlation operator corresponds, respectively. It will be appreciated that the first address of the subsequent address space of tensor T102 may not be consistent with the first address of the target storage region.

With the embodiments of the present disclosure, each operator of the plurality of operators may be a target operator. For each operator, at least one target operation is performed, and at least one iteration of the round is performed. The sum of the capacities of the residual spaces of the current targets can be made to approach a preset capacity threshold, and the available storage space of each correlation operator can be made as large as possible. The method can effectively avoid a large amount of free space in the target storage area, and is also beneficial to improving the multiplexing rate of addresses in the target storage area.

In an embodiment of the present disclosure, the processor 220 may be further configured to: in response to determining that the sum of the capacities of the at least one current target remaining space meets a preset condition, the at least one current address space is respectively taken as at least one first address space. After repeatedly performing the target operation a plurality of times, if it is determined that the sum of the capacities of the at least one current target remaining space is greater than the preset capacity threshold, the current address space may be taken as the first address space. Next, at least one round of iterations may be performed with operator OP104 as the target operator.

It will be appreciated that while the apparatus of the present disclosure has been described above, some ways of determining the target storage area will be described below.

In some embodiments, the processor may be further configured to: a space occupation corresponding to the operator is determined in the storage unit based on a sum of the data amounts of at least one tensor associated with the operator. For example, the tensors associated with an operator may include an input tensor and an output tensor of the operator. Taking the operator OP103 described above as an example, the tensors associated with the operator OP103 may include a tensor T102 and a tensor T103. From the sum of the data amounts of the tensor T102 and the tensor T103, the occupied space corresponding to the operator OP103 can be determined.

In some embodiments, the processor may be further configured to: and determining a target storage area in the storage unit according to the occupied space corresponding to the first operator in the plurality of operators.

In an embodiment of the disclosure, the capacity of the occupied space corresponding to the first operator is greater than or equal to the capacity of the occupied space corresponding to any one of the plurality of operators. For example, among the operators OP101 to OP114, if the occupied space corresponding to the operator OP103 is the largest, the operator OP103 may be regarded as the first operator. The occupied space corresponding to the operator OP103 may be taken as the target storage area. According to the embodiment of the disclosure, an operator with the largest occupied space is determined, and a target storage area is determined according to the occupied space of the operator. Thus, the capacity of the target storage area is more reasonable, which contributes to improving the robustness of a plurality of operations performed by the processor and also contributes to saving space as much as possible.

It will be appreciated that some of the ways in which the target storage area is determined are described above, and that the processor of the present disclosure will be further described below.

FIG. 3A is a schematic diagram of multiple address spaces in a third level cache according to another embodiment of the present disclosure.

As shown in fig. 3A, the neural network model may include, for example, operators OP30 through OP39. Taking the example that the operator OP33 is the first operator, the at least one tensor associated with the operator OP33 may include a tensor T30, a tensor T31, a tensor T32, a tensor T33, and a tensor T34. Tensor T30 and tensor T32 may be input tensors to operator OP 33. The tensor T33 may be the output tensor of the operator OP 33. Tensor T31 and tensor T34 may be the output tensors of operator OP 31. The respective lifecycles of the tensors T31 and T34 overlap with the operator OP33 operation period. Thus, when the operator OP33 is running, the tensor T31 and the tensor T34 can be stored in the storage unit.

As shown in fig. 3A, the width of the tensor may characterize the data volume of the tensor. The height of the tensor may characterize the lifecycle of the tensor. The data amount of the tensor T31 may be smaller than the data amount of the tensor T30. The length of the lifecycle of the tensor T33 may be greater than the length of the lifecycle of the tensor T30.

In case the target operator is the first operator, the tensor T30, the tensor T31, the tensor T32, the tensor T33 and the tensor T34 may all be the first tensor to be addressed.

In an embodiment of the present disclosure, the processor may be configured to: the target operation is performed based on at least one contiguous address space in a target storage area of the storage unit to determine at least one current address space of a target operator of the plurality of operators. As shown in fig. 3, in the case where the target operator is the first operator, the entire address space of the target storage area a30 may be regarded as one continuous address space. From this continuous address space, a plurality of current address spaces may be determined. The current address space may store a first tensor to be addressed. As shown in fig. 3, the first address of the current address space of the tensor T30 may be consistent with the first address of the target storage area, and the last address of the current address space of the tensor T34 may be consistent with the last address of the target storage area. The current address space of the tensor T30, the current address space of the tensor T31, the current address space of the tensor T32, the current address space of the tensor T33, and the current address space of the tensor T34 may be sequentially located within the target storage area a 30.

In an embodiment of the present disclosure, the processor may be further configured to: at least one current target remaining space is determined in the target storage area based on the lifecycle of the at least one first to-be-addressed tensor and the at least one current address space. The at least one current target residual space corresponds to at least one correlation operator correlated with the target operator, respectively. For example, at least one association operator associated with the target operator may be determined based on the lifecycle of the first to-be-addressed tensor. As shown in fig. 3A, the life cycle of the tensor T30 overlaps with the operation period of the operator OP30, the operation period of the operator OP31, the operation period of the operator OP32, and the operation period of the operator OP 33. The lifecycle of the tensor T33 overlaps with the operation period of the operator OP33, the operation period of the operator OP34, the operation period of the operator OP35, the operation period of the operator OP36, the operation period of the operator OP37, the operation period of the operator OP 38. Thus, operator OP30, operator OP31, operator OP32, operator OP34, operator OP35, operator OP36, operator OP37, and operator OP38 can be respectively used as the correlation operators of operator OP 33.

In an embodiment of the present disclosure, the processor may be further configured to: at least one current remaining address space of the associative operator is determined within the target storage area based on the current address space of the at least one first to-be-addressed tensor associated with the associative operator. For example, the tensor associated with the associated operator may be a tensor whose lifecycle overlaps with the operator run period. As shown in fig. 3A, taking the operator OP34 as an example, the tensor T31, the tensor T33, and the tensor T34 may be related to the operator OP 34. The current remaining address space of the operator OP34 may be determined to be 2 in the target memory area, current remaining address space a341 and current remaining address space a342, respectively. The capacity of the current remaining address space a341 may be greater than the capacity of the current remaining address space a342.

In an embodiment of the present disclosure, the processor may be further configured to: and taking the current residual address space with the largest capacity as the current target residual space of the correlation operator. For example, the current remaining address space a341 may be regarded as the current target remaining space that is the operator OP 34. According to the embodiment of the invention, the current residual address space with the largest capacity is used as the current target residual space of the correlation operator, so that iteration convergence can be quickened, and the efficiency of determining the first address space is improved.

In some embodiments, the processor may be further configured to: it is determined whether a sum of capacities of the at least one current target remaining space satisfies a preset condition. For example, the preset conditions may include: the sum of the capacities of the at least one current target remaining space is greater than or equal to a preset capacity threshold.

In the disclosed embodiment, in response to determining that the sum of the capacities of the at least one current target remaining space does not meet the preset condition, returning to the target operation. For example, the sum of the capacities of the current target remaining space of the operator OP30, the current target remaining space of the operator OP31, the current target remaining space of the operator OP32, the current target remaining space of the operator OP34, the current target remaining space of the operator OP35, the current target remaining space of the operator OP36, the current target remaining space of the operator OP37, and the current target remaining space of the operator OP38 may be determined. If the sum of the capacities is less than the preset capacity threshold, the iteration of the current round may be ended. Returning to the target operation, an iteration is performed in a subsequent round to re-determine the current address space of each of the plurality of first to-be-addressed tensors, as will be further described below in connection with fig. 3B.

FIG. 3B is a schematic diagram of multiple address spaces in a target storage area according to another embodiment of the present disclosure.

As shown in fig. 3B, after the respective address spaces of the plurality of first to-be-addressed tensors are redetermined, the first address of the current address space of the tensor T34 may coincide with the first address of the target storage area, and the last address of the current address space of the tensor T31 may coincide with the last address of the target storage area. The current address space of the tensor T34, the current address space of the tensor T33, the current address space of the tensor T30, the current address space of the tensor T32, and the current address space of the tensor T31 may be sequentially located within the target storage area a 30.

Next, at least one current target remaining space may be determined in the target storage area. As shown in fig. 3B, taking the operator OP34 as an example, the tensor T31, the tensor T33, and the tensor T34 may be related to the operator OP 34. The current remaining address space of the operator OP34 may be determined to be 1 in the target storage area as the current remaining address space a343. The current remaining address space a343 may be the current target remaining space corresponding to the operator OP 34.

Next, it may be re-determined whether the sum of the capacities of the at least one current target remaining space satisfies a preset condition.

In some embodiments, the processor may be further configured to: in response to determining that the sum of the capacities of the at least one current target remaining space meets a preset condition, the at least one current address space is respectively used as at least one first address space, wherein the at least one first address space corresponds to at least one first tensor to be addressed respectively. For example, in this iterative process, the sum of the capacities of the current target remaining space of the operator OP30, the current target remaining space of the operator OP31, the current target remaining space of the operator OP32, the current target remaining space of the operator OP34, the current target remaining space of the operator OP35, the current target remaining space of the operator OP36, the current target remaining space of the operator OP37, and the current target remaining space of the operator OP38 may be determined. If the sum of the capacities is greater than the preset capacity threshold, the current address space of the tensor T30 may be used as the first address space of the tensor T30, the current address space of the tensor T31 may be used as the first address space of the tensor T31, the current address space of the tensor T32 may be used as the first address space of the tensor T32, the current address space of the tensor T33 may be used as the first address space of the tensor T33, and the current address space of the tensor T34 may be used as the first address space of the tensor T34.

It will be appreciated that the processor of the present disclosure has been described above using the example in which the target operator is the first operator. The present disclosure is not limited thereto and the target operator may be an operator other than the first operator. As will be further described below.

In an embodiment of the disclosure, in the case where the target operator is an operator other than the first operator of the plurality of operators, there is an addressed tensor in the target storage area. As shown in fig. 3B, operator OP34 may be considered a target operator. For operator OP34, there are 3 addressed tensors in the target storage area, tensor T34, tensor T33 and tensor T31, respectively.

In an embodiment of the present disclosure, the processor may be configured to: at least one consecutive address interval is determined based on the target storage area and the addressed tensor. As shown in fig. 3B, from the target storage area a30, the first address section of the tensor T34, the first address section of the tensor T33, and the first address section of the tensor T31, a continuous address space corresponding to the operator OP34 can be determined. The continuous address space may be, for example, the current remaining address interval a343 described above. According to the embodiment of the invention, the continuous address space is determined according to the address space of the addressed tensor, so that the address of the addressed tensor can be effectively maintained, the number of tensors to be addressed is reduced, the resource consumption of a processor can be further reduced, and the performance of the processor is improved.

It will be appreciated that the present disclosure has been described above with the example of a continuous address space having a capacity greater than the amount of data to be addressed tensor. The present disclosure is not limited thereto and will be described below by taking the example that the capacity of the continuous address space is smaller than the tensor to be addressed.

In embodiments of the present disclosure, after determining the contiguous address space corresponding to the target operator, it may be determined whether the capacity of the contiguous address space is greater than the amount of data of the tensor to be addressed. As shown in fig. 3B, the tensor output by the operator OP34 may be the tensor to be addressed. If the amount of data of the tensor to be addressed is larger than the capacity of the continuous address space, the tensor to be addressed can be taken as a second tensor to be addressed.

In an embodiment of the present disclosure, the processor may be further configured to: a second address space of a second tensor to be addressed of the target operator is determined in the memory location. The second address space is located outside the target storage area and the amount of data of the second tensor to be addressed may be greater than the capacity of either of the successive address spaces. For example, the second address space may be determined outside the target storage area a 30.

It will be appreciated that the current target remaining space was determined above based on the life cycle of the first to-be-addressed tensor and the current address space, but the disclosure is not so limited and will be further described below in connection with fig. 3C.

Fig. 3C and 3D are schematic diagrams of multiple address spaces in a target storage area according to another embodiment of the present disclosure.

After determining the second address space of the tensor output by the operator OP34, the operator OP35 may be taken as the target operator. The processor may determine at least one consecutive address interval based on the target storage area and the addressed tensor. As shown in fig. 3C, from the target storage area a30, the first address space of the tensor T33, and the first address space of the tensor T34, the continuous address space a351 can be determined. The operator OP35 may output a tensor T36. The amount of data of the tensor T36 may be smaller than the capacity of the continuous address space a351, for example. The tensor T36 may be the first tensor to be addressed of the operator OP 35. From the continuous address space a351, a target operation may be performed to determine the current address space of the operator OP35 from the continuous address space a351. The current address space may store the tensor T36 output by the operator OP 35. The current address space may, for example, have a head address that coincides with the head address of the consecutive address space a351.

Next, in an embodiment of the present disclosure, the processor may be further configured to: at least one current target remaining space is determined in the target storage area based on the lifecycle of the at least one first to-be-addressed tensor, the at least one current address space, and the address space of the addressed tensor. As shown in fig. 3D, for operator OP36 associated with operator OP35, tensors T33 and T34 may be the addressed tensors. From the first address space of tensor T33, the first address space of tensor T34, and the current address space of tensor T36, the current remaining address space a361 of operator OP36 can be determined in the target storage region. The current remaining address space a361 may be regarded as the current target remaining space of the operator OP 36. By the embodiments of the present disclosure, the first to-be-addressed tensor of the plurality of operators can efficiently multiplex the address space within the target storage area. And further, the bandwidth advantage of the three-level buffer unit can be fully utilized, the memory access efficiency of the processor is improved, and the performance of the artificial intelligent chip can be effectively improved.

In other embodiments of the present disclosure, the preset conditions may further include: the sum of the capacities of the at least one current target remaining space is maximized. The preset capacity threshold may be a value smaller than the sum of the maximum capacities. For example, the number of iterations or the duration may be preset, and after the predetermined number of iterations is reached, the iteration round with the maximum sum of the capacities of the remaining space of the current target may be determined as the target iteration round. The current address space determined in the target iteration round is taken as the first address space of the tensor.

It will be appreciated that the processor of the present disclosure has been described above and that the memory unit of the present disclosure will be further described below.

In some embodiments, the storage unit may store association relationship information (Layer connections) related to the neural network model. The association relationship information may indicate an association relationship between different operators. For example, the association relationship information may indicate that the operator OP33 is associated with the operator OP30, the operator OP31, the operator OP32, the operator OP34, the operator OP35, the operator OP36, the operator OP37, the operator OP 38.

In some embodiments, the storage unit may store data amount information (layer size) related to the operator. The data amount information may indicate a sum of data amounts of at least one tensor associated with the operator. For example, the data amount information of the operator OP33 may indicate the sum of the data amount of the tensor T30, the data amount of the tensor T31, the data amount of the tensor T32, the data amount of the tensor T33, and the data amount of the tensor T34

In some embodiments, the storage unit may store the operator-ordered index information. The operator-ordered index information (1 layer sort idx) may correspond to the operator index ordered according to the data amount information. From this operator ordering index information, it can be determined that: of the plurality of operators, the operator OP33 has the largest sum of the data amounts of the plurality of tensors related.

In some embodiments, the storage unit may store tensor ordering index information (datasort idx) of the operator. For example, tensor ordering index information may be generated after the target operation. The tensor ordering index information may indicate at least one address space. The tensor ordering index information may include at least one tensor location information (tensor location). The tensor location information may indicate the tail address of the tensor. The tail address may be the tail address of the current address space. The tensor ordering index information is adjusted multiple times during multiple iterations to determine the first address space.

In some embodiments, the storage unit may store tensor addressing information (tensor allocation) of the operator. The tensor addressing information may indicate the tail address of the tensor. The tail address may be the tail address of the first address space of the tensor.

It will be appreciated that the data processing apparatus of the present disclosure has been described above in connection with operator OP30 to operator OP 39. Effects of the data processing apparatus of the present disclosure will be described below in conjunction with the above-described operators OP101 to OP 114.

Fig. 4 is a schematic effect diagram of a data handling apparatus according to one embodiment of the present disclosure.

As shown in fig. 4, after determining the address space of the tensor to be addressed for each operator using the apparatus 200, the address space of the first tensor to be addressed may be located in the target storage area a40. The address space of the plurality of second tensors to be addressed may be located in the first storage area a41 and the second storage area a42.

As shown in fig. 4, the addresses in the target storage area a40 can be efficiently multiplexed, and thus the storage space required for running multiple operators of the neural network model can be greatly reduced, which is beneficial to improving the utilization rate of the third-level cache.

It will be appreciated that while the processing apparatus of the present disclosure has been described above, the method of determining a tensor storage address of the present disclosure will be described below.

FIG. 5 is a flowchart of a method of determining tensor storage addresses according to one embodiment of the present disclosure.

As shown in fig. 5, the method 500 may include operations S510 to S530.

In operation S510, a target operation is performed according to at least one continuous address space in a target storage area of a storage unit to determine at least one current address space of a target operator of a plurality of operators.

In an embodiment of the present disclosure, the at least one current address space corresponds to at least one first tensor to be addressed of the target operator, respectively.

In operation S520, at least one current target remaining space is determined in the target storage area according to the lifecycle of the at least one first to-be-addressed tensor and the at least one current address space.

In an embodiment of the present disclosure, at least one current target residual space corresponds to at least one correlation operator correlated with a target operator, respectively.

In operation S530, in response to determining that the sum of the capacities of the at least one current target remaining space does not meet the preset condition, returning to the target operation.

It is appreciated that the method 500 may be implemented using the processors described above.

In some embodiments, the method 500 may further comprise: in response to determining that the sum of the capacities of the at least one current target remaining space meets a preset condition, the at least one current address space is respectively taken as at least one first address space. For example, the at least one first address space corresponds to at least one first tensor to be addressed, respectively.

In some embodiments, the preset conditions include at least one of: the sum of the capacities of the at least one current target remaining space is maximized. The sum of the capacities of the at least one current target remaining space is greater than or equal to a preset capacity threshold.

In some embodiments, determining at least one current target remaining space in the target storage area may include: at least one current remaining address space of the associative operator is determined within the target storage area based on the current address space of the at least one first to-be-addressed tensor associated with the associative operator. And taking the current residual address space with the largest capacity as the current target residual space of the correlation operator.

In some embodiments, performing the target operation according to at least one contiguous address space in the target storage area of the storage unit may include: determining, in a storage unit, an occupied space corresponding to the operator based on a sum of data amounts of at least one tensor associated with the operator; and determining a target storage area in the storage unit according to the occupied space corresponding to the first operator of the plurality of operators. For example, the volume of the occupied space corresponding to the first operator is greater than or equal to the volume of the occupied space corresponding to any one of the plurality of operators.

In some embodiments, where the target operator is an operator other than the first operator of the plurality of operators, there is an addressed tensor in the target storage area. Determining at least one current target remaining space in the target storage area may include: at least one current target remaining space is determined in the target storage area based on the lifecycle of the at least one first to-be-addressed tensor, the at least one current address space, and the address space of the addressed tensor.

In some embodiments, the method 500 may further comprise: a second address space of a second tensor to be addressed of the target operator is determined in the memory location. For example, the second address space is located outside the target storage area, and the amount of data of the second tensor to be addressed is greater than the capacity of any one of the consecutive address spaces.

In some embodiments, performing the target operation according to at least one contiguous address space in the target storage area of the storage unit may further comprise: at least one consecutive address interval is determined based on the target storage area and the addressed tensor.

It will be appreciated that the method of determining a tensor storage address of the present disclosure has been described above and that an electronic device comprising data processing means will be described below.

Fig. 6 is a block diagram of an electronic device according to another embodiment of the present disclosure.

As shown in fig. 6, the electronic device 60 may include a data processing apparatus 600 provided by the present disclosure. The data processing device 600 may be, for example, the device 200 described above.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, a method of determining a tensor storage address. For example, in some embodiments, the method of determining tensor storage addresses may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into RAM 703 and executed by the computing unit 701, one or more steps of the method of determining a tensor storage address described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method of determining the tensor storage address by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) display or an LCD (liquid crystal display)) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A data processing apparatus comprising:

a storage unit;

a processor configured to:

performing target operation according to at least one continuous address space in a target storage area of the storage unit so as to determine at least one current address space of a target operator in a plurality of operators, wherein at least one current address space corresponds to at least one first tensor to be addressed of the target operator respectively;

determining at least one current target residual space in the target storage area according to the life cycle of at least one first tensor to be addressed and at least one current address space, wherein at least one current target residual space corresponds to at least one correlation operator associated with the target operator respectively; and

and returning to the target operation in response to determining that the sum of the capacities of at least one of the current target remaining spaces does not meet a preset condition.

2. The apparatus of claim 1, wherein the processor is further configured to:

and in response to determining that the sum of the capacities of the at least one current target residual space meets a preset condition, respectively taking the at least one current address space as at least one first address space, wherein the at least one first address space corresponds to the at least one first tensor to be addressed respectively.

3. The apparatus of claim 1, wherein the preset conditions comprise at least one of:

the sum of the capacities of at least one of the current target remaining spaces is maximum; and

the sum of the capacities of at least one of the current target remaining spaces is greater than or equal to a preset capacity threshold.

4. The apparatus of claim 1, wherein the processor is further configured to:

determining at least one current remaining address space of the correlation operator in the target storage area according to the current address space of at least one first tensor to be addressed related to the correlation operator; and

and taking the current residual address space with the largest capacity as the current target residual space of the correlation operator.

5. The apparatus of claim 1, wherein the processor is further configured to:

Determining, in the storage unit, a footprint corresponding to the operator according to a sum of data amounts of at least one tensor associated with the operator; and

and determining the target storage area in the storage unit according to the occupied space corresponding to a first operator in the operators, wherein the capacity of the occupied space corresponding to the first operator is larger than or equal to the capacity of the occupied space corresponding to any operator in the operators.

6. The apparatus of claim 4, wherein in the case where the target operator is an operator other than the first operator of the plurality of operators, there is an addressed tensor in the target storage area,

the processor is further configured to:

at least one of the current target remaining spaces is determined in the target storage area based on a lifecycle of at least one of the first to-be-addressed tensor, at least one of the current address spaces, and an address space of the addressed tensor.

7. The apparatus of claim 4, wherein the processor is further configured to:

and determining a second address space of a second tensor to be addressed of the target operator in the storage unit, wherein the second address space is positioned outside the target storage area, and the data volume of the second tensor to be addressed is larger than the capacity of any one of the continuous address spaces.

8. The apparatus of claim 6, wherein the processor is further configured to:

at least one of the consecutive address intervals is determined from the target storage area and the addressed tensor.

9. A method of determining a tensor storage address, comprising:

performing target operation according to at least one continuous address space in a target storage area of a storage unit so as to determine at least one current address space of a target operator in a plurality of operators, wherein at least one current address space corresponds to at least one first tensor to be addressed of the target operator respectively;

10. The method of claim 9, further comprising:

11. The method of claim 9, wherein the preset conditions include at least one of:

12. The method of claim 9, wherein the determining at least one current target remaining space in the target storage area comprises:

13. The method of claim 9, wherein the performing the target operation according to at least one contiguous address space in the target storage area of the storage unit comprises:

14. The method of claim 12, wherein, in the case where the target operator is an operator other than the first operator of the plurality of operators, there is an addressed tensor in the target storage area,

the determining at least one current target remaining space in the target storage area comprises:

15. The method of claim 12, further comprising:

16. The method of claim 14, wherein the targeting operation according to at least one contiguous address space in a target storage area of a storage unit further comprises:

17. An electronic device comprising the data processing apparatus of any one of claims 1 to 8.

18. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 9 to 16.

19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 9 to 16.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 9 to 16.