CN115759204A

CN115759204A - Data processing method of neural network model, storage medium and electronic device

Info

Publication number: CN115759204A
Application number: CN202211475780.6A
Authority: CN
Inventors: 高峰; 许礼武; 黄敦博
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-03-07

Abstract

The application relates to the technical field of computers, and discloses a data processing method of a neural network model, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring data to be processed; predicting the size of a first storage space occupied by data to be processed when the neural network model is operated; segmenting data to be processed based on the relation between the size of the first storage space and the size of a second storage space of a storage unit of the static memory to obtain a plurality of subdata to be processed; the method comprises the steps of inputting a plurality of sub-data to be processed into a neural network model for operation, respectively storing obtained sub-calculation results corresponding to the plurality of sub-data to be processed into a plurality of storage units of a static memory, further realizing data reading and writing of the neural network model in a data operation process through the static memory, and improving the processing efficiency of the neural network model by reading and writing data from the static memory due to low delay of reading and writing data through the static memory.

Description

Data processing method of neural network model, storage medium and electronic device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method for a neural network model, a storage medium, and an electronic device.

Background

With the rapid development of Artificial Intelligence (AI), neural network models are applied more and more widely in the field of artificial intelligence. In order to increase the operation speed of the Neural Network, various operations in the Neural Network model, such as convolution, element-by-element operation, connection, and the like, may be implemented by an operation Unit of the Neural Network model, such as a Neural-Network Processing Unit (NPU). When the NPU performs an operation, a large number of intermediate results are generated, and the intermediate results need to be repeatedly read and written, for example, after the convolution operation is performed, the convolution result needs to be written into a memory, and when the element-by-element operation is performed, the convolution result needs to be read from the memory to perform the element-by-element operation.

However, when the amount of Data processed by the neural network is large, each intermediate result generated in the operation process is usually written into a Dynamic Memory, such as a Double Data Rate Synchronous Random Access Memory (DDR), which is difficult to ensure the performance of the neural network model due to high read-write latency and large power consumption.

Disclosure of Invention

The embodiment of the application provides a data processing method of a neural network model, a storage medium and electronic equipment.

In a first aspect, the present application provides a data processing method of a neural network model, applied to an electronic device, including: acquiring data to be processed; predicting the size of a first storage space occupied by data to be processed when the neural network model is operated; segmenting data to be processed based on the relation between the size of the first storage space and the size of a second storage space of a storage unit of a static memory in the electronic equipment to obtain a plurality of sub data to be processed; and inputting the plurality of sub-data to be processed into the neural network model for operation, and respectively storing the obtained sub-calculation results corresponding to the plurality of sub-data to be processed into a plurality of storage units of the static memory.

In an embodiment of the present application, the size of the first storage space is used to represent the first memory, and the size of the second storage space is used to represent the second memory. The data to be processed of the neural network model is segmented, the data to be processed with large data volume is segmented into a plurality of sub data to be processed with small data volume based on the segmentation processing, and the sub data to be processed with small data volume is subjected to operation processing in the neural network model, so that the sub data to be processed can be read and written through a plurality of storage units of a static memory in the operation processing process, and the operation processing efficiency of the neural network model on the data to be processed with large data volume is improved.

In one possible implementation of the first aspect, the neural network model includes a plurality of operation layers; predicting the size of a first storage space occupied by data to be processed when the neural network model is operated, wherein the predicting comprises the following steps: respectively predicting the size of a plurality of third storage spaces occupied by the data to be processed when the data to be processed is operated in each operation layer; and taking the maximum value of the sizes of the plurality of third storage spaces as the size of the first storage space.

In the embodiment of the application, the segmentation processing is determined by the maximum operation occupation memory of the data to be processed in each operation layer of the neural network model, and the data to be processed is segmented based on the storage capacity of the maximum operation occupation memory and the single storage unit of the static memory to obtain a plurality of sub data to be processed with smaller data volume, so that the data can be read and written through the plurality of storage units of the static memory when the plurality of sub data to be processed perform operation processing on the operation layer corresponding to the maximum operation occupation memory. Meanwhile, the data read-write can be realized through the static memory when the plurality of to-be-processed subdata occupy the operation layer corresponding to the memory in the maximum operation, namely the data read-write can also be realized through the static memory when the plurality of to-be-processed subdata occupy other operation layers corresponding to the memory in the maximum operation.

In a possible implementation of the first aspect, the predicting sizes of a plurality of third storage spaces that will be occupied by the data to be processed when the data to be processed is operated in each operation layer respectively includes: and respectively predicting the size of each third storage space based on the sum of the sizes of the storage spaces occupied by the input data and the output data of each operation layer during operation.

In one possible implementation of the first aspect, the operation layer includes at least one or more of the following: cutting layers, coiling layers, element-by-element operation layers and splicing layers.

In a possible implementation of the first aspect, the segmenting the data to be processed based on a relationship between a size of the first storage space and a size of a second storage space of a storage unit of a static memory inside the electronic device includes: a slicing number of the slicing is determined based on a ratio of a size of the first storage space to a size of the second storage space.

In a possible implementation of the first aspect, the determining a slicing number of the slicing based on a ratio of a size of the first storage space to a size of the second storage space includes: the number of cuts is N, and N is a positive number; corresponding to the condition that N is an integer, segmenting the data to be processed based on N to obtain N subdata to be processed; and corresponding to the condition that N comprises decimal places, the number of the divisions is M, the M is the addition of 1 of the decimal place of N, and the data to be processed is divided based on M to obtain M sub data to be processed.

In a possible implementation of the first aspect, the segmenting the to-be-processed data to obtain a plurality of to-be-processed sub-data includes: the to-be-processed subdata comprises a first characteristic; the first characteristic is used for representing the position relation of the sub-data to be processed in the data to be processed.

In a possible implementation of the first aspect, the size of the second storage space is determined by: the size of the second storage space is determined by a minimum storage unit of a static memory inside the electronic device.

In a second aspect, the present application provides a readable storage medium, on which instructions are stored, and when executed on an electronic device, the instructions cause the electronic device to implement the first aspect and any one of the data processing methods of the neural network model provided by various possible implementations of the first aspect.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory to store instructions for execution by one or more processors of an electronic device; and a processor, which is one of the processors of the electronic device, configured to execute the instructions stored in the memory to implement the data processing method of the first aspect and any one of the neural network models provided in the various possible implementations of the first aspect.

In a fourth aspect, the present application provides a program product, where the program product includes instructions that, when executed by an electronic device, can cause the electronic device to implement the first aspect and any one of the data processing methods of the neural network model provided in the various possible implementations of the first aspect.

Drawings

FIG. 1 illustrates a schematic diagram of a neural network model 10, according to some embodiments of the present application;

FIG. 2 illustrates a flow diagram of a data processing method of a neural network model, according to some embodiments of the present application;

FIG. 3 illustrates a schematic structural diagram of a neural network model 20, according to some embodiments of the present application;

FIG. 4 illustrates a block diagram of a data processing apparatus 200, according to some embodiments of the present application;

fig. 5 shows a schematic structural diagram of an electronic device 100, according to some embodiments of the present application.

Detailed Description

The illustrative embodiments of the present application include, but are not limited to, a data processing method of a neural network model, a storage medium, and an electronic device.

For a clearer understanding of the present application, the structure of the neural network model will now be described.

Fig. 1 illustrates a schematic diagram of a neural network model 10, according to some embodiments of the present application. As shown in fig. 1, the neural network model 10 includes an input layer 101, a convolutional layer 1021, an Eltwise layer 103, a convolutional layer 1022, and an output layer 104.

The input layer 101 is used for receiving data to be processed and writing the data to be processed into the memory 101A. For example, the input layer receives the feature maps a and D and writes the feature maps a and D to the memory 101A.

The convolutional layer 1021 is used for reading the output data of the input layer 101 from the memory 101A as the input data of the convolutional layer 1021, and writing the convolution result into the memory 1021A after convolution processing as the output data of the convolutional layer 1021. For example, the convolutional layer 1021 reads the feature map D from the memory 101A, and after obtaining the feature map B by convolution processing, writes the feature map B into the memory 1021A.

The Eltwise layer 103 is configured to read output data of the input layer 101 and the convolution layer 1021 from the memory 101A and the memory 1021A as input data of the Eltwise layer 103, perform arithmetic processing on the input data by operations such as product (dot product), sum (sum), max (maximum value), and the like, obtain an arithmetic result, and write the arithmetic result in the memory 103A as output data of the Eltwise layer 103.

It is understood that the number of input data of the Eltwise layer 103 is at least two, and the product (dot product), sum (sum), max (maximum value), and the like are calculated by the two input data.

For example, the Eltwise layer 103 reads the feature map a and the feature map B from the memory 101A and the memory 1021A, respectively, so that the feature map a and the feature map B serve as input data of the Eltwise layer 103. The characteristic diagram A and the characteristic diagram B are as follows:

in some embodiments, when the Eltwise layer 103 performs sum operation, the Eltwise layer outputs data as the feature map C and simultaneously writes the feature map C to the memory 103A. Wherein, the characteristic diagram C is as follows:

in other embodiments, the Eltwise layer 103 outputs data as the feature map C 'when performing the product operation, and simultaneously writes the feature map C' into the memory 103A. Wherein, the characteristic diagram C' is as follows:

it can be understood that the dimensions of the input feature maps and the dimensions of the output feature maps of the Eltwise layer 103 are the same, for example, the dimensions of feature map a, feature map B and feature map C are all 3 × 3.

It is understood that the Eltwise layer shown in fig. 1 has two inputs and one output, and in other embodiments, the Eltwise layer may further include more inputs, and is not limited in particular.

The convolution layer 1022 is configured to read output data of the Eltwise layer 103 from the memory 103A as input data of the convolution layer 1022, and write a convolution result into the memory 1022A after convolution processing as output data of the convolution layer 1022. For example, the convolution layer 1022 reads the feature map C from the memory 103A, obtains the feature map E through convolution processing, and writes the feature map E into the memory 1022A. Further, the output layer 104 is used for reading data from the memory 1022A and outputting the data.

It is understood that the structure of the neural network model 10 is merely an example, and in other embodiments, the neural network model may include more or less layers than the neural network model 10, such as an activation layer, a normalization layer, and the like, and the application is not limited thereto.

As described above, each operation of the neural network model requires writing a corresponding operation result into the memory, and the next operation requires reading data from the memory written by the previous operation and then performing the operation. However, when the amount of data processed by the neural network model is large, each operation result generated in the operation process is usually written into a dynamic memory, for example, DDR, and it is difficult to ensure the performance of the neural network model due to high read-write delay and large power consumption of the dynamic memory.

It can be understood that the data read/write delay of the static memory is smaller than that of the dynamic memory, that is, the static memory has the characteristic of faster data read/write speed compared with the dynamic memory. Therefore, in order to improve the data operation efficiency of the neural network model, the data reading and writing of the neural network model in the data operation process can be realized through the static memory. However, since the storage capacity of a single memory cell of the static memory is small, data having a large data amount cannot be written in the memory cell of the static memory.

The data to be processed of the neural network model is divided into a plurality of sub data to be processed with smaller data volume based on the division processing, and the sub data to be processed with smaller data volume is operated in the neural network model, so that the sub data to be processed can be read and written through a plurality of storage units of a static memory in the operation processing process, and the operation processing efficiency of the neural network model on the data to be processed with larger data volume is improved.

In some embodiments, the segmentation processing is determined by the maximum operation occupied memory of the data to be processed in each operation layer of the neural network model, and the data to be processed is segmented based on the storage capacity of the maximum operation occupied memory and a single storage unit of the static memory to obtain a plurality of pieces of sub data to be processed with smaller data size, so that the plurality of pieces of sub data to be processed can perform data reading and writing through the plurality of storage units of the static memory when the operation layer corresponding to the maximum operation occupied memory performs operation processing. Meanwhile, the data read-write can be realized through the static memory when the maximum operation of the plurality of sub-data to be processed occupies the operation layer corresponding to the memory, namely, the data read-write can also be realized through the static memory when the maximum operation of the plurality of sub-data to be processed occupies other operation layers corresponding to the memory.

For example, the maximum operation occupation memory of the data to be processed in each operation layer in the neural network model is 30 million, and the storage capacity of a single storage unit of the static memory is 5 million, the data to be processed can be divided into 6 sub data to be processed of 5 million through the segmentation processing, and the 6 sub data to be processed of 5 million are utilized to perform the operation processing in each operation layer in the neural network model, so that the data to be processed can be read and written through the storage unit of the static memory in the operation processing process.

In some embodiments, the memory occupied by the data to be processed during operation in each operation layer of the neural network model may be determined by the input data and the output data of the corresponding operation layer.

For example, the input data of the Eltwise layer 103 in fig. 1 is the feature map a and the feature map B, and the output data is the feature map C, and since the dimensions of the feature map a, the feature map B and the feature map C are the same, it can be understood that the data sizes of the feature map a, the feature map B and the feature map C are the same, for example, 10 megabytes of data occupy 30 megabytes of memory for the operation of the Eltwise layer 103.

For another example, in fig. 1, the input data of the convolution layer 1021 is the feature map D, and the output data is the feature map B, so that the memory occupied by the convolution layer 1021 during operation can be determined by the feature map D and the feature map B.

In some embodiments, to ensure that the sizes of the input data and the output data of the convolutional layer are the same, it is usually necessary to perform Padding (Padding) on the input data of the convolutional layer, i.e., the data size of the input data of the convolutional layer is larger than the data size of the output data. For example, the input data of the convolutional layer 1021 is a feature map D, the output data is a feature map B, and the data sizes of the feature map D and the feature map B are both 10 megabits, and the convolutional layer 1021 performs a filling process on the feature map D when acquiring the feature map D, that is, the data size of the input data of the convolutional layer 1021 is usually greater than 10 megabits, for example, the data size after the filling process on the feature map D is 12 megabits, and the memory occupied by the convolutional layer 1021 during operation is 22 megabits. The data size of the input data of the convolutional layer 1021 may be determined according to the padding size of the padding process, and is not particularly limited. Similarly, the memory occupied by the convolutional layer 1022 during operation is 22 megabytes.

In some embodiments, the feature map D may be padded in a zero padding manner. For example, the output data of the input layer 101 is a feature map D, the size of which is 3 × 3. Wherein, the characteristic diagram D is as follows:

further, for example, if the convolution kernel of the convolutional layer 1021 is 3x3 and the step size is 1, the padding size in the padding process is 1, and the output data of the convolutional layer 1021 can also be 3x3. And filling the characteristic diagram D to obtain a characteristic diagram D0, wherein the characteristic diagram D0 is as follows:

it can be understood that, by filling the feature map in a zero-filling manner, the result of the weighted operation performed by the convolution kernel of the convolutional layer 1021 on the filled portion is 0, thereby avoiding the influence of the filling process on the operation result of the convolutional layer.

Some embodiments of the present application are described below with reference to the block diagram of the neural network model 10 shown in fig. 1.

Fig. 2 is a schematic flow chart diagram illustrating a data processing method of a neural network model according to some embodiments of the present application, where the main execution subject of the flow chart diagram is an electronic device. As shown in fig. 2, the process includes the following steps:

s21: and acquiring data to be processed.

In some embodiments, the data to be processed is data to be processed representing a neural network model, such as data input by the input layer 101 in the neural network model 10 shown in fig. 1, feature maps a and D.

S22: and taking the maximum memory occupied by the data to be processed when the data to be processed is operated in each operation layer of the neural network model as a first memory.

In some embodiments, the neural network model includes a plurality of operation layers, such as the neural network model 10 shown in fig. 1 including an input layer 101, a convolutional layer 1021, an Eltwise layer 103, a convolutional layer 1022, and an output layer 104. Each operation layer needs to occupy a certain memory size when performing operation processing on input data, and the memory occupied by each operation layer when performing operation processing can be determined by the input data and the output data of the operation layer. For example, as shown in fig. 1, the memory occupied by the convolution layer 1021/1022 when the characteristic diagram B/E is obtained through operation is 22 megabytes; the memory occupied by the operation of the Eltwise layer 103 is 30 million, that is, the memory occupied by the operation of the Eltwise layer 103 is 30 million as the first memory.

S23: acquiring a storage unit of a static memory for storing data; the storage memory of the storage unit is a second memory.

In some embodiments, the static memory may be an SRAM, and the storage unit is the smallest storage unit in the SRAM. In the present application, a storage memory of a minimum storage unit in an SRAM is taken as an example of 5 megabytes, and some embodiments of the present application are described.

S24: a score is determined based on a ratio of the first memory and the second memory.

In some embodiments, the first memory is used as a divisor, the second memory is used as a dividend, and a division result of the first memory and the second memory is used as a division number.

It can be understood that, since the division of the data to be processed can only be positive integers, that is, when the division result of the first memory and the second memory is a fraction or includes a decimal number, the unit number of the division result is obtained, and 1 is added on the basis of the unit number to serve as the division number.

For example, as described in fig. 1, if the first memory is 30 megabits and the second memory is 5 megabits, the division result is 6, and 6 is used as the division number; for another example, in some embodiments, if the division result is 5.1 or 5.6, the single digit number 5 of the division result is obtained, 1 is added on the basis of 5, and 6 is used as the score; in other embodiments, when the division result of the first memory and the second memory is a fraction or includes a decimal number, a single digit of the division result is obtained, and any positive integer greater than 1 is added as a fractional number on the basis of the single digit, which is not limited specifically.

S25: and segmenting the data to be processed based on the segmentation number to obtain a plurality of sub data to be processed.

In some embodiments, the data to be processed is uniformly divided by the division number to obtain a plurality of sub data to be processed corresponding to the division number, and the sub data to be processed is used for representing the sub data to be processed. Meanwhile, the second memory is smaller than or equal to the first memory. For example, when the division result in step S24 is a positive integer, the second memory is equal to the first memory; when the division result in step S24 is a fraction or includes a decimal, the second memory is smaller than the first memory.

It can be understood that, since the second memory is smaller than or equal to the first memory, that is, the sub-data to be processed can be written into the static memory and can be read from the static memory, and since the delay for reading and writing data from the static memory is lower than the delay for reading and writing data from the dynamic memory, the processing efficiency of the neural network model can be improved by reading and writing data from the static memory.

For example, fig. 3 illustrates a schematic structural diagram of the neural network model 20, according to some embodiments of the present application. As shown in fig. 3, the neural network model 20 includes an input layer 201, a slicing layer 202, a convolutional layer 2031, an Eltwise layer 204, a convolutional layer 2032, a stitching layer 205, and an output layer 206.

In combination with fig. 1, fig. 3 uses 6 as a segmentation number, and segments the output data of the input layer 201 through the segmentation layer 202, so that each to-be-processed subdata obtained by segmentation can be written into the static memory, thereby increasing the data read-write speed, and meanwhile, the segmented data is spliced through the splicing layer 205, thereby ensuring that the model output data is complete data.

In some embodiments, before the segmentation layer 202 segments the data to be processed, a label is preset for each element in the data to be processed, the position of each element in the data to be processed is determined through the label, so that the input data can be determined through the label when the subsequent Eltwise layer 204 performs element-by-element operation, and the subsequent splicing layer 205 can determine each splicing position through the label. The label may be determined by a row-column manner, for example, the characteristic diagram is:

since the element 1 is located in the first row and the first column, the label of the element 1 is set to 11, and the rest elements 2, 4, and 5 are the same, specifically:

in other embodiments, the tag may also be determined in other ways, and is not limited in particular.

In some embodiments, the labels are determined in a row-column manner as an example, as shown in fig. 3, the segmentation layer 202 may segment the feature map D in a vertical segmentation manner to obtain a plurality of sub-data to be processed, which is specifically shown as follows:

in other embodiments, the segmentation layer 202 further segments the feature map D in a transverse cutting manner to obtain a plurality of sub-data to be processed, which is specifically as follows:

D1＝|d11 d12 d13|D2＝|d21 d22 d23|D3＝|d31 d32 d33|

it is understood that feature map D has a data size of 10 megabits, and feature maps D1, D2, and D3 have a data size of 10/3 megabits. The data size of the characteristic diagrams A1, A2 and A3 is 10/3 million.

It can be understood that the size and data size of the feature maps D1, D2, and D3 obtained by segmentation based on the segmentation number are the same, and the manner of segmenting the feature map D by the segmentation number is not particularly limited.

S26: and performing operation on the neural network model based on the plurality of pieces of to-be-processed sub data, so that the to-be-processed sub data and an operation result of the to-be-processed sub data can be written into the storage unit.

In some embodiments, as shown in fig. 3, since the to-be-processed data input by the input layer 201, such as the feature map a and the feature map D, are not split and cannot be stored in the static memory with a small memory, the input layer 201 needs to write the to-be-processed data into the dynamic memory when outputting the to-be-processed data.

Further, after the segmentation layer 202 reads the data to be processed from the dynamic memory and performs label marking and segmentation, multiple pieces of sub-data to be processed are obtained, for example, the feature maps A1, A2, and A3 or the feature maps D1, D2, and D3 satisfy the storage condition of the static memory, for example, 10/3 million feature maps A1, A2, and A3 or feature maps D1, D2, and D3 may be stored in the static memory with a minimum storage unit of 5 million, that is, the sub-data to be processed may be written into the static memory when the segmentation layer 202 outputs the sub-data to be processed.

In some embodiments, the convolutional layer 2031 needs to perform padding processing on the input data when acquiring the input data, so that the sizes of the input data and the output data of the convolutional layer 2031 are the same. Since the input data feature map D of the convolutional layer 2031 is segmented to obtain feature maps D1, D2, and D3, in order to ensure that the operation result of the convolutional layer 2031 performing convolution processing on the feature maps D1, D2, and D3 is the same as the operation result when the feature maps D1, D2, and D3 are not segmented, when the feature maps D1, D2, and D3 are filled, the segmented side may be filled with the adjacent segmented feature maps. For example, the feature maps D1, D2, and D3 obtained by segmenting the feature map D in the vertical segmentation mode are as follows:

further, if the cut side is filled with the adjacent cut feature maps, the feature maps D10, D20, and D30 obtained by filling the feature maps D1, D2, and D3 are as follows:

it can be understood that, by filling the segmented side with the adjacent segmented feature maps, the operation result of the convolution layer 2031 after performing convolution processing on the segmented feature maps D1, D2, and D3 is the same as the operation result of the un-segmented feature map D, thereby ensuring the operation accuracy of the neural network model 20. The filling processing principle of the convolutional layer 2032 is the same as that of the convolutional layer 2031, and details thereof are not described herein.

It is understood that the padding process of the convolutional layer is an operation performed when the convolutional layer acquires input data, so that output data of a previous operation layer of the convolutional layer is of an original data size. For example, the output data feature maps D1, D2, and D3 of the input layer 201 are 10/3 million, and the convolutional layer 2031 performs a padding process on the feature maps D1, D2, and D3 when acquiring the output data of the input layer 201, that is, the data size of the input data of the convolutional layer 2031 should be 12/3 million, which is the data size of the feature maps D10, D20, and D30.

In some embodiments, the convolutional layer 2031 performs convolution processing on the feature maps D1, D2 and D3 to obtain the feature maps B1, B2 and B3. The Eltwise layer 204 reads the feature maps A1, A2, and A3 and the feature maps B1, B2, and B3, and specifies the data to be correlated and calculated from the labels of the input data, and may specify the data to be correlated and calculated based on the same label when calculating the feature maps A1, A2, and A3 and the feature maps B1, B2, and B3, for example. For example, when the Eltwise layer 204 performs sum operation, the output data of the Eltwise layer 204 is the feature maps C1, C2, and C3, which are specifically as follows:

further, the convolutional layer 2032 obtains output data of the Eltwise layer 204, such as feature maps C1, C2, and C3, and performs padding and convolution processing on the feature maps C1, C2, and C3 to obtain feature maps E1, E2, and E3. Finally, the characteristic diagrams E1, E2 and E3 are read through the splicing layer 205, and spliced according to the label marks to obtain a processing result with a large data volume and a complete data volume, and the processing result is written into the dynamic memory, and the processing result is output through the output layer 206, so that the operation processing of the data to be processed is completed.

It can be understood that, in fig. 3, after the output data of the input layer 201 is sliced by the slicing layer 202, a plurality of sliced data with a small data size are subjected to operation processing in the neural network model 20, so that the convolutional layer 2031, the Eltwise layer 204, and the convolutional layer 2032 can all read the input data from the static memory and write the output data into the static memory.

It can be understood that the data amount processed in fig. 1 and fig. 3 is the same, and compared with the way that the data to be processed is not split in fig. 1, each operation layer needs to read the input data from the dynamic memory and write the output data into the dynamic memory, fig. 3 splits the data to be processed by the storage memory of the static memory and the maximum occupied memory of the data to be processed in the neural network model operation, so that the data to be processed is used to the dynamic memory only when the data is input and output by the neural network model, the input and output data of the intermediate operation step can be completed by the static memory, and meanwhile, since the delay of reading and writing the data from the static memory is lower than the delay of reading and writing the data from the dynamic memory, fig. 3 can improve the processing efficiency of the neural network model by reading and writing the data from the static memory.

It is understood that the input layer 202 and the output layer 206 in fig. 3 can also be other operation layers in the neural network model, such as convolutional layers, eltwise layers, and the like.

Fig. 4 shows a block diagram of a data processing apparatus 200, according to some embodiments of the present application.

As shown in fig. 4, the data processing apparatus 200 includes a memory determination module 211, a segmentation module 212, and a model processing module 213.

The memory determining module 211 is configured to determine a memory occupied by the data to be processed during operation in each operation layer of the neural network model, and use a memory occupying the largest size as a first memory; and the storage unit is used for determining the storage unit of the static storage for storing data, and the storage memory of the storage unit is determined as the second memory. For specific functions of the memory determination module 211 and a method for implementing the specific functions, reference may be made to the foregoing steps S22 and S23.

The segmentation module 212 is configured to set a label for input data of the neural network model, determine a segmentation number based on a ratio of the first memory to the second memory, and segment the input data of the neural network model based on the segmentation number, for example, segment the feature map D, the feature map a, and the feature map B. The specific function of the segmentation module 212, and the method for implementing the specific function, refer to the foregoing steps S24 and S25.

The model processing module 213 is configured to obtain data to be processed, and perform an operation on the neural network model based on the multiple sub-data to be processed obtained by the segmentation module 212. The specific functions of the model processing module 213, and the method for implementing the specific functions, refer to the foregoing steps S21 and S26.

It is understood that the structure of the data processing apparatus 200 shown in fig. 4 is only an example, and in other embodiments, the data processing apparatus 200 may include more or less modules, and may combine or split some modules, which is not limited herein.

It can be understood that the data processing method of the neural network model provided in the embodiment of the present application may be applied to any electronic device capable of operating the neural network model, including but not limited to a mobile phone, a wearable device (such as a smart watch, etc.), a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR)/Virtual Reality (VR) device, and the like, and the embodiment of the present application is not limited. To facilitate understanding of the technical solution of the embodiment of the present application, an electronic device 100 is taken as an example to describe a structure of an electronic device to which the data processing method of the neural network model provided in the embodiment of the present application is applied.

Further, fig. 5 illustrates a schematic structural diagram of an electronic device 100, according to some embodiments of the present application. As shown in fig. 5, the electronic device 100 includes one or more processors 101, a system Memory 102, a Non-Volatile Memory (NVM) 103, a communication interface 104, an input/output (I/O) device 105, and system control logic 106 for coupling the processors 101, the system Memory 102, the NVM 103, the communication interface 104, and the input/output (I/O) device 105. Wherein:

the Processor 101 may include one or more Processing units, for example, processing modules or Processing circuits that may include a Central Processing Unit (CPU), an image Processing Unit (GPU), a Digital Signal Processor (DSP), a Micro-programmed Control Unit (MCU), an Artificial Intelligence (AI) Processor or a Programmable logic device (FPGA), a Neural Network Processor (NPU), and the like may include one or more single-core or multi-core processors. In some embodiments, the NPU may be configured to execute instructions corresponding to the data processing method of the neural network model provided in the embodiments of the present application.

The system Memory 102 is a volatile Memory, such as a Random-Access Memory (RAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like. The system memory is used to temporarily store data and/or instructions, for example, in some embodiments, the system memory 102 may be used to store the above-described feature maps a, D, and the like.

Non-volatile memory 103 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory 103 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as a Hard Disk Drive (HDD), compact Disc (CD), digital Versatile Disc (DVD), solid-State Drive (SSD), and the like. In some embodiments, the non-volatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like. In other embodiments, the non-volatile memory 103 may be used to store the signature a, signature D, etc. described above.

In particular, system memory 102 and non-volatile memory 103 may each include: a temporary copy and a permanent copy of instruction 107. The instructions 107 may include: when executed by at least one of the processors 101, causes the electronic device 100 to implement the data processing method of the neural network model provided by the embodiments of the present application.

The communication interface 104 may include a transceiver to provide a wired or wireless communication interface for the electronic device 100 to communicate with any other suitable device over one or more networks. In some embodiments, the communication interface 104 may be integrated with other components of the electronic device 100, for example the communication interface 104 may be integrated in the processor 101. In some embodiments, the electronic device 100 may communicate with other devices through the communication interface 104, for example, the electronic device 100 may obtain, through the communication interface 104, a neural network model and pending data corresponding to the neural network model, such as the feature map a and the feature map D, from other electronic devices.

Input/output (I/O) devices 105 may include input devices such as a keyboard, mouse, etc., output devices such as a display, etc., and a user may interact with electronic device 100 through input/output (I/O) devices 105.

System control logic 106 may include any suitable interface controllers to provide any suitable interfaces with other modules of electronic device 100. For example, in some embodiments, system control logic 106 may include one or more memory controllers to provide an interface to system memory 102 and non-volatile memory 103.

In some embodiments, at least one of the processors 101 may be packaged together with logic for one or more controllers of the System control logic 106 to form a System In Package (SiP). In other embodiments, at least one of the processors 101 may also be integrated on the same Chip with logic for one or more controllers of the System control logic 106 to form a System-on-Chip (SoC).

It is understood that the configuration of electronic device 100 shown in fig. 5 is merely an example, and in other embodiments, electronic device 100 may include more or fewer components than shown, or some components may be combined, or some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this Application, a processing system includes any system having a Processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-Only memories (CD-ROMs), magneto-optical disks, read-Only memories (ROMs), random Access Memories (RAMs), erasable Programmable Read-Only memories (EPROMs), electrically Erasable Programmable Read-Only memories (EEPROMs), magnetic or optical cards, flash Memory, or tangible machine-readable memories for transmitting information (e.g., carrier waves, infrared signals, digital signals, etc.) using the Internet to transmit information in an electrical, optical, acoustical or other form of propagated signals. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodological feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments may not be included or may be combined with other features.

It should be noted that, in each device embodiment of the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solving the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and description of the present patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the application.

Claims

1. A data processing method of a neural network model is applied to electronic equipment and is characterized by comprising the following steps:

acquiring data to be processed;

predicting the size of a first storage space occupied by the data to be processed when the neural network model is operated;

segmenting the data to be processed based on the relation between the size of the first storage space and the size of a second storage space of a storage unit of a static memory in the electronic equipment to obtain a plurality of sub data to be processed;

and inputting the plurality of sub-data to be processed into the neural network model for operation, and respectively storing the obtained sub-calculation results corresponding to the plurality of sub-data to be processed into a plurality of storage units of the static memory.

2. The method of claim 1, wherein the neural network model comprises a plurality of operational layers;

the predicting the size of a first storage space to be occupied by the data to be processed when the neural network model is operated comprises:

respectively predicting the size of a plurality of third storage spaces occupied by the data to be processed when the data to be processed is operated in each operation layer;

taking the maximum value of the sizes of the plurality of third storage spaces as the size of the first storage space.

3. The method according to claim 2, wherein the separately predicting sizes of a plurality of third storage spaces that will be occupied by the data to be processed when performing operations in the respective operation layers comprises:

and respectively predicting the size of each third storage space based on the sum of the sizes of the storage spaces occupied by the input data and the output data of each operation layer during operation.

4. The method of claim 2, wherein the operational layer comprises at least one or more of: cutting layers, coiling layers, element-by-element operation layers and splicing layers.

5. The method according to claim 1, wherein the slicing the data to be processed based on the relationship between the size of the first storage space and the size of the second storage space of the storage unit of the static memory inside the electronic device comprises:

determining a slicing number of the slicing based on a ratio of a size of the first storage space to a size of the second storage space.

6. The method of claim 5, wherein the determining the number of splits of the split based on the ratio of the size of the first storage space to the size of the second storage space comprises:

the number of the cuts is N, and N is a positive number;

corresponding to the condition that N is an integer, segmenting the data to be processed based on N to obtain N sub data to be processed;

and corresponding to the condition that N comprises decimal places, the number of the divisions is M, M is the number of the decimal places of N plus 1, and the data to be processed is divided based on M to obtain M sub data to be processed.

7. The method of claim 1, wherein the segmenting the data to be processed to obtain a plurality of sub-data to be processed comprises:

the sub data to be processed comprises a first characteristic; the first characteristic is used for representing the position relation of the sub-data to be processed in the data to be processed.

8. The method of claim 1, wherein the size of the second storage space is determined by: and determining the size of the second storage space through a minimum storage unit of a static memory inside the electronic equipment.

9. A computer-readable storage medium having instructions stored thereon, which when executed on an electronic device, cause the electronic device to implement the method of any one of claims 1 to 8.

10. An electronic device, comprising:

a memory to store instructions for execution by one or more processors of an electronic device;

and a processor, which is one of the processors of the electronic device, for executing the instructions stored in the memory to implement the method of any one of claims 1 to 8.