CN117273072A

CN117273072A - Data processing method, device, electronic equipment and storage medium

Info

Publication number: CN117273072A
Application number: CN202210700915.8A
Authority: CN
Inventors: 王智慧
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-06-20
Filing date: 2022-06-20
Publication date: 2023-12-22

Abstract

The embodiment of the application provides a data processing method, a device, electronic equipment and a storage medium, which relate to the technical field of data processing, and are characterized in that input data of a current to-be-processed block and adjacent blocks in a current feature map are obtained, wherein the input data of the current to-be-processed block is data input into a convolutional neural network when the current to-be-processed block is subjected to convolution operation, and the input data of the adjacent blocks is data input into the convolutional neural network when the adjacent blocks are subjected to convolution operation; dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block; and respectively storing the data of the multiple preset types into multiple preset buffer areas, wherein the data of different preset types correspond to different preset buffer areas, so that repeated carrying of redundant data can be effectively reduced, the calculation efficiency is improved, and the system power consumption is saved.

Description

Data processing method, device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a data processing method, a data processing device, electronic equipment and a storage medium.

Background

Convolutional neural networks are widely used in computer vision processing tasks such as image classification, object detection, and the like. Because of the power consumption and area limitations of mobile terminals and the high performance requirements, a special convolutional neural network processor chip is generally used to operate the neural network algorithm on the mobile terminal.

In the operation process of the convolutional neural network, the size of the feature map is sometimes larger, and the feature map cannot be processed at one time by a chip with limited computing resources and storage resources. At present, a processor chip divides a feature map into a plurality of blocks with the same size and overlapping areas, and processes one block every clock cycle according to a certain sequence, so that operation is gradually performed on all the blocks.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a storage medium, so as to solve the problems.

In a first aspect, an embodiment of the present application provides a data processing method. The method comprises the following steps: acquiring input data of a current block to be processed and an adjacent block in a current feature map, wherein the input data of the current block to be processed is data input into a convolutional neural network when the convolution operation is carried out on the current block to be processed, and the input data of the adjacent block is data input into the convolutional neural network when the convolution operation is carried out on the adjacent block; dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block; and respectively storing the data of the multiple preset types into multiple preset buffer areas, wherein the data of different preset types correspond to different preset buffer areas.

In a second aspect, embodiments of the present application provide a data processing apparatus. The device comprises: the data acquisition module is used for acquiring input data of a current block to be processed and adjacent blocks in a current feature map, wherein the input data of the current block to be processed is data input into a convolutional neural network when the convolution operation is carried out on the current block to be processed, and the input data of the adjacent blocks is data input into the convolutional neural network when the convolution operation is carried out on the adjacent blocks; the data dividing module is used for dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block; the data storage module is used for respectively storing the data of the multiple preset types into multiple preset buffer areas, wherein the data of different preset types correspond to different preset buffer areas.

In a third aspect, embodiments of the present application provide an electronic device. The electronic device includes a memory, one or more processors, and one or more applications. Wherein one or more application programs are stored in the memory and configured to perform the data processing methods provided by the embodiments of the present application when invoked by one or more processors.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium. The computer readable storage medium has stored therein program code configured to perform the data processing method provided by the embodiments of the present application when called by a processor.

The embodiment of the application provides a data processing method, a device, an electronic device and a storage medium, wherein input data of a partition are divided into a plurality of preset types of data, the plurality of preset types of data are respectively stored in a plurality of preset buffer areas, different preset types of data can be stored in different buffer areas, and classified storage is realized, so that the input data of the partition can be directly taken out from different buffer areas when a convolution operation is carried out on the subsequent partition, the data corresponding to the whole memory bit width is not required to be loaded, and overlapping data between adjacent partitions is extracted from the loaded data, thereby effectively reducing repeated carrying of redundant data, improving the computing efficiency and saving the system power consumption.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of input data for a chunk of Conv3x3_S1 operator provided by an exemplary embodiment of the present application;

fig. 2 is a schematic diagram of an application scenario of a data processing method provided in an embodiment of the present application;

FIG. 3 is a flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a convolution operation with an operator Conv3x3_S1 according to an exemplary embodiment of the present application;

FIG. 5 is a flow chart of a data processing method according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a feature map divided into a plurality of partitions provided in an exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a current pending block of a current feature map provided by an exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of a storage case of a set of data of a second preset type provided in an exemplary embodiment of the present application;

FIG. 9 is a block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present application;

fig. 11 is a block diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a schematic diagram of input data of a block of a conv3x3_s1 operator according to an exemplary embodiment of the present application. According to the characteristic of convolution operation, data multiplexing exists between adjacent blocks, and as shown in fig. 1, the convolution kernel window of the Conv3x3_S1 operator is 3*3, the step length is 1, and no filling exists. Assuming that the size of the current block to be processed corresponding to the conv3x3_s1 operator is tile_ow×tile_oh, in order to obtain an output block with the size of tile_ow×tile_oh, input data with the size of (tile_ow+2) ×tile_oh+2 needs to be input into the convolutional neural network. Wherein, tile_ow is the size of the current block to be processed and the output block in the first direction W, and tile_oh is the size of the current block to be processed and the output block in the second direction H perpendicular to the first direction.

As shown in fig. 1, in the first direction W, overlapping data of 2×2 (tile_oh+2) exists between adjacent tiles 0 and 1, and overlapping data of 2×2 (tile_oh+2) also exists between adjacent tiles 1 and 2. Similarly, in the second direction H, there is (tile_ow+2) ×2 overlapping data between two adjacent tiles (not shown in fig. 1).

Taking the first direction W as an example, when performing convolution operation on the segment 0, overlapping data between the segment 0 and the segment 1 needs to be loaded as an input for calculating the segment 0. After convolving segment 0, the overlapping data between segment 0 and segment 1 is released. When performing a convolution operation on the block 1, it is necessary to reload overlapping data between the block 0 and the block 1 as an input for calculating the block 1. In order to obtain the overlapping data between the block 0 and the block 1, the data corresponding to the whole memory bit width needs to be loaded, and then the overlapping data between the block 0 and the block 1 is extracted from the loaded data. The memory bit width may refer to the amount of data that can be transferred at one time by the memory or the video memory. That is, the above method for acquiring overlapping data of adjacent blocks needs to repeatedly carry redundant data, which increases system power consumption and reduces computing efficiency, thereby adversely affecting performance of the processor.

In order to improve the above problems, the present application provides a data processing method, apparatus, electronic device, and storage medium, where input data of a partition is divided into a plurality of preset types of data, the plurality of preset types of data are stored in a plurality of preset buffers, respectively, and different preset types of data can be stored in different buffers, so as to implement classified storage, so that when a subsequent partition performs convolution operation, the input data of the partition can be directly taken out from different buffers, without loading data corresponding to the whole memory bit width, and overlapping data between adjacent partitions can be extracted from the loaded data, thereby effectively reducing repeated handling of redundant data, improving calculation efficiency, and saving system power consumption.

Referring to fig. 2, fig. 2 is a schematic diagram of an application scenario of the data processing method according to the embodiment of the present application. The data processing system 10 includes a memory 11 and a processor 12. The memory 11 is connected to the processor 12 to enable data interaction between the memory 11 and the processor 12. The data processing system 10 may be disposed in a terminal device, which may be an electronic device 400 shown in fig. 10, which will be mentioned below, or may be a personal computer, a tablet computer, a smart phone, etc., which is not specifically limited herein. In some embodiments, the processor 12 may perform type classification on the input data of the current block to be processed of the feature map during the convolution operation, to obtain different types of data. In some embodiments, the memory 11 may include a plurality of buffers 111, 112, 113, 114, and the plurality of buffers 111, 112, 113, 114 are respectively used to store the different types of data described above.

Referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the present application. The data processing method may be applied to the processor 12 in the data processing system 10 shown in fig. 2 described above, or the data processing apparatus 300 shown in fig. 9 to be mentioned below, or the processor 420 in the electronic device 400 shown in fig. 10 to be mentioned below. The data processing method may include the following steps S110 to S130.

Step S110, obtaining input data of a current block to be processed and adjacent blocks in a current feature map, wherein the input data of the current block to be processed is data input into a convolutional neural network when the current block to be processed is subjected to convolutional operation, and the input data of the adjacent blocks is data input into the convolutional neural network when the adjacent blocks are subjected to convolutional operation.

The current feature map in the embodiment of the present application may refer to a feature map to be processed currently or a feature map that is being processed currently and has not been processed yet in the convolution operation process. As an example, referring to fig. 4, fig. 4 is a schematic diagram of a convolution operation with an operator conv3x3_s1 according to an exemplary embodiment of the present application. The feature map 0 is operated by the convolution layer 0 to output a feature map 1, and the feature map 1 is operated by the convolution layer 2 to output a feature map 2. In the convolution layer 0, the feature map 1 is a feature map to be currently processed, and the feature map 0 is an input feature map of the feature map 1. In the convolution layer 2, the feature map 2 is a feature map to be currently processed, and the feature map 1 is an input feature map of the feature map 2.

The input data of the current to-be-processed partition in the embodiment of the present application may refer to a partition corresponding to the current to-be-processed partition in the input feature map of the current feature map. As shown in fig. 4, when the block to be processed currently is block 0, the input data of block 0 is a block corresponding to block 0 in feature map 0 (a block indicated by a solid thick frame in feature map 0, and an enlarged schematic diagram of the block is indicated by a broken-line arrow in the figure). When the block to be processed is the block 1, the input data of the block 1 is the block corresponding to the block 1 in the feature map 0 (the block marked by a broken-line thick frame in the feature map 0, and the broken-line arrow in the figure indicates an enlarged schematic diagram including the block).

In the embodiment of the application, the adjacent blocks refer to blocks adjacent to the current block to be processed in the current feature map. Adjacent tiles include processed tiles and/or unprocessed tiles. As shown in fig. 4, when the current block to be processed is block 0, the block located on the upper side of block 0 (not shown), the block located on the lower side (not shown), and the block located on the right side (block 1) are adjacent blocks of block 0, wherein the block located on the upper side of block 0 is processed, and the blocks located on the lower side and the right side of block 0 are both unprocessed.

The input data of the adjacent blocks in the embodiment of the present application may refer to the blocks corresponding to the adjacent blocks in the input features of the current feature map. As described above, when the adjacent block is the block 1, the input data of the block 1 is the block corresponding to the block 1 in the feature map 0.

In some embodiments, before performing convolution operation on the current to-be-processed block of the current feature map, input data of the current to-be-processed block and the adjacent block in the current feature map may be obtained from the input feature map of the current feature map. As an example, as shown in fig. 4, the current feature map is the feature map 1, the current block to be processed is the block 0, and the adjacent blocks are the blocks located on the upper side, the lower side and the right side of the block 0. Before performing the convolution operation on the block 0, the block corresponding to the block 0 and the blocks corresponding to the blocks located on the upper side, the lower side, and the right side of the block 0 may be acquired from the feature map 0.

Step S120, dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block.

The convolutional neural network performs a layer-by-layer convolutional operation on the processor. As shown in fig. 4, when performing the convolution operation of the convolution layer 0, the data of the feature map 0 needs to be fetched from the memory, and the feature map 1 is output to the memory through the convolution operation of the convolution layer 0. For convenience of accessing overlapping data when performing convolution operation on the blocks, the block data of the feature map 0 may be divided into a plurality of types of data, and different types of data may be stored in different buffers. Based on the fact that the input data of the current block to be processed and the input data of the adjacent block have overlapping portions and non-overlapping portions, the input data of the block to be processed can be classified according to the overlapping condition of the input data of the block to be processed and the input data of the adjacent block.

The to-be-processed partition in the embodiment of the application has a first direction and a second direction which are perpendicular to each other. As an example, the first direction may be the W direction shown in fig. 1, and the second direction may be the H direction shown in fig. 1.

As shown in fig. 4, taking the current block to be processed as block 0 as an example, the input data of block 0 and the input data of the adjacent block of block 0 include the following four overlapping cases:

overlap case a: the input data of the block 0 and the adjacent block have no data overlap;

overlapping case B: in the first direction W, there is a data overlap of the input data of block 0 and the input data of an adjacent block;

overlap case C: in the second direction H, there is a data overlap of the input data of the block 0 and the input data of the adjacent block;

overlap case D: in both the first direction W and the second direction H, there is a data overlap of the input data of the partition 0 and the input data of the adjacent partition.

Accordingly, a plurality of preset types may be set as the types corresponding to the above-described overlapping cases.

That is, the plurality of preset types in the embodiment of the present application may include the following four types:

the data of the first preset type is part of the input data of the current block to be processed, and the part of the input data of the adjacent block does not have data overlapping;

The second preset type corresponding to the overlapping condition B is that the data of the second preset type is part of the input data of the current block to be processed, and the part of the data overlaps with the input data of the adjacent block in the first direction W;

the third preset type corresponding to the overlapping condition C is that the data of the third preset type is the part of the input data of the current block to be processed, which overlaps the input data of the adjacent block in the second direction H;

and the fourth preset type corresponding to the overlapping condition D, wherein the data of the fourth preset type is part of the input data of the current to-be-processed block, and the part of the data overlaps with the input data of the adjacent block in the first direction W and the second direction H.

In some embodiments, the input data of the current block to be processed may be divided into the above four types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block.

It should be noted that, in practical application, preset types in multiple preset types can be added and deleted according to practical application scenes, and the specific number of preset types is not limited in the application. For example, according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block, the input data of the current block to be processed can be divided into the following two preset types of data: (1) In the input data of the current to-be-processed block, partial data which is overlapped with the input data of the adjacent block does not exist; (2) And the part of the input data of the current to-be-processed block is overlapped with the part of the input data of the adjacent block.

Step S130, respectively storing a plurality of preset types of data into a plurality of preset buffers, wherein the data of different preset types correspond to different preset buffers.

The number of preset buffers is the same as the number of preset types. For example, the preset types include the four types described above, and accordingly, the preset buffer includes four preset buffers. As an example, the plurality of preset buffers in the embodiments of the present application may include a first preset buffer, a second preset buffer, a third preset buffer, and a fourth preset buffer, which will be mentioned below. The first preset buffer zone corresponds to the first preset type, the second preset buffer zone corresponds to the second preset type, the third preset buffer zone corresponds to the third preset type, and the fourth preset buffer zone corresponds to the fourth preset type.

It should be noted that, in the embodiment of the present application, the preset buffer refers to a temporarily allocated storage area during a convolution operation. The storage space of the preset buffer zone can be calculated according to the size of the feature map, the size of the block, and the size of the input data of the block adopted by the convolution operation, and the specific storage space of the preset buffer zone is described in detail in step S220 to step S250.

In some embodiments, multiple preset types of data may be stored into multiple preset buffers. For example, the above four types of data may be stored in corresponding preset buffers, respectively.

As an example, as shown in fig. 4, for the convolution layer 0, taking the block 0 as the current block to be processed as an example, a first preset type of data corresponding to the overlapping case a in the input data of the block 0 may be stored in the first preset buffer. And storing the data of the second preset type corresponding to the overlapping condition B in a second preset buffer area in the input data of the block 0. And storing the data of the third preset type corresponding to the overlapping condition C in a third preset buffer area in the input data of the block 0. And storing the fourth preset type of data corresponding to the overlapping condition D in a fourth preset buffer zone in the input data of the block 0, so that the input data of the block 0 is respectively obtained from four different preset buffer zones when the convolution operation is carried out on the block 0 in the convolution layer 0 later, the data corresponding to the whole memory bit width is not needed to be loaded, and the input data of the block 0 is obtained from the loaded data.

According to the data processing method, the input data of the block are divided into the plurality of data of the preset types, the plurality of data of the preset types are respectively stored in the plurality of preset buffer areas, and the data of different preset types can be stored in different buffer areas to realize classified storage, so that the input data of the block can be directly taken out from different buffer areas when the convolution operation is carried out on the subsequent block, the data corresponding to the whole memory bit width is not needed to be loaded, overlapping data between adjacent blocks is extracted from the loaded data, repeated carrying of redundant data can be effectively reduced, the calculation efficiency is improved, and the system power consumption is saved.

Referring to fig. 5, fig. 5 is a flowchart of a data processing method according to another embodiment of the present application. The data processing method may be applied to the processor 12 in the data processing system 10 shown in fig. 2 described above, or the data processing apparatus 300 shown in fig. 9 to be mentioned below, or the processor 420 in the electronic device 400 shown in fig. 10 to be mentioned below. The data processing method may include the following steps S210 to S260.

Step S210, obtaining input data of a current block to be processed and adjacent blocks in a current feature map, wherein the input data of the current block to be processed is data input into a convolutional neural network when the current block to be processed is subjected to convolutional operation, and the input data of the adjacent blocks is data input into the convolutional neural network when the adjacent blocks are subjected to convolutional operation.

In the specific description of step S210, please refer to step S110, and the description of the embodiment of the present application is omitted here.

In some embodiments, when step S210 or step S110 above is performed, the size of the current feature map, the size of the current block to be processed, the size of the input data of the current block to be processed, so that the storage spaces of the first preset buffer, the second preset buffer, the third preset buffer, and the fourth preset buffer are respectively calculated according to the size of the current feature map, the size of the current block to be processed, and the size of the input data of the current block to be processed.

The size of the current feature map in the embodiment of the present application may be expressed as w×h, where W is the size of the current feature map in the first direction W, and H is the size of the current feature map in the second direction H.

The size of the current block to be processed in the embodiment of the present application may be expressed as tile_ow, where tile_ow is the size of the current block to be processed in the first direction W, and tile_oh is the size of the current block to be processed in the second direction H.

In this embodiment of the present application, the size of the input data of the current block to be processed may be expressed as (tile_ow+x) × (tile_oh+y), where X is a difference between the size of the input data of the current block to be processed and the size of the current block to be processed in the first direction W, Y is a difference between the size of the input data of the current block to be processed and the size of the current block to be processed in the second direction, (tile_ow+x) is a size of the input data of the current block to be processed in the first direction W, and (tile_oh+y) is a size of the input data of the current block to be processed in the second direction H.

As described above, the plurality of preset types in the embodiment of the present application include a first preset type, a second preset type, a third preset type, and a fourth preset type. Then based on these four types, a complete signature can be divided into the form shown in fig. 6, fig. 6 being a schematic diagram of a signature divided into a plurality of segments provided in an exemplary embodiment of the present application. The direction indicated by the dashed arrow is the convolution operation sequence of the blocks in the feature map, that is, the convolution operation sequence of the blocks in the feature map is calculated along the first direction W first, after the calculation of the blocks in the first direction W is completed, the blocks in the next row are switched to the next row along the second direction H, and the blocks in the next row are calculated along the first direction W until all the blocks are calculated.

Referring to fig. 7, fig. 7 is a schematic diagram of a current pending block of a current feature map according to an exemplary embodiment of the present application. Analysis of the convolution operation of the blocks not located at the edges of the feature map as shown in fig. 7 may result in the following:

1. when the convolution operation is performed on the block, the left adjacent block of the block is already calculated, and the input data of the left adjacent block is not needed for calculating the block, so that the input data of the left adjacent block can be released.

2. When the convolution operation is performed on the block, the upper block of the block is already calculated, but the overlapping data of the upper block of the block and the input data of the block need to be used for calculating the block, so that the overlapping data of the upper block of the block and the input data of the block cannot be released temporarily.

3. When the convolution operation is performed on the block, all blocks on the right side of the block are not calculated, and therefore, the overlapping data of the right-side adjacent block of the block and the input data of the block cannot be temporarily released, and the overlapping data of the upper block of all blocks on the right side of the block and the input data of all blocks on the right side cannot be released.

4. When the convolution operation is performed on the block, since all lower blocks of the block are not calculated, it is not possible to temporarily release the superimposed data of the lower blocks of all left blocks and the input data of all left blocks of the block, nor to release the superimposed data of the lower blocks of the block and the input data of the block.

Based on the analysis, the storage space of the first preset buffer zone, the storage space of the second preset buffer zone, the storage space of the third preset buffer zone and the storage space of the fourth preset buffer zone can be calculated, so that the storage space of the first preset buffer zone is equal to x_ow (tile_oh-Y), the storage space of the second preset buffer zone is equal to x_oh, the storage space of the third preset buffer zone is equal to (w+tile_ow) X Y, and the storage space of the fourth preset buffer zone is equal to X Y.

As an example, if the conv_3x3_s1 operator is used to calculate the storage space of the preset buffer, the storage space of the first preset buffer is tile_ow (tile_oh-2), the storage space of the second preset buffer is 2 tile_oh, the storage space of the third preset buffer is (w+tile_ow) 2, and the storage space of the fourth preset buffer is 2 x 2.

Step S220, determining partial data which does not overlap with the input data of the adjacent blocks in the input data of the current block to be processed as data of a first preset type, and storing the data of the first preset type into a first preset buffer zone, wherein the first preset buffer zone corresponds to the first preset type.

Step S230, determining partial data overlapped with the input data of the adjacent blocks in the first direction in the input data of the current block to be processed as data of a second preset type, and storing the data of the second preset type into a second preset buffer zone, wherein the second preset buffer zone corresponds to the second preset type.

Step S240, determining the part of the input data of the current block to be processed, which is overlapped with the input data of the adjacent block in the second direction, as data of a third preset type, and storing the data of the third preset type into a third preset buffer zone, wherein the third preset buffer zone corresponds to the third preset type.

Step S250, determining partial data overlapping with the input data of the adjacent blocks in the first direction and the second direction in the input data of the current block to be processed as data of a fourth preset type, and storing the data of the fourth preset type into a fourth preset buffer area, where the fourth preset buffer area corresponds to the fourth preset type.

The specific description of step S220 to step S250 refer to the aforementioned step S130, and the embodiments of the present application are not repeated here.

The fixed execution sequence does not exist between step S220 and step S250. In some embodiments, steps S220 to S250 may be performed in parallel. In some embodiments, the steps S220 to S250 may be further performed serially according to a preset sequence, where the preset sequence may be determined according to a historical execution time of each step, for example, the preset sequence may be determined according to a historical execution time of each step, and the steps of the historical execution time period may be arranged in front, or the preset sequence may be determined by a developer according to an actual requirement, which is not limited herein.

In some embodiments, if the convolution operation is a multidimensional convolution operation, the second preset types of data of different input channels located at the same plane position can be combined according to the memory bit width to form a group of second preset types of data, and the data volume of the group of second preset types is the same as the data volume corresponding to the memory bit width, so as to reduce the waste of the storage space. Similarly, in some embodiments, the fourth preset type of data of different input channels located at the same plane position may be combined according to the memory bit width to form a set of fourth preset type of data, where the amount of data of the fourth preset type is the same as the amount of data corresponding to the memory bit width. Wherein the same plane position may refer to a coordinate position having the same in the first direction W and the second direction H. The input channel may refer to the number of two-dimensional information input by the feature map in the convolutional neural network, for example, the RGB map inputs 3 two-dimensional information R, G, B, and the input channel includes R, G, B of the 3 channels.

Referring to fig. 8, fig. 8 is a schematic diagram illustrating a storage condition of a set of data of a second preset type according to an exemplary embodiment of the present application. Taking the case of merging the data of the second preset type of the different input channels located at the same plane position as an example, assuming that the operator of Conv_3x3_S1 is adopted to perform convolution operation and a memory with a memory bit width of 128 bits is adopted, the number of the input channels capable of being merged can be calculated to be 128/(2*8) =8. That is, the operator conv_3x3_s1 is adopted to perform convolution operation, and in the case of adopting a memory with a memory bit width of 128 bits, data of a second preset type of 8 input channels located at the same plane position can be combined to form a group of data of the second preset type with the same data amount corresponding to the memory bit width.

In step S260, when performing convolution operation on the current block to be processed, a plurality of data of preset types corresponding to the input data of the current block to be processed are respectively taken out from a plurality of preset buffers, and the taken data are taken as the input data of the convolution operation.

In some embodiments, when performing convolution operation on the current block to be processed, multiple preset types of data corresponding to the input data of the current block to be processed can be respectively taken out from multiple preset buffers, the taken out data is used as the input data of the convolution operation, loading of the data corresponding to the whole memory bit width is not needed, and the input data of the current block to be processed is determined from the loaded data, so that repeated handling of redundant data can be reduced, calculation efficiency is effectively improved, and system power consumption is saved.

According to the data processing method, the input data of the block are divided into the plurality of data of the preset types, the plurality of data of the preset types are respectively stored in the plurality of preset buffer areas, and the data of different preset types can be stored in different buffer areas to realize classified storage, so that the input data of the block can be directly taken out from different buffer areas when the convolution operation is carried out on the subsequent block, the data corresponding to the whole memory bit width is not needed to be loaded, overlapping data between adjacent blocks is extracted from the loaded data, repeated carrying of redundant data can be effectively reduced, the calculation efficiency is improved, and the system power consumption is saved. In addition, by calculating the storage space of each buffer area, the memory can be flexibly allocated to each buffer area, and the waste of the storage space can be reduced. In addition, the waste of the storage space can be reduced by combining the data of the second preset type or the data of the fourth preset type.

Referring to fig. 9, fig. 9 is a block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 300 may be applied to the data processing system 10 shown in fig. 2. The data processing apparatus 300 includes a data acquisition module 310, a data partitioning module 320, and a data storage module 330.

The data acquisition module 310 is configured to acquire input data of a current block to be processed and an adjacent block in the current feature map, where the input data of the current block to be processed is data input to a convolutional neural network when performing a convolutional operation on the current block to be processed, and the input data of the adjacent block is data input to the convolutional neural network when performing a convolutional operation on the adjacent block;

the data dividing module 320 is configured to divide the input data of the current block to be processed into a plurality of types of data according to the overlapping situation of the input data of the current block to be processed and the input data of the adjacent block;

the data storage module 330 is configured to store the plurality of preset types of data into a plurality of preset buffers, where different preset types of data correspond to different preset buffers.

In some embodiments, the to-be-processed partition has a first direction and a second direction perpendicular to each other, and the plurality of preset types includes a first preset type, a second preset type, a third preset type, and a fourth preset type. The data dividing module 320 includes a first data dividing sub-module, a second data dividing sub-module, a third data dividing sub-module, and a fourth data dividing sub-module.

And the first data dividing sub-module is used for determining partial data which does not have data overlap with the input data of the adjacent blocks in the input data of the current block to be processed as the data of the first preset type.

And the second data dividing sub-module is used for determining partial data overlapped with the input data of the adjacent blocks in the first direction in the input data of the current block to be processed as the data of the second preset type.

And the third data dividing sub-module is used for determining the part of the input data of the current to-be-processed block, which is overlapped with the input data of the adjacent block in the second direction, as the data of the third preset type.

And a fourth data dividing sub-module, configured to determine, as the fourth preset type of data, partial data overlapping the input data of the adjacent block in the first direction and the second direction, from the input data of the current block to be processed.

In some embodiments, the plurality of preset buffers includes a first preset buffer, a second preset buffer, a third preset buffer, and a fourth preset buffer. The data storage module 330 includes a first data storage sub-module, a second data storage sub-module, a third data storage sub-module, and a fourth data storage sub-module.

The first data storage sub-module is used for storing the data of the first preset type into the first preset buffer zone, and the first preset buffer zone corresponds to the first preset type.

And the second data storage sub-module is used for storing the data of the second preset type into the second preset buffer zone, and the second preset buffer zone corresponds to the second preset type.

And the third data storage sub-module is used for storing the data of the third preset type into the third preset buffer zone, and the third preset buffer zone corresponds to the third preset type.

And the fourth data storage sub-module is used for storing the data of the fourth preset type into the fourth preset buffer zone, and the fourth preset buffer zone corresponds to the fourth preset type.

In some embodiments, the current feature map has a size w×h, where W is the size of the current feature map in the first direction; h is the dimension of the current feature map in the second direction. The size of the current block to be processed is tile_ow, wherein tile_oh is the size of the current block to be processed in the first direction; tile_oh is the size of the current block to be processed in the second direction. The size of the input data of the current block to be processed is (tile_ow+x, tile_oh+y), wherein (tile_ow+x) is the size of the input data of the current block to be processed in the first direction; (tile_oh+y) being the size of the input data of the current block to be processed in the second direction; x is the difference value between the size of the input data of the current block to be processed and the size of the current block to be processed in the first direction; y is the difference between the size of the input data of the current block to be processed and the size of the current block to be processed in the second direction. The storage space of the first preset buffer zone is tile_ow (tile_oh-Y). The storage space of the second preset buffer zone is x_tile_oh. The storage space of the third preset buffer zone is (w+tile_ow) x Y. The storage space of the fourth preset buffer zone is as follows: x is Y.

In some embodiments, the data obtaining module 310 may be further configured to obtain a size of the current feature map, a size of the current pending block, and a size of input data of the current pending block. The data processing apparatus 300 further comprises a memory space calculation module (not shown in the figures). The storage space calculation module is used for calculating the first preset buffer zone, the second preset buffer zone, the third preset buffer zone and the storage space of the fourth preset buffer zone respectively according to the size of the current feature map and the size of the current to-be-processed block and the size of the input data of the current to-be-processed block.

In some embodiments, the data processing apparatus 300 may further include a convolution operation module (not shown in the figure). The convolution operation module is used for respectively taking out a plurality of preset types of data corresponding to the input data of the current block to be processed from the plurality of preset buffer areas when carrying out convolution operation on the current block to be processed, taking the taken out data as the input data of the convolution operation, and carrying out the convolution operation on the current block to be processed based on the input data.

It will be apparent to those skilled in the art that the data processing apparatus 300 provided in the embodiments of the present application may implement the data processing method provided in the embodiments of the present application. The specific working process of the above device and module may refer to a corresponding process of the data processing method in the embodiment of the present application, which is not described herein again.

In the embodiments provided herein, the modules shown or discussed are coupled, directly coupled, or communicatively coupled to each other via some interfaces, devices, or modules, which may be electrical, mechanical or otherwise.

In addition, each functional module in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in a functional module of software, which is not limited in this embodiment of the present application.

Referring to fig. 10, fig. 10 is a block diagram of an electronic device according to an embodiment of the present application. Electronic device 400 may include one or more of the following components: memory 410, one or more processors 420, and one or more applications, wherein the one or more applications may be stored in memory 410 and configured to, when invoked by the one or more processors 420, cause the one or more processors 420 to perform the above-described data processing methods provided by embodiments of the present application. Wherein memory 410 is the same as memory 11 shown in fig. 2 and processor 420 is the same as processor 12 shown in fig. 1.

Processor 420 may include one or more processing cores. The processor 420 utilizes various interfaces and lines to connect various portions of the overall electronic device 400 for executing or executing instructions, programs, code sets, or instruction sets stored in the memory 410, and for invoking execution or data stored in the memory 410, performing various functions of the electronic device 400, and processing data. Alternatively, the processor 420 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), and editable logic array (Programmable Logic Array, PLA). The processor 420 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU) and a modem. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 420 and may be implemented solely by a single communication chip.

The Memory 410 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Memory 410 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 410 may include a stored program area and a stored data area. The storage program area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the various method embodiments described above, and the like. The storage data area may store data or the like created by the electronic device 400 in use.

Referring to fig. 11, fig. 11 is a block diagram illustrating a computer readable storage medium according to an embodiment of the present application. The computer readable storage medium 500 has stored therein a program code 510, the program code 510 being configured to, when called by a processor, cause the processor to perform the above-described data processing method provided by the embodiments of the present application.

The computer readable storage medium 500 may be an electronic Memory such as a flash Memory, an Electrically erasable programmable read-Only Memory (EEPROM), an erasable programmable read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), a hard disk, or a ROM. Optionally, the computer readable storage medium 500 comprises a Non-volatile computer readable medium (Non-Transitory Computer-Readable Storage Medium, non-TCRSM). The computer readable storage medium 500 has storage space for program code 510 that performs any of the method steps described above. These program code 510 can be read from or written to one or more computer program products. Program code 510 may be compressed in a suitable form.

In summary, the embodiment of the present application provides a data processing method, apparatus, electronic device, and storage medium, by dividing input data of a partition into multiple preset types of data, respectively storing the multiple preset types of data into multiple preset buffers, and storing the different preset types of data into different buffers, so as to implement classified storage, so that the input data of the partition can be directly taken out from the different buffers when a convolution operation is performed on the subsequent partition, without loading data corresponding to the whole memory bit width, and extracting overlapping data between adjacent partitions from the loaded data, thereby effectively reducing repeated handling of redundant data, improving computing efficiency, and saving system power consumption.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of data processing, comprising:

acquiring input data of a current block to be processed and an adjacent block in a current feature map, wherein the input data of the current block to be processed is data input into a convolutional neural network when the convolution operation is carried out on the current block to be processed, and the input data of the adjacent block is data input into the convolutional neural network when the convolution operation is carried out on the adjacent block;

dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block;

and respectively storing the data of the multiple preset types into multiple preset buffer areas, wherein the data of different preset types correspond to different preset buffer areas.

2. The method according to claim 1, wherein the block to be processed has a first direction and a second direction perpendicular to each other, the plurality of preset types includes a first preset type, a second preset type, a third preset type, and a fourth preset type, and dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping situation of the input data of the current block to be processed and the input data of the adjacent block, including:

Determining partial data which does not have data overlap with the input data of the adjacent blocks in the input data of the current block to be processed as the data of the first preset type;

determining partial data overlapped with the input data of the adjacent blocks in the first direction in the input data of the current block to be processed as the data of the second preset type;

determining partial data overlapped with the input data of the adjacent blocks in the second direction in the input data of the current block to be processed as the data of the third preset type;

and determining partial data which are overlapped with the input data of the adjacent blocks in the first direction and the second direction in the input data of the current block to be processed as the data of the fourth preset type.

3. The method of claim 2, wherein the plurality of pre-set buffers comprises a first pre-set buffer, a second pre-set buffer, a third pre-set buffer, and a fourth pre-set buffer, the storing the plurality of pre-set types of data into the plurality of pre-set buffers, respectively, comprising:

Storing the data of the first preset type into a first preset buffer zone, wherein the first preset buffer zone corresponds to the first preset type;

storing the data of the second preset type into a second preset buffer zone, wherein the second preset buffer zone corresponds to the second preset type;

storing the data of the third preset type into a third preset buffer zone, wherein the third preset buffer zone corresponds to the third preset type;

and storing the data of the fourth preset type into a fourth preset buffer zone, wherein the fourth preset buffer zone corresponds to the fourth preset type.

4. A method according to claim 3, wherein the size of the current block to be processed is tile_ow;

the storage space of the first preset buffer zone is as follows: tile_ow (tile_oh-Y), wherein Y is a difference between the size of the input data of the current block to be processed and the size of the current block to be processed in the second direction.

5. The method of claim 4, wherein the storage space of the second preset buffer is: and X is Tile_OH, wherein X is the difference value between the size of the input data of the current block to be processed and the size of the current block to be processed in the first direction.

6. The method of claim 5, wherein the storage space of the third preset buffer is: (w+tile_ow) Y, wherein W is the dimension of the current feature map in the first direction.

7. The method of claim 6, wherein the storage space of the fourth preset buffer is: x is Y.

8. A data processing apparatus, comprising:

the data acquisition module is used for acquiring input data of a current block to be processed and adjacent blocks in a current feature map, wherein the input data of the current block to be processed is data input into a convolutional neural network when the convolution operation is carried out on the current block to be processed, and the input data of the adjacent blocks is data input into the convolutional neural network when the convolution operation is carried out on the adjacent blocks;

the data dividing module is used for dividing the input data of the current block to be processed into a plurality of preset types of data according to the overlapping condition of the input data of the current block to be processed and the input data of the adjacent block;

The data storage module is used for respectively storing the data of the multiple preset types into multiple preset buffer areas, wherein the data of different preset types correspond to different preset buffer areas.

9. An electronic device, comprising:

a memory;

one or more processors;

one or more applications, wherein the one or more applications are stored in the memory and configured to, when invoked by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a program code configured to, when called by a processor, cause the processor to perform the method of any of claims 1-7.