CN110110849B

CN110110849B - Line fixed data stream mapping method based on graph segmentation

Info

Publication number: CN110110849B
Application number: CN201910353373.XA
Authority: CN
Inventors: 张博文; 顾华玺; 王琨; 杨银堂; 姚晰月
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2023-04-07
Anticipated expiration: 2039-04-29
Also published as: CN110110849A

Abstract

The invention discloses a line fixed data stream mapping method based on graph segmentation, which mainly solves the problems of limited application scenes and low utilization rate of a processing array in the conventional line fixed data stream mapping method. The method comprises the following implementation steps: 1. acquiring related parameters of a convolutional neural network convolutional layer and a processing array; 2. generating a mapping map according to the convolutional layer parameters, and determining related parameters of the mapping map; 3. carrying out map segmentation according to the map parameters and the relevant parameters of the processing array; 4. and generating a corresponding data stream mapping according to the graph segmentation result. The invention divides and maps the mapping chart based on the row fixed data stream according to the processing array scale, can map the convolution layer of any scale to the processing array of any scale while keeping the characteristic of high data reusability of the row fixed data stream, has the advantages of high flexibility, strong applicability, high utilization rate of processing units and strong processing performance, and can be used for the convolution neural network to accelerate the data processing process.

Description

Line fixed data stream mapping method based on graph segmentation

Technical Field

The invention belongs to the technical field of communication, and particularly relates to a row fixed data stream mapping method which can be used for accelerating a data processing process by a convolutional neural network.

Background

Neural networks NN are the basis for modern artificial intelligence applications. Since the neural network has made a breakthrough application in speech recognition, image recognition, and the like, the number of application programs using the neural network has sharply increased. These neural networks are widely used in a variety of fields including automotive driving, cancer detection, and complex games. In many fields, neural networks have surpassed human accuracy and greatly improved execution efficiency. The good performance of the neural network is derived from the fact that high-level features can be extracted from raw data after an effective representation of an input space is obtained from a large amount of data by using statistical learning.

As a further development of the neural network, the convolutional neural network CNN can automatically extract data features through a multi-layer neural network, a multi-dimensional convolution operation, and a merging operation of different data paths. Compared with the traditional multilayer neural network, the convolutional neural network can greatly simplify the data processing, so that the convolutional neural network becomes one of the most important tools in the current deep learning. The original convolutional neural network comprises two structures: convolutional layer CONV and pooling layer PL. The identifier of the cells in the convolutional layer as a feature is connected with the cells between different layers by different weights, and the cells sharing the weights provide the necessary nonlinearity through an activation function. While the convolutional layer is used for identifying the features, the pooling layer can merge a plurality of fine features into the same class of features through sampling. These characteristics of convolutional neural networks have made them highly successful in the field of machine learning. With the development of the convolutional neural network, it is one of the important research directions to obtain better neural network performance by further increasing the number of layers of the neural network. The classical structure of deep convolutional neural networks DCNN such as AlexNet, VGG, googleNet, resNet, and SENEt is proposed.

However, the existing deep convolutional neural networks have the defects of high computational complexity, large amount of computational data, strong memory access requirement and high system parallelism requirement, and particularly, a large amount of memory data access requirement is generated in the process of processing corresponding data by a processor, so that the efficiency of processing the deep convolutional neural networks is influenced.

It is worth noting, however, that while deep convolutional neural networks have the above disadvantages, there are a large number of data resources that can be reused in their processing, especially in the processing of convolutional layers, there are a large number of weight/convolutional kernel data, input image data, and parts and data that can be reused. Therefore, the data processing process of the convolution layer can be optimized correspondingly according to the logical operation characteristics of convolution operation and the dependency relationship of data grouping, the possibility of data reuse in the convolution layer processing process of the deep convolution neural network is fully utilized, a special data stream format and a special processing mechanism are designed, the data moving distance and the storage access requirement in the processing process are reduced, and the processing acceleration and the operation efficiency improvement of the deep convolution neural network are realized.

Based on the above thought, yu-Hsin Chen et al proposed a row-fixed RS data stream in its published paper "Eyeris An Energy-Efficient configurable Accelerator for Deep conditional Networks". The line fixed data stream maps a complete line of data in a convolution kernel and a complete line of input image data to corresponding processing units, the processing units select corresponding data elements according to the processing requirements of convolution operation to perform convolution operation and generate partial sum data, and the convolution kernel data and the input image data are repeatedly used in multiple convolution operations. The partial sum data resulting from the convolution operation is shortest-path transmitted between the processing units according to the interdependence relationship and further added in the corresponding processing units. Convolution kernel reuse, input image reuse and partial and data movement minimization are realized in the convolution layer processing process based on the row fixed data stream. The row-fixed data stream is a very efficient data stream to handle convolutional layers in deep convolutional neural networks. However, the existing mapping method based on the row fixed data stream has the minimum requirement on the scale of the processing array in the convolutional neural network processing system, and the mapping method requires that the width of the processing array is not less than the width of the convolutional core in the convolutional layer; for the condition that the processing array width is larger than the convolution kernel width, the existing mapping method cannot fully utilize all processing units, so that the processing units are idle, and the system operation efficiency is low. These problems limit the further application of row-fixed data streams in convolutional neural networks.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a line fixed data stream mapping method based on graph segmentation, so as to realize the high-efficiency mapping from the processing requirements of the convolutional layer of any scale to the processing array of any scale, accelerate the processing speed of the deep convolutional neural network and improve the system operation efficiency.

The technical idea of the invention is as follows: generating a corresponding mapping chart according to the processing mode of the row fixed data stream by considering the size of a convolution kernel in a convolution neural network convolution layer and the size of an input image; the method has the advantages that the problems of the reusability of data in the convolutional neural network processing process, the utilization efficiency of processing units in the processing array and the like are comprehensively considered, the mapping graph is divided according to the scale of the processing array, the high utilization rate of the processing units in the processing array processing process of data stream mapping is realized, and meanwhile, the reusability of the data is fully utilized to reduce the data access pressure and the data movement overhead. The concrete implementation steps comprise:

(1) Obtaining convolutional neural network convolutional layer related parameters and processing array scale S _PE ；

(2) Generating a data stream map based on convolutional layer parameters, the map having a size S _M ：

S _M ＝L _M *W _M Wherein: l is a radical of an alcohol _M For the length of the map generated, W _M Generating a width for the map;

(3) And (3) carrying out map segmentation according to the map parameters and the processing array related parameters:

(3a) Setting the column starting point C =1 of the current map, and scaling the map by S _M And the size S of the processing array _PE By comparison, if S _M ≥S _PE Executing (3 b); if S is _M <S _PE Dividing into 1 complete row at a time, repeating L _M Secondly, finishing the map segmentation;

(3b) And (3) partitioning the mapping map:

(3b1) Simultaneously cut off C at a time _F A complete column and width R _PE The remaining column elements of length 1, wherein C _F ＝S _PE /W _M The number of columns for a complete column of elements that the processing array can process at one time; r _PE ＝S _PE mod W _M Mod is the number of processing units remaining after a complete column of elements is processed once, and is the remainder operation sign in the division;

(3b2) Repeat (3 b 1) total Count = L _M /(C _F + 1) times to obtain the result of the complete column division and the result of the incomplete column division;

(3b3) Judging whether the segmentation can be finished:

if R is _PE >0, then calculate the remaining mapping firstShot size S _MN ＝L _MN *W _MN Wherein: l is _MN = Count is the remaining map length, W _MN ＝W _M -R _PE Is the remaining map width; re-executing (3 b 4);

if R is _PE If =0, ending the map segmentation;

(3b4) The remaining map size S _MN And the size S of the processing array _PE Comparing:

if S is _MN ≥S _PE Then set the column start point to C _N ＝C _F *Count+C _R + C, and returning to (3 b 1) to continue dividing the rest mapping chart;

if S is _MN <S _PE If yes, ending the map segmentation;

(4) Generating a map element according to the map segmentation result in the step (3); and on the principle that each part of processing units can bear the mapping requirements, the processing units in the processing array are divided firstly, and then mapping data streams from the mapping map elements to the corresponding processing units are generated.

Compared with the prior art, the invention has the following advantages:

first, the CONV processing-oriented row-fixed data stream mapping method based on graph partitioning is adopted for CONV processing acceleration, and convolutional layers of any scale can be mapped into processing arrays of any scale, so that the method has the advantages of high flexibility and strong applicability.

Secondly, the invention adopts a mapping chart segmentation method to segment and map the mapping chart based on the row fixed data stream according to the processing array scale, and fully utilizes the processing unit resources in the processing array while keeping the characteristic of high data reusability of the row fixed data stream, so that the invention has high processing unit utilization rate and more efficient processing performance.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a row-fixed data flow map of the first convolutional layer CONV-1 of the convolutional neural network AlexNet of the present invention;

FIG. 3 is a schematic diagram of AlexNet CONV-1 to 4 x 4 processing array data stream map segmentation in accordance with the present invention;

FIG. 4 is a schematic of a first segmentation of the AlexNet CONV-1 data flow map into a 4 x 4 processing array map according to the present invention;

FIG. 5 is a schematic of a second segmentation of the AlexNet CONV-1 data flow map into a 4 x 4 processing array map according to the present invention;

FIG. 6 is a schematic of a third segmentation of the AlexNet CONV-1 data flow map into a 4 x 4 processing array map in accordance with the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments.

Referring to fig. 1, the specific steps of this embodiment are as follows:

step 1, obtaining related parameters of a convolutional neural network convolutional layer and a processing array.

The convolutional layer parameters comprise:

convolution kernel scale: s _F *S _F Wherein: the length and width of the convolution kernel are both S _F ；

Input image size: s _I *S _I Wherein: input image length and width are both S _I ；

Convolution step size: l;

taking the CONV-1 layer of the convolutional neural network AlexNet as an example, the relevant parameters of the convolutional layer are as follows:

S _F ＝11；

S _I ＝227；

L＝4。

the processing array parameters comprise:

processing the length of the array: l is a radical of an alcohol _PE ；

Processing the array width: w is a group of _PE 。

Scale of the treatment array: s. the _PE ＝L _PE *W _PE ；

Taking a 4 × 4 grid-like processing array as an example, the relevant parameters are:

L _PE ＝4；

W _PE ＝4；

S _PE ＝16。

step 2, generating a data stream mapping chart according to the convolutional layer parameters, and determining the related parameters of the mapping chart:

generating the length L by convolution operation according to the convolution kernel, the input image and the convolution step length related parameters _M ＝(S _I -S _F ) L +1, width W _M ＝S _F A map of (2);

the size of the map is S _M ＝L _M *W _M In which is included S _M Each mapping element comprises convolution kernel line data and input image line data;

multiplying and adding convolution kernel elements to input image elements may generate partial sum data, i.e. each mapping element in the map yields L _M A portion and data.

Take CONV-1 layer of AlexNet as an example of the convolutional neural network, L in the mapping chart _M ＝(227-11)/4+1＝55，W _M ＝11，S _M =55 × 11=605. The map includes 605 map elements, each of which generates 55 parts and data in total, as shown in fig. 2.

And 3, dividing the mapping map according to the mapping map parameters and the processing array related parameters.

3a) Setting the column starting point C =1 of the current map, and scaling the map by S _M And the size S of the processing array _PE By comparison, if S _M ≥S _PE Indicating that the processing array cannot process the map at one time, and therefore go to step 3b; if S is _M <S _PE Dividing into 1 complete row at a time, repeating L _M Secondly, finishing the map segmentation;

3b) And (3) partitioning the mapping map:

3b1) Two columns of elements are partitioned simultaneously in the map:

column count C of the array of compute processors that can process a complete column of elements in the map at one time _F ＝S _PE /W _M ；

ComputingNumber of processing units R remaining after processing a complete column of elements in a processing array at a time _PE ＝S _PE mod W _M Wherein mod is a remainder operation symbol in the division;

in the mapping chart, C is divided out simultaneously according to the calculated parameters _F A complete column and width of R _PE Incomplete column elements of length 1;

3b2) For the length L _M According to step 3b 1), repeatedly dividing Count = L _M /(C _F + 1) times, the number of remaining complete columns C in the map is calculated _R ＝L _M mod(C _F + 1) to obtain the results of a complete column split and a non-complete column split:

the results of the complete column are: c column to C _F *Count+C _R + C-1 column, width of column W _M ；

The results of the incomplete column are: l th _M the-Count +1 column to the Lth _M Columns of width R _PE ；

3b3) Judging whether the segmentation can be finished:

if R is _PE >0, indicating that there are still residual mapping elements in the mapping map without being divided, calculating the length L of the residual mapping map _MN = Count, remaining map width W _MN ＝W _M -R _PE Residual size S _MN ＝L _MN *W _MN (ii) a Then step 3b4 is executed;

if R is _PE =0, the map segmentation is ended;

3b4) Scale S of the remaining map _MN And the size S of the processing array _PE Comparing:

if S is _MN ≥S _PE If the processing array cannot process the remaining maps at one time, the starting point of the map column is set to C _N ＝C _F *Count+C _R + C, length set to L _MN Width is set as W _MN Scale is set to S _MN Returning to the step 3b 1) to continue segmenting the residual mapping chart;

if S is _MN <S _PE Then map segmentation is ended.

Taking the grid-shaped processing array with the CONV-1 layer of the convolutional neural network AlexNet mapped to 4 × 4 as an example, the segmentation parameters and results are as follows:

first segmentation map:

the column starting point C =1 of the map, and the length L of the map _M =55, width W _M =11, scale S _M =605; size of processing array S _PE =16; 1 complete column and non-complete column elements with the width of 5 and the length of 1 are simultaneously segmented at one time;

after repeating the segmentation 27 times, the complete column results were obtained: column 1 to column 28, the width of the columns being 11; results for the incomplete column were obtained: column 29 to column 55, the width of the columns being 5;

the division result is shown by the diagonal lines and the cross-lined rectangular portions in fig. 3;

second segmentation residual map:

the column starting point C =29 of the mapping chart and the length L of the mapping chart _M =27, width W _M =6, scale S _M =162; size of processing array S _PE =16; 2 complete columns and non-complete column elements with the width of 4 and the length of 1 are simultaneously segmented at one time;

after 9 repeated divisions, the complete column results were obtained as: 29 th to 46 th columns, the width of which is 6; results for the incomplete column were obtained: column 47 to column 55, the width of the columns being 4;

the segmentation result is shown by the horizontal and vertical line rectangles in fig. 3;

third segmentation the remaining maps after the second segmentation:

the column starting point C =47 of the mapping chart and the length L of the mapping chart _M =9, width W _M =2, scale S _M =18; size of processing array S _PE =16; simultaneously segmenting 8 complete columns and incomplete column elements with the width of 0 and the length of 1 at one time;

after the remaining maps are divided 1 time, the complete column results are obtained as follows: column 47 to column 55, the width of the columns being 2; results for the incomplete column were obtained: column 55 to column 55, the width of the columns being 0;

the result of this division is shown by the dotted rectangular portion in fig. 3.

And 4, generating a data stream from the mapping to the processing unit.

Generating a map element according to the map segmentation result in the step (3);

and dividing the processing units in the processing array on the principle that each part of the processing units can bear the mapping requirements to generate a data stream from the mapping map to the processing units.

Take the grid-like processing array with the CONV-1 layer of the convolutional neural network AlexNet mapped to 4 x 4 as an example:

when the mapping chart is divided for the first time, 1 complete column with the width of 11 and 1 incomplete column element with the width of 5 are divided at the same time; thus, according to this division method, the processing units are divided such that 11 processing units receive a mapping of 1 complete column and the remaining 5 processing units receive a mapping of 1 incomplete column, as shown in fig. 4.

When the residual mapping chart is segmented for the second time, 2 complete columns with the width of 6 and 1 incomplete column element with the width of 4 are segmented at the same time; thus, according to this division method, the processing units are divided such that 12 processing units receive the mapping of 2 complete columns, respectively, and the remaining 4 processing units receive the mapping of 1 incomplete column, as shown in fig. 5.

When the map is divided for the second time for the third time, 8 complete columns with the width of 2 and 1 incomplete column element with the width of 0 are divided at the same time; therefore, according to this division method, the processing units are divided so that 16 processing units are mapped to 8 complete columns, respectively, as shown in fig. 6.

After the processing units are divided, the data in the map elements are sent to corresponding processing arrays for processing according to the corresponding relation between the segmentation elements in the map and the processing units, and a mapping data stream from the map to the corresponding processing units is generated.

The above description is only a specific example of the present invention and is not intended to limit the modifications of the present invention. It will be apparent to persons skilled in the relevant art(s) that, having the benefit of this disclosure and the teaching hereof, numerous modifications and variations in form and detail can be made without departing from the principles and structures described herein, but such modifications and variations are within the scope of the invention as defined by the appended claims.

Claims

1. A line fixed data stream mapping method based on graph segmentation is characterized by comprising the following steps:

(2) Generating a map with the size of S according to the convolutional layer parameters _M ：

S _M ＝L _M *W _M Wherein: l is a radical of an alcohol _M For the generated length of the map, W _M Generating a width for the map;

(3a) Setting the column starting point C =1 of the current mapping chart, and scaling the mapping chart by S _M And the size S of the processing array _PE By comparison, if S _M ≥S _PE Executing (3 b); if S is _M ＜S _PE Dividing into 1 complete row at a time, repeating L _M Secondly, finishing the map segmentation;

(3b) And (3) partitioning the mapping map:

(3b1) Simultaneously cut off C at a time _F A complete column and width of R _PE The remaining column elements of length 1, wherein C _F ＝S _PE /W _M The number of columns of full column elements that can be processed at one time for processing the array; r is _PE ＝S _PE mod W _M Mod is the number of processing units remaining after a complete column of elements is processed once, and is the remainder operation sign in the division;

(3b3) Judging whether the segmentation can be finished:

if R is _PE If the residual mapping graph size is larger than 0, the residual mapping graph size S is calculated first _MN ＝L _MN *W _MN Wherein: l is a radical of an alcohol _MN = Count is the remaining map length, W _MN ＝W _M -R _PE Is the remaining map width; re-executing (3 b 4);

if R is _PE If =0, ending the map segmentation;

(3b4) Scale S of the remaining map _MN And the size S of the processing array _PE Comparing:

if S is _MN ≥S _PE Then set the column start point to C _N ＝C _F *Count+C _R + C, and returning to (3 b 1) to continue dividing the rest mapping chart; c _R ＝L _M mod(C _F + 1) represents the number of remaining complete columns in the map after Count division;

if S is _MN ＜S _PE If yes, ending the map segmentation;

(4) Generating a map element according to the map segmentation result in the step (3); and dividing the processing units in the processing array on the principle that each part of the processing units can bear the mapping requirement, and generating a data stream from the mapping map to the processing units.

2. The method of claim 1, wherein the parameters associated with convolutional neural network convolutional layers in (1) comprise:

scale of convolution kernel: s. the _F *S _F Length and width are both S _F ；

Scale of input image: s. the _I *S _I Length and width are both S _I ；

Convolution step size: and L.

3. The method of claim 1, wherein the array size S is processed in (1) _PE Expressed as follows:

S _PE ＝L _PE *W _PE ，

wherein: l is _PE To handle array length; w is a group of _PE To address the array width.

4. The method of claim 1, wherein in (2) the data stream map is generated based on convolutional layer parameters, which are as follows:

generation length of the map: l is _M ＝(S _I -S _F ) L +1, wherein S _I For the length of the input image in the convolutional layer, S _F Is the length of the convolution kernel in the convolution layer, and L is the convolution step length of the convolution layer;

generation width of map: w _M ＝S _F Wherein S is _F Is the width of the convolution kernel in the convolution layer.

5. The method of claim 1, wherein the results of the complete column segmentation and the results of the incomplete column segmentation in (3 b 2) are obtained as follows:

the results of the complete column are: c column to C _F *Count+C _R + C-1 column, width of column W _M ，C _R ＝L _M mod(C _F + 1) represents the number of remaining complete columns in the map after Count division;

the results of the incomplete column are: l th _M the-Count +1 column to the Lth _M Rows of width R _PE 。