CN108205703A

CN108205703A - Multi-input multi-output matrix average value pooling vectorization implementation method

Info

Publication number: CN108205703A
Application number: CN201711478728.5A
Authority: CN
Inventors: 郭阳; 张军阳; 杨超; 田希; 扈啸; 李斌
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2018-06-26
Anticipated expiration: 2037-12-29
Also published as: CN108205703B

Abstract

A vectorization implementation method for average pooling of multiple input multiple output matrices includes the steps: s1: determining the number of input feature graphs simultaneously calculated by a single core of the vector processor according to parameters such as the number M of vector processing units (VPE) in the vector processor; s2: sorting the input feature graphs according to a third dimension; s3: finishing the sequencing of all the input feature graphs; s4: transmitting the sequenced input characteristic diagram to an AM in a vector processor core from the DMA; s5: vector loading and accumulating with the next data line in sequence to obtain an average pooling result of the pooling windows of the corresponding positions of the input feature map; s6: according to the horizontal moving step length, moving to the next pooling window; s7: repeating steps S5-S6; s8: and repeating the steps S6-S7N/M times, and finally completing the average value pooling operation of the N input feature maps. The method has the advantages of simple implementation, convenient operation, capability of improving the parallelism of the multi-core vector processor, capability of improving the operation efficiency of the processor and the like.

Description

Multiple-input and multiple-output matrix average value pond vectorization implementation method

Technical field

Present invention relates generally to deep learning, convolutional neural networks fields, refer in particular to a kind of multiple-input and multiple-output matrix and put down Mean value pond vectorization implementation method.

Background technology

Convolutional neural networks are the most a kind of neural network models applied in current depth learning algorithm model, simultaneously It is also a kind of best model of discrimination.Matrix convolution, activation primitive, maximum value pond are generally comprised in convolutional neural networks model Change or average value pond, local linear normalization operation etc..

Pond layer is located at after convolutional layer, generally by convolutional layer obtain feature after, it is desirable to taken and done using these features Classification theoretically, can use all obtained features of extracting to take trained grader, but can be faced in this way from huge calculation amount On challenge.Assuming that there are one the input picture of 96 × 96 pixels, learn to obtain 400 spies being defined in 8 × 8 inputs The input convolution that sign, each feature and input picture convolution can obtain one (96-8+1) × (96-8+1)=7921 dimension is special Sign.Due to there is 400 features, thus each sample can obtain the convolution feature of 89 × 89 × 400=3168400 dimension to Amount, and learn so large-scale grader and be susceptible to over-fitting.

And pondization operation is that a kind of important method of dimensionality reduction is carried out to convolution feature vector, it can be by calculating image one The average value (or maximum value) of some special characteristic on region, these summary statistics features not only have very low dimension, together When can also improve as a result, being less prone to over-fitting.

In addition pondization operation also has translation invariant shape, i.e. image can still generate phase after a small translation Same pond feature, and this characteristic has important application prospect in fields such as object detection, image identification, speech recognitions, For example, when a MNIST data set number is handled, it to the left or right side translation, then no matter final position Where, it can all it is expected that your grader remains able to accurately be classified as identical number.

Due in convolutional neural networks, there is multiple input characteristic pattern and multiple output characteristic patterns, just have accordingly more A input feature vector figure for carrying out average value pond and multiple output characteristic patterns, and how to maximize the calculating in average value pond Journey and an important research contents.

As shown in Figure 1, vector processor is a kind of novel architecture, the scalar including carrying out scalar operation handles single First (SPU) and the vector processing unit (VPU) for carrying out vector operation can give full play to vectorial place by the division of rational task Manage the calculating advantage of device.

Invention content

The technical problem to be solved in the present invention is that：For technical problem of the existing technology, the present invention provides one Kind realize be simple and convenient to operate, multinuclear vector processor concurrency can be improved, can improve processor operation efficiency multi input it is more Output matrix average value pond vectorization implementation method.

In order to solve the above technical problems, the present invention uses following technical scheme：

A kind of vectorization implementation method in multiple-input and multiple-output matrix average value pond, step are：

S1：According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size for n × N, the moving step length in average value pond is s, pond window k, determines vector processor monokaryon while the input feature vector figure number calculated Amount；

S2：M input feature vector figure is ranked up according to the third dimension；

S3：Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures；

S4：By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core；

S5：The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and The accumulation result with 1/k2 is multiplied, while show that M input feature vector figure corresponding position pond window is being averaged for k × k Pond result；

S6：According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M The average value pond result of the 2nd k × k ponds window of input feature vector figure；

S7：Step S5-S6 is repeated until completing the average value pondization behaviour of entire all pond windows of M input feature vector figure Make；

S8：It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.

As a further improvement on the present invention：In the step S1, the quantity N of input feature vector figure is much larger than Vector Processing list The quantity M and N of first VPE is the integral multiple of M.

As a further improvement on the present invention：In the step S1, input feature vector figure is a three-dimensional matrice, i.e., highly, Width, quantity.

As a further improvement on the present invention：Three-dimensional matrice in the step S1 is square formation, i.e. height is equal with width.

As a further improvement on the present invention：The moving step length in average value pond, which is divided into, in the step S1 moves horizontally step It grows and vertically moves step-length.

As a further improvement on the present invention：Take that move horizontally step-length identical with step-length is vertically moved in the step S1.

As a further improvement on the present invention：Pond size in the step (1) takes square formation, i.e. maximum value pond window For k × k.

As a further improvement on the present invention：In the step S1, if the quantity of input feature vector figure is not the whole of VPE quantity Several times, then extra input feature vector figure is by several VPE processing.

As a further improvement on the present invention：In the step S6, input feature vector figure takes one corresponding with k × k big every time Small pond window, and the window is first moved horizontally according to moving step length, after vertically move, mobile sequence is, from left to right, from Top to bottm.

Compared with prior art, the advantage of the invention is that：

1st, multiple-input and multiple-output matrix average value pond vectorization implementation method of the invention, can make full use of vectorial place The design feature of the more PE of device is managed, the calculating in the multi input matrix average value pond for being not easy vectorization operation is converted into being easy to vector Change the process calculated.

2nd, it present method avoids shuffling between vector processor difference PE, stipulations operation, effectively raises more The average value pond computational efficiency of input matrix convolution, these advantages cause the method for the present invention to realize simple, easy to operate, energy Enough instructions for fully excavating vector processor, data, task dispatching concurrency at all levels, so as to give full play to more PE operational parts Possessed by part vector processor the advantages of high-performance calculation ability.

Description of the drawings

Fig. 1 is the general structure schematic diagram of vector processor.

Fig. 2 is 3 × 3 average value pond schematic diagrames.

Fig. 3 is the average value pond schematic diagram of M output characteristic pattern of M input feature vector figure.

Fig. 4 is the average value pond schematic diagram of 16 input feature vector figures progress pond window 3 × 3 in present example.

Fig. 5 is the journey schematic diagram of the method for the present invention

Specific embodiment

It is real by multiple-input and multiple-output matrix average value pond vectorization implementation method combination Figure of description and specifically below Example is applied to be described in further details the present invention.

As shown in figure 5, the vectorization implementation method in the multiple-input and multiple-output matrix average value pond of the present invention, flow are：

S1：According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size for n × N, the moving step length in average value pond is s, and pond window k determines the input feature vector that vector processor monokaryon can calculate simultaneously Figure quantity；

S5：The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and The accumulation result is multiplied with 1/k2, it is making even for k × k that can obtain M input feature vector figure corresponding position pond window simultaneously Mean value pond result；

S7：Step S5-S6 is repeated until completing the average value pondization behaviour of entire all pond windows of M input feature vector figure Make.

In concrete application example, in above-mentioned steps S1, the quantity N for typically entering characteristic pattern is much larger than vector processing unit The quantity M and General N of VPE is the integral multiple of M.

In concrete application example, in above-mentioned steps S1, input feature vector figure is a three-dimensional matrice, i.e. height, width, number Amount, generally takes square formation here, i.e., height is equal with width.

In concrete application example, the moving step length in above-mentioned steps S1 average values pond, which is divided into, moves horizontally step-length and vertical Moving step length, generally takes that move horizontally step-length identical with step-length is vertically moved.

Step (1) pond size generally takes square formation, i.e. maximum value pond window is k × k.

In concrete application example, in above-mentioned steps S1, if the quantity of input feature vector figure is not the integral multiple of VPE quantity, Then extra input feature vector figure is by several VPE processing.

Referring to shown in Fig. 2-Fig. 4, in a concrete application example, the quantity M for taking VPE is 16, input feature vector figure quantity N It is 16, input feature vector figure size is 6 × 6, and pond window moving step length s is 1, and pond window size k is 3.16 inputs are special Sign figure is ranked up in third dimension, and the result after sequence is transmitted in core memory space AM.In vector loading AM The 1st row, added up successively follow-up each row using multiply-add instruction, adding up 3 × 3 amounts to 9 times, accumulation result finally is multiplied by 1/9, together When complete 16 output characteristic patterns corresponds to pond window average value pondization operate.According to moving horizontally step-length, pond window water It is flat to move to right 1, repeat the average value pondization operation that step obtains 16 output second corresponding pond windows of characteristic pattern simultaneously.Directly To the calculating for completing all 16 input feature vectors figure average value ponds window.

In conclusion the architecture feature based on vector processor realized by the present invention and input feature vector figure Quantity and scale determine the implementation method of optimal multi output characteristic pattern, are effectively improved the parallelization behaviour of vector processor Make, different input feature vector figures transferred to different PE processing, completely without operation associated between PE, in general have more A few PE can calculate how many a input feature vector figures simultaneously, these advantages cause the method for the present invention to realize simple, operation side Just, the concurrency at all levels of vector processor can be fully excavated, so as to give full play to more PE arithmetic units Vector Processings Possessed by device the advantages of high-performance calculation ability.

The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art For those of ordinary skill, several improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention Range.

Claims

1. a kind of vectorization implementation method in multiple-input and multiple-output matrix average value pond, which is characterized in that step is：

S1：According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size is n × n, is put down The moving step length in mean value pond is s, pond window k, determines vector processor monokaryon while the input feature vector figure quantity calculated；

S5：The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and should Accumulation result is multiplied with 1/k2, while show that M input feature vector figure corresponding position pond window is averaged pond for k × k As a result；

S6：According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M input The average value pond result of the 2nd k × k ponds window of characteristic pattern；

S7：Step S5-S6 is repeated until completing the average value pondization operation of entire all pond windows of M input feature vector figure；

2. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 1, feature exist In, in the step S1, quantity M and N integral multiples for M of the quantity N much larger than vector processing unit VPE of input feature vector figure.

3. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 1, feature exist In in the step S1, input feature vector figure is a three-dimensional matrice, i.e. height, width, quantity.

4. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 3, feature exist In the three-dimensional matrice in the step S1 is square formation, i.e. height is equal with width.

5. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that the moving step length in average value pond, which is divided into, in the step S1 moves horizontally step-length and vertically move step-length.

6. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 5, feature exist In taking that move horizontally step-length identical with step-length is vertically moved in the step S1.

7. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that the pond size in the step (1) takes square formation, i.e. maximum value pond window is k × k.

8. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that in the step S1, if the quantity of input feature vector figure is not the integral multiple of VPE quantity, extra input Characteristic pattern is by several VPE processing.

9. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that in the step S6, input feature vector figure takes the pond window of a size corresponding with k × k every time, and should Window is first moved horizontally according to moving step length, after vertically move, mobile sequence is, from left to right, from top to bottom.