CN106991473A

CN106991473A - The average value value pond method for parallel processing based on SIMD of vector processor-oriented

Info

Publication number: CN106991473A
Application number: CN201710202133.0A
Authority: CN
Inventors: 郭阳; 张军阳; 扈啸; 王慧丽; 胡敏慧; 王子聪
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2017-07-28

Abstract

A kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, its step is：S1：Set pond matrix and pond window；S2：According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, before drawing the corresponding row of k row elements and；S3：Configuration shuffle mode is simultaneously shuffled；S4：The result obtained in step S3 correspondence is added；S5：Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE；S6：Instructed using vectorial VMOVI by immediateA vector registor is assigned to, and this vector registor is multiplied with cumulative and one-to-one corresponding；S7：The final result vector for drawing p/k average value pond；S8：Along the k+1 rows for moving to pond matrix A, all subgraphs of the S2 to step S7 that repeats the above steps up to having traveled through pond matrix A obtain average value pond matrix of consequence.The present invention has the advantages that principle is simple, realization facilitates, can efficiently calculate, shorten the calculating time.

Description

The average value value pond method for parallel processing based on SIMD of vector processor-oriented

Technical field

Present invention relates generally to convolutional neural networks technical field, refer in particular to a kind of vector processor-oriented based on SIMD Average value value pond method for parallel processing.

Background technology

In the 1960s, Hubel and Wiesel is used for the god of local sensitivity and set direction in research cat cortex Find that its unique network structure can be effectively reduced the complexity of Feedback Neural Network during through member, then propose convolution god Through network (ConvolutionalNeuralNetwork, CNN).Currently, convolutional neural networks have become numerous ambits One of study hotspot,, can because the network avoids the complicated early stage pretreatment to image particularly in pattern classification field To directly input original image, thus obtain more being widely applied.

Usually, one be used for recognize convolutional neural networks computation model include convolutional layer, pond layer, full articulamentum with And follow-up grader.Convolutional layer extracts the local feature of last layer image by using the convolution kernel of different scales, once After the local feature is extracted, its position relationship between further feature is also decided therewith；Then, by seeking local average (also referred to as pondization operation) carries out Feature Mapping, obtains the characteristic information after dimensionality reduction, this feature information is output to next convolutional layer Proceed corresponding processing, until reaching last layer (output layer), so as to obtain final output result.

The pondization operation used in the convolutional neural networks model of current main-stream is mainly average value pond (Average Pooling) and maximum pond (Max Pooling), average value pondization and maximum pond the pond method that to be two kinds different, But be substantially provided to reduce the dimension of large-size images, amount of calculation is reduced, therefore occupied again in convolutional neural networks The position wanted.Average value pond is to take the average value in certain size pixel, such as, n × n is small-scale greatly, and maximum pond is Take the maximum in certain size pixel.In view of pondization operates the important function in convolutional neural networks, therefore in amount of calculation Accelerate the concurrency of pondization operation particularly significant in very huge convolutional neural networks model.

The content of the invention

The technical problem to be solved in the present invention is that：The technical problem existed for prior art, the present invention provides one Kind principle is simple, realization facilitates, can efficiently calculated, can fully excavate the multi-level parallelisms of vector processor to play vectorial place Computation capability, the average value value pondization based on SIMD for the vector processor-oriented for shortening the calculating time for managing device are located parallel Reason method.

In order to solve the above technical problems, the present invention uses following technical scheme：

A kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, its step is：

S1：If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, pond window Size is k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple；

S2：According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, draw preceding k Row element it is corresponding row and；

S3：Configuration shuffle mode is simultaneously shuffled；

S4：The result obtained in step S3 correspondence is added；

S5：Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE；

S6：Instructed using vectorial VMOVI by immediateIt is assigned to a vector registor, and by this vector registor It is multiplied with the cumulative and one-to-one corresponding in step S5；

S7：The final result vector for drawing p/k average value pond；

S8：Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through pond matrix A All subgraphs, all sizes for obtaining pond matrix A are k × k average value pond matrix of consequence.

As a further improvement on the present invention：The detailed process of the step S3 is：

S301：If k is even number, preceding k/2 element plain per constituent element in p/k groups in step S2 is put together, often The rear k/2 element of group is put together；

S302：If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, vector processing unit Number is 12.

As a further improvement on the present invention：An average value pond result c in the step S8_0,0Calculation formula ForWherein c_0,0For first element in the matrix of consequence of average value pond, k is the chi of pond window Very little, in convolutional neural networks, pond window is square formation, a_i,jTo need to carry out the member in the pond matrix A in average value pond Element.

As a further improvement on the present invention：The size of definition pond window is sizeX, two adjacent pool windows Horizontal displacement or vertical displacement be stride, average value pondization operate in pond window it is not overlapping, i.e. sizeX=stride.

As a further improvement on the present invention：The pond matrix A and pond window size k are square formation.

Compared with prior art, the advantage of the invention is that：Being averaged based on SIMD of vector processor-oriented of the present invention Value value pond method for parallel processing, realizes simple, with low cost and easy to operate, and it shuffles net using the intersection of vector processor Network substitutes the quicksort that reduction operation completes data between VPE, and once-through operation can calculate p/k average value pondization knot simultaneously Really, the Parallel Computing Performance of all processing units of vector processor can be given full play to, shortens average in convolutional neural networks It is worth the calculating time in pond, good reliability simultaneously can reduce hardware calculating power consumption.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the present invention.

Fig. 2 is the general structure schematic diagram of vector processor.

Fig. 3 is the schematic diagram of present invention average value pond process in concrete application example.

Fig. 4 is that present invention window k in pond in concrete application example is the calculation procedure signal in 2 specific embodiment Figure.

Fig. 5 is the signal of present invention pond window not overlap operation in the operation of average value pondization in concrete application example Figure.

Fig. 6 is the average value pond calculation process schematic diagram that present invention pond window in concrete application example is 3 × 3.

Embodiment

The present invention is described in further details below with reference to Figure of description and specific embodiment.

As shown in figure 1, a kind of average value value pond parallel processing based on SIMD of vector processor-oriented of the present invention Method, its step is：

S1：If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, pond window Size is k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple；In actual convolution god Through in network model, general pond matrix A and pond window size k are square formation；

S3：Configuration shuffle mode is simultaneously shuffled；

S4：The result obtained in step S3 correspondence is added；

S7：The final result vector for drawing p/k average value pond；

S8：Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through pond matrix A All subgraphs, all sizes that can obtain pond matrix A are k × k average value pond matrix of consequence.

The present invention is mainly suitable for vector processor, as shown in Fig. 2 being the general structure schematic diagram of vector processor. In concrete application example, above-mentioned steps S3 detailed process is：

S301：If k is even number, preceding k/2 element plain per constituent element in p/k groups in step 2 is put together, every group Rear k/2 element put together；

S302：If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, vector processing unit Number is 12, and calculating process is as shown in Figure 6.

An average value pond result c in the step S8_0,0Calculation formula beWherein c_0,0For first element in the matrix of consequence of average value pond, k is the size of pond window, in convolutional neural networks, Chi Hua Window is generally square formation, a_i,jTo need to carry out the element in the pond matrix A in average value pond, its average value pond flow is shown It is intended to as shown in Figure 3.

As shown in figure 5, the size for defining pond window is sizeX, the horizontal displacement or vertical of two adjacent pool windows Displacement is stride, and pond window is not overlapping in the operation of average value pondization, i.e. sizeX=stride.

As shown in figure 4, the present invention is in a concrete application example, pond window k is 2, its detailed step：

S101：If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is 16 × 16, Chi Huachi Very little is 2 × 2, and the number p of vector processing unit is 16；

S102：It is 2 according to pond size k, takes preceding 2 row element of pond matrix A to carry out corresponding accumulation operations, before drawing 2 row elements it is corresponding row and；

S103：Shuffle mode is configured, preceding 1 element of 8 constituent elements element in step S102 is put together, rear the 1 of every group Individual element is put together；

S104：The result obtained in step S103 correspondence is added, until the numerical value reduction of every constituent element element is into a VPE Untill；

S105：Instructed using vectorial VMOVI by immediateBe assigned to a vector registor, and by this register with Multiplication is added up and does in step S104；

S106：The final result vector for drawing 8 average value ponds；

S107：Along 3 rows for moving to pond matrix A, the S102 to step S106 that repeats the above steps is until traveled through pond square Battle array A all subgraphs, can calculate the average value pond matrix of consequence for obtaining all sizes of pond matrix A for k × k.

It the above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention Scope.

Claims

1. a kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, it is characterised in that step For：

S1：If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, the size of pond window For k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple；

S2：According to pond window size k, the preceding k row elements of pond matrix A are taken to carry out corresponding accumulation operations, k rows member before drawing Element it is corresponding row and；

S3：Configuration shuffle mode is simultaneously shuffled；

S4：The result obtained in step S3 correspondence is added；

S6：Instructed using vectorial VMOVI by immediateA vector registor is assigned to, and by this vector registor with walking Adding up in rapid S5 is multiplied with one-to-one corresponding；

S7：The final result vector for drawing p/k average value pond；

S8：Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through all of pond matrix A Subgraph, all sizes for obtaining pond matrix A are k × k average value pond matrix of consequence.

2. the average value value pond method for parallel processing based on SIMD of vector processor-oriented according to claim 1, Characterized in that, the detailed process of the step S3 is：

S301：If k is even number, preceding k/2 element plain per constituent element in p/k groups in step S2 is put together, every group K/2 element is put together afterwards；

S302：If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, the number of vector processing unit For 12.

3. the average value value pond method for parallel processing based on SIMD of vector processor-oriented according to claim 1, Characterized in that, an average value pond result c in the step S8_0,0Calculation formula beIts Middle c_0,0For first element in the matrix of consequence of average value pond, k is the size of pond window, in convolutional neural networks, pond Change window is square formation, a_i,jTo need to carry out the element in the pond matrix A in average value pond.

4. the average value value pond parallel processing based on SIMD of the vector processor-oriented according to claim 1 or 2 or 3 Method, it is characterised in that the size of definition pond window is sizeX, the horizontal displacement or perpendicular of two adjacent pool windows Straight displacement is stride, and pond window is not overlapping in the operation of average value pondization, i.e. sizeX=stride.

5. the average value value pond parallel processing based on SIMD of the vector processor-oriented according to claim 1 or 2 or 3 Method, it is characterised in that the pond matrix A and pond window size k are square formation.