CN106991473A - The average value value pond method for parallel processing based on SIMD of vector processor-oriented - Google Patents
The average value value pond method for parallel processing based on SIMD of vector processor-oriented Download PDFInfo
- Publication number
- CN106991473A CN106991473A CN201710202133.0A CN201710202133A CN106991473A CN 106991473 A CN106991473 A CN 106991473A CN 201710202133 A CN201710202133 A CN 201710202133A CN 106991473 A CN106991473 A CN 106991473A
- Authority
- CN
- China
- Prior art keywords
- pond
- average value
- matrix
- vector
- window
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000012545 processing Methods 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 37
- 239000000470 constituent Substances 0.000 claims abstract description 9
- 230000009467 reduction Effects 0.000 claims abstract description 7
- 238000009825 accumulation Methods 0.000 claims abstract description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 2
- 230000006872 improvement Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000011176 pooling Methods 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
A kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, its step is:S1:Set pond matrix and pond window;S2:According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, before drawing the corresponding row of k row elements and;S3:Configuration shuffle mode is simultaneously shuffled;S4:The result obtained in step S3 correspondence is added;S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;S6:Instructed using vectorial VMOVI by immediateA vector registor is assigned to, and this vector registor is multiplied with cumulative and one-to-one corresponding;S7:The final result vector for drawing p/k average value pond;S8:Along the k+1 rows for moving to pond matrix A, all subgraphs of the S2 to step S7 that repeats the above steps up to having traveled through pond matrix A obtain average value pond matrix of consequence.The present invention has the advantages that principle is simple, realization facilitates, can efficiently calculate, shorten the calculating time.
Description
Technical field
Present invention relates generally to convolutional neural networks technical field, refer in particular to a kind of vector processor-oriented based on SIMD
Average value value pond method for parallel processing.
Background technology
In the 1960s, Hubel and Wiesel is used for the god of local sensitivity and set direction in research cat cortex
Find that its unique network structure can be effectively reduced the complexity of Feedback Neural Network during through member, then propose convolution god
Through network (ConvolutionalNeuralNetwork, CNN).Currently, convolutional neural networks have become numerous ambits
One of study hotspot,, can because the network avoids the complicated early stage pretreatment to image particularly in pattern classification field
To directly input original image, thus obtain more being widely applied.
Usually, one be used for recognize convolutional neural networks computation model include convolutional layer, pond layer, full articulamentum with
And follow-up grader.Convolutional layer extracts the local feature of last layer image by using the convolution kernel of different scales, once
After the local feature is extracted, its position relationship between further feature is also decided therewith;Then, by seeking local average
(also referred to as pondization operation) carries out Feature Mapping, obtains the characteristic information after dimensionality reduction, this feature information is output to next convolutional layer
Proceed corresponding processing, until reaching last layer (output layer), so as to obtain final output result.
The pondization operation used in the convolutional neural networks model of current main-stream is mainly average value pond (Average
Pooling) and maximum pond (Max Pooling), average value pondization and maximum pond the pond method that to be two kinds different,
But be substantially provided to reduce the dimension of large-size images, amount of calculation is reduced, therefore occupied again in convolutional neural networks
The position wanted.Average value pond is to take the average value in certain size pixel, such as, n × n is small-scale greatly, and maximum pond is
Take the maximum in certain size pixel.In view of pondization operates the important function in convolutional neural networks, therefore in amount of calculation
Accelerate the concurrency of pondization operation particularly significant in very huge convolutional neural networks model.
The content of the invention
The technical problem to be solved in the present invention is that:The technical problem existed for prior art, the present invention provides one
Kind principle is simple, realization facilitates, can efficiently calculated, can fully excavate the multi-level parallelisms of vector processor to play vectorial place
Computation capability, the average value value pondization based on SIMD for the vector processor-oriented for shortening the calculating time for managing device are located parallel
Reason method.
In order to solve the above technical problems, the present invention uses following technical scheme:
A kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, its step is:
S1:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, pond window
Size is k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple;
S2:According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, draw preceding k
Row element it is corresponding row and;
S3:Configuration shuffle mode is simultaneously shuffled;
S4:The result obtained in step S3 correspondence is added;
S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;
S6:Instructed using vectorial VMOVI by immediateIt is assigned to a vector registor, and by this vector registor
It is multiplied with the cumulative and one-to-one corresponding in step S5;
S7:The final result vector for drawing p/k average value pond;
S8:Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through pond matrix A
All subgraphs, all sizes for obtaining pond matrix A are k × k average value pond matrix of consequence.
As a further improvement on the present invention:The detailed process of the step S3 is:
S301:If k is even number, preceding k/2 element plain per constituent element in p/k groups in step S2 is put together, often
The rear k/2 element of group is put together;
S302:If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, vector processing unit
Number is 12.
As a further improvement on the present invention:An average value pond result c in the step S80,0Calculation formula
ForWherein c0,0For first element in the matrix of consequence of average value pond, k is the chi of pond window
Very little, in convolutional neural networks, pond window is square formation, ai,jTo need to carry out the member in the pond matrix A in average value pond
Element.
As a further improvement on the present invention:The size of definition pond window is sizeX, two adjacent pool windows
Horizontal displacement or vertical displacement be stride, average value pondization operate in pond window it is not overlapping, i.e. sizeX=stride.
As a further improvement on the present invention:The pond matrix A and pond window size k are square formation.
Compared with prior art, the advantage of the invention is that:Being averaged based on SIMD of vector processor-oriented of the present invention
Value value pond method for parallel processing, realizes simple, with low cost and easy to operate, and it shuffles net using the intersection of vector processor
Network substitutes the quicksort that reduction operation completes data between VPE, and once-through operation can calculate p/k average value pondization knot simultaneously
Really, the Parallel Computing Performance of all processing units of vector processor can be given full play to, shortens average in convolutional neural networks
It is worth the calculating time in pond, good reliability simultaneously can reduce hardware calculating power consumption.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the present invention.
Fig. 2 is the general structure schematic diagram of vector processor.
Fig. 3 is the schematic diagram of present invention average value pond process in concrete application example.
Fig. 4 is that present invention window k in pond in concrete application example is the calculation procedure signal in 2 specific embodiment
Figure.
Fig. 5 is the signal of present invention pond window not overlap operation in the operation of average value pondization in concrete application example
Figure.
Fig. 6 is the average value pond calculation process schematic diagram that present invention pond window in concrete application example is 3 × 3.
Embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in figure 1, a kind of average value value pond parallel processing based on SIMD of vector processor-oriented of the present invention
Method, its step is:
S1:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, pond window
Size is k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple;In actual convolution god
Through in network model, general pond matrix A and pond window size k are square formation;
S2:According to pond window size k, take the preceding k row elements of pond matrix A to carry out corresponding accumulation operations, draw preceding k
Row element it is corresponding row and;
S3:Configuration shuffle mode is simultaneously shuffled;
S4:The result obtained in step S3 correspondence is added;
S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;
S6:Instructed using vectorial VMOVI by immediateIt is assigned to a vector registor, and by this vector registor
It is multiplied with the cumulative and one-to-one corresponding in step S5;
S7:The final result vector for drawing p/k average value pond;
S8:Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through pond matrix A
All subgraphs, all sizes that can obtain pond matrix A are k × k average value pond matrix of consequence.
The present invention is mainly suitable for vector processor, as shown in Fig. 2 being the general structure schematic diagram of vector processor.
In concrete application example, above-mentioned steps S3 detailed process is:
S301:If k is even number, preceding k/2 element plain per constituent element in p/k groups in step 2 is put together, every group
Rear k/2 element put together;
S302:If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, vector processing unit
Number is 12, and calculating process is as shown in Figure 6.
An average value pond result c in the step S80,0Calculation formula beWherein
c0,0For first element in the matrix of consequence of average value pond, k is the size of pond window, in convolutional neural networks, Chi Hua
Window is generally square formation, ai,jTo need to carry out the element in the pond matrix A in average value pond, its average value pond flow is shown
It is intended to as shown in Figure 3.
As shown in figure 5, the size for defining pond window is sizeX, the horizontal displacement or vertical of two adjacent pool windows
Displacement is stride, and pond window is not overlapping in the operation of average value pondization, i.e. sizeX=stride.
As shown in figure 4, the present invention is in a concrete application example, pond window k is 2, its detailed step:
S101:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is 16 × 16, Chi Huachi
Very little is 2 × 2, and the number p of vector processing unit is 16;
S102:It is 2 according to pond size k, takes preceding 2 row element of pond matrix A to carry out corresponding accumulation operations, before drawing
2 row elements it is corresponding row and;
S103:Shuffle mode is configured, preceding 1 element of 8 constituent elements element in step S102 is put together, rear the 1 of every group
Individual element is put together;
S104:The result obtained in step S103 correspondence is added, until the numerical value reduction of every constituent element element is into a VPE
Untill;
S105:Instructed using vectorial VMOVI by immediateBe assigned to a vector registor, and by this register with
Multiplication is added up and does in step S104;
S106:The final result vector for drawing 8 average value ponds;
S107:Along 3 rows for moving to pond matrix A, the S102 to step S106 that repeats the above steps is until traveled through pond square
Battle array A all subgraphs, can calculate the average value pond matrix of consequence for obtaining all sizes of pond matrix A for k × k.
It the above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,
All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art
For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention
Scope.
Claims (5)
1. a kind of average value value pond method for parallel processing based on SIMD of vector processor-oriented, it is characterised in that step
For:
S1:If the pond matrix for needing to carry out pondization operation after convolution operation is A, its size is M × N, the size of pond window
For k × k, and M>K, N>K, the number of vector processing unit is p, and M, N, p are k integral multiple;
S2:According to pond window size k, the preceding k row elements of pond matrix A are taken to carry out corresponding accumulation operations, k rows member before drawing
Element it is corresponding row and;
S3:Configuration shuffle mode is simultaneously shuffled;
S4:The result obtained in step S3 correspondence is added;
S5:Repeat step S3, S4 is untill the numerical value reduction of every constituent element element is into p/k VPE;
S6:Instructed using vectorial VMOVI by immediateA vector registor is assigned to, and by this vector registor with walking
Adding up in rapid S5 is multiplied with one-to-one corresponding;
S7:The final result vector for drawing p/k average value pond;
S8:Along the k+1 rows for moving to pond matrix A, the S2 to step S7 that repeats the above steps is until traveled through all of pond matrix A
Subgraph, all sizes for obtaining pond matrix A are k × k average value pond matrix of consequence.
2. the average value value pond method for parallel processing based on SIMD of vector processor-oriented according to claim 1,
Characterized in that, the detailed process of the step S3 is:
S301:If k is even number, preceding k/2 element plain per constituent element in p/k groups in step S2 is put together, every group
K/2 element is put together afterwards;
S302:If k be odd number, by 3, p of k be 12 exemplified by, i.e., the size of pond window be 3 × 3, the number of vector processing unit
For 12.
3. the average value value pond method for parallel processing based on SIMD of vector processor-oriented according to claim 1,
Characterized in that, an average value pond result c in the step S80,0Calculation formula beIts
Middle c0,0For first element in the matrix of consequence of average value pond, k is the size of pond window, in convolutional neural networks, pond
Change window is square formation, ai,jTo need to carry out the element in the pond matrix A in average value pond.
4. the average value value pond parallel processing based on SIMD of the vector processor-oriented according to claim 1 or 2 or 3
Method, it is characterised in that the size of definition pond window is sizeX, the horizontal displacement or perpendicular of two adjacent pool windows
Straight displacement is stride, and pond window is not overlapping in the operation of average value pondization, i.e. sizeX=stride.
5. the average value value pond parallel processing based on SIMD of the vector processor-oriented according to claim 1 or 2 or 3
Method, it is characterised in that the pond matrix A and pond window size k are square formation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710202133.0A CN106991473A (en) | 2017-03-30 | 2017-03-30 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710202133.0A CN106991473A (en) | 2017-03-30 | 2017-03-30 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106991473A true CN106991473A (en) | 2017-07-28 |
Family
ID=59413094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710202133.0A Pending CN106991473A (en) | 2017-03-30 | 2017-03-30 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991473A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108205702A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Parallel processing method for multi-input multi-output matrix convolution |
CN108205703A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
CN109726803A (en) * | 2019-01-10 | 2019-05-07 | 广州小狗机器人技术有限公司 | Pond method, image processing method and device |
CN109754359A (en) * | 2017-11-01 | 2019-05-14 | 腾讯科技(深圳)有限公司 | A kind of method and system that the pondization applied to convolutional neural networks is handled |
CN110096309A (en) * | 2018-11-14 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN110163333A (en) * | 2018-01-10 | 2019-08-23 | 成都信息工程大学 | The parallel optimization method of convolutional neural networks |
CN110232665A (en) * | 2019-06-13 | 2019-09-13 | Oppo广东移动通信有限公司 | Maximum pond method, apparatus, computer equipment and storage medium |
CN110399977A (en) * | 2018-04-25 | 2019-11-01 | 华为技术有限公司 | Pond arithmetic unit |
CN111213125A (en) * | 2017-09-08 | 2020-05-29 | 甲骨文国际公司 | Efficient direct convolution using SIMD instructions |
CN112651489A (en) * | 2020-12-22 | 2021-04-13 | 龙芯中科(合肥)技术有限公司 | Operation processing method, operation processing device and storage medium |
-
2017
- 2017-03-30 CN CN201710202133.0A patent/CN106991473A/en active Pending
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111213125B (en) * | 2017-09-08 | 2023-11-07 | 甲骨文国际公司 | Efficient direct convolution using SIMD instructions |
CN111213125A (en) * | 2017-09-08 | 2020-05-29 | 甲骨文国际公司 | Efficient direct convolution using SIMD instructions |
US11537857B2 (en) | 2017-11-01 | 2022-12-27 | Tencent Technology (Shenzhen) Company Limited | Pooling processing method and system applied to convolutional neural network |
CN109754359A (en) * | 2017-11-01 | 2019-05-14 | 腾讯科技(深圳)有限公司 | A kind of method and system that the pondization applied to convolutional neural networks is handled |
US11734554B2 (en) | 2017-11-01 | 2023-08-22 | Tencent Technology (Shenzhen) Company Limited | Pooling processing method and system applied to convolutional neural network |
CN108205703A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
CN108205702A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Parallel processing method for multi-input multi-output matrix convolution |
CN108205702B (en) * | 2017-12-29 | 2020-12-01 | 中国人民解放军国防科技大学 | Parallel processing method for multi-input multi-output matrix convolution |
CN108205703B (en) * | 2017-12-29 | 2021-01-12 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
CN110163333A (en) * | 2018-01-10 | 2019-08-23 | 成都信息工程大学 | The parallel optimization method of convolutional neural networks |
CN110399977A (en) * | 2018-04-25 | 2019-11-01 | 华为技术有限公司 | Pond arithmetic unit |
CN110096309A (en) * | 2018-11-14 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN109726803B (en) * | 2019-01-10 | 2021-06-29 | 广州小狗机器人技术有限公司 | Pooling method, image processing method and device |
CN109726803A (en) * | 2019-01-10 | 2019-05-07 | 广州小狗机器人技术有限公司 | Pond method, image processing method and device |
CN110232665B (en) * | 2019-06-13 | 2021-08-20 | Oppo广东移动通信有限公司 | Maximum pooling method and device, computer equipment and storage medium |
CN110232665A (en) * | 2019-06-13 | 2019-09-13 | Oppo广东移动通信有限公司 | Maximum pond method, apparatus, computer equipment and storage medium |
CN112651489A (en) * | 2020-12-22 | 2021-04-13 | 龙芯中科(合肥)技术有限公司 | Operation processing method, operation processing device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991473A (en) | The average value value pond method for parallel processing based on SIMD of vector processor-oriented | |
US11775836B2 (en) | Hand pose estimation | |
JP6771018B2 (en) | Improved performance of 2D array processor | |
US20200285446A1 (en) | Arithmetic device for neural network, chip, equipment and related method | |
US10846591B2 (en) | Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks | |
CN108491359B (en) | Submatrix operation device and method | |
CN107438860B (en) | Architecture for high performance power efficient programmable image processing | |
CN106991472A (en) | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond | |
US20210019594A1 (en) | Convolutional neural network accelerating device and method | |
AU2017338783A1 (en) | Efficient data layouts for convolutional neural networks | |
JP2018185847A (en) | Two dimensional shift array for image processor | |
CN107563951B (en) | Statistical operations on a two-dimensional image processor | |
CN108268931A (en) | The methods, devices and systems of data processing | |
CN110188869B (en) | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm | |
KR102610842B1 (en) | Processing element and operating method thereof in neural network | |
KR20070039490A (en) | A bit serial processing element for a simd array processor | |
US20210201120A1 (en) | Inference apparatus, convolution operation execution method, and program | |
JP7171883B2 (en) | efficient convolutional engine | |
CN114781629B (en) | Hardware accelerator of convolutional neural network based on parallel multiplexing and parallel multiplexing method | |
EP4071619A1 (en) | Address generation method, related device and storage medium | |
CN112784973A (en) | Convolution operation circuit, device and method | |
CN109447239B (en) | Embedded convolutional neural network acceleration method based on ARM | |
CN113254391B (en) | Neural network accelerator convolution calculation and data loading parallel method and device | |
CN113313252B (en) | Depth separable convolution implementation method based on pulse array | |
WO2020038462A1 (en) | Tongue segmentation device and method employing deep learning, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170728 |