CN108205703A - Multi-input multi-output matrix average value pooling vectorization implementation method - Google Patents
Multi-input multi-output matrix average value pooling vectorization implementation method Download PDFInfo
- Publication number
- CN108205703A CN108205703A CN201711478728.5A CN201711478728A CN108205703A CN 108205703 A CN108205703 A CN 108205703A CN 201711478728 A CN201711478728 A CN 201711478728A CN 108205703 A CN108205703 A CN 108205703A
- Authority
- CN
- China
- Prior art keywords
- input
- average value
- pond
- input feature
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
A vectorization implementation method for average pooling of multiple input multiple output matrices includes the steps: s1: determining the number of input feature graphs simultaneously calculated by a single core of the vector processor according to parameters such as the number M of vector processing units (VPE) in the vector processor; s2: sorting the input feature graphs according to a third dimension; s3: finishing the sequencing of all the input feature graphs; s4: transmitting the sequenced input characteristic diagram to an AM in a vector processor core from the DMA; s5: vector loading and accumulating with the next data line in sequence to obtain an average pooling result of the pooling windows of the corresponding positions of the input feature map; s6: according to the horizontal moving step length, moving to the next pooling window; s7: repeating steps S5-S6; s8: and repeating the steps S6-S7N/M times, and finally completing the average value pooling operation of the N input feature maps. The method has the advantages of simple implementation, convenient operation, capability of improving the parallelism of the multi-core vector processor, capability of improving the operation efficiency of the processor and the like.
Description
Technical field
Present invention relates generally to deep learning, convolutional neural networks fields, refer in particular to a kind of multiple-input and multiple-output matrix and put down
Mean value pond vectorization implementation method.
Background technology
Convolutional neural networks are the most a kind of neural network models applied in current depth learning algorithm model, simultaneously
It is also a kind of best model of discrimination.Matrix convolution, activation primitive, maximum value pond are generally comprised in convolutional neural networks model
Change or average value pond, local linear normalization operation etc..
Pond layer is located at after convolutional layer, generally by convolutional layer obtain feature after, it is desirable to taken and done using these features
Classification theoretically, can use all obtained features of extracting to take trained grader, but can be faced in this way from huge calculation amount
On challenge.Assuming that there are one the input picture of 96 × 96 pixels, learn to obtain 400 spies being defined in 8 × 8 inputs
The input convolution that sign, each feature and input picture convolution can obtain one (96-8+1) × (96-8+1)=7921 dimension is special
Sign.Due to there is 400 features, thus each sample can obtain the convolution feature of 89 × 89 × 400=3168400 dimension to
Amount, and learn so large-scale grader and be susceptible to over-fitting.
And pondization operation is that a kind of important method of dimensionality reduction is carried out to convolution feature vector, it can be by calculating image one
The average value (or maximum value) of some special characteristic on region, these summary statistics features not only have very low dimension, together
When can also improve as a result, being less prone to over-fitting.
In addition pondization operation also has translation invariant shape, i.e. image can still generate phase after a small translation
Same pond feature, and this characteristic has important application prospect in fields such as object detection, image identification, speech recognitions,
For example, when a MNIST data set number is handled, it to the left or right side translation, then no matter final position
Where, it can all it is expected that your grader remains able to accurately be classified as identical number.
Due in convolutional neural networks, there is multiple input characteristic pattern and multiple output characteristic patterns, just have accordingly more
A input feature vector figure for carrying out average value pond and multiple output characteristic patterns, and how to maximize the calculating in average value pond
Journey and an important research contents.
As shown in Figure 1, vector processor is a kind of novel architecture, the scalar including carrying out scalar operation handles single
First (SPU) and the vector processing unit (VPU) for carrying out vector operation can give full play to vectorial place by the division of rational task
Manage the calculating advantage of device.
Invention content
The technical problem to be solved in the present invention is that:For technical problem of the existing technology, the present invention provides one
Kind realize be simple and convenient to operate, multinuclear vector processor concurrency can be improved, can improve processor operation efficiency multi input it is more
Output matrix average value pond vectorization implementation method.
In order to solve the above technical problems, the present invention uses following technical scheme:
A kind of vectorization implementation method in multiple-input and multiple-output matrix average value pond, step are:
S1:According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size for n ×
N, the moving step length in average value pond is s, pond window k, determines vector processor monokaryon while the input feature vector figure number calculated
Amount;
S2:M input feature vector figure is ranked up according to the third dimension;
S3:Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures;
S4:By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core;
S5:The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and
The accumulation result with 1/k2 is multiplied, while show that M input feature vector figure corresponding position pond window is being averaged for k × k
Pond result;
S6:According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M
The average value pond result of the 2nd k × k ponds window of input feature vector figure;
S7:Step S5-S6 is repeated until completing the average value pondization behaviour of entire all pond windows of M input feature vector figure
Make;
S8:It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.
As a further improvement on the present invention:In the step S1, the quantity N of input feature vector figure is much larger than Vector Processing list
The quantity M and N of first VPE is the integral multiple of M.
As a further improvement on the present invention:In the step S1, input feature vector figure is a three-dimensional matrice, i.e., highly,
Width, quantity.
As a further improvement on the present invention:Three-dimensional matrice in the step S1 is square formation, i.e. height is equal with width.
As a further improvement on the present invention:The moving step length in average value pond, which is divided into, in the step S1 moves horizontally step
It grows and vertically moves step-length.
As a further improvement on the present invention:Take that move horizontally step-length identical with step-length is vertically moved in the step S1.
As a further improvement on the present invention:Pond size in the step (1) takes square formation, i.e. maximum value pond window
For k × k.
As a further improvement on the present invention:In the step S1, if the quantity of input feature vector figure is not the whole of VPE quantity
Several times, then extra input feature vector figure is by several VPE processing.
As a further improvement on the present invention:In the step S6, input feature vector figure takes one corresponding with k × k big every time
Small pond window, and the window is first moved horizontally according to moving step length, after vertically move, mobile sequence is, from left to right, from
Top to bottm.
Compared with prior art, the advantage of the invention is that:
1st, multiple-input and multiple-output matrix average value pond vectorization implementation method of the invention, can make full use of vectorial place
The design feature of the more PE of device is managed, the calculating in the multi input matrix average value pond for being not easy vectorization operation is converted into being easy to vector
Change the process calculated.
2nd, it present method avoids shuffling between vector processor difference PE, stipulations operation, effectively raises more
The average value pond computational efficiency of input matrix convolution, these advantages cause the method for the present invention to realize simple, easy to operate, energy
Enough instructions for fully excavating vector processor, data, task dispatching concurrency at all levels, so as to give full play to more PE operational parts
Possessed by part vector processor the advantages of high-performance calculation ability.
Description of the drawings
Fig. 1 is the general structure schematic diagram of vector processor.
Fig. 2 is 3 × 3 average value pond schematic diagrames.
Fig. 3 is the average value pond schematic diagram of M output characteristic pattern of M input feature vector figure.
Fig. 4 is the average value pond schematic diagram of 16 input feature vector figures progress pond window 3 × 3 in present example.
Fig. 5 is the journey schematic diagram of the method for the present invention
Specific embodiment
It is real by multiple-input and multiple-output matrix average value pond vectorization implementation method combination Figure of description and specifically below
Example is applied to be described in further details the present invention.
As shown in figure 5, the vectorization implementation method in the multiple-input and multiple-output matrix average value pond of the present invention, flow are:
S1:According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size for n ×
N, the moving step length in average value pond is s, and pond window k determines the input feature vector that vector processor monokaryon can calculate simultaneously
Figure quantity;
S2:M input feature vector figure is ranked up according to the third dimension;
S3:Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures;
S4:By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core;
S5:The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and
The accumulation result is multiplied with 1/k2, it is making even for k × k that can obtain M input feature vector figure corresponding position pond window simultaneously
Mean value pond result;
S6:According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M
The average value pond result of the 2nd k × k ponds window of input feature vector figure;
S7:Step S5-S6 is repeated until completing the average value pondization behaviour of entire all pond windows of M input feature vector figure
Make.
S8:It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.
In concrete application example, in above-mentioned steps S1, the quantity N for typically entering characteristic pattern is much larger than vector processing unit
The quantity M and General N of VPE is the integral multiple of M.
In concrete application example, in above-mentioned steps S1, input feature vector figure is a three-dimensional matrice, i.e. height, width, number
Amount, generally takes square formation here, i.e., height is equal with width.
In concrete application example, the moving step length in above-mentioned steps S1 average values pond, which is divided into, moves horizontally step-length and vertical
Moving step length, generally takes that move horizontally step-length identical with step-length is vertically moved.
Step (1) pond size generally takes square formation, i.e. maximum value pond window is k × k.
In concrete application example, in above-mentioned steps S1, if the quantity of input feature vector figure is not the integral multiple of VPE quantity,
Then extra input feature vector figure is by several VPE processing.
Referring to shown in Fig. 2-Fig. 4, in a concrete application example, the quantity M for taking VPE is 16, input feature vector figure quantity N
It is 16, input feature vector figure size is 6 × 6, and pond window moving step length s is 1, and pond window size k is 3.16 inputs are special
Sign figure is ranked up in third dimension, and the result after sequence is transmitted in core memory space AM.In vector loading AM
The 1st row, added up successively follow-up each row using multiply-add instruction, adding up 3 × 3 amounts to 9 times, accumulation result finally is multiplied by 1/9, together
When complete 16 output characteristic patterns corresponds to pond window average value pondization operate.According to moving horizontally step-length, pond window water
It is flat to move to right 1, repeat the average value pondization operation that step obtains 16 output second corresponding pond windows of characteristic pattern simultaneously.Directly
To the calculating for completing all 16 input feature vectors figure average value ponds window.
In conclusion the architecture feature based on vector processor realized by the present invention and input feature vector figure
Quantity and scale determine the implementation method of optimal multi output characteristic pattern, are effectively improved the parallelization behaviour of vector processor
Make, different input feature vector figures transferred to different PE processing, completely without operation associated between PE, in general have more
A few PE can calculate how many a input feature vector figures simultaneously, these advantages cause the method for the present invention to realize simple, operation side
Just, the concurrency at all levels of vector processor can be fully excavated, so as to give full play to more PE arithmetic units Vector Processings
Possessed by device the advantages of high-performance calculation ability.
The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,
All technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art
For those of ordinary skill, several improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention
Range.
Claims (9)
1. a kind of vectorization implementation method in multiple-input and multiple-output matrix average value pond, which is characterized in that step is:
S1:According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size is n × n, is put down
The moving step length in mean value pond is s, pond window k, determines vector processor monokaryon while the input feature vector figure quantity calculated;
S2:M input feature vector figure is ranked up according to the third dimension;
S3:Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures;
S4:By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core;
S5:The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and should
Accumulation result is multiplied with 1/k2, while show that M input feature vector figure corresponding position pond window is averaged pond for k × k
As a result;
S6:According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M input
The average value pond result of the 2nd k × k ponds window of characteristic pattern;
S7:Step S5-S6 is repeated until completing the average value pondization operation of entire all pond windows of M input feature vector figure;
S8:It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.
2. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 1, feature exist
In, in the step S1, quantity M and N integral multiples for M of the quantity N much larger than vector processing unit VPE of input feature vector figure.
3. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 1, feature exist
In in the step S1, input feature vector figure is a three-dimensional matrice, i.e. height, width, quantity.
4. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 3, feature exist
In the three-dimensional matrice in the step S1 is square formation, i.e. height is equal with width.
5. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4
Method, which is characterized in that the moving step length in average value pond, which is divided into, in the step S1 moves horizontally step-length and vertically move step-length.
6. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 5, feature exist
In taking that move horizontally step-length identical with step-length is vertically moved in the step S1.
7. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4
Method, which is characterized in that the pond size in the step (1) takes square formation, i.e. maximum value pond window is k × k.
8. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4
Method, which is characterized in that in the step S1, if the quantity of input feature vector figure is not the integral multiple of VPE quantity, extra input
Characteristic pattern is by several VPE processing.
9. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4
Method, which is characterized in that in the step S6, input feature vector figure takes the pond window of a size corresponding with k × k every time, and should
Window is first moved horizontally according to moving step length, after vertically move, mobile sequence is, from left to right, from top to bottom.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711478728.5A CN108205703B (en) | 2017-12-29 | 2017-12-29 | Multi-input multi-output matrix average value pooling vectorization implementation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711478728.5A CN108205703B (en) | 2017-12-29 | 2017-12-29 | Multi-input multi-output matrix average value pooling vectorization implementation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108205703A true CN108205703A (en) | 2018-06-26 |
CN108205703B CN108205703B (en) | 2021-01-12 |
Family
ID=62606033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711478728.5A Active CN108205703B (en) | 2017-12-29 | 2017-12-29 | Multi-input multi-output matrix average value pooling vectorization implementation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108205703B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002715A (en) * | 2018-07-05 | 2018-12-14 | 东北大学秦皇岛分校 | A kind of Malware recognition methods and system based on convolutional neural networks |
CN109165733A (en) * | 2018-07-11 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix maximum pooling vectorization implementation method |
CN109165734A (en) * | 2018-07-11 | 2019-01-08 | 中国人民解放军国防科技大学 | Matrix local response normalization vectorization implementation method |
CN109886404A (en) * | 2019-02-01 | 2019-06-14 | 东南大学 | A kind of convolutional neural networks pond method of staggered diamonds perception |
CN110096309A (en) * | 2018-11-14 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN110796236A (en) * | 2019-10-21 | 2020-02-14 | 中国人民解放军国防科技大学 | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network |
CN112906829A (en) * | 2021-04-13 | 2021-06-04 | 成都四方伟业软件股份有限公司 | Digital recognition model construction method and device based on Mnist data set |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130329987A1 (en) * | 2012-06-11 | 2013-12-12 | Genesis Group Inc. | Video segmentation method |
CN106991473A (en) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
CN106991472A (en) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN107301456A (en) * | 2017-05-26 | 2017-10-27 | 中国人民解放军国防科学技术大学 | Deep neural network multinuclear based on vector processor speeds up to method |
-
2017
- 2017-12-29 CN CN201711478728.5A patent/CN108205703B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130329987A1 (en) * | 2012-06-11 | 2013-12-12 | Genesis Group Inc. | Video segmentation method |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN106991473A (en) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | The average value value pond method for parallel processing based on SIMD of vector processor-oriented |
CN106991472A (en) * | 2017-03-30 | 2017-07-28 | 中国人民解放军国防科学技术大学 | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond |
CN107301456A (en) * | 2017-05-26 | 2017-10-27 | 中国人民解放军国防科学技术大学 | Deep neural network multinuclear based on vector processor speeds up to method |
Non-Patent Citations (2)
Title |
---|
MATTHIAS JOACHIM EHRHARDT等: "Vector-valued image processing by parallel level sets", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
张兴革: "基于卷积神经网络模型下的语音处理方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002715A (en) * | 2018-07-05 | 2018-12-14 | 东北大学秦皇岛分校 | A kind of Malware recognition methods and system based on convolutional neural networks |
CN109165733A (en) * | 2018-07-11 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix maximum pooling vectorization implementation method |
CN109165734A (en) * | 2018-07-11 | 2019-01-08 | 中国人民解放军国防科技大学 | Matrix local response normalization vectorization implementation method |
CN109165734B (en) * | 2018-07-11 | 2021-04-02 | 中国人民解放军国防科技大学 | Matrix local response normalization vectorization implementation method |
CN110096309A (en) * | 2018-11-14 | 2019-08-06 | 上海寒武纪信息科技有限公司 | Operation method, device, computer equipment and storage medium |
CN109886404A (en) * | 2019-02-01 | 2019-06-14 | 东南大学 | A kind of convolutional neural networks pond method of staggered diamonds perception |
CN109886404B (en) * | 2019-02-01 | 2023-08-04 | 东南大学 | Convolutional neural network pooling method for staggered diamond perception |
CN110796236A (en) * | 2019-10-21 | 2020-02-14 | 中国人民解放军国防科技大学 | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network |
CN110796236B (en) * | 2019-10-21 | 2022-06-17 | 中国人民解放军国防科技大学 | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network |
CN112906829A (en) * | 2021-04-13 | 2021-06-04 | 成都四方伟业软件股份有限公司 | Digital recognition model construction method and device based on Mnist data set |
Also Published As
Publication number | Publication date |
---|---|
CN108205703B (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108205703A (en) | Multi-input multi-output matrix average value pooling vectorization implementation method | |
CN107358293B (en) | Neural network training method and device | |
Ngiam et al. | Tiled convolutional neural networks | |
CN101271572B (en) | Image segmentation method based on immunity clone selection clustering | |
CN106778745A (en) | A kind of licence plate recognition method and device, user equipment | |
Castello et al. | Deep learning in the built environment: Automatic detection of rooftop solar panels using Convolutional Neural Networks | |
CN107464210A (en) | A kind of image Style Transfer method based on production confrontation network | |
CN111882040B (en) | Convolutional neural network compression method based on channel number search | |
CN106250931A (en) | A kind of high-definition picture scene classification method based on random convolutional neural networks | |
CN106650744B (en) | The image object of local shape migration guidance is divided into segmentation method | |
CN109165733A (en) | Multi-input multi-output matrix maximum pooling vectorization implementation method | |
CN106959937B (en) | A kind of vectorization implementation method of the warp product matrix towards GPDSP | |
CN107292234A (en) | It is a kind of that method of estimation is laid out based on information edge and the indoor scene of multi-modal feature | |
CN107292341A (en) | Adaptive multi views clustering method based on paired collaboration regularization and NMF | |
CN106991472A (en) | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond | |
CN106203444B (en) | Classification of Polarimetric SAR Image method based on band wave and convolutional neural networks | |
CN107885700A (en) | Multi-core implementation method for large-scale matrix convolution | |
CN109711401A (en) | A kind of Method for text detection in natural scene image based on Faster Rcnn | |
CN109766949A (en) | Convolutional neural networks light weight method, device and electronic equipment | |
CN110222760A (en) | A kind of fast image processing method based on winograd algorithm | |
CN111523713A (en) | Method and device for predicting residual oil saturation distribution in oil field | |
CN106294288B (en) | A kind of distribution non-negative matrix factorization method | |
CN108510058A (en) | Weight storage method in neural network and the processor based on this method | |
CN106228121A (en) | Gesture feature recognition methods and device | |
CN110222598A (en) | A kind of video behavior recognition methods, device, storage medium and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |