CN108205703A - Multi-input multi-output matrix average value pooling vectorization implementation method - Google Patents

Multi-input multi-output matrix average value pooling vectorization implementation method Download PDF

Info

Publication number
CN108205703A
CN108205703A CN201711478728.5A CN201711478728A CN108205703A CN 108205703 A CN108205703 A CN 108205703A CN 201711478728 A CN201711478728 A CN 201711478728A CN 108205703 A CN108205703 A CN 108205703A
Authority
CN
China
Prior art keywords
input
average value
pond
input feature
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711478728.5A
Other languages
Chinese (zh)
Other versions
CN108205703B (en
Inventor
郭阳
张军阳
杨超
田希
扈啸
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201711478728.5A priority Critical patent/CN108205703B/en
Publication of CN108205703A publication Critical patent/CN108205703A/en
Application granted granted Critical
Publication of CN108205703B publication Critical patent/CN108205703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

A vectorization implementation method for average pooling of multiple input multiple output matrices includes the steps: s1: determining the number of input feature graphs simultaneously calculated by a single core of the vector processor according to parameters such as the number M of vector processing units (VPE) in the vector processor; s2: sorting the input feature graphs according to a third dimension; s3: finishing the sequencing of all the input feature graphs; s4: transmitting the sequenced input characteristic diagram to an AM in a vector processor core from the DMA; s5: vector loading and accumulating with the next data line in sequence to obtain an average pooling result of the pooling windows of the corresponding positions of the input feature map; s6: according to the horizontal moving step length, moving to the next pooling window; s7: repeating steps S5-S6; s8: and repeating the steps S6-S7N/M times, and finally completing the average value pooling operation of the N input feature maps. The method has the advantages of simple implementation, convenient operation, capability of improving the parallelism of the multi-core vector processor, capability of improving the operation efficiency of the processor and the like.

Description

Multiple-input and multiple-output matrix average value pond vectorization implementation method
Technical field
Present invention relates generally to deep learning, convolutional neural networks fields, refer in particular to a kind of multiple-input and multiple-output matrix and put down Mean value pond vectorization implementation method.
Background technology
Convolutional neural networks are the most a kind of neural network models applied in current depth learning algorithm model, simultaneously It is also a kind of best model of discrimination.Matrix convolution, activation primitive, maximum value pond are generally comprised in convolutional neural networks model Change or average value pond, local linear normalization operation etc..
Pond layer is located at after convolutional layer, generally by convolutional layer obtain feature after, it is desirable to taken and done using these features Classification theoretically, can use all obtained features of extracting to take trained grader, but can be faced in this way from huge calculation amount On challenge.Assuming that there are one the input picture of 96 × 96 pixels, learn to obtain 400 spies being defined in 8 × 8 inputs The input convolution that sign, each feature and input picture convolution can obtain one (96-8+1) × (96-8+1)=7921 dimension is special Sign.Due to there is 400 features, thus each sample can obtain the convolution feature of 89 × 89 × 400=3168400 dimension to Amount, and learn so large-scale grader and be susceptible to over-fitting.
And pondization operation is that a kind of important method of dimensionality reduction is carried out to convolution feature vector, it can be by calculating image one The average value (or maximum value) of some special characteristic on region, these summary statistics features not only have very low dimension, together When can also improve as a result, being less prone to over-fitting.
In addition pondization operation also has translation invariant shape, i.e. image can still generate phase after a small translation Same pond feature, and this characteristic has important application prospect in fields such as object detection, image identification, speech recognitions, For example, when a MNIST data set number is handled, it to the left or right side translation, then no matter final position Where, it can all it is expected that your grader remains able to accurately be classified as identical number.
Due in convolutional neural networks, there is multiple input characteristic pattern and multiple output characteristic patterns, just have accordingly more A input feature vector figure for carrying out average value pond and multiple output characteristic patterns, and how to maximize the calculating in average value pond Journey and an important research contents.
As shown in Figure 1, vector processor is a kind of novel architecture, the scalar including carrying out scalar operation handles single First (SPU) and the vector processing unit (VPU) for carrying out vector operation can give full play to vectorial place by the division of rational task Manage the calculating advantage of device.
Invention content
The technical problem to be solved in the present invention is that:For technical problem of the existing technology, the present invention provides one Kind realize be simple and convenient to operate, multinuclear vector processor concurrency can be improved, can improve processor operation efficiency multi input it is more Output matrix average value pond vectorization implementation method.
In order to solve the above technical problems, the present invention uses following technical scheme:
A kind of vectorization implementation method in multiple-input and multiple-output matrix average value pond, step are:
S1:According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size for n × N, the moving step length in average value pond is s, pond window k, determines vector processor monokaryon while the input feature vector figure number calculated Amount;
S2:M input feature vector figure is ranked up according to the third dimension;
S3:Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures;
S4:By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core;
S5:The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and The accumulation result with 1/k2 is multiplied, while show that M input feature vector figure corresponding position pond window is being averaged for k × k Pond result;
S6:According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M The average value pond result of the 2nd k × k ponds window of input feature vector figure;
S7:Step S5-S6 is repeated until completing the average value pondization behaviour of entire all pond windows of M input feature vector figure Make;
S8:It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.
As a further improvement on the present invention:In the step S1, the quantity N of input feature vector figure is much larger than Vector Processing list The quantity M and N of first VPE is the integral multiple of M.
As a further improvement on the present invention:In the step S1, input feature vector figure is a three-dimensional matrice, i.e., highly, Width, quantity.
As a further improvement on the present invention:Three-dimensional matrice in the step S1 is square formation, i.e. height is equal with width.
As a further improvement on the present invention:The moving step length in average value pond, which is divided into, in the step S1 moves horizontally step It grows and vertically moves step-length.
As a further improvement on the present invention:Take that move horizontally step-length identical with step-length is vertically moved in the step S1.
As a further improvement on the present invention:Pond size in the step (1) takes square formation, i.e. maximum value pond window For k × k.
As a further improvement on the present invention:In the step S1, if the quantity of input feature vector figure is not the whole of VPE quantity Several times, then extra input feature vector figure is by several VPE processing.
As a further improvement on the present invention:In the step S6, input feature vector figure takes one corresponding with k × k big every time Small pond window, and the window is first moved horizontally according to moving step length, after vertically move, mobile sequence is, from left to right, from Top to bottm.
Compared with prior art, the advantage of the invention is that:
1st, multiple-input and multiple-output matrix average value pond vectorization implementation method of the invention, can make full use of vectorial place The design feature of the more PE of device is managed, the calculating in the multi input matrix average value pond for being not easy vectorization operation is converted into being easy to vector Change the process calculated.
2nd, it present method avoids shuffling between vector processor difference PE, stipulations operation, effectively raises more The average value pond computational efficiency of input matrix convolution, these advantages cause the method for the present invention to realize simple, easy to operate, energy Enough instructions for fully excavating vector processor, data, task dispatching concurrency at all levels, so as to give full play to more PE operational parts Possessed by part vector processor the advantages of high-performance calculation ability.
Description of the drawings
Fig. 1 is the general structure schematic diagram of vector processor.
Fig. 2 is 3 × 3 average value pond schematic diagrames.
Fig. 3 is the average value pond schematic diagram of M output characteristic pattern of M input feature vector figure.
Fig. 4 is the average value pond schematic diagram of 16 input feature vector figures progress pond window 3 × 3 in present example.
Fig. 5 is the journey schematic diagram of the method for the present invention
Specific embodiment
It is real by multiple-input and multiple-output matrix average value pond vectorization implementation method combination Figure of description and specifically below Example is applied to be described in further details the present invention.
As shown in figure 5, the vectorization implementation method in the multiple-input and multiple-output matrix average value pond of the present invention, flow are:
S1:According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size for n × N, the moving step length in average value pond is s, and pond window k determines the input feature vector that vector processor monokaryon can calculate simultaneously Figure quantity;
S2:M input feature vector figure is ranked up according to the third dimension;
S3:Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures;
S4:By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core;
S5:The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and The accumulation result is multiplied with 1/k2, it is making even for k × k that can obtain M input feature vector figure corresponding position pond window simultaneously Mean value pond result;
S6:According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M The average value pond result of the 2nd k × k ponds window of input feature vector figure;
S7:Step S5-S6 is repeated until completing the average value pondization behaviour of entire all pond windows of M input feature vector figure Make.
S8:It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.
In concrete application example, in above-mentioned steps S1, the quantity N for typically entering characteristic pattern is much larger than vector processing unit The quantity M and General N of VPE is the integral multiple of M.
In concrete application example, in above-mentioned steps S1, input feature vector figure is a three-dimensional matrice, i.e. height, width, number Amount, generally takes square formation here, i.e., height is equal with width.
In concrete application example, the moving step length in above-mentioned steps S1 average values pond, which is divided into, moves horizontally step-length and vertical Moving step length, generally takes that move horizontally step-length identical with step-length is vertically moved.
Step (1) pond size generally takes square formation, i.e. maximum value pond window is k × k.
In concrete application example, in above-mentioned steps S1, if the quantity of input feature vector figure is not the integral multiple of VPE quantity, Then extra input feature vector figure is by several VPE processing.
Referring to shown in Fig. 2-Fig. 4, in a concrete application example, the quantity M for taking VPE is 16, input feature vector figure quantity N It is 16, input feature vector figure size is 6 × 6, and pond window moving step length s is 1, and pond window size k is 3.16 inputs are special Sign figure is ranked up in third dimension, and the result after sequence is transmitted in core memory space AM.In vector loading AM The 1st row, added up successively follow-up each row using multiply-add instruction, adding up 3 × 3 amounts to 9 times, accumulation result finally is multiplied by 1/9, together When complete 16 output characteristic patterns corresponds to pond window average value pondization operate.According to moving horizontally step-length, pond window water It is flat to move to right 1, repeat the average value pondization operation that step obtains 16 output second corresponding pond windows of characteristic pattern simultaneously.Directly To the calculating for completing all 16 input feature vectors figure average value ponds window.
In conclusion the architecture feature based on vector processor realized by the present invention and input feature vector figure Quantity and scale determine the implementation method of optimal multi output characteristic pattern, are effectively improved the parallelization behaviour of vector processor Make, different input feature vector figures transferred to different PE processing, completely without operation associated between PE, in general have more A few PE can calculate how many a input feature vector figures simultaneously, these advantages cause the method for the present invention to realize simple, operation side Just, the concurrency at all levels of vector processor can be fully excavated, so as to give full play to more PE arithmetic units Vector Processings Possessed by device the advantages of high-performance calculation ability.
The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art For those of ordinary skill, several improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention Range.

Claims (9)

1. a kind of vectorization implementation method in multiple-input and multiple-output matrix average value pond, which is characterized in that step is:
S1:According to the quantity M of vector processing unit VPE in vector processor, the quantity N of input feature vector figure, size is n × n, is put down The moving step length in mean value pond is s, pond window k, determines vector processor monokaryon while the input feature vector figure quantity calculated;
S2:M input feature vector figure is ranked up according to the third dimension;
S3:Step S2 is carried out N/M times, until completing the sequence of all N input feature vector figures;
S4:By the input feature vector figure to have sorted in step S3 by AM in DMA transfer to vector processor core;
S5:The 1st row in vector loading AM, adds up successively with next data row, the k that adds up altogether × k times, and should Accumulation result is multiplied with 1/k2, while show that M input feature vector figure corresponding position pond window is averaged pond for k × k As a result;
S6:According to step-length s is moved horizontally, along next pond window is moved to, the calculating of rapid S5 is synchronized, obtains M input The average value pond result of the 2nd k × k ponds window of characteristic pattern;
S7:Step S5-S6 is repeated until completing the average value pondization operation of entire all pond windows of M input feature vector figure;
S8:It repeats step S6-S7N/M times, is finally completed the average value pondization operation of N input feature vector figures.
2. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 1, feature exist In, in the step S1, quantity M and N integral multiples for M of the quantity N much larger than vector processing unit VPE of input feature vector figure.
3. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 1, feature exist In in the step S1, input feature vector figure is a three-dimensional matrice, i.e. height, width, quantity.
4. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 3, feature exist In the three-dimensional matrice in the step S1 is square formation, i.e. height is equal with width.
5. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that the moving step length in average value pond, which is divided into, in the step S1 moves horizontally step-length and vertically move step-length.
6. the vectorization implementation method in multiple-input and multiple-output matrix average value pond according to claim 5, feature exist In taking that move horizontally step-length identical with step-length is vertically moved in the step S1.
7. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that the pond size in the step (1) takes square formation, i.e. maximum value pond window is k × k.
8. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that in the step S1, if the quantity of input feature vector figure is not the integral multiple of VPE quantity, extra input Characteristic pattern is by several VPE processing.
9. the vectorization realization side in the multiple-input and multiple-output matrix average value pond according to any one in claim 1-4 Method, which is characterized in that in the step S6, input feature vector figure takes the pond window of a size corresponding with k × k every time, and should Window is first moved horizontally according to moving step length, after vertically move, mobile sequence is, from left to right, from top to bottom.
CN201711478728.5A 2017-12-29 2017-12-29 Multi-input multi-output matrix average value pooling vectorization implementation method Active CN108205703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711478728.5A CN108205703B (en) 2017-12-29 2017-12-29 Multi-input multi-output matrix average value pooling vectorization implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711478728.5A CN108205703B (en) 2017-12-29 2017-12-29 Multi-input multi-output matrix average value pooling vectorization implementation method

Publications (2)

Publication Number Publication Date
CN108205703A true CN108205703A (en) 2018-06-26
CN108205703B CN108205703B (en) 2021-01-12

Family

ID=62606033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711478728.5A Active CN108205703B (en) 2017-12-29 2017-12-29 Multi-input multi-output matrix average value pooling vectorization implementation method

Country Status (1)

Country Link
CN (1) CN108205703B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002715A (en) * 2018-07-05 2018-12-14 东北大学秦皇岛分校 A kind of Malware recognition methods and system based on convolutional neural networks
CN109165733A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Multi-input multi-output matrix maximum pooling vectorization implementation method
CN109165734A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Matrix local response normalization vectorization implementation method
CN109886404A (en) * 2019-02-01 2019-06-14 东南大学 A kind of convolutional neural networks pond method of staggered diamonds perception
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN110796236A (en) * 2019-10-21 2020-02-14 中国人民解放军国防科技大学 Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN112906829A (en) * 2021-04-13 2021-06-04 成都四方伟业软件股份有限公司 Digital recognition model construction method and device based on Mnist data set

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130329987A1 (en) * 2012-06-11 2013-12-12 Genesis Group Inc. Video segmentation method
CN106991473A (en) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 The average value value pond method for parallel processing based on SIMD of vector processor-oriented
CN106991472A (en) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107301456A (en) * 2017-05-26 2017-10-27 中国人民解放军国防科学技术大学 Deep neural network multinuclear based on vector processor speeds up to method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130329987A1 (en) * 2012-06-11 2013-12-12 Genesis Group Inc. Video segmentation method
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN106991473A (en) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 The average value value pond method for parallel processing based on SIMD of vector processor-oriented
CN106991472A (en) * 2017-03-30 2017-07-28 中国人民解放军国防科学技术大学 A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
CN107301456A (en) * 2017-05-26 2017-10-27 中国人民解放军国防科学技术大学 Deep neural network multinuclear based on vector processor speeds up to method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MATTHIAS JOACHIM EHRHARDT等: "Vector-valued image processing by parallel level sets", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
张兴革: "基于卷积神经网络模型下的语音处理方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002715A (en) * 2018-07-05 2018-12-14 东北大学秦皇岛分校 A kind of Malware recognition methods and system based on convolutional neural networks
CN109165733A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Multi-input multi-output matrix maximum pooling vectorization implementation method
CN109165734A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Matrix local response normalization vectorization implementation method
CN109165734B (en) * 2018-07-11 2021-04-02 中国人民解放军国防科技大学 Matrix local response normalization vectorization implementation method
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN109886404A (en) * 2019-02-01 2019-06-14 东南大学 A kind of convolutional neural networks pond method of staggered diamonds perception
CN109886404B (en) * 2019-02-01 2023-08-04 东南大学 Convolutional neural network pooling method for staggered diamond perception
CN110796236A (en) * 2019-10-21 2020-02-14 中国人民解放军国防科技大学 Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN110796236B (en) * 2019-10-21 2022-06-17 中国人民解放军国防科技大学 Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN112906829A (en) * 2021-04-13 2021-06-04 成都四方伟业软件股份有限公司 Digital recognition model construction method and device based on Mnist data set

Also Published As

Publication number Publication date
CN108205703B (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN108205703A (en) Multi-input multi-output matrix average value pooling vectorization implementation method
CN107358293B (en) Neural network training method and device
Ngiam et al. Tiled convolutional neural networks
CN101271572B (en) Image segmentation method based on immunity clone selection clustering
CN106778745A (en) A kind of licence plate recognition method and device, user equipment
Castello et al. Deep learning in the built environment: Automatic detection of rooftop solar panels using Convolutional Neural Networks
CN107464210A (en) A kind of image Style Transfer method based on production confrontation network
CN111882040B (en) Convolutional neural network compression method based on channel number search
CN106250931A (en) A kind of high-definition picture scene classification method based on random convolutional neural networks
CN106650744B (en) The image object of local shape migration guidance is divided into segmentation method
CN109165733A (en) Multi-input multi-output matrix maximum pooling vectorization implementation method
CN106959937B (en) A kind of vectorization implementation method of the warp product matrix towards GPDSP
CN107292234A (en) It is a kind of that method of estimation is laid out based on information edge and the indoor scene of multi-modal feature
CN107292341A (en) Adaptive multi views clustering method based on paired collaboration regularization and NMF
CN106991472A (en) A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
CN106203444B (en) Classification of Polarimetric SAR Image method based on band wave and convolutional neural networks
CN107885700A (en) Multi-core implementation method for large-scale matrix convolution
CN109711401A (en) A kind of Method for text detection in natural scene image based on Faster Rcnn
CN109766949A (en) Convolutional neural networks light weight method, device and electronic equipment
CN110222760A (en) A kind of fast image processing method based on winograd algorithm
CN111523713A (en) Method and device for predicting residual oil saturation distribution in oil field
CN106294288B (en) A kind of distribution non-negative matrix factorization method
CN108510058A (en) Weight storage method in neural network and the processor based on this method
CN106228121A (en) Gesture feature recognition methods and device
CN110222598A (en) A kind of video behavior recognition methods, device, storage medium and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant