CN106991472A - A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond - Google Patents
A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond Download PDFInfo
- Publication number
- CN106991472A CN106991472A CN201710201376.2A CN201710201376A CN106991472A CN 106991472 A CN106991472 A CN 106991472A CN 201710201376 A CN201710201376 A CN 201710201376A CN 106991472 A CN106991472 A CN 106991472A
- Authority
- CN
- China
- Prior art keywords
- maximum
- pond
- matrix
- relu activation
- maximum pond
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004913 activation Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000004927 fusion Effects 0.000 title claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims abstract description 41
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 230000000052 comparative effect Effects 0.000 claims description 7
- 238000006073 displacement reaction Methods 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims 1
- 230000008901 benefit Effects 0.000 abstract description 3
- 102100022511 Cadherin-like protein 26 Human genes 0.000 description 25
- 101000899450 Homo sapiens Cadherin-like protein 26 Proteins 0.000 description 25
- 238000013527 convolutional neural network Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000006872 improvement Effects 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000007935 neutral effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 101100059544 Arabidopsis thaliana CDC5 gene Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 101150115300 MAC1 gene Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000002490 cerebral effect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond, its step is:S1:Calculating matrix A ReLU activation primitive values;S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;S3:Repeat step S1 and step S2 is until traveled through all sub-blocks of matrix A, and the processing of ReLU activation primitives and maximum pondization for being finally completed whole matrix A are operated.The present invention has the advantages that principle is simple, it is convenient to realize, can fully excavate the computation capability of vector processor and the concurrency of algorithm.
Description
Technical field
Present invention relates generally to convolutional neural networks technical field, a kind of fusion ReLU activation primitives and maximum are refered in particular to
The vectorization implementation method in pond.
Background technology
In the 1960s, Hubel and Wiesel is used for the god of local sensitivity and set direction in research cat cortex
Find that its unique network structure can be effectively reduced the complexity of Feedback Neural Network during through member, then propose convolution god
Through network (Convolutional Neural Network, CNN).Currently, convolutional neural networks have become numerous subject necks
One of the study hotspot in domain, particularly in pattern classification field, because the network avoids the complicated early stage pretreatment to image,
Original image can be directly inputted, thus has obtained more being widely applied.
Usually, typical convolutional neural networks computation model include convolutional layer, pond layer, full articulamentum and after
Continuous grader, such as SVMs (Support Vector Machine, SVM).Wherein related in convolutional neural networks model
And to calculating type mainly have:The convolutional calculation of matrix, the processing of activation primitive;Such as, linear activation primitive f (x)=x or
Nonlinear activation functionDeng, and matrix pondization operation, including maximum pond (max pooling) and
Average value pond (average pooling), it is final to convolution god finally by matrix operation and some processing surmounted function
The output of model through network is predicted, and completes the process of object correlation identification.Because convolutional neural networks model is by not
Same convolutional layer and pond layer alternating iteration, therefore the amount of calculation of convolutional neural networks model is very huge.Therefore, how to accelerate
The computational efficiency of the model is all an important research contents in current academia and industrial quarters.
The activation primitive model used in current convolutional neural networks model mainly includes linear activation primitive and non-linear
The major class of activation primitive two, about ten is as many as several, and correct linear unit, i.e. ReLU (Rectified Linear Units,
ReLU) one kind of activation primitive exactly most common of which, its mathematic(al) representation be f (x)=max (0, x), it can be seen that work as input
When signal x is less than 0, output is all 0, during more than 0, and output is equal to input.The outstanding advantages of ReLU functions are that one side suppresses;Phase
The excited border broad to other activation primitives, with characteristics such as sparse activities.In terms of Neuscience, neuroscientist
It also found the sparse activity of neuron, 2001, on the observational learning that Attwell et al. is consumed based on cerebral energy, pushed away
Surveying neuron coding work mode has an openness and distributivity, the god that Lennie in 2003 et al. estimation brains are activated simultaneously
Through member only 1~4%, the openness of neuron work is further demonstrated that.In terms of signal, i.e., neuron is simultaneously only to defeated
Enter the small part selective response of signal, a large amount of signals can so improve the precision of study by shielding deliberately, more preferably,
Quickly extract sparse features.Therefore, from the point of view of this openness angle, ReLU functions are into approximately meeting human neuronal mould
The best model of type.
In convolutional neural networks model, view data after being handled by activation primitive, it is necessary to carry out the calculating of next stage,
That is, pondization is operated, and pondization operation mainly includes maximum pondization and average value pond, and maximum pond refers to take out pond window
In maximum as the pond window output, and average value pond refer to take out pond window in all elements average value
It is used as the output of the pond window.Either average value pondization or maximum pond, its purpose are provided to not significantly
Influence farthest to reduce the dimension of image array on the premise of Model Identification precision, amount of calculation is reduced, and also to keep away
Exempt from model and over-fitting occur.
Convolutional neural networks are one of the computing modules commonly used during current high performance is calculated, be typical memory access it is intensive and
Compute-intensive applications, calculating unit and memory bandwidth to processor require very high, and computation complexity is very big, current main-stream
Acceleration platform have the convolutional neural networks calculating platform based on GPU, the convolutional neural networks calculating platform based on FPGA, be based on
The calculating platform of special neutral net accelerator and convolutional Neural net is accelerated based on universal cpu or some vector processors
The calculating of network model.Vector processor is a kind of processor of multipurpose multifunctional operating system, generally comprises Vector Processing part (Vector
Processing Unit, VPU) and scalar processor unit (Scalar Processing Unit), Vector Processing part it is main by
Several vector processing units (Vector Pocessing Element, VPE) constitute computing array, are mainly responsible for gauge
Calculate, each VPE includes other work(such as multiple isomorphism calculation function parts such as some MAC0, MAC1, and ALU, position processing (BP)
Can part;Scalar processing unit is mainly responsible for calculating task and stream is controlled, and VPU and SPU can carry out data channel transmission and exchange.
There is provided the special vectorial memory bank of Large Copacity by the Load and Store of vector data access unit supporting vector data.
The content of the invention
The technical problem to be solved in the present invention is that:The technical problem existed for prior art, the present invention provides one
Kind principle is simple, it is convenient to realize, can fully excavate the fusion of the computation capability of vector processor and the concurrency of algorithm
ReLU activation primitives and the vectorization implementation method in maximum pond, i.e., grasped by merging ReLU activation primitives and maximum pondization
Make to reduce the memory access amount of data, and then shorten the calculating time of convolutional neural networks, improve the meter of convolutional neural networks model
Calculate efficiency.
A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
In order to solve the above technical problems, the present invention uses following technical scheme:
The vectorization implementation method in a kind of fusion ReLU activation primitives and maximum pond, its step is:
S1:Calculating matrix A ReLU activation primitive values;
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S3:All sub-blocks of the repeat step S1 and step S2 up to having traveled through matrix A, are finally completed whole matrix A
The processing of ReLU activation primitives and the operation of maximum pondization.
As a further improvement on the present invention:The step S1's concretely comprises the following steps:
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (M, N), and ReLU activation primitives are f (x)
(0, x), vector processing unit VPE number is p to=max, and it is p, k to take Nx、kyIntegral multiple, maximum pond window be kx×
ky;
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;
S1.3 compares size instruction VFCMPGD using vector, compares the size of vector registor, the logical value of comparative result
It is put into PSW;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor;
S1.5 draws the result after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation of A matrix k row elements for 1.2 to 1.5k times
Function operation, is as a result stored in vector registor, directly as the input value in maximum pond in step S2.
As a further improvement on the present invention:The step S2's concretely comprises the following steps:
S2.1 takes the k row elements calculated in step S 1.6, directly as the input of this calculating;
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW;
S2.3 use conditions vector assignment instructs VMOV;
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times;
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S 2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result.
As a further improvement on the present invention:A maximum pond result c in the step S 2.50,0Calculating it is public
Formula is:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolution
In neutral net, pond window is square formation, i.e. kx=ky=k, ai,jTo need to carry out the element in the matrix A in maximum pond.
As a further improvement on the present invention:Defined in the above-mentioned steps size of pond window be sizeX, sizeY,
The horizontal displacement of two adjacent pool windows or vertical displacement are stride, and pond window is not overlapping during maximum pondization is operated,
That is sizeX=sizeY=stride.
Compared with prior art, the advantage of the invention is that:A kind of fusion ReLU activation primitives and maximum of the present invention
The vectorization implementation method in pond, stream is calculated by the way that the operation of ReLU activation primitives and the calculating of maximum pond are fused into one
Journey, it is to avoid the most STORE and LOAD of time-consuming intermediate calculation data, while also make full use of the vectorial portion in vector processor
The characteristics of multiple parallel processing elements of part can carry out identical operation operation simultaneously carries out substantial amounts of same type operation, so that significantly
Degree improves the computational efficiency of convolutional neural networks model, and step is simple, it is easy to accomplish.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method.
Fig. 2 is the general structure schematic diagram of vector processor.
Fig. 3 is the present invention 2 × 2 maximum pond schematic diagram in concrete application example.
Fig. 4 is the RuLU activation primitive image schematic diagrames that the present invention is used in concrete application example.
Fig. 5 is present invention ReLU activation primitives vectorization implementation process schematic diagram in concrete application example.
Fig. 6 is the present invention 2 × 2 maximum pond vectorization implementation process schematic diagram in concrete application example.
Fig. 7 be the present invention in concrete application example maximum pondization operate in pond window not overlap operation schematic diagram.
Embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in Figure 1 and Figure 4, a kind of fusion ReLU activation primitives of the invention and the vectorization realization side in maximum pond
Method, its step is:
S1:Calculating matrix A ReLU activation primitive values;
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (M, N), and ReLU activation primitives are f (x)
(0, x), vector processing unit VPE number is p to=max, and it is p, k typically to take Nx、kyIntegral multiple, maximum pond window is
kx×ky;
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;Such as take into vector registor VR10, make
It is 0, i.e. VMOVI 0, VR20 with one vector registor VR20 of VMOVI instruction initialization;
S1.3 compares size instruction VFCMPGD using vector, compares vector registor VR10 and VR20 size, compares knot
The logical value of fruit is put into PSW, such as VR0;VFCMPGD VR10, VR20, VR0, if VR10 [i]>VR20[i],1≤i
≤ p, then VR0 [i]=1, otherwise VR0 [i]=0;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor,
Computations is:[VR0] VMOV VR10, VR20, p numerical value can be calculated simultaneously by being instructed by this condition adele
ReLU activation primitive values, the numerical value that 0 is more than in VR10 are put into VR20, the numerical value less than 0 is set to 0;
S1.5 draws the result VR20 after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation of A matrix k row elements for 1.2 to 1.5k times
Function operation, is as a result stored in vector registor, it is not necessary to store, directly as the input in maximum pond in step S2
Value.
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S2.1 takes the k row elements calculated in step S 1.6, is posted because the result in step S 1.6 is stored directly in
In storage, therefore it directly as the input of this calculating, this process avoids the time data memory in step S 1.6 and step
The data LOAD times in rapid S 2.2, therefore, the calculating time is reduced accordingly.
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW, such as
In VR1, VFCMPGD VR20, VR21, VR1, if VR20 [i]>VR21 [i], 1≤i≤p, then VR0 [i]=1, otherwise VR0 [i]
=0;
S2.3 use conditions vector assignment instructs VMOV, takes out corresponding to the conditional register VR0 [i]=1 of step S 2.2
VPE in value VR20 [i] be assigned to corresponding VR21 [i], then value bigger than VR20 [i] in VR21 [i] is kept constant.
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times.
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S 2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result;
S3:All sub-blocks of the repeat step S1 and step S2 up to having traveled through matrix A, are finally completed whole matrix A
The processing of ReLU activation primitives and the operation of maximum pondization.
The present invention is mainly suitable for vector processor, as shown in Fig. 2 being the general structure schematic diagram of vector processor.
In concrete application example, a maximum pond result c in the step S 2.50,0Calculation formula be:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolution
In neutral net, pond window is generally square formation, i.e. kx=ky=k, ai,jTo need to carry out the member in the matrix A in maximum pond
Element, its maximum pond schematic flow sheet is as shown in Figure 3.
In concrete application example, the size of pond window defined in above-mentioned steps is sizeX, sizeY, and two adjacent
The horizontal displacement of pond window or vertical displacement are stride, and pond window is not overlapping in the operation of maximum pondization, i.e. sizeX=
SizeY=stride, as shown in Figure 7.
As shown in Figure 5, Figure 6, the present invention is in a concrete application example, and detailed step is:
S100:Calculating matrix A ReLU activation primitive values;
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (16,16), and ReLU activation primitives are f
(x) (0, x), vector processing unit VPE number p is 16, maximum pond window k to=maxx=ky=2;
S1.2 takes 16 elements of the 1st row of matrix A using vectorial VLOAD instructions into vector registor VR10, the 2nd row
16 elements, into VR11, are 0 using vector assignment instruction VMOVI instruction initialization 2 vector registors VR20, VR21, i.e.,
VMOVI 0, VR20, VMOVI 0, VR21;
S1.3 compares size instruction VFCMPGD using vector, compares vector registor VR10 and VR20, VR11 and VR21
Size, the logical value of comparative result is respectively put into PSW VR0, VR1;VFCMPGD VR10,VR20,VR0、
VFCMPGD VR11, VR21, VR1, if VR10 [i]>VR20 [i], (1≤i≤16), then VR0 [i]=1, otherwise VR0 [i]=0,
Similarly VR1 [i]=1, otherwise VR1 [i]=0;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than or equal to 0 and is put into vector registor
In, computations is:[VR0] VMOV VR10, VR20, [VR1] VMOV VR11, VR21, are counted simultaneously by condition assignment directive
Calculate the Relu activation primitive values that matrix A front two row element amounts to 32 elements;
The ReLU activation primitive values of S1.5 matrix A front two row elements are put into vector registor VR20, VR21;
S200:The maximum pond of matrix in calculation procedure S100 after the processing of ReLU activation primitives;
S2.1 is according to maximum pond window size kx=ky=2, the front two row element that step S1.5 is calculated is taken out,
That is VR20 and VR21, is used as the input of maximum pond layer;
S2.2 compares the 1st row element VR20 and the 2nd row element VR21, and the logical value of comparative result is put into condition deposit
In device VR2, computations is:VFCMPGD VR20, VR21, VR2, if VR20 [i]>VR21 [i], 1≤i≤p, then VR2 [i]=
1, otherwise VR2 [i]=0;
S2.3 use conditions vector assignment instructs VMOV, takes out corresponding to step S2.3 conditional register VR0 [i]=1
VPE in value VR20 [i] be assigned to corresponding VR21 [i], then value bigger than VR20 [i] in VR21 [i] is kept constant;
S2.4 compares 1 time, draws the corresponding row maximum of 2 row elements;
S2.5 configures corresponding shuffle mode, compares the maximum for drawing corresponding adjacent 2 column element in step S2.4;
S2.6 finally show that (16/2) 8 pond window size is 2 × 2 maximum pond results simultaneously;
S300:Repeat step S100 and step S200, until having traveled through all sub-blocks of matrix A, is finally completed whole square
The battle array A processing of ReLU activation primitives and the operation of maximum pondization.
It the above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,
All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art
For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention
Scope.
Claims (5)
1. the vectorization implementation method in a kind of fusion ReLU activation primitives and maximum pond, it is characterised in that step is:
S1:Calculating matrix A ReLU activation primitive values;
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S3:Repeat step S1 and step S2 is until traveled through all sub-blocks of matrix A, and the ReLU for being finally completed whole matrix A swashs
Function processing and the operation of maximum pondization living.
2. fusion ReLU activation primitives according to claim 1 and the vectorization implementation method in maximum pond, its feature
It is, the step S1's concretely comprises the following steps:
S1.1 set needed after convolution operation carry out activation primitive processing matrix as A (M, N), ReLU activation primitives for f (x)=
(0, x), vector processing unit VPE number is p to max, and it is p, k to take Nx、kyIntegral multiple, maximum pond window be kx×ky;
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;
S1.3 compares size instruction VFCMPGD using vector, compares the size of vector registor, the logical value of comparative result is put into
In PSW;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor;
S1.5 draws the result after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation primitives of A matrix k row elements for 1.2 to 1.5k times
Operation, is as a result stored in vector registor, directly as the input value in maximum pond in step S2.
3. fusion ReLU activation primitives according to claim 2 and the vectorization implementation method in maximum pond, its feature
It is, the step S2's concretely comprises the following steps:
S2.1 takes the k row elements calculated in step S1.6, directly as the input of this calculating;
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW;
S2.3 use conditions vector assignment instructs VMOV;
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times;
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result.
4. fusion ReLU activation primitives and the vectorization implementation method in maximum pond according to claim 1 or 2 or 3,
Characterized in that, a maximum pond result c in the step S2.50,0Calculation formula be:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolutional Neural
In network, pond window is square formation, i.e. kx=ky=k, ai,jTo need to carry out the element in the matrix A in maximum pond.
5. fusion ReLU activation primitives and the vectorization implementation method in maximum pond according to claim 1 or 2 or 3,
Characterized in that, the size of pond window is sizeX, sizeY, the water of two adjacent pool windows defined in the above-mentioned steps
Prosposition is moved or vertical displacement is stride, and pond window is not overlapping in the operation of maximum pondization, i.e. sizeX=sizeY=
stride。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710201376.2A CN106991472A (en) | 2017-03-30 | 2017-03-30 | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710201376.2A CN106991472A (en) | 2017-03-30 | 2017-03-30 | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106991472A true CN106991472A (en) | 2017-07-28 |
Family
ID=59411852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710201376.2A Pending CN106991472A (en) | 2017-03-30 | 2017-03-30 | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991472A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108205703A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
CN109165733A (en) * | 2018-07-11 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix maximum pooling vectorization implementation method |
CN109583561A (en) * | 2017-09-28 | 2019-04-05 | 杭州海康威视数字技术股份有限公司 | A kind of the activation amount quantization method and device of deep neural network |
CN109685058A (en) * | 2017-10-18 | 2019-04-26 | 杭州海康威视数字技术股份有限公司 | A kind of images steganalysis method, apparatus and computer equipment |
CN109727376A (en) * | 2018-12-29 | 2019-05-07 | 北京沃东天骏信息技术有限公司 | Generate the method, apparatus and selling apparatus of configuration file |
CN109754359A (en) * | 2017-11-01 | 2019-05-14 | 腾讯科技(深圳)有限公司 | A kind of method and system that the pondization applied to convolutional neural networks is handled |
CN110796236A (en) * | 2019-10-21 | 2020-02-14 | 中国人民解放军国防科技大学 | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network |
WO2021035621A1 (en) * | 2019-08-29 | 2021-03-04 | 深圳市大疆创新科技有限公司 | Extreme point extraction method and apparatus, and computer-readable storage medium |
CN112598640A (en) * | 2020-12-22 | 2021-04-02 | 哈尔滨市科佳通用机电股份有限公司 | Water filling port cover plate loss detection method based on deep learning |
US10970604B2 (en) | 2018-09-27 | 2021-04-06 | Industrial Technology Research Institute | Fusion-based classifier, classification method, and classification system |
CN113762452A (en) * | 2020-06-04 | 2021-12-07 | 合肥君正科技有限公司 | Method for quantizing PRELU activation function |
CN113892092A (en) * | 2019-02-06 | 2022-01-04 | 瀚博控股公司 | Method and system for convolution model hardware accelerator |
-
2017
- 2017-03-30 CN CN201710201376.2A patent/CN106991472A/en active Pending
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583561B (en) * | 2017-09-28 | 2021-05-07 | 杭州海康威视数字技术股份有限公司 | Activation quantity quantification method and device for deep neural network |
CN109583561A (en) * | 2017-09-28 | 2019-04-05 | 杭州海康威视数字技术股份有限公司 | A kind of the activation amount quantization method and device of deep neural network |
CN109685058A (en) * | 2017-10-18 | 2019-04-26 | 杭州海康威视数字技术股份有限公司 | A kind of images steganalysis method, apparatus and computer equipment |
US11347977B2 (en) | 2017-10-18 | 2022-05-31 | Hangzhou Hikvision Digital Technology Co., Ltd. | Lateral and longitudinal feature based image object recognition method, computer device, and non-transitory computer readable storage medium |
CN109754359A (en) * | 2017-11-01 | 2019-05-14 | 腾讯科技(深圳)有限公司 | A kind of method and system that the pondization applied to convolutional neural networks is handled |
US11537857B2 (en) | 2017-11-01 | 2022-12-27 | Tencent Technology (Shenzhen) Company Limited | Pooling processing method and system applied to convolutional neural network |
US11734554B2 (en) | 2017-11-01 | 2023-08-22 | Tencent Technology (Shenzhen) Company Limited | Pooling processing method and system applied to convolutional neural network |
CN108205703A (en) * | 2017-12-29 | 2018-06-26 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix average value pooling vectorization implementation method |
CN109165733A (en) * | 2018-07-11 | 2019-01-08 | 中国人民解放军国防科技大学 | Multi-input multi-output matrix maximum pooling vectorization implementation method |
US10970604B2 (en) | 2018-09-27 | 2021-04-06 | Industrial Technology Research Institute | Fusion-based classifier, classification method, and classification system |
CN109727376A (en) * | 2018-12-29 | 2019-05-07 | 北京沃东天骏信息技术有限公司 | Generate the method, apparatus and selling apparatus of configuration file |
CN113892092A (en) * | 2019-02-06 | 2022-01-04 | 瀚博控股公司 | Method and system for convolution model hardware accelerator |
WO2021035621A1 (en) * | 2019-08-29 | 2021-03-04 | 深圳市大疆创新科技有限公司 | Extreme point extraction method and apparatus, and computer-readable storage medium |
CN110796236B (en) * | 2019-10-21 | 2022-06-17 | 中国人民解放军国防科技大学 | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network |
CN110796236A (en) * | 2019-10-21 | 2020-02-14 | 中国人民解放军国防科技大学 | Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network |
CN113762452A (en) * | 2020-06-04 | 2021-12-07 | 合肥君正科技有限公司 | Method for quantizing PRELU activation function |
CN113762452B (en) * | 2020-06-04 | 2024-01-02 | 合肥君正科技有限公司 | Method for quantizing PRELU activation function |
CN112598640B (en) * | 2020-12-22 | 2021-09-14 | 哈尔滨市科佳通用机电股份有限公司 | Water filling port cover plate loss detection method based on deep learning |
CN112598640A (en) * | 2020-12-22 | 2021-04-02 | 哈尔滨市科佳通用机电股份有限公司 | Water filling port cover plate loss detection method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106991472A (en) | A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond | |
TWI759361B (en) | An architecture, method, computer-readable medium, and apparatus for sparse neural network acceleration | |
CN105892989B (en) | Neural network accelerator and operational method thereof | |
US10846591B2 (en) | Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks | |
TWI775605B (en) | Deep vision processor | |
CN107578098B (en) | Neural network processor based on systolic array | |
Cong et al. | Minimizing computation in convolutional neural networks | |
EP3407266B1 (en) | Artificial neural network calculating device and method for sparse connection | |
CN107609641A (en) | Sparse neural network framework and its implementation | |
CN106970896A (en) | The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented | |
CN107239824A (en) | Apparatus and method for realizing sparse convolution neutral net accelerator | |
CN106529670A (en) | Neural network processor based on weight compression, design method, and chip | |
CN108090565A (en) | Accelerated method is trained in a kind of convolutional neural networks parallelization | |
CN108629406B (en) | Arithmetic device for convolutional neural network | |
CN106529668A (en) | Operation device and method of accelerating chip which accelerates depth neural network algorithm | |
CN107341541A (en) | A kind of apparatus and method for performing full articulamentum neural metwork training | |
CN109190756A (en) | Arithmetic unit based on Winograd convolution and the neural network processor comprising the device | |
CN105930902A (en) | Neural network processing method and system | |
CN106991665A (en) | Method based on CUDA image co-registration parallel computations | |
WO2022067508A1 (en) | Neural network accelerator, and acceleration method and device | |
CN107704921A (en) | The algorithm optimization method and device of convolutional neural networks based on Neon instructions | |
CN108388537A (en) | A kind of convolutional neural networks accelerator and method | |
CN110163333A (en) | The parallel optimization method of convolutional neural networks | |
US20220019408A1 (en) | Method and apparatus with neural network processing | |
CN108205703A (en) | Multi-input multi-output matrix average value pooling vectorization implementation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170728 |