CN106991472A - A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond - Google Patents

A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond Download PDF

Info

Publication number
CN106991472A
CN106991472A CN201710201376.2A CN201710201376A CN106991472A CN 106991472 A CN106991472 A CN 106991472A CN 201710201376 A CN201710201376 A CN 201710201376A CN 106991472 A CN106991472 A CN 106991472A
Authority
CN
China
Prior art keywords
maximum
pond
matrix
relu activation
maximum pond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710201376.2A
Other languages
Chinese (zh)
Inventor
郭阳
张军阳
扈啸
王慧丽
胡敏慧
王子聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201710201376.2A priority Critical patent/CN106991472A/en
Publication of CN106991472A publication Critical patent/CN106991472A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond, its step is:S1:Calculating matrix A ReLU activation primitive values;S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;S3:Repeat step S1 and step S2 is until traveled through all sub-blocks of matrix A, and the processing of ReLU activation primitives and maximum pondization for being finally completed whole matrix A are operated.The present invention has the advantages that principle is simple, it is convenient to realize, can fully excavate the computation capability of vector processor and the concurrency of algorithm.

Description

A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
Technical field
Present invention relates generally to convolutional neural networks technical field, a kind of fusion ReLU activation primitives and maximum are refered in particular to The vectorization implementation method in pond.
Background technology
In the 1960s, Hubel and Wiesel is used for the god of local sensitivity and set direction in research cat cortex Find that its unique network structure can be effectively reduced the complexity of Feedback Neural Network during through member, then propose convolution god Through network (Convolutional Neural Network, CNN).Currently, convolutional neural networks have become numerous subject necks One of the study hotspot in domain, particularly in pattern classification field, because the network avoids the complicated early stage pretreatment to image, Original image can be directly inputted, thus has obtained more being widely applied.
Usually, typical convolutional neural networks computation model include convolutional layer, pond layer, full articulamentum and after Continuous grader, such as SVMs (Support Vector Machine, SVM).Wherein related in convolutional neural networks model And to calculating type mainly have:The convolutional calculation of matrix, the processing of activation primitive;Such as, linear activation primitive f (x)=x or Nonlinear activation functionDeng, and matrix pondization operation, including maximum pond (max pooling) and Average value pond (average pooling), it is final to convolution god finally by matrix operation and some processing surmounted function The output of model through network is predicted, and completes the process of object correlation identification.Because convolutional neural networks model is by not Same convolutional layer and pond layer alternating iteration, therefore the amount of calculation of convolutional neural networks model is very huge.Therefore, how to accelerate The computational efficiency of the model is all an important research contents in current academia and industrial quarters.
The activation primitive model used in current convolutional neural networks model mainly includes linear activation primitive and non-linear The major class of activation primitive two, about ten is as many as several, and correct linear unit, i.e. ReLU (Rectified Linear Units, ReLU) one kind of activation primitive exactly most common of which, its mathematic(al) representation be f (x)=max (0, x), it can be seen that work as input When signal x is less than 0, output is all 0, during more than 0, and output is equal to input.The outstanding advantages of ReLU functions are that one side suppresses;Phase The excited border broad to other activation primitives, with characteristics such as sparse activities.In terms of Neuscience, neuroscientist It also found the sparse activity of neuron, 2001, on the observational learning that Attwell et al. is consumed based on cerebral energy, pushed away Surveying neuron coding work mode has an openness and distributivity, the god that Lennie in 2003 et al. estimation brains are activated simultaneously Through member only 1~4%, the openness of neuron work is further demonstrated that.In terms of signal, i.e., neuron is simultaneously only to defeated Enter the small part selective response of signal, a large amount of signals can so improve the precision of study by shielding deliberately, more preferably, Quickly extract sparse features.Therefore, from the point of view of this openness angle, ReLU functions are into approximately meeting human neuronal mould The best model of type.
In convolutional neural networks model, view data after being handled by activation primitive, it is necessary to carry out the calculating of next stage, That is, pondization is operated, and pondization operation mainly includes maximum pondization and average value pond, and maximum pond refers to take out pond window In maximum as the pond window output, and average value pond refer to take out pond window in all elements average value It is used as the output of the pond window.Either average value pondization or maximum pond, its purpose are provided to not significantly Influence farthest to reduce the dimension of image array on the premise of Model Identification precision, amount of calculation is reduced, and also to keep away Exempt from model and over-fitting occur.
Convolutional neural networks are one of the computing modules commonly used during current high performance is calculated, be typical memory access it is intensive and Compute-intensive applications, calculating unit and memory bandwidth to processor require very high, and computation complexity is very big, current main-stream Acceleration platform have the convolutional neural networks calculating platform based on GPU, the convolutional neural networks calculating platform based on FPGA, be based on The calculating platform of special neutral net accelerator and convolutional Neural net is accelerated based on universal cpu or some vector processors The calculating of network model.Vector processor is a kind of processor of multipurpose multifunctional operating system, generally comprises Vector Processing part (Vector Processing Unit, VPU) and scalar processor unit (Scalar Processing Unit), Vector Processing part it is main by Several vector processing units (Vector Pocessing Element, VPE) constitute computing array, are mainly responsible for gauge Calculate, each VPE includes other work(such as multiple isomorphism calculation function parts such as some MAC0, MAC1, and ALU, position processing (BP) Can part;Scalar processing unit is mainly responsible for calculating task and stream is controlled, and VPU and SPU can carry out data channel transmission and exchange. There is provided the special vectorial memory bank of Large Copacity by the Load and Store of vector data access unit supporting vector data.
The content of the invention
The technical problem to be solved in the present invention is that:The technical problem existed for prior art, the present invention provides one Kind principle is simple, it is convenient to realize, can fully excavate the fusion of the computation capability of vector processor and the concurrency of algorithm ReLU activation primitives and the vectorization implementation method in maximum pond, i.e., grasped by merging ReLU activation primitives and maximum pondization Make to reduce the memory access amount of data, and then shorten the calculating time of convolutional neural networks, improve the meter of convolutional neural networks model Calculate efficiency.
A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
In order to solve the above technical problems, the present invention uses following technical scheme:
The vectorization implementation method in a kind of fusion ReLU activation primitives and maximum pond, its step is:
S1:Calculating matrix A ReLU activation primitive values;
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S3:All sub-blocks of the repeat step S1 and step S2 up to having traveled through matrix A, are finally completed whole matrix A The processing of ReLU activation primitives and the operation of maximum pondization.
As a further improvement on the present invention:The step S1's concretely comprises the following steps:
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (M, N), and ReLU activation primitives are f (x) (0, x), vector processing unit VPE number is p to=max, and it is p, k to take Nx、kyIntegral multiple, maximum pond window be kx× ky
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;
S1.3 compares size instruction VFCMPGD using vector, compares the size of vector registor, the logical value of comparative result It is put into PSW;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor;
S1.5 draws the result after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation of A matrix k row elements for 1.2 to 1.5k times Function operation, is as a result stored in vector registor, directly as the input value in maximum pond in step S2.
As a further improvement on the present invention:The step S2's concretely comprises the following steps:
S2.1 takes the k row elements calculated in step S 1.6, directly as the input of this calculating;
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW;
S2.3 use conditions vector assignment instructs VMOV;
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times;
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S 2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result.
As a further improvement on the present invention:A maximum pond result c in the step S 2.50,0Calculating it is public Formula is:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolution In neutral net, pond window is square formation, i.e. kx=ky=k, ai,jTo need to carry out the element in the matrix A in maximum pond.
As a further improvement on the present invention:Defined in the above-mentioned steps size of pond window be sizeX, sizeY, The horizontal displacement of two adjacent pool windows or vertical displacement are stride, and pond window is not overlapping during maximum pondization is operated, That is sizeX=sizeY=stride.
Compared with prior art, the advantage of the invention is that:A kind of fusion ReLU activation primitives and maximum of the present invention The vectorization implementation method in pond, stream is calculated by the way that the operation of ReLU activation primitives and the calculating of maximum pond are fused into one Journey, it is to avoid the most STORE and LOAD of time-consuming intermediate calculation data, while also make full use of the vectorial portion in vector processor The characteristics of multiple parallel processing elements of part can carry out identical operation operation simultaneously carries out substantial amounts of same type operation, so that significantly Degree improves the computational efficiency of convolutional neural networks model, and step is simple, it is easy to accomplish.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method.
Fig. 2 is the general structure schematic diagram of vector processor.
Fig. 3 is the present invention 2 × 2 maximum pond schematic diagram in concrete application example.
Fig. 4 is the RuLU activation primitive image schematic diagrames that the present invention is used in concrete application example.
Fig. 5 is present invention ReLU activation primitives vectorization implementation process schematic diagram in concrete application example.
Fig. 6 is the present invention 2 × 2 maximum pond vectorization implementation process schematic diagram in concrete application example.
Fig. 7 be the present invention in concrete application example maximum pondization operate in pond window not overlap operation schematic diagram.
Embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in Figure 1 and Figure 4, a kind of fusion ReLU activation primitives of the invention and the vectorization realization side in maximum pond Method, its step is:
S1:Calculating matrix A ReLU activation primitive values;
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (M, N), and ReLU activation primitives are f (x) (0, x), vector processing unit VPE number is p to=max, and it is p, k typically to take Nx、kyIntegral multiple, maximum pond window is kx×ky
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;Such as take into vector registor VR10, make It is 0, i.e. VMOVI 0, VR20 with one vector registor VR20 of VMOVI instruction initialization;
S1.3 compares size instruction VFCMPGD using vector, compares vector registor VR10 and VR20 size, compares knot The logical value of fruit is put into PSW, such as VR0;VFCMPGD VR10, VR20, VR0, if VR10 [i]>VR20[i],1≤i ≤ p, then VR0 [i]=1, otherwise VR0 [i]=0;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor, Computations is:[VR0] VMOV VR10, VR20, p numerical value can be calculated simultaneously by being instructed by this condition adele ReLU activation primitive values, the numerical value that 0 is more than in VR10 are put into VR20, the numerical value less than 0 is set to 0;
S1.5 draws the result VR20 after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation of A matrix k row elements for 1.2 to 1.5k times Function operation, is as a result stored in vector registor, it is not necessary to store, directly as the input in maximum pond in step S2 Value.
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S2.1 takes the k row elements calculated in step S 1.6, is posted because the result in step S 1.6 is stored directly in In storage, therefore it directly as the input of this calculating, this process avoids the time data memory in step S 1.6 and step The data LOAD times in rapid S 2.2, therefore, the calculating time is reduced accordingly.
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW, such as In VR1, VFCMPGD VR20, VR21, VR1, if VR20 [i]>VR21 [i], 1≤i≤p, then VR0 [i]=1, otherwise VR0 [i] =0;
S2.3 use conditions vector assignment instructs VMOV, takes out corresponding to the conditional register VR0 [i]=1 of step S 2.2 VPE in value VR20 [i] be assigned to corresponding VR21 [i], then value bigger than VR20 [i] in VR21 [i] is kept constant.
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times.
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S 2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result;
S3:All sub-blocks of the repeat step S1 and step S2 up to having traveled through matrix A, are finally completed whole matrix A The processing of ReLU activation primitives and the operation of maximum pondization.
The present invention is mainly suitable for vector processor, as shown in Fig. 2 being the general structure schematic diagram of vector processor. In concrete application example, a maximum pond result c in the step S 2.50,0Calculation formula be:
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolution In neutral net, pond window is generally square formation, i.e. kx=ky=k, ai,jTo need to carry out the member in the matrix A in maximum pond Element, its maximum pond schematic flow sheet is as shown in Figure 3.
In concrete application example, the size of pond window defined in above-mentioned steps is sizeX, sizeY, and two adjacent The horizontal displacement of pond window or vertical displacement are stride, and pond window is not overlapping in the operation of maximum pondization, i.e. sizeX= SizeY=stride, as shown in Figure 7.
As shown in Figure 5, Figure 6, the present invention is in a concrete application example, and detailed step is:
S100:Calculating matrix A ReLU activation primitive values;
S1.1 sets the matrix for needing to carry out activation primitive processing after convolution operation as A (16,16), and ReLU activation primitives are f (x) (0, x), vector processing unit VPE number p is 16, maximum pond window k to=maxx=ky=2;
S1.2 takes 16 elements of the 1st row of matrix A using vectorial VLOAD instructions into vector registor VR10, the 2nd row 16 elements, into VR11, are 0 using vector assignment instruction VMOVI instruction initialization 2 vector registors VR20, VR21, i.e., VMOVI 0, VR20, VMOVI 0, VR21;
S1.3 compares size instruction VFCMPGD using vector, compares vector registor VR10 and VR20, VR11 and VR21 Size, the logical value of comparative result is respectively put into PSW VR0, VR1;VFCMPGD VR10,VR20,VR0、 VFCMPGD VR11, VR21, VR1, if VR10 [i]>VR20 [i], (1≤i≤16), then VR0 [i]=1, otherwise VR0 [i]=0, Similarly VR1 [i]=1, otherwise VR1 [i]=0;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than or equal to 0 and is put into vector registor In, computations is:[VR0] VMOV VR10, VR20, [VR1] VMOV VR11, VR21, are counted simultaneously by condition assignment directive Calculate the Relu activation primitive values that matrix A front two row element amounts to 32 elements;
The ReLU activation primitive values of S1.5 matrix A front two row elements are put into vector registor VR20, VR21;
S200:The maximum pond of matrix in calculation procedure S100 after the processing of ReLU activation primitives;
S2.1 is according to maximum pond window size kx=ky=2, the front two row element that step S1.5 is calculated is taken out, That is VR20 and VR21, is used as the input of maximum pond layer;
S2.2 compares the 1st row element VR20 and the 2nd row element VR21, and the logical value of comparative result is put into condition deposit In device VR2, computations is:VFCMPGD VR20, VR21, VR2, if VR20 [i]>VR21 [i], 1≤i≤p, then VR2 [i]= 1, otherwise VR2 [i]=0;
S2.3 use conditions vector assignment instructs VMOV, takes out corresponding to step S2.3 conditional register VR0 [i]=1 VPE in value VR20 [i] be assigned to corresponding VR21 [i], then value bigger than VR20 [i] in VR21 [i] is kept constant;
S2.4 compares 1 time, draws the corresponding row maximum of 2 row elements;
S2.5 configures corresponding shuffle mode, compares the maximum for drawing corresponding adjacent 2 column element in step S2.4;
S2.6 finally show that (16/2) 8 pond window size is 2 × 2 maximum pond results simultaneously;
S300:Repeat step S100 and step S200, until having traveled through all sub-blocks of matrix A, is finally completed whole square The battle array A processing of ReLU activation primitives and the operation of maximum pondization.
It the above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention Scope.

Claims (5)

1. the vectorization implementation method in a kind of fusion ReLU activation primitives and maximum pond, it is characterised in that step is:
S1:Calculating matrix A ReLU activation primitive values;
S2:The maximum pond of matrix in calculation procedure S1 after the processing of ReLU activation primitives;
S3:Repeat step S1 and step S2 is until traveled through all sub-blocks of matrix A, and the ReLU for being finally completed whole matrix A swashs Function processing and the operation of maximum pondization living.
2. fusion ReLU activation primitives according to claim 1 and the vectorization implementation method in maximum pond, its feature It is, the step S1's concretely comprises the following steps:
S1.1 set needed after convolution operation carry out activation primitive processing matrix as A (M, N), ReLU activation primitives for f (x)= (0, x), vector processing unit VPE number is p to max, and it is p, k to take Nx、kyIntegral multiple, maximum pond window be kx×ky
S1.2 instructs the first row element for taking matrix A using vectorial VLOAD;
S1.3 compares size instruction VFCMPGD using vector, compares the size of vector registor, the logical value of comparative result is put into In PSW;
S1.4 use conditions vector assignment instructs VMOV, takes out the value in step 1.3 more than 0 and is put into vector registor;
S1.5 draws the result after the processing of ReLU activation primitives;
S1.6 is according to maximum pond window k, and repeat step draws the Relu activation primitives of A matrix k row elements for 1.2 to 1.5k times Operation, is as a result stored in vector registor, directly as the input value in maximum pond in step S2.
3. fusion ReLU activation primitives according to claim 2 and the vectorization implementation method in maximum pond, its feature It is, the step S2's concretely comprises the following steps:
S2.1 takes the k row elements calculated in step S1.6, directly as the input of this calculating;
S2.2 makes comparisons the 1st row element with the 2nd row element, and the logical value of comparative result is put into PSW;
S2.3 use conditions vector assignment instructs VMOV;
S2.4 draws the corresponding row maximum of k row elements by comparing k-1 times;
S2.5 configures shuffle mode, compares the maximum for drawing corresponding k column elements in step S2.4;
S2.6 finally show that p/k pond window size is k simultaneouslyx×kyMaximum pond result.
4. fusion ReLU activation primitives and the vectorization implementation method in maximum pond according to claim 1 or 2 or 3, Characterized in that, a maximum pond result c in the step S2.50,0Calculation formula be:
c 0 , 0 = m a x 0 ≤ i ≤ k x - 1 ( m a x 0 ≤ j ≤ k y - 1 ( a i , j ) )
Wherein c0,0For first element in the matrix of consequence of maximum pond, kx、kyFor the size of pond window, in convolutional Neural In network, pond window is square formation, i.e. kx=ky=k, ai,jTo need to carry out the element in the matrix A in maximum pond.
5. fusion ReLU activation primitives and the vectorization implementation method in maximum pond according to claim 1 or 2 or 3, Characterized in that, the size of pond window is sizeX, sizeY, the water of two adjacent pool windows defined in the above-mentioned steps Prosposition is moved or vertical displacement is stride, and pond window is not overlapping in the operation of maximum pondization, i.e. sizeX=sizeY= stride。
CN201710201376.2A 2017-03-30 2017-03-30 A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond Pending CN106991472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710201376.2A CN106991472A (en) 2017-03-30 2017-03-30 A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710201376.2A CN106991472A (en) 2017-03-30 2017-03-30 A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond

Publications (1)

Publication Number Publication Date
CN106991472A true CN106991472A (en) 2017-07-28

Family

ID=59411852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710201376.2A Pending CN106991472A (en) 2017-03-30 2017-03-30 A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond

Country Status (1)

Country Link
CN (1) CN106991472A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205703A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method
CN109165733A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Multi-input multi-output matrix maximum pooling vectorization implementation method
CN109583561A (en) * 2017-09-28 2019-04-05 杭州海康威视数字技术股份有限公司 A kind of the activation amount quantization method and device of deep neural network
CN109685058A (en) * 2017-10-18 2019-04-26 杭州海康威视数字技术股份有限公司 A kind of images steganalysis method, apparatus and computer equipment
CN109727376A (en) * 2018-12-29 2019-05-07 北京沃东天骏信息技术有限公司 Generate the method, apparatus and selling apparatus of configuration file
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A kind of method and system that the pondization applied to convolutional neural networks is handled
CN110796236A (en) * 2019-10-21 2020-02-14 中国人民解放军国防科技大学 Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
WO2021035621A1 (en) * 2019-08-29 2021-03-04 深圳市大疆创新科技有限公司 Extreme point extraction method and apparatus, and computer-readable storage medium
CN112598640A (en) * 2020-12-22 2021-04-02 哈尔滨市科佳通用机电股份有限公司 Water filling port cover plate loss detection method based on deep learning
US10970604B2 (en) 2018-09-27 2021-04-06 Industrial Technology Research Institute Fusion-based classifier, classification method, and classification system
CN113762452A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Method for quantizing PRELU activation function
CN113892092A (en) * 2019-02-06 2022-01-04 瀚博控股公司 Method and system for convolution model hardware accelerator

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109583561B (en) * 2017-09-28 2021-05-07 杭州海康威视数字技术股份有限公司 Activation quantity quantification method and device for deep neural network
CN109583561A (en) * 2017-09-28 2019-04-05 杭州海康威视数字技术股份有限公司 A kind of the activation amount quantization method and device of deep neural network
CN109685058A (en) * 2017-10-18 2019-04-26 杭州海康威视数字技术股份有限公司 A kind of images steganalysis method, apparatus and computer equipment
US11347977B2 (en) 2017-10-18 2022-05-31 Hangzhou Hikvision Digital Technology Co., Ltd. Lateral and longitudinal feature based image object recognition method, computer device, and non-transitory computer readable storage medium
CN109754359A (en) * 2017-11-01 2019-05-14 腾讯科技(深圳)有限公司 A kind of method and system that the pondization applied to convolutional neural networks is handled
US11537857B2 (en) 2017-11-01 2022-12-27 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
US11734554B2 (en) 2017-11-01 2023-08-22 Tencent Technology (Shenzhen) Company Limited Pooling processing method and system applied to convolutional neural network
CN108205703A (en) * 2017-12-29 2018-06-26 中国人民解放军国防科技大学 Multi-input multi-output matrix average value pooling vectorization implementation method
CN109165733A (en) * 2018-07-11 2019-01-08 中国人民解放军国防科技大学 Multi-input multi-output matrix maximum pooling vectorization implementation method
US10970604B2 (en) 2018-09-27 2021-04-06 Industrial Technology Research Institute Fusion-based classifier, classification method, and classification system
CN109727376A (en) * 2018-12-29 2019-05-07 北京沃东天骏信息技术有限公司 Generate the method, apparatus and selling apparatus of configuration file
CN113892092A (en) * 2019-02-06 2022-01-04 瀚博控股公司 Method and system for convolution model hardware accelerator
WO2021035621A1 (en) * 2019-08-29 2021-03-04 深圳市大疆创新科技有限公司 Extreme point extraction method and apparatus, and computer-readable storage medium
CN110796236B (en) * 2019-10-21 2022-06-17 中国人民解放军国防科技大学 Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN110796236A (en) * 2019-10-21 2020-02-14 中国人民解放军国防科技大学 Vectorization implementation method for pooling of multi-sample multi-channel convolutional neural network
CN113762452A (en) * 2020-06-04 2021-12-07 合肥君正科技有限公司 Method for quantizing PRELU activation function
CN113762452B (en) * 2020-06-04 2024-01-02 合肥君正科技有限公司 Method for quantizing PRELU activation function
CN112598640B (en) * 2020-12-22 2021-09-14 哈尔滨市科佳通用机电股份有限公司 Water filling port cover plate loss detection method based on deep learning
CN112598640A (en) * 2020-12-22 2021-04-02 哈尔滨市科佳通用机电股份有限公司 Water filling port cover plate loss detection method based on deep learning

Similar Documents

Publication Publication Date Title
CN106991472A (en) A kind of fusion ReLU activation primitives and the vectorization implementation method in maximum pond
TWI759361B (en) An architecture, method, computer-readable medium, and apparatus for sparse neural network acceleration
CN105892989B (en) Neural network accelerator and operational method thereof
US10846591B2 (en) Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks
TWI775605B (en) Deep vision processor
CN107578098B (en) Neural network processor based on systolic array
Cong et al. Minimizing computation in convolutional neural networks
EP3407266B1 (en) Artificial neural network calculating device and method for sparse connection
CN107609641A (en) Sparse neural network framework and its implementation
CN106970896A (en) The vectorization implementation method of the two-dimensional matrix convolution of vector processor-oriented
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN106529670A (en) Neural network processor based on weight compression, design method, and chip
CN108090565A (en) Accelerated method is trained in a kind of convolutional neural networks parallelization
CN108629406B (en) Arithmetic device for convolutional neural network
CN106529668A (en) Operation device and method of accelerating chip which accelerates depth neural network algorithm
CN107341541A (en) A kind of apparatus and method for performing full articulamentum neural metwork training
CN109190756A (en) Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
CN105930902A (en) Neural network processing method and system
CN106991665A (en) Method based on CUDA image co-registration parallel computations
WO2022067508A1 (en) Neural network accelerator, and acceleration method and device
CN107704921A (en) The algorithm optimization method and device of convolutional neural networks based on Neon instructions
CN108388537A (en) A kind of convolutional neural networks accelerator and method
CN110163333A (en) The parallel optimization method of convolutional neural networks
US20220019408A1 (en) Method and apparatus with neural network processing
CN108205703A (en) Multi-input multi-output matrix average value pooling vectorization implementation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170728