CN107704921A - The algorithm optimization method and device of convolutional neural networks based on Neon instructions - Google Patents

The algorithm optimization method and device of convolutional neural networks based on Neon instructions Download PDF

Info

Publication number
CN107704921A
CN107704921A CN201710974484.3A CN201710974484A CN107704921A CN 107704921 A CN107704921 A CN 107704921A CN 201710974484 A CN201710974484 A CN 201710974484A CN 107704921 A CN107704921 A CN 107704921A
Authority
CN
China
Prior art keywords
matrixes
convolution
matrix
row
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710974484.3A
Other languages
Chinese (zh)
Inventor
朱明�
曾建平
张智鹏
耿磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhi Xinyuandong Science And Technology Ltd
Original Assignee
Beijing Zhi Xinyuandong Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhi Xinyuandong Science And Technology Ltd filed Critical Beijing Zhi Xinyuandong Science And Technology Ltd
Priority to CN201710974484.3A priority Critical patent/CN107704921A/en
Publication of CN107704921A publication Critical patent/CN107704921A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides the algorithm optimization method of the convolutional neural networks instructed based on Neon, this method includes:The convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and A matrixes columns is alignd according to 4 multiples;Convolved image is treated in input, will treat that convolution input picture carries out matrixing processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;Transposed transform is carried out to B matrixes, obtains transposed matrix Bt;Calculate the row and row dot product of A matrixes and Bt matrixes;Instructed using Neon and carry out parallel optimization processing.Compared with prior art, the present invention can effectively lift the calculating performance of convolutional neural networks.

Description

The algorithm optimization method and device of convolutional neural networks based on Neon instructions
Technical field
The present invention relates to image procossing, video monitoring and convolutional neural networks, the more particularly to volume based on Neon instructions The algorithm optimization method and device of product neutral net.
Background technology
With the fast development of artificial intelligence, deep learning is increasingly introducing to image procossing, pattern-recognition neck In domain, and do well in terms of solving relevant issues.Wherein, convolutional neural networks (convolutional neural Networks, abbreviation CNN) a kind of model structure as deep learning, the processing image that is particularly good at is particularly the phase of big image Shut down problem concerning study, be widely used, most furtherd investigate.
However, in the practical application of Image Processing and Pattern Recognition, convolutional neural networks are usually to use more net Network layers realize that its computational complexity is higher, contain a large amount of intensive image convolution computings, time-consuming long, directly affects The performance of related algorithm based on convolutional neural networks, its application is limited, particularly in video monitoring front-end embedded device In, such as ARM platforms.
At present from the point of view of the technical standpoint of convolutional neural networks algorithm optimization, the optimization for convolution algorithm mainly uses square Battle array accelerates, i.e., by being two big matrixes by convolution nuclear matrix and input picture matrixing, is obtained by big matrix product Convolution results.Thus convolution algorithm is changed for matrix operation, it becomes possible to accelerate in some support third party matrix operations Realize that matrix accelerates in the platform in storehouse, convolutional neural networks algorithm performance is highly improved.But do not supported for some Third party's matrix operation accelerates the embedded-type ARM platform in storehouse, and convolutional neural networks algorithm is time-consuming still very long, and real-time is bad.
Neon instructs the 128 SIMD (Single of one kind for applying to ARM Cortex-A series processors Instruction, Multiple Data, single instrction, more data) expansion structure.From smart mobile phone and mobile computing device to HDTV, it has been acknowledged as one of processor the most superior in multimedia application field.Neon instructions are designed using special, Transplanting of the software between different platform is simplified, the intensive multimedia application for similar Dolby Mobile provides low energy Consumption and flexible acceleration function.
In summary, need to propose at present it is a kind of can be effectively reduced it is time-consuming be applied to ARM platforms based on Neon The convolutional neural networks algorithm optimization method of instruction.
The content of the invention
In view of this, it is a primary object of the present invention to reduce computing resource consumption, the algorithm of convolutional neural networks is realized Optimization.
To reach above-mentioned purpose, according to the first aspect of the invention, there is provided the convolutional Neural net based on Neon instructions The algorithm optimization method of network, this method include:
First step, the convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and by A matrixes Columns aligns according to 4 multiples;
Convolved image is treated in second step, input, will treat that convolution input picture carries out matrixing processing, B squares corresponding to acquisition Battle array, and B matrixes line number is alignd according to 4 multiples;
Third step, transposed transform is carried out to B matrixes, obtains transposed matrix Bt;
Four steps, calculate the row and row dot product of A matrixes and Bt matrixes;And
5th step, instructed using Neon and carry out parallel optimization processing.
Further, the first step includes:For the convolution kernel that CNum convolution kernel size is N × N in convolutional layer Image, successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, the A matrixes that columns is N × N;Will A matrix column numbers expand to 4 multiple, and numerical value is 0 in every column matrix of extension.
Further, the second step includes:Input need convolutional layer handle treat convolved image;According to N × N volume Product core carries out convolution slide window processing successively, to obtain the MNum convolution feature sub-images after convolution slide window processing;Successively Using each convolution feature sub-image as a column matrix data, the B matrixes that acquisition line number is N × N, columns is MNum;By B matrixes Line number expand to 4 multiple, numerical value is 0 in every row matrix of extension.
Further, the row and column of B matrixes is carried out transposed transform by the third step, to obtain line number as MNum, row Number is extended to the Bt matrixes of 4 multiples alignment for N × N.
Further, the 5th step includes:In Neon instructions, carry out 4 using loading instruction vld1q_f32 and float The loading operation of points;The multiplying that 4 floating numbers are carried out using multiplying order vmulq_f32 is operated;Using addition instruction Vaddq_f32 carries out the add operation of 4 floating numbers;Distinguished using instruction vget_low_f32 and vget_high_f32 is split Obtain 2 floating numbers;Using by first carrying out in vget_low_f32, vget_high_f32 2 to addition instruction vpadd_f32 The addition of floating number, then adjacent add up is carried out to the result of addition.
According to another aspect of the present invention, there is provided the algorithm optimization dress of the convolutional neural networks based on Neon instructions Put, the device includes:
Convolution kernel image array processing module, for the convolution kernel image of convolutional layer to be carried out into matrixing processing, acquisition pair The A matrixes answered, and A matrixes columns is alignd according to 4 multiples;
Convolution input picture matrix disposal module is treated, convolved image is treated for inputting, will treat that convolution input picture carries out square Array processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Matrix transposed transform module, for carrying out transposed transform to B matrixes, obtain transposed matrix Bt;
Row matrix and row dot product module, for calculating the row and row dot product of A matrixes and Bt matrixes;And
Neon optimization processing modules, for carrying out parallel optimization processing using Neon instructions.
Further, the convolution kernel image array processing module includes:For for CNum convolution kernel in convolutional layer Size is N × N convolution kernel image, and successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, row Number is N × N A matrixes;A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
Further, it is described to treat that convolution input picture matrix disposal module includes:Need what convolutional layer was handled for inputting Treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, passes through convolution slide window processing to obtain MNum Convolution feature sub-image afterwards;Successively using each convolution feature sub-image as a column matrix data, acquisition line number is N × N, row Number is MNum B matrixes;The line number of B matrixes is expanded to 4 multiple, numerical value is 0 in every row matrix of extension.
The matrix transposed transform module is used to the row and columns of B matrixes carrying out transposed transform, using obtain line number as MNum, Columns is the Bt matrixes that N × N is extended to the alignment of 4 multiples.
Further, the Neon optimizations processing module includes:Used in being instructed in Neon, instructed using loading Vld1q_f32 carries out the loading operation of 4 floating numbers;The multiplying of 4 floating numbers is carried out using multiplying order vmulq_f32 Operation;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Using split instruction vget_low_f32 and Vget_high_f32 obtains 2 floating numbers respectively;Using by addition instruction vpadd_f32 is first carried out vget_low_f32, The addition of 2 floating numbers in vget_high_f32, then adjacent add up is carried out to the result of addition.
Compared with existing convolutional neural networks algorithm optimization method, the convolutional Neural net of the invention based on Neon instructions The algorithm optimization method of network is by convolution kernel image and treats that the matrixing of convolved image is handled, and the Neon instructions of ARM platforms Parallel optimization, can effectively lift the calculating performances of convolutional neural networks.
Brief description of the drawings
Fig. 1 shows the embodiment of the algorithm optimization method of the convolutional neural networks based on Neon instructions according to the present invention Flow chart.
Fig. 2 shows the embodiment of the algorithm optimization device of the convolutional neural networks based on Neon instructions according to the present invention Structural representation.
Embodiment
To enable your auditor to further appreciate that structure, feature and the other purposes of the present invention, in conjunction with appended preferably real Apply example describe in detail it is as follows, illustrated preferred embodiment is merely to illustrate technical scheme, and the non-limiting present invention.
Fig. 1 gives the first reality of the algorithm optimization method of the convolutional neural networks based on Neon instructions according to the present invention Apply the flow chart of example.As shown in figure 1, the algorithm optimization method bag of the convolutional neural networks based on Neon instructions according to the present invention Include:
First step S1, the convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and by A squares Number of arrays is alignd according to 4 multiples;
Convolved image is treated in second step S2, input, will treat that convolution input picture carries out matrixing processing, B corresponding to acquisition Matrix, and B matrixes line number is alignd according to 4 multiples;
Third step S3, transposed transform is carried out to B matrixes, obtains transposed matrix Bt;
Four steps S4, calculate the row and row dot product of A matrixes and Bt matrixes;And
5th step S5, instructed using Neon and carry out parallel optimization processing.
Further, the first step S1 includes:For the convolution that CNum convolution kernel size is N × N in convolutional layer Core image, successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, the A matrixes that columns is N × N; A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
Embodiment, for the convolution kernel image of 16 3 × 3 in convolutional layer, respectively using i-th of convolution kernel image as i-th Capable matrix data, i={ 0,1,2 ..., 15 }, then it is 16 that can obtain line number, and columns is 9 A matrixes;By A matrix column numbers 4 multiple i.e. 12 are expanded to, numerical value is 0 in every column matrix of extension.
Further, the second step S2 includes:Input need convolutional layer handle treat convolved image;According to N × N's Convolution kernel carries out convolution slide window processing successively, to obtain the MNum convolution feature sub-images after convolution slide window processing;According to It is secondary using each convolution feature sub-image as a column matrix data, the B matrixes that acquisition line number is N × N, columns is MNum;By B squares The line number of battle array expands to 4 multiple, and numerical value is 0 in every row matrix of extension.
Embodiment, 3 × 3 convolution kernel slide window processing is carried out to input picture, to obtain after convolution slide window processing Convolution feature sub-image;Matrix data using i-th of convolution feature sub-image as the i-th row respectively, i=0,1,2 ..., MNum }, then it is MNum that can obtain columns, and line number is 9 B matrixes;The line number of B matrixes is expanded to 4 multiple i.e. 12, extended Every row matrix in numerical value be 0.
The row and columns of B matrixes is carried out transposed transform by the third step S3, is MNum, columns for N × N to obtain line number It is extended to the Bt matrixes of 4 multiples alignment.
Further, the 5th step S5 includes:In Neon instructions, 4 are carried out using loading instruction vld1q_f32 The loading operation of floating number;The multiplying that 4 floating numbers are carried out using multiplying order vmulq_f32 is operated;Referred to using addition Vaddq_f32 is made to carry out the add operation of 4 floating numbers;Divided using instruction vget_low_f32 and vget_high_f32 is split Huo Qu not 2 floating numbers;First carried out 2 in vget_low_f32, vget_high_f32 using by addition instruction vpadd_f32 The addition of individual floating number, then adjacent add up is carried out to the result of addition.
Embodiment, for 8 × 8 A matrixes and Bt matrixes, the first row vector of A matrixes is for [a1 a2 a3 … a8], First row vector of Bt matrixes is [b1 b2 b3 … b8], can be simultaneously using loading instruction vld1q_f32 in Neon instructions Row access, once command realize the loading of 4 floating numbers simultaneously, and such as 128 bit register Va are respectively used to store a1、a2、a3、a4 Above-mentioned 4 floating numbers or a5、a6、a7、a8Above-mentioned 4 floating numbers, 128 bit register Vb are respectively used to store b1、b2、b3、b4On State 4 floating numbers or b5、b6、b7、b8Above-mentioned 4 floating numbers;Multiplying for 4 floating numbers is realized using multiplying order vmulq_f32 Method arithmetic operation Va×b=[a1×b1 a2×b2 a3×b3 a4×b4] or Va×b=[a5×b5 a6×b6 a7×b7 a8× b8];The add operation V of 4 floating numbers is carried out using addition instruction vaddq_f32a+b=[a1×b1+a5×b5 a2×b2+a6× b6 a3×b3+a7×b7 a4×b4+a8×b8];A is obtained using instruction vget_low_f32 is split1×b1+a5×b5、a2×b2+ a6×b6Two floating numbers, a is obtained using instruction vget_high_f32 is split3×b3+a7×b7、a4×b4+a8×b8Two floating Points;By the addition a that two adjacent floating numbers in vget_low_f32 are first realized to addition instruction vpadd_f321×b1+a5× b5+a2×b2+a6×b6, in vget_high_f32 two adjacent floating numbers addition a3×b3+a7×b7+a4×b4+a8×b8, Added up again, that is, obtain result Result=a1×b1+a5×b5+a2×b2+a6×b6+a3×b3+a7×b7+a4×b4+a8× b8
Fig. 2 gives the first reality of the algorithm optimization device of the convolutional neural networks based on Neon instructions according to the present invention Apply the structural representation of example.As shown in Fig. 2 filled according to the algorithm optimization of the convolutional neural networks based on Neon instructions of the present invention Put including:
Convolution kernel image array processing module 1, for the convolution kernel image of convolutional layer to be carried out into matrixing processing, acquisition pair The A matrixes answered, and A matrixes columns is alignd according to 4 multiples;
Convolution input picture matrix disposal module 2 is treated, convolved image is treated for inputting, will treat that convolution input picture carries out square Array processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Matrix transposed transform module 3, for carrying out transposed transform to B matrixes, obtain transposed matrix Bt;
Row matrix and row dot product module 4, for calculating the row and row dot product of A matrixes and Bt matrixes;And
Neon optimizations processing module 5, for carrying out parallel optimization processing using Neon instructions.
Further, the convolution kernel image array processing module 1 includes:For for CNum convolution kernel in convolutional layer Size is N × N convolution kernel image, and successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, row Number is N × N A matrixes;A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
Further, it is described to treat that convolution input picture matrix disposal module 2 includes:Need what convolutional layer was handled for inputting Treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, passes through convolution slide window processing to obtain MNum Convolution feature sub-image afterwards;Successively using each convolution feature sub-image as a column matrix data, acquisition line number is N × N, row Number is MNum B matrixes;The line number of B matrixes is expanded to 4 multiple, numerical value is 0 in every row matrix of extension.
The matrix transposed transform module 3 is used to the row and columns of B matrixes carrying out transposed transform, using obtain line number as MNum, columns are the Bt matrixes that N × N is extended to the alignment of 4 multiples.
Further, the Neon optimizations processing module 5 includes:Used in being instructed in Neon, instructed using loading Vld1q_f32 carries out the loading operation of 4 floating numbers;The multiplying of 4 floating numbers is carried out using multiplying order vmulq_f32 Operation;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Using split instruction vget_low_f32 and Vget_high_f32 obtains 2 floating numbers respectively;Using by addition instruction vpadd_f32 is first carried out vget_low_f32, The addition of 2 floating numbers in vget_high_f32, then adjacent add up is carried out to the result of addition.
Compared with existing convolutional neural networks algorithm optimization method, the convolutional Neural net of the invention based on Neon instructions The algorithm optimization method of network is by convolution kernel image and treats that the matrixing of convolved image is handled, and the Neon instructions of ARM platforms Parallel optimization, can effectively lift the calculating performances of convolutional neural networks.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, should Understand, the present invention is not limited to implementation as described herein, and the purpose of these implementations description is to help this area In technical staff put into practice the present invention.Any those of skill in the art are easy to do not departing from spirit and scope of the invention In the case of be further improved and perfect, therefore the present invention is only by the content of the claims in the present invention and limiting for scope System, its intention cover all alternatives being included in the spirit and scope of the invention being defined by the appended claims and waited Same scheme.

Claims (10)

1. the algorithm optimization method of the convolutional neural networks based on Neon instructions, it is characterised in that this method includes:
First step, the convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and by A matrix columns Alignd according to 4 multiples;
Second step, input treat convolved image, will treat convolution input picture carry out matrixing processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Third step, transposed transform is carried out to B matrixes, obtains transposed matrix Bt;
Four steps, calculate the row and row dot product of A matrixes and Bt matrixes;And
5th step, instructed using Neon and carry out parallel optimization processing.
2. the method as described in claim 1, it is characterised in that the first step includes:For CNum volume in convolutional layer The convolution kernel image that product core size is N × N, successively using each convolution kernel image as a line matrix data, obtaining line number is CNum, the A matrixes that columns is N × N;A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
3. the method as described in claim 1, it is characterised in that the second step includes:Input needs what convolutional layer was handled Treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, passes through convolution slide window processing to obtain MNum Convolution feature sub-image afterwards;Successively using each convolution feature sub-image as a column matrix data, acquisition line number is N × N, row Number is MNum B matrixes;The line number of B matrixes is expanded to 4 multiple, numerical value is 0 in every row matrix of extension.
4. the method as described in claim 1, the row and column of B matrixes is carried out transposed transform by the third step, to obtain row Number is MNum, columns is that N × N is extended to the Bt matrixes that 4 multiples align.
5. the method as described in claim 1, it is characterised in that the 5th step includes:In Neon instructions, using loading Vld1q_f32 is instructed to carry out the loading operation of 4 floating numbers;The multiplication of 4 floating numbers is carried out using multiplying order vmulq_f32 Arithmetic operation;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Vget_low_ is instructed using splitting F32 and vget_high_f32 obtains 2 floating numbers respectively;Vget_low_ is first carried out using by addition instruction vpadd_f32 The addition of 2 floating numbers in f32, vget_high_f32, then adjacent add up is carried out to the result of addition.
6. the algorithm optimization device of the convolutional neural networks based on Neon instructions, it is characterised in that the device includes:Convolution kernel figure As matrix disposal module, for the convolution kernel image of convolutional layer to be carried out into matrixing processing, A matrixes corresponding to acquisition, and by A squares Number of arrays is alignd according to 4 multiples;
Convolution input picture matrix disposal module is treated, convolved image is treated for inputting, will treat that convolution input picture carries out matrixing Processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Matrix transposed transform module, for carrying out transposed transform to B matrixes, obtain transposed matrix Bt;
Row matrix and row dot product module, for calculating the row and row dot product of A matrixes and Bt matrixes;And
Neon optimization processing modules, for carrying out parallel optimization processing using Neon instructions.
7. device as claimed in claim 6, it is characterised in that the convolution kernel image array processing module includes:For right The convolution kernel image that CNum convolution kernel size is N × N in convolutional layer, successively using each convolution kernel image as a row matrix Data, acquisition line number is CNum, the A matrixes that columns is N × N;A matrix column numbers are expanded to 4 multiple, each column square of extension Numerical value is 0 in battle array.
8. device as claimed in claim 6, it is characterised in that described to treat that convolution input picture matrix disposal module includes:With In input need convolutional layer handle treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, to obtain The MNum convolution feature sub-images after convolution slide window processing;Successively using each convolution feature sub-image as a column matrix Data, the B matrixes that acquisition line number is N × N, columns is MNum;The line number of B matrixes is expanded to 4 multiple, the often row square of extension Numerical value is 0 in battle array.
9. device as claimed in claim 6, the matrix transposed transform module is used to the row and column of B matrixes carrying out transposition change Change, using obtain line number as MNum, columns for N × N be extended to 4 multiples alignment Bt matrixes.
10. device as claimed in claim 6, it is characterised in that the Neon optimizations processing module includes:For In Neon instructions, the loading that 4 floating numbers are carried out using loading instruction vld1q_f32 is operated;Using multiplying order vmulq_f32 Carry out the multiplying operation of 4 floating numbers;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Using Split instruction vget_low_f32 and vget_high_f32 and obtain 2 floating numbers respectively;Using by addition instruction vpadd_ F32 first carries out the addition of 2 floating numbers in vget_low_f32, vget_high_f32, then adjacent tired to the result progress of addition Add.
CN201710974484.3A 2017-10-19 2017-10-19 The algorithm optimization method and device of convolutional neural networks based on Neon instructions Pending CN107704921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710974484.3A CN107704921A (en) 2017-10-19 2017-10-19 The algorithm optimization method and device of convolutional neural networks based on Neon instructions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710974484.3A CN107704921A (en) 2017-10-19 2017-10-19 The algorithm optimization method and device of convolutional neural networks based on Neon instructions

Publications (1)

Publication Number Publication Date
CN107704921A true CN107704921A (en) 2018-02-16

Family

ID=61181715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710974484.3A Pending CN107704921A (en) 2017-10-19 2017-10-19 The algorithm optimization method and device of convolutional neural networks based on Neon instructions

Country Status (1)

Country Link
CN (1) CN107704921A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549931A (en) * 2018-04-25 2018-09-18 济南浪潮高新科技投资发展有限公司 A kind of accelerator and method of convolutional neural networks
CN109447239A (en) * 2018-09-26 2019-03-08 华南理工大学 A kind of embedded convolutional neural networks accelerated method based on ARM
CN109493300A (en) * 2018-11-15 2019-03-19 湖南鲲鹏智汇无人机技术有限公司 The real-time defogging method of Aerial Images and unmanned plane based on FPGA convolutional neural networks
CN109558944A (en) * 2018-12-13 2019-04-02 北京智芯原动科技有限公司 The algorithm optimization method and device of convolutional neural networks based on configurable convolutional layer
CN109615066A (en) * 2019-01-30 2019-04-12 新疆爱华盈通信息技术有限公司 A kind of method of cutting out of the convolutional neural networks for NEON optimization
CN109784372A (en) * 2018-12-17 2019-05-21 北京理工大学 A kind of objective classification method based on convolutional neural networks
CN110188869A (en) * 2019-05-05 2019-08-30 北京中科汇成科技有限公司 A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating
CN110263909A (en) * 2018-03-30 2019-09-20 腾讯科技(深圳)有限公司 Image-recognizing method and device
CN110399971A (en) * 2019-07-03 2019-11-01 Oppo广东移动通信有限公司 A kind of convolutional neural networks accelerating method and device, storage medium
CN111178505A (en) * 2019-12-23 2020-05-19 福建星网视易信息系统有限公司 Acceleration method of convolutional neural network, computer-readable storage medium and application
WO2020135602A1 (en) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Image processing method and device, intelligent driving system, and vehicle-mounted computing platform
CN111754409A (en) * 2019-03-27 2020-10-09 北京沃东天骏信息技术有限公司 Image processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286858A1 (en) * 2015-03-18 2015-10-08 Looksery, Inc. Emotion recognition in video conferencing
CN105184278A (en) * 2015-09-30 2015-12-23 深圳市商汤科技有限公司 Human face detection method and device
CN105320495A (en) * 2014-07-22 2016-02-10 英特尔公司 Weight-shifting mechanism for convolutional neural network
CN107003989A (en) * 2014-12-19 2017-08-01 英特尔公司 For the distribution and the method and apparatus of Collaboration computing in artificial neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320495A (en) * 2014-07-22 2016-02-10 英特尔公司 Weight-shifting mechanism for convolutional neural network
CN107003989A (en) * 2014-12-19 2017-08-01 英特尔公司 For the distribution and the method and apparatus of Collaboration computing in artificial neural network
US20150286858A1 (en) * 2015-03-18 2015-10-08 Looksery, Inc. Emotion recognition in video conferencing
CN105184278A (en) * 2015-09-30 2015-12-23 深圳市商汤科技有限公司 Human face detection method and device

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263909B (en) * 2018-03-30 2022-10-28 腾讯科技(深圳)有限公司 Image recognition method and device
CN110263909A (en) * 2018-03-30 2019-09-20 腾讯科技(深圳)有限公司 Image-recognizing method and device
CN108549931A (en) * 2018-04-25 2018-09-18 济南浪潮高新科技投资发展有限公司 A kind of accelerator and method of convolutional neural networks
CN109447239A (en) * 2018-09-26 2019-03-08 华南理工大学 A kind of embedded convolutional neural networks accelerated method based on ARM
CN109447239B (en) * 2018-09-26 2022-03-25 华南理工大学 Embedded convolutional neural network acceleration method based on ARM
CN109493300A (en) * 2018-11-15 2019-03-19 湖南鲲鹏智汇无人机技术有限公司 The real-time defogging method of Aerial Images and unmanned plane based on FPGA convolutional neural networks
CN109558944A (en) * 2018-12-13 2019-04-02 北京智芯原动科技有限公司 The algorithm optimization method and device of convolutional neural networks based on configurable convolutional layer
CN109558944B (en) * 2018-12-13 2021-02-19 北京智芯原动科技有限公司 Algorithm optimization method and device of convolutional neural network based on configurable convolutional layer
CN109784372B (en) * 2018-12-17 2020-11-13 北京理工大学 Target classification method based on convolutional neural network
CN109784372A (en) * 2018-12-17 2019-05-21 北京理工大学 A kind of objective classification method based on convolutional neural networks
WO2020135602A1 (en) * 2018-12-29 2020-07-02 北京市商汤科技开发有限公司 Image processing method and device, intelligent driving system, and vehicle-mounted computing platform
CN109615066A (en) * 2019-01-30 2019-04-12 新疆爱华盈通信息技术有限公司 A kind of method of cutting out of the convolutional neural networks for NEON optimization
CN111754409A (en) * 2019-03-27 2020-10-09 北京沃东天骏信息技术有限公司 Image processing method, device, equipment and storage medium
CN110188869B (en) * 2019-05-05 2021-08-10 北京中科汇成科技有限公司 Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm
CN110188869A (en) * 2019-05-05 2019-08-30 北京中科汇成科技有限公司 A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating
CN110399971A (en) * 2019-07-03 2019-11-01 Oppo广东移动通信有限公司 A kind of convolutional neural networks accelerating method and device, storage medium
CN111178505A (en) * 2019-12-23 2020-05-19 福建星网视易信息系统有限公司 Acceleration method of convolutional neural network, computer-readable storage medium and application
CN111178505B (en) * 2019-12-23 2023-04-07 福建星网视易信息系统有限公司 Acceleration method of convolutional neural network and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN107704921A (en) The algorithm optimization method and device of convolutional neural networks based on Neon instructions
AU2022200600B2 (en) Superpixel methods for convolutional neural networks
JP7394104B2 (en) Executing kernel strides in hardware
US10394929B2 (en) Adaptive execution engine for convolution computing systems
US20190340510A1 (en) Sparsifying neural network models
CN108765247A (en) Image processing method, device, storage medium and equipment
US20190303757A1 (en) Weight skipping deep learning accelerator
TW201706917A (en) Rotating data for neural network computations
US11164032B2 (en) Method of performing data processing operation
CN107516131A (en) Acceleration method and device, electronic equipment and the storage medium of convolutional calculation
CN109447239B (en) Embedded convolutional neural network acceleration method based on ARM
CN109558944A (en) The algorithm optimization method and device of convolutional neural networks based on configurable convolutional layer
Zeng et al. Optimizing frequency domain implementation of CNNs on FPGAs
CN116980277B (en) Data processing method, device, computer equipment and storage medium
Chen et al. A TSQR Based Krylov Basis Computation Method on Hybrid GPU Cluster
CN116820577A (en) Parallel processing method and device for model, first computing equipment and electronic equipment
CN117413280A (en) Convolution with kernel expansion and tensor accumulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180216