CN107704921A - The algorithm optimization method and device of convolutional neural networks based on Neon instructions - Google Patents
The algorithm optimization method and device of convolutional neural networks based on Neon instructions Download PDFInfo
- Publication number
- CN107704921A CN107704921A CN201710974484.3A CN201710974484A CN107704921A CN 107704921 A CN107704921 A CN 107704921A CN 201710974484 A CN201710974484 A CN 201710974484A CN 107704921 A CN107704921 A CN 107704921A
- Authority
- CN
- China
- Prior art keywords
- matrixes
- convolution
- matrix
- row
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides the algorithm optimization method of the convolutional neural networks instructed based on Neon, this method includes:The convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and A matrixes columns is alignd according to 4 multiples;Convolved image is treated in input, will treat that convolution input picture carries out matrixing processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;Transposed transform is carried out to B matrixes, obtains transposed matrix Bt;Calculate the row and row dot product of A matrixes and Bt matrixes;Instructed using Neon and carry out parallel optimization processing.Compared with prior art, the present invention can effectively lift the calculating performance of convolutional neural networks.
Description
Technical field
The present invention relates to image procossing, video monitoring and convolutional neural networks, the more particularly to volume based on Neon instructions
The algorithm optimization method and device of product neutral net.
Background technology
With the fast development of artificial intelligence, deep learning is increasingly introducing to image procossing, pattern-recognition neck
In domain, and do well in terms of solving relevant issues.Wherein, convolutional neural networks (convolutional neural
Networks, abbreviation CNN) a kind of model structure as deep learning, the processing image that is particularly good at is particularly the phase of big image
Shut down problem concerning study, be widely used, most furtherd investigate.
However, in the practical application of Image Processing and Pattern Recognition, convolutional neural networks are usually to use more net
Network layers realize that its computational complexity is higher, contain a large amount of intensive image convolution computings, time-consuming long, directly affects
The performance of related algorithm based on convolutional neural networks, its application is limited, particularly in video monitoring front-end embedded device
In, such as ARM platforms.
At present from the point of view of the technical standpoint of convolutional neural networks algorithm optimization, the optimization for convolution algorithm mainly uses square
Battle array accelerates, i.e., by being two big matrixes by convolution nuclear matrix and input picture matrixing, is obtained by big matrix product
Convolution results.Thus convolution algorithm is changed for matrix operation, it becomes possible to accelerate in some support third party matrix operations
Realize that matrix accelerates in the platform in storehouse, convolutional neural networks algorithm performance is highly improved.But do not supported for some
Third party's matrix operation accelerates the embedded-type ARM platform in storehouse, and convolutional neural networks algorithm is time-consuming still very long, and real-time is bad.
Neon instructs the 128 SIMD (Single of one kind for applying to ARM Cortex-A series processors
Instruction, Multiple Data, single instrction, more data) expansion structure.From smart mobile phone and mobile computing device to
HDTV, it has been acknowledged as one of processor the most superior in multimedia application field.Neon instructions are designed using special,
Transplanting of the software between different platform is simplified, the intensive multimedia application for similar Dolby Mobile provides low energy
Consumption and flexible acceleration function.
In summary, need to propose at present it is a kind of can be effectively reduced it is time-consuming be applied to ARM platforms based on Neon
The convolutional neural networks algorithm optimization method of instruction.
The content of the invention
In view of this, it is a primary object of the present invention to reduce computing resource consumption, the algorithm of convolutional neural networks is realized
Optimization.
To reach above-mentioned purpose, according to the first aspect of the invention, there is provided the convolutional Neural net based on Neon instructions
The algorithm optimization method of network, this method include:
First step, the convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and by A matrixes
Columns aligns according to 4 multiples;
Convolved image is treated in second step, input, will treat that convolution input picture carries out matrixing processing, B squares corresponding to acquisition
Battle array, and B matrixes line number is alignd according to 4 multiples;
Third step, transposed transform is carried out to B matrixes, obtains transposed matrix Bt;
Four steps, calculate the row and row dot product of A matrixes and Bt matrixes;And
5th step, instructed using Neon and carry out parallel optimization processing.
Further, the first step includes:For the convolution kernel that CNum convolution kernel size is N × N in convolutional layer
Image, successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, the A matrixes that columns is N × N;Will
A matrix column numbers expand to 4 multiple, and numerical value is 0 in every column matrix of extension.
Further, the second step includes:Input need convolutional layer handle treat convolved image;According to N × N volume
Product core carries out convolution slide window processing successively, to obtain the MNum convolution feature sub-images after convolution slide window processing;Successively
Using each convolution feature sub-image as a column matrix data, the B matrixes that acquisition line number is N × N, columns is MNum;By B matrixes
Line number expand to 4 multiple, numerical value is 0 in every row matrix of extension.
Further, the row and column of B matrixes is carried out transposed transform by the third step, to obtain line number as MNum, row
Number is extended to the Bt matrixes of 4 multiples alignment for N × N.
Further, the 5th step includes:In Neon instructions, carry out 4 using loading instruction vld1q_f32 and float
The loading operation of points;The multiplying that 4 floating numbers are carried out using multiplying order vmulq_f32 is operated;Using addition instruction
Vaddq_f32 carries out the add operation of 4 floating numbers;Distinguished using instruction vget_low_f32 and vget_high_f32 is split
Obtain 2 floating numbers;Using by first carrying out in vget_low_f32, vget_high_f32 2 to addition instruction vpadd_f32
The addition of floating number, then adjacent add up is carried out to the result of addition.
According to another aspect of the present invention, there is provided the algorithm optimization dress of the convolutional neural networks based on Neon instructions
Put, the device includes:
Convolution kernel image array processing module, for the convolution kernel image of convolutional layer to be carried out into matrixing processing, acquisition pair
The A matrixes answered, and A matrixes columns is alignd according to 4 multiples;
Convolution input picture matrix disposal module is treated, convolved image is treated for inputting, will treat that convolution input picture carries out square
Array processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Matrix transposed transform module, for carrying out transposed transform to B matrixes, obtain transposed matrix Bt;
Row matrix and row dot product module, for calculating the row and row dot product of A matrixes and Bt matrixes;And
Neon optimization processing modules, for carrying out parallel optimization processing using Neon instructions.
Further, the convolution kernel image array processing module includes:For for CNum convolution kernel in convolutional layer
Size is N × N convolution kernel image, and successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, row
Number is N × N A matrixes;A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
Further, it is described to treat that convolution input picture matrix disposal module includes:Need what convolutional layer was handled for inputting
Treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, passes through convolution slide window processing to obtain MNum
Convolution feature sub-image afterwards;Successively using each convolution feature sub-image as a column matrix data, acquisition line number is N × N, row
Number is MNum B matrixes;The line number of B matrixes is expanded to 4 multiple, numerical value is 0 in every row matrix of extension.
The matrix transposed transform module is used to the row and columns of B matrixes carrying out transposed transform, using obtain line number as MNum,
Columns is the Bt matrixes that N × N is extended to the alignment of 4 multiples.
Further, the Neon optimizations processing module includes:Used in being instructed in Neon, instructed using loading
Vld1q_f32 carries out the loading operation of 4 floating numbers;The multiplying of 4 floating numbers is carried out using multiplying order vmulq_f32
Operation;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Using split instruction vget_low_f32 and
Vget_high_f32 obtains 2 floating numbers respectively;Using by addition instruction vpadd_f32 is first carried out vget_low_f32,
The addition of 2 floating numbers in vget_high_f32, then adjacent add up is carried out to the result of addition.
Compared with existing convolutional neural networks algorithm optimization method, the convolutional Neural net of the invention based on Neon instructions
The algorithm optimization method of network is by convolution kernel image and treats that the matrixing of convolved image is handled, and the Neon instructions of ARM platforms
Parallel optimization, can effectively lift the calculating performances of convolutional neural networks.
Brief description of the drawings
Fig. 1 shows the embodiment of the algorithm optimization method of the convolutional neural networks based on Neon instructions according to the present invention
Flow chart.
Fig. 2 shows the embodiment of the algorithm optimization device of the convolutional neural networks based on Neon instructions according to the present invention
Structural representation.
Embodiment
To enable your auditor to further appreciate that structure, feature and the other purposes of the present invention, in conjunction with appended preferably real
Apply example describe in detail it is as follows, illustrated preferred embodiment is merely to illustrate technical scheme, and the non-limiting present invention.
Fig. 1 gives the first reality of the algorithm optimization method of the convolutional neural networks based on Neon instructions according to the present invention
Apply the flow chart of example.As shown in figure 1, the algorithm optimization method bag of the convolutional neural networks based on Neon instructions according to the present invention
Include:
First step S1, the convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and by A squares
Number of arrays is alignd according to 4 multiples;
Convolved image is treated in second step S2, input, will treat that convolution input picture carries out matrixing processing, B corresponding to acquisition
Matrix, and B matrixes line number is alignd according to 4 multiples;
Third step S3, transposed transform is carried out to B matrixes, obtains transposed matrix Bt;
Four steps S4, calculate the row and row dot product of A matrixes and Bt matrixes;And
5th step S5, instructed using Neon and carry out parallel optimization processing.
Further, the first step S1 includes:For the convolution that CNum convolution kernel size is N × N in convolutional layer
Core image, successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, the A matrixes that columns is N × N;
A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
Embodiment, for the convolution kernel image of 16 3 × 3 in convolutional layer, respectively using i-th of convolution kernel image as i-th
Capable matrix data, i={ 0,1,2 ..., 15 }, then it is 16 that can obtain line number, and columns is 9 A matrixes;By A matrix column numbers
4 multiple i.e. 12 are expanded to, numerical value is 0 in every column matrix of extension.
Further, the second step S2 includes:Input need convolutional layer handle treat convolved image;According to N × N's
Convolution kernel carries out convolution slide window processing successively, to obtain the MNum convolution feature sub-images after convolution slide window processing;According to
It is secondary using each convolution feature sub-image as a column matrix data, the B matrixes that acquisition line number is N × N, columns is MNum;By B squares
The line number of battle array expands to 4 multiple, and numerical value is 0 in every row matrix of extension.
Embodiment, 3 × 3 convolution kernel slide window processing is carried out to input picture, to obtain after convolution slide window processing
Convolution feature sub-image;Matrix data using i-th of convolution feature sub-image as the i-th row respectively, i=0,1,2 ...,
MNum }, then it is MNum that can obtain columns, and line number is 9 B matrixes;The line number of B matrixes is expanded to 4 multiple i.e. 12, extended
Every row matrix in numerical value be 0.
The row and columns of B matrixes is carried out transposed transform by the third step S3, is MNum, columns for N × N to obtain line number
It is extended to the Bt matrixes of 4 multiples alignment.
Further, the 5th step S5 includes:In Neon instructions, 4 are carried out using loading instruction vld1q_f32
The loading operation of floating number;The multiplying that 4 floating numbers are carried out using multiplying order vmulq_f32 is operated;Referred to using addition
Vaddq_f32 is made to carry out the add operation of 4 floating numbers;Divided using instruction vget_low_f32 and vget_high_f32 is split
Huo Qu not 2 floating numbers;First carried out 2 in vget_low_f32, vget_high_f32 using by addition instruction vpadd_f32
The addition of individual floating number, then adjacent add up is carried out to the result of addition.
Embodiment, for 8 × 8 A matrixes and Bt matrixes, the first row vector of A matrixes is for [a1 a2 a3 … a8],
First row vector of Bt matrixes is [b1 b2 b3 … b8], can be simultaneously using loading instruction vld1q_f32 in Neon instructions
Row access, once command realize the loading of 4 floating numbers simultaneously, and such as 128 bit register Va are respectively used to store a1、a2、a3、a4
Above-mentioned 4 floating numbers or a5、a6、a7、a8Above-mentioned 4 floating numbers, 128 bit register Vb are respectively used to store b1、b2、b3、b4On
State 4 floating numbers or b5、b6、b7、b8Above-mentioned 4 floating numbers;Multiplying for 4 floating numbers is realized using multiplying order vmulq_f32
Method arithmetic operation Va×b=[a1×b1 a2×b2 a3×b3 a4×b4] or Va×b=[a5×b5 a6×b6 a7×b7 a8×
b8];The add operation V of 4 floating numbers is carried out using addition instruction vaddq_f32a+b=[a1×b1+a5×b5 a2×b2+a6×
b6 a3×b3+a7×b7 a4×b4+a8×b8];A is obtained using instruction vget_low_f32 is split1×b1+a5×b5、a2×b2+
a6×b6Two floating numbers, a is obtained using instruction vget_high_f32 is split3×b3+a7×b7、a4×b4+a8×b8Two floating
Points;By the addition a that two adjacent floating numbers in vget_low_f32 are first realized to addition instruction vpadd_f321×b1+a5×
b5+a2×b2+a6×b6, in vget_high_f32 two adjacent floating numbers addition a3×b3+a7×b7+a4×b4+a8×b8,
Added up again, that is, obtain result Result=a1×b1+a5×b5+a2×b2+a6×b6+a3×b3+a7×b7+a4×b4+a8×
b8。
Fig. 2 gives the first reality of the algorithm optimization device of the convolutional neural networks based on Neon instructions according to the present invention
Apply the structural representation of example.As shown in Fig. 2 filled according to the algorithm optimization of the convolutional neural networks based on Neon instructions of the present invention
Put including:
Convolution kernel image array processing module 1, for the convolution kernel image of convolutional layer to be carried out into matrixing processing, acquisition pair
The A matrixes answered, and A matrixes columns is alignd according to 4 multiples;
Convolution input picture matrix disposal module 2 is treated, convolved image is treated for inputting, will treat that convolution input picture carries out square
Array processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Matrix transposed transform module 3, for carrying out transposed transform to B matrixes, obtain transposed matrix Bt;
Row matrix and row dot product module 4, for calculating the row and row dot product of A matrixes and Bt matrixes;And
Neon optimizations processing module 5, for carrying out parallel optimization processing using Neon instructions.
Further, the convolution kernel image array processing module 1 includes:For for CNum convolution kernel in convolutional layer
Size is N × N convolution kernel image, and successively using each convolution kernel image as a line matrix data, acquisition line number is CNum, row
Number is N × N A matrixes;A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
Further, it is described to treat that convolution input picture matrix disposal module 2 includes:Need what convolutional layer was handled for inputting
Treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, passes through convolution slide window processing to obtain MNum
Convolution feature sub-image afterwards;Successively using each convolution feature sub-image as a column matrix data, acquisition line number is N × N, row
Number is MNum B matrixes;The line number of B matrixes is expanded to 4 multiple, numerical value is 0 in every row matrix of extension.
The matrix transposed transform module 3 is used to the row and columns of B matrixes carrying out transposed transform, using obtain line number as
MNum, columns are the Bt matrixes that N × N is extended to the alignment of 4 multiples.
Further, the Neon optimizations processing module 5 includes:Used in being instructed in Neon, instructed using loading
Vld1q_f32 carries out the loading operation of 4 floating numbers;The multiplying of 4 floating numbers is carried out using multiplying order vmulq_f32
Operation;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Using split instruction vget_low_f32 and
Vget_high_f32 obtains 2 floating numbers respectively;Using by addition instruction vpadd_f32 is first carried out vget_low_f32,
The addition of 2 floating numbers in vget_high_f32, then adjacent add up is carried out to the result of addition.
Compared with existing convolutional neural networks algorithm optimization method, the convolutional Neural net of the invention based on Neon instructions
The algorithm optimization method of network is by convolution kernel image and treats that the matrixing of convolved image is handled, and the Neon instructions of ARM platforms
Parallel optimization, can effectively lift the calculating performances of convolutional neural networks.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, should
Understand, the present invention is not limited to implementation as described herein, and the purpose of these implementations description is to help this area
In technical staff put into practice the present invention.Any those of skill in the art are easy to do not departing from spirit and scope of the invention
In the case of be further improved and perfect, therefore the present invention is only by the content of the claims in the present invention and limiting for scope
System, its intention cover all alternatives being included in the spirit and scope of the invention being defined by the appended claims and waited
Same scheme.
Claims (10)
1. the algorithm optimization method of the convolutional neural networks based on Neon instructions, it is characterised in that this method includes:
First step, the convolution kernel image of convolutional layer is subjected to matrixing processing, A matrixes corresponding to acquisition, and by A matrix columns
Alignd according to 4 multiples;
Second step, input treat convolved image, will treat convolution input picture carry out matrixing processing, B matrixes corresponding to acquisition, and
B matrixes line number is alignd according to 4 multiples;
Third step, transposed transform is carried out to B matrixes, obtains transposed matrix Bt;
Four steps, calculate the row and row dot product of A matrixes and Bt matrixes;And
5th step, instructed using Neon and carry out parallel optimization processing.
2. the method as described in claim 1, it is characterised in that the first step includes:For CNum volume in convolutional layer
The convolution kernel image that product core size is N × N, successively using each convolution kernel image as a line matrix data, obtaining line number is
CNum, the A matrixes that columns is N × N;A matrix column numbers are expanded to 4 multiple, numerical value is 0 in every column matrix of extension.
3. the method as described in claim 1, it is characterised in that the second step includes:Input needs what convolutional layer was handled
Treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, passes through convolution slide window processing to obtain MNum
Convolution feature sub-image afterwards;Successively using each convolution feature sub-image as a column matrix data, acquisition line number is N × N, row
Number is MNum B matrixes;The line number of B matrixes is expanded to 4 multiple, numerical value is 0 in every row matrix of extension.
4. the method as described in claim 1, the row and column of B matrixes is carried out transposed transform by the third step, to obtain row
Number is MNum, columns is that N × N is extended to the Bt matrixes that 4 multiples align.
5. the method as described in claim 1, it is characterised in that the 5th step includes:In Neon instructions, using loading
Vld1q_f32 is instructed to carry out the loading operation of 4 floating numbers;The multiplication of 4 floating numbers is carried out using multiplying order vmulq_f32
Arithmetic operation;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Vget_low_ is instructed using splitting
F32 and vget_high_f32 obtains 2 floating numbers respectively;Vget_low_ is first carried out using by addition instruction vpadd_f32
The addition of 2 floating numbers in f32, vget_high_f32, then adjacent add up is carried out to the result of addition.
6. the algorithm optimization device of the convolutional neural networks based on Neon instructions, it is characterised in that the device includes:Convolution kernel figure
As matrix disposal module, for the convolution kernel image of convolutional layer to be carried out into matrixing processing, A matrixes corresponding to acquisition, and by A squares
Number of arrays is alignd according to 4 multiples;
Convolution input picture matrix disposal module is treated, convolved image is treated for inputting, will treat that convolution input picture carries out matrixing
Processing, B matrixes corresponding to acquisition, and B matrixes line number is alignd according to 4 multiples;
Matrix transposed transform module, for carrying out transposed transform to B matrixes, obtain transposed matrix Bt;
Row matrix and row dot product module, for calculating the row and row dot product of A matrixes and Bt matrixes;And
Neon optimization processing modules, for carrying out parallel optimization processing using Neon instructions.
7. device as claimed in claim 6, it is characterised in that the convolution kernel image array processing module includes:For right
The convolution kernel image that CNum convolution kernel size is N × N in convolutional layer, successively using each convolution kernel image as a row matrix
Data, acquisition line number is CNum, the A matrixes that columns is N × N;A matrix column numbers are expanded to 4 multiple, each column square of extension
Numerical value is 0 in battle array.
8. device as claimed in claim 6, it is characterised in that described to treat that convolution input picture matrix disposal module includes:With
In input need convolutional layer handle treat convolved image;Convolution slide window processing is carried out according to N × N convolution kernel successively, to obtain
The MNum convolution feature sub-images after convolution slide window processing;Successively using each convolution feature sub-image as a column matrix
Data, the B matrixes that acquisition line number is N × N, columns is MNum;The line number of B matrixes is expanded to 4 multiple, the often row square of extension
Numerical value is 0 in battle array.
9. device as claimed in claim 6, the matrix transposed transform module is used to the row and column of B matrixes carrying out transposition change
Change, using obtain line number as MNum, columns for N × N be extended to 4 multiples alignment Bt matrixes.
10. device as claimed in claim 6, it is characterised in that the Neon optimizations processing module includes:For
In Neon instructions, the loading that 4 floating numbers are carried out using loading instruction vld1q_f32 is operated;Using multiplying order vmulq_f32
Carry out the multiplying operation of 4 floating numbers;The add operation of 4 floating numbers is carried out using addition instruction vaddq_f32;Using
Split instruction vget_low_f32 and vget_high_f32 and obtain 2 floating numbers respectively;Using by addition instruction vpadd_
F32 first carries out the addition of 2 floating numbers in vget_low_f32, vget_high_f32, then adjacent tired to the result progress of addition
Add.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710974484.3A CN107704921A (en) | 2017-10-19 | 2017-10-19 | The algorithm optimization method and device of convolutional neural networks based on Neon instructions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710974484.3A CN107704921A (en) | 2017-10-19 | 2017-10-19 | The algorithm optimization method and device of convolutional neural networks based on Neon instructions |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107704921A true CN107704921A (en) | 2018-02-16 |
Family
ID=61181715
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710974484.3A Pending CN107704921A (en) | 2017-10-19 | 2017-10-19 | The algorithm optimization method and device of convolutional neural networks based on Neon instructions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107704921A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549931A (en) * | 2018-04-25 | 2018-09-18 | 济南浪潮高新科技投资发展有限公司 | A kind of accelerator and method of convolutional neural networks |
CN109447239A (en) * | 2018-09-26 | 2019-03-08 | 华南理工大学 | A kind of embedded convolutional neural networks accelerated method based on ARM |
CN109493300A (en) * | 2018-11-15 | 2019-03-19 | 湖南鲲鹏智汇无人机技术有限公司 | The real-time defogging method of Aerial Images and unmanned plane based on FPGA convolutional neural networks |
CN109558944A (en) * | 2018-12-13 | 2019-04-02 | 北京智芯原动科技有限公司 | The algorithm optimization method and device of convolutional neural networks based on configurable convolutional layer |
CN109615066A (en) * | 2019-01-30 | 2019-04-12 | 新疆爱华盈通信息技术有限公司 | A kind of method of cutting out of the convolutional neural networks for NEON optimization |
CN109784372A (en) * | 2018-12-17 | 2019-05-21 | 北京理工大学 | A kind of objective classification method based on convolutional neural networks |
CN110188869A (en) * | 2019-05-05 | 2019-08-30 | 北京中科汇成科技有限公司 | A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating |
CN110263909A (en) * | 2018-03-30 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Image-recognizing method and device |
CN110399971A (en) * | 2019-07-03 | 2019-11-01 | Oppo广东移动通信有限公司 | A kind of convolutional neural networks accelerating method and device, storage medium |
CN111178505A (en) * | 2019-12-23 | 2020-05-19 | 福建星网视易信息系统有限公司 | Acceleration method of convolutional neural network, computer-readable storage medium and application |
WO2020135602A1 (en) * | 2018-12-29 | 2020-07-02 | 北京市商汤科技开发有限公司 | Image processing method and device, intelligent driving system, and vehicle-mounted computing platform |
CN111754409A (en) * | 2019-03-27 | 2020-10-09 | 北京沃东天骏信息技术有限公司 | Image processing method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150286858A1 (en) * | 2015-03-18 | 2015-10-08 | Looksery, Inc. | Emotion recognition in video conferencing |
CN105184278A (en) * | 2015-09-30 | 2015-12-23 | 深圳市商汤科技有限公司 | Human face detection method and device |
CN105320495A (en) * | 2014-07-22 | 2016-02-10 | 英特尔公司 | Weight-shifting mechanism for convolutional neural network |
CN107003989A (en) * | 2014-12-19 | 2017-08-01 | 英特尔公司 | For the distribution and the method and apparatus of Collaboration computing in artificial neural network |
-
2017
- 2017-10-19 CN CN201710974484.3A patent/CN107704921A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320495A (en) * | 2014-07-22 | 2016-02-10 | 英特尔公司 | Weight-shifting mechanism for convolutional neural network |
CN107003989A (en) * | 2014-12-19 | 2017-08-01 | 英特尔公司 | For the distribution and the method and apparatus of Collaboration computing in artificial neural network |
US20150286858A1 (en) * | 2015-03-18 | 2015-10-08 | Looksery, Inc. | Emotion recognition in video conferencing |
CN105184278A (en) * | 2015-09-30 | 2015-12-23 | 深圳市商汤科技有限公司 | Human face detection method and device |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110263909B (en) * | 2018-03-30 | 2022-10-28 | 腾讯科技(深圳)有限公司 | Image recognition method and device |
CN110263909A (en) * | 2018-03-30 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Image-recognizing method and device |
CN108549931A (en) * | 2018-04-25 | 2018-09-18 | 济南浪潮高新科技投资发展有限公司 | A kind of accelerator and method of convolutional neural networks |
CN109447239A (en) * | 2018-09-26 | 2019-03-08 | 华南理工大学 | A kind of embedded convolutional neural networks accelerated method based on ARM |
CN109447239B (en) * | 2018-09-26 | 2022-03-25 | 华南理工大学 | Embedded convolutional neural network acceleration method based on ARM |
CN109493300A (en) * | 2018-11-15 | 2019-03-19 | 湖南鲲鹏智汇无人机技术有限公司 | The real-time defogging method of Aerial Images and unmanned plane based on FPGA convolutional neural networks |
CN109558944A (en) * | 2018-12-13 | 2019-04-02 | 北京智芯原动科技有限公司 | The algorithm optimization method and device of convolutional neural networks based on configurable convolutional layer |
CN109558944B (en) * | 2018-12-13 | 2021-02-19 | 北京智芯原动科技有限公司 | Algorithm optimization method and device of convolutional neural network based on configurable convolutional layer |
CN109784372B (en) * | 2018-12-17 | 2020-11-13 | 北京理工大学 | Target classification method based on convolutional neural network |
CN109784372A (en) * | 2018-12-17 | 2019-05-21 | 北京理工大学 | A kind of objective classification method based on convolutional neural networks |
WO2020135602A1 (en) * | 2018-12-29 | 2020-07-02 | 北京市商汤科技开发有限公司 | Image processing method and device, intelligent driving system, and vehicle-mounted computing platform |
CN109615066A (en) * | 2019-01-30 | 2019-04-12 | 新疆爱华盈通信息技术有限公司 | A kind of method of cutting out of the convolutional neural networks for NEON optimization |
CN111754409A (en) * | 2019-03-27 | 2020-10-09 | 北京沃东天骏信息技术有限公司 | Image processing method, device, equipment and storage medium |
CN110188869B (en) * | 2019-05-05 | 2021-08-10 | 北京中科汇成科技有限公司 | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm |
CN110188869A (en) * | 2019-05-05 | 2019-08-30 | 北京中科汇成科技有限公司 | A kind of integrated circuit based on convolutional neural networks algorithm accelerates the method and system of calculating |
CN110399971A (en) * | 2019-07-03 | 2019-11-01 | Oppo广东移动通信有限公司 | A kind of convolutional neural networks accelerating method and device, storage medium |
CN111178505A (en) * | 2019-12-23 | 2020-05-19 | 福建星网视易信息系统有限公司 | Acceleration method of convolutional neural network, computer-readable storage medium and application |
CN111178505B (en) * | 2019-12-23 | 2023-04-07 | 福建星网视易信息系统有限公司 | Acceleration method of convolutional neural network and computer-readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107704921A (en) | The algorithm optimization method and device of convolutional neural networks based on Neon instructions | |
AU2022200600B2 (en) | Superpixel methods for convolutional neural networks | |
JP7394104B2 (en) | Executing kernel strides in hardware | |
US10394929B2 (en) | Adaptive execution engine for convolution computing systems | |
US20190340510A1 (en) | Sparsifying neural network models | |
CN108765247A (en) | Image processing method, device, storage medium and equipment | |
US20190303757A1 (en) | Weight skipping deep learning accelerator | |
TW201706917A (en) | Rotating data for neural network computations | |
US11164032B2 (en) | Method of performing data processing operation | |
CN107516131A (en) | Acceleration method and device, electronic equipment and the storage medium of convolutional calculation | |
CN109447239B (en) | Embedded convolutional neural network acceleration method based on ARM | |
CN109558944A (en) | The algorithm optimization method and device of convolutional neural networks based on configurable convolutional layer | |
Zeng et al. | Optimizing frequency domain implementation of CNNs on FPGAs | |
CN116980277B (en) | Data processing method, device, computer equipment and storage medium | |
Chen et al. | A TSQR Based Krylov Basis Computation Method on Hybrid GPU Cluster | |
CN116820577A (en) | Parallel processing method and device for model, first computing equipment and electronic equipment | |
CN117413280A (en) | Convolution with kernel expansion and tensor accumulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180216 |