CN110533164B - Winograd convolution splitting method for convolution neural network accelerator - Google Patents

Winograd convolution splitting method for convolution neural network accelerator Download PDF

Info

Publication number
CN110533164B
CN110533164B CN201910717929.9A CN201910717929A CN110533164B CN 110533164 B CN110533164 B CN 110533164B CN 201910717929 A CN201910717929 A CN 201910717929A CN 110533164 B CN110533164 B CN 110533164B
Authority
CN
China
Prior art keywords
convolution
size
input
convolution kernel
winograd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910717929.9A
Other languages
Chinese (zh)
Other versions
CN110533164A (en
Inventor
杨晨
王逸洲
王小力
耿莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910717929.9A priority Critical patent/CN110533164B/en
Publication of CN110533164A publication Critical patent/CN110533164A/en
Application granted granted Critical
Publication of CN110533164B publication Critical patent/CN110533164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a Winograd convolution splitting method for a convolution neural network accelerator, which comprises the following steps of: 1) Reading input and convolution kernels with any size from a cache of a convolution neural network accelerator; 2) Judging whether to carry out convolution splitting or not according to the size of the convolution kernel and the input size, and if the convolution splitting is required, carrying out the next step; 3) Splitting the convolution kernel according to the size and the step length of the convolution kernel, and splitting the input according to the input size and the step length; 4) Combining and zero-filling the split elements according to the size of the convolution kernel, and combining and zero-filling the split elements according to the input size; 5) Carrying out Winograd convolution on each pair of split input and convolution kernel; 6) Accumulating Winograd convolution results of each team of input and convolution kernel; 7) And storing the accumulation result in a cache of the convolutional neural network accelerator. The invention enables the convolutional neural network accelerator to support convolution of various shapes by adopting a Winograd accelerating unit.

Description

Winograd convolution splitting method for convolution neural network accelerator
Technical Field
The invention belongs to the field of convolutional neural network algorithms, and particularly relates to a Winograd convolutional splitting method for a convolutional neural network accelerator.
Background
Convolutional Neural Networks (CNN) are being widely used in computer vision tasks such as object detection and image classification, but as network models are continuously developed, recognition accuracy is continuously improved, and huge computation and data amount are brought, so that high-performance and low-power-consumption hardware equipment is required, and meanwhile, the flexibility of the hardware equipment must be ensured to meet various network models.
Convolutional neural network accelerators are widely used to accelerate convolutional neural network algorithms on both the mobile and server sides. In order to improve the performance of the accelerator, a Winograd algorithm is used for reducing hardware multipliers introduced by each multiplication operation on the algorithm level, so that the throughput of the accelerator can be increased on the premise of the same number of the multipliers. At present, the convolutional neural network accelerator adopting the Winograd algorithm has a serious problem, and because the operation units of the Winograd algorithm have fixed parameters, each operation unit can only accelerate the convolution of the corresponding parameters. In order to expand the flexibility of the accelerator, a Winograd arithmetic unit with various parameters needs to be designed, so that the resource consumption and the power consumption of the accelerator are increased. Secondly, the accelerator obtains data streams of different shapes from Winograd operation units with different parameters, so that the utilization rate of the accelerator operation unit is reduced, and the performance of the accelerator is seriously reduced.
Disclosure of Invention
The present invention aims to provide a Winograd convolution splitting method for a convolutional neural network accelerator, which aims to overcome the defects in the prior art, so that the convolutional neural network accelerator can support convolutions of various shapes by using a Winograd accelerating unit. The focus of the present invention is to split and convert convolutions of different shapes into a unified data stream.
The invention is realized by adopting the following technical scheme:
a Winograd convolution splitting method for a convolution neural network accelerator comprises the following steps:
1) Reading input and convolution kernels with any size from a cache of a convolution neural network accelerator;
2) Judging whether to carry out convolution splitting or not according to the size of the convolution kernel and the input size, and if the convolution splitting is required, carrying out the next step;
3) Splitting the convolution kernel according to the size and the step length of the convolution kernel, and splitting the input according to the input size and the step length;
4) Combining and zero-filling the split elements according to the size of the convolution kernel, and combining and zero-filling the split elements according to the input size;
5) Carrying out Winograd convolution on each pair of split input and convolution kernel;
6) Accumulating Winograd convolution results of each team of input and convolution kernels;
7) And storing the accumulation result in a cache of the convolutional neural network accelerator.
The invention has the further improvement that the specific judgment method in the step 2) is as follows:
if the size of the convolution kernel is smaller than the set size of the convolution kernel and the input size is smaller than the set input size, performing no convolution splitting and directly filling zero into the set size of the convolution kernel and the input size; and if the size of the convolution kernel is larger than the set convolution kernel size and the input size is larger than the set input size, performing convolution splitting.
The further improvement of the invention is that the specific implementation method of the step 3) is as follows:
30 Splitting an original convolution kernel into a plurality of convolution kernels with the size not larger than the set convolution kernel, wherein the adjacent distance of each element in the split convolution kernels in the horizontal and vertical directions is one convolution step;
31 Splitting an original input into a plurality of inputs with the size not larger than the set input size, wherein the adjacent distance of each element in the split inputs in the horizontal and vertical directions is a convolution step length.
The further improvement of the invention is that the specific implementation method of the step 30) is as follows:
301 Take the element in the upper left corner of the original convolution kernel as the first element;
302 Taking the next element in the horizontal and vertical directions by taking the convolution step as a step length until the number of the set convolution kernel sizes is obtained;
303 All elements are combined into a new convolution kernel according to positions until the first convolution kernel is completely split;
304 An element that was not taken before in the upper left direction is taken as the first element of the second convolution kernel;
305 Get the remaining elements in the same way until getting the number less than or equal to the size of the set convolution kernel;
306 Repeat the above steps until the remaining convolution kernel is split.
The further improvement of the invention is that the specific implementation method of the step 31) is as follows:
311 Take the element in the upper left corner of the original input as the first element;
312 Taking the next element by taking the convolution step length as the step length in the horizontal and vertical directions until the number of the set input size is obtained;
313 Combine all elements into a new input according to the position until the first input is split;
314 An element that was not taken before in the upper left direction is taken as the first element of the second input;
315 Get the remaining elements in the same way until getting the number less than or equal to the set input size;
316 Repeat the above steps until the remaining inputs are split.
The further improvement of the invention is that the specific implementation method of the step 4) is as follows:
if the split convolution kernel is the set convolution kernel, zero padding is not carried out, and if the split convolution kernel is smaller than the set convolution kernel, zero padding at the upper left side is the set convolution kernel;
if the split input is the set input size, the compensation is not performed, and if the split input is smaller than the set input size, the zero compensation is performed at the upper left side to be the set input size.
The further improvement of the invention is that the specific implementation method of the step 5) is as follows:
carrying out Winograd convolution kernel conversion on the convolution kernel by setting the size of the convolution kernel;
carrying out Winograd input conversion on input with a set input size;
performing dot product operation of setting the size of the convolution kernel and the size of the input on the converted input and the convolution kernel;
and performing Winograd output conversion on the dot product result according to the set output size.
The invention has the following beneficial technical effects:
according to the Winograd convolution splitting method for the convolutional neural network accelerator, the input and the convolution kernel with different sizes are obtained from the cache of the accelerator, the convolution kernels and the input with different shapes are split before Winograd operation is carried out, the split convolution kernels and the input are converted into a uniform data stream, and then Winograd operation is carried out. The algorithm makes the accelerator have a Winograd calculation unit with one parameter, but can accelerate convolution of any shape. Compared with the traditional convolution algorithm, the method has the advantage that the acceleration performance, the power consumption and the flexibility are improved.
The invention has the main characteristics that:
1. and splitting the convolutions of different shapes and converting the convolutions into a unified data stream.
2. And splitting different rules according to the step length of the convolution.
The main advantages are as follows:
1. compared with the traditional convolution algorithm, the algorithm can reduce the introduction of a large number of multipliers for most convolution shapes, especially for the most widely used small convolution small-step kernel convolution.
2. Compared with the traditional Winograd algorithm, the algorithm has better flexibility for a hardware accelerator
The traditional Winograd algorithm mode diagram is shown as figure 1, and the calculation formula is shown as formula (1)
U=GFG T V=B T TB
Out=A T [U⊙V]A (1)
Wherein, W is the input Tile side length, H is the input/output side length, C1 is the input channel number, C2 is the output channel number, and R is the convolution kernel size. A, B, G are constant transformation matrices, depending on W and the convolution step S. The main idea is to transform the input (T) and the convolution kernel (F), then to do matrix dot multiplication, and finally to transform to obtain the final result. For example: when Tile size is 5 x 5 and convolution kernel size is 3 x 3, using conventional convolution will involve 81 multiplications regardless of data multiplexing, while using Winograd algorithm involves only 25 multiplications and additions without error. However, in a convolutional neural network accelerator adopting a Winograd acceleration algorithm, only convolutional kernels with sizes less than three are usually supported, and supporting a large convolutional kernel causes accuracy reduction and does not support multiple step sizes. The following major problems are faced when designing a convolutional neural network accelerator based on the Winograd algorithm:
1. how to accelerate the convolution of convolution kernels of different sizes.
2. How to speed up the convolution for different step sizes.
3. How to ensure the data flow to be uniform on the premise of supporting convolution with different shapes.
Drawings
Fig. 1 is a flowchart of the conventional Winograd algorithm.
Fig. 2 is a calculation flowchart of the Winograd-oriented splitting algorithm.
FIG. 3 is a schematic diagram of the calculation process of the Winograd convolution splitting algorithm with the step size of 1.
FIG. 4 is a schematic diagram of the calculation process of the Winograd convolution splitting algorithm with the step size of 3.
Fig. 5 is a schematic diagram of the conversion module for input and convolution kernels, applicable to W =4, r =3 or 2.
Fig. 6 is a schematic diagram of an output conversion module, which is suitable for W =4, r =3 or 2.
FIG. 7 is a schematic diagram of a unified PE array and conversion module.
Detailed Description
The invention is further described below with reference to the figures and examples.
The flow of the Winograd convolution splitting method for the convolution neural network accelerator is shown in figure 2, wherein w and r are parameters for setting a Winograd algorithm. The difference from the traditional Winograd algorithm is that: before Winograd is used for carrying out convolution acceleration, splitting and zero padding are carried out on the input characteristic value and the convolution kernel according to the step length and the size of the convolution kernel, then Winograd acceleration is respectively carried out on the split result, and the output result is accumulated again.
The invention provides a Winograd convolution splitting method and a Winograd addition algorithm for a convolution neural network accelerator. Specifically, elements at different positions are split, combined and zero-filled according to different convolution parameters. The implementation steps are as follows:
1. and when the size of the convolution kernel is smaller than r, directly performing zero filling on the convolution kernel.
2. And when the size of the convolution kernel is larger than r and the step length is 1, combining the input and the adjacent elements of the convolution kernel, and filling zero into the set input size w and the convolution kernel size r.
3. When the size of the convolution kernel is larger than r and the step size is larger than 1, combining the input and convolution kernel elements by the step size, and filling zero to set the input size r and the convolution kernel size r.
4. And carrying out Winograd convolution and addition on the input after splitting, combining and zero padding and the convolution kernel.
The resolution method provided by the invention comprises the following steps:
when the convolution kernel size is smaller than r, the method comprises the following steps:
1. zero padding is performed on the convolution kernel to a set r × r size.
2. Zero-filling the input to Tile of w size.
When the convolution kernel size is larger than r and the step size is 1, the method comprises the following steps:
1. and combining adjacent elements of the convolution kernels into r multiplied by r convolution kernels from the position of the upper left corner of the original convolution kernel, wherein each combined convolution kernel element is not repeated and cannot exceed the boundary of the original convolution kernel.
2. And (4) zero filling is carried out on the convolution kernels with the sizes smaller than r after splitting and combining to be r multiplied by r.
3. And combining the input adjacent elements into a Tile with the size of w multiplied by w from the position of the upper left corner of the original input, wherein each combined Tile element is not repeated and cannot exceed the boundary of the original input.
4. And after splitting and combining, the zero padding of the Tile with the size smaller than w is 4 multiplied by 4. The results are shown in FIG. 3.
When the convolution kernel size is larger than the Tile size 4 and the step size is larger than 1, the method comprises the following steps:
1. elements with convolution kernels adjacent to each other and with set step length are combined into convolution kernels with the size of r multiplied by r from the position of the upper left corner of the original convolution kernel, and elements of each combined convolution kernel are not repeated and cannot exceed the boundary of the original convolution kernel.
2. The size of the split combination is smaller than the size of zero padding r multiplied by r of a convolution kernel.
3. And combining the input adjacent elements with the set step length into tiles with the size of w multiplied by w from the position of the upper left corner of the original input, wherein each combined Tile element is not repeated and cannot exceed the original input boundary.
4. And (5) filling zero for tiles with the size smaller than w after splitting and combining to be w multiplied by w. The results are shown in FIG. 4.
The Winograd addition algorithm provided by the invention is carried out according to the following steps:
after the algorithm is split, original input and convolution kernels are converted into a plurality of groups of data streams with same w multiplied by w size Tile and r multiplied by r size convolution kernels, in each group of data streams, tile and convolution kernels carry out Winograd convolution operation with unified parameters, and output results are added.
Comparison of the Performance of the present invention with existing methods
The Winograd convolution splitting method for the convolution neural network accelerator provided by the invention has the advantages that the multiplication times introduced under different parameters are shown in a formula (1): where S is the step size, m is the output matrix size, r is the set convolution kernel size, and W is the input matrix size.
Figure BDA0002156103050000071
The multiplication times introduced under different parameters of the traditional Winograd algorithm are shown in a formula (2):
NumMult Convention =r 2 ×(m-W+1) 2 (2)
table 1 shows the saving rate of multiplication using this algorithm for a step size of 1, compared to the conventional convolution algorithm. It can be seen that for most cases, i.e. where the convolution kernel size r is larger than S, the saving rate can reach 36% -55%. For the case where the convolution kernel size r is smaller than S, the algorithm is inferior to the conventional convolution algorithm, but the convolution of such parameters has not been used in the CNN model. The effect of the algorithm is significant.
TABLE 1 multiplication saving rate for Winograd convolution splitting algorithm with step size of 1
Figure BDA0002156103050000072
Table 2 shows the saving rate of multiplication using this algorithm for a step size of 3, compared to the conventional convolution algorithm. Compared with the case that the step size is 1, the effect is general only for small convolution kernels, and the multiplication saving rate of large convolution kernels is high. So when the step size is 3 and the convolution kernel size is small, it is contemplated to use a conventional convolution algorithm in the accelerator.
TABLE 2 Winograd-oriented convolution splitting algorithm multiplication saving rate with step length of 3
Figure BDA0002156103050000081
Examples
The invention can be realized in a PE array of a convolutional neural network accelerator.
For most current convolutional neural network accelerators that use the Winograd algorithm, two part optimizations are required if the present invention is used. Firstly, the conversion module is optimized, and the traditional accelerator can design a Winograd conversion module with various different parameters to support various convolution shapes. The invention can support any convolution shape only by one conversion module. As shown in fig. 5 and 6, the conversion module supports conversion of W =4, r =2 and W =4, r =3 by resource multiplexing, and the number of multiplications introduced by using the convolution splitting algorithm is shown in table 3 compared with the conventional convolution (step S = 1). Secondly, the PE array is optimized, in this example, since the convolution of any shape can be converted into 4 × 4 dot product operation by the algorithm and the conversion module, the design of the PE array is also simplified, as shown in fig. 7.
TABLE 3 Winograd-oriented convolution splitting algorithm multiplication saving rate with step length of 3
Figure BDA0002156103050000091
Compared with the traditional convolution algorithm, the method has the advantage that the average multiplication number saving rate can reach 51.8% on average under the condition that the step size is 1.

Claims (2)

1. A Winograd convolution splitting method for a convolution neural network accelerator is characterized in that the method can split convolutions of different shapes and convert the convolutions into a unified data stream, and simultaneously supports 2 Winograd parameters of W =4, R =2 and W =4, R =3, and comprises the following steps:
1) Reading input and convolution kernels with any size from a cache of a convolution neural network accelerator; the convolutional neural network accelerator can simultaneously support 2 Winograd parameters of W =4, R =2 and W =4, R =3, and the specific implementation method is as follows:
for the conversion modules of input and convolution kernels, only one conversion module is needed to support any one convolution shape of W =4, R =3 or 2 through circuit resource multiplexing;
for the output conversion module, only one conversion module is needed to support any one convolution shape of W =4, R =3 or 2 through circuit resource multiplexing;
for the PE array for calculating the dot product, the Winograd parameter of W =4, R =3 or 2 is uniformly converted into 4 x 4 dot product operation through convolution splitting, so the design of the PE array is also simplified into 16 unified multipliers;
2) Judging whether to carry out convolution splitting or not according to the size of the convolution kernel and the input size, and if the convolution splitting is required, carrying out the next step; the specific judgment method is as follows:
if the size of the convolution kernel is smaller than the set size of the convolution kernel and the input size is smaller than the set input size, performing no convolution splitting and directly filling zero into the set size of the convolution kernel and the input size; if the size of the convolution kernel is larger than the set convolution kernel size and the input size is larger than the set input size, performing convolution splitting; the multiplication times introduced by the convolution splitting method under different parameters are shown in formula (1): wherein S is the step length, m is the output matrix size, r is the set convolution kernel size, and W is the input matrix size;
Figure FDA0003970310070000011
the convolution splitting method can simultaneously support Winograd algorithm parameters of W =4,r =2 and W =4,r = 3;
3) Splitting the convolution kernel according to the size and the step length of the convolution kernel, and splitting the input according to the size and the step length of the input; the specific implementation method comprises the following steps:
30 Splitting an original convolution kernel into a plurality of convolution kernels with the size not larger than the set convolution kernel, wherein the adjacent distance of each element in the split convolution kernels in the horizontal direction and the vertical direction is a convolution step length; the specific implementation method comprises the following steps:
301 Take the element in the upper left corner of the original convolution kernel as the first element;
302 Taking the next element in the horizontal and vertical directions by taking the convolution step as a step length until the number of the set convolution kernel sizes is obtained;
303 All elements are combined into a new convolution kernel according to positions until the first convolution kernel is split;
304 An element that was not taken before in the upper left direction is taken as the first element of the second convolution kernel;
305 Get the remaining elements in the same way until getting the number less than or equal to the size of the set convolution kernel;
306 Repeating the steps until residual convolution kernels are split;
31 Splitting an original input into a plurality of inputs with the size not larger than the set input size, wherein the adjacent distance of each element in the split inputs in the horizontal and vertical directions is a convolution step length; the specific implementation method comprises the following steps:
311 Take the element in the upper left corner of the original input as the first element;
312 Take the next element in the horizontal and vertical directions by taking the convolution step as the step length until the number of the set input size is obtained;
313 Combine all elements into a new input according to the position until the first input is split;
314 An element that was not taken before in the upper left direction is taken as the first element of the second input;
315 Get the remaining elements in the same way until getting the number less than or equal to the set input size;
316 Repeating the above steps until the remaining inputs are split;
4) Combining and zero-filling the split elements according to the size of the convolution kernel, and combining and zero-filling the split elements according to the input size;
5) Carrying out Winograd convolution on each pair of split input and convolution kernel; the specific implementation method comprises the following steps:
carrying out Winograd convolution kernel conversion on the convolution kernel by setting the size of the convolution kernel;
carrying out Winograd input conversion on input with a set input size;
performing dot product operation of setting the size of the convolution kernel and the size of the input on the converted input and the convolution kernel;
winograd output conversion of the dot multiplication result with set output size is carried out;
6) Accumulating Winograd convolution results of each team of input and convolution kernels;
7) And storing the accumulation result in a cache of the convolutional neural network accelerator.
2. The Winograd convolution splitting method for the convolutional neural network accelerator as claimed in claim 1, wherein the specific implementation method of step 4) is as follows:
if the split convolution kernel is the set convolution kernel, zero padding is not carried out, and if the split convolution kernel is smaller than the set convolution kernel, zero padding at the upper left side is the set convolution kernel;
if the split input is the set input size, the compensation is not performed, and if the split input is smaller than the set input size, the zero compensation is performed at the upper left side to be the set input size.
CN201910717929.9A 2019-08-05 2019-08-05 Winograd convolution splitting method for convolution neural network accelerator Active CN110533164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910717929.9A CN110533164B (en) 2019-08-05 2019-08-05 Winograd convolution splitting method for convolution neural network accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910717929.9A CN110533164B (en) 2019-08-05 2019-08-05 Winograd convolution splitting method for convolution neural network accelerator

Publications (2)

Publication Number Publication Date
CN110533164A CN110533164A (en) 2019-12-03
CN110533164B true CN110533164B (en) 2023-04-07

Family

ID=68661412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910717929.9A Active CN110533164B (en) 2019-08-05 2019-08-05 Winograd convolution splitting method for convolution neural network accelerator

Country Status (1)

Country Link
CN (1) CN110533164B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10990648B2 (en) * 2017-08-07 2021-04-27 Intel Corporation System and method for an optimized winograd convolution accelerator
CN112765538B (en) * 2019-11-01 2024-03-29 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
CN113033813B (en) * 2019-12-09 2024-04-26 中科寒武纪科技股份有限公司 Data processing method, device, computer equipment and storage medium
CN111260037B (en) * 2020-02-11 2023-10-13 深圳云天励飞技术股份有限公司 Convolution operation method and device of image data, electronic equipment and storage medium
EP4213070A4 (en) * 2020-09-29 2023-10-25 Huawei Technologies Co., Ltd. Neural network accelerator, and acceleration method and device
CN112215745A (en) * 2020-09-30 2021-01-12 深圳云天励飞技术股份有限公司 Image processing method and device and electronic equipment
CN112862091B (en) * 2021-01-26 2022-09-27 合肥工业大学 Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
CN112927124A (en) * 2021-03-31 2021-06-08 成都商汤科技有限公司 Data processing method, device, equipment and storage medium
CN113269302A (en) * 2021-05-11 2021-08-17 中山大学 Winograd processing method and system for 2D and 3D convolutional neural networks
CN113283587B (en) * 2021-05-28 2023-09-19 西安交通大学 Winograd convolution operation acceleration method and acceleration module
CN113627592B (en) * 2021-08-02 2023-09-19 西安交通大学 Winograd-oriented convolution tensor optimization method and system with adjustable parameters
CN113780544B (en) * 2021-11-10 2022-04-05 南京风兴科技有限公司 Large convolution kernel hardware implementation method, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388777A (en) * 2017-08-07 2019-02-26 英特尔公司 System and method for optimized Winograd convolution accelerator
CN109886400A (en) * 2019-02-19 2019-06-14 合肥工业大学 The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388777A (en) * 2017-08-07 2019-02-26 英特尔公司 System and method for optimized Winograd convolution accelerator
CN109886400A (en) * 2019-02-19 2019-06-14 合肥工业大学 The convolutional neural networks hardware accelerator system and its calculation method split based on convolution kernel

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks;Chen Yang et al.;《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS》;20190729;第66卷(第09期);第3480-3493页 *

Also Published As

Publication number Publication date
CN110533164A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110533164B (en) Winograd convolution splitting method for convolution neural network accelerator
CN105681628B (en) A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing
CN107229598A (en) A kind of low power consumption voltage towards convolutional neural networks is adjustable convolution computing module
CN112465110B (en) Hardware accelerator for convolution neural network calculation optimization
CN110688088A (en) General nonlinear activation function computing device and method for neural network
CN112633490B (en) Data processing device, method and related product for executing neural network model
CN108154229A (en) Accelerate the image processing method of convolutional neural networks frame based on FPGA
CN108205703B (en) Multi-input multi-output matrix average value pooling vectorization implementation method
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN112862091B (en) Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
US20230068450A1 (en) Method and apparatus for processing sparse data
CN110598844A (en) Parallel convolution neural network accelerator based on FPGA and acceleration method
Jiang et al. A high-throughput full-dataflow mobilenetv2 accelerator on edge FPGA
CN114138231A (en) Method, circuit and SOC for executing matrix multiplication operation
CN102970545A (en) Static image compression method based on two-dimensional discrete wavelet transform algorithm
CN115982418B (en) Method for improving super-division operation performance of AI (advanced technology attachment) computing chip
Xu et al. Design and implementation of an efficient CNN accelerator for low-cost FPGAs
CN103092559A (en) Multiplying unit structure for discrete cosine transformation (DCT)/inverse discrete cosine transformation (IDCT) circuit under high efficiency video coding (HEVC) standard
Xiao et al. A mobilenet accelerator with high processing-element-efficiency on fpga
CN115640833A (en) Accelerator and acceleration method for sparse convolutional neural network
CN113407904B (en) Winograd processing method, system and medium compatible with multi-dimensional convolutional neural network
CN115983343A (en) YOLOv4 convolutional neural network lightweight method based on FPGA
CN113744220B (en) PYNQ-based detection system without preselection frame
CN110737869B (en) DCT/IDCT multiplier circuit optimization method and application
CN102722470A (en) Single-machine parallel solving method for linear equation group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant