CN110807513A - Convolutional neural network accelerator based on Winograd sparse algorithm - Google Patents

Convolutional neural network accelerator based on Winograd sparse algorithm Download PDF

Info

Publication number
CN110807513A
CN110807513A CN201911013112.XA CN201911013112A CN110807513A CN 110807513 A CN110807513 A CN 110807513A CN 201911013112 A CN201911013112 A CN 201911013112A CN 110807513 A CN110807513 A CN 110807513A
Authority
CN
China
Prior art keywords
data
winograd
module
weight
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911013112.XA
Other languages
Chinese (zh)
Inventor
郭阳
徐睿
马胜
刘胜
陈海燕
王耀华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201911013112.XA priority Critical patent/CN110807513A/en
Publication of CN110807513A publication Critical patent/CN110807513A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a convolution neural network accelerator based on Winograd sparse algorithm, which comprises: the control module is used for taking charge of moving the data; the buffer module is used for temporarily storing load data, and the operation module is used for finishing the operation of the Winograd sparse algorithm; in the reading stage, the control module sends an address, and the input cache and the weight cache read data in the external DRAM; in the data operation stage, the operation module reads input data, weight data and weight indexes from the buffer module to complete convolution operation; in the sending stage, when the output finishes the final accumulation operation, the output is sent to the external DRAM through the output cache, and the calculation is finished finally. The invention has the advantages of simple structure, easy realization, good acceleration effect and the like.

Description

Convolutional neural network accelerator based on Winograd sparse algorithm
Technical Field
The invention mainly relates to the technical field of convolutional neural networks, in particular to a convolutional neural network accelerator based on a Winograd sparse algorithm.
Background
Convolutional neural networks are currently widely used in various computer fields, such as image recognition, recommendation systems, and language processing. However, the time to train and derive convolutional neural networks is intolerable. The reason is that convolutional layers are introduced into the convolutional neural network, so that the computational complexity in the network is improved, and huge workload is brought. This is difficult to solve by the current CPU or embedded end processor.
In order to solve this problem, many schemes are proposed, such as GPU acceleration during the operation process, or using hardware such as FPGA, custom ASIC, etc. to perform the convolution operation. Most of the schemes utilize the parallelism of a convolutional neural network algorithm so as to improve the efficiency of convolutional calculation. But due to the limitations of GPU area, power consumption and platform usage, it is difficult to be widely applied to hot mobile or embedded terminals. Therefore, the purpose of acceleration is achieved under the condition of controlling power consumption and area by adopting the FPGA or ASIC to carry out customized hardware design, and the method is a very efficient solution.
However, at present, the FPGA and ASIC schemes mostly adopt direct convolution operations in handling convolutional layer operations, and there are few schemes using other algorithms. This actually optimizes the computational efficiency only by hardware, while ignoring the optimization space at the software or algorithm level. From the current trend, the convolutional neural network topology is continuously deepened, so that higher operation complexity is brought. It is therefore necessary to select other acceleration schemes to achieve more benefits with limited hardware resources.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the convolution neural network accelerator based on the Winograd sparse algorithm, which has the advantages of simple structure, easiness in realization and good acceleration effect.
In order to solve the technical problems, the invention adopts the following technical scheme:
a convolutional neural network accelerator based on Winograd sparse algorithm, comprising:
the control module is used for taking charge of moving the data;
a buffer module buffer for temporary storage of load data,
the operation module is used for finishing the operation of the Winograd sparse algorithm;
in the reading stage, the control module sends an address, and the input cache and the weight cache read data in the external DRAM; in the data operation stage, the operation module reads input data, weight data and weight indexes from the buffer module to complete convolution operation; in the sending stage, when the output finishes the final accumulation operation, the output is sent to the external DRAM through the output cache, and the calculation is finished finally.
As a further improvement of the invention: the control module includes:
the conversion module is used for converting the data to be processed into a Winograd domain;
the 0 skipping module is used for skipping all registers with the input value of 0, and then, the input is transmitted into the parallel multiplier array again, so that the pressure of using the multipliers in the calculation process is reduced;
a compression coding unit for providing sparse storage support;
and the weight compression coding reading unit is used for separately storing the data and the Index, and the reading of the data and the Index is guided to be completed by the position of the buffer and the first bit of the Index.
As a further improvement of the invention: the compression coding unit adopts a 4 × 4 sparse matrix with the density of about 0.4; storing a two-dimensional matrix of Data in a linear form in a Data structure in a one-dimensional form, storing Data of elements other than 0 in a vector Data, and storing position information of elements other than 0 in a vector Index, the position information being represented by (r × 4+ c), where r represents a row of Data in the matrix and c represents a column of Data in the matrix; the number of all non-0 elements in the matrix is stored in the Index first bit.
As a further improvement of the invention: the weight compression coding reading unit adopts a convolution weight of 4 multiplied by 4 with training and sparsity adding, and the maximum value of the stored data in Index is 16, so that the data is stored by using unsigned integer data of 5 bits; fixed point 16 bits are used for Data storage.
As a further improvement of the invention: and the conversion module is used for converting the input into a Winograd domain, then completing the operation of multiplying corresponding elements between the matrixes, and finally converting the result output by the multiplier from the Winograd domain into a spatial domain to complete the output of the result.
As a further improvement of the invention: the buffer module adopts a linear buffer unit and directly transmits data in a required one-dimensional form; when data reuse exists in the data reading process, the reading pointer can be repeatedly read in the reuse area to reuse the data.
As a further improvement of the invention: the operation module comprises:
the processing engine is used for completing convolution operation of input characteristic data and weight data under a Winograd sparse algorithm;
and the processing unit is used for processing the input characteristic data and the weight data.
As a further improvement of the invention: the processing unit comprises a PU, a sub-module processing engine and an accumulator; the PU can process four groups of data at a time, and the four groups of data correspond to four groups of input channels respectively; loading four groups of weight data, and generating a group of output data through calculation, wherein the output data correspond to output characteristic graphs under the same channel; the four groups of inputs are finally distributed to four corresponding PEs in the PU, the four PEs finish calculation in parallel, and the result output is finished by accumulation.
Compared with the prior art, the invention has the advantages that:
the convolutional neural network accelerator based on the Winograd sparse algorithm has the advantages of simple structure, easiness in realization and good acceleration effect, and can quickly convert data to be processed into a Winograd domain by utilizing the conversion module through the simplest addition operation. Then by skipping the module by 0, the pressure to use the multiplier in the calculation process can be reduced. And the convolution operation of the input characteristic data and the weight data under a Winograd sparse algorithm is completed by utilizing the processing engine and the processing unit, and the input characteristic data and the weight data are processed. The linear buffer design in the invention can reuse the input characteristic data to the maximum extent.
Drawings
Fig. 1 is a schematic diagram of the topology of the present invention.
FIG. 2 is a pseudo code of the present invention for the structure and operation of a processing unit in an embodiment.
FIG. 3 is a diagram of compression coding in a specific application example of the present invention.
FIG. 4 is a diagram illustrating weight data storage and reading in an embodiment of the present invention.
FIG. 5 is a schematic diagram of the processing engine in a specific application example.
FIG. 6 is a schematic diagram of the structural principle of the conversion module in a specific application example of the present invention.
FIG. 7 is a schematic diagram of a linear buffer unit in an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings and specific examples.
As shown in fig. 1, the convolutional neural network accelerator based on Winograd sparse algorithm of the present invention includes:
the control module (topcontrol) is used for taking charge of moving data;
a buffer module (buffer) for temporary storage of load data,
and the operation module (PUS) is used for finishing the operation of the Winograd sparse algorithm.
In the reading stage, the control module (top control) sends an address, and the input buffer and the weight buffer read data in the external DRAM;
in the data operation stage, the operation module reads input data, weight data and weight indexes from the buffer module to complete convolution operation;
in the sending stage, when the output finishes the final accumulation operation, the output is sent to the external DRAM through the output cache, and the calculation is finished finally.
In a specific application example, the control module of the invention comprises:
the conversion module is used for converting the data to be processed into a Winograd domain; in the example, the conversion module can complete data conversion very quickly through the simplest addition operation;
the 0 skipping module is used for skipping all registers with the input value of 0, and then, the input is transmitted into the parallel multiplier array again, so that the pressure of using the multipliers in the calculation process is reduced;
and the compression coding unit is used for providing sparse storage support, and the data is not stored in the original matrix form any more, but is stored in a linear form in a one-dimensional form of a two-dimensional matrix. Then, storing the Data of the non-0 element into the vector Data, and storing the position information of the non-0 element into the vector Index;
and the weight compression coding reading unit is used for separately storing the data and the Index, so that the buffer storage space is saved as much as possible, and the reading of the data and the Index is guided to be completed by the position of the buffer and the first bit of the Index.
In a specific application example, the buffer module of the invention adopts a linear buffer unit, and because the processing engine selects to process a two-dimensional matrix into a one-dimensional vector when processing input data, and uses a linear-structure buffer, the data can be directly transmitted to the processing unit for operation according to a required one-dimensional form. Meanwhile, data reuse exists in the data reading process, and the reading pointer can be repeatedly read in the reuse area to reuse data.
In a specific application example, the operation module of the invention comprises:
the processing engine is used for completing convolution operation of input characteristic data and weight data under a Winograd sparse algorithm;
the processing unit is used for processing the input characteristic data and the weight data; in this example, the processing unit, which includes the processing engine and the accumulator, is an important unit constituting the calculation module.
In the specific application example, as shown in fig. 2, the processing unit can see that the PU is not a basic unit of the computing module, and it is further composed of sub-module Processing Engines (PEs) and an accumulator. The PU can process four groups of data at a time, the four groups of data respectively correspond to four groups of input channels, for this purpose, four groups of weight data are loaded, and a group of output data is generated through calculation and corresponds to an output characteristic diagram under the same channel. In order to reduce the number of times of reading the weight data, the data flow adopts a weight fixing mode, and only when one weight is completely used, the next group of data is replaced. And the four groups of inputs are finally distributed to four corresponding PEs in the PU, the four PEs finish calculation in parallel, and the result is output after accumulation. The data can be calculated by referring to pseudo code. Note that the weight mentioned here is data after compression encoding, and includes weight data and a weight index.
As shown in fig. 3, for the compression coding scheme in the specific application example of the present invention, because a Winograd domain sparse network structure is used, the present invention proposes its own compression coding scheme by referring to the current mainstream sparse matrix coding format and combining its own hardware features and computing requirements, and considers a 4 × 4 sparse matrix with a density of about 0.4. First, in the data structure, data is no longer stored in the original matrix form as a matrix, but a two-dimensional matrix is stored in a linear form as a one-dimensional form. Then, Data of non-0 elements is stored into the vector Data, and position information of non-0 elements is stored into the vector Index, the position information being represented by (r × 4+ c), where r represents a row of the Data in the matrix and c represents a column of the Data in the matrix. In addition, the number of all non-0 elements in the matrix is stored in the Index first bit.
Fig. 4 is a schematic diagram illustrating weight storage and reading in an embodiment of the present invention. Considering a 4 x 4 sized convolution weight that has been trained to add sparsity, the maximum value of the data stored in Index is 16, so 5 bits of unsigned integer data are used for storage. And the fixed point 16 bits are used for Data storage, so that the storage space is reduced while the accuracy is ensured. Since the data bits used for storing the data are different, the data and the index are stored separately when designing the buffer on the hardware. The association of data with Index depends on its buffer location and the first bit of Index. Since the first bit of the index stores the number of non-zero elements, this is also the length of the weight data vector and the length of the remaining index vector under the same set. When the first group of weights is read, the first bit in the buffer is read first, and the length of the group of data is known to be 6, then the reading pointer moves 6 bits backwards and reads sequentially (the dark part in the figure), and then the first bit is read again, and the length of the next group of data is obtained. The above process is repeated (reading the light colored part of the figure). Therefore, by indexing the first bit, the location of each set of weights in the Buffer can be easily located and the weight data associated with the index.
As shown in fig. 5, which is a schematic diagram of a processing engine structure in a specific application example of the present invention, the module is mainly divided into three working steps, wherein the first step is to complete processing of an input feature map; secondly, completing calculation under a sparse data structure by index guidance; and thirdly, inversely converting the output result, and converting the output result from the Winograd domain back to the space domain.
Fig. 6 is a schematic diagram of a conversion module in an embodiment of the invention. In the first stage of the processing engine, the input must be transferred to Winograd domain, and it is the conversion module to complete this operation, and the parameter matrix involved in the change process of Winograd is very simple, and only involves the change and addition and subtraction operation of digital sign bits, therefore, it does not need too complex calculation module.
In the second phase of the work, the operation of multiplying corresponding elements between the matrices will be completed. Since the weight data has been compressed and encoded, the input and output data are represented in the form of one-dimensional vectors throughout the calculation flow. In order to reduce the workload by using compression coding, the index is used as a control signal, all input and output registers to participate in operation are gated, all registers with the input of 0 value are skipped at the same time, and then the input is transmitted into a parallel multiplier array to complete calculation and output. Since most of the multiplication operations are skipped by the compression coding method, the number of multipliers in PE is only 8 in the design of the present invention to process 16-bit input. Later experiments will prove that it is sufficient to design only 8 multipliers.
In the last stage, the result output by the multiplier is converted into a spatial domain from a Winograd domain through a conversion module, and the output of the result is completed.
Fig. 7 shows a buffer module for inputting features in an embodiment of the present invention. Because the processing unit selects to process the two-dimensional matrix into the one-dimensional vector when processing the input data, the invention designs and uses the buffer with the linear structure, thereby directly transmitting the data to the processing module for operation according to the required one-dimensional form. Taking the graph as an example, if the storage mode is a linear mode when viewed transversely, different data blocks under the same channel in the feature graph are read into the linear buffer and correspond to a plurality of small squares in the graph; and vertically, data corresponding to different channels (TN) are stored. Since the Winograd algorithm processes one 4 × 4 (TH × TW in the corresponding diagram) matrix block at a time in the design of the present invention, during reading from DRAM to buffer, data under the same channel will be selected by sliding longitudinally on the data in a window with a size of 4 × 4 with a step size of 2. Of course, this process has data reuse, and after the first data block is read, only the next two rows of data are read (the dark color part in the figure is the reuse part). The reused data corresponds to the dotted part of the linear buffer, the read pointer will be read repeatedly in the region to reuse the data, and the size of the reusable data is about H × TW.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (8)

1. A convolution neural network accelerator based on Winograd sparse algorithm is characterized by comprising the following components:
the control module is used for taking charge of moving the data;
a buffer module buffer for temporary storage of load data,
the operation module is used for finishing the operation of the Winograd sparse algorithm;
in the reading stage, the control module sends an address, and the input cache and the weight cache read data in the external DRAM; in the data operation stage, the operation module reads input data, weight data and weight indexes from the buffer module to complete convolution operation; in the sending stage, when the output finishes the final accumulation operation, the output is sent to the external DRAM through the output cache, and the calculation is finished finally.
2. The Winograd sparsity algorithm-based convolutional neural network accelerator of claim 1, wherein said control module comprises:
the conversion module is used for converting the data to be processed into a Winograd domain;
the 0 skipping module is used for skipping all registers with the input value of 0, and then, the input is transmitted into the parallel multiplier array again, so that the pressure of using the multipliers in the calculation process is reduced;
a compression coding unit for providing sparse storage support;
and the weight compression coding reading unit is used for separately storing the data and the Index, and the reading of the data and the Index is guided to be completed by the position of the buffer and the first bit of the Index.
3. The Winograd sparsity algorithm-based convolutional neural network accelerator of claim 2, wherein said compression coding unit employs a 4 x 4 sparse matrix with a density of about 0.4; storing a two-dimensional matrix of Data in a linear form in a Data structure in a one-dimensional form, storing Data of elements other than 0 in a vector Data, and storing position information of elements other than 0 in a vector Index, the position information being represented by (r × 4+ c), where r represents a row of Data in the matrix and c represents a column of Data in the matrix; the number of all non-0 elements in the matrix is stored in the Index first bit.
4. The Winograd sparsity algorithm based convolutional neural network accelerator as claimed in claim 2, wherein the weight compression coding reading unit employs a 4 x 4 convolutional weight that has been trained to add sparsity, then the maximum value of the stored data in Index is 16, so 5 bits of unsigned integer data are used for storage; fixed point 16 bits are used for Data storage.
5. The Winograd sparsity algorithm based convolutional neural network accelerator as claimed in claim 2, wherein said transformation module is configured to transform the input into a Winograd domain, then perform operations of multiplying corresponding elements between matrices, and finally transform the result output from the multiplier from the Winograd domain into a spatial domain to perform the output of the result.
6. The Winograd sparse algorithm-based convolutional neural network accelerator according to any one of claims 1-5, wherein the buffer module employs a linear buffer unit to directly transmit data in a desired one-dimensional form; when data reuse exists in the data reading process, the reading pointer can be repeatedly read in the reuse area to reuse the data.
7. The Winograd sparsity algorithm based convolutional neural network accelerator as claimed in any one of claims 1-5, wherein said operation module comprises:
the processing engine is used for completing convolution operation of input characteristic data and weight data under a Winograd sparse algorithm;
and the processing unit is used for processing the input characteristic data and the weight data.
8. The Winograd sparsity algorithm-based convolutional neural network accelerator of claim 7, wherein said processing unit comprises a PU, a sub-module processing engine, and an accumulator; the PU can process four groups of data at a time, and the four groups of data correspond to four groups of input channels respectively; loading four groups of weight data, and generating a group of output data through calculation, wherein the output data correspond to output characteristic graphs under the same channel; the four groups of inputs are finally distributed to four corresponding PEs in the PU, the four PEs finish calculation in parallel, and the result output is finished by accumulation.
CN201911013112.XA 2019-10-23 2019-10-23 Convolutional neural network accelerator based on Winograd sparse algorithm Pending CN110807513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911013112.XA CN110807513A (en) 2019-10-23 2019-10-23 Convolutional neural network accelerator based on Winograd sparse algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911013112.XA CN110807513A (en) 2019-10-23 2019-10-23 Convolutional neural network accelerator based on Winograd sparse algorithm

Publications (1)

Publication Number Publication Date
CN110807513A true CN110807513A (en) 2020-02-18

Family

ID=69488998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911013112.XA Pending CN110807513A (en) 2019-10-23 2019-10-23 Convolutional neural network accelerator based on Winograd sparse algorithm

Country Status (1)

Country Link
CN (1) CN110807513A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882028A (en) * 2020-06-08 2020-11-03 北京大学深圳研究生院 Convolution operation device for convolution neural network
CN112949845A (en) * 2021-03-08 2021-06-11 内蒙古大学 Deep convolutional neural network accelerator based on FPGA
CN113077047A (en) * 2021-04-08 2021-07-06 华南理工大学 Convolutional neural network accelerator based on feature map sparsity
CN113592702A (en) * 2021-08-06 2021-11-02 厘壮信息科技(苏州)有限公司 Image algorithm accelerator, system and method based on deep convolutional neural network
CN113835758A (en) * 2021-11-25 2021-12-24 之江实验室 Winograd convolution implementation method based on vector instruction accelerated computation
WO2022067508A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Neural network accelerator, and acceleration method and device
CN115878957A (en) * 2022-12-29 2023-03-31 珠海市欧冶半导体有限公司 Matrix multiplication accelerating device and method
CN116032432A (en) * 2023-02-17 2023-04-28 重庆邮电大学 Density adjustment method based on sparse network coding

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐睿等: ""基于Winograd稀疏算法的卷积神经网络加速器设计与研究"", 《计算机工程与科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882028A (en) * 2020-06-08 2020-11-03 北京大学深圳研究生院 Convolution operation device for convolution neural network
WO2022067508A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Neural network accelerator, and acceleration method and device
CN112949845A (en) * 2021-03-08 2021-06-11 内蒙古大学 Deep convolutional neural network accelerator based on FPGA
CN113077047A (en) * 2021-04-08 2021-07-06 华南理工大学 Convolutional neural network accelerator based on feature map sparsity
CN113077047B (en) * 2021-04-08 2023-08-22 华南理工大学 Convolutional neural network accelerator based on feature map sparsity
CN113592702A (en) * 2021-08-06 2021-11-02 厘壮信息科技(苏州)有限公司 Image algorithm accelerator, system and method based on deep convolutional neural network
CN113835758A (en) * 2021-11-25 2021-12-24 之江实验室 Winograd convolution implementation method based on vector instruction accelerated computation
CN115878957A (en) * 2022-12-29 2023-03-31 珠海市欧冶半导体有限公司 Matrix multiplication accelerating device and method
CN115878957B (en) * 2022-12-29 2023-08-29 珠海市欧冶半导体有限公司 Matrix multiplication acceleration device and method
CN116032432A (en) * 2023-02-17 2023-04-28 重庆邮电大学 Density adjustment method based on sparse network coding

Similar Documents

Publication Publication Date Title
CN110807513A (en) Convolutional neural network accelerator based on Winograd sparse algorithm
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
Jiao et al. Accelerating low bit-width convolutional neural networks with embedded FPGA
US10810484B2 (en) Hardware accelerator for compressed GRU on FPGA
CN107239829B (en) Method for optimizing artificial neural network
US10691996B2 (en) Hardware accelerator for compressed LSTM
EP4258182A2 (en) Accelerated mathematical engine
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
WO2020073211A1 (en) Operation accelerator, processing method, and related device
KR20180073118A (en) Convolutional neural network processing method and apparatus
CN112200300B (en) Convolutional neural network operation method and device
CN112673383A (en) Data representation of dynamic precision in neural network cores
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN116362312A (en) Neural network acceleration device, method, equipment and computer storage medium
US20210357734A1 (en) Z-first reference neural processing unit for mapping winograd convolution and a method thereof
CN110910434A (en) Method for realizing deep learning parallax estimation algorithm based on FPGA (field programmable Gate array) high energy efficiency
CN111652359B (en) Multiplier array for matrix operations and multiplier array for convolution operations
CN115496181A (en) Chip adaptation method, device, chip and medium of deep learning model
CN108184127A (en) A kind of configurable more dimension D CT mapping hardware multiplexing architectures
KR101722215B1 (en) Apparatus and method for discrete cosine transform
CN114897133A (en) Universal configurable Transformer hardware accelerator and implementation method thereof
CN115913245A (en) Data encoding method, data decoding method, and data processing apparatus
KR101527103B1 (en) Device and method for discrete cosine transform
Wang et al. Acceleration and implementation of convolutional neural network based on FPGA
Moon et al. Multipurpose Deep-Learning Accelerator for Arbitrary Quantization With Reduction of Storage, Logic, and Latency Waste

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200218

RJ01 Rejection of invention patent application after publication