CN111914999B - Method and equipment for reducing calculation bandwidth of neural network accelerator - Google Patents

Method and equipment for reducing calculation bandwidth of neural network accelerator Download PDF

Info

Publication number
CN111914999B
CN111914999B CN202010753645.8A CN202010753645A CN111914999B CN 111914999 B CN111914999 B CN 111914999B CN 202010753645 A CN202010753645 A CN 202010753645A CN 111914999 B CN111914999 B CN 111914999B
Authority
CN
China
Prior art keywords
output
characteristic data
output point
neural network
network accelerator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010753645.8A
Other languages
Chinese (zh)
Other versions
CN111914999A (en
Inventor
尹昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010753645.8A priority Critical patent/CN111914999B/en
Publication of CN111914999A publication Critical patent/CN111914999A/en
Application granted granted Critical
Publication of CN111914999B publication Critical patent/CN111914999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides a method and equipment for reducing the calculation bandwidth of a neural network accelerator, wherein the method comprises the following steps: aiming at the output points in each column, carrying out calculation on the characteristic data required by calculating a first output point from an external memory to a fast memory of a neural network accelerator according to one-dimensional flattening to obtain output data corresponding to the first output point; and carrying the multiplexed characteristic data to the head of the rapid memory, carrying the residual characteristic data from the external memory to the rear part of the rapid memory for calculation according to one-dimensional flattening, and obtaining the output data corresponding to the current output point until the output data of the last output point in the column is obtained.

Description

Method and equipment for reducing calculation bandwidth of neural network accelerator
Technical Field
The present invention relates to the field of data processing, and in particular, to a method and apparatus for reducing the computational bandwidth of a neural network accelerator.
Background
In most of the existing neural network systems, the amount of characteristic data and convolution kernel data is huge, the partial data is generally stored in a memory, and the memory access speed is low due to cost. During convolution calculation, a main control core (such as ARM) is required to transfer characteristic data and a convolution core required by calculation from a slow external memory to a fast internal memory in a neural network accelerator through DMA (Direct Memory Access ), and then the accelerator reads the data to perform convolution calculation.
During specific calculation, according to the sequence, calculating the characteristic data required by the first output point, carrying the characteristic data from an external memory to a fast memory of a neural network accelerator, and calculating to obtain the first output data; when the second output point is needed, the needed data is carried from the external memory to the fast memory of the neural network accelerator for calculation.
In calculating the second output data point, the required characteristic data needs to be carried from the external memory to the fast memory of the accelerator again, and then calculation is performed, and the characteristic data is huge, so that the data bandwidth between the neural network accelerator and the external memory is large.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and equipment for reducing the calculation bandwidth of a neural network accelerator; the data reusability brought by the step length is fully utilized, so that the data reading quantity of the neural network accelerator to the slow external memory is reduced, and the efficiency of the neural network accelerator is improved.
Specifically, the present invention proposes the following specific embodiments:
The embodiment of the invention provides a method for reducing the calculation bandwidth of a neural network accelerator, which is applied to a feature graph comprising a plurality of output points and a plurality of feature data, wherein the output points are distributed in different columns in the feature graph, and the positions of the output points in each column are sequentially arranged from top to bottom according to the Height direction; each output point corresponds to the characteristic data, and the position distribution of the characteristic data is consistent with the position distribution of the corresponding output point; the method comprises the following steps:
Aiming at the output points in each column, carrying out calculation on the characteristic data required by calculating a first output point from an external memory to a fast memory of a neural network accelerator according to one-dimensional flattening to obtain output data corresponding to the first output point;
Carrying the multiplexing characteristic data to the head of the rapid memory, carrying the rest characteristic data from the external memory to the rear part of the rapid memory for calculation according to one-dimensional flattening, and obtaining output data corresponding to the current output point until obtaining the output data of the last output point in the column;
the multiplexing characteristic data is the same part in the characteristic data corresponding to the last output point and the current output point after calculation, and the last output point and the current output point are adjacent output points; and the multiplexing characteristic data and the residual characteristic data form characteristic data required for calculating the current output point.
In a specific embodiment, before "the first feature data required for calculating the first output point is transferred from the external memory to the fast memory of the neural network accelerator for calculation according to one-dimensional flattening", the method further includes:
and carrying the convolution kernel of the neural network system from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening.
In a specific embodiment, the size of the feature data corresponding to each output point is consistent with the size of the convolution kernel.
In a specific embodiment, the output data is calculated based on the characteristic data and the convolution kernel.
In one specific embodiment of the present invention,
The position difference of the feature data corresponding to the adjacent output points in each column in the feature map is preset with a step length;
and the characteristic data corresponding to the position of the preset step length is the residual characteristic data.
The embodiment of the invention also provides equipment for reducing the calculation bandwidth of the neural network accelerator, which is applied to a feature graph comprising a plurality of output points and a plurality of feature data, wherein the output points are distributed in different columns in the feature graph, and the positions of the output points in each column are sequentially arranged from top to bottom according to the Height direction; each output point corresponds to the characteristic data, and the position distribution of the characteristic data is consistent with the position distribution of the corresponding output point; the apparatus includes:
The first processing module is used for carrying the characteristic data required by calculating the first output point from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening aiming at the output point in each column to obtain the output data corresponding to the first output point;
The second processing module is used for carrying the multiplexing characteristic data to the head part of the rapid memory, carrying the residual characteristic data from the external memory to the rear part of the rapid memory for calculation according to one-dimensional flattening, and obtaining the output data corresponding to the current output point until the output data of the last output point in the column is obtained;
the multiplexing characteristic data is the same part in the characteristic data corresponding to the last output point and the current output point after calculation, and the last output point and the current output point are adjacent output points; and the multiplexing characteristic data and the residual characteristic data form characteristic data required for calculating the current output point.
In a specific embodiment, the method further comprises:
And the convolution kernel module is used for carrying the convolution kernel of the neural network system from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening.
In a specific embodiment, the size of the feature data corresponding to each output point is consistent with the size of the convolution kernel.
In a specific embodiment, the output data is calculated based on the characteristic data and the convolution kernel.
In one specific embodiment of the present invention,
The position difference of the feature data corresponding to the adjacent output points in each column in the feature map is preset with a step length;
and the characteristic data corresponding to the position of the preset step length is the residual characteristic data.
Compared with the prior art, the method and the device have the advantages that the data reusability caused by step length is fully utilized through optimization in the calculation direction of the convolutional neural network, and the data reading quantity of the neural network accelerator to the slow external memory is reduced, so that the efficiency of the neural network accelerator is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for reducing the computational bandwidth of a neural network accelerator according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a method for reducing the computational bandwidth of a neural network accelerator according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating characteristic data handling in a method for reducing computing bandwidth of a neural network accelerator according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of convolutional kernel handling in a method for reducing computational bandwidth of a neural network accelerator according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a method for reducing the bandwidth of a neural network accelerator according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an apparatus for reducing the computational bandwidth of a neural network accelerator according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an apparatus for reducing a calculation bandwidth of a neural network accelerator according to an embodiment of the present invention.
Detailed Description
Hereinafter, various embodiments of the present disclosure will be more fully described. The present disclosure is capable of various embodiments and of modifications and variations therein. However, it should be understood that: there is no intention to limit the various embodiments of the disclosure to the specific embodiments disclosed herein, but rather the disclosure is to be interpreted to cover all modifications, equivalents, and/or alternatives falling within the spirit and scope of the various embodiments of the disclosure.
The terminology used in the various embodiments of the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the disclosure. As used herein, the singular is intended to include the plural as well, unless the context clearly indicates otherwise. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of this disclosure belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is the same as the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in the various embodiments of the disclosure.
Example 1
The embodiment 1 of the invention discloses a method for reducing the calculation bandwidth of a neural network accelerator, which is applied to a feature map comprising a plurality of output points and a plurality of feature data, wherein the output points are distributed in different columns in the feature map, and the positions of the output points in each column are sequentially arranged from top to bottom according to the Height direction; each output point corresponds to the characteristic data, and the position distribution of the characteristic data is consistent with the position distribution of the corresponding output point; as shown in fig. 1-5, the method comprises the steps of:
step 101, aiming at the output points in each column, carrying the characteristic data required by calculating a first output point from an external memory to a fast memory of a neural network accelerator for calculation according to one-dimensional flattening to obtain the output data corresponding to the first output point;
Specifically, before step 101, as shown in fig. 4, the method further includes: and carrying the convolution kernel of the neural network system from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening. In a specific embodiment, the size of the feature data corresponding to each output point is consistent with the size of the convolution kernel. In a specific embodiment, the output data is calculated based on the characteristic data and the convolution kernel.
Specifically, the feature data and the convolution kernel generally store data in a format of [ Height, width, channel (depth) ] (HWC for short); the feature map is shown in fig. 2 and 3, taking the convolution kernel size (size) =3, and step size=2 as an example, the size of the feature data corresponding to each output point is 3×3 in the HW direction, so that, for example, the first output point in column 1 corresponds to the data of Channel where 0, c, c×2, p+0, p+c, p+c×2, p×2+0, p×2+c, p×2+c×2 in fig. 3; and the data is carried from the external memory to the fast internal memory of the neural network accelerator for calculation after one-dimensional flattening, and the obtained output result is shown in fig. 5, wherein the output result is the data where 0 is located.
102, Conveying the multiplexing characteristic data to the head of the rapid memory, carrying the rest characteristic data from the external memory to the rear part of the rapid memory according to one-dimensional flattening, and calculating to obtain output data corresponding to the current output point until the output data of the last output point in the column is obtained;
the multiplexing characteristic data is the same part in the characteristic data corresponding to the last output point and the current output point after calculation, and the last output point and the current output point are adjacent output points; and the multiplexing characteristic data and the residual characteristic data form characteristic data required for calculating the current output point.
In a specific embodiment, the position difference of the feature data corresponding to the adjacent output points in each column in the feature map is preset to a step length; and the characteristic data corresponding to the position of the preset step length is the residual characteristic data.
After calculating the output data of the first output point, calculating a second output point, where the second output point is adjacent to the first output point on the Height, and there is a multiplexed portion, and still taking fig. 3 as an example, the second output point in column 1 corresponds to fig. 3: data of channels where p2+0, p2+c 2, p3+0, p3+c 2, p4+0, p4+c 2 are located, whereby the same portion as the first output point, i.e. data of channels where p2+0, p2+c 2 are located, is present, whereby P2+0, p2+c 2 are transported to the head of the flash memory when calculating the second output point, and the remaining characteristic data (i.e. data of channels where p+3+0, p+3+c, p+3+c+2, p+4+0, p+4+c, p+4+c+c) are located) are transferred from the external memory to the rear portion of the fast memory, so that the convolution kernel, the multiplexing characteristic data and the remaining characteristic data are combined to calculate the second output point, and the obtained output result is shown in fig. 5, where p+0 is the data of the channels where p+0 is located.
As for the third output point and the fourth output point in the 1 st column, calculation is performed according to the above method until the output points in the column are all calculated; the calculation of the output points of the 2 nd column is performed, and the specific calculation mode of the 2 nd column is the same as the calculation mode of the 1 st column, that is, steps 101-102 are performed, and finally, the calculation of all the output points of all the columns is completed.
Example 2
The embodiment 2 of the invention also discloses a method for reducing the calculation bandwidth of the neural network accelerator, which is shown in fig. 2-5 and comprises the following steps:
step 1, carrying a convolution kernel from an external memory to a fast memory of a neural network accelerator according to one-dimensional flattening;
Step 2, the characteristic data (data of channels where 0, c 2, p+0, p+c 2, p+2+0, p+2+c 2) required for calculating the first output point in fig. 3 are transferred from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening;
step 3, performing inner product (or other calculation methods) of the vector to calculate matrix convolution to obtain first output data;
Step 4A, the reusable part (the data of channels where p+0, p+2+c, p+2+c+c in fig. 3 are located) of the feature data required for calculating the first output point and the second output point is independently carried in the fast memory by the neural network accelerator, and moved to the head of the internal fast memory, which has a much higher efficiency than the access to the external;
Step 4B, calculating the remaining characteristic data (data of channels where p×3+0, p×3+c, p×3+c+c 2, p×4+0, p×4+c 2 in fig. 3) required for the second output point, with a size of (2×3×channels), and carrying the remaining characteristic data from the external memory to the second half of the fast memory of the neural network accelerator according to one-dimensional flattening;
step 5, performing inner product (or other calculation method) of the vector to calculate matrix convolution to obtain a second output;
And repeating the above operation for other output points, and calculating to complete the whole feature map.
Based on the method, the interaction between the neural network accelerator and the external memory is reduced by 1- (2×3×channel)/(3×3×channel) =1/3. The above example is described based on calculating output data points point by point, but the practical application of the method is not limited thereto, and is applicable to the case of calculating output data points in parallel/serial at multiple points.
Example 3
The embodiment 3 of the invention also discloses equipment for reducing the calculation bandwidth of the neural network accelerator, which is applied to a feature graph comprising a plurality of output points and a plurality of feature data, wherein the output points are distributed in different columns in the feature graph, and the positions of the output points in each column are sequentially arranged from top to bottom according to the Height direction; each output point corresponds to the characteristic data, and the position distribution of the characteristic data is consistent with the position distribution of the corresponding output point; as shown in fig. 6, the apparatus includes:
the first processing module 201 is configured to perform calculation on feature data required for calculating a first output point for each output point in each column from an external memory to a fast memory of the neural network accelerator according to one-dimensional flattening, so as to obtain output data corresponding to the first output point;
A second processing module 202, configured to carry the multiplexed feature data to the header of the flash memory, and perform calculation on the remaining feature data from the external memory to the rear portion of the flash memory according to one-dimensional flattening, so as to obtain output data corresponding to a current output point, until output data of a last output point in the column is obtained;
the multiplexing characteristic data is the same part in the characteristic data corresponding to the last output point and the current output point after calculation, and the last output point and the current output point are adjacent output points; and the multiplexing characteristic data and the residual characteristic data form characteristic data required for calculating the current output point.
In a specific embodiment, as shown in fig. 7, the method further includes:
And the convolution kernel module 203 is used for carrying the convolution kernels of the neural network system from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening.
In a specific embodiment, the size of the feature data corresponding to each output point is consistent with the size of the convolution kernel.
In a specific embodiment, the output data is calculated based on the characteristic data and the convolution kernel.
In one specific embodiment of the present invention,
The position difference of the feature data corresponding to the adjacent output points in each column in the feature map is preset with a step length;
and the characteristic data corresponding to the position of the preset step length is the residual characteristic data.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the invention.
Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario.
The foregoing disclosure is merely illustrative of some embodiments of the invention, and the invention is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the invention.

Claims (10)

1. The method for reducing the calculation bandwidth of the neural network accelerator is characterized by being applied to a feature map comprising a plurality of output points and a plurality of feature data, wherein the output points are distributed in different columns in the feature map, and the positions of the output points in each column are sequentially arranged from top to bottom according to the Height direction; each output point corresponds to the characteristic data, and the position distribution of the characteristic data is consistent with the position distribution of the corresponding output point; the method comprises the following steps:
Aiming at the output points in each column, carrying out calculation on the characteristic data required by calculating a first output point from an external memory to a fast memory of a neural network accelerator according to one-dimensional flattening to obtain output data corresponding to the first output point;
After the output data of the first output point is calculated, calculating a second output point, wherein the specific second output point is adjacent to the first output point on the Height and has a multiplexing part; carrying the multiplexing characteristic data to the head of the rapid memory, carrying the rest characteristic data from the external memory to the rear part of the rapid memory for calculation according to one-dimensional flattening, and obtaining output data corresponding to the current output point until obtaining the output data of the last output point in the column;
the multiplexing characteristic data is the same part in the characteristic data corresponding to the last output point and the current output point after calculation, and the last output point and the current output point are adjacent output points; and the multiplexing characteristic data and the residual characteristic data form characteristic data required for calculating the current output point.
2. The method of reducing computational bandwidth of a neural network accelerator of claim 1, further comprising, prior to "carrying the first characteristic data required to compute the first output point from memory to the fast memory of the neural network accelerator for computation according to one-dimensional flattening":
and carrying the convolution kernel of the neural network system from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening.
3. A method of reducing computational bandwidth of a neural network accelerator as defined in claim 2, wherein the size of the characteristic data corresponding to each of the output points corresponds to the size of the convolution kernel.
4. A method of reducing computational bandwidth of a neural network accelerator as defined in claim 2, wherein the output data is computed based on the characteristic data and the convolution kernel.
5. A method of reducing computational bandwidth of a neural network accelerator as defined in claim 1,
The position difference of the feature data corresponding to the adjacent output points in each column in the feature map is preset with a step length;
and the characteristic data corresponding to the position of the preset step length is the residual characteristic data.
6. The device for reducing the calculation bandwidth of the neural network accelerator is characterized by being applied to a feature map comprising a plurality of output points and a plurality of feature data, wherein the output points are distributed in different columns in the feature map, and the positions of the output points in each column are sequentially arranged from top to bottom according to the Height direction; each output point corresponds to the characteristic data, and the position distribution of the characteristic data is consistent with the position distribution of the corresponding output point; the apparatus includes:
The first processing module is used for carrying the characteristic data required by calculating the first output point from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening aiming at the output point in each column to obtain the output data corresponding to the first output point;
the second processing module is used for calculating a second output point after calculating the output data of the first output point, wherein the specific second output point is adjacent to the first output point on the Height and has a multiplexing part; multiplexing the characteristic data to be transported to the head of the rapid memory, flattening the rest characteristic data in one dimension, and calculating from the external memory to the rear part in the rapid memory to obtain the output data corresponding to the current output point until the output data of the last output point in the column is obtained;
the multiplexing characteristic data is the same part in the characteristic data corresponding to the last output point and the current output point after calculation, and the last output point and the current output point are adjacent output points; and the multiplexing characteristic data and the residual characteristic data form characteristic data required for calculating the current output point.
7. The apparatus for reducing computational bandwidth of a neural network accelerator of claim 6, further comprising:
And the convolution kernel module is used for carrying the convolution kernel of the neural network system from the external memory to the fast memory of the neural network accelerator according to one-dimensional flattening.
8. The apparatus for reducing computational bandwidth of a neural network accelerator of claim 7, wherein the size of the characteristic data corresponding to each of the output points corresponds to the size of the convolution kernel.
9. The apparatus for reducing computational bandwidth of a neural network accelerator of claim 7, wherein the output data is computed based on the characteristic data and the convolution kernel.
10. An apparatus for reducing computational bandwidth of a neural network accelerator as defined in claim 6,
The position difference of the feature data corresponding to the adjacent output points in each column in the feature map is preset with a step length;
and the characteristic data corresponding to the position of the preset step length is the residual characteristic data.
CN202010753645.8A 2020-07-30 2020-07-30 Method and equipment for reducing calculation bandwidth of neural network accelerator Active CN111914999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010753645.8A CN111914999B (en) 2020-07-30 2020-07-30 Method and equipment for reducing calculation bandwidth of neural network accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010753645.8A CN111914999B (en) 2020-07-30 2020-07-30 Method and equipment for reducing calculation bandwidth of neural network accelerator

Publications (2)

Publication Number Publication Date
CN111914999A CN111914999A (en) 2020-11-10
CN111914999B true CN111914999B (en) 2024-04-19

Family

ID=73286459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010753645.8A Active CN111914999B (en) 2020-07-30 2020-07-30 Method and equipment for reducing calculation bandwidth of neural network accelerator

Country Status (1)

Country Link
CN (1) CN111914999B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
CN110705702A (en) * 2019-09-29 2020-01-17 东南大学 Dynamic extensible convolutional neural network accelerator

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11227214B2 (en) * 2017-11-14 2022-01-18 Advanced Micro Devices, Inc. Memory bandwidth reduction techniques for low power convolutional neural network inference applications

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN109993297A (en) * 2019-04-02 2019-07-09 南京吉相传感成像技术研究院有限公司 A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing
CN110705702A (en) * 2019-09-29 2020-01-17 东南大学 Dynamic extensible convolutional neural network accelerator

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于FPGA的卷积神经网络加速器设计与实现;仇越;马文涛;柴志雷;;微电子学与计算机;20180805(08);全文 *
一种支持稀疏卷积的深度神经网络加速器的设计;周国飞;;电子技术与软件工程;20200215(04);全文 *

Also Published As

Publication number Publication date
CN111914999A (en) 2020-11-10

Similar Documents

Publication Publication Date Title
KR102523263B1 (en) Systems and methods for hardware-based pooling
US10977001B2 (en) Asymmetric quantization of multiple-and-accumulate operations in deep learning processing
EP3816824A1 (en) High throughput matrix processor with support for concurrently processing multiple matrices
EP3407203A2 (en) Statically schedulable feed and drain structure for systolic array architecture
KR20180060149A (en) Convolution processing apparatus and method
CN112232426B (en) Training method, device and equipment of target detection model and readable storage medium
US11537865B2 (en) Mapping convolution to a channel convolution engine
CN112183295A (en) Pedestrian re-identification method and device, computer equipment and storage medium
US20230068450A1 (en) Method and apparatus for processing sparse data
EP3835949A1 (en) Hardware accelerated matrix manipulation operations using processor instructions
CN112215345B (en) Convolutional neural network operation method and device based on Tenscorore
CN111914213B (en) Sparse matrix vector multiplication operation time prediction method and system
CN111860276A (en) Human body key point detection method, device, network equipment and storage medium
CN111914999B (en) Method and equipment for reducing calculation bandwidth of neural network accelerator
CN110796229B (en) Device and method for realizing convolution operation
CN109324984B (en) Method and apparatus for using circular addressing in convolution operations
KR102470027B1 (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
KR20230081697A (en) Method and apparatus for accelerating dilatational convolution calculation
CN111047025B (en) Convolution calculation method and device
CN111325281B (en) Training method and device for deep learning network, computer equipment and storage medium
CN114037054A (en) Data processing method, device, chip, equipment and medium
US20130018773A1 (en) Order matching
CN110930290B (en) Data processing method and device
CN112825151A (en) Data processing method, device and equipment
CN111475304A (en) Feature extraction acceleration method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant