CN108647779B - Reconfigurable computing unit of low-bit-width convolutional neural network - Google Patents

Reconfigurable computing unit of low-bit-width convolutional neural network Download PDF

Info

Publication number
CN108647779B
CN108647779B CN201810318783.6A CN201810318783A CN108647779B CN 108647779 B CN108647779 B CN 108647779B CN 201810318783 A CN201810318783 A CN 201810318783A CN 108647779 B CN108647779 B CN 108647779B
Authority
CN
China
Prior art keywords
register
data
shift
current period
reconfigurable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810318783.6A
Other languages
Chinese (zh)
Other versions
CN108647779A (en
Inventor
曹伟
王伶俐
罗成
谢亮
范锡添
周学功
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201810318783.6A priority Critical patent/CN108647779B/en
Publication of CN108647779A publication Critical patent/CN108647779A/en
Application granted granted Critical
Publication of CN108647779B publication Critical patent/CN108647779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a reconfigurable computing unit of a low bit width convolution neural network. The unit includes: the reconfigurable shift accumulation module comprises a plurality of reconfigurable shift accumulation modules, a multi-channel gate and a quantization processing module; the reconfigurable shifting accumulation module comprises a controller, a first register, a second register, a third register and a shifting accumulator; the method comprises the steps that a controller, a first register, a second register, a third register and a shift accumulator are constructed by utilizing network discreteness, whether fixed point data and index weight of a current period are zero or not is judged through the controller, and once the fixed point data and the index weight of the current period are detected to be zero, the third register is controlled to output shift accumulated data of the current period according to a first trigger signal sent by the first register and a second trigger signal sent by the second register; the invention can realize the flexible fixed-point multiply-accumulate operation of 4 bits and 8 bits, improve the shift accumulate operation speed and reduce the memory and power consumption occupied by the operation.

Description

Reconfigurable computing unit of low-bit-width convolutional neural network
Technical Field
The invention relates to the technical field of reconfigurable computing, in particular to a reconfigurable computing unit of a low-bit-width convolutional neural network.
Background
With the development of artificial intelligence, deep learning has achieved great success in the fields of speech recognition, computer vision, automatic driving and the like, and further development of the fields is promoted. The core technology for promoting the development of deep learning research is a convolutional neural network. The target recognition technology using Convolutional Neural Network (Convolutional Neural Network) defeats the traditional image recognition method in the large-scale image recognition competition ILSVRC2012 held in 2012, announcing the arrival of the deep academic era. With the continuous development of the deep learning technology, the structure of the convolutional neural network is also continuously optimized, and the recognition performance is also continuously improved. Whereas on the large-scale image recognition contest ILSVRC2015 held in 2015, the convolutional neural network surpassed the image recognition capability of human for the first time. This milestone event marks a great success of deep learning techniques.
With the continuous improvement of the performance of the convolutional neural network, the network structure becomes more and more complex, corresponding to more computing requirements and storage requirements. In order to support the calculation of the convolutional neural network, a network processing flow is generally operated on a server and a data center, and when data interaction is performed with the data center, a large amount of data needs to be transmitted, so that great delay is brought, and the application of the convolutional neural network in embedded devices such as smart phones and smart cars is hindered. To address this problem, academia and industry began to study how to deploy convolutional neural networks onto accelerators of embedded hardware systems, and therefore many effective convolutional neural network accelerators have been designed with specialized computational units (PEs), typically using fixed computational units for different convolutional neural network models. Due to the diversity of convolutional neural networks, fixed computational units may not be suitable when the network model changes, which increases data movement and compromises power efficiency. Moreover, their convolution mapping methods are not very scalable to various convolution parameters, and mismatches between network shapes and computational resources can occur, thereby reducing resource utilization and performance. Therefore, how to design reconfigurable computing units for different networks becomes a matter of intense research in the art.
The existing reconfigurable computing unit basically adopts a special DSP (Digital Signal Processing) for computing, while the DSP computing unit is designed for floating-point type operation, in the design of common floating-point convolutional neural network hardware, the DSP unit is usually adopted for Multiply-and-accumulate operation (MAC), and one DSP can be used for completing one Multiply-and-accumulate operation in one clock cycle. However, the DSP computing unit is not suitable for multiply-accumulate operation with low bit width, and this disadvantage makes it unable to exert its full capability in low bit width hardware design.
In order to solve the problem, Xilinx corporation introduced a special DSP mapping technique, and aiming at the FPGA chip design introduced by Xilinx, the DSP computing unit on each FPGA chip can realize parallel two times of eight-bit multiply-accumulate operations. The technology gives full play to the computing power of the DSP on the FPGA chip and improves the area and power consumption performance of the FPGA. However, the application range of the technique is too narrow, and the technique can only be used for fixed-point multiply-accumulate operation with eight bit widths and cannot be applied to special operation requirements of the exponential convolution neural network. Based on the above problems, how to overcome the above problems is a problem to be solved in the art.
Disclosure of Invention
The invention aims to provide a reconfigurable computing unit of a low-bit-width convolutional neural network, which is used for meeting the operation requirement of an exponential convolutional neural network, not only realizing flexible fixed-point multiply-accumulate operation of 4 bits and 8 bits, but also improving the shift-accumulate operation rate and reducing the memory and power consumption occupied by operation.
In order to achieve the above object, the present invention provides a low bit width convolutional neural network reconfigurable computing unit, which is applied to a shift accumulation operation of an exponential convolutional neural network, and includes: the reconfigurable shift accumulation module comprises a plurality of reconfigurable shift accumulation modules, a multi-channel gate and a quantization processing module;
the multi-path gate is respectively connected with each reconfigurable shift accumulation module and is used for selecting the shift accumulation data of the current period output by the reconfigurable shift accumulation module; the quantization processing module is connected with the multi-channel gate and is used for performing quantization processing according to the shift accumulated data of the current period to obtain quantization processing data; wherein:
the reconfigurable shifting accumulation module comprises a controller, a first register, a second register, a third register and a shifting accumulator;
the controller is used for judging whether the exponential weight data of the current period is negative; if the exponential weight data of the current period is negative, data shifting accumulation operation is not needed, and the judgment of the exponential weight data of the next period is waited; if the exponential weight data of the current period is not a negative number, judging whether the exponential weight data of the current period is 0; if the exponential weight data of the current period is not 0, controlling a first register to store the exponential weight data of the current period; when the exponential weight data of the current period is 0, the first register sends out a first trigger signal under the control;
the controller is also used for judging whether the fixed point number data of the current period is negative; if the fixed point number data of the current period is negative, data shifting accumulation operation is not needed, and the fixed point number data of the next period is waited to be judged; if the fixed point number data of the current period is not negative, judging whether the fixed point number data of the current period is 0 or not; if the fixed point number data of the current period is not 0, controlling a second register to store the fixed point number data of the current period; if the fixed point number data of the current period is 0, controlling a second register to send out a second trigger signal;
the third register is respectively connected with the first register and the second register, and is used for controlling the third register to output shift accumulated data of the current period according to a first trigger signal sent by the first register or a second trigger signal sent by the second register; the third register is also used for storing the shift accumulated data of the previous period;
the shift accumulator is respectively connected with the first register, the second register and the third register, and is used for determining shift accumulation data of the current period according to the exponential weight data of the last period stored by the first register, the fixed point data of the last period stored by the second register and the first shift accumulation data of the last period stored by the third register, and storing the shift accumulation data of the current period in the third register.
Preferably, the shift accumulator includes:
the shifter is respectively connected with the first register and the second register and used for determining shift data according to the exponential weight data stored in the first register and the fixed point data stored in the second register;
and the accumulator is respectively connected with the shifter and the third register and used for determining the shift accumulated data of the current period according to the shift data determined by the shifter and the first shift accumulated data of the previous period stored by the third register.
Preferably, the low bit width reconfigurable shift accumulation module further includes:
and the output register is connected with the third register and is used for storing the shift accumulated data of the current period output by the third register.
Preferably, the exponential weight data is 4 bits.
Preferably, the fixed point number data is 8 bits.
Preferably, the shifted accumulated data is 18 bits.
Preferably, the quantization processing data is 8-bit data.
Compared with the prior art, the invention has the following technical effects:
the method comprises the steps that a controller, a first register, a second register, a third register and a shift accumulator are constructed by utilizing network discreteness, whether fixed point data and index weight of a current period are zero or not is judged through the controller, and once the fixed point data and the index weight of the current period are detected to be zero, the third register is controlled to output shift accumulated data of the current period according to a first trigger signal sent by the first register and a second trigger signal sent by the second register; the method can realize flexible fixed-point multiply-accumulate operation of 4 bits and 8 bits, improve the shift accumulation operation rate and reduce the memory and power consumption occupied by operation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a block diagram of a reconfigurable computing unit of a low bit width convolutional neural network according to an embodiment of the present invention;
fig. 2 is a structural diagram of a low bit width reconfigurable shift accumulation module according to an embodiment of the present invention.
10, a reconfigurable shift accumulation module 11, a controller 12, a first register 13, a second register 14, a shifter 15, an accumulator 16, a third register 17, an output register 20, a multi-way gate 30 and a quantization processing module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a reconfigurable computing unit of a low-bit-width convolutional neural network, which is used for meeting the operation requirement of an exponential convolutional neural network, not only realizing flexible fixed-point multiply-accumulate operation of 4 bits and 8 bits, but also improving the shift-accumulate operation rate and reducing the memory and power consumption occupied by operation.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
FIG. 1 is a block diagram of a reconfigurable computing unit of a low bit width convolutional neural network according to an embodiment of the present invention; fig. 2 is a block diagram of a low bit width reconfigurable shift accumulation module according to an embodiment of the present invention, and as shown in fig. 1 to fig. 2, the present invention provides a low bit width reconfigurable calculation unit of a convolutional neural network, where the low bit width reconfigurable calculation unit of the convolutional neural network is applied to a shift accumulation operation of an exponential convolutional neural network, and the low bit width reconfigurable calculation unit of the convolutional neural network includes: a plurality of reconfigurable shift accumulation modules 10, a multiplexer 20 and a quantization processing module 30;
the reconfigurable shift accumulation module 10 comprises a controller 11, a first register 12, a second register 13, a third register 16 and a shift accumulator 15.
The controller 11 is configured to determine whether the exponential weight data of the current period is a negative number; if the exponential weight data of the current period is negative, data shifting accumulation operation is not needed, and the judgment of the exponential weight data of the next period is waited; if the exponential weight data of the current period is not a negative number, judging whether the exponential weight data of the current period is 0; if the exponential weight data of the current period is not 0, controlling the first register 12 to store the exponential weight data of the current period; the exponential weight data of the current period is 0, and the first register 12 sends out a first trigger signal under control.
The controller 11 is further configured to determine whether fixed-point data in the current period is negative; if the fixed point number data of the current period is negative, data shifting accumulation operation is not needed, and the fixed point number data of the next period is waited to be judged; if the fixed point number data of the current period is not negative, judging whether the fixed point number data of the current period is 0 or not; if the fixed point number data of the current period is not 0, controlling the second register 13 to store the fixed point number data of the current period; and if the fixed point number data of the current period is 0, controlling the second register 13 to send out a second trigger signal.
The third register 16 is connected to the first register 12 and the second register 13, respectively, and the third register 16 is configured to control the third register 16 to output shift accumulation data of a current period according to a first trigger signal sent by the first register 12 or a second trigger signal sent by the second register 13; the third register 16 is also used for storing the shift accumulated data of the last period.
The shift accumulator 15 is connected to the first register 12, the second register 13, and the third register 16, respectively, and the shift accumulator 15 is configured to determine shift accumulation data of a current period according to the exponential weight data of a previous period stored in the first register 12, fixed point data of a previous period stored in the second register 13, and first shift accumulation data of a previous period stored in the third register 16, and store the shift accumulation data of the current period in the third register 16.
And the multiplexer 20 is respectively connected to each of the low-bit-width reconfigurable shift accumulation modules 10, and is configured to select shift accumulation data of the current period output by the low-bit-width reconfigurable shift accumulation module 10.
And the quantization processing module 30 is connected to the multiplexer 20 and configured to perform quantization processing on the shift accumulated data of the current period to obtain quantized data. The quantization processing data is 8-bit data.
The shift accumulator 15 of the present invention includes:
and the shifter 14 is respectively connected with the first register 12 and the second register 13, and is used for determining shift data according to the exponential weight data stored in the first register 12 and the fixed point data stored in the second register 13.
And the accumulator 15 is respectively connected with the shifter 14 and the third register 16, and is configured to determine shift accumulated data of a current period according to the shift data determined by the shifter 14 and the first shift accumulated data of a previous period stored in the third register 16.
The reconfigurable shift accumulation module 10 of the low bit width convolutional neural network of the present invention further comprises: and the output register 17 is connected with the third register 16 and is used for storing the shift accumulated data of the current period output by the third register 16.
The exponential weight data is 4 bits.
The fixed point number data is 8 bits.
The shift accumulation data of the present invention is 18 bits.
Because the convolutional neural network contains a large part of discreteness, the power performance of hardware design can be greatly improved by fully utilizing the discreteness performance of the network, so that in order to further improve the performance of the reconfigurable computing unit, the discreteness of the convolutional neural network is expanded and the power consumption performance of the network is improved by utilizing the discreteness. Research shows that about 40% -60% of input data in the convolutional neural network are zero values, a large part of small data in weight data can be trimmed, the precision of the network is not influenced, therefore, multiplication and addition containing the zero values are meaningless, and the output result is not influenced, so once fixed point data and exponential weight of a previous period are detected to be zero, the third register 16 is controlled to output shift accumulated data of the current period according to a first trigger signal sent by the first register 12 and a second trigger signal sent by the second register 13.
The quantization processing module 30 of the present invention quantizes the 18-bit shift accumulated data to obtain 8-bit quantization processed data.
In the shift-accumulation calculation process, the widths of the shift accumulation data of the previous period and the output shift accumulation data of the current period are obviously larger than the widths of the fixed point number data and the exponential weight data, because a larger calculation range is needed in the shift-accumulation calculation to avoid calculation overflow. The invention sets the width of the shift accumulation data of the current period of 18-bits to be capable of completely accommodating the shift accumulation data of the current period obtained by all shift-accumulation operations.
The experiment board card of the xc7z020clg400-2 model adopted by the invention is used for testing, and has the following advantages: (1) the reconfigurable computing unit designed by the invention improves the shift accumulation operation rate. Through tests, the common neural network accelerator structure adopting the common reconfigurable multiply-accumulate unit occupies 95 LUTs, the calculation power consumption is 1.658, the common neural network accelerator structure adopting the reconfigurable calculation unit designed by the invention only occupies 46 LUTs, the calculation power consumption is only 1W, and obviously, the reconfigurable calculation unit designed by the invention is close to twice of the operation frequency of the common multiply-accumulate unit. (2) The reconfigurable computing unit designed by the invention fully utilizes reconfigurable performance, supports a network structure with multi-bit wide and multi-configuration and realizes flexible bit width configuration of 4-8 bits. (3) The reconfigurable computing unit designed by the invention fully utilizes the discreteness of the network and further improves the hardware performance. (4) The invention enables the exponential convolution neural network to be effectively mapped on an embedded system, and further reduces the area and power expenditure.
TABLE 1 comparison table of reconfigurable computing unit and reconfigurable multiply-accumulate unit
Figure 89016DEST_PATH_IMAGE002
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (7)

1. A low bit width convolution neural network reconfigurable computing unit is characterized in that the low bit width convolution neural network reconfigurable computing unit is applied to displacement accumulation operation of an exponential convolution neural network, and comprises the following steps: the reconfigurable shift accumulation module comprises a plurality of reconfigurable shift accumulation modules, a multi-channel gate and a quantization processing module; the multi-path gate is respectively connected with each reconfigurable shift accumulation module and is used for selecting the shift accumulation data of the current period output by the reconfigurable shift accumulation module; the quantization processing module is connected with the multi-channel gate and is used for performing quantization processing according to the shift accumulated data of the current period to obtain quantization processing data; wherein:
the reconfigurable shifting accumulation module comprises a controller, a first register, a second register, a third register and a shifting accumulator;
the controller is used for judging whether the exponential weight data of the current period is negative; if the exponential weight data of the current period is negative, data shifting accumulation operation is not needed, and the judgment of the exponential weight data of the next period is waited; if the exponential weight data of the current period is not a negative number, judging whether the exponential weight data of the current period is 0; if the exponential weight data of the current period is not 0, controlling a first register to store the exponential weight data of the current period; when the exponential weight data of the current period is 0, the first register sends out a first trigger signal under the control;
the controller is also used for judging whether the fixed point number data of the current period is negative; if the fixed point number data of the current period is negative, data shifting accumulation operation is not needed, and the fixed point number data of the next period is waited to be judged; if the fixed point number data of the current period is not negative, judging whether the fixed point number data of the current period is 0 or not; if the fixed point number data of the current period is not 0, controlling a second register to store the fixed point number data of the current period; if the fixed point number data of the current period is 0, controlling a second register to send out a second trigger signal;
the third register is respectively connected with the first register and the second register, and is used for controlling the third register to output shift accumulated data of the current period according to a first trigger signal sent by the first register or a second trigger signal sent by the second register; the third register is also used for storing the shift accumulated data of the previous period;
the shift accumulator is respectively connected with the first register, the second register and the third register, and is used for determining shift accumulation data of the current period according to the exponential weight data of the last period stored by the first register, the fixed point data of the last period stored by the second register and the first shift accumulation data of the last period stored by the third register, and storing the shift accumulation data of the current period in the third register.
2. The low bit width convolutional neural network reconfigurable computing unit of claim 1, wherein the shift accumulator comprises:
the shifter is respectively connected with the first register and the second register and used for determining shift data according to the exponential weight data stored in the first register and the fixed point data stored in the second register;
and the accumulator is respectively connected with the shifter and the third register and used for determining the shift accumulated data of the current period according to the shift data determined by the shifter and the first shift accumulated data of the previous period stored by the third register.
3. The low bit width convolutional neural network reconfigurable computing unit of claim 1, wherein the reconfigurable shift accumulation module further comprises:
and the output register is connected with the third register and is used for storing the shift accumulated data of the current period output by the third register.
4. The low bit width convolutional neural network reconfigurable computing unit of claim 1, wherein the exponential weight data is 4 bits.
5. The low bit width convolutional neural network reconfigurable computing unit of claim 1, wherein the fixed point number data is 8 bits.
6. The low bit width convolutional neural network reconfigurable computing unit of claim 1, wherein the shift accumulation data is 18 bits.
7. The low bit width convolutional neural network reconfigurable computing unit of claim 1, wherein the quantized processed data is 8-bit data.
CN201810318783.6A 2018-04-11 2018-04-11 Reconfigurable computing unit of low-bit-width convolutional neural network Active CN108647779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810318783.6A CN108647779B (en) 2018-04-11 2018-04-11 Reconfigurable computing unit of low-bit-width convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810318783.6A CN108647779B (en) 2018-04-11 2018-04-11 Reconfigurable computing unit of low-bit-width convolutional neural network

Publications (2)

Publication Number Publication Date
CN108647779A CN108647779A (en) 2018-10-12
CN108647779B true CN108647779B (en) 2021-06-04

Family

ID=63745967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810318783.6A Active CN108647779B (en) 2018-04-11 2018-04-11 Reconfigurable computing unit of low-bit-width convolutional neural network

Country Status (1)

Country Link
CN (1) CN108647779B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389212B (en) * 2018-12-30 2022-03-25 南京大学 Reconfigurable activation quantization pooling system for low-bit-width convolutional neural network
CN110084362B (en) * 2019-03-08 2021-07-20 中国科学院计算技术研究所 Logarithmic quantization device and method for neural network
CN110109646B (en) * 2019-03-28 2021-08-27 北京迈格威科技有限公司 Data processing method, data processing device, multiplier-adder and storage medium
CN111767980B (en) * 2019-04-02 2024-03-05 杭州海康威视数字技术股份有限公司 Model optimization method, device and equipment
CN110728365B (en) * 2019-09-12 2022-04-01 东南大学 Method for selecting calculation bit width of multi-bit-width PE array and calculation precision control circuit
US10872295B1 (en) 2019-09-19 2020-12-22 Hong Kong Applied Science and Technology Institute Company, Limited Residual quantization of bit-shift weights in an artificial neural network
CN111738427B (en) * 2020-08-14 2020-12-29 电子科技大学 Operation circuit of neural network
CN113610222B (en) * 2021-07-07 2024-02-27 绍兴埃瓦科技有限公司 Method, system and hardware device for calculating convolutional operation of neural network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060730B2 (en) * 2008-05-30 2011-11-15 Freescale Semiconductor, Inc. Selective MISR data accumulation during exception processing
CN104539263A (en) * 2014-12-25 2015-04-22 电子科技大学 Reconfigurable low-power dissipation digital FIR filter
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Many computing unit coarseness reconfigurable systems and method of recurrent neural network
CN107077322A (en) * 2014-11-03 2017-08-18 Arm 有限公司 Apparatus and method for performing translation operation
CN107580712A (en) * 2015-05-08 2018-01-12 高通股份有限公司 Pinpoint the computation complexity of the reduction of neutral net
CN107797962A (en) * 2017-10-17 2018-03-13 清华大学 Computing array based on neutral net
CN107844826A (en) * 2017-10-30 2018-03-27 中国科学院计算技术研究所 Neural-network processing unit and the processing system comprising the processing unit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060730B2 (en) * 2008-05-30 2011-11-15 Freescale Semiconductor, Inc. Selective MISR data accumulation during exception processing
CN107077322A (en) * 2014-11-03 2017-08-18 Arm 有限公司 Apparatus and method for performing translation operation
CN104539263A (en) * 2014-12-25 2015-04-22 电子科技大学 Reconfigurable low-power dissipation digital FIR filter
CN107580712A (en) * 2015-05-08 2018-01-12 高通股份有限公司 Pinpoint the computation complexity of the reduction of neutral net
CN106775599A (en) * 2017-01-09 2017-05-31 南京工业大学 Many computing unit coarseness reconfigurable systems and method of recurrent neural network
CN107797962A (en) * 2017-10-17 2018-03-13 清华大学 Computing array based on neutral net
CN107844826A (en) * 2017-10-30 2018-03-27 中国科学院计算技术研究所 Neural-network processing unit and the processing system comprising the processing unit

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A quantum-implementable neural network model";Jialin Chen et al;《Quantum Information Processing》;20171031;第16卷(第10期);全文 *
"基于FPGA的卷积神经网络并行结构研究";陆志坚;《中国博士学位论文全文数据库 信息科技辑》;20140415;全文 *

Also Published As

Publication number Publication date
CN108647779A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108647779B (en) Reconfigurable computing unit of low-bit-width convolutional neural network
CN108564168B (en) Design method for neural network processor supporting multi-precision convolution
CN108171317B (en) Data multiplexing convolution neural network accelerator based on SOC
WO2020258841A1 (en) Deep neural network hardware accelerator based on power exponent quantisation
CN107451659B (en) Neural network accelerator for bit width partition and implementation method thereof
CN110390385B (en) BNRP-based configurable parallel general convolutional neural network accelerator
CN109325591B (en) Winograd convolution-oriented neural network processor
CN109190756B (en) Arithmetic device based on Winograd convolution and neural network processor comprising same
WO2019218896A1 (en) Computing method and related product
CN107423816B (en) Multi-calculation-precision neural network processing method and system
CN110689126A (en) Device for executing neural network operation
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
CN109409511A (en) A kind of convolution algorithm data stream scheduling method for dynamic reconfigurable array
CN107256424B (en) Three-value weight convolution network processing system and method
CN109478144A (en) A kind of data processing equipment and method
CN110163358B (en) Computing device and method
CN115880132B (en) Graphics processor, matrix multiplication task processing method, device and storage medium
EP3444757A1 (en) Discrete data representation supported device and method for forward operation of artificial neural network
CN110580519A (en) Convolution operation structure and method thereof
CN109325590B (en) Device for realizing neural network processor with variable calculation precision
CN111930681B (en) Computing device and related product
EP3444758B1 (en) Discrete data representation-supporting apparatus and method for back-training of artificial neural network
CN110059797B (en) Computing device and related product
CN110059809B (en) Computing device and related product
CN111047034A (en) On-site programmable neural network array based on multiplier-adder unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant