CN111507465B - Configurable convolutional neural network processor circuit - Google Patents
Configurable convolutional neural network processor circuit Download PDFInfo
- Publication number
- CN111507465B CN111507465B CN202010545278.2A CN202010545278A CN111507465B CN 111507465 B CN111507465 B CN 111507465B CN 202010545278 A CN202010545278 A CN 202010545278A CN 111507465 B CN111507465 B CN 111507465B
- Authority
- CN
- China
- Prior art keywords
- module
- interval
- bit
- function
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention provides a configurable convolutional neural network processor circuit, which comprises an FIR (finite impulse response) filtering module, a windowing processing module and a neural network operation module, wherein the neural network operation module comprises a convolutional layer, a pooling layer, a configurable activation function layer and a full-connection layer, the configurable activation function layer comprises an absolute value taking module, an interval judgment module, a first multiplexer, a configuration module, an address generation module, an RAM (random access memory), an interval expansion module and a second multiplexer, and the configurable activation function layer is configured with a sigmoid function or a tanh function and an error, so that the universality and the flexibility of a processor are greatly improved; by combining layered quantization and saturation truncation, the configurable quantization standard of each layer of neural network is realized, and the overflow risk is reduced; the FIR filtering function is realized by multiplexing the product accumulation operation units of the full connection layer, and the data are transmitted in a two-phase data transmission mode, so that the power consumption is further reduced.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a configurable convolutional neural network processor circuit.
Background
Artificial Intelligence (AI) is a strategic industry leading the future, and an AI chip is a key technical link in the whole field of Artificial Intelligence, is the basis of the Artificial Intelligence industry in China, and is an important level for realizing breakthrough of Artificial Intelligence. Deep learning is an important way for developing artificial intelligence, and the greatest difference from the traditional computing mode is that massive parallel computing is needed without large-scale logic programming, and a new special computing chip is urged to be generated due to the strong requirements of a new computing mode and new computing in the artificial intelligence era. The maturity of the deep learning algorithm, the calculation capacity improvement and the big data jointly promote the artificial intelligence to realize the leap-type development, and the artificial intelligence application develops endlessly to further promote the calculation capacity demand.
The deep convolutional neural network is one of the typical algorithms for deep learning, is implemented and applied on a software platform at present, and can be implemented simply and conveniently on a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU) through software programming due to the development of a plurality of deep learning frameworks. However, the CPU cannot well utilize the parallelism characteristic of the convolutional neural network algorithm, and thus cannot meet the requirements of low latency and low power consumption required by most applications. Although the convolution neural network realized on the GPU can well utilize the parallelism characteristic of the convolution neural network algorithm so as to obtain good performance, the overhigh power consumption can not meet the requirement of portable equipment. The conventional Application Specific Integrated Circuit (ASIC) dedicated artificial intelligence calculation acceleration Circuit realizes calculation by a dedicated Circuit structure for a certain specific algorithm, but has poor configurability and cannot follow up the high-speed development of the artificial intelligence algorithm.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a configurable convolutional neural network processor circuit, which greatly improves the universality and the accuracy of a neural network processor, improves a Finite Impulse Response (FIR) filtering module and a windowing processing module and reduces the chip area and the power consumption of the neural network processor by configuring an activation function layer structure and a quantization standard of each layer of neural network.
The specific technical scheme of the invention is as follows:
a configurable convolutional neural network processor circuit comprises an FIR filtering module, a windowing processing module and a neural network operation module, and is characterized in that the neural network operation module comprises a convolutional layer, a pooling layer, a configurable activation function layer and a full-connection layer, wherein the configurable activation function layer is configured with a sigmoid function or a tanh function and is also configured with an error;
the sigmoid function or tanh function fitting formula configured by the configurable activation function layer is obtained by the following method:
for inputDividing into different intervals with required error less thanSigmoid function or tanh function ofThe activation function is;
when in useWhen, toPerforming first-order Taylor expansion at 0 to obtain a fitting formulaWhen is coming into contact withWhen the abscissa isObtaining the first segment input interval;
When functionWhen the abscissa isObtaining the last input intervalSaidThe fitting formula of interval correspondence is;
According to the first segment input intervalAnd end input intervalObtaining the middle section input intervalInputting the middle section into the intervalIs divided intoSegment segmentation intervalWhereinAccording to the determination of the logic resource and the storage resource,the larger the comparison logic for judging the required segmentation interval, the more the mapping value is storedRequired storage resourcesThe fewer sources; will segment into intervalsDivided into intra-segment cells of equal lengthWhereinFor segmenting intervalsThe number of cells in the inner segment, and(ii) a Length of inter-cell within segment is determined by errorDetermining, in the segment intervalIn (1),correspond toIs provided withAnd is,Corresponding to an independent variable interval ofIf, ifThen take the segment intervalHas an intra-segment inter-cell length ofThe length of the intra-segment cells between different segment intervals followsIs increased by an increase in; the intra-segment cells adopt a direct mapping mode, namely, the intra-segment cells fall intoAll inputs within are mapped to the same output value;
secondly, according to the point symmetry properties of the sigmoid function and the tanh function:
obtaining sigmoid function or tanh function inFinally obtaining a fitting formula of the sigmoid function or the tanh function in the whole independent variable interval;
further, the configurable activation function layer includes an absolute value taking module, an interval judging module, a first multiplexer, a configuration module, an address generating module, a RAM (Random Access Memory), an interval expanding module, and a second multiplexer;
the configuration process of the configurable activation function layer comprises the following steps:
firstly, sequentially storing middle section input intervals in the RAMAll segment inter-cellMapping value corresponding to sequence numberThe serial number is an RAM address; and according to the activation function to be configuredIntroducing segmentation points of segmentation intervals into the configuration module for sigmoid function or tanh functionThe number of bits to be cut per segment intervalOffset number ofThe number of fixed points after quantization of '1' and 1-bit function switching bit; wherein the cutoff numberIn (1)For quantizing coefficients, i.e. the number of bits occupied by a small number of bits in the N-bit fixed-point number, said offset numberFirst intra-segment inter-cell as segmented inter-segmentIn the middle section input intervalIn the sequence numbers in the inter-cell sets in all the segments, 1 in the 1-bit function switching bit represents a tanh function, and 0 represents a sigmoid function;
second inputObtaining input through an absolute value obtaining moduleAbsolute value of (2)And sign bit, into whichFor signed N-bit fixed point number, absolute valueAn input interval judgment module for combining the segment points output from the configuration module to the interval judgment moduleJudging in the interval judging module to obtain the absolute valueThe section where the section is located controls the output of the first multiplexer according to the section judgment resultThe method specifically comprises the following steps:
if the interval judgment result isThen the first multiplexer outputsWhereinPerforming first-order Taylor expansion at 0 by a sigmoid function or a tanh function controlled by a 1-bit function switching bit;
if the interval judgment result isThen the first multiplexer outputsWherein 1 is the number of fixed points after quantization of '1' output by the configuration module;
if the interval judgment result isThe address generation module is started according to the truncated number output to the address generation module by the configuration moduleOffset number ofCalculating absolute valuesThe RAM address where the corresponding mapping value is located; the RAM receives the RAM address output by the address generation module and outputs a mapping valueOutput via the first multiplexer;
And then outputting the first multiplexerInput to the second multiplexer, and output of the absolute value moduleThe sign bit of (a) controls whether the second multiplexer is paired or notExpanding the interval if inputtingIf the sign bit of the signal is positive, the output isIf it is inputIs negative, the output is obtainedIs composed ofObtaining sigmoid function or tanh function through the output of the interval expansion moduleFitting value of;
The 1-bit function switching bit output by the configuration module controls the operation of the interval expansion module, the interval expansion module outputs a result according to the point symmetry property of the sigmoid function and the tanh function, as shown in a formula (2), if the 1-bit function switching bit is 1, the interval expansion module outputsIs output through a second multiplexerIf the 1-bit function switching bit is 0, the interval expansion module outputsIs output through a second multiplexerObtaining sigmoid function or tanh function inFitting value of(ii) a Where 1 is the quantized fixed-point number of "1" output by the configuration module.
Furthermore, the configurable activation function layer works according to the bit truncation numberOffset number ofCalculating absolute valuesThe steps corresponding to the RAM address where the mapping value is located are as follows: suppose thatFall intoThen the RAM address isI.e. absolute valuesMinus saidLeft boundary of intervalAfter, right shift by the number of truncationsPlus the offset numberGet inputAnd the sequence number between cells in the located segment is the RAM address.
Furthermore, the neural network operation module also comprises a layered quantization configuration module, the quantization standard of each layer of neural network is configured in a mode of combining layered quantization and saturation truncation so as to ensure that the calculation result of each layer of hardware cannot overflow as far as possible, the quantization standard of each layer of neural network is tested by software in advance, and then the layered quantization configuration module is configured to each layer of neural network; the configuration process of the hierarchical quantization configuration module is as follows:
the input of the convolution layer is a signed N-bit fixed point number, the input quantization standard is a quantization standard P-bit decimal number of the previous convolution layer, the middle value of the product accumulation operation of the current convolution layer is represented by a signed 2N-bit fixed point number, and the decimal point of the middle value is positioned between the 2P-th bit and the 2P + 1-th bit from the low position to the high position; if the quantization standard of the current layer neural network is set as Q-bit decimal, intercepting the post-decimal Q bit and the pre-decimal N-Q bit of the intermediate value expressed by the signed 2N-bit fixed point number as the signed N-bit fixed point number operation result of the current convolutional layer; and if the intercepted signed N-bit fixed point number operation result is overflow, saturation truncation is carried out on the overflow value, if the overflow is positive overflow, the operation result of the signed N-bit fixed point number judged to overflow is reset to be a positive maximum value, and if the overflow is negative overflow, the operation result of the signed N-bit fixed point number judged to overflow is reset to be a negative minimum value.
Furthermore, the fully-connected layer comprises a product accumulation operation unit, the FIR filtering module multiplexes the product accumulation operation unit of the fully-connected layer, the operation mode of the neural network operation module is divided into a neural network calculation mode and an FIR filtering calculation mode, and when the system is in the neural network calculation mode, the input of a multiplier in the product accumulation unit of the fully-connected layer is selected to be a fully-connected layer input characteristic diagram and a fully-connected layer weight; when the system is in an FIR filtering calculation mode, the input of a multiplier in the product accumulation unit of the full connection layer is selected as an input signal to be filtered and an FIR coefficient, and the FIR filtering module realizes the noise reduction processing of the input signal through a filter consisting of the FIR coefficient and the product accumulation operation unit.
Furthermore, the windowing processing module comprises a cache module, a windowing information calculating module and a data control module, and data are input to the neural network operation module by adopting a two-stage data transmission mode;
in the first stage of data transmission, denoised signals output by the FIR filtering module are simultaneously input into the cache module and the windowing information calculation module, the windowing information calculation module calculates the mark position of a windowing according to the input signals to serve as windowing information, the windowing information is output to the data control module, and the data control module reads data before the mark position stored in the cache module at one time and outputs the data to the neural network operation module; and then, performing second-stage data transmission, judging how many windows of data need to be received in real time after the data control module receives the windowing information sent by the windowing information calculation module, directly receiving the data after the mark position from the FIR filtering module in real time, outputting the data to the neural network operation module, and finally outputting all the data in the windows to the neural network operation module.
The invention has the beneficial effects that:
1. the invention can configure sigmoid function or tanh function and error by adopting the configurable activation function layer, thereby being capable of adapting to the derivation algorithm of a series of neural networks, and adopting different quantization strategies for different neural networks under different application scenes, and greatly improving the universality and flexibility compared with the traditional processor only supporting a single network;
2. according to the invention, a mode of combining layered quantization and saturation truncation is adopted, so that the configurable quantization standard of each layer of neural network is realized, the overflow risk is reduced, and the accuracy of the neural network processor is improved;
3. the invention provides a method for realizing FIR filtering function by a product accumulation operation unit of a full connection layer in a multiplexing neural network operation module, reducing the chip area of a processor and simultaneously reducing the calculation complexity and power consumption of a neural network;
4. compared with the mode of firstly caching and then reading in the network for all input signals, the two-stage data transmission mode can save more power consumption.
Drawings
FIG. 1 is a block diagram of a configurable convolutional neural network processor circuit of embodiment 1 of the present invention;
FIG. 2 is a block diagram of a configurable activation function layer in a configurable convolutional neural network processor circuit according to embodiment 1 of the present invention;
FIG. 3 is an activation function in a configurable convolutional neural network processor circuit according to embodiment 1 of the present inventionThe mapping relationship of (2);
fig. 4 is a block diagram of a windowing processing module in the configurable convolutional neural network processor circuit according to embodiment 1 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the following embodiments and the accompanying drawings.
Example 1
The embodiment provides a configurable convolutional neural network processor circuit, as shown in fig. 1, which includes an FIR filtering module, a windowing processing module, and a neural network operation module, and is characterized in that the neural network operation module includes a convolutional layer, a pooling layer, a configurable activation function layer, and a full connection layer, and the configurable activation function layer configures a sigmoid function or a tanh function and also configures an error;
the sigmoid function or tanh function fitting formula configured by the configurable activation function layer is obtained by the following method:
for inputDividing into different intervals with required error less thanSigmoid function or tanh function ofThe activation function is;
when in useWhen, toPerforming first-order Taylor expansion at 0 to obtain a fitting formulaWhen is coming into contact withWhen the abscissa isTo obtainFirst segment input interval(ii) a Wherein the fitting formula of the sigmoid function isThe fitting formula of the tanh function is;
When functionWhen the abscissa isObtaining the last input intervalSaidThe fitting formula of interval correspondence is;
According to the first segment input intervalAnd end input intervalObtaining the middle section input intervalInputting the middle section into the intervalDivision into 4 segmented intervals(ii) a Will segment into intervalsDivided into intra-segment cells of equal lengthWhereinFor segmenting intervalsThe number of cells in the inner segment, and(ii) a Length of inter-cell within segment is determined by errorDetermining, in the segment intervalIn (1),correspond toIs provided withAnd is,Corresponding to an independent variable interval ofIf, ifThen take the segment intervalHas an intra-segment inter-cell length ofThe length of the intra-segment cells between different segment intervals followsIs increased by an increase in; the intra-segment cells adopt a direct mapping mode, namely, the intra-segment cells fall intoAll inputs within are mapped to the same output value;
the activation functionThe mapping relation diagram is shown in FIG. 3, and is divided into a first segment of input intervalsFour-segment subsection interval of middle segment input intervalAnd end input interval。
According to the point symmetry properties of the sigmoid function and the tanh function:
obtaining sigmoid function or tanh function inFinally obtaining the fitting formula of the sigmoid function or the tanh function in the whole independent variable interval.
Further, the configurable activation function layer is shown in fig. 2 and includes an absolute value taking module, an interval judging module, a first multiplexer, a configuration module, an address generating module, a RAM, an interval expanding module, and a second multiplexer;
the configuration process of the configurable activation function layer comprises the following steps:
firstly, sequentially storing middle section input intervals in the RAMAll segment inter-cellMapping value corresponding to sequence numberThe serial number is an RAM address; and according to the activation function to be configuredIntroducing segmentation points of segmentation intervals into the configuration module for sigmoid function or tanh functionThe number of bits to be cut per segment intervalOffset number ofThe number of fixed points after quantization of '1' and 1-bit function switching bit; wherein the cutoff numberTo quantize coefficientsWithin-segment inter-cell minus base-2 segmentation intervalAbsolute value of logarithm of interval, quantized coefficientDetermining the number of bits occupied by the small digits in the N-digit fixed point number, and the offset numberFirst intra-segment inter-cell as segmented inter-segmentIn the middle section input intervalIn the sequence numbers in the inter-cell sets in all the segments, 1 in the 1-bit function switching bit represents a tanh function, and 0 represents a sigmoid function;
second inputObtaining input through an absolute value obtaining moduleAbsolute value of (2)And sign bit, into whichFor signed N-bit fixed point number, absolute valueAn input section judgment module for combining the section points of the section output from the configuration module to the section judgment moduleJudging in the interval judging module to obtain the absolute valueThe section where the section is located controls the output of the first multiplexer according to the section judgment resultThe method specifically comprises the following steps:
if the interval judgment result isThen the first multiplexer outputsWherein the fitting formula of the sigmoid function isThe fitting formula of the tanh function is;
If the interval judgment result isThen the first multiplexer outputsWherein 1 isThe number of fixed points after quantization of '1' output by the configuration module;
if the interval judgment result isThe address generation module is started according to the truncated number output to the address generation module by the configuration moduleOffset number ofComputingThe RAM address where the corresponding mapping value is: suppose thatFall intoThen the RAM address isI.e. inputAbsolute value of (2)Minus saidLeft boundary of intervalAfter, right shift by the number of truncationsPlus the offset numberGet inputThe sequence number between cells in the segment, namely the RAM address; the RAM receives the RAM address output by the address generation module and outputs a mapping valueOutput via the first multiplexer;
And then outputting the first multiplexerInput to the second multiplexer, and output of the absolute value moduleThe sign bit of (a) controls whether the second multiplexer is paired or notExpanding the interval if inputtingIf the sign bit of the signal is positive, the output isIf it is inputIs negative, the output is obtainedIs composed ofObtaining sigmoid function or tanh function through the output of the interval expansion moduleFitting value of;
The 1-bit function switching bit output by the configuration module controls the operation of the interval expansion module, the interval expansion module outputs a result according to the point symmetry property of the sigmoid function and the tanh function, as shown in a formula (2), if the 1-bit function switching bit is 1, the interval expansion module outputsIs output through a second multiplexerIf the 1-bit function switching bit is 0, the interval expansion module outputsIs output through a second multiplexerObtaining sigmoid function or tanh function inFitting value ofFinally obtaining sigmoid function or tanh functionFitting value of(ii) a Where 1 is the quantized fixed-point number of "1" output by the configuration module.
Furthermore, the neural network operation module also comprises a layered quantization configuration module, the quantization standard of each layer of neural network is configured in a mode of combining layered quantization and saturation truncation so as to ensure that the calculation result of each layer of hardware cannot overflow as far as possible, the quantization standard of each layer of neural network is tested by software in advance, and then the layered quantization configuration module is configured to each layer of neural network; the configuration process of the hierarchical quantization configuration module is as follows:
setting the input of the convolution layer as an N-bit fixed point number, the input quantization standard as a P-bit decimal number of the quantization standard of the previous convolution layer, and representing the intermediate value of the current convolution layer product accumulation operation by adopting a 2N-bit fixed point number, wherein the decimal point of the intermediate value is positioned between the 2P-th bit and the 2P + 1-th bit from the low position to the high position; if the quantization standard of the current layer neural network is Q-bit decimal, intercepting the Q bit after the decimal point of the intermediate value represented by the 2N-bit fixed point number and the N-Q bit before the decimal point as the N-bit fixed point number operation result of the current convolutional layer; and if the intercepted N-bit fixed point number operation result is still overflow, saturation truncation is carried out on the overflow value, if the overflow is positive overflow, the N-bit fixed point number operation result judged to be overflow is reset to be a positive maximum value, and if the overflow is negative overflow, the N-bit fixed point number operation result judged to be overflow is reset to be a negative minimum value.
Further, the fully-connected layer comprises a product accumulation operation unit, the FIR filtering module multiplexes the product accumulation operation unit of the fully-connected layer, the operation mode of the neural network operation module is divided into a neural network calculation mode and a FIR filtering calculation mode, and when the system is in the neural network calculation mode, the input of a multiplier in the product accumulation unit of the fully-connected layer is selected to be a fully-connected layer input characteristic diagram and a fully-connected layer weight; when the system is in an FIR filtering calculation mode, the input of a multiplier in the product accumulation unit of the full connection layer is selected as an input signal to be filtered and an FIR coefficient, and the FIR filtering module realizes the noise reduction processing on the input signal through a filter consisting of the FIR coefficient and the product accumulation operation unit, thereby effectively reducing the influence of noise on the identification precision, weakening the calculation complexity of a neural network and reducing the power consumption.
Further, the windowing processing module, as shown in fig. 4, includes a cache module, a windowing information calculating module, and a data control module, and is configured to input data to the neural network operation module in a two-stage data transmission mode;
first-stage data transmission: the denoised signal data output by the FIR filtering module is simultaneously input into the cache module and the windowing information calculation module, the windowing information calculation module calculates the mark position, such as the peak position, of a windowing according to the input signal, the mark position serves as windowing information, the windowing information is output to the data control module, and the data control module reads the data before the mark position stored in the cache module from the cache module at one time and outputs the data to the neural network operation module;
and second-stage data transmission: and the data control module judges how many windows of data need to be received in real time after receiving the windowing information sent by the windowing information calculation module, directly receives the data after the mark position from the FIR filtering module in real time and outputs the data to the neural network operation module.
And finally, all data in the window are output to the neural network operation module, so that the power consumption of outputting the data after the mark position is read from the cache module to the neural network operation module is saved.
Claims (6)
1. A configurable convolutional neural network processor circuit comprises an FIR filtering module, a windowing processing module and a neural network operation module, and is characterized in that the neural network operation module comprises a convolutional layer, a pooling layer, a configurable activation function layer and a full-connection layer, wherein the configurable activation function layer is configured with a sigmoid function or a tanh function and is also configured with an error;
the sigmoid function or tanh function of the configurable activation function layer configuration is (x), and the fitting formula in x ∈ [0, + ∞) is as follows:
wherein y is an activation function;
for the first segment input interval [0, x1) And (x) is subjected to first-order Taylor expansion at 0 to obtain a fitting formula y as a0x+b0,x1Is when yAbscissa of the case- (x), where, as error, a in the fitting equation of the sigmoid function0Is composed ofb0Is composed ofFitting formula of tanh function0Is 1, b0Is 0;
for the last segment input interval [ xK+1,+∞),xK+1Is the abscissa when (x) ═ 1-;
for the middle input interval [ x1,xK+1) Inputting the middle section into the interval [ x ]1,xK+1) Partitioning into K segmentsi,xi+1) K, and then segment interval [ x ═ 1i,xi+1) Divided into intra-segment cells of equal lengthWherein L isiIs a segment interval [ xi,xi+1) The number of cells in the inner segment; the intra-segment cells adopt a direct mapping mode and fall into the intra-segment cellsAll inputs within are mapped to the same output value
According to the point symmetry properties of the sigmoid function and the tanh function:
and obtaining a fitting formula of the sigmoid function or the tanh function in x ∈ (— ∞,0), and finally obtaining the fitting formula of the sigmoid function or the tanh function in the whole independent variable interval.
2. The configurable convolutional neural network processor circuit as claimed in claim 1, wherein the configurable activation function layer comprises an absolute value taking module, an interval judging module, a first multiplexer, a configuration module, an address generating module, a RAM, an interval expanding module and a second multiplexer; the configuration process of the configurable activation function layer comprises the following steps:
firstly, sequentially storing a middle section input interval [ x ] in the RAM1,xK+1) All segment inter-cellMapping value corresponding to sequence numberThe serial number is an RAM address; leading in a segmentation point x of a segmentation interval in the configuration module according to the fact that the activation function y to be configured is a sigmoid function or a tanh functioniThe number of bits to be cut per segment intervalBias number b (i), fixed point number after quantization of '1' and 1-bit function switching bit; wherein the segmentation point xiK +1, i of the truncation number n (i) is a quantization coefficient, and i of the offset number b (i) is 1, K, M of the truncation number n (i) is a first intra-segment inter-cell interval of a segment interval, and the offset number b (i) is a first intra-segment inter-cell interval of the segment intervalIn the middle input interval [ x1,xK+1) In the sequence numbers in the inter-cell sets in all the segments, 1 in the 1-bit function switching bit represents a tanh function, and 0 represents a sigmoid function;
secondly, the input x passes through an absolute value taking module to obtain an absolute value | x | and a sign bit of the input x, the absolute value | x | is input into an interval judgment module, and a segmentation point x output to the interval judgment module by a configuration module is combinediThe absolute value | x | is obtained by judgment in the interval judgment moduleIn the section interval, the first multiplexer is controlled to output y according to the interval judgment result1The method specifically comprises the following steps:
if the interval determination result is | x-<x1Then the first multiplexer outputs y1=a0|x|+b0Wherein a is0、b0Performing first-order Taylor expansion at 0 by a sigmoid function or a tanh function controlled by a 1-bit function switching bit;
if the interval judgment result is that | x | > x |, is more than or equal to xK+1Then the first multiplexer outputs y11, where 1 is the quantized fixed-point number of "1" output by the configuration module;
if the interval judgment result is x1≤|x|<xK+1If so, the address generation module starts to calculate the RAM address where the absolute value | x | corresponding to the mapping value is located according to the truncation number n (i) and the offset number b (i) output to the address generation module by the configuration module; the RAM receives the RAM address output by the address generation module, outputs a mapping value RAM _ out, and outputs the mapping value RAM _ out through a first multiplexer
Then, y outputted from the first multiplexer1The sign bit output by the absolute value module controls whether the second multiplexer is used for y1Expanding the interval, if the sign bit of the input x is positive, outputting y ═ y1If the sign bit of input x is negative, then output y is y1Obtaining a fitting value y of the sigmoid function or the tanh function in x ∈ [0, + ∞) through the output of the interval expansion module;
finally, the 1-bit function switching bit output by the configuration module controls the operation of the interval expansion module, the interval expansion module outputs a result according to the point symmetry property of the sigmoid function and the tanh function, and if the 1-bit function switching bit is 1, the interval expansion module outputs-y1And outputting y-y via a second multiplexer1If the 1-bit function switching bit is 0, the interval expansion module outputs 1-y1The output y is 1-y through the second multiplexer1To obtain sigmoid function or tanThe fitting value y of the h function at x ∈ (- ∞, 0); where 1 is the quantized fixed-point number of "1" output by the configuration module.
3. The configurable convolutional neural network processor circuit of claim 2, wherein the step of calculating the RAM address where the absolute value | x | corresponds to the mapped value is: suppose | x | falls into xi≤|x|<xi+1Then the RAM address is ((| x | -x)i)>>n(i))+b(i)。
4. The configurable convolutional neural network processor circuit as claimed in claim 1, wherein the neural network operation module further comprises a hierarchical quantization configuration module, which configures quantization standards of each layer of neural network by combining hierarchical quantization and saturation truncation to avoid overflow of calculation results of each layer, and the quantization standards of each layer of neural network are configured to each layer of neural network and the full connection layer by the hierarchical quantization configuration module; the configuration process of the hierarchical quantization configuration module is as follows:
the input of the convolution layer is a signed N-bit fixed point number, the input quantization standard is the quantization standard of the previous convolution layer, the quantization standard of the previous convolution layer is a P-bit decimal number, the signed 2N-bit fixed point number is adopted to represent the intermediate value of the current convolution layer product accumulation operation, and the decimal point of the intermediate value is located between the 2P-th bit and the 2P + 1-th bit from the low position to the high position; if the quantization standard of the current layer neural network is set as Q-bit decimal, intercepting the post-decimal Q bit and the pre-decimal N-Q bit of the intermediate value expressed by the signed 2N-bit fixed point number as the signed N-bit fixed point number operation result of the current convolutional layer; and if the intercepted signed N-bit fixed point number operation result is overflow, saturation truncation is carried out on the overflow value, if the overflow is positive overflow, the operation result of the signed N-bit fixed point number judged to overflow is reset to be a positive maximum value, and if the overflow is negative overflow, the operation result of the signed N-bit fixed point number judged to overflow is reset to be a negative minimum value.
5. The configurable convolutional neural network processor circuit of claim 1 wherein the fully-connected layers comprise multiply-accumulate units and the FIR filtering module multiplexes the multiply-accumulate units of the fully-connected layers.
6. The configurable convolutional neural network processor circuit as claimed in claim 1, wherein the windowing processing module comprises a buffer module, a windowing information calculation module and a data control module, and the two-stage data transmission mode is adopted to input data to the neural network operation module;
in the first stage of data transmission, denoised signals output by the FIR filtering module are simultaneously input into the cache module and the windowing information calculation module, the windowing information calculation module calculates the mark position of a windowing according to the input signals to serve as windowing information, the windowing information is output to the data control module, and the data control module reads data before the mark position stored in the cache module and outputs the data to the neural network operation module; and then, performing second-stage data transmission, judging how many windows of data need to be received in real time after the data control module receives the windowing information sent by the windowing information calculation module, directly receiving the data after the mark position from the FIR filtering module in real time, outputting the data to the neural network operation module, and finally outputting all the data in the windows to the neural network operation module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010545278.2A CN111507465B (en) | 2020-06-16 | 2020-06-16 | Configurable convolutional neural network processor circuit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010545278.2A CN111507465B (en) | 2020-06-16 | 2020-06-16 | Configurable convolutional neural network processor circuit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111507465A CN111507465A (en) | 2020-08-07 |
CN111507465B true CN111507465B (en) | 2020-10-23 |
Family
ID=71877126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010545278.2A Active CN111507465B (en) | 2020-06-16 | 2020-06-16 | Configurable convolutional neural network processor circuit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111507465B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738427B (en) * | 2020-08-14 | 2020-12-29 | 电子科技大学 | Operation circuit of neural network |
CN112651497A (en) * | 2020-12-30 | 2021-04-13 | 深圳大普微电子科技有限公司 | Hardware chip-based activation function processing method and device and integrated circuit |
CN115601692A (en) * | 2021-07-08 | 2023-01-13 | 华为技术有限公司(Cn) | Data processing method, training method and device of neural network model |
CN113705776B (en) * | 2021-08-06 | 2023-08-08 | 山东云海国创云计算装备产业创新中心有限公司 | Method, system, equipment and storage medium for realizing activation function based on ASIC |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729984A (en) * | 2017-10-27 | 2018-02-23 | 中国科学院计算技术研究所 | A kind of computing device and method suitable for neutral net activation primitive |
CN107886166A (en) * | 2016-09-29 | 2018-04-06 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing artificial neural network computing |
CN108154224A (en) * | 2018-01-17 | 2018-06-12 | 北京中星微电子有限公司 | For the method, apparatus and non-transitory computer-readable medium of data processing |
CN108898216A (en) * | 2018-05-04 | 2018-11-27 | 中国科学院计算技术研究所 | Activation processing unit applied to neural network |
CN110751280A (en) * | 2019-09-19 | 2020-02-04 | 华中科技大学 | Configurable convolution accelerator applied to convolutional neural network |
CN110852416A (en) * | 2019-09-30 | 2020-02-28 | 成都恒创新星科技有限公司 | CNN accelerated computing method and system based on low-precision floating-point data expression form |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10949736B2 (en) * | 2016-11-03 | 2021-03-16 | Intel Corporation | Flexible neural network accelerator and methods therefor |
CN110163338B (en) * | 2019-01-31 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Chip operation method and device with operation array, terminal and chip |
CN110738311A (en) * | 2019-10-14 | 2020-01-31 | 哈尔滨工业大学 | LSTM network acceleration method based on high-level synthesis |
-
2020
- 2020-06-16 CN CN202010545278.2A patent/CN111507465B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886166A (en) * | 2016-09-29 | 2018-04-06 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing artificial neural network computing |
CN107729984A (en) * | 2017-10-27 | 2018-02-23 | 中国科学院计算技术研究所 | A kind of computing device and method suitable for neutral net activation primitive |
CN108154224A (en) * | 2018-01-17 | 2018-06-12 | 北京中星微电子有限公司 | For the method, apparatus and non-transitory computer-readable medium of data processing |
CN108898216A (en) * | 2018-05-04 | 2018-11-27 | 中国科学院计算技术研究所 | Activation processing unit applied to neural network |
CN110751280A (en) * | 2019-09-19 | 2020-02-04 | 华中科技大学 | Configurable convolution accelerator applied to convolutional neural network |
CN110852416A (en) * | 2019-09-30 | 2020-02-28 | 成都恒创新星科技有限公司 | CNN accelerated computing method and system based on low-precision floating-point data expression form |
Non-Patent Citations (1)
Title |
---|
苏潮阳 等.一种神经网络的可配置激活函数模块设计.《单片机与嵌入式系统应用》.2020,第20卷(第4期), * |
Also Published As
Publication number | Publication date |
---|---|
CN111507465A (en) | 2020-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111507465B (en) | Configurable convolutional neural network processor circuit | |
CN110070178B (en) | Convolutional neural network computing device and method | |
CN109063825B (en) | Convolutional neural network accelerator | |
CN110880038B (en) | System for accelerating convolution calculation based on FPGA and convolution neural network | |
CN107256424B (en) | Three-value weight convolution network processing system and method | |
US20180218518A1 (en) | Data compaction and memory bandwidth reduction for sparse neural networks | |
WO2021129445A1 (en) | Data compression method and computing device | |
US11599367B2 (en) | Method and system for compressing application data for operations on multi-core systems | |
CN111240746B (en) | Floating point data inverse quantization and quantization method and equipment | |
CN108921292B (en) | Approximate computing system for deep neural network accelerator application | |
CN113660113A (en) | Self-adaptive sparse parameter model design and quantitative transmission method for distributed machine learning | |
CN110110852B (en) | Method for transplanting deep learning network to FPAG platform | |
CN114640354A (en) | Data compression method and device, electronic equipment and computer readable storage medium | |
CN109325590B (en) | Device for realizing neural network processor with variable calculation precision | |
CN114222129A (en) | Image compression encoding method, image compression encoding device, computer equipment and storage medium | |
US20210044303A1 (en) | Neural network acceleration device and method | |
Wong et al. | Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic | |
CN210109863U (en) | Multiplier, device, neural network chip and electronic equipment | |
CN112734021A (en) | Neural network acceleration method based on bit sparse calculation | |
CN113360131A (en) | Logarithm approximate multiplication accumulator for convolutional neural network accelerator | |
KR20220100030A (en) | Pattern-Based Cache Block Compression | |
CN109416757B (en) | Method, apparatus and computer-readable storage medium for processing numerical data | |
CN112132272A (en) | Computing device, processor and electronic equipment of neural network | |
CN112001492A (en) | Mixed flow type acceleration framework and acceleration method for binary weight Densenet model | |
CN111275184B (en) | Method, system, device and storage medium for realizing neural network compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |