CN106203617A - A kind of acceleration processing unit based on convolutional neural networks and array structure - Google Patents
A kind of acceleration processing unit based on convolutional neural networks and array structure Download PDFInfo
- Publication number
- CN106203617A CN106203617A CN201610482653.7A CN201610482653A CN106203617A CN 106203617 A CN106203617 A CN 106203617A CN 201610482653 A CN201610482653 A CN 201610482653A CN 106203617 A CN106203617 A CN 106203617A
- Authority
- CN
- China
- Prior art keywords
- depositor
- processing unit
- mux
- adder
- acceleration processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/22—Microcontrol or microprogram arrangements
- G06F9/28—Enhancement of operational speed, e.g. by using several microcontrol devices operating in parallel
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
The present invention discloses a kind of acceleration processing unit based on convolutional neural networks, for local data is carried out convolution algorithm, described local data includes that multiple multi-medium data, described acceleration processing unit include the first depositor, the second depositor, the 3rd depositor, the 4th depositor, the 5th depositor, multiplier, adder and the first MUX and the second MUX.Single acceleration processing unit is by the first MUX and the control of the second MUX, make multiplier and adder reusable, so that one is accelerated processing unit and has only to a multiplier and an adder can complete convolution algorithm, decrease the use of multiplier and adder, when realizing same convolution algorithm, the use reducing multiplier and adder will improve processing speed and reduce energy consumption, and the most single acceleration processing unit chip area is less.
Description
Technical field
The present invention relates to convolutional neural networks, the acceleration processing unit that is specifically related in the convolutional layer of convolutional neural networks and
Array structure.
Background technology
Degree of depth study (dee accelerates processing unit Plea2ning) learns relative to shallow-layer, refers to that machine passes through algorithm, from
Historical data learning rule, and things is made Intelligent Recognition and prediction.
Convolutional neural networks (Convolutional Neu2al Netwo2k, CNN) belongs to dee and accelerates processing unit
The one of Plea2ning netwo2k, it invents at the beginning of the 1980's, multilamellar the artificial neuron arranged forms, convolution god
The method that human brain processes vision has been reflected through network.Along with Moore's Law promotes computer technology from strength to strength,
Convolutional neural networks can the actual operation mode of more preferable mimic biology neutral net, it is to avoid pre-to the complicated early stage of image
Process, original image can be directly inputted, thus obtained more being widely applied, be the most successfully applied to hand-written character and known
Not, in recognition of face, human eye detection, pedestrian detection and robot navigation.
The basic system of convolutional neural networks includes multiple convolutional layer, and every layer is made up of multiple two dimensional surfaces, and each
Plane is made up of multiple independent neurons.Each neuron is used for the local data of multi-medium data is carried out convolution algorithm, and
And one input end also local receptor field with previous convolutional layer is connected, by the data of the local receptor field to previous convolutional layer
Carry out convolution algorithm, to extract the feature of this local receptor field.
In prior art, acceleration processing unit is generally also used to be used as neuron, the local data to multi-medium data
Carry out convolution algorithm.Existing acceleration processing unit is designed with an adder and one to each multi-medium data of input
Multiplier, when this acceleration processing unit needs local data to be processed to have multiple, it is meant that each acceleration processing unit includes
Multiple adders and multiple multiplier, this design causes the area accelerating processing unit sheet relatively big, and power consumption is big, and processing speed is also
Have much room for improvement.
Summary of the invention
The application provides a kind of acceleration processing unit based on convolutional neural networks, for local data is carried out convolution fortune
Calculate, described local data include multiple multi-medium data, described acceleration processing unit include the first depositor, the second depositor,
3rd depositor, the 4th depositor, the 5th depositor, multiplier, adder and the first MUX and the second multi-path choice
Device;
First depositor is used for inputting multi-medium data, and its outfan is connected with the input of multiplier, by multimedia number
According to being sent to multiplier;
Second depositor is used for input filter weights, and its outfan is connected with the input of multiplier, is weighed by wave filter
Value is sent to multiplier;
Multiplier is for being multiplied multi-medium data with filter weights, and its outfan and the 3rd depositor connect, by phase
Result after taking advantage of is sent to the 3rd depositor;
The outfan of the 3rd depositor and the first end of the first MUX connect;
Described first MUX second end connect adder, the 3rd end be previous acceleration processing unit part and
Input, the 3rd depositor and adder are connected by described first MUX by state switching, or by previous acceleration
The part of reason unit and input and adder connect;
Described adder is also connected with the 5th depositor and the 4th depositor, for the phase the first MUX transmitted
Result or the part of previous acceleration processing unit after taking advantage of and carry out additive operation with the data in the 5th depositor, and will add up
After result export the 4th depositor;
First end of described second MUX and the second end connect the 4th depositor and the 5th depositor respectively, described
4th depositor is connected to the 5th depositor by the second MUX.
Preferably, described first MUX keeps when acceleration processing unit is not fully complete the multiply-add operation of local data
First state, is connected to adder by the 3rd depositor, switches after acceleration processing unit completes the multiply-add operation of local data
It is the second state, part and the input of previous acceleration processing unit are connected to adder.
Preferably, described second MUX keeps when acceleration processing unit is not fully complete the multiply-add operation of local data
It is the first state, the 4th depositor is connected to the 5th depositor, complete the multiply-add operation of local data at acceleration processing unit
After switch to the second state, with by the 5th depositor reset.
Preferably, the 3rd end of described second MUX is for resetting end, and described second MUX is at acceleration
Reason unit switches to the second state after completing the multiply-add operation of local data, and replacement end is connected to the 5th depositor.
Preferably, also include that first memory, second memory and the 3rd memorizer, described first memory and first are posted
The input of storage connects, and needs, for input storage, the local data carrying out convolution algorithm, and by many in local data
Individual multi-medium data is sent to the first depositor successively;The input of described second memory and the second depositor connects, and is used for
Input and store filter weights, and filter weights is sent to the second depositor;Described 3rd memorizer and the 4th is deposited
The input of device connects, the result after the addition inputting and storing adder output, and the result after will add up is sent to
4th depositor.
Preferably, it is characterised in that the result after described adder also will add up exports a rear acceleration processing unit.
The application provides a kind of array structure based on convolutional neural networks, including multiple described acceleration processing units,
Multiple acceleration processing units are rendered as the matrix shape of 3 row N row, and wherein 3 and N is the integer more than or equal to 1, every string
It is connected before and after accelerating processing unit.
Preferably, in every string, after the outfan of the adder of previous acceleration processing unit connects, one accelerates processing unit
The 3rd end of the first MUX.
Preferably, with in the acceleration processing unit of a line, the filter weights of input is identical;It is positioned on same diagonal
Accelerating in processing unit, the local data of input is identical.
Preferably, in the acceleration processing unit of different rows, the filter weights of input is different.
The invention has the beneficial effects as follows: single acceleration processing unit is by the first MUX and the second multi-path choice
The control of device so that multiplier and adder are reusable, so that one is accelerated processing unit and has only to a multiplication
Device and an adder can complete convolution algorithm, decrease the use of multiplier and adder, realizing same convolution fortune
During calculation, the use reducing multiplier and adder will improve processing speed and reduce energy consumption, the most single acceleration processing unit
Chip area is less.
Accompanying drawing explanation
A kind of based on convolutional neural networks the acceleration processing unit structured flowchart that Fig. 1 provides for the embodiment of the present invention;
The convolution algorithm mistake of a kind of based on convolutional neural networks the acceleration processing unit that Fig. 2 provides for the embodiment of the present invention
Journey schematic diagram;
Fig. 3 is that a kind of array structure based on convolutional neural networks of the embodiment of the present invention arranges to distribution schematic diagram;
Fig. 4 is that a kind of array structure row based on convolutional neural networks of the embodiment of the present invention is to distribution schematic diagram;
Fig. 5 is a kind of array structure diagonal distribution schematic diagram based on convolutional neural networks of the embodiment of the present invention.
Detailed description of the invention
Combine accompanying drawing below by detailed description of the invention technical scheme is clearly and completely described, aobvious
So, described embodiment is only a part of embodiment of the present invention rather than whole embodiments.
Embodiment one:
Refer to Fig. 1, the present embodiment provides a kind of acceleration processing unit based on convolutional neural networks, accelerates processing unit
61 include first depositor the 21, second depositor the 22, the 3rd depositor the 23, the 4th depositor the 24, the 5th depositor 25, multiplier
41, adder 51 and the first MUX 31 and the second MUX 32.
First depositor 21 is connected with an input of multiplier 41, and the first depositor 21 is used for inputting multi-medium data,
And multi-medium data is sent to multiplier 41.Second depositor 22 is connected with another input of multiplier 41, and second deposits
Device 22 is for input filter weights, and filter weights is sent to multiplier 41.The outfan of multiplier 41 and the 3rd is posted
Storage 23 connects, and for being multiplied with filter weights by multi-medium data, and the result after being multiplied is sent to the 3rd depositor
23。
First end of the first MUX 31 and the outfan of the 3rd depositor 23 connect, and the second end connects adder 51
An input, the 3rd end is part and the input of previous acceleration processing unit.When the first MUX 31 is switched to
During one state (such as setting to 0), the 3rd depositor 23 and adder 51 are connected by the first MUX 31, by the 3rd depositor 23
In data be sent to adder 51;When the first MUX 31 is switched to the second state (such as putting 1), the first multichannel choosing
Select device 31 its 3rd end and adder 51 to be connected, by the part of previous acceleration processing unit and be sent to adder 51.
Another input and the 5th depositor 25 of adder 51 connect, the outfan of adder 51 and the 4th depositor 24
Connecting, adder 51 inputs the data in the 3rd depositor 23 and the 5th depositor 25, the data in two depositors is added
Method computing, and result after will add up (also referred to as interior section and) output is to the 4th depositor 24.
First end of the second MUX 32 and the second end connect the 4th depositor 24 and the 5th depositor 25 respectively, the
3rd end of two MUX 32 is for resetting end.When the second MUX 32 is switched to the first state (such as setting to 0), the
4th depositor 24 and the 5th depositor 25 are connected by two MUX 32, by the interior section in the 4th depositor 24 and send out
Deliver to the 5th depositor 25;When the second MUX 32 is switched to the second state (such as putting 1), the second MUX 32
Its 3rd end and the 5th depositor 25 are connected, resets the 5th depositor 25, make zeros data in the 5th depositor 25.
In the embodiment having, send data for convenience of to depositor, accelerate processing unit 61 also include first depositing 11, the
Two memorizer 12 and the 3rd memorizeies 13, first memory 11 is connected with the input of the first depositor 21, is used for inputting and depositing
Storage needs the local data carrying out convolution algorithm, and the multiple multi-medium datas in local data are sent to first successively deposit
Device 21;Second memory 12 is connected with the input of the second depositor 22, is used for inputting and storing filter weights, and will filtering
Device weights are sent to the second depositor 22;3rd memorizer 13 is connected with the input of the 4th depositor 24, is used for inputting and depositing
The interior section of storage adder 51 output and, and by interior section and be sent to the 4th depositor 24.
Accelerating processing unit 61 to be used for local data is carried out convolution algorithm, local data includes multiple multi-medium data,
Multi-medium data can be video data, view data, it is also possible to be voice data.When multi-medium data is video data,
It is believed that the corresponding pixel of each multi-medium data.
Below as a example by view data, illustrate to accelerate the convolution algorithm process of processing unit 61.
In conjunction with Fig. 1 and Fig. 2, the work process of single acceleration processing unit 61 based on convolutional neural networks is as follows:
Step 10, reads and needs to carry out video data and the filter weights of convolution algorithm.If view data is not 0, figure
As data are stored in first memory 11, needs when, it is sent to the first depositor 21 for extracting view data,
If view data is 0, view data 0 is routed directly to the first depositor 21 and without extracting, takes the plan skipped or gate
Slightly avoid non-essential reading and calculating;Filter weights is stored in second memory 12, is sent to needs when
Second depositor 22 is used for filter weights data, and wherein, data extracting mode is that serial successively is extracted, i.e. first circulation
In, this acceleration processing unit 61 first view data in the local data carrying out convolution algorithm is sent to first and deposits
Device 21;Second circulate in, second view data is sent to the first depositor 21, after read in view data successively.
Filter weights is required to produce according to convolution algorithm by processor.
Step 20, multiplying.View data in first depositor 21 and the filter weights in the second depositor 22
Being sent in multiplier 41 perform multiplying, the result after multiplier 41 is multiplied is output to the 3rd depositor 23.
Step 30, additive operation.It is not over owing to accelerating the multiply-add operation in processing unit 61, now the first multichannel
Selector 31 sets to 0, and when the first MUX 31 is set to 0, in the 3rd depositor 23, view data is sent to adder 51
In, adder 51 is by view data and the front interior section in the 5th depositor 25 and is added.For primary inside
Convolution operation, is zero in the 5th depositor 25, for follow-up inside convolution operation, is a front convolution in the 5th depositor 25
Interior section after operation and.Result (i.e. interior section and) after being added in this convolution operation is output to the 4th depositor
24, the most just complete once internal convolution operation, finally give first view data and filter weights part and.By
Being not in the multiply-add operation accelerated in processing unit 61, now the second MUX 32 sets to 0, when the second multi-path choice
When device 32 is set to 0, interior section and being sent to the 5th depositor 25 by the 4th depositor 24.
Step 40, this acceleration processing unit 61 judges whether to complete the inside convolution operation of all local datas, if interior
When portion's convolution operation is not fully complete, step 10, step 20 and step 30 will be repeated in, and extract second view data, and input
To the first depositor 21, second filter weights is input to the second depositor 22, the view data in the first depositor 21 and
Filter weights in second depositor 22 is all sent in multiplier 41, and view data is multiplied with filter weights, draws
Result is sent in the 3rd depositor by multiplier 41, is not over owing to accelerating the multiply-add operation in processing unit 61, this
Time the first MUX 31 set to 0, the data in the 3rd depositor 23 are sent in adder 51 by the first MUX,
With from the 5th depositor 25 data sue for peace, finally give secondary view data and filter weights part and.
Part in adder 51 and being sent in the 4th depositor 24, now, does not also have owing to accelerating the multiply-add operation in processing unit
Having end, now the second MUX 32 sets to 0, and the data in the 4th depositor 24 are sent to by the second MUX 31
In 5th depositor 25.Thus complete the inside convolution operation to second view data.Until extracting the last of local data
One view data, this image information and filter weights are multiplied and after phase add operation through above-mentioned, obtain this acceleration and process
The part of unit and, this part and by operation as hereinbefore, eventually enter in the 5th depositor 25.When all interior roll
When long-pending operation completes, then will carry out step 50.
Step 50, after the multiply-add operation to local data accelerated in processing unit 61 terminates, the first MUX
31 and second MUX 32 put 1, when the first MUX 32 is set to 1, the part in previous acceleration processing unit
It is sent in adder 51 with by the first MUX 31, final by this acceleration processing unit 61 of the 5th depositor 25
Part and being sent in adder 51, finally, the part of previous acceleration processing unit and with in this acceleration processing unit 61
Part and summation, obtain two accelerate processing unit superpositions parts and, the part of this superposition and output, be sent to the next one and add
Speed processing unit.When the second MUX 32 will be set to 1 by state 0, and the 4th depositor 24 is no longer to 25, the 5th depositor
The data in data, and the 5th depositor 25 are sent to be cleared.
In the present embodiment, single acceleration processing unit is by the first MUX 31 and the second MUX 32
Control so that multiplier 41 and adder 51 are reusable, so that one is accelerated processing unit and has only to a multiplication
Device and an adder can complete convolution algorithm, decrease the use of multiplier and adder, realizing same convolution fortune
During calculation, the use reducing multiplier and adder will improve processing speed and reduce energy consumption, the most single acceleration processing unit
Chip area is less.
Embodiment two:
Refer to Fig. 3 to Fig. 5, it is shown that a kind of array structure based on convolutional neural networks, including multiple described acceleration
Processing unit, multiple acceleration processing units are rendered as the matrix shape of M row N row, and wherein M and N is the integer more than or equal to 1,
It is connected before and after the acceleration processing unit of every string.
In the present embodiment, multiple acceleration processing units are rendered as the matrix shape of 3 row 3 row, in every string, at previous acceleration
After the outfan connection of the adder of reason unit, one accelerates the 3rd end of the first MUX of processing unit.
With in the acceleration processing unit of a line, the filter weights of input is identical;It is positioned at the acceleration on same diagonal
In reason unit, the local data of input is identical.
In the acceleration processing unit of different rows, the filter weights of input is different.
Below in conjunction with the accompanying drawings, the convolutional layer calculating process of multiple acceleration processing unit is described.
In conjunction with Fig. 1 to Fig. 5, the calculating process of array structure based on convolutional neural networks is as follows:
As it is shown on figure 3, the adder 51 of previous acceleration processing unit connects later accelerates more than the first of processing unit
Road selector 31, every a line output part and all vertically move, by former and later two accelerate processing units part and add up,
Can be read in top line at the end of calculating process, calculate the beginning of process at the next one and delivered to the bottom row of array by buffer.
Such as, accelerate processing unit PE1.1, accelerate processing unit PE2.1 and accelerate processing unit PE3.1, enter the most respectively
The internal convolution algorithm of row, final result is stored in respective 5th depositor 25, then, accelerates in processing unit PE3.1 defeated
The part that goes out and with accelerate the part of the 5th depositor 25 in processing unit PE2.1 and at the addition accelerating processing unit PE2.1
In device 51, summation is cumulative again, obtain the most cumulative part and, part that described first time is cumulative and processed single by acceleration
Unit PE2.1 is sent to accelerate in processing unit PE1.1, with accelerate the 5th depositor 25 in processing unit PE1.1 part and
Accelerate the adder 51 of processing unit PE1.1 is sued for peace again, the part of final output these row all acceleration processing unit 1 with.
It is also pointed out that, as shown in Figure 4 and Figure 5, with in the acceleration processing unit of a line, the filter weights phase of input
With, it being positioned in the acceleration processing unit on same diagonal, the view data of input is identical, the acceleration processing unit of different rows
In, the filter weights of input is different.Owing to whole view data has several rows, and each acceleration processing unit simply processes whole
Single line of data in individual view data, is therefore accomplished by acceleration processing unit having been processed every data line respectively again to every a line
The convolution results of data carries out accumulation operations.Input data on same diagonal are identical, the input picture on different diagonal
Data are different, and the view data being equivalent to the input on different diagonal is the different rows data of view data.And process difference
The view data of row needs different filter weights, such as one filter weights to be used only to process the picture number of the first row
According to, when processing the view data of the second row when, it is necessary to by new filter weights.Therefore the acceleration of same a line can be made
Processing unit uses identical filter weights, and the processing unit that accelerates of different rows uses different filter weights.
Such as, accelerate processing unit PE1.1, accelerate processing unit PE1.2 and the wave filter accelerated in processing unit PE1.3
Weights are identical, and it is identical with accelerating the view data of input in processing unit PE1.2 to accelerate processing unit PE2.1, and acceleration processes single
Unit PE1.1, the filter weights accelerated in processing unit PE2.2 and acceleration processing unit PE3.1 differ.
So achieve, process the multi-medium data of a line simultaneously, then the multi-medium data of different rows is used different
Filter weights, after having processed every data line respectively, is carrying out accumulation operations to the most each row multi-medium data, thus soon
Speed, reliably process whole multi-medium datas.
The present invention is illustrated by use above specific case, is only intended to help and understands the present invention, not in order to limit
The present invention processed.For those skilled in the art, according to the thought of the present invention, it is also possible to make some simply
Deduce, deform or replace.
Claims (10)
1. an acceleration processing unit based on convolutional neural networks, for carrying out convolution algorithm, described local to local data
Data include multiple multi-medium data, it is characterised in that include the first depositor, the second depositor, the 3rd depositor, the 4th post
Storage, the 5th depositor, multiplier, adder and the first MUX and the second MUX;
First depositor is used for inputting multi-medium data, and its outfan is connected with the input of multiplier, is sent out by multi-medium data
Deliver to multiplier;
Second depositor is used for input filter weights, and its outfan is connected with the input of multiplier, filter weights is sent out
Deliver to multiplier;
Multiplier is for being multiplied multi-medium data with filter weights, and its outfan and the 3rd depositor connect, after being multiplied
Result be sent to the 3rd depositor;
The outfan of the 3rd depositor and the first end of the first MUX connect;
Second end of described first MUX connects adder, and the 3rd end is part and the input of previous acceleration processing unit
End, the 3rd depositor and adder are connected by described first MUX by state switching, or previous acceleration are processed single
The part of unit and input and adder connect;
Described adder is also connected with the 5th depositor and the 4th depositor, for by first MUX transmit be multiplied after
Result or the part of previous acceleration processing unit and carry out additive operation with the data in the 5th depositor, and after will add up
Result exports the 4th depositor;
First end of described second MUX and the second end connect the 4th depositor and the 5th depositor respectively, and the described 4th
Depositor is connected to the 5th depositor by the second MUX.
Accelerate processing unit the most as claimed in claim 1, it is characterised in that described first MUX processes single in acceleration
Unit keeps the first state when being not fully complete the multiply-add operation of local data, and the 3rd depositor is connected to adder, processes in acceleration
Unit switches to the second state after completing the multiply-add operation of local data, by the part of previous acceleration processing unit and input even
Receive adder.
Accelerate processing unit the most as claimed in claim 1, it is characterised in that described second MUX processes single in acceleration
Unit remains the first state when being not fully complete the multiply-add operation of local data, the 4th depositor is connected to the 5th depositor, is adding
Speed processing unit switches to the second state after completing the multiply-add operation of local data, to be reset by the 5th depositor.
Accelerate processing unit the most as claimed in claim 3, it is characterised in that the 3rd end of described second MUX is attached most importance to
Putting end, described second MUX switches to the second state after acceleration processing unit completes the multiply-add operation of local data,
Replacement end is connected to the 5th depositor.
5. the acceleration processing unit as according to any one of Claims 1-4, it is characterised in that also include first memory,
Two memorizeies and the 3rd memorizer, the input of described first memory and the first depositor connects, and needs for inputting and storing
The local data of convolution algorithm to be carried out, and the multiple multi-medium datas in local data are sent to the first depositor successively;
The input of described second memory and the second depositor connects, and is used for inputting and storing filter weights, and is weighed by wave filter
Value is sent to the second depositor;The input of described 3rd memorizer and the 4th depositor connects, and is used for inputting and storing addition
Result after the addition of device output, and the result after will add up is sent to the 4th depositor.
6. the acceleration processing unit as according to any one of Claims 1-4, it is characterised in that described adder also will add up
After result export rear one accelerate processing unit.
7. an array structure based on convolutional neural networks, it is characterised in that include multiple as arbitrary in claim 1 to 6
Acceleration processing unit described in Xiang, multiple acceleration processing units be rendered as M row N row matrix shape, wherein M and N for more than or
Integer equal to 1, is connected before and after the acceleration processing unit of every string.
8. array structure as claimed in claim 7, it is characterised in that in every string, the adder of previous acceleration processing unit
Outfan connect after the 3rd end of the first MUX accelerating processing unit.
9. as claimed in claim 7 or 8 array structure, it is characterised in that with in the acceleration processing unit of a line, the filter of input
Ripple device weights are identical;Being positioned in the acceleration processing unit on same diagonal, the local data of input is identical.
10. array structure as claimed in claim 9, it is characterised in that in the acceleration processing unit of different rows, the filtering of input
Device weights are different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610482653.7A CN106203617B (en) | 2016-06-27 | 2016-06-27 | A kind of acceleration processing unit and array structure based on convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610482653.7A CN106203617B (en) | 2016-06-27 | 2016-06-27 | A kind of acceleration processing unit and array structure based on convolutional neural networks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106203617A true CN106203617A (en) | 2016-12-07 |
CN106203617B CN106203617B (en) | 2018-08-21 |
Family
ID=57462215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610482653.7A Expired - Fee Related CN106203617B (en) | 2016-06-27 | 2016-06-27 | A kind of acceleration processing unit and array structure based on convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106203617B (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107622305A (en) * | 2017-08-24 | 2018-01-23 | 中国科学院计算技术研究所 | Processor and processing method for neutral net |
CN107844826A (en) * | 2017-10-30 | 2018-03-27 | 中国科学院计算技术研究所 | Neural-network processing unit and the processing system comprising the processing unit |
CN107862378A (en) * | 2017-12-06 | 2018-03-30 | 芯原微电子(上海)有限公司 | Convolutional neural networks accelerated method and system, storage medium and terminal based on multinuclear |
CN108038815A (en) * | 2017-12-20 | 2018-05-15 | 深圳云天励飞技术有限公司 | Integrated circuit |
WO2018108126A1 (en) * | 2016-12-14 | 2018-06-21 | 上海寒武纪信息科技有限公司 | Neural network convolution operation device and method |
WO2018107383A1 (en) * | 2016-12-14 | 2018-06-21 | 上海寒武纪信息科技有限公司 | Neural network convolution computation method and device, and computer-readable storage medium |
CN108491926A (en) * | 2018-03-05 | 2018-09-04 | 东南大学 | A kind of hardware-accelerated design method of the efficient depth convolutional neural networks of low bit based on logarithmic quantization, module and system |
CN108629405A (en) * | 2017-03-22 | 2018-10-09 | 杭州海康威视数字技术股份有限公司 | The method and apparatus for improving convolutional neural networks computational efficiency |
CN108629406A (en) * | 2017-03-24 | 2018-10-09 | 展讯通信(上海)有限公司 | Arithmetic unit for convolutional neural networks |
EP3388981A1 (en) * | 2017-04-13 | 2018-10-17 | Nxp B.V. | A human-machine-interface system |
CN108701015A (en) * | 2017-11-30 | 2018-10-23 | 深圳市大疆创新科技有限公司 | For the arithmetic unit of neural network, chip, equipment and correlation technique |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm |
CN109993272A (en) * | 2017-12-29 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC |
CN110059818A (en) * | 2019-04-28 | 2019-07-26 | 山东师范大学 | Neural convolution array circuit core, processor and the circuit that convolution nuclear parameter can match |
CN110494867A (en) * | 2017-03-23 | 2019-11-22 | 三星电子株式会社 | Method for operating the electronic device of machine learning and for operating machine learning |
CN110659445A (en) * | 2018-06-29 | 2020-01-07 | 龙芯中科技术有限公司 | Arithmetic device and processing method thereof |
CN111144556A (en) * | 2019-12-31 | 2020-05-12 | 中国人民解放军国防科技大学 | Hardware circuit of range batch processing normalization algorithm for deep neural network training and reasoning |
CN112115095A (en) * | 2020-06-12 | 2020-12-22 | 苏州浪潮智能科技有限公司 | Reconfigurable hardware for Hash algorithm and operation method |
CN112236783A (en) * | 2018-03-13 | 2021-01-15 | 雷哥尼公司 | Efficient convolution engine |
CN112288085A (en) * | 2020-10-23 | 2021-01-29 | 中国科学院计算技术研究所 | Convolutional neural network acceleration method and system |
CN112598122A (en) * | 2020-12-23 | 2021-04-02 | 北方工业大学 | Convolutional neural network accelerator based on variable resistance random access memory |
CN113222126A (en) * | 2020-01-21 | 2021-08-06 | 上海商汤智能科技有限公司 | Data processing device and artificial intelligence chip |
CN113361687A (en) * | 2021-05-31 | 2021-09-07 | 天津大学 | Configurable addition tree suitable for convolutional neural network training accelerator |
CN113591025A (en) * | 2021-08-03 | 2021-11-02 | 深圳思谋信息科技有限公司 | Feature map processing method and device, convolutional neural network accelerator and medium |
CN117369707A (en) * | 2023-12-04 | 2024-01-09 | 杭州米芯微电子有限公司 | Digital signal monitoring circuit and chip |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0422348A2 (en) * | 1989-10-10 | 1991-04-17 | Hnc, Inc. | Two-dimensional systolic array for neural networks, and method |
CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
CN103691058A (en) * | 2013-12-10 | 2014-04-02 | 天津大学 | Deep brain stimulation FPGA (Field Programmable Gate Array) experimental platform for basal ganglia and thalamencephalon network for parkinson's disease |
CN104504205A (en) * | 2014-12-29 | 2015-04-08 | 南京大学 | Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method |
CN105528191A (en) * | 2015-12-01 | 2016-04-27 | 中国科学院计算技术研究所 | Data accumulation apparatus and method, and digital signal processing device |
CN105681628A (en) * | 2016-01-05 | 2016-06-15 | 西安交通大学 | Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor |
EP3035204A1 (en) * | 2014-12-19 | 2016-06-22 | Intel Corporation | Storage device and method for performing convolution operations |
EP3035249A1 (en) * | 2014-12-19 | 2016-06-22 | Intel Corporation | Method and apparatus for distributed and cooperative computation in artificial neural networks |
-
2016
- 2016-06-27 CN CN201610482653.7A patent/CN106203617B/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0422348A2 (en) * | 1989-10-10 | 1991-04-17 | Hnc, Inc. | Two-dimensional systolic array for neural networks, and method |
US5471627A (en) * | 1989-10-10 | 1995-11-28 | Hnc, Inc. | Systolic array image processing system and method |
CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
CN103691058A (en) * | 2013-12-10 | 2014-04-02 | 天津大学 | Deep brain stimulation FPGA (Field Programmable Gate Array) experimental platform for basal ganglia and thalamencephalon network for parkinson's disease |
EP3035204A1 (en) * | 2014-12-19 | 2016-06-22 | Intel Corporation | Storage device and method for performing convolution operations |
EP3035249A1 (en) * | 2014-12-19 | 2016-06-22 | Intel Corporation | Method and apparatus for distributed and cooperative computation in artificial neural networks |
CN104504205A (en) * | 2014-12-29 | 2015-04-08 | 南京大学 | Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method |
CN105528191A (en) * | 2015-12-01 | 2016-04-27 | 中国科学院计算技术研究所 | Data accumulation apparatus and method, and digital signal processing device |
CN105681628A (en) * | 2016-01-05 | 2016-06-15 | 西安交通大学 | Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor |
Non-Patent Citations (2)
Title |
---|
凡保磊: ""卷积神经网络的并行化研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陆志坚: ""基于FPGA的卷积神经网络并行结构研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018108126A1 (en) * | 2016-12-14 | 2018-06-21 | 上海寒武纪信息科技有限公司 | Neural network convolution operation device and method |
WO2018107383A1 (en) * | 2016-12-14 | 2018-06-21 | 上海寒武纪信息科技有限公司 | Neural network convolution computation method and device, and computer-readable storage medium |
CN108629405B (en) * | 2017-03-22 | 2020-09-18 | 杭州海康威视数字技术股份有限公司 | Method and device for improving calculation efficiency of convolutional neural network |
CN108629405A (en) * | 2017-03-22 | 2018-10-09 | 杭州海康威视数字技术股份有限公司 | The method and apparatus for improving convolutional neural networks computational efficiency |
CN110494867A (en) * | 2017-03-23 | 2019-11-22 | 三星电子株式会社 | Method for operating the electronic device of machine learning and for operating machine learning |
CN110494867B (en) * | 2017-03-23 | 2024-06-07 | 三星电子株式会社 | Electronic device for operating machine learning and method for operating machine learning |
US11907826B2 (en) | 2017-03-23 | 2024-02-20 | Samsung Electronics Co., Ltd | Electronic apparatus for operating machine learning and method for operating machine learning |
CN108629406B (en) * | 2017-03-24 | 2020-12-18 | 展讯通信(上海)有限公司 | Arithmetic device for convolutional neural network |
CN108629406A (en) * | 2017-03-24 | 2018-10-09 | 展讯通信(上海)有限公司 | Arithmetic unit for convolutional neural networks |
US11567770B2 (en) | 2017-04-13 | 2023-01-31 | Nxp B.V. | Human-machine-interface system comprising a convolutional neural network hardware accelerator |
EP3388981A1 (en) * | 2017-04-13 | 2018-10-17 | Nxp B.V. | A human-machine-interface system |
CN107622305A (en) * | 2017-08-24 | 2018-01-23 | 中国科学院计算技术研究所 | Processor and processing method for neutral net |
CN107844826A (en) * | 2017-10-30 | 2018-03-27 | 中国科学院计算技术研究所 | Neural-network processing unit and the processing system comprising the processing unit |
CN107844826B (en) * | 2017-10-30 | 2020-07-31 | 中国科学院计算技术研究所 | Neural network processing unit and processing system comprising same |
WO2019104695A1 (en) * | 2017-11-30 | 2019-06-06 | 深圳市大疆创新科技有限公司 | Arithmetic device for neural network, chip, equipment and related method |
CN108701015A (en) * | 2017-11-30 | 2018-10-23 | 深圳市大疆创新科技有限公司 | For the arithmetic unit of neural network, chip, equipment and correlation technique |
CN107862378A (en) * | 2017-12-06 | 2018-03-30 | 芯原微电子(上海)有限公司 | Convolutional neural networks accelerated method and system, storage medium and terminal based on multinuclear |
CN107862378B (en) * | 2017-12-06 | 2020-04-24 | 芯原微电子(上海)股份有限公司 | Multi-core-based convolutional neural network acceleration method and system, storage medium and terminal |
CN108038815A (en) * | 2017-12-20 | 2018-05-15 | 深圳云天励飞技术有限公司 | Integrated circuit |
WO2019119480A1 (en) * | 2017-12-20 | 2019-06-27 | 深圳云天励飞技术有限公司 | Integrated circuit |
US10706353B2 (en) | 2017-12-20 | 2020-07-07 | Shenzhen Intellifusion Technologies Co., Ltd. | Integrated circuit |
CN109993272A (en) * | 2017-12-29 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC |
CN108491926A (en) * | 2018-03-05 | 2018-09-04 | 东南大学 | A kind of hardware-accelerated design method of the efficient depth convolutional neural networks of low bit based on logarithmic quantization, module and system |
US11580372B2 (en) | 2018-03-13 | 2023-02-14 | Recogni Inc. | Efficient convolutional engine |
CN112236783A (en) * | 2018-03-13 | 2021-01-15 | 雷哥尼公司 | Efficient convolution engine |
US11694069B2 (en) | 2018-03-13 | 2023-07-04 | Recogni Inc. | Methods for processing data in an efficient convolutional engine with partitioned columns of convolver units |
US11694068B2 (en) | 2018-03-13 | 2023-07-04 | Recogni Inc. | Methods for processing horizontal stripes of data in an efficient convolutional engine |
US11645504B2 (en) | 2018-03-13 | 2023-05-09 | Recogni Inc. | Methods for processing vertical stripes of data in an efficient convolutional engine |
CN112236783B (en) * | 2018-03-13 | 2023-04-11 | 雷哥尼公司 | Efficient convolution engine |
US11593630B2 (en) | 2018-03-13 | 2023-02-28 | Recogni Inc. | Efficient convolutional engine |
CN110659445A (en) * | 2018-06-29 | 2020-01-07 | 龙芯中科技术有限公司 | Arithmetic device and processing method thereof |
CN110659445B (en) * | 2018-06-29 | 2022-12-30 | 龙芯中科技术股份有限公司 | Arithmetic device and processing method thereof |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm |
CN110059818A (en) * | 2019-04-28 | 2019-07-26 | 山东师范大学 | Neural convolution array circuit core, processor and the circuit that convolution nuclear parameter can match |
CN111144556A (en) * | 2019-12-31 | 2020-05-12 | 中国人民解放军国防科技大学 | Hardware circuit of range batch processing normalization algorithm for deep neural network training and reasoning |
CN113222126A (en) * | 2020-01-21 | 2021-08-06 | 上海商汤智能科技有限公司 | Data processing device and artificial intelligence chip |
CN112115095B (en) * | 2020-06-12 | 2022-07-08 | 苏州浪潮智能科技有限公司 | Reconfigurable hardware for Hash algorithm and operation method |
CN112115095A (en) * | 2020-06-12 | 2020-12-22 | 苏州浪潮智能科技有限公司 | Reconfigurable hardware for Hash algorithm and operation method |
CN112288085A (en) * | 2020-10-23 | 2021-01-29 | 中国科学院计算技术研究所 | Convolutional neural network acceleration method and system |
CN112288085B (en) * | 2020-10-23 | 2024-04-09 | 中国科学院计算技术研究所 | Image detection method and system based on convolutional neural network |
CN112598122A (en) * | 2020-12-23 | 2021-04-02 | 北方工业大学 | Convolutional neural network accelerator based on variable resistance random access memory |
CN112598122B (en) * | 2020-12-23 | 2023-09-05 | 北方工业大学 | Convolutional neural network accelerator based on variable resistance random access memory |
CN113361687B (en) * | 2021-05-31 | 2023-03-24 | 天津大学 | Configurable addition tree suitable for convolutional neural network training accelerator |
CN113361687A (en) * | 2021-05-31 | 2021-09-07 | 天津大学 | Configurable addition tree suitable for convolutional neural network training accelerator |
CN113591025A (en) * | 2021-08-03 | 2021-11-02 | 深圳思谋信息科技有限公司 | Feature map processing method and device, convolutional neural network accelerator and medium |
CN117369707A (en) * | 2023-12-04 | 2024-01-09 | 杭州米芯微电子有限公司 | Digital signal monitoring circuit and chip |
CN117369707B (en) * | 2023-12-04 | 2024-03-19 | 杭州米芯微电子有限公司 | Digital signal monitoring circuit and chip |
Also Published As
Publication number | Publication date |
---|---|
CN106203617B (en) | 2018-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106203617A (en) | A kind of acceleration processing unit based on convolutional neural networks and array structure | |
CN111684473B (en) | Improving performance of neural network arrays | |
CN105681628B (en) | A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing | |
CN108416437A (en) | The processing system and method for artificial neural network for multiply-add operation | |
CN107341544A (en) | A kind of reconfigurable accelerator and its implementation based on divisible array | |
WO2019136764A1 (en) | Convolutor and artificial intelligent processing device applied thereto | |
CN107886167A (en) | Neural network computing device and method | |
EP0504932A2 (en) | A parallel data processing system | |
EP0505179A2 (en) | A parallel data processing system | |
CN106022468A (en) | Artificial neural network processor integrated circuit and design method therefor | |
CN105608490B (en) | Cellular array computing system and communication means therein | |
CN111462137A (en) | Point cloud scene segmentation method based on knowledge distillation and semantic fusion | |
CN107679522A (en) | Action identification method based on multithread LSTM | |
CN107341542A (en) | Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings | |
CN111626184B (en) | Crowd density estimation method and system | |
TWI719512B (en) | Method and system for algorithm using pixel-channel shuffle convolution neural network | |
CN111465943A (en) | On-chip computing network | |
CN108320018A (en) | A kind of device and method of artificial neural network operation | |
CN110378250A (en) | Training method, device and the terminal device of neural network for scene cognition | |
CN110009644B (en) | Method and device for segmenting line pixels of feature map | |
CN108491924A (en) | A kind of serial stream treatment device of Neural Network Data calculated towards artificial intelligence | |
Nathan et al. | Skeletonnetv2: A dense channel attention blocks for skeleton extraction | |
KR20090086660A (en) | Computer architecture combining neural network and parallel processor, and processing method using it | |
CN111886605B (en) | Processing for multiple input data sets | |
CN113592021B (en) | Stereo matching method based on deformable and depth separable convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180821 Termination date: 20190627 |