CN106203617A - A kind of acceleration processing unit based on convolutional neural networks and array structure - Google Patents

A kind of acceleration processing unit based on convolutional neural networks and array structure Download PDF

Info

Publication number
CN106203617A
CN106203617A CN201610482653.7A CN201610482653A CN106203617A CN 106203617 A CN106203617 A CN 106203617A CN 201610482653 A CN201610482653 A CN 201610482653A CN 106203617 A CN106203617 A CN 106203617A
Authority
CN
China
Prior art keywords
depositor
processing unit
mux
adder
acceleration processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610482653.7A
Other languages
Chinese (zh)
Other versions
CN106203617B (en
Inventor
宋博扬
赵秋奇
马芝
刘记朋
韩宇菲
王明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN INTEGRATED CIRCUIT DESIGN INDUSTRIALIZATION BASE ADMINISTRATION CENTER
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
SHENZHEN INTEGRATED CIRCUIT DESIGN INDUSTRIALIZATION BASE ADMINISTRATION CENTER
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN INTEGRATED CIRCUIT DESIGN INDUSTRIALIZATION BASE ADMINISTRATION CENTER, Shenzhen Graduate School Harbin Institute of Technology filed Critical SHENZHEN INTEGRATED CIRCUIT DESIGN INDUSTRIALIZATION BASE ADMINISTRATION CENTER
Priority to CN201610482653.7A priority Critical patent/CN106203617B/en
Publication of CN106203617A publication Critical patent/CN106203617A/en
Application granted granted Critical
Publication of CN106203617B publication Critical patent/CN106203617B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/28Enhancement of operational speed, e.g. by using several microcontrol devices operating in parallel

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention discloses a kind of acceleration processing unit based on convolutional neural networks, for local data is carried out convolution algorithm, described local data includes that multiple multi-medium data, described acceleration processing unit include the first depositor, the second depositor, the 3rd depositor, the 4th depositor, the 5th depositor, multiplier, adder and the first MUX and the second MUX.Single acceleration processing unit is by the first MUX and the control of the second MUX, make multiplier and adder reusable, so that one is accelerated processing unit and has only to a multiplier and an adder can complete convolution algorithm, decrease the use of multiplier and adder, when realizing same convolution algorithm, the use reducing multiplier and adder will improve processing speed and reduce energy consumption, and the most single acceleration processing unit chip area is less.

Description

A kind of acceleration processing unit based on convolutional neural networks and array structure
Technical field
The present invention relates to convolutional neural networks, the acceleration processing unit that is specifically related in the convolutional layer of convolutional neural networks and Array structure.
Background technology
Degree of depth study (dee accelerates processing unit Plea2ning) learns relative to shallow-layer, refers to that machine passes through algorithm, from Historical data learning rule, and things is made Intelligent Recognition and prediction.
Convolutional neural networks (Convolutional Neu2al Netwo2k, CNN) belongs to dee and accelerates processing unit The one of Plea2ning netwo2k, it invents at the beginning of the 1980's, multilamellar the artificial neuron arranged forms, convolution god The method that human brain processes vision has been reflected through network.Along with Moore's Law promotes computer technology from strength to strength, Convolutional neural networks can the actual operation mode of more preferable mimic biology neutral net, it is to avoid pre-to the complicated early stage of image Process, original image can be directly inputted, thus obtained more being widely applied, be the most successfully applied to hand-written character and known Not, in recognition of face, human eye detection, pedestrian detection and robot navigation.
The basic system of convolutional neural networks includes multiple convolutional layer, and every layer is made up of multiple two dimensional surfaces, and each Plane is made up of multiple independent neurons.Each neuron is used for the local data of multi-medium data is carried out convolution algorithm, and And one input end also local receptor field with previous convolutional layer is connected, by the data of the local receptor field to previous convolutional layer Carry out convolution algorithm, to extract the feature of this local receptor field.
In prior art, acceleration processing unit is generally also used to be used as neuron, the local data to multi-medium data Carry out convolution algorithm.Existing acceleration processing unit is designed with an adder and one to each multi-medium data of input Multiplier, when this acceleration processing unit needs local data to be processed to have multiple, it is meant that each acceleration processing unit includes Multiple adders and multiple multiplier, this design causes the area accelerating processing unit sheet relatively big, and power consumption is big, and processing speed is also Have much room for improvement.
Summary of the invention
The application provides a kind of acceleration processing unit based on convolutional neural networks, for local data is carried out convolution fortune Calculate, described local data include multiple multi-medium data, described acceleration processing unit include the first depositor, the second depositor, 3rd depositor, the 4th depositor, the 5th depositor, multiplier, adder and the first MUX and the second multi-path choice Device;
First depositor is used for inputting multi-medium data, and its outfan is connected with the input of multiplier, by multimedia number According to being sent to multiplier;
Second depositor is used for input filter weights, and its outfan is connected with the input of multiplier, is weighed by wave filter Value is sent to multiplier;
Multiplier is for being multiplied multi-medium data with filter weights, and its outfan and the 3rd depositor connect, by phase Result after taking advantage of is sent to the 3rd depositor;
The outfan of the 3rd depositor and the first end of the first MUX connect;
Described first MUX second end connect adder, the 3rd end be previous acceleration processing unit part and Input, the 3rd depositor and adder are connected by described first MUX by state switching, or by previous acceleration The part of reason unit and input and adder connect;
Described adder is also connected with the 5th depositor and the 4th depositor, for the phase the first MUX transmitted Result or the part of previous acceleration processing unit after taking advantage of and carry out additive operation with the data in the 5th depositor, and will add up After result export the 4th depositor;
First end of described second MUX and the second end connect the 4th depositor and the 5th depositor respectively, described 4th depositor is connected to the 5th depositor by the second MUX.
Preferably, described first MUX keeps when acceleration processing unit is not fully complete the multiply-add operation of local data First state, is connected to adder by the 3rd depositor, switches after acceleration processing unit completes the multiply-add operation of local data It is the second state, part and the input of previous acceleration processing unit are connected to adder.
Preferably, described second MUX keeps when acceleration processing unit is not fully complete the multiply-add operation of local data It is the first state, the 4th depositor is connected to the 5th depositor, complete the multiply-add operation of local data at acceleration processing unit After switch to the second state, with by the 5th depositor reset.
Preferably, the 3rd end of described second MUX is for resetting end, and described second MUX is at acceleration Reason unit switches to the second state after completing the multiply-add operation of local data, and replacement end is connected to the 5th depositor.
Preferably, also include that first memory, second memory and the 3rd memorizer, described first memory and first are posted The input of storage connects, and needs, for input storage, the local data carrying out convolution algorithm, and by many in local data Individual multi-medium data is sent to the first depositor successively;The input of described second memory and the second depositor connects, and is used for Input and store filter weights, and filter weights is sent to the second depositor;Described 3rd memorizer and the 4th is deposited The input of device connects, the result after the addition inputting and storing adder output, and the result after will add up is sent to 4th depositor.
Preferably, it is characterised in that the result after described adder also will add up exports a rear acceleration processing unit.
The application provides a kind of array structure based on convolutional neural networks, including multiple described acceleration processing units, Multiple acceleration processing units are rendered as the matrix shape of 3 row N row, and wherein 3 and N is the integer more than or equal to 1, every string It is connected before and after accelerating processing unit.
Preferably, in every string, after the outfan of the adder of previous acceleration processing unit connects, one accelerates processing unit The 3rd end of the first MUX.
Preferably, with in the acceleration processing unit of a line, the filter weights of input is identical;It is positioned on same diagonal Accelerating in processing unit, the local data of input is identical.
Preferably, in the acceleration processing unit of different rows, the filter weights of input is different.
The invention has the beneficial effects as follows: single acceleration processing unit is by the first MUX and the second multi-path choice The control of device so that multiplier and adder are reusable, so that one is accelerated processing unit and has only to a multiplication Device and an adder can complete convolution algorithm, decrease the use of multiplier and adder, realizing same convolution fortune During calculation, the use reducing multiplier and adder will improve processing speed and reduce energy consumption, the most single acceleration processing unit Chip area is less.
Accompanying drawing explanation
A kind of based on convolutional neural networks the acceleration processing unit structured flowchart that Fig. 1 provides for the embodiment of the present invention;
The convolution algorithm mistake of a kind of based on convolutional neural networks the acceleration processing unit that Fig. 2 provides for the embodiment of the present invention Journey schematic diagram;
Fig. 3 is that a kind of array structure based on convolutional neural networks of the embodiment of the present invention arranges to distribution schematic diagram;
Fig. 4 is that a kind of array structure row based on convolutional neural networks of the embodiment of the present invention is to distribution schematic diagram;
Fig. 5 is a kind of array structure diagonal distribution schematic diagram based on convolutional neural networks of the embodiment of the present invention.
Detailed description of the invention
Combine accompanying drawing below by detailed description of the invention technical scheme is clearly and completely described, aobvious So, described embodiment is only a part of embodiment of the present invention rather than whole embodiments.
Embodiment one:
Refer to Fig. 1, the present embodiment provides a kind of acceleration processing unit based on convolutional neural networks, accelerates processing unit 61 include first depositor the 21, second depositor the 22, the 3rd depositor the 23, the 4th depositor the 24, the 5th depositor 25, multiplier 41, adder 51 and the first MUX 31 and the second MUX 32.
First depositor 21 is connected with an input of multiplier 41, and the first depositor 21 is used for inputting multi-medium data, And multi-medium data is sent to multiplier 41.Second depositor 22 is connected with another input of multiplier 41, and second deposits Device 22 is for input filter weights, and filter weights is sent to multiplier 41.The outfan of multiplier 41 and the 3rd is posted Storage 23 connects, and for being multiplied with filter weights by multi-medium data, and the result after being multiplied is sent to the 3rd depositor 23。
First end of the first MUX 31 and the outfan of the 3rd depositor 23 connect, and the second end connects adder 51 An input, the 3rd end is part and the input of previous acceleration processing unit.When the first MUX 31 is switched to During one state (such as setting to 0), the 3rd depositor 23 and adder 51 are connected by the first MUX 31, by the 3rd depositor 23 In data be sent to adder 51;When the first MUX 31 is switched to the second state (such as putting 1), the first multichannel choosing Select device 31 its 3rd end and adder 51 to be connected, by the part of previous acceleration processing unit and be sent to adder 51.
Another input and the 5th depositor 25 of adder 51 connect, the outfan of adder 51 and the 4th depositor 24 Connecting, adder 51 inputs the data in the 3rd depositor 23 and the 5th depositor 25, the data in two depositors is added Method computing, and result after will add up (also referred to as interior section and) output is to the 4th depositor 24.
First end of the second MUX 32 and the second end connect the 4th depositor 24 and the 5th depositor 25 respectively, the 3rd end of two MUX 32 is for resetting end.When the second MUX 32 is switched to the first state (such as setting to 0), the 4th depositor 24 and the 5th depositor 25 are connected by two MUX 32, by the interior section in the 4th depositor 24 and send out Deliver to the 5th depositor 25;When the second MUX 32 is switched to the second state (such as putting 1), the second MUX 32 Its 3rd end and the 5th depositor 25 are connected, resets the 5th depositor 25, make zeros data in the 5th depositor 25.
In the embodiment having, send data for convenience of to depositor, accelerate processing unit 61 also include first depositing 11, the Two memorizer 12 and the 3rd memorizeies 13, first memory 11 is connected with the input of the first depositor 21, is used for inputting and depositing Storage needs the local data carrying out convolution algorithm, and the multiple multi-medium datas in local data are sent to first successively deposit Device 21;Second memory 12 is connected with the input of the second depositor 22, is used for inputting and storing filter weights, and will filtering Device weights are sent to the second depositor 22;3rd memorizer 13 is connected with the input of the 4th depositor 24, is used for inputting and depositing The interior section of storage adder 51 output and, and by interior section and be sent to the 4th depositor 24.
Accelerating processing unit 61 to be used for local data is carried out convolution algorithm, local data includes multiple multi-medium data, Multi-medium data can be video data, view data, it is also possible to be voice data.When multi-medium data is video data, It is believed that the corresponding pixel of each multi-medium data.
Below as a example by view data, illustrate to accelerate the convolution algorithm process of processing unit 61.
In conjunction with Fig. 1 and Fig. 2, the work process of single acceleration processing unit 61 based on convolutional neural networks is as follows:
Step 10, reads and needs to carry out video data and the filter weights of convolution algorithm.If view data is not 0, figure As data are stored in first memory 11, needs when, it is sent to the first depositor 21 for extracting view data, If view data is 0, view data 0 is routed directly to the first depositor 21 and without extracting, takes the plan skipped or gate Slightly avoid non-essential reading and calculating;Filter weights is stored in second memory 12, is sent to needs when Second depositor 22 is used for filter weights data, and wherein, data extracting mode is that serial successively is extracted, i.e. first circulation In, this acceleration processing unit 61 first view data in the local data carrying out convolution algorithm is sent to first and deposits Device 21;Second circulate in, second view data is sent to the first depositor 21, after read in view data successively. Filter weights is required to produce according to convolution algorithm by processor.
Step 20, multiplying.View data in first depositor 21 and the filter weights in the second depositor 22 Being sent in multiplier 41 perform multiplying, the result after multiplier 41 is multiplied is output to the 3rd depositor 23.
Step 30, additive operation.It is not over owing to accelerating the multiply-add operation in processing unit 61, now the first multichannel Selector 31 sets to 0, and when the first MUX 31 is set to 0, in the 3rd depositor 23, view data is sent to adder 51 In, adder 51 is by view data and the front interior section in the 5th depositor 25 and is added.For primary inside Convolution operation, is zero in the 5th depositor 25, for follow-up inside convolution operation, is a front convolution in the 5th depositor 25 Interior section after operation and.Result (i.e. interior section and) after being added in this convolution operation is output to the 4th depositor 24, the most just complete once internal convolution operation, finally give first view data and filter weights part and.By Being not in the multiply-add operation accelerated in processing unit 61, now the second MUX 32 sets to 0, when the second multi-path choice When device 32 is set to 0, interior section and being sent to the 5th depositor 25 by the 4th depositor 24.
Step 40, this acceleration processing unit 61 judges whether to complete the inside convolution operation of all local datas, if interior When portion's convolution operation is not fully complete, step 10, step 20 and step 30 will be repeated in, and extract second view data, and input To the first depositor 21, second filter weights is input to the second depositor 22, the view data in the first depositor 21 and Filter weights in second depositor 22 is all sent in multiplier 41, and view data is multiplied with filter weights, draws Result is sent in the 3rd depositor by multiplier 41, is not over owing to accelerating the multiply-add operation in processing unit 61, this Time the first MUX 31 set to 0, the data in the 3rd depositor 23 are sent in adder 51 by the first MUX, With from the 5th depositor 25 data sue for peace, finally give secondary view data and filter weights part and. Part in adder 51 and being sent in the 4th depositor 24, now, does not also have owing to accelerating the multiply-add operation in processing unit Having end, now the second MUX 32 sets to 0, and the data in the 4th depositor 24 are sent to by the second MUX 31 In 5th depositor 25.Thus complete the inside convolution operation to second view data.Until extracting the last of local data One view data, this image information and filter weights are multiplied and after phase add operation through above-mentioned, obtain this acceleration and process The part of unit and, this part and by operation as hereinbefore, eventually enter in the 5th depositor 25.When all interior roll When long-pending operation completes, then will carry out step 50.
Step 50, after the multiply-add operation to local data accelerated in processing unit 61 terminates, the first MUX 31 and second MUX 32 put 1, when the first MUX 32 is set to 1, the part in previous acceleration processing unit It is sent in adder 51 with by the first MUX 31, final by this acceleration processing unit 61 of the 5th depositor 25 Part and being sent in adder 51, finally, the part of previous acceleration processing unit and with in this acceleration processing unit 61 Part and summation, obtain two accelerate processing unit superpositions parts and, the part of this superposition and output, be sent to the next one and add Speed processing unit.When the second MUX 32 will be set to 1 by state 0, and the 4th depositor 24 is no longer to 25, the 5th depositor The data in data, and the 5th depositor 25 are sent to be cleared.
In the present embodiment, single acceleration processing unit is by the first MUX 31 and the second MUX 32 Control so that multiplier 41 and adder 51 are reusable, so that one is accelerated processing unit and has only to a multiplication Device and an adder can complete convolution algorithm, decrease the use of multiplier and adder, realizing same convolution fortune During calculation, the use reducing multiplier and adder will improve processing speed and reduce energy consumption, the most single acceleration processing unit Chip area is less.
Embodiment two:
Refer to Fig. 3 to Fig. 5, it is shown that a kind of array structure based on convolutional neural networks, including multiple described acceleration Processing unit, multiple acceleration processing units are rendered as the matrix shape of M row N row, and wherein M and N is the integer more than or equal to 1, It is connected before and after the acceleration processing unit of every string.
In the present embodiment, multiple acceleration processing units are rendered as the matrix shape of 3 row 3 row, in every string, at previous acceleration After the outfan connection of the adder of reason unit, one accelerates the 3rd end of the first MUX of processing unit.
With in the acceleration processing unit of a line, the filter weights of input is identical;It is positioned at the acceleration on same diagonal In reason unit, the local data of input is identical.
In the acceleration processing unit of different rows, the filter weights of input is different.
Below in conjunction with the accompanying drawings, the convolutional layer calculating process of multiple acceleration processing unit is described.
In conjunction with Fig. 1 to Fig. 5, the calculating process of array structure based on convolutional neural networks is as follows:
As it is shown on figure 3, the adder 51 of previous acceleration processing unit connects later accelerates more than the first of processing unit Road selector 31, every a line output part and all vertically move, by former and later two accelerate processing units part and add up, Can be read in top line at the end of calculating process, calculate the beginning of process at the next one and delivered to the bottom row of array by buffer.
Such as, accelerate processing unit PE1.1, accelerate processing unit PE2.1 and accelerate processing unit PE3.1, enter the most respectively The internal convolution algorithm of row, final result is stored in respective 5th depositor 25, then, accelerates in processing unit PE3.1 defeated The part that goes out and with accelerate the part of the 5th depositor 25 in processing unit PE2.1 and at the addition accelerating processing unit PE2.1 In device 51, summation is cumulative again, obtain the most cumulative part and, part that described first time is cumulative and processed single by acceleration Unit PE2.1 is sent to accelerate in processing unit PE1.1, with accelerate the 5th depositor 25 in processing unit PE1.1 part and Accelerate the adder 51 of processing unit PE1.1 is sued for peace again, the part of final output these row all acceleration processing unit 1 with.
It is also pointed out that, as shown in Figure 4 and Figure 5, with in the acceleration processing unit of a line, the filter weights phase of input With, it being positioned in the acceleration processing unit on same diagonal, the view data of input is identical, the acceleration processing unit of different rows In, the filter weights of input is different.Owing to whole view data has several rows, and each acceleration processing unit simply processes whole Single line of data in individual view data, is therefore accomplished by acceleration processing unit having been processed every data line respectively again to every a line The convolution results of data carries out accumulation operations.Input data on same diagonal are identical, the input picture on different diagonal Data are different, and the view data being equivalent to the input on different diagonal is the different rows data of view data.And process difference The view data of row needs different filter weights, such as one filter weights to be used only to process the picture number of the first row According to, when processing the view data of the second row when, it is necessary to by new filter weights.Therefore the acceleration of same a line can be made Processing unit uses identical filter weights, and the processing unit that accelerates of different rows uses different filter weights.
Such as, accelerate processing unit PE1.1, accelerate processing unit PE1.2 and the wave filter accelerated in processing unit PE1.3 Weights are identical, and it is identical with accelerating the view data of input in processing unit PE1.2 to accelerate processing unit PE2.1, and acceleration processes single Unit PE1.1, the filter weights accelerated in processing unit PE2.2 and acceleration processing unit PE3.1 differ.
So achieve, process the multi-medium data of a line simultaneously, then the multi-medium data of different rows is used different Filter weights, after having processed every data line respectively, is carrying out accumulation operations to the most each row multi-medium data, thus soon Speed, reliably process whole multi-medium datas.
The present invention is illustrated by use above specific case, is only intended to help and understands the present invention, not in order to limit The present invention processed.For those skilled in the art, according to the thought of the present invention, it is also possible to make some simply Deduce, deform or replace.

Claims (10)

1. an acceleration processing unit based on convolutional neural networks, for carrying out convolution algorithm, described local to local data Data include multiple multi-medium data, it is characterised in that include the first depositor, the second depositor, the 3rd depositor, the 4th post Storage, the 5th depositor, multiplier, adder and the first MUX and the second MUX;
First depositor is used for inputting multi-medium data, and its outfan is connected with the input of multiplier, is sent out by multi-medium data Deliver to multiplier;
Second depositor is used for input filter weights, and its outfan is connected with the input of multiplier, filter weights is sent out Deliver to multiplier;
Multiplier is for being multiplied multi-medium data with filter weights, and its outfan and the 3rd depositor connect, after being multiplied Result be sent to the 3rd depositor;
The outfan of the 3rd depositor and the first end of the first MUX connect;
Second end of described first MUX connects adder, and the 3rd end is part and the input of previous acceleration processing unit End, the 3rd depositor and adder are connected by described first MUX by state switching, or previous acceleration are processed single The part of unit and input and adder connect;
Described adder is also connected with the 5th depositor and the 4th depositor, for by first MUX transmit be multiplied after Result or the part of previous acceleration processing unit and carry out additive operation with the data in the 5th depositor, and after will add up Result exports the 4th depositor;
First end of described second MUX and the second end connect the 4th depositor and the 5th depositor respectively, and the described 4th Depositor is connected to the 5th depositor by the second MUX.
Accelerate processing unit the most as claimed in claim 1, it is characterised in that described first MUX processes single in acceleration Unit keeps the first state when being not fully complete the multiply-add operation of local data, and the 3rd depositor is connected to adder, processes in acceleration Unit switches to the second state after completing the multiply-add operation of local data, by the part of previous acceleration processing unit and input even Receive adder.
Accelerate processing unit the most as claimed in claim 1, it is characterised in that described second MUX processes single in acceleration Unit remains the first state when being not fully complete the multiply-add operation of local data, the 4th depositor is connected to the 5th depositor, is adding Speed processing unit switches to the second state after completing the multiply-add operation of local data, to be reset by the 5th depositor.
Accelerate processing unit the most as claimed in claim 3, it is characterised in that the 3rd end of described second MUX is attached most importance to Putting end, described second MUX switches to the second state after acceleration processing unit completes the multiply-add operation of local data, Replacement end is connected to the 5th depositor.
5. the acceleration processing unit as according to any one of Claims 1-4, it is characterised in that also include first memory, Two memorizeies and the 3rd memorizer, the input of described first memory and the first depositor connects, and needs for inputting and storing The local data of convolution algorithm to be carried out, and the multiple multi-medium datas in local data are sent to the first depositor successively; The input of described second memory and the second depositor connects, and is used for inputting and storing filter weights, and is weighed by wave filter Value is sent to the second depositor;The input of described 3rd memorizer and the 4th depositor connects, and is used for inputting and storing addition Result after the addition of device output, and the result after will add up is sent to the 4th depositor.
6. the acceleration processing unit as according to any one of Claims 1-4, it is characterised in that described adder also will add up After result export rear one accelerate processing unit.
7. an array structure based on convolutional neural networks, it is characterised in that include multiple as arbitrary in claim 1 to 6 Acceleration processing unit described in Xiang, multiple acceleration processing units be rendered as M row N row matrix shape, wherein M and N for more than or Integer equal to 1, is connected before and after the acceleration processing unit of every string.
8. array structure as claimed in claim 7, it is characterised in that in every string, the adder of previous acceleration processing unit Outfan connect after the 3rd end of the first MUX accelerating processing unit.
9. as claimed in claim 7 or 8 array structure, it is characterised in that with in the acceleration processing unit of a line, the filter of input Ripple device weights are identical;Being positioned in the acceleration processing unit on same diagonal, the local data of input is identical.
10. array structure as claimed in claim 9, it is characterised in that in the acceleration processing unit of different rows, the filtering of input Device weights are different.
CN201610482653.7A 2016-06-27 2016-06-27 A kind of acceleration processing unit and array structure based on convolutional neural networks Expired - Fee Related CN106203617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610482653.7A CN106203617B (en) 2016-06-27 2016-06-27 A kind of acceleration processing unit and array structure based on convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610482653.7A CN106203617B (en) 2016-06-27 2016-06-27 A kind of acceleration processing unit and array structure based on convolutional neural networks

Publications (2)

Publication Number Publication Date
CN106203617A true CN106203617A (en) 2016-12-07
CN106203617B CN106203617B (en) 2018-08-21

Family

ID=57462215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610482653.7A Expired - Fee Related CN106203617B (en) 2016-06-27 2016-06-27 A kind of acceleration processing unit and array structure based on convolutional neural networks

Country Status (1)

Country Link
CN (1) CN106203617B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622305A (en) * 2017-08-24 2018-01-23 中国科学院计算技术研究所 Processor and processing method for neutral net
CN107844826A (en) * 2017-10-30 2018-03-27 中国科学院计算技术研究所 Neural-network processing unit and the processing system comprising the processing unit
CN107862378A (en) * 2017-12-06 2018-03-30 芯原微电子(上海)有限公司 Convolutional neural networks accelerated method and system, storage medium and terminal based on multinuclear
CN108038815A (en) * 2017-12-20 2018-05-15 深圳云天励飞技术有限公司 Integrated circuit
WO2018108126A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution operation device and method
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN108491926A (en) * 2018-03-05 2018-09-04 东南大学 A kind of hardware-accelerated design method of the efficient depth convolutional neural networks of low bit based on logarithmic quantization, module and system
CN108629405A (en) * 2017-03-22 2018-10-09 杭州海康威视数字技术股份有限公司 The method and apparatus for improving convolutional neural networks computational efficiency
CN108629406A (en) * 2017-03-24 2018-10-09 展讯通信(上海)有限公司 Arithmetic unit for convolutional neural networks
EP3388981A1 (en) * 2017-04-13 2018-10-17 Nxp B.V. A human-machine-interface system
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 For the arithmetic unit of neural network, chip, equipment and correlation technique
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN109993272A (en) * 2017-12-29 2019-07-09 北京中科寒武纪科技有限公司 Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC
CN110059818A (en) * 2019-04-28 2019-07-26 山东师范大学 Neural convolution array circuit core, processor and the circuit that convolution nuclear parameter can match
CN110494867A (en) * 2017-03-23 2019-11-22 三星电子株式会社 Method for operating the electronic device of machine learning and for operating machine learning
CN110659445A (en) * 2018-06-29 2020-01-07 龙芯中科技术有限公司 Arithmetic device and processing method thereof
CN111144556A (en) * 2019-12-31 2020-05-12 中国人民解放军国防科技大学 Hardware circuit of range batch processing normalization algorithm for deep neural network training and reasoning
CN112115095A (en) * 2020-06-12 2020-12-22 苏州浪潮智能科技有限公司 Reconfigurable hardware for Hash algorithm and operation method
CN112236783A (en) * 2018-03-13 2021-01-15 雷哥尼公司 Efficient convolution engine
CN112288085A (en) * 2020-10-23 2021-01-29 中国科学院计算技术研究所 Convolutional neural network acceleration method and system
CN112598122A (en) * 2020-12-23 2021-04-02 北方工业大学 Convolutional neural network accelerator based on variable resistance random access memory
CN113222126A (en) * 2020-01-21 2021-08-06 上海商汤智能科技有限公司 Data processing device and artificial intelligence chip
CN113361687A (en) * 2021-05-31 2021-09-07 天津大学 Configurable addition tree suitable for convolutional neural network training accelerator
CN113591025A (en) * 2021-08-03 2021-11-02 深圳思谋信息科技有限公司 Feature map processing method and device, convolutional neural network accelerator and medium
CN117369707A (en) * 2023-12-04 2024-01-09 杭州米芯微电子有限公司 Digital signal monitoring circuit and chip

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0422348A2 (en) * 1989-10-10 1991-04-17 Hnc, Inc. Two-dimensional systolic array for neural networks, and method
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system
CN103691058A (en) * 2013-12-10 2014-04-02 天津大学 Deep brain stimulation FPGA (Field Programmable Gate Array) experimental platform for basal ganglia and thalamencephalon network for parkinson's disease
CN104504205A (en) * 2014-12-29 2015-04-08 南京大学 Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method
CN105528191A (en) * 2015-12-01 2016-04-27 中国科学院计算技术研究所 Data accumulation apparatus and method, and digital signal processing device
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor
EP3035204A1 (en) * 2014-12-19 2016-06-22 Intel Corporation Storage device and method for performing convolution operations
EP3035249A1 (en) * 2014-12-19 2016-06-22 Intel Corporation Method and apparatus for distributed and cooperative computation in artificial neural networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0422348A2 (en) * 1989-10-10 1991-04-17 Hnc, Inc. Two-dimensional systolic array for neural networks, and method
US5471627A (en) * 1989-10-10 1995-11-28 Hnc, Inc. Systolic array image processing system and method
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system
CN103691058A (en) * 2013-12-10 2014-04-02 天津大学 Deep brain stimulation FPGA (Field Programmable Gate Array) experimental platform for basal ganglia and thalamencephalon network for parkinson's disease
EP3035204A1 (en) * 2014-12-19 2016-06-22 Intel Corporation Storage device and method for performing convolution operations
EP3035249A1 (en) * 2014-12-19 2016-06-22 Intel Corporation Method and apparatus for distributed and cooperative computation in artificial neural networks
CN104504205A (en) * 2014-12-29 2015-04-08 南京大学 Parallelizing two-dimensional division method of symmetrical FIR (Finite Impulse Response) algorithm and hardware structure of parallelizing two-dimensional division method
CN105528191A (en) * 2015-12-01 2016-04-27 中国科学院计算技术研究所 Data accumulation apparatus and method, and digital signal processing device
CN105681628A (en) * 2016-01-05 2016-06-15 西安交通大学 Convolution network arithmetic unit, reconfigurable convolution neural network processor and image de-noising method of reconfigurable convolution neural network processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
凡保磊: ""卷积神经网络的并行化研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陆志坚: ""基于FPGA的卷积神经网络并行结构研究"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018108126A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution operation device and method
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN108629405B (en) * 2017-03-22 2020-09-18 杭州海康威视数字技术股份有限公司 Method and device for improving calculation efficiency of convolutional neural network
CN108629405A (en) * 2017-03-22 2018-10-09 杭州海康威视数字技术股份有限公司 The method and apparatus for improving convolutional neural networks computational efficiency
CN110494867A (en) * 2017-03-23 2019-11-22 三星电子株式会社 Method for operating the electronic device of machine learning and for operating machine learning
CN110494867B (en) * 2017-03-23 2024-06-07 三星电子株式会社 Electronic device for operating machine learning and method for operating machine learning
US11907826B2 (en) 2017-03-23 2024-02-20 Samsung Electronics Co., Ltd Electronic apparatus for operating machine learning and method for operating machine learning
CN108629406B (en) * 2017-03-24 2020-12-18 展讯通信(上海)有限公司 Arithmetic device for convolutional neural network
CN108629406A (en) * 2017-03-24 2018-10-09 展讯通信(上海)有限公司 Arithmetic unit for convolutional neural networks
US11567770B2 (en) 2017-04-13 2023-01-31 Nxp B.V. Human-machine-interface system comprising a convolutional neural network hardware accelerator
EP3388981A1 (en) * 2017-04-13 2018-10-17 Nxp B.V. A human-machine-interface system
CN107622305A (en) * 2017-08-24 2018-01-23 中国科学院计算技术研究所 Processor and processing method for neutral net
CN107844826A (en) * 2017-10-30 2018-03-27 中国科学院计算技术研究所 Neural-network processing unit and the processing system comprising the processing unit
CN107844826B (en) * 2017-10-30 2020-07-31 中国科学院计算技术研究所 Neural network processing unit and processing system comprising same
WO2019104695A1 (en) * 2017-11-30 2019-06-06 深圳市大疆创新科技有限公司 Arithmetic device for neural network, chip, equipment and related method
CN108701015A (en) * 2017-11-30 2018-10-23 深圳市大疆创新科技有限公司 For the arithmetic unit of neural network, chip, equipment and correlation technique
CN107862378A (en) * 2017-12-06 2018-03-30 芯原微电子(上海)有限公司 Convolutional neural networks accelerated method and system, storage medium and terminal based on multinuclear
CN107862378B (en) * 2017-12-06 2020-04-24 芯原微电子(上海)股份有限公司 Multi-core-based convolutional neural network acceleration method and system, storage medium and terminal
CN108038815A (en) * 2017-12-20 2018-05-15 深圳云天励飞技术有限公司 Integrated circuit
WO2019119480A1 (en) * 2017-12-20 2019-06-27 深圳云天励飞技术有限公司 Integrated circuit
US10706353B2 (en) 2017-12-20 2020-07-07 Shenzhen Intellifusion Technologies Co., Ltd. Integrated circuit
CN109993272A (en) * 2017-12-29 2019-07-09 北京中科寒武纪科技有限公司 Convolution and down-sampled arithmetic element, neural network computing unit and field programmable gate array IC
CN108491926A (en) * 2018-03-05 2018-09-04 东南大学 A kind of hardware-accelerated design method of the efficient depth convolutional neural networks of low bit based on logarithmic quantization, module and system
US11580372B2 (en) 2018-03-13 2023-02-14 Recogni Inc. Efficient convolutional engine
CN112236783A (en) * 2018-03-13 2021-01-15 雷哥尼公司 Efficient convolution engine
US11694069B2 (en) 2018-03-13 2023-07-04 Recogni Inc. Methods for processing data in an efficient convolutional engine with partitioned columns of convolver units
US11694068B2 (en) 2018-03-13 2023-07-04 Recogni Inc. Methods for processing horizontal stripes of data in an efficient convolutional engine
US11645504B2 (en) 2018-03-13 2023-05-09 Recogni Inc. Methods for processing vertical stripes of data in an efficient convolutional engine
CN112236783B (en) * 2018-03-13 2023-04-11 雷哥尼公司 Efficient convolution engine
US11593630B2 (en) 2018-03-13 2023-02-28 Recogni Inc. Efficient convolutional engine
CN110659445A (en) * 2018-06-29 2020-01-07 龙芯中科技术有限公司 Arithmetic device and processing method thereof
CN110659445B (en) * 2018-06-29 2022-12-30 龙芯中科技术股份有限公司 Arithmetic device and processing method thereof
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN110059818A (en) * 2019-04-28 2019-07-26 山东师范大学 Neural convolution array circuit core, processor and the circuit that convolution nuclear parameter can match
CN111144556A (en) * 2019-12-31 2020-05-12 中国人民解放军国防科技大学 Hardware circuit of range batch processing normalization algorithm for deep neural network training and reasoning
CN113222126A (en) * 2020-01-21 2021-08-06 上海商汤智能科技有限公司 Data processing device and artificial intelligence chip
CN112115095B (en) * 2020-06-12 2022-07-08 苏州浪潮智能科技有限公司 Reconfigurable hardware for Hash algorithm and operation method
CN112115095A (en) * 2020-06-12 2020-12-22 苏州浪潮智能科技有限公司 Reconfigurable hardware for Hash algorithm and operation method
CN112288085A (en) * 2020-10-23 2021-01-29 中国科学院计算技术研究所 Convolutional neural network acceleration method and system
CN112288085B (en) * 2020-10-23 2024-04-09 中国科学院计算技术研究所 Image detection method and system based on convolutional neural network
CN112598122A (en) * 2020-12-23 2021-04-02 北方工业大学 Convolutional neural network accelerator based on variable resistance random access memory
CN112598122B (en) * 2020-12-23 2023-09-05 北方工业大学 Convolutional neural network accelerator based on variable resistance random access memory
CN113361687B (en) * 2021-05-31 2023-03-24 天津大学 Configurable addition tree suitable for convolutional neural network training accelerator
CN113361687A (en) * 2021-05-31 2021-09-07 天津大学 Configurable addition tree suitable for convolutional neural network training accelerator
CN113591025A (en) * 2021-08-03 2021-11-02 深圳思谋信息科技有限公司 Feature map processing method and device, convolutional neural network accelerator and medium
CN117369707A (en) * 2023-12-04 2024-01-09 杭州米芯微电子有限公司 Digital signal monitoring circuit and chip
CN117369707B (en) * 2023-12-04 2024-03-19 杭州米芯微电子有限公司 Digital signal monitoring circuit and chip

Also Published As

Publication number Publication date
CN106203617B (en) 2018-08-21

Similar Documents

Publication Publication Date Title
CN106203617A (en) A kind of acceleration processing unit based on convolutional neural networks and array structure
CN111684473B (en) Improving performance of neural network arrays
CN105681628B (en) A kind of convolutional network arithmetic element and restructural convolutional neural networks processor and the method for realizing image denoising processing
CN108416437A (en) The processing system and method for artificial neural network for multiply-add operation
CN107341544A (en) A kind of reconfigurable accelerator and its implementation based on divisible array
WO2019136764A1 (en) Convolutor and artificial intelligent processing device applied thereto
CN107886167A (en) Neural network computing device and method
EP0504932A2 (en) A parallel data processing system
EP0505179A2 (en) A parallel data processing system
CN106022468A (en) Artificial neural network processor integrated circuit and design method therefor
CN105608490B (en) Cellular array computing system and communication means therein
CN111462137A (en) Point cloud scene segmentation method based on knowledge distillation and semantic fusion
CN107679522A (en) Action identification method based on multithread LSTM
CN107341542A (en) Apparatus and method for performing Recognition with Recurrent Neural Network and LSTM computings
CN111626184B (en) Crowd density estimation method and system
TWI719512B (en) Method and system for algorithm using pixel-channel shuffle convolution neural network
CN111465943A (en) On-chip computing network
CN108320018A (en) A kind of device and method of artificial neural network operation
CN110378250A (en) Training method, device and the terminal device of neural network for scene cognition
CN110009644B (en) Method and device for segmenting line pixels of feature map
CN108491924A (en) A kind of serial stream treatment device of Neural Network Data calculated towards artificial intelligence
Nathan et al. Skeletonnetv2: A dense channel attention blocks for skeleton extraction
KR20090086660A (en) Computer architecture combining neural network and parallel processor, and processing method using it
CN111886605B (en) Processing for multiple input data sets
CN113592021B (en) Stereo matching method based on deformable and depth separable convolution

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180821

Termination date: 20190627