CN108537325A - The method for operating neural network device - Google Patents
The method for operating neural network device Download PDFInfo
- Publication number
- CN108537325A CN108537325A CN201810167217.XA CN201810167217A CN108537325A CN 108537325 A CN108537325 A CN 108537325A CN 201810167217 A CN201810167217 A CN 201810167217A CN 108537325 A CN108537325 A CN 108537325A
- Authority
- CN
- China
- Prior art keywords
- feature vector
- input feature
- index
- value
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 238
- 238000000034 method Methods 0.000 title claims abstract description 49
- 239000013598 vector Substances 0.000 claims abstract description 439
- 238000013507 mapping Methods 0.000 claims abstract description 181
- 239000011159 matrix material Substances 0.000 claims description 47
- 238000012545 processing Methods 0.000 description 90
- 230000015654 memory Effects 0.000 description 79
- 238000003860 storage Methods 0.000 description 38
- 238000004891 communication Methods 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 11
- 238000005070 sampling Methods 0.000 description 11
- 230000004913 activation Effects 0.000 description 10
- 238000007792 addition Methods 0.000 description 9
- 230000008859 change Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000003068 static effect Effects 0.000 description 6
- 239000003638 chemical reducing agent Substances 0.000 description 5
- 210000005036 nerve Anatomy 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 235000013399 edible fruits Nutrition 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000005611 electricity Effects 0.000 description 3
- 238000012886 linear function Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 102100034112 Alkyldihydroxyacetonephosphate synthase, peroxisomal Human genes 0.000 description 1
- 241000256844 Apis mellifera Species 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 101000799143 Homo sapiens Alkyldihydroxyacetonephosphate synthase, peroxisomal Proteins 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000000848 angular dependent Auger electron spectroscopy Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/535—Dividing only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/48—Indexing scheme relating to groups G06F7/48 - G06F7/575
- G06F2207/4802—Special implementations
- G06F2207/4818—Threshold devices
- G06F2207/4824—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
A method of operation neural network device, it can be based on input feature vector mapping graph and generate input feature vector list, the wherein described input feature vector list includes input feature vector index and input feature vector value, based on and the input feature vector list in the included corresponding input feature vector index of input feature vector and weight index corresponding with weight included in weighted list generate output aspect indexing, and be based on and the corresponding input feature vector value of the input feature vector and and the corresponding weighted value of the weight generate output characteristic value corresponding with the output aspect indexing.
Description
[cross reference of related application]
The South Korea patent application 10- that this application claims file an application in Korean Intellectual Property Office on March 3rd, 2017
The disclosure of No. 2017-0027778 right, the South Korea patent application is incorporated by this case for reference.
Technical field
Concept of the present invention is related to semiconductor device, and systems is configured to index based on one or more
The method for executing the neural network device and one or more operation neural network devices of operation.
Background technology
Neural network refers to the computing architecture (computational architecture) of the model as biological brain.
With nerual network technique has been developed recently, has a large amount of research and use neural network in various types of electronic systems
Device come analyze input data and extraction effective information.
Neural network device can be directed to complicated input data and execute relatively great amount of operation (" neural network computing ").Phase
Hope the efficient process for realizing neural network computing so that neural network device analyzes fine definition and inputs and extract information in real time.
Invention content
Concept of the present invention provides a kind of neural network device and its operation side for improving the speed of service and reducing power consumption
Method.
According to some exemplary embodiments, it is a kind of operation neural network device method may include:It is reflected based on input feature vector
It penetrates figure and generates input feature vector list, the input feature vector list includes input feature vector index and input feature vector value, and the input is special
Sign index and the input feature vector value correspond to input feature vector;Based on the weight rope to input feature vector index and weighted list
The first operation of row is introduced, output aspect indexing is generated;And based on to the input feature vector value and with the weight index pair
The second operation that the weighted value answered carries out generates output characteristic value corresponding with the output aspect indexing.
According to other exemplary embodiments, it is a kind of operation neural network device method may include:Generate input feature vector
List, the input feature vector list include input feature vector index corresponding with having the input feature vector of nonzero value and input feature vector
Value, the input feature vector index indicate position of the input feature vector on input feature vector mapping graph;Based on to input spy
The index operation that sign index carries out generates output aspect indexing;And based on the data operation carried out to the input feature vector value,
Generate output characteristic value corresponding with the output aspect indexing.
According to some exemplary embodiments, a kind of neural network device may include:First memory, store instruction program;
And processor.The processor can be configured to execute described instruction program to be based on input feature vector index execution index fortune
It calculates, position of the input feature vector index instruction input feature vector on input feature vector mapping graph, the rope based on the index operation
Draw operation result, generate output aspect indexing, the input feature vector value based on the input feature vector executes data operation, Yi Jiji
In the data operation data operation as a result, generating corresponding with output aspect indexing output characteristic value.
According to some exemplary embodiments, it is a kind of operation neural network device method may include:Use the rope of processor
Draw re-mapper based on input feature vector mapping graph to generate input feature vector list, the input feature vector list includes input feature vector rope
Draw and input feature vector value, the input feature vector index and the input feature vector value correspond to input feature vector;And make index replay
Emitter executes the first operation to generate output aspect indexing.First operation may include:The input feature vector is indexed and weighed
The weight index for rearranging table is added, and will be added obtained addition value divided by integer by described, and based on determining to remove described
There is no remainders to select the quotient of the division as the output aspect indexing after method is completed.
Description of the drawings
Following detailed description is read in conjunction with the figure, the exemplary embodiment of concept of the present invention will be more clearly understood, attached
In figure:
Fig. 1 is the block diagram according to the electronic system of some exemplary embodiments of concept of the present invention.
Fig. 2 is the figure according to the neural network framework of some exemplary embodiments.
Fig. 3 is the figure according to the input feature vector list of some exemplary embodiments of concept of the present invention.
Fig. 4 is the flow according to the neural network computing method based on index of some exemplary embodiments of concept of the present invention
Figure.
Fig. 5 is the flow chart according to the convolution algorithm method based on index of some exemplary embodiments of concept of the present invention.
Fig. 6 is the figure according to the convolution algorithm of some exemplary embodiments.
Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F be convolution algorithm shown in Fig. 6 during significance arithmetic knot
The figure of the snapshot of fruit.
Fig. 8 A, Fig. 8 B and Fig. 8 C are for explaining according to some exemplary embodiments of concept of the present invention based on index
The figure of convolution algorithm.
Fig. 9 A and Fig. 9 B are for explaining the convolution fortune based on index according to some exemplary embodiments of concept of the present invention
The figure of calculation.
Figure 10 is the flow chart according to the zero-padding method based on index of some exemplary embodiments of concept of the present invention.
Figure 11 A be according to some exemplary embodiments in neural network to input feature vector mapping graph application zero padding
The figure of example.
Figure 11 B are for explaining the zero-padding method based on index according to some exemplary embodiments of concept of the present invention
Figure.
Figure 12 is to use stride in the convolution algorithm based on index according to some exemplary embodiments of concept of the present invention
Method flow chart.
Figure 13 A and Figure 13 B are the figures of the output eigenmatrix generated when using stride in convolution.
Figure 14 is the flow chart according to the pond method based on index of some exemplary embodiments of concept of the present invention.
Figure 15 is the figure for explaining the pond operation based on index according to some exemplary embodiments of concept of the present invention.
Figure 16 is the block diagram according to the neural network device of some exemplary embodiments of concept of the present invention.
Figure 17 is the block diagram according to the neural network processor of some exemplary embodiments of concept of the present invention.
Figure 18 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the first fortune
The figure of the state run in row pattern.
Figure 19 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the second fortune
The figure of the state run in row pattern.
Figure 20 is the figure of the data flow during the convolution algorithm according to some exemplary embodiments.
Figure 21 and Figure 22 is holding in the neural network based on index according to some exemplary embodiments of concept of the present invention
The figure of data processing during capable convolution algorithm.
Figure 23 is the figure according to the neural network processor of some exemplary embodiments of concept of the present invention.
Figure 24 is the volume executed in the neural network based on index according to some exemplary embodiments of concept of the present invention
The figure of data processing during product operation.
Figure 25 is the figure according to the neural network processor of some exemplary embodiments of concept of the present invention.
[explanation of symbol]
10:Neural network;
11:First layer/layer;
12:The second layer/layer;
13:Third layer/layer;
21、21a:Index re-mapper;
21b:Address remapped device;
22、22a、22b:First data operation circuit;
23、23a、23b:Second data operation circuit;
24、24a、24b:Private memory;
100:Electronic system;
110:Central processing unit;
120:Random access memory;
130、200:Neural network device;
140:Memory;
150:Sensor assembly;
160:Communication module;
170:Bus;
210、210a、210b:Neural network processor;
211、211b_0、211b_k:Processing circuit;
211a_0:First processing circuit/processing circuit;
211a_k:Kth processing circuit/processing circuit;
212:Internal storage;
213:List maker;
214:Compressor reducer;
215、215a:Selector;
216:Global accumulator;
220:Controller;
230:System storage;
CA:Second index/input feature vector index;
CL:Classification;
CH0:First channel group/channel group;
CH1:Second channel group/channel group;
CH2、CH3、CH4、CH5、CH6、CH7、CH8、CHn-2、CHn-1:Channel group;
CHk:Channel;
D:Depth;
D0,0~D7,9、f3,2、IFB:Input feature vector;
DATA:Input feature vector value;
f1,1、f1,4、f4,3:Input feature vector/non-zero input feature vector;
F0:First input feature vector;
F1:Second input feature vector;
F2:Third input feature vector;
F3:4th input feature vector;
F4:5th input feature vector;
F5:6th input feature vector;
F6:7th input feature vector;
F7:8th input feature vector;
F8:9th input feature vector;
FM1:Fisrt feature mapping graph/Feature Mapping figure;
FM2:Second feature mapping graph/Feature Mapping figure;
FM3:Third feature mapping graph/Feature Mapping figure;
H:Highly;
IFL:Input feature vector list;
IFM:Input feature vector mapping graph;
IFMa:Input feature vector mapping graph/initial input feature list;
IFM_Z:Input feature vector mapping graph through zero padding;
IFM_Za:Input feature vector mapping graph/be filled through input feature vector mapping graph;
IFMX:Input feature vector matrix;
IWL:Initial weight list;
KN0:First kernel/kernel;
KN1:Second kernel/kernel;
KN2:Third kernel/kernel;
KN3:Four kernels/kernel;
KN4:Five kernels/kernel;
KNk:Kernel;
LUT:Look-up table;
MWL:Mirror image weighted list;
OFL1:First output feature list;
OFL2:Second output feature list;
OFM:Export Feature Mapping figure;
OFMX、OFMX_S1、OFMX_S3:Export eigenmatrix;
PW:The pond window of two-dimentional pond window/pond window/2 × 2;
RA:First index/input feature vector index;
REC:Identification signal;
S110、S120、S130、S140、S150、S210、S220、S230、S240、S310、S320、S410、S420、
S430、S440、S450、S460、S510、S520、S530:Operation;
S710、S720、S730、S740、S750、S760:Significance arithmetic result;
W:Width;
W0,1、W2,2:Weight/non-zero weight;
WL:Weighted list;
WM:Weight mapping graph;
WMX:Weight matrix;
①:First position;
②:The second position;
③:The third place;
④:4th position;
⑤:5th position;
⑥:6th position.
Specific implementation mode
Fig. 1 is the block diagram according to the electronic system of some exemplary embodiments of concept of the present invention.Fig. 2 is shown according to some
The figure of the example of the neural network framework of example property embodiment.Fig. 3 is the input according to some exemplary embodiments of concept of the present invention
The figure of feature list.
Electronic system 100 can analyze input data in real time based on neural network, extract effective information, and be based on being extracted
Information carry out each element of certain situation or control installation electronic device on electronic system 100.Electronic system 100 can be used
In unmanned plane (drone), for example advanced driving assistance system (advanced driver assistance system, ADAS)
Equal robot devices, smart television (television, TV), smart mobile phone, medical treatment device, mobile device, image display device,
Measuring device and Internet of Things (internet of things, IoT) device.Electronic system 100 can be mounted on other various electronics
In any one in device.
Referring to Fig.1, electronic system 100 may include central processing unit (central processing unit, CPU) 110,
Random access memory (random access memory, RAM) 120, neural network device 130, memory 140, sensor
Module (also referred herein as " sensor device ") 150 and communication (or transmitting/reception (Tx/Rx)) module are (herein
Also referred to as " communication device ", " communication interface " and/or " communication transceiver ") 160.Electronic system 100 may also comprise input/
Output module, security module and power control device.Each element of electronic system 100 is (that is, central processing unit 110, random
Access memory 120, neural network device 130, memory 140, sensor assembly 150 and communication module 160) in it is certain
Element can be installed on a semiconductor chip in a.As shown in Figure 1, the element of electronic system can be coupled by bus 170.
Central processing unit 110 controls the integrated operation of electronic system 100.Central processing unit 110 may include single core processor
Or multi-core processor.Central processing unit 110 can handle or execute the program and/or data being stored in memory 140.Citing comes
It says, central processing unit 110 may be implemented within the program (" one or more instruction repertories ") in memory 140 to control
The function of neural network device 130 in implementation operation described herein certain operations or all operationss.
Random access memory 120 can interim storage procedure, data or instruction.The program being stored in memory 140
And/or data can be temporarily stored in random access memory 120 according to the control of central processing unit 110 or according to guidance code
In.Random access memory 120 can be implemented as dynamic random access memory (dynamic RAM, DRAM) or static random
Access memory (static RAM, SRAM).
Neural network device 130 can be based on input data and execute neural network computing and can be based on operation (" neural network fortune
Calculate ") result generate information signal.Neural network may include convolutional neural networks (convolutional neural
Networks, CNN), recurrent neural network (recurrent neural networks, RNN), depth belief network (deep
Belief networks) and limited Boltzmann machine (restricted Boltzmann machines), but be not limited only to
This.
Information signal may include such as speech recognition signal, object identification information number, image identification signal and biological characteristic
One kind in the various identification signals such as identification signal.Neural network device 130 can receive frame data included in video flowing and make
For input data and can by the frame data generate relative to object included in the image indicated by frame data identification believe
Number.However, concept of the present invention is not limited only to this.Neural network device 130 can be according to the electricity installed above for having electronic system 100
The type or function of sub-device receive the input data of various (" various types ") and can generate identification signal according to input data.
The example of neural network framework will be briefly illustrated with reference to Fig. 2.
Example Fig. 2 shows the structure of convolutional neural networks as neural network framework.With reference to Fig. 2, neural network 10 can
Including multiple layers, such as first layer 11, the second layer 12 and third layer 13.It can be pond that first layer 11, which can be convolutional layer, the second layer 12,
Change layer, and third layer 13 can be output layer.Output layer can be full articulamentum (fully-connected layer).In addition to Fig. 2 institutes
Except first layer 11, the second layer 12 and the third layer 13 shown, neural network 10 may also include active layer and may also include another
Convolutional layer, another pond layer or another full articulamentum.
Each in first layer 11, the second layer 12 and third layer 13 can receive input data or be generated in preceding layer
Feature Mapping figure be used as input feature vector mapping graph, and can be special to generate output by executing operation to input feature vector mapping graph
Levy mapping graph or identification signal REC.At this point, Feature Mapping figure is the data for the various features for indicating input data.Feature Mapping
Figure FM1, FM2 and FM3 can have two-dimensional matrix form or three-dimensional matrice form.These features with multi-dimensional matrix form are reflected
It penetrates figure FM1, FM2 and FM3 and is referred to alternatively as characteristic tensor (feature tensor).Feature Mapping figure FM1, FM2 and FM3 have
Can respectively with the x-axis in coordinate system, y-axis and the corresponding width of z-axis (or row) W, height (or row) H and depth D.Depth D can
It is referred to as channel number.
Position on the x/y plane of Feature Mapping figure is referred to alternatively as spatial position.Position in the z-axis of Feature Mapping figure can
It is referred to as channel.Size on the x/y plane of Feature Mapping figure is referred to alternatively as space size.
First layer 11 can execute convolution to generate second feature mapping to fisrt feature mapping graph FM1 and weight mapping graph WM
Scheme FM2.Weight mapping graph WM can be filtered and be referred to alternatively as filter or kernel to fisrt feature mapping graph FM1
(kernel).The depth (that is, channel number) of weight mapping graph WM can be with the depth of fisrt feature mapping graph FM1 (that is, the number of channel
Mesh) it is identical.Convolution can be executed to identical channel in weight mapping graph WM and fisrt feature mapping graph FM1 the two.Weight maps
Figure WM is translated by being used as sliding window (sliding window) traversal fisrt feature mapping graph FM1.Translational movement can be claimed
For " stride length (stride length) " or " stride ".During translation, each included power in weight mapping graph WM
All characteristic values that weight can be multiplied by and be added in the regions Chong Die with fisrt feature mapping graph FM1 weight mapping graph WM.Pass through
A channel of second feature mapping graph FM2 can be generated by executing convolution to fisrt feature mapping graph FM1 and weight mapping graph WM.To the greatest extent
Pipe Fig. 2 only shows a weight mapping graph WM, however can actually have multiple weight mapping graphs and fisrt feature mapping graph FM1 into
Row convolution is to generate multiple channels of second feature mapping graph FM2.In other words, the channel number of second feature mapping graph FM2
It can correspond to the number of weight mapping graph.
The second layer 12 can perform pond to generate third feature mapping graph FM3.Pond is referred to alternatively as sampling or down-sampling
(downsampling).Two-dimentional pond window PW can be translated on second feature mapping graph FM2, and pond window may be selected
Maximum value (or average value of characteristic value) in characteristic value in region Chong Die with second feature mapping graph FM2 PW, so that
Third feature mapping graph FM3 can be generated from second feature mapping graph FM2 by obtaining.The channel number of third feature mapping graph FM3 can be with
The channel number of second feature mapping graph FM2 is identical.
In some exemplary embodiments, pond window PW can be on second feature mapping graph FM2 with pond window PW's
Size is that unit is translated.Translational movement (that is, stride length of pond window PW) can be identical as the length of pond window PW.Cause
This, the space size of third feature mapping graph FM3 is smaller than the space size of second feature mapping graph FM2.However, the present invention is general
Thought is not limited only to this.The space size of third feature mapping graph FM3 can be identical to or the sky more than second feature mapping graph FM2
Between size.Whether the space size of third feature mapping graph FM3 according to the size of pond window PW, stride length and can have been held
Row zero padding determines,
Third layer 13 can be combined the feature of third feature mapping graph FM3 and return to the classification CL of input data
Class.Third layer 13 can also generate identification signal REC corresponding with classification CL.Input data can correspond in video flowing included
Frame data.At this point, third layer 13 can be extracted and by frame data table based on the third feature mapping graph FM3 provided from the second layer 12
The corresponding classification of object included in the image shown identifies the object, and generates identification signal corresponding with the object
REC。
In neural network, low-level layers (for example, convolutional layer) can be rudimentary from input data or the extraction of input feature vector mapping graph
Feature (for example, edge or gradient of face image) and higher-level layer (for example, full articulamentum) can be extracted from input feature vector mapping graph
Or detection advanced features (that is, classification) (for example, eyes and nose of face image).
Referring to Fig.1, neural network device 130 can perform the neural network computing based on index.At this point, index indicates feature
Spatial position or weight.Index may include that the rows and columns respectively with two-dimensional matrix corresponding first index and second indexes.Again
Each index in secondary explanation, input feature vector index and weight index may include the first index and the second index, wherein inputting
First index of aspect indexing is corresponding to the row of input feature vector matrix, and the second index of input feature vector index is corresponding to input feature vector
Matrix column, the first index of weight index corresponds to the row of weight matrix, and the second index of weight index corresponds to weight
Matrix column.
Neural network device 130 can based on index execute with above by reference to Fig. 2 illustrate neural network multiple layers in
The corresponding operation of at least one layer.Neural network device 130 can be based on the input feature vector mapping graph in matrix form (below
In, referred to as input feature vector matrix) generate it is corresponding with each input feature vector include index and the input feature vector list of data and
It can be based on index and execute operation.
As shown in figure 3, neural network device 130 can generate input feature vector list from input feature vector matrix.Input feature vector arranges
Table may include first index RA corresponding with the spatial position of input feature vector and the second index CA.Index is referred to alternatively as address and the
One index RA and the second index CA can be known respectively as row address and column address.Input feature vector list may also comprise and each rope
Draw corresponding data (that is, input feature vector value).
Neural network computing based on index may include indexing operation.Operation is indexed to each in input feature vector list
Input feature vector indexes and the index of different parameters executes operation.Index operation is referred to alternatively as index and remaps (index
remapping).When executing index operation, it can simplify or skip data operation (that is, the operation executed to input feature vector value).
As shown in figure 3, input feature vector list may include and the input feature vector f with nonzero value1,1、f1,4And f4,3In it is every
One corresponding index and data.Neural network device 130 can execute the fortune based on index to the input feature vector with nonzero value
It calculates.
Meanwhile the weight mapping graph used in convolution algorithm can be converted into weighted list and be provided to neural network dress
Set 130.Weighted list may include index corresponding with having each weight of nonzero value and data.To avoid term from obscuring,
Index and data in input feature vector list are referred to alternatively as the index in input feature vector index and input feature vector value and weighted list
And data will be referred to as weight index and weighted value.
Neural network device 130 can be based on the index in input feature vector list and the index in weighted list to non-zero
The input feature vector and weight of value execute convolution algorithm.
Zero in neural network computing does not interfere with the result of operation.Therefore, neural network device 130 can be based on having
The input feature vector of nonzero value generates input feature vector list and executes operation based on the index in input feature vector list, so that
Neural network device 130 only can execute operation to the input feature vector with nonzero value.Therefore, it can skip to the input with zero
The operation that feature carries out.
However, concept of the present invention can be not limited only to this.Input feature vector list may also include special with the input with zero
Levy corresponding index and data.It is special that neural network device 130 can generate input based on the input feature vector with zero or nonzero value
It levies list and index can be based on and execute operation.
Referring back to Fig. 1, memory 140 is memory element for storing data.Memory 140 can storage program area
(operating system, OS), various programs and various data.Memory 140 can store intermediate result (for example, in operation
Period by export feature list or export eigenmatrix in the form of generate output Feature Mapping figure).Compressed output feature is reflected
Figure is penetrated to be storable in memory 140.Memory 140 can also store the various parameters (example used by neural network device 130
Such as, weight mapping graph or weighted list).
Memory 140 can be dynamic random access memory, but be not limited only to this.Memory 140 may include volatibility
At least one of memory and nonvolatile memory.Nonvolatile memory includes read-only memory (read-only
Memory, ROM), programmable read only memory (programmableROM, PROM), electrically programmable read-only memory
(electrically programmable ROM, EPROM), electrically erasable programmable read-only memory (electrically
Erasable programmableROM, EEPROM), flash memory, phase change random access memory devices (phase-change
RAM, PRAM), magnetic RAM (magnetic RAM, MRAM), resistive random access memory
(resistive RAM, RRAM) and ferroelectric random access memory (ferroelectric RAM, FeRAM).Volatibility is deposited
Reservoir may include dynamic random access memory, static RAM, Synchronous Dynamic Random Access Memory
(synchronous DRAM, SDRAM), phase change random access memory devices, magnetic RAM, resistive random access
Memory and ferroelectric random access memory.Alternatively, memory 140 may include it is following at least one
Kind:Hard disk drive (hard disk drive, HDD), solid state drive (solid state drive, SSD), close-coupled dodge
Fast (compact flash, CF) card, secure digital (secure digital, SD) card, miniature secure digital (micro-
Secure digital, micro-SD) block, small-sized safety digital (mini-secure digital, mini-SD) card, limit number
Word (extreme digital, xD) blocks or memory stick (memory stick).
Sensor assembly 150 collects the peripheral information of the electronic device of installation on electronic system 100.Sensor assembly
150 can from external electronic device sense or receive signal (for example, vision signal, audio signal, magnetic signal, bio signal or
Touching signals) and sensed or received signal can be converted into data.For this operation, sensor assembly 150
It may include at least one of for example following various sensing device furthers:Microphone, image capturing device (image pickup
Device), imaging sensor, optical detection and ranging (light detection and ranging, LIDAR) sensor, ultrasound
Wave sensor, infrared sensor, biosensor and touch sensing.
Sensor assembly 150 can provide data to neural network device 130 and be used as input data.For example, it senses
Device module 150 may include imaging sensor.At this point, sensor assembly 150 can shoot the external environment of electronic device, video is generated
Stream, and the continuous data frame of video flowing is sequentially provided neural network device 130 and is used as input data.However, of the invention
Concept is not limited only to this.Sensor assembly 150 can provide various types of data to neural network device 130.
Communication module 160 may include the various types of wireline interfaces or wireless interface communicated with external device (ED).It lifts
For example, communication module 160 may include being able to access that the communication interface with lower network:LAN (local area network,
LAN);WLAN (wireless LAN, WLAN), such as Wireless Fidelity (wireless fidelity, Wi-Fi);Wirelessly
Personal area network (wireless personal area network, WPAN), such as bluetooth;Radio universal serial bus
(universal serial bus, USB);Purple honeybee (ZigBee);Near-field communication (near field communication,
NFC);Radio frequency identification (radio-frequency identification, RFID);Power line communication (power line
Communication, PLC);Or mobile cellular network, such as the third generation (third generation, 3G), forth generation
(fourth generation, 4G) or long-term evolution (long term evolution, LTE).
Communication module 160 can receive weight mapping graph or weighted list from external server.External server can be based on big
It includes the weight mapping graph or weighted list for training weight that amount learning data, which executes training and can be provided to electronic system 100,.Institute
The weight mapping graph or weighted list received is storable in memory 140.
Communication module 160 can be based on operation result (for example, exporting feature list or output eigenmatrix during operation
The output Feature Mapping figure that form generates) generate and/or transmit information signal.
As described above, according to some exemplary embodiments of concept of the present invention, neural network device 130 can be by being based on rope
Draw and executes neural network computing to be effectively carried out neural network computing.Specifically, neural network device 130 can be special wherein
It levies and is generated and the input feature vector with nonzero value in the sparse neural network that nonzero value in mapping graph or weight mapping graph is sparse
Corresponding input feature vector list and based on input feature vector list to nonzero value input feature vector execute operation, to reduce fortune
Calculation amount.As operand is reduced, the efficiency of neural network device 130 is improved and neural network device 130 and electronic system
100 power consumption is reduced.The various embodiments of neural network computing method described in detail below based on index.
Fig. 4 is the flow according to the neural network computing method based on index of some exemplary embodiments of concept of the present invention
Figure.Neural network computing method shown in Fig. 4 can execute in neural network device 130 and can be applied to nerve net shown in Fig. 2
The operation of the layer 11,12 and 13 of network 10.
With reference to Fig. 4, in operation sl 10, neural network device 130 can generate input feature vector list.For example, neural
Network equipment 130 can generate input feature vector table from the input feature vector mapping graph of matrix form.It is defeated as illustrated above by reference to Fig. 3
Enter feature list and may include input feature vector index corresponding with each input (" input feature vector ") and input feature vector value.It is described defeated
Nonzero value can be had by entering.Input feature vector index can indicate position of the input feature vector on input feature vector mapping graph.
In operation s 120, neural network device 130 can be indexed based on the input feature vector in input feature vector list and execute rope
Draw operation and output aspect indexing is generated based on index operation result.The index operation result for indexing operation can be output feature rope
Draw.
In operating S130, neural network device 130 can execute data based on the input feature vector value in input feature vector list
Operation simultaneously generates output characteristic value corresponding with output aspect indexing based on data operation result.At this point, being produced when in operation S120
Raw output aspect indexing is non-mapped when exporting in Feature Mapping figure, and neural network device 130 can skip data operation.Number
Data operation result according to operation can be output characteristic value corresponding with output aspect indexing.
In operating S140, neural network device 130 can be based on output aspect indexing and output characteristic value generates output spy
Levy list.Neural network device 130 executes operation S120 and S130 to generate to all input feature vectors in input feature vector list
Export feature list.Illustrate again, at operation S110, neural network device 130 can be generated including multiple input aspect indexing
And the input feature vector list of multiple input characteristic value, the multiple input feature vector index correspond to independent in multiple input feature
Input feature vector, the multiple input feature vector value correspond to the multiple input feature vector in individual input feature vector, and nerve
Network equipment 130 can be based further on individually corresponding input feature vector and execute one group of operation S120 and S130 respectively to be based respectively on
The individual corresponding input feature vector index of input feature vector list generates multiple output aspect indexings and based on individually corresponding input
Characteristic value generates multiple output characteristic values.One group of operation S120 and S130 are executed respectively as based on individual corresponding input feature vector
A part, neural network device 130 can having based on the output index for determining not interfere with output result during operation
Limit selection is filtered come the limited selection for exporting index in being indexed to the multiple output, so that the multiple output rope
Draw remaining selection for being filtered into the output index including that can influence to export result during operation.Neural network device 130 can
Output feature list is stored in memory.Memory can be located at 130 inside of neural network device or can be positioned at nerve
Memory outside network equipment 130 (for example, memory 140 shown in Fig. 1).In some exemplary embodiments, neural network
130 compressible output feature list of device simultaneously stores compressed output feature list in memory.
In the exemplary embodiment, if output feature list is last layer for neural network, neural network device
130, which can be based on output feature list, generates information signal.
Neural network device 130 can by each input feature vector index and each input feature vector value execute operation and
Filter out do not interfered with during operation output result output index (such as it is the multiple output index in it is more limited
Output index) reduce operand.In addition, based on index operation, neural network device 130 can easily handle neural network
Various operations.As a result, based on said one or multiple operations is executed, include the function of the electronic system of neural network device 130
Property can be improved.
Fig. 5 is the flow chart according to the convolution algorithm method based on index of some exemplary embodiments of concept of the present invention.
Operation method shown in Fig. 5 can execute in neural network device 130 shown in Fig. 1.
With reference to Fig. 5, in operating S210, neural network device 130 can be from input feature vector mapping graph (that is, input feature vector square
Battle array) generate input feature vector list.Input feature vector list may include in the output feature with input feature vector matrix each is corresponding
Input feature vector index and input feature vector value.Input feature vector index may include corresponding with the rows and columns of input feature vector matrix respectively
First index and the second index.Neural network device 130 can be generated has nonzero value at least one of input feature vector matrix
The corresponding input feature vector list of input feature vector.
Later, neural network device 130 can be based on based on pre-stored input feature vector list and weighted list to execute
The convolution algorithm of index.
In operating S220, neural network device 130 can be based on input feature vector index and weight index generates output feature
Index.Neural network device 130 can be defeated by executing operation (" the first operation ") generation to input feature vector index and weight index
Go out aspect indexing.
Neural network device 130 can by pair with the input feature vector of nonzero value corresponding input feature vector index and with tool
There is the corresponding weight index of the weight of nonzero value to execute operation to generate output aspect indexing.
Specifically, neural network device 130 can carry out phase Calais's generation by being indexed with weight to input feature vector index
Export aspect indexing.The first index phase that the first index that neural network device 130 can index input feature vector is indexed with weight
Add and is added the second index that input feature vector indexes with the second index that weight indexes.
In operating S230, neural network device 130 can be based on input feature vector value and weighted value generates and output feature rope
Draw corresponding output characteristic value.Neural network device 130 can execute data operation by being based on input feature vector value and weighted value
(" the second operation ") generates output characteristic value.Input feature vector value can be multiplied by weighted value and can be based on by neural network device 130
Output characteristic value is generated by the multiplication value that the multiplication obtains.Neural network device 130 can by will with output aspect indexing pair
The multiple multiplication value phases Calais answered generates output characteristic value.Output characteristic value and weighted value can be non-zero.
Neural network device 130 can by operating S220 based on input feature vector index and weighted list in weight rope
Draw execution index operation and data operation is executed to execute based on rope based on input feature vector value and weighted value in operating S230
The convolution algorithm drawn.In the exemplary embodiment, if output is characterized in last layer for neural network, neural network device
130, which can be based on output characteristic value, generates information signal.
In some exemplary embodiments, convolution behaviour's operation method based on index may also comprise wherein neural network device
The operation of weighted list is generated from weight matrix.For example, neural network device 130 can be from external (for example, neural network fills
Set 130 outside, or equipped with neural network device 130 electronic device external server) receive weight matrix, and can
Weighted list is generated from weight matrix.Weighted list may include corresponding with each in weight included in weight matrix
Weight indexes and weighted value.Neural network device 130 can generate and the weight of weight matrix with nonzero value at least one of
Corresponding weighted list.Neural network device 130 can store weighted list and can use weight rope in operating S220 and S230
Draw and weighted value.However, concept of the present invention is not limited only to this.Neural network device 130 can be from outside (for example, neural network
The outside of device 130, or equipped with neural network device 130 electronic device external server) receive weighted list, and
The weighted list can be stored and then use the weighted list.
Fig. 6 is the figure of convolution algorithm.Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F are convolution algorithms shown in Fig. 6
The figure of the snapshot of significance arithmetic result in the process.
Specifically, Fig. 6 show based on sparse non-zero Distribution value input feature vector matrix and weight matrix execute
Convolution algorithm.Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F show to be illustrated respectively in common neural network and use
Traversal convolution algorithm (traversal convolution operation) during significance arithmetic result S710, S720,
The snapshot of S730, S740, S750 and S760.
With reference to Fig. 6, to including non-zero input feature vector f1,1、f1,4And f4,3Input feature vector matrix IFMX and including non-zero weigh
Weight W0,1And W2,2The result of convolution algorithm (being expressed as " * ") that carries out of weight matrix WMX can be expressed as including respectively with the
One position to the 6th position 1., 2., 3., 4., 5. and 6. it is corresponding output feature output eigenmatrix OFMX.
As described above, when executing convolution algorithm, the input feature vector with zero and/or the weight with zero will not shadows
Ring operation result.Although many snapshots can be generated during traversing convolution algorithm, only Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D,
Six fast notes shown in Fig. 7 E and Fig. 7 F influence operation result.Such as Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F institutes
Show, output feature can correspond to non-zero input feature vector f1,1、f1,4And f4,3With non-zero weight W0,1And W2,2In each into
The result of row convolution.
Fig. 8 A, Fig. 8 B and Fig. 8 C are for explaining according to some exemplary embodiments of concept of the present invention based on index
The figure of convolution algorithm.Fig. 8 A, Fig. 8 B and Fig. 8 C show the convolution based on index executed to non-zero input feature vector and non-zero weight
Operation.
Fig. 8 A show the generation of input feature vector list IFL.With reference to Fig. 8 A, neural network device 130 can be special relative to input
The non-zero for levying matrix IFMX inputs (for example, input feature vector f1,1、f1,4And f4,3) generate input feature vector list IFL.Input feature vector
List IFL can include input feature vector index RA and CA and input feature vector value DATA for each input feature vector.
Fig. 8 B show the generation of weighted list WL.The generation of weighted list WL is similar to the generation of input feature vector list IFL.
The operation being adjusted is indexed to the weight in weighted list WL however, can extraly be executed to convolution algorithm.Shown in Fig. 8 B
Weighted list WL generation can to neural network device 130 (shown in Fig. 1) provide weight server in execute or can
It is executed based on the weight matrix provided from server in included pretreatment circuit in neural network device 130.For just
For the sake of explanation, it is assumed that weighted list WL shown in Fig. 8 B is generated in neural network device 130.
Neural network device 130 can for weight matrix WMX non-zero weight (for example, weight W0,1And W2,2) generate initially
Weighted list IWL.The weight index of initial weight list IWL indicates weight W0,1And W2,2In the spatial position (example of each
Such as, address).Such weight index is referred to alternatively as " initial weight index ".
Later, initial weight index can be adjusted to corresponding with certain operations.The adjustment may include:Neural network device
130 are indexed by being based on the weight that weight biasing index (for example, (RA, CA)=(1,1)) is formed in initial weight list IWL
The mirror image of (" initial weight index ") generates mirror image weighted list MWL, the weight biasing index instruction weight matrix WMX's
Center.
Neural network device 130 can be by subtracting from the weight of mirror image weighted list MWL index (" mirror image weight index ")
Weight biasing index (that is, (RA, CA)=(1,1)) is biased to be indexed to mirror image weight.As a result, can generate (1,0) and (-
1, -1) it is used as each weight W0,1And W2,2Weight index, and the weighted list WL for convolution algorithm can be generated.
Fig. 8 C show the operation carried out to input feature vector and weight based on index.With reference to Fig. 8 C, neural network device 130 can
Input feature vector value is multiplied by weighted value by input feature vector index and weight index phase adduction.
It for example, can be by each input feature vector f1,1、f1,4And f4,3Input feature vector index (1,1), (1,4) and (4,
3) each in and weight W0,1Weight index (1,0) be added so that can generate output aspect indexing (2,1), (2,4),
And (5,3).At this point, the first index RA that can index each input feature vector is added with the first index RA that weight indexes and can
The second index CA that each input feature vector indexes is added with the second index CA that weight indexes.
By input feature vector f1,1、f1,4And f4,3In the input feature vector value of each be multiplied by weight W0,1Weighted value so that
Obtaining can be for weight W0,1Generate the first output feature list OFL1.In addition, can be by each input feature vector f1,1、f1,4And f4,3's
Input feature vector indexes each and weight W in (1,1), (1,4) and (4,3)2,2Weight index (- 1, -1) be added, and will
Input feature vector f1,1、f1,4And f4,3In the input feature vector value of each be multiplied by weight W2,2Weighted value so that can for power
Weight W2,2Generate the second output feature list OFL2.
Since Chong Die output is not present between the first output feature list OFL1 and the second output feature list OFL2
Aspect indexing, therefore the output feature in the first output feature list OFL1 and the output in the second output feature list OFL2 are special
Sign can be mapped on matrix, without additional operation.As can be seen that output eigenmatrix OFMX and Fig. 6 shown in Fig. 8 C
Shown matrix is identical.
Traversal convolution algorithm can be related to redundancy in itself because of traversal.Thus it is not easy to skip to zero
The operation (that is, not interfering with the meaningless operation of output feature) that input feature vector and weight carry out.However, when as shown in Figure 8 C
When using according to the convolution algorithms based on index of some exemplary embodiments of concept of the present invention, neural network device 130 is based on
Non-zero inputs and non-zero weight executes the operation based on index, to eliminate meaningless operation.As a result, operand is subtracted
It is few.
Fig. 9 A and Fig. 9 B are for explaining the convolution fortune based on index according to some exemplary embodiments of concept of the present invention
The figure of calculation.Fig. 9 A show the generation of input feature vector index.Fig. 9 B are shown based on shown in the index of input feature vector shown in Fig. 9 A and Fig. 8 B
The convolution algorithm based on index that weight index executes.
With reference to Fig. 9 A, neural network device 130 can input the non-zero of input feature vector matrix IFMX (for example, input is special
Levy f1,1、f1,4、f3,2And f4,3) generate input feature vector list IFL.Input feature vector list IFL can be for each input feature vector packet
Include input feature vector index RA and CA and input feature vector value DATA.When being compared with Fig. 8 A to Fig. 9 A, input feature vector f3,2Quilt
It is added to input feature vector matrix IFMX, and therefore, with input feature vector f3,2Corresponding input feature vector index (3,2) and input feature vector
Value f3,2It is added to input feature vector list IFL.
When based on convolution of the execution of weighted list WL shown in input feature vector list IFL shown in Fig. 9 A and Fig. 8 B based on index
When operation, it can generate for weight W0,1The first output feature list OFL1 and for weight W2,2Second output feature list
OFL2, as shown in Figure 9 B.At this point, there are Chong Die between the first output feature list OFL1 and the second output feature list OFL2
Output aspect indexing (2,1).It can be pair with output aspect indexing (2,1) corresponding multiple characteristic values (that is, f1,1×W0,1And f3,2
×W2,2) be added, and addition result can be generated as the corresponding output characteristic value with output aspect indexing (2,1).
According to the present example of concept of the present invention, when using the convolution algorithm based on index, neural network device 130
It index operation can be used to generate output aspect indexing and generate output characteristic value using data operation.However, when defeated for one
Go out aspect indexing and there is the output aspect indexing of overlapping (that is, when there are multiple data for an output aspect indexing
Operation result (that is, multiplication value)) when, the multiple multiplication value can be added to generate and the output by neural network device 130
The corresponding output characteristic value of aspect indexing.
Above by reference to as described in Fig. 8 A to Fig. 9 B, neural network device 130 can be based on index to the input with nonzero value
Feature and weight execute convolution algorithm.Therefore, the required operand of convolution algorithm can be reduced.Therefore, neural network device 130
Arithmetic speed can be improved and the power consumption of neural network device 130 can reduce.
Figure 10 is the flow chart according to the zero-padding method based on index of some exemplary embodiments of concept of the present invention.
Referring to Fig.1 0, in operation s 310, neural network device 130 can generate input feature vector list.For example, neural
Network equipment 130 can generate input feature vector list, the input feature vector list packet from the input feature vector mapping graph of matrix form
Include the index and data for each input feature vector in the input feature vector with nonzero value.
In operating S320, biasing index can be added to each index of input feature vector list by neural network device 130.
Therefore, neural network device 130 can perform zero padding.This 1A and Figure 11 B will be elaborated referring to Fig.1.
Figure 11 A are the figures of the example to input feature vector mapping graph IFM application zero paddings in neural network.Figure 11 B are to use
In the figure for explaining the zero-padding method based on index according to some exemplary embodiments of concept of the present invention.In the drawings, each
Number at the top of a pixel is that the number at the index and each pixel bottom of input feature vector is input feature vector value.
Zero padding in neural network is reflected to input feature vector in all outwardly directions (that is, line direction and column direction)
Penetrate figure IFM additions zero.When to input feature vector mapping graph IFM application zero paddings, the input feature vector with zero padding can be generated and reflected
Penetrate figure (that is, input feature vector mapping graph IFM_Z through zero padding).When being added to each of input feature vector mapping graph IFM by one zero
When a outwardly direction, as shown in Figure 11 A, the position (that is, index) of each input feature vector can increase by 1.For example, it inputs
Feature D0,0Index (0,0) become (1,1).As described above, when in each outwardly direction will " n (and wherein " n " be to
It is 1 integer less) " a zero when being added to input feature vector mapping graph IFM, the index of each input feature vector can increase " n ".Each
(hereinafter referred to as the length (length of zero value) of zero or zero are long for zero number added on a direction
Spend (zero-value length)) " n " can with application zero padding after based on input feature vector execute operation type and
Characteristic and change.
When during traversing convolution algorithm to the input feature vector mapping graph IFM application zero paddings in matrix form, it can produce
The raw output Feature Mapping figure having with the same size of input feature vector mapping graph IFM.God for executing traversal convolution algorithm
It needs to include control logic through network equipment, control logic is added to input feature vector mapping graph IFM to support zero padding by zero.
Figure 11 B are for explaining the zero-padding method based on index according to some exemplary embodiments of concept of the present invention
Figure.Specifically, Figure 11 B show the input feature vector mapping graph IFMa of the input feature vector with nonzero value and removed by pair
Input feature vector mapping graph IFMa is filled through input feature vector mapping graph IFM_Za using zero's generated based on the zero padding of index.
In Figure 11 B, input feature vector mapping graph IFMa and IFM_Za be input feature vector list and for convenience of explanation for the sake of be represented as square
Formation formula.IFMa is referred to alternatively as initial input feature list.
It can skip the operation carried out to the input feature vector with zero in the neural network computing based on index.Work as use
When zero padding, neural network device 130 can generate the input feature vector mapping graph IFMa of the input feature vector including having nonzero value
(that is, initial input feature list) and can generate removed by input feature vector mapping graph IFMa apply the zero padding based on index
Zero for filling and generating is filled through input feature vector mapping graph IFM_Za (that is, being filled through input feature vector list).Illustrate again, nerve
Network equipment 130 can generate initial input feature list IFMa, and initial input feature list IFMa includes and the input feature vector
The corresponding initial input aspect indexing in position and input feature vector value corresponding with the input feature vector.
For execute the neural network computing based on index neural network device 130 can by be based on biasing index (z,
Z) index is remapped in input feature vector list (that is, in input feature vector mapping graph IFMa of tabular form) to generate warp
Input feature vector mapping graph IFM_Za is filled, biasing index (z, z) is also referred herein as " characteristic offset index ".Citing comes
It says, biasing index (z, z) can be added to the index of the input feature vector of input feature vector mapping graph IFMa with right by neural network device 130
The index is remapped.At this point, biasing index (z, z) can be determined according to zero length.For example, when special in input
When being added to input feature vector mapping graph IFM by one zero on all outward directions of sign mapping graph IFM, as shown in Figure 11 A, also
It is to say, when zero length is 1, biasing index (z, z) can be set to (1,1).When zero length is 2, biasing index (z,
Z) it can be set to (2,2).When zero length is " n ", biasing index (z, z) can be set to (n, n).As described above, can
Biasing index (z, z) is set based on zero length.
Figure 11 B show to add one zero situation on all outward directions of input feature vector mapping graph IFMa wherein
In removed zero-suppress be filled through input feature vector mapping graph IFM_Za.Neural network device 130 can be by that will bias index (1,1)
The index of input feature vector mapping graph IFMa is added to be remapped to the index of input feature vector.For example, biasing is indexed
(1,1) it is added to the input feature vector D of input feature vector mapping graph IFMa0,0Index (0,0) so that input feature vector D0,0Index can
It is remapped to (1,1) from (0,0).Biasing index (1,1) is added to input feature vector D2,3Index (2,3) so that input feature vector
D2,3Index can be remapped to (3,4) from (2,3).Biasing index (1,1) can be added to input feature vector by neural network device 130
The input feature vector D of mapping graph IFMa0,0To D5,5In the index of each, to generate removed zero-suppress be filled through input feature vector
Mapping graph IFM_Za.
As described above, the neural network device 130 for executing the neural network computing based on index can be based on according to zero
The biasing index (z, z) of value length setting remaps the index of the input feature vector mapping graph IFMa in tabular form, from
And easily generate removed zero-suppress be filled through input feature vector mapping graph IFM_Za, without using for the individual of zero padding
Control logic.
Figure 12 is to use stride in the convolution algorithm based on index according to some exemplary embodiments of concept of the present invention
Method flow chart.Stride can be executed during convolution algorithm and can be executed in operation S220 and S230 shown in Fig. 5.
Referring to Fig.1 2, neural network device 130 input feature vector index can be indexed with weight in operating S410 be added and
Result (mutually indexing) divided by stride length can be will add up in operating S420.
In operating S430, neural network device 130 can determine whether that the division whether there is remainder.When there are remainder,
In operating S440, neural network device 130 can skip the operation carried out to input feature vector index and weighted value.When the division
There are when remainder, mutually indexing to be mapped on output Feature Mapping figure, and therefore, the data operation that index is carried out
As a result output Feature Mapping figure is not interfered with.Therefore, neural network device 130 can skip to input feature vector value and weighted value progress
Operation.
In addition, when remainder is not present in the division (for example, after division is completed), neural network device 130 is operating
Quotient may be selected in S450 to be used as output aspect indexing and can execute operation (example to input feature vector value and weighted value in operating S460
Such as, multiplication and addition).The output characteristic value that the operation values obtained by the operation are used as output aspect indexing can be provided.
For example, when in the knot for being added the input feature vector index of the first input feature vector with the weight of the first weight index
When remainder being not present after fruit divided by stride length, quotient may be selected as output aspect indexing, and can provide pair and the first input
The corresponding input feature vector value of feature and weighted value corresponding with the first weight execute the result of operation as output aspect indexing
Output characteristic value.It is indexed and the weight of the second weight index divided by stride length when by the input feature vector of the second input feature vector
When there is remainder later, the fortune carried out to the input feature vector index of the second input feature vector and the weight index of the second weight is not selected
The result of calculation is used as output aspect indexing.Therefore, can omit pair input feature vector value corresponding with the second input feature vector and with
The operation that the corresponding weighted value of second weight carries out.
As described above, by the operation carried out to index, stride can be easily used in the convolution algorithm based on index,
And operand can be reduced.
Figure 13 A and Figure 13 B are the figures of the output eigenmatrix generated when using stride in convolution.
Figure 13 A show wherein to apply matrix pixel by pixel the example of stride.Figure 13 B are shown wherein on matrix to every
The example of three pixel application strides.When stride length increases, exporting the size of eigenmatrix can reduce.When by Figure 13 A institutes
When showing that output eigenmatrix OFMX_S1 is compared with output eigenmatrix OFMX_S3 shown in Figure 13 B, it can be seen that in Figure 13 A
It is constituted with the output feature of dash box label in shown output eigenmatrix OFMX_S1 and exports eigenmatrix OFMX_ shown in Figure 13 B
S3, and the index of only output feature changes.
As described above, when using the convolution algorithm based on index according to some exemplary embodiments of concept of the present invention,
Input feature vector can be indexed and is added with weight index by neural network device 130, can be indexed the phase divided by stride, and ought be
Select quotient as output aspect indexing when remainder being not present after division.
For example, since stride length in figure 13a is 1, each in output eigenmatrix OFMX_S1 is defeated
The index for going out feature be by input feature vector index with weight index progress be added mutually index.
In the example shown in Figure 13 B, when remainder is not present after will add up index divided by stride length 3, it can produce
Output aspect indexing of the raw quotient as output eigenmatrix OFMX_S3.
Neural network device 130 can execute operation by pair input feature vector value corresponding with output aspect indexing and weighted value
To generate output characteristic value.Neural network device 130 can not be to input feature vector value not corresponding with output aspect indexing and weight
Value executes operation.
Figure 14 is the flow chart according to the pond method based on index of some exemplary embodiments of concept of the present invention.
Referring to Fig.1 4, in operation s 510 neural network device 130 can be based on sampling unit and index input feature vector to carry out
It remaps.One index can be remapped into pond window included multiple input feature.It can provide by remapping
Index the output aspect indexing as output Feature Mapping figure.
In operating S520, neural network device 130 can execute pond to the input feature vector having the same for remapping index
Change operation.In other words, pond operation can be executed to input feature vector included in the window of pond.Input feature vector can be executed most
Great Chiization or average pond.
In operating S530, neural network device 130 can provide the pond operation values obtained by pond operation be used as with
Export the corresponding output characteristic value of aspect indexing.Referring to Fig.1 5 are elaborated the pond method based on index.
Figure 15 is the figure for explaining the pond operation based on index according to some exemplary embodiments of concept of the present invention.
For ease of illustration for the sake of, Feature Mapping figure is represented as matrix form.
As described above with reference to Figure 2, in pond layer, the size of input feature vector mapping can reduce.Therefore, the ginseng of neural network
Number and operand can be reduced.As shown in figure 15,2 × 2 pond window PW can be applied to 10 × 10 input feature vector mapping graphs (A).When
When executing pond operation to each 2 × 2 sampling unit, 5 × 5 output Feature Mapping figures (C) can be generated.Although showing in fig.15
Go out 2 × 2 samplings, however sampling unit can be variously modified.
According to some exemplary embodiments, neural network device 130 can be based on index and execute pond.Neural network device
130 can be by input feature vector index divided by specific (or alternatively, scheduled) sampling length (" sub-sample size
(sub-sampling size) ") and may be selected the division quotient remap index (" with input feature vector as input
Corresponding output aspect indexing ").Therefore, as shown in the input feature vector mapping graph (B) remapped by index, index can
Input feature vector is remapped, and multiple input feature having the same according to sampling unit can remap index.Replay
It can be output aspect indexing to penetrate index, that is, by the spatial position of storage output feature in output eigenmatrix.By input feature vector
Value is stored in before the position according to corresponding output aspect indexing, can execute fortune to input feature vector value according to the type in pond
It calculates.
For example, when to input feature vector matrix application maximum pond (max pooling), it is possible to provide 2 × 2 samplings are single
Maximum value in member in included input feature vector value (that is, input feature vector value corresponding with an output aspect indexing) is used as
Output characteristic value corresponding with output aspect indexing.
In another example, when to input feature vector matrix application be averaged pond (average pooling) when, can pair with
One corresponding each input feature vector value of output aspect indexing is added, can will be by the obtained addition value divided by defeated of being added
Enter the number of characteristic value, and result of division can be provided as output characteristic value corresponding with output aspect indexing.However, of the invention
Concept is not limited only to these examples, and various types of ponds can be used.
The result of pond operation is executed as output when providing a pair input feature vector corresponding with each output aspect indexing
When characteristic value, output Feature Mapping figure (C) can be generated.
The various embodiments of the neural network computing method based on index are elaborated with reference to Fig. 4 to Figure 15 above.However,
Concept of the present invention is not limited only to these embodiments.The various operations used in various neural networks can be executed based on index.
Figure 16 is the block diagram according to the neural network device 200 of some exemplary embodiments of concept of the present invention.
Referring to Fig.1 6, in some exemplary embodiments, neural network device 200 is neural network device shown in Fig. 1
130.Therefore, the explanation of neural network device 130 can be applied to neural network device 200.
Neural network device 200 may include controller 220, neural network processor 210 and system storage 230.Nerve
Network equipment 200 may also comprise direct memory access (DMA) (direct memory access, DMA) controller to store data
In external memory.Neural network processor 210, controller 220 and the system storage 230 of neural network device 200 can
It is in communication with each other by system bus.Neural network device 200 can be implemented as semiconductor chip (for example, System on Chip/SoC
(system-on-chip, SoC)), but it is not limited only to this.Neural network device 200 can be by multiple semiconductor chips come real
Make.In the present embodiment, controller 220 and neural network processor 210 show the component of separation, but are not limited only to this, and
And controller 220 may include in neural network processor 210.
Controller 220 can be implemented as central processing unit or microprocessor.Controller 220 can control neural network device
200 all operations.In the exemplary embodiment, the journey of the executable instruction being stored in system storage 230 of controller 220
Sequence carrys out control neural network device 200.Controller 220 can control the behaviour of neural network processor 210 and system storage 230
Make.For example, controller 220 can be set and management parameters enable neural network processor 210 normally to execute nerve net
Each layer of network.
Controller 220 can generate weighted list from weight matrix and weighted list is provided to neural network processor 210.
However, concept of the present invention is not limited only to this.It may include being used in neural network device 200 or neural network processor 210
The individual processing circuit of weighted list is generated from weight matrix.
Neural network processor 210 may include multiple processing circuits 211.Each processing circuit 211 can be configured to concurrently
It runs simultaneously.In addition, processing circuit 211 can be run independently of one another.Each in processing circuit 211 can be implemented as executing
The core circuit of instruction.The executable operation based on index illustrated above by reference to Fig. 4 to Figure 15 of processing circuit 211.
Neural network device 210 can be by hardware circuit implementation.For example, neural network processor 210 can be implemented as
Integrated circuit.Neural network processor 210 may include at least one of the following:Central processing unit, multi-core processor, at array
Manage device, vector processor, digital signal processor (digital signal processor, DSP), field programmable gate array
(field-programmable gate array, FPGA), programmable logic array (programmable logic array,
PLA), application specific integrated circuit (application specific integrated circuit, ASIC), programmable patrol
Collect circuit system, video processor (video processing unit, VPU) and graphics processor (graphics
Processing unit, GPU).However, concept of the present invention is not limited only to this.
Neural network device 210 may also comprise internal storage 212.Internal storage 212 can be neural network processor
210 cache memory.Internal storage 212 can be static RAM, but be not limited only to this.It deposits inside
Reservoir 212 can be implemented as the buffer or cache memory or neural network processor of neural network processor 210
One kind in 210 other type memories.Internal storage 212 can store according to by operation that processing circuit 211 executes and
The data of generation, such as output aspect indexing, output characteristic value or the various data generated during operation.
System storage 230 can be implemented as random access memory (for example, dynamic random access memory or it is static with
Machine accesses memory).System storage 230 can be connected to neural network processor 210 by Memory Controller.System stores
Device 230 can store various types of programs and data.System storage 230 can be stored from external device (ED) (for example, server or outer
Portion's memory) provide weight mapping graph.
System storage 230 can buffer weight mapping corresponding with the next layer that will be executed by neural network processor 210
Figure.It, can be from external memory (for example, the storage in Fig. 1 when the right to use remaps figure execution operation in processing circuit 211
Device 140) it exports weight mapping graph and weight mapping graph is stored in the internal storage 212 of neural network processor 210 (at this
In in text also referred to as " second memory ") or in processing circuit 211 in included private memory.Weight mapping graph can
It is stored as matrix form (that is, weight matrix) or is stored as the tabular form (that is, weighted list) based on index.System is deposited
Reservoir 230 (also referred herein as " first memory ") (can will be also referred herein as " external from memory 140
Memory ") the weight mapping graph of output provides to included private memory in internal storage 212 or processing circuit 211
Interim storage weight mapping graph before.
System storage 230 also can the output Feature Mapping figure that is exported from neural network processor 210 of interim storage.
Figure 17 is the block diagram according to the neural network processor of some exemplary embodiments of concept of the present invention.Figure 17 is detailed
Neural network processor 210 shown in Figure 16 are shown.
Referring to Fig.1 7, neural network processor 210 may include at least one processing circuit 211, list maker (list
Maker) 213 and internal storage 212 (" second memory ").Neural network processor 210 may also comprise compressor reducer 214 and
Selector 215.Processing circuit 211 may include indexing re-mapper 21, the first data operation circuit 22 (" multiplier "), the second number
According to computing circuit 23 (" accumulator ") and private memory 24 (" third memory ").
List maker 213 can generate input feature vector list from input feature vector.List maker 213 is recognizable to have non-zero
The input of value and the input feature vector list for generating the input with nonzero value.
When received input feature vector is compressed input feature vector matrix, list maker 213 can will be described defeated
Enter eigenmatrix decompression to contract based on decompressed input feature vector matrix generation input feature vector list.When received input
When feature includes compressed input feature vector list, list maker 213 can generate input feature vector row by executing decompression
Table.
Selector 215 can be received by the input feature vector list exported from list maker 213 or from internal storage 212
Input feature vector list be optionally supplied to processing circuit 211.For example, selector 215 can in the first operational mode
Input feature vector list is provided to processing circuit 211 from list maker 213.First operational mode can be linear operation pattern.
For example, the first operational mode can be convolution pattern.Selector 215 in the second operational mode can by input feature vector list from
Internal storage 212 is provided to processing circuit 211.Second operational mode can be to use activation primitive (activation
Function pond pattern) or nonlinear model.For example, in the second operational mode, can perform pond operation or
Activation primitive can be applied to the output characteristic value generated in the first operational mode.
The executable index operation of index re-mapper 21 simultaneously generates output aspect indexing.Index re-mapper 21 it is executable with
The upper index operation illustrated with reference to Fig. 4 to Figure 15.Index re-mapper 21 may include arithmetic circuity.
Input feature vector list can be received from selector 215 and receive weight row from private memory 24 by indexing re-mapper 21
Table.Index re-mapper 21 can index input feature vector to be added with weight index and mutually be indexed with generating.Index remaps
Device 21 can will add up index divided by specific (or alternatively, scheduled) integer (for example, in the operation of pond
The step-length or sampling unit used).
Index re-mapper 21, which can be filtered the index generated, to be enabled to having in generated index
The index of meaning executes data operation.For example, index re-mapper 21 can be by generated index classification at output feature
Index and other indexes enable to special to output in the first data operation circuit 22 and/or the second data operation circuit 23
It levies output aspect indexing included in list and executes data operation.It indexes re-mapper 21 and can control the first data operation circuit
22 and/or second data operation circuit 23 with not to other index execute operation.
Index re-mapper 21 can ask to read the data being stored in private memory 24.For example, replay is indexed
Emitter 21 can ask private memory 24 to read weighted list.Illustrate again, index re-mapper 21 can be in the second operational mode
It is middle to transmit read requests signal to private memory 24, it the read requests signal and to read in the multiple parameter with first
The request of the corresponding parameter of input feature vector value is associated.Alternatively, index re-mapper 21 can ask special deposit
Reservoir 24 exports parameter (for example, output characteristic value in output feature list) corresponding with input feature vector value.
Private memory 24 is storable in executes the various data used during operation by processing circuit 211.For example,
Private memory 24 can store weighted list.Private memory 24 can also store look-up table, and the look-up table includes special with input
The corresponding parameter of value indicative.Weighted list is provided to index by the request that private memory 24 may be in response to index re-mapper 21
Re-mapper 21 and the first data operation circuit 22.Private memory 24 may also respond to index re-mapper 21 request by
Parameter is provided to the first data operation circuit 22 and the second data operation circuit 23.
First data operation circuit 22 and the second data operation circuit 23 can perform data operation.First data operation circuit
22 and second data operation circuit 23 can form data operation circuit.First data operation circuit 22 and the second data operation circuit
The 23 executable data operations illustrated above by reference to Fig. 4 to Figure 15.
First data operation circuit 22 can perform multiplying.First data operation circuit 22 may include multiplier.Work as place
When managing the execution convolution algorithm of circuit 211, the input feature vector value in input feature vector list can be multiplied by by the first data operation circuit 22
Weighted value in weighted list.Multiplication result may be provided to the second data operation circuit 23.First data operation circuit 22 can
By the array of multiplier come implementation.
Second data operation circuit 23 can perform add operation and also execute division arithmetic.In addition, the second data operation is electric
Road 23 can perform other various types of operations.Second data operation circuit 23 can be implemented as accumulator or arithmetical operation electricity
Road.Second data operation circuit 23 can be implemented as the array of computing circuit (operational circuit).For example,
Second data operation circuit 23 can be implemented as the array of accumulator.
Internal storage 212 can store the data exported from processing circuit 211.For example, internal storage 212 can be deposited
Store up the output aspect indexing received from the second data operation circuit 23 and corresponding output characteristic value.In other words, storage inside
Device 212 can store output feature list.In addition, internal storage 212 can be stored during operation and be exported from processing circuit 211
Intermediate result.Intermediate result may be provided to the second data operation circuit 23 to make in the operation of the second data operation circuit 23
With.
The data being stored in internal storage 212 can be provided to processing circuit 211 by selector 215.In other words
It says, the data obtained by the current operation of processing circuit 211 can be used for next operation.For example, because of processing circuit 211
Convolution algorithm and the output feature list that generates may be provided to processing circuit 211 and be used as input feature vector list and processing circuit
211 can execute pond operation to input feature vector list.
Meanwhile output feature list can be output to outside (for example, electronic system 100 from the second data operation circuit 23
Memory 140), or can be stored in internal storage 212 and then be exported.Output feature list can pass through compression
Device 214 is exported.Output feature list can be compressed and export compressed output feature list by compressor reducer 214.
The operation of the processor carried out according to operational mode is illustrated hereinafter with reference to Figure 18 and Figure 19.
Figure 18 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the first fortune
The figure of the state run in row pattern.First operational mode can be convolution algorithm pattern.
Referring to Fig.1 8, list maker 213 can receive input feature vector mapping graph IFM and generate input feature vector list.List
Input feature vector list can be provided to processing circuit 211 by maker 213.
It indexes re-mapper 21 and the first data operation circuit 22 can be respectively from the weight being stored in private memory 24
List reception weight indexes and weighted value corresponding with weight index.Index re-mapper 21 can receive weight index and the first number
Weighted value is can receive according to computing circuit 22.
Input feature vector index and weight index execution index operation and the first data operation can be based on by indexing re-mapper 21
Circuit 22 can execute data operation to input feature vector value and weighted value.Indexing re-mapper 21 can be by input feature vector index and weight
Index is added and can also execute division to the addition value to generate output characteristic value.
Index re-mapper 21 also can determine whether output aspect indexing is meaningful.When determining output aspect indexing is
When meaningless, index re-mapper 21 can control the not pair input corresponding with output aspect indexing of the first data operation circuit 22
Characteristic value and weighted value execute operation.Therefore, the first data operation circuit 22 can to only with significant output aspect indexing pair
The input feature vector value and weighted value answered execute operation.
Second data operation circuit 23 can by from the operation result that the first data operation circuit 22 exports with same output
The corresponding operation result of aspect indexing is added.Therefore, the first data operation circuit 22 and the second data operation circuit 23 are executable
Included multiplying and add operation in convolution algorithm.
The output feature list generated by convolution algorithm can be stored in internal storage by the second data operation circuit 23
In 212 or feature list can be exported by compressor reducer 214.
Figure 19 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the second fortune
The figure of the state run in row pattern.Second operational mode can execute after the first operational mode.In the second operational mode,
Activation primitive can be applied in the output characteristic value in the output feature list generated in the first operational mode.
Referring to Fig.1 9, the result of the operation executed in the first operational mode can be stored in internal storage 212.It lifts
For example, internal storage 212 can store output feature list (that is, executing convolution fortune to input feature vector list based on index
The result of calculation).
Input feature vector value can be received (that is, the output in output feature list from internal storage 212 by indexing re-mapper 21
Characteristic value).Private memory 24 (also referred herein as " third memory ") can store look-up table, and the look-up table includes
Parameter corresponding with input feature vector value.Illustrate again, the look-up table may include and each characteristic value pair in multiple characteristic values
The multiple parameters answered.Sign function (sign function), S type functions (sigmoid can be used in neural network
) or exponential function (exponential function) function.These activation primitives have non-linear.Look-up table can wrap
Include enable with nonlinear activation primitive as piecewise linear function (piecewise linear function) come into
Row calculates.The output " f " of the activation primitive of input feature vector value " v " can be expressed as applying piecewise linearity to input feature vector value " v "
Function as a result, as defined in formula 1:
F=c (v) v+b (v) (1)
Wherein c (v) is coefficient corresponding with input feature vector value " v ", and b (v) is biasing corresponding with input feature vector value " v "
Value.Look-up table may include and the different corresponding parameters of input feature vector value.
Index re-mapper 21 can ask to obtain parameter corresponding with input feature vector value " v " from private memory 24.This
Request may include transmitting read requests signal to private memory 24, the read requests signal and read the multiple parameter
In parameter corresponding with input feature vector value request it is associated.What the parameter received may include receiving from private memory 24
First parameter and the second parameter, wherein the first parameter and the second parameter correspond to input feature vector value.It therefore, can be special from being stored in
Look-up table in memory 24 exports parameter (that is, c (v) and b (v)) corresponding with input feature vector value " v ".Illustrate again, exports
Characteristic value can be based on input feature vector value, the first parameter and the second parameter and generate.
Parameter c (v) may be provided to the first data operation circuit 22 and parameter b (v) may be provided to the second data operation
Circuit 23.First data operation circuit 22 can be based on input feature vector value " v " and parameter c (v) executes multiplying and the second data
Computing circuit 23 can execute add operation based on the operation result received from the first data operation circuit 22 and parameter b (v).Knot
Fruit can generate the output " f " of the activation primitive of input feature vector value " v ".The output feature of the activation primitive of multiple input characteristic value
Value can be output to the outside of neural network processor.The output characteristic value of activation primitive can be before being output to outside by pressing
Contracting device 214 compresses.
Figure 20 is the figure of the data flow during convolution algorithm in neural network.
With reference to Figure 20, input feature vector mapping graph IFM and output Feature Mapping figure OFM can have three-dimensional matrice form.When holding
When row convolution algorithm, multiple kernel KN0 to KN4 with three-dimensional matrice form can be applied to input feature vector mapping graph IFM.Knot
Fruit can generate output Feature Mapping figure OFM.
Kernel KN0 to KN4 can be filter different from each other to obtain different characteristics from input feature vector mapping graph IFM.
Number phase of the number and the channel of input feature vector mapping graph IFM of included channel CH in each of kernel KN0 to KN4
Together.
When executing convolution algorithm, each in kernel KN0 to KN4 can be in the x-y plane of input feature vector mapping graph IFM
Upper translation.Therefore, can convolution algorithm be executed to channel to input feature vector mapping graph IFM and kernel KN0 to KN4 one by one.Citing
For, the channel CHk of kernel KN0 to KN4 can be applied to the channel CHk of input feature vector mapping graph IFM in convolution algorithm.When
By by a kernel in kernel KN0 to KN4 applied to input feature vector mapping graph IFM come when executing convolution algorithm, can be to each
A channel independently executes convolution algorithm.It can be to the output characteristic value with same spatial location (for example, being obtained from convolution algorithm
Output feature on the x-y plane identical position and correspond to different channels) be added.Therefore, by by kernel KN0
To one in KN4 output Feature Mapping is can correspond to execute the result of convolution algorithm applied to input feature vector mapping graph IFM
Scheme a channel of OFM.
When executing convolution algorithm based on the multiple kernel KN0 to KN4, multiple channels can be generated.As shown in figure 20, when
Based on five kernel KN0 to KN4 come when executing convolution algorithm, output Feature Mapping figure OFM may include five channels.
The convolution algorithm for using kernel KN0 to KN4 to carry out respectively can execute simultaneously in parallel.Convolution algorithm can be different
It is performed in parallel in processing circuit.However, this convolution algorithm can change with the hardware configuration of neural network.
Figure 21 and Figure 22 is holding in the neural network based on index according to some exemplary embodiments of concept of the present invention
The figure of data processing during capable convolution algorithm.Figure 21 shows to enable the convolution algorithm based on index in sparse nerve
The data processing efficiently executed in network, the sparse neural network is in input feature vector mapping graph and weight Feature Mapping figure
With sparse nonzero value.
As illustrated above by reference to Figure 20, the convolution algorithm based on kernel KN0 to KN4 can be respectively in different processing circuits
In execute simultaneously in parallel.However, according to the present example of concept of the present invention, when in the neural network based on index (and
Specifically, in sparse neural network) each channel of input feature vector mapping graph IFM is directed in different processing circuits
When executing convolution algorithm simultaneously in parallel, it can perform the operation to the input feature vector progress with nonzero value and can skip to having
The operation that the input feature vector of zero carries out.Since the input feature vector with nonzero value is in multiple letters of input feature vector mapping graph IFM
There is different spatial positions in road, each letter for input feature vector mapping graph IFM in different processing circuits can be passed through
Road executes operation to be conducive to skip the operation for carrying out zero respectively.
As described above, in order in different processing circuits for each channel parallel of input feature vector mapping graph IFM
Execute convolution algorithm, the neural network based on index to each kernel can divide and by the phase of each kernel with channel
Cochannel is grouped as a channel group again.
With reference to Figure 21, the channel of the first kernel KN0 to the 5th kernel KN4 shown in Figure 20 can be grouped again.For example,
The first channel of kernel KN0 to KN4 can be grouped as the first channel group CH0, and the second channel of kernel KN0 to KN4 again
It can be grouped as second channel group CH1 again.In this fashion, multiple channels of kernel KN0 to KN4 can be grouped again
At different channel groups.Since the channel number of each kernel is identical as channel number " n " of input feature vector mapping graph, because
This can generate " n " a channel group CH0 to CHn-1 by being grouped again.Each channel group is referred to alternatively as core.
When executing convolution algorithm, can be used in channel group CH0 to CHn-1 every with input feature vector mapping graph FIM
The corresponding channel group of one channel.For example, second channel that can be to input feature vector mapping graph IFM and second channel group
CH1 executes convolution algorithm.Each in channel group CH0 to CHn-1 includes the channel of kernel KN0 to KN4, and therefore, base
The result of the convolution algorithm of a channel group in channel group CH0 to CHn-1 can influence to export Feature Mapping figure OFM's
All first channels are to the 5th channel.In the convolution algorithm result for " n " a channel group, to being generated from a kernel
And export Feature Mapping figure OFM on correspond to a spatial position convolution algorithm result be added when, can complete defeated
Go out Feature Mapping figure.
With reference to Figure 22, can by different channels and spatial position having the same (that is, identical index) it is defeated
Enter feature IFB and carries out convolution from different channel groups.Due to according to some exemplary embodiments of concept of the present invention based on
Operation is carried out to nonzero value in the neural network of index, therefore operation will not be executed to the input feature vector with zero.Therefore, divide
Do not include the 6th of the 6th input feature vector F5 with zero with the first channel for including the first input feature vector F0 with zero
Channel and including having the operation of the corresponding processing circuit of the 9th channel of the 9th input feature vector F8 of zero that can be interrupted.
However, since neural network device 200 based on index is transported based on index corresponding with the input feature vector with nonzero value
Row, and the input feature vector with nonzero value is provided to corresponding processing circuit, therefore processing circuit can substantially be run to
Until completing the operation carried out to the input feature vector with nonzero value in each channel of input feature vector mapping graph IFM.
Figure 23 is the figure according to the neural network processor 210a of some exemplary embodiments of concept of the present invention.Neural network
Processor 210a can have be suitable for reference to Figure 21 and Figure 22 sparse neural network computing illustrated hardware configuration and can
For input feature vector mapping graph IFM each channel parallel execute operation.
It may include selector 215a, multiple processing circuit 211a_0 to 211a_ with reference to Figure 23, neural network processor 210a
K, and global accumulator (global accumulator) 216.Neural network processor 210a may also comprise list maker
And compressor reducer.
Neural network processor 210a can generate input feature vector row for each channel of input feature vector mapping graph IFM
Table.The input feature vector list of input feature vector included in each channel can be provided to processing circuit by selector 215a
One in 211a_0 to 211a_k.For example, selector 215a can be by the defeated of input feature vector included in the first channel
Enter feature list and be provided to the first processing circuit 211a_0, and can be by the input feature vector of input feature vector included in kth channel
List is provided to kth processing circuit 211a_k.
Processing circuit 211a_0 to 211a_k can correspond respectively to the channel of input feature vector mapping graph IFM.In other words, locate
Each in reason circuit 211a_0 to 211a_k can correspond to core (that is, one in channel group shown in Figure 21 and Figure 22
It is a).Knot of the structure of each processing circuit in processing circuit 211a_0 to 211a_k with processing circuit 211 shown in Figure 17
Structure is similar.However, each in processing circuit 211a_0 to 211a_k may include it is corresponding with an element of processing circuit 211
Multiple element to execute operation for multiple input feature parallel.
For example, the first processing circuit 211a_0 may include multiple index re-mapper 21a, multiple first data operations
Circuit 22a, multiple second data operation circuit 23a and private memory 24a.
Each in index re-mapper 21a may include arithmetic circuity.First data operation circuit 22a can be to multiply
The array of musical instruments used in a Buddhist or Taoist mass.Second data operation circuit 23a can be the array of adder.However, concept of the present invention is not limited only to this.The
Each in two data operation circuit 23a may also comprise arithmetic circuity.
Private memory 24a can store weighted list WL or look-up table LUT.When neural network processor 210a executes convolution
When operation, weight corresponding with the weight from weighted list WL can be indexed output to index and remapped by private memory 24a
Device 21a and weighted value corresponding with weight can be exported to the first data operation circuit 22a.Weighted list WL may include weight rope
Draw, weighted value and kernel corresponding with each weight index.Kernel indexes.
When neural network processor 210a executes nonlinear operation, private memory 24a can will be corresponding with input feature vector
Parameter be provided to the first data operation circuit 22a and the second data operation circuit 23a to support piecewise linear function.
The operation of first processing circuit 211a_0 is similar to the operation of processing circuit 211 that referring to Fig.1 7 to Figure 19 illustrate.
However, index re-mapper 21a can be performed in parallel index operation and the first data operation circuit 22a and the second data operation are electric
Road 23a can be performed in parallel data operation.
Other processing circuit 211a_1 to 211a_k can substantially comprise member identical with the first processing circuit 211a_0
Part and executable and substantially the same the first processing circuit 211a_0 operation.
Meanwhile certain values in the operation values exported from each processing circuit 211a_0 to 211a_k can correspond to output spy
Levy the same position of mapping graph.Therefore, global accumulator 216 can be to exporting but corresponding to output spy from different processing circuits
The operation values of same position on sign mapping graph are added.
At this point, due to the characteristic of sparse neural network, the operation values quilt that is exported from processing circuit 211a_0 to 211a_k
Be mapped in output Feature Mapping figure on position can random distribution, and from processing circuit 211a_0 to 211a_k simultaneously output fortune
It the position that calculation value is mapped to can be mutually the same on output Feature Mapping figure.When global accumulator 216 adds up from real time
When the operation values that reason circuit 211a_0 to 211a_k is exported, the load of global accumulator 216 can be excessively increased.
For this reason, the second number included in each processing circuit in processing circuit 211a_0 to 211a_k
According to computing circuit 23a can according on output Feature Mapping figure spatial position and channel come to from the first data operation circuit 22a
The operation values of output are added generates addition value to be directed to each spatial position and channel.Processing circuit 211a_0 is extremely
211a_k can be synchronized to export addition value.Each in second data operation circuit 23a may include static random-access
Memory bank (SRAM bank) with according to output Feature Mapping figure on spatial position and channel come to from the first data transport
The operation values for calculating circuit 22a outputs are added.
The addition value exported from processing circuit 211a_0 to 211a_k can be according to the corresponding position on output Feature Mapping figure
It sets and is outputted as vector data.Global accumulator 216 can add up to vector data.
Figure 24 is the volume executed in the neural network based on index according to some exemplary embodiments of concept of the present invention
The figure of data processing during product operation.Figure 24 shows to enable the convolution algorithm based on index in intensive neural network
The data processing efficiently executed, the intensive neural network have close in input feature vector mapping graph and weight Feature Mapping figure
The nonzero value of collection.
Since intensive neural network is with sparse input feature vector or weight with zero, can be transported by simplifying
The operation that step rather than skip carries out zero in calculation step is calculated to be effectively carried out operation.
With reference to Figure 24, can each in input feature vector mapping graph IFM and kernel KN0 to KN4 be subjected to convolution.Based on each
The convolution algorithm of a kernel KN0 to KN4 can be performed in parallel in different processing circuits.
As illustrated above by reference to Figure 20, when to a kernel in input feature vector mapping graph IFM and kernel KN0 to KN4
When executing convolution algorithm, convolution algorithm is that same channel is executed.In the operation values obtained by convolution algorithm, can pair with
Indicate that the corresponding operation values of output aspect indexing of a spatial position on output Feature Mapping figure OFM are added.To defeated
A channel of output Feature Mapping figure OFM can be formed by entering the convolution algorithm that Feature Mapping figure IFM and a kernel carry out.
Input feature vector corresponding with the input feature vector index of a spatial position is indicated can be expressed by input feature value.
Weight corresponding with the weight index of a spatial position is indicated can be expressed by weight vectors.Therefore, input feature vector mapping graph
It may include that input feature vector index and input feature value corresponding with input feature vector index and weighted list may include that weight indexes
And weight vectors corresponding with weight index.For example, each kernel in kernel KN0 to KN4 shown in Figure 24 can have
Nine indexes and weighted list may include nine indexes and respectively weight vectors corresponding with nine indexes.
Input feature vector index is added with weight index to generate output aspect indexing.It can be by feature vector and weight
The dot product (dot product) of vector is exported as operation values corresponding with output aspect indexing.For an output feature
Multiple operation values may be present in index.Each operation values can be added to generate output feature corresponding with aspect indexing is exported
Value.
Figure 25 is the figure according to the neural network processor 210b of some exemplary embodiments of concept of the present invention.Shown in Figure 25
Neural network processor 210b, which can have, is suitable for the hardware configuration with reference to intensive neural network computing described in Figure 24 and can needle
Operation is performed in parallel to each kernel.
It may include multiple processing circuit 211b_0 to 211b_k with reference to Figure 25, neural network processor 210b.Neural network
Processor 210b may also comprise by processing circuit 211b_0 to the 211b_k internal storages shared or support each processing electricity
Multiple internal storages of road 211b_0 to 211b_k.Neural network processor 210b may also comprise list maker and compression
Device.
Processing circuit 211b_0 to 211b_k can correspond respectively to different kernels.Processing circuit 211b_0 is to 211b_k's
Structure is similar to the structure of processing circuit 211 shown in Figure 17.However, since processing circuit 211b_0 to 211b_k can calculate vector
Dot product, therefore each in processing circuit 211b_0 to 211b_k may include address remapped device 21b, multiple first data fortune
Calculate circuit 22b and multiple second data operation circuit 23b.Each in processing circuit 211b_0 to 211b_k may include using
In the private memory 24b of storage weighted list.Weighted list may include weight index and weight corresponding with weight index to
Amount.
Address remapped device 21b may include arithmetic circuity.First data operation circuit 22b can be the battle array of multiplier
Row.Second data operation circuit 23b can be the array of adder.Address remapped device 21b can be special to the input received from outside
Sign index and the weight index provided from private memory 24b execute operation, and the first data operation circuit 22b can be by input feature vector
Value is multiplied by weighted value, and the multiplication value obtained by the multiplication can be added by the second data operation circuit 23b.Therefore, can pair with
Input feature vector indexes corresponding input feature value and weight vectors corresponding with weight index execute dot product.
Although concept of the present invention is specifically illustrated and elaborated with reference to the embodiment of concept of the present invention, it should be understood, however, that
Under conditions of the spirit and scope of following claims, the various changes in form and details can be made.
Claims (20)
1. a kind of method of operation neural network device, which is characterized in that the method includes:
Input feature vector list is generated based on input feature vector mapping graph, the input feature vector list includes input feature vector index and input
Characteristic value, the input feature vector index and the input feature vector value correspond to input feature vector;
Based on the first operation that the weight index to input feature vector index and weighted list carries out, output feature rope is generated
Draw;And
Based on the second operation for indexing corresponding weighted value to the input feature vector value and with the weight and carrying out, generate with it is described
Export the corresponding output characteristic value of aspect indexing.
2. according to the method described in claim 1, it is characterized in that, the generation input feature vector list includes:Based on institute
It is special to generate the input to state at least one input feature vector with nonzero value in the multiple input feature of input feature vector mapping graph
Levy list.
3. according to the method described in claim 1, it is characterized in that, the weighted list includes at least one weight index and extremely
A few weighted value, at least one weight index and at least one weighted value correspond to multiple power of weight mapping graph
At least one of weight weight, at least one weight have nonzero value.
4. according to the method described in claim 1, it is characterized in that, the generation output characteristic value includes:It will be described defeated
Enter characteristic value and is multiplied by the weighted value.
5. according to the method described in claim 1, it is characterized in that, the generation output characteristic value includes:
Multiplication value is generated based on the input feature vector value is multiplied by the weighted value, it is special that the multiplication value corresponds to the output
Sign index;And
Based on multiple multiplication value phases Calais is generated the output characteristic value, the multiple multiplication value corresponds to the output feature
Index.
6. according to the method described in claim 1, it is characterized in that, the generation output aspect indexing includes will be described defeated
Enter aspect indexing to be added with weight index.
7. according to the method described in claim 6, it is characterized in that, the generation output aspect indexing further comprises:
It will be added obtained addition value divided by integer by described;And
Select the quotient of the division as the output feature based on determining that there is no remainder after the division is completed
Index.
8. according to the method described in claim 1, it is characterized in that, the generation input feature vector list includes:
Initial input feature list is generated, the initial input feature list includes corresponding with the position of the input feature vector first
Beginning input feature vector indexes and the input feature vector value corresponding with the input feature vector, and
It is special to generate the input being filled with zero based on characteristic offset index is added to the initial input aspect indexing
Sign index.
9. according to the method described in claim 1, it is characterized in that, further comprising generating the weight row from weight mapping graph
Table.
10. according to the method described in claim 9, it is characterized in that, the generation weighted list includes:
Initial weight list is generated, the initial weight list includes initial weight index corresponding with the position of weight and comes from
The weighted value of the weight of the weight mapping graph, and
Initial weight index is adjusted to correspond to certain operations.
11. according to the method described in claim 10, it is characterized in that, the adjustment initial weight index includes:
Index is biased based on weight to form the mirror image of the initial weight index, and the weight biasing index indicates the weight
The center of the matrix of mapping graph, and
The weight biasing index is subtracted from mirror image weight index.
12. according to the method described in claim 1, it is characterized in that, generating information signal based on the output characteristic value.
13. a kind of method of operation neural network device, which is characterized in that the method includes:
Input feature vector list is generated, the input feature vector list includes input feature vector corresponding with having the input feature vector of nonzero value
Index and input feature vector value, the input feature vector index indicate position of the input feature vector on input feature vector mapping graph;
Based on the index operation carried out to input feature vector index, output aspect indexing is generated;And
Based on the data operation carried out to the input feature vector value, output feature corresponding with the output aspect indexing is generated
Value.
14. according to the method for claim 13, which is characterized in that described to generate the output aspect indexing and include:By institute
It states input feature vector index to be added with the weight of weighted list index, the weight index corresponds to the weight with nonzero value.
15. according to the method for claim 14, which is characterized in that described to generate the output characteristic value and include:It will be described
Input feature vector value is multiplied by weighted value corresponding with weight index.
16. according to the method for claim 13, which is characterized in that described to generate the output aspect indexing and include:
Based on specific sub-sample size, the input feature vector is indexed and executes division, and
Select the quotient of the division as the output aspect indexing, the output aspect indexing corresponds to the input feature vector.
17. according to the method for claim 13, which is characterized in that generate the output characteristic value include will be with the output
The corresponding output characteristic value of aspect indexing is calculated as:
Maximum value in multiple input characteristic value corresponding with the output aspect indexing, or
The average value of the multiple input feature vector value.
18. a kind of method of operation neural network device, which is characterized in that including:
Input feature vector list, the input feature vector row are generated based on input feature vector mapping graph using the list maker of processor
Table includes input feature vector index and input feature vector value, and it is special that the input feature vector index and the input feature vector value correspond to input
Sign;And
The index re-mapper of the processor is set to execute the first operation to generate output aspect indexing, the first operation packet
It includes:
Input feature vector index is added with the weight of weighted list index,
The addition value divided by integer that will be obtained by the addition, and
Select the quotient of the division as the output feature based on determining that there is no remainder after the division is completed
Index.
19. according to the method for claim 18, which is characterized in that further comprise:
Data operation circuit is set to execute the second operation to the input feature vector value and weighted value corresponding with weight index, with
Generate output characteristic value corresponding with the output aspect indexing.
20. according to the method for claim 19, which is characterized in that described to generate the output characteristic value and include:
Multiplication value is generated based on the input feature vector value is multiplied by the weighted value, it is special that the multiplication value corresponds to the output
Sign index;And
Based on multiple multiplication value phases Calais is generated the output characteristic value, the multiple multiplication value corresponds to the output feature
Index.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2017-0027778 | 2017-03-03 | ||
KR1020170027778A KR102499396B1 (en) | 2017-03-03 | 2017-03-03 | Neural network device and operating method of neural network device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108537325A true CN108537325A (en) | 2018-09-14 |
CN108537325B CN108537325B (en) | 2024-06-07 |
Family
ID=63355193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810167217.XA Active CN108537325B (en) | 2017-03-03 | 2018-02-28 | Method of operating a neural network device |
Country Status (4)
Country | Link |
---|---|
US (2) | US11295195B2 (en) |
KR (1) | KR102499396B1 (en) |
CN (1) | CN108537325B (en) |
TW (1) | TWI765979B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726633A (en) * | 2018-11-23 | 2019-05-07 | 成都品果科技有限公司 | A kind of face critical point detection method based on look-up table activation primitive |
CN112364032A (en) * | 2021-01-12 | 2021-02-12 | 浙江正元智慧科技股份有限公司 | Data center data query method based on Internet technology |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10878310B2 (en) * | 2016-11-29 | 2020-12-29 | Mellanox Technologies, Ltd. | Accelerated convolution in convolutional neural networks |
US10474458B2 (en) | 2017-04-28 | 2019-11-12 | Intel Corporation | Instructions and logic to perform floating-point and integer operations for machine learning |
JP7072640B2 (en) * | 2017-05-19 | 2022-05-20 | モヴィディウス リミテッド | Methods, systems, and equipment to improve convolution efficiency |
US10489542B2 (en) * | 2018-04-24 | 2019-11-26 | Nvidia Corp. | Machine learning based post route path delay estimator from synthesis netlist |
TWI680409B (en) * | 2017-07-08 | 2019-12-21 | 英屬開曼群島商意騰科技股份有限公司 | Method for matrix by vector multiplication for use in artificial neural network |
JP2019036899A (en) * | 2017-08-21 | 2019-03-07 | 株式会社東芝 | Information processing unit, information processing method and program |
US10366322B2 (en) * | 2017-10-06 | 2019-07-30 | DeepCube LTD. | System and method for compact and efficient sparse neural networks |
DE102018203709A1 (en) * | 2018-03-12 | 2019-09-12 | Robert Bosch Gmbh | Method and device for memory-efficient operation of a neural network |
US10572568B2 (en) * | 2018-03-28 | 2020-02-25 | Intel Corporation | Accelerator for sparse-dense matrix multiplication |
US11782839B2 (en) * | 2018-08-21 | 2023-10-10 | Neuchips Corporation | Feature map caching method of convolutional neural network and system thereof |
US11467973B1 (en) * | 2018-09-28 | 2022-10-11 | Amazon Technologies, Inc. | Fine-grained access memory controller |
US11615505B2 (en) * | 2018-09-30 | 2023-03-28 | Boe Technology Group Co., Ltd. | Apparatus and method for image processing, and system for training neural network |
US11610111B2 (en) * | 2018-10-03 | 2023-03-21 | Northeastern University | Real-time cognitive wireless networking through deep learning in transmission and reception communication paths |
CN110770763A (en) * | 2018-10-08 | 2020-02-07 | 深圳市大疆创新科技有限公司 | Data storage device, method, processor and removable equipment |
US20200143226A1 (en) * | 2018-11-05 | 2020-05-07 | Samsung Electronics Co., Ltd. | Lossy compression of neural network activation maps |
KR102137151B1 (en) * | 2018-12-27 | 2020-07-24 | 엘지전자 주식회사 | Apparatus for noise canceling and method for the same |
US11488016B2 (en) | 2019-01-23 | 2022-11-01 | Google Llc | Look-up table based neural networks |
KR20200091623A (en) * | 2019-01-23 | 2020-07-31 | 삼성전자주식회사 | Method and device for performing convolution operation on neural network based on Winograd transform |
KR20200094534A (en) * | 2019-01-30 | 2020-08-07 | 삼성전자주식회사 | Neural network apparatus and method for processing multi-bits operation thereof |
US11934342B2 (en) | 2019-03-15 | 2024-03-19 | Intel Corporation | Assistance for hardware prefetch in cache access |
CN113396400A (en) | 2019-03-15 | 2021-09-14 | 英特尔公司 | System and method for providing hierarchical openly partitioned sectors and variable sector sizes for cache operations |
WO2020190807A1 (en) * | 2019-03-15 | 2020-09-24 | Intel Corporation | Systolic disaggregation within a matrix accelerator architecture |
EP4130988A1 (en) | 2019-03-15 | 2023-02-08 | INTEL Corporation | Systems and methods for cache optimization |
TWI745697B (en) * | 2019-05-24 | 2021-11-11 | 創鑫智慧股份有限公司 | Computing system and compressing method thereof for neural network parameters |
CN110163370B (en) * | 2019-05-24 | 2021-09-17 | 上海肇观电子科技有限公司 | Deep neural network compression method, chip, electronic device and medium |
US20210064987A1 (en) * | 2019-09-03 | 2021-03-04 | Nvidia Corporation | Processor and system to convert tensor operations in machine learning |
US11663452B2 (en) * | 2019-09-25 | 2023-05-30 | Intel Corporation | Processor array for processing sparse binary neural networks |
KR20210084123A (en) * | 2019-12-27 | 2021-07-07 | 삼성전자주식회사 | Electronic apparatus and controlling method thereof |
US11113601B1 (en) * | 2020-06-30 | 2021-09-07 | Moffett Technologies Co., Limited | Method and system for balanced-weight sparse convolution processing |
KR20220034520A (en) * | 2020-09-11 | 2022-03-18 | 삼성전자주식회사 | Processing apparatus, computing apparatus, and operating method of processing apparatus |
GB2599098B (en) * | 2020-09-22 | 2024-04-10 | Imagination Tech Ltd | Hardware implementation of windowed operations in three or more dimensions |
US20220108328A1 (en) * | 2020-10-06 | 2022-04-07 | Mastercard International Incorporated | Systems and methods for linking indices associated with environmental impact determinations for transactions |
CN115481713A (en) * | 2021-06-15 | 2022-12-16 | 瑞昱半导体股份有限公司 | Method for improving convolution neural network to calculate |
JPWO2023105616A1 (en) * | 2021-12-07 | 2023-06-15 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5812698A (en) * | 1995-05-12 | 1998-09-22 | Synaptics, Inc. | Handwriting recognition system and method |
US6674855B1 (en) * | 1999-10-06 | 2004-01-06 | Comverse Ltd. | High performance multifrequency signal detection |
US20080162385A1 (en) * | 2006-12-28 | 2008-07-03 | Yahoo! Inc. | System and method for learning a weighted index to categorize objects |
US8463591B1 (en) * | 2009-07-31 | 2013-06-11 | Google Inc. | Efficient polynomial mapping of data for use with linear support vector machines |
US20160358069A1 (en) * | 2015-06-03 | 2016-12-08 | Samsung Electronics Co., Ltd. | Neural network suppression |
Family Cites Families (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3246764B2 (en) | 1992-05-11 | 2002-01-15 | 株式会社東芝 | Neurochip |
JPH08305846A (en) | 1995-03-07 | 1996-11-22 | Matsushita Electric Ind Co Ltd | Neuro filter, image area dividing method, and filter device |
US6516309B1 (en) | 1998-07-17 | 2003-02-04 | Advanced Research & Technology Institute | Method and apparatus for evolving a neural network |
MXPA03005942A (en) | 2000-11-30 | 2005-02-14 | Pok Yang Ming | Neural cortex. |
US7634137B2 (en) | 2005-10-14 | 2009-12-15 | Microsoft Corporation | Unfolded convolution for fast feature extraction |
JP5184824B2 (en) * | 2007-06-15 | 2013-04-17 | キヤノン株式会社 | Arithmetic processing apparatus and method |
US10366325B2 (en) | 2011-12-07 | 2019-07-30 | Paul Burchard | Sparse neural control |
US9147154B2 (en) * | 2013-03-13 | 2015-09-29 | Google Inc. | Classifying resources using a deep network |
US9053558B2 (en) | 2013-07-26 | 2015-06-09 | Rui Shen | Method and system for fusing multiple images |
US9730643B2 (en) | 2013-10-17 | 2017-08-15 | Siemens Healthcare Gmbh | Method and system for anatomical object detection using marginal space deep neural networks |
CN104809426B (en) | 2014-01-27 | 2019-04-05 | 日本电气株式会社 | Training method, target identification method and the device of convolutional neural networks |
US10102474B2 (en) | 2014-03-28 | 2018-10-16 | International Business Machines Corporation | Event-based neural network with hierarchical addressing for routing event packets between core circuits of the neural network |
US20150286925A1 (en) | 2014-04-08 | 2015-10-08 | Qualcomm Incorporated | Modulating plasticity by global scalar values in a spiking neural network |
CN105488515B (en) | 2014-09-17 | 2019-06-25 | 富士通株式会社 | The image processing method and image processing apparatus that a kind of pair of image is classified |
EP3796235B1 (en) | 2014-12-17 | 2024-09-04 | Google LLC | Generating numeric embeddings of images |
US10515304B2 (en) | 2015-04-28 | 2019-12-24 | Qualcomm Incorporated | Filter specificity as training criterion for neural networks |
US10013652B2 (en) | 2015-04-29 | 2018-07-03 | Nuance Communications, Inc. | Fast deep neural network feature transformation via optimized memory bandwidth utilization |
US11423311B2 (en) * | 2015-06-04 | 2022-08-23 | Samsung Electronics Co., Ltd. | Automatic tuning of artificial neural networks |
WO2017031630A1 (en) * | 2015-08-21 | 2017-03-02 | 中国科学院自动化研究所 | Deep convolutional neural network acceleration and compression method based on parameter quantification |
US10366337B2 (en) * | 2016-02-24 | 2019-07-30 | Bank Of America Corporation | Computerized system for evaluating the likelihood of technology change incidents |
US11907843B2 (en) * | 2016-06-30 | 2024-02-20 | Intel Corporation | Importance-aware model pruning and re-training for efficient convolutional neural networks |
KR20180034853A (en) * | 2016-09-28 | 2018-04-05 | 에스케이하이닉스 주식회사 | Apparatus and method test operating of convolutional neural network |
US10510146B2 (en) * | 2016-10-06 | 2019-12-17 | Qualcomm Incorporated | Neural network for image processing |
WO2018073975A1 (en) * | 2016-10-21 | 2018-04-26 | Nec Corporation | Improved sparse convolution neural network |
KR20180073118A (en) * | 2016-12-22 | 2018-07-02 | 삼성전자주식회사 | Convolutional neural network processing method and apparatus |
-
2017
- 2017-03-03 KR KR1020170027778A patent/KR102499396B1/en active IP Right Grant
-
2018
- 2018-01-08 US US15/864,379 patent/US11295195B2/en active Active
- 2018-02-28 CN CN201810167217.XA patent/CN108537325B/en active Active
- 2018-03-02 TW TW107107109A patent/TWI765979B/en active
-
2022
- 2022-04-04 US US17/712,247 patent/US20220261615A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5812698A (en) * | 1995-05-12 | 1998-09-22 | Synaptics, Inc. | Handwriting recognition system and method |
US6674855B1 (en) * | 1999-10-06 | 2004-01-06 | Comverse Ltd. | High performance multifrequency signal detection |
US20080162385A1 (en) * | 2006-12-28 | 2008-07-03 | Yahoo! Inc. | System and method for learning a weighted index to categorize objects |
US8463591B1 (en) * | 2009-07-31 | 2013-06-11 | Google Inc. | Efficient polynomial mapping of data for use with linear support vector machines |
US20160358069A1 (en) * | 2015-06-03 | 2016-12-08 | Samsung Electronics Co., Ltd. | Neural network suppression |
Non-Patent Citations (2)
Title |
---|
HYUNSUN PARK 等: "zero and data reuse-aware fast convolution for deep neural networks on GPU", IEEE, 31 December 2016 (2016-12-31) * |
胡萍: "基于人工神经网络和最近邻算法的实例检索模型", 组合机床与自动化加工技术, no. 12, 20 November 2008 (2008-11-20) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726633A (en) * | 2018-11-23 | 2019-05-07 | 成都品果科技有限公司 | A kind of face critical point detection method based on look-up table activation primitive |
CN112364032A (en) * | 2021-01-12 | 2021-02-12 | 浙江正元智慧科技股份有限公司 | Data center data query method based on Internet technology |
CN112364032B (en) * | 2021-01-12 | 2021-08-24 | 浙江正元智慧科技股份有限公司 | Data center data query method based on Internet technology |
Also Published As
Publication number | Publication date |
---|---|
US20180253635A1 (en) | 2018-09-06 |
TWI765979B (en) | 2022-06-01 |
TW201833823A (en) | 2018-09-16 |
KR102499396B1 (en) | 2023-02-13 |
US20220261615A1 (en) | 2022-08-18 |
US11295195B2 (en) | 2022-04-05 |
CN108537325B (en) | 2024-06-07 |
KR20180101055A (en) | 2018-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108537325A (en) | The method for operating neural network device | |
US20200234124A1 (en) | Winograd transform convolution operations for neural networks | |
US20180253636A1 (en) | Neural network apparatus, neural network processor, and method of operating neural network processor | |
US20230418610A1 (en) | Deep vision processor | |
US11533458B2 (en) | Image processing device including neural network processor and operating method thereof | |
KR20200066953A (en) | Semiconductor memory device employing processing in memory (PIM) and operating method for the same | |
CN109064384A (en) | Object detecting method and Related product | |
KR20180101978A (en) | Neural network device, Neural network processor and method of operating neural network processor | |
US11562046B2 (en) | Neural network processor using dyadic weight matrix and operation method thereof | |
KR20190036317A (en) | Neural network system, and Operating method of neural network system | |
JP2019102084A (en) | Method and apparatus for processing convolution operation in neural network | |
CN106445471A (en) | Processor and method for executing matrix multiplication on processor | |
KR20210154502A (en) | Neural network apparatus performing floating point operation and operating method of the same | |
CN113033790A (en) | Neural network device and method of operating the same | |
TW202014934A (en) | Electronic system and non-transitory computer-readable recording medium | |
KR20210045225A (en) | Method and apparatus for performing operation in neural network | |
CN109496319A (en) | Artificial intelligence process device hardware optimization method, system, storage medium, terminal | |
US20200159495A1 (en) | Processing apparatus and method of processing add operation therein | |
US20200057932A1 (en) | System and method for generating time-spectral diagrams in an integrated circuit solution | |
WO2022133814A1 (en) | Omni-scale convolution for convolutional neural networks | |
CN116704200A (en) | Image feature extraction and image noise reduction method and related device | |
CN109993290B (en) | Integrated circuit chip device and related product | |
KR20200062014A (en) | Apparatus for accelerating neural network using weight with dyadic matrix form and operation method thereof | |
KR20200094534A (en) | Neural network apparatus and method for processing multi-bits operation thereof | |
CN114626515A (en) | NPU device for executing convolution operation based on channel number and operation method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |