CN108537325A

CN108537325A - The method for operating neural network device

Info

Publication number: CN108537325A
Application number: CN201810167217.XA
Authority: CN
Inventors: 朴峻奭
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2017-03-03
Filing date: 2018-02-28
Publication date: 2018-09-14
Anticipated expiration: 2038-02-28
Also published as: US20180253635A1; TWI765979B; TW201833823A; KR102499396B1; US20220261615A1; US11295195B2; CN108537325B; KR20180101055A

Abstract

A method of operation neural network device, it can be based on input feature vector mapping graph and generate input feature vector list, the wherein described input feature vector list includes input feature vector index and input feature vector value, based on and the input feature vector list in the included corresponding input feature vector index of input feature vector and weight index corresponding with weight included in weighted list generate output aspect indexing, and be based on and the corresponding input feature vector value of the input feature vector and and the corresponding weighted value of the weight generate output characteristic value corresponding with the output aspect indexing.

Description

The method for operating neural network device

[cross reference of related application]

The South Korea patent application 10- that this application claims file an application in Korean Intellectual Property Office on March 3rd, 2017 The disclosure of No. 2017-0027778 right, the South Korea patent application is incorporated by this case for reference.

Technical field

Concept of the present invention is related to semiconductor device, and systems is configured to index based on one or more The method for executing the neural network device and one or more operation neural network devices of operation.

Background technology

Neural network refers to the computing architecture (computational architecture) of the model as biological brain. With nerual network technique has been developed recently, has a large amount of research and use neural network in various types of electronic systems Device come analyze input data and extraction effective information.

Neural network device can be directed to complicated input data and execute relatively great amount of operation (" neural network computing ").Phase Hope the efficient process for realizing neural network computing so that neural network device analyzes fine definition and inputs and extract information in real time.

Invention content

Concept of the present invention provides a kind of neural network device and its operation side for improving the speed of service and reducing power consumption Method.

According to some exemplary embodiments, it is a kind of operation neural network device method may include：It is reflected based on input feature vector It penetrates figure and generates input feature vector list, the input feature vector list includes input feature vector index and input feature vector value, and the input is special Sign index and the input feature vector value correspond to input feature vector；Based on the weight rope to input feature vector index and weighted list The first operation of row is introduced, output aspect indexing is generated；And based on to the input feature vector value and with the weight index pair The second operation that the weighted value answered carries out generates output characteristic value corresponding with the output aspect indexing.

According to other exemplary embodiments, it is a kind of operation neural network device method may include：Generate input feature vector List, the input feature vector list include input feature vector index corresponding with having the input feature vector of nonzero value and input feature vector Value, the input feature vector index indicate position of the input feature vector on input feature vector mapping graph；Based on to input spy The index operation that sign index carries out generates output aspect indexing；And based on the data operation carried out to the input feature vector value, Generate output characteristic value corresponding with the output aspect indexing.

According to some exemplary embodiments, a kind of neural network device may include：First memory, store instruction program； And processor.The processor can be configured to execute described instruction program to be based on input feature vector index execution index fortune It calculates, position of the input feature vector index instruction input feature vector on input feature vector mapping graph, the rope based on the index operation Draw operation result, generate output aspect indexing, the input feature vector value based on the input feature vector executes data operation, Yi Jiji In the data operation data operation as a result, generating corresponding with output aspect indexing output characteristic value.

According to some exemplary embodiments, it is a kind of operation neural network device method may include：Use the rope of processor Draw re-mapper based on input feature vector mapping graph to generate input feature vector list, the input feature vector list includes input feature vector rope Draw and input feature vector value, the input feature vector index and the input feature vector value correspond to input feature vector；And make index replay Emitter executes the first operation to generate output aspect indexing.First operation may include：The input feature vector is indexed and weighed The weight index for rearranging table is added, and will be added obtained addition value divided by integer by described, and based on determining to remove described There is no remainders to select the quotient of the division as the output aspect indexing after method is completed.

Description of the drawings

Following detailed description is read in conjunction with the figure, the exemplary embodiment of concept of the present invention will be more clearly understood, attached In figure：

Fig. 1 is the block diagram according to the electronic system of some exemplary embodiments of concept of the present invention.

Fig. 2 is the figure according to the neural network framework of some exemplary embodiments.

Fig. 3 is the figure according to the input feature vector list of some exemplary embodiments of concept of the present invention.

Fig. 4 is the flow according to the neural network computing method based on index of some exemplary embodiments of concept of the present invention Figure.

Fig. 5 is the flow chart according to the convolution algorithm method based on index of some exemplary embodiments of concept of the present invention.

Fig. 6 is the figure according to the convolution algorithm of some exemplary embodiments.

Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F be convolution algorithm shown in Fig. 6 during significance arithmetic knot The figure of the snapshot of fruit.

Fig. 8 A, Fig. 8 B and Fig. 8 C are for explaining according to some exemplary embodiments of concept of the present invention based on index The figure of convolution algorithm.

Fig. 9 A and Fig. 9 B are for explaining the convolution fortune based on index according to some exemplary embodiments of concept of the present invention The figure of calculation.

Figure 10 is the flow chart according to the zero-padding method based on index of some exemplary embodiments of concept of the present invention.

Figure 11 A be according to some exemplary embodiments in neural network to input feature vector mapping graph application zero padding The figure of example.

Figure 11 B are for explaining the zero-padding method based on index according to some exemplary embodiments of concept of the present invention Figure.

Figure 12 is to use stride in the convolution algorithm based on index according to some exemplary embodiments of concept of the present invention Method flow chart.

Figure 13 A and Figure 13 B are the figures of the output eigenmatrix generated when using stride in convolution.

Figure 14 is the flow chart according to the pond method based on index of some exemplary embodiments of concept of the present invention.

Figure 15 is the figure for explaining the pond operation based on index according to some exemplary embodiments of concept of the present invention.

Figure 16 is the block diagram according to the neural network device of some exemplary embodiments of concept of the present invention.

Figure 17 is the block diagram according to the neural network processor of some exemplary embodiments of concept of the present invention.

Figure 18 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the first fortune The figure of the state run in row pattern.

Figure 19 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the second fortune The figure of the state run in row pattern.

Figure 20 is the figure of the data flow during the convolution algorithm according to some exemplary embodiments.

Figure 21 and Figure 22 is holding in the neural network based on index according to some exemplary embodiments of concept of the present invention The figure of data processing during capable convolution algorithm.

Figure 23 is the figure according to the neural network processor of some exemplary embodiments of concept of the present invention.

Figure 24 is the volume executed in the neural network based on index according to some exemplary embodiments of concept of the present invention The figure of data processing during product operation.

Figure 25 is the figure according to the neural network processor of some exemplary embodiments of concept of the present invention.

[explanation of symbol]

10：Neural network；

11：First layer/layer；

12：The second layer/layer；

13：Third layer/layer；

21、21a：Index re-mapper；

21b：Address remapped device；

22、22a、22b：First data operation circuit；

23、23a、23b：Second data operation circuit；

24、24a、24b：Private memory；

100：Electronic system；

110：Central processing unit；

120：Random access memory；

130、200：Neural network device；

140：Memory；

150：Sensor assembly；

160：Communication module；

170：Bus；

210、210a、210b：Neural network processor；

211、211b_0、211b_k：Processing circuit；

211a_0：First processing circuit/processing circuit；

211a_k：Kth processing circuit/processing circuit；

212：Internal storage；

213：List maker；

214：Compressor reducer；

215、215a：Selector；

216：Global accumulator；

220：Controller；

230：System storage；

CA：Second index/input feature vector index；

CL：Classification；

CH0：First channel group/channel group；

CH1：Second channel group/channel group；

CH2、CH3、CH4、CH5、CH6、CH7、CH8、CHn-2、CHn-1：Channel group；

CHk：Channel；

D：Depth；

D_0,0~D_7,9、f_3,2、IFB：Input feature vector；

DATA：Input feature vector value；

f_1,1、f_1,4、f_4,3：Input feature vector/non-zero input feature vector；

F0：First input feature vector；

F1：Second input feature vector；

F2：Third input feature vector；

F3：4th input feature vector；

F4：5th input feature vector；

F5：6th input feature vector；

F6：7th input feature vector；

F7：8th input feature vector；

F8：9th input feature vector；

FM1：Fisrt feature mapping graph/Feature Mapping figure；

FM2：Second feature mapping graph/Feature Mapping figure；

FM3：Third feature mapping graph/Feature Mapping figure；

H：Highly；

IFL：Input feature vector list；

IFM：Input feature vector mapping graph；

IFMa：Input feature vector mapping graph/initial input feature list；

IFM_Z：Input feature vector mapping graph through zero padding；

IFM_Za：Input feature vector mapping graph/be filled through input feature vector mapping graph；

IFMX：Input feature vector matrix；

IWL：Initial weight list；

KN0：First kernel/kernel；

KN1：Second kernel/kernel；

KN2：Third kernel/kernel；

KN3：Four kernels/kernel；

KN4：Five kernels/kernel；

KNk：Kernel；

LUT：Look-up table；

MWL：Mirror image weighted list；

OFL1：First output feature list；

OFL2：Second output feature list；

OFM：Export Feature Mapping figure；

OFMX、OFMX_S1、OFMX_S3：Export eigenmatrix；

PW：The pond window of two-dimentional pond window/pond window/2 × 2；

RA：First index/input feature vector index；

REC：Identification signal；

S110、S120、S130、S140、S150、S210、S220、S230、S240、S310、S320、S410、S420、 S430、S440、S450、S460、S510、S520、S530：Operation；

S710、S720、S730、S740、S750、S760：Significance arithmetic result；

W：Width；

W_0,1、W_2,2：Weight/non-zero weight；

WL：Weighted list；

WM：Weight mapping graph；

WMX：Weight matrix；

①：First position；

②：The second position；

③：The third place；

④：4th position；

⑤：5th position；

⑥：6th position.

Specific implementation mode

Fig. 1 is the block diagram according to the electronic system of some exemplary embodiments of concept of the present invention.Fig. 2 is shown according to some The figure of the example of the neural network framework of example property embodiment.Fig. 3 is the input according to some exemplary embodiments of concept of the present invention The figure of feature list.

Electronic system 100 can analyze input data in real time based on neural network, extract effective information, and be based on being extracted Information carry out each element of certain situation or control installation electronic device on electronic system 100.Electronic system 100 can be used In unmanned plane (drone), for example advanced driving assistance system (advanced driver assistance system, ADAS) Equal robot devices, smart television (television, TV), smart mobile phone, medical treatment device, mobile device, image display device, Measuring device and Internet of Things (internet of things, IoT) device.Electronic system 100 can be mounted on other various electronics In any one in device.

Referring to Fig.1, electronic system 100 may include central processing unit (central processing unit, CPU) 110, Random access memory (random access memory, RAM) 120, neural network device 130, memory 140, sensor Module (also referred herein as " sensor device ") 150 and communication (or transmitting/reception (Tx/Rx)) module are (herein Also referred to as " communication device ", " communication interface " and/or " communication transceiver ") 160.Electronic system 100 may also comprise input/ Output module, security module and power control device.Each element of electronic system 100 is (that is, central processing unit 110, random Access memory 120, neural network device 130, memory 140, sensor assembly 150 and communication module 160) in it is certain Element can be installed on a semiconductor chip in a.As shown in Figure 1, the element of electronic system can be coupled by bus 170.

Central processing unit 110 controls the integrated operation of electronic system 100.Central processing unit 110 may include single core processor Or multi-core processor.Central processing unit 110 can handle or execute the program and/or data being stored in memory 140.Citing comes It says, central processing unit 110 may be implemented within the program (" one or more instruction repertories ") in memory 140 to control The function of neural network device 130 in implementation operation described herein certain operations or all operationss.

Random access memory 120 can interim storage procedure, data or instruction.The program being stored in memory 140 And/or data can be temporarily stored in random access memory 120 according to the control of central processing unit 110 or according to guidance code In.Random access memory 120 can be implemented as dynamic random access memory (dynamic RAM, DRAM) or static random Access memory (static RAM, SRAM).

Neural network device 130 can be based on input data and execute neural network computing and can be based on operation (" neural network fortune Calculate ") result generate information signal.Neural network may include convolutional neural networks (convolutional neural Networks, CNN), recurrent neural network (recurrent neural networks, RNN), depth belief network (deep Belief networks) and limited Boltzmann machine (restricted Boltzmann machines), but be not limited only to This.

Information signal may include such as speech recognition signal, object identification information number, image identification signal and biological characteristic One kind in the various identification signals such as identification signal.Neural network device 130 can receive frame data included in video flowing and make For input data and can by the frame data generate relative to object included in the image indicated by frame data identification believe Number.However, concept of the present invention is not limited only to this.Neural network device 130 can be according to the electricity installed above for having electronic system 100 The type or function of sub-device receive the input data of various (" various types ") and can generate identification signal according to input data. The example of neural network framework will be briefly illustrated with reference to Fig. 2.

Example Fig. 2 shows the structure of convolutional neural networks as neural network framework.With reference to Fig. 2, neural network 10 can Including multiple layers, such as first layer 11, the second layer 12 and third layer 13.It can be pond that first layer 11, which can be convolutional layer, the second layer 12, Change layer, and third layer 13 can be output layer.Output layer can be full articulamentum (fully-connected layer).In addition to Fig. 2 institutes Except first layer 11, the second layer 12 and the third layer 13 shown, neural network 10 may also include active layer and may also include another Convolutional layer, another pond layer or another full articulamentum.

Each in first layer 11, the second layer 12 and third layer 13 can receive input data or be generated in preceding layer Feature Mapping figure be used as input feature vector mapping graph, and can be special to generate output by executing operation to input feature vector mapping graph Levy mapping graph or identification signal REC.At this point, Feature Mapping figure is the data for the various features for indicating input data.Feature Mapping Figure FM1, FM2 and FM3 can have two-dimensional matrix form or three-dimensional matrice form.These features with multi-dimensional matrix form are reflected It penetrates figure FM1, FM2 and FM3 and is referred to alternatively as characteristic tensor (feature tensor).Feature Mapping figure FM1, FM2 and FM3 have Can respectively with the x-axis in coordinate system, y-axis and the corresponding width of z-axis (or row) W, height (or row) H and depth D.Depth D can It is referred to as channel number.

Position on the x/y plane of Feature Mapping figure is referred to alternatively as spatial position.Position in the z-axis of Feature Mapping figure can It is referred to as channel.Size on the x/y plane of Feature Mapping figure is referred to alternatively as space size.

First layer 11 can execute convolution to generate second feature mapping to fisrt feature mapping graph FM1 and weight mapping graph WM Scheme FM2.Weight mapping graph WM can be filtered and be referred to alternatively as filter or kernel to fisrt feature mapping graph FM1 (kernel).The depth (that is, channel number) of weight mapping graph WM can be with the depth of fisrt feature mapping graph FM1 (that is, the number of channel Mesh) it is identical.Convolution can be executed to identical channel in weight mapping graph WM and fisrt feature mapping graph FM1 the two.Weight maps Figure WM is translated by being used as sliding window (sliding window) traversal fisrt feature mapping graph FM1.Translational movement can be claimed For " stride length (stride length) " or " stride ".During translation, each included power in weight mapping graph WM All characteristic values that weight can be multiplied by and be added in the regions Chong Die with fisrt feature mapping graph FM1 weight mapping graph WM.Pass through A channel of second feature mapping graph FM2 can be generated by executing convolution to fisrt feature mapping graph FM1 and weight mapping graph WM.To the greatest extent Pipe Fig. 2 only shows a weight mapping graph WM, however can actually have multiple weight mapping graphs and fisrt feature mapping graph FM1 into Row convolution is to generate multiple channels of second feature mapping graph FM2.In other words, the channel number of second feature mapping graph FM2 It can correspond to the number of weight mapping graph.

The second layer 12 can perform pond to generate third feature mapping graph FM3.Pond is referred to alternatively as sampling or down-sampling (downsampling).Two-dimentional pond window PW can be translated on second feature mapping graph FM2, and pond window may be selected Maximum value (or average value of characteristic value) in characteristic value in region Chong Die with second feature mapping graph FM2 PW, so that Third feature mapping graph FM3 can be generated from second feature mapping graph FM2 by obtaining.The channel number of third feature mapping graph FM3 can be with The channel number of second feature mapping graph FM2 is identical.

In some exemplary embodiments, pond window PW can be on second feature mapping graph FM2 with pond window PW's Size is that unit is translated.Translational movement (that is, stride length of pond window PW) can be identical as the length of pond window PW.Cause This, the space size of third feature mapping graph FM3 is smaller than the space size of second feature mapping graph FM2.However, the present invention is general Thought is not limited only to this.The space size of third feature mapping graph FM3 can be identical to or the sky more than second feature mapping graph FM2 Between size.Whether the space size of third feature mapping graph FM3 according to the size of pond window PW, stride length and can have been held Row zero padding determines,

Third layer 13 can be combined the feature of third feature mapping graph FM3 and return to the classification CL of input data Class.Third layer 13 can also generate identification signal REC corresponding with classification CL.Input data can correspond in video flowing included Frame data.At this point, third layer 13 can be extracted and by frame data table based on the third feature mapping graph FM3 provided from the second layer 12 The corresponding classification of object included in the image shown identifies the object, and generates identification signal corresponding with the object REC。

In neural network, low-level layers (for example, convolutional layer) can be rudimentary from input data or the extraction of input feature vector mapping graph Feature (for example, edge or gradient of face image) and higher-level layer (for example, full articulamentum) can be extracted from input feature vector mapping graph Or detection advanced features (that is, classification) (for example, eyes and nose of face image).

Referring to Fig.1, neural network device 130 can perform the neural network computing based on index.At this point, index indicates feature Spatial position or weight.Index may include that the rows and columns respectively with two-dimensional matrix corresponding first index and second indexes.Again Each index in secondary explanation, input feature vector index and weight index may include the first index and the second index, wherein inputting First index of aspect indexing is corresponding to the row of input feature vector matrix, and the second index of input feature vector index is corresponding to input feature vector Matrix column, the first index of weight index corresponds to the row of weight matrix, and the second index of weight index corresponds to weight Matrix column.

Neural network device 130 can based on index execute with above by reference to Fig. 2 illustrate neural network multiple layers in The corresponding operation of at least one layer.Neural network device 130 can be based on the input feature vector mapping graph in matrix form (below In, referred to as input feature vector matrix) generate it is corresponding with each input feature vector include index and the input feature vector list of data and It can be based on index and execute operation.

As shown in figure 3, neural network device 130 can generate input feature vector list from input feature vector matrix.Input feature vector arranges Table may include first index RA corresponding with the spatial position of input feature vector and the second index CA.Index is referred to alternatively as address and the One index RA and the second index CA can be known respectively as row address and column address.Input feature vector list may also comprise and each rope Draw corresponding data (that is, input feature vector value).

Neural network computing based on index may include indexing operation.Operation is indexed to each in input feature vector list Input feature vector indexes and the index of different parameters executes operation.Index operation is referred to alternatively as index and remaps (index remapping).When executing index operation, it can simplify or skip data operation (that is, the operation executed to input feature vector value).

As shown in figure 3, input feature vector list may include and the input feature vector f with nonzero value_1,1、f_1,4And f_4,3In it is every One corresponding index and data.Neural network device 130 can execute the fortune based on index to the input feature vector with nonzero value It calculates.

Meanwhile the weight mapping graph used in convolution algorithm can be converted into weighted list and be provided to neural network dress Set 130.Weighted list may include index corresponding with having each weight of nonzero value and data.To avoid term from obscuring, Index and data in input feature vector list are referred to alternatively as the index in input feature vector index and input feature vector value and weighted list And data will be referred to as weight index and weighted value.

Neural network device 130 can be based on the index in input feature vector list and the index in weighted list to non-zero The input feature vector and weight of value execute convolution algorithm.

Zero in neural network computing does not interfere with the result of operation.Therefore, neural network device 130 can be based on having The input feature vector of nonzero value generates input feature vector list and executes operation based on the index in input feature vector list, so that Neural network device 130 only can execute operation to the input feature vector with nonzero value.Therefore, it can skip to the input with zero The operation that feature carries out.

However, concept of the present invention can be not limited only to this.Input feature vector list may also include special with the input with zero Levy corresponding index and data.It is special that neural network device 130 can generate input based on the input feature vector with zero or nonzero value It levies list and index can be based on and execute operation.

Referring back to Fig. 1, memory 140 is memory element for storing data.Memory 140 can storage program area (operating system, OS), various programs and various data.Memory 140 can store intermediate result (for example, in operation Period by export feature list or export eigenmatrix in the form of generate output Feature Mapping figure).Compressed output feature is reflected Figure is penetrated to be storable in memory 140.Memory 140 can also store the various parameters (example used by neural network device 130 Such as, weight mapping graph or weighted list).

Memory 140 can be dynamic random access memory, but be not limited only to this.Memory 140 may include volatibility At least one of memory and nonvolatile memory.Nonvolatile memory includes read-only memory (read-only Memory, ROM), programmable read only memory (programmableROM, PROM), electrically programmable read-only memory (electrically programmable ROM, EPROM), electrically erasable programmable read-only memory (electrically Erasable programmableROM, EEPROM), flash memory, phase change random access memory devices (phase-change RAM, PRAM), magnetic RAM (magnetic RAM, MRAM), resistive random access memory (resistive RAM, RRAM) and ferroelectric random access memory (ferroelectric RAM, FeRAM).Volatibility is deposited Reservoir may include dynamic random access memory, static RAM, Synchronous Dynamic Random Access Memory (synchronous DRAM, SDRAM), phase change random access memory devices, magnetic RAM, resistive random access Memory and ferroelectric random access memory.Alternatively, memory 140 may include it is following at least one Kind：Hard disk drive (hard disk drive, HDD), solid state drive (solid state drive, SSD), close-coupled dodge Fast (compact flash, CF) card, secure digital (secure digital, SD) card, miniature secure digital (micro- Secure digital, micro-SD) block, small-sized safety digital (mini-secure digital, mini-SD) card, limit number Word (extreme digital, xD) blocks or memory stick (memory stick).

Sensor assembly 150 collects the peripheral information of the electronic device of installation on electronic system 100.Sensor assembly 150 can from external electronic device sense or receive signal (for example, vision signal, audio signal, magnetic signal, bio signal or Touching signals) and sensed or received signal can be converted into data.For this operation, sensor assembly 150 It may include at least one of for example following various sensing device furthers：Microphone, image capturing device (image pickup Device), imaging sensor, optical detection and ranging (light detection and ranging, LIDAR) sensor, ultrasound Wave sensor, infrared sensor, biosensor and touch sensing.

Sensor assembly 150 can provide data to neural network device 130 and be used as input data.For example, it senses Device module 150 may include imaging sensor.At this point, sensor assembly 150 can shoot the external environment of electronic device, video is generated Stream, and the continuous data frame of video flowing is sequentially provided neural network device 130 and is used as input data.However, of the invention Concept is not limited only to this.Sensor assembly 150 can provide various types of data to neural network device 130.

Communication module 160 may include the various types of wireline interfaces or wireless interface communicated with external device (ED).It lifts For example, communication module 160 may include being able to access that the communication interface with lower network：LAN (local area network, LAN)；WLAN (wireless LAN, WLAN), such as Wireless Fidelity (wireless fidelity, Wi-Fi)；Wirelessly Personal area network (wireless personal area network, WPAN), such as bluetooth；Radio universal serial bus (universal serial bus, USB)；Purple honeybee (ZigBee)；Near-field communication (near field communication, NFC)；Radio frequency identification (radio-frequency identification, RFID)；Power line communication (power line Communication, PLC)；Or mobile cellular network, such as the third generation (third generation, 3G), forth generation (fourth generation, 4G) or long-term evolution (long term evolution, LTE).

Communication module 160 can receive weight mapping graph or weighted list from external server.External server can be based on big It includes the weight mapping graph or weighted list for training weight that amount learning data, which executes training and can be provided to electronic system 100,.Institute The weight mapping graph or weighted list received is storable in memory 140.

Communication module 160 can be based on operation result (for example, exporting feature list or output eigenmatrix during operation The output Feature Mapping figure that form generates) generate and/or transmit information signal.

As described above, according to some exemplary embodiments of concept of the present invention, neural network device 130 can be by being based on rope Draw and executes neural network computing to be effectively carried out neural network computing.Specifically, neural network device 130 can be special wherein It levies and is generated and the input feature vector with nonzero value in the sparse neural network that nonzero value in mapping graph or weight mapping graph is sparse Corresponding input feature vector list and based on input feature vector list to nonzero value input feature vector execute operation, to reduce fortune Calculation amount.As operand is reduced, the efficiency of neural network device 130 is improved and neural network device 130 and electronic system 100 power consumption is reduced.The various embodiments of neural network computing method described in detail below based on index.

Fig. 4 is the flow according to the neural network computing method based on index of some exemplary embodiments of concept of the present invention Figure.Neural network computing method shown in Fig. 4 can execute in neural network device 130 and can be applied to nerve net shown in Fig. 2 The operation of the layer 11,12 and 13 of network 10.

With reference to Fig. 4, in operation sl 10, neural network device 130 can generate input feature vector list.For example, neural Network equipment 130 can generate input feature vector table from the input feature vector mapping graph of matrix form.It is defeated as illustrated above by reference to Fig. 3 Enter feature list and may include input feature vector index corresponding with each input (" input feature vector ") and input feature vector value.It is described defeated Nonzero value can be had by entering.Input feature vector index can indicate position of the input feature vector on input feature vector mapping graph.

In operation s 120, neural network device 130 can be indexed based on the input feature vector in input feature vector list and execute rope Draw operation and output aspect indexing is generated based on index operation result.The index operation result for indexing operation can be output feature rope Draw.

In operating S130, neural network device 130 can execute data based on the input feature vector value in input feature vector list Operation simultaneously generates output characteristic value corresponding with output aspect indexing based on data operation result.At this point, being produced when in operation S120 Raw output aspect indexing is non-mapped when exporting in Feature Mapping figure, and neural network device 130 can skip data operation.Number Data operation result according to operation can be output characteristic value corresponding with output aspect indexing.

In operating S140, neural network device 130 can be based on output aspect indexing and output characteristic value generates output spy Levy list.Neural network device 130 executes operation S120 and S130 to generate to all input feature vectors in input feature vector list Export feature list.Illustrate again, at operation S110, neural network device 130 can be generated including multiple input aspect indexing And the input feature vector list of multiple input characteristic value, the multiple input feature vector index correspond to independent in multiple input feature Input feature vector, the multiple input feature vector value correspond to the multiple input feature vector in individual input feature vector, and nerve Network equipment 130 can be based further on individually corresponding input feature vector and execute one group of operation S120 and S130 respectively to be based respectively on The individual corresponding input feature vector index of input feature vector list generates multiple output aspect indexings and based on individually corresponding input Characteristic value generates multiple output characteristic values.One group of operation S120 and S130 are executed respectively as based on individual corresponding input feature vector A part, neural network device 130 can having based on the output index for determining not interfere with output result during operation Limit selection is filtered come the limited selection for exporting index in being indexed to the multiple output, so that the multiple output rope Draw remaining selection for being filtered into the output index including that can influence to export result during operation.Neural network device 130 can Output feature list is stored in memory.Memory can be located at 130 inside of neural network device or can be positioned at nerve Memory outside network equipment 130 (for example, memory 140 shown in Fig. 1).In some exemplary embodiments, neural network 130 compressible output feature list of device simultaneously stores compressed output feature list in memory.

In the exemplary embodiment, if output feature list is last layer for neural network, neural network device 130, which can be based on output feature list, generates information signal.

Neural network device 130 can by each input feature vector index and each input feature vector value execute operation and Filter out do not interfered with during operation output result output index (such as it is the multiple output index in it is more limited Output index) reduce operand.In addition, based on index operation, neural network device 130 can easily handle neural network Various operations.As a result, based on said one or multiple operations is executed, include the function of the electronic system of neural network device 130 Property can be improved.

Fig. 5 is the flow chart according to the convolution algorithm method based on index of some exemplary embodiments of concept of the present invention. Operation method shown in Fig. 5 can execute in neural network device 130 shown in Fig. 1.

With reference to Fig. 5, in operating S210, neural network device 130 can be from input feature vector mapping graph (that is, input feature vector square Battle array) generate input feature vector list.Input feature vector list may include in the output feature with input feature vector matrix each is corresponding Input feature vector index and input feature vector value.Input feature vector index may include corresponding with the rows and columns of input feature vector matrix respectively First index and the second index.Neural network device 130 can be generated has nonzero value at least one of input feature vector matrix The corresponding input feature vector list of input feature vector.

Later, neural network device 130 can be based on based on pre-stored input feature vector list and weighted list to execute The convolution algorithm of index.

In operating S220, neural network device 130 can be based on input feature vector index and weight index generates output feature Index.Neural network device 130 can be defeated by executing operation (" the first operation ") generation to input feature vector index and weight index Go out aspect indexing.

Neural network device 130 can by pair with the input feature vector of nonzero value corresponding input feature vector index and with tool There is the corresponding weight index of the weight of nonzero value to execute operation to generate output aspect indexing.

Specifically, neural network device 130 can carry out phase Calais's generation by being indexed with weight to input feature vector index Export aspect indexing.The first index phase that the first index that neural network device 130 can index input feature vector is indexed with weight Add and is added the second index that input feature vector indexes with the second index that weight indexes.

In operating S230, neural network device 130 can be based on input feature vector value and weighted value generates and output feature rope Draw corresponding output characteristic value.Neural network device 130 can execute data operation by being based on input feature vector value and weighted value (" the second operation ") generates output characteristic value.Input feature vector value can be multiplied by weighted value and can be based on by neural network device 130 Output characteristic value is generated by the multiplication value that the multiplication obtains.Neural network device 130 can by will with output aspect indexing pair The multiple multiplication value phases Calais answered generates output characteristic value.Output characteristic value and weighted value can be non-zero.

Neural network device 130 can by operating S220 based on input feature vector index and weighted list in weight rope Draw execution index operation and data operation is executed to execute based on rope based on input feature vector value and weighted value in operating S230 The convolution algorithm drawn.In the exemplary embodiment, if output is characterized in last layer for neural network, neural network device 130, which can be based on output characteristic value, generates information signal.

In some exemplary embodiments, convolution behaviour's operation method based on index may also comprise wherein neural network device The operation of weighted list is generated from weight matrix.For example, neural network device 130 can be from external (for example, neural network fills Set 130 outside, or equipped with neural network device 130 electronic device external server) receive weight matrix, and can Weighted list is generated from weight matrix.Weighted list may include corresponding with each in weight included in weight matrix Weight indexes and weighted value.Neural network device 130 can generate and the weight of weight matrix with nonzero value at least one of Corresponding weighted list.Neural network device 130 can store weighted list and can use weight rope in operating S220 and S230 Draw and weighted value.However, concept of the present invention is not limited only to this.Neural network device 130 can be from outside (for example, neural network The outside of device 130, or equipped with neural network device 130 electronic device external server) receive weighted list, and The weighted list can be stored and then use the weighted list.

Fig. 6 is the figure of convolution algorithm.Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F are convolution algorithms shown in Fig. 6 The figure of the snapshot of significance arithmetic result in the process.

Specifically, Fig. 6 show based on sparse non-zero Distribution value input feature vector matrix and weight matrix execute Convolution algorithm.Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F show to be illustrated respectively in common neural network and use Traversal convolution algorithm (traversal convolution operation) during significance arithmetic result S710, S720, The snapshot of S730, S740, S750 and S760.

With reference to Fig. 6, to including non-zero input feature vector f_1,1、f_1,4And f_4,3Input feature vector matrix IFMX and including non-zero weigh Weight W_0,1And W_2,2The result of convolution algorithm (being expressed as " * ") that carries out of weight matrix WMX can be expressed as including respectively with the One position to the 6th position 1., 2., 3., 4., 5. and 6. it is corresponding output feature output eigenmatrix OFMX.

As described above, when executing convolution algorithm, the input feature vector with zero and/or the weight with zero will not shadows Ring operation result.Although many snapshots can be generated during traversing convolution algorithm, only Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Six fast notes shown in Fig. 7 E and Fig. 7 F influence operation result.Such as Fig. 7 A, Fig. 7 B, Fig. 7 C, Fig. 7 D, Fig. 7 E and Fig. 7 F institutes Show, output feature can correspond to non-zero input feature vector f_1,1、f_1,4And f_4,3With non-zero weight W_0,1And W_2,2In each into The result of row convolution.

Fig. 8 A, Fig. 8 B and Fig. 8 C are for explaining according to some exemplary embodiments of concept of the present invention based on index The figure of convolution algorithm.Fig. 8 A, Fig. 8 B and Fig. 8 C show the convolution based on index executed to non-zero input feature vector and non-zero weight Operation.

Fig. 8 A show the generation of input feature vector list IFL.With reference to Fig. 8 A, neural network device 130 can be special relative to input The non-zero for levying matrix IFMX inputs (for example, input feature vector f_1,1、f_1,4And f_4,3) generate input feature vector list IFL.Input feature vector List IFL can include input feature vector index RA and CA and input feature vector value DATA for each input feature vector.

Fig. 8 B show the generation of weighted list WL.The generation of weighted list WL is similar to the generation of input feature vector list IFL. The operation being adjusted is indexed to the weight in weighted list WL however, can extraly be executed to convolution algorithm.Shown in Fig. 8 B Weighted list WL generation can to neural network device 130 (shown in Fig. 1) provide weight server in execute or can It is executed based on the weight matrix provided from server in included pretreatment circuit in neural network device 130.For just For the sake of explanation, it is assumed that weighted list WL shown in Fig. 8 B is generated in neural network device 130.

Neural network device 130 can for weight matrix WMX non-zero weight (for example, weight W_0,1And W_2,2) generate initially Weighted list IWL.The weight index of initial weight list IWL indicates weight W_0,1And W_2,2In the spatial position (example of each Such as, address).Such weight index is referred to alternatively as " initial weight index ".

Later, initial weight index can be adjusted to corresponding with certain operations.The adjustment may include：Neural network device 130 are indexed by being based on the weight that weight biasing index (for example, (RA, CA)=(1,1)) is formed in initial weight list IWL The mirror image of (" initial weight index ") generates mirror image weighted list MWL, the weight biasing index instruction weight matrix WMX's Center.

Neural network device 130 can be by subtracting from the weight of mirror image weighted list MWL index (" mirror image weight index ") Weight biasing index (that is, (RA, CA)=(1,1)) is biased to be indexed to mirror image weight.As a result, can generate (1,0) and (- 1, -1) it is used as each weight W_0,1And W_2,2Weight index, and the weighted list WL for convolution algorithm can be generated.

Fig. 8 C show the operation carried out to input feature vector and weight based on index.With reference to Fig. 8 C, neural network device 130 can Input feature vector value is multiplied by weighted value by input feature vector index and weight index phase adduction.

It for example, can be by each input feature vector f_1,1、f_1,4And f_4,3Input feature vector index (1,1), (1,4) and (4, 3) each in and weight W_0,1Weight index (1,0) be added so that can generate output aspect indexing (2,1), (2,4), And (5,3).At this point, the first index RA that can index each input feature vector is added with the first index RA that weight indexes and can The second index CA that each input feature vector indexes is added with the second index CA that weight indexes.

By input feature vector f_1,1、f_1,4And f_4,3In the input feature vector value of each be multiplied by weight W_0,1Weighted value so that Obtaining can be for weight W_0,1Generate the first output feature list OFL1.In addition, can be by each input feature vector f_1,1、f_1,4And f_4,3's Input feature vector indexes each and weight W in (1,1), (1,4) and (4,3)_2,2Weight index (- 1, -1) be added, and will Input feature vector f_1,1、f_1,4And f_4,3In the input feature vector value of each be multiplied by weight W_2,2Weighted value so that can for power Weight W_2,2Generate the second output feature list OFL2.

Since Chong Die output is not present between the first output feature list OFL1 and the second output feature list OFL2 Aspect indexing, therefore the output feature in the first output feature list OFL1 and the output in the second output feature list OFL2 are special Sign can be mapped on matrix, without additional operation.As can be seen that output eigenmatrix OFMX and Fig. 6 shown in Fig. 8 C Shown matrix is identical.

Traversal convolution algorithm can be related to redundancy in itself because of traversal.Thus it is not easy to skip to zero The operation (that is, not interfering with the meaningless operation of output feature) that input feature vector and weight carry out.However, when as shown in Figure 8 C When using according to the convolution algorithms based on index of some exemplary embodiments of concept of the present invention, neural network device 130 is based on Non-zero inputs and non-zero weight executes the operation based on index, to eliminate meaningless operation.As a result, operand is subtracted It is few.

Fig. 9 A and Fig. 9 B are for explaining the convolution fortune based on index according to some exemplary embodiments of concept of the present invention The figure of calculation.Fig. 9 A show the generation of input feature vector index.Fig. 9 B are shown based on shown in the index of input feature vector shown in Fig. 9 A and Fig. 8 B The convolution algorithm based on index that weight index executes.

With reference to Fig. 9 A, neural network device 130 can input the non-zero of input feature vector matrix IFMX (for example, input is special Levy f_1,1、f_1,4、f_3,2And f_4,3) generate input feature vector list IFL.Input feature vector list IFL can be for each input feature vector packet Include input feature vector index RA and CA and input feature vector value DATA.When being compared with Fig. 8 A to Fig. 9 A, input feature vector f_3,2Quilt It is added to input feature vector matrix IFMX, and therefore, with input feature vector f_3,2Corresponding input feature vector index (3,2) and input feature vector Value f_3,2It is added to input feature vector list IFL.

When based on convolution of the execution of weighted list WL shown in input feature vector list IFL shown in Fig. 9 A and Fig. 8 B based on index When operation, it can generate for weight W_0,1The first output feature list OFL1 and for weight W_2,2Second output feature list OFL2, as shown in Figure 9 B.At this point, there are Chong Die between the first output feature list OFL1 and the second output feature list OFL2 Output aspect indexing (2,1).It can be pair with output aspect indexing (2,1) corresponding multiple characteristic values (that is, f_1,1×W_0,1And f_3,2 ×W_2,2) be added, and addition result can be generated as the corresponding output characteristic value with output aspect indexing (2,1).

According to the present example of concept of the present invention, when using the convolution algorithm based on index, neural network device 130 It index operation can be used to generate output aspect indexing and generate output characteristic value using data operation.However, when defeated for one Go out aspect indexing and there is the output aspect indexing of overlapping (that is, when there are multiple data for an output aspect indexing Operation result (that is, multiplication value)) when, the multiple multiplication value can be added to generate and the output by neural network device 130 The corresponding output characteristic value of aspect indexing.

Above by reference to as described in Fig. 8 A to Fig. 9 B, neural network device 130 can be based on index to the input with nonzero value Feature and weight execute convolution algorithm.Therefore, the required operand of convolution algorithm can be reduced.Therefore, neural network device 130 Arithmetic speed can be improved and the power consumption of neural network device 130 can reduce.

Referring to Fig.1 0, in operation s 310, neural network device 130 can generate input feature vector list.For example, neural Network equipment 130 can generate input feature vector list, the input feature vector list packet from the input feature vector mapping graph of matrix form Include the index and data for each input feature vector in the input feature vector with nonzero value.

In operating S320, biasing index can be added to each index of input feature vector list by neural network device 130. Therefore, neural network device 130 can perform zero padding.This 1A and Figure 11 B will be elaborated referring to Fig.1.

Figure 11 A are the figures of the example to input feature vector mapping graph IFM application zero paddings in neural network.Figure 11 B are to use In the figure for explaining the zero-padding method based on index according to some exemplary embodiments of concept of the present invention.In the drawings, each Number at the top of a pixel is that the number at the index and each pixel bottom of input feature vector is input feature vector value.

Zero padding in neural network is reflected to input feature vector in all outwardly directions (that is, line direction and column direction) Penetrate figure IFM additions zero.When to input feature vector mapping graph IFM application zero paddings, the input feature vector with zero padding can be generated and reflected Penetrate figure (that is, input feature vector mapping graph IFM_Z through zero padding).When being added to each of input feature vector mapping graph IFM by one zero When a outwardly direction, as shown in Figure 11 A, the position (that is, index) of each input feature vector can increase by 1.For example, it inputs Feature D_0,0Index (0,0) become (1,1).As described above, when in each outwardly direction will " n (and wherein " n " be to It is 1 integer less) " a zero when being added to input feature vector mapping graph IFM, the index of each input feature vector can increase " n ".Each (hereinafter referred to as the length (length of zero value) of zero or zero are long for zero number added on a direction Spend (zero-value length)) " n " can with application zero padding after based on input feature vector execute operation type and Characteristic and change.

When during traversing convolution algorithm to the input feature vector mapping graph IFM application zero paddings in matrix form, it can produce The raw output Feature Mapping figure having with the same size of input feature vector mapping graph IFM.God for executing traversal convolution algorithm It needs to include control logic through network equipment, control logic is added to input feature vector mapping graph IFM to support zero padding by zero.

Figure 11 B are for explaining the zero-padding method based on index according to some exemplary embodiments of concept of the present invention Figure.Specifically, Figure 11 B show the input feature vector mapping graph IFMa of the input feature vector with nonzero value and removed by pair Input feature vector mapping graph IFMa is filled through input feature vector mapping graph IFM_Za using zero's generated based on the zero padding of index. In Figure 11 B, input feature vector mapping graph IFMa and IFM_Za be input feature vector list and for convenience of explanation for the sake of be represented as square Formation formula.IFMa is referred to alternatively as initial input feature list.

It can skip the operation carried out to the input feature vector with zero in the neural network computing based on index.Work as use When zero padding, neural network device 130 can generate the input feature vector mapping graph IFMa of the input feature vector including having nonzero value (that is, initial input feature list) and can generate removed by input feature vector mapping graph IFMa apply the zero padding based on index Zero for filling and generating is filled through input feature vector mapping graph IFM_Za (that is, being filled through input feature vector list).Illustrate again, nerve Network equipment 130 can generate initial input feature list IFMa, and initial input feature list IFMa includes and the input feature vector The corresponding initial input aspect indexing in position and input feature vector value corresponding with the input feature vector.

For execute the neural network computing based on index neural network device 130 can by be based on biasing index (z, Z) index is remapped in input feature vector list (that is, in input feature vector mapping graph IFMa of tabular form) to generate warp Input feature vector mapping graph IFM_Za is filled, biasing index (z, z) is also referred herein as " characteristic offset index ".Citing comes It says, biasing index (z, z) can be added to the index of the input feature vector of input feature vector mapping graph IFMa with right by neural network device 130 The index is remapped.At this point, biasing index (z, z) can be determined according to zero length.For example, when special in input When being added to input feature vector mapping graph IFM by one zero on all outward directions of sign mapping graph IFM, as shown in Figure 11 A, also It is to say, when zero length is 1, biasing index (z, z) can be set to (1,1).When zero length is 2, biasing index (z, Z) it can be set to (2,2).When zero length is " n ", biasing index (z, z) can be set to (n, n).As described above, can Biasing index (z, z) is set based on zero length.

Figure 11 B show to add one zero situation on all outward directions of input feature vector mapping graph IFMa wherein In removed zero-suppress be filled through input feature vector mapping graph IFM_Za.Neural network device 130 can be by that will bias index (1,1) The index of input feature vector mapping graph IFMa is added to be remapped to the index of input feature vector.For example, biasing is indexed (1,1) it is added to the input feature vector D of input feature vector mapping graph IFMa_0,0Index (0,0) so that input feature vector D_0,0Index can It is remapped to (1,1) from (0,0).Biasing index (1,1) is added to input feature vector D_2,3Index (2,3) so that input feature vector D_2,3Index can be remapped to (3,4) from (2,3).Biasing index (1,1) can be added to input feature vector by neural network device 130 The input feature vector D of mapping graph IFMa_0,0To D_5,5In the index of each, to generate removed zero-suppress be filled through input feature vector Mapping graph IFM_Za.

As described above, the neural network device 130 for executing the neural network computing based on index can be based on according to zero The biasing index (z, z) of value length setting remaps the index of the input feature vector mapping graph IFMa in tabular form, from And easily generate removed zero-suppress be filled through input feature vector mapping graph IFM_Za, without using for the individual of zero padding Control logic.

Figure 12 is to use stride in the convolution algorithm based on index according to some exemplary embodiments of concept of the present invention Method flow chart.Stride can be executed during convolution algorithm and can be executed in operation S220 and S230 shown in Fig. 5.

Referring to Fig.1 2, neural network device 130 input feature vector index can be indexed with weight in operating S410 be added and Result (mutually indexing) divided by stride length can be will add up in operating S420.

In operating S430, neural network device 130 can determine whether that the division whether there is remainder.When there are remainder, In operating S440, neural network device 130 can skip the operation carried out to input feature vector index and weighted value.When the division There are when remainder, mutually indexing to be mapped on output Feature Mapping figure, and therefore, the data operation that index is carried out As a result output Feature Mapping figure is not interfered with.Therefore, neural network device 130 can skip to input feature vector value and weighted value progress Operation.

In addition, when remainder is not present in the division (for example, after division is completed), neural network device 130 is operating Quotient may be selected in S450 to be used as output aspect indexing and can execute operation (example to input feature vector value and weighted value in operating S460 Such as, multiplication and addition).The output characteristic value that the operation values obtained by the operation are used as output aspect indexing can be provided.

For example, when in the knot for being added the input feature vector index of the first input feature vector with the weight of the first weight index When remainder being not present after fruit divided by stride length, quotient may be selected as output aspect indexing, and can provide pair and the first input The corresponding input feature vector value of feature and weighted value corresponding with the first weight execute the result of operation as output aspect indexing Output characteristic value.It is indexed and the weight of the second weight index divided by stride length when by the input feature vector of the second input feature vector When there is remainder later, the fortune carried out to the input feature vector index of the second input feature vector and the weight index of the second weight is not selected The result of calculation is used as output aspect indexing.Therefore, can omit pair input feature vector value corresponding with the second input feature vector and with The operation that the corresponding weighted value of second weight carries out.

As described above, by the operation carried out to index, stride can be easily used in the convolution algorithm based on index, And operand can be reduced.

Figure 13 A show wherein to apply matrix pixel by pixel the example of stride.Figure 13 B are shown wherein on matrix to every The example of three pixel application strides.When stride length increases, exporting the size of eigenmatrix can reduce.When by Figure 13 A institutes When showing that output eigenmatrix OFMX_S1 is compared with output eigenmatrix OFMX_S3 shown in Figure 13 B, it can be seen that in Figure 13 A It is constituted with the output feature of dash box label in shown output eigenmatrix OFMX_S1 and exports eigenmatrix OFMX_ shown in Figure 13 B S3, and the index of only output feature changes.

As described above, when using the convolution algorithm based on index according to some exemplary embodiments of concept of the present invention, Input feature vector can be indexed and is added with weight index by neural network device 130, can be indexed the phase divided by stride, and ought be Select quotient as output aspect indexing when remainder being not present after division.

For example, since stride length in figure 13a is 1, each in output eigenmatrix OFMX_S1 is defeated The index for going out feature be by input feature vector index with weight index progress be added mutually index.

In the example shown in Figure 13 B, when remainder is not present after will add up index divided by stride length 3, it can produce Output aspect indexing of the raw quotient as output eigenmatrix OFMX_S3.

Neural network device 130 can execute operation by pair input feature vector value corresponding with output aspect indexing and weighted value To generate output characteristic value.Neural network device 130 can not be to input feature vector value not corresponding with output aspect indexing and weight Value executes operation.

Referring to Fig.1 4, in operation s 510 neural network device 130 can be based on sampling unit and index input feature vector to carry out It remaps.One index can be remapped into pond window included multiple input feature.It can provide by remapping Index the output aspect indexing as output Feature Mapping figure.

In operating S520, neural network device 130 can execute pond to the input feature vector having the same for remapping index Change operation.In other words, pond operation can be executed to input feature vector included in the window of pond.Input feature vector can be executed most Great Chiization or average pond.

In operating S530, neural network device 130 can provide the pond operation values obtained by pond operation be used as with Export the corresponding output characteristic value of aspect indexing.Referring to Fig.1 5 are elaborated the pond method based on index.

Figure 15 is the figure for explaining the pond operation based on index according to some exemplary embodiments of concept of the present invention. For ease of illustration for the sake of, Feature Mapping figure is represented as matrix form.

As described above with reference to Figure 2, in pond layer, the size of input feature vector mapping can reduce.Therefore, the ginseng of neural network Number and operand can be reduced.As shown in figure 15,2 × 2 pond window PW can be applied to 10 × 10 input feature vector mapping graphs (A).When When executing pond operation to each 2 × 2 sampling unit, 5 × 5 output Feature Mapping figures (C) can be generated.Although showing in fig.15 Go out 2 × 2 samplings, however sampling unit can be variously modified.

According to some exemplary embodiments, neural network device 130 can be based on index and execute pond.Neural network device 130 can be by input feature vector index divided by specific (or alternatively, scheduled) sampling length (" sub-sample size (sub-sampling size) ") and may be selected the division quotient remap index (" with input feature vector as input Corresponding output aspect indexing ").Therefore, as shown in the input feature vector mapping graph (B) remapped by index, index can Input feature vector is remapped, and multiple input feature having the same according to sampling unit can remap index.Replay It can be output aspect indexing to penetrate index, that is, by the spatial position of storage output feature in output eigenmatrix.By input feature vector Value is stored in before the position according to corresponding output aspect indexing, can execute fortune to input feature vector value according to the type in pond It calculates.

For example, when to input feature vector matrix application maximum pond (max pooling), it is possible to provide 2 × 2 samplings are single Maximum value in member in included input feature vector value (that is, input feature vector value corresponding with an output aspect indexing) is used as Output characteristic value corresponding with output aspect indexing.

In another example, when to input feature vector matrix application be averaged pond (average pooling) when, can pair with One corresponding each input feature vector value of output aspect indexing is added, can will be by the obtained addition value divided by defeated of being added Enter the number of characteristic value, and result of division can be provided as output characteristic value corresponding with output aspect indexing.However, of the invention Concept is not limited only to these examples, and various types of ponds can be used.

The result of pond operation is executed as output when providing a pair input feature vector corresponding with each output aspect indexing When characteristic value, output Feature Mapping figure (C) can be generated.

The various embodiments of the neural network computing method based on index are elaborated with reference to Fig. 4 to Figure 15 above.However, Concept of the present invention is not limited only to these embodiments.The various operations used in various neural networks can be executed based on index.

Figure 16 is the block diagram according to the neural network device 200 of some exemplary embodiments of concept of the present invention.

Referring to Fig.1 6, in some exemplary embodiments, neural network device 200 is neural network device shown in Fig. 1 130.Therefore, the explanation of neural network device 130 can be applied to neural network device 200.

Neural network device 200 may include controller 220, neural network processor 210 and system storage 230.Nerve Network equipment 200 may also comprise direct memory access (DMA) (direct memory access, DMA) controller to store data In external memory.Neural network processor 210, controller 220 and the system storage 230 of neural network device 200 can It is in communication with each other by system bus.Neural network device 200 can be implemented as semiconductor chip (for example, System on Chip/SoC (system-on-chip, SoC)), but it is not limited only to this.Neural network device 200 can be by multiple semiconductor chips come real Make.In the present embodiment, controller 220 and neural network processor 210 show the component of separation, but are not limited only to this, and And controller 220 may include in neural network processor 210.

Controller 220 can be implemented as central processing unit or microprocessor.Controller 220 can control neural network device 200 all operations.In the exemplary embodiment, the journey of the executable instruction being stored in system storage 230 of controller 220 Sequence carrys out control neural network device 200.Controller 220 can control the behaviour of neural network processor 210 and system storage 230 Make.For example, controller 220 can be set and management parameters enable neural network processor 210 normally to execute nerve net Each layer of network.

Controller 220 can generate weighted list from weight matrix and weighted list is provided to neural network processor 210. However, concept of the present invention is not limited only to this.It may include being used in neural network device 200 or neural network processor 210 The individual processing circuit of weighted list is generated from weight matrix.

Neural network processor 210 may include multiple processing circuits 211.Each processing circuit 211 can be configured to concurrently It runs simultaneously.In addition, processing circuit 211 can be run independently of one another.Each in processing circuit 211 can be implemented as executing The core circuit of instruction.The executable operation based on index illustrated above by reference to Fig. 4 to Figure 15 of processing circuit 211.

Neural network device 210 can be by hardware circuit implementation.For example, neural network processor 210 can be implemented as Integrated circuit.Neural network processor 210 may include at least one of the following：Central processing unit, multi-core processor, at array Manage device, vector processor, digital signal processor (digital signal processor, DSP), field programmable gate array (field-programmable gate array, FPGA), programmable logic array (programmable logic array, PLA), application specific integrated circuit (application specific integrated circuit, ASIC), programmable patrol Collect circuit system, video processor (video processing unit, VPU) and graphics processor (graphics Processing unit, GPU).However, concept of the present invention is not limited only to this.

Neural network device 210 may also comprise internal storage 212.Internal storage 212 can be neural network processor 210 cache memory.Internal storage 212 can be static RAM, but be not limited only to this.It deposits inside Reservoir 212 can be implemented as the buffer or cache memory or neural network processor of neural network processor 210 One kind in 210 other type memories.Internal storage 212 can store according to by operation that processing circuit 211 executes and The data of generation, such as output aspect indexing, output characteristic value or the various data generated during operation.

System storage 230 can be implemented as random access memory (for example, dynamic random access memory or it is static with Machine accesses memory).System storage 230 can be connected to neural network processor 210 by Memory Controller.System stores Device 230 can store various types of programs and data.System storage 230 can be stored from external device (ED) (for example, server or outer Portion's memory) provide weight mapping graph.

System storage 230 can buffer weight mapping corresponding with the next layer that will be executed by neural network processor 210 Figure.It, can be from external memory (for example, the storage in Fig. 1 when the right to use remaps figure execution operation in processing circuit 211 Device 140) it exports weight mapping graph and weight mapping graph is stored in the internal storage 212 of neural network processor 210 (at this In in text also referred to as " second memory ") or in processing circuit 211 in included private memory.Weight mapping graph can It is stored as matrix form (that is, weight matrix) or is stored as the tabular form (that is, weighted list) based on index.System is deposited Reservoir 230 (also referred herein as " first memory ") (can will be also referred herein as " external from memory 140 Memory ") the weight mapping graph of output provides to included private memory in internal storage 212 or processing circuit 211 Interim storage weight mapping graph before.

System storage 230 also can the output Feature Mapping figure that is exported from neural network processor 210 of interim storage.

Figure 17 is the block diagram according to the neural network processor of some exemplary embodiments of concept of the present invention.Figure 17 is detailed Neural network processor 210 shown in Figure 16 are shown.

Referring to Fig.1 7, neural network processor 210 may include at least one processing circuit 211, list maker (list Maker) 213 and internal storage 212 (" second memory ").Neural network processor 210 may also comprise compressor reducer 214 and Selector 215.Processing circuit 211 may include indexing re-mapper 21, the first data operation circuit 22 (" multiplier "), the second number According to computing circuit 23 (" accumulator ") and private memory 24 (" third memory ").

List maker 213 can generate input feature vector list from input feature vector.List maker 213 is recognizable to have non-zero The input of value and the input feature vector list for generating the input with nonzero value.

When received input feature vector is compressed input feature vector matrix, list maker 213 can will be described defeated Enter eigenmatrix decompression to contract based on decompressed input feature vector matrix generation input feature vector list.When received input When feature includes compressed input feature vector list, list maker 213 can generate input feature vector row by executing decompression Table.

Selector 215 can be received by the input feature vector list exported from list maker 213 or from internal storage 212 Input feature vector list be optionally supplied to processing circuit 211.For example, selector 215 can in the first operational mode Input feature vector list is provided to processing circuit 211 from list maker 213.First operational mode can be linear operation pattern. For example, the first operational mode can be convolution pattern.Selector 215 in the second operational mode can by input feature vector list from Internal storage 212 is provided to processing circuit 211.Second operational mode can be to use activation primitive (activation Function pond pattern) or nonlinear model.For example, in the second operational mode, can perform pond operation or Activation primitive can be applied to the output characteristic value generated in the first operational mode.

The executable index operation of index re-mapper 21 simultaneously generates output aspect indexing.Index re-mapper 21 it is executable with The upper index operation illustrated with reference to Fig. 4 to Figure 15.Index re-mapper 21 may include arithmetic circuity.

Input feature vector list can be received from selector 215 and receive weight row from private memory 24 by indexing re-mapper 21 Table.Index re-mapper 21 can index input feature vector to be added with weight index and mutually be indexed with generating.Index remaps Device 21 can will add up index divided by specific (or alternatively, scheduled) integer (for example, in the operation of pond The step-length or sampling unit used).

Index re-mapper 21, which can be filtered the index generated, to be enabled to having in generated index The index of meaning executes data operation.For example, index re-mapper 21 can be by generated index classification at output feature Index and other indexes enable to special to output in the first data operation circuit 22 and/or the second data operation circuit 23 It levies output aspect indexing included in list and executes data operation.It indexes re-mapper 21 and can control the first data operation circuit 22 and/or second data operation circuit 23 with not to other index execute operation.

Index re-mapper 21 can ask to read the data being stored in private memory 24.For example, replay is indexed Emitter 21 can ask private memory 24 to read weighted list.Illustrate again, index re-mapper 21 can be in the second operational mode It is middle to transmit read requests signal to private memory 24, it the read requests signal and to read in the multiple parameter with first The request of the corresponding parameter of input feature vector value is associated.Alternatively, index re-mapper 21 can ask special deposit Reservoir 24 exports parameter (for example, output characteristic value in output feature list) corresponding with input feature vector value.

Private memory 24 is storable in executes the various data used during operation by processing circuit 211.For example, Private memory 24 can store weighted list.Private memory 24 can also store look-up table, and the look-up table includes special with input The corresponding parameter of value indicative.Weighted list is provided to index by the request that private memory 24 may be in response to index re-mapper 21 Re-mapper 21 and the first data operation circuit 22.Private memory 24 may also respond to index re-mapper 21 request by Parameter is provided to the first data operation circuit 22 and the second data operation circuit 23.

First data operation circuit 22 and the second data operation circuit 23 can perform data operation.First data operation circuit 22 and second data operation circuit 23 can form data operation circuit.First data operation circuit 22 and the second data operation circuit The 23 executable data operations illustrated above by reference to Fig. 4 to Figure 15.

First data operation circuit 22 can perform multiplying.First data operation circuit 22 may include multiplier.Work as place When managing the execution convolution algorithm of circuit 211, the input feature vector value in input feature vector list can be multiplied by by the first data operation circuit 22 Weighted value in weighted list.Multiplication result may be provided to the second data operation circuit 23.First data operation circuit 22 can By the array of multiplier come implementation.

Second data operation circuit 23 can perform add operation and also execute division arithmetic.In addition, the second data operation is electric Road 23 can perform other various types of operations.Second data operation circuit 23 can be implemented as accumulator or arithmetical operation electricity Road.Second data operation circuit 23 can be implemented as the array of computing circuit (operational circuit).For example, Second data operation circuit 23 can be implemented as the array of accumulator.

Internal storage 212 can store the data exported from processing circuit 211.For example, internal storage 212 can be deposited Store up the output aspect indexing received from the second data operation circuit 23 and corresponding output characteristic value.In other words, storage inside Device 212 can store output feature list.In addition, internal storage 212 can be stored during operation and be exported from processing circuit 211 Intermediate result.Intermediate result may be provided to the second data operation circuit 23 to make in the operation of the second data operation circuit 23 With.

The data being stored in internal storage 212 can be provided to processing circuit 211 by selector 215.In other words It says, the data obtained by the current operation of processing circuit 211 can be used for next operation.For example, because of processing circuit 211 Convolution algorithm and the output feature list that generates may be provided to processing circuit 211 and be used as input feature vector list and processing circuit 211 can execute pond operation to input feature vector list.

Meanwhile output feature list can be output to outside (for example, electronic system 100 from the second data operation circuit 23 Memory 140), or can be stored in internal storage 212 and then be exported.Output feature list can pass through compression Device 214 is exported.Output feature list can be compressed and export compressed output feature list by compressor reducer 214.

The operation of the processor carried out according to operational mode is illustrated hereinafter with reference to Figure 18 and Figure 19.

Figure 18 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the first fortune The figure of the state run in row pattern.First operational mode can be convolution algorithm pattern.

Referring to Fig.1 8, list maker 213 can receive input feature vector mapping graph IFM and generate input feature vector list.List Input feature vector list can be provided to processing circuit 211 by maker 213.

It indexes re-mapper 21 and the first data operation circuit 22 can be respectively from the weight being stored in private memory 24 List reception weight indexes and weighted value corresponding with weight index.Index re-mapper 21 can receive weight index and the first number Weighted value is can receive according to computing circuit 22.

Input feature vector index and weight index execution index operation and the first data operation can be based on by indexing re-mapper 21 Circuit 22 can execute data operation to input feature vector value and weighted value.Indexing re-mapper 21 can be by input feature vector index and weight Index is added and can also execute division to the addition value to generate output characteristic value.

Index re-mapper 21 also can determine whether output aspect indexing is meaningful.When determining output aspect indexing is When meaningless, index re-mapper 21 can control the not pair input corresponding with output aspect indexing of the first data operation circuit 22 Characteristic value and weighted value execute operation.Therefore, the first data operation circuit 22 can to only with significant output aspect indexing pair The input feature vector value and weighted value answered execute operation.

Second data operation circuit 23 can by from the operation result that the first data operation circuit 22 exports with same output The corresponding operation result of aspect indexing is added.Therefore, the first data operation circuit 22 and the second data operation circuit 23 are executable Included multiplying and add operation in convolution algorithm.

The output feature list generated by convolution algorithm can be stored in internal storage by the second data operation circuit 23 In 212 or feature list can be exported by compressor reducer 214.

Figure 19 is for explaining the neural network processor according to some exemplary embodiments of concept of the present invention in the second fortune The figure of the state run in row pattern.Second operational mode can execute after the first operational mode.In the second operational mode, Activation primitive can be applied in the output characteristic value in the output feature list generated in the first operational mode.

Referring to Fig.1 9, the result of the operation executed in the first operational mode can be stored in internal storage 212.It lifts For example, internal storage 212 can store output feature list (that is, executing convolution fortune to input feature vector list based on index The result of calculation).

Input feature vector value can be received (that is, the output in output feature list from internal storage 212 by indexing re-mapper 21 Characteristic value).Private memory 24 (also referred herein as " third memory ") can store look-up table, and the look-up table includes Parameter corresponding with input feature vector value.Illustrate again, the look-up table may include and each characteristic value pair in multiple characteristic values The multiple parameters answered.Sign function (sign function), S type functions (sigmoid can be used in neural network ) or exponential function (exponential function) function.These activation primitives have non-linear.Look-up table can wrap Include enable with nonlinear activation primitive as piecewise linear function (piecewise linear function) come into Row calculates.The output " f " of the activation primitive of input feature vector value " v " can be expressed as applying piecewise linearity to input feature vector value " v " Function as a result, as defined in formula 1：

F=c (v) v+b (v) (1)

Wherein c (v) is coefficient corresponding with input feature vector value " v ", and b (v) is biasing corresponding with input feature vector value " v " Value.Look-up table may include and the different corresponding parameters of input feature vector value.

Index re-mapper 21 can ask to obtain parameter corresponding with input feature vector value " v " from private memory 24.This Request may include transmitting read requests signal to private memory 24, the read requests signal and read the multiple parameter In parameter corresponding with input feature vector value request it is associated.What the parameter received may include receiving from private memory 24 First parameter and the second parameter, wherein the first parameter and the second parameter correspond to input feature vector value.It therefore, can be special from being stored in Look-up table in memory 24 exports parameter (that is, c (v) and b (v)) corresponding with input feature vector value " v ".Illustrate again, exports Characteristic value can be based on input feature vector value, the first parameter and the second parameter and generate.

Parameter c (v) may be provided to the first data operation circuit 22 and parameter b (v) may be provided to the second data operation Circuit 23.First data operation circuit 22 can be based on input feature vector value " v " and parameter c (v) executes multiplying and the second data Computing circuit 23 can execute add operation based on the operation result received from the first data operation circuit 22 and parameter b (v).Knot Fruit can generate the output " f " of the activation primitive of input feature vector value " v ".The output feature of the activation primitive of multiple input characteristic value Value can be output to the outside of neural network processor.The output characteristic value of activation primitive can be before being output to outside by pressing Contracting device 214 compresses.

Figure 20 is the figure of the data flow during convolution algorithm in neural network.

With reference to Figure 20, input feature vector mapping graph IFM and output Feature Mapping figure OFM can have three-dimensional matrice form.When holding When row convolution algorithm, multiple kernel KN0 to KN4 with three-dimensional matrice form can be applied to input feature vector mapping graph IFM.Knot Fruit can generate output Feature Mapping figure OFM.

Kernel KN0 to KN4 can be filter different from each other to obtain different characteristics from input feature vector mapping graph IFM. Number phase of the number and the channel of input feature vector mapping graph IFM of included channel CH in each of kernel KN0 to KN4 Together.

When executing convolution algorithm, each in kernel KN0 to KN4 can be in the x-y plane of input feature vector mapping graph IFM Upper translation.Therefore, can convolution algorithm be executed to channel to input feature vector mapping graph IFM and kernel KN0 to KN4 one by one.Citing For, the channel CHk of kernel KN0 to KN4 can be applied to the channel CHk of input feature vector mapping graph IFM in convolution algorithm.When By by a kernel in kernel KN0 to KN4 applied to input feature vector mapping graph IFM come when executing convolution algorithm, can be to each A channel independently executes convolution algorithm.It can be to the output characteristic value with same spatial location (for example, being obtained from convolution algorithm Output feature on the x-y plane identical position and correspond to different channels) be added.Therefore, by by kernel KN0 To one in KN4 output Feature Mapping is can correspond to execute the result of convolution algorithm applied to input feature vector mapping graph IFM Scheme a channel of OFM.

When executing convolution algorithm based on the multiple kernel KN0 to KN4, multiple channels can be generated.As shown in figure 20, when Based on five kernel KN0 to KN4 come when executing convolution algorithm, output Feature Mapping figure OFM may include five channels.

The convolution algorithm for using kernel KN0 to KN4 to carry out respectively can execute simultaneously in parallel.Convolution algorithm can be different It is performed in parallel in processing circuit.However, this convolution algorithm can change with the hardware configuration of neural network.

Figure 21 and Figure 22 is holding in the neural network based on index according to some exemplary embodiments of concept of the present invention The figure of data processing during capable convolution algorithm.Figure 21 shows to enable the convolution algorithm based on index in sparse nerve The data processing efficiently executed in network, the sparse neural network is in input feature vector mapping graph and weight Feature Mapping figure With sparse nonzero value.

As illustrated above by reference to Figure 20, the convolution algorithm based on kernel KN0 to KN4 can be respectively in different processing circuits In execute simultaneously in parallel.However, according to the present example of concept of the present invention, when in the neural network based on index (and Specifically, in sparse neural network) each channel of input feature vector mapping graph IFM is directed in different processing circuits When executing convolution algorithm simultaneously in parallel, it can perform the operation to the input feature vector progress with nonzero value and can skip to having The operation that the input feature vector of zero carries out.Since the input feature vector with nonzero value is in multiple letters of input feature vector mapping graph IFM There is different spatial positions in road, each letter for input feature vector mapping graph IFM in different processing circuits can be passed through Road executes operation to be conducive to skip the operation for carrying out zero respectively.

As described above, in order in different processing circuits for each channel parallel of input feature vector mapping graph IFM Execute convolution algorithm, the neural network based on index to each kernel can divide and by the phase of each kernel with channel Cochannel is grouped as a channel group again.

With reference to Figure 21, the channel of the first kernel KN0 to the 5th kernel KN4 shown in Figure 20 can be grouped again.For example, The first channel of kernel KN0 to KN4 can be grouped as the first channel group CH0, and the second channel of kernel KN0 to KN4 again It can be grouped as second channel group CH1 again.In this fashion, multiple channels of kernel KN0 to KN4 can be grouped again At different channel groups.Since the channel number of each kernel is identical as channel number " n " of input feature vector mapping graph, because This can generate " n " a channel group CH0 to CHn-1 by being grouped again.Each channel group is referred to alternatively as core.

When executing convolution algorithm, can be used in channel group CH0 to CHn-1 every with input feature vector mapping graph FIM The corresponding channel group of one channel.For example, second channel that can be to input feature vector mapping graph IFM and second channel group CH1 executes convolution algorithm.Each in channel group CH0 to CHn-1 includes the channel of kernel KN0 to KN4, and therefore, base The result of the convolution algorithm of a channel group in channel group CH0 to CHn-1 can influence to export Feature Mapping figure OFM's All first channels are to the 5th channel.In the convolution algorithm result for " n " a channel group, to being generated from a kernel And export Feature Mapping figure OFM on correspond to a spatial position convolution algorithm result be added when, can complete defeated Go out Feature Mapping figure.

With reference to Figure 22, can by different channels and spatial position having the same (that is, identical index) it is defeated Enter feature IFB and carries out convolution from different channel groups.Due to according to some exemplary embodiments of concept of the present invention based on Operation is carried out to nonzero value in the neural network of index, therefore operation will not be executed to the input feature vector with zero.Therefore, divide Do not include the 6th of the 6th input feature vector F5 with zero with the first channel for including the first input feature vector F0 with zero Channel and including having the operation of the corresponding processing circuit of the 9th channel of the 9th input feature vector F8 of zero that can be interrupted. However, since neural network device 200 based on index is transported based on index corresponding with the input feature vector with nonzero value Row, and the input feature vector with nonzero value is provided to corresponding processing circuit, therefore processing circuit can substantially be run to Until completing the operation carried out to the input feature vector with nonzero value in each channel of input feature vector mapping graph IFM.

Figure 23 is the figure according to the neural network processor 210a of some exemplary embodiments of concept of the present invention.Neural network Processor 210a can have be suitable for reference to Figure 21 and Figure 22 sparse neural network computing illustrated hardware configuration and can For input feature vector mapping graph IFM each channel parallel execute operation.

It may include selector 215a, multiple processing circuit 211a_0 to 211a_ with reference to Figure 23, neural network processor 210a K, and global accumulator (global accumulator) 216.Neural network processor 210a may also comprise list maker And compressor reducer.

Neural network processor 210a can generate input feature vector row for each channel of input feature vector mapping graph IFM Table.The input feature vector list of input feature vector included in each channel can be provided to processing circuit by selector 215a One in 211a_0 to 211a_k.For example, selector 215a can be by the defeated of input feature vector included in the first channel Enter feature list and be provided to the first processing circuit 211a_0, and can be by the input feature vector of input feature vector included in kth channel List is provided to kth processing circuit 211a_k.

Processing circuit 211a_0 to 211a_k can correspond respectively to the channel of input feature vector mapping graph IFM.In other words, locate Each in reason circuit 211a_0 to 211a_k can correspond to core (that is, one in channel group shown in Figure 21 and Figure 22 It is a).Knot of the structure of each processing circuit in processing circuit 211a_0 to 211a_k with processing circuit 211 shown in Figure 17 Structure is similar.However, each in processing circuit 211a_0 to 211a_k may include it is corresponding with an element of processing circuit 211 Multiple element to execute operation for multiple input feature parallel.

For example, the first processing circuit 211a_0 may include multiple index re-mapper 21a, multiple first data operations Circuit 22a, multiple second data operation circuit 23a and private memory 24a.

Each in index re-mapper 21a may include arithmetic circuity.First data operation circuit 22a can be to multiply The array of musical instruments used in a Buddhist or Taoist mass.Second data operation circuit 23a can be the array of adder.However, concept of the present invention is not limited only to this.The Each in two data operation circuit 23a may also comprise arithmetic circuity.

Private memory 24a can store weighted list WL or look-up table LUT.When neural network processor 210a executes convolution When operation, weight corresponding with the weight from weighted list WL can be indexed output to index and remapped by private memory 24a Device 21a and weighted value corresponding with weight can be exported to the first data operation circuit 22a.Weighted list WL may include weight rope Draw, weighted value and kernel corresponding with each weight index.Kernel indexes.

When neural network processor 210a executes nonlinear operation, private memory 24a can will be corresponding with input feature vector Parameter be provided to the first data operation circuit 22a and the second data operation circuit 23a to support piecewise linear function.

The operation of first processing circuit 211a_0 is similar to the operation of processing circuit 211 that referring to Fig.1 7 to Figure 19 illustrate. However, index re-mapper 21a can be performed in parallel index operation and the first data operation circuit 22a and the second data operation are electric Road 23a can be performed in parallel data operation.

Other processing circuit 211a_1 to 211a_k can substantially comprise member identical with the first processing circuit 211a_0 Part and executable and substantially the same the first processing circuit 211a_0 operation.

Meanwhile certain values in the operation values exported from each processing circuit 211a_0 to 211a_k can correspond to output spy Levy the same position of mapping graph.Therefore, global accumulator 216 can be to exporting but corresponding to output spy from different processing circuits The operation values of same position on sign mapping graph are added.

At this point, due to the characteristic of sparse neural network, the operation values quilt that is exported from processing circuit 211a_0 to 211a_k Be mapped in output Feature Mapping figure on position can random distribution, and from processing circuit 211a_0 to 211a_k simultaneously output fortune It the position that calculation value is mapped to can be mutually the same on output Feature Mapping figure.When global accumulator 216 adds up from real time When the operation values that reason circuit 211a_0 to 211a_k is exported, the load of global accumulator 216 can be excessively increased.

For this reason, the second number included in each processing circuit in processing circuit 211a_0 to 211a_k According to computing circuit 23a can according on output Feature Mapping figure spatial position and channel come to from the first data operation circuit 22a The operation values of output are added generates addition value to be directed to each spatial position and channel.Processing circuit 211a_0 is extremely 211a_k can be synchronized to export addition value.Each in second data operation circuit 23a may include static random-access Memory bank (SRAM bank) with according to output Feature Mapping figure on spatial position and channel come to from the first data transport The operation values for calculating circuit 22a outputs are added.

The addition value exported from processing circuit 211a_0 to 211a_k can be according to the corresponding position on output Feature Mapping figure It sets and is outputted as vector data.Global accumulator 216 can add up to vector data.

Figure 24 is the volume executed in the neural network based on index according to some exemplary embodiments of concept of the present invention The figure of data processing during product operation.Figure 24 shows to enable the convolution algorithm based on index in intensive neural network The data processing efficiently executed, the intensive neural network have close in input feature vector mapping graph and weight Feature Mapping figure The nonzero value of collection.

Since intensive neural network is with sparse input feature vector or weight with zero, can be transported by simplifying The operation that step rather than skip carries out zero in calculation step is calculated to be effectively carried out operation.

With reference to Figure 24, can each in input feature vector mapping graph IFM and kernel KN0 to KN4 be subjected to convolution.Based on each The convolution algorithm of a kernel KN0 to KN4 can be performed in parallel in different processing circuits.

As illustrated above by reference to Figure 20, when to a kernel in input feature vector mapping graph IFM and kernel KN0 to KN4 When executing convolution algorithm, convolution algorithm is that same channel is executed.In the operation values obtained by convolution algorithm, can pair with Indicate that the corresponding operation values of output aspect indexing of a spatial position on output Feature Mapping figure OFM are added.To defeated A channel of output Feature Mapping figure OFM can be formed by entering the convolution algorithm that Feature Mapping figure IFM and a kernel carry out.

Input feature vector corresponding with the input feature vector index of a spatial position is indicated can be expressed by input feature value. Weight corresponding with the weight index of a spatial position is indicated can be expressed by weight vectors.Therefore, input feature vector mapping graph It may include that input feature vector index and input feature value corresponding with input feature vector index and weighted list may include that weight indexes And weight vectors corresponding with weight index.For example, each kernel in kernel KN0 to KN4 shown in Figure 24 can have Nine indexes and weighted list may include nine indexes and respectively weight vectors corresponding with nine indexes.

Input feature vector index is added with weight index to generate output aspect indexing.It can be by feature vector and weight The dot product (dot product) of vector is exported as operation values corresponding with output aspect indexing.For an output feature Multiple operation values may be present in index.Each operation values can be added to generate output feature corresponding with aspect indexing is exported Value.

Figure 25 is the figure according to the neural network processor 210b of some exemplary embodiments of concept of the present invention.Shown in Figure 25 Neural network processor 210b, which can have, is suitable for the hardware configuration with reference to intensive neural network computing described in Figure 24 and can needle Operation is performed in parallel to each kernel.

It may include multiple processing circuit 211b_0 to 211b_k with reference to Figure 25, neural network processor 210b.Neural network Processor 210b may also comprise by processing circuit 211b_0 to the 211b_k internal storages shared or support each processing electricity Multiple internal storages of road 211b_0 to 211b_k.Neural network processor 210b may also comprise list maker and compression Device.

Processing circuit 211b_0 to 211b_k can correspond respectively to different kernels.Processing circuit 211b_0 is to 211b_k's Structure is similar to the structure of processing circuit 211 shown in Figure 17.However, since processing circuit 211b_0 to 211b_k can calculate vector Dot product, therefore each in processing circuit 211b_0 to 211b_k may include address remapped device 21b, multiple first data fortune Calculate circuit 22b and multiple second data operation circuit 23b.Each in processing circuit 211b_0 to 211b_k may include using In the private memory 24b of storage weighted list.Weighted list may include weight index and weight corresponding with weight index to Amount.

Address remapped device 21b may include arithmetic circuity.First data operation circuit 22b can be the battle array of multiplier Row.Second data operation circuit 23b can be the array of adder.Address remapped device 21b can be special to the input received from outside Sign index and the weight index provided from private memory 24b execute operation, and the first data operation circuit 22b can be by input feature vector Value is multiplied by weighted value, and the multiplication value obtained by the multiplication can be added by the second data operation circuit 23b.Therefore, can pair with Input feature vector indexes corresponding input feature value and weight vectors corresponding with weight index execute dot product.

Although concept of the present invention is specifically illustrated and elaborated with reference to the embodiment of concept of the present invention, it should be understood, however, that Under conditions of the spirit and scope of following claims, the various changes in form and details can be made.

Claims

1. a kind of method of operation neural network device, which is characterized in that the method includes：

Input feature vector list is generated based on input feature vector mapping graph, the input feature vector list includes input feature vector index and input Characteristic value, the input feature vector index and the input feature vector value correspond to input feature vector；

Based on the first operation that the weight index to input feature vector index and weighted list carries out, output feature rope is generated Draw；And

Based on the second operation for indexing corresponding weighted value to the input feature vector value and with the weight and carrying out, generate with it is described Export the corresponding output characteristic value of aspect indexing.

2. according to the method described in claim 1, it is characterized in that, the generation input feature vector list includes：Based on institute It is special to generate the input to state at least one input feature vector with nonzero value in the multiple input feature of input feature vector mapping graph Levy list.

3. according to the method described in claim 1, it is characterized in that, the weighted list includes at least one weight index and extremely A few weighted value, at least one weight index and at least one weighted value correspond to multiple power of weight mapping graph At least one of weight weight, at least one weight have nonzero value.

4. according to the method described in claim 1, it is characterized in that, the generation output characteristic value includes：It will be described defeated Enter characteristic value and is multiplied by the weighted value.

5. according to the method described in claim 1, it is characterized in that, the generation output characteristic value includes：

Multiplication value is generated based on the input feature vector value is multiplied by the weighted value, it is special that the multiplication value corresponds to the output Sign index；And

Based on multiple multiplication value phases Calais is generated the output characteristic value, the multiple multiplication value corresponds to the output feature Index.

6. according to the method described in claim 1, it is characterized in that, the generation output aspect indexing includes will be described defeated Enter aspect indexing to be added with weight index.

7. according to the method described in claim 6, it is characterized in that, the generation output aspect indexing further comprises：

It will be added obtained addition value divided by integer by described；And

Select the quotient of the division as the output feature based on determining that there is no remainder after the division is completed Index.

8. according to the method described in claim 1, it is characterized in that, the generation input feature vector list includes：

Initial input feature list is generated, the initial input feature list includes corresponding with the position of the input feature vector first Beginning input feature vector indexes and the input feature vector value corresponding with the input feature vector, and

It is special to generate the input being filled with zero based on characteristic offset index is added to the initial input aspect indexing Sign index.

9. according to the method described in claim 1, it is characterized in that, further comprising generating the weight row from weight mapping graph Table.

10. according to the method described in claim 9, it is characterized in that, the generation weighted list includes：

Initial weight list is generated, the initial weight list includes initial weight index corresponding with the position of weight and comes from The weighted value of the weight of the weight mapping graph, and

Initial weight index is adjusted to correspond to certain operations.

11. according to the method described in claim 10, it is characterized in that, the adjustment initial weight index includes：

Index is biased based on weight to form the mirror image of the initial weight index, and the weight biasing index indicates the weight The center of the matrix of mapping graph, and

The weight biasing index is subtracted from mirror image weight index.

12. according to the method described in claim 1, it is characterized in that, generating information signal based on the output characteristic value.

13. a kind of method of operation neural network device, which is characterized in that the method includes：

Input feature vector list is generated, the input feature vector list includes input feature vector corresponding with having the input feature vector of nonzero value Index and input feature vector value, the input feature vector index indicate position of the input feature vector on input feature vector mapping graph；

Based on the index operation carried out to input feature vector index, output aspect indexing is generated；And

Based on the data operation carried out to the input feature vector value, output feature corresponding with the output aspect indexing is generated Value.

14. according to the method for claim 13, which is characterized in that described to generate the output aspect indexing and include：By institute It states input feature vector index to be added with the weight of weighted list index, the weight index corresponds to the weight with nonzero value.

15. according to the method for claim 14, which is characterized in that described to generate the output characteristic value and include：It will be described Input feature vector value is multiplied by weighted value corresponding with weight index.

16. according to the method for claim 13, which is characterized in that described to generate the output aspect indexing and include：

Based on specific sub-sample size, the input feature vector is indexed and executes division, and

Select the quotient of the division as the output aspect indexing, the output aspect indexing corresponds to the input feature vector.

17. according to the method for claim 13, which is characterized in that generate the output characteristic value include will be with the output The corresponding output characteristic value of aspect indexing is calculated as：

Maximum value in multiple input characteristic value corresponding with the output aspect indexing, or

The average value of the multiple input feature vector value.

18. a kind of method of operation neural network device, which is characterized in that including：

Input feature vector list, the input feature vector row are generated based on input feature vector mapping graph using the list maker of processor Table includes input feature vector index and input feature vector value, and it is special that the input feature vector index and the input feature vector value correspond to input Sign；And

The index re-mapper of the processor is set to execute the first operation to generate output aspect indexing, the first operation packet It includes：

Input feature vector index is added with the weight of weighted list index,

The addition value divided by integer that will be obtained by the addition, and

19. according to the method for claim 18, which is characterized in that further comprise：

Data operation circuit is set to execute the second operation to the input feature vector value and weighted value corresponding with weight index, with Generate output characteristic value corresponding with the output aspect indexing.

20. according to the method for claim 19, which is characterized in that described to generate the output characteristic value and include：