WO2017185387A1 - 一种用于执行全连接层神经网络正向运算的装置和方法 - Google Patents
一种用于执行全连接层神经网络正向运算的装置和方法 Download PDFInfo
- Publication number
- WO2017185387A1 WO2017185387A1 PCT/CN2016/080968 CN2016080968W WO2017185387A1 WO 2017185387 A1 WO2017185387 A1 WO 2017185387A1 CN 2016080968 W CN2016080968 W CN 2016080968W WO 2017185387 A1 WO2017185387 A1 WO 2017185387A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- module
- instruction
- storage unit
- unit
- vector
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims description 39
- 239000013598 vector Substances 0.000 claims abstract description 91
- 210000002364 input neuron Anatomy 0.000 claims abstract description 28
- 210000004205 output neuron Anatomy 0.000 claims abstract description 28
- 239000010410 layer Substances 0.000 claims description 88
- 238000004364 calculation method Methods 0.000 claims description 25
- 230000004913 activation Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 16
- 210000002569 neuron Anatomy 0.000 claims description 12
- 239000002356 single layer Substances 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 8
- 239000000872 buffer Substances 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 2
- 238000012886 linear function Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 3
- 101100283966 Pectobacterium carotovorum subsp. carotovorum outN gene Proteins 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0875—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/452—Instruction code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Definitions
- the present invention generally relates to artificial neural networks, and in particular to an apparatus and method for performing a full connection layer artificial neural network forward operation.
- Multi-layer artificial neural networks are widely used in the fields of pattern recognition, image processing, function approximation and optimization calculation. Multi-layer artificial networks have been accepted by academia in recent years due to their high recognition accuracy and good parallelism. The industry is getting more and more attention. Artificial neural networks involve a variety of algorithms. The fully connected layer is an important algorithm in artificial neural networks and is widely used in various artificial neural network models.
- One known method of supporting the forward operation of a fully connected layer of a multi-layer artificial neural network is to use a general purpose processor.
- the method supports the above algorithm by executing general purpose instructions using a general purpose register file and generic functions.
- One of the disadvantages of this approach is that the performance of a single general purpose processor is low and cannot meet the performance requirements of conventional multi-layer artificial neural network operations.
- communication between general-purpose processors becomes a performance bottleneck.
- the general-purpose processor needs to decode the multi-layer artificial neural network into a long column operation and a fetch instruction sequence, and the processor front-end decoding brings a large power consumption overhead.
- Another known method of supporting multi-layer artificial neural network forward operations is to use a graphics processing unit (GPU).
- the method supports the above algorithm by executing a generic SIMD instruction using a general purpose register file and a generic stream processing unit. Since the GPU is a device dedicated to performing graphics and image operations and scientific calculations, without the special support for multi-layer artificial neural network operations, a large amount of front-end decoding work is still required to perform multi-layer artificial neural network operations, bringing a large number of Additional overhead.
- the GPU has only a small on-chip cache, and the model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip, and the off-chip bandwidth becomes the main performance bottleneck.
- An aspect of the present invention provides an apparatus for performing an artificial neural network full connection layer forward operation, including an instruction storage unit, a controller unit, a data access unit, an interconnection module, a main operation module, And multiple slave arithmetic modules, where:
- the instruction storage unit reads the instruction through the data access unit and stores the read instruction
- the controller unit reads an instruction from the instruction storage unit, and translates the instruction into a control signal for controlling behavior of other modules, the other module including a data access unit, a main operation module, and the plurality of slave operation modules;
- the data access unit performs data or instruction read and write operations between the external address space and the device;
- the interconnect module is used to connect the main operation module and the slave operation module
- the main operation module is used to implement a function activation operation in the artificial neural network full connection layer algorithm
- the slave computing module is used to implement multiplication and addition of input neurons and weight parameters in the artificial neural network connection layer algorithm
- An interconnection module is used for data transmission between the main operation module and the slave operation module.
- the main operation module transmits the input neuron vector to each through the interconnection module.
- the interconnect module progressively combines the output neuron values of the respective slave modules into intermediate result vectors, and sends them back to the main operation module for subsequent calculation.
- Another aspect of the present invention provides a method of performing a single layer artificial neural network full connection layer forward operation using the above apparatus.
- Another aspect of the present invention provides a method of performing a multi-layer artificial neural network full connection layer forward operation using the above apparatus.
- the device can be applied to the following (including but not limited to) scenarios: data processing, robots, computers, printers, scanners, phones, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers , cameras, camcorders, projectors, watches, earphones, mobile storage, wearable devices and other electronic products; aircraft, ships, vehicles and other types of transportation; televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, Electric lights, gas stoves, range hoods and other household appliances; and including nuclear magnetic resonance instruments, B-ultrasound, electrocardiograph and other medical equipment.
- FIG 1 shows an example block diagram of the overall structure of an apparatus for performing an artificial neural network full connection layer forward operation, in accordance with an embodiment of the present invention.
- FIG. 2 is a schematic diagram showing the implementation of an artificial neural network full connection layer forward according to an embodiment of the present invention.
- FIG. 3 illustrates an example block diagram of a main operational module structure in an apparatus for performing an artificial neural network full connectivity layer forward operation, in accordance with an embodiment of the present invention.
- FIG. 4 is a block diagram showing an example of a slave module structure in an apparatus for performing an artificial neural network full-layer layer forward operation in accordance with an embodiment of the present invention.
- FIG. 5 illustrates an example block diagram of a neural network full connectivity layer forward operation process in accordance with an embodiment of the present invention.
- FIG. 6 illustrates an implementation of a single layer artificial neural network full connectivity layer forward operation, in accordance with one embodiment.
- FIG. 7 is a flow chart showing a single layer artificial neural network full connection layer operation in accordance with an embodiment of the present invention.
- the multi-layer artificial neural network full connection layer forward operation includes two or more layers of multiple neurons.
- the input neuron vector is first subjected to a dot product with the weight vector, and the result is biased and the activation function is used to obtain the output neuron.
- the addition of offset and activation operations are optional operations, and the activation function can be any of sigmoid, tanh, relu, and softmax.
- FIG. 1 illustrates a method for performing an artificial neural network full connection layer forward operation according to an embodiment of the present invention.
- the apparatus includes an instruction storage unit 1, a controller unit 2, a data access unit 3, an interconnection module 4, a main operation module 5, and a plurality of slave operation modules 6.
- the instruction storage unit 1, the controller unit 2, the data access unit 3, the interconnection module 4, the main operation module 5, and the slave operation module 6 can all pass hardware circuits (including but not limited to FPGA, CGRA, application specific integrated circuit ASIC, analog circuit). Or memristor, etc.).
- the instruction storage unit 1 reads in an instruction through the data access unit 3 and stores the read instruction.
- the instruction memory unit 1 can be implemented by various memory devices (SRAM, DRAM, memristor, 3D-DRAM or nonvolatile memory, etc.).
- the controller unit 2 reads an instruction from the instruction storage unit 1, and translates the instruction into a control signal for controlling the behavior of other modules, for example, the data access unit 3, the main operation module 5, and the slave operation module 6, and the like.
- the data access unit 3 is capable of accessing an external address space, writing data or instructions directly to respective memory cells within the device, or writing data from various memory cells within the device to an external address space.
- the interconnect module 4 is used to connect the main operation module and the slave operation module, and can be implemented into different interconnection topologies (such as a tree structure, a ring structure, a grid structure, a hierarchical interconnection, a bus structure, etc.).
- FIG. 2 schematically shows an embodiment of an interconnection module 4: an H-tree structure.
- the H-tree module 4 constitutes a data path between the main arithmetic module 5 and the plurality of slave arithmetic modules 6, and has a structure of an H-tree type.
- the H-tree is a binary tree path composed of multiple nodes. Each node sends the upstream data to the downstream two nodes in the same way, and the data returned by the two downstream nodes are combined and returned to the upstream node.
- the neuron data in the main operation module 5 is sent to the respective slave operation modules 6 through the H-tree module 4; when the calculation process from the operation module 6 is completed, when the slave operation module is completed, After the calculation process is completed, the value of each neuron output from the arithmetic module is progressively formed into a complete vector composed of neurons in the H-tree module as an intermediate result vector. For example, assuming that there are N slave arithmetic modules in the device, the intermediate result vector is segmented by N, each segment has N elements, and the i-th slave computing module calculates the i-th element in each segment.
- the N elements are assembled into a vector of length N through the H-tree module and returned to the main arithmetic module. So if the network has only N output neurons, each slave module only needs to output the value of a single neuron. If the network has m*N output neurons, each slave module needs to output m neuron values.
- FIG. 3 illustrates a method for performing an artificial neural network full connection layer forward operation according to an embodiment of the present invention.
- the main operation module 5 includes a first operation unit 51, a first data dependency determination unit 52, and a first storage unit 53.
- the first operation unit 51 includes a vector addition unit 511 and an activation unit 512.
- the first operation unit 51 receives the control signal from the controller unit, and completes various operation functions of the main operation module 5, and the vector addition unit 511 is configured to implement an offset operation in the forward calculation of the artificial neural network full connection layer, the component
- the offset vector is added to the intermediate result vector transmitted from the arithmetic module 6 through the interconnect module 4, and the output is a vector-added value
- the activation operation unit 512 is used to implement the artificial neural network full-connection layer activation function. operating.
- the input of the component is an intermediate result transmitted from the arithmetic module 6 through the interconnection module 4, or an output result of the vector addition unit 511, and is output as a neuron vector after function activation.
- the offset vector can be read from the external address space or stored locally.
- the first data dependency determining unit 52 is a port in which the first computing unit 51 reads and writes the first storage unit 53, and ensures read/write consistency of data in the first storage unit 53. At the same time, the first data dependency determining unit 52 is also responsible for transmitting the data read from the first storage unit 53 to the slave computing module through the interconnect module 4, and the output data from the computing module 6 is directly sent to the slave module 4 through the interconnect module 4.
- the first arithmetic unit 51 The command output from the controller unit 2 is sent to the calculation unit 51 and the first data dependency determination unit 52 to control its behavior.
- the storage unit 53 is configured to buffer the input data and the output data used by the main operation module 5 in the calculation process.
- each slave arithmetic module 6 includes a second arithmetic unit 61, a data dependency determining unit 62, a second storage unit 63, and a third storage unit 64.
- the second arithmetic unit 61 receives the control signal issued by the controller unit 2 and performs a dot product operation, including a vector multiplication unit 611, and an accumulation operation unit 612.
- the vector multiplication unit 611 is used to implement the alignment multiplication of the neuron vector and the weight vector
- the accumulation operation unit 612 is used to implement an operation of adding each item of the vector together.
- the second data dependency determining unit 62 is responsible for the read and write operations on the second storage unit 63 in the calculation process. Before the second data dependency determining unit 62 performs the read/write operation, it first ensures that there is no read/write consistency conflict between the data used between the instructions. For example, all control signals sent to the data dependency unit 62 are stored in an instruction queue internal to the data dependency unit 62, in which the read command is read. If the range of the read data conflicts with the range of the write command write data of the queue position, the instruction must wait until the dependent write instruction is executed.
- the second storage unit 63 buffers the input neuron vector data and the output neuron value data of the slave arithmetic module 6.
- the third storage unit 64 buffers the weight data required by the slave computing module 6 in the calculation process.
- each slave arithmetic module 6 may store only the weights between all input neurons and partial output neurons.
- the output neurons are segmented according to the number N of the operation modules, and the weights corresponding to the nth output neurons of each segment are stored in the nth slave operation module.
- Each slave arithmetic module 6 can calculate only the dot product of the corresponding row of the in column vector and the weight matrix w, and obtain that the output is a one-dimensional component of the intermediate result vector, and the one-dimensional components are successively spliced in the interconnect module 4 to obtain the intermediate Result vector. So the calculation process becomes a parallel computational part of the process and the subsequent splicing process.
- Each of the slave arithmetic modules 6 calculates an output neuron value, and all of the output neuron values are assembled in the interconnect module 4 to obtain an intermediate result vector. Each slave arithmetic module 6 only needs to calculate the output neuron value corresponding to the module in the intermediate result vector y.
- an instruction set for performing an artificial neural network forward operation on the aforementioned apparatus includes a CONFIG instruction, a COMPUTE instruction, an IO instruction, a NOP instruction, a JUMP instruction, and a MOVE instruction, among which:
- the CONFIG command configures various constants required for current layer calculation before each layer of artificial neural network calculation begins;
- the COMPUTE instruction completes the arithmetic logic calculation of each layer of artificial neural network
- the IO instruction realizes reading input data required for calculation from the external address space and storing the data back to the external space after the calculation is completed;
- the NOP instruction is responsible for clearing the control signals in all control signal buffer queues of the current device, and ensuring that all instructions before the NOP instruction are all executed.
- the NOP instruction itself does not contain any operations;
- the JUMP instruction is responsible for controlling the jump of the next instruction address to be read from the instruction storage unit, and is used to implement the jump of the control flow;
- the MOVE instruction is responsible for carrying data of an address in the internal address space of the device to another address in the internal address space of the device.
- the process is independent of the operation unit and does not occupy the resources of the operation unit during execution.
- FIG. 5 illustrates an example block diagram of an artificial neural network full connectivity layer forward operation process in accordance with an embodiment of the present invention.
- the input neuron vector is respectively subjected to a dot product operation with the weight vector of the slave operation module 6, to obtain a corresponding output neuron value, and all of the output neuron values constitute an intermediate result vector, and the intermediate result
- the vector is subjected to an activation operation, or by adding an offset vector and an activation operation to obtain a final output neuron vector of the layer neural network.
- the weight vector of each slave arithmetic module 6 is the row vector corresponding to the slave arithmetic module 6 in the weight matrix.
- the interconnect module sends the input neuron vector [in0,...,inN] to all slave arithmetic modules, temporarily stored in the neuron cache unit. For the i-th slave arithmetic module, calculate the dot product of its corresponding weight vector [w_i0,...,w_iN] and the input neuron vector.
- the result output from the arithmetic module is integrated into the complete intermediate result vector through the interconnect module and returned to the main operation module, and the activation operation is performed in the main operation module, or the offset and activation operations are performed to obtain the final output neuron vector [ Out0, out1, out2,..., outN].
- FIG. 6 illustrates an implementation of a single layer artificial neural network full connectivity layer forward operation, in accordance with one embodiment.
- the flowchart depicts a process for implementing a single layer neural network full connectivity layer forward operation illustrated in FIG. 1 using the apparatus and instruction set of the present invention.
- Step S1.1 the initial instruction is stored in the instruction storage unit 1;
- Step S1.2 reading an instruction from the instruction storage unit 1;
- Step S1.3 decoding the above instruction
- Step S1.4 performing corresponding operations according to the decoded control signal
- step S1.5 the operation result is written back to the corresponding storage.
- step S1.1 an initialization IO instruction may be stored for carrying subsequent instructions.
- the readable instructions include, but are not limited to, a CONFIG instruction, a COMPUTE instruction, an IO instruction, a NOP instruction, a JUMP instruction, and a MOVE instruction.
- step S1.3 the control signal of the corresponding module is obtained according to the operation type of the instruction (CONFIG, COMPUTE, IO, NOP, JUMP, MOVE, etc.).
- the decoding obtains the configuration information of the remaining modules.
- the control signal of the master-slave operation module is obtained by decoding.
- the control signal of the data access module is decoded.
- NOP instruction no actual control is generated The signal is only used to clear the control signals in all control signal storage queues of the current device, and all the instructions before the NOP instruction are executed.
- For the JUMP instruction a control signal for the jump instruction stream is obtained.
- MOVE command a control signal for carrying data inside the device is obtained.
- step S1.4 the above module 2-6 performs a corresponding operation in accordance with the control signal.
- the interconnection module sends the input neuron vectors [in0, . . . , inN] to all the slave operation modules, and temporarily stores them in the second storage unit 63.
- the i-th slave arithmetic module calculate the dot product of its corresponding weight vector [w_i0,...,w_iN] and the input neuron vector.
- the output from the arithmetic module is integrated into the complete output vector through the interconnect module and returned to the main operation module.
- the activation operation is performed in the main operation module or the offset and activation operations are performed to obtain the final output neuron vector [out0, Out1, out2,...,outN].
- step S1.5 each module writes the result of the operation back to the corresponding storage unit. Taking the operation of the neural network full connection layer forward as an example, the output neuron vector obtained by the main operation module is written back to the first storage unit 53.
- Figure 7 illustrates another more detailed implementation of a single layer artificial neural network full connection layer forward operation.
- step S2.1 an IO instruction is pre-stored at the instruction storage unit 1.
- step S2.2 the operation starts, the controller unit 2 reads the IO instruction from the instruction storage unit 1, and according to the decoded control signal, the data access unit 3 reads all the corresponding artificial neural network full connections from the external address space.
- the layer operates on the instruction and stores it in the instruction storage unit 1.
- step S2.3 the controller unit 2 then reads the next IO instruction from the instruction storage unit 1, and according to the translated control signal, the data access unit 3 reads all the data required by the main operation module 5 from the external address space (for example) , including input neuron vector, interpolation table, constant table, offset, etc.) to the first storage unit 53 of the main operation module 5.
- the external address space for example
- step S2.4 the controller unit 2 then reads the next IO instruction from the instruction storage unit 1, and according to the translated control signal, the data access unit 3 reads the weight matrix data required from the arithmetic module 6 from the external address space. .
- step S2.5 the controller unit 2 then reads the next CONFIG command from the instruction storage unit, and configures various constants required for the calculation of the layer neural network according to the decoded control signal.
- the first arithmetic unit 51, 61 configures the value of the unit internal register based on parameters in the control signal, such as data required for the activation function, and the like.
- Step S2.5 is an optional step, and in some cases, step S2.5 can be skipped.
- step S2.6 the controller unit 2 then reads the next COMPUTE instruction from the instruction storage unit.
- the main operation module 5 first sends the input neuron vector to each slave arithmetic module through the interconnection module 4. 6. Save to the second storage unit 63 of the slave arithmetic module 6.
- step S2.7 the weight vector is read from the third storage unit 64 from the second operation unit 61 of the arithmetic module 6 according to the control signal decoded by the COMPUTE instruction (the row corresponding to the slave operation module 6 in the weight matrix) Vector), reading the input neuron vector from the second storage unit 63, completing the dot product operation of the weight vector and the input neuron vector, and returning the intermediate result through the interconnect module.
- step S2.8 in the interconnection module 4, the intermediate results returned from each of the arithmetic modules 6 are successively assembled into a complete intermediate result vector.
- step S2.9 the main operation module 5 obtains the return value of the interconnection module 4, reads the offset vector from the first storage unit 53 according to the control signal decoded by the COMPUTE instruction, and the vector passing vector returned by the interconnection module 4
- the adding unit 512 is added, then the activation unit 511 activates the addition result and writes the last output neuron vector back to the first storage unit 53.
- step S2.10 the controller unit 2 then reads the next IO instruction from the instruction storage unit, and according to the decoded control signal, the data access unit 3 stores the output neuron vector in the storage unit 53 to the external address space designation address. The operation ends.
- the implementation process is similar to the single-layer neural network full connection layer.
- the next layer of operation instructions will be stored in the main operation module.
- the output neuron address of the previous layer is used as the input neuron address of this layer.
- the weight address and offset address in the instruction are also changed to the address corresponding to this layer.
- the processes or methods depicted in the preceding figures may include hardware (eg, circuitry, dedicated logic) Processing logic, such as firmware, software (eg, software embodied on a non-transitory computer readable medium), or a combination of both.
- processing logic such as firmware, software (eg, software embodied on a non-transitory computer readable medium), or a combination of both.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
- Advance Control (AREA)
Abstract
Description
Claims (12)
- 一种用于执行人工神经网络全连接层正向运算的装置,包括指令存储单元、控制器单元、数据访问单元、互连模块、主运算模块、以及多个从运算模块,其中:指令存储单元通过数据访问单元读入指令并存储读入的指令;控制器单元从指令存储单元中读取指令,将指令译成控制其他模块行为的控制信号,所述其他模块包括数据访问单元、主运算模块和所述多个从运算模块;数据访问单元执行外部地址空间与所述装置之间的数据或指令读写操作;互连模块用于连接主运算模块和从运算模块;主运算模块用于实现人工神经网络全连接层算法中的函数激活运算;从运算模块用于实现人工神经网络全连接层算法中的输入神经元和权值参数的乘法和加法运算;互连模块用于所述主运算模块和所述从运算模块之间的数据传输,在神经网络全连接层正向运算开始之前,主运算模块通过互连模块将输入神经元向量输送到每一个从运算模块,在从运算模块的计算过程结束后,互连模块逐级将各从运算模块的输出神经元值拼成中间结果向量,输送回主运算模块,用于后续计算。
- 根据权利要求1所述的装置,其中,多个从运算模块利用相同的输入神经元向量和各自的权值向量,并行地计算出各自的输出神经元值,每个从运算模块的权值向量是权值矩阵中与该从运算模块相对应的行向量。
- 根据权利要求1所述的装置,其中主运算模块使用的激活函数active是非线性函数sigmoid,tanh,relu,softmax中的任一个或线性函数。
- 根据权利要求1所述的装置,其中主运算模块对中间结果向量加偏置,然后执行激活操作。
- 根据权利要求1所述的装置,其中,互连模块构成主运算模块和所述多个从运算模块之间的连续或离散化数据的数据通路,互连模块为以下任一种结构:树状结构、环状结构、网格状结构、分级互连、总线结构。
- 根据权利要求1所述的装置,其中,主运算模块包括第一存储单元、第一运算单元、第一数据依赖关系判定单元和第一存储单元,其中:神经元缓存单元用于缓存主运算模块在计算过程中用到的输入数据和输出 数据;第一运算单元完成主运算模块的各种运算功能;数据依赖关系判定单元是第一运算单元读写第一存储单元的端口,保证对第一存储单元的数据读写不存在一致性冲突,并且负责从第一存储单元读取输入的神经元向量,并通过互连模块发送给从运算模块;以及来自互连模块的中间结果向量被发送到第一运算单元。
- 根据权利要求1所述的装置,其中,每个从运算模块包括第二运算单元、第二数据依赖关系判定单元、第二存储单元和第三存储单元,其中:第二运算单元接收控制器单元发出的控制信号并进行算数逻辑运算;第二数据依赖关系判定单元负责计算过程中对第二存储单元和第三存储单元的读写操作,保证对第二存储单元和第三存储单元的读写不存在一致性冲突;第二存储单元缓存输入神经元向量的数据以及该从运算模块计算得到的输出神经元值;以及第三存储单元缓存该从运算模块在计算过程中需要的权值向量。
- 根据权利要求6或7所述的装置,其中,第一和第二数据依赖关系判定单元通过以下方式保证读写不存在一致性冲突:判断尚未执行的控制信号与正在执行过程中的控制信号的数据之间是否存在依赖关系,如果不存在,允许该条控制信号立即发射,否则需要等到该条控制信号所依赖的所有控制信号全部执行完成后该条控制信号才允许被发射。
- 一种使用根据权利要求1-8中的任一项的装置执行单层人工神经网络全连接层正向运算的方法,包括:步骤S1.1,将初始指令存放到指令存储单元;步骤S1.2,从指令存储单元中读取一条指令;步骤S1.3,对读取的指令进行译码;步骤S1.4,根据译码得到的控制信号,进行相应操作;步骤S1.5,将操作结果写回到相应存储单元中。
- 一种使用根据权利要求1-8中的任一项的装置执行单层人工神经网络全连接层正向运算的方法,包括:在步骤S2.1,在指令存储单元处预先存入一条IO指令;在步骤S2.2,运算开始,控制器单元从指令存储单元读取该条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取相应的所有人工神经网络全连接层运算指令,并将其存储在指令存储单元中;在步骤S2.3,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取主运算模块需要的所有数据至主运算模块的第一存储单元;在步骤S2.4,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取从运算模块需要的权值矩阵数据;在步骤S2.6,控制器单元接着从指令存储单元读入下一条COMPUTE指令,根据译出的控制信号,主运算模块首先通过互连模块将输入神经元向量发给各从运算模块,保存至从运算模块的第二存储单元;在步骤S2.7,根据COMPUTE指令译出的控制信号,从运算模块的第二运算单元从第三存储单元读取权值向量,从第二存储单元读取输入神经元向量,完成权值向量和输入神经元向量的点积运算,将中间结果通过互连模块返回;在步骤S2.8,在互连模块中,各从运算模块返回的中间结果被逐级拼成完整的中间结果向量;在步骤S2.9,主运算模块得到互连模块的返回值,根据COMPUTE指令译出的控制信号,从第一存储单元读取偏置向量,与互连模块返回的向量通过向量加单元相加,然后激活单元对相加结果做激活,并将最后的输出神经元向量写回至第一存储单元中;在步骤S2.10,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元将存储单元中的输出神经元向量存至外部地址空间指定地址,运算结束。
- 根据权利要求10所述的方法,还包括,在步骤S2.4和步骤S2.6之间执行:步骤S2.5,控制器单元接着从指令存储单元读入下一条CONFIG指令,根据译出的控制信号,配置该层神经网络计算需要的各种常数。
- 一种执行多层人工神经网络全连接层正向运算的方法,包括:对于每一层人工神经网络全连接层,执行如权利要求10所述的方法,其中,当上一层人工神经网络全连接层执行完毕后,下一层的运算指令将主运算模块中存 储的上一层的输出神经元地址作为本层的输入神经元地址,并将指令中的权值地址和/或偏置地址变更至本层对应的地址。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020187033949A KR102486030B1 (ko) | 2016-04-27 | 2016-05-04 | 완전연결층 신경망 정방향 연산 실행용 장치와 방법 |
EP16899898.7A EP3451236A4 (en) | 2016-04-27 | 2016-05-04 | METHOD AND DEVICE FOR CARRYING OUT A FORWARDING OPERATION OF A FULLY CONNECTED LAYERED NEURONAL NETWORK |
US16/174,185 US11373084B2 (en) | 2016-04-27 | 2018-10-29 | Apparatus and methods for forward propagation in fully connected layers of convolutional neural networks |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610270004.0A CN107315571B (zh) | 2016-04-27 | 2016-04-27 | 一种用于执行全连接层神经网络正向运算的装置和方法 |
CN201610270004.0 | 2016-04-27 |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/174,185 Continuation-In-Part US11373084B2 (en) | 2016-04-27 | 2018-10-29 | Apparatus and methods for forward propagation in fully connected layers of convolutional neural networks |
US16/174,185 Continuation US11373084B2 (en) | 2016-04-27 | 2018-10-29 | Apparatus and methods for forward propagation in fully connected layers of convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017185387A1 true WO2017185387A1 (zh) | 2017-11-02 |
Family
ID=60160564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/080968 WO2017185387A1 (zh) | 2016-04-27 | 2016-05-04 | 一种用于执行全连接层神经网络正向运算的装置和方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11373084B2 (zh) |
EP (1) | EP3451236A4 (zh) |
KR (1) | KR102486030B1 (zh) |
CN (3) | CN111860811B (zh) |
WO (1) | WO2017185387A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11373084B2 (en) | 2016-04-27 | 2022-06-28 | Cambricon Technologies Corporation Limited | Apparatus and methods for forward propagation in fully connected layers of convolutional neural networks |
US11423284B2 (en) * | 2018-09-07 | 2022-08-23 | Black Sesame Technologies, Inc | Subgraph tile fusion in a convolutional neural network |
US11977928B2 (en) | 2018-12-12 | 2024-05-07 | Samsung Electronics Co., Ltd. | Apparatus and method for performing a recognition operation in a neural network |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909872B (zh) * | 2017-12-14 | 2023-08-25 | 中科寒武纪科技股份有限公司 | 集成电路芯片装置及相关产品 |
CN107957975B (zh) * | 2017-12-15 | 2021-01-05 | 安徽寒武纪信息科技有限公司 | 一种计算方法及相关产品 |
CN109976809B (zh) * | 2017-12-28 | 2020-08-25 | 中科寒武纪科技股份有限公司 | 调度方法及相关装置 |
CN111338776B (zh) * | 2017-12-28 | 2023-11-28 | 中科寒武纪科技股份有限公司 | 调度方法及相关装置 |
CN109993289B (zh) * | 2017-12-30 | 2021-09-21 | 中科寒武纪科技股份有限公司 | 集成电路芯片装置及相关产品 |
CN110097181B (zh) * | 2018-01-30 | 2023-07-11 | 上海寒武纪信息科技有限公司 | 用于执行人工神经网络正向运算的装置和方法 |
CN110147249B (zh) * | 2018-02-12 | 2021-02-09 | 上海寒武纪信息科技有限公司 | 一种网络模型的计算方法及装置 |
CN110163354B (zh) * | 2018-02-13 | 2020-10-09 | 上海寒武纪信息科技有限公司 | 一种计算装置及方法 |
CN110264229A (zh) * | 2018-03-12 | 2019-09-20 | 优估(上海)信息科技有限公司 | 基于全连接神经网络的二手车定价方法,装置,及系统 |
CN110728364A (zh) * | 2018-07-17 | 2020-01-24 | 上海寒武纪信息科技有限公司 | 一种运算装置和运算方法 |
US11138350B2 (en) | 2018-08-09 | 2021-10-05 | Zoox, Inc. | Procedural world generation using tertiary data |
CN111079925B (zh) * | 2018-10-19 | 2021-04-09 | 中科寒武纪科技股份有限公司 | 运算方法、装置及相关产品 |
CN111078286B (zh) * | 2018-10-19 | 2023-09-01 | 上海寒武纪信息科技有限公司 | 数据通信方法、计算系统和存储介质 |
CN109711539B (zh) * | 2018-12-17 | 2020-05-29 | 中科寒武纪科技股份有限公司 | 运算方法、装置及相关产品 |
CN110020720B (zh) * | 2019-04-01 | 2021-05-11 | 中科寒武纪科技股份有限公司 | 算子拼接方法及装置 |
CN110032450B (zh) * | 2019-04-17 | 2021-04-20 | 中山大学 | 一种基于固态盘扩展内存的大规模深度学习方法及系统 |
CN111831328A (zh) * | 2019-04-18 | 2020-10-27 | 华为技术有限公司 | 数据处理的方法及装置 |
US11589781B2 (en) | 2019-05-31 | 2023-02-28 | Georgetown University | Assessing diseases by analyzing gait measurements |
CN110309911B (zh) * | 2019-07-05 | 2021-01-05 | 安徽寒武纪信息科技有限公司 | 神经网络模型验证方法、装置、计算机设备和存储介质 |
CN112070220B (zh) * | 2020-08-06 | 2023-01-17 | 北京大学 | 一种基于非线性器件的原位自激活神经网络电路及神经网络运算方法 |
CN113791996B (zh) * | 2021-09-10 | 2024-02-06 | 中科寒武纪科技股份有限公司 | 集成电路装置、电子设备、板卡和计算方法 |
KR20240085458A (ko) * | 2022-12-08 | 2024-06-17 | 재단법인대구경북과학기술원 | Ssd 오프로딩을 이용한 인공지능 추론 및 학습 시스템 및 방법 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103680496A (zh) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | 基于深层神经网络的声学模型训练方法、主机和系统 |
CN104376389A (zh) * | 2014-12-10 | 2015-02-25 | 国电南京自动化股份有限公司 | 基于负载均衡的主从式微电网功率负荷预测系统及其方法 |
CN105184366A (zh) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | 一种时分复用的通用神经网络处理器 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5065339A (en) * | 1990-05-22 | 1991-11-12 | International Business Machines Corporation | Orthogonal row-column neural processor |
JPH06195322A (ja) * | 1992-10-29 | 1994-07-15 | Hitachi Ltd | 汎用型ニューロコンピュータとして用いられる情報処理装置 |
JP2000322400A (ja) * | 1999-05-10 | 2000-11-24 | Fuji Xerox Co Ltd | 情報処理装置 |
CN101299185B (zh) * | 2003-08-18 | 2010-10-06 | 上海海尔集成电路有限公司 | 一种基于cisc结构的微处理器结构 |
FR2884008A1 (fr) * | 2005-03-31 | 2006-10-06 | France Telecom | Systeme et procede de localisation de points d'interet dans une image d'objet mettant en oeuvre un reseau de neurones |
US7747070B2 (en) * | 2005-08-31 | 2010-06-29 | Microsoft Corporation | Training convolutional neural networks on graphics processing units |
JP5171118B2 (ja) * | 2007-06-13 | 2013-03-27 | キヤノン株式会社 | 演算処理装置及びその制御方法 |
US20100312736A1 (en) * | 2009-06-05 | 2010-12-09 | The Regents Of The University Of California | Critical Branching Neural Computation Apparatus and Methods |
KR20130111956A (ko) * | 2010-05-19 | 2013-10-11 | 더 리전트 오브 더 유니버시티 오브 캘리포니아 | 신경 처리 장치 |
US9015092B2 (en) * | 2012-06-04 | 2015-04-21 | Brain Corporation | Dynamically reconfigurable stochastic learning apparatus and methods |
US8918351B2 (en) * | 2012-07-30 | 2014-12-23 | International Business Machines Corporation | Providing transposable access to a synapse array using column aggregation |
US9147153B2 (en) * | 2012-11-06 | 2015-09-29 | Rockwell Automation Technologies, Inc. | Empirical modeling with globally enforced general constraints |
US9190053B2 (en) * | 2013-03-25 | 2015-11-17 | The Governing Council Of The Univeristy Of Toronto | System and method for applying a convolutional neural network to speech recognition |
CN104077842B (zh) * | 2014-07-02 | 2017-02-15 | 浙江大学 | 基于图像识别的自选餐厅自助付费装置及其使用方法 |
US10417525B2 (en) * | 2014-09-22 | 2019-09-17 | Samsung Electronics Co., Ltd. | Object recognition with reduced neural network weight precision |
US9411726B2 (en) * | 2014-09-30 | 2016-08-09 | Samsung Electronics Co., Ltd. | Low power computation architecture |
CN105488565A (zh) * | 2015-11-17 | 2016-04-13 | 中国科学院计算技术研究所 | 加速深度神经网络算法的加速芯片的运算装置及方法 |
CN109993285B (zh) * | 2016-01-20 | 2020-02-07 | 中科寒武纪科技股份有限公司 | 用于执行人工神经网络正向运算的装置和方法 |
CN107506828B (zh) * | 2016-01-20 | 2020-11-03 | 中科寒武纪科技股份有限公司 | 用于稀疏连接的人工神经网络计算装置和方法 |
CN109358900B (zh) * | 2016-04-15 | 2020-07-03 | 中科寒武纪科技股份有限公司 | 支持离散数据表示的人工神经网络正向运算装置和方法 |
CN111860811B (zh) | 2016-04-27 | 2024-01-16 | 中科寒武纪科技股份有限公司 | 一种用于执行人工神经网络全连接层正向运算的装置和方法 |
-
2016
- 2016-04-27 CN CN202010614867.1A patent/CN111860811B/zh active Active
- 2016-04-27 CN CN201610270004.0A patent/CN107315571B/zh active Active
- 2016-04-27 CN CN201811221557.2A patent/CN109375951B/zh active Active
- 2016-05-04 EP EP16899898.7A patent/EP3451236A4/en not_active Ceased
- 2016-05-04 WO PCT/CN2016/080968 patent/WO2017185387A1/zh active Application Filing
- 2016-05-04 KR KR1020187033949A patent/KR102486030B1/ko active IP Right Grant
-
2018
- 2018-10-29 US US16/174,185 patent/US11373084B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103680496A (zh) * | 2013-12-19 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | 基于深层神经网络的声学模型训练方法、主机和系统 |
CN104376389A (zh) * | 2014-12-10 | 2015-02-25 | 国电南京自动化股份有限公司 | 基于负载均衡的主从式微电网功率负荷预测系统及其方法 |
CN105184366A (zh) * | 2015-09-15 | 2015-12-23 | 中国科学院计算技术研究所 | 一种时分复用的通用神经网络处理器 |
Non-Patent Citations (2)
Title |
---|
See also references of EP3451236A4 * |
ZHANG, XIAN: "An Algorithm for Training Back-propagation Neural Networks Based on Data Parallelism", CHINA MASTER'S THESES FULL-TEXT DATABASE INFORMATION TECHNOLOGY, vol. 2010, no. 05, 10 May 2015 (2015-05-10), XP009513124, ISSN: 1674-0246 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11373084B2 (en) | 2016-04-27 | 2022-06-28 | Cambricon Technologies Corporation Limited | Apparatus and methods for forward propagation in fully connected layers of convolutional neural networks |
US11423284B2 (en) * | 2018-09-07 | 2022-08-23 | Black Sesame Technologies, Inc | Subgraph tile fusion in a convolutional neural network |
US11977928B2 (en) | 2018-12-12 | 2024-05-07 | Samsung Electronics Co., Ltd. | Apparatus and method for performing a recognition operation in a neural network |
Also Published As
Publication number | Publication date |
---|---|
CN109375951B (zh) | 2020-10-09 |
CN107315571B (zh) | 2020-07-31 |
CN107315571A (zh) | 2017-11-03 |
EP3451236A1 (en) | 2019-03-06 |
EP3451236A4 (en) | 2019-12-25 |
CN109375951A (zh) | 2019-02-22 |
US20190065934A1 (en) | 2019-02-28 |
KR20190003611A (ko) | 2019-01-09 |
KR102486030B1 (ko) | 2023-01-06 |
CN111860811A (zh) | 2020-10-30 |
US11373084B2 (en) | 2022-06-28 |
CN111860811B (zh) | 2024-01-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017185387A1 (zh) | 一种用于执行全连接层神经网络正向运算的装置和方法 | |
WO2017185386A1 (zh) | 一种用于执行卷积神经网络正向运算的装置和方法 | |
KR102470264B1 (ko) | 완전연결층 신경망 역방향 트레이닝 실행용 장치와 방법 | |
CN111860812B (zh) | 一种用于执行卷积神经网络训练的装置和方法 | |
CN109358900B (zh) | 支持离散数据表示的人工神经网络正向运算装置和方法 | |
CN106991476B (zh) | 用于执行人工神经网络正向运算的装置和方法 | |
CN109284825B (zh) | 用于执行lstm运算的装置和方法 | |
WO2017185347A1 (zh) | 用于执行循环神经网络和lstm运算的装置和方法 | |
CN107886166B (zh) | 一种执行人工神经网络运算的装置和方法 | |
WO2017185336A1 (zh) | 用于执行pooling运算的装置和方法 | |
WO2017185248A1 (zh) | 用于执行人工神经网络自学习运算的装置和方法 | |
WO2018058452A1 (zh) | 一种执行人工神经网络运算的装置和方法 | |
WO2017177446A1 (zh) | 支持离散数据表示的人工神经网络反向训练装置和方法 | |
WO2017185335A1 (zh) | 一种用于执行batch normalization运算的装置和方法 | |
CN111860772B (zh) | 一种用于执行人工神经网络pooling运算的装置和方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20187033949 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2016899898 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899898 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016899898 Country of ref document: EP Effective date: 20181127 |