WO2017185391A1 - 一种用于执行卷积神经网络训练的装置和方法 - Google Patents
一种用于执行卷积神经网络训练的装置和方法 Download PDFInfo
- Publication number
- WO2017185391A1 WO2017185391A1 PCT/CN2016/081088 CN2016081088W WO2017185391A1 WO 2017185391 A1 WO2017185391 A1 WO 2017185391A1 CN 2016081088 W CN2016081088 W CN 2016081088W WO 2017185391 A1 WO2017185391 A1 WO 2017185391A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- module
- unit
- instruction
- storage unit
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 58
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000013598 vector Substances 0.000 claims abstract description 49
- 230000006870 function Effects 0.000 claims abstract description 35
- 238000004364 calculation method Methods 0.000 claims abstract description 34
- 230000004913 activation Effects 0.000 claims abstract description 23
- 230000008569 process Effects 0.000 claims description 30
- 238000013528 artificial neural network Methods 0.000 claims description 25
- 238000009825 accumulation Methods 0.000 claims description 2
- 238000004091 panning Methods 0.000 claims 2
- 210000002364 input neuron Anatomy 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 11
- 210000004205 output neuron Anatomy 0.000 description 9
- 210000002569 neuron Anatomy 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 239000000872 buffer Substances 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 4
- 230000006399 behavior Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to an apparatus and method for performing reverse training of a convolutional neural network for efficiently and flexibly performing a reverse training operation of a convolutional neural network according to a convolutional neural network reverse training operation instruction, which can be well solved More and more algorithms in the current computer field contain a large number of convolutional neural network reverse training operations.
- Convolutional neural network is an efficient recognition algorithm widely used in pattern recognition, image processing and other fields in recent years. It has the characteristics of simple structure, less training parameters and adaptability, translation, rotation and scaling. Since the feature detection layer of the CNN/DNN learns through the training data, when the CNN/DNN is used, the feature extraction of the display is avoided, and the learning data is implicitly learned from the training data; The weights of the elements are the same, so the network can learn in parallel, which is also a major advantage of the convolutional network relative to the neural network connected to each other.
- FIG. 1 The flow chart of the convolutional neural network training algorithm provided by the present invention is shown in FIG. 1 , and the flow includes two stages:
- the first phase the forward propagation phase:
- the information is transformed from the input layer to the output layer.
- This process is also the process that the network performs when it is running normally after training. In this process, the network performs the calculation (actually the input is multiplied by the weight matrix of each layer to get the final output):
- a known scheme for performing a convolutional neural network reverse training operation is to use a general purpose processor that performs general purpose instructions through a general register file and general purpose functions to perform convolutional neural network reversal. Training operations.
- a general purpose processor that performs general purpose instructions through a general register file and general purpose functions to perform convolutional neural network reversal. Training operations.
- one of the disadvantages of this method is that a single general purpose processor is mostly used for scalar calculations.
- the performance of the deconvolution neural network in the reverse training operation is low.
- mutual communication between general-purpose processors may become a performance bottleneck.
- a vector processing is performed using a graphics processing unit (GPU) in which convolutional neural network reverse training operations are performed by executing general SIMD instructions using a general purpose register file and a general purpose stream processing unit.
- GPU graphics processing unit
- convolutional neural network reverse training operations are performed by executing general SIMD instructions using a general purpose register file and a general purpose stream processing unit.
- the GPU on-chip cache is too small, and it is necessary to continuously perform off-chip data transfer when performing large-scale convolutional neural network reverse training operations, and the off-chip bandwidth becomes a main performance bottleneck.
- An aspect of the present invention provides an apparatus for performing reverse training of a convolutional neural network, comprising an instruction storage unit, a controller unit, a data access unit, an interconnection module, a main operation module, and a plurality of slave operation modules, wherein :
- An instruction storage unit is configured to store an instruction
- the controller unit is configured to read an instruction from the instruction storage unit and decode the instruction into a control interconnect module, a main operation module, and a control signal of the plurality of slave operation modules;
- the data access unit performs data or instruction read and write operations between the external address space and the device;
- the main operation module transmits the input data of the layer to all the slave arithmetic modules through the interconnection module;
- Each slave arithmetic module calculates a dot product of its own convolution kernel and input data as an intermediate result portion, and the convolution kernel corresponds to the intermediate result portion and;
- the interconnect module sequentially obtains the intermediate result of the layer from the intermediate result part and the splicing of each slave arithmetic module;
- the main operation module uses the intermediate result of this layer to complete the subsequent calculation in the calculation process of each layer.
- Another aspect of the invention provides a method of performing a convolutional neural network instruction.
- the convolutional neural network reverse training arithmetic device and the supporting instruction provided by the invention temporarily store the input data, the output gradient and the convolution kernel participating in the calculation on the scratch pad memory. Only send the same
- the convolutional neural network reverse training arithmetic unit can more flexibly and effectively support data of different widths, and can solve the correlation problem in the data storage, thereby improving the computational task including a large number of convolutional neural networks.
- Execution performance, the instructions adopted by the present invention have a compact format, which makes the instruction set easy to use and the supported vector length flexible.
- the invention can be applied to the following (including but not limited to) scenarios: data processing, robots, computers, printers, scanners, telephones, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers , cameras, camcorders, projectors, watches, earphones, mobile storage, wearable devices and other electronic products; aircraft, ships, vehicles and other types of transportation; televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, Electric lights, gas stoves, range hoods and other household appliances; and including nuclear magnetic resonance instruments, B-ultrasound, electrocardiograph and other medical equipment.
- Figure 1 is a flow chart of a convolutional neural network reverse training algorithm
- FIG. 2 is a schematic diagram of instructions for supporting a convolutional neural network reverse training device provided by the present invention.
- FIG 3 shows an example block diagram of the overall structure of an apparatus for performing an artificial neural network convolutional layer training operation in accordance with an embodiment of the present invention.
- FIG. 4 is a diagram schematically showing the structure of an interconnection module in an apparatus for performing an artificial neural network convolutional layer training operation according to an embodiment of the present invention.
- FIG. 5 illustrates an example block diagram of a main operational module structure in an apparatus for performing an artificial neural network convolutional layer training operation in accordance with an embodiment of the present invention.
- FIG. 6 shows an example block diagram of a slave arithmetic module structure in an apparatus for performing an artificial neural network convolutional layer training operation in accordance with an embodiment of the present invention.
- FIG. 7 shows an example block diagram of a neural network convolutional layer training operation process in accordance with an embodiment of the present invention.
- the invention provides a convolutional neural network reverse training device and a set of supporting instructions, comprising a storage unit, a register unit and a convolutional neural network reverse training operation unit, wherein the storage unit stores data, input and output data gradients and convolution kernels.
- the register unit stores data, input and output data gradients, and addresses stored by the convolution kernel.
- the input neuron vector is based on a convolution window, and data is selected from the output data of the previous layer to obtain input data of the current layer, and then Ladder based on the input data obtained from the selection and the output data from the next layer of the current layer Degree, calculate and update the convolution kernel, and secondly calculate the gradient of the input data according to the convolution kernel and the gradient of the output data and the derivative function of the activation function, and store it in the memory for the next layer of calculation.
- the gradient of the input data is used as the gradient of the output data of the next layer.
- the invention temporarily stores the input data and the convolution kernel involved in the calculation on the cache memory, so that the convolutional neural network can support the data of different widths more flexibly and effectively in the reverse training operation process, and enhance the inclusion of a large number of convolutional neural networks. Calculate the execution performance of the task to the training.
- Figure 1 is a schematic diagram of a convolutional neural network reverse training algorithm, as shown, including convolutional neural network input data, a function of the activation function, a gradient of the output data, and a convolution kernel.
- the input data corresponding to the layer operation module is selected from the output data of the previous layer of the current layer, and then the input data and the data are obtained according to the part.
- the data gradient of the latter layer of the current layer is subjected to vector multiplication vector operation to obtain a gradient of the convolution kernel, which is a scalar.
- the main operation module is to solve the square mean of the gradient of the convolution kernel of all the slave modules of the current layer.
- c is greater than the threshold t
- the value of the convolution kernel is updated according to the gradient of the new convolution kernel.
- ds is the input data with the current layer x
- w is the convolution kernel data corresponding to the input data x and the gradient ds of the output data
- h(x) is the value of the derivative function of the activation function corresponding to the input data.
- the convolution kernel is multiplied and added with the data gradient from the latter layer, and then multiplied by the value of the derivative function of the activation function corresponding to the input data of the current layer to obtain a data gradient output to the previous layer.
- the gradient is Output to memory for the convolutional neural network reverse training operation of its previous layer.
- the convolution window is based on the size of the convolution kernel k x and k y .
- the input data of the same size as the convolution kernel is selected from the beginning, and then the flat shift according to the convolution window.
- the vectors S x and S y are first horizontally translated and then vertically translated to traverse the entire input image.
- the convolutional neural network reverse training operation instruction includes 1 operation code and 7 operation fields, wherein the operation code is used to indicate the convolutional nerve.
- the function of the network reverse training operation instruction, the convolutional neural network reverse training operation unit can perform the convolutional neural network reverse training operation by identifying the operation code, and the operation field is used to indicate the convolutional neural network reverse training operation instruction
- Data information wherein the data information may be an immediate or register number, including a starting address and a data length of the input data, The starting address and data length of the convolution kernel, the starting address and data length of the output gradient, and the type of the activation function.
- the instruction set contains convolutional neural network COMPUTE instructions with different functions as well as CONFIG instructions, IO instructions, NOP instructions, JUMP instructions, and MOVE instructions.
- the COMPUTE instruction includes:
- Convolutional neural network sigmoid instructions According to the instruction, the device respectively extracts the input data of the specified size and the convolution kernel from the designated address of the scratch pad memory, performs a convolution operation in the convolution operation unit, and then activates the output result as sigmoid.
- the device respectively extracts the input data of the specified size and the convolution kernel from the designated address of the scratch pad memory, performs a convolution operation in the convolution operation unit, and then activates the output result as TanH.
- Convolutional neural network ReLU instruction the device respectively extracts the input data of the specified size and the convolution kernel from the designated address of the scratch pad memory, performs a convolution operation in the convolution operation unit, and then activates the output result as ReLU.
- the device extracts the input data of the specified size and the convolution kernel from the designated address of the scratch pad memory, divides the group, performs a convolution operation in the convolution operation unit, and then activates the output result.
- the IO instruction realizes reading input data required for calculation from the external address space and storing the data back to the external space after the calculation is completed;
- the NOP instruction is responsible for clearing the control signals currently loaded into all internal control signal buffer queues, ensuring that all instructions preceding the NOP instruction are completed.
- the NOP instruction itself does not contain any operations;
- the JUMP instruction is responsible for the jump of the next instruction address that the controller will read from the instruction storage unit, and is used to implement the jump of the control flow;
- the MOVE instruction is responsible for carrying data of an address in the internal address space of the device to another address in the internal address space of the device.
- the process is independent of the operation unit and does not occupy the resources of the operation unit during execution.
- FIG. 3 is a schematic structural diagram of a convolutional neural network reverse training computing device according to an embodiment of the present invention.
- the apparatus includes an instruction storage unit 1, a controller unit 2, a data access unit 3, an interconnection module 4, a main operation module 5, and a plurality of slave operation modules 6.
- the instruction storage unit 1, the controller unit 2, the data access unit 3, the interconnection module 4, the main operation module 5, and the slave operation module 6 may all pass through hardware circuits (including but not limited to FPGA, CGRA, application specific integrated circuit ASIC, analog Circuits and memristors, etc.).
- the instruction storage unit 1 reads in an instruction through the data access unit 3 and caches the read instruction.
- the controller unit 2 reads instructions from the instruction storage unit 1, and translates the instructions into control signals that control the behavior of other modules, such as the data access unit 3, the main arithmetic module 5, and the slave arithmetic module 6.
- the data access unit 3 can access the external address space, and directly read and write data, data gradients, and convolution kernel values to each cache unit inside the device according to values such as data size required in the calculation process, and complete loading and the like. storage.
- the interconnection module 4 constitutes a data path between the main operation module and the plurality of slave operation modules, and the interconnection module is any one of the following structures: a tree structure, a ring structure, a grid structure, a hierarchical interconnection, and a bus structure.
- FIG. 4 schematically shows a structure of the interconnection module 4.
- the interconnection module 4 constitutes a data path between the main operation module 5 and the plurality of slave operation modules 6, and has a structure of an H-tree type.
- the H-tree is a binary tree path composed of multiple nodes. Each node sends the upstream data to the downstream two nodes in the same way, and the data returned by the two downstream nodes are combined and returned to the upstream node.
- the neuron data in the main operation module 5 is sent to the respective slave operation modules 6 through the interconnection module 4; when the calculation process from the operation module 6 is completed, when the slave operation is performed
- the convolution kernel update is first performed, and then the value of each neuron output from the operation module is progressively formed into a complete vector composed of neurons in the interconnect module as an intermediate result vector.
- the intermediate result vector is segmented by N, each segment has N elements, and the i-th slave computing module calculates the i-th element in each segment.
- the N elements are assembled into a vector of length N through the interconnect module and returned to the main arithmetic module. So if the network has only N output neurons, each slave unit only needs to output the value of a single neuron. If the network has m*N output neurons, each slave unit needs to output m neuron values.
- FIG. 5 shows an example block diagram of the structure of a main operation module 5 in an apparatus for performing an artificial neural network convolutional layer reverse training operation according to an embodiment of the present invention.
- the main operation module 5 includes a first operation unit 51, a first data dependency determination unit 52, and a first storage unit 53.
- the first operation unit 51 includes a vector addition unit 511 and an activation unit 512.
- the first operation unit 51 receives the control signal from the controller unit, and completes various operation functions of the main operation module 5, and the vector addition unit 511 is configured to implement an offset operation in the artificial neural network convolutional layer reverse training operation.
- the input of the component is an offset vector read from the external address space, and an intermediate result vector transmitted from the arithmetic module 6 through the interconnect module 4, the output is a vector-added value, and the activation unit 512 is used for Implement artificial neural network convolutional layer activation function derivative function multiplication operation.
- the input of the component is the intermediate result transmitted from the arithmetic module 6 through the interconnect module 4, or the output result of the vector addition unit 511, and the output is a function of the activation function.
- the first data dependency determining unit 52 is a port in which the first computing unit 51 reads and writes the first storage unit 53, and ensures read/write consistency of data in the neuron cache unit. At the same time, the first data dependency determining unit 52 is also responsible for transmitting the read data to the slave computing module through the interconnect module 4, and the output data from the computing module 6 is directly sent to the first computing unit 51 through the interconnect module 4. The instruction output by the controller unit 2 is sent to the calculation unit 51 and the first data dependency determination unit 52 to control its behavior;
- the first storage unit 53 is configured to buffer input and output data, input and output data gradients, and convolution kernel values used by the main operation module 5 in the calculation process.
- each slave arithmetic module 6 includes a second arithmetic unit 61, a data dependency determining unit 62, a second storage unit 63, and a third storage unit 64.
- the second arithmetic unit 61 first updates the convolution kernel based on the gradient of the input data and the output data. The second arithmetic unit 61 then receives the control signal from the controller unit 2 and performs a dot product operation.
- the second arithmetic unit 61 includes a vector multiplication unit 611, and an accumulation unit 612.
- the vector multiplication unit 611 is used to implement the alignment multiplication of the neuron vector and the weight vector, and the accumulating unit 612 is configured to implement an operation of adding together each item of the vector.
- the second data dependency judging unit 62 is responsible for the read and write operations on the second storage unit 63 in the calculation process. Before the second data dependency determining unit 62 performs the read and write operations, it first ensures that there is no read/write consistency conflict between the data used between the instructions. For example, all control signals sent to the data dependency unit 62 are stored in an instruction queue internal to the data dependency unit 62, in which the range of read data of the read command is a write command ahead of the queue position. If the range of write data conflicts, the instruction must wait until the write instruction it depends on is executed.
- the second storage unit 63 buffers the input neuron vector data and the output neuron value data of the slave arithmetic module 6.
- the third storage unit 64 buffers the convolution kernel data required by the slave arithmetic module 6 in the calculation process. For each slave arithmetic module 6, only a partial convolution kernel corresponding to a portion of the output neurons is stored. The output neurons are segmented according to the number N of operation units, and the convolution kernel corresponding to the nth output neuron of each segment is stored in the nth slave operation unit.
- the parallelism of the dot product operation in the artificial neural network convolution kernel layer operation process is realized from the arithmetic module 6.
- the kernel is divided into multiple parallel independent tasks, and the output data and the input data are column vectors.
- Each of the calculation modules 6 calculates the portion of the output data and the corresponding convolution kernel.
- the dot product of the input data, each output vector obtained is the pre-activation value of the final result, and these parts are combined in the interconnect module 4 step by step to obtain the final result.
- each of the slave arithmetic modules 6 calculates an output neuron value, and all of the output neuron values are assembled in the interconnect module 4 to obtain an intermediate result vector.
- Each slave arithmetic module 6 only needs to calculate the output neuron value corresponding to the module in the intermediate result vector y.
- the interconnect module 4 combines all the neuron values output from the arithmetic module 6 to obtain a final intermediate result vector y.
- FIG. 7 is a flowchart of executing a convolutional neural network instruction by a convolutional neural network reverse training operation apparatus according to an embodiment of the present invention. As shown in FIG. 7, the process of executing a convolutional neural network instruction includes:
- step S1 an IO instruction is pre-stored at the first address of the instruction memory unit 1.
- step S2 the operation starts, the controller unit 2 reads the IO instruction from the first address of the instruction storage unit 1, and according to the translated control signal, the data access unit 3 reads all the corresponding artificial neural network volumes from the external address space.
- the operation instruction is layered and buffered in the instruction storage unit 1.
- step S3 the controller unit 2 then reads in the next IO instruction from the instruction storage unit, and according to the decoded control signal, the data access unit 3 reads all data required by the main operation module 5 from the external address space (for example, including input).
- the data, the interpolation table, the constant table, the offset, and the like are supplied to the first storage unit 53 of the main arithmetic module 5.
- step S4 the controller unit 2 then reads the next IO command from the instruction memory unit, and the data access unit 3 reads the convolution kernel data required from the arithmetic module 6 from the external address space based on the decoded control signal.
- step S5 the controller unit 2 then reads the next CONFIG command from the instruction storage unit, and according to the translated control signal, the device configures various constants required for the calculation of the layer neural network.
- the first arithmetic unit 51, 61 configures the value of the unit internal register based on parameters in the control signal, including parameters such as data required for the activation function.
- step S6 the controller unit 2 then reads the next COMPUTE instruction from the instruction storage unit.
- the main operation module 5 first passes the input data of the current layer in the convolution window through the interconnection module 4 and The data gradient vector of one layer is sent to each slave arithmetic module 6, saved to the second storage unit 63 of the slave arithmetic module 6, and then the convolution window is moved according to the instruction.
- step S7 the convolution kernel vector is read from the third storage unit 64 from the second operation unit 61 of the arithmetic module 6 according to the control signal decoded by the COMPUTE instruction, and the input data is read from the second storage unit 63 to complete the convolution.
- the dot product operation of the kernel vector and the input data, updating the third storage unit 64, calculating the output data gradient, and the middle The result is returned via the interconnect module.
- step S8 in the interconnect module 4, the intermediate results returned from each of the arithmetic modules 6 are progressively assembled into a complete intermediate result vector.
- step S9 the main operation module 5 obtains the return value of the interconnection module 4, reads the offset vector from the first storage unit 53 according to the control signal decoded by the COMPUTE instruction, and the vector returned by the interconnection module 4 through the vector addition unit. 512 is added, and then the activation unit 511 multiplies the addition result by the derivative function of the activation function, and writes the last output back to the first storage unit 53.
- step S10 the controller unit 2 then reads in the next IO instruction from the instruction storage unit, and according to the translated control signal, the data access unit 3 stores the data gradient of the output in the first storage unit 53 to the external address space designation address. The operation ends.
Abstract
Description
Claims (10)
- 一种用于执行卷积神经网络反向训练的装置,包括指令存储单元、控制器单元、数据访问单元、互连模块、主运算模块、多个从运算模块,其中:指令存储单元用于存储指令;控制器单元用于从指令存储单元读取指令,并将该指令译码成控制互连模块、主运算模块、以及所述多个从运算模块行为的控制信号;数据访问单元执行外部地址空间与所述装置之间的数据或指令读写操作;在每层神经网络反向训练开始计算的阶段,主运算模块通过互连模块向所有的从运算模块传输本层的输入数据;每个从运算模块计算自身的卷积核与输入数据的点积作为中间结果部分和,该卷积核与该中间结果部分和相对应;在从运算模块的计算过程完成后,互连模块逐级将各从运算模块的中间结果部分和拼接得到本层的中间结果;主运算模块在每一层的计算过程中,利用本层的中间结果完成后续计算。
- 根据权利要求1所述的装置,其中,主运算模块包括第一运算单元和第一存储单元,其中:第一运算单元包括向量加法单元以及激活单元,第一运算单元接收来自控制器单元的控制信号,完成主运算模块的各种运算功能;向量加法单元用于实现卷积神经网络反向训练运算中的加偏置操作,该单元的输入为从外部地址空间读取出来的偏置向量,以及通过所述互连模块从从运算模块传送回来的中间结果,输出为偏置向量与中间结果相加后的偏置结果;激活单元用于实现卷积神经网络激活函数导函数乘法操作,该单元的输入为通过所述互连模块从从运算模块传送回来的中间结果,或向量加法单元输出的偏置结果,输出为所述中间结果或偏置结果与激活函数导函数相乘后的输出数据;第一存储单元存储主运算模块在计算过程中用到的输入输出数据,输入输出数据梯度和卷积核数值。
- 根据权利要求1所述的装置,其中:每个从运算模块包括第二运算单元、第二存储单元和第三存储单元;第二运算单元包括向量乘单元以及累加单元,其中,向量乘单元用于实现输入数据和卷积核的对位乘法,累加单元用于实现将对位乘法结果的每一项累加到一起的操 作;第二存储单元存储该从运算模块的输入数据和输出的中间结果部分和;第三存储单元存储该从运算模块在计算过程中需要的卷积核数据。
- 根据权利要求1所述的装置,其中,主运算模块依据卷积核和当前层的输入数据,得到要输出到前一层的数据梯度dx=h(x)∑widsi,其中,ds为与当前层的输入数据x有相互关系的当前层的输出数据梯度,w为该输入数据x与输出数据的梯度ds对应的卷积核数据,h(x)为该输入数据对应的激活函数的导函数的值,即,将卷积核与来自后一层的数据梯度进行乘加运算,然后乘以当前层的输入数据对应的激活函数的导函数的值,得到输出给前一层的数据梯度,最后,将该梯度输出到外部地址空间,用以进行其前一层的卷积反向运算。
- 根据权利要求1所述的装置,其中,互连模块构成主运算模块和所述多个从运算模块之间的数据通路,互连模块为以下任一种结构:树状结构、环状结构、网格状结构、分级互连、总线结构。
- 根据权利要求4所述的装置,其中,卷积窗口依据卷积核的大小kx和ky,在大小为W*H的输入数据集上,依据卷积窗口的平移位矢Sx和Sy,以此先作水平平移,然后再做垂直平移,对整个数据集进行遍历。
- 根据权利要求1所述的装置,其中,依据卷积核,将整个卷积层的计算任务切分成多个并行的独立的任务,输出数据与输入数据是列向量,每个从运算模块计算输出数据中的部分和所对应的卷积核与输入数据的点积,这些部分和在互连模块中逐级两两合并得到最后的结果。
- 根据权利要求1所述的装置,其中,卷积神经网络反向训练运算指令包括1个 操作码和7个操作域,其中,操作码用于指示该卷积神经网络反向训练运算指令的功能,操作域用于指示该卷积神经网络反向训练运算指令的数据信息,其中,数据信息是立即数或寄存器号,包括输入数据的起始地址和数据长度,卷积核的起始地址和数据长度,输出梯度的起始地址和数据长度以及激活函数的类型。
- 一种执行卷积神经网络指令的方法,包括:在步骤S1,在指令存储单元的首地址处预先存入一条IO指令;在步骤S2,运算开始,控制器单元从指令存储单元的首地址读取该条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取相应的所有卷积神经网络运算指令,并将其缓存在指令存储单元中;在步骤S3,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取主运算模块需要的所有数据至主运算模块的第一存储单元;在步骤S4,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元从外部地址空间读取从运算模块需要的卷积核数据;在步骤S5,控制器单元接着从指令存储单元读入下一条CONFIG指令,根据译出的控制信号,配置该层神经网络计算需要的各种常数;在步骤S6,控制器单元接着从指令存储单元读入下一条COMPUTE指令,根据译出的控制信号,主运算模块首先通过互连模块将卷积窗口内的当前层的输入数据和后一层的数据梯度向量发给各从运算模块,保存至从运算模块的第二存储单元,之后,再依据指令移动卷积窗口;在步骤S7,根据COMPUTE指令译出的控制信号,从运算模块的第二运算单元从第三存储单元读取卷积核向量,从第二存储单元读取输入数据,完成卷积核向量和输入数据的点积运算,更新第三存储单元,计算输出数据梯度的中间结果部分和,并将中间结果通过互连模块返回;在步骤S8,在互连模块中,各从运算模块返回的中间结果部分和被逐级拼成完整的中间结果;在步骤S9,主运算模块得到互连模块返回的中间结果,根据COMPUTE指令译出的控制信号,从第一存储单元读取偏置向量,与互连模块返回的中间结果通过向量加法单元相加,然后激活单元对相加结果与激活函数的导函数相乘得到输出数据梯度,并将输出数据梯度写回至主运算模块中的第一存储单元;在步骤S10,控制器单元接着从指令存储单元读入下一条IO指令,根据译出的控制信号,数据访问单元将第一存储单元中的输出的数据梯度存至外部地址空间指定地址,运算结束。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21179374.0A EP3944157A1 (en) | 2016-04-29 | 2016-05-05 | Device and method for performing training of convolutional neural network |
KR1020187033948A KR102544275B1 (ko) | 2016-04-29 | 2016-05-05 | 콘볼루션 신경망 트레이닝 실행용 장치와 방법 |
EP16899902.7A EP3451241A4 (en) | 2016-04-29 | 2016-05-05 | DEVICE AND METHOD FOR LEARNING A CONVOLUTIVE NEURONAL NETWORK |
US16/174,165 US10643129B2 (en) | 2016-04-29 | 2018-10-29 | Apparatus and methods for training in convolutional neural networks |
US16/709,968 US20200111007A1 (en) | 2016-04-29 | 2019-12-11 | Apparatus and methods for training in convolutional neural networks |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610283838.5A CN107341547B (zh) | 2016-04-29 | 2016-04-29 | 一种用于执行卷积神经网络训练的装置和方法 |
CN201610283838.5 | 2016-04-29 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/174,165 Continuation-In-Part US10643129B2 (en) | 2016-04-29 | 2018-10-29 | Apparatus and methods for training in convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017185391A1 true WO2017185391A1 (zh) | 2017-11-02 |
Family
ID=60160560
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/081088 WO2017185391A1 (zh) | 2016-04-29 | 2016-05-05 | 一种用于执行卷积神经网络训练的装置和方法 |
Country Status (5)
Country | Link |
---|---|
US (2) | US10643129B2 (zh) |
EP (2) | EP3944157A1 (zh) |
KR (1) | KR102544275B1 (zh) |
CN (3) | CN111860812B (zh) |
WO (1) | WO2017185391A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344764A (zh) * | 2018-09-28 | 2019-02-15 | 大连民族大学 | 度量视频连续帧与其卷积特征图间差异的系统及装置 |
CN109389588A (zh) * | 2018-09-28 | 2019-02-26 | 大连民族大学 | 度量视频连续帧与其卷积特征图间差异的方法 |
CN110807170A (zh) * | 2019-10-21 | 2020-02-18 | 中国人民解放军国防科技大学 | 多样本多通道卷积神经网络Same卷积向量化实现方法 |
US10643129B2 (en) | 2016-04-29 | 2020-05-05 | Cambricon Technologies Corporation Limited | Apparatus and methods for training in convolutional neural networks |
US20200257977A1 (en) * | 2019-02-12 | 2020-08-13 | Irida Labs S.A. | System and a method to achieve time-aware approximated inference |
US11423284B2 (en) * | 2018-09-07 | 2022-08-23 | Black Sesame Technologies, Inc | Subgraph tile fusion in a convolutional neural network |
TWI793225B (zh) * | 2017-12-14 | 2023-02-21 | 大陸商中科寒武紀科技股份有限公司 | 神經網絡訓練方法及相關產品 |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017185347A1 (zh) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | 用于执行循环神经网络和lstm运算的装置和方法 |
US20180189641A1 (en) | 2017-01-04 | 2018-07-05 | Stmicroelectronics S.R.L. | Hardware accelerator engine |
CN110826712B (zh) * | 2017-12-14 | 2024-01-09 | 中科寒武纪科技股份有限公司 | 神经网络处理器板卡及相关产品 |
CN109978148B (zh) * | 2017-12-28 | 2020-06-23 | 中科寒武纪科技股份有限公司 | 集成电路芯片装置及相关产品 |
CN111582464B (zh) * | 2017-12-29 | 2023-09-29 | 中科寒武纪科技股份有限公司 | 神经网络处理方法、计算机系统及存储介质 |
CN108388446A (zh) * | 2018-02-05 | 2018-08-10 | 上海寒武纪信息科技有限公司 | 运算模块以及方法 |
CN110147249B (zh) * | 2018-02-12 | 2021-02-09 | 上海寒武纪信息科技有限公司 | 一种网络模型的计算方法及装置 |
CN108364061B (zh) * | 2018-02-13 | 2020-05-05 | 北京旷视科技有限公司 | 运算装置、运算执行设备及运算执行方法 |
CN110163357B (zh) * | 2018-02-13 | 2021-06-25 | 上海寒武纪信息科技有限公司 | 一种计算装置及方法 |
CN108829719B (zh) * | 2018-05-07 | 2022-03-01 | 中国科学院合肥物质科学研究院 | 一种非事实类问答答案选择方法及系统 |
CN110728364A (zh) * | 2018-07-17 | 2020-01-24 | 上海寒武纪信息科技有限公司 | 一种运算装置和运算方法 |
US11579921B2 (en) * | 2018-08-29 | 2023-02-14 | Alibaba Group Holding Limited | Method and system for performing parallel computations to generate multiple output feature maps |
CN109343978B (zh) * | 2018-09-27 | 2020-10-20 | 苏州浪潮智能科技有限公司 | 一种深度学习分布式框架用的数据交换方法与装置 |
CN111338694B (zh) * | 2018-12-19 | 2022-05-31 | 上海寒武纪信息科技有限公司 | 运算方法、装置、计算机设备和存储介质 |
CN111045729A (zh) * | 2018-10-12 | 2020-04-21 | 上海寒武纪信息科技有限公司 | 运算方法、装置及相关产品 |
CN111047028A (zh) * | 2018-10-12 | 2020-04-21 | 上海寒武纪信息科技有限公司 | 运算方法、装置及相关产品 |
CN110059797B (zh) * | 2018-10-10 | 2020-03-10 | 中科寒武纪科技股份有限公司 | 一种计算装置及相关产品 |
CN111047025B (zh) * | 2018-10-15 | 2024-04-09 | 华为技术有限公司 | 一种卷积计算方法及装置 |
CN111078286B (zh) * | 2018-10-19 | 2023-09-01 | 上海寒武纪信息科技有限公司 | 数据通信方法、计算系统和存储介质 |
US10387772B1 (en) * | 2018-10-22 | 2019-08-20 | Gyrfalcon Technology Inc. | Ensemble learning based image classification systems |
US11526759B2 (en) | 2018-11-05 | 2022-12-13 | International Business Machines Corporation | Large model support in deep learning |
US11520561B1 (en) * | 2018-11-28 | 2022-12-06 | Amazon Technologies, Inc. | Neural network accelerator with compact instruct set |
CN109670578A (zh) * | 2018-12-14 | 2019-04-23 | 北京中科寒武纪科技有限公司 | 神经网络首层卷积层数据处理方法、装置及计算机设备 |
WO2020160608A1 (en) * | 2019-02-07 | 2020-08-13 | Ocean Logic Pty Ltd | Highly parallel convolutional neural network |
CN110058943B (zh) * | 2019-04-12 | 2021-09-21 | 三星(中国)半导体有限公司 | 用于电子设备的内存优化方法和设备 |
CN110059818B (zh) * | 2019-04-28 | 2021-01-08 | 山东师范大学 | 卷积核参数可配的神经卷积阵列电路核、处理器及电路 |
WO2021012215A1 (zh) * | 2019-07-24 | 2021-01-28 | 华为技术有限公司 | 神经网络切分方法、预测方法及相关装置 |
CN111161705B (zh) * | 2019-12-19 | 2022-11-18 | 寒武纪(西安)集成电路有限公司 | 语音转换方法及装置 |
CN113222101A (zh) * | 2020-02-05 | 2021-08-06 | 北京百度网讯科技有限公司 | 深度学习处理装置、方法、设备和存储介质 |
US11593609B2 (en) | 2020-02-18 | 2023-02-28 | Stmicroelectronics S.R.L. | Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks |
IN202011021305A (zh) * | 2020-05-20 | 2020-06-26 | Hcl Technologies Ltd | |
US11531873B2 (en) | 2020-06-23 | 2022-12-20 | Stmicroelectronics S.R.L. | Convolution acceleration with embedded vector decompression |
CN112418419B (zh) * | 2020-11-20 | 2022-10-11 | 复旦大学 | 一种面向神经网络处理的按优先级调度的数据输出电路结构 |
CN112633498B (zh) * | 2020-12-22 | 2023-04-07 | 天津大学 | 基于数据流的卷积神经网络权重梯度优化方法 |
CN112799599B (zh) * | 2021-02-08 | 2022-07-15 | 清华大学 | 一种数据存储方法、计算核、芯片和电子设备 |
CN115456149B (zh) * | 2022-10-08 | 2023-07-25 | 鹏城实验室 | 脉冲神经网络加速器学习方法、装置、终端及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150596A (zh) * | 2013-02-22 | 2013-06-12 | 百度在线网络技术(北京)有限公司 | 一种反向传播神经网络dnn的训练系统 |
CN104103033A (zh) * | 2014-08-05 | 2014-10-15 | 四川九成信息技术有限公司 | 图像实时处理方法 |
CN104537393A (zh) * | 2015-01-04 | 2015-04-22 | 大连理工大学 | 一种基于多分辨率卷积神经网络的交通标志识别方法 |
US20150294219A1 (en) * | 2014-04-11 | 2015-10-15 | Google Inc. | Parallelizing the training of convolutional neural networks |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5204938A (en) | 1989-05-30 | 1993-04-20 | Loral Aerospace Corp. | Method of implementing a neural network on a digital computer |
CA2069811C (en) * | 1991-10-03 | 1998-08-11 | John Stewart Denker | Projective training system for neural networks |
US7747070B2 (en) * | 2005-08-31 | 2010-06-29 | Microsoft Corporation | Training convolutional neural networks on graphics processing units |
JP5184824B2 (ja) * | 2007-06-15 | 2013-04-17 | キヤノン株式会社 | 演算処理装置及び方法 |
US8942438B2 (en) * | 2010-07-19 | 2015-01-27 | The University Of Maryland, College Park | Method and apparatus for authenticating swipe biometric scanners |
US8965819B2 (en) * | 2010-08-16 | 2015-02-24 | Oracle International Corporation | System and method for effective caching using neural networks |
US10078620B2 (en) * | 2011-05-27 | 2018-09-18 | New York University | Runtime reconfigurable dataflow processor with multi-port memory access module |
CN102346719B (zh) * | 2011-09-20 | 2016-08-10 | 北京国科环宇空间技术有限公司 | 面向航天器的高速运算方法及系统 |
US10071687B2 (en) * | 2011-11-28 | 2018-09-11 | Magna Electronics Inc. | Vision system for vehicle |
CN102707931A (zh) * | 2012-05-09 | 2012-10-03 | 刘大可 | 一种基于并行数据通道的数字信号处理器 |
US9153230B2 (en) * | 2012-10-23 | 2015-10-06 | Google Inc. | Mobile speech recognition hardware accelerator |
CN103105773A (zh) * | 2012-12-27 | 2013-05-15 | 电子科技大学 | 基于神经网络逆辨识与自适应pid的声参量阵控制方法 |
US9679258B2 (en) | 2013-10-08 | 2017-06-13 | Google Inc. | Methods and apparatus for reinforcement learning |
US9665823B2 (en) * | 2013-12-06 | 2017-05-30 | International Business Machines Corporation | Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition |
CN104809426B (zh) * | 2014-01-27 | 2019-04-05 | 日本电气株式会社 | 卷积神经网络的训练方法、目标识别方法及装置 |
US10373047B2 (en) * | 2014-02-28 | 2019-08-06 | Educational Testing Service | Deep convolutional neural networks for automated scoring of constructed responses |
US9346167B2 (en) * | 2014-04-29 | 2016-05-24 | Brain Corporation | Trainable convolutional network apparatus and methods for operating a robotic vehicle |
US20150324690A1 (en) * | 2014-05-08 | 2015-11-12 | Microsoft Corporation | Deep Learning Training System |
CN103971163B (zh) * | 2014-05-09 | 2017-02-15 | 哈尔滨工程大学 | 一种基于归一化最小均方自适应滤波的自适应学习率小波神经网络控制方法 |
CN104102919B (zh) * | 2014-07-14 | 2017-05-24 | 同济大学 | 一种有效防止卷积神经网络过拟合的图像分类方法 |
US20160026912A1 (en) * | 2014-07-22 | 2016-01-28 | Intel Corporation | Weight-shifting mechanism for convolutional neural networks |
CN104281858B (zh) * | 2014-09-15 | 2018-07-10 | 中安消技术有限公司 | 三维卷积神经网络训练方法、视频异常事件检测方法及装置 |
US10417525B2 (en) * | 2014-09-22 | 2019-09-17 | Samsung Electronics Co., Ltd. | Object recognition with reduced neural network weight precision |
US10387773B2 (en) * | 2014-10-27 | 2019-08-20 | Ebay Inc. | Hierarchical deep convolutional neural network for image classification |
CN104463324A (zh) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | 一种基于大规模高性能集群的卷积神经网络并行处理方法 |
CN104933722B (zh) * | 2015-06-29 | 2017-07-11 | 电子科技大学 | 一种基于Spiking‑卷积神经网络模型的图像边缘检测方法 |
CN205139973U (zh) * | 2015-10-26 | 2016-04-06 | 中国人民解放军军械工程学院 | 基于fpga器件构建的bp神经网络 |
CN107578099B (zh) * | 2016-01-20 | 2021-06-11 | 中科寒武纪科技股份有限公司 | 计算装置和方法 |
CN111860812B (zh) | 2016-04-29 | 2024-03-01 | 中科寒武纪科技股份有限公司 | 一种用于执行卷积神经网络训练的装置和方法 |
-
2016
- 2016-04-29 CN CN202010615117.6A patent/CN111860812B/zh active Active
- 2016-04-29 CN CN201610283838.5A patent/CN107341547B/zh active Active
- 2016-04-29 CN CN202010164111.1A patent/CN111310904B/zh active Active
- 2016-05-05 KR KR1020187033948A patent/KR102544275B1/ko active IP Right Grant
- 2016-05-05 EP EP21179374.0A patent/EP3944157A1/en active Pending
- 2016-05-05 WO PCT/CN2016/081088 patent/WO2017185391A1/zh active Application Filing
- 2016-05-05 EP EP16899902.7A patent/EP3451241A4/en not_active Ceased
-
2018
- 2018-10-29 US US16/174,165 patent/US10643129B2/en active Active
-
2019
- 2019-12-11 US US16/709,968 patent/US20200111007A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150596A (zh) * | 2013-02-22 | 2013-06-12 | 百度在线网络技术(北京)有限公司 | 一种反向传播神经网络dnn的训练系统 |
US20150294219A1 (en) * | 2014-04-11 | 2015-10-15 | Google Inc. | Parallelizing the training of convolutional neural networks |
CN104103033A (zh) * | 2014-08-05 | 2014-10-15 | 四川九成信息技术有限公司 | 图像实时处理方法 |
CN104537393A (zh) * | 2015-01-04 | 2015-04-22 | 大连理工大学 | 一种基于多分辨率卷积神经网络的交通标志识别方法 |
Non-Patent Citations (2)
Title |
---|
See also references of EP3451241A4 * |
YANG, XIN: "Traffic Sign Recognition Research and Application Based on Convolutional Neural Network", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE, 15 July 2015 (2015-07-15), CHINA MASTER'S THESES FULL-TEXT DATABASE, XP9512912, ISSN: 1674-0246 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10643129B2 (en) | 2016-04-29 | 2020-05-05 | Cambricon Technologies Corporation Limited | Apparatus and methods for training in convolutional neural networks |
TWI793225B (zh) * | 2017-12-14 | 2023-02-21 | 大陸商中科寒武紀科技股份有限公司 | 神經網絡訓練方法及相關產品 |
US11423284B2 (en) * | 2018-09-07 | 2022-08-23 | Black Sesame Technologies, Inc | Subgraph tile fusion in a convolutional neural network |
CN109344764A (zh) * | 2018-09-28 | 2019-02-15 | 大连民族大学 | 度量视频连续帧与其卷积特征图间差异的系统及装置 |
CN109389588A (zh) * | 2018-09-28 | 2019-02-26 | 大连民族大学 | 度量视频连续帧与其卷积特征图间差异的方法 |
US20200257977A1 (en) * | 2019-02-12 | 2020-08-13 | Irida Labs S.A. | System and a method to achieve time-aware approximated inference |
US11526753B2 (en) * | 2019-02-12 | 2022-12-13 | Irida Labs S.A. | System and a method to achieve time-aware approximated inference |
CN110807170A (zh) * | 2019-10-21 | 2020-02-18 | 中国人民解放军国防科技大学 | 多样本多通道卷积神经网络Same卷积向量化实现方法 |
Also Published As
Publication number | Publication date |
---|---|
US20190065959A1 (en) | 2019-02-28 |
EP3451241A4 (en) | 2019-12-25 |
CN111860812A (zh) | 2020-10-30 |
CN111310904A (zh) | 2020-06-19 |
CN107341547A (zh) | 2017-11-10 |
EP3451241A1 (en) | 2019-03-06 |
EP3944157A1 (en) | 2022-01-26 |
KR102544275B1 (ko) | 2023-06-16 |
CN107341547B (zh) | 2021-04-20 |
KR20190004306A (ko) | 2019-01-11 |
CN111310904B (zh) | 2024-03-08 |
CN111860812B (zh) | 2024-03-01 |
US10643129B2 (en) | 2020-05-05 |
US20200111007A1 (en) | 2020-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017185391A1 (zh) | 一种用于执行卷积神经网络训练的装置和方法 | |
US11922132B2 (en) | Information processing method and terminal device | |
WO2017185386A1 (zh) | 一种用于执行卷积神经网络正向运算的装置和方法 | |
KR102470264B1 (ko) | 완전연결층 신경망 역방향 트레이닝 실행용 장치와 방법 | |
CN111860811B (zh) | 一种用于执行人工神经网络全连接层正向运算的装置和方法 | |
CN109358900B (zh) | 支持离散数据表示的人工神经网络正向运算装置和方法 | |
CN107316078B (zh) | 用于执行人工神经网络自学习运算的装置和方法 | |
CN107341542B (zh) | 用于执行循环神经网络和lstm运算的装置和方法 | |
CN111353589B (zh) | 用于执行人工神经网络正向运算的装置和方法 | |
WO2017185347A1 (zh) | 用于执行循环神经网络和lstm运算的装置和方法 | |
WO2017185336A1 (zh) | 用于执行pooling运算的装置和方法 | |
WO2017185248A1 (zh) | 用于执行人工神经网络自学习运算的装置和方法 | |
WO2017185335A1 (zh) | 一种用于执行batch normalization运算的装置和方法 | |
WO2017177446A1 (zh) | 支持离散数据表示的人工神经网络反向训练装置和方法 | |
CN111860814B (zh) | 一种用于执行batch normalization运算的装置和方法 | |
CN111860772B (zh) | 一种用于执行人工神经网络pooling运算的装置和方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20187033948 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899902 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2016899902 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016899902 Country of ref document: EP Effective date: 20181129 |