WO2017124644A1 - 一种人工神经网络压缩编码装置和方法 - Google Patents

一种人工神经网络压缩编码装置和方法 Download PDF

Info

Publication number
WO2017124644A1
WO2017124644A1 PCT/CN2016/078448 CN2016078448W WO2017124644A1 WO 2017124644 A1 WO2017124644 A1 WO 2017124644A1 CN 2016078448 W CN2016078448 W CN 2016078448W WO 2017124644 A1 WO2017124644 A1 WO 2017124644A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
artificial neural
unit
input
instruction
Prior art date
Application number
PCT/CN2016/078448
Other languages
English (en)
French (fr)
Inventor
陈天石
刘少礼
郭崎
陈云霁
Original Assignee
北京中科寒武纪科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京中科寒武纪科技有限公司 filed Critical 北京中科寒武纪科技有限公司
Publication of WO2017124644A1 publication Critical patent/WO2017124644A1/zh
Priority to US16/041,160 priority Critical patent/US10402725B2/en
Priority to US16/508,139 priority patent/US10726336B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks

Definitions

  • the present invention relates to the field of artificial neural network processing technologies, and more particularly to an artificial neural network compression coding apparatus and method, and more particularly to an execution unit or an apparatus including the execution unit of the artificial neural network algorithm method, and a multi-layer artificial nerve An execution unit of a network operation, a back propagation training algorithm, and a device and method thereof, or a device including the same.
  • Multi-layer artificial neural networks are widely used in the fields of pattern recognition, image processing, function approximation and optimization calculation. Especially in recent years, due to the deep research of back-propagation training algorithms and pre-training algorithms, multi-layer artificial neural networks have been increasingly popular in academia and industry due to their high recognition accuracy and good parallelism. Widespread concern.
  • general-purpose processors are commonly used in the prior art to process multi-layer artificial neural network operations, training algorithms, and compression coding thereof, by using general register files and general functions.
  • the component executes general instructions to support the above algorithm.
  • One of the disadvantages of using a general purpose processor is that the performance of a single general purpose processor is low and cannot meet the performance requirements of conventional multi-layer artificial neural network operations.
  • the communication between the general-purpose processors becomes a performance bottleneck.
  • the general-purpose processor needs to decode the multi-layer artificial neural network into a long-column operation and a sequence of fetch instructions. Front-end decoding brings a large power consumption overhead.
  • GPU graphics processing unit
  • the method supports the above algorithm by executing a generic SIMD instruction using a general purpose register file and a generic stream processing unit. Since the GPU is a device dedicated to performing graphics and image operations and scientific calculations, without the special support for multi-layer artificial neural network operations, a large amount of front-end decoding work is still required to perform multi-layer artificial neural network operations, bringing a large number of Additional overhead.
  • the GPU has only a small on-chip cache, and the model data (weight) of the multi-layer artificial neural network. Need to be carried out from off-chip, the off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption.
  • an object of the present invention is to provide an artificial neural network compression coding apparatus and method.
  • the present invention provides an artificial neural network compression coding apparatus, comprising:
  • a memory interface unit for inputting data and instructions
  • controller unit for reading an instruction from the instruction cache and decoding the instruction into an operation unit
  • An operation unit configured to perform a corresponding operation on data from the memory interface unit according to an instruction of the controller unit; the operation unit mainly performs a three-step operation, and the first step is to input the input neuron and weight data Multiply; the second step performs an addition tree operation for adding the weighted output neurons processed in the first step by the addition tree step by step, or adding the output neurons through the offsets to obtain the biased output neurons.
  • the third step performs an activation function operation to obtain the final output neuron.
  • weight data for the input in the operation unit is expressed as follows:
  • the weight data W2 is expressed using a floating point number.
  • the W1 is converted from W2, wherein the weight data W2 of the reverse operation process has a significant number of len bits, and the n of the weight data W1 of the forward operation process represents the number of significant digits in W1, n ⁇ len; control bit m is used to specify the starting position of the significant digit of W1 in the original weight data.
  • the present invention also proposes to use the above
  • the artificial neural network compression coding method performed by the compression-coded artificial neural network computing device includes the following steps;
  • Step 1 The controller unit reads a multi-layer artificial neural network operation SIMD instruction from the instruction cache;
  • Step 2 The controller unit decodes the instruction into micro-instructions of each functional component, including whether the input neuron cache reads and writes and the number of reading and writing, whether the output neuron cache reads and writes, and the number of reading and writing, weighting nerve Whether the meta-cache reads, writes, reads and writes, and what operations are performed at each stage of the operation unit;
  • Step 3 The arithmetic unit obtains an input vector from the input neuron buffer and the weight buffer, and determines whether to perform the vector multiplication operation according to the operation code of the phase one. If the execution, the result is sent to the next stage, otherwise the input is directly sent to The next stage;
  • Step 4 The operation unit determines whether to perform an addition tree operation according to the operation code of the second stage, and if the execution is performed, sends the result to the next stage, otherwise the input is directly sent to the next stage;
  • Step 5 The operation unit determines whether to perform an operation of the activation function according to the operation code of the third stage, and if executed, sends the result to the output neuron cache.
  • the present invention also provides a method for artificial neural network back propagation training using the artificial neural network computing device for compression coding as described above, comprising the following steps;
  • Step 1 The controller unit reads a multi-layer artificial neural network training algorithm SIMD instruction from the instruction cache;
  • Step 2 The controller unit decodes the instruction into micro-instructions of each operation unit, including whether the input neuron cache reads and writes, the number of read/write times, whether the output neuron cache reads and writes, the number of read/write times, and the weight value Whether the meta-cache reads, writes, reads and writes, and what operations are performed at each stage of the operation unit;
  • Step 3 The arithmetic unit obtains an input vector from the input neuron buffer and the weight buffer, and determines whether to perform the vector multiplication operation according to the operation code of the phase one. If the execution, the result is sent to the next stage, otherwise the input is directly sent to The next stage;
  • Step 4 The operation unit determines whether to perform an addition tree according to the operation code of the second stage. Operation, vector addition, or weight update operation.
  • the present invention also provides an artificial neural network calculation method using the artificial neural network computing device for compression coding as described above, comprising the following steps:
  • Step 1 Convert the stored floating point weight W1 into a representation W2 with a small number of bits
  • Step 2 Perform a forward operation using W2;
  • Step 3 performing an inverse operation update W1 according to the error result obtained by the forward operation
  • Step 4 Iteratively perform the above three steps until the required model is obtained.
  • the present invention provides an apparatus and method for performing a multi-layer artificial neural network operation, a back propagation training algorithm, and a compression coding thereof, and improves the operation and back propagation of a multi-layer artificial neural network.
  • the performance of the training algorithm is reduced, and the number of bits represented by the weight data is reduced to achieve the purpose of compression coding.
  • the invention not only can effectively reduce the model size of the artificial neural network, improve the data processing speed of the artificial neural network, but also can effectively reduce the power consumption and improve the resource utilization rate; and has significant performance and power consumption improvement compared with the existing technology. And can greatly compress the weight data.
  • FIG. 1 is a block diagram showing an example of a neural network forward operation in accordance with an embodiment of the present invention
  • FIG. 2 is a block diagram showing an example of a neural network reverse training algorithm in accordance with an embodiment of the present invention
  • FIG. 3 is a block diagram showing an example of a general structure in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram showing an example of an accelerator chip structure in accordance with an embodiment of the present invention.
  • FIG. 5 is a flowchart of operations of a multi-layer artificial neural network according to an embodiment of the invention.
  • FIG. 6 is a flow chart of a multi-layer artificial neural network back propagation training algorithm according to an embodiment of the invention.
  • FIG. 7 is a flow chart of a multi-layer artificial neural network compression coding algorithm in accordance with an embodiment of the present invention.
  • the invention relates to a multi-layer artificial neural network operation and compression coding thereof, which comprises two or more layers of multiple neurons.
  • the input neuron vector is first subjected to a dot product with the weight vector, and the result is the output neuron through the activation function.
  • the activation function can be a sigmoid function, a tanh function, a relu function, and a softmax function.
  • the training algorithms of the multi-layer artificial neural network include steps of forward propagation, back propagation, and weight update.
  • the weight W2 in the reverse operation is expressed in floating point numbers.
  • the weight data representation method used in the forward operation process is converted from the floating point weight representation (len significant digit) used in the reverse operation process, and the data bit stores n significant digits in the original weight data, n Less than or equal to len.
  • the control bit is used to specify the starting position of the data bit in the original significant digit; since the total data digit used by the weight data representation method used in the forward operation is less than the weight used in the reverse training operation The total number of data bits can be used to achieve compression coding.
  • the instruction cache unit is used for caching instructions
  • the input neuron cache unit is configured to buffer input neuron data
  • the weight buffer unit is used to cache weights
  • the control unit is configured to read the dedicated instruction from the instruction cache and decode it into a micro instruction
  • the arithmetic unit is configured to accept microinstructions issued from the controller unit and perform arithmetic logic operations;
  • the output neuron cache unit is configured to cache output neuron data
  • the memory interface unit is used for data read and write channels
  • Direct memory access channel units are used to read and write data from memory to each cache.
  • the cache used in the above instruction cache unit, input neuron cache unit, weight buffer unit, and output neuron cache unit may be RAM.
  • the three-step operation is mainly performed in the above operation unit.
  • the first step is to multiply the input neuron and the weight data, as follows:
  • the second step performs an addition tree operation for adding the weighted output neurons processed in the first step by the addition tree step by step, or adding the output neurons through the offsets to obtain the biased output neurons, which are expressed as follows. :
  • the third step is to perform the activation function operation to get the final output neuron, which is expressed as follows:
  • the output neuron (in) is activated by an activation function (active) to obtain an activated output neuron (out).
  • the biased output neuron (in) is activated by an activation function (active) to obtain an activated output neuron (out).
  • the artificial neural network compression coding device comprises: a device capable of performing convolution artificial neural network operation, a device capable of performing pooled artificial neural network operation, a device capable of performing lrn artificial neural network operation, and an executable full neural network operation Device.
  • Figure 3 shows a schematic block diagram of the overall structure in accordance with an embodiment of the present invention.
  • I / O interface (1) for I / O data needs to be sent to the accelerator chip through the CPU (3), and then written to the memory by the accelerator chip (4), the dedicated program required by the accelerator chip (4) is also by the CPU (3) ) Transfer to the accelerator chip.
  • Memory (2) for temporary storage of multi-layer artificial neural network models and neuron data, especially when all models cannot be dropped in the accelerator on-chip cache.
  • the central processing unit CPU (3) is used for basic control such as data handling and startup and stop of the accelerator chip (4) as an interface between the accelerator chip (4) and external control.
  • the accelerator is used as a coprocessor of the CPU or GPU to perform multi-layer artificial neural network operations and back propagation training algorithms.
  • the system structure of multiple accelerator interconnections can be interconnected via the PCIE bus to support larger multi-layer artificial neural network operations. They can share the same host CPU or have their own host CPUs. They can share memory or each accelerator has its own memory.
  • the interconnection method can be any interconnection topology.
  • FIG. 4 shows a schematic block diagram of an accelerator chip structure in accordance with an embodiment of the present invention.
  • the input neuron cache (2) stores input neurons of a layer operation.
  • the output neuron cache (5) stores the output neurons of a layer of operations.
  • the weight buffer (8) stores model (weight) data.
  • the direct memory access channel (7) is used to read and write data from the memory (6) to each RAM.
  • the instruction cache (1) is used to store dedicated instructions.
  • the controller unit (3) reads the dedicated instruction from the instruction buffer and decodes it into the microinstruction of each arithmetic unit.
  • the arithmetic unit (4) is used to perform a specific operation.
  • the arithmetic unit is mainly divided into three stages, and the first stage performs a multiplication operation for multiplying the input neurons and weight data.
  • the second stage performs the addition tree operation, and the first and second stages are combined to complete the vector inner product operation.
  • the third stage performs an activation function operation, and the activation function may be a sigmoid function, a tanh function, or the like.
  • the third stage gets the output neurons and writes back to the output neuron cache.
  • FIG. 5 is a flow chart showing the operation of a multi-layer artificial neural network according to an embodiment of the present invention, and FIG. 1 is an example of the operation performed.
  • Step 1 The controller reads a multi-layer artificial neural network operation SIMD instruction from the instruction cache, such as a multi-layer perceptron (MLP) instruction, a convolution instruction, a POOLing instruction, etc., to execute a special neural network algorithm.
  • a general vector/matrix instruction for performing a neural network operation such as a matrix multiply instruction, a vector addition instruction, a vector activation function instruction, or the like.
  • Step 2 The controller decodes the instruction into micro-instructions of each functional component, including whether the input neuron cache reads and writes and the number of reading and writing, and whether the output neuron cache reads, writes, reads, and writes. Number of times, whether the weighted neuron cache reads, writes, or reads and writes, and what operations are performed at each stage of the arithmetic unit.
  • Step 3 The arithmetic unit obtains an input vector from the input neuron buffer and the weight buffer, and determines whether to perform vector multiplication according to the operation code of the first stage, and sends the result to the next stage (if no multiplication is performed, the operation unit directly The input is sent to the next stage).
  • Step 4 The arithmetic unit obtains the output of phase one, determines whether to perform the addition tree operation according to the operation code of phase two, and sends the result to the next stage (if no addition tree operation is performed, the input is directly sent to the next stage) ).
  • Step 5 The arithmetic unit obtains the output of the second stage, and according to the operation code of the third stage, determines whether to perform the operation of the activation function, and sends the result to the output neuron cache.
  • FIG. 6 is a flow chart showing a multi-layer artificial neural network back propagation training algorithm according to an embodiment of the present invention, and FIG. 2 is an example of the performed operation.
  • Step 1 The controller reads a multi-layer artificial neural network training algorithm SIMD instruction from the instruction cache.
  • Step 2 The controller decodes the instruction into micro instructions of each operation unit, including whether the input neuron cache reads and writes and the number of reading and writing, whether the output of the neuron cache reads and writes, and the number of reading and writing, whether the weight neuron cache reads or not Write and read/write times, what operations are performed at each stage of the arithmetic unit.
  • Step 3 The arithmetic unit obtains an input vector from the input neuron buffer and the weight buffer, and determines whether to perform vector multiplication according to the operation code of the first stage, and sends the result to the next stage (if no multiplication is performed, the operation unit directly The input is sent to the next stage).
  • Step 4 The arithmetic unit obtains the output of the phase one, and determines whether to perform the addition tree, the vector addition or the weight update operation according to the operation code of the second stage.
  • FIG. 7 is a flow chart showing a multi-layer artificial neural network weight compression algorithm in accordance with an embodiment of the present invention.
  • Step 1 Convert the stored floating point weight W1 into a representation W2 with a small number of bits.
  • Step 2 Use W2 for forward operation.
  • Step 3 Perform an inverse operation update W1 based on the error result obtained by the forward operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Neurology (AREA)
  • Nonlinear Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Analysis (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

一种人工神经网络压缩编码装置及方法。该人工神经网络压缩编码装置包括内存接口单元(6)、指令缓存(1)、控制器单元(3)和运算单元(4),其中运算单元(4)用于根据控制器单元(3)的指令对来自内存接口单元(6)的数据执行相应的运算;运算单元(4)主要执行三步运算,第一步是将输入的神经元和权值数据相乘;第二步执行加法树运算,用于将第一步处理后的加权输出神经元通过加法树逐级相加,或者将输出神经元通过和偏置相加得到加偏置输出神经元;第三步执行激活函数运算,得到最终输出神经元。该装置及方法不仅能有效减小人工神经网络的模型大小,提高人工神经网络的数据处理速度,而且能有效降低功耗,提高资源利用率。

Description

一种人工神经网络压缩编码装置和方法 技术领域
本发明涉及人工神经网络处理技术领域,更具体地涉及一种人工神经网络压缩编码装置和方法,尤其是涉及执行人工神经网络算法方法的执行单元或包含这些执行单元的设备,以及多层人工神经网络运算、反向传播训练算法及其压缩编码的装置和方法的执行单元或包括这些执行单元的设备。
背景技术
多层人工神经网络被广泛应用于模式识别、图像处理、函数逼近和优化计算等领域。特别是近年来由于反向传播训练算法,以及预训练算法的研究不断深入,多层人工神经网络由于其较高的识别准确度和较好的可并行性,受到学术界和工业界越来越广泛的关注。
随着人工神经网络计算量和访存量的急剧增大,现有技术中通常采用通用处理器来对多层人工神经网络运算、训练算法及其压缩编码进行处理,通过使用通用寄存器堆和通用功能部件执行通用指令来支持上述算法。采用通用处理器的缺点之一是单个通用处理器的运算性能较低,无法满足通常的多层人工神经网络运算的性能需求。而多个通用处理器并行执行时,通用处理器之间相互通讯又成为了性能瓶颈;此外,通用处理器需要把多层人工神经网络运算译码成一长列运算及访存指令序列,处理器前端译码带来了较大的功耗开销。另一种支持多层人工神经网络运算、训练算法及其压缩编码的已知方法是使用图形处理器(GPU)。该方法通过使用通用寄存器堆和通用流处理单元执行通用SIMD指令来支持上述算法。由于GPU是专门用来执行图形图像运算以及科学计算的设备,没有对多层人工神经网络运算的专门支持,仍然需要大量的前端译码工作才能执行多层人工神经网络运算,带来了大量的额外开销。另外GPU只有较小的片上缓存,多层人工神经网络的模型数据(权值) 需要反复从片外搬运,片外带宽成为了主要性能瓶颈,同时带来了巨大的功耗开销。
发明内容
针对上述现有技术的不足,本发明的目的在于提供一种人工神经网络压缩编码装置和方法。
为了实现上述目的,作为本发明的一个方面,本发明提出了一种人工神经网络压缩编码装置,包括:
内存接口单元,用于输入数据和指令;
指令缓存,用于对指令进行缓存;
控制器单元,用于从所述指令缓存中读取指令,并将其译码成运算单元的指令;以及
运算单元,用于根据所述控制器单元的指令对来自所述内存接口单元的数据执行相应的运算;所述运算单元主要执行三步运算,第一步是将输入的神经元和权值数据相乘;第二步执行加法树运算,用于将第一步处理后的加权输出神经元通过加法树逐级相加,或者将输出神经元通过和偏置相加得到加偏置输出神经元;第三步执行激活函数运算,得到最终输出神经元。
其中,所述运算单元中对于输入的权值数据表示如下:
若是正向运算过程,则所述权值数据W1由k比特数据表示,其中包括1比特符号位、m比特控制位以及n比特数据位,即k=1+m+n;其中k、m、n均为自然数;
若是反向运算过程,则所述权值数据W2使用浮点数表示。
其中,所述W1是从W2转换而来,其中反向运算过程的权值数据W2具有len位有效数字,正向运算过程的权值数据W1中的n表示W1中的有效数字位数,n≤len;控制位m用于指定W1的有效数字在原权值数据中的起始位置。
作为本发明的另一个方面,本发明还提出了一种采用如上所述的用 于压缩编码的人工神经网络计算装置进行的人工神经网络压缩编码方法,包括以下步骤;
步骤1:控制器单元从指令缓存中读取一条多层人工神经网络运算SIMD指令;
步骤2:所述控制器单元将所述指令译码成各个功能部件的微指令,包括输入神经元缓存是否读写以及读写次数、输出神经元缓存是否读写以及读写次数、权值神经元缓存是否读写以及读写次数、运算单元各个阶段执行何种操作;
步骤3:运算单元得到从输入神经元缓存和权值缓存的输入向量,根据阶段一的操作码,决定是否执行向量乘法运算,如果执行就将结果发送到下一阶段,否则直接将输入发送到下一阶段;
步骤4:所述运算单元根据阶段二的操作码,决定是否进行加法树运算,如果执行就将结果发送到下一阶段,否则直接将输入发送到下一阶段;
步骤5:所述运算单元根据阶段三的操作码,决定是否进行激活函数的运算,如果执行就将结果发送给输出神经元缓存。
作为本发明的再一个方面,本发明还提出了一种采用如上所述的用于压缩编码的人工神经网络计算装置进行的人工神经网络反向传播训练的方法,包括以下步骤;
步骤1:控制器单元从指令缓存中读取一条多层人工神经网络训练算法SIMD指令;
步骤2:所述控制器单元将所述指令译码成各个运算单元的微指令,包括输入神经元缓存是否读写以及读写次数、输出神经元缓存是否读写以及读写次数、权值神经元缓存是否读写以及读写次数、运算单元各个阶段执行何种操作;
步骤3:运算单元得到从输入神经元缓存和权值缓存的输入向量,根据阶段一的操作码,决定是否执行向量乘法运算,如果执行就将结果发送到下一阶段,否则直接将输入发送到下一阶段;
步骤4:所述运算单元根据阶段二的操作码,决定是否进行加法树 运算、向量加法运算或权值更新运算。
作为本发明的还一个方面,本发明还提出了一种采用如上所述的用于压缩编码的人工神经网络计算装置进行的人工神经网络计算方法,包括以下步骤:
步骤1:将存储的浮点权值W1转换成位数较少的表示W2;
步骤2:使用W2进行正向运算;
步骤3:根据正向运算得到的误差结果进行反向运算更新W1;
步骤4:迭代执行上述三个步骤,直到得到所需要的模型。
基于上述技术方案可知,本发明提供了一种用于执行多层人工神经网络运算、反向传播训练算法及其压缩编码的装置和方法,提高了对于多层人工神经网络运算及其反向传播训练算法的性能,同时降低了权值数据表示的位数,以达到压缩编码的目的。本发明不仅能有效减小人工神经网络的模型大小,提高人工神经网络的数据处理速度,而且能有效降低功耗,提高资源利用率;相对于现有的技术,具有显著的性能和功耗改善,并能大幅压缩权值数据。
附图说明
图1是根据本发明一实施例的神经网络正向运算的示例框图;
图2是根据本发明一实施例的神经网络反向训练算法的示例框图;
图3是根据本发明一实施例的总体结构的示例框图;
图4是根据本发明一实施例的加速器芯片结构的示例框图;
图5是根据本发明一实施例的多层人工神经网络的运算流程图;
图6是根据本发明一实施例的多层人工神经网络反向传播训练算法的流程图;
图7是根据本发明一实施例的多层人工神经网络压缩编码算法的流程图。
具体实施方式
将参考以下所讨论的细节来描述本发明的各实施例和方面,并且所附附图将说明各实施例。下列描述和附图是用于说明本发明的,并且不应当被解释为限制本发明。描述许多具体的细节以提供对本发明的各实施例的透彻理解。然而,在某些实例中,不描述公知的或寻常的细节,以便提供本发明的实施例的简洁的讨论。
说明书中提到的“一个实施例”或“一实施例”意思是结合实施例所描述的特定的特征、结构或特性能够被包括在本发明的至少一个实施例中。在本说明书的多处出现的短语“在一个实施例中”不一定全部都指同一个实施例。
本发明涉及的多层人工神经网络运算及其压缩编码,包括两层或者两层以上的多个神经元。对于每一层来说,输入神经元向量首先和权值向量进行点积运算,结果经过激活函数得到输出神经元。其中激活函数可以是sigmoid函数、tanh函数、relu函数和softmax函数等。此外,多层人工神经网络的训练算法包括正向传播、反向传播和权值更新等步骤。
在正向运算过程中,神经网络权值W1由k比特数据表示,其中包括1比特符号位、m比特控制位以及n比特数据位(k=1+m+n)。
反向运算过程中的权值W2使用浮点数表示。
正向运算过程中使用的权值数据表示方法,从反向运算过程使用的浮点权值表示(len位有效数字)转换而来,数据位存储原权值数据中的n位有效数字,n小于等于len。控制位用于指定数据位在原有效数字中的起始位置;由于正向运算过程中使用的权值数据表示方法所使用的总数据位数少于反向训练运算过程中使用的权值表示的总数据位数,从而可以达到压缩编码的作用。
本发明的用于执行人工神经网络运算、反向传播训练算法及其压缩编码的人工神经网络压缩编码装置,包括指令缓存单元、输入神经元缓存单元、权值缓存单元、控制单元、运算单元、输出神经元缓存单元、内存接口单元和直接内存访问通道单元;
其中,指令缓存单元用于缓存指令;
输入神经元缓存单元用于缓存输入神经元数据;
权值缓存单元用于缓存权值;
控制单元用于从指令缓存中读取专用指令,并将其译码成微指令;
运算单元用于接受从控制器单元发出的微指令并进行算术逻辑运算;
输出神经元缓存单元用于缓存输出神经元数据;
内存接口单元用于数据读写通道;
直接内存访问通道单元用于从内存向各缓存中读写数据。
上述指令缓存单元、输入神经元缓存单元、权值缓存单元和输出神经元缓存单元等中使用的缓存可以为RAM。
在上述运算单元中主要执行三步运算,第一步是将输入的神经元和权值数据相乘,表示如下:
将输入神经元(in)通过运算(f)得到输出神经元(out),过程为:out=f(in);
将输入神经元(in)通过和权值(w)相乘得到加权输出神经元(out),过程为:out=w*in;
第二步执行加法树运算,用于将第一步处理后的加权输出神经元通过加法树逐级相加,或者将输出神经元通过和偏置相加得到加偏置输出神经元,表示如下:
将输出神经元(in)通过和偏置(b)相加得到加偏置输出神经元(out),过程为:out=in+b;
第三步执行激活函数运算,得到最终输出神经元,表示如下:
将输出神经元(in)通过激活函数(active)运算得到激活输出神经元(out),过程为:out=active(in),激活函数active可以是sigmoid函数、tanh函数、relu函数或softmax函数等;
将加权输出神经元(in)通过激活函数(active)运算得到激活输出神经元(out),过程为:out=active(in);
将加偏置输出神经元(in)通过激活函数(active)运算得到激活输出神经元(out),过程为:out=active(in)。
在执行人工神经网络反向训练运算时,运算过程表示如下:
将输入梯度(g)通过加权运算(f)得到输出梯度(g’),过程为:g’=f(g);
将输入梯度(g)通过更新运算(update)得到权值改变(deltaW),过程为:deltaW=update(g);
将输入权值(w)和权值改变(deltaW)相加得到输出权值(w’),过程为:w’=w+deltaW;
将输入梯度(g)通过更新运算(update)得到偏置改变(deltaB),过程为:deltaB=update(g);
将输入偏置(b)和偏置改变(deltaB)相加得到输出偏置(b’),过程为:b’=b+deltaB;
其中,在执行人工神经网络反向训练运算时,所有数据使用浮点数表示。
上述人工神经网络压缩编码装置,包括可执行卷积人工神经网络运算的装置、可执行池化人工神经网络运算的装置、可执行lrn人工神经网络运算的装置及可执行全连接人工神经网络运算的装置。
下面结合附图和具体实施例对本发明的技术方案进行进一步的阐释说明。
图3示出了根据本发明一实施例的总体结构的示意性框图。
I/O接口(1),用于I/O数据需要经过CPU(3)发给加速器芯片,然后由加速器芯片(4)写入内存,加速器芯片(4)需要的专用程序也是由CPU(3)传输到加速器芯片。
内存(2),用于暂存多层人工神经网络模型和神经元数据,特别是当全部模型无法在加速器片上缓存中放下时。
中央处理器CPU(3),用于进行数据搬运以及加速器芯片(4)启动停止等基本控制,作为加速器芯片(4)与外部控制的接口。
加速器芯片(4),用于执行多层人工神经网络运算及其反向传播训练算法模块,接受来自CPU(3)的数据和程序,执行上述多层人工神 经网络运算及其反向传播训练算法,加速器芯片(4)执行结果将传输回CPU(3)。
通用系统架构。将加速器作为CPU或者GPU的协处理器来执行多层人工神经网络运算及其反向传播训练算法。
多加速器互联的系统结构。多个加速器可以通过PCIE总线互联,以支持更大规模的多层人工神经网络运算,可以共用同一个宿主CPU或者分别有自己的宿主CPU,可以共享内存也可以每个加速器有各自的内存。此外其互联方式可以是任意互联拓扑。
图4示出了根据本发明一实施例的加速器芯片结构的示意性框图。
其中输入神经元缓存(2)存储一层运算的输入神经元。
其中输出神经元缓存(5)存储一层运算的输出神经元。
权值缓存(8)存储模型(权值)数据。
直接内存访问通道(7)用于从内存(6)向各RAM中读写数据。
指令缓存(1)用于储存专用指令。
控制器单元(3)从指令缓存中读取专用指令,并将其译码成各运算单元的微指令。
运算单元(4)用于执行具体运算。运算单元主要被分为三个阶段,第一阶段执行乘法运算,用于将输入的神经元和权值数据相乘。第二阶段执行加法树运算,第一、二两阶段合起来完成了向量内积运算。第三阶段执行激活函数运算,激活函数可以是sigmoid函数、tanh函数等。第三阶段得到输出神经元,写回输出神经元缓存。
图5绘示出根据本发明一实施例的多层人工神经网络的运算流程图,图1为所执行运算的一个示例。
步骤1:控制器从指令缓存中读取一条多层人工神经网络运算SIMD指令,例如多层感知机(MLP)指令、卷积指令、池化(POOLing)指令等执行专门神经网络算法的指令,或者矩阵乘指令、向量加指令、向量激活函数指令等用于执行神经网络运算的通用向量/矩阵指令。
步骤2:控制器将该指令译码成各个功能部件的微指令,包括输入神经元缓存是否读写以及读写次数,输出神经元缓存是否读写以及读写 次数,权值神经元缓存是否读写以及读写次数,运算单元各个阶段执行何种操作。
步骤3:运算单元得到从输入神经元缓存和权值缓存的输入向量,根据阶段一的操作码,决定是否执行向量乘法运算,并将结果发送到下一阶段(如不进行乘法运算则直接将输入发到下一阶段)。
步骤4:运算单元得到阶段一的输出,根据阶段二的操作码,决定是否进行加法树运算,并将结果发送到下一阶段(如不进行加法树操作,则直接将输入发送到下一阶段)。
步骤5:运算单元得到阶段二的输出,根据阶段三的操作码,决定是否进行激活函数的运算,并将结果发送给输出神经元缓存。
图6绘示出根据本发明一实施例的多层人工神经网络反向传播训练算法的流程图,图2为所执行运算的一个示例。
步骤1:控制器从指令缓存中读取一条多层人工神经网络训练算法SIMD指令。
步骤2:控制器将该指令译码成各个运算单元的微指令,包括输入神经元缓存是否读写以及读写次数,输出神经元缓存是否读写以及读写次数,权值神经元缓存是否读写以及读写次数,运算单元各个阶段执行何种操作。
步骤3:运算单元得到从输入神经元缓存和权值缓存的输入向量,根据阶段一的操作码,决定是否执行向量乘法运算,并将结果发送到下一阶段(如不进行乘法运算则直接将输入发到下一阶段)。
步骤4:运算单元得到阶段一的输出,根据阶段二的操作码,决定是否进行加法树,向量加法或权值更新运算。
图7绘示出根据本发明一实施例的多层人工神经网络权值压缩算法的流程图。
步骤1:将存储的浮点权值W1转换成位数较少的表示W2。
步骤2:使用W2进行正向运算。
步骤3:根据正向运算得到的误差结果进行反向运算更新W1。
迭代执行上述三个步骤,直到得到所需要的模型。
前面的附图中所描绘的进程或方法可通过包括硬件(例如,电路、专用逻辑等)、固件、软件(例如,被具体化在非瞬态计算机可读介质上的软件),或两者的组合的处理逻辑来执行。虽然上文按照某些顺序操作描述了进程或方法,但是,应该理解,所描述的某些操作能以不同顺序来执行。此外,可并行地而非顺序地执行一些操作。
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种人工神经网络压缩编码装置,其特征在于,包括;
    内存接口单元,用于输入数据和指令;
    指令缓存,用于对指令进行缓存;
    控制器单元,用于从所述指令缓存中读取指令,并将其译码成运算单元的指令;以及
    运算单元,用于根据所述控制器单元的指令对来自所述内存接口单元的数据执行相应的运算;所述运算单元主要执行三步运算,第一步是将输入的神经元和权值数据相乘;第二步执行加法树运算,用于将第一步处理后的加权输出神经元通过加法树逐级相加,或者将输出神经元通过和偏置相加得到加偏置输出神经元;第三步执行激活函数运算,得到最终输出神经元。
  2. 如权利要求1所述的用于压缩编码的人工神经网络计算装置,其特征在于,所述运算单元中对于输入的权值数据表示如下:
    若是正向运算过程,则所述权值数据W1由k比特数据表示,其中包括1比特符号位、m比特控制位以及n比特数据位,即k=1+m+n;其中k、m、n均为自然数;
    若是反向运算过程,则所述权值数据W2使用浮点数表示。
  3. 如权利要求2所述的用于压缩编码的人工神经网络计算装置,其特征在于,所述W1是从W2转换而来,其中反向运算过程的权值数据W2具有len位有效数字,正向运算过程的权值数据W1中的n表示W1中的有效数字位数,n≤len;控制位m用于指定W1的有效数字在原权值数据中的起始位置。
  4. 如权利要求1所述的用于压缩编码的人工神经网络计算装置,其特征在于,所述人工神经网络计算装置还包括DMA,用于在所述内存接口单元与所述指令缓存和/或所述运算单元之间进行数据或者指令读写。
  5. 如权利要求4所述的用于压缩编码的人工神经网络计算装置, 其特征在于,所述人工神经网络计算装置还包括:
    输入神经元缓存,用于缓存输入到所述运算单元的输入神经元数据;
    权值缓存,用于缓存权值数据;以及
    所述输入神经元缓存和权值缓存均通过所述DMA输入。
  6. 如权利要求4所述的用于压缩编码的人工神经网络计算装置,其特征在于,所述人工神经网络计算装置还包括:
    输出神经元缓存,用于缓存所述运算单元输出的输出神经元。
  7. 如权利要求1所述的用于压缩编码的人工神经网络计算装置,其特征在于,所述运算单元在第三步执行的激活函数为sigmoid函数、tanh函数、relu函数或softmax函数。
  8. 一种采用如权利要求1至7任意一项所述的用于压缩编码的人工神经网络计算装置进行的人工神经网络压缩编码方法,其特征在于,包括以下步骤;
    步骤1:控制器单元从指令缓存中读取一条多层人工神经网络运算SIMD指令;
    步骤2:所述控制器单元将所述指令译码成各个功能部件的微指令,包括输入神经元缓存是否读写以及读写次数、输出神经元缓存是否读写以及读写次数、权值神经元缓存是否读写以及读写次数、运算单元各个阶段执行何种操作;
    步骤3:运算单元得到从输入神经元缓存和权值缓存的输入向量,根据阶段一的操作码,决定是否执行向量乘法运算,如果执行就将结果发送到下一阶段,否则直接将输入发送到下一阶段;
    步骤4:所述运算单元根据阶段二的操作码,决定是否进行加法树运算,如果执行就将结果发送到下一阶段,否则直接将输入发送到下一阶段;
    步骤5:所述运算单元根据阶段三的操作码,决定是否进行激活函数的运算,如果执行就将结果发送给输出神经元缓存。
  9. 一种采用如权利要求1至7任意一项所述的用于压缩编码的人工神经网络计算装置进行的人工神经网络反向传播训练的方法,其特征 在于,包括以下步骤;
    步骤1:控制器单元从指令缓存中读取一条多层人工神经网络训练算法SIMD指令;
    步骤2:所述控制器单元将所述指令译码成各个运算单元的微指令,包括输入神经元缓存是否读写以及读写次数、输出神经元缓存是否读写以及读写次数、权值神经元缓存是否读写以及读写次数、运算单元各个阶段执行何种操作;
    步骤3:运算单元得到从输入神经元缓存和权值缓存的输入向量,根据阶段一的操作码,决定是否执行向量乘法运算,如果执行就将结果发送到下一阶段,否则直接将输入发送到下一阶段;
    步骤4:所述运算单元根据阶段二的操作码,决定是否进行加法树运算、向量加法运算或权值更新运算。
  10. 一种采用如权利要求1至7任意一项所述的用于压缩编码的人工神经网络计算装置进行的人工神经网络计算方法,其特征在于,包括以下步骤:
    步骤1:将存储的浮点权值W1转换成位数较少的表示W2;
    步骤2:使用W2进行正向运算;
    步骤3:根据正向运算得到的误差结果进行反向运算更新W1;
    迭代执行上述三个步骤,直到得到所需要的模型。
PCT/CN2016/078448 2016-01-20 2016-04-05 一种人工神经网络压缩编码装置和方法 WO2017124644A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/041,160 US10402725B2 (en) 2016-01-20 2018-07-20 Apparatus and method for compression coding for artificial neural network
US16/508,139 US10726336B2 (en) 2016-01-20 2019-07-10 Apparatus and method for compression coding for artificial neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610039026.6 2016-01-20
CN201610039026.6A CN106991477B (zh) 2016-01-20 2016-01-20 一种人工神经网络压缩编码装置和方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/041,160 Continuation-In-Part US10402725B2 (en) 2016-01-20 2018-07-20 Apparatus and method for compression coding for artificial neural network

Publications (1)

Publication Number Publication Date
WO2017124644A1 true WO2017124644A1 (zh) 2017-07-27

Family

ID=59361365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/078448 WO2017124644A1 (zh) 2016-01-20 2016-04-05 一种人工神经网络压缩编码装置和方法

Country Status (3)

Country Link
US (2) US10402725B2 (zh)
CN (2) CN106991477B (zh)
WO (1) WO2017124644A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046703A (zh) * 2019-03-07 2019-07-23 中国科学院计算技术研究所 一种用于神经网络的片上存储处理系统
WO2019141902A1 (en) * 2018-01-17 2019-07-25 Nokia Technologies Oy An apparatus, a method and a computer program for running a neural network
WO2020058800A1 (en) * 2018-09-19 2020-03-26 International Business Machines Corporation Encoder-decoder memory-augmented neural network architectures
CN113783933A (zh) * 2021-08-10 2021-12-10 中山大学 基于编码缓存的双层网络通信方法、装置及介质
EP3940606A1 (en) * 2016-01-20 2022-01-19 Cambricon Technologies Corporation Limited Device and method for executing reversal training of artificial neural network
TWI767098B (zh) * 2017-12-14 2022-06-11 大陸商中科寒武紀科技股份有限公司 神經網絡正向運算方法及相關產品
US11687759B2 (en) 2018-05-01 2023-06-27 Semiconductor Components Industries, Llc Neural network accelerator
TWI812117B (zh) * 2021-08-27 2023-08-11 台灣積體電路製造股份有限公司 用於記憶體內計算(cim)的記憶體元件及方法

Families Citing this family (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106990937B (zh) * 2016-01-20 2020-10-20 中科寒武纪科技股份有限公司 一种浮点数处理装置和处理方法
CN108271026B (zh) * 2016-12-30 2020-03-31 上海寒武纪信息科技有限公司 压缩/解压缩的装置和系统、芯片、电子装置、方法
US11037330B2 (en) * 2017-04-08 2021-06-15 Intel Corporation Low rank matrix compression
US10817296B2 (en) * 2017-04-21 2020-10-27 Intel Corporation Message based general register file assembly
US11580361B2 (en) * 2017-04-24 2023-02-14 Intel Corporation Neural network training mechanism
EP3637325A4 (en) 2017-05-23 2020-05-27 Shanghai Cambricon Information Technology Co., Ltd TREATMENT METHOD AND ACCELERATION DEVICE
CN109389209B (zh) * 2017-08-09 2022-03-15 上海寒武纪信息科技有限公司 处理装置及处理方法
WO2019007406A1 (zh) 2017-07-05 2019-01-10 上海寒武纪信息科技有限公司 一种数据处理装置和方法
CN109583577B (zh) 2017-09-29 2021-04-23 上海寒武纪信息科技有限公司 运算装置及方法
US20200110635A1 (en) 2017-07-05 2020-04-09 Shanghai Cambricon Information Technology Co., Ltd. Data processing apparatus and method
CN107578014B (zh) 2017-09-06 2020-11-03 上海寒武纪信息科技有限公司 信息处理装置及方法
KR102601604B1 (ko) * 2017-08-04 2023-11-13 삼성전자주식회사 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치
CN108205704B (zh) * 2017-09-27 2021-10-29 深圳市商汤科技有限公司 一种神经网络芯片
CN107748914A (zh) * 2017-10-19 2018-03-02 珠海格力电器股份有限公司 人工神经网络运算电路
CN109697507B (zh) * 2017-10-24 2020-12-25 安徽寒武纪信息科技有限公司 处理方法及装置
CN107992486A (zh) * 2017-10-30 2018-05-04 上海寒武纪信息科技有限公司 一种信息处理方法及相关产品
CN109754062B (zh) * 2017-11-07 2024-05-14 上海寒武纪信息科技有限公司 卷积扩展指令的执行方法以及相关产品
CN109977446B (zh) * 2017-12-28 2020-07-07 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
CN109993276B (zh) * 2017-12-29 2021-10-26 中科寒武纪科技股份有限公司 用于执行人工神经网络反向训练的装置和方法
CN108197705A (zh) * 2017-12-29 2018-06-22 国民技术股份有限公司 卷积神经网络硬件加速装置及卷积计算方法及存储介质
CN113807510B (zh) * 2017-12-30 2024-05-10 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
EP3624019A4 (en) 2017-12-30 2021-03-24 Cambricon Technologies Corporation Limited CHIP DEVICE WITH INTEGRATED CIRCUIT AND ASSOCIATED PRODUCT
CN109993290B (zh) 2017-12-30 2021-08-06 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
CN109993292B (zh) 2017-12-30 2020-08-04 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
WO2019165939A1 (zh) * 2018-02-27 2019-09-06 上海寒武纪信息科技有限公司 一种计算装置及相关产品
CN110196735A (zh) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 一种计算装置及相关产品
CN110196734A (zh) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 一种计算装置及相关产品
CN110363291B (zh) * 2018-03-26 2022-02-08 上海寒武纪信息科技有限公司 神经网络的运算方法、装置、计算机设备和存储介质
CN108764454B (zh) * 2018-04-28 2022-02-25 中国科学院计算技术研究所 基于小波变换压缩和/或解压缩的神经网络处理方法
CN108647660A (zh) * 2018-05-16 2018-10-12 中国科学院计算技术研究所 一种使用神经网络芯片处理图像的方法
CN108566537A (zh) * 2018-05-16 2018-09-21 中国科学院计算技术研究所 用于对视频帧进行神经网络运算的图像处理装置
CN108921012B (zh) * 2018-05-16 2022-05-03 中国科学院计算技术研究所 一种利用人工智能芯片处理图像视频帧的方法
CN110147872B (zh) * 2018-05-18 2020-07-17 中科寒武纪科技股份有限公司 编码存储装置及方法、处理器及训练方法
EP3796189A4 (en) 2018-05-18 2022-03-02 Cambricon Technologies Corporation Limited VIDEO RECOVERY METHOD, AND METHOD AND APPARATUS FOR GENERATING A VIDEO RECOVERY MAPPING RELATION
US20210098001A1 (en) 2018-09-13 2021-04-01 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and terminal device
CN110956257A (zh) * 2018-09-26 2020-04-03 龙芯中科技术有限公司 神经网络加速器
CN111062469B (zh) * 2018-10-17 2024-03-05 上海寒武纪信息科技有限公司 计算装置及相关产品
WO2020062284A1 (zh) * 2018-09-30 2020-04-02 深圳市大疆创新科技有限公司 基于卷积神经网络的图像处理方法和设备,以及无人机
CN111045726B (zh) * 2018-10-12 2022-04-15 上海寒武纪信息科技有限公司 支持编码、解码的深度学习处理装置及方法
CN111222632B (zh) * 2018-11-27 2023-06-30 中科寒武纪科技股份有限公司 计算装置、计算方法及相关产品
CN111079908B (zh) * 2018-10-18 2024-02-13 上海寒武纪信息科技有限公司 片上网络数据处理方法、存储介质、计算机设备和装置
JP7060720B2 (ja) 2018-10-18 2022-04-26 シャンハイ カンブリコン インフォメーション テクノロジー カンパニー リミテッド ネットワークオンチップによるデータ処理方法及び装置
CN109657788A (zh) * 2018-12-18 2019-04-19 北京中科寒武纪科技有限公司 数据处理方法、装置及相关产品
CN111353598A (zh) * 2018-12-20 2020-06-30 中科寒武纪科技股份有限公司 一种神经网络压缩方法、电子设备及计算机可读介质
CN111367567B (zh) * 2018-12-25 2023-03-07 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368986B (zh) * 2018-12-25 2023-03-10 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368967B (zh) * 2018-12-25 2023-04-07 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368985B (zh) * 2018-12-25 2023-11-28 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111368990B (zh) * 2018-12-25 2023-03-07 上海寒武纪信息科技有限公司 一种神经网络计算装置和方法
CN111382835A (zh) * 2018-12-27 2020-07-07 中科寒武纪科技股份有限公司 一种神经网络压缩方法、电子设备及计算机可读介质
CN110225346A (zh) * 2018-12-28 2019-09-10 杭州海康威视数字技术股份有限公司 一种编解码方法及其设备
CN111488976B (zh) * 2019-01-28 2023-06-30 中科寒武纪科技股份有限公司 神经网络计算装置、神经网络计算方法及相关产品
CN109871941B (zh) * 2019-02-18 2020-02-21 中科寒武纪科技股份有限公司 数据处理方法、装置及相关产品
CN111738429B (zh) * 2019-03-25 2023-10-13 中科寒武纪科技股份有限公司 一种计算装置及相关产品
CN109978143B (zh) * 2019-03-29 2023-07-18 南京大学 一种基于simd架构的堆栈式自编码器及编码方法
CN111915003B (zh) * 2019-05-09 2024-03-22 深圳大普微电子科技有限公司 一种神经网络硬件加速器
KR20200129957A (ko) 2019-05-10 2020-11-18 삼성전자주식회사 피처맵 데이터에 대한 압축을 수행하는 뉴럴 네트워크 프로세서 및 이를 포함하는 컴퓨팅 시스템
CN111930681B (zh) * 2019-05-13 2023-10-10 中科寒武纪科技股份有限公司 一种计算装置及相关产品
EP3998554A4 (en) * 2019-06-12 2023-11-15 Shanghai Cambricon Information Technology Co., Ltd METHOD FOR DETERMINING QUANTIZATION PARAMETERS IN A NEURONAL NETWORK AND ASSOCIATED PRODUCTS
CN110598858A (zh) * 2019-08-02 2019-12-20 北京航空航天大学 基于非易失性存内计算实现二值神经网络的芯片和方法
US11526761B2 (en) * 2019-08-24 2022-12-13 Microsoft Technology Licensing, Llc Neural network training with decreased memory consumption and processor utilization
CN112712172B (zh) * 2019-10-25 2023-12-26 安徽寒武纪信息科技有限公司 用于神经网络运算的计算装置、方法、集成电路和设备
CN111008699B (zh) * 2019-12-05 2022-06-07 首都师范大学 一种基于自动驾驶的神经网络数据存储方法及系统
CN111045687B (zh) * 2019-12-06 2022-04-22 浪潮(北京)电子信息产业有限公司 一种人工智能应用的部署方法及相关装置
US11144208B2 (en) * 2019-12-23 2021-10-12 Advanced Micro Devices, Inc. Data compression system using base values and methods thereof
CN111079930B (zh) * 2019-12-23 2023-12-19 深圳市商汤科技有限公司 数据集质量参数的确定方法、装置及电子设备
CN111240743B (zh) * 2020-01-03 2022-06-03 格兰菲智能科技有限公司 人工智能集成电路
CN111199283A (zh) * 2020-02-24 2020-05-26 张早 一种基于卷积循环神经网络的气温预测系统及方法
US10938411B1 (en) * 2020-03-25 2021-03-02 Arm Limited Compression and/or decompression of activation data
US11947828B2 (en) * 2020-08-27 2024-04-02 Taiwan Semiconductor Manufacturing Company, Ltd. Memory device
CN112865804B (zh) * 2021-01-12 2023-10-10 东南大学 三值神经网络稀疏性权重的压缩计算单元
US11734075B2 (en) * 2021-11-24 2023-08-22 International Business Machines Corporation Reducing data format conversion of an accelerator
CN114819122B (zh) * 2022-03-28 2022-12-06 中国科学院自动化研究所 基于脉冲神经网络的数据处理方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5204938A (en) * 1989-05-30 1993-04-20 Loral Aerospace Corp. Method of implementing a neural network on a digital computer
CN104200224A (zh) * 2014-08-28 2014-12-10 西北工业大学 基于深度卷积神经网络的无价值图像去除方法
CN105095833A (zh) * 2014-05-08 2015-11-25 中国科学院声学研究所 用于人脸识别的网络构建方法、识别方法及系统
CN105184366A (zh) * 2015-09-15 2015-12-23 中国科学院计算技术研究所 一种时分复用的通用神经网络处理器

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331092C (zh) * 2004-05-17 2007-08-08 中国科学院半导体研究所 模式识别专用神经网络计算机系统
JP2009516246A (ja) * 2005-11-15 2009-04-16 ベルナデット ガーナー ニューラルネットワークのトレーニング方法
CN101527010B (zh) * 2008-03-06 2011-12-07 上海理工大学 人工神经网络算法的硬件实现方法及其系统
US7864083B2 (en) * 2008-05-21 2011-01-04 Ocarina Networks, Inc. Efficient data compression and decompression of numeric sequences
CN102184453A (zh) * 2011-05-16 2011-09-14 上海电气集团股份有限公司 基于模糊神经网络和支持向量机的风电功率组合预测方法
CN103019656B (zh) * 2012-12-04 2016-04-27 中国科学院半导体研究所 可动态重构的多级并行单指令多数据阵列处理系统
CN103177288A (zh) * 2013-03-05 2013-06-26 辽宁省电力有限公司鞍山供电公司 基于遗传算法优化神经网络的变压器故障诊断方法
US10055434B2 (en) * 2013-10-16 2018-08-21 University Of Tennessee Research Foundation Method and apparatus for providing random selection and long-term potentiation and depression in an artificial network
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
CN104915195B (zh) * 2015-05-20 2017-11-28 清华大学 一种基于现场可编程门阵列实现神经网络计算的方法
CN104915560A (zh) * 2015-06-11 2015-09-16 万达信息股份有限公司 一种基于广义神经网络聚类的疾病病种诊疗方案预测方法
US20190073582A1 (en) * 2015-09-23 2019-03-07 Yi Yang Apparatus and method for local quantization for convolutional neural networks (cnns)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5204938A (en) * 1989-05-30 1993-04-20 Loral Aerospace Corp. Method of implementing a neural network on a digital computer
CN105095833A (zh) * 2014-05-08 2015-11-25 中国科学院声学研究所 用于人脸识别的网络构建方法、识别方法及系统
CN104200224A (zh) * 2014-08-28 2014-12-10 西北工业大学 基于深度卷积神经网络的无价值图像去除方法
CN105184366A (zh) * 2015-09-15 2015-12-23 中国科学院计算技术研究所 一种时分复用的通用神经网络处理器

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3940606A1 (en) * 2016-01-20 2022-01-19 Cambricon Technologies Corporation Limited Device and method for executing reversal training of artificial neural network
TWI767098B (zh) * 2017-12-14 2022-06-11 大陸商中科寒武紀科技股份有限公司 神經網絡正向運算方法及相關產品
WO2019141902A1 (en) * 2018-01-17 2019-07-25 Nokia Technologies Oy An apparatus, a method and a computer program for running a neural network
US11687759B2 (en) 2018-05-01 2023-06-27 Semiconductor Components Industries, Llc Neural network accelerator
WO2020058800A1 (en) * 2018-09-19 2020-03-26 International Business Machines Corporation Encoder-decoder memory-augmented neural network architectures
GB2593055A (en) * 2018-09-19 2021-09-15 Int Buisness Machines Corporation Encoder-decoder memory-augmented neural network architectures
GB2593055B (en) * 2018-09-19 2022-11-02 Ibm Encoder-decoder memory-augmented neural network architectures
CN110046703A (zh) * 2019-03-07 2019-07-23 中国科学院计算技术研究所 一种用于神经网络的片上存储处理系统
CN113783933A (zh) * 2021-08-10 2021-12-10 中山大学 基于编码缓存的双层网络通信方法、装置及介质
CN113783933B (zh) * 2021-08-10 2022-05-24 中山大学 基于编码缓存的双层网络通信方法、装置及介质
TWI812117B (zh) * 2021-08-27 2023-08-11 台灣積體電路製造股份有限公司 用於記憶體內計算(cim)的記憶體元件及方法

Also Published As

Publication number Publication date
US10402725B2 (en) 2019-09-03
CN106991477B (zh) 2020-08-14
US20190332945A1 (en) 2019-10-31
CN108427990B (zh) 2020-05-22
CN108427990A (zh) 2018-08-21
CN106991477A (zh) 2017-07-28
US20180330239A1 (en) 2018-11-15
US10726336B2 (en) 2020-07-28

Similar Documents

Publication Publication Date Title
WO2017124644A1 (zh) 一种人工神经网络压缩编码装置和方法
CN107729989B (zh) 一种用于执行人工神经网络正向运算的装置及方法
CN109522254B (zh) 运算装置及方法
US11568258B2 (en) Operation method
WO2019127838A1 (zh) 卷积神经网络实现方法及装置、终端、存储介质
CN107545303B (zh) 用于稀疏人工神经网络的计算装置和运算方法
WO2018192500A1 (zh) 处理装置和处理方法
TWI598831B (zh) 權重位移處理器、方法以及系統
WO2017185391A1 (zh) 一种用于执行卷积神经网络训练的装置和方法
WO2017124642A1 (zh) 用于执行人工神经网络正向运算的装置和方法
US20210089316A1 (en) Deep learning implementations using systolic arrays and fused operations
US20200097799A1 (en) Heterogeneous multiplier
WO2017124641A1 (zh) 用于执行人工神经网络反向训练的装置和方法
US10275247B2 (en) Apparatuses and methods to accelerate vector multiplication of vector elements having matching indices
WO2017124647A1 (zh) 一种矩阵计算装置
CN109062608B (zh) 用于独立数据上递归计算的向量化的读和写掩码更新指令
WO2021040921A1 (en) Systems and methods for providing vector-wise sparsity in a neural network
WO2018120016A1 (zh) 用于执行lstm神经网络运算的装置和运算方法
WO2017177442A1 (zh) 支持离散数据表示的人工神经网络正向运算装置和方法
WO2017185336A1 (zh) 用于执行pooling运算的装置和方法
JP7256811B2 (ja) アドバンストインタコネクト技術を利用してaiトレーニングを加速するための方法及びシステム
WO2018113790A1 (zh) 一种人工神经网络运算的装置及方法
CN111105023A (zh) 数据流重构方法及可重构数据流处理器
EP4292018A1 (en) Techniques for accelerating neural networks
WO2017177446A1 (zh) 支持离散数据表示的人工神经网络反向训练装置和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16885908

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16885908

Country of ref document: EP

Kind code of ref document: A1