WO2018112892A1 - Device and method for supporting fast artificial neural network operation - Google Patents

Device and method for supporting fast artificial neural network operation Download PDF

Info

Publication number
WO2018112892A1
WO2018112892A1 PCT/CN2016/111737 CN2016111737W WO2018112892A1 WO 2018112892 A1 WO2018112892 A1 WO 2018112892A1 CN 2016111737 W CN2016111737 W CN 2016111737W WO 2018112892 A1 WO2018112892 A1 WO 2018112892A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
neurons
neuron
output
unit
Prior art date
Application number
PCT/CN2016/111737
Other languages
French (fr)
Chinese (zh)
Inventor
刘少礼
郝一帆
陈云霁
郭崎
陈天石
Original Assignee
北京中科寒武纪科技有限公司
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京中科寒武纪科技有限公司, 上海寒武纪信息科技有限公司 filed Critical 北京中科寒武纪科技有限公司
Priority to PCT/CN2016/111737 priority Critical patent/WO2018112892A1/en
Publication of WO2018112892A1 publication Critical patent/WO2018112892A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to the field of data processing technologies, and more particularly to an apparatus and method for fast artificial neural network operations.
  • Neural Networks are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system, and adjusts the interconnection relationship between a large number of internal nodes to achieve the purpose of processing information.
  • the algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.
  • Neural networks are widely used in a variety of application scenarios: computational vision, speech recognition, and natural language processing.
  • the scale of neural networks has been growing.
  • Lecun's neural network for handwritten character recognition was less than 1M in weight; in 2012, krizhevsky used to participate in the ImageNet competition with a scale of 60M weights.
  • the neural network is a high-calculation and high-access application.
  • the prior art generally uses a general-purpose processor to calculate the artificial neural network.
  • input neurons, output neurons, and weights are stored in three arrays, along with an indexed array that stores the connection between each output and the input connection.
  • the main operation is the multiplication of neurons with weights. Since the weight and the neuron are not one-to-one correspondence, each operation must find the weight corresponding to the neuron through the index array. Due to the weak computing power and memory access of general-purpose processors, the needs of neural networks cannot be met.
  • Another known method of supporting artificial neural network operations and their training algorithms is to use a graphics processing unit (GPU) that performs the processing by using a general purpose register file and a general purpose stream processing unit.
  • the above algorithm is supported by SIMD instructions.
  • the GPU is a device specially used for performing graphic image operations and scientific calculations, without special support for artificial neural network operations, a large amount of front-end decoding work is still required to perform artificial neural network operations, which brings a lot of additional overhead.
  • the GPU has only a small on-chip buffer.
  • the model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.
  • the present invention proposes an apparatus and method for fast artificial neural network operation.
  • an apparatus for supporting fast artificial neural network operations comprising:
  • mapping unit receiving an input neuron, a weight, and a connection relationship between the input neuron and the output neuron, optimizing the connection relationship, outputting the mapped input neuron and the weight, and the mapped input neuron
  • the correspondence with the weight is the input neuron-weight pair.
  • a method for supporting fast artificial neural network operations comprising:
  • the mapping unit retrieves the input neurons, weights, and connection relationships in the storage unit and outputs the mapped input neurons and weights;
  • the computing device retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
  • FIG. 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an artificial neural network according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing a first connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;
  • FIG. 4 is a schematic diagram showing a second connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;
  • FIG. 5 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to an embodiment of the present invention
  • FIG. 6 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 5;
  • Figure 7 is a flow chart showing the operation steps of the arithmetic unit of Figure 6;
  • FIG. 8 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to another embodiment of the present invention.
  • FIG. 9 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 8;
  • FIG. 10 is a schematic structural diagram of a system supporting fast artificial neural network operation according to still another embodiment of the present invention.
  • Embodiments of the present invention provide an apparatus for supporting fast artificial neural network operations, including
  • the mapping unit optimizes the connection relationship and outputs the mapped input neurons and weights.
  • the correspondence between the mapped input neurons and the weights is the input neuron-weight pair, which reduces the computational complexity of the artificial neural network operation.
  • Fast artificial neural network operations including
  • the input neuron-weight pair is not a true data storage structure, but merely represents the correspondence between the input neurons and the weights.
  • the input neurons are stored in vector A
  • the weights are stored in vector B
  • the lengths of vectors A and B are the same
  • the components of the same position of vectors A and B are combined to be considered an input neuron-weight pair.
  • the input neurons and weights can be placed separately in different caches and used by the arithmetic unit.
  • the input data includes input neurons, weights, and connection relationships
  • the input data is input to the mapping unit 1
  • the mapped input neurons and weights are output by the mapping unit 1
  • the mapped input neurons and weights are mapped.
  • the corresponding relationship is the input neuron-weight pair.
  • the mapping unit includes a thinning mapping unit 11 and/or a fast mapping unit 12, and the thinning mapping unit 11 is configured to perform a thinning operation for removing a connection whose weight is 0 or whose weight is less than a preset threshold, and the fast mapping unit 12 Used to perform a fast mapping operation for removing connections where the input neuron is 0 or less than a preset threshold, and the two thresholds mentioned herein may not be equal.
  • the thinning mapping unit 11 includes a thinning determination unit and a thinning execution unit, and the thinning determination unit determines whether to perform a thinning operation. If the thinning determination unit determines to perform the thinning operation, the thinning execution unit is based on the input neuron and the output. Whether the weight is 0 or whether it is less than a predetermined threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if the thinning determination unit determines If the thinning operation is not performed, the default ownership value is non-zero or greater than the preset threshold. The connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.
  • connection relationship in the thinning mapping unit 11 can be expressed in the following two ways:
  • Use 1 to indicate that the weight between the input neuron and the output neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained, and 0 indicates that the input neuron weight is zero or less than the preset.
  • Threshold the connection between the input neuron and the output neuron is removed, and each output neuron is connected with all input neurons to form a string of 0 and 1 to represent the connection relationship of the output neuron, and the output is output
  • the connection relationship of neurons is combined into a vector.
  • Retaining/removing the connection according to whether the weight is zero or whether it is less than a preset threshold, and outputting the distance of the first connection where the first connection is located from the first input neuron, and the outputting the second input neuron distance The distance of the last input neuron, the distance of the output third input neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship of the output .
  • the fast mapping unit 12 includes a fast mapping determining unit and a fast mapping executing unit, and the fast mapping determining unit determines whether the neural network performs an operation of discriminating the input neurons, and if the fast mapping determining unit determines execution, according to whether the value of the input neuron is 0 or Whether it is less than a preset threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if not, all input neurons are not 0 or both by default. Greater than the preset threshold, the connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.
  • connection relationship in the fast mapping unit 12 can also be expressed in the following two ways:
  • Use 1 to indicate that the input neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained.
  • 0 indicates that the input neuron is zero or less than a preset threshold, and the input neuron and output nerve are removed.
  • the connection between the elements, each output neuron and all of its input neurons form a string of 0 and 1 to represent the connection relationship of the output neurons, and combine the connection relationships of all output neurons into a vector.
  • Retaining/removing the connection according to whether the input neuron is zero or less than a preset threshold, and the distance at which the first connection of the output neuron is located is the distance from the first input neuron, and the output neuron is second.
  • the distance of the input neuron from the previous input neuron, the distance of the third input neuron of the output neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted Indicates the connection relationship of the output.
  • K the Kth layer
  • K+1 the Kth layer
  • the +1 layer is called the output layer. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer, and the number of neurons in each layer is predicted.
  • the input layer be composed of N input neurons I 1 , I 2 , . . . , I N
  • the output layer consists of M output neurons O 1 , O 2 , . . . , O M .
  • the first connection method :
  • each output neuron O j gets its corresponding connection relationship. Since the input layer has N nodes, the connection relationship has N bits, and the value of each bit is 1 or 0. The value of the i-th bit indicates that there is a connection between I i and O j , and 0 indicates I i and O j . There is no connection between them. Initially, the value of these N bits is set to 1. If the value of the input neuron I i is zero or less than a preset threshold, or if the weight between I i and O j is zero or less than a preset threshold, then the value of the i-th bit in the connection relationship is set If it is 0, it is considered that there is no connection between I i and O j . Then, all the connection relations of the output neurons are combined into one vector, and the N ⁇ (j-1)+1 component to the N ⁇ jth component value of the vector is the connection relationship corresponding to the output neuron O j .
  • the number of input layer neurons is equal to the number of stored bits of the connection relationship corresponding to each output neuron. So even with the simplest one-dimensional array that takes only 0,1 values, you can clearly know the connection relationship of each output neuron.
  • connection relationship For each output neuron O j get its corresponding connection relationship. If the value of the input neuron I i is zero or less than a preset threshold, or if the weight between I i and O j is zero or less than a preset threshold, then there is no connection between I i and O j , Otherwise there is a connection. If the input and O j of the neural element is connected to I i_1, I i_2, ..., I i_n, wherein 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N. Then, the connection relationship has n bits; the first bit value is equal to i_1-1; n ⁇ k>1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1).
  • connection relationship can be represented by a high-dimensional dynamic array, which can be represented by a linked list or the like.
  • the mapping unit After the processed connection relationship is obtained, the mapping unit outputs the mapped input neurons and weights according to the processed connection relationship, and the mapping relationship between the mapped input neurons and the weights is an input neuron-weight pair, after mapping
  • the input neurons and weights can be used directly during the operation.
  • the thinning mapping unit 11 and the fast mapping unit 12 optimize the connection relationship of the input data, and output the mapped input neurons and weights, and the corresponding connection relationship can adopt two representation modes: One is to use between each input and output neuron One indicates whether there is a connection, and the other is the distance between the connections to indicate the location of each connection.
  • the artificial neural network has 4 input neurons: I1, I2, I3, I4; there are 2 output nerves. Element: O1, O2; the weights of the connections are expressed as: W11, W21, W31, W41, W12, W22, W32, W42. Let I1 have a value of 0, I2, I3, and I4 are not 0; let W21, W12, and W42 be 0, and the remaining weights are non-zero.
  • the sparse mapping unit and the fast mapping unit can process the data at the same time, or the data can be processed in turn and the order of the two can be interchanged. The following only describes the data processing by the sparse mapping unit.
  • the first connection is expressed as follows:
  • the connection relationship of O2 defaults to: 1111, and the placement order is 11111111; if the thinning operation is performed, as shown in FIG. 3, the connection of the output neuron O1 is output.
  • the relationship is: 1011, each bit indicates whether there is a connection with the input, 1 indicates that there is a connection, 0 indicates no connection, and the connection relationship of the output neuron O2 is 0110.
  • the input neurons and the weights corresponding to the connection relationship of 0 are not operated.
  • the connection relationship can be stored in the order of the output neurons. Put all the inputs of each output neuron in turn and combine them into a vector.
  • the order of the above example is 10110110.
  • the connection relationship and the placement order of O1, O2 are unchanged; if the operation of discriminating the input neuron value is performed, the map after the thinning operation is performed
  • the connection relationship of the output neuron O1 is: 0011, and the first digit is changed from 1 to 0 because the first input neuron I1 has a value of 0, the connection from I1 is removed, and the output nerve is output.
  • the connection relationship of the element O2 is: 0110, and finally placed as: 00110110.
  • the connection relationship of the output neuron O1 is: 0111, and the connection relationship of the output neuron O2 is: 0111, and finally placed as: 01110111.
  • the second connection is expressed as follows:
  • the connection relationship of O1, O2 defaults to: 0, 1, 1, 1; if the thinning operation is performed, as shown in FIG. 4, the output neurons are output.
  • O1 is connected to the input neurons I1, I3, I4, then the connection relationship is 0, 2, 1.
  • 0 indicates that the position of the first connection is 0 from the first input neuron, that is, the first input nerve Yuan
  • 2 indicates that the distance of the second input neuron from the previous input neuron is 2
  • 1 indicates that the distance of the third input neuron from the previous input neuron is 1, that is, Represents the fourth input neuron.
  • the connection relationship of O2 is 1,1.
  • the connection relationship of O1, O2 is unchanged; if the operation of discriminating the input neuron value is performed, the operation shown in FIG. 4 after performing the thinning operation
  • the neural network because the first input neuron I1 value is 0, removes the connection from I1, so the connection relationship of the output neuron O1 is: 2, 1, the connection relationship of the output neuron O2 is: 1,1.
  • the connection relationship of O1 and O2 is: 1,1,1.
  • the thinning mapping unit 11 and the fast mapping unit 12 in the present invention include, but are not limited to, the above connection relationship.
  • the sparse mapping unit 11 and the fast mapping unit 12 output the mapped neurons and weights according to the connection relationship obtained above, and the correspondence between the mapped neurons and the weights is an input neuron-weight pair, and the input nerve
  • the meta-weight pair can be used directly in the operation, taking the specific process of outputting the neuron O1 mapping in an artificial neural network as shown in FIG. 2 as an example:
  • the input neurons are: I1, I2, I3, I4, and the input weights are: W11, W21, W31, W41, where I1, W21 take 0, and the rest are non-zero.
  • connection relationship is: 1011, or 0, 2, 1.
  • the connection relationship is: 0011, or 2, 1.
  • the two mapping units output the input neurons with the value of 0 removed and the connection weights issued therefrom.
  • the mapped input neurons are I3, I4, and the mapped weights are W31, W41.
  • the input neuron-weight pair is: I3-W31, I4-W41.
  • the obtained input neuron vector is (I3, I4), and the obtained weight vector is (W31, W41).
  • the fast mapping unit is used to perform the fast mapping, and finally the mapped input neurons and weights are obtained.
  • the two mapping units preferably operate on the data at the same time, regardless of the order.
  • a device supporting fast artificial neural network operation in the embodiment of the present invention except for a mapping unit 1, further comprising: a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, an input neuron buffer 6, a weight buffer 7, an arithmetic unit 8, and an output neuron cache 9.
  • a mapping unit 1 further comprising: a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, an input neuron buffer 6, a weight buffer 7, an arithmetic unit 8, and an output neuron cache 9.
  • the storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.
  • the mapping unit 1 retrieves the input neurons, the weights, and the connection relationships in the storage unit 2, and performs the thinning operation by the thinning mapping unit 11, performs the fast mapping by the fast mapping unit 12, and the mapping unit 1 obtains the mapping of the data.
  • the mapped input neurons and weights are stored in the storage unit 2.
  • the DMA 3 calls the instruction in the storage unit 2 and the mapped input neurons and weights, and allocates them to the instruction cache 4, the input neuron buffer 6, and the weight buffer 7, respectively.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step.
  • the vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for supporting fast artificial neural network operation. As shown in FIG. 6, the method includes the following steps:
  • S101 Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.
  • the mapping unit calls all input neurons, weights, and connection relationships in the storage unit, and processes the same, and obtains the mapped input neurons and weights, and stores them in the storage unit.
  • the sparse mapping unit sparses input neurons, weights, and connection relationships.
  • the processing and fast mapping unit performs fast mapping processing on input neurons, weights, and connection relationships.
  • Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights.
  • the neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.
  • the weight buffer 7 reads the partially mapped neurons and weights through the DMA3.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • S1041 Perform a multiplication operation, which is used to multiply the mapped neuron and the weight data;
  • S1042 Perform an addition tree operation, and add the results obtained in the first stage by the addition tree step by step to complete the vector inner product operation;
  • S1043 Perform non-linear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transformation is an activation function operation, and the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S106 It is determined whether all the mapped neurons and the weights are calculated. If the result is N, the process returns to step S103. If the result is Y, step S107 is performed.
  • Another embodiment of the present invention provides an apparatus for supporting fast artificial neural network operations, including a mapping unit 1, a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, and an input neuron cache. 6.
  • the storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.
  • the instructions in the DMA3 call storage unit 2 are allocated to the instruction cache 4, and the input neurons in the call storage unit 2, the weights, and the connection relationship are assigned to the mapping unit 1 for direct mapping.
  • the mapping unit 1 performs a thinning operation by the thinning mapping unit 11, and the fast mapping unit 12 performs fast mapping, and the mapping unit 1 obtains the mapped input neurons and weights through mapping of the data, and transmits them to the neuron cache 6, and the weight buffer 7, respectively.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to perform a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step.
  • the vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for supporting fast artificial neural network operations, as shown in FIG. 9, including the following steps:
  • S201 Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.
  • mapping unit calls part of the input neurons, weights, and connection relationships in the storage unit through DMA3, and processes the same, and obtains the mapped input neurons and weights directly into the neuron cache 6, and the weight cache. 7.
  • the sparse mapping unit performs thinning processing on the input neurons, the weights, and the connection relationships
  • the fast mapping unit performs fast mapping processing on the input neurons, the weights, and the connection relationships.
  • Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights.
  • the neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S205 It is determined whether all input neurons and weights are mapped and operated. If the result is N, the process returns to step S102. If the result is Y, step S107 is performed.
  • the thinning mapping unit and the fast mapping unit of the mapping unit in the embodiment are mapped in the calculation, and the mapped data is directly calculated to the operation unit, and the previous embodiment
  • the data mapped by the thinning mapping unit and the fast mapping unit of the mapping unit before use in the calculation of the arithmetic unit are stored in the storage unit.
  • the operation speed is faster.
  • FIG. 10 Another embodiment of the present invention provides a system for fast artificial neural network operation, as shown in FIG. 10, which includes: an I/O interface 20, a storage device 30, a central processing unit (CPU) 40, and a fast artificial neural network operation.
  • I/O interface 20 an I/O interface 20
  • storage device 30 a storage device 30
  • CPU 40 central processing unit 40
  • fast artificial neural network operation a fast artificial neural network operation.
  • the I/O interface 20 for I/O data needs to be sent by the CPU 40 to the device 10 supporting the fast artificial neural network operation, and then written to the storage device 30 by the device 10 supporting the fast artificial neural network operation, and the fast artificial neural network operation
  • the dedicated instructions required by device 10 are also transmitted by CPU 40 to fast artificial neural network computing device 10.
  • Storage device 30 is used to temporarily store artificial neural network models and neuron data, particularly when all models are not available in the cache on device 10 supporting fast artificial neural network operations.
  • a central processing unit (CPU) 40 is used for data handling and basic control such as start-stop of the device 10 supporting fast artificial neural network operation, as an interface between the device 10 supporting the fast artificial neural network operation and external control.
  • a device 10 supporting fast artificial neural network operations for accepting data and programs from the CPU 40, executing a fast artificial neural network operation algorithm, and executing the results of the device 10 supporting the fast artificial neural network operation is transmitted back to the CPU 40.
  • the device 10 supporting the fast artificial neural network operation is supported as a coprocessor of the CPU 40 or the GPU to execute a fast artificial neural network operation algorithm.
  • a plurality of devices supporting fast artificial neural network operations are interconnected to form a system: a plurality of devices supporting fast artificial neural network operations can be interconnected through a PCIE bus to support larger-scale rapid artificial neural network operations. Can share the same host CPU or Each has its own host CPU, which can share memory or each accelerator has its own memory.
  • the interconnection method can be any interconnection topology.
  • a device or method using the techniques of the present invention when performing a neural network operation, if a portion of the input neurons and weights in the network have a value equal to zero or near zero for a given neural network, then In terms of computational speed, there is an improvement over devices or methods that do not employ the techniques described herein. Moreover, the larger the ratio of input neurons equal to 0 or near 0 to all input neurons in the network, the greater the increase in operation speed; the value of the value equal to 0 or the weight near 0 is the proportion of the ownership value in the network. Large, the higher the speed of the operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The device and method for supporting a fast artificial neural network operation. The device comprises: a mapping unit for receiving an input neuron, a weight value, and a connection relationship between the input neuron and an output neuron, optimizing the connection relationship, and outputting the mapped input neuron and the mapped weight value, wherein the correlation between the mapped input neuron and the mapped weight value is an input neuron-weight value pair. By means of the device and method, the connection relationship between the input neuron and the weight value is optimized by means of a sparse mapping unit and/or a fast mapping unit, thereby reducing the calculation amount, solving the problem of insufficient operational performance of a CPU and a GPU and large front-end decoding costs, and effectively improving the support for multi-level artificial neural network operation algorithms.

Description

一种支持快速人工神经网络运算的装置及方法Device and method for supporting fast artificial neural network operation 技术领域Technical field
本发明涉及数据处理技术领域,更具体地涉及一种用于快速人工神经网络运算的装置和方法。The present invention relates to the field of data processing technologies, and more particularly to an apparatus and method for fast artificial neural network operations.
背景技术Background technique
人工神经网络(Artificial Neural Networks,ANNs)简称为神经网络(NNs),它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间的相互连接关系,从而达到处理信息的目的。神经网络用到的算法就是向量乘法,并且广泛采用符号函数及其各种逼近。Artificial Neural Networks (ANNs) are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system, and adjusts the interconnection relationship between a large number of internal nodes to achieve the purpose of processing information. The algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.
神经网络被广泛应用于各种应用场景:计算视觉、语音识别和自然语言处理等。在近几年的时间里,神经网络的规模一直在增长。在1998年,Lecun用于手写字符识别的神经网络的规模小于1M个权值;在2012年,krizhevsky用于参加ImageNet竞赛的规模是60M个权值。Neural networks are widely used in a variety of application scenarios: computational vision, speech recognition, and natural language processing. In recent years, the scale of neural networks has been growing. In 1998, Lecun's neural network for handwritten character recognition was less than 1M in weight; in 2012, krizhevsky used to participate in the ImageNet competition with a scale of 60M weights.
神经网络是一个高计算量和高访存的应用,权值越多,计算量和访存量都会增大。随着神经网络计算量和访存量的急剧增大,现有技术中通常采用通用处理器计算人工神经网络。对于通用处理器,输入神经元、输出神经元和权重分别存储在三个数组中,同时还有一个索引数组,索引数组存储了每个输出和输入连接的连接关系。在计算时,主要的运算是神经元与权值相乘。由于权值和神经元不是一一对应的关系,所以每一次运算都要通过索引数组找到神经元对应的权值。由于通用处理器计算能力和访存能力都很弱,满足不了神经网络的需求。而多个通用处理器并行执行时,通用处理器之间相互通讯又成为了性能瓶颈。在计算剪枝之后的神经网络时,每次乘法运算都要去索引数组里重新查找权值对应的位置,增加了额外的计算量和访存开销。因此计算神经网络耗时长,功耗高。通用处理器需要把多层人工神经网络运算译码成一长列运算及访存指令序列,处理器前端译码带来了较大的功耗开销。The neural network is a high-calculation and high-access application. The more weights, the larger the calculation and the amount of memory. With the dramatic increase in the amount of computation and the amount of memory in the neural network, the prior art generally uses a general-purpose processor to calculate the artificial neural network. For general-purpose processors, input neurons, output neurons, and weights are stored in three arrays, along with an indexed array that stores the connection between each output and the input connection. In the calculation, the main operation is the multiplication of neurons with weights. Since the weight and the neuron are not one-to-one correspondence, each operation must find the weight corresponding to the neuron through the index array. Due to the weak computing power and memory access of general-purpose processors, the needs of neural networks cannot be met. When multiple general-purpose processors are executed in parallel, communication between the general-purpose processors becomes a performance bottleneck. In the calculation of the neural network after pruning, each multiplication operation has to go to the index array to re-find the position corresponding to the weight, which increases the additional computation and memory overhead. Therefore, the calculation of the neural network takes a long time and consumes a lot of power. The general-purpose processor needs to decode the multi-layer artificial neural network into a long column operation and a fetch instruction sequence, and the processor front-end decoding brings a large power consumption overhead.
另一种支持人工神经网络运算及其训练算法的已知方法是使用图形处理器(GPU),该方法通过使用通用寄存器堆和通用流处理单元执行通 用SIMD指令来支持上述算法。但由于GPU是专门用来执行图形图像运算以及科学计算的设备,没有对人工神经网络运算的专门支持,仍然需要大量的前端译码工作才能执行人工神经网络运算,带来了大量的额外开销。另外GPU只有较小的片上缓存,多层人工神经网络的模型数据(权值)需要反复从片外搬运,片外带宽成为了主要性能瓶颈,同时带来了巨大的功耗开销。Another known method of supporting artificial neural network operations and their training algorithms is to use a graphics processing unit (GPU) that performs the processing by using a general purpose register file and a general purpose stream processing unit. The above algorithm is supported by SIMD instructions. However, since the GPU is a device specially used for performing graphic image operations and scientific calculations, without special support for artificial neural network operations, a large amount of front-end decoding work is still required to perform artificial neural network operations, which brings a lot of additional overhead. In addition, the GPU has only a small on-chip buffer. The model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.
发明内容Summary of the invention
鉴于现有方案存在的问题,为了克服上述现有技术方案的不足,本发明提出了一种用于快速人工神经网络运算的装置和方法。In view of the problems existing in the prior art, in order to overcome the deficiencies of the above prior art solutions, the present invention proposes an apparatus and method for fast artificial neural network operation.
根据本发明的一个方面,提供一种支持快速人工神经网络运算的装置,包括:According to an aspect of the present invention, an apparatus for supporting fast artificial neural network operations is provided, comprising:
映射单元,接收输入神经元、权值以及输入神经元和输出神经元的连接关系,对所述连接关系进行优化处理,输出映射后的输入神经元和权值,所述映射后的输入神经元和权值的对应关系为输入神经元-权值对。a mapping unit, receiving an input neuron, a weight, and a connection relationship between the input neuron and the output neuron, optimizing the connection relationship, outputting the mapped input neuron and the weight, and the mapped input neuron The correspondence with the weight is the input neuron-weight pair.
根据本发明的另一个方面,提供一种支持快速人工神经网络运算的方法,包括:According to another aspect of the present invention, a method for supporting fast artificial neural network operations is provided, comprising:
映射单元调取所述存储单元中的所述输入神经元、权值以及连接关系并输出映射后的输入神经元和权值;The mapping unit retrieves the input neurons, weights, and connection relationships in the storage unit and outputs the mapped input neurons and weights;
运算装置调取所述映射后的输入神经元和权值并进行运算获得输出神经元。The computing device retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
从上述技术方案可以看出,本发明具有以下有益效果:It can be seen from the above technical solutions that the present invention has the following beneficial effects:
(1)通过稀疏化映射单元和/或快速映射单元,优化输入神经元和权值的连接关系,减少了计算量,解决了CPU和GPU运算性能不足,前端译码开销大的问题,有效提高了对多层人工神经网络运算算法的支持;(1) Optimize the connection relationship between the input neurons and the weights through the sparse mapping unit and/or the fast mapping unit, reduce the calculation amount, solve the problem that the CPU and GPU performance is insufficient, and the front-end decoding overhead is large, effectively improving Support for multi-layer artificial neural network algorithms;
(2)通过采用针对多层人工神经网络运算算法的专用片上缓存,充分挖掘了输入神经元和权值数据的重用性,避免了反复向内存读取这些数据,降低了内存访问带宽,避免了内存带宽成为多层人工神经网络运算及其训练算法性能瓶颈的问题。(2) By using a dedicated on-chip buffer for multi-layer artificial neural network operation algorithm, the reuse of input neurons and weight data is fully exploited, avoiding repeated reading of these data into memory, reducing memory access bandwidth and avoiding Memory bandwidth becomes a problem of multi-layer artificial neural network operation and performance bottleneck of its training algorithm.
附图说明DRAWINGS
图1为本发明实施例中映射单元的结构示意图; 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention;
图2为本发明实施例中一个人工神经网络的结构示意图;2 is a schematic structural diagram of an artificial neural network according to an embodiment of the present invention;
图3为图2中的人工神经网络经稀疏化后的第一个输出神经元的第一连接方式示意图;3 is a schematic diagram showing a first connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;
图4为图2中的人工神经网络经稀疏化后的第一个输出神经元的第二连接方式示意图;4 is a schematic diagram showing a second connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;
图5为本发明一实施例的支持快速人工神经网络运算的装置的结构示意图;FIG. 5 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to an embodiment of the present invention; FIG.
图6为图5中支持快速人工神经网络运算的装置的运算方法的流程图;6 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 5;
图7为图6中运算单元运算步骤的流程图;Figure 7 is a flow chart showing the operation steps of the arithmetic unit of Figure 6;
图8为本发明另一实施例的支持快速人工神经网络运算的装置的结构示意图;FIG. 8 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to another embodiment of the present invention; FIG.
图9为图8中支持快速人工神经网络运算的装置的运算方法的流程图;9 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 8;
图10为本发明又一实施例的支持快速人工神经网络运算的系统的结构示意图。FIG. 10 is a schematic structural diagram of a system supporting fast artificial neural network operation according to still another embodiment of the present invention.
具体实施方式detailed description
本发明某些实施例于后方将参照所附附图做更全面性地描述,其中一些但并非全部的实施例将被示出。实际上,本发明的各种实施例可以许多不同形式实现,而不应被解释为限于此数所阐述的实施例;相对地,提供这些实施例使得本发明满足适用的法律要求。Some embodiments of the invention will be described more fully hereinafter with reference to the appended drawings, in which some, but not all, In fact, the various embodiments of the invention can be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.
在本说明书中,下述用于描述本发明原理的各种实施例只是说明,不应该以任何方式解释为限制发明的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本发明的示例性实施例。下述描述包括多种具体细节来帮助理解,但这些细节应认为仅仅是示例性的。因此,本领域普通技术人员应认识到,在不悖离本发明的范围和精神的情况下,可以对本文中描述的实施例进行多种改变和修改。此外,为了清楚和简洁起见,省略了公知功能和结构的描述。此外,贯穿附图,相同附图标记用于相似功能和操作。In the present specification, the following various embodiments for describing the principles of the present invention are merely illustrative and should not be construed as limiting the scope of the invention. The following description of the invention is intended to be understood as The description below includes numerous specific details to assist the understanding, but these details should be considered as merely exemplary. Accordingly, it will be appreciated by those skilled in the art that various changes and modifications can be made in the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness. In addition, the same reference numbers are used throughout the drawings for similar functions and operations.
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。The present invention will be further described in detail below with reference to the specific embodiments of the invention.
本发明实施例提供了一种支持快速人工神经网络运算的装置,其包括 映射单元,优化连接关系并输出映射后的输入神经元和权值,映射后的输入神经元和权值的对应关系为输入神经元-权值对,降低了人工神经网络运算的运算量,实现了快速人工神经网络运算。Embodiments of the present invention provide an apparatus for supporting fast artificial neural network operations, including The mapping unit optimizes the connection relationship and outputs the mapped input neurons and weights. The correspondence between the mapped input neurons and the weights is the input neuron-weight pair, which reduces the computational complexity of the artificial neural network operation. Fast artificial neural network operations.
所述输入神经元-权值对不是一种真正的数据存储结构,仅仅是表示输入神经元和权值的对应关系。例如,输入神经元存储于向量A中,权值存储于向量B中,向量A和B的长度相同,向量A和B的同一位置的分量组合在一起被认为是一个输入神经元-权值对。在参与运算时,输入神经元和权值可以分开放置于不同缓存中,被运算单元使用。The input neuron-weight pair is not a true data storage structure, but merely represents the correspondence between the input neurons and the weights. For example, the input neurons are stored in vector A, the weights are stored in vector B, the lengths of vectors A and B are the same, and the components of the same position of vectors A and B are combined to be considered an input neuron-weight pair. . When participating in an operation, the input neurons and weights can be placed separately in different caches and used by the arithmetic unit.
如图1所示,输入数据包括输入神经元、权值以及连接关系,输入数据输入映射单元1,由映射单元1输出映射后的输入神经元和权值,映射后的输入神经元和权值的对应关系为输入神经元-权值对。As shown in FIG. 1, the input data includes input neurons, weights, and connection relationships, the input data is input to the mapping unit 1, and the mapped input neurons and weights are output by the mapping unit 1, and the mapped input neurons and weights are mapped. The corresponding relationship is the input neuron-weight pair.
映射单元包括稀疏化映射单元11和/或快速映射单元12,稀疏化映射单元11用于执行稀疏化操作,用于去除权值为0或权值小于预设的阈值的连接,快速映射单元12用于执行快速映射操作,用于去除输入神经元为0或者小于预设的阈值的连接,这里提到的两个阈值可以不相等。The mapping unit includes a thinning mapping unit 11 and/or a fast mapping unit 12, and the thinning mapping unit 11 is configured to perform a thinning operation for removing a connection whose weight is 0 or whose weight is less than a preset threshold, and the fast mapping unit 12 Used to perform a fast mapping operation for removing connections where the input neuron is 0 or less than a preset threshold, and the two thresholds mentioned herein may not be equal.
稀疏化映射单元11包括稀疏化判断单元及稀疏化执行单元,稀疏化判断单元判别是否执行稀疏化操作,如果稀疏化判断单元判定执行稀疏化操作,则稀疏化执行单元根据输入神经元和输出之间的权值是否为0或者是否小于某个预先设定的阈值,对连接关系进行优化处理,根据处理后的连接关系将输入数据转换成输入神经元-权值对;如果稀疏化判断单元判定不执行稀疏化操作,则默认所有权值非0或者都大于预设的阈值,不对连接关系做处理,直接将输入数据转换成输入神经元-权值对。The thinning mapping unit 11 includes a thinning determination unit and a thinning execution unit, and the thinning determination unit determines whether to perform a thinning operation. If the thinning determination unit determines to perform the thinning operation, the thinning execution unit is based on the input neuron and the output. Whether the weight is 0 or whether it is less than a predetermined threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if the thinning determination unit determines If the thinning operation is not performed, the default ownership value is non-zero or greater than the preset threshold. The connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.
所述稀疏化映射单元11中的连接关系可以采用以下两种方式表示:The connection relationship in the thinning mapping unit 11 can be expressed in the following two ways:
第一种方式:The first way:
采用1表示输入神经元与输出神经元之间的权值非零或者大于预设的阈值,保留该输入神经元与输出神经元间的连接,0表示输入神经元权值为零或者小于预设的阈值,去除该输入神经元与输出神经元间的连接,每个输出神经元与所有输入神经元的连接组成一个0和1的字符串来表示该输出神经元的连接关系,并将所输出神经元的连接关系拼合成一个向量。 Use 1 to indicate that the weight between the input neuron and the output neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained, and 0 indicates that the input neuron weight is zero or less than the preset. Threshold, the connection between the input neuron and the output neuron is removed, and each output neuron is connected with all input neurons to form a string of 0 and 1 to represent the connection relationship of the output neuron, and the output is output The connection relationship of neurons is combined into a vector.
第二种方式: The second way:
根据权值是否为零或者是否小于预设的阈值对连接进行保留/去除,将一输出第一个连接所在的位置距离第一个输入神经元的距离、所述输出第二个输入神经元距离上一个输入神经元的距离,所述输出第三个输入神经元距离上一个输入神经元的距离,……,依次类推,直到穷举所述输出的所有输入,来表示所述输出的连接关系。Retaining/removing the connection according to whether the weight is zero or whether it is less than a preset threshold, and outputting the distance of the first connection where the first connection is located from the first input neuron, and the outputting the second input neuron distance The distance of the last input neuron, the distance of the output third input neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship of the output .
快速映射单元12包括快速映射判断单元和快速映射执行单元,快速映射判断单元判断神经网络是否执行判别输入神经元的操作,如果快速映射判断单元判定执行,则根据输入神经元的值是否为0或者是否小于某个预先设定的阈值,对连接关系进行优化处理,根据处理后的连接关系将输入数据转换成输入神经元-权值对;如果不执行,则默认所有输入神经元非0或者都大于预设的阈值,不对连接关系做处理,直接将输入数据转换成输入神经元-权值对。The fast mapping unit 12 includes a fast mapping determining unit and a fast mapping executing unit, and the fast mapping determining unit determines whether the neural network performs an operation of discriminating the input neurons, and if the fast mapping determining unit determines execution, according to whether the value of the input neuron is 0 or Whether it is less than a preset threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if not, all input neurons are not 0 or both by default. Greater than the preset threshold, the connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.
其中,所述快速映射单元12中的连接关系亦可以采用以下两种方式表示:The connection relationship in the fast mapping unit 12 can also be expressed in the following two ways:
第一种方式:The first way:
采用1表示输入神经元非零或者大于预设的阈值,保留该输入神经元与输出神经元间的连接,0表示输入神经元为零或者小于预设的阈值,去除该输入神经元与输出神经元间的连接,每个输出神经元与其所有输入神经元组成一个0和1的字符串来表示该输出神经元的连接关系,并将所有输出神经元的连接关系拼合成一个向量。 Use 1 to indicate that the input neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained. 0 indicates that the input neuron is zero or less than a preset threshold, and the input neuron and output nerve are removed. The connection between the elements, each output neuron and all of its input neurons form a string of 0 and 1 to represent the connection relationship of the output neurons, and combine the connection relationships of all output neurons into a vector.
第二种方式:The second way:
根据输入神经元是否为零或者是否小于预设的阈值对连接进行保留/去除,将一输出神经元第一个连接所在的位置距离第一个输入神经元的距离、所述输出神经元第二个输入神经元距离上一个输入神经元的距离,所述输出神经元第三个输入神经元距离上一个输入神经元的距离,……,依次类推,直到穷举所述输出的所有输入,来表示所述输出的连接关系。Retaining/removing the connection according to whether the input neuron is zero or less than a preset threshold, and the distance at which the first connection of the output neuron is located is the distance from the first input neuron, and the output neuron is second. The distance of the input neuron from the previous input neuron, the distance of the third input neuron of the output neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted, Indicates the connection relationship of the output.
具体地,设一个神经网络有L层,K=1,2,...,L-1,对于第K层和第K+1层来说,我们将第K层称为输入层,第K+1层称为输出层。即除最顶层外,每一层都可以作为输入层,其下一层为对应的输出层,每层神经元的个数是预知的。 Specifically, let a neural network have an L layer, K=1, 2, . . . , L-1. For the Kth layer and the K+1th layer, we refer to the Kth layer as an input layer, Kth. The +1 layer is called the output layer. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer, and the number of neurons in each layer is predicted.
假设稀疏化映射单元和快速映射单元都执行相应的操作,每一对输入层与输出层之间的运算过程如下:Assuming that both the sparse mapping unit and the fast mapping unit perform the corresponding operations, the operation process between each pair of input layers and output layers is as follows:
设输入层由N个输入神经元I1,I2,...,IN组成,输出层由M个输出神经元O1,O2,...,OM组成。Let the input layer be composed of N input neurons I 1 , I 2 , . . . , I N , and the output layer consists of M output neurons O 1 , O 2 , . . . , O M .
i=1,2,...,N,j=1,2,...,Mi=1,2,...,N,j=1,2,...,M
第一种连接方式:The first connection method:
首先,对每个输出神经元Oj得到其对应的连接关系。由于输入层有N个节点,所以该连接关系有N位,每一位的值为1或0,第i位值为1表示Ii与Oj之间有连接,0表示Ii与Oj之间无连接。初始时,这N位的值都置为1。如果输入神经元Ii的值为零或小于预设的阈值,或者如果Ii与Oj之间的权值为零或小于预设的阈值,则将该连接关系中第i位的值置为0,即认为Ii与Oj之间无连接。然后,将所有的输出神经元的连接关系拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量值就是输出神经元Oj对应的连接关系。First, each output neuron O j gets its corresponding connection relationship. Since the input layer has N nodes, the connection relationship has N bits, and the value of each bit is 1 or 0. The value of the i-th bit indicates that there is a connection between I i and O j , and 0 indicates I i and O j . There is no connection between them. Initially, the value of these N bits is set to 1. If the value of the input neuron I i is zero or less than a preset threshold, or if the weight between I i and O j is zero or less than a preset threshold, then the value of the i-th bit in the connection relationship is set If it is 0, it is considered that there is no connection between I i and O j . Then, all the connection relations of the output neurons are combined into one vector, and the N×(j-1)+1 component to the N×jth component value of the vector is the connection relationship corresponding to the output neuron O j .
该方法中,输入层神经元的个数等于每一个输出神经元对应的连接关系的存储位数。所以就算只用最简单的只取0,1值的一维数组,也能清晰地知道每个输出神经元对应的连接关系。In this method, the number of input layer neurons is equal to the number of stored bits of the connection relationship corresponding to each output neuron. So even with the simplest one-dimensional array that takes only 0,1 values, you can clearly know the connection relationship of each output neuron.
第二种连接方式:The second connection method:
对每个输出神经元Oj得到其对应的连接关系。如果输入神经元Ii的值为零或小于预设的阈值,或者如果Ii与Oj之间的权值为零或小于预设的阈值,则认为Ii与Oj之间无连接,反之则有连接。若与Oj有连接的输入神经元为Ii_1,Ii_2,...,Ii_n,其中1≤i_1<i_2<...<i_n≤N。则连接关系有n位;第1位值等于i_1-1;n≥k>1,连接关系第k位的值等于i_k-i_(k-1)。For each output neuron O j get its corresponding connection relationship. If the value of the input neuron I i is zero or less than a preset threshold, or if the weight between I i and O j is zero or less than a preset threshold, then there is no connection between I i and O j , Otherwise there is a connection. If the input and O j of the neural element is connected to I i_1, I i_2, ..., I i_n, wherein 1≤i_1 <i_2 <... <i_n≤N. Then, the connection relationship has n bits; the first bit value is equal to i_1-1; n≥k>1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1).
该方法中,连接关系可以用高维动态数组,可以用链表等等表示。In this method, the connection relationship can be represented by a high-dimensional dynamic array, which can be represented by a linked list or the like.
得到处理后的连接关系后,映射单元根据处理后的连接关系输出映射后的输入神经元和权值,映射后的输入神经元和权值的对应关系为输入神经元-权值对,映射后的输入神经元和权值可以在运算时被直接使用。After the processed connection relationship is obtained, the mapping unit outputs the mapped input neurons and weights according to the processed connection relationship, and the mapping relationship between the mapped input neurons and the weights is an input neuron-weight pair, after mapping The input neurons and weights can be used directly during the operation.
总之,上述映射单元中,稀疏化映射单元11和快速映射单元12将输入数据的连接关系做优化处理,并输出映射后的输入神经元和权值,对应连接关系均可采用两种表示方式:一种是每个输入与输出神经元之间都用 一位表示是否有连接,另一种是用连接之间的距离来表示每个连接的位置。In summary, in the mapping unit, the thinning mapping unit 11 and the fast mapping unit 12 optimize the connection relationship of the input data, and output the mapped input neurons and weights, and the corresponding connection relationship can adopt two representation modes: One is to use between each input and output neuron One indicates whether there is a connection, and the other is the distance between the connections to indicate the location of each connection.
为了使得这两个映射单元的功能更加明确,以下分别给出这两个单元中的数据操作过程。In order to make the functions of the two mapping units more explicit, the data operation processes in the two units are respectively given below.
以图2所示的一个人工神经网络为例,仅以判断标准是取值是否为0进行说明,该人工神经网络有4个输入神经元:I1,I2,I3,I4;有2个输出神经元:O1,O2;把连接的权值分别表示为:W11,W21,W31,W41,W12,W22,W32,W42。设I1的值为0,I2,I3,I4非0;设W21,W12,W42为0,其余权值非0。Taking an artificial neural network as shown in FIG. 2 as an example, only the criterion is whether the value is 0. The artificial neural network has 4 input neurons: I1, I2, I3, I4; there are 2 output nerves. Element: O1, O2; the weights of the connections are expressed as: W11, W21, W31, W41, W12, W22, W32, W42. Let I1 have a value of 0, I2, I3, and I4 are not 0; let W21, W12, and W42 be 0, and the remaining weights are non-zero.
稀疏化映射单元和快速映射单元可同时对数据进行处理,也可以依次对数据进行处理并且二者的顺序可以互换,下面仅以稀疏化映射单元先对数据进行处理进行说明。The sparse mapping unit and the fast mapping unit can process the data at the same time, or the data can be processed in turn and the order of the two can be interchanged. The following only describes the data processing by the sparse mapping unit.
采用第一种连接方式表示如下:The first connection is expressed as follows:
在稀疏化映射单元11中,如果不执行稀疏化操作:O1,O2的连接关系默认为:1111,摆放顺序为11111111;如果执行稀疏化操作,如图3所示,输出神经元O1的连接关系为:1011,每一位表示是否与输入有连接,1表示有连接,0表示无连接,输出神经元O2的连接关系为0110。在运算时,连接关系为0所对应的输入神经元与权值不会进行运算。在存储连接关系时,可以按照输出神经元的顺序对连接关系进行存储。将每个输出神经元的所有输入依次摆放完,拼合成一向量,上面的例子摆放的顺序为10110110。In the thinning mapping unit 11, if the thinning operation is not performed: O1, the connection relationship of O2 defaults to: 1111, and the placement order is 11111111; if the thinning operation is performed, as shown in FIG. 3, the connection of the output neuron O1 is output. The relationship is: 1011, each bit indicates whether there is a connection with the input, 1 indicates that there is a connection, 0 indicates no connection, and the connection relationship of the output neuron O2 is 0110. During the operation, the input neurons and the weights corresponding to the connection relationship of 0 are not operated. When storing the connection relationship, the connection relationship can be stored in the order of the output neurons. Put all the inputs of each output neuron in turn and combine them into a vector. The order of the above example is 10110110.
在快速映射单元12中,如果不执行判别输入神经元值的操作,O1,O2的连接关系及摆放顺序不变;如果执行判别输入神经元值的操作,对于执行了稀疏化操作后的图3所示的神经网络,输出神经元O1的连接关系为:0011,第一位由1换成0,是因为第一个输入神经元I1值为0,去除了从I1发出的连接,输出神经元O2的连接关系为:0110,最终摆放为:00110110。对于未执行稀疏化操作的神经网络,输出神经元O1的连接关系为:0111,输出神经元O2的连接关系为:0111,最终摆放为:01110111。In the fast mapping unit 12, if the operation of discriminating the input neuron value is not performed, the connection relationship and the placement order of O1, O2 are unchanged; if the operation of discriminating the input neuron value is performed, the map after the thinning operation is performed In the neural network shown in 3, the connection relationship of the output neuron O1 is: 0011, and the first digit is changed from 1 to 0 because the first input neuron I1 has a value of 0, the connection from I1 is removed, and the output nerve is output. The connection relationship of the element O2 is: 0110, and finally placed as: 00110110. For the neural network that does not perform the thinning operation, the connection relationship of the output neuron O1 is: 0111, and the connection relationship of the output neuron O2 is: 0111, and finally placed as: 01110111.
采用第二种连接方式表示如下:The second connection is expressed as follows:
在稀疏化映射单元11中,如果不执行稀疏化操作,O1,O2的连接关系默认为:0,1,1,1;如果执行稀疏化操作,如图4所示,输出神经元 O1与输入神经元I1,I3,I4相连接,那么连接关系为0,2,1。0表示第一个连接所在的位置距离第一个输入神经元的距离为0,即第一个输入神经元,2表示第二个输入神经元距离上一个输入神经元的距离为2,即表示第三个输入神经元,1表示第三个输入神经元距离上一个输入神经元的距离为1,即表示第四个输入神经元。同理,O2的连接关系为1,1。In the thinning mapping unit 11, if the thinning operation is not performed, the connection relationship of O1, O2 defaults to: 0, 1, 1, 1; if the thinning operation is performed, as shown in FIG. 4, the output neurons are output. O1 is connected to the input neurons I1, I3, I4, then the connection relationship is 0, 2, 1. 0 indicates that the position of the first connection is 0 from the first input neuron, that is, the first input nerve Yuan, 2 indicates that the distance of the second input neuron from the previous input neuron is 2, that is, the third input neuron, and 1 indicates that the distance of the third input neuron from the previous input neuron is 1, that is, Represents the fourth input neuron. Similarly, the connection relationship of O2 is 1,1.
在快速映射单元12中,如果不执行判别输入神经元值的操作:O1,O2的连接关系不变;如果执行判别输入神经元值的操作,对于执行了稀疏化操作后的图4所示的神经网络,因为第一个输入神经元I1值为0,去除了从I1发出的连接,故输出神经元O1的连接关系为:2,1,输出神经元O2的连接关系为:1,1。对于未执行稀疏化操作的神经网络,O1,O2的连接关系都是:1,1,1。In the fast mapping unit 12, if the operation of discriminating the input neuron value is not performed: the connection relationship of O1, O2 is unchanged; if the operation of discriminating the input neuron value is performed, the operation shown in FIG. 4 after performing the thinning operation The neural network, because the first input neuron I1 value is 0, removes the connection from I1, so the connection relationship of the output neuron O1 is: 2, 1, the connection relationship of the output neuron O2 is: 1,1. For a neural network that does not perform thinning operations, the connection relationship of O1 and O2 is: 1,1,1.
本发明中的稀疏化映射单元11和快速映射单元12包括但不限于以上的连接关系。The thinning mapping unit 11 and the fast mapping unit 12 in the present invention include, but are not limited to, the above connection relationship.
稀疏化映射单元11和快速映射单元12会根据上面得到的连接关系,输出映射后的神经元和权值,映射后的神经元和权值的对应关系为输入神经元-权值对,输入神经元-权值对可以在运算时被直接使用,以图2所示的一个人工神经网络中输出神经元O1映射的具体过程为例:The sparse mapping unit 11 and the fast mapping unit 12 output the mapped neurons and weights according to the connection relationship obtained above, and the correspondence between the mapped neurons and the weights is an input neuron-weight pair, and the input nerve The meta-weight pair can be used directly in the operation, taking the specific process of outputting the neuron O1 mapping in an artificial neural network as shown in FIG. 2 as an example:
输入神经元为:I1,I2,I3,I4,输入权值为:W11,W21,W31,W41,其中I1,W21取0,其余非0。The input neurons are: I1, I2, I3, I4, and the input weights are: W11, W21, W31, W41, where I1, W21 take 0, and the rest are non-zero.
首先,稀疏化映射单元11中,连接关系为:1011,或0,2,1。然后,快速映射单元12中,连接关系为:0011,或2,1。两个映射单元根据连接关系,输出是去除掉值为0的输入神经元以及从它发出的连接权值,则映射后的输入神经元为I3,I4,映射后的权值为W31,W41,输入神经元-权值对是:I3-W31,I4-W41。例如,用向量的方式对映射后的输入神经元和权值进行存储,则得到的输入神经元向量是(I3,I4),得到的权值向量是(W31,W41)。First, in the thinning mapping unit 11, the connection relationship is: 1011, or 0, 2, 1. Then, in the quick mapping unit 12, the connection relationship is: 0011, or 2, 1. According to the connection relationship, the two mapping units output the input neurons with the value of 0 removed and the connection weights issued therefrom. The mapped input neurons are I3, I4, and the mapped weights are W31, W41. The input neuron-weight pair is: I3-W31, I4-W41. For example, by storing the mapped input neurons and weights in a vector manner, the obtained input neuron vector is (I3, I4), and the obtained weight vector is (W31, W41).
尽管上述举例为先利用稀疏化映射单元执行稀疏化操作,再利用快速映射单元执行快速映射,最终得到映射后的输入神经元和权值。但是在实际应用中,两个映射单元优选是同时对数据进行操作的,不分先后顺序。Although the above example first performs the thinning operation by using the thinning mapping unit, the fast mapping unit is used to perform the fast mapping, and finally the mapped input neurons and weights are obtained. However, in practical applications, the two mapping units preferably operate on the data at the same time, regardless of the order.
本发明实施例中的支持快速人工神经网络运算的装置,除了映射单元 1,还包括:存储单元2、DMA(直接内存存取)3、指令缓存4、控制单元5、输入神经元缓存6、权值缓存7、运算单元8以及输出神经元缓存9。如图5所示,A device supporting fast artificial neural network operation in the embodiment of the present invention, except for a mapping unit 1, further comprising: a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, an input neuron buffer 6, a weight buffer 7, an arithmetic unit 8, and an output neuron cache 9. As shown in Figure 5,
存储单元2,用于存储数据和指令,其接收并存储外界输入的数据及指令,该数据包括输入神经元、权值以及连接关系。The storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.
映射单元1调取存储单元2中的输入神经元、权值以及连接关系,并由稀疏化映射单元11执行稀疏化操作,由快速映射单元12执行快速映射,映射单元1经过对数据的映射获得映射后的输入神经元和权值,存储至存储单元2中。The mapping unit 1 retrieves the input neurons, the weights, and the connection relationships in the storage unit 2, and performs the thinning operation by the thinning mapping unit 11, performs the fast mapping by the fast mapping unit 12, and the mapping unit 1 obtains the mapping of the data. The mapped input neurons and weights are stored in the storage unit 2.
DMA3调用存储单元2中指令及经过映射后的输入神经元和权值,分别分配给指令缓存4、输入神经元缓存6、权值缓存7。The DMA 3 calls the instruction in the storage unit 2 and the mapped input neurons and weights, and allocates them to the instruction cache 4, the input neuron buffer 6, and the weight buffer 7, respectively.
控制单元5从指令缓存4中读取专用指令,并将其译码成运算单元指令并输入至运算单元8。The control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
运算单元8用于执行具体的运算,其根据运算指令,调取输入神经元缓存6、权值缓存7中的映射后的输入神经元和权值,进行运算。运算单元8包括乘法运算单元,用于将调用映射后的神经元和权值数据相乘;加法树运算单元,用于将第一阶段得到的相乘后的结果通过加法树逐级相加,完成了向量内积运算;以及非线性变换单元,用于对第二阶段得到的结果进行非线性变换后得到输出神经元,所述非线性变换为激活函数运算,激活函数可以是sigmoid函数、tanh函数、ReLU函数或softmax函数等。The operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated. The operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step. The vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.
输出神经元缓存9用于存储运算单元获得的输出神经元,再经DMA3存储至存储单元2中,外界可以调取存储至存储单元2中的输出神经元。The output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
本实施例还提供了一种支持快速人工神经网络运算的方法,如图6所示,包括以下步骤:This embodiment also provides a method for supporting fast artificial neural network operation. As shown in FIG. 6, the method includes the following steps:
S101:读取人工神经网络SIMD指令,用于开始进行快速人工神经网络运算的操作。S101: Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.
S102:映射单元调用存储单元中的全部输入神经元、权值以及连接关系,并对其进行处理,得到映射后的输入神经元和权值,并存储至存储单元。S102: The mapping unit calls all input neurons, weights, and connection relationships in the storage unit, and processes the same, and obtains the mapped input neurons and weights, and stores them in the storage unit.
具体的,稀疏化映射单元对输入神经元、权值以及连接关系进行稀疏 化处理、快速映射单元对输入神经元、权值以及连接关系进行快速映射处理。两种映射单元均可以采用两种连接方式对连接关系进行处理,并将输入神经元和输入权值按照处理后连接关系输出映射后的神经元和权值,在前已经对两种连接方式及按照处理后连接关系输出映射后的神经元和权值进行了详细描述,在此不再进行赘述。Specifically, the sparse mapping unit sparses input neurons, weights, and connection relationships. The processing and fast mapping unit performs fast mapping processing on input neurons, weights, and connection relationships. Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights. The neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.
S103:输入神经元缓存6、权值缓存7通过DMA3读取部分映射后的神经元和权值。S103: Input the neuron cache 6. The weight buffer 7 reads the partially mapped neurons and weights through the DMA3.
S104:运算单元调用神经元缓存6、权值缓存7中映射后的输入神经元和权值进行运算,获得输出神经元。S104: The operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
具体的运算包括以下步骤,如图7所示:The specific operation includes the following steps, as shown in Figure 7:
S1041:执行乘法运算,用于将调用映射后的神经元和权值数据相乘;S1041: Perform a multiplication operation, which is used to multiply the mapped neuron and the weight data;
S1042:执行加法树运算,将第一阶段得到的结果通过加法树逐级相加,完成向量内积运算;S1042: Perform an addition tree operation, and add the results obtained in the first stage by the addition tree step by step to complete the vector inner product operation;
S1043:对第二阶段得到的结果进行非线性变换后得到输出神经元,所述非线性变换为激活函数运算,激活函数可以是sigmoid函数、tanh函数、ReLU函数或softmax函数等。S1043: Perform non-linear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transformation is an activation function operation, and the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.
S105:运算单元将获得的输出神经元存储至输出神经元缓存9,并经DMA3存储至存储单元2中。S105: The operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
S106:判断是否所有映射后的神经元和权值运算完毕,若结果为N,则返回步骤S103,若结果为Y,则执行步骤S107。S106: It is determined whether all the mapped neurons and the weights are calculated. If the result is N, the process returns to step S103. If the result is Y, step S107 is performed.
S107:结束运算。S107: End the operation.
本发明另一实施例提供了一种支持快速人工神经网络运算的装置,其包括映射单元1、存储单元2、DMA(直接内存存取)3、指令缓存4、控制单元5、输入神经元缓存6、权值缓存7、运算单元8以及输出神经元缓存9,如图8所示,Another embodiment of the present invention provides an apparatus for supporting fast artificial neural network operations, including a mapping unit 1, a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, and an input neuron cache. 6. The weight buffer 7, the arithmetic unit 8, and the output neuron cache 9, as shown in FIG.
存储单元2,用于存储数据和指令,其接收并存储外界输入的数据及指令,该数据包括输入神经元、权值以及连接关系。The storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.
DMA3调用存储单元2中的指令分配给指令缓存4,调用存储单元2中的输入神经元、权值以及连接关系分配给映射单元1直接用于映射。The instructions in the DMA3 call storage unit 2 are allocated to the instruction cache 4, and the input neurons in the call storage unit 2, the weights, and the connection relationship are assigned to the mapping unit 1 for direct mapping.
映射单元1由稀疏化映射单元11执行稀疏化操作,由快速映射单元 12执行快速映射,映射单元1经过对数据的映射获得映射后的输入神经元和权值,分别传输给神经元缓存6、权值缓存7。The mapping unit 1 performs a thinning operation by the thinning mapping unit 11, and the fast mapping unit 12 performs fast mapping, and the mapping unit 1 obtains the mapped input neurons and weights through mapping of the data, and transmits them to the neuron cache 6, and the weight buffer 7, respectively.
控制单元5从指令缓存4中读取专用指令,并将其译码成运算单元指令并输入至运算单元8。The control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
运运算单元8用于执行具体的运算,其根据运算指令,调取输入神经元缓存6、权值缓存7中的映射后的输入神经元和权值,进行运算。运算单元8包括乘法运算单元,用于将调用映射后的神经元和权值数据相乘;加法树运算单元,用于将第一阶段得到的相乘后的结果通过加法树逐级相加,完成了向量内积运算;以及非线性变换单元,用于对第二阶段得到的结果进行非线性变换后得到输出神经元,所述非线性变换为激活函数运算,激活函数可以是sigmoid函数、tanh函数、ReLU函数或softmax函数等。The operation unit 8 is configured to perform a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated. The operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step. The vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.
输出神经元缓存9用于存储运算单元获得的输出神经元,再经DMA3存储至存储单元2中,外界可以调取存储至存储单元2中的输出神经元。The output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
本实施例还提供了一种支持快速人工神经网络运算的方法,如图9所示,包括以下步骤:This embodiment also provides a method for supporting fast artificial neural network operations, as shown in FIG. 9, including the following steps:
S201:读取人工神经网络SIMD指令,用于开始进行快速人工神经网络运算的操作。S201: Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.
S202:映射单元通过DMA3调用存储单元中的部分输入神经元、权值以及连接关系,并对其进行处理,得到映射后的输入神经元和权值分别直接存入神经元缓存6、权值缓存7。S202: the mapping unit calls part of the input neurons, weights, and connection relationships in the storage unit through DMA3, and processes the same, and obtains the mapped input neurons and weights directly into the neuron cache 6, and the weight cache. 7.
具体的,稀疏化映射单元对输入神经元、权值以及连接关系进行稀疏化处理、快速映射单元对输入神经元、权值以及连接关系进行快速映射处理。两种映射单元均可以采用两种连接方式对连接关系进行处理,并将输入神经元和输入权值按照处理后连接关系输出映射后的神经元和权值,在前已经对两种连接方式及按照处理后连接关系输出映射后的神经元和权值进行了详细描述,在此不再进行赘述。Specifically, the sparse mapping unit performs thinning processing on the input neurons, the weights, and the connection relationships, and the fast mapping unit performs fast mapping processing on the input neurons, the weights, and the connection relationships. Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights. The neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.
S203:运算单元调用神经元缓存6、权值缓存7中映射后的输入神经元和权值进行运算,获得输出神经元。S203: The operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
具体的运算步骤与前一实施例中步骤S104的运算步骤相同,再次不在赘述。 The specific operation steps are the same as those of the previous step S104 in the previous embodiment, and are not described again.
S204:运算单元将获得的输出神经元存储至输出神经元缓存9,并经DMA3存储至存储单元2中。S204: The operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
S205:判断是否所有输入神经元和权值均经过映射及运算,若结果为N,则返回步骤S102,若结果为Y,则执行步骤S107。S205: It is determined whether all input neurons and weights are mapped and operated. If the result is N, the process returns to step S102. If the result is Y, step S107 is performed.
S206:结束运算。S206: End the operation.
本实施例相较于上一实施例,本实施例中映射单元的稀疏化映射单元和快速映射单元是在计算中进行映射,将映射好的数据直接给运算单元进行运算,而上一实施例,在运算单元计算之前事先利用映射单元的稀疏化映射单元和快速映射单元映射好的数据存储在存储单元中,本实施例,运算速度更快。Compared with the previous embodiment, the thinning mapping unit and the fast mapping unit of the mapping unit in the embodiment are mapped in the calculation, and the mapped data is directly calculated to the operation unit, and the previous embodiment The data mapped by the thinning mapping unit and the fast mapping unit of the mapping unit before use in the calculation of the arithmetic unit are stored in the storage unit. In this embodiment, the operation speed is faster.
本发明又一实施例提供一种快速人工神经网络运算的系统,如图10所示,其包括:I/O接口20、存储装置30、中央处理器(CPU)40以及支持快速人工神经网络运算的装置10。Another embodiment of the present invention provides a system for fast artificial neural network operation, as shown in FIG. 10, which includes: an I/O interface 20, a storage device 30, a central processing unit (CPU) 40, and a fast artificial neural network operation. Device 10.
I/O接口20,用于I/O数据需要经过CPU 40发给支持快速人工神经网络运算的装置10,然后由支持快速人工神经网络运算的装置10写入存储装置30,快速人工神经网络运算装置10需要的专用指令也是由CPU40传输到快速人工神经网络运算装置10。The I/O interface 20 for I/O data needs to be sent by the CPU 40 to the device 10 supporting the fast artificial neural network operation, and then written to the storage device 30 by the device 10 supporting the fast artificial neural network operation, and the fast artificial neural network operation The dedicated instructions required by device 10 are also transmitted by CPU 40 to fast artificial neural network computing device 10.
存储装置30用于暂存人工神经网络模型和神经元数据,特别是当全部模型无法在支持快速人工神经网络运算的装置10上的缓存中放下时。 Storage device 30 is used to temporarily store artificial neural network models and neuron data, particularly when all models are not available in the cache on device 10 supporting fast artificial neural network operations.
中央处理器(CPU)40,用于进行数据搬运以及支持快速人工神经网络运算的装置10启动停止等基本控制,作为支持快速人工神经网络运算的装置10与外部控制的接口。A central processing unit (CPU) 40 is used for data handling and basic control such as start-stop of the device 10 supporting fast artificial neural network operation, as an interface between the device 10 supporting the fast artificial neural network operation and external control.
支持快速人工神经网络运算的装置10,用于接受来自CPU 40的数据和程序,执行快速人工神经网络运算算法,支持快速人工神经网络运算的装置10的执行结果将传输回CPU40。A device 10 supporting fast artificial neural network operations for accepting data and programs from the CPU 40, executing a fast artificial neural network operation algorithm, and executing the results of the device 10 supporting the fast artificial neural network operation is transmitted back to the CPU 40.
本实施例中将支持支持快速人工神经网络运算的装置10作为CPU 40或者GPU的协处理器来执行快速人工神经网络运算算法。In this embodiment, the device 10 supporting the fast artificial neural network operation is supported as a coprocessor of the CPU 40 or the GPU to execute a fast artificial neural network operation algorithm.
本发明再一实施例中,多个支持快速人工神经网络运算的装置互联构成系统:多个支持快速人工神经网络运算的装置可以通过PCIE总线互联,以支持更大规模的快速人工神经网络运算,可以共用同一个宿主CPU或 者分别有自己的宿主CPU,可以共享内存也可以每个加速器有各自的内存。此外其互联方式可以是任意互联拓扑。In still another embodiment of the present invention, a plurality of devices supporting fast artificial neural network operations are interconnected to form a system: a plurality of devices supporting fast artificial neural network operations can be interconnected through a PCIE bus to support larger-scale rapid artificial neural network operations. Can share the same host CPU or Each has its own host CPU, which can share memory or each accelerator has its own memory. In addition, the interconnection method can be any interconnection topology.
检验是否使用本发明中所述技术方案的方式如下:The manner of checking whether or not the technical solution described in the present invention is used is as follows:
如果采用了本发明所述技术的装置或方法,在进行神经网络运算时,对于一个给定的神经网络,如果网络中有部分输入神经元和权值的取值等于0或在0附近,则在运算速度上,比不采用本发明所述技术的装置或方法有所提升。而且取值等于0或在0附近的输入神经元占网络中所有输入神经元的比例越大,运算速度提升越大;取值等于0或在0附近的权值占网络中所有权值的比例越大,运算速度提升越大。If a device or method using the techniques of the present invention is employed, when performing a neural network operation, if a portion of the input neurons and weights in the network have a value equal to zero or near zero for a given neural network, then In terms of computational speed, there is an improvement over devices or methods that do not employ the techniques described herein. Moreover, the larger the ratio of input neurons equal to 0 or near 0 to all input neurons in the network, the greater the increase in operation speed; the value of the value equal to 0 or the weight near 0 is the proportion of the ownership value in the network. Large, the higher the speed of the operation.
前面的附图中所描绘的进程或方法可通过包括硬件(例如,电路、专用逻辑等)、固件、软件(例如,被承载在非瞬态计算机可读介质上的软件),或两者的组合的处理逻辑来执行。虽然上文按照某些顺序操作描述了进程或方法,但是,应该理解,所描述的某些操作能以不同顺序来执行。此外,可并行地而非顺序地执行一些操作。The processes or methods depicted in the preceding figures may be by hardware (eg, circuitry, dedicated logic, etc.), firmware, software (eg, software carried on a non-transitory computer readable medium), or both. The combined processing logic is executed. Although the processes or methods have been described above in some order, it should be understood that certain operations described can be performed in a different order. Moreover, some operations may be performed in parallel rather than sequentially.
需要说明的是,在附图或说明书正文中,未绘示或描述的实现方式,均为所属技术领域中普通技术人员所知的形式,并未进行详细说明。此外,上述对各元件和方法的定义并不仅限于实施例中提到的各种具体结构、形状或方式,本领域普通技术人员可对其进行简单地更改或替换。It should be noted that the implementations that are not shown or described in the drawings or the text of the specification are all known to those of ordinary skill in the art and are not described in detail. In addition, the above definitions of the various elements and methods are not limited to the specific structures, shapes or manners mentioned in the embodiments, and those skilled in the art can simply modify or replace them.
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The specific embodiments of the present invention have been described in detail in the foregoing detailed description of the embodiments of the present invention. All modifications, equivalents, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (16)

  1. 一种支持快速人工神经网络运算的装置,包括:A device that supports fast artificial neural network operations, including:
    映射单元(1),接收输入神经元、权值以及输入神经元和输出神经元的连接关系,对所述连接关系进行优化处理,输出映射后的输入神经元和权值,所述映射后的输入神经元和权值的对应关系为输入神经元-权值对。The mapping unit (1) receives an input neuron, a weight, and a connection relationship between the input neuron and the output neuron, optimizes the connection relationship, and outputs the mapped input neuron and the weight, the mapped The correspondence between the input neurons and the weights is the input neuron-weight pair.
  2. 根据权利要求1所述的装置,其中,所述映射单元(1)包括:The apparatus according to claim 1, wherein said mapping unit (1) comprises:
    稀疏化映射单元(11),用于去除神经网络中权值为0或小于第一阈值的连接;和/或a thinning mapping unit (11) for removing a connection in the neural network with a weight of 0 or less than a first threshold; and/or
    快速映射单元(12),用于去除输入神经元为0或小于第二阈值的连接;a fast mapping unit (12) for removing a connection in which the input neuron is 0 or smaller than the second threshold;
    所述稀疏化映射单元(11)和所述快速映射单元(12)分别对连接关系进行优化处理。The thinning mapping unit (11) and the fast mapping unit (12) respectively perform optimization processing on the connection relationship.
  3. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I1,I2,...,IN,输出层具有M个输出神经元O1,O2,...,OM,所述稀疏化映射单元(11)对连接关系进行优化处理包括:The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the thinning mapping unit (11) optimizes the connection relationship including:
    对第j个输出神经元Oj得到其对应的连接关系,对应于输入层的N个节点,所述连接关系有N位,初始时,所述N位的值都置为1,N个输入神经元I1,I2,...,IN与输出神经元Oj之间均有连接,若第i个输入神经元Ii与输出神经元Oj之间的权值为0或小于第一阈值,将该连接关系中第i位的值置为0,Ii与Oj之间无连接,将所有的输出神经元O1,O2,...,OM的连接关系拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量为输出神经元Oj对应的处理后的连接关系。The j-th output neuron O j obtains its corresponding connection relationship, corresponding to the N nodes of the input layer, the connection relationship has N bits, and initially, the value of the N bits is set to 1, N inputs The neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j if the weight between the i-th input neuron I i and the output neuron O j is 0 or less. The first threshold value sets the value of the i-th bit in the connection relationship to 0, and there is no connection between I i and O j , and the connection relationship of all the output neurons O 1 , O 2 , . . . , O M is combined. a vector, the vector of N × (j-1) +1 th component of the N × j-th component of the processed output connection relationship neurons corresponding O j.
  4. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I1,I2,...,IN,输出层具有M个输出神经元O1,O2,...,OM,所述稀疏化映射单元(11)对连接关系进行优化处理包括:The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the thinning mapping unit (11) optimizes the connection relationship including:
    对第j个输出神经元Oj得到其对应的连接关系,若第i个输入神经元Ii与输出神经元Oj之间的权值为0或小于第一阈值,则Ii与Oj之间无连接,否则有连接,与Oj有连接的n个输入神经元为Ii_1,Ii_2,...,Ii_n,其中1≤i_1<i_2<...<i_n≤N,输出神经元Oj对应的处理后的连接关系有n位, 第1位值等于i_1-1,连接关系第k位的值等于i_k-i_(k-1),其中,n≥k>1。For the jth output neuron O j to obtain its corresponding connection relationship, if the weight between the i-th input neuron I i and the output neuron O j is 0 or less than the first threshold, then I i and O j There is no connection between them, otherwise there are connections. The n input neurons connected to O j are I i_1 , I i_2 ,..., I i_n , where 1≤i_1<i_2<...<i_n≤N, output The processed connection relationship corresponding to the neuron O j has n bits, the first bit value is equal to i_1-1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1), where n≥k>1.
  5. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I1,I2,...,IN,输出层具有M个输出神经元O1,O2,...,OM,所述快速映射单元(12)对连接关系进行优化处理包括:The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the fast mapping unit (12) optimizes the connection relationship including:
    对第j个输出神经元Oj得到其对应的连接关系,对应于输入层的N个节点,所述连接关系有N位,初始时,所述N位的值都置为1,N个输入神经元I1,I2,...,IN与输出神经元Oj之间均有连接,若第i个输入神经元Ii为0或小于第二阈值,将该连接关系中第i位的值置为0,Ii与Oj之间无连接,将所有的输出神经元O1,O2,...,OM的连接关系拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量为输出神经元Oj对应的处理后的连接关系。The j-th output neuron O j obtains its corresponding connection relationship, corresponding to the N nodes of the input layer, the connection relationship has N bits, and initially, the value of the N bits is set to 1, N inputs The neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j . If the i-th input neuron I i is 0 or less than the second threshold, the i-th in the connection relationship The value of the bit is set to 0, there is no connection between I i and O j , and the connection relationship of all the output neurons O 1 , O 2 , . . . , O M is combined into one vector, and the N×( J-1) The +1 component to the N×jth component is the processed connection relationship corresponding to the output neuron O j .
  6. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I1,I2,...,IN,输出层具有M个输出神经元O1,O2,...,OM,所述快速映射单元(12)对连接关系进行优化处理包括:The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the fast mapping unit (12) optimizes the connection relationship including:
    对第j个输出神经元Oj得到其对应的连接关系,若第i个输入神经元Ii为0或小于第二阈值,则Ii与Oj之间无连接,否则有连接,与Oj有连接的n个输入神经元为Ii_1,Ii_2,...,Ii_n,其中1≤i_1<i_2<...<i_n≤N,输出神经元Oj对应的处理后的连接关系有n位,第1位值等于i_1-1,连接关系第k位的值等于i_k-i_(k-1),其中,n≥k>1。For the jth output neuron O j to obtain its corresponding connection relationship, if the i th input neuron I i is 0 or less than the second threshold, then there is no connection between I i and O j , otherwise there is a connection, and O j has n connected input neurons as I i_1 , I i_2 ,..., I i_n , where 1≤i_1<i_2<...<i_n≤N, and the processed connection relationship corresponding to the output neuron O j There are n bits, the first bit value is equal to i_1-1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1), where n≥k>1.
  7. 根据权利要求1至6中任一所述的装置,还包括:The apparatus according to any one of claims 1 to 6, further comprising:
    存储单元(2),用于存储外界输入的数据及指令,所述数据包括输入神经元、权值以及连接关系,所述映射单元(1)调取所述输入神经元、权值以及连接关系并输出映射后的输入神经元和权值;a storage unit (2) for storing externally input data and instructions, the data including input neurons, weights, and connection relationships, the mapping unit (1) retrieving the input neurons, weights, and connection relationships And outputting the mapped input neurons and weights;
    运算单元(8),用于调取所述映射后的输入神经元和权值并进行运算获得输出神经元。The operation unit (8) is configured to retrieve the mapped input neurons and weights and perform operations to obtain output neurons.
  8. 根据权利要求7所述的装置,其中,所述运算装置(8)包括:The apparatus according to claim 7, wherein said arithmetic means (8) comprises:
    乘法运算单元,用于将映射后的神经元和权值相乘;a multiplication unit for multiplying the mapped neurons by weights;
    加法树运算单元,用于将乘法运算单元获得的结果通过加法树逐级相加,完成向量内积运算;以及An addition tree operation unit, configured to add the results obtained by the multiplication unit through the addition tree step by step to complete the vector inner product operation;
    非线性变换单元,用于对加法树运算单元获得的结果进行非线性变换 得到输出神经元。A nonlinear transform unit for nonlinearly transforming the result obtained by the addition tree operation unit Get the output neurons.
  9. 根据权利要求8所述的装置,其中,所述非线性变换为激活函数运算,所述激活函数选自sigmoid函数、tanh函数、ReLU函数或softmax函数。The apparatus of claim 8, wherein the non-linear transformation is an activation function operation selected from a sigmoid function, a tanh function, a ReLU function, or a softmax function.
  10. 根据权利要求7所述的装置,还包括:The apparatus of claim 7 further comprising:
    指令缓存单元(4),用于缓存所述指令;An instruction cache unit (4) for buffering the instructions;
    输入神经元缓存(6),用于缓存所述映射后的输入神经元;Inputting a neuron cache (6) for buffering the mapped input neurons;
    权值缓存(7),用于缓存所述映射后的权值;a weight cache (7) for buffering the mapped weights;
    控制单元(5),用于读取所述指令缓存单元(4)中的指令,并控制所述运算单元(8)调取所述输入神经元缓存(6)中的所述映射后的输入神经元及所述权值缓存(7)中所述映射后的权值并进行运算;以及a control unit (5) for reading an instruction in the instruction cache unit (4) and controlling the operation unit (8) to retrieve the mapped input in the input neuron buffer (6) a neuron and the mapped weights in the weight buffer (7) and performing operations;
    输出神经元缓存(9),用于缓存所述运算单元(8)获得的所述输出神经元。An output neuron cache (9) is provided for buffering the output neurons obtained by the arithmetic unit (8).
  11. 根据权利要求10所述的装置,其中,所述映射单元(1)输出的映射后的输入神经元和权值存储在所述存储单元(2)上,所述装置还包括:The apparatus according to claim 10, wherein the mapped input neurons and weights output by the mapping unit (1) are stored on the storage unit (2), the device further comprising:
    DMA(3),用于调取存储单元(2)上的指令及映射后的输入神经元和权值分别存储至所述指令缓存单元(4)、输入神经元缓存(6)、权值缓存(7),并将所述输出神经元缓存(9)中的所述输出神经元存储至存储单元(2)上用于传输至外界。DMA (3) for retrieving instructions on the storage unit (2) and mapping the input neurons and weights to the instruction cache unit (4), the input neuron cache (6), and the weight buffer respectively (7), and storing the output neurons in the output neuron buffer (9) on the storage unit (2) for transmission to the outside world.
  12. 根据权利要求10所述的装置,其中,所述装置还包括:The device of claim 10, wherein the device further comprises:
    DMA(3),用于调取存储单元(2)上的指令存储至所述指令缓存单元(4),并调取存储单元(2)上的数据至映射单元(1),所述映射单元(1)输出的映射后的输入神经元和权值分别存储至输入神经元缓存(6)、权值缓存(7),并将所述输出神经元缓存(9)中的所述输出神经元存储至存储单元(2)上用于传输至外界。a DMA (3) for retrieving instructions stored on the memory unit (2) to the instruction cache unit (4), and retrieving data on the memory unit (2) to the mapping unit (1), the mapping unit (1) The outputted mapped input neurons and weights are respectively stored in the input neuron cache (6), the weight buffer (7), and the output neurons in the output neuron cache (9) Stored on the storage unit (2) for transmission to the outside world.
  13. 一种支持快速人工神经网络运算的方法,包括权利要求7至12中任一所述的装置,所述方法包括:A method of supporting a fast artificial neural network operation, comprising the apparatus of any one of claims 7 to 12, the method comprising:
    映射单元(1)调取所述存储单元(2)中的所述输入神经元、权值以及连接关系并输出映射后的输入神经元和权值; The mapping unit (1) retrieves the input neurons, weights, and connection relationships in the storage unit (2) and outputs the mapped input neurons and weights;
    运算装置(8)调取所述映射后的输入神经元和权值并进行运算获得输出神经元。The computing device (8) retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
  14. 根据权利要求13所述的方法,其中,所述运算包括:The method of claim 13 wherein said computing comprises:
    乘法运算,将映射后的神经元和权值相乘;Multiplication, multiplying the mapped neurons by weights;
    加法树运算,将乘法运算获得的结果通过加法树逐级相加,完成向量内积运算;以及Adding a tree operation, adding the results obtained by the multiplication operation step by step through the addition tree to complete the vector inner product operation;
    非线性变换,对加法树运算单元获得的结果进行非线性变换得到输出神经元。A nonlinear transformation is performed by nonlinearly transforming the result obtained by the addition tree operation unit to obtain an output neuron.
  15. 根据权利要求13所述的方法,所述方法还包括:The method of claim 13 further comprising:
    所述映射单元(1)调取所述存储单元(2)中的全部的所述输入神经元、权值以及连接关系并输出映射后的输入神经元和权值,并存储至所述存储单元(2);The mapping unit (1) retrieves all of the input neurons, weights, and connection relationships in the storage unit (2) and outputs the mapped input neurons and weights, and stores them in the storage unit (2);
    输入神经元缓存(6)、权值缓存(7)通过DMA(3)读取部分所述映射后的输入神经元和权值,并被运算单元(8)调取;The input neuron buffer (6) and the weight buffer (7) read a portion of the mapped input neurons and weights through the DMA (3), and are retrieved by the operation unit (8);
    输出神经元缓存(9)缓存所述运算单元(8)获得的所述输出神经元,并通过DMA(3)存储至所述存储单元(2);Outputting a neuron buffer (9) buffering the output neuron obtained by the operation unit (8), and storing to the storage unit (2) by DMA (3);
    判断所述输入神经元及权值是否均经过运算,若是,运算结束,否则,返回输入神经元缓存(6)、权值缓存(7)通过DMA(3)读取部分所述映射后的输入神经元和权值的步骤。Determining whether the input neuron and the weight are both operated, and if so, the operation ends; otherwise, returning to the input neuron cache (6), the weight buffer (7) reading the mapped input through the DMA (3) The steps of neurons and weights.
  16. 根据权利要求13所述的方法,所述方法还包括:The method of claim 13 further comprising:
    所述映射单元(1)通过DMA(3)调取所述存储单元(2)中的部分的所述输入神经元、权值以及连接关系并输出映射后的输入神经元和权值;The mapping unit (1) retrieves the input neurons, weights, and connection relationships of the portions of the storage unit (2) through DMA (3) and outputs the mapped input neurons and weights;
    输入神经元缓存(6)、权值缓存(7)缓存所述映射后的输入神经元和权值,并被运算单元(8)调取;The input neuron cache (6) and the weight buffer (7) buffer the mapped input neurons and weights, and are retrieved by the operation unit (8);
    输出神经元缓存(9)缓存所述运算单元(8)获得的所述输出神经元,并通过DMA(3)存储至所述存储单元(2);Outputting a neuron buffer (9) buffering the output neuron obtained by the operation unit (8), and storing to the storage unit (2) by DMA (3);
    判断所述输入神经元及权值是否均经过映射及运算,若是,运算结束,否则,返回映射单元(1)通过DMA(3)调取所述存储单元(2)中的部分的所述输入神经元、权值以及连接关系的步骤。 Determining whether the input neuron and the weight are both mapped and operated, and if so, the operation ends; otherwise, the return mapping unit (1) retrieves the input of the portion of the storage unit (2) by DMA (3) The steps of neurons, weights, and connections.
PCT/CN2016/111737 2016-12-23 2016-12-23 Device and method for supporting fast artificial neural network operation WO2018112892A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/111737 WO2018112892A1 (en) 2016-12-23 2016-12-23 Device and method for supporting fast artificial neural network operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/111737 WO2018112892A1 (en) 2016-12-23 2016-12-23 Device and method for supporting fast artificial neural network operation

Publications (1)

Publication Number Publication Date
WO2018112892A1 true WO2018112892A1 (en) 2018-06-28

Family

ID=62624217

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111737 WO2018112892A1 (en) 2016-12-23 2016-12-23 Device and method for supporting fast artificial neural network operation

Country Status (1)

Country Link
WO (1) WO2018112892A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740739A (en) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 Neural computing device, neural computing method and Related product
CN111222632A (en) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 Computing device, computing method and related product
CN111291884A (en) * 2018-12-10 2020-06-16 中科寒武纪科技股份有限公司 Neural network pruning method and device, electronic equipment and computer readable medium
CN111523653A (en) * 2019-02-03 2020-08-11 上海寒武纪信息科技有限公司 Arithmetic device and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (en) * 2003-10-21 2004-09-15 上海交通大学 Neural network modelling method
CN105701540A (en) * 2016-01-11 2016-06-22 清华大学 Self-generated neural network construction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (en) * 2003-10-21 2004-09-15 上海交通大学 Neural network modelling method
CN105701540A (en) * 2016-01-11 2016-06-22 清华大学 Self-generated neural network construction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNFEI ET AL.: "Dynamic Optimization Structure Design for Neural Networks: Review and Perspective", CONTROL THEORY & APPLICATIONS, vol. 27, no. 3, 31 March 2010 (2010-03-31), ISSN: 1000-8152 *
SUN, HUANLONG ET AL.: "A New Pruning Algorithm for Feedforward Neural Network", JOURNAL OF GUANGXI TEACHERS, vol. 30, no. 4, 31 December 2013 (2013-12-31), ISSN: 1002-8743 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222632A (en) * 2018-11-27 2020-06-02 中科寒武纪科技股份有限公司 Computing device, computing method and related product
CN111291884A (en) * 2018-12-10 2020-06-16 中科寒武纪科技股份有限公司 Neural network pruning method and device, electronic equipment and computer readable medium
CN109740739A (en) * 2018-12-29 2019-05-10 北京中科寒武纪科技有限公司 Neural computing device, neural computing method and Related product
CN109740739B (en) * 2018-12-29 2020-04-24 中科寒武纪科技股份有限公司 Neural network computing device, neural network computing method and related products
CN111523653A (en) * 2019-02-03 2020-08-11 上海寒武纪信息科技有限公司 Arithmetic device and method
CN111523653B (en) * 2019-02-03 2024-03-29 上海寒武纪信息科技有限公司 Computing device and method

Similar Documents

Publication Publication Date Title
CN107545303B (en) Computing device and operation method for sparse artificial neural network
US11568258B2 (en) Operation method
CN108427990B (en) Neural network computing system and method
WO2017124642A1 (en) Device and method for executing forward calculation of artificial neural network
WO2021208612A1 (en) Data processing method and device
WO2018113790A1 (en) Operation apparatus and method for artificial neural network
WO2018112892A1 (en) Device and method for supporting fast artificial neural network operation
CN111105023B (en) Data stream reconstruction method and reconfigurable data stream processor
WO2018120016A1 (en) Apparatus for executing lstm neural network operation, and operational method
CN108171328B (en) Neural network processor and convolution operation method executed by same
CN106846235A (en) Convolution optimization method and system that a kind of utilization NVIDIA Kepler GPU assembly instructions accelerate
CN111860773A (en) Processing apparatus and method for information processing
WO2017181336A1 (en) Maxout layer operation apparatus and method
CN112348182A (en) Neural network maxout layer computing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924610

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16924610

Country of ref document: EP

Kind code of ref document: A1