WO2018113790A1 - 一种人工神经网络运算的装置及方法 - Google Patents

一种人工神经网络运算的装置及方法 Download PDF

Info

Publication number
WO2018113790A1
WO2018113790A1 PCT/CN2017/118124 CN2017118124W WO2018113790A1 WO 2018113790 A1 WO2018113790 A1 WO 2018113790A1 CN 2017118124 W CN2017118124 W CN 2017118124W WO 2018113790 A1 WO2018113790 A1 WO 2018113790A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
neuron
unit
neurons
output
Prior art date
Application number
PCT/CN2017/118124
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
刘少礼
郝一帆
陈云霁
郭崎
陈天石
Original Assignee
北京中科寒武纪科技有限公司
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京中科寒武纪科技有限公司, 上海寒武纪信息科技有限公司 filed Critical 北京中科寒武纪科技有限公司
Priority to EP17883465.1A priority Critical patent/EP3561732A4/de
Publication of WO2018113790A1 publication Critical patent/WO2018113790A1/zh
Priority to US16/444,443 priority patent/US11775832B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks

Definitions

  • the present invention relates to the field of data processing technologies, and more particularly to an apparatus and method for artificial neural network operation.
  • Neural Networks are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system to adjust the data of the interconnection between a large number of internal nodes, so as to achieve the purpose of processing information.
  • the algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.
  • Neural networks are widely used in a variety of application scenarios: computational vision, speech recognition, and natural language processing.
  • the scale of neural networks has been growing.
  • Lecun's neural network for handwritten character recognition was less than 1M in weight; in 2012, krizhevsky used to participate in the ImageNet competition with a scale of 60M weights.
  • the neural network is a high-calculation and high-access application.
  • the prior art generally uses a general-purpose processor to calculate the artificial neural network.
  • input neurons, output neurons, and weights are stored in three arrays, along with an indexed array that stores the connection data for each output and input connection.
  • the main operation is the multiplication of neurons with weights. Since the weight and the neuron are not one-to-one correspondence, each operation must find the weight corresponding to the neuron through the index array. Due to the weak computing power and memory access of general-purpose processors, the needs of neural networks cannot be met.
  • Another known method of supporting artificial neural network operations and their training algorithms is to use a graphics processing unit (GPU) that supports the above algorithms by executing general SIMD instructions using a general purpose register file and a general stream processing unit.
  • the GPU is a device specially used for performing graphic image operations and scientific calculations, without special support for artificial neural network operations, a large amount of front-end decoding work is still required to perform artificial neural network operations, which brings a lot of additional overhead.
  • the GPU has only a small on-chip buffer.
  • the model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.
  • the present invention proposes an apparatus and method for artificial neural network operation.
  • an apparatus for artificial neural network operation comprising: a mapping unit that receives an input neuron and a weight, generates connection relation data between the input neuron and the output neuron, and outputs the mapped input neuron And a weight, the mapped input neuron and the weight corresponding to the input neuron-weight pair, the mapping unit comprising: a first mapping unit, configured to remove the absolute value less than or equal to the first threshold or A value of 0 or less than the first threshold; and/or a second mapping unit for removing input neurons whose absolute value is less than or equal to the second threshold or whose value is 0 or less than the second threshold.
  • the first mapping unit includes: a first mapping determining unit, configured to determine whether an absolute value of each input weight is less than or equal to the first threshold or whether the value of each input weight is 0 or And less than the first threshold; and the first mapping execution unit generates the connection relationship data based on the determination result of the first mapping determination unit, and removes the weight that the absolute value is less than or equal to the first threshold or the value is 0 or less than the first threshold And outputting the input neuron-weight pair; and/or the second mapping unit comprises: a second mapping determining unit, configured to determine whether an absolute value of each input input neuron is less than or equal to a second threshold or each Whether the value of an input input neuron is 0 or less than a second threshold; and the second mapping execution unit generates the connection relationship data based on the determination result of the second mapping determination unit, and removes the absolute value less than or equal to the second
  • the input neuron having a threshold or a value of 0 or less than the second threshold outputs the input neuron-weight pair
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the first mapping execution unit of the first mapping unit generates the connection relationship data, including: obtaining, by the jth output neuron O j, the corresponding connection relationship data, corresponding to the N nodes of the input layer, the connection
  • the relation data has N bits. Initially, the value of the N bits is set to 1, and N input neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j .
  • the absolute value of the weight between the i input neurons I i and the output neurons O j is less than or equal to the first threshold or the value of the weight between the i-th input neuron I i and the output neuron O j 0 or less than the first threshold, the value of the i-th bit in the connection relationship data is set to 0, there is no connection between I i and O j , and all output neurons O 1 , O 2 , ..., O M
  • the connection relationship data is combined into a vector, and the Nth (j-1)+1th component to the Nthth jth component of the vector are the connection relationship data corresponding to the output neuron Oj .
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the first mapping execution unit of the first mapping unit generates the connection relationship data, including: obtaining, by the jth output neuron O j, the corresponding connection relationship data, if the i-th input neuron I i and the output nerve
  • the absolute value of the weight between the elements O j is less than or equal to the first threshold or the value of the weight between the i-th input neuron I i and the output neuron O j is 0 or less than the first threshold, then I i There is no connection with O j , otherwise there is a connection.
  • the n input neurons connected with O j are I i_1 , I i_2 ,..., I i_n , where 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N, the connection relationship data corresponding to the output neuron O j has n bits, the first bit value is equal to i_1-1, and the value of the kth bit of the connection relationship data is equal to i_k-i_(k-1), where n ⁇ k>1 .
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the generating, by the second mapping execution unit of the second mapping unit, the connection relationship data includes: obtaining, by the jth output neuron O j, the corresponding connection relationship data, corresponding to the N nodes of the input layer, the connection The relation data has N bits. Initially, the value of the N bits is set to 1, and N input neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j .
  • the absolute value of the i-th input neuron I i is less than or equal to the second threshold or the value of the i-th input neuron I i is 0 or less than the second threshold, and the value of the i-th bit in the connection relation data is set to 0. , there is no connection between I i and O j , and all the connection data of the output neurons O 1 , O 2 , . . . , O M are combined into one vector, and the N ⁇ (j-1)+ of the vector One component to the N ⁇ jth component is the connection relation data corresponding to the output neuron O j .
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the generating, by the second mapping execution unit of the second mapping unit, the connection relationship data includes: obtaining, by the jth output neuron O j, the corresponding connection relationship data, if the absolute value of the i-th input neuron I i Less than or equal to the second threshold or the value of the i-th input neuron I i is 0 or less than the second threshold, then there is no connection between I i and O j , otherwise there are connections, n input nerves connected with O j
  • the element is I i_1 , I i_2 ,..., I i_n , where 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N, and the connection relationship data corresponding to the output neuron O j has n bits, and the first
  • the apparatus for artificial neural network operation further includes: a storage unit configured to store externally input data and instructions, the data including input neurons and weights, the mapping unit retrieving the input neurons And the weight value and output the mapped input neuron and the weight; the operation unit is configured to retrieve the mapped input neuron and the weight and perform an operation to obtain the output neuron.
  • the computing device comprises: a multiplication unit; at least one adder; and/or a non-linear transform unit.
  • the apparatus for artificial neural network operation further includes: an instruction cache unit for buffering the instruction; an input neuron cache for buffering the mapped input neuron; and a weight buffer for caching The mapped weight; a control unit, configured to read an instruction in the instruction cache unit, and control the operation unit to retrieve the mapped input neuron and the location in the input neuron cache Decoding the weighted value in the cache and performing an operation; and outputting a neuron buffer for buffering the output neuron obtained by the operation unit.
  • the mapped input neurons and weights output by the mapping unit are stored on the storage unit
  • the device further includes: a DMA, configured to retrieve instructions on the storage unit and the mapped
  • the input neurons and weights are respectively stored to the instruction cache unit, the input neuron cache, the weight buffer, and the output neurons in the output neuron buffer are stored on the storage unit for transmission to the outside world.
  • the apparatus for artificial neural network operation further includes: a DMA for retrieving instructions stored on the storage unit to the instruction cache unit, and retrieving data on the storage unit to the mapping unit, the mapping unit.
  • a DMA for retrieving instructions stored on the storage unit to the instruction cache unit, and retrieving data on the storage unit to the mapping unit, the mapping unit.
  • a method of artificial neural network operation comprising an apparatus for artificial neural network operation, the method comprising: a mapping unit retrieving the input neurons and weights in the storage unit and The mapped input neurons and weights are output; the computing device retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
  • the operations include: multiplication operations; addition operations; and/or nonlinear transformations.
  • the method further includes: the mapping unit retrieving all of the input neurons and weights in the storage unit and outputting the mapped input neurons and weights, and storing the same The storage unit; the input neuron cache, the weight buffer reads a portion of the mapped input neurons and weights by DMA, and is retrieved by the operation unit; and the output neuron cache buffers the output obtained by the operation unit a neuron, and stored to the storage unit by DMA; determining whether the input neuron and the weight are both operated, and if so, the operation ends; otherwise, returning to the input neuron cache, the weight buffer is described by the DMA read portion The steps of mapping the input neurons and weights.
  • the method further includes: the mapping unit omitting the input neurons and weights of the portion of the storage unit by DMA and outputting the mapped input neurons and weights; inputting the neural network; a meta-cache, a weight buffer caches the mapped input neurons and weights, and is retrieved by the operation unit; the output neuron cache buffers the output neurons obtained by the operation unit, and stores the same by DMA a storage unit; determining whether the input neuron and the weight are both mapped and operated, and if so, the operation ends; otherwise, the return mapping unit retrieves the input neuron and the weight of the portion of the storage unit by DMA step.
  • a chip comprising the apparatus of the artificial neural network operation.
  • an electronic device wherein the electronic device includes the chip.
  • the first mapping unit and/or the second mapping unit generate connection data of the input neuron and the weight, thereby reducing the calculation amount, solving the problem that the CPU and the GPU have insufficient performance, and the front-end decoding overhead is large. Effectively improved support for multi-layer artificial neural network algorithms;
  • FIG. 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an artificial neural network according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a first connection manner of the first output neuron after the first mapping of the artificial neural network in FIG. 2;
  • FIG. 4 is a schematic diagram of a second connection manner of the first output neuron after the first mapping of the artificial neural network in FIG. 2;
  • FIG. 5 is a schematic structural diagram of an apparatus for artificial neural network operation according to an embodiment of the present invention.
  • FIG. 6 is a flow chart of an operation method of the apparatus for artificial neural network operation in FIG. 5;
  • Figure 7 is a flow chart showing the operation steps of the arithmetic unit of Figure 6;
  • FIG. 8 is a schematic structural diagram of an apparatus for artificial neural network operation according to another embodiment of the present invention.
  • FIG. 9 is a flow chart showing an operation method of the apparatus for artificial neural network operation in FIG. 8;
  • FIG. 10 is a schematic structural diagram of a system for artificial neural network operation according to still another embodiment of the present invention.
  • An embodiment of the present invention provides an apparatus for artificial neural network operation, which includes a mapping unit, generates connection relation data, and outputs the mapped input neurons and weights, and the correspondence between the mapped input neurons and weights is an input.
  • the neuron-weight pair reduces the computational complexity of the artificial neural network operation and realizes the artificial neural network operation of fast computation.
  • the input neuron-weight pair is not a true data storage structure, but merely represents the correspondence between the input neurons and the weights.
  • the input neurons are stored in vector A
  • the weights are stored in vector B
  • the lengths of vectors A and B are the same
  • the components of the same position of vectors A and B are combined to be considered an input neuron-weight pair.
  • the input neurons and weights can be placed separately in different caches and used by the arithmetic unit.
  • mapping unit 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention.
  • input data includes an input neuron and a weight
  • an input data is input to a mapping unit 1
  • the mapped input neuron and weight are output by the mapping unit 1.
  • the value, the correspondence between the mapped input neurons and the weights is the input neuron-weight pair.
  • the mapping unit comprises a first mapping unit 11 and/or a second mapping unit 12, the first mapping unit 11 is configured to perform a first mapping operation, and can be used to remove a weight having a value of 0 or less than the first threshold ⁇ ', optionally
  • the first threshold ⁇ ' satisfies 0 ⁇ ' ⁇ 0.2, for example, 0.1, 0.08, 0.05, 0.02, 0.01 may also be used to remove the weight whose absolute value is less than or equal to the first threshold ⁇ , optionally, the first The threshold ⁇ satisfies 0 ⁇ ⁇ 0.2, for example, 0, 0.1, 0.08, 0.05, 0.02, and 0.01.
  • the second mapping unit 12 is configured to perform a second mapping operation, and correspondingly, may be used to remove an input neuron having a value of 0 or less than a second threshold ⁇ ′, optionally, the second threshold ⁇ ′ satisfies 0 ⁇ ′ ⁇ 0.2, for example, 0.1, 0.08, 0.05, 0.02, 0.01, can also be used to remove input neurons whose absolute value is less than or equal to the second threshold.
  • the second threshold ⁇ satisfies 0 ⁇ 0.2, for example, 0. , 0.1, 0.08, 0.05, 0.02, 0.01.
  • the two thresholds ⁇ and ⁇ mentioned here may or may not be equal.
  • the following description removes the weight whose absolute value is less than or equal to the first threshold value ⁇ only by the first mapping operation, and the second mapping operation removes the input neuron whose absolute value is less than or equal to the second threshold value ⁇ for explanation.
  • the first mapping unit 11 includes a first mapping determining unit 111 and a first mapping executing unit 112.
  • the first mapping determining unit 111 determines whether the absolute value of each input weight is less than or equal to the first threshold ⁇ . Based on the result of the first mapping determination unit, the first mapping execution unit 112 generates connection relation data, and converts the input data into an input neuron-weight pair according to the generated connection relation data.
  • connection relationship data generated by the first mapping execution unit 112 of the first mapping unit 11 may be expressed in the following two manners:
  • the value of 1 indicates that the absolute value of the weight between the input neuron and the output neuron is greater than the first threshold ⁇ , and the connection between the input neuron and the output neuron is retained, where 0 indicates that the absolute value of the weight is less than or equal to the first threshold.
  • removes the connection between the input neuron and the output neuron, and the connection of each output neuron to all input neurons constitutes a string of 0 and 1 to represent the connection relationship data of the output neuron, and all the outputs
  • the connection data of neurons is combined into a vector.
  • connection is reserved/removed according to whether the absolute value of the weight is greater than the first threshold ⁇ . If it is greater than, the reservation is retained, otherwise it is removed. Transmitting a distance at which the first connection is located from the first input neuron, and outputting a distance from the second input neuron to the last input neuron, the output being the third input neuron from the previous The distance of the input neurons, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship data of the output neurons.
  • the second mapping unit 12 includes a second mapping determining unit 121 and a second mapping executing unit 122, and the second mapping determining unit 121 determines whether the absolute value of each input input neuron is less than or equal to the second threshold ⁇ . Based on the result of the second mapping determination unit, the second mapping execution unit 122 generates connection relation data, and converts the input data into an input neuron-weight pair according to the generated connection relation data.
  • connection relationship data generated by the first mapping execution unit 122 in the second mapping unit 12 may also be expressed in the following two manners:
  • connection is retained/removed according to whether the absolute value of the input neuron is greater than the second threshold ⁇ . If it is greater than, the reservation is retained, otherwise it is removed.
  • the distance from the position where the first connection of an output neuron is located to the first input neuron, the distance from the second input neuron of the output neuron to the distance of the previous input neuron, and the third of the output neurons The distance of the input neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship data of the output neuron.
  • K the Kth layer
  • K+1 the Kth layer
  • the +1 layer is called the output layer. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer, and the number of neurons in each layer is predicted.
  • the input layer be composed of N input neurons I 1 , I 2 , . . . , I N
  • the output layer consists of M output neurons O 1 , O 2 , . . . , O M .
  • the first connection method :
  • each of the output neurons O j obtains its corresponding connection relationship data. Since the input layer has N nodes, the connection relationship data has N bits, each bit has a value of 1 or 0, the i-th bit value of 1 indicates that there is a connection between I i and O j , and 0 indicates I i and O. There is no connection between j . Initially, the value of these N bits is set to 1. If the absolute value of the input neuron I i is less than or equal to the second threshold ⁇ , or if the absolute value of the weight between I i and O j is less than or equal to the first threshold ⁇ , then the i-th bit in the connection relationship data The value is set to 0, which means that there is no connection between I i and O j . Then, all the connection relationship data of the output neurons are combined into a vector, and the N ⁇ (j-1)+1 component to the N ⁇ jth component value of the vector is the connection relationship of the output neuron O j . data.
  • the number of input layer neurons is equal to the number of storage bits of the connection relation data corresponding to each output neuron. Therefore, even with the simplest one-dimensional array that takes only 0, 1 values, the connection relation data corresponding to each output neuron can be clearly known.
  • connection relation data For each output neuron O j, its corresponding connection relationship data is obtained. If the absolute value of the input neuron I i is less than or equal to the second threshold ⁇ , or if the absolute value of the weight between I i and O j is less than or equal to the first threshold ⁇ , then no relationship between I i and O j is considered Connected, otherwise connected. If the input and O j of the neural element is connected to I i_1, I i_2, ..., I i_n, wherein 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N. Then, the connection relation data has n bits; the first bit value is equal to i_1-1; n ⁇ k>1, and the value of the kth bit of the connection relation data is equal to i_k-i_(k-1).
  • connection relation data can be represented by a high-dimensional dynamic array, which can be represented by a linked list or the like.
  • the mapping execution unit of the mapping unit After the connection relationship data is generated, the mapping execution unit of the mapping unit outputs the mapped input neurons and weights according to the connection relationship data, and the mapping relationship between the mapped input neurons and the weights is an input neuron-weight pair, after mapping
  • the input neurons and weights can be used directly during the operation.
  • the first mapping execution unit 112 of the first mapping unit 11 and the second mapping execution unit 122 of the second mapping unit 12 generate connection relationship data based on the input data, and output the mapped input neurons and weights.
  • the corresponding connection relationship data can take two representations: one is to use one bit between each input and output neuron to indicate whether there is a connection, and the other is to use the distance between the connections to represent each The location of the connection.
  • the artificial neural network has 4 input neurons: I1, I2, I3, I4; there are 2 output nerves. Element: O1, O2; the weights of the connections are expressed as: W11, W21, W31, W41, W12, W22, W32, W42. Let I1 have a value of 0, I2, I3, and I4 are not 0; let W21, W12, and W42 be 0, and the remaining weights are non-zero.
  • the first mapping unit and the second mapping unit may process the data at the same time, or may process the data sequentially and the order of the two may be interchanged. The following describes only the first mapping unit to process the data.
  • the first connection is expressed as follows:
  • the connection relationship data of O2 defaults to: 1111, and the placement order is 11111111; if the operation of determining the weight is performed, as shown in FIG. 3, the output is
  • the connection relationship data of the neuron O1 is: 1011, each bit indicates whether or not there is a connection with the input, 1 indicates that there is a connection, 0 indicates no connection, and the connection relationship data of the output neuron O2 is 0110.
  • the input neurons and the weights corresponding to the connection relationship data of 0 are not operated.
  • the connection relation data is stored, the connection relationship data can be stored in the order of the output neurons. Put all the inputs of each output neuron in turn and combine them into a vector.
  • the order of the above example is 10110110.
  • the connection relationship data of the output neuron O1 is: 0011, and the first bit is changed from 1 to 0 because the first input neuron I1 has a value of 0, and the release from I1 is removed.
  • the connection and output neuron O2 connection relationship data is: 0110, and finally placed as: 00110110.
  • the connection relationship data of the output neuron O1 is: 0111, and the connection relationship data of the output neuron O2 is: 0111, and finally placed as: 01110111.
  • the second connection is expressed as follows:
  • the connection relationship data of O1, O2 defaults to: 0, 1, 1, 1; if the operation of determining the weight is performed, as shown in FIG. 4, the output is The neuron O1 is connected to the input neurons I1, I3, and I4, and the connection relationship data is 0, 2, 1. 0 indicates that the distance of the first connection is 0 from the first input neuron, that is, the first Input neurons, 2 means that the distance between the second input neuron and the previous input neuron is 2, which means the third input neuron, and 1 indicates that the distance of the third input neuron from the previous input neuron is 1, that is, the fourth input neuron. Similarly, the connection relationship data of O2 is 1,1.
  • the connection relationship data of O1, O2 is unchanged; if the operation of judging the input neuron value is performed, FIG. 4 after performing the first mapping operation
  • the connection relationship data of O1 and O2 are: 1, 1, 1,
  • the first mapping unit 11 and the second mapping unit 12 output the mapped neurons and weights according to the connection relationship data obtained above, and the correspondence between the mapped neurons and the weights is an input neuron-weight pair.
  • the input neuron-weight pair can be used directly in the operation, taking the specific process of outputting the neuron O1 mapping in an artificial neural network as shown in FIG. 2 as an example:
  • the input neurons are: I1, I2, I3, I4, and the input weights are: W11, W21, W31, W41, where I1, W21 take 0, and the rest are non-zero.
  • the connection relationship data is: 1011, or 0, 2, 1.
  • the connection relationship data is: 0011, or 2, 1.
  • the mapping execution unit of the two mapping units outputs the input neuron with the value 0 removed and the connection weight issued from it according to the connection relationship data, and the mapped input neurons are I3, I4, and the mapped weights.
  • the input neuron-weight pair is: I3-W31, I4-W41.
  • the obtained input neuron vector is (I3, I4), and the obtained weight vector is (W31, W41).
  • the two mapping units preferably operate on the data at the same time, regardless of the order.
  • the apparatus for artificial neural network operation in the embodiment of the present invention includes, in addition to the mapping unit 1, a storage unit 2, a DMA (direct memory access) 3, an instruction cache 4, a control unit 5, an input neuron cache 6, and a weight.
  • the storage unit 2 is configured to store data and instructions, which receive and store externally input data and instructions, including input neurons and weights.
  • the mapping unit 1 retrieves the input neurons and weights in the storage unit 2, and performs a first mapping operation by the first mapping unit 11, and performs a second mapping by the second mapping unit 12, and the mapping unit 1 obtains a mapping of the data.
  • the mapped input neurons and weights are stored in the storage unit 2.
  • the DMA 3 calls the instruction in the storage unit 2 and the mapped input neurons and weights, and allocates them to the instruction cache 4, the input neuron buffer 6, and the weight buffer 7, respectively.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation performed by the arithmetic unit 8 includes a neural network calculation.
  • the computing unit includes, but is not limited to: a first partial multiplier; a second portion of one or more adders (more specifically, an adder of the second portion constitutes an additive tree); the third portion is an activation function Unit; and/or fourth part vector processing unit. More specifically, the vector processing unit can process vector operations and/or pool operations.
  • the first part multiplies the input data 1 (in1) and the input data 2 (in2) to obtain the multiplied output (out).
  • the third part converts the input data (in) to the active output data (out) by the activation function (active).
  • pool is a pooling operation
  • pooling operations include but are not limited to: average pooling, maximum pooling, The median pooling, the input data in is the data in a pooled core associated with the output out.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for artificial neural network operation, as shown in FIG. 6, comprising the following steps:
  • S101 Read an artificial neural network SIMD instruction for starting an operation of an artificial neural network operation.
  • the mapping unit calls all the input neurons and weights in the storage unit, and processes them to obtain the mapped input neurons and weights, and stores them in the storage unit.
  • the first mapping unit performs a first mapping process on the input neurons and the weights
  • the second mapping unit performs a second mapping process on the input neurons and the weights.
  • Both mapping units can use two connection methods to generate connection relationship data, and output the input neurons and input weights according to the connection relationship data to output the mapped neurons and weights.
  • the two connection methods and the connection have been used before.
  • the neurons and weights after the relational data output mapping are described in detail and will not be described here.
  • the weight buffer 7 reads the partially mapped neurons and weights through the DMA3.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • the specific operation includes the following steps.
  • FIG. 7 An embodiment, as shown in FIG. 7:
  • S1041 Perform a multiplication operation, which is used to multiply the mapped neuron and the weight data;
  • S1042 Perform an addition tree operation, and add the results obtained in the first stage by the addition tree step by step to complete the vector inner product operation;
  • S1043 Perform non-linear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transformation is an activation function operation, and the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.
  • performing the addition tree operation does not necessarily perform the operation on the data of the input addition tree operation unit based on the result of the first stage, and the nonlinear transformation is not necessarily based on the result of the second stage, and may be directly input.
  • the data of the nonlinear transform unit is transformed.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S106 It is determined whether all the mapped neurons and the weights are calculated. If the result is N, the process returns to step S103. If the result is Y, step S107 is performed.
  • Another embodiment of the present invention provides an apparatus for artificial neural network operation, including a mapping unit 1, a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, and an input neuron cache 6.
  • the storage unit 2 is configured to store data and instructions, which receive and store externally input data and instructions, including input neurons and weights.
  • the instructions in the DMA3 call memory unit 2 are allocated to the instruction cache 4, and the input neurons in the memory unit 2 are called and the weights are assigned to the mapping unit 1 for direct mapping.
  • the mapping unit 1 performs a first mapping operation by the first mapping unit 11, and the second mapping unit 12 performs a second mapping.
  • the mapping unit 1 obtains the mapped input neurons and weights by mapping the data, and respectively transmits the converted input neurons to the neurons.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation unit 8 has been described in detail before, and will not be described again.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for artificial neural network operation, as shown in FIG. 9, comprising the following steps:
  • S201 Read an artificial neural network SIMD instruction for starting an operation of an artificial neural network operation.
  • S202 The mapping unit calls part of the input neurons and weights in the storage unit through DMA3, and processes the same, and the mapped input neurons and weights are directly stored in the neuron cache 6, and the weight buffer 7.
  • the first mapping unit performs a first mapping process on the input neurons and the weights
  • the second mapping unit performs a second mapping process on the input neurons and the weights.
  • Both mapping units can use two connection methods to generate connection relationship data, and output the input neurons and input weights according to the connection relationship data to output the mapped neurons and weights.
  • the two connection methods and the connection have been used before.
  • the neurons and weights after the relational data output mapping are described in detail and will not be described here.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S205 It is determined whether all input neurons and weights are mapped and operated. If the result is N, the process returns to step S102. If the result is Y, step S107 is performed.
  • the first mapping unit and the second mapping unit of the mapping unit in the embodiment are mapped in the calculation, and the mapped data is directly calculated to the operation unit, and the previous implementation
  • the data mapped by the first mapping unit and the second mapping unit of the mapping unit in advance before the calculation by the arithmetic unit is stored in the storage unit.
  • the operation speed is faster.
  • FIG. 10 Another embodiment of the present invention provides a system for artificial neural network operation, as shown in FIG. 10, which includes an I/O interface 20, a storage device 30, a central processing unit (CPU) 40, and an apparatus 10 for artificial neural network computing. .
  • the I/O interface 20 is used for the I/O data to be sent to the artificial neural network computing device 10 by the CPU 40, and then written by the artificial neural network computing device 10 to the storage device 30, and the artificial neural network computing device 10 requires dedicated The instructions are also transmitted by the CPU 40 to the artificial neural network computing device 10.
  • the storage device 30 is used to temporarily store artificial neural network models and neuron data, particularly when all models cannot be dropped in the cache on the device 10 of the artificial neural network operation.
  • the central processing unit (CPU) 40 is used for basic control such as start-stop of the data transfer and artificial neural network calculation, and serves as an interface between the device 10 for artificial neural network calculation and external control.
  • the apparatus 10 for artificial neural network operation is for accepting data and programs from the CPU 40, and executing an artificial neural network operation algorithm, and the execution result of the apparatus 10 of the artificial neural network operation is transmitted back to the CPU 40.
  • the apparatus 10 supporting the artificial neural network operation is used as a coprocessor of the CPU 40 or the GPU to execute an artificial neural network operation algorithm.
  • a plurality of artificial neural network computing devices are interconnected to form a system: a plurality of artificial neural network computing devices can be interconnected through a PCIE bus to support larger-scale artificial neural network operations, and can share the same host.
  • the CPUs either have their own host CPUs and can share memory or each accelerator has its own memory.
  • the interconnection method can be any interconnection topology.
  • the artificial neural network computing system can be used as a SOC on-chip system for mobile phones, robots, drones, video monitoring equipment, etc., effectively reducing the core area of the control part, increasing the processing speed, and reducing the overall power consumption.
  • the universal interconnect interface of the combined processing device is coupled to certain components of the device. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
  • the present disclosure discloses a chip that includes an apparatus for artificial neural network computing or a system of artificial neural network operations.
  • the present disclosure discloses a chip package structure that includes the chip described above.
  • the present disclosure discloses a board that includes the chip package structure described above.
  • the present disclosure discloses an electronic device that includes the above-described card.
  • the electronic device may include a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a camera, a projector, a watch, a headset, Mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
  • a data processing device a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a camera, a projector, a watch, a headset, Mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
  • the vehicle includes an airplane, a ship, and/or a vehicle;
  • the household appliance includes a television, an air conditioner, a microwave oven, a refrigerator, a rice cooker, a humidifier, a washing machine, an electric lamp, a gas stove, a range hood;
  • the medical device includes a nuclear magnetic resonance instrument, B-ultrasound and / or electrocardiograph.
  • a device or method employing the techniques of the present invention when performing neural network operations, for a given neural network, if the absolute value of some of the input neurons and weights in the network is equal to zero or near zero, then In terms of computational speed, there is an improvement over devices or methods that do not employ the techniques described herein. Moreover, the larger the ratio of the input neurons whose absolute value is equal to 0 or near 0 to all the input neurons in the network, the greater the increase of the operation speed; the absolute value is equal to 0 or the weight of the value near 0 is the proportion of the ownership value in the network. Large, the higher the speed of the operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Feedback Control In General (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Devices For Executing Special Programs (AREA)
  • Complex Calculations (AREA)
PCT/CN2017/118124 2016-12-23 2017-12-22 一种人工神经网络运算的装置及方法 WO2018113790A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17883465.1A EP3561732A4 (de) 2016-12-23 2017-12-22 Betriebsvorrichtung und verfahren für ein künstliches neuronales netzwerk
US16/444,443 US11775832B2 (en) 2016-12-23 2019-06-18 Device and method for artificial neural network operation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611214028 2016-12-23
CN201611214028.0 2016-12-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/444,443 Continuation-In-Part US11775832B2 (en) 2016-12-23 2019-06-18 Device and method for artificial neural network operation

Publications (1)

Publication Number Publication Date
WO2018113790A1 true WO2018113790A1 (zh) 2018-06-28

Family

ID=62624548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/118124 WO2018113790A1 (zh) 2016-12-23 2017-12-22 一种人工神经网络运算的装置及方法

Country Status (4)

Country Link
US (1) US11775832B2 (de)
EP (1) EP3561732A4 (de)
CN (4) CN108320018B (de)
WO (1) WO2018113790A1 (de)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102637733B1 (ko) * 2018-10-31 2024-02-19 삼성전자주식회사 뉴럴 네트워크 프로세서 및 그것의 컨볼루션 연산 방법
CN109740739B (zh) * 2018-12-29 2020-04-24 中科寒武纪科技股份有限公司 神经网络计算装置、神经网络计算方法及相关产品
US11099788B2 (en) * 2019-10-21 2021-08-24 Advanced Micro Devices, Inc. Near-memory data reduction
KR20210084123A (ko) * 2019-12-27 2021-07-07 삼성전자주식회사 전자 장치 및 그 제어 방법
US20220138579A1 (en) * 2020-11-02 2022-05-05 International Business Machines Corporation Weight repetition on rpu crossbar arrays

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (zh) * 2003-10-21 2004-09-15 上海交通大学 神经网络建模方法
CN105701540A (zh) * 2016-01-11 2016-06-22 清华大学 一种自生成神经网络构建方法

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0349820B1 (de) * 1988-07-05 1995-04-19 Siemens Aktiengesellschaft Netzwerk -Baustein und Architektur für die programmierbare Emulation künstlicher neuronaler Netze mit digitaler Arbeitsweise
US5274745A (en) * 1989-07-28 1993-12-28 Kabushiki Kaisha Toshiba Method of processing information in artificial neural networks
JP5115965B2 (ja) * 2007-10-01 2013-01-09 独立行政法人理化学研究所 ニューロン装置、神経回路網装置、非負整数符号化装置、整数クラスタ装置、フィードバック制御装置、ならびに、プログラム
CN101546389A (zh) * 2008-03-26 2009-09-30 中国科学院半导体研究所 一种主方向神经网络系统
JP4766101B2 (ja) * 2008-11-10 2011-09-07 ソニー株式会社 触行動認識装置及び触行動認識方法、情報処理装置、並びにコンピューター・プログラム
DE102008058016A1 (de) * 2008-11-19 2010-11-04 Optiminig Gmbh System und Verfahren zur rechnerbasierten Analyse großer Datenmengen
CN102111626A (zh) * 2009-12-23 2011-06-29 新奥特(北京)视频技术有限公司 一种rgb到cmyk色彩空间的转换方法和装置
CN101834827B (zh) * 2010-03-29 2012-07-18 大唐联诚信息系统技术有限公司 一种多输入多输出系统中的信号检测方法和装置
US8700552B2 (en) * 2011-11-28 2014-04-15 Microsoft Corporation Exploiting sparseness in training deep neural networks
CN102665049B (zh) * 2012-03-29 2014-09-17 中国科学院半导体研究所 基于可编程视觉芯片的视觉图像处理系统
US9730643B2 (en) * 2013-10-17 2017-08-15 Siemens Healthcare Gmbh Method and system for anatomical object detection using marginal space deep neural networks
US20160026912A1 (en) * 2014-07-22 2016-01-28 Intel Corporation Weight-shifting mechanism for convolutional neural networks
CN105550749A (zh) * 2015-12-09 2016-05-04 四川长虹电器股份有限公司 一种新型网络拓扑结构的卷积神经网络的构造方法
CN106127301B (zh) * 2016-01-16 2019-01-11 上海大学 一种随机神经网络硬件实现装置
CN107609642B (zh) * 2016-01-20 2021-08-31 中科寒武纪科技股份有限公司 计算装置和方法
WO2017155544A1 (en) * 2016-03-11 2017-09-14 Hewlett Packard Enterprise Development Lp Hardware accelerators for calculating node values of neural networks
CN105844330B (zh) * 2016-03-22 2019-06-28 华为技术有限公司 神经网络处理器的数据处理方法及神经网络处理器
CN106022468B (zh) * 2016-05-17 2018-06-01 成都启英泰伦科技有限公司 人工神经网络处理器集成电路及该集成电路的设计方法
US20180046903A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
WO2018058509A1 (en) * 2016-09-30 2018-04-05 Intel Corporation Dynamic neural network surgery

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (zh) * 2003-10-21 2004-09-15 上海交通大学 神经网络建模方法
CN105701540A (zh) * 2016-01-11 2016-06-22 清华大学 一种自生成神经网络构建方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIAO, JUNFEI ET AL.: "Dynamic optimization structure design for neural networks: review and perspective", CONTROL THEORY & APPLICATIONS, vol. 27, no. 3, 1 March 2010 (2010-03-01), pages 350 - 357, XP055610827, ISSN: 1000-8152, DOI: 10.7641/j.issn.1000-8152 *
SUN, HUANLONG ET AL.: "Research on New pruning algorithm for Feedforward Neural Network Structure", JOURNAL OF GUANGXI TEACHERS EDUCATION UNIVERSITY (NATURAL SCIENCE EDITION), vol. 30, no. 4, 31 December 2013 (2013-12-31), pages 55 - 60, XP009515365, ISSN: 1002-8743 *

Also Published As

Publication number Publication date
US11775832B2 (en) 2023-10-03
CN108334944B (zh) 2020-04-17
CN108320018A (zh) 2018-07-24
US20190311266A1 (en) 2019-10-10
CN111160547B (zh) 2024-04-09
EP3561732A4 (de) 2020-04-01
CN108334944A (zh) 2018-07-27
CN111160547A (zh) 2020-05-15
CN111126590B (zh) 2023-09-29
CN108320018B (zh) 2020-03-06
EP3561732A1 (de) 2019-10-30
CN111126590A (zh) 2020-05-08

Similar Documents

Publication Publication Date Title
WO2018113790A1 (zh) 一种人工神经网络运算的装置及方法
CN107609642B (zh) 计算装置和方法
US11698786B2 (en) Processing apparatus and processing method
US11307865B2 (en) Data processing apparatus and method
US20200097806A1 (en) Processing method and accelerating device
EP3451157B1 (de) Vorrichtung und verfahren zur durchführung eines vorwärtsbetriebs eines neuronalen faltungsnetzes
WO2018108126A1 (zh) 神经网络卷积运算装置及方法
CN110298443B (zh) 神经网络运算装置及方法
CN108427990B (zh) 神经网络计算系统和方法
KR102470264B1 (ko) 완전연결층 신경망 역방향 트레이닝 실행용 장치와 방법
EP3944157A1 (de) Vorrichtung und verfahren zur durchführung des trainings eines neuronalen faltungsnetzes
CN111260025B (zh) 用于执行lstm神经网络运算的装置和运算方法
WO2017185387A1 (zh) 一种用于执行全连接层神经网络正向运算的装置和方法
WO2017185389A1 (zh) 一种用于执行矩阵乘运算的装置和方法
WO2021208612A1 (zh) 数据处理的方法与装置
EP3564863B1 (de) Vorrichtung zur ausführung eines betriebs eines neuronalen lstm-netzes und betriebsverfahren
EP3451238A1 (de) Vorrichtung und verfahren zur ausführung einer pooling-operation
WO2018058427A1 (zh) 神经网络运算装置及方法
WO2018112892A1 (zh) 一种支持快速人工神经网络运算的装置及方法
CN109711540B (zh) 一种计算装置及板卡
CN107305486B (zh) 一种神经网络maxout层计算装置
WO2017177446A1 (zh) 支持离散数据表示的人工神经网络反向训练装置和方法
WO2017185248A1 (zh) 用于执行人工神经网络自学习运算的装置和方法
CN113741977B (zh) 数据操作方法、数据操作装置、数据处理器
CN111860772B (zh) 一种用于执行人工神经网络pooling运算的装置和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17883465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017883465

Country of ref document: EP

Effective date: 20190723