WO2018113790A1 - 一种人工神经网络运算的装置及方法 - Google Patents

一种人工神经网络运算的装置及方法 Download PDF

Info

Publication number
WO2018113790A1
WO2018113790A1 PCT/CN2017/118124 CN2017118124W WO2018113790A1 WO 2018113790 A1 WO2018113790 A1 WO 2018113790A1 CN 2017118124 W CN2017118124 W CN 2017118124W WO 2018113790 A1 WO2018113790 A1 WO 2018113790A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
neuron
unit
neurons
output
Prior art date
Application number
PCT/CN2017/118124
Other languages
English (en)
French (fr)
Inventor
刘少礼
郝一帆
陈云霁
郭崎
陈天石
Original Assignee
北京中科寒武纪科技有限公司
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京中科寒武纪科技有限公司, 上海寒武纪信息科技有限公司 filed Critical 北京中科寒武纪科技有限公司
Priority to EP17883465.1A priority Critical patent/EP3561732A4/en
Publication of WO2018113790A1 publication Critical patent/WO2018113790A1/zh
Priority to US16/444,443 priority patent/US11775832B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks

Definitions

  • the present invention relates to the field of data processing technologies, and more particularly to an apparatus and method for artificial neural network operation.
  • Neural Networks are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system to adjust the data of the interconnection between a large number of internal nodes, so as to achieve the purpose of processing information.
  • the algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.
  • Neural networks are widely used in a variety of application scenarios: computational vision, speech recognition, and natural language processing.
  • the scale of neural networks has been growing.
  • Lecun's neural network for handwritten character recognition was less than 1M in weight; in 2012, krizhevsky used to participate in the ImageNet competition with a scale of 60M weights.
  • the neural network is a high-calculation and high-access application.
  • the prior art generally uses a general-purpose processor to calculate the artificial neural network.
  • input neurons, output neurons, and weights are stored in three arrays, along with an indexed array that stores the connection data for each output and input connection.
  • the main operation is the multiplication of neurons with weights. Since the weight and the neuron are not one-to-one correspondence, each operation must find the weight corresponding to the neuron through the index array. Due to the weak computing power and memory access of general-purpose processors, the needs of neural networks cannot be met.
  • Another known method of supporting artificial neural network operations and their training algorithms is to use a graphics processing unit (GPU) that supports the above algorithms by executing general SIMD instructions using a general purpose register file and a general stream processing unit.
  • the GPU is a device specially used for performing graphic image operations and scientific calculations, without special support for artificial neural network operations, a large amount of front-end decoding work is still required to perform artificial neural network operations, which brings a lot of additional overhead.
  • the GPU has only a small on-chip buffer.
  • the model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.
  • the present invention proposes an apparatus and method for artificial neural network operation.
  • an apparatus for artificial neural network operation comprising: a mapping unit that receives an input neuron and a weight, generates connection relation data between the input neuron and the output neuron, and outputs the mapped input neuron And a weight, the mapped input neuron and the weight corresponding to the input neuron-weight pair, the mapping unit comprising: a first mapping unit, configured to remove the absolute value less than or equal to the first threshold or A value of 0 or less than the first threshold; and/or a second mapping unit for removing input neurons whose absolute value is less than or equal to the second threshold or whose value is 0 or less than the second threshold.
  • the first mapping unit includes: a first mapping determining unit, configured to determine whether an absolute value of each input weight is less than or equal to the first threshold or whether the value of each input weight is 0 or And less than the first threshold; and the first mapping execution unit generates the connection relationship data based on the determination result of the first mapping determination unit, and removes the weight that the absolute value is less than or equal to the first threshold or the value is 0 or less than the first threshold And outputting the input neuron-weight pair; and/or the second mapping unit comprises: a second mapping determining unit, configured to determine whether an absolute value of each input input neuron is less than or equal to a second threshold or each Whether the value of an input input neuron is 0 or less than a second threshold; and the second mapping execution unit generates the connection relationship data based on the determination result of the second mapping determination unit, and removes the absolute value less than or equal to the second
  • the input neuron having a threshold or a value of 0 or less than the second threshold outputs the input neuron-weight pair
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the first mapping execution unit of the first mapping unit generates the connection relationship data, including: obtaining, by the jth output neuron O j, the corresponding connection relationship data, corresponding to the N nodes of the input layer, the connection
  • the relation data has N bits. Initially, the value of the N bits is set to 1, and N input neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j .
  • the absolute value of the weight between the i input neurons I i and the output neurons O j is less than or equal to the first threshold or the value of the weight between the i-th input neuron I i and the output neuron O j 0 or less than the first threshold, the value of the i-th bit in the connection relationship data is set to 0, there is no connection between I i and O j , and all output neurons O 1 , O 2 , ..., O M
  • the connection relationship data is combined into a vector, and the Nth (j-1)+1th component to the Nthth jth component of the vector are the connection relationship data corresponding to the output neuron Oj .
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the first mapping execution unit of the first mapping unit generates the connection relationship data, including: obtaining, by the jth output neuron O j, the corresponding connection relationship data, if the i-th input neuron I i and the output nerve
  • the absolute value of the weight between the elements O j is less than or equal to the first threshold or the value of the weight between the i-th input neuron I i and the output neuron O j is 0 or less than the first threshold, then I i There is no connection with O j , otherwise there is a connection.
  • the n input neurons connected with O j are I i_1 , I i_2 ,..., I i_n , where 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N, the connection relationship data corresponding to the output neuron O j has n bits, the first bit value is equal to i_1-1, and the value of the kth bit of the connection relationship data is equal to i_k-i_(k-1), where n ⁇ k>1 .
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the generating, by the second mapping execution unit of the second mapping unit, the connection relationship data includes: obtaining, by the jth output neuron O j, the corresponding connection relationship data, corresponding to the N nodes of the input layer, the connection The relation data has N bits. Initially, the value of the N bits is set to 1, and N input neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j .
  • the absolute value of the i-th input neuron I i is less than or equal to the second threshold or the value of the i-th input neuron I i is 0 or less than the second threshold, and the value of the i-th bit in the connection relation data is set to 0. , there is no connection between I i and O j , and all the connection data of the output neurons O 1 , O 2 , . . . , O M are combined into one vector, and the N ⁇ (j-1)+ of the vector One component to the N ⁇ jth component is the connection relation data corresponding to the output neuron O j .
  • the input layer of the neural network has N input neurons I 1 , I 2 , . . . , I N
  • the output layer has M output neurons O 1 , O 2 , . . . , O M
  • the generating, by the second mapping execution unit of the second mapping unit, the connection relationship data includes: obtaining, by the jth output neuron O j, the corresponding connection relationship data, if the absolute value of the i-th input neuron I i Less than or equal to the second threshold or the value of the i-th input neuron I i is 0 or less than the second threshold, then there is no connection between I i and O j , otherwise there are connections, n input nerves connected with O j
  • the element is I i_1 , I i_2 ,..., I i_n , where 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N, and the connection relationship data corresponding to the output neuron O j has n bits, and the first
  • the apparatus for artificial neural network operation further includes: a storage unit configured to store externally input data and instructions, the data including input neurons and weights, the mapping unit retrieving the input neurons And the weight value and output the mapped input neuron and the weight; the operation unit is configured to retrieve the mapped input neuron and the weight and perform an operation to obtain the output neuron.
  • the computing device comprises: a multiplication unit; at least one adder; and/or a non-linear transform unit.
  • the apparatus for artificial neural network operation further includes: an instruction cache unit for buffering the instruction; an input neuron cache for buffering the mapped input neuron; and a weight buffer for caching The mapped weight; a control unit, configured to read an instruction in the instruction cache unit, and control the operation unit to retrieve the mapped input neuron and the location in the input neuron cache Decoding the weighted value in the cache and performing an operation; and outputting a neuron buffer for buffering the output neuron obtained by the operation unit.
  • the mapped input neurons and weights output by the mapping unit are stored on the storage unit
  • the device further includes: a DMA, configured to retrieve instructions on the storage unit and the mapped
  • the input neurons and weights are respectively stored to the instruction cache unit, the input neuron cache, the weight buffer, and the output neurons in the output neuron buffer are stored on the storage unit for transmission to the outside world.
  • the apparatus for artificial neural network operation further includes: a DMA for retrieving instructions stored on the storage unit to the instruction cache unit, and retrieving data on the storage unit to the mapping unit, the mapping unit.
  • a DMA for retrieving instructions stored on the storage unit to the instruction cache unit, and retrieving data on the storage unit to the mapping unit, the mapping unit.
  • a method of artificial neural network operation comprising an apparatus for artificial neural network operation, the method comprising: a mapping unit retrieving the input neurons and weights in the storage unit and The mapped input neurons and weights are output; the computing device retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
  • the operations include: multiplication operations; addition operations; and/or nonlinear transformations.
  • the method further includes: the mapping unit retrieving all of the input neurons and weights in the storage unit and outputting the mapped input neurons and weights, and storing the same The storage unit; the input neuron cache, the weight buffer reads a portion of the mapped input neurons and weights by DMA, and is retrieved by the operation unit; and the output neuron cache buffers the output obtained by the operation unit a neuron, and stored to the storage unit by DMA; determining whether the input neuron and the weight are both operated, and if so, the operation ends; otherwise, returning to the input neuron cache, the weight buffer is described by the DMA read portion The steps of mapping the input neurons and weights.
  • the method further includes: the mapping unit omitting the input neurons and weights of the portion of the storage unit by DMA and outputting the mapped input neurons and weights; inputting the neural network; a meta-cache, a weight buffer caches the mapped input neurons and weights, and is retrieved by the operation unit; the output neuron cache buffers the output neurons obtained by the operation unit, and stores the same by DMA a storage unit; determining whether the input neuron and the weight are both mapped and operated, and if so, the operation ends; otherwise, the return mapping unit retrieves the input neuron and the weight of the portion of the storage unit by DMA step.
  • a chip comprising the apparatus of the artificial neural network operation.
  • an electronic device wherein the electronic device includes the chip.
  • the first mapping unit and/or the second mapping unit generate connection data of the input neuron and the weight, thereby reducing the calculation amount, solving the problem that the CPU and the GPU have insufficient performance, and the front-end decoding overhead is large. Effectively improved support for multi-layer artificial neural network algorithms;
  • FIG. 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an artificial neural network according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a first connection manner of the first output neuron after the first mapping of the artificial neural network in FIG. 2;
  • FIG. 4 is a schematic diagram of a second connection manner of the first output neuron after the first mapping of the artificial neural network in FIG. 2;
  • FIG. 5 is a schematic structural diagram of an apparatus for artificial neural network operation according to an embodiment of the present invention.
  • FIG. 6 is a flow chart of an operation method of the apparatus for artificial neural network operation in FIG. 5;
  • Figure 7 is a flow chart showing the operation steps of the arithmetic unit of Figure 6;
  • FIG. 8 is a schematic structural diagram of an apparatus for artificial neural network operation according to another embodiment of the present invention.
  • FIG. 9 is a flow chart showing an operation method of the apparatus for artificial neural network operation in FIG. 8;
  • FIG. 10 is a schematic structural diagram of a system for artificial neural network operation according to still another embodiment of the present invention.
  • An embodiment of the present invention provides an apparatus for artificial neural network operation, which includes a mapping unit, generates connection relation data, and outputs the mapped input neurons and weights, and the correspondence between the mapped input neurons and weights is an input.
  • the neuron-weight pair reduces the computational complexity of the artificial neural network operation and realizes the artificial neural network operation of fast computation.
  • the input neuron-weight pair is not a true data storage structure, but merely represents the correspondence between the input neurons and the weights.
  • the input neurons are stored in vector A
  • the weights are stored in vector B
  • the lengths of vectors A and B are the same
  • the components of the same position of vectors A and B are combined to be considered an input neuron-weight pair.
  • the input neurons and weights can be placed separately in different caches and used by the arithmetic unit.
  • mapping unit 1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention.
  • input data includes an input neuron and a weight
  • an input data is input to a mapping unit 1
  • the mapped input neuron and weight are output by the mapping unit 1.
  • the value, the correspondence between the mapped input neurons and the weights is the input neuron-weight pair.
  • the mapping unit comprises a first mapping unit 11 and/or a second mapping unit 12, the first mapping unit 11 is configured to perform a first mapping operation, and can be used to remove a weight having a value of 0 or less than the first threshold ⁇ ', optionally
  • the first threshold ⁇ ' satisfies 0 ⁇ ' ⁇ 0.2, for example, 0.1, 0.08, 0.05, 0.02, 0.01 may also be used to remove the weight whose absolute value is less than or equal to the first threshold ⁇ , optionally, the first The threshold ⁇ satisfies 0 ⁇ ⁇ 0.2, for example, 0, 0.1, 0.08, 0.05, 0.02, and 0.01.
  • the second mapping unit 12 is configured to perform a second mapping operation, and correspondingly, may be used to remove an input neuron having a value of 0 or less than a second threshold ⁇ ′, optionally, the second threshold ⁇ ′ satisfies 0 ⁇ ′ ⁇ 0.2, for example, 0.1, 0.08, 0.05, 0.02, 0.01, can also be used to remove input neurons whose absolute value is less than or equal to the second threshold.
  • the second threshold ⁇ satisfies 0 ⁇ 0.2, for example, 0. , 0.1, 0.08, 0.05, 0.02, 0.01.
  • the two thresholds ⁇ and ⁇ mentioned here may or may not be equal.
  • the following description removes the weight whose absolute value is less than or equal to the first threshold value ⁇ only by the first mapping operation, and the second mapping operation removes the input neuron whose absolute value is less than or equal to the second threshold value ⁇ for explanation.
  • the first mapping unit 11 includes a first mapping determining unit 111 and a first mapping executing unit 112.
  • the first mapping determining unit 111 determines whether the absolute value of each input weight is less than or equal to the first threshold ⁇ . Based on the result of the first mapping determination unit, the first mapping execution unit 112 generates connection relation data, and converts the input data into an input neuron-weight pair according to the generated connection relation data.
  • connection relationship data generated by the first mapping execution unit 112 of the first mapping unit 11 may be expressed in the following two manners:
  • the value of 1 indicates that the absolute value of the weight between the input neuron and the output neuron is greater than the first threshold ⁇ , and the connection between the input neuron and the output neuron is retained, where 0 indicates that the absolute value of the weight is less than or equal to the first threshold.
  • removes the connection between the input neuron and the output neuron, and the connection of each output neuron to all input neurons constitutes a string of 0 and 1 to represent the connection relationship data of the output neuron, and all the outputs
  • the connection data of neurons is combined into a vector.
  • connection is reserved/removed according to whether the absolute value of the weight is greater than the first threshold ⁇ . If it is greater than, the reservation is retained, otherwise it is removed. Transmitting a distance at which the first connection is located from the first input neuron, and outputting a distance from the second input neuron to the last input neuron, the output being the third input neuron from the previous The distance of the input neurons, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship data of the output neurons.
  • the second mapping unit 12 includes a second mapping determining unit 121 and a second mapping executing unit 122, and the second mapping determining unit 121 determines whether the absolute value of each input input neuron is less than or equal to the second threshold ⁇ . Based on the result of the second mapping determination unit, the second mapping execution unit 122 generates connection relation data, and converts the input data into an input neuron-weight pair according to the generated connection relation data.
  • connection relationship data generated by the first mapping execution unit 122 in the second mapping unit 12 may also be expressed in the following two manners:
  • connection is retained/removed according to whether the absolute value of the input neuron is greater than the second threshold ⁇ . If it is greater than, the reservation is retained, otherwise it is removed.
  • the distance from the position where the first connection of an output neuron is located to the first input neuron, the distance from the second input neuron of the output neuron to the distance of the previous input neuron, and the third of the output neurons The distance of the input neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship data of the output neuron.
  • K the Kth layer
  • K+1 the Kth layer
  • the +1 layer is called the output layer. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer, and the number of neurons in each layer is predicted.
  • the input layer be composed of N input neurons I 1 , I 2 , . . . , I N
  • the output layer consists of M output neurons O 1 , O 2 , . . . , O M .
  • the first connection method :
  • each of the output neurons O j obtains its corresponding connection relationship data. Since the input layer has N nodes, the connection relationship data has N bits, each bit has a value of 1 or 0, the i-th bit value of 1 indicates that there is a connection between I i and O j , and 0 indicates I i and O. There is no connection between j . Initially, the value of these N bits is set to 1. If the absolute value of the input neuron I i is less than or equal to the second threshold ⁇ , or if the absolute value of the weight between I i and O j is less than or equal to the first threshold ⁇ , then the i-th bit in the connection relationship data The value is set to 0, which means that there is no connection between I i and O j . Then, all the connection relationship data of the output neurons are combined into a vector, and the N ⁇ (j-1)+1 component to the N ⁇ jth component value of the vector is the connection relationship of the output neuron O j . data.
  • the number of input layer neurons is equal to the number of storage bits of the connection relation data corresponding to each output neuron. Therefore, even with the simplest one-dimensional array that takes only 0, 1 values, the connection relation data corresponding to each output neuron can be clearly known.
  • connection relation data For each output neuron O j, its corresponding connection relationship data is obtained. If the absolute value of the input neuron I i is less than or equal to the second threshold ⁇ , or if the absolute value of the weight between I i and O j is less than or equal to the first threshold ⁇ , then no relationship between I i and O j is considered Connected, otherwise connected. If the input and O j of the neural element is connected to I i_1, I i_2, ..., I i_n, wherein 1 ⁇ i_1 ⁇ i_2 ⁇ ... ⁇ i_n ⁇ N. Then, the connection relation data has n bits; the first bit value is equal to i_1-1; n ⁇ k>1, and the value of the kth bit of the connection relation data is equal to i_k-i_(k-1).
  • connection relation data can be represented by a high-dimensional dynamic array, which can be represented by a linked list or the like.
  • the mapping execution unit of the mapping unit After the connection relationship data is generated, the mapping execution unit of the mapping unit outputs the mapped input neurons and weights according to the connection relationship data, and the mapping relationship between the mapped input neurons and the weights is an input neuron-weight pair, after mapping
  • the input neurons and weights can be used directly during the operation.
  • the first mapping execution unit 112 of the first mapping unit 11 and the second mapping execution unit 122 of the second mapping unit 12 generate connection relationship data based on the input data, and output the mapped input neurons and weights.
  • the corresponding connection relationship data can take two representations: one is to use one bit between each input and output neuron to indicate whether there is a connection, and the other is to use the distance between the connections to represent each The location of the connection.
  • the artificial neural network has 4 input neurons: I1, I2, I3, I4; there are 2 output nerves. Element: O1, O2; the weights of the connections are expressed as: W11, W21, W31, W41, W12, W22, W32, W42. Let I1 have a value of 0, I2, I3, and I4 are not 0; let W21, W12, and W42 be 0, and the remaining weights are non-zero.
  • the first mapping unit and the second mapping unit may process the data at the same time, or may process the data sequentially and the order of the two may be interchanged. The following describes only the first mapping unit to process the data.
  • the first connection is expressed as follows:
  • the connection relationship data of O2 defaults to: 1111, and the placement order is 11111111; if the operation of determining the weight is performed, as shown in FIG. 3, the output is
  • the connection relationship data of the neuron O1 is: 1011, each bit indicates whether or not there is a connection with the input, 1 indicates that there is a connection, 0 indicates no connection, and the connection relationship data of the output neuron O2 is 0110.
  • the input neurons and the weights corresponding to the connection relationship data of 0 are not operated.
  • the connection relation data is stored, the connection relationship data can be stored in the order of the output neurons. Put all the inputs of each output neuron in turn and combine them into a vector.
  • the order of the above example is 10110110.
  • the connection relationship data of the output neuron O1 is: 0011, and the first bit is changed from 1 to 0 because the first input neuron I1 has a value of 0, and the release from I1 is removed.
  • the connection and output neuron O2 connection relationship data is: 0110, and finally placed as: 00110110.
  • the connection relationship data of the output neuron O1 is: 0111, and the connection relationship data of the output neuron O2 is: 0111, and finally placed as: 01110111.
  • the second connection is expressed as follows:
  • the connection relationship data of O1, O2 defaults to: 0, 1, 1, 1; if the operation of determining the weight is performed, as shown in FIG. 4, the output is The neuron O1 is connected to the input neurons I1, I3, and I4, and the connection relationship data is 0, 2, 1. 0 indicates that the distance of the first connection is 0 from the first input neuron, that is, the first Input neurons, 2 means that the distance between the second input neuron and the previous input neuron is 2, which means the third input neuron, and 1 indicates that the distance of the third input neuron from the previous input neuron is 1, that is, the fourth input neuron. Similarly, the connection relationship data of O2 is 1,1.
  • the connection relationship data of O1, O2 is unchanged; if the operation of judging the input neuron value is performed, FIG. 4 after performing the first mapping operation
  • the connection relationship data of O1 and O2 are: 1, 1, 1,
  • the first mapping unit 11 and the second mapping unit 12 output the mapped neurons and weights according to the connection relationship data obtained above, and the correspondence between the mapped neurons and the weights is an input neuron-weight pair.
  • the input neuron-weight pair can be used directly in the operation, taking the specific process of outputting the neuron O1 mapping in an artificial neural network as shown in FIG. 2 as an example:
  • the input neurons are: I1, I2, I3, I4, and the input weights are: W11, W21, W31, W41, where I1, W21 take 0, and the rest are non-zero.
  • the connection relationship data is: 1011, or 0, 2, 1.
  • the connection relationship data is: 0011, or 2, 1.
  • the mapping execution unit of the two mapping units outputs the input neuron with the value 0 removed and the connection weight issued from it according to the connection relationship data, and the mapped input neurons are I3, I4, and the mapped weights.
  • the input neuron-weight pair is: I3-W31, I4-W41.
  • the obtained input neuron vector is (I3, I4), and the obtained weight vector is (W31, W41).
  • the two mapping units preferably operate on the data at the same time, regardless of the order.
  • the apparatus for artificial neural network operation in the embodiment of the present invention includes, in addition to the mapping unit 1, a storage unit 2, a DMA (direct memory access) 3, an instruction cache 4, a control unit 5, an input neuron cache 6, and a weight.
  • the storage unit 2 is configured to store data and instructions, which receive and store externally input data and instructions, including input neurons and weights.
  • the mapping unit 1 retrieves the input neurons and weights in the storage unit 2, and performs a first mapping operation by the first mapping unit 11, and performs a second mapping by the second mapping unit 12, and the mapping unit 1 obtains a mapping of the data.
  • the mapped input neurons and weights are stored in the storage unit 2.
  • the DMA 3 calls the instruction in the storage unit 2 and the mapped input neurons and weights, and allocates them to the instruction cache 4, the input neuron buffer 6, and the weight buffer 7, respectively.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation performed by the arithmetic unit 8 includes a neural network calculation.
  • the computing unit includes, but is not limited to: a first partial multiplier; a second portion of one or more adders (more specifically, an adder of the second portion constitutes an additive tree); the third portion is an activation function Unit; and/or fourth part vector processing unit. More specifically, the vector processing unit can process vector operations and/or pool operations.
  • the first part multiplies the input data 1 (in1) and the input data 2 (in2) to obtain the multiplied output (out).
  • the third part converts the input data (in) to the active output data (out) by the activation function (active).
  • pool is a pooling operation
  • pooling operations include but are not limited to: average pooling, maximum pooling, The median pooling, the input data in is the data in a pooled core associated with the output out.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for artificial neural network operation, as shown in FIG. 6, comprising the following steps:
  • S101 Read an artificial neural network SIMD instruction for starting an operation of an artificial neural network operation.
  • the mapping unit calls all the input neurons and weights in the storage unit, and processes them to obtain the mapped input neurons and weights, and stores them in the storage unit.
  • the first mapping unit performs a first mapping process on the input neurons and the weights
  • the second mapping unit performs a second mapping process on the input neurons and the weights.
  • Both mapping units can use two connection methods to generate connection relationship data, and output the input neurons and input weights according to the connection relationship data to output the mapped neurons and weights.
  • the two connection methods and the connection have been used before.
  • the neurons and weights after the relational data output mapping are described in detail and will not be described here.
  • the weight buffer 7 reads the partially mapped neurons and weights through the DMA3.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • the specific operation includes the following steps.
  • FIG. 7 An embodiment, as shown in FIG. 7:
  • S1041 Perform a multiplication operation, which is used to multiply the mapped neuron and the weight data;
  • S1042 Perform an addition tree operation, and add the results obtained in the first stage by the addition tree step by step to complete the vector inner product operation;
  • S1043 Perform non-linear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transformation is an activation function operation, and the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.
  • performing the addition tree operation does not necessarily perform the operation on the data of the input addition tree operation unit based on the result of the first stage, and the nonlinear transformation is not necessarily based on the result of the second stage, and may be directly input.
  • the data of the nonlinear transform unit is transformed.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S106 It is determined whether all the mapped neurons and the weights are calculated. If the result is N, the process returns to step S103. If the result is Y, step S107 is performed.
  • Another embodiment of the present invention provides an apparatus for artificial neural network operation, including a mapping unit 1, a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, and an input neuron cache 6.
  • the storage unit 2 is configured to store data and instructions, which receive and store externally input data and instructions, including input neurons and weights.
  • the instructions in the DMA3 call memory unit 2 are allocated to the instruction cache 4, and the input neurons in the memory unit 2 are called and the weights are assigned to the mapping unit 1 for direct mapping.
  • the mapping unit 1 performs a first mapping operation by the first mapping unit 11, and the second mapping unit 12 performs a second mapping.
  • the mapping unit 1 obtains the mapped input neurons and weights by mapping the data, and respectively transmits the converted input neurons to the neurons.
  • the control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.
  • the operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated.
  • the operation unit 8 has been described in detail before, and will not be described again.
  • the output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.
  • This embodiment also provides a method for artificial neural network operation, as shown in FIG. 9, comprising the following steps:
  • S201 Read an artificial neural network SIMD instruction for starting an operation of an artificial neural network operation.
  • S202 The mapping unit calls part of the input neurons and weights in the storage unit through DMA3, and processes the same, and the mapped input neurons and weights are directly stored in the neuron cache 6, and the weight buffer 7.
  • the first mapping unit performs a first mapping process on the input neurons and the weights
  • the second mapping unit performs a second mapping process on the input neurons and the weights.
  • Both mapping units can use two connection methods to generate connection relationship data, and output the input neurons and input weights according to the connection relationship data to output the mapped neurons and weights.
  • the two connection methods and the connection have been used before.
  • the neurons and weights after the relational data output mapping are described in detail and will not be described here.
  • the operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.
  • the operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.
  • step S205 It is determined whether all input neurons and weights are mapped and operated. If the result is N, the process returns to step S102. If the result is Y, step S107 is performed.
  • the first mapping unit and the second mapping unit of the mapping unit in the embodiment are mapped in the calculation, and the mapped data is directly calculated to the operation unit, and the previous implementation
  • the data mapped by the first mapping unit and the second mapping unit of the mapping unit in advance before the calculation by the arithmetic unit is stored in the storage unit.
  • the operation speed is faster.
  • FIG. 10 Another embodiment of the present invention provides a system for artificial neural network operation, as shown in FIG. 10, which includes an I/O interface 20, a storage device 30, a central processing unit (CPU) 40, and an apparatus 10 for artificial neural network computing. .
  • the I/O interface 20 is used for the I/O data to be sent to the artificial neural network computing device 10 by the CPU 40, and then written by the artificial neural network computing device 10 to the storage device 30, and the artificial neural network computing device 10 requires dedicated The instructions are also transmitted by the CPU 40 to the artificial neural network computing device 10.
  • the storage device 30 is used to temporarily store artificial neural network models and neuron data, particularly when all models cannot be dropped in the cache on the device 10 of the artificial neural network operation.
  • the central processing unit (CPU) 40 is used for basic control such as start-stop of the data transfer and artificial neural network calculation, and serves as an interface between the device 10 for artificial neural network calculation and external control.
  • the apparatus 10 for artificial neural network operation is for accepting data and programs from the CPU 40, and executing an artificial neural network operation algorithm, and the execution result of the apparatus 10 of the artificial neural network operation is transmitted back to the CPU 40.
  • the apparatus 10 supporting the artificial neural network operation is used as a coprocessor of the CPU 40 or the GPU to execute an artificial neural network operation algorithm.
  • a plurality of artificial neural network computing devices are interconnected to form a system: a plurality of artificial neural network computing devices can be interconnected through a PCIE bus to support larger-scale artificial neural network operations, and can share the same host.
  • the CPUs either have their own host CPUs and can share memory or each accelerator has its own memory.
  • the interconnection method can be any interconnection topology.
  • the artificial neural network computing system can be used as a SOC on-chip system for mobile phones, robots, drones, video monitoring equipment, etc., effectively reducing the core area of the control part, increasing the processing speed, and reducing the overall power consumption.
  • the universal interconnect interface of the combined processing device is coupled to certain components of the device. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
  • the present disclosure discloses a chip that includes an apparatus for artificial neural network computing or a system of artificial neural network operations.
  • the present disclosure discloses a chip package structure that includes the chip described above.
  • the present disclosure discloses a board that includes the chip package structure described above.
  • the present disclosure discloses an electronic device that includes the above-described card.
  • the electronic device may include a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a camera, a projector, a watch, a headset, Mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
  • a data processing device a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a camera, a projector, a watch, a headset, Mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
  • the vehicle includes an airplane, a ship, and/or a vehicle;
  • the household appliance includes a television, an air conditioner, a microwave oven, a refrigerator, a rice cooker, a humidifier, a washing machine, an electric lamp, a gas stove, a range hood;
  • the medical device includes a nuclear magnetic resonance instrument, B-ultrasound and / or electrocardiograph.
  • a device or method employing the techniques of the present invention when performing neural network operations, for a given neural network, if the absolute value of some of the input neurons and weights in the network is equal to zero or near zero, then In terms of computational speed, there is an improvement over devices or methods that do not employ the techniques described herein. Moreover, the larger the ratio of the input neurons whose absolute value is equal to 0 or near 0 to all the input neurons in the network, the greater the increase of the operation speed; the absolute value is equal to 0 or the weight of the value near 0 is the proportion of the ownership value in the network. Large, the higher the speed of the operation.

Abstract

一种人工神经网络运算的装置及方法,人工神经网络运算的装置包括:映射单元,接收输入神经元和权值,产生输入神经元和输出神经元的连接关系数据,输出映射后的输入神经元和权值,所述映射后的输入神经元和权值的对应关系为输入神经元-权值对,所述映射单元包括:第一映射单元,用于去除绝对值小于或等于第一阈值或者值为0或小于第一阈值的权值;和/或第二映射单元,用于去除绝对值小于或等于第二阈值或者值为0或小于第二阈值的输入神经元。

Description

一种人工神经网络运算的装置及方法 技术领域
本发明涉及数据处理技术领域,更具体地涉及一种人工神经网络运算的装置和方法。
背景技术
人工神经网络(Artificial Neural Networks,ANNs)简称为神经网络(NNs),它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间的相互连接关系数据,从而达到处理信息的目的。神经网络用到的算法就是向量乘法,并且广泛采用符号函数及其各种逼近。
神经网络被广泛应用于各种应用场景:计算视觉、语音识别和自然语言处理等。在近几年的时间里,神经网络的规模一直在增长。在1998年,Lecun用于手写字符识别的神经网络的规模小于1M个权值;在2012年,krizhevsky用于参加ImageNet竞赛的规模是60M个权值。
神经网络是一个高计算量和高访存的应用,权值越多,计算量和访存量都会增大。随着神经网络计算量和访存量的急剧增大,现有技术中通常采用通用处理器计算人工神经网络。对于通用处理器,输入神经元、输出神经元和权重分别存储在三个数组中,同时还有一个索引数组,索引数组存储了每个输出和输入连接的连接关系数据。在计算时,主要的运算是神经元与权值相乘。由于权值和神经元不是一一对应的关系,所以每一次运算都要通过索引数组找到神经元对应的权值。由于通用处理器计算能力和访存能力都很弱,满足不了神经网络的需求。而多个通用处理器并行执行时,通用处理器之间相互通讯又成为了性能瓶颈。在计算剪枝之后的神经网络时,每次乘法运算都要去索引数组里重新查找权值对应的位置,增加了额外的计算量和访存开销。因此计算神经网络耗时长,功耗高。通用处理器需要把多层人工神经网络运算译码成一长列运算及访存指令序列,处理器前端译码带来了较大的功耗开销。
另一种支持人工神经网络运算及其训练算法的已知方法是使用图形处理器(GPU),该方法通过使用通用寄存器堆和通用流处理单元执行通 用SIMD指令来支持上述算法。但由于GPU是专门用来执行图形图像运算以及科学计算的设备,没有对人工神经网络运算的专门支持,仍然需要大量的前端译码工作才能执行人工神经网络运算,带来了大量的额外开销。另外GPU只有较小的片上缓存,多层人工神经网络的模型数据(权值)需要反复从片外搬运,片外带宽成为了主要性能瓶颈,同时带来了巨大的功耗开销。
公开内容
鉴于现有方案存在的问题,为了克服上述现有技术方案的不足,本发明提出了一种人工神经网络运算的装置和方法。
根据本发明的一方面,提供一种人工神经网络运算的装置,包括:映射单元,接收输入神经元和权值,产生输入神经元和输出神经元的连接关系数据,输出映射后的输入神经元和权值,所述映射后的输入神经元和权值的对应关系为输入神经元-权值对,所述映射单元包括:第一映射单元,用于去除绝对值小于或等于第一阈值或者值为0或小于第一阈值的权值;和/或第二映射单元,用于去除绝对值小于或等于第二阈值或者值为0或小于第二阈值的输入神经元。
在一些实施例中,第一映射单元包括:第一映射判断单元,用于判断每一输入的权值的绝对值是否小于或等于第一阈值或者每一输入的权值的值是否为0或小于第一阈值;以及第一映射执行单元,基于所述第一映射判断单元的判断结果产生所述连接关系数据,去除绝对值小于或等于第一阈值或者值为0或小于第一阈值的权值,输出所述输入神经元-权值对;和/或第二映射单元包括:第二映射判断单元,用于判断每一输入的输入神经元的绝对值是否小于或等于第二阈值或者每一输入的输入神经元的值是否为0或小于第二阈值;以及第二映射执行单元,基于所述第二映射判断单元的判断结果产生所述连接关系数据,去除绝对值小于或等于第二阈值或者值为0或小于第二阈值的输入神经元,输出所述输入神经元-权值对。
在一些实施例中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第一映射单元的第一映射执行单元产生所述连接关系数据包括:对第j个输出神经元O j得到其对应的连接关系数据,对应于输入层的N个节点,所述接关系数据有N位,初始 时,所述N位的值都置为1,N个输入神经元I 1,I 2,...,I N与输出神经元O j之间均有连接,第i个输入神经元I i与输出神经元O j之间的权值的绝对值小于或等于第一阈值或者第i个输入神经元I i与输出神经元O j之间的权值的值为0或小于第一阈值,将该连接关系数据中第i位的值置为0,I i与O j之间无连接,将所有的输出神经元O 1,O 2,...,O M的连接关系数据拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量为输出神经元O j对应的连接关系数据。
在一些实施例中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第一映射单元的第一映射执行单元产生所述连接关系数据包括:对第j个输出神经元O j得到其对应的连接关系数据,若第i个输入神经元I i与输出神经元O j之间的权值的绝对值小于或等于第一阈值或者第i个输入神经元I i与输出神经元O j之间的权值的值为0或小于第一阈值,则I i与O j之间无连接,否则有连接,与O j有连接的n个输入神经元为I i_1,I i_2,...,I i_n,其中1≤i_1<i_2<...<i_n≤N,输出神经元O j对应的连接关系数据有n位,第1位值等于i_1-1,连接关系数据第k位的值等于i_k-i_(k-1),其中,n≥k>1。
在一些实施例中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第二映射单元的第二映射执行单元产生所述连接关系数据包括:对第j个输出神经元O j得到其对应的连接关系数据,对应于输入层的N个节点,所述连接关系数据有N位,初始时,所述N位的值都置为1,N个输入神经元I 1,I 2,...,I N与输出神经元O j之间均有连接,若第i个输入神经元I i的绝对值小于或等于第二阈值或者第i个输入神经元I i的值为0或小于第二阈值,将该连接关系数据中第i位的值置为0,I i与O j之间无连接,将所有的输出神经元O 1,O 2,...,O M的连接关系数据拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量为输出神经元O j对应的连接关系数据。
在一些实施例中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第二映射单元的第二映射执行单元产生所述连接关系数据包括:对第j个输出神经元O j得到其对应的连接关系数据,若第i个输入神经元I i的绝对值小于或等于第二阈值或者第 i个输入神经元I i的值为0或小于第二阈值,则I i与O j之间无连接,否则有连接,与O j有连接的n个输入神经元为I i_1,I i_2,...,I i_n,其中1≤i_1<i_2<...<i_n≤N,输出神经元O j对应的连接关系数据有n位,第1位值等于i_1-1,连接关系数据第k位的值等于i_k-i_(k-1),其中,n≥k>1。
在一些实施例中,人工神经网络运算的装置还包括:存储单元,用于存储外界输入的数据及指令,所述数据包括输入神经元和权值,所述映射单元调取所述输入神经元和权值并输出映射后的输入神经元和权值;运算单元,用于调取所述映射后的输入神经元和权值并进行运算获得输出神经元。
在一些实施例中,所述运算装置包括:乘法运算单元;至少一个加法器;和/或非线性变换单元。
在一些实施例中,人工神经网络运算的装置还包括:指令缓存单元,用于缓存所述指令;输入神经元缓存,用于缓存所述映射后的输入神经元;权值缓存,用于缓存所述映射后的权值;控制单元,用于读取所述指令缓存单元中的指令,并控制所述运算单元调取所述输入神经元缓存中的所述映射后的输入神经元和所述权值缓存中所述映射后的权值并进行运算;以及输出神经元缓存,用于缓存所述运算单元获得的所述输出神经元。
在一些实施例中,所述映射单元输出的映射后的输入神经元和权值存储在所述存储单元上,所述装置还包括:DMA,用于调取存储单元上的指令及映射后的输入神经元和权值分别存储至所述指令缓存单元、输入神经元缓存、权值缓存,并将所述输出神经元缓存中的所述输出神经元存储至存储单元上用于传输至外界。
在一些实施例中,人工神经网络运算的装置还包括:DMA,用于调取存储单元上的指令存储至所述指令缓存单元,并调取存储单元上的数据至映射单元,所述映射单元输出的映射后的输入神经元和权值分别存储至输入神经元缓存、权值缓存,并将所述输出神经元缓存中的所述输出神经元存储至存储单元上用于传输至外界。
根据本发明的另一方面,提供一种人工神经网络运算的方法,包括人工神经网络运算的装置,所述方法包括:映射单元调取所述存储单元中的所述输入神经元和权值并输出映射后的输入神经元和权值;运算装置调取 所述映射后的输入神经元和权值并进行运算获得输出神经元。
在一些实施例中,所述运算包括:乘法运算;加法运算;和/或非线性变换。
在一些实施例中,所述方法还包括:所述映射单元调取所述存储单元中的全部的所述输入神经元和权值并输出映射后的输入神经元和权值,并存储至所述存储单元;输入神经元缓存、权值缓存通过DMA读取部分所述映射后的输入神经元和权值,并被运算单元调取;输出神经元缓存缓存所述运算单元获得的所述输出神经元,并通过DMA存储至所述存储单元;判断所述输入神经元和权值是否均经过运算,若是,运算结束,否则,返回输入神经元缓存、权值缓存通过DMA读取部分所述映射后的输入神经元和权值的步骤。
在一些实施例中,所述方法还包括:所述映射单元通过DMA调取所述存储单元中的部分的所述输入神经元和权值并输出映射后的输入神经元和权值;输入神经元缓存、权值缓存缓存所述映射后的输入神经元和权值,并被运算单元调取;输出神经元缓存缓存所述运算单元获得的所述输出神经元,并通过DMA存储至所述存储单元;判断所述输入神经元和权值是否均经过映射及运算,若是,运算结束,否则,返回映射单元通过DMA调取所述存储单元中的部分的所述输入神经元和权值的步骤。
根据本发明的再一方面,提供一种芯片,所述芯片包括所述的人工神经网络运算的装置。
根据本发明的又一方面,提供一种电子装置,其中,所述电子装置包括所述的芯片。
从上述技术方案可以看出,本发明具有以下有益效果:
(1)通过第一映射单元和/或第二映射单元,产生输入神经元和权值的连接关系数据,减少了计算量,解决了CPU和GPU运算性能不足,前端译码开销大的问题,有效提高了对多层人工神经网络运算算法的支持;
(2)通过采用针对多层人工神经网络运算算法的专用片上缓存,充分挖掘了输入神经元和权值数据的重用性,避免了反复向内存读取这些数据,降低了内存访问带宽,避免了内存带宽成为多层人工神经网络运算及其训练算法性能瓶颈的问题。
附图说明
图1为本发明实施例中映射单元的结构示意图;
图2为本发明实施例中一个人工神经网络的结构示意图;
图3为图2中的人工神经网络经第一映射后的第一个输出神经元的第一连接方式示意图;
图4为图2中的人工神经网络经第一映射后的第一个输出神经元的第二连接方式示意图;
图5为本发明一实施例的人工神经网络运算的装置的结构示意图;
图6为图5中人工神经网络运算的装置的运算方法的流程图;
图7为图6中运算单元运算步骤的流程图;
图8为本发明另一实施例的人工神经网络运算的装置的结构示意图;
图9为图8中人工神经网络运算的装置的运算方法的流程图;
图10为本发明又一实施例的人工神经网络运算的系统的结构示意图。
具体实施方式
本发明某些实施例于后方将参照所附附图做更全面性地描述,其中一些但并非全部的实施例将被示出。实际上,本发明的各种实施例可以许多不同形式实现,而不应被解释为限于此数所阐述的实施例;相对地,提供这些实施例使得本发明满足适用的法律要求。
在本说明书中,下述用于描述本发明原理的各种实施例只是说明,不应该以任何方式解释为限制发明的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本发明的示例性实施例。下述描述包括多种具体细节来帮助理解,但这些细节应认为仅仅是示例性的。因此,本领域普通技术人员应认识到,在不悖离本发明的范围和精神的情况下,可以对本文中描述的实施例进行多种改变和修改。此外,为了清楚和简洁起见,省略了公知功能和结构的描述。此外,贯穿附图,相同附图标记用于相似功能和操作。
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。
本发明实施例提供了一种人工神经网络运算的装置,其包括映射单元,产生连接关系数据并输出映射后的输入神经元和权值,映射后的输入神经 元和权值的对应关系为输入神经元-权值对,降低了人工神经网络运算的运算量,实现了快速运算的人工神经网络运算。
所述输入神经元-权值对不是一种真正的数据存储结构,仅仅是表示输入神经元和权值的对应关系。例如,输入神经元存储于向量A中,权值存储于向量B中,向量A和B的长度相同,向量A和B的同一位置的分量组合在一起被认为是一个输入神经元-权值对。在参与运算时,输入神经元和权值可以分开放置于不同缓存中,被运算单元使用。
图1为本发明实施例中映射单元的结构示意图,如图1所示,输入数据包括输入神经元和权值,输入数据输入映射单元1,由映射单元1输出映射后的输入神经元和权值,映射后的输入神经元和权值的对应关系为输入神经元-权值对。
映射单元包括第一映射单元11和/或第二映射单元12,第一映射单元11用于执行第一映射操作,可以用于去除值为0或小于第一阈值α’的权值,可选地,第一阈值α’满足0<α′<0.2,例如为0.1、0.08、0.05、0.02、0.01还可以用于去除绝对值小于或等于第一阈值α的权值,可选地,第一阈值α满足0≤α<0.2,例如为0、0.1、0.08、0.05、0.02、0.01。第二映射单元12用于执行第二映射操作,相应地,可以用于去除值为0或小于第二阈值β’的输入神经元,可选地,第二阈值β’满足0<β′<0.2,例如为0.1、0.08、0.05、0.02、0.01,还可以用于去除绝对值小于或等于第二阈值的输入神经元,可选地,第二阈值β满足0≤β<0.2,例如为0、0.1、0.08、0.05、0.02、0.01。这里提到的两个阈值α与β可以相等,也可以不相等。以下描述仅以第一映射操作去除绝对值小于或等于第一阈值α的权值,第二映射操作去除绝对值小于或等于第二阈值β的输入神经元来进行解释说明。
第一映射单元11包括第一映射判断单元111及第一映射执行单元112,第一映射判断单元111判断每一输入的权值的绝对值是否小于或等于第一阈值α。基于第一映射判断单元的结果,第一映射执行单元112产生连接关系数据,根据产生的连接关系数据将输入数据转换成输入神经元-权值对。
所述第一映射单元11的第一映射执行单元112产生的连接关系数据可以采用以下两种方式表示:
第一种方式:
采用1表示输入神经元与输出神经元之间的权值的绝对值大于第一阈值α,保留该输入神经元与输出神经元间的连接,0表示权值的绝对值小于或等于第一阈值α,去除该输入神经元与输出神经元间的连接,每个输出神经元与所有输入神经元的连接组成一个0和1的字符串来表示该输出神经元的连接关系数据,并将所有输出神经元的连接关系数据拼合成一个向量。
第二种方式:
根据权值的绝对值是否大于第一阈值α对连接进行保留/去除,若大于,则保留,否则去除。将一输出第一个连接所在的位置距离第一个输入神经元的距离、所述输出第二个输入神经元距离上一个输入神经元的距离,所述输出第三个输入神经元距离上一个输入神经元的距离,……,依次类推,直到穷举所述输出的所有输入,来表示所述输出神经元的连接关系数据。
第二映射单元12包括第二映射判断单元121和第二映射执行单元122,第二映射判断单元121判断每一输入的输入神经元的绝对值是否小于或等于第二阈值β。基于第二映射判断单元的结果,则第二映射执行单元122产生连接关系数据,根据产生的连接关系数据将输入数据转换成输入神经元-权值对。
其中,所述第二映射单元12中的第一映射执行单元122产生的连接关系数据亦可以采用以下两种方式表示:
第一种方式:
采用1表示输入神经元的绝对值大于第二阈值β,保留该输入神经元与输出神经元间的连接,0表示输入神经元的绝对值小于或等于第二阈值β,去除该输入神经元与输出神经元间的连接,每个输出神经元与其所有输入神经元组成一个0和1的字符串来表示该输出神经元的连接关系数据,并将所有输出神经元的连接关系数据拼合成一个向量。
第二种方式:
根据输入神经元的绝对值是否大于第二阈值β对连接进行保留/去除,若大于,则保留,否则去除。将一输出神经元第一个连接所在的位置距离第一个输入神经元的距离、所述输出神经元第二个输入神经元距离上一个 输入神经元的距离,所述输出神经元第三个输入神经元距离上一个输入神经元的距离,……,依次类推,直到穷举所述输出的所有输入,来表示所述输出神经元的连接关系数据。
具体地,设一个神经网络有L层,K=1,2,...,L-1,对于第K层和第K+1层来说,我们将第K层称为输入层,第K+1层称为输出层。即除最顶层外,每一层都可以作为输入层,其下一层为对应的输出层,每层神经元的个数是预知的。
假设第一映射单元和第二映射单元都执行相应的操作,每一对输入层与输出层之间的运算过程如下:
设输入层由N个输入神经元I 1,I 2,...,I N组成,输出层由M个输出神经元O 1,O 2,...,O M组成。
i=1,2,...,N,j=1,2,...,M
第一种连接方式:
首先,对每个输出神经元O j得到其对应的连接关系数据。由于输入层有N个节点,所以该连接关系数据有N位,每一位的值为1或0,第i位值为1表示I i与O j之间有连接,0表示I i与O j之间无连接。初始时,这N位的值都置为1。如果输入神经元I i的绝对值小于或等于第二阈值β,或者如果I i与O j之间的权值的绝对值小于或等于第一阈值α,则将该连接关系数据中第i位的值置为0,即认为I i与O j之间无连接。然后,将所有的输出神经元的连接关系数据拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量值就是输出神经元O j对应的连接关系数据。
该方法中,输入层神经元的个数等于每一个输出神经元对应的连接关系数据的存储位数。所以就算只用最简单的只取0,1值的一维数组,也能清晰地知道每个输出神经元对应的连接关系数据。
第二种连接方式:
对每个输出神经元O j得到其对应的连接关系数据。如果输入神经元I i的绝对值小于或等于第二阈值β,或者如果I i与O j之间的权值的绝对值小于或等于第一阈值α,则认为I i与O j之间无连接,反之则有连接。若与O j有连接的输入神经元为I i_1,I i_2,...,I i_n,其中1≤i_1<i_2<...<i_n≤N。则连接关系数据有n位;第1位值等于i_1-1;n≥k>1,连接关系数据第k位的值 等于i_k-i_(k-1)。
该方法中,连接关系数据可以用高维动态数组,可以用链表等等表示。
产生连接关系数据后,映射单元的映射执行单元根据连接关系数据输出映射后的输入神经元和权值,映射后的输入神经元和权值的对应关系为输入神经元-权值对,映射后的输入神经元和权值可以在运算时被直接使用。
总之,上述映射单元中,第一映射单元11的第一映射执行单元112和第二映射单元12的第二映射执行单元122基于输入数据产生连接关系数据,并输出映射后的输入神经元和权值,对应的连接关系数据均可采用两种表示形式:一种是每个输入与输出神经元之间都用一位表示是否有连接,另一种是用连接之间的距离来表示每个连接的位置。
为了使得这两个映射单元的功能更加明确,以下分别给出这两个单元中的数据操作过程。
以图2所示的一个人工神经网络为例,仅以判断标准是取值是否为0进行说明,该人工神经网络有4个输入神经元:I1,I2,I3,I4;有2个输出神经元:O1,O2;把连接的权值分别表示为:W11,W21,W31,W41,W12,W22,W32,W42。设I1的值为0,I2,I3,I4非0;设W21,W12,W42为0,其余权值非0。
第一映射单元和第二映射单元可同时对数据进行处理,也可以依次对数据进行处理并且二者的顺序可以互换,下面仅以第一映射单元先对数据进行处理进行说明。
采用第一种连接方式表示如下:
在第一映射单元11中,如果不执行判断权值的操作:O1,O2的连接关系数据默认为:1111,摆放顺序为11111111;如果执行判断权值的操作,如图3所示,输出神经元O1的连接关系数据为:1011,每一位表示是否与输入有连接,1表示有连接,0表示无连接,输出神经元O2的连接关系数据为0110。在运算时,连接关系数据为0所对应的输入神经元与权值不会进行运算。在存储连接关系数据时,可以按照输出神经元的顺序对连接关系数据进行存储。将每个输出神经元的所有输入依次摆放完,拼合成一向量,上面的例子摆放的顺序为10110110。
在第二映射单元12中,如果不执行判断输入神经元值的操作,O1, O2的连接关系数据及摆放顺序不变;如果执行判断输入神经元值的操作,对于执行了第一映射操作后的图3所示的神经网络,输出神经元O1的连接关系数据为:0011,第一位由1换成0,是因为第一个输入神经元I1值为0,去除了从I1发出的连接,输出神经元O2的连接关系数据为:0110,最终摆放为:00110110。对于未执行第一映射操作的神经网络,输出神经元O1的连接关系数据为:0111,输出神经元O2的连接关系数据为:0111,最终摆放为:01110111。
采用第二种连接方式表示如下:
在第一映射单元11中,如果不执行判断权值的操作,O1,O2的连接关系数据默认为:0,1,1,1;如果执行判断权值的操作,如图4所示,输出神经元O1与输入神经元I1,I3,I4相连接,那么连接关系数据为0,2,1。0表示第一个连接所在的位置距离第一个输入神经元的距离为0,即第一个输入神经元,2表示第二个输入神经元距离上一个输入神经元的距离为2,即表示第三个输入神经元,1表示第三个输入神经元距离上一个输入神经元的距离为1,即表示第四个输入神经元。同理,O2的连接关系数据为1,1。
在第二映射单元12中,如果不执行判断输入神经元值的操作:O1,O2的连接关系数据不变;如果执行判断输入神经元值的操作,对于执行了第一映射操作后的图4所示的神经网络,因为第一个输入神经元I1值为0,去除了从I1发出的连接,故输出神经元O1的连接关系数据为:2,1,输出神经元O2的连接关系数据为:1,1。对于未执行第一映射操作的神经网络,O1,O2的连接关系数据都是:1,1,1。
第一映射单元11和第二映射单元12会根据上面得到的连接关系数据,输出映射后的神经元和权值,映射后的神经元和权值的对应关系为输入神经元-权值对,输入神经元-权值对可以在运算时被直接使用,以图2所示的一个人工神经网络中输出神经元O1映射的具体过程为例:
输入神经元为:I1,I2,I3,I4,输入权值为:W11,W21,W31,W41,其中I1,W21取0,其余非0。
首先,第一映射单元11中,连接关系数据为:1011,或0,2,1。然后,第二映射单元12中,连接关系数据为:0011,或2,1。两个映射单元的 映射执行单元根据连接关系数据,输出是去除掉值为0的输入神经元以及从它发出的连接权值,则映射后的输入神经元为I3,I4,映射后的权值为W31,W41,输入神经元-权值对是:I3-W31,I4-W41。例如,用向量的方式对映射后的输入神经元和权值进行存储,则得到的输入神经元向量是(I3,I4),得到的权值向量是(W31,W41)。
尽管上述举例为先利用第一映射单元执行第一映射操作,再利用第二映射单元执行第二映射,最终得到映射后的输入神经元和权值。但是在实际应用中,两个映射单元优选是同时对数据进行操作的,不分先后顺序。
本发明实施例中的人工神经网络运算的装置,除了映射单元1,还包括:存储单元2、DMA(直接内存存取)3、指令缓存4、控制单元5、输入神经元缓存6、权值缓存7、运算单元8以及输出神经元缓存9。如图5所示,
存储单元2,用于存储数据和指令,其接收并存储外界输入的数据及指令,该数据包括输入神经元和权值。
映射单元1调取存储单元2中的输入神经元和权值,并由第一映射单元11执行第一映射操作,由第二映射单元12执行第二映射,映射单元1经过对数据的映射获得映射后的输入神经元和权值,存储至存储单元2中。
DMA3调用存储单元2中指令及经过映射后的输入神经元和权值,分别分配给指令缓存4、输入神经元缓存6、权值缓存7。
控制单元5从指令缓存4中读取专用指令,并将其译码成运算单元指令并输入至运算单元8。
运算单元8用于执行具体的运算,其根据运算指令,调取输入神经元缓存6、权值缓存7中的映射后的输入神经元和权值,进行运算。
所述运算单元8执行运算包括神经网络计算。
在一个实施例里,计算单元包括但不仅限于:第一部分乘法器;第二部分一个或者多个加法器(更具体的,第二个部分的加法器组成加法树);第三部分为激活函数单元;和/或第四部分向量处理单元。更具体的,向量处理单元可以处理向量运算和/或池化运算。第一部分将输入数据1(in1)和输入数据2(in2)相乘得到相乘之后的输出(out),过程为:out=in1*in2;第二部分将输入数据in1通过加法器相加得到输出数据(out)。更具体的, 第二部分为加法树时,将输入数据in1通过加法树逐级相加得到输出数据(out),其中in1是一个长度为N的向量,N大于1,过称为:out=in1[1]+in1[2]+...+in1[N],和/或将输入数据(in1)通过加法数累加之后和输入数据(in2)相加得到输出数据(out),过程为:out=in1[1]+in1[2]+...+in1[N]+in2,或者将输入数据(in1)和输入数据(in2)相加得到输出数据(out),过称为:out=in1+in2;第三部分将输入数据(in)通过激活函数(active)运算得到激活输出数据(out),过程为:out=active(in),激活函数active可以是sigmoid、tanh、relu、softmax等,除了做激活操作,第三部分可以实现其他的非线性函数,可将将输入数据(in)通过运算(f)得到输出数据(out),过程为:out=f(in)。向量处理单元将输入数据(in)通过池化运算得到池化操作之后的输出数据(out),过程为out=pool(in),其中pool为池化操作,池化操作包括但不限于:平均值池化,最大值池化,中值池化,输入数据in是和输出out相关的一个池化核中的数据。
所述运算单元执行运算包括第一部分是将所述输入数据1和输入数据2相乘,得到相乘之后的数据;和/或第二部分执行加法运算(更具体的,为加法树运算,用于将输入数据1通过加法树逐级相加),或者将所述输入数据1通过和输入数据2相加得到输出数据;和/或第三部分执行激活函数运算,对输入数据通过激活函数(active)运算得到输出数据;和/或第四部分执行池化运算,out=pool(in),其中pool为池化操作,池化操作包括但不限于:平均值池化,最大值池化,中值池化,输入数据in是和输出out相关的一个池化核中的数据。以上几个部分的运算可以自由选择一个多个部分进行不同顺序的组合,从而实现各种不同功能的运算。
输出神经元缓存9用于存储运算单元获得的输出神经元,再经DMA3存储至存储单元2中,外界可以调取存储至存储单元2中的输出神经元。
本实施例还提供了一种人工神经网络运算的方法,如图6所示,包括以下步骤:
S101:读取人工神经网络SIMD指令,用于开始进行人工神经网络运算的操作。
S102:映射单元调用存储单元中的全部输入神经元和权值,并对其进行处理,得到映射后的输入神经元和权值,并存储至存储单元。
具体的,第一映射单元对输入神经元和权值进行第一映射处理、第二映射单元对输入神经元和权值进行第二映射处理。两种映射单元均可以采用两种连接方式产生连接关系数据,并将输入神经元和输入权值按照连接关系数据输出映射后的神经元和权值,在前已经对两种连接方式及按照连接关系数据输出映射后的神经元和权值进行了详细描述,在此不再进行赘述。
S103:输入神经元缓存6、权值缓存7通过DMA3读取部分映射后的神经元和权值。
S104:运算单元调用神经元缓存6、权值缓存7中映射后的输入神经元和权值进行运算,获得输出神经元。
具体的运算包括以下步骤,在一实施例中,如图7所示:
S1041:执行乘法运算,用于将调用映射后的神经元和权值数据相乘;
S1042:执行加法树运算,将第一阶段得到的结果通过加法树逐级相加,完成向量内积运算;
S1043:对第二阶段得到的结果进行非线性变换后得到输出神经元,所述非线性变换为激活函数运算,激活函数可以是sigmoid函数、tanh函数、ReLU函数或softmax函数等。
在其他实施例中,执行加法树运算并不必然基于第一阶段的结果,直接对输入加法树运算单元的数据进行运算,非线性变换亦并不必然基于第二阶段的结果,可以直接对输入非线性变换单元的数据进行变换。
S105:运算单元将获得的输出神经元存储至输出神经元缓存9,并经DMA3存储至存储单元2中。
S106:判断是否所有映射后的神经元和权值运算完毕,若结果为N,则返回步骤S103,若结果为Y,则执行步骤S107。
S107:结束运算。
本发明另一实施例提供了一种人工神经网络运算的装置,其包括映射单元1、存储单元2、DMA(直接内存存取)3、指令缓存4、控制单元5、输入神经元缓存6、权值缓存7、运算单元8以及输出神经元缓存9,如图 8所示,
存储单元2,用于存储数据和指令,其接收并存储外界输入的数据及指令,该数据包括输入神经元和权值。
DMA3调用存储单元2中的指令分配给指令缓存4,调用存储单元2中的输入神经元和权值分配给映射单元1直接用于映射。
映射单元1由第一映射单元11执行第一映射操作,由第二映射单元12执行第二映射,映射单元1经过对数据的映射获得映射后的输入神经元和权值,分别传输给神经元缓存6、权值缓存7。
控制单元5从指令缓存4中读取专用指令,并将其译码成运算单元指令并输入至运算单元8。
运算单元8用于执行具体的运算,其根据运算指令,调取输入神经元缓存6、权值缓存7中的映射后的输入神经元和权值,进行运算。在前已对运算单元8做出了详细描述,在此不再赘述。
输出神经元缓存9用于存储运算单元获得的输出神经元,再经DMA3存储至存储单元2中,外界可以调取存储至存储单元2中的输出神经元。
本实施例还提供了一种人工神经网络运算的方法,如图9所示,包括以下步骤:
S201:读取人工神经网络SIMD指令,用于开始进行人工神经网络运算的操作。
S202:映射单元通过DMA3调用存储单元中的部分输入神经元和权值,并对其进行处理,得到映射后的输入神经元和权值分别直接存入神经元缓存6、权值缓存7。
具体的,第一映射单元对输入神经元和权值进行第一映射处理、第二映射单元对输入神经元和权值进行第二映射处理。两种映射单元均可以采用两种连接方式产生连接关系数据,并将输入神经元和输入权值按照连接关系数据输出映射后的神经元和权值,在前已经对两种连接方式及按照连接关系数据输出映射后的神经元和权值进行了详细描述,在此不再进行赘述。
S203:运算单元调用神经元缓存6、权值缓存7中映射后的输入神经元和权值进行运算,获得输出神经元。
具体的运算步骤与前一实施例中步骤S104的运算步骤相同,再次不在赘述。
S204:运算单元将获得的输出神经元存储至输出神经元缓存9,并经DMA3存储至存储单元2中。
S205:判断是否所有输入神经元和权值均经过映射及运算,若结果为N,则返回步骤S102,若结果为Y,则执行步骤S107。
S206:结束运算。
本实施例相较于上一实施例,本实施例中映射单元的第一映射单元和第二映射单元是在计算中进行映射,将映射好的数据直接给运算单元进行运算,而上一实施例,在运算单元计算之前事先利用映射单元的第一映射单元和第二映射单元映射好的数据存储在存储单元中,本实施例,运算速度更快。
本发明又一实施例提供一种人工神经网络运算的系统,如图10所示,其包括:I/O接口20、存储装置30、中央处理器(CPU)40以及人工神经网络运算的装置10。
I/O接口20,用于I/O数据需要经过CPU 40发给人工神经网络运算的装置10,然后由人工神经网络运算的装置10写入存储装置30,人工神经网络运算装置10需要的专用指令也是由CPU40传输到人工神经网络运算装置10。
存储装置30用于暂存人工神经网络模型和神经元数据,特别是当全部模型无法在人工神经网络运算的装置10上的缓存中放下时。
中央处理器(CPU)40,用于进行数据搬运以及人工神经网络运算的装置10启动停止等基本控制,作为人工神经网络运算的装置10与外部控制的接口。
人工神经网络运算的装置10,用于接受来自CPU 40的数据和程序,执行人工神经网络运算算法,人工神经网络运算的装置10的执行结果将传输回CPU40。
本实施例中将支持人工神经网络运算的装置10作为CPU 40或者GPU的协处理器来执行人工神经网络运算算法。
本发明再一实施例中,多个人工神经网络运算的装置互联构成系统:多个人工神经网络运算的装置可以通过PCIE总线互联,以支持更大规模的人工神经网络运算,可以共用同一个宿主CPU或者分别有自己的宿主CPU,可以共享内存也可以每个加速器有各自的内存。此外其互联方式可以是任意互联拓扑。
人工神经网络运算的系统可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。
在一个实施例里,本披露公开了一个芯片,其包括了人工神经网络运算的装置或人工神经网络运算的系统。
在一个实施例里,本披露公开了一个芯片封装结构,其包括了上述芯片。
在一个实施例里,本披露公开了一个板卡,其包括了上述芯片封装结构。
在一个实施例里,本披露公开了一个电子装置,其包括了上述板卡。
电子装置可包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
检验是否使用本发明中所述技术方案的方式如下:
如果采用了本发明所述技术的装置或方法,在进行神经网络运算时,对于一个给定的神经网络,如果网络中有部分输入神经元和权值的绝对值等于0或在0附近,则在运算速度上,比不采用本发明所述技术的装置或方法有所提升。而且绝对值等于0或在0附近的输入神经元占网络中所有 输入神经元的比例越大,运算速度提升越大;绝对值等于0或在0附近的权值占网络中所有权值的比例越大,运算速度提升越大。
前面的附图中所描绘的进程或方法可通过包括硬件(例如,电路、专用逻辑等)、固件、软件(例如,被承载在非瞬态计算机可读介质上的软件),或两者的组合的处理逻辑来执行。虽然上文按照某些顺序操作描述了进程或方法,但是,应该理解,所描述的某些操作能以不同顺序来执行。此外,可并行地而非顺序地执行一些操作。
需要说明的是,在附图或说明书正文中,未绘示或描述的实现方式,均为所属技术领域中普通技术人员所知的形式,并未进行详细说明。此外,上述对各元件和方法的定义并不仅限于实施例中提到的各种具体结构、形状或方式,本领域普通技术人员可对其进行简单地更改或替换。
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (17)

  1. 一种人工神经网络运算的装置,包括:
    映射单元(1),接收输入神经元和权值,产生输入神经元和输出神经元的连接关系数据,输出映射后的输入神经元和权值,所述映射后的输入神经元和权值的对应关系为输入神经元-权值对,所述映射单元(1)包括:
    第一映射单元(11),用于去除绝对值小于或等于第一阈值或者值为0或小于第一阈值的权值;和/或
    第二映射单元(12),用于去除绝对值小于或等于第二阈值或者值为0或小于第二阈值的输入神经元。
  2. 根据权利要求1所述的装置,其中第一映射单元(11)包括:
    第一映射判断单元(111),用于判断每一输入的权值的绝对值是否小于或等于第一阈值或者每一输入的权值的值是否为0或小于第一阈值;以及
    第一映射执行单元(112),基于所述第一映射判断单元(111)的判断结果产生所述连接关系数据,去除绝对值小于或等于第一阈值或者值为0或小于第一阈值的权值,输出所述输入神经元-权值对;和/或
    第二映射单元(12)包括:
    第二映射判断单元(121),用于判断每一输入的输入神经元的绝对值是否小于或等于第二阈值或者每一输入的输入神经元的值是否为0或小于第二阈值;以及
    第二映射执行单元(122),基于所述第二映射判断单元(121)的判断结果产生所述连接关系数据,去除绝对值小于或等于第二阈值或者值为0或小于第二阈值的输入神经元,输出所述输入神经元-权值对。
  3. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第一映射单元(11)的第一映射执行单元(112)产生所述连接关系数据包括:
    对第j个输出神经元O j得到其对应的连接关系数据,对应于输入层的N个节点,所述接关系数据有N位,初始时,所述N位的值都置为1,N个输入神经元I 1,I 2,...,I N与输出神经元O j之间均有连接,若第i个输入神经元 I i与输出神经元O j之间的权值的绝对值小于或等于第一阈值或者第i个输入神经元I i与输出神经元O j之间的权值的值为0或小于第一阈值,将该连接关系数据中第i位的值置为0,I i与O j之间无连接,将所有的输出神经元O 1,O 2,...,O M的连接关系数据拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量为输出神经元O j对应的连接关系数据。
  4. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第一映射单元(11)的第一映射执行单元(112)产生所述连接关系数据包括:
    对第j个输出神经元O j得到其对应的连接关系数据,若第i个输入神经元I i与输出神经元O j之间的权值的绝对值小于或等于第一阈值或者第i个输入神经元I i与输出神经元O j之间的权值的值为0或小于第一阈值,则I i与O j之间无连接,否则有连接,与O j有连接的n个输入神经元为I i_1,I i_2,...,I i_n,其中1≤i_1<i_2<...<i_n≤N,输出神经元O j对应的连接关系数据有n位,第1位值等于i_1-1,连接关系数据第k位的值等于i_k-i_(k-1),其中,n≥k>1。
  5. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M,所述第二映射单元(12)的第二映射执行单元(122)产生所述连接关系数据包括:
    对第j个输出神经元O j得到其对应的连接关系数据,对应于输入层的N个节点,所述连接关系数据有N位,初始时,所述N位的值都置为1,N个输入神经元I 1,I 2,...,I N与输出神经元O j之间均有连接,若第i个输入神经元I i的绝对值小于或等于第二阈值或者第i个输入神经元I i的值为0或小于第二阈值,将该连接关系数据中第i位的值置为0,I i与O j之间无连接,将所有的输出神经元O 1,O 2,...,O M的连接关系数据拼合为一个向量,该向量的第N×(j-1)+1个分量到第N×j个分量为输出神经元O j对应的连接关系数据。
  6. 根据权利要求2所述的装置,其中,神经网络的输入层具有N个输入神经元I 1,I 2,...,I N,输出层具有M个输出神经元O 1,O 2,...,O M, 所述第二映射单元(12)的第二映射执行单元(122)产生所述连接关系数据包括:
    对第j个输出神经元O j得到其对应的连接关系数据,若第i个输入神经元I i的绝对值小于或等于第二阈值或者第i个输入神经元I i的值为0或小于第二阈值,则I i与O j之间无连接,否则有连接,与O j有连接的n个输入神经元为I i_1,I i_2,...,I i_n,其中1≤i_1<i_2<...<i_n≤N,输出神经元O j对应的连接关系数据有n位,第1位值等于i_1-1,连接关系数据第k位的值等于i_k-i_(k-1),其中,n≥k>1。
  7. 根据权利要求1至6中任一所述的装置,还包括:
    存储单元(2),用于存储外界输入的数据及指令,所述数据包括输入神经元和权值,所述映射单元(1)调取所述输入神经元和权值并输出映射后的输入神经元和权值;
    运算单元(8),用于调取所述映射后的输入神经元和权值并进行运算获得输出神经元。
  8. 根据权利要求7所述的装置,其中,所述运算装置(8)包括:
    乘法运算单元;
    至少一个加法器;和/或
    非线性变换单元。
  9. 根据权利要求7所述的装置,还包括:
    指令缓存单元(4),用于缓存所述指令;
    输入神经元缓存(6),用于缓存所述映射后的输入神经元;
    权值缓存(7),用于缓存所述映射后的权值;
    控制单元(5),用于读取所述指令缓存单元(4)中的指令,并控制所述运算单元(8)调取所述输入神经元缓存(6)中的所述映射后的输入神经元和所述权值缓存(7)中所述映射后的权值并进行运算;以及
    输出神经元缓存(9),用于缓存所述运算单元(8)获得的所述输出神经元。
  10. 根据权利要求9所述的装置,其中,所述映射单元(1)输出的映射后的输入神经元和权值存储在所述存储单元(2)上,所述装置还包括:
    DMA(3),用于调取存储单元(2)上的指令及映射后的输入神经元和权值分别存储至所述指令缓存单元(4)、输入神经元缓存(6)、权值缓存(7),并将所述输出神经元缓存(9)中的所述输出神经元存储至存储单元(2)上用于传输至外界。
  11. 根据权利要求9所述的装置,其中,所述装置还包括:
    DMA(3),用于调取存储单元(2)上的指令存储至所述指令缓存单元(4),并调取存储单元(2)上的数据至映射单元(1),所述映射单元(1)输出的映射后的输入神经元和权值分别存储至输入神经元缓存(6)、权值缓存(7),并将所述输出神经元缓存(9)中的所述输出神经元存储至存储单元(2)上用于传输至外界。
  12. 一种人工神经网络运算的方法,包括权利要求7至11中任一所述的装置,所述方法包括:
    映射单元(1)调取所述存储单元(2)中的所述输入神经元和权值并输出映射后的输入神经元和权值;
    运算装置(8)调取所述映射后的输入神经元和权值并进行运算获得输出神经元。
  13. 根据权利要求12所述的方法,其中,所述运算包括:
    乘法运算;
    加法运算;和/或
    非线性变换。
  14. 根据权利要求12所述的方法,所述方法还包括:
    所述映射单元(1)调取所述存储单元(2)中的全部的所述输入神经元和权值并输出映射后的输入神经元和权值,并存储至所述存储单元(2);
    输入神经元缓存(6)、权值缓存(7)通过DMA(3)读取部分所述映射后的输入神经元和权值,并被运算单元(8)调取;
    输出神经元缓存(9)缓存所述运算单元(8)获得的所述输出神经元,并通过DMA(3)存储至所述存储单元(2);
    判断所述输入神经元和权值是否均经过运算,若是,运算结束,否则,返回输入神经元缓存(6)、权值缓存(7)通过DMA(3)读取部分所述映射后的输入神经元和权值的步骤。
  15. 根据权利要求12所述的方法,所述方法还包括:
    所述映射单元(1)通过DMA(3)调取所述存储单元(2)中的部分的所述输入神经元和权值并输出映射后的输入神经元和权值;
    输入神经元缓存(6)、权值缓存(7)缓存所述映射后的输入神经元和权值,并被运算单元(8)调取;
    输出神经元缓存(9)缓存所述运算单元(8)获得的所述输出神经元,并通过DMA(3)存储至所述存储单元(2);
    判断所述输入神经元和权值是否均经过映射及运算,若是,运算结束,否则,返回映射单元(1)通过DMA(3)调取所述存储单元(2)中的部分的所述输入神经元和权值的步骤。
  16. 一种芯片,其中,所述芯片包括权利要求1至11中任一所述的人工神经网络运算的装置。
  17. 一种电子装置,其中,所述电子装置包括权利要求16所述的芯片。
PCT/CN2017/118124 2016-12-23 2017-12-22 一种人工神经网络运算的装置及方法 WO2018113790A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17883465.1A EP3561732A4 (en) 2016-12-23 2017-12-22 OPERATING DEVICE AND METHOD FOR AN ARTIFICIAL NEURONAL NETWORK
US16/444,443 US11775832B2 (en) 2016-12-23 2019-06-18 Device and method for artificial neural network operation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611214028 2016-12-23
CN201611214028.0 2016-12-23

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/444,443 Continuation-In-Part US11775832B2 (en) 2016-12-23 2019-06-18 Device and method for artificial neural network operation

Publications (1)

Publication Number Publication Date
WO2018113790A1 true WO2018113790A1 (zh) 2018-06-28

Family

ID=62624548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/118124 WO2018113790A1 (zh) 2016-12-23 2017-12-22 一种人工神经网络运算的装置及方法

Country Status (4)

Country Link
US (1) US11775832B2 (zh)
EP (1) EP3561732A4 (zh)
CN (4) CN108320018B (zh)
WO (1) WO2018113790A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102637733B1 (ko) * 2018-10-31 2024-02-19 삼성전자주식회사 뉴럴 네트워크 프로세서 및 그것의 컨볼루션 연산 방법
CN109740739B (zh) * 2018-12-29 2020-04-24 中科寒武纪科技股份有限公司 神经网络计算装置、神经网络计算方法及相关产品
US11099788B2 (en) * 2019-10-21 2021-08-24 Advanced Micro Devices, Inc. Near-memory data reduction
KR20210084123A (ko) * 2019-12-27 2021-07-07 삼성전자주식회사 전자 장치 및 그 제어 방법
US20220138579A1 (en) * 2020-11-02 2022-05-05 International Business Machines Corporation Weight repetition on rpu crossbar arrays

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (zh) * 2003-10-21 2004-09-15 上海交通大学 神经网络建模方法
CN105701540A (zh) * 2016-01-11 2016-06-22 清华大学 一种自生成神经网络构建方法

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE58909182D1 (de) * 1988-07-05 1995-05-24 Siemens Ag Netzwerk -Baustein und Architektur für die programmierbare Emulation künstlicher neuronaler Netze mit digitaler Arbeitsweise.
US5274745A (en) * 1989-07-28 1993-12-28 Kabushiki Kaisha Toshiba Method of processing information in artificial neural networks
JP5115965B2 (ja) * 2007-10-01 2013-01-09 独立行政法人理化学研究所 ニューロン装置、神経回路網装置、非負整数符号化装置、整数クラスタ装置、フィードバック制御装置、ならびに、プログラム
CN101546389A (zh) * 2008-03-26 2009-09-30 中国科学院半导体研究所 一种主方向神经网络系统
JP4766101B2 (ja) * 2008-11-10 2011-09-07 ソニー株式会社 触行動認識装置及び触行動認識方法、情報処理装置、並びにコンピューター・プログラム
DE102008058016A1 (de) * 2008-11-19 2010-11-04 Optiminig Gmbh System und Verfahren zur rechnerbasierten Analyse großer Datenmengen
CN102111626A (zh) * 2009-12-23 2011-06-29 新奥特(北京)视频技术有限公司 一种rgb到cmyk色彩空间的转换方法和装置
CN101834827B (zh) * 2010-03-29 2012-07-18 大唐联诚信息系统技术有限公司 一种多输入多输出系统中的信号检测方法和装置
US8700552B2 (en) * 2011-11-28 2014-04-15 Microsoft Corporation Exploiting sparseness in training deep neural networks
CN102665049B (zh) * 2012-03-29 2014-09-17 中国科学院半导体研究所 基于可编程视觉芯片的视觉图像处理系统
US9730643B2 (en) * 2013-10-17 2017-08-15 Siemens Healthcare Gmbh Method and system for anatomical object detection using marginal space deep neural networks
US20160026912A1 (en) * 2014-07-22 2016-01-28 Intel Corporation Weight-shifting mechanism for convolutional neural networks
CN105550749A (zh) * 2015-12-09 2016-05-04 四川长虹电器股份有限公司 一种新型网络拓扑结构的卷积神经网络的构造方法
CN106127301B (zh) * 2016-01-16 2019-01-11 上海大学 一种随机神经网络硬件实现装置
CN105512723B (zh) * 2016-01-20 2018-02-16 南京艾溪信息科技有限公司 一种用于稀疏连接的人工神经网络计算装置和方法
WO2017155544A1 (en) * 2016-03-11 2017-09-14 Hewlett Packard Enterprise Development Lp Hardware accelerators for calculating node values of neural networks
CN105844330B (zh) * 2016-03-22 2019-06-28 华为技术有限公司 神经网络处理器的数据处理方法及神经网络处理器
CN106022468B (zh) * 2016-05-17 2018-06-01 成都启英泰伦科技有限公司 人工神经网络处理器集成电路及该集成电路的设计方法
US20180046903A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
WO2018058509A1 (en) * 2016-09-30 2018-04-05 Intel Corporation Dynamic neural network surgery

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1529281A (zh) * 2003-10-21 2004-09-15 上海交通大学 神经网络建模方法
CN105701540A (zh) * 2016-01-11 2016-06-22 清华大学 一种自生成神经网络构建方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIAO, JUNFEI ET AL.: "Dynamic optimization structure design for neural networks: review and perspective", CONTROL THEORY & APPLICATIONS, vol. 27, no. 3, 1 March 2010 (2010-03-01), pages 350 - 357, XP055610827, ISSN: 1000-8152, DOI: 10.7641/j.issn.1000-8152 *
SUN, HUANLONG ET AL.: "Research on New pruning algorithm for Feedforward Neural Network Structure", JOURNAL OF GUANGXI TEACHERS EDUCATION UNIVERSITY (NATURAL SCIENCE EDITION), vol. 30, no. 4, 31 December 2013 (2013-12-31), pages 55 - 60, XP009515365, ISSN: 1002-8743 *

Also Published As

Publication number Publication date
US20190311266A1 (en) 2019-10-10
CN111160547B (zh) 2024-04-09
EP3561732A4 (en) 2020-04-01
CN111126590A (zh) 2020-05-08
CN111160547A (zh) 2020-05-15
EP3561732A1 (en) 2019-10-30
CN108334944A (zh) 2018-07-27
CN108320018A (zh) 2018-07-24
US11775832B2 (en) 2023-10-03
CN108334944B (zh) 2020-04-17
CN108320018B (zh) 2020-03-06
CN111126590B (zh) 2023-09-29

Similar Documents

Publication Publication Date Title
WO2018113790A1 (zh) 一种人工神经网络运算的装置及方法
CN107609642B (zh) 计算装置和方法
US11698786B2 (en) Processing apparatus and processing method
US11307865B2 (en) Data processing apparatus and method
US20200097806A1 (en) Processing method and accelerating device
CN109284823B (zh) 一种运算装置及相关产品
EP3451157B1 (en) Device and method for performing forward operation of convolutional neural network
WO2018108126A1 (zh) 神经网络卷积运算装置及方法
CN110298443B (zh) 神经网络运算装置及方法
CN108427990B (zh) 神经网络计算系统和方法
KR102470264B1 (ko) 완전연결층 신경망 역방향 트레이닝 실행용 장치와 방법
WO2017185391A1 (zh) 一种用于执行卷积神经网络训练的装置和方法
CN111260025B (zh) 用于执行lstm神经网络运算的装置和运算方法
WO2017185389A1 (zh) 一种用于执行矩阵乘运算的装置和方法
WO2017185387A1 (zh) 一种用于执行全连接层神经网络正向运算的装置和方法
WO2021208612A1 (zh) 数据处理的方法与装置
EP3564863B1 (en) Apparatus for executing lstm neural network operation, and operational method
EP3451238A1 (en) Apparatus and method for executing pooling operation
CN108171328B (zh) 一种神经网络处理器和采用其执行的卷积运算方法
WO2018058427A1 (zh) 神经网络运算装置及方法
WO2018112892A1 (zh) 一种支持快速人工神经网络运算的装置及方法
CN107305486B (zh) 一种神经网络maxout层计算装置
WO2017177446A1 (zh) 支持离散数据表示的人工神经网络反向训练装置和方法
WO2017185248A1 (zh) 用于执行人工神经网络自学习运算的装置和方法
CN113741977B (zh) 数据操作方法、数据操作装置、数据处理器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17883465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017883465

Country of ref document: EP

Effective date: 20190723