WO2018112892A1

WO2018112892A1 - Device and method for supporting fast artificial neural network operation

Info

Publication number: WO2018112892A1
Application number: PCT/CN2016/111737
Authority: WO
Inventors: 刘少礼; 郝一帆; 陈云霁; 郭崎; 陈天石
Original assignee: 北京中科寒武纪科技有限公司; 上海寒武纪信息科技有限公司
Priority date: 2016-12-23
Filing date: 2016-12-23
Publication date: 2018-06-28

Abstract

The device and method for supporting a fast artificial neural network operation. The device comprises: a mapping unit for receiving an input neuron, a weight value, and a connection relationship between the input neuron and an output neuron, optimizing the connection relationship, and outputting the mapped input neuron and the mapped weight value, wherein the correlation between the mapped input neuron and the mapped weight value is an input neuron-weight value pair. By means of the device and method, the connection relationship between the input neuron and the weight value is optimized by means of a sparse mapping unit and/or a fast mapping unit, thereby reducing the calculation amount, solving the problem of insufficient operational performance of a CPU and a GPU and large front-end decoding costs, and effectively improving the support for multi-level artificial neural network operation algorithms.

Description

Device and method for supporting fast artificial neural network operation

Technical field

The present invention relates to the field of data processing technologies, and more particularly to an apparatus and method for fast artificial neural network operations.

Background technique

Artificial Neural Networks (ANNs) are simply referred to as Neural Networks (NNs), which is an algorithmic mathematical model that mimics the behavioral characteristics of animal neural networks and performs distributed parallel information processing. This kind of network relies on the complexity of the system, and adjusts the interconnection relationship between a large number of internal nodes to achieve the purpose of processing information. The algorithm used by neural networks is vector multiplication, and symbolic functions and their various approximations are widely used.

Neural networks are widely used in a variety of application scenarios: computational vision, speech recognition, and natural language processing. In recent years, the scale of neural networks has been growing. In 1998, Lecun's neural network for handwritten character recognition was less than 1M in weight; in 2012, krizhevsky used to participate in the ImageNet competition with a scale of 60M weights.

The neural network is a high-calculation and high-access application. The more weights, the larger the calculation and the amount of memory. With the dramatic increase in the amount of computation and the amount of memory in the neural network, the prior art generally uses a general-purpose processor to calculate the artificial neural network. For general-purpose processors, input neurons, output neurons, and weights are stored in three arrays, along with an indexed array that stores the connection between each output and the input connection. In the calculation, the main operation is the multiplication of neurons with weights. Since the weight and the neuron are not one-to-one correspondence, each operation must find the weight corresponding to the neuron through the index array. Due to the weak computing power and memory access of general-purpose processors, the needs of neural networks cannot be met. When multiple general-purpose processors are executed in parallel, communication between the general-purpose processors becomes a performance bottleneck. In the calculation of the neural network after pruning, each multiplication operation has to go to the index array to re-find the position corresponding to the weight, which increases the additional computation and memory overhead. Therefore, the calculation of the neural network takes a long time and consumes a lot of power. The general-purpose processor needs to decode the multi-layer artificial neural network into a long column operation and a fetch instruction sequence, and the processor front-end decoding brings a large power consumption overhead.

Another known method of supporting artificial neural network operations and their training algorithms is to use a graphics processing unit (GPU) that performs the processing by using a general purpose register file and a general purpose stream processing unit. The above algorithm is supported by SIMD instructions. However, since the GPU is a device specially used for performing graphic image operations and scientific calculations, without special support for artificial neural network operations, a large amount of front-end decoding work is still required to perform artificial neural network operations, which brings a lot of additional overhead. In addition, the GPU has only a small on-chip buffer. The model data (weight) of the multi-layer artificial neural network needs to be repeatedly transferred from off-chip. The off-chip bandwidth becomes the main performance bottleneck, and brings huge power consumption overhead.

Summary of the invention

In view of the problems existing in the prior art, in order to overcome the deficiencies of the above prior art solutions, the present invention proposes an apparatus and method for fast artificial neural network operation.

According to an aspect of the present invention, an apparatus for supporting fast artificial neural network operations is provided, comprising:

a mapping unit, receiving an input neuron, a weight, and a connection relationship between the input neuron and the output neuron, optimizing the connection relationship, outputting the mapped input neuron and the weight, and the mapped input neuron The correspondence with the weight is the input neuron-weight pair.

According to another aspect of the present invention, a method for supporting fast artificial neural network operations is provided, comprising:

The mapping unit retrieves the input neurons, weights, and connection relationships in the storage unit and outputs the mapped input neurons and weights;

The computing device retrieves the mapped input neurons and weights and performs operations to obtain output neurons.

It can be seen from the above technical solutions that the present invention has the following beneficial effects:

(1) Optimize the connection relationship between the input neurons and the weights through the sparse mapping unit and/or the fast mapping unit, reduce the calculation amount, solve the problem that the CPU and GPU performance is insufficient, and the front-end decoding overhead is large, effectively improving Support for multi-layer artificial neural network algorithms;

(2) By using a dedicated on-chip buffer for multi-layer artificial neural network operation algorithm, the reuse of input neurons and weight data is fully exploited, avoiding repeated reading of these data into memory, reducing memory access bandwidth and avoiding Memory bandwidth becomes a problem of multi-layer artificial neural network operation and performance bottleneck of its training algorithm.

DRAWINGS

1 is a schematic structural diagram of a mapping unit according to an embodiment of the present invention;

2 is a schematic structural diagram of an artificial neural network according to an embodiment of the present invention;

3 is a schematic diagram showing a first connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;

4 is a schematic diagram showing a second connection manner of the first output neuron after the artificial neural network in FIG. 2 is thinned;

FIG. 5 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to an embodiment of the present invention; FIG.

6 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 5;

Figure 7 is a flow chart showing the operation steps of the arithmetic unit of Figure 6;

FIG. 8 is a schematic structural diagram of an apparatus for supporting a fast artificial neural network operation according to another embodiment of the present invention; FIG.

9 is a flow chart of an operation method of the apparatus for supporting fast artificial neural network operation in FIG. 8;

FIG. 10 is a schematic structural diagram of a system supporting fast artificial neural network operation according to still another embodiment of the present invention.

detailed description

Some embodiments of the invention will be described more fully hereinafter with reference to the appended drawings, in which some, but not all, In fact, the various embodiments of the invention can be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

In the present specification, the following various embodiments for describing the principles of the present invention are merely illustrative and should not be construed as limiting the scope of the invention. The following description of the invention is intended to be understood as The description below includes numerous specific details to assist the understanding, but these details should be considered as merely exemplary. Accordingly, it will be appreciated by those skilled in the art that various changes and modifications can be made in the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness. In addition, the same reference numbers are used throughout the drawings for similar functions and operations.

The present invention will be further described in detail below with reference to the specific embodiments of the invention.

Embodiments of the present invention provide an apparatus for supporting fast artificial neural network operations, including The mapping unit optimizes the connection relationship and outputs the mapped input neurons and weights. The correspondence between the mapped input neurons and the weights is the input neuron-weight pair, which reduces the computational complexity of the artificial neural network operation. Fast artificial neural network operations.

The input neuron-weight pair is not a true data storage structure, but merely represents the correspondence between the input neurons and the weights. For example, the input neurons are stored in vector A, the weights are stored in vector B, the lengths of vectors A and B are the same, and the components of the same position of vectors A and B are combined to be considered an input neuron-weight pair. . When participating in an operation, the input neurons and weights can be placed separately in different caches and used by the arithmetic unit.

As shown in FIG. 1, the input data includes input neurons, weights, and connection relationships, the input data is input to the mapping unit 1, and the mapped input neurons and weights are output by the mapping unit 1, and the mapped input neurons and weights are mapped. The corresponding relationship is the input neuron-weight pair.

The mapping unit includes a thinning mapping unit 11 and/or a fast mapping unit 12, and the thinning mapping unit 11 is configured to perform a thinning operation for removing a connection whose weight is 0 or whose weight is less than a preset threshold, and the fast mapping unit 12 Used to perform a fast mapping operation for removing connections where the input neuron is 0 or less than a preset threshold, and the two thresholds mentioned herein may not be equal.

The thinning mapping unit 11 includes a thinning determination unit and a thinning execution unit, and the thinning determination unit determines whether to perform a thinning operation. If the thinning determination unit determines to perform the thinning operation, the thinning execution unit is based on the input neuron and the output. Whether the weight is 0 or whether it is less than a predetermined threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if the thinning determination unit determines If the thinning operation is not performed, the default ownership value is non-zero or greater than the preset threshold. The connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.

The connection relationship in the thinning mapping unit 11 can be expressed in the following two ways:

The first way:

Use 1 to indicate that the weight between the input neuron and the output neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained, and 0 indicates that the input neuron weight is zero or less than the preset. Threshold, the connection between the input neuron and the output neuron is removed, and each output neuron is connected with all input neurons to form a string of 0 and 1 to represent the connection relationship of the output neuron, and the output is output The connection relationship of neurons is combined into a vector.

The second way:

Retaining/removing the connection according to whether the weight is zero or whether it is less than a preset threshold, and outputting the distance of the first connection where the first connection is located from the first input neuron, and the outputting the second input neuron distance The distance of the last input neuron, the distance of the output third input neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted to represent the connection relationship of the output .

The fast mapping unit 12 includes a fast mapping determining unit and a fast mapping executing unit, and the fast mapping determining unit determines whether the neural network performs an operation of discriminating the input neurons, and if the fast mapping determining unit determines execution, according to whether the value of the input neuron is 0 or Whether it is less than a preset threshold, the connection relationship is optimized, and the input data is converted into an input neuron-weight pair according to the processed connection relationship; if not, all input neurons are not 0 or both by default. Greater than the preset threshold, the connection relationship is not processed, and the input data is directly converted into an input neuron-weight pair.

The connection relationship in the fast mapping unit 12 can also be expressed in the following two ways:

The first way:

Use 1 to indicate that the input neuron is non-zero or greater than a preset threshold, and the connection between the input neuron and the output neuron is retained. 0 indicates that the input neuron is zero or less than a preset threshold, and the input neuron and output nerve are removed. The connection between the elements, each output neuron and all of its input neurons form a string of 0 and 1 to represent the connection relationship of the output neurons, and combine the connection relationships of all output neurons into a vector.

The second way:

Retaining/removing the connection according to whether the input neuron is zero or less than a preset threshold, and the distance at which the first connection of the output neuron is located is the distance from the first input neuron, and the output neuron is second. The distance of the input neuron from the previous input neuron, the distance of the third input neuron of the output neuron from the previous input neuron, ..., and so on, until all inputs of the output are exhausted, Indicates the connection relationship of the output.

Specifically, let a neural network have an L layer, K=1, 2, . . . , L-1. For the Kth layer and the K+1th layer, we refer to the Kth layer as an input layer, Kth. The +1 layer is called the output layer. That is, except for the top layer, each layer can be used as an input layer, and the next layer is the corresponding output layer, and the number of neurons in each layer is predicted.

Assuming that both the sparse mapping unit and the fast mapping unit perform the corresponding operations, the operation process between each pair of input layers and output layers is as follows:

Let the input layer be composed of N input neurons I ₁ , I ₂ , . . . , I _N , and the output layer consists of M output neurons O ₁ , O ₂ , . . . , O _M .

i=1,2,...,N,j=1,2,...,M

The first connection method:

First, each output neuron O _j gets its corresponding connection relationship. Since the input layer has N nodes, the connection relationship has N bits, and the value of each bit is 1 or 0. The value of the i-th bit indicates that there is a connection between I _i and O _j , and 0 indicates I _i and O _{j .} There is no connection between them. Initially, the value of these N bits is set to 1. If the value of the input neuron I _i is zero or less than a preset threshold, or if the weight between I _i and O _j is zero or less than a preset threshold, then the value of the i-th bit in the connection relationship is set If it is 0, it is considered that there is no connection between I _i and O _j . Then, all the connection relations of the output neurons are combined into one vector, and the N×(j-1)+1 component to the N×jth component value of the vector is the connection relationship corresponding to the output neuron O _j .

In this method, the number of input layer neurons is equal to the number of stored bits of the connection relationship corresponding to each output neuron. So even with the simplest one-dimensional array that takes only 0,1 values, you can clearly know the connection relationship of each output neuron.

The second connection method:

For each output neuron O _j get its corresponding connection relationship. If the value of the input neuron I _i is zero or less than a preset threshold, or if the weight between I _i and O _j is zero or less than a preset threshold, then there is no connection between I _i and O _j , Otherwise there is a connection. If the input and O _j of the neural element is connected to _{_{I i_1, I i_2, ...,}} I i_n, wherein 1≤i_1 <i_2 <... <i_n≤N. Then, the connection relationship has n bits; the first bit value is equal to i_1-1; n≥k>1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1).

In this method, the connection relationship can be represented by a high-dimensional dynamic array, which can be represented by a linked list or the like.

After the processed connection relationship is obtained, the mapping unit outputs the mapped input neurons and weights according to the processed connection relationship, and the mapping relationship between the mapped input neurons and the weights is an input neuron-weight pair, after mapping The input neurons and weights can be used directly during the operation.

In summary, in the mapping unit, the thinning mapping unit 11 and the fast mapping unit 12 optimize the connection relationship of the input data, and output the mapped input neurons and weights, and the corresponding connection relationship can adopt two representation modes: One is to use between each input and output neuron One indicates whether there is a connection, and the other is the distance between the connections to indicate the location of each connection.

In order to make the functions of the two mapping units more explicit, the data operation processes in the two units are respectively given below.

Taking an artificial neural network as shown in FIG. 2 as an example, only the criterion is whether the value is 0. The artificial neural network has 4 input neurons: I1, I2, I3, I4; there are 2 output nerves. Element: O1, O2; the weights of the connections are expressed as: W11, W21, W31, W41, W12, W22, W32, W42. Let I1 have a value of 0, I2, I3, and I4 are not 0; let W21, W12, and W42 be 0, and the remaining weights are non-zero.

The sparse mapping unit and the fast mapping unit can process the data at the same time, or the data can be processed in turn and the order of the two can be interchanged. The following only describes the data processing by the sparse mapping unit.

The first connection is expressed as follows:

In the thinning mapping unit 11, if the thinning operation is not performed: O1, the connection relationship of O2 defaults to: 1111, and the placement order is 11111111; if the thinning operation is performed, as shown in FIG. 3, the connection of the output neuron O1 is output. The relationship is: 1011, each bit indicates whether there is a connection with the input, 1 indicates that there is a connection, 0 indicates no connection, and the connection relationship of the output neuron O2 is 0110. During the operation, the input neurons and the weights corresponding to the connection relationship of 0 are not operated. When storing the connection relationship, the connection relationship can be stored in the order of the output neurons. Put all the inputs of each output neuron in turn and combine them into a vector. The order of the above example is 10110110.

In the fast mapping unit 12, if the operation of discriminating the input neuron value is not performed, the connection relationship and the placement order of O1, O2 are unchanged; if the operation of discriminating the input neuron value is performed, the map after the thinning operation is performed In the neural network shown in 3, the connection relationship of the output neuron O1 is: 0011, and the first digit is changed from 1 to 0 because the first input neuron I1 has a value of 0, the connection from I1 is removed, and the output nerve is output. The connection relationship of the element O2 is: 0110, and finally placed as: 00110110. For the neural network that does not perform the thinning operation, the connection relationship of the output neuron O1 is: 0111, and the connection relationship of the output neuron O2 is: 0111, and finally placed as: 01110111.

The second connection is expressed as follows:

In the thinning mapping unit 11, if the thinning operation is not performed, the connection relationship of O1, O2 defaults to: 0, 1, 1, 1; if the thinning operation is performed, as shown in FIG. 4, the output neurons are output. O1 is connected to the input neurons I1, I3, I4, then the connection relationship is 0, 2, 1. 0 indicates that the position of the first connection is 0 from the first input neuron, that is, the first input nerve Yuan, 2 indicates that the distance of the second input neuron from the previous input neuron is 2, that is, the third input neuron, and 1 indicates that the distance of the third input neuron from the previous input neuron is 1, that is, Represents the fourth input neuron. Similarly, the connection relationship of O2 is 1,1.

In the fast mapping unit 12, if the operation of discriminating the input neuron value is not performed: the connection relationship of O1, O2 is unchanged; if the operation of discriminating the input neuron value is performed, the operation shown in FIG. 4 after performing the thinning operation The neural network, because the first input neuron I1 value is 0, removes the connection from I1, so the connection relationship of the output neuron O1 is: 2, 1, the connection relationship of the output neuron O2 is: 1,1. For a neural network that does not perform thinning operations, the connection relationship of O1 and O2 is: 1,1,1.

The thinning mapping unit 11 and the fast mapping unit 12 in the present invention include, but are not limited to, the above connection relationship.

The sparse mapping unit 11 and the fast mapping unit 12 output the mapped neurons and weights according to the connection relationship obtained above, and the correspondence between the mapped neurons and the weights is an input neuron-weight pair, and the input nerve The meta-weight pair can be used directly in the operation, taking the specific process of outputting the neuron O1 mapping in an artificial neural network as shown in FIG. 2 as an example:

The input neurons are: I1, I2, I3, I4, and the input weights are: W11, W21, W31, W41, where I1, W21 take 0, and the rest are non-zero.

First, in the thinning mapping unit 11, the connection relationship is: 1011, or 0, 2, 1. Then, in the quick mapping unit 12, the connection relationship is: 0011, or 2, 1. According to the connection relationship, the two mapping units output the input neurons with the value of 0 removed and the connection weights issued therefrom. The mapped input neurons are I3, I4, and the mapped weights are W31, W41. The input neuron-weight pair is: I3-W31, I4-W41. For example, by storing the mapped input neurons and weights in a vector manner, the obtained input neuron vector is (I3, I4), and the obtained weight vector is (W31, W41).

Although the above example first performs the thinning operation by using the thinning mapping unit, the fast mapping unit is used to perform the fast mapping, and finally the mapped input neurons and weights are obtained. However, in practical applications, the two mapping units preferably operate on the data at the same time, regardless of the order.

A device supporting fast artificial neural network operation in the embodiment of the present invention, except for a mapping unit 1, further comprising: a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, an input neuron buffer 6, a weight buffer 7, an arithmetic unit 8, and an output neuron cache 9. As shown in Figure 5,

The storage unit 2 is configured to store data and instructions, and receive and store externally input data and instructions, including input neurons, weights, and connection relationships.

The mapping unit 1 retrieves the input neurons, the weights, and the connection relationships in the storage unit 2, and performs the thinning operation by the thinning mapping unit 11, performs the fast mapping by the fast mapping unit 12, and the mapping unit 1 obtains the mapping of the data. The mapped input neurons and weights are stored in the storage unit 2.

The DMA 3 calls the instruction in the storage unit 2 and the mapped input neurons and weights, and allocates them to the instruction cache 4, the input neuron buffer 6, and the weight buffer 7, respectively.

The control unit 5 reads the dedicated instruction from the instruction buffer 4 and decodes it into an arithmetic unit instruction and inputs it to the arithmetic unit 8.

The operation unit 8 is configured to execute a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated. The operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step. The vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.

The output neuron cache 9 is used to store the output neurons obtained by the operation unit, and then stored in the storage unit 2 via the DMA 3, and the external neurons can be retrieved and stored in the storage unit 2.

This embodiment also provides a method for supporting fast artificial neural network operation. As shown in FIG. 6, the method includes the following steps:

S101: Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.

S102: The mapping unit calls all input neurons, weights, and connection relationships in the storage unit, and processes the same, and obtains the mapped input neurons and weights, and stores them in the storage unit.

Specifically, the sparse mapping unit sparses input neurons, weights, and connection relationships. The processing and fast mapping unit performs fast mapping processing on input neurons, weights, and connection relationships. Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights. The neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.

S103: Input the neuron cache 6. The weight buffer 7 reads the partially mapped neurons and weights through the DMA3.

S104: The operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.

The specific operation includes the following steps, as shown in Figure 7:

S1041: Perform a multiplication operation, which is used to multiply the mapped neuron and the weight data;

S1042: Perform an addition tree operation, and add the results obtained in the first stage by the addition tree step by step to complete the vector inner product operation;

S1043: Perform non-linear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transformation is an activation function operation, and the activation function may be a sigmoid function, a tanh function, a ReLU function, or a softmax function.

S105: The operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.

S106: It is determined whether all the mapped neurons and the weights are calculated. If the result is N, the process returns to step S103. If the result is Y, step S107 is performed.

S107: End the operation.

Another embodiment of the present invention provides an apparatus for supporting fast artificial neural network operations, including a mapping unit 1, a storage unit 2, a DMA (Direct Memory Access) 3, an instruction cache 4, a control unit 5, and an input neuron cache. 6. The weight buffer 7, the arithmetic unit 8, and the output neuron cache 9, as shown in FIG.

The instructions in the DMA3 call storage unit 2 are allocated to the instruction cache 4, and the input neurons in the call storage unit 2, the weights, and the connection relationship are assigned to the mapping unit 1 for direct mapping.

The mapping unit 1 performs a thinning operation by the thinning mapping unit 11, and the fast mapping unit 12 performs fast mapping, and the mapping unit 1 obtains the mapped input neurons and weights through mapping of the data, and transmits them to the neuron cache 6, and the weight buffer 7, respectively.

The operation unit 8 is configured to perform a specific operation, and according to the operation instruction, the input neuron and the weight of the input neurons in the input neuron buffer 6 and the weight buffer 7 are retrieved and operated. The operation unit 8 includes a multiplication unit for multiplying the mapped neurons and the weight data, and an addition tree operation unit for adding the multiplied results obtained in the first stage by the addition tree step by step. The vector inner product operation is completed; and the nonlinear transform unit is configured to perform nonlinear transformation on the result obtained in the second stage to obtain an output neuron, and the nonlinear transform is an activation function operation, and the activation function may be a sigmoid function, tanh Functions, ReLU functions or softmax functions.

This embodiment also provides a method for supporting fast artificial neural network operations, as shown in FIG. 9, including the following steps:

S201: Read an artificial neural network SIMD instruction for starting an operation of a fast artificial neural network operation.

S202: the mapping unit calls part of the input neurons, weights, and connection relationships in the storage unit through DMA3, and processes the same, and obtains the mapped input neurons and weights directly into the neuron cache 6, and the weight cache. 7.

Specifically, the sparse mapping unit performs thinning processing on the input neurons, the weights, and the connection relationships, and the fast mapping unit performs fast mapping processing on the input neurons, the weights, and the connection relationships. Both mapping units can use two connection methods to process the connection relationship, and output the input neurons and input weights according to the processed connection relationship to output the mapped neurons and weights. The neurons and weights after the mapping are processed according to the processed connection relationship are described in detail, and will not be described again here.

S203: The operation unit calls the neuron buffer 6, the mapped input neuron in the weight buffer 7, and the weight to perform an operation, and obtain an output neuron.

The specific operation steps are the same as those of the previous step S104 in the previous embodiment, and are not described again.

S204: The operation unit stores the obtained output neurons into the output neuron buffer 9 and stores them in the storage unit 2 via the DMA 3.

S205: It is determined whether all input neurons and weights are mapped and operated. If the result is N, the process returns to step S102. If the result is Y, step S107 is performed.

S206: End the operation.

Compared with the previous embodiment, the thinning mapping unit and the fast mapping unit of the mapping unit in the embodiment are mapped in the calculation, and the mapped data is directly calculated to the operation unit, and the previous embodiment The data mapped by the thinning mapping unit and the fast mapping unit of the mapping unit before use in the calculation of the arithmetic unit are stored in the storage unit. In this embodiment, the operation speed is faster.

Another embodiment of the present invention provides a system for fast artificial neural network operation, as shown in FIG. 10, which includes: an I/O interface 20, a storage device 30, a central processing unit (CPU) 40, and a fast artificial neural network operation. Device 10.

The I/O interface 20 for I/O data needs to be sent by the CPU 40 to the device 10 supporting the fast artificial neural network operation, and then written to the storage device 30 by the device 10 supporting the fast artificial neural network operation, and the fast artificial neural network operation The dedicated instructions required by device 10 are also transmitted by CPU 40 to fast artificial neural network computing device 10.

Storage device 30 is used to temporarily store artificial neural network models and neuron data, particularly when all models are not available in the cache on device 10 supporting fast artificial neural network operations.

A central processing unit (CPU) 40 is used for data handling and basic control such as start-stop of the device 10 supporting fast artificial neural network operation, as an interface between the device 10 supporting the fast artificial neural network operation and external control.

A device 10 supporting fast artificial neural network operations for accepting data and programs from the CPU 40, executing a fast artificial neural network operation algorithm, and executing the results of the device 10 supporting the fast artificial neural network operation is transmitted back to the CPU 40.

In this embodiment, the device 10 supporting the fast artificial neural network operation is supported as a coprocessor of the CPU 40 or the GPU to execute a fast artificial neural network operation algorithm.

In still another embodiment of the present invention, a plurality of devices supporting fast artificial neural network operations are interconnected to form a system: a plurality of devices supporting fast artificial neural network operations can be interconnected through a PCIE bus to support larger-scale rapid artificial neural network operations. Can share the same host CPU or Each has its own host CPU, which can share memory or each accelerator has its own memory. In addition, the interconnection method can be any interconnection topology.

The manner of checking whether or not the technical solution described in the present invention is used is as follows:

If a device or method using the techniques of the present invention is employed, when performing a neural network operation, if a portion of the input neurons and weights in the network have a value equal to zero or near zero for a given neural network, then In terms of computational speed, there is an improvement over devices or methods that do not employ the techniques described herein. Moreover, the larger the ratio of input neurons equal to 0 or near 0 to all input neurons in the network, the greater the increase in operation speed; the value of the value equal to 0 or the weight near 0 is the proportion of the ownership value in the network. Large, the higher the speed of the operation.

The processes or methods depicted in the preceding figures may be by hardware (eg, circuitry, dedicated logic, etc.), firmware, software (eg, software carried on a non-transitory computer readable medium), or both. The combined processing logic is executed. Although the processes or methods have been described above in some order, it should be understood that certain operations described can be performed in a different order. Moreover, some operations may be performed in parallel rather than sequentially.

It should be noted that the implementations that are not shown or described in the drawings or the text of the specification are all known to those of ordinary skill in the art and are not described in detail. In addition, the above definitions of the various elements and methods are not limited to the specific structures, shapes or manners mentioned in the embodiments, and those skilled in the art can simply modify or replace them.

The specific embodiments of the present invention have been described in detail in the foregoing detailed description of the embodiments of the present invention. All modifications, equivalents, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

A device that supports fast artificial neural network operations, including:

The mapping unit (1) receives an input neuron, a weight, and a connection relationship between the input neuron and the output neuron, optimizes the connection relationship, and outputs the mapped input neuron and the weight, the mapped The correspondence between the input neurons and the weights is the input neuron-weight pair.
The apparatus according to claim 1, wherein said mapping unit (1) comprises:

a thinning mapping unit (11) for removing a connection in the neural network with a weight of 0 or less than a first threshold; and/or

a fast mapping unit (12) for removing a connection in which the input neuron is 0 or smaller than the second threshold;

The thinning mapping unit (11) and the fast mapping unit (12) respectively perform optimization processing on the connection relationship.
The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the thinning mapping unit (11) optimizes the connection relationship including:

The j-th output neuron O j obtains its corresponding connection relationship, corresponding to the N nodes of the input layer, the connection relationship has N bits, and initially, the value of the N bits is set to 1, N inputs The neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j if the weight between the i-th input neuron I i and the output neuron O j is 0 or less. The first threshold value sets the value of the i-th bit in the connection relationship to 0, and there is no connection between I i and O j , and the connection relationship of all the output neurons O 1 , O 2 , . . . , O M is combined. a vector, the vector of N × (j-1) +1 th component of the N × j-th component of the processed output connection relationship neurons corresponding O j.
The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the thinning mapping unit (11) optimizes the connection relationship including:

For the jth output neuron O j to obtain its corresponding connection relationship, if the weight between the i-th input neuron I i and the output neuron O j is 0 or less than the first threshold, then I i and O j There is no connection between them, otherwise there are connections. The n input neurons connected to O j are I i_1 , I i_2 ,..., I i_n , where 1≤i_1<i_2<...<i_n≤N, output The processed connection relationship corresponding to the neuron O j has n bits, the first bit value is equal to i_1-1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1), where n≥k>1.
The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the fast mapping unit (12) optimizes the connection relationship including:

The j-th output neuron O j obtains its corresponding connection relationship, corresponding to the N nodes of the input layer, the connection relationship has N bits, and initially, the value of the N bits is set to 1, N inputs The neurons I 1 , I 2 , . . . , I N are connected to the output neuron O j . If the i-th input neuron I i is 0 or less than the second threshold, the i-th in the connection relationship The value of the bit is set to 0, there is no connection between I i and O j , and the connection relationship of all the output neurons O 1 , O 2 , . . . , O M is combined into one vector, and the N×( J-1) The +1 component to the N×jth component is the processed connection relationship corresponding to the output neuron O j .
The apparatus according to claim 2, wherein the input layer of the neural network has N input neurons I 1 , I 2 , ..., I N , and the output layer has M output neurons O 1 , O 2 ,. .., O M , the fast mapping unit (12) optimizes the connection relationship including:

For the jth output neuron O j to obtain its corresponding connection relationship, if the i th input neuron I i is 0 or less than the second threshold, then there is no connection between I i and O j , otherwise there is a connection, and O j has n connected input neurons as I i_1 , I i_2 ,..., I i_n , where 1≤i_1<i_2<...<i_n≤N, and the processed connection relationship corresponding to the output neuron O j There are n bits, the first bit value is equal to i_1-1, and the value of the kth bit of the connection relationship is equal to i_k-i_(k-1), where n≥k>1.
The apparatus according to any one of claims 1 to 6, further comprising:

a storage unit (2) for storing externally input data and instructions, the data including input neurons, weights, and connection relationships, the mapping unit (1) retrieving the input neurons, weights, and connection relationships And outputting the mapped input neurons and weights;

The operation unit (8) is configured to retrieve the mapped input neurons and weights and perform operations to obtain output neurons.
The apparatus according to claim 7, wherein said arithmetic means (8) comprises:

a multiplication unit for multiplying the mapped neurons by weights;

An addition tree operation unit, configured to add the results obtained by the multiplication unit through the addition tree step by step to complete the vector inner product operation;

A nonlinear transform unit for nonlinearly transforming the result obtained by the addition tree operation unit Get the output neurons.
The apparatus of claim 8, wherein the non-linear transformation is an activation function operation selected from a sigmoid function, a tanh function, a ReLU function, or a softmax function.
The apparatus of claim 7 further comprising:

An instruction cache unit (4) for buffering the instructions;

Inputting a neuron cache (6) for buffering the mapped input neurons;

a weight cache (7) for buffering the mapped weights;

a control unit (5) for reading an instruction in the instruction cache unit (4) and controlling the operation unit (8) to retrieve the mapped input in the input neuron buffer (6) a neuron and the mapped weights in the weight buffer (7) and performing operations;

An output neuron cache (9) is provided for buffering the output neurons obtained by the arithmetic unit (8).
The apparatus according to claim 10, wherein the mapped input neurons and weights output by the mapping unit (1) are stored on the storage unit (2), the device further comprising:

DMA (3) for retrieving instructions on the storage unit (2) and mapping the input neurons and weights to the instruction cache unit (4), the input neuron cache (6), and the weight buffer respectively (7), and storing the output neurons in the output neuron buffer (9) on the storage unit (2) for transmission to the outside world.
The device of claim 10, wherein the device further comprises:

a DMA (3) for retrieving instructions stored on the memory unit (2) to the instruction cache unit (4), and retrieving data on the memory unit (2) to the mapping unit (1), the mapping unit (1) The outputted mapped input neurons and weights are respectively stored in the input neuron cache (6), the weight buffer (7), and the output neurons in the output neuron cache (9) Stored on the storage unit (2) for transmission to the outside world.
A method of supporting a fast artificial neural network operation, comprising the apparatus of any one of claims 7 to 12, the method comprising:

The mapping unit (1) retrieves the input neurons, weights, and connection relationships in the storage unit (2) and outputs the mapped input neurons and weights;

The computing device (8) retrieves the mapped input neurons and weights and performs operations to obtain output neurons.
The method of claim 13 wherein said computing comprises:

Multiplication, multiplying the mapped neurons by weights;

Adding a tree operation, adding the results obtained by the multiplication operation step by step through the addition tree to complete the vector inner product operation;

A nonlinear transformation is performed by nonlinearly transforming the result obtained by the addition tree operation unit to obtain an output neuron.
The method of claim 13 further comprising:

The mapping unit (1) retrieves all of the input neurons, weights, and connection relationships in the storage unit (2) and outputs the mapped input neurons and weights, and stores them in the storage unit (2);

The input neuron buffer (6) and the weight buffer (7) read a portion of the mapped input neurons and weights through the DMA (3), and are retrieved by the operation unit (8);

Outputting a neuron buffer (9) buffering the output neuron obtained by the operation unit (8), and storing to the storage unit (2) by DMA (3);

Determining whether the input neuron and the weight are both operated, and if so, the operation ends; otherwise, returning to the input neuron cache (6), the weight buffer (7) reading the mapped input through the DMA (3) The steps of neurons and weights.
The method of claim 13 further comprising:

The mapping unit (1) retrieves the input neurons, weights, and connection relationships of the portions of the storage unit (2) through DMA (3) and outputs the mapped input neurons and weights;

The input neuron cache (6) and the weight buffer (7) buffer the mapped input neurons and weights, and are retrieved by the operation unit (8);

Outputting a neuron buffer (9) buffering the output neuron obtained by the operation unit (8), and storing to the storage unit (2) by DMA (3);

Determining whether the input neuron and the weight are both mapped and operated, and if so, the operation ends; otherwise, the return mapping unit (1) retrieves the input of the portion of the storage unit (2) by DMA (3) The steps of neurons, weights, and connections.