US20180260711A1 - Calculating device and method for a sparsely connected artificial neural network - Google Patents
Calculating device and method for a sparsely connected artificial neural network Download PDFInfo
- Publication number
- US20180260711A1 US20180260711A1 US15/975,083 US201815975083A US2018260711A1 US 20180260711 A1 US20180260711 A1 US 20180260711A1 US 201815975083 A US201815975083 A US 201815975083A US 2018260711 A1 US2018260711 A1 US 2018260711A1
- Authority
- US
- United States
- Prior art keywords
- data
- input
- output
- values
- array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims description 76
- 239000003607 modifier Substances 0.000 claims description 133
- 230000004913 activation Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 8
- 230000001133 acceleration Effects 0.000 abstract description 31
- 239000010410 layer Substances 0.000 description 30
- 238000010586 diagram Methods 0.000 description 26
- 238000012986 modification Methods 0.000 description 25
- 230000004048 modification Effects 0.000 description 25
- 238000013138 pruning Methods 0.000 description 20
- 239000011159 matrix material Substances 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 14
- 210000002569 neuron Anatomy 0.000 description 11
- 230000010365 information processing Effects 0.000 description 3
- 239000002365 multiple layer Substances 0.000 description 3
- 210000002364 input neuron Anatomy 0.000 description 2
- 210000004205 output neuron Anatomy 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000000225 synapse Anatomy 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G06K9/6256—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G06N3/0454—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/48—Indexing scheme relating to groups G06F7/48 - G06F7/575
- G06F2207/4802—Special implementations
- G06F2207/4818—Threshold devices
- G06F2207/4824—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
Definitions
- ANNs Artificial Neural Networks
- NNs Neural Networks
- the algorithm used by NNs may be vector multiplication (also referred as “multiplication”) and convolution, which widely adopts sign functions and various approximations thereof.
- NNs consist of multiple interconnected nodes. As shown in FIG. 3 , each block represents a node and each arrow represents a connection between two nodes.
- NNs are widely applied to a variety of applications, such as computer vision, voice recognition and natural language processing.
- scale of NNs has been growing. For example, in 1998, Lecun's neural network for handwriting characters recognition includes less than 1 M weight values; while in 2012, Krizhevsky for participating ImageNet competition includes 60 M weight values.
- NNs are applications that require large amounts of calculation and great bandwidth for memory access.
- a sparsely connected neural network may be implemented.
- a general-purpose processor is conventionally adopted to calculate a sparse artificial neural network.
- the input neurons, output neurons and weight values are respectively stored in three arrays, meanwhile there is an index array for storing the connection relation between each output neuron and input neuron connected by weight values.
- a major operation is a multiplication of input data and a weight value.
- Each calculation needs to search a weight value corresponding to the input data through the index array. Since the general-purpose processor is weak in both calculation and memory access, demands of NNs may not be satisfied. Nevertheless, when multiple general-purpose processors work concurrently, inter-processor communication becomes a performance bottleneck again.
- each multiplication operation needs to re-search positions corresponding to the weight values in the index array, which increases additional calculation amounts and memory access overhead.
- NNs calculation is time-consuming and power-consuming.
- General-purpose processors need to decode an operation of a multiple-layer artificial neural network into a long sequence of operations and memory access instructions, and front-end decoding brings about a larger overhead.
- GPU graphics processing unit
- SIMD Single-instruction-multiple-data
- model data e.g., weight values
- GPU since GPU only contains relative small on-chip caching, then model data (e.g., weight values) of a multiple-layer artificial neural network has to be repeatedly retrieved from outside the chip.
- off-chip bandwidth becomes a main performance bottleneck while producing huge power consumption.
- An example apparatus may include a data modifier configured to receive one or more groups of input data.
- the one or more groups of input data may be stored as input elements in an input array and each of the input elements may be identified by an input array index.
- the data modifier may be further configured to receive a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on the one or more groups of input data.
- the one or more groups of output data may be stored as output elements in an output array and each of the output elements may be identified by an output array index.
- the data modifier may be configured to receive connection data that include one or more connection values.
- connection values may correspond to one of the input array indexes and one of the output array indexes and may indicate whether one of the weight values in the predetermined weight value array is designated for calculating a group of the output data to be stored as the output element identified by the corresponding output array index based on a group of the input data stored as the input element identified by the corresponding input array index, and whether the weight value meets a predetermined condition.
- the data modifier may be further configured to modify the weight values and the input data based on the connection data.
- the example apparatus may include a computing unit configured to receive the modified weight values and the modified input data from the data modifier and calculate the one or more groups of output data based on the modified weight values and the modified input data.
- An example method for modifying data in an MNN acceleration processor for neural networks may include receiving one or more groups of input data.
- the one or more groups of input data may be stored as input elements in an input array and each of the input elements may be identified by an input array index.
- the example method may include receiving a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on the one or more groups of input data.
- the one or more groups of output data may be stored as output elements in an output array and each of the output elements may be identified by an output array index.
- the example method may include receiving connection data that include one or more connection values.
- connection values may correspond to one of the input array indexes and one of the output array indexes and indicate whether one of the weight values in the predetermined weight value array is designated for calculating a group of the output data to be stored as the output element identified by the corresponding output array index based on a group of the input data stored as the input element identified by the corresponding input array index, and whether the weight value meets a predetermined condition.
- the example method may include modifying the weight values and the input data based on the connection data and calculating the one or more groups of output data based on the modified weight values and the modified input data.
- the one or more aspects comprise the features herein after fully described and particularly pointed out in the claims.
- the following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
- FIG. 1 is a block diagram illustrating an example computing process at an MNN acceleration processor for neural networks
- FIG. 2 is a block diagram illustrating an example computer system in which data modification for neural networks may be implemented
- FIG. 3 is a diagram illustrating a comparison between a regular MNN and a sparse MNN in which data modification for neural networks may be implemented;
- FIG. 4A and FIG. 4B are diagrams illustrating one or more connection values in a sparse MNN in which data modification for neural networks may be implemented;
- FIG. 5 is a diagram illustrating a convolution process with which data modification for neural networks may be implemented
- FIG. 6 is a diagram illustrating a convolution process with modified weight values with which data modification for neural networks may be implemented
- FIG. 7 is a block diagram illustrating an example MNN acceleration processor in which data modification for neural networks may be implemented
- FIG. 8 is a block diagram illustrating another example MNN acceleration processor in which data modification for neural networks may be implemented
- FIG. 9 is a block diagram illustrating an example data modifier by which data modification for neural networks may be implemented.
- FIG. 10 is a flow chart of aspects of an example method for modifying data for neural networks
- FIG. 11 is a block diagram illustrating another example MMN acceleration processor in which data modification for neural networks may be implemented
- FIG. 12 is a block diagram illustrating another example data modifier by which data modification for neural networks may be implemented.
- FIG. 13 is a flow chart of aspects of another example method for modifying data for neural networks
- FIG. 14 is a block diagram illustrating another example MMN acceleration processor in which data modification for neural networks may be implemented
- FIG. 15 is a block diagram illustrating another example data modifier by which data modification for neural networks may be implemented.
- FIG. 16 is a flow chart of aspects of another example method for modifying data for neural networks.
- a typical conceptual model of a multi-layer neural network may include multiple layers of neurons.
- Each neuron is an information-processing unit that is fundamental to the operation of a neural network.
- a typical model of a neuron may include three basic elements, e.g., a set of synapses, an adder, and an activation function.
- a simplified model of a neuron may include one or more input nodes for receiving the input signals or data and an output node for transmitting the output signals or data to an input node of another neuron at the next level.
- a layer of neurons may at least include a layer of multiple input nodes and another layer of output nodes.
- FIG. 1 is a block diagram illustrating an example computing process 100 at an MNN acceleration processor for neural networks.
- the example computing process 100 may be performed by a layer of input nodes 102 , a layer of output nodes 104 , a layer of input nodes 106 , and a layer of output nodes 108 .
- a triangular-shaped operator ( ⁇ as shown in FIG. 1 ) may indicate a matrix multiplication or a convolution operation.
- the layers of input nodes and output nodes may not be the first layer and the last layer of the entire neural network in the process. Rather, the layers of input and output nodes may refer to the nodes included in any two consecutive layers of neurons of a neural network.
- the computing process from the layers of input nodes 102 to the layer of output nodes 108 may be referred to as a forward propagation process; the computing process from the layer of output nodes 108 to the layer of input nodes 102 may be referred to as a backward propagation process.
- the forward propagation process may start from one or more input nodes that receive input data 102 A.
- the received input data 102 A may be multiplied or convolved by one or more weight values 102 C.
- the results of the multiplication or convolution may be transmitted to one or more output nodes at the layer of output nodes 104 as output data 104 A.
- the output data 104 A with or without further operations, may be transmitted to one or more input nodes at the next layer (e.g., the layer of input nodes 106 ) as input data 106 A.
- the input data 106 A may be multiplied or convolved by one or more weight values 106 C.
- the results of the multiplication or convolution may be similarly transmitted to one or more output nodes at the layer of output nodes 108 as output data 108 A.
- the backward propagation process may start from one or more output nodes at the last layer of nodes of the forward propagation process (e.g., the layer of output nodes 108 ).
- output gradients 108 B generated at the layer of output nodes 108 may be multiplied or convolved by the input data 106 A to generate weight gradients 106 D at the layer of input nodes 106 .
- the output gradients 108 B may be further multiplied or convolved by the weight values 106 C to generated input data gradients.
- the input data gradients 106 B with or without other operations between layers, may be transmitted to one or more nodes at the layer of output nodes 104 as output gradients 104 B.
- the output gradients 104 B may then be multiplied or convolved by the input data 102 A to generate weight gradients 102 D. Additionally, the output gradients 104 B may be multiplied by the weight values 102 C to generate input data gradients 102 B.
- FIG. 2 is a block diagram illustrating an example computer system 200 in which data modification for neural networks may be implemented.
- the example computer system 200 may include at least an I/O interface 202 , a central processing unit (CPU) 204 , a multi-layer neural network acceleration processor 206 , and a memory 208 .
- the I/O interface 202 may be configured to exchange data or information with peripheral devices, e.g., input devices, storage devices, etc.
- Data received from the I/O interface 202 may be further processed at the CPU 204 .
- Data that require processing at an MMN may be transmitted to the MNN acceleration processor 206 .
- the forward propagation process and the backward propagation process described above in accordance with FIG. 1 may be performed at the MNN acceleration processor 206 .
- weight values 102 C and 106 C may be retrieved from the memory 208 and stored on the MNN acceleration processor 206 during the processes.
- the index array that indicates the correspondence between the input data and the weight values is conventionally stored on the memory 208 .
- retrieving the index array from the memory 208 may cause significant system delays or bandwidth consumption.
- the MNN acceleration processor 206 may be described in further detail below.
- FIG. 3 is a diagram illustrating a comparison between a regular MNN 300 A and a sparse MNN 300 B in which data modification for neural networks may be implemented.
- the regular MNN 300 A may include a layer of input nodes 302 and a layer of output nodes 304 .
- Each block shown in the regular MNN 300 A indicates an input node or an output node.
- the arrows between the input nodes (e.g., i 1 , i 2 , i 3 . . . i N ) and the output nodes (e.g., o 1 , o 2 , o 3 . . . o N ) indicate those non-zero weight values for calculating the output data.
- w 11 may be the weight value for calculating the output data at output node o 1 based on the input data received at input node i 1 .
- more than one of the weight values may be zero, in which case input data received at more than one input nodes are not considered for calculating some output data.
- the arrows between corresponding input nodes and output nodes will be deleted and the MNN may be referred to as a sparse MNN, e.g., sparse MNN 300 B.
- no arrow is between i 2 and o 1 , i 1 and o 2 , and i 4 and o 2 , which indicates that the weight values, w 21 , w 12 , and w 42 are zero.
- FIG. 4A and FIG. 4B are diagrams illustrating one or more connection values in a sparse MNN in which data modification for neural networks may be implemented.
- an index array that indicates the correspondence between the weight values and the input data is conventionally stored in the memory 208 .
- connection data that indicate the correspondence between the output data and the input data may be generated and transmitted to MNN acceleration processor 206 .
- one or more groups of input data may be received at the input nodes i 1 , i 2 , i 3 , and i 4 .
- input data may be received and stored in a form of input array that includes elements identified by array indexes i 1 , i 2 , i 3 , and i 4 .
- one or more groups of output data may be generated at output nodes o 1 and o 2 . That is, the output data may be stored and transmitted in a form of output array that include elements identified by array indexes o 1 and o 2 .
- some input nodes are not connected to the output nodes.
- Connection data including one or more connection values may be generated based on the weight values corresponding to an output node and an input node. That is, if a weight value meets a predetermined condition, a connection value for the corresponding output node and input node may be set to one. Otherwise, if a weight value corresponding to the output node and input node is zero, or the weight value does not meet the predetermined condition, the connection value for the corresponding output node and input node may be set to zero.
- the predetermined condition may include that, the weight value is a non-zero number, an absolute value of the weight value is less than or equal to a first threshold value, and/or the absolute value of the weight value is less than or equal to a second threshold value but greater than or equal to a third threshold value.
- the first, second, and third threshold values may be received from the peripheral devices via the I/O interface 202 .
- the weight values for calculating output data at output node o 1 may include w 11 , w 21 , w 31 , and w 41 , which respective corresponds to the input data received at input nodes i 2 , i 3 , and i 4 .
- the weight values (w 11 , w 21 , w 31 , and w 41 ) may be 0.5, 0, 0.6, and 0.8 and the predetermined condition may be that a weight value is greater than zero but less than 0.99.
- weight values w 11 , w 31 , and w 41 meet the predetermined condition but w 21 does not.
- connection values for i 1 and o 1 , i 3 and o 1 , i 4 and o 1 may be set to 1 and the connection value for i 2 and o 1 may be set to zero.
- the connection values for i 1 and o 2 and i 4 and o 2 may be set to zero and the connection values for i 2 and o 2 and i 3 and o 2 may be set to one.
- the connection values for of may be determined and stored to be (1, 0, 1, 1) and the connection values for o2 may be determined to be (0, 1, 1, 0).
- the connection values may be stored in a form of a linked list or a multi-dimensional dynamic array.
- connection values may be generated based on a distance between the input nodes.
- a connection value may be determined by the distances between different input nodes that correspond to those weight values that meet the predetermined condition. With respect to the above example weight values, w 11 , w 31 , and w 41 meet the predetermined condition.
- the connection value for input node i 1 may be set to a value equal to the distance between the first input node and the current input node. Thus, since the distance between input node i 1 and the first node (also i 1 here) is zero, the connection value for i 1 may be set to zero.
- the connection value for i 3 may be set to 2. It is notable that the illustration and the term “distance” are provided for purpose of brevity. Since the input data and the output data may be stored in a form of data array, the term “distance” may refer to the difference between array indexes.
- connection values sufficiently represent the connections between the input nodes and the output nodes, the MNN acceleration processor 206 is not required to retrieve the index array from the memory 208 during the forward propagation process and the backward propagation process described in FIG. 1 .
- FIG. 5 is a diagram illustrating a convolution process with which data modification for neural networks may be implemented.
- an example convolution process between one or more groups of input data in a form of an input matrix
- each element of the output matrix is calculated by convolving a portion of the input matrix with the weigh matrix.
- the output data at the output node o 1 may be calculated by convolving the top left portion of the input matrix
- the result of the convolution process may be stored in an output matrix
- FIG. 6 is a diagram illustrating a convolution process with sparse weight matrix with which data modification for neural networks may be implemented.
- the top part of FIG. 6 shows a convolution process between an input matrix and a weight matrix.
- the lower part of FIG. 6 shows a convolution process between the input matrix and a sparse weight matrix.
- weight values w2 and w3 are deleted.
- the connection values w 11 , and w 41 , w 21 , w 31 may be set to (1, 0, 0, 1) or (0, 2) for the calculation of output data at output nodes o 1 and o 4 .
- FIG. 7 is a block diagram illustrating an example MNN acceleration processor 206 in which data modification for neural networks may be implemented.
- MNN acceleration processor 206 may at least include a data modifier 702 configured to receive one or more groups of input data and a predetermined weight value array that includes one or more weight values.
- the one or more groups of input data may be stored in a form of data array (“input array” hereinafter); that is, each group of the input data may be stored as an element of the input array (“input element” hereinafter).
- Each input element may be identified by an array index (“input array index” hereinafter; e.g., i 1 , i 2 , i 3 , and i 4 ).
- Each of the weight values may be designated for calculating a group of output data at an output node (e.g., o 1 ) based on a respective group of input data (e.g., a group of input data received at the input node i 1 ).
- the calculated output data may be similarly stored in a form of data array (“output array” hereinafter); that is, each group of the output data may be stored as an element of the output array (“output element” hereinafter).
- Each output element may be identified by an array index (“output array index” hereinafter; e.g., o 1 and o 2 ).
- the data modifier 702 may be configured to further receive connection data that include the one or more aforementioned connection values.
- Each of the connection values may correspond to an input array index (e.g., i 2 ) and an output array index (e.g., o 1 ).
- the data modifier 702 may be configured to modify the input data and the weight values based on the connection values.
- the data modifier 702 may be configured to operate in a work mode to delete one or more weight values or one or more groups of the input data (“pruning mode”). Additionally, the data modifier 702 may be configured to operate in another work mode to add one or more zero values to the predetermined weight value array or the input data (“compensation mode”). The selection between the deletion mode or the compensation mode may be predetermined as a system parameter or according to other algorithms prior to the receiving of the input data.
- the data modifier 702 may receive an input array including groups of input data (0.5, 0.6, 0.7, 1.2, 4, 0.1), an array of connection values (1, 0, 0, 1, 1, 1), a predetermined weight value array including weight values (0.5, 0.8, 0.9, 0.4).
- an input array including groups of input data (0.5, 0.6, 0.7, 1.2, 4, 0.1), an array of connection values (1, 0, 0, 1, 1, 1), a predetermined weight value array including weight values (0.5, 0.8, 0.9, 0.4).
- the processor retrieves the index array from the memory 208 to determine which four elements of the input array should be multiplied or convolved by the four elements in the weight array. The retrieving of the index array, as previously discussed, likely causes bandwidth consumption.
- the data modifier 702 may be configured to operate in the pruning mode. That is, since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7).
- the modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1).
- the modified input data may then be transmitted to a direct memory access (DMA) module 704 .
- DMA direct memory access
- the modified input data may be transmitted to and stored at the memory 208 for future processing.
- the data modifier 702 may receive groups of input data in an input array (0.5, 1.2, 4, 0.1), a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4), and the same array of connection values. Since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array.
- the modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4). Similarly, the modified weight value array may be transmitted to the DMA module 704 or to the memory 208 .
- the data modifier 702 may be configured to operate in the compensation mode.
- the data modifier 702 may receive an input array including elements (0.5, 1.2, 4, 0.1), a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4), and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 702 may be configured to add two elements of zero value to the input array to be the second and the third elements of the input array generating a modified input array including elements (0.5, 0, 0, 1.2, 4, 0.1). For the same reason stated above, a processor that performs multiplication or convolution operations on the modified input array and the predetermined weight value array is not required to retrieve the index array from the memory 208 and, thus, bandwidth consumption may be reduced.
- the data modifier 702 may receive an input array including elements (0.5, 0.6, 0.7, 1.2, 4, 0.1), a predetermined weight value array including elements (0.5, 0.8, 0.9, 0.4), and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 702 may be configured to add two elements of zero value to be the second and the third elements of the predetermined weight value array generating a modified weight value array including elements (0.5, 0, 0, 0.8, 0.9, 0.4).
- the modified input data and/or the modified weight values may be transmitted to and temporarily stored in an input data cache 712 and/or a weight cache 714 .
- the input data cache 712 and weight cache 714 may refer to one or more high-speed storage devices incorporated within the MNN acceleration processor 206 and configured to store the input data and the weight values respectively.
- the modified input data and/or the modified weight values may be further transmitted to a computing unit 710 for further processing.
- MNN acceleration processor 206 may further include an instruction cache 706 and a controller unit 708 .
- the instruction cache 706 may refer one or more storage devices configured to store instructions received from the CPU 204 .
- the controller unit 708 may be configured to read the instructions from the instruction cache 706 and decode the instructions.
- the computing unit 710 may be configured to calculate one or more groups of output data based on the modified weight values and the modified input data.
- the calculation of the output data may include the forward propagation process and the backward propagation process described in accordance with FIG. 1 .
- the computing unit 710 may further include one or more multipliers configured to multiply the modified input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- the generated output data may be temporarily stored in an output data cache 716 and may be further transmitted to the memory 208 via the DMA module 704 .
- FIG. 8 is a block diagram illustrating another example MNN acceleration processor 206 in which data modification for neural networks may be implemented.
- components in the example MNN acceleration processor 206 may be the same or similar to the corresponding components shown in FIG. 7 or may be configured to perform the same or similar operations to those shown in FIG. 7 except that a data modifier 802 may be implemented between a DMA module 804 , an input data cache 812 , and a weight cache 814 .
- the data modifier 802 may be configured to modify the input data and the weight values based on the connection values.
- the modified input data and the modified weight values may be transmitted to an input data cache 812 and a weight cache 814 and may be further transmitted to a computing unit 810 for further processing.
- FIG. 9 is a block diagram illustrating an example data modifier 702 / 802 by which data modification for neural networks may be implemented.
- the data modifier 702 / 802 may include an input data modifier 902 and a weight modifier 904 .
- the input data modifier 902 may be configured to modify the input data.
- the input data modifier 902 may be configured to delete groups of input data that correspond to the connection values that are zeroes.
- the input data modifier 902 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes.
- the weight modifier 904 may be configured to modify the weight values based on different operation mode. When operates in the pruning mode, the weight modifier 904 may be configured to delete weight values that correspond to the connection values that are zeroes. When operates in the compensation mode, the weight modifier 904 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes.
- the input data modifier 902 and the weight modifier 904 may be implemented by one or more multiplexers and at least one storage device configured to store information indicating the current operation mode.
- the input data modifier 902 may include an input data filter 906 and an input data multiplexer 908 .
- the input data filter 906 may be configured to output an input element if a connection value corresponding to the input element is 1. Further, when the connection value is 0, the input data filter 906 may be configured to ignore the corresponding input element and move to process the next input element.
- the input data multiplexer 908 may be configured to output data from the input data filter 906 when in the pruning mode and to directly output the input data when in the compensation mode. As such, those input elements corresponding to the connection values of zero may be deleted when the input data modifier 902 is configured to work in the pruning mode.
- the weight modifier 904 may include a first level weight multiplexer 910 and a second level weight multiplexer 912 .
- the first level weight multiplexer 910 may be configured to output a zero value if a corresponding connection value is 0 and to output a weight value corresponding to the connection value if the connection value is 1.
- the second level weight multiplexer 912 may be configured to output data received from the first level weight multiplexer 910 when in the compensation mode. Further, the second level weight multiplexer 912 may be configured to directly output a corresponding weight value when in the pruning mode. As such, additional elements of zero values may be added to the weight value array when the weight modifier 904 is configured to work in the compensation mode.
- FIG. 10 is a flow chart of aspects of an example method 1000 for modifying data for neural networks.
- the example method 1000 may be performed by one or more components of the MNN acceleration processor 206 as described in FIGS. 7 and 8 and the components of the data modifier 702 / 802 as described in FIG. 9 .
- method 1000 may include the data modifier 702 / 802 receiving one or more groups of input data, wherein the one or more groups of input data are stored as input elements in an input array and each of the input elements is identified by an input array index.
- method 1000 may include the data modifier 702 / 802 receiving a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on the one or more groups of input data, wherein the one or more groups of output data are to be stored as output elements in an output array and each of the output elements is identified by an output array index.
- method 1000 may include the data modifier 702 / 802 receiving connection data that include one or more connection values, wherein each of the connection values corresponds to one of the input array indexes and one of the output array indexes and indicates whether one of the weight values in the predetermined weight value array is designated for calculating a group of the output data to be stored as the output element identified by the corresponding output array index based on a group of the input data stored as the input element identified by the corresponding input array index, and whether the weight value meets a predetermined condition.
- method 1000 may include the data modifier 702 / 802 modifying the weight values and the input data based on the connection data.
- the modifying may further includes sub-processes or sub-operations including deleting at least one weight values that correspond to the connection values that are zero, adding one or more zero values to the predetermined weight value array based on the connection values, deleting at least one groups of the input data that are stored as the input elements identified by the input array indexes corresponding to the connection values that are zero, or adding one or more zero values to the input elements identified by the input array indexes corresponding to the connection values that are zero.
- the data modifier 702 may receive an input array including groups of input data (0.5, 0.6, 0.7, 1.2, 4, 0.1), an array of connection values including elements (1, 0, 0, 1, 1, 1), a predetermined weight value array including weight values (0.5, 0.8, 0.9, 0.4).
- the data modifier 702 may be configured to operate in the pruning mode. That is, since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7).
- the modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1).
- the data modifier 702 may receive groups of input data in an input array (0.5, 1.2, 4, 0.1), a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4), and the same array of connection values. Since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array.
- the modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4).
- method 1000 may include the computing unit 710 / 810 calculating the one or more groups of output data based on the modified weight values and the modified input data. That is, the computing unit 710 may be configured to calculating one or more groups of output data based on the modified weight values and the modified input data. In some aspects, the computing unit 710 may further include one or more multipliers configured to multiply the modified input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- the computing unit 710 may further include one or more multipliers configured to multiply the modified input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total
- FIG. 11 is a block diagram illustrating another example MMN acceleration processor 206 in which data modification for neural networks may be implemented.
- components in the example MNN acceleration processor 206 may be the same or similar to the corresponding components shown in FIG. 7 or may be configured to perform the same or similar operations to those shown in FIG. 7 except that a data modifier 1102 may be implemented between a DMA module 1104 and an input data cache 1112 .
- the DMA module 1104 may be configured to transmit and receive data from and to the memory 208 , an instruction cache 1106 , the data modifier 1102 , a weight cache 1114 , and an output data cache 1116 .
- the instruction cache 1106 , the input data cache 1112 , the weight cache 1114 , and the output data cache 1116 may respectively refer to one or more high-speed storage devices incorporated within the MNN acceleration processor 206 and configured to respectively store instructions from the DMA module 1104 , the modified input data from the data modifier 1102 , weight values from the DMA module 1104 , and the calculated output data from a computing unit 1110 .
- the data modifier 1102 may be configured to receive one or more groups of input data for generating one or more groups of output data.
- the one or more groups of input data may be stored as input elements in an input array and each of the input elements is identified by an input array index.
- the data modifier 1102 may be further configured to receive connection data that include one or more connection values.
- the data modifier 1102 is not configured to receive the weight values as the weight values are directly transmitted from the DMA module 1104 to the weight cache 1114 .
- the data modifier 1102 may be configured to modify the received groups of input data based on the connection data.
- the data modifier 1102 may be configured to receive an input array including groups of input data as elements (0.5, 0.6, 0.7, 1.2, 4, 0.1) and an array of connection values (1, 0, 0, 1, 1, 1).
- the data modifier 1102 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7).
- the modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1).
- the data modifier 1102 may operate in the compensation mode.
- the data modifier 1102 may receive an input array including elements (0.5, 1.2, 4, 0.1) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 1102 may be configured to add two elements of zero value to the input array to be the second and the third elements of the input array generating a modified input array including elements (0.5, 0, 0, 1.2, 4, 0.1).
- the modified input data may be transmitted to and temporarily stored at the input data cache 1112 .
- the modified input data may be further transmitted, together with the weight values from the weight cache 1114 and the decoded instructions from the controller unit 1108 , to the computing unit 1110 .
- the computing unit 1110 may be configured to calculate one or more groups of output data based on the weight values and the modified input data.
- the calculation of the output data may include the forward propagation process and the backward propagation process described in accordance with FIG. 1 .
- the computing unit 1110 may include one or more multipliers configured to multiply the modified input data by the weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- the generated output data may be temporarily stored in the output data cache 1116 and may be further transmitted to the memory 208 via the DMA module 1104 .
- FIG. 12 is a block diagram illustrating another example data modifier 1102 by which data modification for neural networks may be implemented.
- the data modifier 1102 may be configured to only modify the input data
- the data modifier 1102 may only include an input data modifier 1202 .
- the dash-lined block indicates an optional weight modifier 904 .
- the input data modifier 1202 may be configured to modify the input data depending on the operation mode.
- the input data modifier 1202 may be configured to delete groups of input data that correspond to the connection values that are zeroes.
- the input data modifier 1202 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes.
- the input data modifier 1202 may be implemented by one or more multiplexers and at least one storage device configured to store information indicating the current operation mode.
- the input data modifier 1202 may include an input data filter 1206 and an input data multiplexer 1208 .
- the input data filter 1206 may be configured to output an input element if a connection value corresponding to the input element is 1. Further, when the connection value is 0, the input data filter 1206 may be configured to ignore the corresponding input element and move to process the next input element.
- the input data multiplexer 1208 may be configured to output data from the input data filter 1206 when in the pruning mode and to directly output the input data when in the compensation mode. As such, those input elements corresponding to the connection values of zero may be deleted when the input data modifier 1202 is configured to work in the pruning mode.
- FIG. 13 is a flow chart of aspects of another example method 1300 for modifying data for neural networks.
- the example method 1300 may be performed by one or more components of the MNN acceleration processor 206 as described in FIG. 11 and the component of the data modifier 1102 as described in FIG. 12 .
- method 1300 may include the data modifier 1102 receiving one or more groups of input data for generating one or more groups of output data.
- the one or more groups of input data may be stored as input elements in an input array and each of the input elements is identified by an input array index.
- Method 1300 may further include the data modifier 1102 receiving connection data that include one or more connection values.
- method 1300 may include the data modifier 1102 modifying the received groups of input data based on the connection data.
- the modifying may further include sub-processes or sub-operations including deleting at least one groups of the input data that are stored as the input elements identified by the input array indexes corresponding to the connection values that are zero when the data modifier 1102 operates in the pruning mode.
- the modifying may include adding one or more zero values to the input elements identified by the input array indexes corresponding to the connection values that are zero when the data modifier 1102 operates in the compensation mode.
- the data modifier 1102 may receive an input array including groups of input data as elements (0.5, 0.6, 0.7, 1.2, 4, 0.1) and an array of connection values (1, 0, 0, 1, 1, 1).
- the data modifier 1102 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7).
- the modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1).
- the data modifier 1102 may operate in the compensation mode.
- the data modifier 1102 may receive an input array including elements (0.5, 1.2, 4, 0.1) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 1102 may be configured to add two elements of zero value to the input array to be the second and the third elements of the input array generating a modified input array including elements (0.5, 0, 0, 1.2, 4, 0.1).
- method 1300 may include the computing unit 1110 calculating the one or more groups of output data based on the weight values and the modified input data.
- the computing unit 1110 may include one or more multipliers configured to multiply the modified input data by the weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- FIG. 14 is a block diagram illustrating another example MMN acceleration processor 206 in which data modification for neural networks may be implemented.
- components in the example MNN acceleration processor 206 may be the same or similar to the corresponding components shown in FIG. 7 or may be configured to perform the same or similar operations to those shown in FIG. 7 except that a data modifier 1402 may be implemented between a DMA module 1404 and a weight cache 1414 .
- the DMA module 1404 may be configured to transmit and receive data from and to the memory 208 , an instruction cache 1406 , the data modifier 1402 , an input data cache 1412 , and an output data cache 1416 .
- the instruction cache 1406 , the input data cache 1412 , the weight cache 1414 , and the output data cache 1416 may respectively refer to one or more high-speed storage devices incorporated within the MNN acceleration processor 206 and configured to respectively store instructions from the DMA module 1404 , the input data from the DMA module 1404 , the modified weight values from the data modifier 1402 , and the calculated output data from a computing unit 1410 .
- the data modifier 1402 may be configured to receive a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on one or more groups of input data.
- the one or more groups of input data may be stored as input elements in an input array and each of the input elements is identified by an input array index.
- the one or more groups of output data are to be stored as output elements in an output array and each of the output elements is identified by an output array index.
- the data modifier 1402 may be further configured to receive connection data that include one or more connection values.
- the data modifier 1402 is not configured to receive the input data as the input data may be directly transmitted from the DMA module 1404 to the input data cache 1412 .
- the data modifier 1402 may be configured to modify the weight values based on the connection data. For example, the data modifier 1402 may receive a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4) and an array of connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 1402 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array.
- the modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4).
- the data modifier 1402 may receive a predetermined weight value array including elements (0.5, 0.8, 0.9, 0.4) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 1402 may be configured to add two elements of zero value to be the second and the third elements of the predetermined weight value array generating a modified weight value array including elements (0.5, 0, 0, 0.8, 0.9, 0.4).
- the modified weight values may be transmitted to and temporarily stored at the weight cache 1414 .
- the modified weight values may be further transmitted, together with the input data from the input data cache 1412 and the decoded instructions from the controller unit 1408 , to the computing unit 1410 .
- the computing unit 1410 may be further configured to calculate one or more groups of output data based on the modified weight values and the input data.
- the calculation of the output data may include the forward propagation process and the backward propagation process described in accordance with FIG. 1 .
- the computing unit 1410 may include one or more multipliers configured to multiply the input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- the generated output data may be temporarily stored in the output data cache 1416 and may be further transmitted to the memory 208 via the DMA module 1404 .
- FIG. 15 is a block diagram illustrating another example data modifier by which data modification for neural networks may be implemented.
- the data modifier 1402 may be configured to only modify the weight values, the data modifier 1402 may only include a weight modifier 1504 .
- the dash-lined block indicates an optional input data modifier 902 .
- the weight modifier 1504 may be configured to modify the input data depending on the operation mode. When operates in the pruning mode, the weight modifier 1504 may be configured to delete weight values that correspond to the connection values that are zeroes. When operates in the compensation mode, the weight modifier 1504 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes.
- the weight modifier 1504 may be implemented by one or more comparators and at least one storage device configured to store information indicating the current operation mode.
- the weight modifier 1504 may include a first level weight multiplexer 1506 and a second level weight multiplexer 1508 .
- the first level weight multiplexer 1506 may be configured to output a zero value if a corresponding connection value is 0 and to output a weight value corresponding to the connection value if the connection value is 1.
- the second level weight multiplexer 1508 may be configured to output data received from the first level weight multiplexer 1506 when in the compensation mode. Further, the second level weight multiplexer 1508 may be configured to directly output a corresponding weight value when in the pruning mode. As such, additional elements of zero values may be added to the weight value array when the weight modifier 1504 is configured to work in the compensation mode.
- FIG. 16 is a flow chart of aspects of another example method for modifying data for neural networks.
- the example method 1600 may be performed by one or more components of the MNN acceleration processor 206 as described in FIG. 14 and the component of the data modifier 1402 as described in FIG. 15 .
- method 1600 may include the data modifier 1402 receiving a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on one or more groups of input data.
- Method 1600 may further include the data modifier 1402 receiving connection data that include one or more connection values.
- method 1600 may include the data modifier 1402 modifying the weight values based on the connection data.
- the modifying may further include sub-processes or sub-operations including deleting at least one weight values that correspond to the connection values that are zero when the data modifier 1402 operates in the pruning mode.
- the modifying may include adding one or more zero values to the predetermined weight value array based on the connection values when the data modifier 1402 operates in the compensation mode.
- the data modifier 1402 may receive a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4) and an array of connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 1402 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array.
- the modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4).
- the data modifier 1402 may receive a predetermined weight value array including elements (0.5, 0.8, 0.9, 0.4) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 1402 may be configured to add two elements of zero value to be the second and the third elements of the predetermined weight value array generating a modified weight value array including elements (0.5, 0, 0, 0.8, 0.9, 0.4).
- method 1600 may include the computing unit 1410 calculating the one or more groups of output data based on the modified weight values and the input data.
- the computing unit 1410 may include one or more multipliers configured to multiply the input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from the context, the phrase “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, the phrase “X employs A or B” is satisfied by any of the following instances: X employs A; X employs B; or X employs both A and B.
- the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from the context to be directed to a singular form.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Neurology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
- Memory System (AREA)
- Devices For Executing Special Programs (AREA)
- Complex Calculations (AREA)
- Machine Translation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
- Artificial Neural Networks (ANNs), or Neural Networks (NNs) for short, are algorithmic mathematical models imitating the behavior characteristics of animal neural networks and performing the distributed concurrent information processing. Depending on complexity of a system, such networks adjust interconnection among a great number of internal nodes, thereby achieving the purpose of information processing. The algorithm used by NNs may be vector multiplication (also referred as “multiplication”) and convolution, which widely adopts sign functions and various approximations thereof.
- As neural networks in animal brains, NNs consist of multiple interconnected nodes. As shown in
FIG. 3 , each block represents a node and each arrow represents a connection between two nodes. - The calculation formula of a neuron can be briefly described as y=f(Σi=0 nwi*xi), wherein x represents input data received at all input nodes connected to the output nodes, w represents corresponding weight values between the input nodes and the output nodes, and f(x) is a nonlinear function, usually known as an activation function including those commonly used functions such as
-
- NNs are widely applied to a variety of applications, such as computer vision, voice recognition and natural language processing. In recent years, the scale of NNs has been growing. For example, in 1998, Lecun's neural network for handwriting characters recognition includes less than 1 M weight values; while in 2012, Krizhevsky for participating ImageNet competition includes 60 M weight values.
- NNs are applications that require large amounts of calculation and great bandwidth for memory access. The more weight values, the more amounts of calculation and memory access are required. In order to decrease the account of calculation and the number of weight values thereby reducing memory access, a sparsely connected neural network may be implemented.
- Even as the amount of calculation and the amount of memory access of NNs dramatically increase, a general-purpose processor is conventionally adopted to calculate a sparse artificial neural network. With regard to the general-purpose processor, the input neurons, output neurons and weight values are respectively stored in three arrays, meanwhile there is an index array for storing the connection relation between each output neuron and input neuron connected by weight values. At the time of calculating, a major operation is a multiplication of input data and a weight value. Each calculation needs to search a weight value corresponding to the input data through the index array. Since the general-purpose processor is weak in both calculation and memory access, demands of NNs may not be satisfied. Nevertheless, when multiple general-purpose processors work concurrently, inter-processor communication becomes a performance bottleneck again. In some other aspects, when calculating a neural network after pruning, each multiplication operation needs to re-search positions corresponding to the weight values in the index array, which increases additional calculation amounts and memory access overhead. Thus, NNs calculation is time-consuming and power-consuming. General-purpose processors need to decode an operation of a multiple-layer artificial neural network into a long sequence of operations and memory access instructions, and front-end decoding brings about a larger overhead.
- Another known method to support the operations and training algorithms of a sparsely connected artificial neural network is to use a graphics processing unit (GPU). In such method a general-purpose register file and a general-purpose stream processing unit are used to execute a universal Single-instruction-multiple-data (SIMD) instruction to support the aforementioned algorithm. Since a GPU is a device specially designed for executing graph and image operations as well as scientific calculation, it fails to provide specific support for sparse artificial neural network operations. As such, GPUs also need a great amount of front-end decoding to execute sparse artificial neural network operations, thus leading to additional overheads. In addition, since GPU only contains relative small on-chip caching, then model data (e.g., weight values) of a multiple-layer artificial neural network has to be repeatedly retrieved from outside the chip. Thus, off-chip bandwidth becomes a main performance bottleneck while producing huge power consumption.
- The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
- The present disclosure presents examples of techniques for modifying data in an MNN acceleration processor for neural networks. An example apparatus may include a data modifier configured to receive one or more groups of input data. The one or more groups of input data may be stored as input elements in an input array and each of the input elements may be identified by an input array index. The data modifier may be further configured to receive a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on the one or more groups of input data. The one or more groups of output data may be stored as output elements in an output array and each of the output elements may be identified by an output array index. Further still, the data modifier may be configured to receive connection data that include one or more connection values. Each of the connection values may correspond to one of the input array indexes and one of the output array indexes and may indicate whether one of the weight values in the predetermined weight value array is designated for calculating a group of the output data to be stored as the output element identified by the corresponding output array index based on a group of the input data stored as the input element identified by the corresponding input array index, and whether the weight value meets a predetermined condition. The data modifier may be further configured to modify the weight values and the input data based on the connection data. In addition, the example apparatus may include a computing unit configured to receive the modified weight values and the modified input data from the data modifier and calculate the one or more groups of output data based on the modified weight values and the modified input data.
- An example method for modifying data in an MNN acceleration processor for neural networks may include receiving one or more groups of input data. The one or more groups of input data may be stored as input elements in an input array and each of the input elements may be identified by an input array index. Further, the example method may include receiving a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on the one or more groups of input data. The one or more groups of output data may be stored as output elements in an output array and each of the output elements may be identified by an output array index. Further still, the example method may include receiving connection data that include one or more connection values. Each of the connection values may correspond to one of the input array indexes and one of the output array indexes and indicate whether one of the weight values in the predetermined weight value array is designated for calculating a group of the output data to be stored as the output element identified by the corresponding output array index based on a group of the input data stored as the input element identified by the corresponding input array index, and whether the weight value meets a predetermined condition. In addition, the example method may include modifying the weight values and the input data based on the connection data and calculating the one or more groups of output data based on the modified weight values and the modified input data.
- To the accomplishment of the foregoing and related ends, the one or more aspects comprise the features herein after fully described and particularly pointed out in the claims. The following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
- The disclosed aspects will hereinafter be described in conjunction with the appended drawings, provided to illustrate and not to limit the disclosed aspects, wherein like designations denote like elements, and in which:
-
FIG. 1 is a block diagram illustrating an example computing process at an MNN acceleration processor for neural networks; -
FIG. 2 is a block diagram illustrating an example computer system in which data modification for neural networks may be implemented; -
FIG. 3 is a diagram illustrating a comparison between a regular MNN and a sparse MNN in which data modification for neural networks may be implemented; -
FIG. 4A andFIG. 4B are diagrams illustrating one or more connection values in a sparse MNN in which data modification for neural networks may be implemented; -
FIG. 5 is a diagram illustrating a convolution process with which data modification for neural networks may be implemented; -
FIG. 6 is a diagram illustrating a convolution process with modified weight values with which data modification for neural networks may be implemented; -
FIG. 7 is a block diagram illustrating an example MNN acceleration processor in which data modification for neural networks may be implemented; -
FIG. 8 is a block diagram illustrating another example MNN acceleration processor in which data modification for neural networks may be implemented; -
FIG. 9 is a block diagram illustrating an example data modifier by which data modification for neural networks may be implemented; -
FIG. 10 is a flow chart of aspects of an example method for modifying data for neural networks; -
FIG. 11 is a block diagram illustrating another example MMN acceleration processor in which data modification for neural networks may be implemented; -
FIG. 12 is a block diagram illustrating another example data modifier by which data modification for neural networks may be implemented; -
FIG. 13 is a flow chart of aspects of another example method for modifying data for neural networks; -
FIG. 14 is a block diagram illustrating another example MMN acceleration processor in which data modification for neural networks may be implemented; -
FIG. 15 is a block diagram illustrating another example data modifier by which data modification for neural networks may be implemented; and -
FIG. 16 is a flow chart of aspects of another example method for modifying data for neural networks. - Various aspects are now described with reference to the drawings. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It may be evident, however, that such aspect(s) may be practiced without these specific details.
- A typical conceptual model of a multi-layer neural network (MNN) may include multiple layers of neurons. Each neuron is an information-processing unit that is fundamental to the operation of a neural network. In more detail, a typical model of a neuron may include three basic elements, e.g., a set of synapses, an adder, and an activation function. In a form of a mathematical formula, the output signals of a neuron may be represented as yk=φ(Σj=1 mwkjxj+bk), in which yk represents the output signals of the neuron, φ( ) represents the activation function, wkj represents one or more weight values, xj represents the input signals of the neuron, and bk represents a bias value. In other words, a simplified model of a neuron may include one or more input nodes for receiving the input signals or data and an output node for transmitting the output signals or data to an input node of another neuron at the next level. Thus, a layer of neurons may at least include a layer of multiple input nodes and another layer of output nodes.
-
FIG. 1 is a block diagram illustrating anexample computing process 100 at an MNN acceleration processor for neural networks. As depicted, theexample computing process 100 may be performed by a layer ofinput nodes 102, a layer ofoutput nodes 104, a layer ofinput nodes 106, and a layer ofoutput nodes 108. A triangular-shaped operator (Δ as shown inFIG. 1 ) may indicate a matrix multiplication or a convolution operation. It is notable that the layers of input nodes and output nodes may not be the first layer and the last layer of the entire neural network in the process. Rather, the layers of input and output nodes may refer to the nodes included in any two consecutive layers of neurons of a neural network. As described below in greater detail, the computing process from the layers ofinput nodes 102 to the layer ofoutput nodes 108 may be referred to as a forward propagation process; the computing process from the layer ofoutput nodes 108 to the layer ofinput nodes 102 may be referred to as a backward propagation process. - The forward propagation process may start from one or more input nodes that receive input data 102A. The received input data 102A may be multiplied or convolved by one or more weight values 102C. The results of the multiplication or convolution may be transmitted to one or more output nodes at the layer of
output nodes 104 asoutput data 104A. Theoutput data 104A, with or without further operations, may be transmitted to one or more input nodes at the next layer (e.g., the layer of input nodes 106) as input data 106A. Similarly, the input data 106A may be multiplied or convolved by one or more weight values 106C. The results of the multiplication or convolution may be similarly transmitted to one or more output nodes at the layer ofoutput nodes 108 asoutput data 108A. - The backward propagation process may start from one or more output nodes at the last layer of nodes of the forward propagation process (e.g., the layer of output nodes 108). For example,
output gradients 108B generated at the layer ofoutput nodes 108 may be multiplied or convolved by the input data 106A to generate weight gradients 106D at the layer ofinput nodes 106. Theoutput gradients 108B may be further multiplied or convolved by the weight values 106C to generated input data gradients. Theinput data gradients 106B, with or without other operations between layers, may be transmitted to one or more nodes at the layer ofoutput nodes 104 as output gradients 104B. The output gradients 104B may then be multiplied or convolved by the input data 102A to generate weight gradients 102D. Additionally, the output gradients 104B may be multiplied by the weight values 102C to generate input data gradients 102B. -
FIG. 2 is a block diagram illustrating anexample computer system 200 in which data modification for neural networks may be implemented. Theexample computer system 200 may include at least an I/O interface 202, a central processing unit (CPU) 204, a multi-layer neuralnetwork acceleration processor 206, and amemory 208. The I/O interface 202 may be configured to exchange data or information with peripheral devices, e.g., input devices, storage devices, etc. Data received from the I/O interface 202 may be further processed at theCPU 204. Data that require processing at an MMN may be transmitted to theMNN acceleration processor 206. For example, the forward propagation process and the backward propagation process described above in accordance withFIG. 1 may be performed at theMNN acceleration processor 206. Other data for the forward propagation process and the backward propagation process, e.g., weight values 102C and 106C, may be retrieved from thememory 208 and stored on theMNN acceleration processor 206 during the processes. However, as discussed above, the index array that indicates the correspondence between the input data and the weight values is conventionally stored on thememory 208. At each multiplication or convolution that involves the weight values, retrieving the index array from thememory 208 may cause significant system delays or bandwidth consumption. TheMNN acceleration processor 206 may be described in further detail below. -
FIG. 3 is a diagram illustrating a comparison between aregular MNN 300A and asparse MNN 300B in which data modification for neural networks may be implemented. As depicted, theregular MNN 300A may include a layer ofinput nodes 302 and a layer ofoutput nodes 304. Each block shown in theregular MNN 300A indicates an input node or an output node. The arrows between the input nodes (e.g., i1, i2, i3 . . . iN) and the output nodes (e.g., o1, o2, o3 . . . oN) indicate those non-zero weight values for calculating the output data. For example, w11 may be the weight value for calculating the output data at output node o1 based on the input data received at input node i1. However, in some applications of neural networks, more than one of the weight values may be zero, in which case input data received at more than one input nodes are not considered for calculating some output data. In these cases, the arrows between corresponding input nodes and output nodes will be deleted and the MNN may be referred to as a sparse MNN, e.g.,sparse MNN 300B. As shown insparse MNN 300B, no arrow is between i2 and o1, i1 and o2, and i4 and o2, which indicates that the weight values, w21, w12, and w42 are zero. -
FIG. 4A andFIG. 4B are diagrams illustrating one or more connection values in a sparse MNN in which data modification for neural networks may be implemented. As discussed above, an index array that indicates the correspondence between the weight values and the input data is conventionally stored in thememory 208. With respect to sparse MNNs, connection data that indicate the correspondence between the output data and the input data may be generated and transmitted toMNN acceleration processor 206. - As depicted in
FIGS. 4A and 4B , one or more groups of input data may be received at the input nodes i1, i2, i3, and i4. In other words, input data may be received and stored in a form of input array that includes elements identified by array indexes i1, i2, i3, and i4. Similarly, one or more groups of output data may be generated at output nodes o1 and o2. That is, the output data may be stored and transmitted in a form of output array that include elements identified by array indexes o1 and o2. As an example of a sparse MNN, some input nodes are not connected to the output nodes. - Connection data including one or more connection values may be generated based on the weight values corresponding to an output node and an input node. That is, if a weight value meets a predetermined condition, a connection value for the corresponding output node and input node may be set to one. Otherwise, if a weight value corresponding to the output node and input node is zero, or the weight value does not meet the predetermined condition, the connection value for the corresponding output node and input node may be set to zero. In some examples, the predetermined condition may include that, the weight value is a non-zero number, an absolute value of the weight value is less than or equal to a first threshold value, and/or the absolute value of the weight value is less than or equal to a second threshold value but greater than or equal to a third threshold value. The first, second, and third threshold values may be received from the peripheral devices via the I/
O interface 202. - For example, the weight values for calculating output data at output node o1 may include w11, w21, w31, and w41, which respective corresponds to the input data received at input nodes i2, i3, and i4. The weight values (w11, w21, w31, and w41) may be 0.5, 0, 0.6, and 0.8 and the predetermined condition may be that a weight value is greater than zero but less than 0.99. Thus, weight values w11, w31, and w41 meet the predetermined condition but w21 does not. As such, the connection values for i1 and o1, i3 and o1, i4 and o1 may be set to 1 and the connection value for i2 and o1 may be set to zero. Similarly, the connection values for i1 and o2 and i4 and o2 may be set to zero and the connection values for i2 and o2 and i3 and o2 may be set to one. Thus, the connection values for of may be determined and stored to be (1, 0, 1, 1) and the connection values for o2 may be determined to be (0, 1, 1, 0). In some examples, the connection values may be stored in a form of a linked list or a multi-dimensional dynamic array.
- In other examples (e.g., illustrated in
FIG. 4B ), connection values may be generated based on a distance between the input nodes. A connection value may be determined by the distances between different input nodes that correspond to those weight values that meet the predetermined condition. With respect to the above example weight values, w11, w31, and w41 meet the predetermined condition. The connection value for input node i1 may be set to a value equal to the distance between the first input node and the current input node. Thus, since the distance between input node i1 and the first node (also i1 here) is zero, the connection value for i1 may be set to zero. With respect to input node i3, since the distance between input node i3 and the first input node (i1) is 2, the connection value for i3 may be set to 2. It is notable that the illustration and the term “distance” are provided for purpose of brevity. Since the input data and the output data may be stored in a form of data array, the term “distance” may refer to the difference between array indexes. - Thus, as the connection values sufficiently represent the connections between the input nodes and the output nodes, the
MNN acceleration processor 206 is not required to retrieve the index array from thememory 208 during the forward propagation process and the backward propagation process described inFIG. 1 . -
FIG. 5 is a diagram illustrating a convolution process with which data modification for neural networks may be implemented. In this example, an example convolution process between one or more groups of input data in a form of an input matrix -
- and weight values in a form of a weight matrix
-
- is described. As shown, each element of the output matrix is calculated by convolving a portion of the input matrix with the weigh matrix. For example, the output data at the output node o1 may be calculated by convolving the top left portion of the input matrix
-
- by the weight matrix. The result of the convolution process may be stored in an output matrix
-
-
FIG. 6 is a diagram illustrating a convolution process with sparse weight matrix with which data modification for neural networks may be implemented. As depicted, the top part ofFIG. 6 shows a convolution process between an input matrix and a weight matrix. The lower part ofFIG. 6 shows a convolution process between the input matrix and a sparse weight matrix. In the sparse weight matrix, weight values w2 and w3 are deleted. Thus, rather than four times of convolution operations, it only requires two convolution operations to generate the output matrix. Specifically, the connection values w11, and w41, w21, w31, may be set to (1, 0, 0, 1) or (0, 2) for the calculation of output data at output nodes o1 and o4. -
FIG. 7 is a block diagram illustrating an exampleMNN acceleration processor 206 in which data modification for neural networks may be implemented. As depicted,MNN acceleration processor 206 may at least include a data modifier 702 configured to receive one or more groups of input data and a predetermined weight value array that includes one or more weight values. As described above, the one or more groups of input data may be stored in a form of data array (“input array” hereinafter); that is, each group of the input data may be stored as an element of the input array (“input element” hereinafter). Each input element may be identified by an array index (“input array index” hereinafter; e.g., i1, i2, i3, and i4). Each of the weight values may be designated for calculating a group of output data at an output node (e.g., o1) based on a respective group of input data (e.g., a group of input data received at the input node i1). The calculated output data may be similarly stored in a form of data array (“output array” hereinafter); that is, each group of the output data may be stored as an element of the output array (“output element” hereinafter). Each output element may be identified by an array index (“output array index” hereinafter; e.g., o1 and o2). - The data modifier 702 may be configured to further receive connection data that include the one or more aforementioned connection values. Each of the connection values may correspond to an input array index (e.g., i2) and an output array index (e.g., o1).
- Further, the data modifier 702 may be configured to modify the input data and the weight values based on the connection values. In some aspects, the data modifier 702 may be configured to operate in a work mode to delete one or more weight values or one or more groups of the input data (“pruning mode”). Additionally, the data modifier 702 may be configured to operate in another work mode to add one or more zero values to the predetermined weight value array or the input data (“compensation mode”). The selection between the deletion mode or the compensation mode may be predetermined as a system parameter or according to other algorithms prior to the receiving of the input data.
- In a specific example, the data modifier 702 may receive an input array including groups of input data (0.5, 0.6, 0.7, 1.2, 4, 0.1), an array of connection values (1, 0, 0, 1, 1, 1), a predetermined weight value array including weight values (0.5, 0.8, 0.9, 0.4). Conventionally, when a processor performs multiplication or convolution operations on the six-element input array and the four-element weight array, the processor retrieves the index array from the
memory 208 to determine which four elements of the input array should be multiplied or convolved by the four elements in the weight array. The retrieving of the index array, as previously discussed, likely causes bandwidth consumption. - In this example, the data modifier 702 may be configured to operate in the pruning mode. That is, since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7). The modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1). The modified input data may then be transmitted to a direct memory access (DMA) module 704. Alternatively, the modified input data may be transmitted to and stored at the
memory 208 for future processing. - In another specific example where the data modifier 702 operates in the pruning mode, the data modifier 702 may receive groups of input data in an input array (0.5, 1.2, 4, 0.1), a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4), and the same array of connection values. Since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array. The modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4). Similarly, the modified weight value array may be transmitted to the DMA module 704 or to the
memory 208. - In some other examples, the data modifier 702 may be configured to operate in the compensation mode. For example, the data modifier 702 may receive an input array including elements (0.5, 1.2, 4, 0.1), a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4), and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 702 may be configured to add two elements of zero value to the input array to be the second and the third elements of the input array generating a modified input array including elements (0.5, 0, 0, 1.2, 4, 0.1). For the same reason stated above, a processor that performs multiplication or convolution operations on the modified input array and the predetermined weight value array is not required to retrieve the index array from the
memory 208 and, thus, bandwidth consumption may be reduced. - In another example where the data modifier 702 operates in the compensation mode, the data modifier 702 may receive an input array including elements (0.5, 0.6, 0.7, 1.2, 4, 0.1), a predetermined weight value array including elements (0.5, 0.8, 0.9, 0.4), and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, the data modifier 702 may be configured to add two elements of zero value to be the second and the third elements of the predetermined weight value array generating a modified weight value array including elements (0.5, 0, 0, 0.8, 0.9, 0.4).
- The modified input data and/or the modified weight values may be transmitted to and temporarily stored in an input data cache 712 and/or a weight cache 714. The input data cache 712 and weight cache 714 may refer to one or more high-speed storage devices incorporated within the
MNN acceleration processor 206 and configured to store the input data and the weight values respectively. The modified input data and/or the modified weight values may be further transmitted to a computing unit 710 for further processing. -
MNN acceleration processor 206 may further include an instruction cache 706 and a controller unit 708. The instruction cache 706 may refer one or more storage devices configured to store instructions received from theCPU 204. The controller unit 708 may be configured to read the instructions from the instruction cache 706 and decode the instructions. - Upon receiving the decoded instructions from the controller unit 708, the modified input data from the input data cache 712, and the modified weight values from the weight cache 714, the computing unit 710 may be configured to calculate one or more groups of output data based on the modified weight values and the modified input data. In some aspects, the calculation of the output data may include the forward propagation process and the backward propagation process described in accordance with
FIG. 1 . - The computing unit 710 may further include one or more multipliers configured to multiply the modified input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data.
- The generated output data may be temporarily stored in an output data cache 716 and may be further transmitted to the
memory 208 via the DMA module 704. -
FIG. 8 is a block diagram illustrating another exampleMNN acceleration processor 206 in which data modification for neural networks may be implemented. As depicted, components in the exampleMNN acceleration processor 206 may be the same or similar to the corresponding components shown inFIG. 7 or may be configured to perform the same or similar operations to those shown inFIG. 7 except that adata modifier 802 may be implemented between aDMA module 804, aninput data cache 812, and aweight cache 814. - The
data modifier 802, similar to the data modifier 702, may be configured to modify the input data and the weight values based on the connection values. The modified input data and the modified weight values may be transmitted to aninput data cache 812 and aweight cache 814 and may be further transmitted to acomputing unit 810 for further processing. -
FIG. 9 is a block diagram illustrating an example data modifier 702/802 by which data modification for neural networks may be implemented. As depicted, the data modifier 702/802 may include aninput data modifier 902 and aweight modifier 904. - Depending on the operation mode, the
input data modifier 902 may be configured to modify the input data. When operates in the pruning mode, theinput data modifier 902 may be configured to delete groups of input data that correspond to the connection values that are zeroes. When operates in the compensation mode, theinput data modifier 902 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes. - Similarly, the
weight modifier 904 may be configured to modify the weight values based on different operation mode. When operates in the pruning mode, theweight modifier 904 may be configured to delete weight values that correspond to the connection values that are zeroes. When operates in the compensation mode, theweight modifier 904 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes. - In some aspects, the
input data modifier 902 and theweight modifier 904 may be implemented by one or more multiplexers and at least one storage device configured to store information indicating the current operation mode. - In a non-limiting example illustrated in
FIG. 9 , theinput data modifier 902 may include aninput data filter 906 and an input data multiplexer 908. The input data filter 906 may be configured to output an input element if a connection value corresponding to the input element is 1. Further, when the connection value is 0, the input data filter 906 may be configured to ignore the corresponding input element and move to process the next input element. The input data multiplexer 908 may be configured to output data from the input data filter 906 when in the pruning mode and to directly output the input data when in the compensation mode. As such, those input elements corresponding to the connection values of zero may be deleted when theinput data modifier 902 is configured to work in the pruning mode. - Further to the above non-limiting example, the
weight modifier 904 may include a first level weight multiplexer 910 and a second level weight multiplexer 912. The first level weight multiplexer 910 may be configured to output a zero value if a corresponding connection value is 0 and to output a weight value corresponding to the connection value if the connection value is 1. The second level weight multiplexer 912 may be configured to output data received from the first level weight multiplexer 910 when in the compensation mode. Further, the second level weight multiplexer 912 may be configured to directly output a corresponding weight value when in the pruning mode. As such, additional elements of zero values may be added to the weight value array when theweight modifier 904 is configured to work in the compensation mode. -
FIG. 10 is a flow chart of aspects of anexample method 1000 for modifying data for neural networks. Theexample method 1000 may be performed by one or more components of theMNN acceleration processor 206 as described inFIGS. 7 and 8 and the components of the data modifier 702/802 as described inFIG. 9 . - At
block 1002,method 1000 may include the data modifier 702/802 receiving one or more groups of input data, wherein the one or more groups of input data are stored as input elements in an input array and each of the input elements is identified by an input array index. - Further,
method 1000 may include the data modifier 702/802 receiving a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on the one or more groups of input data, wherein the one or more groups of output data are to be stored as output elements in an output array and each of the output elements is identified by an output array index. - Further still,
method 1000 may include the data modifier 702/802 receiving connection data that include one or more connection values, wherein each of the connection values corresponds to one of the input array indexes and one of the output array indexes and indicates whether one of the weight values in the predetermined weight value array is designated for calculating a group of the output data to be stored as the output element identified by the corresponding output array index based on a group of the input data stored as the input element identified by the corresponding input array index, and whether the weight value meets a predetermined condition. - At
block 1004,method 1000 may include the data modifier 702/802 modifying the weight values and the input data based on the connection data. In some aspects, the modifying may further includes sub-processes or sub-operations including deleting at least one weight values that correspond to the connection values that are zero, adding one or more zero values to the predetermined weight value array based on the connection values, deleting at least one groups of the input data that are stored as the input elements identified by the input array indexes corresponding to the connection values that are zero, or adding one or more zero values to the input elements identified by the input array indexes corresponding to the connection values that are zero. - In a specific example, the data modifier 702 may receive an input array including groups of input data (0.5, 0.6, 0.7, 1.2, 4, 0.1), an array of connection values including elements (1, 0, 0, 1, 1, 1), a predetermined weight value array including weight values (0.5, 0.8, 0.9, 0.4). In this example, the data modifier 702 may be configured to operate in the pruning mode. That is, since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7). The modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1).
- In another specific example where the data modifier 702 operates in the pruning mode, the data modifier 702 may receive groups of input data in an input array (0.5, 1.2, 4, 0.1), a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4), and the same array of connection values. Since the second and the third connection values are zeroes, the data modifier 702 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array. The modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4).
- At
block 1006,method 1000 may include the computing unit 710/810 calculating the one or more groups of output data based on the modified weight values and the modified input data. That is, the computing unit 710 may be configured to calculating one or more groups of output data based on the modified weight values and the modified input data. In some aspects, the computing unit 710 may further include one or more multipliers configured to multiply the modified input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data. -
FIG. 11 is a block diagram illustrating another exampleMMN acceleration processor 206 in which data modification for neural networks may be implemented. As depicted, components in the exampleMNN acceleration processor 206 may be the same or similar to the corresponding components shown inFIG. 7 or may be configured to perform the same or similar operations to those shown inFIG. 7 except that adata modifier 1102 may be implemented between aDMA module 1104 and aninput data cache 1112. For example, theDMA module 1104 may be configured to transmit and receive data from and to thememory 208, aninstruction cache 1106, thedata modifier 1102, aweight cache 1114, and anoutput data cache 1116. Theinstruction cache 1106, theinput data cache 1112, theweight cache 1114, and theoutput data cache 1116 may respectively refer to one or more high-speed storage devices incorporated within theMNN acceleration processor 206 and configured to respectively store instructions from theDMA module 1104, the modified input data from thedata modifier 1102, weight values from theDMA module 1104, and the calculated output data from acomputing unit 1110. - In this example, the
data modifier 1102 may be configured to receive one or more groups of input data for generating one or more groups of output data. The one or more groups of input data may be stored as input elements in an input array and each of the input elements is identified by an input array index. Thedata modifier 1102 may be further configured to receive connection data that include one or more connection values. In this example, unlike the data modifier 702/802, thedata modifier 1102 is not configured to receive the weight values as the weight values are directly transmitted from theDMA module 1104 to theweight cache 1114. - Upon receiving the input data and the connection data, the
data modifier 1102 may be configured to modify the received groups of input data based on the connection data. For example, thedata modifier 1102 may be configured to receive an input array including groups of input data as elements (0.5, 0.6, 0.7, 1.2, 4, 0.1) and an array of connection values (1, 0, 0, 1, 1, 1). When thedata modifier 1102 operates in the pruning mode, thedata modifier 1102 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7). The modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1). - In some other aspects, the
data modifier 1102 may operate in the compensation mode. For example, thedata modifier 1102 may receive an input array including elements (0.5, 1.2, 4, 0.1) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, thedata modifier 1102 may be configured to add two elements of zero value to the input array to be the second and the third elements of the input array generating a modified input array including elements (0.5, 0, 0, 1.2, 4, 0.1). - In this example, the modified input data may be transmitted to and temporarily stored at the
input data cache 1112. The modified input data may be further transmitted, together with the weight values from theweight cache 1114 and the decoded instructions from thecontroller unit 1108, to thecomputing unit 1110. Thecomputing unit 1110 may be configured to calculate one or more groups of output data based on the weight values and the modified input data. In some aspects, the calculation of the output data may include the forward propagation process and the backward propagation process described in accordance withFIG. 1 . - Similar to the computing unit 710, the
computing unit 1110 may include one or more multipliers configured to multiply the modified input data by the weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data. The generated output data may be temporarily stored in theoutput data cache 1116 and may be further transmitted to thememory 208 via theDMA module 1104. -
FIG. 12 is a block diagram illustrating anotherexample data modifier 1102 by which data modification for neural networks may be implemented. As thedata modifier 1102 may be configured to only modify the input data, thedata modifier 1102 may only include aninput data modifier 1202. The dash-lined block indicates anoptional weight modifier 904. - Similar to the
input data modifier 902, theinput data modifier 1202 may be configured to modify the input data depending on the operation mode. When operates in the pruning mode, theinput data modifier 1202 may be configured to delete groups of input data that correspond to the connection values that are zeroes. When operates in the compensation mode, theinput data modifier 1202 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes. - In some aspects, the
input data modifier 1202 may be implemented by one or more multiplexers and at least one storage device configured to store information indicating the current operation mode. - In a non-limiting example illustrated in
FIG. 12 , theinput data modifier 1202 may include aninput data filter 1206 and aninput data multiplexer 1208. The input data filter 1206 may be configured to output an input element if a connection value corresponding to the input element is 1. Further, when the connection value is 0, the input data filter 1206 may be configured to ignore the corresponding input element and move to process the next input element. Theinput data multiplexer 1208 may be configured to output data from the input data filter 1206 when in the pruning mode and to directly output the input data when in the compensation mode. As such, those input elements corresponding to the connection values of zero may be deleted when theinput data modifier 1202 is configured to work in the pruning mode. -
FIG. 13 is a flow chart of aspects of anotherexample method 1300 for modifying data for neural networks. Theexample method 1300 may be performed by one or more components of theMNN acceleration processor 206 as described inFIG. 11 and the component of thedata modifier 1102 as described inFIG. 12 . - At the
block 1302,method 1300 may include thedata modifier 1102 receiving one or more groups of input data for generating one or more groups of output data. As previously described, the one or more groups of input data may be stored as input elements in an input array and each of the input elements is identified by an input array index.Method 1300 may further include thedata modifier 1102 receiving connection data that include one or more connection values. - At the
block 1304,method 1300 may include thedata modifier 1102 modifying the received groups of input data based on the connection data. In some aspects, the modifying may further include sub-processes or sub-operations including deleting at least one groups of the input data that are stored as the input elements identified by the input array indexes corresponding to the connection values that are zero when thedata modifier 1102 operates in the pruning mode. In some other aspects, the modifying may include adding one or more zero values to the input elements identified by the input array indexes corresponding to the connection values that are zero when thedata modifier 1102 operates in the compensation mode. - In a specific example, the
data modifier 1102 may receive an input array including groups of input data as elements (0.5, 0.6, 0.7, 1.2, 4, 0.1) and an array of connection values (1, 0, 0, 1, 1, 1). When thedata modifier 1102 operates in the pruning mode, thedata modifier 1102 may be configured to delete the corresponding groups of the input data, i.e., the second and the third groups of the input data (0.6 and 0.7). The modified input data may be stored as an array including elements (0.5, 1.2, 4, 0.1). - In another example, the
data modifier 1102 may operate in the compensation mode. For example, thedata modifier 1102 may receive an input array including elements (0.5, 1.2, 4, 0.1) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, thedata modifier 1102 may be configured to add two elements of zero value to the input array to be the second and the third elements of the input array generating a modified input array including elements (0.5, 0, 0, 1.2, 4, 0.1). - At the
block 1306,method 1300 may include thecomputing unit 1110 calculating the one or more groups of output data based on the weight values and the modified input data. In some aspects, thecomputing unit 1110 may include one or more multipliers configured to multiply the modified input data by the weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data. -
FIG. 14 is a block diagram illustrating another exampleMMN acceleration processor 206 in which data modification for neural networks may be implemented. As depicted, components in the exampleMNN acceleration processor 206 may be the same or similar to the corresponding components shown inFIG. 7 or may be configured to perform the same or similar operations to those shown inFIG. 7 except that adata modifier 1402 may be implemented between aDMA module 1404 and aweight cache 1414. For example, theDMA module 1404 may be configured to transmit and receive data from and to thememory 208, aninstruction cache 1406, thedata modifier 1402, aninput data cache 1412, and anoutput data cache 1416. Theinstruction cache 1406, theinput data cache 1412, theweight cache 1414, and theoutput data cache 1416 may respectively refer to one or more high-speed storage devices incorporated within theMNN acceleration processor 206 and configured to respectively store instructions from theDMA module 1404, the input data from theDMA module 1404, the modified weight values from thedata modifier 1402, and the calculated output data from acomputing unit 1410. - In this example, the
data modifier 1402 may be configured to receive a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on one or more groups of input data. The one or more groups of input data may be stored as input elements in an input array and each of the input elements is identified by an input array index. The one or more groups of output data are to be stored as output elements in an output array and each of the output elements is identified by an output array index. Thedata modifier 1402 may be further configured to receive connection data that include one or more connection values. In this example, unlike the data modifier 702/802, thedata modifier 1402 is not configured to receive the input data as the input data may be directly transmitted from theDMA module 1404 to theinput data cache 1412. - Upon receiving the weight values and the connection data, the
data modifier 1402 may be configured to modify the weight values based on the connection data. For example, thedata modifier 1402 may receive a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4) and an array of connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, thedata modifier 1402 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array. The modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4). - In another example, the
data modifier 1402 may receive a predetermined weight value array including elements (0.5, 0.8, 0.9, 0.4) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, thedata modifier 1402 may be configured to add two elements of zero value to be the second and the third elements of the predetermined weight value array generating a modified weight value array including elements (0.5, 0, 0, 0.8, 0.9, 0.4). - The modified weight values may be transmitted to and temporarily stored at the
weight cache 1414. The modified weight values may be further transmitted, together with the input data from theinput data cache 1412 and the decoded instructions from thecontroller unit 1408, to thecomputing unit 1410. Thecomputing unit 1410 may be further configured to calculate one or more groups of output data based on the modified weight values and the input data. In some aspects, the calculation of the output data may include the forward propagation process and the backward propagation process described in accordance withFIG. 1 . - Similar to the computing unit 710, the
computing unit 1410 may include one or more multipliers configured to multiply the input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data. The generated output data may be temporarily stored in theoutput data cache 1416 and may be further transmitted to thememory 208 via theDMA module 1404. -
FIG. 15 is a block diagram illustrating another example data modifier by which data modification for neural networks may be implemented. As thedata modifier 1402 may be configured to only modify the weight values, thedata modifier 1402 may only include aweight modifier 1504. The dash-lined block indicates an optionalinput data modifier 902. - Similar to the
weight modifier 904, theweight modifier 1504 may be configured to modify the input data depending on the operation mode. When operates in the pruning mode, theweight modifier 1504 may be configured to delete weight values that correspond to the connection values that are zeroes. When operates in the compensation mode, theweight modifier 1504 may be configured to add one or more zeroes to be the elements corresponding to the connection values that are zeroes. - In some aspects, the
weight modifier 1504 may be implemented by one or more comparators and at least one storage device configured to store information indicating the current operation mode. - In a non-limiting example illustrated in
FIG. 15 , theweight modifier 1504 may include a firstlevel weight multiplexer 1506 and a second level weight multiplexer 1508. The firstlevel weight multiplexer 1506 may be configured to output a zero value if a corresponding connection value is 0 and to output a weight value corresponding to the connection value if the connection value is 1. The second level weight multiplexer 1508 may be configured to output data received from the firstlevel weight multiplexer 1506 when in the compensation mode. Further, the second level weight multiplexer 1508 may be configured to directly output a corresponding weight value when in the pruning mode. As such, additional elements of zero values may be added to the weight value array when theweight modifier 1504 is configured to work in the compensation mode. -
FIG. 16 is a flow chart of aspects of another example method for modifying data for neural networks. Theexample method 1600 may be performed by one or more components of theMNN acceleration processor 206 as described inFIG. 14 and the component of thedata modifier 1402 as described inFIG. 15 . - At
block 1602,method 1600 may include thedata modifier 1402 receiving a predetermined weight value array that includes one or more weight values for calculating one or more groups of output data based on one or more groups of input data.Method 1600 may further include thedata modifier 1402 receiving connection data that include one or more connection values. - At
block 1604,method 1600 may include thedata modifier 1402 modifying the weight values based on the connection data. In some aspects, the modifying may further include sub-processes or sub-operations including deleting at least one weight values that correspond to the connection values that are zero when thedata modifier 1402 operates in the pruning mode. In some other aspects, the modifying may include adding one or more zero values to the predetermined weight value array based on the connection values when thedata modifier 1402 operates in the compensation mode. - In a specific example, the
data modifier 1402 may receive a predetermined weight value array including weight values (0.5, 0, 0, 0.8, 0.9, 0.4) and an array of connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, thedata modifier 1402 may be configured to delete the corresponding weight values from the predetermined weight value array. That is, the second and the third weight values in the predetermined weight value array. The modified weight value array may be stored as an array including elements (0.5, 0.8, 0.9, 0.4). - In another example, the
data modifier 1402 may receive a predetermined weight value array including elements (0.5, 0.8, 0.9, 0.4) and the same connection data including connection values (1, 0, 0, 1, 1, 1). Since the second and the third connection values are zeroes, thedata modifier 1402 may be configured to add two elements of zero value to be the second and the third elements of the predetermined weight value array generating a modified weight value array including elements (0.5, 0, 0, 0.8, 0.9, 0.4). - At the
block 1606,method 1600 may include thecomputing unit 1410 calculating the one or more groups of output data based on the modified weight values and the input data. In some aspects, thecomputing unit 1410 may include one or more multipliers configured to multiply the input data by the modified weight values to generate one or more weighted input data, one or more adders configured to add the one or more weighted input data to generate a total weighted value and add a bias value to the total weighted value to generate a biased value, and an activation processor configured to perform an activation function on the biased value to generate a group of output data. - It is understood that the specific order or hierarchy of steps in the processes disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged. Further, some steps may be combined or omitted. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy presented.
- The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. All structural and functional equivalents to the elements of the various aspects described herein that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed as a means plus function unless the element is expressly recited using the phrase “means for.”
- Moreover, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise, or clear from the context, the phrase “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, the phrase “X employs A or B” is satisfied by any of the following instances: X employs A; X employs B; or X employs both A and B. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more” unless specified otherwise or clear from the context to be directed to a singular form.
Claims (20)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610039162.5A CN105512723B (en) | 2016-01-20 | 2016-01-20 | A kind of artificial neural networks apparatus and method for partially connected |
CN201610039162.5 | 2016-01-20 | ||
PCT/CN2016/078545 WO2017124646A1 (en) | 2016-01-20 | 2016-04-06 | Artificial neural network calculating device and method for sparse connection |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/078545 Division WO2017124646A1 (en) | 2016-01-20 | 2016-04-06 | Artificial neural network calculating device and method for sparse connection |
Publications (1)
Publication Number | Publication Date |
---|---|
US20180260711A1 true US20180260711A1 (en) | 2018-09-13 |
Family
ID=55720686
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/975,075 Pending US20180260710A1 (en) | 2016-01-20 | 2018-05-09 | Calculating device and method for a sparsely connected artificial neural network |
US15/975,065 Abandoned US20180260709A1 (en) | 2016-01-20 | 2018-05-09 | Calculating device and method for a sparsely connected artificial neural network |
US15/975,083 Abandoned US20180260711A1 (en) | 2016-01-20 | 2018-05-09 | Calculating device and method for a sparsely connected artificial neural network |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/975,075 Pending US20180260710A1 (en) | 2016-01-20 | 2018-05-09 | Calculating device and method for a sparsely connected artificial neural network |
US15/975,065 Abandoned US20180260709A1 (en) | 2016-01-20 | 2018-05-09 | Calculating device and method for a sparsely connected artificial neural network |
Country Status (5)
Country | Link |
---|---|
US (3) | US20180260710A1 (en) |
EP (1) | EP3407266B1 (en) |
KR (3) | KR102163561B1 (en) |
CN (6) | CN105512723B (en) |
WO (1) | WO2017124646A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20200049366A (en) * | 2018-10-31 | 2020-05-08 | 삼성전자주식회사 | Neural network processor and convolution operation method thereof |
US20210041934A1 (en) * | 2018-09-27 | 2021-02-11 | Intel Corporation | Power savings for neural network architecture with zero activations during inference |
US11010315B2 (en) * | 2017-04-17 | 2021-05-18 | Microsoft Technology Licensing, Llc | Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size |
WO2021158085A1 (en) * | 2020-02-05 | 2021-08-12 | Samsung Electronics Co., Ltd. | Neural network update method, classification method and electronic device |
US11165952B2 (en) * | 2018-11-29 | 2021-11-02 | Canon Kabushiki Kaisha | Information processing apparatus, image capturing apparatus, method for controlling information processing apparatus, and non-transitory storage medium |
US11238337B2 (en) * | 2016-08-22 | 2022-02-01 | Applied Brain Research Inc. | Methods and systems for implementing dynamic neural networks |
US11294677B2 (en) | 2020-02-20 | 2022-04-05 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
US11422803B2 (en) * | 2020-01-07 | 2022-08-23 | SK Hynix Inc. | Processing-in-memory (PIM) device |
US11537323B2 (en) | 2020-01-07 | 2022-12-27 | SK Hynix Inc. | Processing-in-memory (PIM) device |
US11645529B2 (en) * | 2018-05-01 | 2023-05-09 | Hewlett Packard Enterprise Development Lp | Sparsifying neural network models |
Families Citing this family (126)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109358900B (en) * | 2016-04-15 | 2020-07-03 | 中科寒武纪科技股份有限公司 | Artificial neural network forward operation device and method supporting discrete data representation |
CN108416436B (en) * | 2016-04-18 | 2021-06-01 | 中国科学院计算技术研究所 | Method and system for neural network partitioning using multi-core processing module |
CN111860811B (en) | 2016-04-27 | 2024-01-16 | 中科寒武纪科技股份有限公司 | Device and method for executing full-connection layer forward operation of artificial neural network |
WO2017185256A1 (en) * | 2016-04-27 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Rmsprop gradient descent algorithm execution apparatus and method |
CN109284825B (en) * | 2016-04-29 | 2020-04-14 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing LSTM operations |
CN107704267B (en) * | 2016-04-29 | 2020-05-08 | 中科寒武纪科技股份有限公司 | Convolution neural network operation instruction and method thereof |
CN111860814B (en) * | 2016-04-29 | 2024-01-16 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing batch normalization operations |
CN107341541B (en) * | 2016-04-29 | 2021-01-29 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing full connectivity layer neural network training |
WO2017185347A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing recurrent neural network and lstm computations |
CN111310904B (en) * | 2016-04-29 | 2024-03-08 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing convolutional neural network training |
US20170330069A1 (en) * | 2016-05-11 | 2017-11-16 | Kneron Inc. | Multi-layer artificial neural network and controlling method thereof |
JP6913388B2 (en) * | 2016-05-19 | 2021-08-04 | 国立大学法人東京工業大学 | Neural network circuit and neural network integrated circuit |
CN105893159B (en) * | 2016-06-21 | 2018-06-19 | 北京百度网讯科技有限公司 | Data processing method and device |
CN107678781B (en) * | 2016-08-01 | 2021-02-26 | 北京百度网讯科技有限公司 | Processor and method for executing instructions on processor |
CN111310893B (en) | 2016-08-05 | 2023-11-21 | 中科寒武纪科技股份有限公司 | Device and method for executing neural network operation |
CN107808664B (en) * | 2016-08-30 | 2021-07-30 | 富士通株式会社 | Sparse neural network-based voice recognition method, voice recognition device and electronic equipment |
CN106485317A (en) * | 2016-09-26 | 2017-03-08 | 上海新储集成电路有限公司 | A kind of neutral net accelerator and the implementation method of neural network model |
WO2018058427A1 (en) * | 2016-09-29 | 2018-04-05 | 北京中科寒武纪科技有限公司 | Neural network computation apparatus and method |
CN110298443B (en) * | 2016-09-29 | 2021-09-17 | 中科寒武纪科技股份有限公司 | Neural network operation device and method |
CN106529670B (en) * | 2016-10-27 | 2019-01-25 | 中国科学院计算技术研究所 | It is a kind of based on weight compression neural network processor, design method, chip |
CN108022281B (en) * | 2016-10-31 | 2021-02-23 | 龙芯中科技术股份有限公司 | Method and device for rendering three-dimensional graph |
CN108073550A (en) * | 2016-11-14 | 2018-05-25 | 耐能股份有限公司 | Buffer unit and convolution algorithm apparatus and method |
CN111860826B (en) * | 2016-11-17 | 2024-08-13 | 北京图森智途科技有限公司 | Neural network pruning method and device |
WO2018103736A1 (en) | 2016-12-09 | 2018-06-14 | Beijing Horizon Information Technology Co., Ltd. | Systems and methods for data management |
WO2018108126A1 (en) * | 2016-12-14 | 2018-06-21 | 上海寒武纪信息科技有限公司 | Neural network convolution operation device and method |
CN108205706B (en) * | 2016-12-19 | 2021-04-23 | 上海寒武纪信息科技有限公司 | Artificial neural network reverse training device and method |
WO2018112699A1 (en) * | 2016-12-19 | 2018-06-28 | 上海寒武纪信息科技有限公司 | Artificial neural network reverse training device and method |
CN108205700B (en) * | 2016-12-20 | 2021-07-30 | 上海寒武纪信息科技有限公司 | Neural network operation device and method |
EP3340129B1 (en) * | 2016-12-21 | 2019-01-30 | Axis AB | Artificial neural network class-based pruning |
WO2018113790A1 (en) * | 2016-12-23 | 2018-06-28 | 北京中科寒武纪科技有限公司 | Operation apparatus and method for artificial neural network |
CN108154228B (en) | 2016-12-28 | 2022-04-12 | 上海寒武纪信息科技有限公司 | Artificial neural network computing device and method |
CN113537480B (en) * | 2016-12-30 | 2024-04-02 | 上海寒武纪信息科技有限公司 | Apparatus and method for performing LSTM neural network operation |
WO2018120016A1 (en) * | 2016-12-30 | 2018-07-05 | 上海寒武纪信息科技有限公司 | Apparatus for executing lstm neural network operation, and operational method |
TWI630544B (en) * | 2017-02-10 | 2018-07-21 | 耐能股份有限公司 | Operation device and method for convolutional neural network |
CN107633297B (en) * | 2017-03-10 | 2021-04-06 | 南京风兴科技有限公司 | Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm |
CN106951962B (en) * | 2017-03-22 | 2020-09-01 | 南京地平线机器人技术有限公司 | Complex arithmetic unit, method and electronic device for neural network |
CN110462637B (en) * | 2017-03-24 | 2022-07-19 | 华为技术有限公司 | Neural network data processing device and method |
CN108734280A (en) * | 2017-04-21 | 2018-11-02 | 上海寒武纪信息科技有限公司 | A kind of arithmetic unit and method |
US11551067B2 (en) | 2017-04-06 | 2023-01-10 | Shanghai Cambricon Information Technology Co., Ltd | Neural network processor and neural network computation method |
EP3633526A1 (en) * | 2017-04-06 | 2020-04-08 | Shanghai Cambricon Information Technology Co., Ltd | Computation device and method |
US10346944B2 (en) * | 2017-04-09 | 2019-07-09 | Intel Corporation | Machine learning sparse computation mechanism |
WO2018192500A1 (en) | 2017-04-19 | 2018-10-25 | 上海寒武纪信息科技有限公司 | Processing apparatus and processing method |
CN108734288B (en) * | 2017-04-21 | 2021-01-29 | 上海寒武纪信息科技有限公司 | Operation method and device |
CN108734279B (en) * | 2017-04-20 | 2021-04-23 | 上海寒武纪信息科技有限公司 | Arithmetic device and method |
CN109284823B (en) * | 2017-04-20 | 2020-08-04 | 上海寒武纪信息科技有限公司 | Arithmetic device and related product |
CN117933327A (en) | 2017-04-21 | 2024-04-26 | 上海寒武纪信息科技有限公司 | Processing device, processing method, chip and electronic device |
CN109478251B (en) * | 2017-05-23 | 2021-01-05 | 安徽寒武纪信息科技有限公司 | Processing method and acceleration device |
CN108960420B (en) * | 2017-05-23 | 2021-06-08 | 上海寒武纪信息科技有限公司 | Processing method and acceleration device |
CN109146069B (en) * | 2017-06-16 | 2020-10-13 | 上海寒武纪信息科技有限公司 | Arithmetic device, arithmetic method, and chip |
CN109389210B (en) * | 2017-08-09 | 2021-06-18 | 上海寒武纪信息科技有限公司 | Processing method and processing apparatus |
WO2018214913A1 (en) | 2017-05-23 | 2018-11-29 | 上海寒武纪信息科技有限公司 | Processing method and accelerating device |
CN110175673B (en) * | 2017-05-23 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Processing method and acceleration device |
CN109117455A (en) | 2017-06-26 | 2019-01-01 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN109102073A (en) | 2017-06-21 | 2018-12-28 | 上海寒武纪信息科技有限公司 | A kind of sparse training method |
CN109102074B (en) * | 2017-06-21 | 2021-06-01 | 上海寒武纪信息科技有限公司 | Training device |
EP3637327B1 (en) | 2017-06-13 | 2023-09-13 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
CN109426553A (en) | 2017-08-21 | 2019-03-05 | 上海寒武纪信息科技有限公司 | Task cutting device and method, Task Processing Unit and method, multi-core processor |
CN109214616B (en) * | 2017-06-29 | 2023-04-07 | 上海寒武纪信息科技有限公司 | Information processing device, system and method |
CN110413551B (en) | 2018-04-28 | 2021-12-10 | 上海寒武纪信息科技有限公司 | Information processing apparatus, method and device |
WO2019001418A1 (en) * | 2017-06-26 | 2019-01-03 | 上海寒武纪信息科技有限公司 | Data sharing system and data sharing method therefor |
CN109583577B (en) * | 2017-09-29 | 2021-04-23 | 上海寒武纪信息科技有限公司 | Arithmetic device and method |
CN107729990B (en) * | 2017-07-20 | 2021-06-08 | 上海寒武纪信息科技有限公司 | Apparatus and method for performing forward operations in support of discrete data representations |
EP3651031A1 (en) * | 2017-08-31 | 2020-05-13 | Cambricon Technologies Corporation Limited | Chip device and related products |
CN109615061B (en) * | 2017-08-31 | 2022-08-26 | 中科寒武纪科技股份有限公司 | Convolution operation method and device |
CN107590535A (en) * | 2017-09-08 | 2018-01-16 | 西安电子科技大学 | Programmable neural network processor |
US10366322B2 (en) * | 2017-10-06 | 2019-07-30 | DeepCube LTD. | System and method for compact and efficient sparse neural networks |
CN107766643B (en) * | 2017-10-16 | 2021-08-03 | 华为技术有限公司 | Data processing method and related device |
EP3660628B1 (en) * | 2017-10-20 | 2023-12-06 | Shanghai Cambricon Information Technology Co., Ltd | Dynamic voltage frequency scaling device and method |
CN109697507B (en) * | 2017-10-24 | 2020-12-25 | 安徽寒武纪信息科技有限公司 | Processing method and device |
CN109697135B (en) * | 2017-10-20 | 2021-03-26 | 上海寒武纪信息科技有限公司 | Storage device and method, data processing device and method, and electronic device |
CN108958801B (en) * | 2017-10-30 | 2021-06-25 | 上海寒武纪信息科技有限公司 | Neural network processor and method for executing vector maximum value instruction by using same |
GB2560600B (en) * | 2017-11-06 | 2020-03-04 | Imagination Tech Ltd | Nueral Network Hardware |
CN110059811B (en) | 2017-11-06 | 2024-08-02 | 畅想科技有限公司 | Weight buffer |
CN110097180B (en) * | 2018-01-29 | 2020-02-21 | 上海寒武纪信息科技有限公司 | Computer device, data processing method, and storage medium |
CN111738431B (en) * | 2017-12-11 | 2024-03-05 | 中科寒武纪科技股份有限公司 | Neural network computing device and method |
CN109993290B (en) | 2017-12-30 | 2021-08-06 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
WO2019129302A1 (en) | 2017-12-30 | 2019-07-04 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and related product |
CN109993289B (en) * | 2017-12-30 | 2021-09-21 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN109993292B (en) * | 2017-12-30 | 2020-08-04 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN109993291B (en) * | 2017-12-30 | 2020-07-07 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN108021537B (en) * | 2018-01-05 | 2022-09-16 | 南京大学 | Softmax function calculation method based on hardware platform |
EP3743813A4 (en) * | 2018-01-24 | 2021-11-10 | Qomplx, Inc. | Platform for hierarchy cooperative computing |
CN110163361B (en) * | 2018-02-13 | 2021-06-25 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110197263B (en) * | 2018-02-27 | 2020-10-09 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN111767996B (en) * | 2018-02-27 | 2024-03-05 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related products |
CN108171328B (en) * | 2018-03-02 | 2020-12-29 | 中国科学院计算技术研究所 | Neural network processor and convolution operation method executed by same |
CN110414663B (en) * | 2018-04-28 | 2022-03-25 | 深圳云天励飞技术有限公司 | Convolution implementation method of neural network and related product |
CN108764468A (en) * | 2018-05-03 | 2018-11-06 | 中国科学院计算技术研究所 | Artificial neural network processor for intelligent recognition |
CN108647777A (en) * | 2018-05-08 | 2018-10-12 | 济南浪潮高新科技投资发展有限公司 | A kind of data mapped system and method for realizing that parallel-convolution calculates |
US11423312B2 (en) * | 2018-05-14 | 2022-08-23 | Samsung Electronics Co., Ltd | Method and apparatus for universal pruning and compression of deep convolutional neural networks under joint sparsity constraints |
CN108647155B (en) * | 2018-05-14 | 2020-08-21 | 瑞芯微电子股份有限公司 | Deep learning-based multi-level cache sharing method and device |
CN108566537A (en) * | 2018-05-16 | 2018-09-21 | 中国科学院计算技术研究所 | Image processing apparatus for carrying out neural network computing to video frame |
CN108647660A (en) * | 2018-05-16 | 2018-10-12 | 中国科学院计算技术研究所 | A method of handling image using neural network chip |
CN108921012B (en) * | 2018-05-16 | 2022-05-03 | 中国科学院计算技术研究所 | Method for processing image video frame by using artificial intelligence chip |
CN108764465B (en) * | 2018-05-18 | 2021-09-24 | 中国科学院计算技术研究所 | Processing device for neural network operation |
CN108764470B (en) * | 2018-05-18 | 2021-08-31 | 中国科学院计算技术研究所 | Processing method for artificial neural network operation |
CN108647781B (en) * | 2018-05-18 | 2021-08-27 | 中国科学院计算技术研究所 | Artificial intelligence chip processing apparatus |
CN108846142A (en) * | 2018-07-12 | 2018-11-20 | 南方电网调峰调频发电有限公司 | A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing |
CN110738316B (en) * | 2018-07-20 | 2024-05-14 | 北京三星通信技术研究有限公司 | Operation method and device based on neural network and electronic equipment |
CN110865950B (en) * | 2018-08-28 | 2021-01-12 | 中科寒武纪科技股份有限公司 | Data preprocessing method and device, computer equipment and storage medium |
US11966583B2 (en) | 2018-08-28 | 2024-04-23 | Cambricon Technologies Corporation Limited | Data pre-processing method and device, and related computer device and storage medium |
CN110874550A (en) * | 2018-08-31 | 2020-03-10 | 华为技术有限公司 | Data processing method, device, equipment and system |
US12094456B2 (en) | 2018-09-13 | 2024-09-17 | Shanghai Cambricon Information Technology Co., Ltd. | Information processing method and system |
CN111078624B (en) * | 2018-10-18 | 2022-03-25 | 上海寒武纪信息科技有限公司 | Network-on-chip processing system and network-on-chip data processing method |
CN111078623B (en) * | 2018-10-18 | 2022-03-29 | 上海寒武纪信息科技有限公司 | Network-on-chip processing system and network-on-chip data processing method |
CN111078625B (en) * | 2018-10-18 | 2022-03-29 | 上海寒武纪信息科技有限公司 | Network-on-chip processing system and network-on-chip data processing method |
CN111198670B (en) | 2018-11-20 | 2021-01-29 | 华为技术有限公司 | Method, circuit and SOC for executing matrix multiplication operation |
CN111290698B (en) * | 2018-12-07 | 2022-05-03 | 上海寒武纪信息科技有限公司 | Data access method, data processing method, data access circuit and arithmetic device |
CN109657788A (en) * | 2018-12-18 | 2019-04-19 | 北京中科寒武纪科技有限公司 | Data processing method, device and Related product |
KR102663561B1 (en) * | 2018-12-28 | 2024-05-08 | 엘지전자 주식회사 | Refrigerator |
KR102579883B1 (en) | 2018-12-28 | 2023-09-18 | 엘지전자 주식회사 | Refrigerator |
CN111382864B (en) * | 2018-12-29 | 2024-08-20 | 中科寒武纪科技股份有限公司 | Neural network training method and device |
CN109740739B (en) * | 2018-12-29 | 2020-04-24 | 中科寒武纪科技股份有限公司 | Neural network computing device, neural network computing method and related products |
KR102360452B1 (en) * | 2019-06-18 | 2022-02-11 | 주식회사 퓨리오사에이아이 | Method and apparatus for processing convolutional operation of neural network processor |
US11630992B2 (en) | 2019-07-05 | 2023-04-18 | Electronics And Telecommunications Research Institute | Neural network generation method for neuromorphic computing and apparatus for the same |
CN110929627B (en) * | 2019-11-18 | 2021-12-28 | 北京大学 | Image recognition method of efficient GPU training model based on wide-model sparse data set |
CN111312270B (en) | 2020-02-10 | 2022-11-22 | 腾讯科技(深圳)有限公司 | Voice enhancement method and device, electronic equipment and computer readable storage medium |
US11556384B2 (en) | 2020-03-13 | 2023-01-17 | Cisco Technology, Inc. | Dynamic allocation and re-allocation of learning model computing resources |
CN111292322B (en) * | 2020-03-19 | 2024-03-01 | 中国科学院深圳先进技术研究院 | Medical image processing method, device, equipment and storage medium |
CN111966405B (en) * | 2020-07-03 | 2022-07-26 | 北京航空航天大学杭州创新研究院 | Polar code high-speed parallel decoding method based on GPU |
CN112288085B (en) * | 2020-10-23 | 2024-04-09 | 中国科学院计算技术研究所 | Image detection method and system based on convolutional neural network |
CN112994840B (en) * | 2021-02-03 | 2021-11-02 | 白盒子(上海)微电子科技有限公司 | Decoder based on neural network |
CN114897800A (en) * | 2022-04-22 | 2022-08-12 | 南京航空航天大学 | Ultrahigh-speed X-ray image identification method and device based on SiC neural network chip |
CN116863490B (en) * | 2023-09-04 | 2023-12-12 | 之江实验室 | Digital identification method and hardware accelerator for FeFET memory array |
CN117689025B (en) * | 2023-12-07 | 2024-06-14 | 上海交通大学 | Quick large model reasoning service method and system suitable for consumer display card |
CN118333099B (en) * | 2024-06-12 | 2024-09-06 | 上海岩芯数智人工智能科技有限公司 | Multi-mode shared neural network model construction method and device |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR930009066B1 (en) * | 1990-08-18 | 1993-09-22 | 정호선 | Multilayer neural network and method of its circuit design |
US5517596A (en) * | 1991-05-17 | 1996-05-14 | International Business Machines Corporation | Learning machine synapse processor system apparatus |
US5293456A (en) * | 1991-06-28 | 1994-03-08 | E. I. Du Pont De Nemours And Company | Object recognition system employing a sparse comparison neural network |
US5926566A (en) * | 1996-11-15 | 1999-07-20 | Synaptics, Inc. | Incremental ideographic character input method |
AU2001283397A1 (en) * | 2000-08-16 | 2002-02-25 | Research Foundation Of State University Of New York | Neural network device for evolving appropriate connections |
US20020143720A1 (en) * | 2001-04-03 | 2002-10-03 | Anderson Robert Lee | Data structure for improved software implementation of a neural network |
JP4513865B2 (en) * | 2008-01-25 | 2010-07-28 | セイコーエプソン株式会社 | Parallel computing device and parallel computing method |
CN101527010B (en) * | 2008-03-06 | 2011-12-07 | 上海理工大学 | Hardware realization method and system for artificial neural network algorithm |
US9015093B1 (en) * | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
CN102201188A (en) * | 2011-05-25 | 2011-09-28 | 华侨大学 | Building television advertisement system oriented intelligent control device and method |
WO2013111200A1 (en) * | 2012-01-23 | 2013-08-01 | パナソニック株式会社 | Neural network circuit learning method |
US8903746B2 (en) * | 2012-03-22 | 2014-12-02 | Audrey Kudritskiy | System and method for viewing, modifying, storing, and running artificial neural network components |
KR20150016089A (en) * | 2013-08-02 | 2015-02-11 | 안병익 | Neural network computing apparatus and system, and method thereof |
US9679258B2 (en) * | 2013-10-08 | 2017-06-13 | Google Inc. | Methods and apparatus for reinforcement learning |
JP6236296B2 (en) * | 2013-11-14 | 2017-11-22 | 株式会社デンソーアイティーラボラトリ | Learning device, learning program, and learning method |
US10339447B2 (en) * | 2014-01-23 | 2019-07-02 | Qualcomm Incorporated | Configuring sparse neuronal networks |
US20150269479A1 (en) * | 2014-03-24 | 2015-09-24 | Qualcomm Incorporated | Conversion of neuron types to hardware |
CN104077595B (en) * | 2014-06-15 | 2017-06-20 | 北京工业大学 | Deep learning network image recognition methods based on Bayesian regularization |
CN104200224A (en) * | 2014-08-28 | 2014-12-10 | 西北工业大学 | Valueless image removing method based on deep convolutional neural networks |
US10169073B2 (en) * | 2015-12-20 | 2019-01-01 | Intel Corporation | Hardware accelerators and methods for stateful compression and decompression operations |
US20190138922A1 (en) * | 2016-04-15 | 2019-05-09 | Cambricon Technologies Corporation Limited | Apparatus and methods for forward propagation in neural networks supporting discrete data |
US10949736B2 (en) * | 2016-11-03 | 2021-03-16 | Intel Corporation | Flexible neural network accelerator and methods therefor |
CN107688850B (en) * | 2017-08-08 | 2021-04-13 | 赛灵思公司 | Deep neural network compression method |
US10719932B2 (en) * | 2018-03-01 | 2020-07-21 | Carl Zeiss Meditec, Inc. | Identifying suspicious areas in ophthalmic data |
-
2016
- 2016-01-20 CN CN201610039162.5A patent/CN105512723B/en active Active
- 2016-01-20 CN CN201710794715.2A patent/CN107563497B/en active Active
- 2016-01-20 CN CN201710794580.XA patent/CN107545303B/en active Active
- 2016-01-20 CN CN201710794711.4A patent/CN107609642B/en active Active
- 2016-01-20 CN CN201710794713.3A patent/CN107578099B/en active Active
- 2016-01-20 CN CN201710794712.9A patent/CN107506828B/en active Active
- 2016-04-06 KR KR1020187018866A patent/KR102163561B1/en active IP Right Grant
- 2016-04-06 EP EP16885910.6A patent/EP3407266B1/en active Active
- 2016-04-06 KR KR1020187018864A patent/KR102166775B1/en active IP Right Grant
- 2016-04-06 WO PCT/CN2016/078545 patent/WO2017124646A1/en unknown
- 2016-04-06 KR KR1020187015437A patent/KR102142889B1/en active IP Right Grant
-
2018
- 2018-05-09 US US15/975,075 patent/US20180260710A1/en active Pending
- 2018-05-09 US US15/975,065 patent/US20180260709A1/en not_active Abandoned
- 2018-05-09 US US15/975,083 patent/US20180260711A1/en not_active Abandoned
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11238337B2 (en) * | 2016-08-22 | 2022-02-01 | Applied Brain Research Inc. | Methods and systems for implementing dynamic neural networks |
US11010315B2 (en) * | 2017-04-17 | 2021-05-18 | Microsoft Technology Licensing, Llc | Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size |
US11645529B2 (en) * | 2018-05-01 | 2023-05-09 | Hewlett Packard Enterprise Development Lp | Sparsifying neural network models |
US20210041934A1 (en) * | 2018-09-27 | 2021-02-11 | Intel Corporation | Power savings for neural network architecture with zero activations during inference |
KR20200049366A (en) * | 2018-10-31 | 2020-05-08 | 삼성전자주식회사 | Neural network processor and convolution operation method thereof |
TWI834729B (en) * | 2018-10-31 | 2024-03-11 | 南韓商三星電子股份有限公司 | Neural network processor and convolution operation method thereof |
US11244028B2 (en) * | 2018-10-31 | 2022-02-08 | Samsung Electronics Co., Ltd. | Neural network processor and convolution operation method thereof |
KR102637733B1 (en) | 2018-10-31 | 2024-02-19 | 삼성전자주식회사 | Neural network processor and convolution operation method thereof |
US11165952B2 (en) * | 2018-11-29 | 2021-11-02 | Canon Kabushiki Kaisha | Information processing apparatus, image capturing apparatus, method for controlling information processing apparatus, and non-transitory storage medium |
US11537323B2 (en) | 2020-01-07 | 2022-12-27 | SK Hynix Inc. | Processing-in-memory (PIM) device |
US11422803B2 (en) * | 2020-01-07 | 2022-08-23 | SK Hynix Inc. | Processing-in-memory (PIM) device |
US11842193B2 (en) | 2020-01-07 | 2023-12-12 | SK Hynix Inc. | Processing-in-memory (PIM) device |
WO2021158085A1 (en) * | 2020-02-05 | 2021-08-12 | Samsung Electronics Co., Ltd. | Neural network update method, classification method and electronic device |
US11294677B2 (en) | 2020-02-20 | 2022-04-05 | Samsung Electronics Co., Ltd. | Electronic device and control method thereof |
Also Published As
Publication number | Publication date |
---|---|
EP3407266B1 (en) | 2023-08-09 |
CN105512723B (en) | 2018-02-16 |
CN105512723A (en) | 2016-04-20 |
CN107506828B (en) | 2020-11-03 |
CN107609642B (en) | 2021-08-31 |
EP3407266A4 (en) | 2019-09-18 |
WO2017124646A1 (en) | 2017-07-27 |
KR102166775B1 (en) | 2020-10-16 |
CN107506828A (en) | 2017-12-22 |
CN107578099A (en) | 2018-01-12 |
CN107545303A (en) | 2018-01-05 |
KR20180093969A (en) | 2018-08-22 |
KR102163561B1 (en) | 2020-10-08 |
US20180260710A1 (en) | 2018-09-13 |
CN107563497A (en) | 2018-01-09 |
CN107563497B (en) | 2021-03-19 |
CN107609642A (en) | 2018-01-19 |
KR102142889B1 (en) | 2020-08-10 |
EP3407266A1 (en) | 2018-11-28 |
KR20180101334A (en) | 2018-09-12 |
KR20180093970A (en) | 2018-08-22 |
US20180260709A1 (en) | 2018-09-13 |
CN107545303B (en) | 2021-09-07 |
CN107578099B (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20180260711A1 (en) | Calculating device and method for a sparsely connected artificial neural network | |
US11308398B2 (en) | Computation method | |
US11775832B2 (en) | Device and method for artificial neural network operation | |
US10402725B2 (en) | Apparatus and method for compression coding for artificial neural network | |
US11734006B2 (en) | Deep vision processor | |
Li et al. | A high performance FPGA-based accelerator for large-scale convolutional neural networks | |
US11593658B2 (en) | Processing method and device | |
US10691996B2 (en) | Hardware accelerator for compressed LSTM | |
CN108701250B (en) | Data fixed-point method and device | |
KR102175044B1 (en) | Apparatus and method for running artificial neural network reverse training | |
US8131659B2 (en) | Field-programmable gate array based accelerator system | |
CN108090565A (en) | Accelerated method is trained in a kind of convolutional neural networks parallelization | |
US11429855B2 (en) | Acceleration of neural networks using depth-first processing | |
CN110321997B (en) | High-parallelism computing platform, system and computing implementation method | |
US20190138922A1 (en) | Apparatus and methods for forward propagation in neural networks supporting discrete data | |
US20210286860A1 (en) | Method and device for matrix multiplication optimization using vector registers | |
CN115023685A (en) | Accelerator for dense and sparse matrix computations | |
Mao et al. | Energy-efficient machine learning accelerator for binary neural networks | |
CN111582444A (en) | Matrix data processing device, electronic equipment and storage medium | |
CN113761934B (en) | Word vector representation method based on self-attention mechanism and self-attention model | |
Gonçalves et al. | Exploring data size to run convolutional neural networks in low density fpgas | |
WO2021013117A1 (en) | Systems and methods for providing block-wise sparsity in a neural network | |
US20190073584A1 (en) | Apparatus and methods for forward propagation in neural networks supporting discrete data | |
US20190080241A1 (en) | Apparatus and methods for backward propagation in neural networks supporting discrete data | |
TWI842584B (en) | Computer implemented method and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CAMBRICON TECHNOLOGIES CORPORATION LIMITED, CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, SHIJIN;GUO, QI;CHEN, YUNJI;AND OTHERS;SIGNING DATES FROM 20180420 TO 20180424;REEL/FRAME:045764/0541 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |