EP3449424A1 - Vorrichtung und verfahren zur verteilung von faltungsdaten eines neuronalen faltungsnetzwerks - Google Patents

Vorrichtung und verfahren zur verteilung von faltungsdaten eines neuronalen faltungsnetzwerks

Info

Publication number
EP3449424A1
EP3449424A1 EP17720462.5A EP17720462A EP3449424A1 EP 3449424 A1 EP3449424 A1 EP 3449424A1 EP 17720462 A EP17720462 A EP 17720462A EP 3449424 A1 EP3449424 A1 EP 3449424A1
Authority
EP
European Patent Office
Prior art keywords
input
convolution
permutation
network
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP17720462.5A
Other languages
English (en)
French (fr)
Inventor
Olivier Bichler
Antoine Dupret
Vincent LORRAIN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Original Assignee
Commissariat a lEnergie Atomique CEA
Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Commissariat a lEnergie Atomique CEA, Commissariat a lEnergie Atomique et aux Energies Alternatives CEA filed Critical Commissariat a lEnergie Atomique CEA
Publication of EP3449424A1 publication Critical patent/EP3449424A1/de
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present invention generally relates to convolutional neural networks and in particular to a device and a method for distributing the coefficients of at least one convolution core to computing units in a calculator based on convolutional neural network architecture. .
  • Artificial neural networks are computational models that mimic the functioning of biological neural networks. Artificial neural networks consist mainly of neurons interconnected by synapses that can be implemented by digital memories or by resistive components whose conductance varies as a function of the voltage applied to their terminals.
  • Convolutional neural networks correspond to a particular model of artificial neural network.
  • the convolutional neural networks were initially described in K. Fukushima's article, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36 (4): 193-202, 1980. ISSN 0340-1200. doi: 10.1007 / BF00344251 ".
  • the convolutional neural networks (designated in the English language by the terms “convolutional neural networks”, or “deep (convolutional) neural networks” or “ConvNets”) are networks of neurons without feedback (“feedforward”). , inspired by biological visual systems.
  • CNN Convolutional neural networks
  • the hardware implementations of existing convolutional convolutional networks are based on the use of a computer to calculate the convolutional layers of the convolutional neural network, the computer comprising at least one processing unit (such as a processor) and digital memories storing the data.
  • the processing units must access a parallel dataset.
  • access to data poses problems of routing and concurrency of data due to parallel reading.
  • the permutation networks are data distribution and parallel communication structures.
  • permutation networks called MINs networks (acronym for "Multistage Interconnection Networks” meaning “Multi-Stage Interconnection Networks”), such as “Butterfly Network” networks, “Oméga Network”, “Baseline Network” and the “Cube Network”.
  • MINs networks ancronym for "Multistage Interconnection Networks” meaning “Multi-Stage Interconnection Networks”
  • Such networks are used to connect N inputs to N outputs with several permutation stages.
  • Their complexities in number of permutators are (N / 2) * log2 (N), which makes them very compact.
  • the "Butterfly”, “Baseline” and “Cube” networks have a number of power inputs / outputs of two while the Omega network has a number of inputs / outputs that is a multiple of two. These networks are used in different architectures such as parallel computing architectures, switches, and so on. MIM networks have been implemented in several parallel computers. In existing solutions, an Ml-type network can be instantiated with two types of permutators, either two-state or four-state. Two-state permutators are controlled with one bit. In a so-called “uncrossed” state of the permutators, the inputs are not permuted. In a so-called “crossed” state, the entries are permuted. Four-state permutators also have the two states “uncrossed” and “crossed”, but include two additional states called “grouping state” and "state of unbundling".
  • the first nodes are 4 to 4 permutation structures
  • the higher depth nodes are 8 to 8 networks, and so on.
  • MIM networks are compact and powerful. However, the number of entries is not free, which implies a reduction in flexibility and / or the use of a larger memory than necessary. Furthermore, this type of network can perform a limited number of circular offsets using two-state switches. MIM networks can not be used in convolutional neural network architecture based calculators to allow parallel access to data.
  • the invention improves the situation by providing a device for distributing convolution coefficients of at least one convolution core of a convolutional neural network carried by an input bus to a set of processing units in a computer.
  • the device comprises at least one permutation network controlled by at least one control unit, the permutation network comprising a set of permutators arranged to perform circular offsets from least part of the input bus.
  • each control unit is configured to dynamically drive at least one of the permutators of the permutation networks in response to an input event applied to the convolution kernel and at least one parameter representing the maximum size of the convolution kernels. nuclei of convolution.
  • the invention further provides a neuromorphic calculator comprising a memory for storing the convolution kernel coefficients and a set of processing units for calculating the response of a neural network to an input event, the calculator comprising a device routing according to one of the preceding features for distributing the coefficients to the processing units.
  • Embodiments of the invention thus allow a set of processing units of a convolutional neural network based computer to access data stored in parallel memory.
  • FIG. 1 represents an example of a convolutional network
  • FIG. 2 is a diagram showing a convolution layer consisting of several output cards
  • FIG. 3 illustrates the principle of operation of a convolution layer in a convolutional neural network
  • FIG. 4 shows an example of a convolutional neural network based calculator in which the data routing device can be implemented, according to some embodiments
  • FIG. 5 illustrates the distribution of the weight coefficients as a function of an input event
  • FIG. 6 shows the structure of a computing module based on a convolutional neural network in which the data routing device can be implemented, according to some embodiments
  • FIG. 7 shows the inputs / outputs of the data routing device, according to some embodiments.
  • FIG. 8 shows the structure of the convolution data routing device, according to some embodiments.
  • FIG. 9 illustrates the dynamically configurable routing device structure from a shift tree topology, according to some embodiments
  • FIG. 10 shows an example of implementation of the convolution data routing device for a maximum filter of 5 * 5 * 1, according to some embodiments
  • FIG. 11 is a diagram showing an example of two-state permutators that can be used in permutation networks, according to an exemplary embodiment
  • FIG. 12 is a diagram illustrating two-state permutators distributed in layers on the bus lines, according to an exemplary embodiment
  • FIG. 13 is a diagram illustrating permutator layers of different degrees, according to another exemplary embodiment.
  • FIG. 14 shows an example of a permutation network for a 5-line bus with switch layers in 2 power n;
  • FIG. 15 shows an example of a shift tree for a maximum filter of 5 * 5 * 1;
  • FIG. 16 is a diagram of the routing device for a multi-convolution, according to an exemplary embodiment
  • Fig. 17 is a flowchart showing the configuration method according to some embodiments.
  • FIG. 18 illustrates the configuration method for a 5-line shift on a 5-line bus for the possible offsets on a power n-layer 2 permutation network
  • FIG. 19 represents an example of a permutator with two digital states, according to some embodiments.
  • FIG. 20 represents an example of a permutation network for a bus of 11 lines with layers in 2 power n.
  • An artificial neural network (also called “formal” neural network or simply designated by the term “neural network” hereinafter) consists of one or more layers of neurons, interconnected between them. Each layer consists of a set of neurons, which are connected to one or more previous layers. Each neuron in a layer can be connected to one or more neurons of one or more previous layers. The last layer of the network is called the “output layer”. Neurons are connected to each other by synapses, or synaptic weights (also called “weight coefficients” or “weights” or “convolution coefficients”), which weight the efficiency of the connection between neurons. Synaptic weights are adjustable parameters of the neural network and store the information included in the neural network. Synaptic weights can be positive or negative.
  • neural network or “convolutional”, “deep convolutional”, “convnets” are furthermore composed of special type layers which may comprise convolutional layers, pooling layers, Anglo-Saxon language) and fully connected layers.
  • a neural network consists of one or more convolutional layers, which can include pooling layers in the English language.
  • the convolutional layers can be followed by a multilayer perceptron classifier.
  • the output of a convolution layer may be connected to the input of the next layer or the output of the next layer. It can also loop back to the input or be output to other types of layers that are not convolutional layers.
  • each neuron is connected to a sub-matrix of the input matrix.
  • Sub-matrices have the same size. They are staggered from each other in a regular way and can overlap.
  • the input matrix may be of any size. However, the input matrix is generally of 2D dimension when the data to be processed are visual data, the two dimensions then corresponding to the spatial dimensions X and Y of a 3D image if color image.
  • FIG. 1 represents an example of simplified convolutional network, with an input layer “env” corresponding to the input matrix, two convolution layers, “convl” and “conv2", as well as two completely connected layers “fc1" and “fc2".
  • the size of the convolution cores is 5x5 pixels and they are offset by 2 pixels (an offset or "stride” of 2):
  • conv2 has 12 different convolution cores and thus 12 output cards, and each output card takes as input all 6 output cards of the previous layer.
  • the neurons are connected to their input sub-matrix / synapses whose weight is adjustable.
  • the matrix K of the synaptic weights applied to the input sub-matrices of the neurons is the same for all the neurons of the same output map ("feature map" in Anglo-Saxon).
  • Such a matrix K is also called a "convolution kernel”.
  • the convolution kernel is thus shared for the set of neurons of the same output card O, and is therefore applied to the whole of the input matrix, which reduces the memory required for the storage of the coefficients. which optimizes performance.
  • the coefficients of a convolution kernel K may correspond to conventional signal processing filters (Gaussian, Gabor, Laplace 7), or be determined by learning, supervised or unsupervised, for example using the gradient retro-propagation algorithm used in multi-layered percepton neuron networks.
  • the coefficients of the convolution nuclei can be positive or negative, and are generally normalized between -1 and 1, as are the input and output values of the neurons.
  • Neural networks can be transposed into spike encoding ("spike" in English language).
  • the signals propagated at the input and at the output of the network layers are no longer numerical values, but electrical pulses (comparable to Dirac pulses).
  • the information that was coded in the signal value (normalized between -1 and 1) is then coded temporally with the order of arrival of the pulses (rank order coding) or with the frequency of the pulses .
  • rank order coding the instant of arrival of the pulse is inversely proportional to the absolute value of the signal to be coded.
  • the sign of the pulse determines the sign of the value of the signal to be encoded.
  • the pulse frequency of between f min and f max is proportional to the absolute value of the signal to be encoded.
  • the coding can also be pseudo-frequential, for example fishy: in this case f max and f min represent average frequencies only.
  • the initial phase of the pulses can be random.
  • the pulses may also come directly from a sensor, such as a retina or an artificial cochlea, mimicking the operation of their biological equivalent.
  • a neuron is defined by a nonlinear algebraic function, parameterized, with bounded values, and having real variables called "input" according to the neuron model used.
  • a neuron is further characterized by an activation function g (), a threshold and synaptic weights.
  • the neuron model is defined by a nonlinear algebraic function. This function can take as argument the value of the integration (representing the internal value of a neuron) but also, depending on the model, the time or the output of an internal counter.
  • the expression “integration” designates the time integral of pulse trains ("spike” in Anglo-Saxon language) weighted at the input of the neuron (ie time integral of a train of pulses weighted (Dirac comb for example)). This integration can be reset when the neuron fires (i.e. produces an output pulse).
  • the output of the neuron corresponds to the value of the activation function g of the neuron applied to this sum: g (h).
  • g can take the form of a sigmoid function, typically the hyperbolic tangent function.
  • the calculation of the weighted sum h is done by accumulating the coefficient of the convolution core 14 at each arrival of a pulse on the corresponding input.
  • the activation function of the neuron g can in this case be replaced by a threshold.
  • the output neuron emits a pulse of the sign of h and returns the weighted sum h to the value 0.
  • the neuron then enters a so-called "refractory" period during which it can no longer emit pulse during a fixed period.
  • the pulses can therefore be positive or negative, depending on the sign of h when the threshold is exceeded.
  • a negative input pulse reverses the sign of the corresponding kernel coefficient for accumulation.
  • a convolution layer may contain one or more convolution cores which each have an input matrix (the input matrix of the different convolution cores may be the same) but have different convolution coefficients corresponding to different filters.
  • a convolution or “pooling” layer may consist of one or more output matrices 14 (also called “output maps” or “output feature maps” in the English language), each output card that can be connected to one or more input matrices 11 (also called “input cards”).
  • Each convolution core 12 in a given convolutional layer produces a different output map 14 so that the output neurons are different for each core.
  • the convolutional networks may also include local or global “pooling” layers that combine the neuron group outputs of one or more output cards. The combination of the outputs may for example consist of taking the maximum or average value of the outputs of the neuron group, for the corresponding output, on the output map of the "pooling" layer.
  • the “pooling" layers make it possible to reduce the size of the output cards from one layer to another in the network, while improving its performance by making it more tolerant to small deformations or translations in the input data.
  • Convolutional networks may also include fully connected layers of the perceptron type.
  • an output matrix (14), denoted by 0 includes coefficients Oj j and has a noted size (fl h , O w ).
  • This matrix corresponds to a matrix of neurons and the coefficients O ii ⁇ correspond to the output values of these neurons, calculated from the entries and the synaptic weights.
  • An input matrix or card 11 may correspond to an output card of a previous layer, or to an input matrix of the network that receives stimuli or a portion of the stimuli to be processed.
  • a neural network may consist of one or more input matrices 1 1. It may be, for example, RGB, HSV, YUV components or any other conventional component of an image, with a matrix per component.
  • An input matrix denoted by / comprises coefficients and has a noted size (l h , l w ).
  • An output card 0 is connected to an input matrix / by a convolution operation, via a convolution core 12 denoted K (the convolution core is also called a filter, or convolution matrix), of size n, m). and comprising coefficients K kil .
  • Each neuron of the output card 14 is connected to a part of the input matrix 11, this part being again called “input sub-matrix” or “receiver field of the neuron” and being of the same size as the matrix K.
  • the convolution matrix K comprising the synaptic weights is common for all the neurons of the output map 0 (the weights of the matrix K are then called “shared weights").
  • Each output coefficient of the output matrix 0 i then satisfies the following formula:
  • g) denotes the neuron activation function
  • s denotes the neuron activation function
  • s denotes the offset parameters ("stride” in English language) according to two dimensions, in particular in a vertical dimension and in a horizontal dimension respectively.
  • stride denotes the offset parameters ("stride” in English language) according to two dimensions, in particular in a vertical dimension and in a horizontal dimension respectively.
  • Such a "stride" offset corresponds to the offset between each application of the convolution kernel on the input matrix. For example, if the offset is greater than or equal to the size of the kernel, then there is no overlap between each kernel application.
  • An output card 0 is connected to an input matrix / by a “pooling” operation which downsamples the input matrix, which provides a subsampled matrix.
  • the subsampling can be of two types: - A type of subsampling known as "MAX pooling" according to the equation below:
  • synaptic weights associated with connections in the case of a pooling layer are generally unitary and therefore do not appear in the formulas above.
  • a fully connected layer comprises a set of neurons, each neuron being connected to all inputs of the layer.
  • each neuron j has its own synaptic weights W ii ⁇ with the corresponding inputs and performs a weighted sum of input coefficients with weight which is then passed to the neuron activation function for the output of the neuron.
  • the activation function of neurons g () is usually a sigmoid function, such as the tanhQ function.
  • the activation function can be for example the identity function.
  • FIG. 4 diagrammatically represents an example of a computer based on a convolutional neural network architecture 100 in which a device for distributing data of at least one convolution core can be implemented, according to some embodiments.
  • the computer 100 also called “neuromorphic calculator” is configured to calculate at least one convolutional layer of a convolutional neural network.
  • the calculator may comprise at least one convolution module 10 (also called “convolution block” or “convolution calculation module”) configured to calculate each convolution layer, each convolution layer consisting of one or more output cards. of the convolutional neural network.
  • the computation of a convolutional layer consists of calculating the internal value (also called “integration value") of the neurons that received an input event (such as a pulse), called “triggered” or “activated” neurons.
  • the calculation of a convolutional layer thus consists in determining the response of the neurons of the convolutional layer to an input event (such as a "pulse").
  • the convolution modules 10 may be interconnected by an interconnection system 101, for example by using a network-on-a-chip (NOC) system. programmable interconnections (eg FPGA type), fixed-route interconnection systems, etc.
  • NOC network-on-a-chip
  • programmable interconnections eg FPGA type
  • fixed-route interconnection systems etc.
  • the interconnection system 101 makes it possible to redirect the events between the modules and / or the inputs / outputs and ensures the connection between the different layers of the neuron network.
  • Each convolution module 10 performs a convolution operation on one or a part of a convolutional layer of the neuron network.
  • the convolution modules 10 may be used to calculate different convolutional layers. Alternatively, each convolution module 10 may be used to perform multiple filters in a given convolutional layer. In some embodiments, when a convolution module 10 is not sufficient to compute a given convolutional layer, a plurality of convolutional modules may be used to compute the convolutional layer.
  • the computer 100 may be a multi-core distributed memory computer, each core being interconnectable by an interconnection system, each calculation module forming a computing core which can be used to calculate one or more convolution operations.
  • the different convolutional layers of the convolutional neural network are distributed on the various calculation modules 10.
  • Figure 5 shows the distribution of weight coefficients as a function of an input event for a convolutional and pulse neural network.
  • Each processing unit may be configured to calculate the value of independent neurons triggered by input events, based on the weight coefficient associated with the computing unit for each input event.
  • the input event is represented by a pulse arriving on the network.
  • a pulse neural network can receive a pulse train over time, each pulse can trigger one or more neurons.
  • the input event can then be defined by an input address on the convolution layer. The following description will be made with reference to a pulsed convolutional neural network to facilitate the understanding of the invention.
  • the "input address" of a pulse represents the address of the pulse emitted by the previous layer (input pulses) and received by the considered layer.
  • the pulses can be transmitted via a serial bus.
  • the impulses therefore spread with them at least their broadcast address.
  • the entry address in the selected landmark may include the coordinates of the input event on the input map.
  • an input neuron pre-synaptic neuron
  • the The pulse crosses the synapses, which induces a pulse on a finite number of pst-synaptic neurons which are then solicited.
  • each calculation module 10 can be configured to pool certain functions for groups of neurons independent of the same layer.
  • FIG. 6 illustrates the structure of a convolution module 10, according to some embodiments.
  • the convolution module 10 comprises one or more processing units 20 (also called “elementary processing units” or “computing units”), such as elementary processors, configured to calculate the response of the neurons of a convolutional layer to an input event (such as an "impulse").
  • the convolution module 10 is configured to distribute the coefficients of the convolution core 12 to at least some of the elementary processing units 20 in parallel.
  • Each convolution module 10 comprises a synaptic weight memory 21 in which are stored the coefficients of the convolution kernels (14) and a data distribution device 200 (also called “routing device” or “data routing device” ) according to some embodiments of the invention.
  • the weight coefficient memory 21 is configured to store the convolutional weights and is accessible in parallel reading by the different processing units 20.
  • the neurons share the same weight coefficients (value of the convolutional filter ).
  • the memory of the weight coefficients 21 is distributed parallel to the processing units as a function of the entry events. Thus, the memory of the weight coefficients 21 is not redundant. Such a memory distribution makes it possible to optimize the memory space.
  • each processing unit 20 may be associated with a weight coefficient of the convolution kernel that is distributed by the data distribution device 200 after a circular shift of the bus data.
  • input 201 which carries the coefficients of the memory 21.
  • the input event is a pulse arriving on a convolution layer.
  • This pulse is emitted by the preceding convolution layer (input pulse) and is received by the convolutional layer considered.
  • the pulse can be defined by an input address, i.e. the address of the transmitted pulse.
  • the pulses can be transmitted by a serial bus and then propagate with them at least their address.
  • the processing units 20 perform the computation of the response of a convolutional layer to the input pulse from the weight coefficients distributed in parallel by the data distribution device 200.
  • the elementary processing units 20 then generate the events output based on an input event for each triggered input neuron.
  • the processing units 20 may depend on a chosen neuron model (also called "computational neuron model").
  • the neuron model can be defined during system programming. In some embodiments, the neuron model is identical for all neurons in a layer.
  • a pulse-type neuron model generates temporal integration of the information.
  • the response of a neuron to stimulation is the emission time of the next impulse.
  • a pulse is emitted when the excitation of the neuron exceeds a threshold.
  • a virtual clock can be used to delay neuron pulse emissions which can then be considered as dated events.
  • impulse neuron models for example:
  • HH Hodgkin & Huxley
  • LIF Leaky Integrate & Fire model
  • QIF Quadratic Integrate & Fire model
  • gIF Integrate & Fire conductance model
  • SRM Spike Response Model
  • Each processing unit 20 (also called “sub-calculator of the neuromorphic model”) may represent one or more neurons.
  • Each elementary data processing unit 20 is configured to be able to generate an output value representing the output value of each neuron it contains which results from the integration of the data received by the neuron.
  • This temporal sum (representing a temporal integration) is determined by using the activation function (also called transition function or transfer function) which depends on the neuron model.
  • Each convolution module 10 includes an output management unit 22 for outputting the values calculated by the elementary processing units.
  • This unit 22 can be configured to serialize the outputs of the processing units 20 and / or to create a grouping function ("pooling" in Anglo-Saxon).
  • the output management unit 20 may comprise a competition system for managing the competition between the output events according to predefined competition rules and a serialization unit configured to serialize the output events.
  • the output management unit 22 may input a parallel data bus and output this data, one by one, over a serial bus, in a predefined order of priority, in some embodiments.
  • FIG. 7 shows the inputs and outputs of the routing device 200, according to some embodiments, in which parallel values of the filter must be read as a function of the coordinates of the input pulse of the system.
  • the routing device 200 is configured to control the distribution of the weight coefficients to the processing units 20 for the calculation of a neuromorphic convolution layer and to allow the parallel reading of the convolution filter by the processing units 20.
  • the processing units 20 must be able to read in parallel values of the filter as a function of the coordinates of the input pulse of the computer 100, whereas the processing units do not read the values of the filter. same data at the same time.
  • the routing device 200 allows dynamic routing of these data to the processing units 20 which allows a parallel reading of the data of the memory 2 (filter coefficients). As shown in FIG. 7, the routing device 20 receives the data from the memory 2 in the form of an input bus 201 from the memory 2 which stores the filter coefficients of the neural network. The routing device 20 further receives the address of the input event to be processed 202 (hereinafter referred to as the "input address"). The routing device 200 is configured to reorder the coefficients of the filters (data representing convolution kernels) as a function of the input address, and delivers the reordered filter coefficients to the processing units 20 so as to be able to use the coefficients parallel processing in the processing units 200.
  • the input address 202 of the convolution layer represents the address of the application point of the convolution core 12.
  • the input address corresponds to the address of the pulse emitted by the previous layer (input pulse) and received by the layer in question.
  • the pulses can be transmitted via a serial bus. The pulses thus propagate with them at least their transmission address.
  • the input address may comprise two components X and Y according to the representation format chosen to encode the pulses.
  • the pulses transiting between the layers of the neural network 10 may be encoded according to the protocol or format AER ("Address-Event Representation" meaning representation by address event).
  • AER Address-Event Representation
  • each pulse is digital and consists of a bit stream encoding the destination address (X, Y) of the pulse along two perpendicular axes X and Y, the reference (X, Y) corresponding to the reference of the input matrix, as well as the sign of the impulse.
  • the coded address (X, Y) represents the location of the input neuron to be activated.
  • its address (X, Y) gives the location to activate with X ⁇ j and Y ⁇ i.
  • the routing device 200 placed between the memory 2 of the filter coefficients and the processing units 20, thus changes configuration at each input event.
  • the routing device 200 may be implemented for a maximum number of dimensions and a maximum convolution kernel size (the convolution kernel is represented by a matrix).
  • the routing device 200 may be parameterized for convolution core sizes smaller or equal to the maximum size, or for several different convolution cores.
  • the implementation of the routing device 200 can therefore be of any size.
  • the routing device 200 may be based on a regular structure while scaling in terms of maximum matrix size and number of dimensions.
  • Figure 8 illustrates the structure of the routing device 200 according to one embodiment.
  • the routing device 200 comprises at least one permutation network 30 controlled by at least one parameterizable control unit 32 (the control units are hereinafter referred to as "decoder").
  • Each permutation network 30 is configured to perform circular offsets and / or offsets of the input data bus 201 or a portion of the bus 201 by shifting the values of the coefficients carried by the wires of the bus 201.
  • Each wire (or “line") of the input bus 201 transmits or carries a coefficient of the digital or analog convolution core.
  • Each decoder 32 (also called “control unit”) is configured to drive the permutators 30 according to the input address of the convolution core and at least one parameter 34 representing the size of the convolution cores to be processed.
  • the routing device 200 may rely on a tree representation (hereinafter referred to as an "offset tree”) to route the data from a convolution filter to the processing units 20. .
  • an offset tree a tree representation
  • the tree representation is constructed by the routing device based on the maximum size of the convolutional filter to be processed.
  • the tree can be further constructed from other parameters that may include:
  • the size of the input bus 201 may be defined as being equal to the number of elements in the maximum convolutional filter.
  • the cutting of the input bus can be defined as being equal to the number of columns (respectively of lines) of the convolution filter in its matrix representation (12) or more generally from this number of columns (respectively of lines).
  • the depth of the tree can be defined as being equal to the number of dimensions of the convolution filter or more generally from this number of dimensions. The depth of the tree thus defines the number of levels of the tree. The remainder of the description will be made with reference to a cutting of the input bus according to a column cut-out of the convolution matrix by way of non-limiting example, each sub-bus comprising the components (convolution coefficient) of a column of the convolution matrix.
  • Each node of the tree represents a permutation network 30 configured to perform the circular offsets of a data vector, representing the input convolution coefficients of a portion of the input bus 201, or circular offsets of the subnets.
  • -Sets (corresponding to the 2010 subbus entries) of this vector.
  • each level of the tree corresponds to a given dimension of the coordinate system in which the convolution matrix is represented.
  • the permutation networks arranged on a given level of the tree may advantageously be configured to perform circular offsets of the data received at the input according to the dimension of the selected reference mark associated with the level of the tree (the reference being used to represent the elements of the neural network).
  • Figure 9 illustrates the equivalence between a maximum convolutional filter and the tree topology, according to some embodiments.
  • the matrices of the neural network and in particular the convolution matrices 12, are represented in a three-dimensional coordinate system (X, Y, Z).
  • the input bus 201 can be divided into p sub-bus ⁇ 2010-1, 2010-p ⁇ , each subbus including the components a column of the convolution matrix.
  • each convolution coefficient in the convolution matrix may be associated with an address or position defined by coordinates in the used coordinate system.
  • the shift tree comprises 3 levels, each level being associated with a dimension of the marker.
  • the permutation networks 30 on a given level of the tree are thus configured to perform circular offsets of the received data, according to the dimension associated with this level of the tree.
  • the level 1 permutation networks 30-1 perform circular offsets of the sub-buses 2010-i (components of each column i of the convolution matrix) along the X axis
  • the permutation networks 30-2 of the Level 2 perform circular offsets of Y-axis level 1 swap network outputs
  • Level 3 swap networks 30-3 perform circular offsets of Z-level 2 level outputs.
  • the input bus 201 therefore comprises 25 elements (5x5).
  • the number of columns of the convolution matrix being equal to 5, 5 sub-buses 2010 of 5 elements (denoted ⁇ 20101 -1, ..., 2010-i, 2010-5 ⁇ ) can be created. Since the filter has two dimensions (2D matrices defined along two axes X, Y), the depth of the tree is 2 (the tree has two levels).
  • Each of the sub-buses 2010 can be permuted independently of the other sub-buses by a separate permutation network (X-dimensional off-shifts) in the first level of the tree (5 permutation networks at the level 1 of the sub-bus). 'tree).
  • the outputs of the 5 sub-buses 2010 can be joined in an intermediate bus 204 comprising 5 elements (5 wires).
  • the data of the secondary bus 204 can be exchanged by a permutation network arranged on a second level of the tree which receives the data of the secondary bus 204 (offset according to the dimension Y).
  • the tree thus created therefore has 5 nodes on the first level and a node on the second level.
  • Each permutation network 30 may consist of a set of circular shift operators, hereinafter referred to as "permutators".
  • the permutators 30 can be in two states and distributed in superimposed layers (also called “layers” below) on the lines of the bus.
  • a permutation layer may perform a circular shift. Each layer thus shifts the data of a selected number of bus lines to the right. The values on the right are not overwritten but are shifted to the left.
  • the set of permutation layers can perform any useful circular offsets.
  • FIG. 11 shows an example of a two-state permutator 300 that may be used in each permutation network 30.
  • a two-state switch 300 may be controlled with one bit.
  • the inputs are not permuted.
  • the so-called “crossed” state (shown below), the entries are swapped.
  • FIG. 12 represents an exemplary permutation network 30 according to two representations (left part and right part) comprising three permutators 300 with two states distributed in layers on the bus lines carrying the values 0, 1, 2, 3.
  • the three permutators 300 perform a circular shift modifying the order of the initial data ⁇ 0, 1, 2, 3 ⁇ .
  • the permutation network 30 thus delivers the data ⁇ 3, 0, 1, 2 ⁇ reordered by circular shift.
  • the permutation network 30 may be configured to make circular offsets of different degrees on the N son of the input sub-bus 2010 with which it is associated. Indeed, each layer of permutators can act on more or less distant lines. The distance between two permuted lines in a given stratum defines the degree of this stratum.
  • FIG. 13 shows examples of permutation networks 30-1, 30-2, 30-3 and 30-4 comprising permutators configured in layers according to different degrees.
  • the permutator strata of the permutation network 30-1 are of degree 1;
  • the permutator strata of the permutation network 30-2 are of degree 2;
  • the permutator strata of the permutation network 30-3 are of degree 3;
  • the permutator strata of the permutation network 30-4 are of degree 4.
  • the circular offsets to be applied can be defined by offset numbers.
  • An offset number can define the circular offset to apply.
  • an offset number "1" may trigger the application of a circular shift to the right
  • an offset number "2” may trigger the application of two circular offsets to the right
  • a bus of N lines is at most N-1 shifted.
  • the circular offsets made by the layers (or "layers") of permutators 300 can be added.
  • the degrees of the permutator layers 300 used in each permutation network 30 of the routing device 20 can correspond to the decomposition in sum of the circular offsets of the bus 201 and all the sub-buses 2010.
  • the permutation network 30 can perform all the circular offsets of the bus of entrance and / or its sub-buses.
  • the permutator layers 300 in each permutation network 30 may preferably be arranged in descending order of degrees.
  • the degree of permutator layers 300 in each permutation network 30 can be set to 2 power n (2 n ) with ne [0, [log2 (N) ⁇ ].
  • FIG. 13 represents such an example of a permutation network 30 comprising 3 layers of 2-power permutators for a 5-line bus.
  • the permutation network comprises a first layer of degree 4 (comprising 1 permutator 300), a second layer of degree 2 (comprising 3 permutators 300) and a third layer of degree 1 (comprising 4 permutators 300).
  • FIG. 15 represents an example of a routing device 200 whose elements are arranged according to a tree representation, for maximum size convolution matrix.
  • the permutation networks 30 located at the same depth in the tree share a same decoder 32.
  • At each level of the tree is thus associated at least one decoder 32 configured to control the permutation networks 30 arranged on this level.
  • the total number of decoders 32 is at least equal to the depth of the tree. For example, if the tree has a depth of 2, 2 independently controlled decoders (1 for each depth) are used. Such an arrangement of the decoders makes it possible to reduce the control logic.
  • Each decoder 32 associated with a given level of the tree is configured to dynamically activate the permutators 300 permutation networks 30 arranged on this level of the tree so as to modify the number of offsets applied depending on the event and an input configuration defined in particular by the address of the input event.
  • the decoder 32-1 is configured to dynamically control the activation of the permutators 300 of the permutation networks 30-1 of the first level as a function of the input event
  • the Decoder 32-2 is configured to dynamically control the activation of permutators 300 of the second level permutation networks 30-2.
  • the decoders 32 may inhibit some permutators 300 of the permutation network 30 to create independent offset subspaces. From these subspaces subtrees and subsets of the input vector of the permutation network can be created. A permutator 300 may be inhibited for all layers of all stages. In one embodiment, the permutators 300 of a given layer are activated at the same time.
  • the input configuration may be used by the routing device 200 to determine the number of lines of the input bus 201 on which a circular offset is to be applied.
  • the input configuration may be common to the different decoders 32.
  • the input configuration may be used after the construction of the tree to set the size of the processed convolution.
  • the routing device 200 may process one or more convolutions at the same time as a function of the number of offsets to be applied relative to the size of the input bus, as shown in FIG. 16.
  • FIG. 16 shows the offset tree representation used to dynamically configure the permutation networks 30 according to an input event for each convolution core C0, C1, C2, C3.
  • Each convolution uses 3 permutation networks Px (30) on the first level of the tree to perform circular sub-shifts according to the dimension X and a permutator Py (30) on the second level of the shaft to make shifts according to the dimension Y.
  • On each level and for each convolution core, at least one decoder 32 (not shown in FIG.
  • control word 16 can drive the permutation networks by using a control word for each permutation network, the control word comprising as many bits as permutators 300 in the permutation network 30. The value of each bit in the control word controls the activation status of the associated permutator 300.
  • the circular offsets thus made on the various input sub-buses 2010 thus make it possible to reorder the coefficients carried by the wires of these sub-buses.
  • the device Routing 200 then distributes the coefficients thus reordered to the processing units 20. For example, the coefficients ⁇ 021 ⁇ of the convolution C0 of the line L0 of the convolution matrix (according to a line cut of the convolution matrix) and the coefficients ⁇ 3,5,4 ⁇ of the convolution C1 of the line L2 the convolution matrix are distributed after reordering ( ⁇ 012 ⁇ for C0, L0 and ⁇ 345 ⁇ for C1, L2 to the same processing unit 20. Each unit of processing receives only one value.
  • the different filters can be cut (or divided) into lines. These lines can then be concatenated on a sub-bus.
  • the following subbus contains the following lines. Once all the rows of these filters are placed, the next filter is processed. In a preferred embodiment, a line can not be cut to place it on 2 under buses. For example, considering 1 1 sub-bus of 1 1 elements (ie 121 elements), on a given sub-bus, it is possible to place 3 lines of 3 different filters 3 * 3.
  • the 3 filters use 3 sub-buses, ie 33 elements (wires). It is therefore possible to constitute 9 filters 3 * 3.
  • the routing device 200 can perform the permutations of lines as follows:
  • the routing device 200 applies the permutation only once on the input bus 201 (for example a permutation of 4 input elements of a bus of 5 elements 201 is performed);
  • the permutation can be applied as many times as possible. For example, a permutation of 2 input elements of a 5 element bus 201 can be applied twice on the bus to process two 2x2 convolutions at the same time. According to one characteristic, the non-useful values may not be permuted and keep their initial position.
  • the decoders 32 may implement a permutator activation method for determining the permutators 300 to be enabled in the permutation networks for a given input configuration and for a given circular permutation (the permutation can be defined by an offset number). ), and taking into account the topology of permutation networks.
  • the activation method also called “configuration method” can be applied for all possible offsets and all possible input configurations.
  • Fig. 17 is a flowchart showing the method of activating the permutators of a permutation network of a given level of the tree.
  • the permutation network 30 consists of permutator layers 300, each layer comprising one or more permutators 300, each having a degree.
  • a set of permutator layers 300 of the permutation network is selected based on the sum of the degrees associated with the permutators 300 of that set of layers.
  • the layer assembly includes at least one layer. In one embodiment, for the permutation "0", no permutator is activated.
  • the set of layers selected is such that the sum of the degrees associated with the permutators 300 of this set of layers is equal to the offset to be applied (defined by an offset number).
  • step 502 in the selected set of layers, some of the permutators are activated.
  • the permutators that are enabled are those that move data from bus 201 or input subbus 2010 from the left to target locations on the right. Activation of the permutators results in a circular shift of the input data of the permutation network 30.
  • step 504 each data thus permuted can be tested. The values can be tested from left to right.
  • step 505 it is determined whether a condition relating to the data tested after permutation (initially the left-most datum) is satisfied.
  • This condition can consist in determining whether the data, which is brought back to the position considered by each permutation applied in step 502 and which corresponds to an activated permutator (initially left-most position), is positioned on the desired line of the bus. output of the permutation network. If the condition is not satisfied, in step 506, another switch 300 of the subassembly selected in step 500 that has not been activated in step 502 can be activated to replace the switch without assign the permutations made beforehand.
  • steps 504 and 507 are iteratively repeated to check whether the following data (507) are properly placed. If the number of permuted lines of the input bus 201 is less than half the number of lines (201) in the bus 201, the found configurations can be repeated as many times as possible on the input bus.
  • 18 illustrates the successive stages of permutators of the activation process for an example of permutation network layers 2n embodiment, for all offsets of 5 lines on a 5-line input bus on a permutation network layer 2 power n. The red permutators are turned on in step 500 while the green permutators are turned on in step 504.
  • these configurations can be formalized in the form of a logic circuit or a correspondence table controlled by a control bus and by a configuration bus.
  • the routing input configuration for a given input event provides the size of the processed filter, the control, and / or the number of circular offsets to be applied.
  • FIG. 19 shows an example of a two-state digital permutator implementation that can be used in the permutation networks 30 of the routing device 200.
  • the two-state switch 300 of FIG. 19 is realized by using two multiplexers 301 and 302 taking the same inputs and sharing the same control bit 303. However, the inputs of the multiplexers are reversed (in the example, the multiplexer 301 receives the same inputs). inputs A and B while the multiplexer 302 receives the inputs B and A).
  • the size of the input buses of the multiplexers 301/302 can be arbitrary: the permutator 300 can thus interchange buses of any size. In the example of FIG. 19, when the control bit 303 is at 0, the data A is routed to C and the data B is routed to D. Conversely, when the control bit is at 1, the data A is routed to D and the data B is routed to C.
  • FIG. 20 shows an example of a permutation network 30 of an input bus 201 to 1 1 lines with layers in 2 n that can use such a two-state switch 300.
  • the permutation network 30 thus contains layers of degree 8, 4, 2 and 1 and 29 permutators 300.
  • Such a permutation network structure 30 makes it possible to make all the circular permutations of a bus of 1 1 lines and all its under bus.
  • the list of circular offsets that can be performed by such a permutation network comprises:
  • Such a permutation network 30 can be controlled by a control word (29 bits in this example) supplied by the decoder 32 associated with the permutation network 30.
  • a control word 29 bits in this example
  • Each bit of the control word is associated with one of the permutators 300 of the permutation network 30.
  • the value of each bit of the control word controls the activation status of the associated permutator 300 (for example, the value 1 commands the activation of the associated permutator while the value 0 leaves the permutator inactivated state). Since the permutators 300 are arranged in layers of degree deg, the number of permutators in a given layer is:
  • N- 2 n N - deg n , where N represents the number of elements in the bus.
  • the bus of input comprises 121 lines (1 1 x1 1) divided into 1 1 packets of 1 1 lines forming the 1 1 input sub-bus 2010.
  • the filter having two dimensions (2D matrices defined along two axes X and Y), the depth of the tree is 2 (the tree has two levels).
  • the tree contains 12 nodes each representing a permutation network.
  • the offset tree can be set to perform all matrices of size 1 1 x 1 1 to 1 x 1.
  • the embodiments of the invention thus make it possible to find the target offset from the address of the input event to reorder the data of a convolution filter to be transferred to the processing units 20 in a computer 100 to convolutional neural network architecture base.
  • the routing device 200 allows dynamic activation of the permutators of the permutation networks by using decoders 32 that can be parameterized as a function of the input event, the maximum size of the convolution core and / or the offset to be applied.
  • the invention allows to reorder the weight coefficients carried by the input vector (input bus) according to the input events (for example pulses) and network parameters by applying controlled circular permutations.
  • the invention provides a solution for dynamically routing data of convolution kernels in neuromorphic computers and in particular in massively parallel neuromorphic computers.
  • the invention notably allows a parallel and parameterizable routing of data for neuromorphic calculations of convolutions, a distribution and a scheduling of the parallel weight coefficients for a convolution. It offers flexible routing including the creation of independent sub-routes.
  • the routing device 200 dynamically adapts to the size and number of dimensions of the convolutions. This results in a reduced complexity of N 2 compared to existing solutions (complexity N 3 ).
  • the routing device 200 can dynamically change the configuration of the permutators 300 of the permutation networks, at each input event on a convolution core.
  • the routing device 200 is implementable not only for maximum size and maximum convolution filter size, but also for all convolution matrix sizes (12) smaller than or equal to the maximum size, or even for several convolution matrices at the same time.
  • the routing device 200 thus allows scaling in terms of maximum matrix size and number of dimensions of the convolution core.
  • the routing device 200 requires a smaller implementation area than in existing multiplexer-based implementations.
  • the offset tree makes it possible to represent the interconnection of the permutation networks 30 and to process the convolution filters as a function of the maximum kernel to be processed, and the number of dimensions of the convolution filter (defining the depth of the tree).
  • the topology of the permutation networks 30 and permutator layers 30 within the permutation networks allow circular permutations of the input bus and / or input subbuses by dynamically controlling the activation of the permutators. 30.
  • the dynamic control of the permutators 300 and more generally the dynamic configuration of the permutation networks 30 as a function of the input event are such that the permutation networks can perform all the circular offsets.
  • the routing device 200 can be applied to any input bus of any size in a completely parallel manner, using only permutators 300 that can be digital or analog.
  • the routing device 200 may be implemented for digital or analog signals. It also offers great flexibility of implementation (size of the input bus, depth of the tree, etc.). The circular offsets can be made at the input bus 201 but also at its sub-bus.
  • the invention has a particular advantage in an application to a pulse neuromorphic convolution calculator for transferring the values of a convolutional filter to processing units representing neuromorphic computing units 20, the invention is not limited to such an application. In a variant, the invention applies to any device in which the coefficients of at least one convolution core must be distributed in parallel to processing units.
  • the method of activating the permutators 300 can be implemented in various ways by hardware, software, or a combination of hardware and software, particularly under the A form of program code that can be distributed as a program product in a variety of forms.
  • the program code may be distributed using computer readable media, which may include computer readable storage media and communication media.
  • the methods described in the present description can in particular be implemented in the form of computer program instructions executable by one or more processors in a computer computing device. These computer program instructions may also be stored in a computer readable medium.
  • the invention is not limited to the embodiments described above by way of non-limiting example. It encompasses all the variants that can be envisaged by the skilled person.
  • the invention is not limited to a pulse type convolutional neural network.
  • the invention is not limited to a number of dimension equal to 2 or 3 as illustrated in the examples above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
EP17720462.5A 2016-04-27 2017-04-27 Vorrichtung und verfahren zur verteilung von faltungsdaten eines neuronalen faltungsnetzwerks Pending EP3449424A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1653744A FR3050846B1 (fr) 2016-04-27 2016-04-27 Dispositif et procede de distribution de donnees de convolution d'un reseau de neurones convolutionnel
PCT/EP2017/060017 WO2017186830A1 (fr) 2016-04-27 2017-04-27 Dispositif et procede de distribution de donnees de convolution d'un reseau de neurones convolutionnel

Publications (1)

Publication Number Publication Date
EP3449424A1 true EP3449424A1 (de) 2019-03-06

Family

ID=57113426

Family Applications (1)

Application Number Title Priority Date Filing Date
EP17720462.5A Pending EP3449424A1 (de) 2016-04-27 2017-04-27 Vorrichtung und verfahren zur verteilung von faltungsdaten eines neuronalen faltungsnetzwerks

Country Status (4)

Country Link
US (1) US11423296B2 (de)
EP (1) EP3449424A1 (de)
FR (1) FR3050846B1 (de)
WO (1) WO2017186830A1 (de)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360470B2 (en) * 2016-10-10 2019-07-23 Gyrfalcon Technology Inc. Implementation of MobileNet in a CNN based digital integrated circuit
US10366328B2 (en) * 2017-09-19 2019-07-30 Gyrfalcon Technology Inc. Approximating fully-connected layers with multiple arrays of 3x3 convolutional filter kernels in a CNN based integrated circuit
US11164071B2 (en) * 2017-04-18 2021-11-02 Samsung Electronics Co., Ltd. Method and apparatus for reducing computational complexity of convolutional neural networks
US11265540B2 (en) * 2018-02-23 2022-03-01 Sk Telecom Co., Ltd. Apparatus and method for applying artificial neural network to image encoding or decoding
JP2019200675A (ja) * 2018-05-17 2019-11-21 東芝メモリ株式会社 演算デバイス及びデータの処理方法
US12099912B2 (en) 2018-06-22 2024-09-24 Samsung Electronics Co., Ltd. Neural processor
FR3085517B1 (fr) 2018-08-31 2020-11-13 Commissariat Energie Atomique Architecture de calculateur d'une couche de convolution dans un reseau de neurones convolutionnel
US11615505B2 (en) 2018-09-30 2023-03-28 Boe Technology Group Co., Ltd. Apparatus and method for image processing, and system for training neural network
CN109493300B (zh) * 2018-11-15 2022-05-20 湖南鲲鹏智汇无人机技术有限公司 基于fpga卷积神经网络的航拍图像实时去雾方法及无人机
US11526753B2 (en) * 2019-02-12 2022-12-13 Irida Labs S.A. System and a method to achieve time-aware approximated inference
US11671111B2 (en) 2019-04-17 2023-06-06 Samsung Electronics Co., Ltd. Hardware channel-parallel data compression/decompression
US11211944B2 (en) 2019-04-17 2021-12-28 Samsung Electronics Co., Ltd. Mixed-precision compression with random access
US11880760B2 (en) 2019-05-01 2024-01-23 Samsung Electronics Co., Ltd. Mixed-precision NPU tile with depth-wise convolution
CN112215329B (zh) * 2019-07-09 2023-09-29 杭州海康威视数字技术股份有限公司 基于神经网络的卷积计算方法及装置
CN118468107A (zh) 2019-07-25 2024-08-09 智力芯片有限责任公司 数字尖峰卷积神经网络系统和执行卷积的计算机实施方法
CN112308202A (zh) * 2019-08-02 2021-02-02 华为技术有限公司 一种确定卷积神经网络的决策因素的方法及电子设备
CN110728303B (zh) * 2019-09-12 2022-03-11 东南大学 基于卷积神经网络数据复杂度的动态自适应计算阵列
US11625453B1 (en) 2019-12-12 2023-04-11 Amazon Technologies, Inc. Using shared data bus to support systolic array tiling
US12112141B2 (en) 2019-12-12 2024-10-08 Samsung Electronics Co., Ltd. Accelerating 2D convolutional layer mapping on a dot product architecture
KR20210105053A (ko) * 2020-02-18 2021-08-26 에스케이하이닉스 주식회사 연산 회로 및 그것을 포함하는 딥 러닝 시스템
CN111783997B (zh) * 2020-06-29 2024-04-23 杭州海康威视数字技术股份有限公司 一种数据处理方法、装置及设备
CN112101284A (zh) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 图像识别方法、图像识别模型的训练方法、装置及系统
CN113570031B (zh) * 2021-06-08 2024-02-02 中国科学院深圳先进技术研究院 卷积运算的处理方法、电子设备及计算机可读存储介质
CN114265625A (zh) * 2021-12-31 2022-04-01 上海阵量智能科技有限公司 数据循环移位装置、方法、芯片、计算机设备及存储介质
KR20240092304A (ko) * 2022-12-14 2024-06-24 리벨리온 주식회사 뉴럴 프로세서

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7237055B1 (en) * 2003-10-22 2007-06-26 Stretch, Inc. System, apparatus and method for data path routing configurable to perform dynamic bit permutations
WO2012015450A1 (en) * 2010-07-30 2012-02-02 Hewlett-Packard Development Company, L.P. Systems and methods for modeling binary synapses
FR3015068B1 (fr) * 2013-12-18 2016-01-01 Commissariat Energie Atomique Module de traitement du signal, notamment pour reseau de neurones et circuit neuronal
FR3025344B1 (fr) * 2014-08-28 2017-11-24 Commissariat Energie Atomique Reseau de neurones convolutionnels

Also Published As

Publication number Publication date
WO2017186830A1 (fr) 2017-11-02
US11423296B2 (en) 2022-08-23
US20190156201A1 (en) 2019-05-23
FR3050846A1 (fr) 2017-11-03
FR3050846B1 (fr) 2019-05-03

Similar Documents

Publication Publication Date Title
WO2017186830A1 (fr) Dispositif et procede de distribution de donnees de convolution d'un reseau de neurones convolutionnel
EP3449423B1 (de) Vorrichtung und verfahren zur berechnung der faltung in einem neuronalen faltungsnetzwerk
US10803347B2 (en) Image transformation with a hybrid autoencoder and generative adversarial network machine learning architecture
KR102545128B1 (ko) 뉴럴 네트워크를 수반한 클라이언트 장치 및 그것을 포함하는 시스템
US11055608B2 (en) Convolutional neural network
WO2020177651A1 (zh) 图像分割方法和图像处理装置
US11989645B2 (en) Event-based extraction of features in a convolutional spiking neural network
CN112308200B (zh) 神经网络的搜索方法及装置
WO2019211226A1 (en) Neural hardware accelerator for parallel and distributed tensor computations
EP3659072B1 (de) Rechner für spiking neural network mit maximaler aggregation
US9342780B2 (en) Systems and methods for modeling binary synapses
Alaeddine et al. Deep residual network in network
FR3085517A1 (fr) Architecture de calculateur d'une couche de convolution dans un reseau de neurones convolutionnel
CN117063182A (zh) 一种数据处理方法和装置
Andreou et al. Neuromorphic Chiplet Architecture for Wide Area Motion Imagery Processing
EP3955170A1 (de) Systolische rechnerarchitektur für die implementierung von künstlichen neuronalen netzen, die verschiedene arten von faltungen verarbeiten
CN118196424A (zh) 特征提取单元、特征提取方法及相关设备
CN116868205A (zh) 经层式分析的神经网络剪枝方法和系统
CN113892115A (zh) 用于二值化卷积神经网络的处理器,逻辑芯片及其方法
Szu et al. Natural Intelligence can do Compressive Sampling adaptively

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20181018

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20210707

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS