EP4278304A1 - Optimisation d'opérations dans un réseau de neurones artificiels - Google Patents
Optimisation d'opérations dans un réseau de neurones artificielsInfo
- Publication number
- EP4278304A1 EP4278304A1 EP21706384.1A EP21706384A EP4278304A1 EP 4278304 A1 EP4278304 A1 EP 4278304A1 EP 21706384 A EP21706384 A EP 21706384A EP 4278304 A1 EP4278304 A1 EP 4278304A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- input value
- bits
- weight
- mathematical operation
- neuron
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 41
- 210000002569 neuron Anatomy 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000012545 processing Methods 0.000 claims description 41
- SGPGESCZOCHFCL-UHFFFAOYSA-N Tilisolol hydrochloride Chemical compound [Cl-].C1=CC=C2C(=O)N(C)C=C(OCC(O)C[NH2+]C(C)(C)C)C2=C1 SGPGESCZOCHFCL-UHFFFAOYSA-N 0.000 claims 9
- 230000015654 memory Effects 0.000 description 22
- 230000006870 function Effects 0.000 description 17
- 238000003860 storage Methods 0.000 description 16
- 238000012549 training Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 238000007792 addition Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 3
- 230000035508 accumulation Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 210000000225 synapse Anatomy 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the present disclosure relates generally to data processing and, more particularly, to system and method for optimizing operations in artificial neural network computations.
- ANNs Artificial Neural Networks
- the human brain contains 10-20 billion neurons connected through synapses. Electrical and chemical messages are passed from neurons to neurons based on input information and their resistance to passing information.
- a neuron can be represented by a node performing a simple operation of addition coupled with a saturation function.
- a synapse can be represented by a connection between two nodes. Each of the connections can be associated with an operation of multiplication by a constant.
- the ANNs are particularly useful for solving problems that cannot be easily solved by classical computer programs.
- ANNs While forms of the ANNs may vary, they all have the same basic elements similar to the human brain.
- a typical ANN can be organized into layers, each of the layers may include many neurons sharing similar functionality.
- the inputs of a layer may come from a previous layer, multiple previous layers, any other layers or even the layer itself.
- Major architectures of ANNs include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Long Term Short Memory (LTSM) network, but other architectures of ANN can be developed for specific applications. While some operations have a natural sequence, for example a layer depending on previous layers, most of the operations can be carried out in parallel within the same layer. The ANNs can then be computed in parallel on many different computing elements similar to neurons of the brain.
- a single ANN may have hundreds of layers. Each of the layers can involve millions of connections. Thus, a single ANN may potentially require billions of simple operations like multiplications and additions. [0004] Because of the larger number of operations and their parallel nature, ANNs can result in a very heavy load for processing units (e.g., CPU), even ones running at high rates.
- processing units e.g., CPU
- GPUs graphics processing units
- GPUs can be used to process large ANNs because GPUs have a much higher throughput capacity of operations in comparison to CPUs. Because this approach solves, at least partially, the throughput limitation problem, GPUs appear to be more efficient in the computations of ANNs than the CPUs.
- GPUs are not well suited to the computations of ANNs because the GPUs have been specifically designed to compute graphical images.
- the GPUs may provide a certain level of parallelism in computations.
- the GPUs are constraining the computations in long pipes implying latency and lack of reactivity.
- very large GPUs can be used which may involving excessive power consumption, a typical issue of GPUs. Since the GPUs may require more power consumptions for the computations of ANNs, the deployment of GPUs can be difficult.
- CPUs provide a very generic engine that can execute very few sequences of instructions with a minimum effort in terms of programming, but lack the power of computing for ANN.
- GPUs are slightly more parallel and require a larger effort of programming than CPUs, which can be hidden behind libraries with some performance costs, but are not very well suitable for ANNs.
- FPGAs Field Programmable Gate Arrays
- the FPGAs can be configured to perform computations in parallel. Therefore, FPGAs can be well suited to compute ANNs.
- One of the challenges of FPGAs is the programming, which requires a much larger effort than programming CPUs and GPUs. Adaption of FPGAs to perform ANN computations can be more challenging than for CPUs and GPUs.
- Most attempts in programming FPGAs to compute ANNs have being focusing on a specific ANN or a subset of ANNs, or requiring modifying the ANN structure to fit into a specific limited accelerator, or providing a basic functionality without solving the problem of computing ANN on FPGAs globally.
- the computation scale is typically not considered for existing FPGA solutions, many of the research being limited to a single or few computation engines, which could be replicated.
- the existing FPGA solutions do not solve the problem of massive data movement required at large scale for the actual ANN involved in real industrial applications.
- the inputs to be computed with an ANN are typically provided by an artificial intelligence (Al) framework.
- Al artificial intelligence
- Those programs are used by the Al community to develop new ANN or global solutions based on ANN.
- FPGAs are also lacking integration in those software environments.
- a system for optimizing operations in ANN computations may include a processing unit.
- the processing unit can be configured to select a first input value from a set of input values to a neuron.
- the processing unit can be configured to select, based on a criterion, a second input value from the set of input values to the neuron.
- the processing unit can be configured to acquire a first weight from a set of weights corresponding to the first input value.
- the processing unit can be configured to acquire a second weight from a set of weights corresponding to the second input value.
- the processing unit can be configured to perform, in parallel, a first mathematical operation on the first input value and the first weight to obtain a first result and a second mathematical operation based on a set of the bits of the second input value and the second weight to obtain a second result.
- the first mathematical operation can require a first number of bits.
- the second mathematical operation can require a second number of bits.
- the second number of bits can be smaller than the first number of bits.
- the processing unit can be configured to compute an output of the neuron based on the first result and the second result.
- the first mathematical operation includes a multiplication product.
- the second mathematical operation includes a bitwise shift of the second weight.
- the processing unit can be configured to provide, without modifying, the second weight to an accumulating unit.
- the accumulating unit can be configured to add the second weight to a sum.
- the sum can be used to compute the output of the neuron.
- the accumulating unit may include an enable for configuring the accumulating unit to add the second weight to the sum.
- the first input value and the second input value include the same number of bits in the set of input values.
- the processing unit can be configured to perform operations on a part of bits of the second input value. A number of bits in the part of bits can be less than a number of bits in the second input value.
- the selection of the second input value includes comparing the second input value to at least one reference value. Instead of comparing the value to at least one reference value, the selection of the second input value may include comparing a subset of the bits of the second value to 0 or 1.
- the processing unit can be configured to provide the first input value or the second input value to at least one further processing unit in parallel to performing the first mathematical operation and the second mathematical operation.
- the processing unit can be integrated into an electronic circuit configured to perform computations of the ANN.
- the electronic circuit can include a first circuitry to perform the first operation and a second circuitry to perform the second operation, where a number of transistors in the second circuitry is less than a number of the transistors in the first circuitry.
- a method for optimizing operations in ANN computations is provided.
- the method can be performed by at least one processing unit.
- the method may include selecting a first input value from a set of input values to a neuron.
- the method may include selecting, based on a criterion, a second input value from the set of input values to the neuron.
- the method may also include acquiring a first weight from a set of weights corresponding to the first input value.
- the method may also include acquiring a second weight from a set of weights corresponding to the second input value.
- the method may also include performing, in parallel, a first mathematical operation on the first input value and the first weight to obtain a first result and a second mathematical operation based on a set of the bits of the second input value and the second weight to obtain a second result.
- the first mathematical operation can require a first number of bits.
- the second mathematical operation can require a second number of bits.
- the second number of bits can be less than the first number of bits.
- the method may include computing an output of the neuron based on the first result and the second result.
- FIG. 1 is a block diagram showing an example system wherein a method for optimizing operations in ANN computations can be implemented, according to some example embodiments.
- FIG. 2 shows an ANN, neuron, and transfer function, according to an example embodiment.
- FIG. 3 is a flow chart showing training and inference of ANN, according to some example embodiments.
- FIG. 4 is a block diagram showing a processing unit for optimizing operations in ANN computations, according to some example embodiments.
- FIG. 5 is a block diagram showing an accumulating unit for optimizing operations in ANN computations, according to an example embodiment.
- FIG. 6 is a schematic 600 showing a timeline for calculating a neuron by using standard multiplications and time for calculating the neuron using a set of operations, according some example embodiments.
- FIG. 7 is a flow chart showing steps of a method for optimizing operations in
- ANN computations according to some example embodiments.
- FIG. 8 shows a computing system that can be used to implement embodiments of the disclosed technology.
- Embodiments of this disclosure are concerned with methods and systems for optimizing operations in ANN computations.
- Embodiments of the present disclosure may monitor number of meaningful bits of input values to neurons of an ANN and weights of the input values to neurons and select, based on the meaningful bits, a type of mathematical operations needed to obtain products of the input values and the weights.
- Embodiments of the present disclosure may also allow to perform, in parallel, at least two operations for obtaining a first product of a first input value and a first weight and a logic operation equivalent to a second product of a second input value and a second weight, where the first product and the logic operation equivalent to the second product are determined by different mathematical operations.
- Number of bits required for obtaining the second product can be less then number of bits required for obtaining the first product.
- the size and number of elements of an electrical circuit designed for obtaining the second product result can be less than the size and number of elements of an electrical circuit designed for obtaining the first product result.
- module shall be construed to mean a hardware device, software, or a combination of both.
- a hardware-based module can use one or more microprocessors, FPGAs, application-specific integrated circuits (ASICs), programmable logic devices, transistor-based circuits, CPUs, or various combinations thereof.
- Software-based modules can constitute computer programs, computer program procedures, computer program functions, and the like.
- a module of a system can be implemented by a computer or server, or by multiple computers or servers interconnected into a network.
- module may also refer to a subpart of a computer system, a hardware device, an integrated circuit, or a computer program
- Technical effects of certain embodiments of the present disclosure can include configuring or designing integrated circuits, FPGAs, or computer systems to perform ANN computations without execution of redundant and unnecessary mathematical operations, or by dynamically reducing the complexity of some mathematical operations, thereby accelerating the ANN computations or using fewer transistors in electronic circuits to obtain the same result. Further technical effects of some embodiments of the present disclosure can facilitate configuration or design of integrated circuits, FPGAs, or computer systems to dynamically qualify data on which mathematical operations are to be performed in the ANN computations. Yet further technical effects of embodiments of the present disclosure include configuration or design of integrated circuits, FPGAs, or computer systems to dynamically align results of neuron computations performed in parallel by multiple processing units.
- FIG. 1 is a block diagram showing an example system 100, where a method for optimizing operations in ANN computations can be implemented, according to some example embodiments.
- the system 100 can be part of a computing system, such as a personal computer, a server, a doud-based computing recourse, and the like.
- the system 100 may indude one or more FPGA boards 105 and a chipset 135 inducting a least one CPU.
- the chipset 135 can be communicatively connected to the FPGA boards 105 via a communication interface.
- the communication interface may indude a Peripheral Component Interconnect Express (PCIE) standard 130.
- the communication interface may also indude an Ethernet connection 131.
- PCIE Peripheral Component Interconnect Express
- the FPGA board 105 may indude an FPGA 115, a volatile memory 110, and a non-volatile memory 120.
- the volatile memory 110 may indude a double data rate synchronous dynamic random-access memory (DDR SDRAM), High Bandwidth Memory (HBM), or any other type of memory.
- the volatile memory 110 may indude the host memory.
- the non-volatile memory 120 may indude Electrically Erasable Programmable Read-Only Memory (EEROM), a solid-state drive (SSD), a flash memory, and so forth.
- the FPGA 115 can indude blocks.
- the blocks may indude a set of elementary nodes (also referred to as gates) performing basic hardware operations, such as Boolean operations.
- the blocks may further indude registers retaining bit information, one or more memory storages of different sizes, and one or more digital signal processors (DSPs) to perform arithmetic computations, for example, additions and multiplications.
- DSPs digital signal processors
- Programming of FPGA 115 may indude configuring each of the blocks to have an expeded behavior and connecting the blocks by routing information between the blocks. Programming of FPGA 115 can be carried out using a result from a compiler receiving as input schematic description, gate-level description, hardware languages like Verilog, System Verilog, or Very High Speed Integrated Circuit Hardware Description Language (VHDL), or any combination of thereof.
- VHDL Very High Speed Integrated Circuit Hardware Description Language
- the non-volatile memory 120 may be configured to store instructions in a form of bit file 125 to be executed by the FPGA 115.
- the FPGA 115 can be configured by the instructions to perform one or more floating point operations induding multiplication and addition to calculate the sum of products that can be used in neural network computations.
- the volatile memory 110 can be configured to store weights W[i] for neurons of one or more ANNs, input values V[i] to be processed for the ANNs, and results of ANNs computation induding any intermediate results of computations of layers of the ANNs.
- FIG. 2 shows ANN 210, neuron 220, and transfer function 230, according to some example embodiments.
- the ANN 210 may indude one or more input layers 240, one or more hidden layers 250, and one or more output layers 260.
- Each of the input layers, hidden layers, and output layers may include one or more (artificial) neurons 220. The number of neurons can be different for different layers.
- Each of neurons 220 may represent a calculation of a mathematical function
- V[i] are neuron input values
- W[i] are weights assigned to input values at neuron
- F(X) is a transfer function.
- the transfer function 230 F(X) is selected to be zero for X ⁇ 0 and have a limit of zero as X approaches zero.
- the transfer function F(X) can be in the form of a sigmoid.
- the result of calculation of a neuron propagates as an input value of further neurons in the ANN.
- the further neurons can belong to either the next layer, a previous layer or the same layer.
- ANN 210 illustrated in FIG. 2 can be referred to as a feedforward neural network
- embodiments of the present disclosure can be also used in computations of convolution neural networks, recurrent neural networks, long short-term memory networks, and other types of ANNs.
- FIG. 3 is a flow chart showing training 310 and inference 325 of an ANN, according to some example embodiments.
- the training 310 (also known as learning) is a process of teaching ANN 305 to output a proper result based on a given set of training data 315.
- the process of training may include determining weights 320 of neurons of the ANN 305 based on training data 315.
- the training data 315 may include samples. Each of the samples may be represented as a pair of input values and an expected output.
- the training data 315 may include hundreds to millions of samples. While the training 310 is required to be performed only once, it may require a significant number of computations and considerable time.
- the ANNs can be configured to solve different tasks including, for example, image recognition, speech recognition, handwriting recognition, machine translation, social network filtering, video games, medical diagnosis, and so forth.
- the inference 325 is a process of computation of an ANN.
- the inference 325 uses the trained ANN weights 320 and new data 330 including new sets of input values. For each new set of input values, the computation of the ANN provides a new output which answers the problem that the ANN is supposed to solve.
- an ANN can be trained to recognize various animals in images.
- the ANN can be trained on millions of images of animals. Submitting a new image to the ANN would provide the information for animals in the new image (this process being known as image tagging). While the inference for each image takes less computations than training, number of inferences can be large because new images can be received from billions of sources.
- the inference 325 includes multiple computations of sum of products: wherein the V[i] are new input values and W[i] are weights associated with neurons of ANN.
- the weights W[i] may remain unchanged while input values V[i] are dynamic and depend on input data to the ANN.
- the inference 325 may include multiplication by zero that can be avoided by inspecting input values V[i] and weights W[i].
- Multiplications V[i] x W[i] can be not carried out if a predetermined criterion is satisfied with respect to input value V[i] and weight W[i]. For example, multiplication V[i] x W [i] can be skipped if the input value V[i] or weight W[i] is substantially zero.
- the multiplications V[i] x W[i] are performed.
- the same accumulating unit is used for performing any of the multiplications V[i] x W[i] without considering values of the input values V[i] or weights W[i].
- the values of the input values V[i] or weights W[i] can be inspected to determine amount of bit operations required to perform the multiplication V[i] x W[i].
- some embodiments of the present disclosure may allow performing, in parallel, at least two mathematical operations on at least two pairs (V[i], W [i]) and (V[i 2 ], W[i 2 ]) wherein obtaining a value of the product V[i 2 ] x W[i 2 ] requires less bitwise operations (and, hence, a circuitry of a smaller size) than obtaining a value of the product V[i] x W[i].
- This approach allows dynamically reducing the complexity of mathematical operations based on the input values V[i] and other values, for example, static values including weights.
- embodiments of the present disclosure may allow dynamic selection of operations to be performed on an optimal computing logic.
- FIG. 4 is a block diagram showing a processing unit 400 for accelerating ANN computation, according to some example embodiments.
- the processing unit 400 may include a controller 415, a selector 420, and an accumulating unit 425.
- the controller 415 may receive a set ⁇ V[j 0 ], V[j 1 ], ... , V [j x-1 ] ⁇ of X input values 405 to a neuron.
- the controller 415 may optionally receive further input values 406 which are different from the input values 405.
- the further input values 406 can be related to the neuron, the layer, the ANN, the weights, the operation to be carried or any other kind of values.
- the controller 415 may provide, based on the input values 405 and the further values 406, an indication to the selector 420 as to which X input values to select from the set of input values 405.
- the controller 415 may provide, to the selector 420, a primary identifier of a primary input value V[i] and a secondary identifier of a secondary input value V[i 2 ].
- the controller 415 may also provide the primary identifier and the secondary identifier to the accumulating unit 425. Both the primary identifier and the secondary identifier may include an offset, an index, or bit enables of the selected input values in the set ⁇ V[j 0 ], V[j 1 ],..., V[j x-1 ] ⁇ -
- the controller 415 may also provide an enable 430 to the accumulating unit 425.
- the controller 415 can be configured to select the secondary input value V[i 2 ] based on criteria — K ⁇ V[i 2 ] ⁇ L.
- interval [— K; L] can be one of the following: [0;1], [-2;1], [-4:3], and so forth, allowing to perform mathematical operations equivalent to standard multiplication of V[i 2 ] and corresponding weights, such that the mathematical operations require using less bits than the standard multiplication.
- the controller 415 may include a comparator for comparing V[i 2 ] to — K and L.
- the primary input value V[i] can be used to calculate products of V[i] and corresponding weights using standard multiplication operation.
- the controller 415 can be configured to avoid selecting secondary input value V[i 2 ] as the primary input value as a parameter of the standard multiplication operation because V[i 2 ] is already used in an operation simpler than the standard multiplication. This allows substantially doubling the speed of computing of a given set of multiplications to determine the output of a neuron in an ANN.
- the selector 420 may receive the set of input values ⁇ V[i 0 ], V[i 1 ], ... V[i x-1 ] ⁇ and the primary identifier and the secondary identifier from the controller 415. The selector 420 may select, based on the primary identifier, a primary input value V[i] and provide the selected primary input value V[i] to the arithmetic unit 425. The selector 420 may select, based on the secondary identifier, a secondary input value V[i 2 ], and provide the selected secondary input value V[i 2 ] to the arithmetic unit 425.
- the arithmetic unit 425 can select, based on the primary identifier, weight W[i] corresponding to the primary input value V[i].
- the arithmetic unit 425 can select, based on the secondary identifier, weight W[i 2 ] corresponding to the secondary input value V[i 2 ].
- the arithmetic unit 425 can perform a first operation on the primary input value V[i] and corresponding weight W[i] and a second operation on the secondary input value V[i 2 ] and corresponding weight W[i 2 ].
- the arithmetic unit 425 can further accumulate the results of the first operation and the second operation. Performing the second operation can require fewer bits and a simpler logic than performing the first operation.
- the first operation may include a standard multiplication operation.
- the controller 415 and the selector 420 may be carried out as a single unit configured to perform functionalities of both controller 415 and selector 420.
- the same controller 415 can be shared between multiple processing units similar to the processing unit 400 because input values ⁇ V[i 0 ], V[i 1 ], ... V[i x-1 ] ⁇ can be used multiple times with different sets of weights.
- the processing unit 400 may include different accumulating units (similar to the accumulating unit 425) for the first operation and the second operation.
- the selection of weights and the accumulation of results of first operation and the second operation can be carried out by different processing units.
- the accumulating unit 425 may be configured to perform either only the first operation or multiple second operations.
- the accumulating unit 415 may execute one of the following: 1) single first operation on single input value and single weight; or 2) multiple second operations on multiple input values and multiple weights based on the selection by selector 420.
- the selector 420 can be configured to select multiple secondary input values matching the criterion — K ⁇ V[i 2 ] ⁇ L.
- FIG. 5 is a block diagram showing the accumulating unit 425, according to some example embodiments.
- the accumulating unit 425 can be configured to compute sums, multiplications, accumulations, or other operations.
- the accumulating unit 425 may include multiplication unit 505, function unit 510, and summation unit 515.
- the accumulating unit 425 may include other operational units necessary for operations of the arithmetic unit 425.
- the accumulating unit 425 may receive, from the controller 415, primary identifier of the primary input value V[i] and secondary identifier of the secondary input value V[i 2 ].
- the accumulating unit 425 may receive, from the selector 420, the primary input value V[i] and the secondary input value V[i 2 ].
- the accumulating unit 425 may be configured to select, based on the primary identifier, a weight W[i] corresponding to the primary input value V[i].
- the accumulating unit 425 may be configured to select, based on the secondary identifier, weight W[i 2 ] corresponding to the secondary input value V[i].
- the multiplication unit 505 may determine product V[i] x W[i], The multiplication unit 505 performs m bits by n bits multiplication, where m is number of bits used for the primary input value V[i] and m is number of bits used for weight W[i]. [0057] Simultaneously with the multiplication unit 505, the function unit 510 may perform an operation on the secondary input value V[i 2 ] and corresponding weight W[i 2 ]. The function unit 510 can be designed to perform different operations based on the number of significant bits n 2 of secondary input value V[i 2 ]:
- the controller 430 may provide an enable 430 to configure the function unit 510 to provide the weight W[i 2 ] to the accumulating unit without performing any operations.
- n 2 2 and function unit 510 can use a simple combinatorial logic for performing m bits by n 2 multiplication.
- function unit 510 can be designed to perform m bits by n 2 bits multiplication, which requires fewer gates and transistors than the m bit by n bit multiplication performed by the multiplication unit 505 because n 2 ⁇ n.
- the summation unit 515 can be configured to accumulate results of parallel computations of the multiplication unit 505 and function unit 510 to a sum.
- FIG. 6 is a schematic 600 showing time Ti of calculating a neuron using standard multiplication and time T 2 of calculating the neuron using a set of different operations, according to some example embodiments.
- the neuron can be calculated based on a set of input values ⁇ V[0], V[0], V[x — 1] ⁇ and a set of weights ⁇ W[0], W[1], V[x — 1] ⁇ .
- the time T 2 for computing an output of the neuron using a set of different operations on the input values ⁇ V[0], V[0], V[x — 1] ⁇ and the corresponding weights ⁇ W[0], W[1],..., V[x — 1] ⁇ is shorter than time Ti for computing the output of the neuron by using only standard multiplications of size NxM on the input values ⁇ V[0], V [0],..., V[x — 1] ⁇ and the corresponding weights ⁇ W[0], W[1],..., V[x — 1] ⁇ .
- the multiplication V[i 2 ] x W[i 2 ] is not executed by the multiplication unit 505 (shown in FIG. 5) during period but performed by the function unit 510 during time period [t i_1 ; t i ] in parallel with multiplication V[i] x W[i].
- multiplication unit 505 can use the free period to perform other multiplications of the same neuron. Accordingly, the summation unit 515 can obtain the sum earlier with less logic than when using only standard NxM bit multiplications.
- FIG. 7 is a flow chart illustrating a method 700 for optimizing operations in ANN computations, in accordance with some example embodiments.
- the operations may be combined, performed in parallel, or performed in a different order.
- the method 700 may also include additional or fewer operations than those illustrated.
- the method 700 may be performed by processing unit 400 described above with reference to in FIG. 4 and FIG. 5.
- the method 700 commence with selecting a first input value from a set of input values to a neuron.
- the method 700 may select, based on a criterion, a second input value from the set of input values to the neuron. Selecting the second input value may include comparing the second input value to at least one reference value. The first input value and the second input value may include the same number of bits in the set of input values.
- the method 700 may acquire a first weight from a set of weights corresponding to the first input value.
- the method 700 may acquire a second weight from a set of weights corresponding to the second input value.
- the method 700 may perform, in parallel, a first mathematical operation on the first input value and the first weight to obtain a first result and a second mathematical operation on the second input value and the second weight to obtain a second result.
- the first mathematical operation can require a first number of bits.
- the second mathematical operation can require a second number of bits, the second number of bits being less than the first number of bits.
- the first mathematical operation may include a multiplication product.
- the second mathematical operation may include a bitwise shift of the second weight.
- the second mathematical operation can be performed based on a part of bits of the second input value.
- the part of bits can include a number of bits smaller than a number of bits in the second input value.
- the method may include providing, without modifying, the second weight to an accumulating unit, the accumulating unit being configured to add the second weight to a sum being used to compute the output of the neuron (equation (2)).
- the accumulating unit includes an enable for configuring the accumulating unit to add the second weight to the sum.
- the method 700 may include computing an output of the neuron based on the first result and the second result.
- the method 700 may include providing the first input value or the second input value to at least one further processing unit in parallel to performing the first mathematical operation and the second mathematical operation.
- the processing unit can be integrated into an electronic circuit configured to perform computations of the ANN.
- FIG. 8 illustrates an example computing system 800 that may be used to implement embodiments described herein.
- the example computing system 800 of FIG. 8 may include one or more processors 810 and memory 820.
- Memory 820 may store, in part, instructions and data for execution by the one or more processors 810.
- Memory 820 can store the executable code when the exemplary computing system 800 is in operation.
- the processor 810 may include internal accelerators like a graphical processing unit, a Field Programmable Gate Array, or similar accelerators that may be suitable for use with embodiments described herein.
- the memory 820 may include internal accelerators like a graphical processing unit, a Field Programmable Gate Array, or similar accelerators that may be suitable for use with embodiments described herein.
- the example computing system 800 of FIG. 8 may further include a mass storage 830, portable storage 840, one or more output devices 850, one or more input devices 860, a network interface 870, and one or more peripheral devices 880.
- FIG. 8 The components shown in FIG. 8 are depicted as being connected via a single bus 890.
- the components may be connected through one or more data transport means.
- the one or more processors 810 and memory 820 may be connected via a local microprocessor bus, and the mass storage 830, one or more peripheral devices 880, portable storage 840, and network interface 870 may be connected via one or more input/output buses.
- Mass storage 830 which may be implemented with a magnetic disk drive, an optical disk drive or a solid state drive, is a non-volatile storage device for storing data and instructions for use by a magnetic disk, an optical disk drive or SSD, which in turn may be used by one or more processors 810. Mass storage 830 can store the system software for implementing embodiments described herein for purposes of loading that software into memory 820.
- the mass storage 830 may also include internal accelerators like a graphical processing unit, a Field Programmable Gate Array, or similar accelerators that may be suitable for use with embodiments described herein.
- Portable storage 840 may operate in conjunction with a portable non-volatile storage medium, such as a compact disk (CD) or digital video disc (DVD), to input and output data and code to and from the computing system 800 of FIG. 8.
- the system software for implementing embodiments described herein may be stored on such a portable medium and input to the computing system 800 via the portable storage 840.
- One or more input devices 860 provide a portion of a user interface.
- the one or more input devices 860 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, a stylus, or cursor direction keys.
- the computing system 800 as shown in FIG. 8 includes one or more output devices 850. Suitable one or more output devices 850 include speakers, printers, network interfaces, and monitors.
- Network interface 870 can be utilized to communicate with external devices, external computing devices, servers, and networked systems via one or more communications networks such as one or more wired, wireless, or optical networks including, for example, the Internet, intranet, LAN, WAN, cellular phone networks (e.g., Global System for Mobile communications network, packet switching communications network, circuit switching communications network), Bluetooth radio, and an IEEE 802.11-based radio frequency network, among others.
- Network interface 770 may be a network interface card, such as an Ethernet card, optical transceiver, radio frequency transceiver, or any other type of device that can send and receive information.
- Other examples of such network interfaces may include Bluetooth®, 3G, 4G, and WiFi® radios in mobile computing devices as well as a USB.
- One or more peripheral devices 880 may include any type of computer support device to add additional functionality to the computing system.
- the one or more peripheral devices 880 may include a modem or a router.
- the example computing system 800 of FIG. 8 may also include one or more accelerator devices 885.
- the accelerator devices 885 may include PCIe-form-factor boards or storage-form-factor boards, or any electronic board equipped with a specific electronic component like a Graphical Processing Unit, a Neural Processing Unit, a Multi-CPU component, a Field Programmable Gate Array component, or similar accelerators electronic or photonic components, that may be suitable for use with embodiments described herein.
- the components contained in the exemplary computing system 800 of FIG. 8 are those typically found in computing systems that may be suitable for use with embodiments described herein and are intended to represent a broad category of such computer components that are well known in the art.
- the exemplary computing system 800 of FIG. 8 can be a personal computer, handheld computing device, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device.
- the computer can also include different bus configurations, networked platforms, multi-processor platforms, and so forth.
- Various operating systems (OS) can be used including UNIX, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
- Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium).
- the instructions may be retrieved and executed by the processor.
- Some examples of storage media are memory devices, tapes, disks, and the like.
- the instructions are operational when executed by the processor to direct the processor to operate in accord with the example embodiments. Those skilled in the art are familiar with instructions, processor(s), and storage media.
- Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk.
- Volatile media include dynamic memory, such as RAM.
- Transmission media include coaxial cables, copper wire, and fiber optics, among others, including the wires that include one embodiment of a bus.
- Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency and infrared data communications.
- Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, SSD, a CD-read-only memory (ROM) disk, DVD, any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
- Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to a CPU for execution.
- a bus carries the data to system RAM, from which a CPU retrieves and executes the instructions.
- the instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
- the instructions or data may not be used by the CPU but be accessed in writing or reading from the other devices without having the CPU directing them.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Complex Calculations (AREA)
Abstract
Sont divulgués, des systèmes et des procédés pour optimiser des opérations dans des calculs de réseau de neurones artificiels. Un procédé donné à titre d'exemple peut consister à sélectionner une première valeur d'entrée à partir d'un ensemble de valeurs d'entrée à un neurone, à sélectionner, sur la base d'un critère, une seconde valeur d'entrée à partir de l'ensemble de valeurs d'entrée, à acquérir une première pondération à partir d'un ensemble de pondérations, à acquérir une seconde pondération à partir d'un ensemble de pondérations, à effectuer, en parallèle, une première opération mathématique sur la première valeur d'entrée et sur la première pondération pour obtenir un premier résultat, une seconde opération mathématique sur la base de la seconde valeur d'entrée et de la seconde pondération pour obtenir un second résultat, la seconde opération mathématique nécessitant moins de nombre de bits que la première opération mathématique, le second nombre de bits étant inférieur au premier nombre de bits, et à calculer une sortie du neurone sur la base du premier résultat et du second résultat.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2021/050225 WO2022153078A1 (fr) | 2021-01-13 | 2021-01-13 | Optimisation d'opérations dans un réseau de neurones artificiels |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4278304A1 true EP4278304A1 (fr) | 2023-11-22 |
Family
ID=74666752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21706384.1A Pending EP4278304A1 (fr) | 2021-01-13 | 2021-01-13 | Optimisation d'opérations dans un réseau de neurones artificiels |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4278304A1 (fr) |
WO (1) | WO2022153078A1 (fr) |
-
2021
- 2021-01-13 WO PCT/IB2021/050225 patent/WO2022153078A1/fr unknown
- 2021-01-13 EP EP21706384.1A patent/EP4278304A1/fr active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2022153078A1 (fr) | 2022-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200226458A1 (en) | Optimizing artificial neural network computations based on automatic determination of a batch size | |
US11625583B2 (en) | Quality monitoring and hidden quantization in artificial neural network computations | |
US10990525B2 (en) | Caching data in artificial neural network computations | |
US20200311511A1 (en) | Accelerating neuron computations in artificial neural networks by skipping bits | |
EP3924891A1 (fr) | Surveillance de qualité et quantification cachée dans des calculs de réseau neuronal artificiel | |
US11494624B2 (en) | Accelerating neuron computations in artificial neural networks with dual sparsity | |
US11068784B2 (en) | Generic quantization of artificial neural networks | |
US10769527B2 (en) | Accelerating artificial neural network computations by skipping input values | |
US11568255B2 (en) | Fine tuning of trained artificial neural network | |
US20220222519A1 (en) | Optimizing operations in artificial neural network | |
WO2020194032A1 (fr) | Accélération de calculs neuronaux dans des réseaux neuronaux artificiels par saut de bits | |
EP3895024A1 (fr) | Mise en mémoire cache de données dans des calculs de réseau neuronal artificiel | |
US11126912B2 (en) | Realigning streams of neuron outputs in artificial neural network computations | |
EP4278304A1 (fr) | Optimisation d'opérations dans un réseau de neurones artificiels | |
US11645510B2 (en) | Accelerating neuron computations in artificial neural networks by selecting input data | |
US11748623B2 (en) | Modifying structure of artificial neural networks by collocating parameters | |
EP3895073A1 (fr) | Réalignement de flux de sorties de neurone dans des calculs de réseau de neurones artificiels | |
EP3895071A1 (fr) | Accélération de calculs de réseau neuronal artificiel par saut de valeurs d'entrée | |
EP3973464A1 (fr) | Accélération de calculs neuronaux dans des réseaux neuronaux artificiels avec double dispersion | |
US20210117800A1 (en) | Multiple locally stored artificial neural network computations | |
EP3908981A1 (fr) | Optimisation de calculs de réseau neuronal artificiel sur la base d'une détermination automatique d'une taille de lot | |
WO2022053851A1 (fr) | Réglage de précision d'un réseau neuronal artificiel entraîné | |
EP3953867A1 (fr) | Accélération de calculs neuronaux dans des réseaux neuronaux artificiels par sélection des données d'entrée | |
EP4136584A1 (fr) | Structure modificatrice de réseaux neuronaux artificiels en colocalisant des paramètres | |
EP4049187A1 (fr) | Multiples calculs stockés localement de réseau neuronal artificiel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230805 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) |