EP3915057A1 - Quantification générique de réseaux neuronaux artificiels - Google Patents
Quantification générique de réseaux neuronaux artificielsInfo
- Publication number
- EP3915057A1 EP3915057A1 EP20704358.9A EP20704358A EP3915057A1 EP 3915057 A1 EP3915057 A1 EP 3915057A1 EP 20704358 A EP20704358 A EP 20704358A EP 3915057 A1 EP3915057 A1 EP 3915057A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- interval
- data type
- ann
- neurons
- inputs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 79
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 50
- 210000002569 neuron Anatomy 0.000 claims abstract description 111
- 238000000034 method Methods 0.000 claims abstract description 96
- 238000013507 mapping Methods 0.000 claims abstract description 34
- 230000006870 function Effects 0.000 claims description 26
- 229920006395 saturated elastomer Polymers 0.000 claims description 19
- 230000015654 memory Effects 0.000 claims description 15
- 238000012546 transfer Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 description 19
- 238000007667 floating Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000007620 mathematical function Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 3
- 238000007792 addition Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 210000000225 synapse Anatomy 0.000 description 2
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013506 data mapping Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- FEPMHVLSLDOMQC-UHFFFAOYSA-N virginiamycin-S1 Natural products CC1OC(=O)C(C=2C=CC=CC=2)NC(=O)C2CC(=O)CCN2C(=O)C(CC=2C=CC=CC=2)N(C)C(=O)C2CCCN2C(=O)C(CC)NC(=O)C1NC(=O)C1=NC=CC=C1O FEPMHVLSLDOMQC-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the present disclosure relates generally to data processing and, more particularly, to system and method for generic quantization of artificial neural networks.
- ANNs Artificial Neural Networks
- the human brain contains 10-20 billion neurons connected through synapses. Electrical and chemical messages are passed from neurons to neurons based on input information and their resistance to passing information.
- a neuron can be represented by a node performing a simple operation of addition coupled with a saturation function.
- a synapse can be represented by a connection between two nodes. Each of the connections can be associated with an operation of a multiplication by a constant.
- the ANNs are particularly useful for solving problems that cannot be easily solved by classical computer programs.
- ANNs While forms of the ANNs may vary, they all have the same basic elements similar to the human brain.
- a typical ANN can be organized into layers, each of the layers may include many neurons sharing similar functionality.
- the inputs of a layer may come from a previous layer, multiple previous layers, any other layers or even the layer itself.
- Major architectures of ANNs include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Term Short Memory (LTSM) network, but other architectures of ANN can be developed for specific applications. While some operations have a natural sequence, for example a layer depending on previous layers, most of the operations can be carried out in parallel within the same layer. The ANNs can then be computed in parallel on many different computing elements similar to neurons of the brain.
- a single ANN may include hundreds of layers. Each layer may involve millions of connections. Thus, a single ANN may potentially require billions of simple operations like multiplications and additions.
- ANNs can result in a very heavy load for processing units (e.g., CPU), even ones running at high rates.
- processing units e.g., CPU
- GPUs graphics processing units
- GPUs can be used to process large ANNs because GPUs have a much higher throughput capacity of operations in comparison to CPUs. Because this approach solves, at least partially, the throughput limitation problem, GPUs appear to be more efficient in the computations of ANNs than the CPUs.
- GPUs are not well suited to the computations of ANNs because the GPUs have been specifically designed to compute graphical images.
- the GPUs may provide a certain level of parallelism in computations.
- the GPUs are constraining the computations in long pipes, which results in latency and lack of reactivity.
- very large GPUs can be used which may involve excessive power consumption, a typical issue of GPUs. Since the GPUs may require more power consumption for the computations of ANNs, the deployment of GPUs can be difficult.
- CPUs provide a very generic engine that can execute very few sequences of instructions with a minimum effort in terms of programming, but lack the power of computing required for ANNs.
- GPUs are slightly more parallel and require a larger effort of programming than CPUs, which can be hidden behind libraries with some performance costs, but are not very well suitable for ANNs.
- FPGAs Field Programmable Gate Arrays
- the FPGAs can be configured to perform computations in parallel. Therefore, FPGAs can be well suited to compute ANNs.
- Programming of FPGAs is challenging, requiring a much larger effort than programming CPUs and GPUs. Thus, adaption of FPGAs to perform ANN computations can be more challenging than for CPUs and GPUs.
- the inputs computed with an ANN are typically provided by an artificial intelligence (AI) framework.
- AI artificial intelligence
- Those programs are used by the AI community to develop new ANN or global solutions based on ANN.
- FPGAs typically lack
- Embodiments of the present disclosure may facilitate determination of
- quantization intervals for ANN data involving computations performed on numbers of types excluding floating point types.
- the description can be of a first data type.
- the one or more processors may determine a first interval of the first data type to be mapped to a second interval of a second data type.
- the processors may (b) determine, based on the set of sum results, a measure of saturations of the set of sum results. The processors may then (c) adjust, based on the measure of saturations, at least one of the first interval and the second interval.
- the processors can repeat operations (a), (b), and (c) until the measure of saturations satisfies one or more criteria.
- the at least one of the first interval and the second interval can be adjusted to cause the measure of saturations fall in a pre-determined range.
- the first data type may include a floating-point data type and the second data type may include a fixed-point data type.
- the measure of saturations can be determined based on the count of saturated sum results in the set of sum results.
- the measure of saturations can be a function of sum results in the set of sum results.
- the plurality of p neurons of the ANN can include all neurons of the ANN.
- the plurality of p neurons of the ANN can include a subset of neurons of ANN, wherein a count of neurons in the subset is less than a count of all neurons in the ANN.
- Products W x V j L can be computed using corresponding numbers of the second interval.
- the sum results can be represented by the second data type.
- the determination of the measure of saturations can include comparing at least one of the sum results to a function of boundaries of the second interval.
- products Wf x V can be computed using corresponding numbers of the second interval.
- the sum results can be represented by a third data type, wherein the third data type may be different from the second data type.
- the determination of the measure of saturations can include comparing at least one of the sum results to one or more thresholds of the third data type.
- the second data type can be a K-bit fixed data type and the third data type can be a L-bit fixed data type, wherein K and L are different.
- the determination of the measure of saturation can further include determining that the at least one of the sum results are within boundaries of the second interval and comparing the at least one of the sum results to one or more further thresholds of the third data type.
- the processors can include one or more electronic component accelerating the computations of products and sums.
- the description can be of a first data type.
- the method can determine, by the one or more processors, a first interval of the first data type to be mapped to a second interval of a second data type.
- the computations of sums can be performed using at least one number of the second data type within the second interval, wherein the at least one number is a result of mapping of at least one number of the first interval to a number of the second interval.
- the method can determine, by the one or more processors, a measure of saturations in the set of sum results.
- the method can also include adjusting, by the one or more processors and based on the measure of saturations, at least one of the first interval and the second interval.
- FIG. 1 is a block diagram showing an example system for quantization data in ANN computations, according to some example embodiments.
- FIG. 2 shows an ANN, neuron, and transfer function, according to an example embodiment.
- FIG. 3A is a flow chart showing training and inference of an ANN performed with the same data type, according to some example embodiments.
- FIG. 3B is a flow chart showing training and inference of an ANN using different data types, according to some example embodiments.
- FIG. 4A is a schematic diagram showing an example quantization of input data in ANN, according to some example embodiments.
- FIG. 4B is a flow chart showing steps of a method for quantization of ANN, according to an example embodiment.
- FIG. 5 is a flow chart showing steps of method for quantization of ANN, according to some other example embodiments.
- FIG. 6 is a schematic diagram showing an example quantization of input data in ANN, according to some example embodiments.
- FIG. 7 is a schematic diagram showing an example quantization of input data in ANN using multiple quantization intervals, according to some example
- FIG. 8 is a flow chart showing steps of a method for quantization of ANN, according to some example embodiments.
- FIG. 9 shows a computing system that can be used to implement embodiments of the disclosed technology.
- FIG. 10 is a flow chart showing steps of a method for quantization of ANN, according to some example embodiments.
- FIG. 11 is a flow chart showing steps of a method for determining saturations of a sum of products, according to an example embodiment.
- Embodiments of this disclosure are directed to methods and systems for quantization of ANNs without use of computations on floating point data.
- Embodiments of the present disclosure may facilitate selection of quantization intervals for inputs, weights and other parameters of neurons in ANNs. Some embodiments of the present disclosure may allow adjustment of quantization interval individually for each layer of an ANN, filter of the ANN, or activation map of the ANN.
- quantization interval can be adjusted separately for one or more ranges of the input data.
- the quantization interval can be adjusted to decrease the number of saturations in neurons in integer-based computations of the ANN.
- module shall be construed to include a hardware device, software, or a combination of both.
- a hardware-based module can use one or more microprocessors, FPGAs, application-specific integrated circuits (ASICs), programmable logic devices, transistor-based circuits, or various combinations thereof.
- Software-based modules can constitute computer programs, computer program procedures, computer program functions, and the like.
- a module of a system can be implemented by a computer or server, or by multiple computers or servers interconnected into a network.
- a module may refer to a subpart of a computer system, a hardware device, an integrated circuit, or a computer program.
- Technical effects of certain embodiments of the present disclosure can include increasing accuracy of fixed-point ANN computations. Further technical effects of certain embodiments of the present disclosure can allow decreasing saturations of neurons in fixed-point ANN computations.
- FIG. 1 is a block diagram showing an example system 100 for quantization of ANNs, according to some example embodiments.
- the system 100 can be part of a computing system, such as a personal computer, a server, a cloud-based computing recourse, and the like.
- the system 100 may include one or more processor(s) 110 and a memory 120.
- the memory 120 may include computer-readable instructions for execution by the processor(s) 110.
- the processor(s) 110 may include a programmable processor, such as a microcontroller, central processing unit (CPU), and so forth.
- the processor(s) 110 may include an application-specific integrated circuit(s) or programmable logic array(s), such as an FPGA(s), designed to implement the functions performed by the system 100.
- the system 100 may be installed on a remote server or may be provided as a cloud service residing in a cloud storage.
- the processor (s) 110 may be configured to receive a structure and parameters of an ANN and input datasets for the ANN.
- the input datasets may include inputs to the neurons.
- the parameters may include weights for the inputs to the neurons.
- the parameters of the ANN and the input datasets can be presented in a first data type.
- the processor(s) 110 may be further configured to select a first interval of the first data type to be mapped to a second interval of a second data type.
- the processor(s) 110 can be further configured to perform, based on the input data, computations of one or more neurons of the ANN, wherein the computations are performed using at least one number within the second interval of the second data type.
- the number within the second interval can be a result of mapping of an input from the input datasets or a parameter (for example, a weight for the input) from the parameters of the ANN to the second interval.
- the processor(s) 110 can be further configured to determine a measure of saturations in the neurons of the ANN.
- the measure of saturations can be defined as function of sums of products of weights and inputs for one or more neurons of the ANN.
- the sums of products of weights and inputs can be used to measure the saturations prior to applying, to the sums of products, a transfer function of each of the neurons taken into account for the measure of saturations.
- the one or more neurons may represent the whole ANN, a part of the ANN, for example a layer, a group of layers, a subset of neurons within the same layer, and a subset of neurons that belong to at least two different layers.
- the saturations of the sum of products of inputs and weights can be measured in the whole ANN or a part of the ANN, for estimating the quality of mapping of inputs of input datasets and other parameters of the ANN from the first interval of the first data type to the second interval of the second data type.
- the measure of saturations can be a count of saturated sums of products of weights and inputs of all neurons in the ANN or in the part of neurons of the ANN. In another embodiment, the measure of saturations can be a count of sums of products that are not saturated in the ANN or in the part of the ANN. In yet another embodiment, the measure of saturations can be determined as a ratio of the number of saturated sums of products to the total number of neurons in the ANN or in the part of the ANN. In certain embodiment, the measure of saturations can be determined as a ratio of the number of sums of products that are not saturated to the total number of neurons in the ANN or in the part of the ANN. In further, the measure of saturations can be a count of saturated sums of products of weights and inputs of all neurons in the ANN or in the part of neurons of the ANN. In another embodiment, the measure of saturations can be a count of sums of products that are not saturated in the ANN or in the part of the ANN. In yet another embodiment, the measure
- the measure of saturations can be determined by a mathematical function based on the sum of products in the ANN or in the part of the ANN.
- the mathematical function can be calculated only based on the sums of products of weights and inputs that are close to a saturated value.
- the measure of saturations can be a mathematical function that allows representing of the degree of saturations of the sums of products of weights and inputs in the ANN or the part of the ANN.
- the processor(s) 110 can be further configured to adjust, based on the measure of saturations, the first interval of the first data type and/or the second data type of the of the data types.
- computation of a neuron of the ANN using numbers of the second data type may require less operations of the processor(s) 110 than the computation of the same neuron of the ANN using numbers of the first data type.
- the input datasets presented using the second data type may require less memory to be stored than the same input datasets presented using the first data type.
- some embodiments of the present disclosure deal with real numbers as the first data type and integers as the second data type, similar methods can be used for mapping and quantization of data using another first data type and another second data type.
- the first data type may include floating point real numbers and the second data type may include fixed-point real numbers.
- the first data type can include double precision floating point numbers and the second data type may include single precision floating-point numbers.
- the first data type may include 32-bit floating point numbers and the second data type may include 8-bit integers.
- the first data type may include 8-bit integers and the second data type may include 4-bit integers.
- FIG. 2 shows ANN 210, neuron 220, and transfer function 230, according to some example embodiments.
- the ANN 210 may include one or more input layers 240, one or more hidden layers 250, and one or more output layers 260.
- Each of the input layers, hidden layers, and output layers may include one or more (artificial) neurons 220. The number of neurons can be different for different layers.
- Each of neurons 220 may be represented by a calculation of a mathematical function
- V[i] are inputs to the neuron 220
- W[i] are weights assigned to inputs to the neuron 220
- F(X) is a transfer function (also referred to as an activation function).
- the transfer function 230 F(X) is selected to be zero for X ⁇ 0 and have a limit of zero as X approaches zero.
- the transfer function F(X) can be in the form of a sigmoid.
- the result of the calculation of the neuron propagates as an input to further neurons in the ANN.
- the further neurons can belong to either the next layer, previous layer or the same layer.
- the ANN 210 illustrated in FIG. 2 can be referred to as a feedforward neural network
- embodiments of the present disclosure can be also used in computations of convolution neural networks, recurrent neural networks, long short-term memory networks, and other types of ANNs.
- FIG. 3A is a flow chart showing a workflow 300A for training 310 and inference 325 of an ANN, according to some example embodiments.
- the training 310 (also known as learning) is a process of teaching ANN 305 to output a proper result based on a given set of training data 315.
- the process of training may include determining weights 320 of neurons of the ANN 305 based on training data 315.
- the training data 315 may include samples. Each sample may be represented as a pair of input values and expected output.
- the training data 315 may include hundreds to millions of samples. While training 310 is required to be performed only once, it may require a significant amount of computations and may take a considerable time.
- the ANNs can be configured to solve different tasks including, for example, image recognition, speech recognition, handwriting recognition, machine
- the inference 325 is a process of computation of an ANN.
- the inference 325 uses the trained ANN weights 320 and new data 330 including new sets of inputs. For each new set of inputs, the computation of the ANN provides a new output which answer the problem that the ANN is supposed to solve.
- an ANN can be trained to recognize various animals in images.
- the ANN can be trained using millions of images of animals. Submitting a new image to the ANN would provide the information concerning animals in the new image (this process being known as image tagging). While the inference for each image takes fewer computations than training, the number of inferences can be large because new images can be received from billions of sources.
- the inference 325 includes multiple computations of the following sum of products (also referred to as a sum of weighted inputs to a neuron):
- V[i] are inputs to a neuron and W[i] are weights of the inputs to the neuron of the ANN.
- both training 310 and inference 325 in FIG. 3A are performed using computations based on the same type of data, for example, real numbers in floating-point format. Performing inference for large number of input datasets of new data 330 using floating-point calculations can be time consuming and may require significant computing resources for computations of an ANN.
- the inference of an ANN be performed using integer- based or fixed-point calculations in order to reduce computation time and computing resources required to perform ANN computations.
- real (floating point) numbers of input data for example, inputs to neurons
- weights associated with the ANN can be quantized.
- quantization can be referred to as a process of reduction of the number of bits that represent a real number.
- the quantization may include converting 32-bit floating point numbers into 8- bit integers. The quantization may significantly reduce bandwidth of ANN
- FIG. 3B is a flow chart showing a workflow 300B of training 310 and inference 345 of an ANN using different data types for training and inference, according to some example embodiments.
- the training 310 can be performed using training data 315.
- the training data 315 can be of a first data type, for example real numbers in the floating point format.
- the process of training may include determining weights 320 of neurons of the ANN 305.
- the weights 320 can be also of the first data type.
- the weights 320 and other parameters of ANN can be quantized in quantization 335.
- the weights 320 can be mapped to a set including a pre-determined number of numbers of a second data type.
- the second data type may include integers.
- the inference 345 can be further performed using the quantized numbers for the weights 320. Prior to the inference 345, each input dataset in new data 330 can be also quantized, that is mapped to the numbers of the second data type, in quantization 340 using the same quantization workflow as in the quantization 335. Since the weights 320 and the inputs of new data 340 are quantized and converted to the second data type, the inference 345 can be performed using hardware configured to perform computations using only second data type. The computations using the second data type may require less time and memory resources than the same computations using the first data type.
- the result of the inference 345 performed using second data type can be less accurate than the result of inference 325 performed using the first data type used in the training of ANN.
- the quantization differs from a simple data mapping because the quantization of a number of the first data type may result in a different number of the second data type.
- FIG. 4A shows a simplified schematic of example quantization of input data in an ANN, according to some example embodiments.
- the real number data associated with ANN for example input values for a layer
- the range is shown below zero, in general, the range of input data can include an interval with only positive numbers, both positive and negative input numbers, and zero number.
- the intervals (ti , ti +i ] can be equal in length.
- the length of (ti , ti +i ] is referred to as a quantization interval or a quantization step. All input data within the same interval (ti , ti +i ] can be mapped to an integer i.
- the input data in interval [-B; -A] are real numbers 32-bit floating point data.
- the input data in [-B; -A] are represented by 8-bit integers between -128 to +127.
- a range to be quantized is selected to be symmetrical with respect to 0.
- the range [-B; +B] can be selected to include the input data from the range [-B; - A]
- FIG. 4B is a block diagram showing a method 400 for quantization of ANNs, according to some embodiments.
- the method 400 may correspond to some current approaches used for quantization of ANNs.
- the method 400 can be implemented using the system 100 described above with reference to FIG. 1.
- the method 400 may commence, in block 410, with computing a layer of ANN.
- the computations of neurons of the layer is performed using real numbers for input data 405.
- the method 400 may determine, based on the result of commutation of the layer, a quantization interval.
- the weights of the neurons can be converted to integer numbers once after the ANN is trained using real numbers.
- the quantization interval can be determined per layer, because input data for the different layers can be of different range.
- the quantization interval for a layer can be determined based on the maximum and minimum numbers for the input values 405 for the layer.
- the range of values to be quantized can be selected to be symmetrical with respect to zero as shown in FIG. 4A.
- the method 400 may truncate real numbers (weights and input data) to integers numbers (or fix-point numbers).
- an inference of the ANN can be performed for other sets of input data.
- the inference of the ANN can be performed using integer numbers for weights of neurons.
- the input data for each layer of the ANN can be converted to integer numbers based on the
- the accuracy of the result of the inference of the ANN performed using integer numbers and integer-based operation depends on whether a quantization interval is selected accurately.
- the accuracy of results of ANN computations using integer numbers may also depend on a method for mapping the real numbers to integer numbers.
- FIG. 5 is a flow chart, showing a method 500 for quantization of the ANN, according to some embodiments of the present disclosure.
- the method 500 may be performed by the system 100 described above with reference to FIG. 1.
- the method 500 can be used to determine a quantization interval and a workflow for mapping of real numbers from the quantization interval to integer numbers individually for each layer of the ANN.
- the method 500 may commence, in block 510, with estimating an initial quantization interval.
- the quantization interval can be estimated based on the average quantization intervals determined for previous layers.
- the quantization interval can be also determined by method 400 as described above with reference to FIG. 4.
- the method 500 may compute layer of the ANN using input data 505.
- the input data 505 can be quantized based on the quantization interval.
- the computations of the layer can be performed using integer numbers representing the input data 505.
- the computation of the layer may include computation of sums of products of weights and inputs to neurons of the layer (as shown in equation (2)) and computation of outputs of the neurons by applying, to the sums of products, a transfer function F(x) (as shown in equation (1)).
- the method 500 may determine a number of saturations in neurons of the layer.
- a neuron is said to be saturated if output values of the neuron are close to the asymptotic end of the transfer function F(X).
- the neuron is said to be saturated if an output of the neuron is close to the boundary of the integer range, for example -128 or 127 if 8-bit integers are used.
- the method 500 may determine a number of saturations in sums of products of weights and inputs (equation (2)) prior to applying, to the sums of products, the transfer function F(x). Specifics of determining whether a sum of products of weights and inputs to a neuron is saturated are described below with reference to FIG. 11.
- the method 500 may compare the number of saturations to a first pre-determined level. If the number of saturations exceeds the first pre determined level, method 500 proceeds to block 550 with an adjustment of the quantization interval.
- the method 500 may compare the number of saturations to a second pre-determined level. If the number of saturations does not exceed the second pre-determined level, the method 500 proceeds to block 550 with an adjustment of the quantization interval.
- the quantization interval can be adjusted to bring the number of neuron saturations between the second pre-determined level and the first pre-determined level.
- the method 500 may determine, in block 530, a proportion of the saturations (a ratio of saturated neurons to the number of all neurons in the layer). The proportion of the saturations can be further used in blocks 540 and 545 for comparison with the first pre-determined level and the second pre determined level, respectively.
- the method 500 may further proceed with computing the layer using integer numbers for weights of inputs to neurons and input data 505, wherein integer numbers are determined based on the adjusted quantization interval. Steps 550, 520, 530, 540, and 545 can be repeated until the number of saturations are between the second pre determined level and the first pre-determined level or number of iterations for the steps exceeding a pre-determined maximum number.
- method 500 proceeds, in block 560, with computations of integer numbers for weights of neurons of the layer.
- the method 500 may further determine a quantization interval for the next layer of the ANN.
- steps of method 500 are described with reference to layers, a similar method can be applied to determine a single quantization interval for entire ANN (one quantization interval to be used for layers in the ANN) or multiple ANNs. In these cases, the number of saturations is determined for neurons in all layers of the ANN or multiple ANNs.
- the method 500 does not require knowledge of the original interval of input data in the first data type (for example real numbers) for any of the layers of the ANN. Therefore, the method 500 does not require computations involving data in the first data type.
- the method 500 can be performed on hardware configured to perform only reduced precision calculations.
- the reduced precision calculations may include calculations using only the second data type, for example fixed-point calculations and/or integer calculations.
- a method 500 can be used to determine a quantization interval individually for each filter and/or each activation map in a convolution neural network if the number of saturations is determined per filter or per activation map.
- FIG. 6 is schematic diagram showing a workflow 600 for adjustment of a quantization interval, according to some example embodiments of the present disclosure.
- inputs (to neurons) v[i] are located in interval [-B; -A] If the range of the quantization is selected as [-B; B] and the range [-B; B] is divided into equal intervals, then the inputs will be represented only by a part of integers, which may lead to loss of the precision in ANN computations. Therefore, prior quantization, the inputs v[i] can be optionally scaled by a factor S, and then shifted by a shift D to position the range of the inputs v[i] symmetrically with respect to zero.
- the range [-C; C] can be further divided into L equal intervals of a length of quantization step Q, wherein L is the number of integers used to represent the inputs v[i].
- FIG. 7 is schematic diagram showing a workflow 700 for adjustment of a quantization interval for ANNs, according to some example embodiments of the present disclosure.
- inputs to neurons are located within two subintervals [-B; -A] and [C; D]
- the subintervals [-B; -A] and [C; D] may include different numbers of inputs.
- the subintervals [-B; -A] may include M inputs v[j] and subinterval [C; D] may include N inputs, wherein M > N.
- the values -A, -B, C and D can be positive or negative.
- the inputs in ranges [- B; -A] and [C; D] can be shifted by different shifts Di and D2.
- ranges [-B; -A] and [C; D] can form a new range [-F; F]
- a number of integers representing inputs v[i] and a number of integers representing inputs v[j] can be selected to be proportional to number M of inputs v[i] and number N of inputs v[j], respectfully.
- a quantization step for the subinterval [-B, -A] can be different than a quantization step for the subinterval [C, D]
- the inputs v[j] from the subinterval [C; D] can be mapped to a first subset of integers h and the inputs v[i] from the subinterval [-B; -A] can be mapped to a second subset of integers L ⁇ .
- FIG. 4, 6 and 7 describe mapping inputs to neurons to the integers, similar approach can be used for mapping, to the integers, weights of inputs to the neurons and other parameters of
- an adjustment of a quantization interval may include determining one or more ranges of inputs to neurons and/or weights to be quantized and numbers of inputs to neurons and/or weights within the ranges. Quantization steps and
- quantization levels for the ranges can be further determined based on numbers of the inputs to neurons and/or weights within the ranges.
- the numbers of the quantization levels for the ranges can be selected to be proportional to the numbers of the inputs to neurons and/or weights within the ranges.
- FIG. 8 is a flow chart illustrating a method 800 for quantization of ANNs, in accordance with some example embodiments.
- the operations can be combined, performed in parallel, or performed in a different order.
- the method 800 may also include additional or fewer operations than those illustrated.
- the method 800 may be performed by the system 100 described above with reference to in FIG. 1.
- the method 800 may commence with receiving, by one or more processors, a description of an ANN and input data associated with the ANN, wherein the description of the ANN is represented according to a first data type.
- the description of the ANN may include parameters of the ANN, for example weights.
- the method 800 may determine, by the one or more processors, a first interval of the first data type to be mapped to a second interval of a second data type.
- the first data type may include a floating-point data type and the second data type may include a fixed-point data type.
- the method 800 may perform, by the one or more processors and based on the input data and the description of the ANN, computations of one or more neurons of the ANN.
- the computations are performed for at least one value within the second interval, wherein the at least one value is a result of mapping of at least one of the first interval to a value of the second interval.
- the method 800 may determine, by the one or more processors, a measure of saturations in the one or more neurons of the ANN.
- the measure of saturations can be based on a count of saturations in all neurons of the ANN or based on a count of saturations in neurons belonging to a subset of layers of the ANN.
- the method 800 may proceed with adjusting, by the one or more processors and based on the measure of saturations, at least one of the first interval or the second interval.
- FIG. 9 illustrates an example computing system 900 that may be used to implement embodiments described herein.
- the example computing system 900 of FIG. 9 illustrates an example computing system 900 that may be used to implement embodiments described herein.
- the example computing system 900 of FIG. 9 illustrates an example computing system 900 that may be used to implement embodiments described herein.
- the example computing system 900 of FIG. 9 illustrates an example computing system 900 that may be used to implement embodiments described herein.
- the processor 910 may include internal accelerators like a graphical processing unit, a Field Programmable Gate Array, or similar accelerators that may be suitable for use with embodiments described herein.
- the memory 920 may include internal accelerators like a graphical processing unit, a Field Programmable Gate Array, or similar accelerators that may be suitable for use with embodiments described herein.
- the example computing system 900 of FIG. 9 may further include a mass storage 930, portable storage 940, one or more output devices 950, one or more input devices 960, a network interface 970, and one or more peripheral devices 980.
- the components shown in FIG. 9 are depicted as being connected via a single bus 990.
- the components may be connected through one or more data transport means.
- the one or more processors 910 and memory 920 may be connected via a local microprocessor bus, and the mass storage 930, one or more peripheral devices 980, portable storage 940, and network interface 970 may be connected via one or more input/output buses.
- Mass storage 930 which may be implemented with a magnetic disk drive, an optical disk drive or a solid state drive, is a non-volatile storage device for storing data and instructions for use by a magnetic disk, an optical disk drive or SSD, which in turn may be used by one or more processors 910. Mass storage 930 can store the system software for implementing embodiments described herein for purposes of loading that software into memory 920.
- the mass storage 930 may also include internal accelerators like a graphical processing unit, a Field Programmable Gate Array, or similar
- Portable storage 940 may operate in conjunction with a portable non-volatile storage medium, such as a compact disk (CD) or digital video disc (DVD), to input and output data and code to and from the computing system 900 of FIG. 9.
- a portable non-volatile storage medium such as a compact disk (CD) or digital video disc (DVD)
- CD compact disk
- DVD digital video disc
- the system software for implementing embodiments described herein may be stored on such a portable medium and input to the computing system 900 via the portable storage 940.
- One or more input devices 960 provide a portion of a user interface.
- the one or more input devices 960 may include an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, a stylus, or cursor direction keys.
- the computing system 900 as shown in FIG. 9 includes one or more output devices 950. Suitable one or more output devices 950 include speakers, printers, network interfaces, and monitors.
- Network interface 970 can be utilized to communicate with external devices, external computing devices, servers, and networked systems via one or more
- Network interface 970 may be a network interface card, such as an Ethernet card, optical transceiver, radio frequency transceiver, or any other type of device that can send and receive information.
- Other examples of such network interfaces may include Bluetooth®, 3G, 4G, and WiFi® radios in mobile computing devices as well as a USB.
- One or more peripheral devices 980 may include any type of computer support device to add additional functionality to the computing system.
- the one or more peripheral devices 980 may include a modem or a router.
- the example computing system 900 of FIG. 9 may also include one or more accelerator devices 985.
- the accelerator devices 985 may include PCIe-form-factor boards or storage-form-factor boards, or any electronic board equipped with a specific electronic component like a Graphical Processing Unit, a Neural Processing Unit, a Multi-CPU component, a Field Programmable Gate Array component, or similar accelerators electronic or photonic components, that may be suitable for use with embodiments described herein.
- the components contained in the exemplary computing system 900 of FIG. 9 are those typically found in computing systems that may be suitable for use with embodiments described herein and are intended to represent a broad category of such computer components that are well known in the art.
- the exemplary computing system 900 of FIG. 9 can be a personal computer, hand held computing device, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device.
- the computer can also include different bus configurations, networked platforms, multi-processor platforms, and so forth.
- Various operating systems (OS) can be used including UNIX, Finux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
- Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium).
- the instructions may be retrieved and executed by the processor.
- storage media are memory devices, tapes, disks, and the like.
- the instructions are operational when executed by the processor to direct the processor to operate in accord with the example
- Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk.
- Volatile media include dynamic memory, such as RAM.
- Transmission media include coaxial cables, copper wire, and fiber optics, among others, including the wires that include one embodiment of a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency and infrared data communications.
- Computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, SSD, a CD-read-only memory (ROM) disk, DVD, any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
- Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to a CPU for execution.
- a bus carries the data to system RAM, from which a CPU retrieves and executes the instructions.
- the instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
- the instructions or data may not be used by the CPU but be accessed in writing or reading from the other devices without having the CPU directing them.
- the quantization scheme can be adjusted based on the measure of saturations of sums of products (equation 2) rather than the measure of saturations of outputs of a neuron (equation 1).
- a measure of saturations can be obtained before calculating the transfer function in the equation (2) and based on the fact that, after mapping on the second data type, the value of the neuron (being just a sum of products) is close to the maximum or the minimum of the second interval of the second data type.
- FIG. 10 is a flow chart showing steps of a method 1000 for quantization of ANNs, according to some example embodiments.
- the operations can be combined, performed in parallel, or performed in a different order.
- the method 1000 may also include additional or fewer operations than those illustrated.
- the method 1000 can be performed by the system 100 described above with reference to in FIG. 1.
- the inputs to the neurons and description of the ANN can be of a first data type.
- the plurality of p neurons of the ANN can include one of: all neurons of the ANN, neurons of the same layer of the ANN, and neurons of at least two different layers of the ANN.
- the method 1000 can determine, by the one or more processors, a first interval of the first data type to be mapped to a second interval of a second data type.
- the first data type may include a floating-point data type and the second data type may include a fixed-point data type.
- the computations of sums are performed using at least one number of the second data type within the second interval, wherein the at least one number is a result of mapping of at least one number of the first interval to a number of the second interval.
- the result of mapping can be a result of mapping of at least one of the inputs
- the method 1000 may determine, by the one or more processors, a measure of saturations of the set of sum results.
- the measure of saturations can be a function of sum results in the set of sum results, wherein, in turn, the sum results depend on the result of mapping of at least one number of the first interval to a number of the second interval.
- the measure of saturations can be a count of saturated sum results in the set of sum results. In another embodiment, the measure of
- saturations can be a difference between the number of elements in the set of sum results and the count of saturated sum results.
- the measure of saturations can be determined as a ratio of the count of saturated sum results to the number of elements in the set of sum results.
- the measure of saturations can be determined by a mathematical function based on the set of sum results. The mathematical function can be calculated based on the sum results that are close to a saturated number (the minimum number or the maximum number in the second interval). In some embodiments, the measure of saturations can be a
- products W x V can be computed using corresponding numbers W and Vf of the second interval of the second data type, wherein the W is a result of mapping W to the second interval and Vf is a result of mapping V to the second interval.
- the sum results can be also represented by the second data type, wherein the determinization of the measure of saturations can include comparing at least one of the sum results to boundaries of the second interval.
- products W ⁇ x Vf can be computed using
- the sum results can be represented by a third data type that is different from the second data type, wherein the determination of the measure of saturations can include comparing at least one of the sum results to one or more thresholds of the third data type.
- the second data type can be a K-bit fixed data type and the third data type can be a L-bit fixed data type, wherein L is a bigger integer than K.
- the products W- x V ⁇ are mapped from the second data type to the third data type.
- the mapping of a number of the second data type onto a number of the third data type can be based on division, with different rounding strategies, of the boundaries of the third data type by a constant number.
- the mapping of the second data type onto the third data type can be based on bitwise shifting.
- the mapping of a number of the second data type onto a number of the third data type can be based on a combination of divisions, additions, subtractions, multiplications, and bitwise shifting of a number of the second data type.
- the mapping of a number of the second data type onto a number of the third data type can include computing a mathematical function based on numbers from the second or third data types.
- the determination of the measure of saturations can further include determining that the at least one of the sum results is within boundaries of the second interval and comparing the at least one of the sum results to one or more further thresholds of the second data type. Because the sum results are not modified by the transfer function, this may result in more accurate determination of the measure of the saturations of the neural network or of part of the neural networks than determination of the measure of saturations based on outputs of neurons.
- the method 1000 can proceed with adjusting, by the one or more processors and based on the measure of saturations, at least one of the first interval and the second interval.
- the one or more processors can repeat operations in blocks 1006, 1008, and 1010 until the measure of saturations satisfies one or more criteria.
- the at least one of the first interval and the second interval can be adjusted to cause the measure of saturations fall in a pre-determined range.
- the one or more processors can include at least one electronic component accelerating the computations of products and sums.
- FIG. 11 is a flow chart showing steps of a method 1100 for determining saturations of a sum of products, according to an example embodiment.
- the method 1100 can be performed by the system 100 described above with reference to FIG. 1.
- the method 1100 may provide some details of operations in block 1006 and 1008 of the method 1000.
- the method 1100 may commence, in block 1105, with performing
- the method 1100 may accumulate the multiplications Wf x V into a sum result.
- the sum result is represented by an intermediate data type.
- the intermediate data type can be referred to as a third data type, wherein the third data type is different from the second data type and typically corresponds to a bigger integer type.
- the sum result can be reduced to the second data type prior to applying the transfer function F(x).
- the measure of saturations can be determined based on the third data type.
- the method 1100 can compare the sum result with first boundaries represented by the third data type. If the sum result exceeds the first boundaries, method 1100 can proceed to block 1130 with a determination that the sum result is saturated.
- the method 1100 can compare the result of mapping of the sum result with the minimum and maximum of the second data type. If the result of mapping of the sum result exceeds the minimum and maximum of the second data type, the method 1100 proceeds to block 1130 with a determination that the sum result is saturated. Otherwise, method 1100 proceeds to block 1125.
- method 1115 can compare the result of mapping of the sum result from the third data type onto the second data type with second boundaries represented by the second data type. If the result of the mapping of the sum result exceeds the second boundaries represented by the second data type, the method 1100 proceeds to block 1130 with a determination that the sum result is saturated. Otherwise, method 1100 proceeds to block 1135, where it is determined that the sum result is not saturated.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/IB2019/050648 WO2020152504A1 (fr) | 2019-01-26 | 2019-01-26 | Quantification générique de réseaux neuronaux artificiels |
US16/258,552 US11068784B2 (en) | 2019-01-26 | 2019-01-26 | Generic quantization of artificial neural networks |
PCT/IB2020/050423 WO2020152571A1 (fr) | 2019-01-26 | 2020-01-20 | Quantification générique de réseaux neuronaux artificiels |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3915057A1 true EP3915057A1 (fr) | 2021-12-01 |
Family
ID=69526291
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20704358.9A Pending EP3915057A1 (fr) | 2019-01-26 | 2020-01-20 | Quantification générique de réseaux neuronaux artificiels |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3915057A1 (fr) |
WO (1) | WO2020152571A1 (fr) |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10373050B2 (en) * | 2015-05-08 | 2019-08-06 | Qualcomm Incorporated | Fixed point neural network based on floating point neural network quantization |
-
2020
- 2020-01-20 WO PCT/IB2020/050423 patent/WO2020152571A1/fr active Application Filing
- 2020-01-20 EP EP20704358.9A patent/EP3915057A1/fr active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2020152571A1 (fr) | 2020-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11625583B2 (en) | Quality monitoring and hidden quantization in artificial neural network computations | |
US20200226458A1 (en) | Optimizing artificial neural network computations based on automatic determination of a batch size | |
US20200242445A1 (en) | Generic quantization of artificial neural networks | |
US20200311511A1 (en) | Accelerating neuron computations in artificial neural networks by skipping bits | |
EP3924891A1 (fr) | Surveillance de qualité et quantification cachée dans des calculs de réseau neuronal artificiel | |
US10990525B2 (en) | Caching data in artificial neural network computations | |
US11068784B2 (en) | Generic quantization of artificial neural networks | |
US11494624B2 (en) | Accelerating neuron computations in artificial neural networks with dual sparsity | |
US11568255B2 (en) | Fine tuning of trained artificial neural network | |
US10769527B2 (en) | Accelerating artificial neural network computations by skipping input values | |
EP3915057A1 (fr) | Quantification générique de réseaux neuronaux artificiels | |
EP3948685A1 (fr) | Accélération de calculs neuronaux dans des réseaux neuronaux artificiels par saut de bits | |
EP3915055A1 (fr) | Quantification générique de réseaux neuronaux artificiels | |
US11989653B2 (en) | Pseudo-rounding in artificial neural networks | |
US11645510B2 (en) | Accelerating neuron computations in artificial neural networks by selecting input data | |
US11126912B2 (en) | Realigning streams of neuron outputs in artificial neural network computations | |
WO2020121030A1 (fr) | Mise en mémoire cache de données dans des calculs de réseau neuronal artificiel | |
US20220222519A1 (en) | Optimizing operations in artificial neural network | |
WO2021234437A1 (fr) | Pseudo-arrondissement dans des réseaux neuronaux artificiels | |
EP3953867A1 (fr) | Accélération de calculs neuronaux dans des réseaux neuronaux artificiels par sélection des données d'entrée | |
WO2022053851A1 (fr) | Réglage de précision d'un réseau neuronal artificiel entraîné | |
EP3895073A1 (fr) | Réalignement de flux de sorties de neurone dans des calculs de réseau de neurones artificiels | |
WO2022153078A1 (fr) | Optimisation d'opérations dans un réseau de neurones artificiels | |
WO2020121023A1 (fr) | Accélération de calculs de réseau neuronal artificiel par saut de valeurs d'entrée | |
WO2020144493A1 (fr) | Optimisation de calculs de réseau neuronal artificiel sur la base d'une détermination automatique d'une taille de lot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210826 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20240322 |