WO2021052677A1 - Schnelles quantisiertes training trainierbarer module - Google Patents
Schnelles quantisiertes training trainierbarer module Download PDFInfo
- Publication number
- WO2021052677A1 WO2021052677A1 PCT/EP2020/072158 EP2020072158W WO2021052677A1 WO 2021052677 A1 WO2021052677 A1 WO 2021052677A1 EP 2020072158 W EP2020072158 W EP 2020072158W WO 2021052677 A1 WO2021052677 A1 WO 2021052677A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- values
- parameters
- training
- discrete
- output variables
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the present invention relates to the training of trainable modules, especially for use in control devices for vehicles or in other embedded systems.
- ANN artificial neural network
- An artificial neural network is a processing chain that usually contains a plurality of layers of neurons. Each neuron combines a plurality of input variables with weights for activation. The activations formed in a layer, or a result determined therefrom by further processing, are in each case fed to the next adjacent layer until the ANN has been completely passed through and one or more output variables arise.
- the ANN therefore maps values of the input variables to values of the output variables in accordance with the internal processing chain.
- WO 2018/158043 A1 discloses a method for coding numerical values in an ANN, in which the most significant bit is reserved specifically for coding the value zero. In this way, it is particularly quick to check whether the value is zero.
- the trainable module maps one or more input variables to one or more output variables using an internal processing chain.
- the internal processing chain is characterized by a set of parameters.
- a trainable module is viewed in particular as a module that embodies a function parameterized with adaptable parameters with ideally great power for generalization.
- the parameters can in particular be adapted in such a way that when learning input variables are input into the module, the values of associated learning output variables are reproduced as well as possible.
- the internal processing chain can in particular comprise, for example, an artificial neural network, ANN, or it can also be an ANN.
- the parameters can then include weights with which neurons each combine a plurality of input variables to form an activation.
- At least one learning data set is provided which comprises learning values of the input variables and associated learning values of the output variables.
- a large number of learning data sets are made available, which record many variants of situations presented at the input that the trainable module is supposed to deal with.
- a list of discrete values is provided from which the parameters characterizing the internal processing chain are to be selected during training. These discrete values are chosen to be can be stored as a fixed point number with a predetermined number N of bits without loss of quality.
- a discretization of the model parameters generally leads to a reduced memory requirement of the ANN. If these discrete values can also be expressed losslessly as fixed-point numbers, an efficient implementation on fixed-point hardware can be realized. Such fixed-point hardware is significantly cheaper, more energy-efficient and more space-saving than hardware for floating-point calculations.
- the learning values of the input variables contained in the learning data set are mapped to assessment values of the output variables by the trainable module.
- a predefined cost function is now evaluated, which characterizes both a deviation of the assessment values of the output variables from the learning values of the output variables contained in the learning data set and a deviation of at least one parameter of the internal processing chain from at least one discrete value in the list.
- At least one parameter of the internal processing chain is adjusted with the aim of improving the value of the cost function.
- the range of values of the parameters and / or a gradient of the cost function is limited using the discrete values.
- the cost function can be a sum, for example.
- the first summand can characterize the deviation of the assessment values of the output variables from the learning values of the output variables contained in the learning data set.
- the second summand can include at least one penalty variable which is the deviation of the at least one Parameters of the internal processing chain characterized by the at least one discrete value in the list.
- the sum can optionally also be weighted.
- the weighting can be carried out individually for each layer. For example, a weighting can be used that is inversely proportional to the number of parameters of precisely this layer.
- the adaptation of the at least one parameter of the internal processing chain can in particular be aimed at optimizing, for example minimizing, the value of the cost function. This optimization then represents a simultaneous improvement both in terms of the optimal reproduction of the knowledge contained in the learning data sets and in terms of compliance with the desired discretization of the parameters.
- a gradient descent method or any other desired optimization method can be used which, based on the value of the cost function, suggests changes to one or more parameters that are likely to improve the value of the cost function in the further course of the training. “Probably” in this context means that not every training step inevitably leads to an improvement in the value of the cost function. In the course of the training there can also be “missteps” that instead worsen the value of the cost function. The optimization process learns from these "missteps” so that the initial deterioration is ultimately turned into an improvement.
- the number N of bits specifies the number of possible discrete values fixed at a maximum of 2 N.
- values of N between 2 and 7, preferably between 2 and 5 and very particularly preferably between 2 and 3, have proven to be particularly advantageous.
- the last-mentioned narrowest area in particular is counter-intuitive, because prima facie the impression arises that this discretization is also propagated into the output variables supplied by the trainable module as a whole and that these are thus significantly coarsened. In practical applications, however, this is not the case, as the trainable module, due to the large number of parameters available, is still quite in is able to differentiate the knowledge presented in the form of the learning data sets.
- the number N of bits can be used as an adjusting screw to adapt the training of one and the same basic architecture for a trainable module to different applications. If the basic architecture remains the same, this means that the various applications can be implemented on hardware with a high proportion of identical parts, which in turn simplifies production and makes it cheaper.
- the list of discrete values can come from any source. It can be determined, for example, on the basis of prior knowledge about the application in which the trainable module is to be operated. However, there are also possibilities of automatically defining the list of discrete values in whole or in part if such prior knowledge is incomplete or not available.
- the list of discrete values is determined on the basis of the values of the parameters of the internal processing chain obtained during pre-training of the trainable module.
- This pre-training can in particular be carried out using floating point numbers for the parameters, that is to say without quantizing or otherwise restricting the values that the parameters can assume.
- the learning values of the input variables contained in at least one learning data set are mapped to assessment values of the output variables by the trainable module.
- a predetermined pre-training Evaluated cost function that characterizes a deviation of the assessment values of the output variables from the learning values of the output variables contained in the learning data set.
- At least one parameter of the internal processing chain is adjusted with the aim of improving, in particular optimizing, the value of the pre-training cost function.
- the pre-training cost function can in particular, for example, characterize the deviation of the assessment values of the output variables from the learning values of the output variables contained in the learning data set in the same way as the previously described predetermined cost function does.
- this given cost function is a sum
- the summand that characterizes the deviation of the assessment values of the output variables from the learning values of the output variables contained in the learning data set can, for example, be taken over unchanged as the pre-training cost function will.
- a range of values can be determined in which the parameters are located.
- the discrete values of the list can then be determined depending on this area. They can, for example, be distributed equidistantly over this area and / or can be determined using accumulation points of the values of parameters in this area.
- the number N of bits and / or the list of discrete values can also be adapted during the actual training i with the aim of improving, in particular optimizing, the value of the cost function.
- This adaptation can thus be integrated into the normal training of the trainable module. If the cost function depends on the error that the trainable module makes in processing the learning values of the Input quantities, as well as the discretization error of the parameters, then both types of errors can be the cause that the value of the cost function is bad. Then it is logical if the trainable module can learn to correct the discretization error itself in the same way as to correct excessive deviations from the learning values of the output variables.
- the discrete values in the list are uniformly, i.e. symmetrically, distributed around zero. This means that the values are distributed around 0 with a constant distance (step size) D, where the step size D is any power of two.
- Discrete values of this type are characterized by the fact that arithmetic operations with these values can be implemented particularly easily on fixed-point hardware. Scaling by the step size D can be implemented as a simple bit-by-bit shift operation.
- the discrete values in the list can be integers.
- numerically adjacent discrete values can each differ by a step size D, which is a power of two of a non-negative integer.
- the parameters are typically successively adapted in many update steps in order to ultimately arrive at a set of parameters for which the value of the cost function is optimal.
- part of the information acquired during training can be lost in a manner similar to that when an image sensor is overdriven, which can no longer distinguish between very large brightness values and instead only outputs its maximum saturation value.
- This tendency is counteracted by setting parameters outside the permitted interval to the corresponding interval limits. This can take place particularly advantageously with each adjustment of the parameters as part of the training.
- Significantly fewer training epochs are then required overall to train the ANN in such a way that the trained ANN achieves a predetermined accuracy in a test with validation data.
- a parameter can have a negative value in one training epoch and change to a positive value in a later training epoch. If, for example, the permitted discrete values for a parameter are -1, 0 and +1 and the parameter was set to -10 in the first training epoch, while in a later training epoch the tendency tends towards +2, then the strong negative A deviation of -10 in connection with the discretization leads to the trend reversal being completely “dampened”. This is avoided by truncating the negative deflection at -1.
- Limiting the parameters to the permitted interval during training has the further effect that less computing time is required for the calculation of Intermediate results are used that are not reflected in the parameters determined as the end result of the training. If the permitted discrete values for a parameter are, for example, -1, 0 and +1, then there is no point in optimizing this parameter in several steps to -3, then to -3.141 and finally to -3.14159. Limiting the parameters at an early stage restricts the search space from the outset to what is ultimately required. The effect is to a certain extent comparable to the fact that passwords can be cracked much faster if an originally very large search space (around 15 characters from the entire available character set) with prior knowledge of the user's bad habits is restricted to "six lowercase letters", for example can be.
- the solution space is given by the list of discrete values that the parameters can assume after training. It would therefore be inefficient to alienate yourself from this solution space during training.
- a gradient of the cost function expressed in the parameters as variables is determined in the direction of better values of the cost function.
- the parameters are changed by a product of the gradient and a step size (gradient descent or gradient ascent method).
- the cost function contains a weighted sum
- a second contribution which characterizes a deviation of at least one parameter of the internal processing chain from at least one discrete value in the list.
- the weighting of the first contribution is decreased and the weighting of the second contribution is increased.
- the speed or rate at which this change in weighting is carried out can increase exponentially as the training progresses.
- model capacity depends on the number of possible combinations of values of the parameters, so it is a measure of the dimensionality of a space spanned by the parameters.
- too high a model capacity can give the training a tendency to “overfitting”, which is essentially a “memorization” of the situations represented by the training data. This “memorized” knowledge is difficult to generalize to new situations that did not arise in training.
- a restriction of the model capacity during training can therefore avoid over-adapting to the training data, especially in an advanced phase in which the training is, so to speak, perfected, and can therefore be very useful.
- This restriction only finds its practical limit at the point at which the ANN can no longer cope with the complexity of the problem posed due to the insufficient model capacity.
- the end result of the training can become a sensible combination of convergence in the sense of the learning task on the one hand and discretization on the other.
- the emphasis is placed on the learning task of the ANN in order to achieve the first important learning successes as early as possible.
- the comparison with human learning processes shows that this improves the accuracy of the final result: If you start studying with a large model capacity and achieve the important learning successes in the fundamentals of your subject, you will be able to fall back on this again and again later and get a good degree at the end. But if you are already struggling with insufficient model capacity in this phase, you will always lag behind.
- the initial learning success is “preserved”, so to speak, and the training turns towards the goal of discretization.
- the first stage focuses exclusively on the intended application and the second stage exclusively on discretization each other's goal is not completely lost sight of. In this way, it is avoided in particular that the training with regard to the discretization is too much at the expense of the previously acquired learning success with regard to the intended application.
- each of the parameters is set to that discrete value from the list to which it is closest.
- the trainable module can learn to deal with the restrictions that the discretization entails in the best possible way and still deliver maximum accuracy.
- the trainable module is validated after the parameters have been set to the discrete values. This is used to check whether the trainable module has really “understood” the knowledge contained in the learning data sets and can apply it to new situations, or whether the trainable module has only “learned by heart” this knowledge in such a way that it prepares precisely for the known situations is.
- a large number of validation data records are provided for validation, each of which includes validation values for the input variables and associated validation values for the output variables.
- the amount of validation data records is not congruent with the amount of learning data records.
- the set of validation data sets can advantageously be disjoint to the set of learning data sets.
- the respective validation values of the input variables are mapped to test values of the output variables by the trainable module. It is checked whether the deviations of the test values from the validation values of the output variables meet a specified criterion.
- a criterion can consist, for example, in the fact that a mean, for example an absolute or quadratic, mean of the deviations over all validation data records is below a predetermined one The threshold value. Alternatively or also in combination with this, it can be checked, for example, whether the maximum deviation in the set of validation data records is below a predetermined threshold value. If the respective criterion is met, the trainable module can be found to be suitable for real use.
- the cost function used for the training can be, for example, the form where L is a component that only characterizes the deviation of the assessment values of the output variables from the learning values of the output variables contained in the learning data set, and LR is a component that only characterizes the deviation of at least one parameter of the internal processing chain from characterizes at least one discrete value in the list.
- the parameters to be discretized can include, for example, all weights w®.
- the LR portion of the cost function can then take the form, for example where the index q in the subtract denotes the closest discrete value of the corresponding weight.
- the parameter l decides how strongly the part LR is weighted relative to the part L. In particular, as described above, it can be used as an adjustment screw between the learning task and discretization. As explained above, l can be varied during training and in particular, for example, increase exponentially with increasing training progress.
- D (i) is the respective step size of the quantization for layer I.
- the first case in the case distinction can practically be neglected, since the weights w® are usually initialized with random floating point values at the beginning of the training of an ANN. The probability that such a value falls exactly on a quantization level approaches zero.
- the invention also relates to a method for producing a trainable module which maps one or more input variables onto one or more output variables by means of an internal processing chain.
- the internal processing chain is characterized by a set of parameters.
- discrete values are determined for the parameters of the internal processing chain of the trainable module, and the trainable module is then optionally validated.
- the internal processing chain of the trainable module implemented in an arithmetic unit, which is designed for the representation and / or processing of the parameters characterizing the internal processing chain in fixed-point arithmetic.
- the implementation can in particular include providing appropriate hardware.
- the parameters are set to the previously determined discrete values in the arithmetic unit.
- this method can use fast floating point hardware for the actual development and possible validation of the trainable module.
- the trainable module can then be implemented on fixed-point hardware without changing its behavior.
- the fixed point arithmetic results in a clear run time advantage compared to implementation on hardware with comparable costs and / or comparable energy consumption.
- the invention accordingly also relates to a further method.
- a trainable module is first trained with the method described above for training.
- the trainable module is then operated by supplying it with one or more input variables.
- a vehicle, a robot, a quality control system and / or a system for monitoring an area on the basis of sensor data is controlled.
- the invention therefore also relates to a trainable module for mapping one or more input variables to one or more output variables by means of an internal processing chain, which is characterized by a set of parameters and is implemented in an arithmetic unit.
- the arithmetic unit is designed to display and / or process the parameters in fixed-point arithmetic.
- trainable modules can be installed in control units for vehicles and other embedded systems in particular. With such devices in particular, there is a high cost pressure with regard to the hardware to be used, despite the requirement that the trainable module should function reliably.
- the invention therefore also relates to a control device for a vehicle and / or an embedded system with the trainable module described above and / or with another trainable module trained and / or produced using one of the methods described above.
- the trainable module can in particular be designed as a classifier and / or regressor for physical measurement data recorded with at least one sensor.
- the sensor can be, for example, an imaging sensor, a radar sensor, a lidar sensor or an ultrasonic sensor.
- the methods described can be implemented in whole or in part by computer. For example, they can be part of a computer-implemented development environment for trainable modules.
- the implementation of the internal processing chain of the trainable module as part of its production can also be computer-implemented, for example by means of automated production.
- the invention therefore also relates to a computer program with machine-readable instructions which, when they are executed on one or more computers, cause the computer or computers to carry out one of the described methods.
- the invention also relates to a machine-readable data carrier and / or a download product with the computer program.
- the invention also relates to a computer or other computing unit with the described computer program and / or with the described machine-readable data carrier and / or download product.
- the computer or the arithmetic unit can also be specifically designed in some other way to carry out one of the described methods.
- Such a specific design can be embodied, for example, in one or more field programmable gate arrays (FPGA) and / or in one or more application-specific integrated circuits (ASIC).
- FPGA field programmable gate arrays
- ASIC application-specific integrated circuits
- FIG. 1 exemplary embodiment of the method 100 for training a trainable module 1
- FIG. 2 exemplary embodiment of the method 200 for producing a trainable module 1
- FIG. 4 exemplary qualitative effect of the method 100 on the parameters 12a of the internal processing chain 12 of the trainable module 1;
- FIG. 5 shows an example of the development of the parameters 12a in the course of the method 100
- FIG. 6 Another example of the development of the parameters 12a with a limitation of the value range of the parameters 12a for each update step;
- FIG. 7 shows an example of the course of the convergence of the parameters 12a in different layers ad of an artificial neural network;
- FIG. 8 Another example of the convergence of the parameters 12a with (diagram (a)) and without (diagram (b)) limitation of the range of values of the parameters 12a in each update step.
- Figure 1 is a flowchart of an embodiment of the method 100 for training a trainable module 1.
- step 110 at least one learning data record 2 is provided, the learning values 11a of the input variables 11 of the trainable module 1 and learning values 13a of the output variables 13 of the trainable module 1 includes.
- step 120 a list of 3 discrete values 3a-3c is provided, from which the parameters 12a characterizing the internal processing chain 12 of the trainable module 1 are to be selected. These discrete values 3a-3c are selected in such a way that they can be stored as fixed-point numbers with a predetermined number N of bits without any loss of quality.
- the list 3 of the discrete values 3a-3c can be determined, for example, with the aid of a pre-training of the trainable module 1. This preliminary training can also make use of the learning data sets 2 provided in step 110.
- the learning values 11a of the input variables 11 are mapped by the trainable module 1 to assessment values 13b of the output variables.
- a pre-training cost function 4a is evaluated, which characterizes a deviation of the assessment values 13b of the output variables from the learning values 13a of the output variables 13 contained in the learning data record 2.
- at least one parameter 12a of the internal processing chain 12 of the trainable module 1 is adapted as a function of the value of the pre-training cost function 4a determined in this way. From the values of the parameters 12a obtained in this way, the list 3 of the discrete values 3a-3c can finally be determined in accordance with block 124. For example, an interval can be determined in which the parameters 12a lie, and the discrete values 3a-3c can, for example, be distributed equidistantly over this interval.
- step 130 The actual training of trainable module 1 begins in step 130.
- learning values 11a are again mapped by trainable module 1 to assessment values 13b of the output variables.
- the cost function 4 subsequently evaluated in step 140, however, in contrast to the pre-training cost function 4a in the optional pre-training, not only depends on the deviation of the assessment values 13b from the learning values 13a, but also characterizes a deviation of at least one internal parameter 12a Processing chain 12 of at least one discrete value 3a-3c in the list 3.
- step 150 at least one parameter 12a of the internal processing chain 12 is adapted as a function of the value of the cost function 4 determined in this way.
- the training can end, for example, when a predefined termination condition is reached.
- the termination condition can include, for example, a threshold value for the cost function and / or a time available for the training and / or the completion of a predetermined number of epochs.
- the training in step 130 can be initialized, for example, with random values for the parameters 12a. If, however, a preliminary training has already been carried out in order to establish the list 3 of the discrete values 3a-3c, then the parameters 12a determined in the course of this preliminary training can be used as starting values. In this way, the effort invested in this preliminary training is optimally used.
- the number N of bits and / or the list 3 of the discrete values 3a-3c can be adapted in step 160 as a function of the value of the cost function 4.
- the parameters 12a can be set to those discrete values 3a-3c from the list 3 which are respectively closest to them.
- the trainable module 1 can then be validated in step 180. When the trainable module 1 is finally implemented on hardware, it will show exactly the same behavior as in the validation 180.
- validation data records 5 are provided in accordance with block 181.
- These validation data sets 5 each include validation values 51 of the input variables 11 and associated validation values 53 of the output variables 13.
- the set of validation data sets 5 is not congruent with the set of learning data sets 2. These two sets are particularly advantageous disjoint.
- the respective validation values 51 of the input variables 11 are mapped by the trainable module 1 to test values 13c of the output variables 13.
- box 150 several possibilities are shown by way of example as to how the adaptation 150 of the parameters 12a can be refined during the training in order to improve the accuracy and at the same time to save training time.
- values of the parameters 12a which are lower than the lowest discrete value 3a-3c of the list 3 can be set to this lowest discrete value 3a-3c.
- values of the parameters 12a which are higher than the highest discrete value 3a-3c of the list 3 can be set to this highest value 3a-3c.
- the adaptation 150 of the parameters 12a can include, in accordance with block 153, determining a gradient 4d of the cost function 4 expressed as variables in the parameters 12a in the direction of better values of the cost function 4 and, in accordance with block 154, the parameters 12a around a product of the gradient 4d and to change one step size.
- components of the gradient 4d which relate to parameters 12a which currently have the lowest discrete value 3a-3c in list 3 can then be limited to non-negative values.
- components of the gradient 4d which relate to parameters 12a which currently have the highest discrete value 3a-3c of the list 3 can be limited to non-positive values.
- the cost function 4 can contain a weighted sum of at least two contributions.
- the first contribution 4b characterizes a deviation of the assessment values 13b of the output variables 13 from the learning values 13a of the output variables 13 contained in the learning data record 2.
- the second contribution 4c characterizes a deviation of at least one parameter 12a of the internal processing chain 12 from at least one discrete value 3a-3c in the list 3.
- the weighting of the first contribution 4b can be reduced and the weighting of the second contribution 4c can be increased.
- FIG. 2 is a flowchart of an exemplary embodiment of the method 200 for producing a trainable module 1.
- the trainable module is trained with the method 100, and integer values for the parameters 12a of the internal processing chain 12 of the trainable module 1 are determined.
- Discrete values 3a-3c that are numerically adjacent differ in each case by a step size D, which is a power of two of a non-negative whole number.
- step 220 such an implementation is carried out on an arithmetic unit 6, which is used for the representation and / or processing of the parameters 12a in fixed point Arithmetic is trained, made.
- step 230 the parameters 12a are set in the arithmetic unit to the integer values determined in step 210.
- FIG. 3 shows an exemplary embodiment of the trainable module 1.
- the trainable module 1 is implemented in an arithmetic unit 6, which is designed for the display and / or processing of the parameters 12a in fixed-point arithmetic.
- the internal processing chain 12 drawn as an example as an artificial neural network, ANN, and characterized by the parameters 12a, one or more input variables 11 are mapped onto one or more output variables 13 during operation of the trainable module 1.
- FIG. 4 qualitatively shows the effect of the method 100 on the structure which the spectrum of the parameters 12a shows.
- the frequency p of the values of the parameters 12a is plotted against these values.
- Diagram (a) shows a uniform distribution, as it arises, for example, when the parameters 12a are initialized with random values.
- Diagram (b) shows a normal distribution that arises from a list 3 during conventional training without restriction to discrete values 3a-3c.
- Diagram (c) shows a multimodal distribution that arises during training with the method 100 for three discrete values 3a-3c drawn in by way of example. The optimal theoretical distribution for these three quantization levels 3a-3c would be the Dirac distribution shown in diagram (d) with three Dirac pulses.
- Figure 5 shows an example of real training of a trainable module 1 with an ANN as internal processing chain 12 on the benchmark data set “CIFAR-10” how the frequencies p of values of the parameters 12a of a specific layer in the ANN as a function of the epoch number e develop.
- the parameters 12a are approximately normally distributed as in the schematic diagram in diagram (b) of FIG. 4.
- the distribution approaches the multimodal distribution shown in diagram (c) of FIG on.
- FIG. 6 shows how the parameters 12a of the first layer (row (a)), the fourth layer (row (b)) and the seventh layer (row (c)) of a VGG11, seen from above -KNN develop in the course of the training on the benchmark data set "CIFAR-100" depending on the epoch number e.
- the frequencies p of values of the parameters 12a of the respective layer for the respective epoch number e are plotted in each of the diagrams in FIG. In the example shown in FIG. 6, only the three discrete values -D, 0 and D are permitted for the parameters 12a.
- the distribution of the parameters 12a which comes from a pre-training, is unimodal with a peak at 0.
- the distribution also includes runners for values of the parameters 12a that are below -D or above D lie.
- the weight of a first contribution 4b which relates to the training of the ANN in relation to the classification task on the CIFAR 100 data set, is reduced in the cost function 4.
- the weight of a second contribution 4c which relates to the regularization by discretizing the parameters 12a, is increased.
- FIG. 7 shows which percentage r of the parameters 12a is “rotated” in different layers ad of the ANN in a training period that goes back 10 epochs from the given epoch, ie, from a mode of multimodal distribution has switched to another.
- the layers ac are Folding layers and are in the first, third and fifth position in the layer sequence of the ANN.
- Layer d is a fully cross-linked layer and is in seventh position in the layer sequence of the ANN. It can be seen that the parameters 12a in the various layers finally “decide” for one of the modes at different speeds. From an epoch number e of around 180, this is
- FIG. 8 shows, on the basis of the real training already illustrated in FIG. 6, which percentage r of the parameters 12a of layers a-k of the ANN has changed from one mode of multimodal distribution to another in each epoch e.
- Diagram (a) shows the course of r with the limitation of the value range of the parameters 12a to the interval [-D, D] after each update step.
- diagram (b) shows the course of r without this limitation of the value range.
- the limitation of the value range has the effect, which is clearly visible in comparison, that especially at the beginning of the training a particularly large number of parameters 12a change mode, i.e. change from one of the discrete values 3a-3c in list 3 to another. This means that there is greater learning progress, especially at the beginning of the training, than without the limitation of the value range. As explained before, this improves the accuracy of the training and at the same time saves a considerable amount of training time.
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20753732.5A EP4032030A1 (de) | 2019-09-19 | 2020-08-06 | Schnelles quantisiertes training trainierbarer module |
KR1020227013027A KR20220065025A (ko) | 2019-09-19 | 2020-08-06 | 훈련 가능한 모듈들의 고속 양자화 훈련 |
US17/632,735 US20220277200A1 (en) | 2019-09-19 | 2020-08-06 | Fast quantised training of trainable modules |
CN202080065624.5A CN114402340A (zh) | 2019-09-19 | 2020-08-06 | 对可训练模块的快速量化训练 |
JP2022517997A JP7385022B2 (ja) | 2019-09-19 | 2020-08-06 | トレーニング可能なモジュールの高速な量子化されたトレーニング |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102019214308.3A DE102019214308B4 (de) | 2019-09-19 | 2019-09-19 | Schnelles quantisiertes Training trainierbarer Module |
DE102019214308.3 | 2019-09-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021052677A1 true WO2021052677A1 (de) | 2021-03-25 |
Family
ID=71996001
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2020/072158 WO2021052677A1 (de) | 2019-09-19 | 2020-08-06 | Schnelles quantisiertes training trainierbarer module |
Country Status (7)
Country | Link |
---|---|
US (1) | US20220277200A1 (de) |
EP (1) | EP4032030A1 (de) |
JP (1) | JP7385022B2 (de) |
KR (1) | KR20220065025A (de) |
CN (1) | CN114402340A (de) |
DE (1) | DE102019214308B4 (de) |
WO (1) | WO2021052677A1 (de) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102017206892A1 (de) | 2017-03-01 | 2018-09-06 | Robert Bosch Gmbh | Neuronalnetzsystem |
KR102601604B1 (ko) | 2017-08-04 | 2023-11-13 | 삼성전자주식회사 | 뉴럴 네트워크의 파라미터들을 양자화하는 방법 및 장치 |
US11928601B2 (en) | 2018-02-09 | 2024-03-12 | Google Llc | Neural network compression |
-
2019
- 2019-09-19 DE DE102019214308.3A patent/DE102019214308B4/de active Active
-
2020
- 2020-08-06 EP EP20753732.5A patent/EP4032030A1/de not_active Withdrawn
- 2020-08-06 US US17/632,735 patent/US20220277200A1/en active Pending
- 2020-08-06 WO PCT/EP2020/072158 patent/WO2021052677A1/de unknown
- 2020-08-06 JP JP2022517997A patent/JP7385022B2/ja active Active
- 2020-08-06 CN CN202080065624.5A patent/CN114402340A/zh active Pending
- 2020-08-06 KR KR1020227013027A patent/KR20220065025A/ko active Search and Examination
Non-Patent Citations (2)
Title |
---|
PAULIUS MICIKEVICIUS ET AL: "Mixed Precision Training", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 15 February 2018 (2018-02-15), XP081319104 * |
WESS MATTHIAS ET AL: "Weighted Quantization-Regularization in DNNs for Weight Memory Minimization Toward HW Implementation", IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 37, no. 11, 1 November 2018 (2018-11-01), pages 2929 - 2939, XP011692583, ISSN: 0278-0070, [retrieved on 20181017], DOI: 10.1109/TCAD.2018.2857080 * |
Also Published As
Publication number | Publication date |
---|---|
JP7385022B2 (ja) | 2023-11-21 |
US20220277200A1 (en) | 2022-09-01 |
DE102019214308B4 (de) | 2022-07-28 |
KR20220065025A (ko) | 2022-05-19 |
CN114402340A (zh) | 2022-04-26 |
EP4032030A1 (de) | 2022-07-27 |
DE102019214308A1 (de) | 2021-03-25 |
JP2022548965A (ja) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE102005046747B3 (de) | Verfahren zum rechnergestützten Lernen eines neuronalen Netzes und neuronales Netz | |
EP2999998B1 (de) | Methode zur ermittlung eines modells einer ausgangsgrösse eines technischen systems | |
DE102012009502A1 (de) | Verfahren zum Trainieren eines künstlichen neuronalen Netzes | |
WO2019081241A1 (de) | Verfahren, vorrichtung und computerprogramm zur erstellung eines tiefen neuronalen netzes | |
DE102019209644A1 (de) | Verfahren zum Trainieren eines neuronalen Netzes | |
DE102021002318A1 (de) | Verfahren zur Erstellung eines Simulationsmodells, Verwendung eines Simulationsmodells, Computerprogrammprodukt, Verfahren zur Kalibrierung eines Steuergeräts | |
DE4002336A1 (de) | Bezugsmuster-erneuerungsverfahren | |
DE102019211672A1 (de) | Trainingsverfahren für ein künstliches neuronales Netzwerk | |
DE4121453C2 (de) | Näherungsschlußfolgerungsvorrichtung | |
WO2020178009A1 (de) | Training neuronaler netzwerke für effizientes implementieren auf hardware | |
DE102019214308B4 (de) | Schnelles quantisiertes Training trainierbarer Module | |
WO2020064209A1 (de) | Maschinelles lernsystem, sowie ein verfahren, ein computerprogramm und eine vorrichtung zum erstellen des maschinellen lernsystems | |
WO2020169416A1 (de) | Quantisiertes training trainierbarer module | |
DE102013206274A1 (de) | Verfahren und Vorrichtung zum Anpassen eines nicht parametrischen Funktionsmodells | |
DE102020210700A1 (de) | Flexiblerer iterativer Betrieb künstlicher neuronaler Netzwerke | |
DE102020116013A1 (de) | Verfahren zum Bereitstellen eines komprimierten künstlichen neuronalen Netzes mittels eines Autoencoders | |
WO2021009153A1 (de) | Komponentenbasierte verarbeitung von eingangsgrössen | |
DE102004013924B3 (de) | Vorrichtung zur kontextabhängigen Datenanalyse | |
DE102020122979A1 (de) | Verfahren zum Bereitstellen eines komprimierten, robusten neuronalen Netzes und Assistenzeinrichtung | |
EP1359539A2 (de) | Neurodynamisches Modell der Verarbeitung visueller Informationen | |
DE102019216973A1 (de) | Lernverfahren für neuronale netze basierend auf evolutionären algorithmen | |
DE102005045120A1 (de) | Vorrichtung und Verfahren zur dynamischen Informationsselektion mit Hilfe eines neuronalen Netzes | |
DE102021204343A1 (de) | Steuergerät zum Erzeugen von Trainingsdaten zum Trainieren eines Algorithmus des maschinellen Lernens | |
DE102022115101A1 (de) | Automatisierter entwurf von architekturen künstlicher neuronaler netze | |
DE19641286C1 (de) | Lernverfahren für selbstorganisierende neuronale Netzwerke und Vorrichtung zur Durchführung des Verfahrens |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20753732 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022517997 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20227013027 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2020753732 Country of ref document: EP Effective date: 20220419 |