EP3899808A1 - Procédé pour entraîner un réseau neuronal - Google Patents
Procédé pour entraîner un réseau neuronalInfo
- Publication number
- EP3899808A1 EP3899808A1 EP19812975.1A EP19812975A EP3899808A1 EP 3899808 A1 EP3899808 A1 EP 3899808A1 EP 19812975 A EP19812975 A EP 19812975A EP 3899808 A1 EP3899808 A1 EP 3899808A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- neural network
- training
- pairs
- input signal
- gradient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 238000012549 training Methods 0.000 title claims abstract description 83
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 82
- 230000006870 function Effects 0.000 claims description 38
- 230000003416 augmentation Effects 0.000 claims description 24
- 230000001419 dependent effect Effects 0.000 claims description 11
- 230000003190 augmentative effect Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 claims description 2
- 238000000053 physical method Methods 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 20
- 230000008569 process Effects 0.000 description 17
- 238000004519 manufacturing process Methods 0.000 description 11
- 230000001276 controlling effect Effects 0.000 description 8
- 238000011161 development Methods 0.000 description 8
- 230000018109 developmental process Effects 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 7
- 238000013507 mapping Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 238000005406 washing Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000003384 imaging method Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000001174 ascending effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000002775 capsule Substances 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013434 data augmentation Methods 0.000 description 2
- 230000006866 deterioration Effects 0.000 description 2
- 238000002059 diagnostic imaging Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241001351225 Sergey Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000009189 diving Effects 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000009187 flying Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 230000009183 running Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000009182 swimming Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 230000009184 walking Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- the invention relates to a method for training a neural network, a training system, uses of the neural network thus trained, a computer program and a machine-readable storage medium.
- the method with the features of independent claim 1 has the advantage that there is a guaranteed reliability of the trained system, which is particularly important for safety-critical
- Performance e.g. in the classification of images.
- Deep learning methods ie (deep) artificial neural networks
- This can be, for example, a classification of sensor data, in particular image data, that is to say a mapping from sensor data or
- the task of training the neural network is to determine weights w £ W such that an expected value F one
- the cost function L denotes a measure for the distance between the image of a determined by means of the function f w
- Input variable x D to a variable f w (x D ) in the output space V k and an actual output variable y D in the output space V k .
- a “deep neural network” can be understood to mean a neural network with at least two hidden layers (English: “hidden layer”).
- nF is usually determined using Training data (X j , y) approximated, i.e. by V w L (f w (X j , y j )), the indices j being selected from a so-called epoch.
- An epoch is a permutation of the labels ⁇ 1, ..., N ⁇ of the available ones
- So-called data augmentation can be used to expand the training data set.
- a xf) can be a lot of typical variations of the input signal x j (including the
- Input signal x j itself), which leave a classification of the input signal x j , i.e. the output signal of the neural network, unchanged.
- Augmentation using the set a (x j ) is difficult here, since the effect does not have to be equally pronounced for every entry date X j .
- rotation can have no effect on circular objects, but can have a very strong impact on general objects. Therefore, the size of the Quantity a ⁇ x) depends on the date of receipt X j , which can be problematic for adverserial training methods.
- the number N of training data points is usually a size difficult to set. If N is too large, the running time of the
- Evaluation data set and determines the quality of convergence using this evaluation data set this can lead to an over-fitting of the weights w with respect to the data points of the evaluation data set, which not only reduces the data efficiency but also the performance of the The network can deteriorate if it is applied to data other than the training data. This can lead to a reduction in the so-called “generalizability” (English: “generalizability”).
- batch normalization layers (English: "batch normalization layer") statistical parameters m and s over so-called
- Training process can be updated probabilistically.
- the values of these parameters m and s are selected as predefinable values, for example as estimated values from the training by extrapolation of the exponential drop behavior.
- the layer with index i is a stack normalization layer
- the size of the mini stack is a parameter that the
- Training result in general influenced and must therefore be set as well as a further hyper parameter, for example in the context of a (possibly complex) architecture search.
- the invention therefore relates to a method for training a neural network, which is in particular set up for
- Classification of physical measurement variables the neural network being trained using a training data set X, pairs for training comprising an input signal and an associated desired one
- Output signal are drawn from the training data set (randomly), with parameters of the neural network being adapted as a function of an output signal of the neural network when the input signal and the desired output signal are supplied, this pulling of pairs always taking place from the entire training data set.
- pairs are pulled regardless of which pairs were previously drawn in the course of the training.
- Training data set can be dragged by "dragging without replacement”. This “pulling with replacement” may initially appear disadvantageous, since it cannot be guaranteed that within a given number of training examples every data point from the training data record will actually be used.
- this advantage arises without a deterioration in the performance that can be achieved at the end of the training (e.g. in the case of Classification of images).
- an interface to other sub-blocks of a training system with which the neural network can be trained is greatly simplified.
- the drawn pairs can optionally be augmented even further.
- a set of augmentation functions can be provided to which the input signal can be subjected.
- the corresponding augmentation function can also be selected randomly, preferably regardless of which pairs and / or which
- Augmentation functions were previously pulled during the course of the training.
- the input signal of the pair being pulled is augmented with the augmentation function a t , ie that the input signal is replaced by its image under the augmentation function.
- the augmentation function a t is selected, in particular randomly, from the set a of possible augmentation functions, this set being dependent on the input signal.
- the probability can be a predeterminable quantity.
- the probability is advantageously selected proportional to the number of possible augmentation functions. This makes it possible to adequately take into account that some augmentation functions leave the input signal unchanged, so that the thickness of the set (i.e. the number of elements of the set) of the augmentation functions can be very different between the input signals.
- an adverserial input signal can be generated in the case of adversial training methods using a suitable augmentation function, which has a sufficiently small distance smaller than a maximum distance r from the given input signal. Become two
- the parameters are adjusted as a function of a determined gradient, and to determine the gradient an estimated value m 1 of the gradient by taking into account a successively increasing number of pairs that are drawn from the training data set, so is refined long until a predefinable termination condition, which is dependent on the estimated value m 1 of the gradient, is met.
- the size of the mini-stacks can be optimized
- Hyperparameters Because this optimization can be dispensed with, the method is more efficient and reliable, since overfitting can be suppressed more effectively and the stack size as a hyper parameter is eliminated.
- the predefinable termination condition can also be dependent on a covariance matrix C of the estimated value m 1 of the gradient.
- the predeterminable termination condition can include the condition whether the estimated value m 1 and the covariance matrix C for a predeterminable confidence value l fulfill the condition ⁇ m 1 , C ⁇ 1 m 1 )> l 2 .
- components of the determined gradient are scaled depending on the layer of the neural network to which these components belong
- scaling can be understood to mean that the components of the determined gradient are multiplied by a factor dependent on the layer.
- the scaling can take place as a function of a position, that is to say the depth, of this layer within the neural network.
- the depth can, for example, be characterized, in particular given, by the number of layers through which an input layer of the
- neural network must propagate signal before it is present for the first time as an input signal to this layer.
- the scaling also takes place depending on which feature of a feature card the corresponding component of the determined gradient belongs to. In particular, it can be provided that the scaling takes place depending on the size of a receptive field of this feature.
- weights of a feature card are multiplied cumulatively with information about the features of the receptive field, which is why an over-adaptation can develop for these weights. This is effectively prevented with the proposed method.
- the neural network comprises a scaling layer, the scaling layer mapping an input signal present at the input of the scaling layer to an output signal present at the output of the scaling layer such that the output signal present at the output is a rescaled signal of the input signal represents parameters that rescaling
- the scaling layer maps an input signal present at the input of the scaling layer to an output signal present at the output of the scaling layer in such a way that this mapping corresponds to a projection onto a ball, the center c and / or radius p of this ball being able to be predetermined can.
- these parameters it is also possible for these parameters to be adapted in the course of the training, just like other parameters of the neural network.
- N ⁇ first norm
- N 2 second norm
- the first standard (N -ü and the second standard (N 2 ) are chosen the same.
- the first norm (N -ü can be a L ° ° norm. This norm can also be calculated particularly efficiently, especially if the first norm (N -ü and the second norm (N 2 ) are not the same are selected.
- the first norm (N -ü is an L 1 norm. This choice of the first norm favors the sparsity of the output signal of the scaling layer. This is for example for the
- Compression of neural networks is advantageous since weights with the value 0 have no contribution to the output value of their layer.
- a neural network with such a layer can therefore be used in a particularly memory-efficient manner, in particular in connection with a compression method.
- the second standard (N 2 ) is an L 2 standard.
- the methods can thus be implemented particularly easily.
- Figure 1 schematically shows a structure of an embodiment of a
- Figure 2 schematically shows an embodiment for controlling a
- Figure 3 schematically shows an embodiment for controlling a
- Figure 4 schematically shows an embodiment for controlling a
- Figure 5 schematically shows an embodiment for controlling a
- Figure 6 schematically shows an embodiment for controlling a
- Figure 7 schematically shows an embodiment for controlling a
- Figure 8 shows schematically a training system
- FIG. 9 schematically shows a structure of a neural network
- Figure 10 shows schematically an information transfer within the neural
- FIG. 11 shows an embodiment of a flow chart
- FIG. 12 shows a flowchart of an embodiment of a method for estimating a gradient
- FIG. 13 shows a flowchart of an alternative embodiment of the method for estimating the gradient
- FIG. 14 shows in a flowchart an embodiment of a method for scaling the estimated gradient
- FIG. 15 in flow diagrams embodiments for implementing a
- FIG. 16 shows a method for operating the trained neural network in a flowchart.
- Figure 1 shows an actuator 10 in its environment 20 in interaction with a control system 40.
- Actuator 10 and environment 20 are collectively referred to as an actuator system.
- a state of the actuator system is detected with a sensor 30, which can also be given by a plurality of sensors.
- the sensor signal S - or in the case of several sensors, one sensor signal S - from the sensor 30 is transmitted to the control system 40.
- the control system 40 thus receives a sequence of sensor signals S.
- the control system 40 uses this to determine control signals A which are transmitted to the actuator 10.
- the sensor 30 is any sensor that detects a state of the environment 20 and transmits it as a sensor signal S.
- it can be a
- the senor can be an imaging sensor, in particular an optical sensor such as an image sensor or a video sensor, or a radar sensor, or an ultrasonic sensor, or a LiDAR sensor. It can also be an acoustic sensor that receives structure-borne noise or voice signals, for example. Likewise, the sensor can be a position sensor (such as GPS) or a kinematic sensor (for example a single or multi-axis sensor)
- Accelerometer act.
- a sensor that characterizes an orientation of the actuator 10 in the environment 20 for example a compass
- the sensor 30 can also include an information system that determines information about a state of the actuator system, such as
- a weather information system that determines a current or future state of the weather in the environment 20.
- the control system 40 receives the sequence of sensor signals S from the sensor 30 in an optional receiving unit 50, which the sequence of
- sensor signal S Converts sensor signals S into a sequence of input signals x (alternatively, sensor signal S can also be directly accepted as input signal x).
- the input signal x can be, for example, a section or further processing of the sensor signal S.
- the input signal x can comprise, for example, image data or images, or individual frames of a video recording. In other words, input signal x is determined as a function of sensor signal S.
- the input signal x is fed to a neural network 60.
- the neural network 60 is preferably parameterized by parameters Q, for example comprising weights w, which are stored in a parameter memory P and are provided by the latter.
- the neural network 60 determines x output signals y from the input signals.
- the output signals y encode classification information of the input signal x.
- the output signals y become an optional
- Forming unit 80 supplied, which determines control signals A, which are fed to the actuator 10 to control the actuator 10 accordingly.
- the neural network 60 can for example be set up in the
- the actuator 10 receives the control signals A, is controlled accordingly and carries out a corresponding action.
- the actuator 10 can comprise a control logic (not necessarily structurally integrated), which determines a second control signal from the control signal A, with which the actuator 10 is then controlled.
- control system 40 comprises the sensor 30. In still further embodiments, the control system 40 alternatively or additionally also includes the actuator 10.
- control system 40 includes one or more processors 45 and at least one
- machine readable storage medium 46 on which instructions are stored which, when executed on processors 45, cause control system 40 to execute the method of operating the
- a display unit 10a is provided as an alternative or in addition to the actuator 10.
- FIG. 2 shows an exemplary embodiment in which the control system 40 is used to control an at least partially autonomous robot, here an at least partially automated motor vehicle 100.
- the sensor 30 can be one of the sensors mentioned in connection with FIG. 1, preferably one or more
- video sensors and / or one or more radar sensors and / or one or more ultrasonic sensors and / or one or more LiDAR sensors and / or one or more position sensors are preferably arranged in motor vehicle 100.
- the neural network 60 can, for example, from the input data x
- the output signal y can be information that characterizes where in the vicinity of the at least partially autonomous Robot's items are present.
- the output signal A can then be determined depending on this information and / or according to this information.
- the actuator 10, which is preferably arranged in the motor vehicle 100, can be, for example, a brake, a drive or a steering of the motor vehicle 100
- the control signal A can then be determined in such a way that the actuator or the actuators 10 is controlled in such a way that the motor vehicle 100, for example, prevents a collision with the objects identified by the neural network 60, in particular if it is
- control signal A can be determined as a function of the determined class and / or according to the determined class.
- the at least partially autonomous robot can also be another mobile robot (not shown), for example one that moves by flying, swimming, diving or walking.
- the mobile robot can also be, for example, an at least partially autonomous lawn mower or an at least partially autonomous cleaning robot.
- the control signal A can be determined in such a way that the drive and / or steering of the mobile robot are controlled such that the at least partially autonomous robot prevents, for example, a collision with the objects identified by the neural network 60,
- the at least partially autonomous robot can also be a garden robot (not shown), which uses an imaging sensor 30 and the neural network 60 to determine a type or a state of plants in the environment 20.
- the actuator 10 can then be an applicator of chemicals, for example.
- the control signal A can be determined as a function of the determined type or the determined state of the plants in such a way that an amount of the chemicals corresponding to the determined type or the determined state is applied.
- the at least partially autonomous robot can also be a household appliance (not shown), in particular a washing machine, a stove, an oven, a microwave or a
- a state of an object treated with the household appliance can be detected, for example in the case of the washing machine, a state of laundry that is in the washing machine.
- a type or a state of this object can then be determined and transmitted
- Output signal y can be characterized.
- the control signal A can then be determined in such a way that the household appliance is controlled as a function of the determined type or the determined state of the object. For example, in the case of the washing machine, this can be controlled depending on the material from which the laundry contained therein is made. Control signal A can then be selected depending on which material of the laundry has been determined.
- FIG. 3 shows an exemplary embodiment in which the control system 40 is used to control a production machine 11 of a production system 200 in that an actuator 10 controlling this production machine 11 is activated.
- the production machine 11 can, for example, be a machine for punching, sawing, drilling and / or cutting.
- the sensor 30 can be one of the sensors mentioned in connection with FIG. 1, preferably an optical sensor which e.g. Properties of finished products 12 recorded. It is possible that the actuator 10 controlling the production machine 11 is controlled depending on the determined properties of the production product 12, so that the production machine 11 accordingly executes a subsequent processing step of this production product 12. It is also possible for the sensor 30 to determine the properties of the manufactured product 12 processed by the manufacturing machine 11 and, depending on this, to adapt a control of the manufacturing machine 11 for a subsequent finished product.
- FIG. 4 shows an exemplary embodiment in which the control system 40 is used to control a personal assistant 250.
- the sensor 30 can be one of those mentioned in connection with FIG. 1 Act sensors.
- Sensor 30 is preferably an acoustic sensor that receives voice signals from a user 249. Alternatively or additionally, the sensor 30 can also be set up to receive optical signals,
- the control system 40 determines a control signal A of the personal assistant 250, for example by the neural network performing a gesture recognition. This determined control signal A is then transmitted to the personal assistant 250 and is thus controlled accordingly.
- This determined control signal A ist can in particular be selected such that it corresponds to a presumed desired activation by the user 249. This presumed desired activation can be determined depending on the gesture recognized by the neural network 60.
- the control system 40 can then select the control signal A for transmission to the personal assistant 250 depending on the presumed desired control and / or select the control signal A for transmission to the personal assistant in accordance with the presumed desired control 250.
- This corresponding control can include, for example, that the personal assistant 250 retrieves information from a database and reproduces it in a way that the user 249 can receive.
- a household appliance (not shown), in particular a washing machine, a stove, an oven, a microwave or a dishwasher, can also be provided in order to be controlled accordingly.
- FIG. 5 shows an exemplary embodiment in which the control system 40 is used to control an access system 300.
- Access system 300 may include physical access control, such as door 401.
- the sensor 30 can be one of the sensors mentioned in connection with FIG. 1, preferably an optical sensor (for example for recording image or video data), which is set up to record a face.
- This captured image can be captured by means of the neural network 60 be interpreted.
- the actuator 10 can be a lock that, depending on the control signal A, releases the access control or not, for example opens the door 401, or not.
- the control signal A can be selected depending on the interpretation of the neural network 60, for example depending on the identified identity of the person.
- a logical access control can also be provided.
- FIG. 6 shows an exemplary embodiment in which the control system 40 is used to control a monitoring system 400.
- This exemplary embodiment differs from the exemplary embodiment shown in FIG. 5 in that, instead of the actuator 10, the display unit 10a is provided, which is controlled by the control system 40.
- the neural network 60 can determine whether one is from the optical sensor
- control signal A can then be selected such that this object is shown in color by the display unit 10a.
- FIG. 7 shows an exemplary embodiment in which the control system 40 is used to control a medical imaging system 500, for example an MRI, X-ray or ultrasound device.
- the sensor 30 can be provided, for example, by an imaging sensor, and the control system 40 controls the display unit 10a.
- the neural network 60 can determine whether an area recorded by the imaging sensor is noticeable, and the control signal A can then be selected such that this area is highlighted in color by the display unit 10a.
- FIG. 8 schematically shows an exemplary embodiment of a training system 140 for training the neural network 60 by means of a training method.
- a training data unit 150 determines suitable input signals x, which are fed to the neural network 60. For example, the
- Training data unit 150 to a computer-implemented database in which a set of training data is stored and, for example, randomly selects input signals x from the set of training data.
- the Training data unit 150 also desired, or “actual”, output signals associated with the input signals x, which one
- Evaluation unit 180 are supplied.
- the artificial neural network x is set up to determine associated output signals y from the input signals x supplied to it. This
- Output signals y are fed to the evaluation unit 180.
- the evaluation unit 180 can, for example, by means of one of the
- the parameters Q can be optimized depending on the cost function L.
- training system 140 includes one or more processors 145 and at least one
- machine-readable storage medium 146 on which instructions are stored which, when executed on processors 145, cause control system 140 to execute the training process.
- FIG. 9 shows an example of a possible structure of the neural network 60, which is given as a neural network in the exemplary embodiment.
- the neural network comprises a plurality of layers S 1 , S 2 , S 3 , S 4 , S 5 in order to convert from the input signal x, which is fed to an input of an input layer S 4
- Each of the layers S 1 , S 2 , S 3 , S 4 , S 5 is set up here, from an (possibly multidimensional) input signal x, z 1 , z 3 , z 4 , z 6 which is at an input of the respective layer S 1 , S 2 , S 3 , S 4 , S 5 is present, a - a (possibly multidimensional) output signal z 1 , z 2 , z 4 , z 5 , y to determine that at an output of the respective layer S 1 , S 2 , S 3 , S 4 , S 5 is present.
- Such output signals especially in image processing, are also referred to as feature maps. It is not necessary that the layers
- S 1 , S 2 , S 3 , S 4 , S 5 are arranged in such a way that all output signals which are input to further layers come from a previous layer into an immediately following layer. Instead, there are also bridging connections (English: “Skip
- the input signal x enters several of the layers or that the output signal y of the neural network 60 is composed of output signals of a plurality of layers.
- the output layer S 5 can be given, for example, by an Argmax layer (i.e. a layer that selects from a plurality of inputs with associated input values a description of the input whose assigned input value is the largest among these input values), one or more of the layers S 1 , S 2 , S 3 can be given, for example, by folding layers.
- an Argmax layer i.e. a layer that selects from a plurality of inputs with associated input values a description of the input whose assigned input value is the largest among these input values
- one or more of the layers S 1 , S 2 , S 3 can be given, for example, by folding layers.
- a layer S 4 is advantageously designed as a scaling layer which is designed such that an input signal (x) present at the input of the scaling layer (S 4 ) is mapped onto an output signal (y) present at the output of the scaling layer (S 4 ) such that the output signal (y) present at the output is a rescaling of the input signal (x), parameters which characterize the rescaling being predeterminable. Exemplary embodiments of methods that scaling layer S 4 can carry out are described below in connection with FIG. 15.
- FIG. 10 schematically illustrates the forwarding of information within the neural network 60.
- Three multidimensional signals within the neural network 60 namely the input signal x, and later feature maps z 1 , z 2 are shown schematically here.
- the input signal x has
- Embodiment a spatial resolution of n * x riy pixels, the first feature map z 4 of n x x ri y pixels, the second feature map z 2 of n? x x ri y pixels.
- the resolution of the second feature map z 2 is less than the resolution of the input signal x, but this is not absolutely the case.
- a feature, for example a pixel, (i) 3 of the second is also shown
- Characteristic card z 2 The function that determines the second feature map z 2 from the first feature map z 4 , for example by a folding layer or a fully meshed layer (English: “fully connected layer”), it is also possible that a plurality of features of the first feature map z 4 .
- Characteristic map z- the determination of the value of this characteristic (i, y) 3 is received.
- “Incoming” can advantageously be understood to mean that there is a combination of values of the parameters that characterize the function with which the second feature map z 2 is determined from the first feature map z and of values of the first feature map z such that that the value of the feature (i, y) 3 depends on the value of the incoming feature.
- One or more features of the input signal x are included in the determination of each feature (i, j) 2 of the area Be.
- the set of all features of the input signal x that go into the determination of at least one of the features (i, j) 2 of the area Be are referred to as the receptive field rF of the feature (i) 3 .
- the receptive field rF of the feature (i) 3 includes all those features of the input signal x that are directly or indirectly (in other words: at least indirectly) included in the determination of the feature (i, y) 3 , ie the values of which Can influence value of feature (i) 3 .
- FIG. 11 shows the flow of a method for
- Training neural network 60 according to one embodiment.
- a training data record X comprising pairs (x j , y j ) of input signals and respectively associated output signals y t is provided.
- first set G and a second set N are optionally initialized, for example if in step 1100 that illustrated in FIG. 12
- step 1100 the exemplary embodiment of this part of the method illustrated in FIG can be used, the initialization of the first set G and the second set N can be dispensed with.
- the initialization of the first set G and the second set N can take place as follows: The first set G, the pairs (x j , y j ) of the
- Training data record X which has already been drawn in the course of a current epoch of the training process, is initialized as an empty set.
- the second set N which comprises those pairs (x j , y j ) of the training data set X that have not yet been drawn in the course of the current epoch, is initialized by assigning all pairs (x j , y j ) of the training data set X to it .
- ⁇ h a predeterminable learning rate reduction factor
- ⁇ h 1/10
- a number of epochs can be divided by a predefinable number of epochs, e.g. 5.
- the parameters Q are updated by means of the determined and possibly scaled gradient g and the learning rate h.
- the learning rate h For example, the
- the convergence criterion can be met exactly when an L 2 standard for changing all parameters Q between the last two epochs is smaller than a predefinable convergence threshold.
- Parameter Q is accepted as the learned parameter and the process ends.
- FIG. 12 illustrates in a flowchart an exemplary method for determining the gradient g in step 1100.
- Training data set X (without replacement) drawn, i.e. selected and assigned to a batch B (English: "batch").
- the predeterminable number of bs is also referred to as a batch size.
- Batch B is initialized as an empty set.
- the stack size bs is not larger than the number of pairs (x ⁇ y;) that are present in the second set N, bs many pairs (x j , y j ) are randomly drawn from the second set N (1130), that is selected and the stack B
- stack size bs is larger than the number of pairs (x ⁇ y;) which are present in the second set N, all pairs of the second set N, the number of which is denoted by s, are drawn (1140), that is to say selected and given Stack B added, and the rest, ie many - drawn from the first set G, selected and added to stack B.
- step (1130) or (1140) it is then optionally decided (1150) at step (1130) or (1140) whether or not these parameters Q should be ignored in this training session. For example, for each Layer (S 1 , S 2 , ..., S 6 ) separately set a probability with which
- Parameters Q of this layer are ignored. For example, this probability can be 50% for the first layer and can be reduced by 10% with each subsequent layer.
- an augmentation function is selected and applied to the input signal x t .
- the input signal x t thus augmented then replaces the original input signal x t .
- the augmentation function can be given, for example, by rotation through a predeterminable angle.
- Corresponding (and possibly augmented) input signal x t is selected and fed to the neural network 60.
- the parameters Q to be transferred of the neural network 60 are here during the determination of the
- corresponding output signal is deactivated, for example by temporarily setting it to zero.
- the corresponding output signal y (x j ) of the neural network 60 is assigned to the corresponding pair (c ⁇ , g).
- step (1200) If it is found that if the stack size bs is not greater than the number of pairs (x ⁇ y;) that are present in the second set N, then (1180) all pairs (x ⁇ y;) of the stack B of the first set G added and removed from the second set N. It is now checked (1185) whether the second set N is empty. If the second set N is empty, a new era begins (1186). For this purpose, the first set G is reinitialized as an empty set, and the second set N is reinitialized by reassigning all pairs (x j , y j ) of the training data set X, and the process branches to step (1200). If the second set N is not empty, the process branches directly to step (1200).
- the first set G is reinitialized (1190) by matching all pairs (x j , y j ) of stack B
- the second set N is reinitialized by re-assigning all pairs (x j , y j ) of training data set X and then the pairs (x ⁇ y;) that are also present in stack B. be removed.
- a new epoch then begins and the process branches to step (1200). This ends this part of the procedure.
- FIG. 13 illustrates a further exemplary method for determining the gradient g in step 1100 in a flowchart.
- parameters of the method are initialized (1111).
- the mathematical space of the parameters Q is referred to as W.
- a pair (x j , y j ) is randomly selected from the training data record X and, if necessary, augmented. This can happen, for example, in such a way that for each input signal x t of the pairs (x j , y j ) training data set X is one Number m (a (c ⁇ )) of possible augmentations a (x is determined, and each pair (xi, yi) a position size is assigned. If a random number fe [0; 1], evenly distributed, that position variable p t can be selected that contains the inequality chain
- the associated index i designates the selected pair (x ⁇ y;), an augmentation a t of the input variable can be drawn randomly from the set of possible augmentations cr (X j ) and applied to the input variable Xi, ie the selected pair (x j , y j ) is replaced by (a ⁇ (c ⁇ ) ⁇ ⁇ ).
- the input signal x t is fed to the neural network 60.
- the corresponding output signal y (x ; ) and the output signal y t of the pair (xi, yi) as the desired output signal y T the corresponding one
- Cost function L t determined.
- step (1200) If the inequality is satisfied, the current value of the first variable m 1 is adopted as the estimated gradient g and the process branches back to step (1200).
- the process can branch back to step (1121). Alternatively, it can also be checked (1171) whether the iteration counter n has reached a predeterminable maximum iteration value n max . If this is not the case, the process branches back to step (1121), otherwise the zero vector 0 e W is adopted as the estimated gradient g (1181), and the process branches back to step (1200). This ends this part of the procedure.
- FIG. 14 shows an embodiment of the method for scaling the
- Gradients g are denoted by a pair (t, Z), where ie ⁇ 1, ..., k ⁇ denotes a layer of the corresponding parameter Q, and le ⁇ 1, ..., dim (Vi) ⁇ a numbering of the corresponding parameter Q within the i-th layer.
- the neural network as illustrated in FIG. 10, is designed for processing multidimensional input data x with corresponding feature maps z ( in the i-th layer, the numbering l is advantageously given by the position of the feature in the feature map z ( with which the
- this scaling factor W M can be given by the size of the receptive field rF of the corresponding feature of the feature map of the i-th layer.
- the scaling factor W M can alternatively also be given by a ratio of the resolutions, ie the number of features, of the i-th layer in relation to the input layer.
- the scaling factor W M is given by the size of the receptive field rF, an over-adaptation of the parameters Q can be avoided particularly effectively. If the scaling factor W M is given by the ratio of the resolutions, this is a particularly efficient approximate estimate of the size of the receptive field rF.
- FIG. 15 illustrates embodiments of the method used by the
- the scaling layer S 4 is set up to achieve a projection of the input signal x present at the input of the scaling layer S 4 onto a sphere, with radius p and center point c.
- This is characterized by a first standard N ⁇ y— c), which measures a distance of the center point c from the output signal y present at the output of the scaling layer S 4 , and a second standard N 2 (x— y), which measures a distance of the input signal present at the input of the scaling layer S 4 x of the output signal y present at the output of the scaling layer S 4 .
- FIG. 15 a illustrates a particularly efficient first embodiment in the event that the first standard / V 4 and a second standard N 2 are the same. You will see
- an input signal x present at the input of the scaling layer S 4 , a center parameter c and a radius parameter p are provided.
- FIGS. 15b) and 15c) illustrate embodiments for particularly advantageously selected combinations of the first standard / V 4 and the second standard N 2 .
- FIG. 15 b) illustrates a second embodiment in the event that the first norm / V 4 ( ⁇ ) is given by the condition (12) to be fulfilled
- First (3000) is analogous to step (2000) at the input of the
- Scaling layer S 4 applied input signal x, the center parameter c and the radius parameter p are provided. Then (3100) the components y t of the output signal y present at the output of the scaling layer S 4 are determined if x t - c t > p
- FIG. 15 c) illustrates a third embodiment in the event that the first standard / V 4 ( ⁇ ) is given by the 1 standard 11 ⁇
- This combination of standards means that as many small components as possible in the input signal x present at the input of the scaling layer S 4 are set to the value zero.
- Scaling layer S 4 applied input signal x, the center parameter c and the radius parameter p are provided.
- This method corresponds to a Newton method and is particularly computationally efficient, in particular if many of the components of the input signal x present at the input of the scaling layer S 4 are important.
- FIG. 16 illustrates an embodiment of a method for operating the neural network 60.
- the neural network is trained using one of the methods described.
- the control system 40 is operated with the neural network 60 thus trained as described. This ends the process.
- the neural network is not restricted to forward-looking neural networks (English: “feedforward neural network”), but that the invention can be applied in the same way to any type of neural network, in particular recurrent networks, convolution networks
- the term “computer” encompasses any device for processing predefinable calculation rules. These calculation rules can be in the form of software, or in the form of hardware, or also in a mixed form of software and hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Neurology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102018222347.5A DE102018222347A1 (de) | 2018-12-19 | 2018-12-19 | Verfahren zum Trainieren eines neuronalen Netzes |
PCT/EP2019/082837 WO2020126378A1 (fr) | 2018-12-19 | 2019-11-28 | Procédé pour entraîner un réseau neuronal |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3899808A1 true EP3899808A1 (fr) | 2021-10-27 |
Family
ID=68733060
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19812975.1A Pending EP3899808A1 (fr) | 2018-12-19 | 2019-11-28 | Procédé pour entraîner un réseau neuronal |
Country Status (8)
Country | Link |
---|---|
US (1) | US20210406684A1 (fr) |
EP (1) | EP3899808A1 (fr) |
JP (1) | JP7137018B2 (fr) |
KR (1) | KR20210099149A (fr) |
CN (1) | CN113243021A (fr) |
DE (1) | DE102018222347A1 (fr) |
TW (1) | TWI845580B (fr) |
WO (1) | WO2020126378A1 (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI793516B (zh) * | 2021-02-04 | 2023-02-21 | 國立中興大學 | 神經網路之自適應調節批量大小的訓練方法 |
TWI771098B (zh) * | 2021-07-08 | 2022-07-11 | 國立陽明交通大學 | 路側單元之雷達系統之狀態之錯誤診斷系統及方法 |
CN114046179B (zh) * | 2021-09-15 | 2023-09-22 | 山东省计算中心(国家超级计算济南中心) | 一种基于co监测数据智能识别和预测井下安全事故的方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5745382A (en) * | 1995-08-31 | 1998-04-28 | Arch Development Corporation | Neural network based system for equipment surveillance |
DE19635758C1 (de) * | 1996-09-03 | 1997-11-20 | Siemens Ag | Verfahren und Vorrichtung zur rechnergestützten Generierung mindestens eines künstlichen Trainingsdatenvektors für ein neuronales Netz |
DE19721067C1 (de) * | 1997-05-20 | 1998-09-17 | Siemens Nixdorf Advanced Techn | Stochastischer Schätzer, insbesondere zur Analyse von Kundenverhalten |
JP2004265190A (ja) | 2003-03-03 | 2004-09-24 | Japan Energy Electronic Materials Inc | 階層型ニューラルネットワークの学習方法、そのプログラム及びそのプログラムを記録した記録媒体 |
TWI655587B (zh) * | 2015-01-22 | 2019-04-01 | 美商前進公司 | 神經網路及神經網路訓練的方法 |
US10410118B2 (en) * | 2015-03-13 | 2019-09-10 | Deep Genomics Incorporated | System and method for training neural networks |
EP3336774B1 (fr) * | 2016-12-13 | 2020-11-25 | Axis AB | Procédé, produit-programme informatique et dispositif de formation d'un réseau neuronal |
CN108015766B (zh) * | 2017-11-22 | 2020-05-22 | 华南理工大学 | 一种非线性约束的原对偶神经网络机器人动作规划方法 |
CN108015765B (zh) * | 2017-11-22 | 2019-06-18 | 华南理工大学 | 一种机器人运动规划的拓展解集对偶神经网络解决方法 |
CN108520155B (zh) * | 2018-04-11 | 2020-04-28 | 大连理工大学 | 基于神经网络的车辆行为模拟方法 |
CN108710950A (zh) * | 2018-05-11 | 2018-10-26 | 上海市第六人民医院 | 一种图像量化分析方法 |
-
2018
- 2018-12-19 DE DE102018222347.5A patent/DE102018222347A1/de active Pending
-
2019
- 2019-11-28 EP EP19812975.1A patent/EP3899808A1/fr active Pending
- 2019-11-28 WO PCT/EP2019/082837 patent/WO2020126378A1/fr unknown
- 2019-11-28 JP JP2021535840A patent/JP7137018B2/ja active Active
- 2019-11-28 KR KR1020217022763A patent/KR20210099149A/ko unknown
- 2019-11-28 CN CN201980084359.2A patent/CN113243021A/zh active Pending
- 2019-11-28 US US17/295,434 patent/US20210406684A1/en active Pending
- 2019-12-18 TW TW108146410A patent/TWI845580B/zh active
Also Published As
Publication number | Publication date |
---|---|
CN113243021A (zh) | 2021-08-10 |
JP7137018B2 (ja) | 2022-09-13 |
TWI845580B (zh) | 2024-06-21 |
US20210406684A1 (en) | 2021-12-30 |
WO2020126378A1 (fr) | 2020-06-25 |
JP2022514886A (ja) | 2022-02-16 |
DE102018222347A1 (de) | 2020-06-25 |
KR20210099149A (ko) | 2021-08-11 |
TW202105261A (zh) | 2021-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020260020A1 (fr) | Procédé et dispositif de contrôle de la robustesse d'un réseau neuronal artificiel | |
DE102018218586A1 (de) | Verfahren, Vorrichtung und Computerprogramm zum Erzeugen robuster automatisch lernender Systeme und Testen trainierter automatisch lernender Systeme | |
EP3899808A1 (fr) | Procédé pour entraîner un réseau neuronal | |
EP3853778B1 (fr) | Procédé et dispositif servant à faire fonctionner un système de commande | |
DE202020101012U1 (de) | Vorrichtung zum Vorhersagen einer geeigneten Konfiguration eines maschinellen Lernsystems für einen Trainingsdatensatz | |
WO2020260016A1 (fr) | Procédé et dispositif d'apprentissage d'un système d'apprentissage automatique | |
EP3857822A1 (fr) | Procédé et dispositif de détermination d'un signal de commande | |
DE102023202402A1 (de) | System und Verfahren zum Verbessern der Robustheit von vortrainierten Systemen in tiefen neuronalen Netzwerken unter Verwendung von Randomisierung und Sample-Abweisung | |
DE102019214625A1 (de) | Verfahren, Vorrichtung und Computerprogramm zum Erstellen eines künstlichen neuronalen Netzes | |
EP3915058A1 (fr) | Procédé destiné à l'entraînement d'un réseau neuronal | |
EP3899809A1 (fr) | Procédé et dispositif de classification de données de capteur et de détermination d'un signal de commande pour commander un actionneur | |
DE102020216188A1 (de) | Vorrichtung und Verfahren zum Trainieren eines Klassifizierers | |
DE102020208828A1 (de) | Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems | |
WO2020173700A1 (fr) | Procédé et dispositif de fonctionnement d'un système de commande | |
DE102020209024A1 (de) | Verfahren zum Generieren eines Überwachungs-Bildes | |
DE202019105304U1 (de) | Vorrichtung zum Erstellen eines künstlichen neuronalen Netzes | |
EP3857455A1 (fr) | Système d'apprentissage automatique ainsi que procédé, programme informatique et dispositif pour créer le système d'apprentissage automatique | |
DE102020208309A1 (de) | Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems | |
DE202020104005U1 (de) | Vorrichtung zum Erstellen eines Systems zum automatisierten Erstellen von maschinellen Lernsystemen | |
DE102020212108A1 (de) | Verfahren und Vorrichtung zum Anlernen eines maschinellen Lernsystems | |
DE202021100225U1 (de) | Verbesserte Vorrichtung zum Anlernen von maschinellen Lernsysteme für Bild-verarbeitung | |
DE102021200439A1 (de) | Verbessertes Anlernen von maschinellen Lernsysteme für Bildverarbeitung | |
DE102020211714A1 (de) | Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems | |
WO2023006597A1 (fr) | Procédé et dispositif de création d'un système d'apprentissage automatique | |
DE102021207937A1 (de) | Verfahren und Vorrichtung zum Erstellen eines maschinellen Lernsystems mit einer Mehrzahl von Ausgängen |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20210719 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230905 |