EP3931761A1 - Autonomes selbstlernendes system - Google Patents
Autonomes selbstlernendes systemInfo
- Publication number
- EP3931761A1 EP3931761A1 EP20709525.8A EP20709525A EP3931761A1 EP 3931761 A1 EP3931761 A1 EP 3931761A1 EP 20709525 A EP20709525 A EP 20709525A EP 3931761 A1 EP3931761 A1 EP 3931761A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- neural network
- output vector
- vector
- new state
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 claims abstract description 468
- 239000013598 vector Substances 0.000 claims abstract description 192
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 53
- 230000008451 emotion Effects 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000006243 chemical reaction Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 33
- 230000006870 function Effects 0.000 description 27
- 230000000306 recurrent effect Effects 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 230000003993 interaction Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 241001061260 Emmelichthys struhsakeri Species 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000002996 emotional effect Effects 0.000 description 3
- 230000002787 reinforcement Effects 0.000 description 3
- 230000002618 waking effect Effects 0.000 description 3
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 235000003642 hunger Nutrition 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000002981 neuropathic effect Effects 0.000 description 2
- 230000007171 neuropathology Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003014 reinforcing effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000000044 Amnesia Diseases 0.000 description 1
- 208000031091 Amnestic disease Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 206010003830 Automatism Diseases 0.000 description 1
- 206010005177 Blindness cortical Diseases 0.000 description 1
- 206010012177 Deja vu Diseases 0.000 description 1
- 208000025967 Dissociative Identity disease Diseases 0.000 description 1
- 201000000251 Locked-in syndrome Diseases 0.000 description 1
- 206010041347 Somnambulism Diseases 0.000 description 1
- 230000006986 amnesia Effects 0.000 description 1
- 210000004727 amygdala Anatomy 0.000 description 1
- 210000003926 auditory cortex Anatomy 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 208000005675 central hearing loss Diseases 0.000 description 1
- 208000009153 cortical blindness Diseases 0.000 description 1
- 201000008107 cortical deafness Diseases 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000003705 neurological process Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 210000000857 visual cortex Anatomy 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the invention lies in the field of automatic autonomous systems.
- the invention relates to a method for controlling a technical system with an agent that implemen benefits an artificial neural network.
- recurrent neural networks feedback neural networks
- reinforcement learning reinforcing learning or reinforcing learning
- Recurrent neural networks are a technology that makes it possible to represent general automata as learnable systems. Examples of this are shown in FIG. 1 and in FIG. 2 as simplified block diagrams.
- FIG. 1 shows a recurrent neural network known from the prior art. It has an input x, a state h t , and an output y.
- the input x and the current state h t are jointly changed to a new state h t + i transferred, ie the new state h t + i of the neural network is generated from the input x and the current state h t .
- the output y is then generated from this new state h t + i .
- Each arrow is a universal function approximator.
- the function approximators can be formed by a fully connected network with a hidden layer. Deeper so-called feed-forward models can also be used. To do this, it is necessary to train the network.
- pairs comprising an input vector x and a reference vector y * are known.
- monitored training can be carried out, for which various optimization or training methods can be used, for example the so-called gradient descent method or so-called simulated annealing. Other optimization or training methods can also be used.
- FIG. 2 An alternative known from the prior art for a recurrent neural network is shown in FIG. 2, namely a so-called long-short-term memory network (LSTM). These long-short-term memory networks also have an internal memory c t . The provision of such an internal memory c t also makes it possible to model long time dependencies.
- LSTM long-short-term memory network
- More complex memory accesses can also be implemented using artificial neural networks.
- artificial neural networks One example of this are the so-called memory-augmented neural networks or neural turing machines.
- Reinforcement learning makes it possible to train self-acting systems that try to achieve a maximum future reward. These systems try to solve a given problem in the best possible way.
- the disadvantage of the artificial neural networks known from the prior art is that, regardless of the training method used, an essential prerequisite for training the neural network is that the problem must be precisely formulated and the target, i.e. the reward must be specified exactly. In this way, for example, games such as chess or go can be solved in which the problem can be precisely formulated and the target size can be precisely specified.
- An essential problem of the methods known from the prior art is that either a reference y * is necessary for training, or the entire world, including the complete rules of the game and axioms, has to be modeled for training.
- General problem-lessers based on artificial neural networks, who learn the rules, ie the problem definition and the solution themselves and can thus solve new, unknown problems, are not known in the prior art.
- the object of the present invention is therefore to provide solutions with which a technical system can be controlled without the environment of the technical system having to be modeled.
- a method for controlling a technical system with a first agent, wherein the first agent implements a first artificial neural network, a first input vector of the first neural network and a current state of the first neural network together in a new state of the first neural network Network are transferred, with a first output vector of the first neural network is generated from the new state of the first neural network, and where
- a second input vector, the first input vector and the current state of the first neural network are jointly transferred to the new state of the first neural network, the second input vector of the first neural network representing an emotion, and
- a second output vector of the first neural network is generated, the second output vector of the first neural network representing an expected emotion of the new state of the first neural network.
- emotions can also be drawn in to train the first neural network, such as pain (comparable to a collision), hunger (comparable to the charge level of a battery), or joy (comparable to achieving a goal, e.g. solving a certain problem).
- the technical system that can be controlled with the first agent can be, for example, a robot or an autonomously driving vehicle. It is advantageous if the second output vector of the first neural network is compared with a second reference for the purpose of training the first neural network, the comparison of the second output vector of the first neural network with the second reference calculating a distance function, preferably a Euclidean distance , and wherein the second reference represents an ideal state of the second output vector of the first neural network and thus an ideal state of the expected emotion of the new state of the first neural network.
- the second output vector of the first neural network is compared with the second input vector of the first neural network, and / or
- the second output vector of the first neural network is generated from the new state of the first neural network and from the first output vector of the first neural network.
- the first output vector of the first neural network is compared with a first reference for the purpose of training the first neural network, the comparing of the first output vector of the first neural network with the first reference a calculation of a distance function, preferably a Euclidean distance, and wherein the first reference represents an ideal state of the first output vector of the first neural network.
- the first output vector of the first neural network is fed to a second artificial neural network as the first input vector of the second neural network, the second neural network being implemented by a second agent, - the first input vector of the second neural network and a current state of the second neural network are jointly transferred to a new state of the second neural network,
- a first output vector of the second neural network is generated from the new state of the second neural network, the first output vector of the second neural network representing an expected reaction of the second neural network to the first input vector of the second neural network, and
- the first output vector of the second neural network is compared with the first input vector of the first neural network in order to train the first neural network.
- a second output vector of the second neural network can be generated from the new state of the second neural network, the second output vector of the second neural network representing an expected emotion of the new state of the second neural network, and
- the second output vector of the second neural network is compared with the second input vector of the first neural network in order to train the first neural network.
- the second agent can implement a third artificial neural network where at
- the second output vector of the second neural network is fed to the third neural network as the second input vector of the third neural network, - the first input vector, the second input vector and a current state of the third neural network are jointly transferred to a new state of the third neural network,
- a second output vector of the third neural network is generated from the new state of the third neural network, the second output vector of the third neural network representing an expected emotion of the new state of the third neural network, and
- a first output vector of the third neural network is generated from the new state of the third neural network, which is fed to the second neural network as a further input vector of the second neural network.
- the second output vector of the third neural network is compared with a third reference for the purpose of training the third neural network, the comparison of the second output vector of the third neural network with the third reference calculating a distance function, preferably a Euclidean distance , and wherein the third reference represents an ideal state of the second output vector of the third neural network and thus an ideal state of the expected emotion of the new state of the third neural network.
- first neural network and the third neural network are coupled to one another, in particular the new state of the first neural network and the current state of the third neural network are coupled to one another in order to be based on the first neural network to train the third neural network or to train the first neural network based on the third neural network.
- FIG. 1 shows an artificial neural network known from the prior art as a recurrent neural network
- FIG. 2 shows a further artificial neural network known from the prior art as a long-short-term memory network
- FIG. 3 shows a system according to the invention as an extension of the artificial neural network shown in FIG. 1;
- FIG. 4 shows a system according to the invention as an extension of the artificial neural network shown in FIG. 2;
- FIG. 5 shows a system according to the invention as an extension of the artificial neural network shown in FIG. 1; 6 shows an expansion according to the invention of the system shown in FIG. 5; 7 shows an expansion according to the invention of the system shown in FIG. 6; FIG. 8 shows an expansion according to the invention of the system shown in FIG. 7; and FIG. 9 shows an expansion according to the invention of the system shown in FIG.
- the neural networks described below are each artificial neural networks.
- autonomously self-learning agents can be provided with which a technical system can be controlled.
- the agents and thus also the respective controlled technical systems can not only work autonomously. They can also adapt adaptively and autonomously to new environments.
- Applications are, for example, robotics, autonomous driving, space travel or medical applications.
- a robot can be used in different environments, the robot being able to learn the new environment autonomously after a change in environment and thus adapt its behavior to the new environment.
- the first extension relates to the introduction of an intrinsic reference of the neural network (hereinafter first neural network NN1), that is, a self-image of the first neural network NN1.
- the intrinsic reference is referred to below as emotion.
- the second extension concerns the learning of a world model as part of the overall system using a further neural network (hereinafter the second neural network NN2).
- the world model is also called the worldview below.
- FIG. 3 shows an expansion according to the invention of the recurrent neural network NN1 shown in FIG. 1 on the basis of an emotion.
- the neural network NN1 (first neural network) is implemented by a first agent S.
- the agent S is also referred to below as self.
- a first input vector x of the first neural network NN1 and a current state h t of the first neural network NN1 are jointly transferred into a new state h t + i of the first neural network NN1.
- a first output vector y of the first neural network NN1 is then generated from the new state h t + i of the first neural network NN1.
- the first output vector y can then be used for training the first neural network NN1 compared with a first reference y * or with a first reference vector, for example using a distance function, preferably a Euclidean distance function.
- a second input vector e is fed to the first neural network NN1.
- the second input vector e of the first neural network NN1 represents an emotion of the self or of the first neural network NN1 or of the first agent S.
- any number of scalar inputs and emotions can be modeled with both input evectors x, e.
- the current emotion of the system can therefore contain several variables, such as pain (for example, if a robot causes a collision), hunger (for example when a battery is low) or joy (for example a reward if the technical system to be controlled has a Solved the task).
- a second output vector e ' is generated.
- the second output vector e 're presents the expected emotion of the next state h t + i of the self or of the first neural network NN1 or of the first agent S.
- the second output vector e ' is generated according to the invention by transferring the second input vector e, the first input vector x and the current state h t of the first neural network NN1 together into the new state h t + i of the first neural network NN1.
- the first output vector y is generated from the new state h t + i generated in this way, that is, taking into account the second input vector e.
- the second output vector e 'of the first neural network NN1 is also generated from the new state h t + i thus generated
- the expected emotion or the second output vector e 'can then be compared with a second reference e * or with a second reference vector for the purpose of training the first neural network NN1, for example using a distance function, preferably a Euclidean distance function.
- the second reference e * represents an ideal state of the second output vector e 'of the first neural network NN1 and thus an ideal state of the expected emotion of the new state h t + i of the first neural network NN1.
- Any suitable distance functions can be used for the comparison between e 'and e * or between y and y *.
- the ideal state of the expected emotion can be, for example, 0 (for nonexistent) or 1 (for existing), whereby values between 0 and 1 are also possible.
- the system is able to train all learnable parameters that lead to the second output vector e 'by means of the dashed arrows.
- methods can also be used that not only optimize the current emotion, but also take into account the anticipated emotion in the future, comparable to so-called reinforcement learning.
- the dashed arrow to the output vector y cannot be trained with emotions alone, so that the first reference y * or the first reference vector must be used for this training.
- FIG. 4 shows an expansion according to the invention of the long-short-term memory network shown in FIG. 2 on the basis of an emotion. Except for the underlying neural network, the embodiment shown in FIG. 4 corresponds to the embodiment shown in FIG. 3.
- the expansion shown in FIGS. 3 and 4 can, however, also be used for other types of neural networks.
- the second output vector e '(output emotion) is compared not only with the second reference e *, but also with the second input vector e. In this way it can be ensured that the second output vector e 'actually matches the second input vector e, i.e. fits the input emotion.
- the second output vector e '(output emotion) is not only derived from the new state h t + i of the first neural network NN1, but also taking into account the first output vector y, ie the second output vector e' is derived from the new state h t + i and derived from the first output vector y. This makes it possible to train all parameters in the network purely through emotions.
- FIG. 5 shows a system according to the invention as an extension of the artificial neural network shown in FIG. 1;
- a second neural network NN2 is provided in addition to the first neural network NN1.
- the first neural network NN1 is coupled to the second neural network NN2, the first output vector y of the first neural network NN1 being fed to the second neural network NN2 as the first input vector y of the second neural network NN2.
- the second neural network NN2 is hereby implemented by a second agent W.
- the second agent W is also called the world view below, since a world model can be learned as part of the overall system with the second neural network NN2.
- the behavior of the world for example an environment in which a robot is located, is modeled with the second neural network NN2.
- the second neural network NN2 can be, for example, a recurrent neural network, any other type of neural network also being able to be used.
- the first input vector y of the second neural network NN2 and a current state w t of the second neural network NN2 are jointly transferred to a new state w t + i of the second neural network NN2.
- the first output vector x 'of the second neural network NN2 is then generated from the new state w t + i of the second neural network NN2,
- the first output vector x 'of the second neural network NN2 is compared with the first input vector x of the first neural network NN1 in order to train the first neural network NN1.
- the first neural network NN1 is thus trained as a function of the behavior of the second neural network NN2 or as a function of the first output vector x 'of the second neural network NN2.
- the overall system shown in FIG. 5 can be fully trained, so that all learnable parameters can be estimated.
- FIG. 6 shows an expansion according to the invention of the system shown in FIG. 5, the system shown in FIG. 6 being a combination of the systems shown in FIGS. 3 and 5.
- the actual control system i.e. the agent S, with which a technical system, something a robot, is controlled, can be controlled or trained here on the one hand via the emotions (second input vector e of the first neural network NN1 or second output vector e 'of the first neural network NN1) become. This ensures that the first neural network NN1 or the first agent S follows a state that is as desirable as possible.
- the output of the first neural network NN1 (ie the first output vector y of the first neural network NN1) via the worldview (ie via the second neural network NN2 or via the second agent W) with the input of the first neural network NN1 ( ie compared with the first input vector x of the first neural network NN1), since the world view can produce an expected input (ie a first output vector x 'of the second neural network NN2), with the first output vector x' of the second neural network NN2 being the first Input vector x of the first neural network NN1 is trained.
- This enables training to be carried out without reference.
- the system or the first agent S can therefore be trained completely without annotated data and only needs incentives that identify states as desirable or not worth striving for.
- incentives can be coded using sparse annotation, such as extreme events, such as a collision, or parameters that are easily ascertainable, such as falling energy levels.
- the two aforementioned variants for emotional training can also be used in the system shown in FIG. 6.
- FIG. 7 shows an expansion of the system shown in FIG. 6 according to the invention.
- a second output vector e "of the second neural network NN2 is generated.
- the second output vector e" of the second neural network NN2 is derived from the new state w t + i of the second derived from the neural network NN2.
- the second output vector e ′′ of the second neural network NN2 represents an expected emotion of the new state w t + i of the second neural network NN2.
- the expected emotion could, for example, result from the actions of another participant in the world, i.e. a counterpart. For example, if someone makes someone laugh, a positive reaction can be expected, or if, for example, a robot collides with another robot, an alarm signal from the other robot can be expected.
- These expected emotions or the second output vector e ′′ of the second neural network NN2 can also be compared with the second input vector e of the first neural network NN1, which also enables the first neural network NN1 to be trained.
- the training of the first neural network NN 1 by means of the second output vector e ′′ of the second neural network NN2 can contribute to stabilizing the overall training of the first neural network NN1 in the sense of so-called multi-task learning abstract effects can be modeled via the second agent W or via the second neural network NN2, such as the effects of an output y of the first neural network NN1 on the worldview, the resulting change in the state of the worldview and, as a result, the emotional feedback on the Even or on the first neural network NN1.
- FIG. 8 shows an expansion of the system shown in FIG. 7 according to the invention.
- the second agent W implements a third neural network NN3, so that not only the state of the worldview can be coded with the second agent W or with the second neural network NN2, but also a model of the Self-image of the worldview can be valued.
- the first output vector x 'of the second neural network NN2 is fed to the third neural network NN3 as the first input vector x' of the third neural network NN3.
- a second output vector e ′′ of the second neural network NN2 is fed to the third neural network NN3 as a second input vector e ′′ of the third neural network NN3.
- the second output vector e ′′ of the second neural network NN2 represents, as already explained above, an anticipated emotion of the new state w t + i of the second neural network NN2.
- the second output vector e ′′ of the second neural network NN2 is derived from the new state w t + i of the second neural network NN2 generated.
- the first input vector x ', the second input vector e "and the current state h' t of the third neural network NN3 are used together in order to transfer the third neural network NN3 to a new state h ' t + i .
- a first output vector y 'of the third neural network NN3 is generated from the new state h' t + i of the third neural network NN3, which is fed to the second neural network NN2 as a further input vector of the second neural network NN2.
- the worldview and the self-image of the second agent W are coupled. This makes it possible for the two neural networks NN3 and NN2 to be able to simulate interactions even without the first neural network NN1.
- a second output vector e '"of the third neural network NN3 is generated from the new state h' t + i of the third neural network NN3.
- the second output vector e '" of the third neural network NN3 represents an expected emotion of the new state h ' t + i of the third neural network NN3.
- the second output vector e '"of the third neural network NN3 is compared with a third reference e ** for the purpose of training the third neural network NN3.
- the comparison of the second output vector e'" of the third neural network NN3 with the third reference e ** may also include calculating a distance function, for example one of the above-mentioned distance functions.
- the third reference e ** represents an ideal state of the second output vector e ′ ′′ of the third neural network NN3 and thus an ideal state of the expected emotion of the new state h ′ t + i of the third neural network NN3.
- first neural network NN1 and the third neural network NN3 can be coupled to one another, for example by coupling the new state h t + i of the first neural network NN1 and the current state h ' t of the third neural network NN3 to one another.
- This coupling is through in Fig. 8 (and in Fig. 9) marked by the arrow P.
- the self-image or the third neural network NN3 does not generate any outputs or output vectors that are available as outputs or output vectors of the second agent W.
- the self-image or the third neural network NN3 can be used to research changes in the worldview based on changes in the self-image based on the first output vector y 'of the third neural network NN3 (which is not made available outside the second agent W).
- the coupling P it is also possible to operate the overall system in two different states, which are referred to here as the waking phase and the dream sleep phase.
- the first agent S or the first neural network NN1 is coupled to the second agent W or to the third neural network NN3 (arrow P).
- the self-image or the third neural network NN3 learns from every action of the first neural network NN1 how the action changes its own state and the state of the worldview or of the second agent W.
- the first agent S or the first neural network NN1 is decoupled from the second agent W or from the third neural network NN3 (no arrow P).
- the first output vector y of the first neural network NN1 is not fed to the second neural network NN2.
- the self-image or the third neural network NN3 can act freely within the second agent W.
- the worldview or the second neural network NN2 can generate both expected inputs (first input vector x 'of the third neural network NN3) and expected emotions (second input vector e ′′ of the third neural network NN3) and the third neural network NN3 the other Input (further input vector y 'of the second neural network NN2) can generate, the world view or the second neural network NN2 and self-image or the third neural network NN3 can act completely freely in alternation.
- Training is still possible for the first agent S or the first neural network NN1, since the new state h t + i of the self or of the first neural network NN1 still includes the second output vector e 'of the first neural network NN1 generated, which can be compared with the second (ideal) reference e *.
- Dreaming can therefore be used to generate improved interaction of the self-image or the third neural network NN3 with the expected worldview.
- FIG. 9 shows an expansion of the system shown in FIG. 8 according to the invention. According to the extension shown in FIG. 9, the overall system shown in FIG.
- extended functions could, for example, be an extended memory (designed as a storage device) that can store and load the state of the second neural network NN2 and / or the state of the third neural network NN3. Further extensions, only listed as examples, can be:
- a language processor which can convert the state of the second neural network NN2 and / or the state of the third neural network NN3 into symbol sequences of words and letters;
- any further modules can be provided that can interact with the state of the second neural network NN1 and the state of the third neural network NN3.
- the second input vector e of the first neural network NN1 can represent, for example, vital parameters (charge level of the accumulator, functionality of the axes, etc., whereby these parameters can be provided by suitable sensors).
- the second input vector e of the first neural network NN1 can also represent or describe goals, for example the urge to explore one's Elmwelt (curiosity) or the processing of tasks (loyalty), using the extended functions shown in FIG. 9 for this purpose can be.
- the extended functions can bring about changes to the state of the second agent W directly in the self-image or in the third neural network NN3. If, for example, the list of work has not yet been completed, the state of the second agent W changes in such a way that it causes an emotion e '(represented by the second output vector of the first neural network NN1), which in turn arouses the desire in the first agent S. to work through the list. Additional extended functions may be required for this. For example, a task planner can be provided as an extended function which enables the first agent S to process a sequence of actions.
- an extended function for mapping for example using Simultaneous Localization and Mapping (SLAM), in which a map and the position of the Mars rover are estimated at the same time
- SLAM Simultaneous Localization and Mapping
- the relevant information can be provided by suitable sensors, such as ultrasonic sensors or lidar.
- Another module can examine the card for gaps and errors. If such gaps or errors are found, the state of the self-image or the third neuronal Network NN3 are changed so that a corresponding emotion e '(represented by the second output vector of the first neural network NN1) is generated. As a result, the system or the first agent S tries to quit this state and to correct the errors and / or gaps in the map. This can then also be done using a task planner.
- pre-trained neural networks or even direct algorithms can be used if they are implemented on the basis of differentiable programming. This makes it possible in an advantageous manner to mix neural networks and programming, whereby the development and training of the neural networks are considerably accelerated.
- an overall solution is provided for the first time that can be trained in a manner comparable to the human perception process through emotions and interaction with the world. To do this, it is not necessary to provide a fixed view of the world, as is required in the prior art.
- the worldview is learned autonomously. Actions worth striving for are learned purely through emotions through weak labeling. According to the method according to the invention, the agent S can therefore act completely autonomously and in a self-learning manner. According to the further development shown in FIG. 8, even a self-image in the world or the worldview is modeled with which the worldview can be trained. The system according to FIG. 8 can learn itself in the waking and sleeping phases without interaction with the real world being necessary.
- Switching off the self or the first agent S would put the overall system in a state in which it can only interact with itself. This condition is described as locked-in syndrome in neuropathology. - The entire consciousness could be switched off completely. This could be achieved by removing the worldview. The overall system could still act, but it would no longer be able to create complex plans, as the worldview is required for this. This corresponds to the so-called automatisms observed in neuropathology. The state of sleepwalking also produces similar phenomena.
- a removal of the block e '(second output vector of the first neural network NN1) is comparable to a restriction of the amygdala of the brain. Here the entire system can no longer process the emotions correctly. Similar limitations can also exist in autistic disorders.
- Restriction of the extended functions shown in FIG. 9 can also be mapped to corresponding neuropathological phenomena. These include, for example, amnesia, cortical deafness or cortical blindness.
- the first agent S is able to adapt to completely new environments, since both the image of the World as well as the image of oneself can be completely relearned and adapted.
- the system is thus able to learn and adjust to change in the world as well as to observe and take into account changes in the self. No training data is required to use the system. Merely your own feedback based on the emotion is sufficient to adjust to complex new situations.
- W second agent also called "Weltchan"
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102019105281.5A DE102019105281A1 (de) | 2019-03-01 | 2019-03-01 | Autonomes selbstlernendes System |
PCT/EP2020/055427 WO2020178232A1 (de) | 2019-03-01 | 2020-03-02 | Autonomes selbstlernendes system |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3931761A1 true EP3931761A1 (de) | 2022-01-05 |
Family
ID=69770879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20709525.8A Pending EP3931761A1 (de) | 2019-03-01 | 2020-03-02 | Autonomes selbstlernendes system |
Country Status (5)
Country | Link |
---|---|
US (1) | US20210397143A1 (de) |
EP (1) | EP3931761A1 (de) |
CN (1) | CN113678146A (de) |
DE (1) | DE102019105281A1 (de) |
WO (1) | WO2020178232A1 (de) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9754221B1 (en) * | 2017-03-09 | 2017-09-05 | Alphaics Corporation | Processor for implementing reinforcement learning operations |
EP3596662A1 (de) * | 2017-05-19 | 2020-01-22 | Deepmind Technologies Limited | Vorstellungsbasiere neuronale agentennetze |
-
2019
- 2019-03-01 DE DE102019105281.5A patent/DE102019105281A1/de active Pending
-
2020
- 2020-03-02 CN CN202080027691.8A patent/CN113678146A/zh active Pending
- 2020-03-02 WO PCT/EP2020/055427 patent/WO2020178232A1/de active Application Filing
- 2020-03-02 EP EP20709525.8A patent/EP3931761A1/de active Pending
-
2021
- 2021-08-31 US US17/462,632 patent/US20210397143A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN113678146A (zh) | 2021-11-19 |
WO2020178232A1 (de) | 2020-09-10 |
US20210397143A1 (en) | 2021-12-23 |
DE102019105281A1 (de) | 2020-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE102010045529B4 (de) | Interaktives Robotersteuerungssystem und Verwendungsverfahren | |
Samsonovich | Toward a unified catalog of implemented cognitive architectures | |
DE4042139C2 (de) | Neuron-Einheit | |
Voss | Essentials of general intelligence: The direct path to artificial general intelligence | |
WO2000063788A2 (de) | Situationsabhängig operierendes semantisches netz n-ter ordnung | |
DE102020209685B4 (de) | Verfahren zum steuern einer robotervorrichtung und robotervorrichtungssteuerung | |
WO2021069129A1 (de) | Vorrichtung und verfahren zum steuern einer robotervorrichtung | |
DE102019131385A1 (de) | Sicherheits- und leistungsstabilität von automatisierung durch unsicherheitsgetriebenes lernen und steuern | |
DE102020212658A1 (de) | Vorrichtung und Verfahren zum Steuern einer Robotervorrichtung | |
WO2020182541A1 (de) | Verfahren zum betreiben eines roboters in einem multiagentensystem, roboter und multiagentensystem | |
DE102021212276A1 (de) | Wissensgetriebenes und selbstüberwachtes system zur fragenbeantwortung | |
DE102020214177A1 (de) | Vorrichtung und Verfahren zum Trainieren einer Steuerungsstrategie mittels bestärkendem Lernen | |
EP0388401B1 (de) | Selbstentwickelndes computersystem | |
DE102020200165B4 (de) | Robotersteuereinrichtung und Verfahren zum Steuern eines Roboters | |
EP3931761A1 (de) | Autonomes selbstlernendes system | |
WO2020178248A1 (de) | Autonomes selbstlernendes system | |
DE102022201116B4 (de) | Verfahren zum Steuern einer Robotervorrichtung | |
DE102021114768A1 (de) | Fahrzeugsteuerung unter Verwendung eines Controllers eines neuronalen Netzes in Kombination mit einem modellbasierten Controller | |
DE102014000086A1 (de) | Arbeitsverfahren für Behandlung von abstrakten Objekten (Gedanke-Substanzen) von einem Computersystem von Künstlicher Intelligenz von einem Cyborg oder einem Android. | |
DE102020210823A1 (de) | Maschinen-Steueranordnung | |
Pozna et al. | A new pattern of knowledge based on experimenting the causality relation | |
DE102022208082B3 (de) | Verfahren zum Steuern eines technischen Systems | |
DE102020105485A1 (de) | Trainieren lokaler künstlicher neuronaler Netzwerke | |
DE102017205048A1 (de) | Vorrichtung und verfahren zur bestimmung eines zustands eines arbeitsablaufs | |
DE102015004666A1 (de) | Ein zeigerorientiertes Objekterfassungsverfahren für eine greifbare, stofflich artige Behandlung von Informationen von einem Computersystem von einer Künstlichen Intelligenz von einem Cyborg oder einem Android, wobei eine aufgenommene Signal-Reaktion des |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20211001 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20231030 |