US20220245429A1

US20220245429A1 - Recursive coupling of artificial learning units

Info

Publication number: US20220245429A1
Application number: US17/612,746
Authority: US
Inventors: Heiko Zimmermann; Günter Fuhr; Antonie Fuhr
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV; Universitaet des Saarlandes
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV; Universitaet des Saarlandes
Priority date: 2019-05-21
Filing date: 2020-03-09
Publication date: 2022-08-04
Also published as: CN114026570A; EP3973457A1; US20220292331A1; EP3973456A1; WO2020233851A1; WO2020233850A1; CN114556365A

Abstract

Provided is a method in a system of at least two artificial intelligence units, comprising inputting input values to at least a first artificial intelligence unit and a second artificial intelligence unit; obtaining first output values of the first artificial intelligence unit; forming one or more modulation functions based on the output values of the first artificial intelligence unit; applying the formed one or more modulation functions to one or more parameters of the second artificial intelligence unit, the one or more parameters influencing the processing of input values and the obtaining of output values in the second artificial intelligence unit; and finally obtaining second output values of the second artificial intelligence unit.

Description

The present invention relates to a method for recursively coupling artificial intelligence units.

PRIOR ART

Artificial intelligence now plays an increasing role in countless areas of application. This is initially understood to mean any automation of intelligent behavior and machine learning. However, such systems are usually intended and trained for special tasks. This form of artificial intelligence (AI) is often referred to as “weak AI” and is essentially based on the application of computations and algorithms to simulate intelligent behavior in a fixed domain. Examples include systems that are able to recognize certain patterns, such as safety systems in vehicles, or that can learn and implement certain rules, such as in chess. At the same time, these systems are essentially useless in other domains and must be completely retrained for other applications or even trained using completely different approaches.
For the practical implementation of such artificial intelligence units, neural networks are used, among other things. In principle, these networks replicate the functioning of biological neurons on an abstract level. There are several artificial neurons or nodes that are connected to each other and can receive, process and transmit signals to other nodes. For each node, for example, functions, weightings and threshold values are defined that determine whether and to what extent a signal is passed on to a node.
Usually, the nodes are considered in layers, so that each neural network has at least one output layer. Before that, other layers may be present as so-called hidden layers, so that a multilayer network is formed. The input values or features can also be considered as layers. The connections between the nodes of the different layers are called edges, and these are usually assigned a fixed processing direction. Depending on the network topology, it may be specified which node of a layer is linked to which node of the following layer. In this case, all nodes can be connected, but, for example, due to a learned weighting with the value 0, a signal cannot be processed further via a specific node.
The processing of signals in the neural network can be described by various functions. In the following, this principle is described for a single neuron or node of a neural network. From the several different input values reaching a node, a network input is formed by a propagation function (also input function). Often this propagation function comprises a simple weighted sum, where for each input value an associated weight is given. In principle, however, other propagation functions are also possible. In this case, the weights w_ican be specified as a weight matrix for the network.
An activation function f_akt, which can be dependent on a threshold value, is applied to the network input of a node formed in this way. This function represents the relationship between the network input and the activity level of a neuron. Various activation functions are known, for example, simple binary threshold functions whose output is thus zero below the threshold and identity above the threshold; sigmoid functions; or piecewise linear functions with a given slope. These functions are specified in the design of a neural network. The result of the activation function fact forms the activation state. Optionally, an additional output function f_outor output function may be specified, which is applied to the output of the activation function and determines the final output value of the node. Often, however, the result of the activation function is simply passed directly as the output value here, i.e. the identity is used as the output function. Depending on the nomenclature used, the activation function and the output function can also be combined as the transfer function f_trans.
The output values of each node are then passed on to the next layer of the neural network as input values for the respective nodes of the layer, where the corresponding steps are repeated for processing with the respective functions and weights of the node. Depending on the topology of the network, there may also be backward edges to previous layers or back to the outputting layer, resulting in a recurrent network.
In contrast, the weights w_iwhich are used to weight each of the input values can be changed by the network to adjust the output values and operation of the entire network, which is considered as the “learning” of a neural network.
For this purpose, an error backpropagation in the network is usually used, i.e. a comparison of the output values with expected values and a use of the comparison for the adaptation of the input values with the goal of error minimization. Error feedback can then be used to adjust various parameters of the network accordingly, such as the step size (learning rate) or the weights of the input values at the nodes. Likewise, the input values can also be re-evaluated.
The networks can then be trained in a training mode. The learning strategies used are also decisive for the possible applications of a neural network. In particular, the following variants are distinguished:
In supervised learning, an input pattern or training data set is given and the output of the network is compared with the expected value.
Unsupervised learning leaves the finding of correlations or rules to the system, so that only the patterns to be learned are specified. An intermediate variant is partially supervised learning, in which data sets without predefined classifications can also be used.
In reinforced learning or Q-learning, an agent is created that can receive rewards and punishments for actions, and based on this, tries to maximize rewards received and thus adapt its behavior.
An important application of neural networks is the classification of input data or inputs into certain categories or classes, i.e. the recognition of correlations and assignments. The classes can be trained on the basis of known data and be at least partially predefined, or they can be developed or learned independently by a network.
The basic operation and further specific details of such neural networks are known in the field, for example from R. Schwaiger, J. Steinwender: Neuronale Netze programmieren mit Python, Rheinwerk Computing, Bonn 2019.
A universally applicable AI system that is not trained for only one special task would lead to high-dimensional spaces and thus require exponentially increasing training and test data sets. Real-time responses thus quickly become impossible. Therefore, it is generally attempted to reduce the dimensionality and complexity of such systems. Different approaches are being pursued. For example, the complexity can be reduced by linking data sets, reducing the degrees of freedom and/or by feeding known knowledge into a system. As another approach, correlated data or interdependent data sets can be at least partially separated, for example by methods such as Principal Component Analysis. By applying filtering methods to the features, data that do not stand out or stand out negatively when training a network can be eliminated, for example, by applying statistical tests such as the chi-square test or others. Finally, the selection of the training data itself can be done as an optimization problem in an AI network. This involves combining the training data in such a way that it can train a new network as quickly and as well as possible.
More advanced approaches include so-called “Convolutional Neural Networks”, which apply convolutions in at least one layer of a multilayer fully connected network instead of simple matrix transformations. For this purpose, for example, the so-called “deep-dream” method is known, especially in the field of image recognition, in which the weights are left optimal in a trained network, but instead the input values (e.g., an input image) are modified as a feedback loop depending on the output value. Thus, for example, what the system thinks it can identify is faded in or inserted. The name refers to the fact that dream-like images are created in the process. In this way, internal processes of the neural network and their direction can be traced.
It is obvious that these methods still show great differences to human intelligence. Although the databases, text files, images and audio files can in principle be compared to how facts, language, speech logic, sounds, images and event sequences are also stored and processed in the brain, human intelligence, for example, differs significantly in that it links all this data in the context of feelings and unconscious “soft” categorizations.

DISCLOSURE OF THE INVENTION

According to the invention, a method for the recursive coupling of at least two artificial intelligence units with the features of the independent patent claims is proposed. Advantageous embodiments are the subject of the dependent claims and the following description.
In particular, according to one embodiment, a method is proposed in a system of at least two artificial intelligence units comprising inputting input values to at least a first artificial intelligence unit and a second artificial intelligence unit, whereupon first output values of the first artificial intelligence unit are obtained. Based on the output values of the first artificial intelligence unit, one or more modulation functions are formed, which are then applied to one or more parameters of the second artificial intelligence unit. In this regard, the one or more parameters are parameters that affect the processing of input values and the obtaining of output values in the second artificial intelligence unit in some manner. Furthermore, output values of the second artificial intelligence unit are obtained. These may represent, for example, modulated output values of the second unit. In this way, two artificial intelligence units are coupled together without using direct feedback of input or output values. Instead, one of the units is used to influence the function of the second unit by modulating certain functionally relevant parameters, resulting in a novel coupling that leads to different results or output values compared to conventional learning units. Moreover, by processing input values in two coupled units, a result can be obtained in a shorter time or with a more in-depth analysis than in conventional systems, so that the overall efficiency can be increased. In particular, a rapid classification of the problem at hand and consideration of rapid changes is achieved.
In an exemplary embodiment, at least one of the artificial intelligence units may comprise a neural network having a plurality of nodes, in particular one of the learning units to which the modulation functions are applied. In this case, the one or more parameters may be at least one of: a weighting for a node of the neural network, an activation function of a node, an output function of a node, a propagation function of a node. These are essential components of a neural network that determine how data is processed in the network. Instead of defining new weights or functions for the nodes, the modulation function can be used to superimpose existing self-learned and/or predefined functions of the modulated network, which is dependent on the results of the first artificial intelligence unit. In this context, this application of modulation functions can in particular also take place outside a training phase of the networks and thus achieve an active coupling of two or more networks in the processing of input values.
According to an exemplary embodiment, each of the artificial intelligence units can be assigned a classification memory, wherein each of the artificial intelligence units performs a classification of the input values into one or more classes which are stored in the classification memory, wherein the classes are each structured in one or more dependent levels, and wherein a number of the classes and/or the levels in a first classification memory of the first artificial intelligence unit is smaller than a number of the classes and/or the levels in a second classification memory of the second artificial intelligence unit. By making the classification memories of two coupled artificial intelligence units asymmetric in this way, a parallel or also time-dependent alternating evaluation of the input values with different objectives can take place, e.g. a combination of a fast classification of the input values and a deep, slower analysis of the input values.
Alternatively or in addition to the asymmetric design of the classification memories, the complexity of the first and second artificial intelligence units may also be designed differently, so that, for example, a first artificial intelligence unit has a significantly lower degree of complexity than a second artificial intelligence unit. In this regard, for the case of neural networks, for example, a first neural network may have substantially fewer nodes and/or layers and/or edges than a second neural network.
In a possible embodiment, the application of the at least one modulation function may cause a time-dependent superposition of parameters of the second artificial intelligence unit, wherein the at least one modulation function may comprise one of the following features: a periodic function, a step function, a function with briefly increased amplitudes, a damped oscillation function, a beat function as superposition of several periodic functions, a continuously increasing function, a continuously decreasing function. Combinations or temporal sequences of such functions are also conceivable. In this way, relevant parameters of a learning unit can be superimposed in a time-dependent manner, so that, for example, the output values “jump” into search spaces due to the modulation, which would not be reached without the superimposition.
Optionally, the second artificial intelligence unit may comprise a second neural network having a plurality of nodes, wherein applying the at least one modulation function causes deactivation of at least a portion of the nodes. This type of deactivation may also be considered a “dropout” based on the output values of the first artificial intelligence unit, and may also provide for newly explored search regions in the classifications as well as reduced computational overhead and thus accelerated execution of the method.
In exemplary embodiments, the method may further comprise determining a currently dominant artificial intelligence unit in the system, and forming overall output values of the system from the output values of the currently dominant unit. In this way, the two or more networks in the system may be meaningfully coupled and synchronized.
In this regard, for example, the first artificial intelligence unit may be determined to be the dominant unit at least until one or more output values of the second artificial intelligence unit are available. In this way, it can be ensured that the system is decision-safe at all times, i.e. that a reaction of the system is possible at all times (after a first run of the first artificial intelligence unit), even before a complete classification of the input values by all existing artificial intelligence units of the system has been performed.
In this context, it is also possible to further apply a comparison of the current input values with previous input values by at least one of the artificial intelligence units of the system, wherein, if the comparison results in a deviation that is above a predetermined input threshold, the first artificial intelligence unit is set as the dominating unit. In this way, it can be ensured that substantially changed input values (e.g. detection of a new situation by sensors) are immediately reacted to with a new evaluation of the input values.
Additionally or alternatively, a comparison of current output values of the first artificial intelligence unit with previous output values of the first artificial unit may further be made, wherein if the comparison results in a deviation that is above a predetermined output threshold, the first artificial intelligence unit is determined to be the dominant unit. By evaluating deviations in the output values, for example in the presence of deviating classes as a result in comparison to a previous run, changes in the input values can thus also be indirectly detected which have a certain significance and thus make a new classification meaningful.
In certain embodiments, the system may further comprise a timer storing one or more predetermined time periods associated with one or more of the artificial intelligence units, the timer being arranged to measure, for one of the artificial intelligence units at a time, the passage of the predetermined time period associated with that unit. Such an element forms a possibility to synchronize the different units of a system, for example, and to control when output values of a certain unit are expected or further processed. Thus, a timer can be used to define an adjustable latency period of the overall system within which a decision should be available as an overall output value of the system. This time may be, for example, a few ms, e.g. 30 or 50 ms, and may depend, inter alia, on the existing topology of the artificial intelligence units and the computing units (processors or other data processing means) present.
Thereby, for example, the measuring of the assigned predetermined time period for one of the artificial intelligence units can be started as soon as this artificial intelligence unit is determined as the dominating unit. In this way, it can be ensured that a unit develops a solution within a predetermined time or, optionally, that the data processing is even aborted.
In one possible embodiment, the second artificial intelligence unit may be set as the dominant unit if a first time period in the timer predetermined for the first artificial intelligence unit has elapsed. This ensures that a reaction based on the first artificial unit is already possible before the input values are analyzed by further artificial intelligence units, while subsequently the data is analyzed in more detail by the second unit.
In any embodiments, the input values may include, for example, one or more of the following: measured values detected by one or more sensors, data detected by a user interface, data retrieved from a memory, data received via a communication interface, data output by a computing unit. Thus, it may be, for example, image data captured by a camera, audio data, position data, physical measurements such as velocities, distance measurements, resistance values, and generally any value captured by a suitable sensor. Similarly, data may be entered or selected by a user via a keyboard or screen, and optionally may be associated with other data such as sensor data.
Further advantages and embodiments of the invention will be apparent from the description and the accompanying drawings.
It is understood that the above features, and those to be explained below, may be used not only in the combination indicated in each case, but also in other combinations or alone, without departing from the scope of the present invention.
The invention is schematically illustrated with reference to an example embodiment shown in the drawings, and is described below with reference to the drawings.

FIGURE DESCRIPTION

FIG. 1 shows a combination of two artificial intelligence units coupled together;

FIG. 2 schematically shows various exemplary modulation functions;

FIG. 3 illustrates the application of a dropout method in two coupled neural networks according to one embodiment;

FIG. 4 shows a system as in FIG. 1 with an additional timer;

FIG. 5 schematically represents a system as in FIG. 1 with the associated classification memories; and

FIG. 6 shows an alternative system with three coupled artificial intelligence units.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 1 shows an exemplary embodiment with two linked artificial intelligence units 110, 120, which will be described in more detail below. In the following explanations, the artificial intelligence units are exemplarily designed as neural networks.
Thereby, a first artificial intelligence unit is provided, here in the form of a first neural network 110, which can essentially be used to categorize the input signals x_iand to influence a second artificial intelligence unit 120, here a second neural network, with the result of this categorization. Preferably, the results of the first neural network are not used as input values for the second neural network, but are used to influence existing weights, step sizes and functions of the network. In particular, these parameters of the second neural network may be influenced such that they are not completely redefined, but rather the original parameters of the second network 120 are modulated or superimposed based on the output signals of the first neural network 110. This means that the two neural networks otherwise preferably operate independently, e.g. train their basic values themselves, but may be coupled by a superposition. In this regard, the two neural networks may be substantially similar in design to each other, but with, for example, significantly different levels of complexity, such as the number of layers and classifications present. Further, each of the neural networks has its own memory.
In one possible embodiment, the first neural network 110 may be used as a categorizing network which serves to categorize the input values coarsely and quickly, while then, on this basis of the categorization result, the second network is influenced accordingly by modulating parameters of the second network. For this purpose, the first neural network may be a network with comparatively few levels, having a memory with few classes K1, K2, . . . K_n, which are preferably highly abstracted to achieve a coarse categorization. For example, this first neural network could be limited to 10, 50, 100 or 500 classes, these numbers being of course only to be understood as rough examples. In this regard, the training of the first neural network may in particular be performed individually and independently of further coupled neural networks. Additionally or alternatively, however, a training phase in a coupled state with one or more coupled neural networks may also be used.
The first neural network is thus intended to provide a usable output within a short period of time, which can be used to meaningfully influence the second neural network. Weights and functions can be generated from the output values Output1 of the first neural network 110, which can be superimposed on the self-generated weights and functions of the second neural network 120. This means that the second neural network initially functions independently and does not fully adopt the output values of the first network or the parameters obtained therefrom. Also, the second neural network 120 may initially be trained independently in the usual manner and thereby have self-generated weights.
In this context, the second neural network may be designed to be significantly more complex than the first neural network and, in particular, may have more levels and/or memory classes. The degree by which the complexity of the second neural network is increased compared to the first network can be determined differently depending on the application. The input values or input data for the second neural network are thereby preferably the same input values as for the first neural network, so that a more complex analysis can now be carried out with the same data. Alternatively, however, output values of the first neural network can also be used, at least in part, as input values of the second network. In particular, in the case of a significantly different complexity of the second network, for example, a second network could be provided to which both the original input values which also served as input values for the first network are supplied as input values, and additionally the output values of the first network are used as input values of the second network.
FIG. 2 shows exemplary different modulation functions f_mod, with which one or more parameters of the second neural network can be superimposed. In principle, the superposition or modulation can take place in any way. For example, when a modulation function f_{mod_w}is applied to the weights w_i2of the nodes, it may be provided that the weighting matrix of the second network 120 is used as an argument of the modulation function, or one-dimensional (even different) functions may be provided for each one of the weights w_i2. When a modulation function f_{mod_f_}is applied to one of the descriptive functions of the second neural network, i.e., to a transfer function f_trans2, an activation function f_akt2, a propagation function, or an output function f_out2of the network 120, this may be done by combining the two functions, and again a modulation function f_{mod_f}may be applied to either only a portion or all of the relevant descriptive functions (e.g., to all of the activation functions f_akt2of the second neural network 120). Modulations may be applied equally to all nodes of a network, or alternatively may be applied to only a portion of nodes, or may be modulated differently for each node. Similarly, for example, modulations may be applied separately or otherwise staggered for each layer of a network.
In particular, the modulation functions fmod can also be time-dependent functions, so that the weights w_i2or functions of the second neural network are changed in a time-dependent manner. However, static modulation functions for modulating the second neural network are also conceivable. In this case, the modulation is applied to the parameters of the second network 120 which are already originally defined for this second network (such as the propagation functions or the activation functions), or which were obtained independently during the training phase, such as the adapted self-generated weights.
Eight different time-dependent modulation functions are shown as examples in FIG. 2. Example a) shows a simple binary step function, in which the value zero is specified up to a specified time and then a value greater than zero is specified. Here, the second value can in principle be 1, but could also have a different value, so that the original parameters are additionally subjected to a factor. In this way, for example, a weighting is switched on and off in a time-dependent manner or amplified in a time-dependent manner. Example b) shows a reverse situation, in which a step function with a second value less than zero is predetermined. Likewise, as an alternative to the variants of examples a) and b), step functions are conceivable which comprise two different values not equal to 0, so that the level is raised or lowered in a corresponding time-dependent manner.
Example c) shows a periodic modulation function which can also be applied to any parameter of the second network and in this way will periodically amplify or attenuate certain elements in a time-dependent manner. For example, different amplitudes and/or periods for such a function could also be chosen for different nodes and/or different layers, respectively. Any periodic function could be used at this point, such as a sinusoidal function or even non-continuous functions. Depending on the type of concatenation of the functions with the self-generated functions of the second network, only positive or also negative function values can be selected.
Example d) shows a slow continuous transient increase and decrease in level. Example e), on the other hand, describes brief, approximately rectangular high levels with an otherwise low function value, which can optionally be zero. Similarly, example f) shows irregularly distributed and very short peaks or spikes, which thus cause a level increase for a very short period of time. Here, the peaks have different amplitudes and can take on both positive and negative values (relative to the basic value). For the variants from examples e) and f), both regular, periodic and temporally completely irregular (e.g. stochastically determined) distributions of the peaks or amplifications can be present. In this context, short level increases can, for example, lie within the time of a decision cycle of the second neural network, while longer pronounced level changes can extend over several decision cycles.
Example g) in FIG. 2 further shows a damped oscillation, which could also be arbitrarily designed with different dampings and amplitudes. Finally, example h) shows a time sequence of different oscillations around the fundamental value, where in particular the period lengths of the oscillations differ, while the amplitude remains the same. This combination of different oscillations can also be designed as an additive superposition, i.e. beat.
In general, any modulation functions are conceivable and the functions shown in FIG. 2 are only to be understood as examples. In particular, any combination of the example functions shown is possible. It is also understood that the baseline shown in all examples can run at 0 or at another basic value, depending on the desired effect of the modulation function. For a pure concatenation of the modulation function with the respective modulated function, a base value of 0 and corresponding increases in the function value can be used to ensure that the respective node only contributes to the processing in a time-dependent manner and is switched off at other times. With a basic value of 1, on the other hand, it can be achieved that, for example, with the example in FIG. 2a ), a modulation function which is applied to the weights first reproduces the self-generated weights of the modulated network as a basic value and then, from the stepped higher value, has correspondingly increased weights. Accordingly, such a function also acts on the modulation of functions such as the activation function.
As described above, a modulation function may be formed based on the output values of a first artificial intelligence unit, i.e., in the present example, based on the first neural network. The relationship between the output values and the modulation function formed therefrom may be arbitrary. For example, this correlation may be generated at least in part in a joint training phase of the coupled network. In other embodiments, it may be predetermined how the dependency between the modulation functions and the output values of the first network is designed. Optionally, it could also be decided that for certain output values no modulation of the second network takes place at first.
Alternatively or in addition to applying modulation functions to the weights and functions of a second neural network, a coupled dropout method can be applied, which is illustrated in FIG. 3. This is conventionally a neural network training procedure in which only a portion of the neurons present in the hidden layers and the input layer are used in each training cycle and the remainder are not used (“drop out”). To this end, the prior art typically sets a dropout rate based on the feedback errors of the network, which determines how large a fraction of the total network is made up of neurons that are switched off. Similarly, instead of neurons, some of the edges or connections between neurons could be switched off.
Such partial disconnection of neurons and/or edges may now also be used in a second neural network in exemplary embodiments, wherein now the dropout parameters are not used based on the error feedback of the network itself, but as in time-dependent modulation, depending on the output values of a first neural network. In this regard, for example, a dropout rate for the second neural network may be determined based on the output values Output1 of the first neural network 310, which is then applied to the second network. The figure again shows two coupled networks 310, 320 as in FIG. 1, but now the neurons or nodes 326, 328 of the second network 320 are schematically indicated as circles. Here, the connecting edges are not shown, and the arrangement of the neurons shown is not intended to have any compelling relationship to their actual topology. Via the dropout rate, a portion of the available neurons are now deactivated and thus not used. The active neurons 326 of the second network are shown shaded in the figure, while the unfilled neurons are intended to represent the dropout neurons 328.
In a general manner, the coupled dropout described herein can also be understood as a modulation function f_modby using either 0 or 1 as the modulation function for the weight or, for example, the output function of each node. This may be based on the output values of the first network to determine which of the neurons 326, 328 are switched off, or only the rate may be specified and stochastic functions may be used to determine which neuron is switched off. In this regard, the dropout rate may also again be determined based on the output values Output1 of the first network 310. In this regard, a dropout modulation function may optionally also cause a time-dependent shutdown, which would correspond, for example, to a concatenation of a dropout function with a modulation function as shown in FIG. 2. Similarly, a sequence of pattern shutdowns that have been proven in prior training may also be employed, such that, for example, cyclic pattern variations are employed for shutdown in the second neural network 320.
In general, the dropout can ensure that the working speed of a neural network is increased. It also prevents neighboring neurons from becoming too close in behavior. The coupled dropout as described above can be used both in a joint training phase, in which the two networks are coupled, and in an already trained network.
To ensure that the coupled neural networks complement each other in a meaningful way, it can be determined which of the neural networks dominates the overall system at any given time. The network whose output values determine the output of the overall system can be designated as the dominating network or dominance. In the following, it is assumed that only exactly one network in a group of two or more coupled networks is dominant at any time, and thus the output of the dominating network is equal to the output of the overall system. However, other embodiments are also conceivable in principle, so that, for example, rules are specified which describe a processing of the output values of the dominating nets to a final overall output value in the case of more than one dominating net.
In exemplary embodiments, a timer or timing element can be implemented for this purpose, which defines a time specification for one or more of the coupled neural networks. In this context, this time specification is preferably to be understood as a maximum value or temporal upper limit after which an output value of the respective network must be present, so that an output can also be present earlier. At the latest after expiry of the time specified for a particular network, an output value of this network is then evaluated. The timer can thus control and/or change the dominance between the coupled nets on the basis of fixed time specifications.
An exemplary embodiment of this type is shown in FIG. 4. Here, the formation and coupling of the two neural networks 410, 420 may correspond to the example already described in FIG. 1. The timer 440 now ensures that the output of the first neural network 410 is evaluated at the latest after a predetermined time, which is defined by a predetermined time parameter value. The required time may be measured, for example, from the time the input values X_iare fed into the respective network. The choice of the predetermined time parameter for a network may thereby be carried out in particular depending on the complexity of a network, so that usable results can actually be expected in the predetermined time. In an example such as that previously described, in which the first neural network 410 is preferably formed by a network with a few hidden layers and a small number of classifications, a correspondingly short time can thus also be selected for this first network. Similarly, other considerations may be taken into account when choosing the time parameters for a network, such as the hardware available, which has a decisive influence on the computation time of the networks, and/or also the application area considered by the coupled networks. Further, the predetermined timing parameters may be variable and may, for example, be modified or redefined depending on results from at least one of the coupled neural networks. It is understood that such a time specification should comprise at least the time period required as a minimum time for traversing the respective network 410, 420 once. In FIG. 4, as an example, a time span of 30 ms is specified for the first network, so that during a process run, this network dominates in the time from 0 ms to 30 ms from the start of the process. However, a suitable other value for this time span can of course also be selected.
During the time period specified by the time parameter for the first network 410 (here 30 ms), the first neural network will process the input values X_iin the usual manner. After the predetermined time has elapsed, the output Output1 of the first neural network 410 may be used to generate functions that are used to superimpose or modulate the second neural network's own weights and functions. Furthermore, the output values of the first neural network may also be processed independently as an alternative or in addition to being used to influence the second network 420 and used, for example, as a fast output of the overall system.
Once the modulation functions f_{mod_f}, f_{mod_w}have been applied to the second neural network 420, the timer 440 may start a new timing measurement, now applying a second timing parameter predetermined for the second neural network 420.
In this regard, the second neural network 420 can optionally also independently utilize the input values X_ieven before the modulation by the obtained modulation functions f_{mod_f}, f_{mod_w}, so that, for example, the input values can also be given to the second neural network 420 even before the start of the second predetermined time period and can be processed there accordingly. After the first time period has elapsed, the parameter values and functions of the second neural network are then superimposed by applying the corresponding modulation functions f_{mod_f}, f_{mod_w}. In this regard, one or more modulation functions may be formed for different parts of the second neural network 420, for example for the weights, output functions, propagation functions and/or activation functions of the second neural network. In the case of a second neural network 420 that is formed to be significantly more complex than the first neural network 410, for example by having significantly more layers and nodes and/or by having a higher number of memory classes, the second neural network will require a comparatively higher computational effort and thus also more time, so that in this case the second time period may be selected to be correspondingly longer.
In this regard, optionally each of the networks 410, 420 may continue to continuously process and evaluate the input values even while another network is determined to be the dominant network in the overall system based on the current time spans. In particular, in the example shown of two coupled networks, the first network may continuously evaluate the input values even while dominance is with the second network and therefore the output values of the overall system may correspond to the output values of the second network after the second time period has elapsed and a solution has been found by the second network. In this way, a fast categorizing network such as the first network 410 described herein, which evaluates the available input values throughout, can also perform short-term interventions to the extent that the output values found find their way into the overall output. Such embodiments will be described in further detail below.
As a result of such a time control by predetermined time periods in a timer, the overall system can make decisions early and, for example, already be capable of acting without the final evaluation and detailed analysis by the second neural network already having to be completed. As an example, a situation in an autonomous driving system may be considered to be evaluated by such a system with at least two coupled networks. By means of the first unit or the first neural network, an early categorization “danger” can be achieved, which does not yet involve any further assessment of the nature of the danger, but can already lead to an immediate reaction such as a slowing down of the speed of the vehicle and the activation of the braking and sensor systems. At the same time, based on the categorization, namely under the influence of the modulation by the output values of the first network, the second neural network performs a more in-depth analysis of the situation, which can then lead to further reactions or changes of the overall system based on the output values of the second network.
It is also conceivable not to specify a time limit for each of the coupled networks, but only for one of the networks (or, if more than two networks are coupled, also for only a subset of the coupled networks). For example, in the above example, a timer could be applied to the first fast categorizing neural network, while the second network is not given a fixed time constraint, or vice versa. Such an embodiment may also be combined with further methods for determining the currently dominant network, which are described in further detail below.
In all embodiments with an inserted timer, it can be provided that the output values of the neural network which currently has an active timer are used as the output of the overall system. Due to the time required by a network to reach a first solution for given input values, there is a certain latency time within which the previous output values (of the first or second network) are still available as total output values.
If timers are only defined for some of the coupled nets, e.g. a timer is only active for a first net, it can be defined, for example, that the output of the overall system generally always corresponds to the output of the second net and is only replaced by the output of the first net if a timer is active for the first net, i.e. a predefined period of time is actively running and has not yet expired.
In a system with more than two nets, a reasonable synchronization of the nets among each other can also be made possible by aligning the predetermined time spans and changing the timer, especially if several nets with different tasks are to arrive at a result simultaneously, which in turn is to have an influence on one or more other nets. Similarly, by adjusting the predetermined time periods and sequences, synchronization can also be achieved among several separate overall systems, each comprising several coupled networks. In this context, the systems can be synchronized, for example, by a time alignment and then run independently but synchronously according to the respective timer specifications.
In addition or alternatively to changing the respective dominating neural network in the overall system based on a timer, each of the neural networks itself may also make decisions to hand over dominance in a cooperative manner. This may mean, for example, that a first neural network of an overall system processes the input values and arrives at a certain first solution or certain output values.
As with the change of the center of gravity with the help of the timer, it can be specified here that the output values of the total net correspond in each case to the output values of the currently dominating net.
For this purpose, for example, changes in the input values may be evaluated. As long as the input values remain substantially unchanged, the dominance distribution among the coupled networks may also remain substantially unchanged, and/or may be determined based solely on a timer. However, if the input values suddenly change, a predetermined dominance may be established that overrides the other dominance behavior of the coupled networks. For example, for suddenly changing input values, it may be determined that dominance will initially revert to the first neural network in any case. This also restarts an optional timer for this first neural network, and the process is performed as previously described earlier. A significant change in the input values could occur, for example, if sensor values detect a new environment or if a previously evaluated process has been completed and a new process is now to be triggered.
Threshold values can be specified in the form of a significance threshold, which can be used to determine whether a change in the input values should be considered significant and lead to a change in dominance. Individual significance thresholds may also be predetermined for different input values or for each input value, or a general value, for example in the form of a percentage deviation, may be provided as a basis for evaluating a change in the input values. Likewise, instead of fixed significance thresholds, there could be thresholds that can be changed in time or adaptively and depending on the situation, or they could be functions, matrices or patterns, on the basis of which the significance of the change can be evaluated.
Alternatively or additionally, the change in dominance among the coupled networks may be made dependent on the output values found for each network. For example, depending on the embodiment, the first neural network may evaluate the input values and/or their change. In this context, significance thresholds may be predetermined in each case for the classes which are available for the first neural network for classification, so that if the first neural network finds a significant change in the class found for the input data, a transfer of dominance to the first neural network takes place immediately, so that a rapid re-evaluation of the situation and, if necessary, a reaction can take place. In this way, it can also be prevented that despite a significantly changed input situation, which was detected by the first, fast categorizing network, the analysis is continued for an unnecessarily long time without taking the change into account by the second neural network in depth.
In all of the above examples, the output values of the overall system can be further used in any way, for example as direct or indirect control signals for actuators, as data that is stored for future use, or as a signal that is passed on to output units. In all cases, the output values can also initially be further processed by additional functions and evaluations and/or combined with further data and values.
FIG. 5 again shows the simple embodiment example as in FIG. 1 with two unidirectionally coupled networks 510, 520, whereby a classification memory 512, 522 is now schematically shown for each of the networks. The type of classifications K_iused is initially of secondary importance here and will be described in more detail below. In particular, the dimension and structure of the two classification memories of the first 512 and second network 522 may differ significantly, so that two neural networks with different speeds and foci or centers of gravity are formed. Thus, for example, as already briefly described, an interaction of a fast, coarsely categorizing network and a slower, but more detailed analyzing network can be achieved to form a coupled overall system. In the present example, a first neural network 510 is formed with relatively few classifications K1, K2, . . . , K_n, which may also follow, for example, only a flat hierarchy, such that categorization is performed in only one dimension. Preferably, such a first network 510 may also be formed with a comparatively simple topology, i.e. with a not too large number n of neurons and hidden layers. In principle, however, the network topology may be substantially independent of the classifications.
The second neural network 520 may then have a significantly larger and/or more complex classification system. For example, this memory 522 or the underlying classification may also be hierarchically structured in multiple levels 524, as shown in FIG. 5. The total number m of classes K1, K2, . . . , K_mof the second network 520 may be very large, in particular significantly larger than the number n of classes used by the first neural network 510. For example, the number m, n of classes could differ by one or more orders of magnitude. Thus, an asymmetric distribution of the individual networks in the overall system is achieved.
Rapid classification by the first neural network 510 may then be used to quickly classify the input values. Abstract summary classes may preferably be used for this purpose. In one example, the classification of a sensed situation (e.g., based on sensor data such as image and audio data) may then be initially performed by the first neural network 510 as a “large, possibly dangerous animal” without performing any further analysis for this purpose. This means that, for example, no further classification by animal species (wolf, dog) or as a dangerous predator is made in the first network, but instead classification is made only according to the broadest possible general characteristics, such as size, detection of teeth, attack postures, and other characteristics. This data, which essentially corresponds to the output “danger”, can then optionally already be passed on to appropriate external systems for preliminary and rapid response, such as a warning system for a user or to specific actuators of an automated system. Furthermore, the output Output 1 of the first neural network 510 is used to generate the described modulation functions for the second neural network 520.
The same input values X_i, e.g. said sensor values, are also given to the second neural network 520. In this case, the input values can be input immediately, i.e. substantially simultaneously as to the first network, or with a delay, in which case they are input before or only when the modulation functions are applied, i.e. when the result of the first network is available, depending on the embodiment. Preferably, they should not be given to the second neural network later, in particular in the case of time-critical processes, in order to avoid delays. The second neural network then also computes a solution, and the self-generated weights original to this second network and its basis functions (such as the specified activation functions and output functions) can each be superimposed based on the modulation functions formed from the output values of the first network. This allows the iterative work of the second network to omit a large number of possible variants for which there would be no time in the case of a critical situation (e.g., a hazardous situation) quickly detected by the first network. While the slower analysis of the second neural network takes place, possible reactions can already be executed on the basis of the first neural network, as described. This corresponds to a first instinctive reaction in biological systems. The hierarchical and, compared to the first network, significantly larger memory of the second network then allows a precise analysis of the input values, in the example mentioned a detailed classification into the class “dog”, the respective breed, behavioral characteristics that indicate danger or a harmless situation, and others. If necessary, after a result has been achieved by the second neural network, the previous reaction of the overall system can then be overwritten, e.g. by downgrading the first classification “danger” again.
Overall, for such a coupled overall system with asymmetric classification, it may be envisaged, for example, that the classes K_nof the fast-classifying first network 510 mainly perform abstract classifications such as new/known situation, dangerous/undangerous event, interesting/uninteresting feature, decision required/not required and the like, without going into depth. In this regard, this first classification need not necessarily correspond to the final result ultimately found by the second unit 520. However, the two-stage classification by at least one fast and one deep analyzing unit thus allows for sentiment-like or instinctive reactions of an artificial intelligence overall system. For example, if an object is identified by image recognition that could possibly be a snake, the “worst case” may preferably be the result of the first classification, regardless of whether this classification is likely to be correct or not. What is present in the case of human intelligence as evolutionary knowledge and instinctive reaction can be replaced by a fast first classification with pre-programmed knowledge, so that appropriate default reactions (keep distance, initiate movement, activate increased attention) can also be performed by the overall system and its actuators. The additional modulation of the second learning unit on the basis of this first classification can then be understood similarly to an emotion-related superposition, i.e., for example, corresponding to a fear reaction that automatically initiates a different conscious situation analysis than a situation understood as harmless. The superimposition of the parameters of the second neural network, which is performed by the modulation functions, can thereby cause the necessary shift into other classification spaces that are otherwise not reached by default or not reached immediately.
Accordingly, such systems can be used for a variety of application areas, for example in all applications in which critical decision-making situations occur. Examples are driving systems, rescue or warning systems for different types of hazards, surgical systems, and generally complex and nonlinear tasks.
In the embodiments described so far, only two artificial intelligence units have been coupled together. However, this idea is in principle also applicable to more than two units, so that, for example, three or more artificial intelligence units may be coupled in an appropriate manner, whereby it may be determined which of the units may modulate the parameters of a particular other unit or units. FIG. 6 shows an example in which three neural networks 610, 620, 630 (and/or other artificial intelligence units) may be provided, wherein the output values of the first network 610 yield modulation functions for the weights and/or functions of the second network 620, and wherein in turn output values of the second network yield modulation functions for the weights and/or functions of the third network 630. In this way, arbitrarily long chains of artificial intelligence units could be formed, which influence each other in a coupled manner by superposition.
Similar to the earlier example with two neural networks, in one embodiment all coupled networks can receive the same input values and the processing can only be coupled by the modulation of the respective networks. Equally, however, embodiments are conceivable in which, for example, a third neural network is provided subsequent to two neural networks as in FIG. 1, which receives the output values of the first and/or second network as input values. Optionally, the functions and/or weights of this third neural network could also be modulated by modulation functions which are formed, for example, from the output values of the first network. These could be the same or different modulation functions than the modulation functions formed for the second network. Alternatively, for example, the output values of the third network could be used to form additional modulation functions which are then recursively applied to the first and/or second network.
It is understood that various further combinations of correspondingly coupled learning units are possible, in which at least two of the connected units have a coupling by forming modulation functions for the descriptive parameters of the units, in particular for the case of neural networks for the weights and/or functions of a network. As the number of coupled units increases, more complex variations of the modulations and couplings are conceivable.
As already noted at the beginning, the embodiments described here were described as examples with respect to neural networks, but can in principle also be transferred to other forms of machine learning. In this context, all variants are considered in which it is possible to influence at least a second artificial intelligence unit by a first artificial intelligence unit by superposition or modulation on the basis of output values. Modification of the weights and functions of a neural network by superposition using modulation functions from the preceding examples may be replaced by corresponding modulation of any suitable parameter controlling or describing the operation of such a learning unit. In each of the examples, the term “learning unit” may be replaced by the special case of a neural network, and conversely, the described neural networks of the exemplary embodiments may also each be implemented in a generalized manner in the form of an artificial intelligence unit, even if it is not explicitly stated in the respective example.
Besides neural networks, examples include evolutionary algorithms, support vector machines (SVM), decision trees, and special forms such as random forests or genetic algorithms.
Similarly, neural networks and other artificial intelligence units may be combined. In particular, it is possible to replace, for example, the first neural network from the preceding examples, which was illustrated as a fast categorizing unit, with any other artificial intelligence unit. In this context, it is also possible to selectively choose a method that is particularly suitable for a fast, coarse classification of features. However, the output values of such a first learning unit can then be applied in the same way as described for two neural networks to form modulation functions for a second artificial intelligence unit, which in particular can again be a neural network.
It is understood that the examples described above may be combined in any way. For example, in any of the embodiments described, there may also be a timer as described in connection with FIG. 4. Similarly, in all of the examples, the learning units may include classification memories, as described as an example in connection with FIG. 5. All these variants are again applicable to a coupling of also several artificial intelligence units.

Claims

1. A method in a system of at least two artificial intelligence units comprising: inputting input values (X_i) to at least a first artificial intelligence unit and a second artificial intelligence unit;

obtaining first output values (Output1) of the first artificial intelligence unit;

forming one or more modulation functions (f_{mod_f}, f_{mod_w}) based on the output values (Output1) of the first artificial intelligence unit;

applying the formed one or more modulation functions to one or more parameters of the second artificial intelligence unit, wherein the one or more parameters influence the processing of input values and the obtaining of output values in the second artificial intelligence unit;

obtaining second output values (Output2) of the second artificial intelligence unit.

2. The method of claim 1, wherein at least one of the artificial intelligence units comprises a neural network having a plurality of nodes, and wherein the one or more parameters is at least one of: a weighting (w_i) for a node of the neural network, an activation function (f_akt) of a node, an output function (f_out) of a node, a propagation function of a node.

3. A method according to claim 1, wherein a classification memory is associated with each of the artificial intelligence units, wherein each of the artificial intelligence units performs a classification of the input values into one or more classes (K₁, K₂, . . . , K_n, K_m) which are stored in the classification memory, the classes each being structured in one or more dependent levels, and a number of the classes (n) and/or the levels in a first classification memory of the first artificial intelligence unit being less than a number of the classes (m) and/or the levels in a second classification memory of the second artificial intelligence unit.

4. The method according to claim 1, wherein applying the at least one modulation function causes a time-dependent superposition of parameters of the second artificial intelligence unit, and wherein the at least one modulation function (f_{mod_f}, f_{mod_w}) comprises one of the following: a periodic function, a step function, a function with briefly increased amplitudes, a damped oscillation function, a beat function as a superposition of several periodic functions, a continuously increasing function, and a continuously decreasing function.

5. The method of claim 1, wherein the second artificial intelligence unit comprises a second neural network having a plurality of nodes, and wherein applying the at least one modulation function causes deactivation of at least a portion of the nodes.

6. The method of claim 1, further comprising:

determining a currently dominant artificial intelligence entity in the system; and

forming total output values of the system from the output values of the currently dominating unit.

7. The method of claim 6, wherein the first artificial intelligence unit is set as the dominant unit at least until one or more output values (Output2) of the second artificial intelligence unit are available.

8. The method according to claim 6, further comprising a comparison of current input values with previous input values by at least one of the artificial intelligence units of the system, wherein, if the comparison results in a deviation that is above a predetermined input threshold, the first artificial intelligence unit is determined to be the dominant unit.

9. The method of claim 6, further comprising comparing current output values of the first artificial intelligence unit with previous output values of the first artificial unit, wherein, if the comparison results in a deviation that is above a predetermined output threshold, the first artificial intelligence unit is determined to be the dominant unit.

10. The method of claim 6, wherein the system further comprises a timer storing one or more predetermined time periods associated with one or more of the artificial intelligence units, and wherein the timer is arranged to measure, for a respective one of the artificial intelligence units, the elapse of the predetermined time period associated with that unit.

11. The method of claim 10, wherein measuring the assigned predetermined period of time for one of the artificial intelligence units is started when that artificial intelligence unit is determined to be the dominant unit.

12. The method of claim 10, wherein the second artificial intelligence unit is determined to be the dominant unit if a first time period predetermined for the first artificial intelligence unit has elapsed in the timer.

13. The method according to claim 1, wherein the input values (X_i) comprise at least one of the following: measured values detected by one or more sensors, data detected by a user interface, data retrieved from a memory, data received via a communication interface, and data output by a computing unit.