CN114153199A

CN114153199A - Method and device for supporting the planning of maneuvers of a vehicle or robot

Info

Publication number: CN114153199A
Application number: CN202110948200.XA
Authority: CN
Inventors: M·赫尔比格
Original assignee: Volkswagen AG
Current assignee: Volkswagen AG
Priority date: 2020-08-18
Filing date: 2021-08-18
Publication date: 2022-03-08
Also published as: DE102020210465A1

Abstract

The invention relates to a method for supporting a manoeuvre plan for an at least partially autonomous vehicle or robot, wherein the state space is described by means of a Markov decision problem, wherein, in order to support a manoeuvre plan for the vehicle or robot, an optimal action is determined on the basis of discrete states in the state space by performing at least one optimization method on the basis of the Markov decision problem, wherein a mapping is determined having as input values states in a state space and as output values optimal actions in the state space, wherein the determined mapping is approximated by a functional approximation, wherein the elements of the approximated mapping are stored in a look-up table on the basis of the respectively associated input values, the output values of the approximated mappings have an error exceeding a predetermined error threshold relative to the corresponding output values of the determined mappings, the approximated mappings and the look-up table are provided for use in maneuver planning. The invention also relates to a device, a control device and a vehicle or a robot.

Description

Method and device for supporting the planning of maneuvers of a vehicle or robot

Technical Field

The invention relates to a method and a device for supporting the planning of the manoeuvres (mangle) of an at least partially autonomous vehicle or robot. The invention further relates to a control device, a vehicle and a robot.

Background

In addition to trajectory planning, i.e. providing the trajectory to be traveled in the present case, strategic maneuver planning is also required in the scope of maneuver planning in order to implement superordinate strategies in autonomous vehicles. A specific example of this is a turning situation with multiple lanes and many other traffic participants. It is then necessary to determine when the vehicle must be on which lane, in order to execute the turning process as comfortably and/or as time-optimally as possible, for example, for the passengers, and to determine which other traffic participants must be overridden for this purpose. The same problem situation arises in principle for automatic mobile robots as well.

Reinforcement learning methods are known, by means of which the behavior of other traffic participants can be trained (anlernen) and optimal decisions can be made on the basis thereof. In this case, a mapping (english: mapping) is learned between the states and optimal actions corresponding thereto in relation to the goal settings expressed by reward values (english: rewarded). In other words, reinforcement learning agents (agents) attempt to find actions that maximize reward values. To find the optimal solution, the reinforcement learning agent must thoroughly check the environment to ensure that the optimal solution is not ignored. On the other hand, the agent may make full use of situations that have been experienced at earlier points in time, in which the agent has found a good solution with a correspondingly high prize value.

Furthermore, markov decision problems and dynamic programming methods are known.

The problem in describing the state space by means of the markov decision problem is that the state space grows exponentially with each of the other dimensions added ("dimension disaster ä t") and the storage requirements increase accordingly.

Disclosure of Invention

The object on which the invention is based is to provide a method and a device for supporting the planning of maneuvers of an at least partially autonomous vehicle or robot, in which in particular a low storage requirement can be achieved.

According to the invention, this object is achieved by a method for supporting a maneuver planning of an at least partially autonomous vehicle or robot having the features of the invention, a device for supporting a maneuver planning of an at least partially autonomous vehicle or robot having the features of the invention, and a control device for an at least partially autonomous vehicle or robot having the features of the invention. An advantageous embodiment of the invention results from the embodiment according to the invention.

In a first aspect of the invention, a method is provided for supporting a maneuver planning of an at least partially autonomous vehicle or robot, wherein a state space of an environment of the vehicle or robot is described in a discrete form by means of a Markov decision problem by means of a maneuver determination device, wherein, for supporting the maneuver planning of the vehicle or robot, an optimal (discretized) maneuver is determined on the basis of discrete states in the state space by performing at least one optimization method on the basis of the Markov decision problem, wherein a mapping is determined having states in the state space as input values and optimal maneuvers in the state space as output values, wherein the determined mapping is approximated by means of a function approximation (Funktionapproximation) by means of an approximation device, wherein elements (elements) of the approximated mapping are stored in a look-up table (Nachschlaggiambelle) in accordance with the respectively assigned input values, the output values of the approximated mapping have an error exceeding a predetermined error threshold relative to the respective output values of the determined mapping, and wherein the approximated mapping and the look-up table are provided for use in motor action planning.

Furthermore, in a second aspect of the invention, in particular an apparatus for supporting a maneuver planning for an at least partially autonomous vehicle or robot is proposed, which apparatus comprises an action-determining device and an approximation device, wherein the action-determining device is set up for describing a state space of the environment of the vehicle or robot in discrete form by means of a markov decision problem, for supporting the maneuver planning of the vehicle or robot an optimal (discretized) action is determined on the basis of discrete states in the state space by performing at least one optimization method on the basis of the markov decision problem, a mapping is determined having states in the state space as input values and optimal actions in the state space as output values, and wherein the approximation device is set up for approximating the determined mapping by means of functional approximation, wherein the elements of the approximated mapping whose output values have an error exceeding a predefined error threshold relative to the respective output values of the determined mapping are stored in a look-up table as a function of the respectively associated input values, and wherein the device is set up for providing the approximated mapping and the look-up table for use in the motor action planning.

In a third aspect of the invention, in particular also a method for supporting a maneuver planning of an at least partially autonomous vehicle or robot is provided, wherein an approximated mapping and look-up table generated according to the method according to the first aspect is obtained and/or provided by means of a control device of the vehicle or robot, and an optimal action is provided for the maneuver planning on the basis of the identified discrete states of the state space, wherein in this case it is first checked: whether the identified state has an optimal action stored in a lookup table; if this is the case, the deposited optimal action is invoked and provided for the maneuver planning, otherwise the optimal action is estimated and provided by means of the approximated mapping.

Then, in a fourth aspect of the invention, a control device for an at least partially autonomous vehicle or robot is proposed, wherein the control device is set up to obtain and/or provide an approximated mapping and look-up table generated according to the method according to the first aspect and to provide an optimal action for a maneuver planning according to the identified discrete states of the state space, and to this end it is first checked that: whether an optimal action is stored in a lookup table for the identified state; if this is the case, the deposited optimal action is invoked and provided for the maneuver planning, otherwise, the optimal action is estimated by means of the approximated mapping and provided for the maneuver planning.

These different aspects achieve: even with increasing state space, the storage demand is not increased exponentially. This is achieved by: the mapping determined for the maneuver planning is expressed not only by means of a functional approximation but also by means of a look-up table, in which discrete states in the state space of a markov decision problem are used as input values and an optimal action in the state space is used as output value. In this case, one of the basic ideas is, in particular, that a large part of the determined mapping can be approximated by means of a function. However, those elements of the approximated mapping for which the error with the corresponding element in the (non-approximated) determined mapping exceeds the error threshold (i.e. those associations between discrete states as input values of the mapping and optimal actions as output values) are deposited in a look-up table. A compromise can thus be found between the storage requirements and the accuracy of the provided optimal action. When using the approximated mapping and look-up table for maneuver planning, first check in the look-up table: whether there is an optimal action deposited for a discrete state in the state space currently detected or identified. If the optimal action is stored, i.e., there is an entry (Eintrag) in the lookup table for the identified discrete state, then the entry is called and provided for maneuver planning. If, however, no optimal action is stored in the look-up table for the discrete state identified, the associated optimal action is estimated by means of the approximated mapping.

One of the advantages of the different aspects is that a compromise can be found between storage requirements and accuracy even in the case of large and in particular growing state spaces. In particular, all provided optimal actions have an error with respect to the optimal actions stored in the (non-approximated) map, said error not being greater than a predefined error threshold.

By predetermining a suitable error threshold, in particular the size of the look-up table can be influenced. The smaller the predefined error threshold, the more accurate the provided optimal action in view of the corresponding optimal action in the determined mapping. At the same time, however, the memory requirements increase with smaller error thresholds, since the look-up table is thus larger and requires more memory.

In particular, it can be provided that the error threshold is predefined or predefined such that a predefined memory space for accommodating the approximated mapping and the look-up table is not exceeded. Such storage space is limited or defined, in particular, by using approximated maps and look-up tables in the control device in the vehicle or robot.

The Markov Decision Problem (MDP), the English language: Markov Decision Process, is a model of the Decision problem. In this case, the use of the agent is related to a decision sequence, wherein the sequence comprises sequential state transitions between discrete states in a state space. For each state transition, the markov assumption applies in this case, i.e. the transition probability from state s to state s' is dependent only on s and not on the history in the past, i.e. the predecessor of s. The state space maps in particular discrete states in the environment of the vehicle or robot. In principle, the Markov Decision problem can also be designed as a decomposable Markov Decision problem (FMDP).

The states in the state space may in particular comprise a plurality of quantities or properties, i.e. the states are in particular multidimensional. A state is in this case defined in particular as a specific manifestation of these quantities or characteristics. In particular, the states in the state space are selected to be discrete. The state space is in particular a higher-level state space, i.e. the states are not mapped via the sensor raw data, but rather via higher-quality (subwaveller) features and characteristics derived from the sensor raw data, for example by means of object and/or pattern recognition. The status may for example comprise the position of the obstacle and/or the speed of the obstacle and/or the type or category of the obstacle in the environment. At least in the case of use in a vehicle, the state is derived, in particular, from sensor data detected by means of at least one sensor.

At least one optimization method is performed based on the Markov decision problem to determine an optimal action for the mapping. To this end, provision may be made, in particular, for an optimal action value for the discretized action to be determined by means of dynamic programming on the basis of the discrete states in the state space, wherein a mapping with states in the state space as input values and action values of the action in the state space as output values is learned by means of a reinforcement learning method, wherein a reinforcement learning agent is initialized in this case on the basis of the optimal action value determined by means of dynamic programming, and wherein the learned mapping is provided for the maneuver planning. This has the following advantages: the reinforcement learning agent does not have to start from zero at learning, but can already start with a solution that is optimal at least in view of a number of discrete states in the state space. This can be achieved by: before applying reinforcement learning, optimal action values for individual actions for discrete states in the state space have been determined by means of dynamic programming. With the optimal action value thus determined, a mapping trained by the reinforcement learning agent is initialized. The reinforcement learning agent thus does not have to start from zero, but can be based on an action value determined by means of dynamic programming.

In principle, it is also possible to specify only the application of the reinforcement learning method, without initializing the reinforcement learning method by means of the mapping generated by dynamic programming. In this case, the behavior is similar to that described above. However, in principle, other optimization methods can also be provided. However, at least one optimization method used always works based on a markov decision problem.

Dynamic programming is a method for solving optimization problems by dividing a complex problem into simpler sub-problems or sub-problems. In this case the solution is done in a recursive manner. In particular, dynamic programming is an algorithmic paradigm that describes a class of optimization methods that use a perfect model of the environment as a markov decision problem to solve a given problem situation. Dynamic programming is particularly applied in state spaces with discrete states. In particular, dynamic programming provides optimal action values as a result as a measure of reward for discretized actions based on discrete states in a state space.

Reinforcement learning (also known as reinforcement learning or reinforcement learning) is a machine learning method in which an agent learns strategies independently in order to maximize the rewards earned. In this case, the reward may be positive (positiv) or negative (negativ). Depending on the reward obtained, the agent approaches a reward function that describes which value the state or action has. In connection with actions, such values are called action values (English). The reinforcement learning method considers, among other things, the interaction of an agent with its environment, which is expressed in the form of a markov decision problem. Based on a given state, e.g. derived from detected sensor data of at least one sensor, the agent may reach another state by an action selected from a plurality of actions. Depending on the decision made, i.e. the action performed, the agent receives a reward (english). In this case, the task of the agent is to maximize the future expected revenue, which consists of the discount rewards (i.e., total rewards). At the end of the method, an approximated reward function represents a predefined strategy, with which a reward value or an action value can be provided or estimated for each action.

It can be provided that at least one optimization method is executed on a computing device optimized for this purpose, for example on a quantum computer.

For example, for a vehicle, the actions may include the following actions: direct (i.e. stay on the lane without changing lanes), direct (without acceleration), direct and braking, lane to left lane or lane to right lane, etc. with an activated pitch control speed controller (ACC).

The optimal action for a given state is in particular an action with an optimal action value, i.e. an action for which an optimal action value is determined or has been determined by means of at least one optimization method in a given state.

Rewards (english) or action values for actions in the state space can take into account, inter alia, the following effects: collision avoidance, full compliance with the path (i.e. no deviation from the path pre-set by the navigation device or only slight deviation from the path pre-set by the navigation device), time-optimal behavior and/or comfort or suitability of the vehicle occupants.

In particular, it is provided that a specific mapping for a predefined strategy (for example, energy efficiency or comfort, etc.) which is influenced by the reward or action value is determined or has already been determined. This means in particular that the optimal action deposited in a particular map is optimal in view of a predefined strategy.

In particular, it is provided that the mapping determined by means of at least one optimization method, in particular by means of dynamic programming and reinforcement learning methods, has a tabular form.

Alternatively, provision may be made, in particular, for the determined mapping to be provided by means of a neural network, wherein the neural network is trained for initialization in a monitored learning process on the basis of optimal actions, in particular determined by means of dynamic programming.

The parts of the device, in particular the action-determining means and the approximation means and the control device, may be constructed individually or jointly as a combination of hardware and software, for example as program code executed on a microcontroller or microprocessor.

The vehicle is in particular a motor vehicle. In principle, however, the vehicle may be another land vehicle, a water vehicle, an air vehicle, a rail vehicle or a spacecraft. In principle, the robot can be designed in any way, for example as a transport robot, a production robot or a care robot, etc.

In one embodiment, it is provided that the providing comprises: loading the approximated mapping and the look-up table into a memory of a control device of the at least one transport means or of the at least one robot, so that, when the at least one transport means or the at least one robot is operated, in order to provide an optimal action value for the identified discrete states of the state space, it can first be checked by means of the control device: whether an optimal action is stored in a lookup table for the identified state; if this is the case, the deposited optimal action is invoked and provided for the maneuver planning, otherwise the optimal action may be estimated by means of the approximated mapping and provided for the maneuver planning.

The providing may comprise, inter alia: communicating the approximated mapping and the look-up table to the at least one control device. In this case, the transmission takes place in particular by means of communication interfaces of the device and of the at least one control device, which are provided for this purpose. The at least one control device obtains, in particular receives, the approximated mapping and look-up table and loads the approximated mapping and look-up table into a memory, so that the approximated mapping and look-up table can be provided for maneuver planning, in particular by: optimal actions for the identified states may be invoked and/or provided.

In one embodiment, it is provided that at least one neural network is trained and provided for the functional approximation of the determined mapping. The neural network is trained in a monitored learning process, in particular by means of the determined mapping, wherein the mapping is determined, in particular, by means of dynamic programming and reinforcement learning methods. If the determined mapping is already constructed by means of a trained neural network, it is provided in particular that the neural network for the function approximation is constructed smaller than the neural network used for constructing the determined mapping in terms of scope and complexity, i.e. in terms of structure and required memory requirements and required computing power for implementation.

In an alternative embodiment, it is provided that at least one Decision Tree (English: Decision Tree) is used for the function approximation of the mapping. In which case the mode of operation is substantially similar to that of the previous embodiment.

In principle, other methods for approximating the determined function of the map can also be used. In this case, the operation is substantially similar to the foregoing embodiment.

In one embodiment, provision is made for the back-end server to perform: an approximated mapping and look-up table are provided. Thus, powerful computers, for example supercomputers, can be used for determining the mappings on the basis of a given markov decision problem by carrying out at least one optimization method, in particular a dynamic programming and reinforcement learning method, approximating the mappings and generating and providing a look-up table. Less computing power is required when applying the approximated mapping and look-up table in the control device of the vehicle or robot, so that resources (e.g. computing power, memory, structural space and energy) can be saved.

In one embodiment of the device, it is provided that the device is designed as a backend server. For example, such a back-end server may be constructed as a power-powerful supercomputer.

In particular, a method for planning maneuvers for an at least partially autonomous vehicle or robot is also provided, wherein a map and a look-up table approximated according to the method according to the first aspect are used in the maneuver planning.

It can be provided that the method for planning a maneuver further comprises: maneuvers, in particular for transverse and longitudinal guidance, are performed by generating and/or providing control signals and/or control data for actuators (Aktorik) or robots of a vehicle. The generation and provision of the respective control signals and/or control data in this case is particularly helpful for carrying out the respectively called or estimated optimal action. The control device of the vehicle or robot is configured accordingly to carry out these measures.

Furthermore, in particular, a transport means or a robot is also proposed, which comprises at least one control device according to one of the described embodiments.

Furthermore, a system is proposed, which comprises at least one device according to one of the described embodiments and at least one control device according to one of the described embodiments.

Further features relating to the configuration of the device are derived from the description of the configuration of the method. The advantages of the device are in this case the same advantages as in the case of the embodiment of the method.

Drawings

The invention is explained in more detail below on the basis of preferred embodiments with reference to the drawings. In this case:

fig. 1 shows a schematic view of an embodiment of an arrangement for supporting a maneuver planning of an at least partially autonomous vehicle or robot;

fig. 2 shows a schematic diagram for elucidating a method for supporting a maneuver planning of an at least partially autonomous vehicle or robot;

fig. 3 shows a schematic diagram for elucidating a method for supporting a maneuver planning of an at least partially autonomous vehicle or robot.

Detailed Description

Fig. 1 shows a schematic illustration of an embodiment of a device 1 for supporting a maneuver planning of an at least partially autonomous vehicle 50. In particular, the device 1 performs the method described in the present disclosure for supporting a maneuver planning of an at least partially autonomous vehicle 50. The example shown relates to a transport means 50, however, for a robot, the device 1 is also constructed analogously in principle.

The device 1 comprises action determining means 2 and approximating means 3. The action-determining means 2 and the approximating means 3 may be constructed individually or jointly as a combination of hardware and software, for example as program code executed on a microcontroller or microprocessor. The device 1 is in particular constructed as a backend server 100, wherein the backend server 100 can in particular be a powerful supercomputer.

The action determining means 2 are set up for describing the state space 10 of the environment of the vehicle 50 in a discrete form by means of a markov decision problem. In order to support the manoeuvre planning of the transport means 50, the action determining means 2 perform at least one optimization method based on a markov decision problem. The at least one optimization method may comprise, inter alia, dynamic programming and/or reinforcement learning methods.

Within the scope of at least one optimization method, the action determining means 2 determine an optimal action 34 for each state 11 in the state space 10. In this case, the action-determining device 2 is based on the states 11 in the state space 10 and action values which have been determined for the individual discrete actions in the state space 10 in each case in view of a predefined strategy (for example energy efficiency or comfort, etc.). The action determining means 2 determines, from a specific optimal action 34, a mapping 30 having as input values the states 11 in the state space 10 and having as output values the optimal action 34 in the state space 10. The determined mapping 30 is fed to the approximation means 3.

The approximation means 3 are designed to approximate the determined mapping 30 by means of a functional approximation. In this case, it is provided that the elements of the approximated mapping 31 are stored in the look-up table 33 as a function of the respectively associated input value, the output value of the approximated mapping 31 having an error exceeding a predefined error threshold 32 with respect to the corresponding output value of the determined mapping 30. This error is determined in particular by means of a suitable distance measure between the optimal action supplied by the determined map 30 and the corresponding action supplied by the approximated map 31. In particular, after the determined mapping 30 has been approximated, the error between the determined mapping 30 and the approximated mapping 31 is calculated element by element, wherein all elements of the

mappings

30, 31 are compared with each other. For all elements whose respectively determined error exceeds a predefined error threshold, the associated optimal action is stored in the look-up table 33 in association with the associated state 11 in the state space 10.

The approximated mapping 31 and the look-up table 33 are provided by the communication interface 4 of the device 1 for use in the motor action planning. In this case, provision is made for the approximated mapping 31 and the look-up table 33 to be transmitted to at least one vehicle 50 by means of the communication interface 4 and received there by means of the communication interface 52 of the control device 51 of the vehicle 50.

The approximated map 31 and the look-up table 33 are loaded into a memory (not shown) of the control device 51 and used there for the maneuver planning. For the maneuver planning, the control device 51 is fed with the current (discretized) state 11 from the state space 10 of the markov decision problem. The state 11 is derived and discretized, in particular, from detected sensor data of at least one sensor (not shown) of the vehicle 50, for example from a camera image detected by means of a camera. Depending on the delivered current state 11, the control device 51 provides an optimal action 34. For this purpose, the control device 51 first checks: whether the optimal action 34 is stored in the look-up table 33 for the identified state 11. If this is the case, the deposited optimal action 34 is called from the look-up table 33 and provided for the maneuver planning. Conversely, if no optimal action 34 is stored in the look-up table 33 for the identified state 11, the optimal action 34 is estimated by means of the approximated mapping 31 and provided for the maneuver planning.

Providing the optimal action 34 may include, among other things: the optimal action 34 is fed to a further control device 53, for example a trajectory planner, which plans a trajectory for executing the optimal action 34 and feeds it, for example, to an actuator of the vehicle.

In principle, it can also be provided that the device 1 is part of the transport means 50.

It can be provided that at least one neural network is trained and provided for the functional approximation of the mapping 30. The approximated mapping 31 is then provided by applying the trained neural network. Alternatively, the determined mapping 30 may also be approximated using a decision tree, for example.

Furthermore, it can be provided that, by means of the backend server 100: an approximated mapping 31 and a look-up table 33 are provided.

Fig. 2 shows a schematic diagram for illustrating a method for supporting a maneuver planning of an at least partially autonomous vehicle or robot. Only a strongly simplified example is shown, which, however, clarifies the way of behavior when approximating the determined mapping 30.

In this simple example, the determined mapping 30 comprises an allocation between the states 11 of the state space and the optimal action 34. In the example shown, state 11 has only two dimensions "A" and "B". The optimal action 34 again has only two manifestations "r" and "g". This is strongly simplified. The true state 11 may have multiple dimensions and the true optimal action 34 may likewise have multiple manifestations.

Within the scope of the method described in the present disclosure, the determined mapping 30, i.e. the respective optimal action 34 in relation to the state, is determined by means of at least one optimization method, in particular by means of dynamic programming and reinforcement learning methods.

The determined map 30 is approximated by a function approximation, which here corresponds, by way of example and in a strong simplification: the expressions "r" and "g" are classified according to the dimensions "a" and "B". The classification can be carried out in particular by means of a neural network, so that it is trained by means of the determined mapping 30 in order to estimate the respective optimal action 34 ("r" or "g") on the basis of the given state 11. The result of this training is an approximated mapping 31 that allows: the optimal action 34 is estimated from the state 11 (with dimensions "A" and "B").

Additionally, outliers (Ausrei β er) 35, i.e. those combinations of states 11 and optimal actions 34 which cannot be detected correctly or with insufficient accuracy by means of the approximated mapping 31, are also determined. In a simple example where only these two manifestations "r" and "g" exist for the optimal action, these are the outlier 35 of the optimal action "g" located within the area for which the approximated mapping 31 estimates the optimal action "r" and the outlier 35 of the optimal action "r" located within the area for which the approximated mapping 31 estimates the optimal action "g". The difference between the estimated optimal action and the determined optimal action 34 in the map 30 is considered an error. For each element of the determined mapping 31, the error is determined and compared to an error threshold. In this simple example, the error threshold is defined as the error manifestation. In the actual application of the method, the predefined error threshold corresponds to a predefined difference threshold between the optimal actions 34, wherein in this case a respectively suitable distance measure is applied (for example a scalar product (Skalarprodukt) if the actions can be expressed as vectors, etc.).

A look-up table 33 is generated for the outliers 35, in which the associations between the states 11 and the optimal actions 34 are stored. In this case, the existing element of the look-up table 33 corresponds to the determined mapping 30. The look-up table 33 includes only entries for the outliers 35; for the other state 11 there is no entry.

Alternatively, it can also be provided that an action value is determined for the optimal action estimated for the state 11 by means of the approximated mapping 31. For determining the action value, the optimal action estimated by means of the approximated mapping 31 can be compared, for example, with the action determined within the framework of the dynamic programming in order to find the optimal action 34 for the associated discrete state 11. The estimated optimal action is then assigned an action value of the action of those actions that is closest to the estimated optimal action (in a simple example, both actions include, for example, a vehicle acceleration of 2m/s ^2 of the same size). The action value assigned to the estimated optimal action in this manner may then be compared to the action value of the optimal action 34 deposited in the determined map 30. The action value of the optimal action 34 can likewise be obtained, for example, in the context of dynamic programming. From the difference threshold of the action values, which is predefined as the error threshold, it can then be determined whether the approximated mapping 31 correctly maps the combination of the determined state 11 of the mapping 30 and the optimal action 34. If the difference between the assigned action value of the estimated optimal action and the action value of the optimal action 34 deposited in the determined mapping 30 is below a difference threshold, the optimal action 34 for the state 11 is estimated by the approximated mapping 31. If the difference meets or exceeds the difference threshold, the optimal action 34 for state 11 is deposited in the look-up table 33. Alternative behavioural approaches can be applied generally and are not limited to the simple examples described.

In an alternative, provision can be made for the action value to be estimated, for example by a neural network which is set up and trained accordingly for this purpose, likewise for the optimal action to be estimated by means of the approximated mapping 31. The estimated action value can then be compared, as described above, with the action values of the respectively associated optimal actions 34 stored in the determined map 31, in order to determine by means of a difference threshold: whether the optimal action 34 should be evaluated for the associated state 11 and stored in the look-up table 33.

The approximated mapping 31 and the look-up table 33 are provided for a maneuver planning, in particular loaded into a memory of a control device of the at least one vehicle or the at least one robot.

Fig. 3 shows a schematic diagram for elucidating a method for supporting a maneuver planning of an at least partially autonomous vehicle or robot as performed in a control device in the vehicle or robot. The example described with respect to fig. 2 continues to be used in this case for illustration.

The approximated mapping 31 and the look-up table 33 are obtained and/or provided by means of a control device of the vehicle or the robot. For example, it can be provided that the approximated mapping 31 and the look-up table 33 are generated by means of a backend server and transmitted to the control device. The approximated mapping 31 and the look-up table 33 are loaded into the memory of the control device and provided by said memory for the maneuver planning.

For a current state 11 in the state space 10, which is identified and discretized, for example, on the basis of the detected sensor data, it is checked: whether there is an optimal action 34 stored in the look-up table 33 for that state. If this is the case, the deposited optimal action 34 is invoked and provided for maneuver planning (this is the case, for example, for a-0 and B-10 with optimal action "g"). If, on inspection, it is indicated that no optimal action 34 is stored in the look-up table 33 (for example for a =10 and B = 20), the optimal action 34 is estimated by means of the approximated mapping 33 and provided for the maneuver planning.

An optimization action 34 is then carried out, for example by planning a trajectory by means of a trajectory planner and by actuating an actuator of the vehicle or the robot by means of a control device.

List of reference numerals

1 apparatus

2 action determining device

3 approximation device

4 communication interface

10 state space

11 state

30 determined mapping

31 approximated mapping

32 error threshold

33 lookup table

34 optimal action

35 abnormal value

50 transport means

51 control device

52 communication interface

53 other control devices

100 backend server.

Claims

1. A method for supporting a maneuver planning of an at least partially autonomous vehicle (50) or robot, wherein a state space (10) of an environment of the vehicle (50) or robot is described in a discrete form by means of a Markov decision problem by means of an action determining device (2),

wherein for supporting a maneuver planning of the vehicle (50) or robot, an optimal action is determined based on discrete states (11) in the state space (10) by performing at least one optimization method based on a Markov decision problem, wherein a map (30) is determined having the states (11) in the state space (10) as input values and the optimal actions (34) in the state space (10) as output values,

wherein the determined mapping (30) is approximated by means of an approximation device (3) by means of a function approximation, wherein elements of an approximated mapping (31) whose output values have an error exceeding a predefined error threshold (32) with respect to the respective output values of the determined mapping (30) are stored in a look-up table (33) as a function of the respectively associated input values, and wherein the approximated mapping (31) and the look-up table (33) are provided for use in the motor action planning.

2. The method of claim 1, wherein the providing comprises: loading the approximated mapping (31) and the look-up table (33) into a memory of a control device (51) of at least one transport means (50) or at least one robot, so that, when operating the at least one transport means (50) or the at least one robot, in order to provide an optimal action (34) for the identified discrete states (11) of the state space (10), it can first be checked by means of the control device (51): whether an optimal action (34) is stored in the look-up table (33) for the identified state (11); if this is the case, the deposited optimal action (34) can be called and provided for the maneuver planning, otherwise the optimal action (34) can be estimated by means of the approximated mapping (31) and provided for the maneuver planning.

3. The method according to claim 1 or 2, characterized in that at least one neural network is trained and provided for the purpose of approximating the function of the map (30).

4. The method according to any of the preceding claims, characterized by performing, by means of a backend server (100): -providing the approximated mapping (31) and the look-up table (33).

5. Method for supporting a manoeuvre plan for an at least partially autonomous vehicle (50) or robot, wherein an approximated mapping (31) and a look-up table (33) generated according to the method of any one of claims 1 to 4 are obtained and/or provided by means of a control device (51) of the vehicle (50) or robot, and an optimal action (34) is provided for the manoeuvre plan on the basis of the identified discrete states (11) of a state space (10), wherein in this case it is first checked: whether an optimal action (34) is stored in the look-up table (33) for the identified state (11); if this is the case, the deposited optimal action (34) is invoked and provided for the maneuver planning, otherwise the optimal action (34) is estimated and provided by means of the approximated mapping (31).

6. Method for planning maneuvers for an at least partially autonomous vehicle (50) or robot, wherein a mapping (31) and a look-up table (33) approximated according to the method of any one of claims 1 to 4 are used in the maneuver planning.

7. An apparatus (1) for supporting a manoeuvre plan for an at least partially autonomous vehicle (50) or robot, the apparatus comprising

Action determining device (2) and

an approximation device (3),

wherein the action determination means (2) are set up for describing a state space (10) of the environment of the vehicle (50) or robot in a discrete form by means of a Markov decision problem,

in order to support a manoeuvre planning of the vehicle (50) or robot, an optimal action is determined based on discrete states (11) in the state space (10) by performing at least one optimization method based on a Markov decision problem, a mapping (30) with states (11) in the state space (10) as input values and optimal actions (34) in the state space (10) as output values is determined, and

wherein the approximation device (3) is designed to approximate the determined mapping (30) by means of a functional approximation, wherein elements of an approximated mapping (31) whose output values have an error exceeding a predefined error threshold (32) with respect to the corresponding output values of the determined mapping (30) are stored in a look-up table (33) as a function of the respectively associated input values, and

wherein the device (1) is set up for providing the approximated mapping (31) and the look-up table (33) for use in motor action planning.

8. The device (1) according to claim 7, characterized in that the device (1) is configured as a backend server (100).

9. A control device (51) for an at least partially autonomous vehicle (50) or robot, wherein the control device (51) is set up for obtaining and/or providing an approximated mapping (31) and a look-up table (33) generated according to the method of one of claims 1 to 4 and providing an optimal action (34) for a maneuver planning from the identified discrete states (11) of a state space (10), and for this first checking: whether an optimal action (34) is stored in the look-up table (33) for the identified state (11); if this is the case, the deposited optimal action (34) is invoked and provided for the maneuver planning, otherwise the optimal action (34) is estimated by means of the approximated mapping (33) and provided for the maneuver planning.

10. A transport means (50) or robot comprising at least one control device (51) according to claim 9.