WO2022112298A1

WO2022112298A1 - Method for generating a temporal sequence of augmented data frames

Info

Publication number: WO2022112298A1
Application number: PCT/EP2021/082772
Authority: WO
Inventors: Florian Drews; Claudius Glaeser; Florian Faion; Lars Rosenbaum; Koba Natroshvili
Original assignee: Robert Bosch Gmbh
Priority date: 2020-11-25
Filing date: 2021-11-24
Publication date: 2022-06-02
Also published as: DE102020214766A1

Abstract

The invention relates to a method for generating a temporal sequence of augmented data frames, comprising the following steps: providing a temporal sequence of sensor data frames which are generated by a sensor system; determining at least one augmented data frame by interpolating a scene which is characterized by successive sensor data frames of at least a part of the temporal sequence of sensor data frames; and generating the temporal sequence of augmented data frames by a temporal succession of a plurality of determined augmented data frames, said temporal succession corresponding to the temporal sequence of sensor data frames.

Description

description

title

Method for generating a temporal sequence of augmented data frames

State of the art

Machine learning methods, especially deep learning, are becoming increasingly important in the field of automated driving. Deep neural networks are increasingly used in the field of environmental perception with sensors installed in the vehicle. These are typically trained using supervised learning, i.e. on the basis of labeled environment data. However, the creation and in particular the labeling of corresponding data sets is very time-consuming and costly.

In order to increase the diversity of data sets, methods for data augmentation are often used. The recorded sensor data is artificially changed and thus the scope of the data set is increased. Typical augmentations include, for example, adding noise, cropping or scaling camera images or rotating and mirroring LI DAR point clouds, radar reflections or individual objects in the vehicle environment.

Disclosure of Invention

These methods for data augmentation are currently applied to individual frames of sensor data (e.g. individual images from a video camera). They are therefore particularly suitable for supporting the training of neural networks that implement a single-frame application, such as the detection of objects in individual frames. According to aspects of the invention, a method for generating a temporal sequence of augmented data frames, a use of the method, a device, a computer program and a machine-readable storage medium according to the features of the independent claims are proposed. Advantageous configurations are the subject of the dependent claims and the following description.

Throughout this description of the invention, the sequence of method steps is presented in such a way that the method is easy to follow.

However, those skilled in the art will recognize that many of the method steps can also be carried out in a different order and lead to the same or a corresponding result. In this sense, the order of the method steps can be changed accordingly. Some features are numbered to improve readability or to clarify attribution, but this does not imply the presence of specific features.

According to one aspect, a method for generating a temporal sequence of augmented data frames is proposed, which has the following steps:

In one step, a temporal sequence of sensor data frames generated by a sensor system is provided. In a further step, at least one augmented data frame is determined by interpolating a scene that is characterized by consecutive sensor data frames, at least part of the temporal sequence of sensor data frames. In a further step, the temporal sequence of augmented data frames is generated by a temporal sequence of a multiplicity of specific augmented data frames that corresponds to the temporal sequence of sensor data frames.

This method can be used to provide training data sets for training perception, prediction or planning tasks that operate on the basis of multiple frames, with the data sets having temporal sequences of augmented data frames and possibly associated labels or other object attributes.

The method can therefore be used for various systems that are based on sequence-based data sets. In particular, the training of neural networks for perception, prediction or planning tasks in the area of driver assistance systems and the automated driving in question. In addition, an application in other areas (eg robotics) is conceivable. In addition, data sets that are generated according to the method can be used advantageously for the verification and validation of algorithms and system components.

An example of such an application of such training data sets is a temporal tracking (tracking) of detected objects in a vehicle environment. A diversity of sequence-based sensor data frames can be artificially increased by generating a temporal sequence of augmented data frames. The temporal sequence of augmented data frames can enable a more comprehensive training of neural networks for sequence-based applications and thus increase their accuracy and robustness.

By selecting at least part of the time sequence of sensor data frames, a frame rate can be changed and by changing the frame rate, in addition to the time sequence of sensor data frames, other speeds of the ego vehicle and all other road users be simulated. Because if the augmented data frames generated in this way result in a slower or faster “playback” of the sensor data at the same frame rate for the sensor data frames and the augmented data frames. In this way, a slower or faster movement of road users, including the recording ego vehicle, can be simulated.

Alternatively or additionally, a temporal sequence of augmented data frames can have additional data frames, so that both an increased and a reduced speed of the road users can be simulated with such sequences of data frames.

For example, an additional time sequence of augmented data frames can be generated using the time sequence of sensor data frames, in which only every Nth frame of the sequence of sensor data frames is converted into the sequence of augmented data frames. With a constant frame rate, an N-fold speed of the road users is thus simulated. This procedure can be repeated for different integer N, whereby the sequence-based data set can be artificially enlarged many times over.

Alternatively or additionally, the temporal sequence of augmented data frames can be generated by interpolating the sensor data frames. Such an interpolation of frames allows any speeds of road users are simulated. This includes both faster and slower speeds than the original sensor data frame sequence. Since the original sequence of the sensor data frames can be interpolated for any sampling frequency, the method allows an unlimited artificial enlargement of the temporal sequence of sensor data frames.

The variability of the available data frames can be increased by such a generation of a temporal sequence of augmented data frames. As a result, corresponding neural networks for perception, prediction and planning tasks can be trained more efficiently, which is reflected in increased precision and/or robustness.

In addition, temporal sequences of augmented data frames generated in this way enable the simulation of traffic events that are observed only rarely or in insufficient numbers in reality, such as speeding by road users.

The use of time sequences of augmented data frames generated in this way also reduces the effort involved in introducing and storing corresponding sensor data, which represents a time and cost saving.

Advantageously, in addition to using the temporal sequence of augmented data frames for training neural networks, this can also be used for the verification and validation of algorithms and system components in the area of driver assistance systems and automated driving.

According to one aspect, it is proposed that the temporal sequence of augmented data frames is provided for a system for perceiving the surroundings and/or for object prediction and/or for planning the behavior of at least partially automated mobile platforms.

In particular, the temporal sequence of augmented data frames can be provided for training machine learning systems and/or for training neural networks.

According to one aspect, it is proposed that a system created with the temporal sequence of augmented data frames is used for environmental perception and/or for object prediction and/or for behavior planning of at least partially automated mobile platforms. According to one aspect, it is proposed that the scene is characterized by sensor data frames with raw sensor data from the sensor system and/or with a representation of the raw sensor data from the sensor system.

Generating a temporal sequence of augmented data frames by interpolating a temporal sequence of sensor data frames alternatively or additionally to new points in time can prove difficult if the sensor data frames are based on raw sensor data. For example, images from optical camera systems are difficult to interpolate. Therefore, it is proposed here to determine the interpolation based on a representation of the raw sensor data of the sensor system. This is particularly advantageous when neural networks for perception, prediction or planning tasks operate on such derived representations. This applies in particular to occupancy maps, which can be used for various applications in the field of driver assistance systems and automated driving.

The relative distances of the raw sensor data and/or the relative distances of the representation are particularly relevant for the method for generating a temporal sequence of augmented data frames.

According to one aspect, it is proposed that the scene is characterized by sensor data frames with an occupancy grid as a representation of the respective raw sensor data of the sensor system.

By means of such occupancy maps (occupancy maps) as a representation of the respective raw sensor data, the interpolation of a scene can be simplified depending on the type of raw sensor data.

According to one aspect, it is proposed that a plurality of augmented data frames of the temporal sequence of augmented data frames is determined by interpolating the scene using consecutive sensor data frames in such a way that the scene of the respective augmented data frame is opposite to the scene of the corresponding sensor data frame is shifted in time in order to generate a temporal jitter in the temporal sequence of augmented data frames. A temporal jitter can be understood as a temporal fluctuation and/or a temporal fluctuation corresponding to a signal that is noisy in relation to the sampling frequency. According to this aspect of the method, the generation of a temporal sequence of augmented data frames is combined with additional noise in the sampling frequency, or temporal jitter.

As a result, temporal inaccuracies that occur when sensor data is recorded, such as variable exposure times of camera images, can be mapped and correspondingly simulated with the temporal sequence of augmented data frames. A correspondingly augmented data set can thus contribute to perception, prediction or planning methods becoming more robust in relation to such a temporal jitter.

According to one aspect, it is proposed that the method for generating a temporal sequence of augmented data frames has the following steps:

In one step, at least a first object is identified in the respective scene from consecutive sensor data frames, at least part of the temporal sequence of sensor data frames. In a further step, at least one second object is identified in the respective scene of consecutive sensor data frames, at least of the part of the temporal sequence of sensor data frames. In a further step, at least one augmented data frame of the temporal sequence of augmented data frames is determined, with the interpolation for the at least first object in the scene being different from the interpolation of the at least second object in the scene.

In this case, the first object and/or the second object can relate both to a static object and to a dynamic object of the respective scene. For example, this advantageously makes it possible to interpolate augmented data frames in which two dynamic objects of the scene each have different speeds in comparison with the sensor data frames.

This is because if the interpolation of the respective scene is carried out in the same way for the entire data frame, starting from the respective sensor data frame, the same change in speed results for all objects in the scene.

With this aspect of the method, the speeds of the road users or the dynamic objects can be changed independently of one another. i.e. the speeds of the dynamic objects can be determined object-specifically.

According to one aspect it is proposed that the interpolation for determining the at least one augmented data frame for the at least first object and/or for the second object is dependent on a change in a position of an ego vehicle in the scene, the sensor system of which provides the temporal sequence of the sensor data frames.

The ego vehicle can be viewed as a mobile platform that has the sensor system that generates and provides the temporal sequence of sensor data frames. For static objects, taking into account the change in the position of the host vehicle, corresponding to the known movement of the host vehicle in the form of a displacement or translation and a rotation, is sufficient for determining the augmented data frame. For dynamic objects, trajectories of the respective dynamic objects must also be taken into account.

According to one aspect it is proposed that at least one object is determined as a dynamic object based on a label for the interpolation which is assigned to the object in the respective scene.

In this case, the respective label can be assigned to a respective frame with, for example, a list of objects for a frame. This means that a number of raw data points that fall into a cuboid (bounding box) can be assigned to the respective object using the object list.

According to one aspect it is proposed that at least one object is determined as a dynamic object for the interpolation, based on an automatic determination of dynamic areas of the scene.

In other words, this means that a division of the objects into static and dynamic objects or raw sensor data, which can be assigned to a static or a dynamic object, via labels that are assigned to a bounding box, for example, the raw sensor data that can be assigned to this bounding Box fall, can be assigned.

Data sets of temporal sequences of sensor data frames typically have labels for other objects such as road users, e.g. in the form of bounding boxes, as well as information about the recording vehicle (host vehicle), e.g. in the form of speeds via CAN bus, in addition to the sensor data on. Alternatively or additionally, dynamic areas or dynamic objects can be determined automatically, for example by means of flow algorithms that are applied to the raw sensor data, the respective data frames. According to one aspect, it is proposed that at least one augmented data frame is generated based on an interpolated dynamic object and/or an interpolated static object.

In other words, new augmented data frames can be created by merging the static and dynamic objects.

Such new frames can be generated or interpolated from various neighboring frames, e.g. the point in time (N+0.3) can be generated from frame N by “forward propagation” by 0.3 or from frame (N+l) by “back propagation” by 0.7. The interpolations can also be performed in both directions and then both results can be weighted to get more accurate results.

In a first step, a trajectory for at least one object of the scene that is determined to be dynamic is determined using the successive sensor data frames. In a further step, the scene with the at least one dynamically determined object is interpolated based on the trajectory for the at least one dynamically determined object.

All of the methods mentioned above for generating a temporal sequence of augmented data frames can also be combined with single-frame-based augmentation methods, so that the combination is advantageous for our applications, such as object tracking or trajectory planning. In addition, they can be applied to sequence data from various sensors (e.g. video, radar, LIDAR) or representations derived from them.

According to one aspect, it is proposed that the at least one object determined as dynamic is a host vehicle whose sensor system provides the temporal sequence of the sensor data frames and the method has the following steps: in one step, a drivable trajectory for the host vehicle of the scene definitely.

In a further step, the scene with the host vehicle is interpolated based on the drivable trajectory for the host vehicle.

A drivable trajectory for the ego vehicle can be a trajectory that is dynamically possible in terms of vehicle dynamics and/or has a desired course for the interpolation of the scene within the trajectories of other road users. In other words, this means that different positions are assigned to the ego vehicle, resulting in a different drivable trajectory. This method is particularly simple since the raw sensor data can be transformed using the drivable trajectory of the host vehicle. This means that the host vehicle is shifted to the corresponding position of the drivable trajectory for each individual data frame and the raw sensor data is transformed accordingly in order to convert an augmented data frame from a point in time N into an augmented data frame at a point in time N +l to convert.

In combination with generating a temporal sequence of augmented data frames, the ego vehicle can be assigned a new drivable trajectory, so that augmented data frames are generated which correspond to the ego vehicle driving through the same scene multiple times on different drivable trajectories.

The corresponding method can also be used for simulating different trajectories of other road users, as is described in detail in the exemplary embodiments.

Correspondingly, an object contained in a scene of a sequence of sensor data frames can also be exchanged for any other desired object in order to generate a temporal sequence of augmented data frames with this desired object.

For example, a dynamic object driving ahead in the form of a car can be exchanged for a truck driving ahead or a cyclist. Alternatively or additionally, new objects that follow any trajectory can be integrated into the sequence of augmented data frames. The new objects can be derived from other sequences of sensor data frames, from which the corresponding sensor data are selected. Alternatively or additionally, additional objects can be integrated into the temporal sequence of augmented data frames using simulated sensor data.

According to one aspect, it is proposed that the temporal sequence of sensor data frames be provided by means of a video system and/or a RADAR system and/or a LIDAR system.

A video system can be an optical camera system. Use of one of the methods described above for verification and/or validation of algorithms and/or system components for a driver assistance system and/or a system for at least partially automated driving is proposed.

In neural networks, the signal at a connection of artificial neurons can be a real number, and the output of an artificial neuron is calculated by a non-linear function of the sum of its inputs. The connections of the artificial neurons typically have a weight that adjusts as learning progresses. Weight increases or decreases the strength of the signal on a connection. Artificial neurons can have a threshold such that a signal is only output if the total signal exceeds this threshold.

Typically, a large number of artificial neurons are combined in layers. Different layers may perform different types of transformations on their inputs. Signals travel from the first layer, the input layer, to the last layer, the output layer; possibly after going through the shifts several times.

Basically, neural networks consist of at least three layers of neurons: an input layer, an intermediate layer (hidden layer) and an output layer. This means that all neurons in the network are divided into layers, with a neuron in one layer always being connected to all neurons in the next layer. Except for the input layer, the different layers consist of neurons that are subject to a non-linear activation function and are connected to the neurons of the next layer. A deep neural network can have many such intermediate layers.

Such neural networks must be trained for their specific task. Each neuron of the corresponding architecture of the neural network receives z. B. a random starting weight. Then the input data is fed into the network, and each neuron weights the input signals with its weight and passes the result on to the neurons of the next layer. The overall result is then made available at the output layer. The size of the error can be calculated, as well as the contribution each neuron made to that error, and then change the weight of each neuron in the direction that minimizes the error. Then done recursively Runs, re-measures the error, and adjusts the weights until the error is below a predetermined limit.

A data device is proposed which is set up to carry out one of the methods described above for generating a temporal sequence of augmented data frames.

A system for capturing the surroundings of a sensor system is proposed that has been trained with a temporal sequence of augmented data frames to capture the surroundings of the sensor system, with the augmented data frames being generated in accordance with one of the methods described above.

As a result, the environment of the mobile platform can be recorded with less economic effort and with high quality of the recording.

A mobile platform is proposed which has the system described above for detecting the surroundings of a sensor system.

A mobile platform can be understood to mean an at least partially automated system that is mobile and/or a driver assistance system. An example can be an at least partially automated vehicle or a vehicle with a driver assistance system. That is, in this context, an at least partially automated system includes a mobile platform in terms of at least partially automated functionality, but a mobile platform also includes vehicles and other mobile machines including driver assistance systems. Other examples of mobile platforms can be driver assistance systems with multiple sensors, mobile multi-sensor robots such as robot vacuum cleaners or lawn mowers.

The method described for detecting an environment of a first sensor system can be used for mobile platforms and/or also for multi-sensor monitoring systems and/or a production machine and/or a personal assistant and/or an access control system.

Each of these systems can be a fully or partially automated system. A computer program is proposed which has instructions which, when the computer program is executed by a computer, cause the latter to execute one of the methods described above. Using such a computer program, the methods described above can be made available in a simple manner, for example on a mobile platform.

A machine-readable storage medium is proposed, on which the computer program described above is stored. The computer program described above can be transported by means of such a machine-readable storage medium.

A neural network is proposed that is trained with one of the temporal sequences of augmented data frames described above.

exemplary embodiments

Exemplary embodiments of the invention are illustrated with reference to FIGS. 1 to 9 and explained in more detail below. Show it:

FIG. 1 shows a sketch of a chronological sequence of sensor data frames and chronological sequences of augmented data frames interpolated therefrom by only taking over every nth frame of the original sequence in the new sequence;

FIG. 2 shows a sketch of a time sequence of sensor data frames and time sequences of augmented data frames interpolated therefrom at other sampling times;

FIG. 3 shows a sketch of a time sequence of sensor data frames and time sequences of augmented data frames interpolated therefrom from representations;

FIG. 4 shows a sketch of a time sequence of sensor data frames and time sequences of augmented data frames interpolated therefrom at other sampling times in order to generate a time jitter; FIG. 5a shows a sketch of a frame with a scene made up of static and dynamic objects at a first point in time;

FIG. 5b shows a sketch of a frame with a scene made up of static and dynamic objects at a second point in time;

FIG. 6a shows a sketch of a frame with a scene with a host vehicle on a trajectory and static objects;

FIG. 6b shows a sketch of a frame with a scene with a host vehicle following a trajectory and static objects;

FIG. 7a shows a sketch of a frame with a scene with a host vehicle and dynamic objects that follow a trajectory at a first point in time;

FIG. 7b shows a sketch of a frame with a scene with a host vehicle and dynamic objects that follow a trajectory at a second point in time;

FIG. 8a shows a sketch of a frame with a scene with a host vehicle following a trajectory and dynamic objects;

FIG. 8b shows a sketch of a frame with a scene with a host vehicle that follows a drivable trajectory and dynamic objects;

FIG. 9a shows a sketch of a frame with a scene with an ego vehicle that follows an ego trajectory and a dynamic object that follows an object trajectory; and

FIG. 9b shows a sketch of a frame with a scene with an ego vehicle that follows an ego trajectory and a dynamic object that follows a drivable object trajectory. Figure 1 a) to d) show schematically a temporal sequence of sensor

Data frames 102 and new temporal sequences of augmented data frames 100 interpolated from them by only taking over every nth frame of the original sequence in the new sequence. 2 outlines a temporal sequence of sensor data frames 102 and temporal sequences of augmented data frames 210, 220 interpolated therefrom the temporal sequence of sensor data frames 102 formed by interpolation. It is outlined that in this case the number of frames in the temporal sequence of augmented data frames 210 is greater than the number of frames in the temporal sequence of sensor data frames 102.

The temporal sequence of augmented data frames 220 with the frames 220a to 220b is formed from the elements 102a to 102c of the temporal sequence of sensor data frames 102 by interpolation. It is outlined that in this case the number of frames in the temporal sequence of augmented data frames 220 is smaller than the number of frames in the temporal sequence of sensor data frames 102.

In addition, the assigned times for the respective frames of the augmented data frames 210, 220 can differ from the assigned times of the frames of the temporal sequence of sensor data frames 102.

FIG. 3 outlines a temporal sequence of sensor data frames 102 and a temporal sequence of augmented data frames 300 interpolated therefrom. The elements 102a to 102c of the temporal sequence of sensor data frames 102 first become a sequence of representations of the raw sensor data of the sensor system 302 with the elements 302a to 302c before the temporal sequence of augmented data frames 300 with the elements 300a and 300b is formed by interpolation from the sequence of representations of the raw sensor data of the sensor system 302.

Here, too, the number of elements in the augmented data frames 300 differs from the number of frames in the temporal sequence of sensor data frames 102. Such a representation of the raw sensor data of the sensor system can be an occupancy grid, for example.

4 outlines a temporal sequence of sensor data frames 102 and a temporal sequence of augmented data frames 400 interpolated therefrom with the elements 400a and 400b, the augmented data frames 400a, 400b of the Temporal sequence of augmented data frames 400 are determined by interpolating the scene using successive sensor data frames in such a way that the scene of the respective augmented data frame 400a, 400b is shifted in time relative to the scene of the corresponding sensor data frame 102a to 102c to generate a temporal jitter 410,420 in the temporal sequence of augmented data frames 400.

FIG. 5a sketches a frame from a temporal sequence of sensor data frames 102 with a scene of static objects 530, 540, 550 and dynamic objects 500a, 510a, 520a at a first point in time, the dynamic object 500a being a host vehicle on a trajectory 502 represents. The black dots can be understood as raw sensor data, for example of a LIDAR system of the temporal sequence of sensor data frames 102, which were assigned by means of object recognition as label rectangles corresponding to a bounding box for the respective object and possibly an object ID. In addition, attributes such as a speed of the host vehicle are assigned to the host vehicle 500a. The trajectory 502 of the ego vehicle relates both to a part that has already been traveled and to a future part.

FIG. 5b corresponds to FIG. 5a and sketches a frame from a temporal sequence of sensor data frames 102 with a scene from static 530,

540, 550 and dynamic objects 500a, 510a, 520a (drawn in dashed lines) at a first and a second point in time, the dynamic object 500a, 500b representing a host vehicle on a trajectory 502 at the two points in time. In addition, the relative positions of the dynamic objects 510b, 520b at the second time are shown in FIG. 5b.

Such frames from a temporal sequence of sensor data frames 102 represent the starting point for the interpolation of temporal sequences of augmented data frames, as has already been explained above.

FIG. 6a sketches an augmented data frame with a scene with a host vehicle 600a on a trajectory 502 and static objects 530, 540,

550 at a first point in time N, which was determined by consecutive sensor data frames, at least part of the temporal sequence of sensor data frames, as described above. Figure 6b outlines an augmented data frame with a scene with a host vehicle 600a on a trajectory 502 and static objects 530, 540, 550 at a second point in time N+l, which is determined by successive sensor data frames, at least part of the time Sequence of sensor data frames was determined as described above.

The interpolation of the raw sensor data of the sensor system can be done separately for static and dynamic objects. The division into static and dynamic raw sensor data can take place both via labels, i.e. an assignment of raw sensor data to a bounding box, for example, and/or through an automatic determination of dynamic ranges, which can be done, for example, via flow algorithms that are applied to the raw sensor data . A comparison of FIGS. 6a and 6b shows that for static objects only raw sensor data has to be shifted and rotated according to the determined and known movement of the host vehicle.

FIG. 7a outlines an augmented data frame with a scene with a host vehicle 700a and a first dynamic object 710a on a trajectory 710 at a first point in time N and 710b at a second point in time N+1, and a second dynamic object 720a on a trajectory 720 at a first point in time N and 720b at a second point in time N+1.

FIG. 7b outlines an augmented data frame with a scene corresponding to FIG. 7a. A movement of the host vehicle 700a at a first point in time and 700b at a second point in time is also sketched.

In addition, FIG. 7b outlines that the first dynamic object 710a, 710b at the two points in time and the second dynamic object 720a, 720b at the two points in time can also be interpolated to a position in the scene at an intermediate point in time 710c or 720c.

In other words, static raw sensor data only has to be shifted and rotated according to the known own movement of the host vehicle 700a, 700b. For dynamic objects 710a to 710c, the labels can be used to construct trajectories through consecutive sensor data frames, at least part of the temporal sequence of sensor data frames, and the raw sensor data can be shifted according to the trajectories, which may be associated with the movement of the Ego vehicle compensated Need to become. New frames can ultimately be created by merging the static and dynamic parts.

Such augmented data frames can be generated or interpolated from various neighboring sensor data frames, e.g. the point in time (N+0.3) can be generated from frame N by “forward propagation” by 0.3 or from frame (N+1) by “ Backpropagation” by 0.7 The interpolations can also be performed in both directions and then both results can be weighted to get more accurate results.

8a corresponds to FIG. 5a and outlines a frame from a temporal sequence of sensor data frames 102 with a scene of static 530, 540, 550 and dynamic objects 800a, 810a, 820a at a first point in time, the dynamic object 800a represents a host vehicle on a trajectory 802a. The trajectory 802a of the ego vehicle relates both to a part that has already been traveled and to a future part.

Based on FIG. 8a, FIG. 8b outlines how the method for generating a temporal sequence of augmented data frames can be combined with methods for single-frame augmentation, so that this combination can advantageously be used for object tracking or trajectory planning, for example.

In particular, deviating from the trajectory 802a of the host vehicle 800a, which is derived from the temporal sequence of sensor data frames, different drivable trajectories 802b of the host vehicle, such as the trajectory 802b on which the host vehicle 800b is positioned at a different time, in a temporal sequence of augmented data frames are simulated by rotating or shifting the raw sensor data around the ego vehicle in each individual augmented data frame according to the position of the ego vehicle on the new drivable trajectory.

With this combination of methods, completely new drivable trajectories of the ego vehicle can be simulated with a generated temporal sequence of augmented data frames, i.e. the ego vehicle 800b can drive through the same scene multiple times and differently. Time sequences of augmented data frames are then generated in each case.

FIG. 9a corresponds to FIG. 5a and outlines a frame from a temporal sequence of sensor data frames 102 with a scene made up of static objects 530, 540, 550 and dynamic objects 900, 910a, 920a at a first time, wherein the dynamic object 900 represents a host vehicle on a trajectory 902 . The trajectory 902 of the ego vehicle relates both to a part that has already been traveled and to a future part. In this case, the dynamic object 920a is located on the trajectory 920c, which can be determined by successive sensor data frames, at least part of the temporal sequence of sensor data frames.

Based on FIG. 9a, FIG. 9b outlines how the combination of the method for generating a temporal sequence of augmented data frames with single-frame augmentation according to the method for simulating different trajectories for the host vehicle explained with FIGS. 8a and 9a on a trajectory of a dynamic Object can be applied. i.e. in combination with a temporal augmentation, different trajectories of the other road users can be simulated.

According to the method explained in relation to FIG. 8, different drivable trajectories 920d of the dynamic object 920a, 920b, such as the Trajectory 920d on which the dynamic object 920b is positioned at a different point in time can be simulated in a temporal sequence of augmented data frames by the raw sensor data corresponding to the position of the dynamic object 920a, 920b on the new drivable trajectory in each individual augmented data frame 920d can be rotated or shifted around the dynamic object 920a, 920b.

With this combination of methods, completely new drivable trajectories of the dynamic object 920a, 920b can be simulated with a generated temporal sequence of augmented data frames, i.e. the dynamic object 920a, 920b can drive through the same scene multiple times and differently. Time sequences of augmented data frames are then generated in each case.

Claims

Expectations

1. Method for generating a temporal sequence of augmented data frames (100, 210, 220, 300, 400), with the steps:

Providing a temporal sequence of sensor data frames (102) which are generated by a sensor system;

Determining at least one augmented data frame by interpolating a scene that is characterized by consecutive sensor data frames, at least part of the temporal sequence of sensor data frames (102); and generating the temporal sequence of augmented data frames (100, 210,

220, 300, 400), by a temporal sequence of a plurality of specific augmented data frames corresponding to the temporal sequence of sensor data frames (102).

2. The method according to claim 1, wherein the scene is characterized by sensor data frames (102a, 102b, 102c) with raw sensor data of the sensor system and/or with a representation (300a, 300b) of the raw sensor data of the sensor system.

3. The method according to claim 2, wherein the scene is characterized by sensor data frames (102a, 102b, 102c) with an occupancy grid as a representation of the respective raw sensor data of the sensor system.

4. The method according to any one of the preceding claims, wherein a plurality of augmented data frames of the temporal sequence of augmented data frames (100, 210, 220, 300, 400) is determined by interpolating the scene using consecutive sensor data frames that the scene of the respective augmented data frame is temporally shifted compared to the scene of the corresponding sensor data frame in order to avoid a temporal jitter (410, 420) in the temporal sequence of augmented data frames (100, 210, 220, 300, 400 ) to generate.

5. The method according to any one of the preceding claims, with the steps: identifying at least one first object (500a, 510a, 520a, 530, 540, 550) in the respective scene from consecutive sensor data frames, at least part of the temporal sequence of sensor data frames;

identifying at least one second object (500a, 510a, 520a, 530, 540, 550) in the respective scene from consecutive sensor data frames, at least part of the temporal sequence of sensor data frames;

Determining at least one augmented data frame of the temporal sequence of augmented data frames (100, 210, 220, 300, 400), the interpolation for the at least first object (500a, 510a, 520a, 530, 540, 550) of the scene being different from the interpolation of the at least second object (500a, 510a, 520a, 530, 540, 550) of the scene.

6. The method according to claim 5, wherein the interpolation for determining the at least one augmented data frame for the at least first object (500a, 510a, 520a, 530, 540, 550) and/or for the second object (500a, 510a, 520a , 530,

540, 550) is dependent on a change in a position of a host vehicle (500a, 600a, 700a, 800a) in the scene, whose sensor system provides the temporal sequence of the sensor data frames.

7. The method according to claim 5 or 6, wherein at least one object (500a,

510a, 520a, 530, 540, 550) as a dynamic object (500a, 510a, 520a) is determined based on a label for interpolation that corresponds to the object (500a, 510a, 520a, 530, 540, 550) in the respective scene assigned.

8. The method according to any one of claims 5 to 7, wherein at least one object (500a, 510a, 520a, 530, 540, 550) as a dynamic object (500a, 510a, 520a), based on an automatic determination of dynamic areas of the scene, for the interpolation is determined.

9. The method according to any one of claims 7 to 8, with the steps:

Determining a trajectory for at least one determined to be dynamic

object (500a, 510a, 520a) of the scene using the consecutive sensor data frames; and

interpolating the scene with the at least one dynamically determined object (500a, 510a, 520a) based on the trajectory for the at least one dynamically determined object (500a, 510a, 520a).

10. The method according to claim 9, wherein the at least one object determined as dynamic is a host vehicle (500a, 600a, 700a, 800a) whose sensor system provides the temporal sequence of the sensor data frames (102); and the steps:

determining a drivable trajectory for the scene's host vehicle; and interpolating the scene with the host vehicle (500a, 600a, 700a, 800a) based on the drivable trajectory for the host vehicle.

11. The method according to any one of the preceding claims, wherein the temporal sequence of sensor data frames (102) is provided by means of a video system and/or a RADAR system and/or a LIDAR system.

12. Use of the method according to one of the preceding claims for the verification and/or validation of algorithms and/or system components for a driver assistance system and/or a system for at least partially automated driving.

13. Data device that is set up to carry out a method according to any one of claims 1 to 11.

14. A computer program comprising instructions which, when the computer program is executed by a computer, cause the latter to execute the method according to any one of claims 1 to 11.

15. Machine-readable storage medium on which the computer program according to claim 14 is stored.