EP3847576A1

EP3847576A1 - Method and system for improved object marking in sensor data

Info

Publication number: EP3847576A1
Application number: EP19773742.2A
Authority: EP
Inventors: Jens Eric Markus MEHNERT
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2018-09-04
Filing date: 2019-09-03
Publication date: 2021-07-14
Also published as: CN112639812A; US20210081668A1; WO2020048940A1; DE102018214979A1; US11521375B2

Abstract

The invention relates to a method and to a system (100) for improved object marking in sensor data, as a result of which an at least partially automated annotation of objects or object classes in a recorded data set is possible. The method for object marking in sensor data provides for a scene (170) to be detected in a first state by at least one sensor (140, 150). A first object marking (195) is then assigned to at least one object (180) contained in the scene in a first data set (190) containing the scene in the first state. Subsequently, the similar or at least substantially matching scene (170') is detected in a second state that is different from the first state by the at least one sensor (140, 150), and an at least partial acceptance of the first object marking (195) contained in the first data set (190) for the object (180) identified in the second state of the scene (170 ') as a second object marking (1951 ') in a second data set (190') takes place.

Description

description

Title:

Method and system for improved object marking in sensor data

The present invention relates to a method and a system for

Object marking in sensor data.

State of the art

In the area of machine learning, training data sets are often used, which can contain, for example, image and / or video data, for example to to learn automatic object recognition in such or similar data. An exemplary use of such an automatic

Object detection can e.g. be an autonomous driving or flight operation to recognize objects in the vehicle environment. In order to ensure reliable object detection, a large number of

Training records may be required.

Objects identified in a (training) data record are often classified, marked or labeled and form an object-label pair which is used for

machine learning can be processed by machine. For example, in a data record in which a scene of a traffic situation is recorded, a street course can be provided as an object with a marking that corresponds to the

Road course designated as such or classified. In particular, the generation of such image and video annotations, that is

Object marking in image and video data sets can be cost-intensive, since this cannot be automated at all or only to a very limited extent. For this reason, such image and video annotations are predominantly carried out by human editors, for example the annotation of one captured image for semantic segmentation may take an average of more than an hour.

Disclosure of the invention

The object of the invention is therefore to provide a possibility for the simplified or more cost-effective provision of object markings or data containing annotations.

This object is achieved by a method and a system for object marking in sensor data in accordance with the independent claims. Advantageous further developments of the invention result from the dependent claims, the description and the accompanying figures.

Such a method for object marking in sensor data can

used in particular to generate one or more training data sets for machine learning. The process has the following steps:

- First, a scene in a first state is detected by at least one sensor. The scene can e.g. a vehicle environment, a street scene, a course of the road, a traffic situation or the like and include static and / or objects such as traffic areas, buildings, road users or the like. The sensor may be a single optical sensor, such as a camera, a lidar sensor, or a fusion of such or similar sensors.

- At least one object contained in the scene is assigned a first object marking, for example a first annotation, in a first data record containing the scene in the first state. The first data record can contain an image or an image sequence which reproduces the scene in its first state, that is to say, for example, contains an image of a street course. The first object marking can, for example, frame, fill, label or otherwise mark the object, but preferably optically. For example only, the Be machine-readable to trace the course of the road. In other words, the object and the object marking can form an object-label pair that can be processed, for example, in machine learning. The

Object marking can be of a certain object class, such as street,

Tree, building, traffic sign, pedestrian or the like.

- It will also be similar or at least essentially

matching scene in a to the first state

different, second state detected by the at least one sensor. In the simplest case, this can e.g. mean that a road is driven at least twice and is detected by the sensor, in which case e.g. different times can distinguish the first state from the second state. As described above, one or more objects of the scene are already marked in the first state, e.g. a course of the road.

- Then there is at least partial acceptance of the first object marking contained in the first data record for the object recognized in the second state of the scene (recognized object as a second object marking in a second data record. Clearly viewed, this can already be traced in the above-mentioned example of a street course. Of course this method can in principle be repeated with any number of data records and / or states.

With this method it is possible to reduce the provision costs for data which contain object markings or annotations. For the second (third, fourth, etc.) data record, at least not all of them have to

Object markings can be created from scratch again. Rather, this effort only has to be carried out once, and the second data set can then be derived from it. Considered clearly, a location to be captured, for the image content of which an annotation already exists, can be captured again in one or more other states, the effort of the annotation being operated only initially. When to train a function through

machine learning the location should be recorded at daytime and at nighttime, it would be sufficient in this case, e.g. only in the daytime scene Set object marking and apply it to the night scene. A large number of training data can thus be generated on the basis of an existing pair of object labels, without incurring any costs for the annotation.

A further training provides that in order to recognize the scene in the second

Location information of the scene is assigned to the first data record. The location information can e.g. by a suitable sensor, e.g. be provided by GPS or the like. This makes it easier to recognize the scene or to assign a data record to a specific scene.

According to another development, sensor data can also be merged in order to provide the location information. E.g. this can be done using a combination of GPS and camera intrinsics, e.g. in the form of

Calibration data of the camera or the like, based. Also

Own movement data of a vehicle can be taken into account.

This further improves recognition.

Another development provides that, in order to recognize the scene in the second data record, the first data record is associated with viewing angle and / or position information of the scene. This can also be in addition to assigning a

Location information and e.g. on a self-movement data of a vehicle, by GPS data, a camera instrument or the like. This further improves recognition.

According to a further development, depth prediction, e.g. monocular, by means of a stereo depth estimate, an estimate of the optical flow and / or based on LIDAR data, of the first object marking

having image, that is, based on the first data set. A prediction of semantic segmentation can also be carried out in the unknown image, that is to say in the second data set. A further development provides that the object marking or the label is transformed so that the object marking fits the new image of the second data record more precisely. This transform is also known as warping.

According to another development, a SLAM method (Simultaneous Localization And Mapping) can be used in order to obtain a better location and position determination.

The effort for object marking or annotation can be particularly significantly reduced if the adoption of the first object marking is at least partially automated by an artificial intelligence module, or KL module for short. This can have at least one processor and e.g. be set up by program instructions to emulate human-like decision-making structures in order to independently solve problems, such as here e.g. to solve the automatic object marking or annotation.

For a particularly high performance of the method, it has proven to be advantageous if at least one artificial neural network, which can be configured in a multi-layer and / or folding manner, determines image regions of the scene in the first and second data sets of the KL module that match.

A further development provides that the artificial neural network can provide a pixel-by-bit match mask as an output. This can be a good basis for manual, semi-automatic or fully automatic

Form further processing.

In order to save even more costs, the Kl module can be trained using the first and / or second data set, which is why the Kl module can be trained as

Training data record can be fed.

According to another development, preferably by means of a SLAM method, at least one distinguishing feature of the scene between the first state and the second state can be determined and that

Distinguishing feature can be assigned to the second object marking. This is at least possible if the distinguishing feature, for example the Difference class, already has a sufficiently good quality (eg statistical test with high confidence) and the comparison network indicates a match for the remaining image content of the scene. Then, for example, an option can be offered to automatically take over the object marking, ie the annotation. In other words, for example on the basis of the above-mentioned or another artificial neural network, a prediction can be carried out with existing training data in order to detect any changes in the scene. Since there is already a pair of image labels in the training data for the scene, a high quality of prediction can be achieved. A difference between annotation and prediction gives an indication of which objects must be annotated.

A further development provides that the scene in the second state can be captured by an image sequence and an unfavorable position, from which the scene is captured in the second state, can be compensated for on the basis of at least one single image upstream and / or downstream of the individual image to be marked .

For example, the first state and the second state of the scene can differ in terms of weather conditions, light conditions or the like.

For example, the scene can be captured again if the visibility conditions deteriorate due to fog compared to sunny weather, at night or the like.

According to another development, the second state can, for example if the second state includes darkness, poor visibility or the like, cause one or more objects of the scene to be (no longer) visible in the second data set. In this case, such invisible areas can be marked or annotated accordingly or based on e.g. a signal-to-noise ratio are automatically excluded.

The invention also relates to a system for object marking in sensor data. The system can in particular be operated in accordance with the method described above and accordingly further developed according to one or more of the embodiment variants described above. The system has Via at least one, preferably optical, sensor for detecting a scene and via a data processing device, for example a computer with a processor, a memory and / or the like. The

Data processing device is set up to assign at least one object contained in the scene in a first data set containing the scene in a first state, and the first object marking contained in the first data set as second for the object recognized in a second state of the scene To at least partially take over object marking in a second data record.

According to a development, the system can have a second sensor for determining the location and / or position during the detection of the scene, the location and / or position determination of the detected scene, i.e. in particular the first data record. The second sensor can e.g. comprise one or more sensors, such as for GPS positioning, for determining self-movement or the like.

Further measures improving the invention are described in more detail below together with the description of the preferred exemplary embodiments of the invention with reference to figures.

Brief description of the figures

In the following, advantageous exemplary embodiments of the invention are described in detail with reference to the accompanying figures. Show it:

Figure 1 is a schematic of a system dealing with one of this invention

operates the underlying process, and

Figure 2 shows a practical application of the method using the example of a

Road course.

The figures are only schematic and are not to scale. In the figures, the same, equivalent or similar elements are provided with the same reference numerals throughout. Embodiments of the invention

FIG. 1 shows a diagram of a system 100 which is suitable for the partially automated and / or fully automated marking or annotation of an object or an object class recognized in an image or in an image sequence.

The system 100 comprises a data processing device 110, which can have a processor, a storage device, in particular for program code, etc. In this embodiment, the

Data processing device 110 has at least one artificial intelligence module 120, or KL module for short, which, for example, uses a multilayered artificial neural network 130 for pattern recognition in an image or in an image

Image sequence is set up. In addition, the system has at least one first sensor 140, which is designed as an optical sensor, for example as a camera, and at least one second sensor 150 for determining the location and / or position. The sensors 140, 150 are exemplary on or in one

Motor vehicle 160 arranged and can also be borrowed from another vehicle system. The first sensor 140 can thus be part of a driver assistance system that can also be set up for autonomous driving operation of the motor vehicle 160. The second sensor 150 can be part of a

Navigation system, an odometry system or the like.

System 100 can be operated using the method described below.

First, the motor vehicle 160 is moved by a scene 170, which here is an example of a traffic situation with an object 180, which e.g. a static object in the form of a street, one

Traffic sign, etc. This scene 170 is recorded in a first state as an image or image sequence by means of the first sensor 140 and stored in a first data record 190. The first state of the scene 170 corresponds, for example, to a daytime travel of the motor vehicle 160 through the scene, with the scene being assumed to be illuminated as bright as day. Based on the location and / or location determination by the second sensor 150 are also one in the first data record 190

Location information, the location where the scene was recorded, and viewing angle and / or location information.

The same or at least similar scene is again recorded in a second state, which differs from the first state, which is why the newly recorded scene in the second state is denoted by 170 in FIG. 1. This corresponds here, for example, to a night drive of the motor vehicle 160 through the scene 170 ', with a correspondingly dark environment being assumed here. Furthermore, it is assumed that the object 180 is still part of the scene 170 '. This scene 170 'in the second state is stored in a second data record 190'.

Furthermore, the first data record 180 is fed to the data processing device 110 and with its help, e.g. manually or partially automated, possibly also fully automated by the KL module 120, the object 190 with a first object marking 195, i.e. an annotation, marked. The first

Object marker 195 can e.g. be a highlight of a street.

The second data record 190 'is also fed to the data processing device 110 and processed therein. The KL module 120 is also set up to recognize the object 180 in the second data record 190 'and to assign a second object marking 195' to it, which is the same as the first object marking 195 in the first data record 190 when the object 180 is unchanged. Recognizing the scene 170 'and / or the object 180, the KL module 120 accesses the information on the location and location of the recording of the scene 170, which are stored in the first data record 190. As a result of the processing by the KL module 120, the second data record 190 now also contains the similar or the same scene 170 and the second object marking 195.

As indicated in FIG. 1, the first and the second data record 190, 190 'serve as training data record 200 for the Kl module 120 itself or for a further Kl module 210, which can also be part of an autonomously driving vehicle, for example. FIG. 2 shows an exemplary scene 170 on the left-hand side, in which the object 180 is a course of a road, which already here is the first

Object marking 195 is provided. It is assumed that comparatively bad weather prevailed during the recording of scene 170 and therefore the view is slightly restricted. On the right-hand side of FIG. 2, scene 170 is again recorded when the weather is clearer. The KL module 120 has recognized the scene 170 '(and has the object 180, that is to say the

Road course, the second object marking 195 'automatically assigned.

Based on the illustrated embodiment, the system 100 and the method described above can be modified in many ways. For example, it is possible that, based on the first data record 190, a depth prediction, e.g. monocular, by a stereo depth estimate, an estimate of the optical flow and / or on the basis of LIDAR data, of the image already having the first object marking. There can also be a prediction of semantic segmentation in the

unknown image, i.e. the second data record. Furthermore, it is conceivable that the first object marking 195 is transformed so that the object marking fits the new image of the second data record 190 ′ more precisely. This transform is also known as warping. It is also possible that a SLAM (Simultaneous Localization And Mapping) method is used to obtain a better location and position determination. It is also conceivable for the artificial neural network 130 to be pixel by pixel

Can provide match mask as output. This can be a good basis for manual, semi-automatic or fully automatic

Form further processing. In addition, it is possible that, in particular, the SLAM method determines at least one distinguishing feature of the scene 170, 170 'between the first state and the second state and the second object marking 195' is assigned to the distinguishing feature, at least if the distinguishing feature, e.g. the difference class already has a sufficiently good quality (e.g. statistical test with high confidence) and the artificial neural network 130 indicates a match for the remaining image content of the scene 170, 170 ', e.g. an option is offered to automatically take over object marking 195.

Claims

Expectations

1. Method for object marking in sensor data, with the steps:

- detecting a scene (170) in a first state by at least one sensor (140, 150),

Assigning a first object marker (195) to at least one object (180) contained in the scene in a first data record (190) containing the scene in the first state,

marked by

- Capture the similar or at least essentially

matching scene (170 ') in a second state different from the first state by the at least one sensor (140, 150),

- At least partially accepting the first object marking (195) contained in the first data record (190) for the object (180) recognized in the second state of the scene (170 ') as a second object marking (195' ') in a second data record (190 ").

2. The method according to claim 1, characterized in that for recognizing the scene (170 ') in the second data record (190' '), the first data record (190) is associated with location information of the scene (170).

3. The method according to claim 1 or 2, characterized in that for recognizing the scene (170 ') in the second data set (190'), the first data set (190) is associated with viewing angle and / or position information of the scene (170).

4. The method according to any one of the preceding claims, characterized in that the adoption of the first object marking (195) is at least partially automated by an artificial intelligence module (120), KL module.

5. The method according to claim 4, characterized in that an artificial neural network (130) of the KL module (120) determines matching image areas of the scene (170, 170 '') in the first and second data sets (190, 190 '').

6. The method according to claim 5, characterized in that the artificial

neural network (130) provides a pixel-wise match mask as output.

7. The method according to any one of claims 4 to 6, characterized in that the first and / or second data set (190, 190 ') to the Kl module (120, 210) as

Training data record (200) is supplied.

8. The method according to any one of the preceding claims, characterized in that, preferably by a SLAM method, at least one

Distinguishing feature of the scene (170, 170 ') between the first state and the second state is determined and the second object marking (195) is assigned to the distinguishing feature.

9. The method according to any one of the preceding claims, characterized in that the scene (170 ') is detected in the second state by an image sequence and an unfavorable position from which the Scene (170 ') is recorded in the second state, is compensated.

10. The method according to any one of the preceding claims, characterized in that the first state and the second state of the scene (170, 170) differ by weather conditions, lighting conditions or the like.

11. System (100) for object marking in sensor data, with

- at least one first sensor (140) for detecting a scene (170, 170 ') and

- a data processing device (110),

characterized in that

the data processing device (110) is set up to

- to assign at least one object (180) contained in the scene (170) in a first data record (190) containing the scene (170) in a first state, and - to at least partially adopt the first object marking (195) contained in the first data record (190) for the object (180) recognized in a second state of the scene as the second object marking (195 ') in a second data record (190').

12. The system according to claim 11, characterized by a second sensor (150) for determining the location and / or position during the detection of the scene (170, 170 '), the location and / or position determination of the detected scene (170, 170' ) is assignable.