WO2020164841A1

WO2020164841A1 - Method for providing a training data set quantity, method for training a classifier, method for controlling a vehicle, computer-readable storage medium and vehicle

Info

Publication number: WO2020164841A1
Application number: PCT/EP2020/050913
Authority: WO
Inventors: Christopher Matheisen; Gabor Varga; Christian EFFERTZ; Dirk Wohlfeil
Original assignee: Saint-Gobain Glass France
Priority date: 2019-02-14
Filing date: 2020-01-15
Publication date: 2020-08-20
Also published as: MA55272A; EP3938946A1; CN111837125A

Abstract

Classification systems require a large quantity of training data representing different operating conditions. Creating said training data is complex and expensive. The invention relates to a method for providing a training data set quantity (30), in particular for an artificial neural network (32, 40), comprising the following steps: loading a base training data set (31), which specifies assignments (15, 15') of image data (14, 14') to characterizations (16, 16'); processing the base training data set (31) using at least one optical filter (19) and producing an output training data set (31'), which comprises the processed base training data set (31); providing a training data set quantity comprising the base training data set (31) and the output training data set (31'), the base training data set (31) being associated with properties of an optically transparent reference medium, in particular a reference windshield, and the output training data set (31') being associated with properties of an optically transparent training medium, in particular a training windshield, the optical filter (19) being defined in accordance with the installation position of the optically transparent reference medium (20) relative to an image sensor (23).

Description

Method for providing a set of training data sets, a method for training a classifier, a method for controlling a vehicle, a computer-readable storage medium and a vehicle

The invention relates to a method for providing a set of training data sets, a method for training a classifier, a method for controlling a vehicle, a computer-readable storage medium and a vehicle.

For a number of driver assistance functions it is common in modern vehicles for an image sensor to be arranged behind a windshield, for example as part of a camera. This image sensor records the vehicle environment, in particular in the area in front of and behind the vehicle.

This makes it possible to enable functions such as automatic lane recognition, automatic brake assist or functions of semi-autonomous or fully autonomous driving.

The recorded image data are analyzed by a computing device in the vehicle and control functions are carried out based on this analysis. For example, if the analysis has determined that a stop sign is arranged in front of the vehicle, the vehicle can show the driver a warning that a vehicle stop should be carried out. In principle, however, it is also possible for the vehicle to independently carry out this stop.

Classifiers are usually used to analyze the image data, which determine for each pixel of an image to which object or to which class of objects this pixel belongs. Object recognition can thus be carried out as a pixel-based classification.

Artificial neural networks, in particular “deep convolutional nets”, are often used as classifiers. These neural networks accept either a complete image or a section of an image as input parameters and specify the associated object class for each pixel.

Neural networks can achieve a very high level of accuracy with a high hit rate (precision, recall).

In order to train a classifier, it is necessary to provide annotated training data. These annotated training data contain a large amount of Image data, with an identification or a so-called label being stored for each image and there each pixel, to which object or which object class the pixel belongs. For example, training data can contain an indication that a particular pixel belongs to a stop sign.

Training data are usually generated by first driving a vehicle or several vehicles along a large number of different routes and thereby recording a large amount of image data. The image data is then annotated manually, i. H. performed by a human. This process is partially carried out automatically, but at least verified and corrected by a human. This is already necessary for legislative reasons.

Annotating the training data is therefore a very complex and expensive process.

A disadvantage of using the machine learning methods described is that a large amount of training data is required for this in order to be able to achieve the accuracy required for practical use. This is problematic because, as described, creating the training data is very complex and expensive.

Methods are known from the prior art as to how the data quality can be improved. US 2016/0 300 333 A1 describes a method for filtering contaminants on panes from data for an artificial neural network in such a way that they do not have any negative effects on a classification. So z. B. also water droplets can be calculated from training or usage data.

In the prior art, a number of methods are also described how the number of training data can be increased. US 2017/0236013 A1 describes the generation of synthetic training data for an artificial neural network using a graphics engine. Objects can be placed anywhere in a three-dimensional space using the graphics engine. In this way, rare situations can be created in a targeted manner that only very rarely occur when recording with cameras in real situations. In this respect, an artificial neural network to be trained can be trained in such a way that it delivers improved results in the trained situations.

Another disadvantage, however, is that the training data is recorded with a specific vehicle. That means that the Manufacturing tolerances of the windshield installed there and the image sensor used there influence the image data.

The influence of the manufacturing tolerances reduces the accuracy of the classifier in the classification during operation. It is therefore the object of the invention to reduce the effort involved in generating training data. It is a particular object of the invention to reduce the influence of manufacturing tolerances on the classification. Another particular object of the invention is to increase the safety during the operation of a vehicle with assistance functions. The object is achieved by a method according to claim 1, a method according to claim 8, a method according to claim 9, a computer-readable storage medium according to claim 10 and a vehicle according to claim 11.

In particular, the object is achieved by a method for providing a set of training data sets, in particular for an artificial neural network, comprising the following steps:

Loading a basic training data set which indicates assignments of image data to identifications;

Processing the basic training data set using at least one optical filter and generating an output training data set comprising the processed basic training data set;

Providing a training data set comprising the basic training data set and the initial training data set.

A core of the invention is that the basic training data set is processed using an optical filter. By using the optical filter on the basic training data set, the existing data is doubled. This allows a classifier to be trained better. The resources to be used to create the training data are also significantly reduced.

Image data represent, in particular, a data structure in which individual images are stored in a chronologically ordered manner. Using an optical filter means in particular that image data of a training data set are modified by using the optical filter.

The basic training data set is assigned properties of an optically transparent reference medium, in particular a reference windshield, and the output training data set is assigned properties of an optically transparent training medium, in particular a training windshield.

In the context of this application, optically transparent means in particular that the medium is permeable to visible light, in particular in the range from 400 nm to 800 nm.

The basic training data set is thus assigned to an optically transparent reference medium. For example, the training data can be recorded by an image sensor which is arranged behind a reference windshield. The initial training data set is in turn assigned to the properties of an optically transparent training medium, for example a training windshield. This means that by providing the training dataset, two different transparent media are now taken into account. This improves the accuracy of a classification when used with another windshield. This also makes it possible to take manufacturing tolerances into account in the production of optical media.

In one embodiment, the at least one optical filter can indicate an analytical mapping from the basic training data set to the initial training data set.

It is particularly advantageous if the optical filter indicates an analytical image. Because an analytical mapping is specified, the processing of the basic training data set is comprehensible or predictable. It is thus possible in particular to provide information about which pixel in an image of the basic training data set corresponds to which pixel in an image of the output training data set. Correspondingly, taking into account the analytical mapping, an identification of the pixels of image data of the initial training data set can be adapted in accordance with the identification of the pixels of image data of the basic training data set. A method is thus specified in which the identification of the initial training data set can be carried out particularly efficiently.

In one embodiment, the image data can be stored as a set of pixels with assigned brightness values, preferably in each case for a multiplicity of color channels, wherein an assignment of image data to identifications can specify an associated object class for each pixel.

It is thus possible for a pixel-based assignment to be specified for a black-and-white image or a color image. An identifier can be an indication of an object class. For example, an identification can indicate that a certain pixel of an image is assigned to the object class “stop sign”. Ultimately, such a designation makes it possible to segment an image, with each pixel storing the object class to which it belongs. In one embodiment it is possible for an identifier to be stored as a data structure in which a coordinate of the pixel is stored as a first property and the assigned object class is stored as a second property.

In one embodiment, the optical filter can be determined by measuring properties of at least one optically transparent reference medium.

An optical filter can in particular be designed as a Gaussian blurring, as an offset filter or as a color filter.

The determination of the optical filter can be carried out efficiently by measuring properties of at least one optically transparent reference medium. This means that the optical filter simulates the properties of the optically transparent reference medium. This makes it possible to emulate any optical, transparent reference media by changing the parameters of the optical filter. This makes it possible to measure a large number of different media and to define appropriate optical filters. For example, the variance in the production of optically transparent media, for example windshields, can thus be simulated by optical filters. This makes it possible to simulate the complete variance across the production tolerances by creating a single basic training data set. For example, at least 30 optically transparent media can be measured so that 30 corresponding optical filters are created. This means that from a single basic Training data set a total of 30 different output training data sets can be created. All in all, the creation of training data for training a classifier is thus significantly simplified.

In one embodiment, the optical filter can be determined by determining a modulation transfer function. Determining a modulation transfer function is a particularly efficient implementation for determining the optical filter.

The optical filter is determined taking into account an installation position of the optically transparent reference medium with respect to an image sensor. In one embodiment it is also possible that the optical filter is determined taking into account geometric properties of the optically transparent reference medium, in particular using a ray tracing-based method.

Geometric properties can indicate a reflectivity, a thickness, a refractive power, a transmission and / or a polarization of an optically transparent medium.

It is particularly advantageous that the installation position of an image sensor with respect to the optically transparent medium is also taken into account. This can be exploited in particular when using ray tracing-based methods. All in all, an optically transparent medium can be simulated very precisely by using ray tracing-based methods. This improves the accuracy of the optical filters.

In one embodiment, the method can comprise the following steps:

Determining a sensor filter taking into account a specific noise of the image sensor;

- Applying the sensor filter to the image data of the training data set.

Image sensors generally have a characteristic noise which can turn out differently depending on the image sensor used. The noise can be measured, and optical filters can be designed to reduce the measured noise. It is therefore helpful to restore the noise ratio, which has been artificially changed by an optical filter, with a sensor-specific filter and to adapt it to the real conditions to be expected. Since the image data of the basic training data set and the initial training data set are recorded using the same image sensor, the same sensor filter can be used for all training data sets. It is of course also conceivable to use different sensor filters for the different training data sets. In particular, a sensor filter can be determined taking into account the image sensor used to record the image data of the corresponding training data sets.

Overall, the quality of the training data set or the training data set set is thus improved.

The object is also achieved in particular by a method for training an artificial neural network, comprising the following steps:

Acquisition of reference image data indicating a plurality of images, in particular using an image sensor;

Assigning identifiers to pixels of the plurality of images to generate a basic training data set;

Providing a training dataset as described above;

Training a classifier, in particular an artificial neural network, using the training data set.

It is possible to train an artificial neural network with a training data set provided as described above. Thus, a method is provided which trains a classifier that can be used to classify image data.

The image sensor can be a CMOS or CCD sensor, for example. The acquisition of the reference image data can be carried out, for example, using a test vehicle on which an image sensor is arranged. The assignment of the identification to pixels can be carried out manually.

The object is also achieved in particular by a method for controlling a vehicle, comprising the following steps:

Loading a classifier trained by the method described above; Acquiring image data indicating surroundings of a vehicle;

Classifying the image data using the classifier;

Generating control instructions for a control device of the vehicle using the classified image data; - Controlling at least one actuator of the vehicle by the control unit under

Use of the control instructions.

With the method described, it is therefore possible to at least partially control a vehicle. In particular, the method makes it possible to control an actuator. A control instruction can be, for example, an indication of a steering angle, an acceleration indication, a speed indication, a braking indication or a similar indication. Overall, the use of a training data set for training a classifier which is used in the operation of a vehicle makes the vehicle safer to use.

Further embodiments emerge from the subclaims. In the following, the invention is explained in more detail using exemplary embodiments, which show:

FIGS. 1 a and 1 b: a schematic representation of a vehicle in a top view and a side view;

FIG. 2: a representation of image data; FIG. 3: a detailed view of an image section;

FIG. 4: a schematic representation of an assignment of pixels to

Markings;

FIG. 5: an illustration of the use of an optical filter;

FIG. 6 is an illustration showing the generation of a set of training data sets;

FIG. 7: an illustration of a light beam which is an optically transparent

Medium penetrates and hits an image sensor; FIG. 8: a flow chart showing the generation of a

Training dataset illustrated.

FIG. 1A shows a vehicle 1. A camera 3 is arranged in the driver's cab of vehicle 1. The camera 3 supplies image data to a processing device 4, which is also arranged in the vehicle 1. As can be seen from FIG. 1 B, the camera 3 is arranged in the area of a rearview mirror 7 of the vehicle 1. The camera 3 is arranged and aligned in such a way that the camera 3 can record the area in front of the vehicle 1. The camera 3 has an image sensor which can be designed as a CMOS or CCD sensor, for example. Furthermore, the driver 2 is shown symbolically in FIG. 1A and the steering wheel 5 in FIG. 1B.

Light rays that are recorded by the image sensor of the camera 3 first pass a windshield 6 and then a lens of the camera 3. The effective passage area of the windshield 6 can have an area of 7 cm x 7 cm or preferably 40 cm x 20 cm .

The image data recorded by the image sensor are sent to the processing device 4 via a bus system. The bus system can be, for example, an Ethernet-based communication system. It is also conceivable that a CAN bus or a similar data connection is used. In particular, it is conceivable that a wireless connection is used.

The processing device 4 is designed to generate control instructions based on the image data for the vehicle 1. For example, the processing device 4 can use an artificial neural network or another classifier. The image data serve as input parameters for the classifier. For example, a classifier can be used that recognizes objects in the front area of vehicle 1.

Such an example is shown in FIG. FIG. 2 shows an image section 10 of image data at a specific point in time.

Two objects are arranged in the image section 10. A stop sign 11 and a tree 12. The classifier, which is executed by the processing device 4 of the vehicle 1, is designed to determine the individual objects with pixel accuracy. This means that an object class can be specified for each pixel. Thereby it is possible, on the one hand, to segment the image detail 10 and, on the other hand, to determine which objects are in front of the vehicle 1.

Based on the detected objects 11, 12, a control instruction can then be derived by the processing device 4. The position of objects 1 1, 12 can also be included as a parameter. For example, the processing device 4 can be designed to output a warning to the driver 2 of the vehicle 1 when a stop sign 11 is arranged in front of the vehicle 1. For this purpose, for example, a light in the vehicle interior can light up or a warning message can be projected into the field of view of the driver 2 by means of a head-up display.

Figures 3 and 4 illustrate the assignment of individual pixels to object classes. For this purpose, a partial image detail 13 of the image detail 10 is shown in FIG. The partial image section 13 comprises a multiplicity of pixels 14, 14 ‘, each of which has brightness values assigned. However, it is also possible to assign the brightness values of different color channels to individual pixels for color reproduction. The classifier, which is executed by the processing device 4, is now designed to assign an object class to the individual pixels 14, 14 '. FIG. 4 shows that a white pixel 14 is assigned the object class 16, ie. H. "Background" (bg). However, object class 16 ‘is assigned to pixel 14‘ by means of assignment 15 Zuordnung. The object class 16 ‘indicates that the pixel 14‘ is part of a “stop sign” (obj 1). Correspondingly, those pixels of the image section 10 are also assigned to the object class 16 ”that are part of an object“ tree ”(obj 3).

As already mentioned in connection with FIG. 1, the camera 3 is arranged in the vehicle 1 behind a windshield 6. This means that the windshield 6 has an influence on the recording of the surroundings of the vehicle 1. For example, the windshield 6 can cause distortion. This is particularly disadvantageous because manufacturing tolerances occur in the manufacture of windshields, so that the representation of the same scene with different windshields 6 leads to different image data. If a classifier is then trained with the data of only one windshield 6, the manufacturing tolerances or different vehicle models are not taken into account. This leads to unsatisfactory results in the classification as described in connection with FIGS. 1 and 2. The effect that a windshield 6 has on the light which is transmitted from an object to the image sensor of the camera 3 can be approximated by means of optical filters.

Such an optical filter 19 is shown as an example in FIG. In the example of FIG. 5, an original image detail 17 is shown, which was recorded using a reference windshield. The optical filter 19 now defines an image for each pixel 14 ‘of the original image section 17 on pixel 14 ″ of a processed image section 18.

The example in FIG. 5 shows that the pixel 14 Pixel, which in the exemplary embodiment shown is arranged in the third line at the fourth position from the left, is arranged in the processed image section 18 in the fourth line at the third position from the left. An offset is therefore defined for each pixel 14 ‘. A number of other possible optical filters are of course conceivable. For example, different windshields can differ in their light transmission. As a result, the brightness values of the individual pixels have different strengths. This can be emulated with an optical filter. It is also conceivable that individual image areas are shown distorted by a slight curvature in the pane. Such a behavior can also be represented by an optical filter 19. In particular, an optical filter 19 can include an analytical representation, so that it is possible to understand which pixels in the output image correspond to which pixels in the processed image. As a result, an identification or a label of corresponding pixels can also be transmitted.

With the use of the different optical filters 19 it is therefore possible to approximate different windshields. As a result, a large number of different training data can be generated with which a classifier can then be trained. Due to the increased variability of the training data, the classifier is generally more robust against interference. In particular, the classifier can react better to unusual situations.

FIG. 6 once again illustrates the advantage of the present invention. FIG. 6 shows that a training data set 31, which contains image data and a corresponding identifier, can be processed with an optical filter so that a training data set 30 is generated which includes the original training data set 31 and the processed training data set 3T. The number of training data was thus doubled, with different windshields now being covered by the training data.

FIG. 7 shows how the properties of a pane 20 can be approximated with the aid of a ray tracing-based method. This makes use of the fact that a light source 21 emits a light beam linearly in the direction of the pane 20. When it hits an outer glass entry plane 24 facing the light source 21, part of the light is reflected, so that a reflected light beam 22 is reflected away from the glass entry plane 24. Another part of the light beam is refracted and passed through the pane 20. When passing through a glass exit plane 25, which is arranged on the side facing the camera, the light beam is refracted again and directed in the direction of the camera 3. Before the light beam can strike an image sensor 23, it is refracted again by an objective 26 of the camera 3.

The parameters of the pane 20 therefore include, on the one hand, the thickness B of the pane 20, the reflectivity, the refractive power, the transmission and / or the polarization. These parameters can also represent parameters of an optical filter 19, so that different slices can be emulated by adjusting the parameters of the optical filter 19. An optical filter 19 can be modeled by a large number of standard filters, for example a Gaussian blurring filter or a displacement filter.

FIG. 8 is a flow chart which once again describes the entire method 40. First, image data 41 are recorded and the objects shown in the image data 41 are manually assigned to corresponding object classes in a labeling step 42. Annotated or labeled image data 43 are now processed in a processing step 44 using an optical filter 19. Different optical filters 19 are used in order to simulate a large number of different optically transparent media. For example, a large number of different windshields 6 can be simulated in this step.

The processing step 44 generates a training dataset set 45, which is provided to a training algorithm for a classifier 47 in a training step 46. For example, it can be an artificial neural network, for example a convolutional neural network. The trained classifier 47 is transferred to a processing device 4 for a vehicle 1 in a transfer step 48. In a detection step 51, image data 50 are fed to the classifier during the operation of the vehicle 1, so that the classifier 47 classifies the objects stored in the image data 50.

The classified image data 52, ie data which contain information about the objects shown in the image data, are analyzed by the processing device 4 in a control step 53, corresponding control instructions for actuators of the vehicle 1 being derived. These control instructions are also implemented in control step 53, so that, for example, a warning is displayed for a user. The detection step 51 and the control step 53 are carried out alternately until the vehicle 1 comes to a stop or is switched off.

List of reference symbols

1 vehicle

2 drivers

3 camera

4 processing facility

5 steering wheel

6 windshield

7 rearview mirrors

10 camera image

1 1 stop sign

12 tree

13 Image detail

14, 14 ‘, 14" pixels

15, 15 ‘assignment

16, 16 ‘, 16“ marking / label

17 Original image section

18 processed image section

19 optical filters

20 optically transparent medium / glass pane 21 light source

22 Reflected light beam

23 image sensor

24 glass entry level

25 glass exit level 26 lens

30 set of training data

31 31 Training data set

32 40 Convolutional Neural Network

40 procedure

41 image data

42 labeling step

43 labeled or annotated image data

44 Processing step

45 Training dataset set

46 training step

47 classifier

48 transfer step

50 image data

51 detection step

52 classified image data

53 control step

B width

Claims

1. A method for providing a set of training data sets (30), in particular for an artificial neural network (32, 40), comprising the following steps:

Loading a basic training data set (31) which indicates assignments (15, 15) of image data (14, 14 ‘) to identifications (16, 16‘);

Processing the basic training data set (31) using at least one optical filter (19) and generating an output training data set (3T) which comprises the processed basic training data set (31);

Providing a set of training data sets comprising the basic training data set (31) and the initial training data set (3T), characterized in that

the basic training data set (31) is assigned properties of an optically transparent reference medium, in particular a reference windshield, and the output training data set (3T) is assigned properties of an optically transparent training medium, in particular a training windshield, wherein

the optical filter (19) is determined taking into account an installation position of the optically transparent reference medium (20) with respect to an image sensor (23).

2. The method according to claim 1,

characterized in that

the at least one optical filter (19) indicates an analytical mapping from the basic training data set (31) to the output training data set (3T).

3. The method according to any one of the preceding claims,

characterized in that

the image data (14, 14 ') are stored as a set of pixels (14, 14') with assigned brightness values, preferably each for a plurality of color channels, with an assignment (15, 15 ') of image data (14, 14') ) specifies an associated object class for identifications (16, 16 ') for each pixel (14, 14').

4. The method according to any one of the preceding claims,

marked by

Determining the optical filter (19) by measuring properties of at least one optically transparent reference medium.

5. The method according to any one of the preceding claims, in particular according to claim 4,

characterized in that

the optical filter (19) is determined by determining a modulation transfer function.

6. The method according to any one of the preceding claims, in particular according to claim 4,

characterized in that

the optical filter (19) is determined taking into account geometric properties (B) of the optically transparent reference medium (20), in particular using a ray tracing-based method.

7. The method according to any one of the preceding claims,

marked by

- Applying the sensor filter to the image data of the training data set.

8. A method for training an artificial neural network, comprising the following steps:

Acquisition of reference image data which indicate a plurality of images, in particular using an image sensor (23);

Assigning identifiers to pixels of the plurality of images to generate a basic training data set (31);

Providing a training dataset set according to one of the preceding claims;

9. A method for controlling a vehicle (1), comprising the following steps:

Loading a classifier trained by the method of claim 8;

Acquisition of image data indicating the surroundings of a vehicle (1); Classifying the image data using the classifier;

Generating control instructions for a control unit of the vehicle (1) using the classified image data;

Control of at least one actuator of the vehicle (1) by the control device using the control instructions.

10. Computer-readable storage medium containing instructions that cause at least one processor to implement a method according to any one of the preceding claims when the instructions are executed by the at least one processor.

11. Vehicle comprising:

an image acquisition device which is designed to acquire image data;

a storage medium according to claim 10;

- A processing device which is designed to perform a method according to

Claim 9 to carry out.