WO2022171590A1

WO2022171590A1 - Method for determining a degradation degree of a captured image, computer program product, computer-readable storage medium as well as assistance system

Info

Publication number: WO2022171590A1
Application number: PCT/EP2022/052939
Authority: WO
Inventors: Senthil Kumar Yogamani; Arindam Das
Original assignee: Connaught Electronics Ltd.
Priority date: 2021-02-11
Filing date: 2022-02-08
Publication date: 2022-08-18
Also published as: DE102021103200B3

Abstract

The invention relates to a method for determining a degradation degree (3) of an image (5) captured by a camera (4) of an assistance system (2) of a motor vehicle (1) by the assistance system (2), comprising the steps of: - capturing the image (5) by the camera (4); - performing a deep feature extraction of a plurality of pixels (8) of the image (5) by 0 an encoding module (9) of an electronic computing device (6) of the assistance system (2); - clustering the plurality of pixels (8) by a feature point cluster module (10) of the electronic computing device (6); - regressing the clustered pixels (8) by a regression module (11) of the electronic 5 computing device (6); and - determining the degradation degree (3) depending on an evaluation by applying a sigmoid function (20) after the regression by a sigmoid function module (12) of the electronic computing device (6) as an output of the sigmoid function module (12). Further, the invention relates to a computer program product, to a computer-readable storage medium as well as to an assistance system (2).

Description

Method for determining a degradation degree of a captured image, computer program product, computer-readable storage medium as well as assistance system

The invention relates to a method for determining a degradation degree of an image captured by a camera of an assistance system of a motor vehicle by the assistance system. Further, the invention relates to a computer program product, to a computer- readable storage medium as well as to an assistance system.

From the motor vehicle construction, cameras are in particular known, which can capture an environmental image of an environment of the motor vehicle. In particular, so-called surround view cameras are known, which can in particular be impaired by harsh environmental conditions like snow, rain, mud and the like. In order to capture such an impairment, it is already known that special contamination classes like soil and water drops can be determined and recognized depending on the contamination. Flerein, great challenges in the annotation of only these few classes arise. In particular, the transition areas are to be subjectively annotated and it is difficult to obtain realistic contamination captures with corresponding variety. In particular, lens contaminations based on mud, sand, water drops and frost as well as further unbeneficial weather conditions like snow, rain or fog are not yet possible to determine, wherein they herein in particular generate a deterioration caused by illumination like low light, blinding, shadow or motion blur, wherein they are also to be taken into account in the future. Flowever, it is herein impractical to create further datasets for each of these individual classes and with all combinations of degradation and to annotate them.

US 2018/315167 A1 discloses an object recognition method, in which an object such as for example a vehicle can be recognized from an image captured by an on-board camera even if a lens of the on-board camera is contaminated. In order to achieve the aim, in this object recognition method for recognizing an object contained in a captured image, an original image containing the object to be recognized is created from the captured image, a processed image is generated from the created original image by applying predetermined processing to the original image, and a learning process with respect to the restoration of an image of the object to be recognized is performed using the original image and the processed image. EP 3657379 A1 discloses an image processing device with neural network, which obtains a first image and a respective additional image, wherein a first image capturing device has another field of view than each additional image capturing device. Each image is processed by corresponding instances of a common feature extraction processing network to generate a corresponding first map and at least one additional map. The feature classification includes processing the first map to generate a first classification, which indicates a contamination of the first image. The processing of the or each additional map to generate at least one additional classification, which indicates a corresponding contamination of the or each additional image. The first and each additional classification are combined to generate an enhanced classification, which indicates a contamination of the first image. If the enhanced classification does not indicate a contamination of the first image capturing device, further processing can be performed.

It is the object of the present invention to provide a method, a computer program product, a computer-readable storage medium as well as an assistance system, by which an improved determination of a degradation degree of a camera can be performed.

This object is solved by a method, a computer program product, a computer-readable storage medium as well as an assistance system according to the independent claims. Advantageous forms of configuration are specified in the dependent claims.

An aspect of the invention relates to a method for determining a degradation degree of an image captured by a camera of an assistance system of a motor vehicle by the assistance system. Capturing the image by the camera and performing a deep feature extraction of a plurality of pixels of the image by an encoding module of an electronic computing device of the assistance system are effected. The plurality of pixels is clustered by a feature point cluster module of the electronic computing device. The clustered pixels are regressed by a regression module of the electronic computing device. Determining the degradation degree depending on an evaluation by applying a sigmoid function after the regression by a sigmoid function module of the electronic computing device as an output of the sigmoid function module is effected.

Thus, an improved determination of the degradation degree can in particular be performed. In particular, the degradation degree can for example be between 0 and 1 , wherein 0 can indicate that a degradation is not present, thus the lens is for example clear and clean, respectively, and 1 can signify that the lens is contaminated. Thus, the closer the degradation degree is to 0, the cleaner the lens is. Thus, this is in particular independent of a classification of the contamination. The degradation is only generally determined and if for example then the image with the corresponding degradation degree can in turn be further used to for example use an evaluation of this image for a drive function.

Thus, the invention in particular solves the problem that independently of a classification and thus also independently of a plurality of datasets for a contamination, a corresponding degradation degree can nevertheless be determined. Thus, the assistance system can be simply trained, wherein a low number of datasets is in particular required hereto.

Thus, the degradation deterioration is in particular estimated between 0 and 1 by the sigmoid function, wherein 0 means clean and 1 opaque. The two above mentioned cases are in particular extremes, for which corresponding annotations are known. The quality of the degradation deterioration is determined in the range between 0 and 1 due to the presence of the regression module.

According to an advantageous form of configuration, the encoding module is provided as a convolutional neural network. The encoding module can also be referred to as encoder. In particular, a simple convolutional neural network (CNN) is used to extract depth features from the corresponding input images. In particular if sufficient storage and computational assistance should for example be present, thus, a neural network with a higher capacity can also be used. Thus, a feature extraction can in particular be reliably performed.

It has further proven advantageous if the plurality of pixels is clustered by a K-means algorithm of the feature point cluster module. In particular since the annotations are each only available for two extreme cases, namely clean and opaque, however, the input image will most likely have a varying deterioration in the range between 0 and 1 during the inference time. In order to learn this variation, the K-means clustering method with a number of for example n clusters, wherein n can for example be 5, is in particular applied. It is advantageous to first differentiate the depth feature points in n clusters and later each of these clusters is passed through a regression module. The K-means algorithm is in particular a method for vector quantization, which is in particular also used for cluster analysis. Therein, a previously known number of K-groups is formed from an amount of similar objects. It is further advantageous if the clustered pixels are regressed by a regression module formed as a long short term memory module. The long short term memory module is in particular a so-called long short term memory module (LSTM). The presented problem is in particular not a classification problem, but a regression problem. The output from the in particular K-means clustering is nothing else than a series of depth feature points, which are separated in clusters, wherefore the LSTM module is in particular advantageous for regressing. The core concept of such an LSTM module is the cell state and its different gates. Cell states act like a memory of the network and can carry relevant information during the processing of the sequence. The gates are different neural networks, which decide which information is allowed on the cell status. During the training, the gates can learn which information is relevant, to retain or to forget it. Each gateway or gate contains sigmoidal activations. This is helpful to update or to forget data, since each number, which is multiplied by 0, is 0, whereby values disappear or are "forgotten". Each number, which is multiplied by 1 , is the same value, therefore this value remains the same or is "retained". In such an LSTM module, the so-called forget gate is in particular first formed. This gate decides which information are to be discarded or retained. Information from the previous hidden state and information from the current input are passed through the sigmoid function. Therein, values between 0 and 1 result. The closer to 0, the more it is forgotten, and the closer to 1 , the more it is retained.

In a further advantageous form of configuration, the clustered pixels are unidirectionally transferred from the feature point cluster module to the regression module. In particular the connection from the encoder to the LSTM module via the K-means algorithm is unidirectional and is only used for the forward passage since unsupervised K-means clustering is used.

Further, it has proven advantageous if the regressed pixels are backpropagated to the encoding device. In particular in order to generate an unconnected graph, the backpropagation is effected via a separate connection from the LSTM module to the encoder. It is also advised against using supervised K-means and making the connection bidirectional since it has only few trainable parameters, which are not sufficient in the backpropagation since the gradient flow is effected through a great number of trainable parameters from the LSTM module.

In a further advantageous form of configuration, a sigmoid loss is trained in a first training phase for the assistance system for applying the sigmoid function. Here, sigmoid is in particular an advantageous function in order that the values are obtained in the range between 0 and 1. The reason for this is that a continuous annotation is not present, but only annotations for 0 and 1. Thus, in the first phase, the sigmoidal loss function is in particular first trained, which estimates the quality of the perception deterioration, thus the degradation, from an input image. However, this flow cannot output the quality of the degradation deterioration per pixel, therefore, a functionality is added, which generates the output in the form of the input image.

It has further proven advantageous if the regressed pixels are discriminated by a self attention module of the electronic computing device and the discriminated pixels are transferred to the sigmoid function module. The self-attention module is in particular a so- called attention map. A so-called "self-attention" can in particular be performed by the attention map. This is in particular an advantageous step to discriminate the output of the LSTM.

It is also advantageous if the self-attention module is provided in the form of a global averaging. The global averaging is in particular a so-called global average pooling (GAP).

It is applied to the output signal of the LSTM und multiplied by a one-dimensional vector.

In a further advantageous form of configuration, the output of the sigmoid function module is transferred to a decoding device of the electronic computing device for decoding the output. Thus, a decoder is in particular added, which adopts the output of the sigmoid function module in the one-dimensional format. In particular, this one-dimensional vector is converted into a two-dimensional vector by conversion and then in turn supplied to the decoder for reconstruction. This reconstruction can then in turn be transferred to a superordinated assistance system, wherein the superordinated assistance system can then in turn decide whether or not the image of the camera can be used for evaluation based on the reconstruction. For example, if the image should not be able to be used for evaluation, thus, an alarm signal can be generated for a user of the motor vehicle.

Further, it can also be provided that the credibility of the image is for example downgraded for an at least partially automated operation of the motor vehicle if a corresponding value of the degradation should be present. Thus, a safe operation of the motor vehicle can be realized.

In a further advantageous form of configuration, the decoding device is provided in the form of a fully convoluted neural network. The fully convoluted neural network is in particular a so-called fully convoluted network (FCN). In particular, it has turned out that this is very advantageous to be able to further process the output of the sigmoid function module.

In a further advantageous form of configuration, the decoding device is trained in a second training phase for the assistance system, wherein exclusively the decoding device is trained in the second training phase, which is after the first training phase in time. Thus, the pre-trained model from the first phase is in particular used. Herein, it is known that there are two simultaneous problems. In particular, an estimation of the degradation degree is effected in the first phase, and a reconstruction of the output is effected in the second phase. Therefore, only the decoder is trained in the second phase, the remaining components of the assistance system or of the so-called "pipeline" remain untrainable. Otherwise, the gradient of the decoder would destruct most of the trained weights, including the encoder, in the first epochs. Therefore, the components used in the first training phase are only used as feature extractors in the second phase.

The presented method is in particular a computer-implemented method. Therein, the method is in particular performed on an electronic computing device, wherein the electronic computing device can in particular comprise circuits, for example integrated circuits, processors and further electronic components to perform the corresponding method steps.

Therefore, a further aspect of the invention relates to a computer program product with program code means, which are stored in a computer-readable storage medium, to perform the method for determining a degradation degree according to the preceding aspect, when the computer program product is executed on a processor of an electronic computing device.

A still further aspect of the invention relates to a computer-readable storage medium with a computer program product, in particular an electronic computing device with a computer program product, according to the preceding aspect.

A still further aspect of the invention relates to an assistance system for determining a degradation degree of an image captured by a camera of a motor vehicle, with at least one camera and with an electronic computing device, wherein the assistance system is formed for performing a method according to the preceding aspect. In particular, the method is performed by the assistance system. A still further aspect of the invention relates to a motor vehicle with an assistance system according to the preceding aspect. In particular, the motor vehicle is formed as an at least partially autonomous, in particular as a fully autonomous, motor vehicle.

Advantageous forms of configuration of the method are to be regarded as advantageous forms of configuration of the computer program product, of the computer-readable storage medium, of the assistance system as well as of the motor vehicle. Thereto, the assistance system as well as the motor vehicle comprise concrete features, which allow performing the method or an advantageous form of configuration thereof.

Further features are apparent from the claims, the figures and the description of figures. The features and feature combinations mentioned above in the description as well as the features and feature combinations mentioned below in the description of figures and/or shown in the figures alone are usable not only in the respectively specified combination, but also in other combinations without departing from the scope of the invention. Thus, implementations are also to be considered as encompassed and disclosed by the invention, which are not explicitly shown in the figures and explained, but arise from and can be generated by separated feature combinations from the explained implementations. Implementations and feature combinations are also to be considered as disclosed, which thus do not comprise all of the features of an originally formulated independent claim. Moreover, implementations and feature combinations are to be considered as disclosed, in particular by the implementations set out above, which extend beyond or deviate from the feature combinations set out in the relations of the claims.

Now, the invention is explained in more detail based on preferred embodiments as well as with reference to the attached drawings.

There show:

Fig. 1 a schematic top view to a motor vehicle with an embodiment of an assistance system;

Fig. 2 a schematic block diagram of an embodiment of an assistance system; and Fig. 3 a schematic view of an embodiment of a regression module of an embodiment of an electronic computing device of an embodiment of the assistance system.

In the figures, identical or functionally identical elements are provided with the same reference characters.

Fig. 1 shows a schematic top view to an embodiment of a motor vehicle 1 with an embodiment of an assistance system 2. The assistance system 2 is formed for determining a degradation degree 3 (Fig. 2) of an image 5 captured by a camera 4 of the motor vehicle 1. Flereto, the assistance system 2 in particular comprises the camera 4 as well as an electronic computing device 6. In particular, an environment 7 of the motor vehicle 1 can be captured by the camera 4. Presently, the motor vehicle 1 is in particular an at least partially autonomous, in particular a fully autonomous, motor vehicle 1. Presently, the assistance system 2 can be formed only for determining the degradation degree 3. In addition, the assistance system 2 can also be formed for at least partially autonomous operation or for fully autonomous operation of the motor vehicle 1. Flereto, the assistance system 2 can for example perform interventions in a steering and acceleration device of the motor vehicle 1 . Based on the captured environment 7, then, the assistance system 2 can in turn in particular perform for example corresponding control signals for the steering and braking intervention, respectively.

Fig. 2 shows a schematic block diagram of an embodiment of the assistance system 2, in particular of the electronic computing device 6 of the assistance system 2.

In the method for determining the degradation degree 3, the image 5 is captured by the camera 4. Performing a deep feature extraction of a plurality of pixels 8 of the image 5 by an encoding module 9 of the electronic computing device 6 is effected. The encoding module 9 can also be referred to as encoder. Clustering of the plurality of pixels 8 by a feature point cluster module of the electronic computing device 6 is effected. The clustered pixels 8 are regressed by a regression module 11 of the electronic computing device 6. Then, determining the degradation degree 3 depending on an evaluation by applying a sigmoid function 20 (Fig. 3) after the regression by a sigmoid function module 12 of the electronic computing device 6 as the output of the sigmoid function module 12 is then in turn effected. Fig. 2 further shows that a sigmoid loss 13 is trained in a first training phase for the assistance system 2 for applying the sigmoid function 20. Further, the assistance system 2 or the electronic computing device 6 comprises a self-attention module 14, by which the regressed pixels 8 are discriminated and the discriminated pixels 8 are transferred to the sigmoid function module 12. The self-attention module 14 is in particular provided in the form of a global averaging (global average pooling).

Fig. 2 further shows that the encoding module 9 is provided as a convolutional neural network. Further, the plurality of pixels 8 is clustered by a K-means algorithm 15 of the feature point cluster module 10. The clustered pixels 8 are in turn regressed by a regression module 11 formed as a long short term memory module 16. Further, the clustered pixels 8 are unidirectionally, which is represented by the arrows 17, transferred from the feature point cluster module 10 to the regression module 11 . The pixels 8 are in turn bidirectionally transferred from the regression module 11 to the self-attention module 14, from the self-attention module 14 to the sigmoid function module 12 and bidirectionally from the sigmoid function module 12 to the sigmoid loss 13. This is in particular represented by the arrows 18. Further, the passage of the degradation degree 3 is also bidirectionally effected to a decoding device 19 of the assistance system 2. Thus, the output of the sigmoid function module 12 is in particular effected to the decoding device 19 for decoding the output. Therein, the decoding device 19 can in particular be provided in the form of a fully convoluted neural network. The decoding device 19 is in particular trained in a second training phase for the assistance system 2, wherein exclusively the decoding device 19 is trained in the second training phase, which is after the first training phase in time.

Fig. 3 shows a schematic block diagram of an embodiment of the regression module 11 according to Fig. 2. Presently, it is in particular a long short term memory module 16, which can also be referred to as long short term memory module (LSTM). The long short term memory module 16 presently comprises at least three sigmoid functions 20 as well as at least two tanh functions 21. Further, the long short term memory module 16 comprises at least two pointwise multiplications 22 as well as two pointwise additions 23. The results of the regression are in turn backpropagated to the encoding device 9, which is in particular represented by the arrows 24.

In particular, such a long short term memory module 16 is suitable since the K-means algorithm 15 is a series of depth feature points, which are separated in clusters, such that it is a regression problem. The core concept of the long short term memory module 16 is the cell state and its different gates. Cell states act like a memory of the network and can carry relevant information during the processing of the sequence. The gates are different neural networks, which decide which information is allowed on the cell status. The gates can learn during the training, which information is relevant to retain or to forget it. Each gateway or gate contains sigmoidal activation. This is helpful to update or forget data since each number, which is multiplied by 0, is 0, whereby values disappear or are "forgotten". Each number, which is multiplied by 1 , is the same value, therefore this value remains the same or is "retained". In the long short term memory module 16, the forget gate is first provided. This gate decides which information is to be discarded or retained. Information from the previous hidden state and information from the current input are passed through the sigmoid function 20. Therein, values between 0 and 1 result. The closer to 0, the more it is forgotten, and the closer to 1 , the more it is retained.

Next, the input gate follows, in which the previous hidden state into the current input is first passed to a sigmoid function 20, which decides which values are updated, in that it transforms the values such that they are between 0 and 1 . The hidden state to the current input is also passed to the tanh function 21 to "squeeze" values between -1 and 1 to regulate the network. Later, the outputs of both are multiplied. The sigmoid output decides which information is important to retain it from the tanh output.

For calculating the cell state, the cell state is first pointwise multiplied by the forget vector. Then, the output of the input gate is used and a pointwise addition 23 is performed, which updates the cell status to new values, which the neural network regards as relevant.

An output gate then decides what is to be the next hidden state. The hidden state contains information about the previous inputs. This state is also used for predictions. It is to be noted that the connection from the encoder to the LSTM module via K-means is unidirectional and is only used for the forward passage, since unsupervised K-means clustering is used. In order to generate an unconnected graph, the backpropagation 24 is effected via a separate connection from the regression module 11 to the encoding device 9. It is also advised against using the supervised K-means and making the connection bidirectional since it only has few trainable parameters, which are not sufficient in the backpropagation 24, since the gradient flow is effected through a great number of trainable parameters from the LSTM. Further, the temporal consistency represents an important aspect in algorithms based on image processing to keep the system performance constant. The solution proposed in the figures can be extended by a further encoding device such that both encoders have successive image frames or frames of the video sequence. This is advantageous for the assistance system 2 to output predictions without flicker effect.

Overall, a semi-supervised algorithm is thus proposed, which estimates the degradation degree 3 on pixel basis from the input image. The assistance system 2 is flexible such that it does not require a hard annotation, which is cumbersome and very cost-intensive. The assistance system 2 can in particular be used for at least partially autonomous driving of the motor vehicle 1 .

Claims

1. A method for determining a degradation degree (3) of an image (5) captured by a camera (4) of an assistance system (2) of a motor vehicle (1) by the assistance system (2), comprising the steps of: - capturing the image (5) by the camera (4);

- performing a deep feature extraction of a plurality of pixels (8) of the image (5) by an encoding module (9) of an electronic computing device (6) of the assistance system (2);

- clustering the plurality of pixels (8) by a feature point cluster module (10) of the electronic computing device (6);

- regressing the clustered pixels (8) by a regression module (11) of the electronic computing device (6); and

- determining the degradation degree (3) depending on an evaluation by applying a sigmoid function (20) after the regression by a sigmoid function module (12) of the electronic computing device (6) as an output of the sigmoid function module (12).

2. The method according to claim 1 , characterized in that the encoding module (9) is provided as a convolutional neural network.

3. The method according to claim 1 or 2, characterized in that the plurality of pixels (8) is clustered by a K-means algorithm (15) of the feature point cluster module (10).

4. The method according to any one of the preceding claims, characterized in that the clustered pixels (8) are regressed by a regression module (11) formed as a long short term memory module (16).

5. The method according to any one of the preceding claims, characterized in that the clustered pixels (8) are unidirectionally transferred at least from the feature point cluster module (10) to the regression module (11).

6. The method according to any one of the preceding claims, characterized in that the regressed pixels (8) are backpropagated to the encoding device (9).

7. The method according to any one of the preceding claims, characterized in that a sigmoid loss (13) is trained in a first training phase for the assistance system (2) for applying the sigmoid function (20).

8. The method according to any one of the preceding claims, characterized in that the regressed pixels (8) are discriminated by a self-attention module (14) of the electronic computing device (6) and the discriminated pixels (8) are transferred to the sigmoid function module (12).

9. The method according to claim 8, characterized in that the self-attention module (14) is provided in the form of a global averaging.

10. The method according to any one of the preceding claims, characterized in that the output of the sigmoid function module (12) is transferred to a decoding device (19) of the electronic computing device (6) for decoding the output.

11. The method according to claim 10, characterized in that the decoding device (19) is provided in the form of a fully convoluted neural network.

12. The method according to claim 10 or 11 , characterized in that the decoding device (19) is trained in a second training phase for the assistance system (2), wherein exclusively the decoding device (19) is trained in the second training phase, which is after the first training phase in time.

13. A computer program product with program code means, which, when the program code means are executed on an electronic computing device (6), cause it to perform a method according to any one of claims 1 to 12.

14. A computer-readable storage medium with at least one computer program product according to claim 13.

15. An assistance system (2) for determining a degradation degree (3) of an image (5) captured by a camera (4) of a motor vehicle (1 ), with at least one camera (4) and with an electronic computing device (6), wherein the assistance system (2) is formed for performing a method according to any one of claims 1 to 12.