CN114943265A

CN114943265A - Method and system for detecting visual line obstruction of imaging sensor and training method

Info

Publication number: CN114943265A
Application number: CN202210120381.1A
Authority: CN
Inventors: F·布里克韦德; U·布勒施; 郁杰
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-02-10
Filing date: 2022-02-09
Publication date: 2022-08-26
Also published as: DE102021201255A1

Abstract

The invention relates to a computer-implemented method and system for detecting line of sight obstructions, in particular occlusions (20), of an imaging sensor and a training method. The method comprises the following steps: comparing (S5) the first data set (DS 1) with a second data set (DS 2) representing a reference topology (18) of the scene (12 a), in particular occurrences of scene elements (14 a, 14b, 14c, 14 d) and/or arrangements of these scene elements in the scene (12 a); furthermore, a line-of-sight obstruction, in particular an occlusion (20), to the imaging sensor (10) is detected (S6) if a deviation (A) of the classified scene elements (14 a, 14b, 14c, 14 d) of the first data set (DS 1) from the classified scene elements (14 a, 14b, 14c, 14 d) of the second data set (DS 2) exceeds a predefined threshold value (SW). The invention also relates to a computer program and a computer-readable data carrier.

Description

Method and system for detecting visual line obstruction of imaging sensor and training method

Technical Field

The invention relates to a computer-implemented method for detecting a line-of-sight obstruction of an imaging sensor.

The invention also relates to a system for detecting a line of sight obstruction of an imaging sensor.

The invention also relates to a computer-implemented method for providing a trained machine learning algorithm to classify static semantic scene elements of a sequence of single images recorded by an imaging sensor.

Background

In order to ensure the functional capability of camera-based systems, such as monitoring systems or driver assistance systems, methods are being developed which automatically evaluate the functional capability and in particular detect tampering at the camera.

Camera tampering refers to, for example, the following: due to obstructions/obstructions caused by objects in front of the camera, such as straps, dirt on the camera or a painted camera, damage of the lens/objective in the form of scratches, defocusing of the lens/objective leading to image blur, camera glare by light sources and/or translational and/or rotational deviations from the intended camera mounting position, areas of the scene can no longer be sufficiently visible.

Camera and scene tampering are indicative of a particular application scenario, but can also be generally said to be an anomaly/anomalous detection that detects any systematic deviation from what can be expected.

The known methods can be basically divided into two categories.

The first category, image feature-based methods, includes methods based directly on image features and content. Based on the old image/the image in the normal state, a reference model is formed. The reference model may describe, for example, color values, statistics of gradients or may describe the appearance of features at salient locations. Camera tampering is detected as a significant deviation from the reference model.

Instead of manually specifying the calculation of the reference model and the recognition of deviations/camera falsifications on the basis thereof, a machine learning method, i.e. the second category, can also be used. For this purpose, a data set with tampered data and data that has not been tampered with is recorded or generated manually. Machine learning methods are trained to distinguish between tampered data and untampered data. The output may also be made pixel-by-pixel to determine tampered image regions (e.g. due to occlusion).

The detection of a tampered/abnormal situation should be achieved with high detection accuracy and high robustness. In particular, image feature-based methods show sensitivity to variations in lighting conditions and shadows, as well as objects and people in front of the camera.

In the field of vehicle interior space monitoring, varying lighting conditions and shadows form as the vehicle is traveling, which without tampering result in significant changes in image intensity and characteristics. Furthermore, people and objects are close to the camera and may occupy a large area of the image. A person entering the vehicle causes a significant change in the image content, but this change should not be detected as tampering.

The difficulty/lack of giving an explanation and explanation of the reasons for the detection of tampering is a weakness of the existing methods. The machine learning method directly determines whether tampering exists or which regions have been tampered with. However, this determination cannot be easily understood and cannot be traced back to the absence of a particular image feature, for example.

Existing machine learning methods require training data with and without tampering. If the data set is not representative, this represents additional expense and possible vulnerability. If the data set contains, for example, only line-of-sight obstructions, for example occlusions with specific objects, generalization to any object type cannot be ensured.

Based on the camera images, other functions (monitoring, driver assistance, etc.) are usually implemented, which require further computation time. Tamper identification is used to monitor whether the function continues to be ensured. Therefore, long calculation times for tampering identification only, without reusing the result, should be avoided.

The invention is therefore based on the task of: an improved method and system for detecting line-of-sight obstructions of an imaging sensor is specified.

This object is achieved by a computer-implemented method for detecting a line-of-sight obstruction of an imaging sensor having the features of patent claim 1.

The task is also solved with a method for providing a trained machine learning algorithm for classifying static semantic scene elements of a sequence of single images recorded by an imaging sensor having the features of patent claim 9.

The object is also achieved by a system for detecting a line-of-sight obstruction of an imaging sensor having the features of patent claim 13.

The object is also achieved by a computer program having the features of patent claim 14 and a computer-readable data carrier having the features of patent claim 15.

Disclosure of Invention

The present invention provides a computer-implemented method for detecting line of sight obstructions, particularly occlusions, of an imaging sensor.

The method comprises the following steps: image data of a sequence of individual images recorded by an imaging sensor is provided.

The method further comprises the following steps: image data is received by a machine learning algorithm that classifies scenes contained in respective single images into a plurality of static semantic scene elements and outputs a first data set representing a plurality of determined classes.

The method further comprises the following steps: the first data set is compared with a second data set representing a reference topology of the scene, in particular the occurrence of scene elements and/or the arrangement of these scene elements in the scene.

The method further comprises the following steps: if the deviation of the classified scene elements of the first data set from the classified scene elements of the second data set exceeds a predefined threshold value, a line-of-sight obstruction, in particular an occlusion, of the imaging sensor is detected.

The invention also provides a computer-implemented method for providing a trained machine learning algorithm to classify static semantic scene elements of a sequence of single images recorded by an imaging sensor.

The method comprises the following steps: a first training data set of image data of a sequence of single images recorded by an imaging sensor is received.

The method further comprises the following steps: a second training data set of classified image data is received, wherein scenes contained in respective single images are classified into a plurality of static semantic scene elements.

The method further comprises the following steps: the machine learning algorithm is trained by an optimization algorithm that computes extrema of a loss function for classifying static semantic scene elements contained in the image data.

The invention also provides a system for detecting line of sight obstructions, particularly occlusions, of an imaging sensor.

The system comprises: means for providing image data of a sequence of individual images recorded by the imaging sensor.

The system further comprises: means for receiving image data by a machine learning algorithm that classifies scenes contained in respective single images into a plurality of static semantic scene elements and outputs a first data set representing a plurality of determined classes.

The system further comprises: means for comparing the first data set with a second data set representing a reference topology of the scene, in particular the occurrence of scene elements and/or the arrangement of these scene elements in the scene.

The system further comprises: means for detecting a line-of-sight obstruction, in particular an occlusion, of the imaging sensor if the deviation of the classified scene elements of the first data set from the classified scene elements of the second data set exceeds a predefined threshold value.

The invention also provides: a computer program having a program code for performing one of the methods according to the invention when the computer program is executed on a computer; and a computer-readable data carrier with a program code of a computer program for performing one of the methods according to the invention when the computer program is executed on a computer.

The idea of the invention is as follows: machine learning methods for semantic segmentation are combined with model-based knowledge for detecting tampering.

The present invention is designed for application in a semantic static scene environment. Semantic statics refers to the fact that: important semantic scene elements in the camera image exist at the same location and do not change over time. This fact exists, for example, in vehicle interior spaces with semantic elements like seats, foot spaces, doors, windows, steering wheels, etc.

Semantic segmentation is trained by means of machine learning methods to detect semantic static elements of a scene. For tamper recognition, the current detection of the semantic elements is compared with a reference model. In the case of significant tampering, the machine learning method cannot detect the semantic elements, so that deviations from the reference model occur.

The changed illumination conditions and shadows significantly change the image content and features, which has no effect on semantic meaning. For example, a seat will continue to be classified as a seat regardless of whether there are light points or shadows in the image. Machine learning methods, in particular neural networks, have shown that: these machine learning methods may achieve high robustness against varying lighting conditions.

With respect to people and objects in front of the camera, semantic segmentation is trained to continue detecting static semantic scene elements. For example, even if a person is sitting on it, the class seat continues to be detected. This is achieved by: the neural network incorporates the context together in the scene and correspondingly produces static semantic scene elements. This is a substantial distinction from image features which generally correspond to a very local description of the image content.

In this respect, mention should be made of: scene context is also typically lost in the case of systematic tampering, so that static semantic scene elements cannot be detected. In this way, comparison with the reference model enables detection of tampering.

For the reasons mentioned, the invention achieves a high robustness, especially with respect to varying illumination conditions and objects and persons in the camera image.

The improvement relates in particular to image feature based methods, whereas the described invention shows an interpretable advantage over existing machine learning methods that provide "Black-Box" detection without additional reasoning or explanation. Comparison of semantic segmentation with reference models, as proposed in the present invention, follows a model-based approach/probabilistic approach. This allows an explanation to be made as to why tampering was recognized and with what probability tampering was present.

The comparison between the reference model and the semantic segmentation also integrates the following model knowledge: the semantics in this particular scenario do not change over time. Whereas the End2End (End-to-End) machine learning method must implicitly map the scene model of all possible scenes.

Further scene knowledge can be integrated, such as that the vehicle interior must contain, for example, two seats, two vehicle doors and two vehicle windows depending on the viewing angle and the vehicle type. The machine learning method is trained to detect static semantic scene elements, and training data with tampering is not mandatory. This reduces the cost for generating the data set and prevents overfitting to some tampering in the data set.

Semantic segmentation provides detection of semantic scene elements, which can also be used for other functions. For example, the detection of the seating area is a function for ascertaining whether all occupants are normally seated. By reusing the detection of the machine learning method, the running time can be reduced as a whole, as compared with the additional method for tamper recognition only. By combining with the image feature based approach, the computation time can be further reduced.

Advantageous embodiments and developments emerge from the dependent claims and from the description with reference to the figures.

According to a preferred embodiment, provision is made for: the second data set representing the reference topology of the scene comprises a predefined plurality of classified static semantic scene elements in a single image of the scene containing the sequence of single images recorded by the imaging sensor, wherein these classified scene elements do not represent line-of-sight obstacles, in particular occluded scene elements, of the imaging sensor.

Thus, the second data set is advantageously a reference scene that can be compared with the first data set.

According to a further preferred embodiment, provision is made for: the machine learning algorithm classifies a plurality of individual images pixel by pixel in order to generate a pixel-by-pixel discrete distribution of classes, wherein a pixel-by-pixel comparison of deviations of classified scene elements of a first data set with classified scene elements of a second data set is carried out in order to detect a line-of-sight obstacle, in particular an occlusion, of the imaging sensor, wherein, when a predefined threshold value is exceeded, the function provided using a sequence of individual images recorded by the imaging sensor is deactivated.

This function is, for example, the monitoring of the interior of a motor vehicle. Thus, the detection of a line-of-sight obstruction, in particular an occlusion or tampering, of the imaging sensor can cause the deactivation of the provided functionality.

According to a further preferred embodiment, provision is made for: a pixel-by-pixel comparison of the classified scene elements of the first data set with the classified scene elements of the second data set produces a difference image, wherein a pixel-by-pixel dissimilarity is represented by a first numerical value and a pixel-by-pixel identity is represented by a second numerical value, and wherein a line-of-sight obstruction, in particular an occlusion, of the imaging sensor is detected when a predetermined number of pixels are not identical.

Thus, occlusions of the imaging sensor of a certain, predefined size can be reliably detected.

According to a further preferred embodiment, provision is made for: a line-of-sight obstacle, in particular an occlusion, is detected if the probability of identity of at least a predetermined number of pixels or at least a predetermined region pixel by pixel of the disparity image is low, and the disparity image specifies the arrangement of non-identical pixels in the scene.

Therefore, not only can an explanation be made as to whether or not there is a line-of-sight obstruction, particularly an obstruction, of the imaging sensor, but also an explanation can be made as to in which region of the image the line-of-sight obstruction is present.

According to a further preferred embodiment, provision is made for: the topology of the disparity image is a graph-based representation, wherein the nodes are the center points of the scene elements and the edges are the arrangement and the neighborhood of the scene elements, wherein a line-of-sight obstacle, in particular an occlusion, of the imaging sensor is detected if a predetermined number of nodes and/or edges differ from a reference topology.

This alternative embodiment of the method for detecting occlusion of an imaging sensor therefore also enables an accurate detection of occlusion.

According to a further preferred embodiment, provision is made for: the machine learning algorithm is set up to detect a person and/or an object in the scene, wherein pixels belonging to the person and/or the object are detected as identical to the reference topology in case of detection of the person and/or the object.

Thus, it is advantageously possible to achieve: the presence of people and/or objects does not lead to false detection of occlusions.

According to a further preferred embodiment, provision is made for: the provided sequence of individual images recorded by the imaging sensor is generated in real time by the imaging sensor or extracted from the stored video data, and an image-feature-based detection method, in particular another machine learning algorithm, is applied as a prefilter to the sequence of individual images recorded by the imaging sensor, the prefilter being designed to: the line-of-sight obstruction, in particular the occlusion, of the imaging sensor is detected by determining a deviation of the image data from a reference model, in particular a deviation of the color values, gradients and/or the appearance of image features, in particular SIFTs, at predetermined positions of the individual images.

Thus, the method may advantageously be applied to different sensor data. The use of a pre-filter can further improve the detection accuracy of the method. Only potentially critical situations have to be analyzed by the vehicle interior partitioning. In this way, the computation time can also be significantly reduced.

According to a further preferred embodiment, provision is made for: in training the machine learning algorithm, an extremum of another loss function for detecting people and/or objects in the scene is calculated. Thus, the presence of people and/or objects in the scene may be effectively detected.

According to a further preferred embodiment, provision is made for: upon detection of an occlusion of the imaging sensor, the penalty of the optimization algorithm is adapted such that the machine learning algorithm outputs a predefined class or classes. Thus, occlusion is assigned a unique class, which can simplify detection.

According to a further preferred embodiment, provision is made for: the machine learning algorithm is trained to perform a pixel-by-pixel classification of a scene contained in a respective single image into a plurality of static semantic scene elements, or to detect the plurality of static semantic scene elements if polygonal polylines or bounding boxes are used. Thus, depending on the field of application, algorithms suitable for detecting occlusion or tampering of the imaging sensor may be used.

The described embodiments and further embodiments can be combined with one another in any desired manner.

Other possible configurations, extensions and implementations of the invention also include combinations of features of the invention not explicitly mentioned above or described below with regard to the exemplary embodiments.

Drawings

The accompanying drawings should be included to provide a further understanding of embodiments of the invention. The drawings illustrate embodiments and, together with the description, serve to explain the principles and designs of the invention.

Further embodiments and a plurality of the mentioned advantages are derived with reference to the figures. The presented elements of these drawings are not necessarily shown to the correct scale to each other.

Wherein:

1-4 illustrate a flow diagram of a computer-implemented method for detecting line-of-sight obstructions, particularly occlusions, of an imaging sensor, in accordance with a preferred embodiment of the present invention;

FIG. 5 illustrates a flow diagram of a computer-implemented method for providing a trained machine learning algorithm to monitor a use area of a function provided using another machine learning algorithm, in accordance with a preferred embodiment of the present invention; and

fig. 6 shows a schematic illustration of a system for monitoring a use area of a function provided using a first machine learning algorithm according to a preferred embodiment of the invention.

In the drawings, the same reference numerals denote the same or functionally same elements, members, or components, unless otherwise specified.

Detailed Description

Fig. 1 shows a flow diagram of a computer-implemented method for detecting line-of-sight obstructions, in particular occlusions 20, of an imaging sensor 10.

The method comprises the following steps: image data BD of a sequence of individual images 12 recorded by the imaging sensor 10 is provided S1.

The method further comprises the following steps: the image data BD is received S2 by a machine learning algorithm a1 which classifies S3 the scene 12a contained in the respective single image 12 into a plurality of static

semantic scene elements

14a, 14b, 14c, 14d and outputs S4 a first data set DS1 representing the plurality of

determined classes

16a, 16b, 16c, 16 d.

The method further comprises the following steps: the first data set DS1 is compared S5 with a second data set DS2 representing the reference topology 18 of the scene 12a, in particular the occurrence of the

scene elements

14a, 14b, 14c, 14d and/or the arrangement of these scene elements in the scene 12 a.

The second data set DS2, that is to say the semantic reference model, describes the expected semantic segmentation of the static scene when there is no tampering.

The method further comprises the following steps: an occlusion 20 of S6 to the imaging sensor 10 is detected if a deviation a of the

classified scene elements

14a, 14b, 14c, 14d of the first data set DS1 from the classified

scene elements

14a, 14b, 14c, 14d of the second data set DS2 exceeds a predefined threshold value SW. If the deviation a exceeds a predefined threshold value SW, a first evaluation J is carried out, i.e. "yes". If the deviation a is below a predefined threshold value SW, a second evaluation N is carried out, i.e. "no".

Instead of the occlusion 20, the line of sight obstruction may be, for example, camera tampering or deviation from an expected situation.

Camera tampering refers to, for example, the following: due to occlusion/blocking of objects in front of the camera, e.g. straps, dirt on the camera or painted camera, damage of the lens/objective in the form of scratches, defocusing of the lens/objective leading to image blur, camera glare and/or translational and/or rotational deviation from the intended camera mounting position or systematic occlusion of areas in the scene, e.g. due to hangings, areas of the scene are no longer sufficiently visible.

The second data set DS2 representing the reference topology 18 of the scene 12a comprises a predetermined plurality of classified static

semantic scene elements

14a, 14b, 14c, 14d in a single image 12 of the scene 12a containing a sequence of single images 12 recorded by the imaging sensor 10, wherein these

classified scene elements

14a, 14b, 14c, 14d do not represent the

scene elements

14a, 14b, 14c, 14d of the occlusion 20 of the imaging sensor 10.

The machine learning algorithm a1 for generating the pixel-by-pixel discrete distribution of

classes

16a, 16b, 16c, 16d classifies the plurality of individual images 12 pixel-by-pixel. In order to detect an occlusion 20 of the imaging sensor 10, a pixel-by-pixel comparison of the deviations a of the

classified scene elements

14a, 14b, 14c, 14d of the first data set DS1 with the

classified scene elements

14a, 14b, 14c, 14d of the second data set DS2 is carried out, wherein the function provided using the sequence of individual images 12 recorded by the imaging sensor 10 is deactivated if a predefined threshold value SW is exceeded.

A pixel-by-pixel comparison of the

classified scene elements

14a, 14b, 14c, 14d of the first data set DS1 with the

classified scene elements

14a, 14b, 14c, 14d of the second data set DS2 yields a difference image 22. Here, the pixel-by-pixel disparity is represented by a first numerical value and the pixel-by-pixel identity is represented by a second numerical value. In this case, the occlusion 20 of the imaging sensor 10 is detected when a predetermined number of pixels are different.

Furthermore, an occlusion 20 is detected if the same probability per pixel of at least a predetermined number of pixels or at least a predetermined region of the difference image 22 is low. The difference image 22 illustrates the arrangement of non-identical pixels in the scene 12 a.

Alternatively, the topology of the difference image 22 may be a graph-based representation, where the node is the center point of the

scene element

14a, 14b, 14c, 14d and the edge is the arrangement and neighborhood of the

scene element

14a, 14b, 14c, 14 d. If the predetermined number of nodes and/or edges is not identical to the reference topology 18, an occlusion 20 of the imaging sensor 10 is detected.

The machine learning algorithm a1 is also set up to detect a person 30 and/or an object 31 in the scene 12a, wherein pixels belonging to the person 30 and/or the object 31 are detected as identical to the reference topology 18 in the event of detection of the person 30 and/or the object 31.

The sequence of individual images 12 recorded by the imaging sensor 10 that is provided is generated in real time by the imaging sensor 10 or extracted from the stored video data. An image-feature-based detection method, in particular another machine learning algorithm, is applied to a sequence of individual images 12 recorded by the imaging sensor 10 as a prefilter, which is designed to: the occlusion 20 of the imaging sensor 10 is detected by determining a deviation a of the image data BD from a reference model, in particular a deviation a of the color values, gradients and/or the occurrence of image features, in particular SIFTs, at predetermined positions of the individual images.

Tamper identification may also be performed on another device, the cloud. For this purpose, images are uploaded into the cloud, which should be evaluated with respect to tampering. And then tamper identification is carried out in the cloud.

The pixel-by-pixel classification of static semantic scene elements, that is to say semantic segmentation, is a machine learning method. For this purpose, preferably a convolutional neural network is used. The exact architecture may be UNet, VGG or ResNet, for example. A possible variant of this method is, for example, a combination with explicit supervised training of the tampered area by defining another category and supplementing training material.

Instead of ignoring the object 31, the person 30 in the tag definition, these categories may be explicitly learned, for example. As long as the person 30 can still be detected, it can be assumed that the area has not been tampered with.

FIG. 2 shows a flow diagram of a computer-implemented method for detecting occlusions 20 of an imaging sensor 10.

In the illustration shown in fig. 2, a person 30 is present in the scene 12a, who is already seated on the rear seats of the motor vehicle. The above scenario is then compared with the reference topology of the second data set DS 2. The difference image 22 shows the identified person 30, where no occlusion is detected in the current embodiment.

FIG. 3 shows a flow diagram of a computer-implemented method for detecting occlusions 20 of an imaging sensor 10.

In the illustration shown in fig. 3, an object 31 is present in the scene 12a, which object is located on a rear seat of the motor vehicle. The above scenario is then compared with the reference topology of the second data set DS 2. The difference image 22 shows the identified object 31, where no occlusion is detected in the current embodiment.

FIG. 4 shows a flow diagram of a computer-implemented method for detecting occlusion 20 of an imaging sensor 10.

In the illustration shown in fig. 4, there is a barrier 20 in the form of a tape placed on the imaging sensor in the scene 12a, which is tampering with the imaging sensor. The above scenario is then compared with the reference topology of the second data set DS 2. The difference image 22 shows the identified occlusion 20, where in the current embodiment an occlusion is detected.

FIG. 5 illustrates a flow diagram of a computer-implemented method for providing a trained machine learning algorithm to monitor a use area of a function provided using another machine learning algorithm, in accordance with a preferred embodiment of the present invention.

The method comprises the following steps: a first training data set TD1 of image data BD of a sequence of single images 12 recorded by the imaging sensor 10 is received S1'.

The method further comprises the following steps: a second training data set TD2 of classified image data BD is received S2', wherein the scene 12a contained in the respective single image 12 is classified into a plurality of static

semantic scene elements

14a, 14b, 14c, 14 d.

The method further comprises the following steps: the machine learning algorithm a1 is trained S3' by an optimization algorithm A3 that computes extrema of a loss function for classifying static

semantic scene elements

14a, 14b, 14c, 14d contained in the image data BD.

In training S3', the machine learning algorithm A1, an extremum of another loss function is calculated for detecting the person 30 and/or the object 31 in the scene 12 a.

Upon detection of an occlusion 20 of the imaging sensor 10, the loss of the optimization algorithm is adapted such that the machine learning algorithm a1 outputs a

predefined class

16a, 16b, 16c, 16d or a plurality of

predefined classes

16a, 16b, 16c, 16 d.

The machine learning algorithm a1 is trained to perform a pixel-by-pixel classification of a scene 12a contained in a respective single image 12 into a plurality of static

semantic scene elements

14a, 14b, 14c, 14d, or to detect the plurality of static

semantic scene elements

14a, 14b, 14c, 14d if polygonal polylines or bounding boxes are used.

A dataset DS1 with images and a semantically segmented Ground-Truth (Ground-Truth) tag DS2 are created and the network is trained to classify the

semantic scene elements

14a, 14b, 14c, 14d pixel by pixel. In the example of vehicle interior space partitioning, the categories { backsound, seat _ outer, seat _ middle, backseat _ outer, backseat _ middle, outdoor, door, footarea } are defined.

Network a1 may be trained to minimize Cross-entropy Loss (Cross-entropy Loss). The definition of a tag is particular to the fact that: the network A1 is trained to detect

semantic categories

16a, 16b, 16c, 16d of static scenes even if there are people 30 or objects 31 in the image 12. Even if a person 30 is present in the vehicle, the ground truth tag still defines the categories of seats and backrests for the corresponding area.

This fact improves robustness with respect to objects 31 and persons 30 in the camera image for tamper recognition, since these objects and persons are ignored in the semantic segmentation and have no effect in the ideal case. No tampered training data is required.

The losses can also be adapted so that in the case of unknown/tampered areas, the network outputs either a specific class, for example the background, or as different classes as possible, in order to intensify the deviation from the reference model. An example of this is to reduce the weight of classes in the loss, e.g. background, in order to achieve a lower penalty for the error assignment.

An additional share can also be provided in the loss, which share supports as different tags as possible in the local area. Alternatively, the enhancement may be performed with any texture/image that is marked as a category background.

Further, instead of a neural network, another machine learning method may be used. Feature extraction may for example be replaced by Aggregated Channel Features (Aggregated Channel Features) and classification may for example be replaced by enhancement (Boosting) methods.

Fig. 6 shows a schematic diagram of a system for monitoring a use area of a function provided using a first machine learning algorithm according to a preferred embodiment of the present invention.

The system comprises: means 32 for providing image data BD of a sequence of individual images 12 recorded by the imaging sensor 10.

The system further comprises: means 34 for receiving image data BD by a machine learning algorithm a1 which classifies the scene 12a contained in the respective single image 12 into a plurality of static

semantic scene elements

14a, 14b, 14c, 14d and outputs a first data set DS1 representing the plurality of

determined classes

16a, 16b, 16c, 16 d.

The system further comprises: means 36 for comparing the first data set DS1 with a second data set DS2 representing the reference topology 18 of the scene 12a, in particular the occurrence of

scene elements

The system further comprises: means 38 for detecting an occlusion 20 of the imaging sensor 10 if a deviation a of the

classified scene elements

14a, 14b, 14c, 14d of the first data set DS1 from the classified

scene elements

14a, 14b, 14c, 14d of the second data set DS2 exceeds a predefined threshold value SW.

Claims

1. A computer-implemented method for detecting a line-of-sight obstruction, in particular an occlusion (20), of an imaging sensor (10), the method having the steps of:

providing (S1) image data (BD) of a sequence of individual images (12) recorded by an imaging sensor (10);

receiving (S2) the image data (BD) by a machine learning algorithm (A1) that classifies (S3) scenes (12 a) contained in respective single images (12) into a plurality of static semantic scene elements (14 a, 14b, 14c, 14 d) and outputs (S4) a first data set (DS 1) representing a plurality of determined classes (16 a, 16b, 16c, 16 d);

comparing (S5) the first data set (DS 1) with a second data set (DS 2) representing a reference topology (18) of the scene (12 a), in particular occurrences of scene elements (14 a, 14b, 14c, 14 d) and/or arrangements of the scene elements in the scene (12 a);

detecting (S6) a line-of-sight obstruction, in particular an occlusion (20), to the imaging sensor (10) if a deviation (A) of the classified scene elements (14 a, 14b, 14c, 14 d) of the first data set (DS 1) from the classified scene elements (14 a, 14b, 14c, 14 d) of the second data set (DS 2) exceeds a predefined threshold value (SW).

2. The computer-implemented method of claim 1, wherein the second data set (DS 2) representing the reference topology (18) of the scene (12 a) comprises a predefined plurality of classified static semantic scene elements (14 a, 14b, 14c, 14 d) in a single image (12) of the scene (12 a) containing a sequence of single images (12) recorded by an imaging sensor (10), wherein the classified scene elements (14 a, 14b, 14c, 14 d) are free of scene elements (14 a, 14b, 14c, 14 d) representing a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10).

3. The computer-implemented method of claim 1 or 2, wherein the machine learning algorithm (A1) classifies pixel-by-pixel a plurality of individual images (12) for the purpose of generating a pixel-by-pixel discrete distribution of classes (16 a, 16b, 16c, 16 d), wherein for detecting a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10), a pixel-by-pixel comparison of deviations (A) of the classified scene elements (14 a, 14b, 14c, 14 d) of the first data set (DS 1) with the classified scene elements (14 a, 14b, 14c, 14 d) of the second data set (DS 2) is performed, wherein, if the predefined threshold value (SW) is exceeded, the function provided using the sequence of individual images (12) recorded by the imaging sensor (10) is deactivated.

4. The computer-implemented method as claimed in claim 3, wherein a pixel-by-pixel comparison of the classified scene elements (14 a, 14b, 14c, 14 d) of the first data set (DS 1) with the classified scene elements (14 a, 14b, 14c, 14 d) of the second data set (DS 2) yields a difference image (22), wherein a non-uniformity pixel-by-pixel is represented by a first numerical value and a uniformity pixel-by-pixel is represented by a second numerical value, and wherein a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10) is detected when a predefined number of pixels is not identical.

5. The computer-implemented method as claimed in claim 4, wherein a line-of-sight obstacle, in particular a line-of-sight limitation, in particular an occlusion (20), is detected if the probability of identity of at least a predetermined number of pixels or at least a predetermined region of the difference image (22) pixel by pixel is low, and wherein the difference image (22) describes the arrangement of non-identical pixels in the scene (12 a).

6. The computer-implemented method of claim 4 or 5, wherein the topology of the difference image (22) is a graph-based representation, wherein a node is a center point of a scene element (14 a, 14b, 14c, 14 d) and an edge is an arrangement and neighborhood of the scene element (14 a, 14b, 14c, 14 d), wherein a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10) is detected if a predefined number of nodes and/or edges are not identical to the reference topology (18).

7. The computer-implemented method of one of the preceding claims, wherein the machine learning algorithm (a 1) is set up to detect a person (30) and/or an object (31) in the scene (12 a), wherein pixels belonging to the person (30) and/or the object (31) are detected as being identical to the reference topology (18) in case a person (30) and/or an object (31) is detected.

8. The computer-implemented method of one of the preceding claims, wherein the provided sequence of individual images (12) recorded by the imaging sensor (10) is generated in real time by the imaging sensor (10) or extracted from stored video data, and wherein an image-feature-based detection method, in particular another machine learning algorithm, is applied as a prefilter to the sequence of individual images (12) recorded by the imaging sensor (10), the prefilter being set up to: a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10) is detected by determining a deviation (A) of the image data (BD) from a reference model, in particular a deviation (A) of the color values, gradients and/or the occurrence of image features, in particular SIFT, at predetermined positions of the individual images.

9. A computer-implemented method for providing a trained machine learning algorithm (a 1) for classifying static semantic scene elements (14 a, 14b, 14c, 14 d) of a sequence of single images (12) recorded by an imaging sensor (10), the method having the steps of:

receiving (S1') a first training data set (TD 1) of image data (BD) of a sequence of single images (12) recorded by an imaging sensor (10);

receiving (S2') a second training data set (TD 2) of classified image data (BD), wherein scenes (12 a) contained in respective single images (12) are classified into a plurality of static semantic scene elements (14 a, 14b, 14c, 14 d); and also

Training (S3') the machine learning algorithm (A1) by an optimization algorithm (A3) that computes extrema of a loss function for classifying static semantic scene elements (14 a, 14b, 14c, 14 d) contained in the image data (BD).

10. The computer-implemented method of claim 9, wherein an extremum of another loss function for detecting people (30) and/or objects (31) in the scene (12 a) is computed when the machine learning algorithm (a 1) is trained (S3').

11. The computer-implemented method of claim 9 or 10, wherein upon detection of a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10), the loss of the optimization algorithm is adapted such that the machine learning algorithm (a 1) outputs a predefined class (16 a, 16b, 16c, 16 d) or a plurality of predefined classes (16 a, 16b, 16c, 16 d).

12. The computer-implemented method of any of claims 9 to 11, wherein the machine learning algorithm (a 1) is trained to perform a pixel-by-pixel classification of a scene (12 a) contained in a respective single image (12) into a plurality of static semantic scene elements (14 a, 14b, 14c, 14 d), or to detect the plurality of static semantic scene elements (14 a, 14b, 14c, 14 d) using polygonal polylines or bounding boxes.

13. A system (1) for detecting line-of-sight obstructions, in particular occlusions (20), of an imaging sensor (10), the system comprising:

-means (32) for providing image data (BD) of a sequence of individual images (12) recorded by the imaging sensor (10);

means (34) for receiving the image data (BD) by a machine learning algorithm (A1) that classifies a scene (12 a) contained in a respective single image (12) into a plurality of static semantic scene elements (14 a, 14b, 14c, 14 d) and outputs a first data set (DS 1) representing a plurality of determined classes (16 a, 16b, 16c, 16 d);

means (36) for comparing the first data set (DS 1) with a second data set (DS 2) representing a reference topology (18) of the scene (12 a), in particular occurrences of scene elements (14 a, 14b, 14c, 14 d) and/or arrangements of the scene elements in the scene (12 a); and

means (38) for detecting a line-of-sight obstruction, in particular an occlusion (20), of the imaging sensor (10) if a deviation (A) of classified scene elements (14 a, 14b, 14c, 14 d) of the first data set (DS 1) from classified scene elements (14 a, 14b, 14c, 14 d) of the second data set (DS 2) exceeds a predefined threshold value (SW).

14. A computer program having a program code for performing one of the methods according to any one of claims 1 to 8 and 9 to 12 when the computer program is executed on a computer.

15. A computer-readable data carrier having a program code of a computer program for performing one of the methods according to any one of claims 1 to 8 and 9 to 12 when the computer program is executed on a computer.