WO2022226989A1

WO2022226989A1 - System and method for obstacle-free driving

Info

Publication number: WO2022226989A1
Application number: PCT/CN2021/091451
Authority: WO
Inventors: Diego ORTIZ; Philipp QUENTIN; Gereon HINZ; Wen Hu; Xiaoyu Zhang
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2022-11-03
Also published as: EP4274770A4; CN117500709A; EP4274770A1

Abstract

An apparatus (220) and a method for object detection in a reflective surface (101) is provided. The disclosure relates to a self-driving vehicle comprising the apparatus (220) for operating the vehicle according to the method for automated driving support. The apparatus (220) comprises at least one processor (225) configured to : receive a sequence of input image frames; detect a reflective surface (101) in the sequence of input image frames; perform object detection within the detected reflective surface (101); and allocate one or more detected objects (570) for specifying an object trajectory. The method comprises: receiving a sequence of input image frames; detecting a reflective surface (101) in the sequence of input image frames; performing object detection within the detected reflective surface (101); allocating one or more detected objects (570) for specifying an object trajectory. The self-driving vehicle comprises the apparatus (220) for detecting object trajectory. The self-driving vehicle comprises the apparatus (220) for detecting objects (570) within a reflective surface (101) for operating the vehicle according to the method for detecting objects (570) within a reflective surface (101) for automated driving support.

Description

SYSTEM AND METHOD FOR OBSTACLE-FREE DRIVING

TECHNICAL FIELD

The present application relates to the field of automated driving vehicles, and more particularly to an apparatus and method for object detection in a reflective surface. Furthermore, the present application relates to a self-driving vehicle comprising the apparatus for operating the vehicle according to the method for automated driving support.

BACKGROUND

Automated driving vehicles need to perceive the environment, so that automated behavioral planning and navigation can be performed. Accurate estimation and identification of an obstacle-free path is often crucial to ensure safe vehicle operation. Environment conditions can become complex requiring the use of a traffic mirror. An example of such an environment is an intersection of roads, where the field of view of an approaching vehicle can partially or completely be occluded by the infrastructure of the environment. Human drivers possess cognitive capacity to perceive traffic mirrors, locate relevant traffic participants in them and have the semantic understanding that the perceived reflected objects are situated elsewhere in the environment, i.e. outside the traffic mirror. Known automatic detection configurations seem less reliably when detecting traffic participants in traffic mirrors and might lack semantic understanding.

Object detection methods based on visual cameras, for example, are frequently used to perceive and track objects of interest in the environment. These object detection methods might be less reliable for automated driving purposes when detecting small objects such as traffic participants in traffic mirrors becomes necessary. Current object detection methods commonly use Convolutional Neural Networks (CNN) as backbone structures. These networks often use convolutions and pooling layers to reduce the image resolution, abstract relevant features from corresponding pixels, and based on this, apply a high-quality classification algorithm. Nevertheless, spatial information at each pooling layer is reduced and small object features can get lost during the convolution process of the network. This might hinder reliable detection of small-sized objects in images.

Moreover, current driving automation configurations seem to lack semantic understanding of mirror detections especially regarding the position and motion direction of the detected objects. Even if available object detection systems incorporated in the automated vehicle would detect objects in the traffic mirror such as incoming vehicles, these detected vehicles may be interpreted as coming from the front side of the road. The reason for this might be a misinterpretation that reflected objects in the mirror are located elsewhere in the environment.

Therefore, there arises a need to address the aforementioned technical drawbacks in existing systems or technologies for reliable estimation of position and motion direction of objects detected in reflective surfaces.

SUMMARY

It is an object of the invention to provide an apparatus and method for object detection in a reflective surface having reliable object detection and tracking capabilities. Such apparatus and method is particularly used for obstacle-free automated driving.

The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.

According to a first aspect, there is provided an apparatus comprising at least one processor configured to: receive a sequence of input image frames; detect a reflective surface in the sequence of input image frames; perform object detection within the detected reflective surface; and allocate one or more detected objects for specifying an object trajectory.

The apparatus can be implemented in any vehicle or other device used for deployment on embedded systems for automated driving support. Furthermore, the apparatus can be used in other computer vision tasks that require detection, tracking and semantic assignment of multiple classes of objects in reflective surfaces.

In some implementations the apparatus may be referred to as ADAS (Advanced driver-assistance systems) which is an electronic system that assists drivers in driving and parking functions.

The one or more processor may be embodied as any type of processor for performing the described operations. For instance, the processor may be embodied as a single or multi-core processor (s) , microcontroller, or other processing/controlling circuit. The processor is configured to assess the relevance of object detections to support the decision-making process of a host vehicle comprising the apparatus so that, lane incorporation or passing maneuvers are performed reliably.

A memory may be coupled to the processor and may be embodied as any type of volatile or non-volatile memory for short-term or long-term data storage, for performing of the operations described herein. The memory may comprise hard disk drives, solid-state drives, or other data storage devices. The memory can store image data, training data, other data used for object detection as well as various software and drivers used during operation of the apparatus. The memory of the apparatus can comprise a high-definition (HD) map of the environment.

The sequence of input image frames comprises images providing a reflective surface, objects which are to be detected as relevant and not relevant. Such objects comprise, vehicles, street topology, environmental features such as trees, traffic signs, road signs etc. The apparatus can further comprise an image capturing device configured to capture a video sequence formed by the set of input image frames.

The reflective surface is detected in the sequence of input image frames. The reflective surface may be detected over the whole sequence or in a set of images in the sequence. Detecting the reflective surface, such as a traffic mirror for example, provides a first reliable detection of relevant objects. In other words, the detection of the reflective surface at least indicates a more complex environment. Also, more complex conditions of object detection is initiated. This will become more apparent by further details provided in the following implementations.

Object detection is performed within the reflected surface. By doing so, the detected objects are easily classified as being located elsewhere in the environment than at the location of the reflective surface. Misinterpretations of detected object locations can thus be avoided.

For specifying the object trajectory, e.g. direction, motion translation, the detected objects are allocated. Thus, an object is not only detected in the reflective surface, but it is further classified, tracked and if relevant included in a following decision procedure.

The detection of objects in a reflective surface along with allocating the detected objects for specifying the relevant object trajectories provides reliable detection, tracking and semantic assignment of multiple classes of objects in reflective surfaces. Additional sensors, such as in known approaches, are not required.

In a possible implementation form of the apparatus according to the first aspect, the at least one processor is further configured to crop the detected reflective surface from one or more input frames, providing a cropped image comprising cropped pixels of a corresponding bounding box. The scanning includes retrieving a sequence of cropped images of resulting bounding boxes of the reflective surface detections or similar marking. In the following a single cropped image or the sequence of cropped images is referred to as cropped image. The area of the reflective surface is cropped. Thus, only the region of interest in the sequence of images is provided for further processing. The resulting cropped images comprise the pixels of the input image of the corresponding bounding box. The bounding box is selected to include the detected reflective surface. The objects are detected within the cropped image. In some implementations at least one of the detected objects is a host vehicle or further vehicles that can communicate with the host vehicle via Vehicle-to-vehicle (V2V) communication services.

In a possible implementation form of the apparatus according to the first aspect, the at least one processor is further configured to perform a high-resolution object detection over the cropped image. This provides for a reduced loss of image resolution.

In a possible implementation form of the apparatus according to the first aspect, wherein allocating of the one or more detected objects further comprises at least one of: classifying the one or more detected objects, assigning the one or more detected objects to an environment topology, and tracking the assigned one or more detected objects. Classifying the detected objects, e.g. as relevant, less relevant or not relevant, provides for improved or fast object detection by focusing on object of interest. Some objects are classified as street topology and the like, for example, so that only these objects are assigned to an environment topology. Such environment topology may be provided as internal or external HD map. The assigned objects are tracked. In some implementations, the tracking is performed in the local coordinate system of the reflective surface. In some implementations tracking is performed in a coordinate system of the host vehicle.

In a possible implementation form of the apparatus according to the first aspect, the at least one processor is further configured to calibrate the cropped image using distortion correction. Known position and geometric information, e.g. of the host vehicle or other communicating vehicles, can be used to correct the cropped image for distortions. For this, radial and tangential distortion parameters from the comparison of the distorted and the known geometry is used.

In a possible implementation form of the apparatus according to the first aspect, the at least one processor is further configured to calibrate extrinsic parameters of the cropped image with provided or detected positions and geometry of the environment topology. Known position and geometric information, e.g. of the host vehicle or other communicating vehicles, is retrieved to estimate the translational and rotational calibration parameters of the cropped image by approximating the translation and rotation parameters between the reflective surface and the host vehicle through triangulation methods and comparison of the host vehicle known geometry during different video frames.

In a possible implementation form of the apparatus according to the first aspect, the apparatus further comprises a communication subsystem configured to use wireless technology and associated protocols to enable communication between the apparatus and other remote computer networks or devices. The communication subsystem is configured to use wireless technology and associated protocols to enable communication between the apparatus and other remote computer networks or devices. These include but are not limited to vehicles with automated driving capabilities via vehicle-to-vehicle (V2V) communication or intelligent infrastructure devices via Vehicle-to-infrastructure (V2I) communication.

In some implementations it is required for the apparatus to incorporate solutions and approaches to correct the cropped image for distortions and to estimate the position and motion direction of the detected objects for a correct semantic understanding of the detections. Besides, in some implementations, it is desirable to provide the apparatus, e.g. the automated driving system, with decision-making support for safe navigation, taking advantage of the information gained by detecting and tracking of traffic participants in the reflective surface, for example a traffic mirror.

According to a second aspect, there is provided a method of object detection. The method is performed by an apparatus. The method comprises: receiving a sequence of input image frames; detecting a reflective surface in the sequence of input image frames; performing object detection within the detected reflective surface; allocating one or more detected objects for specifying an object trajectory.

In a possible implementation form of the method according to the second aspect, the method further comprises cropping the detected reflective surface from one or more input frames and providing a cropped image comprising cropped pixels of a corresponding bounding box.

In a possible implementation form of the method according to the second aspect, the method further comprises performing a high-resolution object detection over the cropped image.

In a possible implementation form of the method according to the second aspect, wherein allocating of the one or more detected objects further comprises at least one of: classifying the one or more detected objects; assigning the one or more detected objects to an environment topology; and tracking the assigned one or more detected objects. According to another aspect, there is provided a method for classifying the detected objects, assigning them to the street topology and using the calibrated mirror image to track other detected objects, such as traffic participants, for example.

In a further implementation form of the method according to the second aspect, the method further comprises tracking the trajectory of the assigned one or more detected objects in a coordinate system external to the reflective surface. This may comprise tracking position and velocity of the relevant objects in the host vehicle coordinate system. Thus, the objects can be detected correctly and reliable, at least regarding their true location and velocity. In a further implementation the method comprises calculating an approximate of a surface curvature of the convex surface of the reflective surface using distortion parameters and providing a predetermined curvature model for generic reflective surfaces. Based on the estimated curvature or the reflection properties of the reflective surface the pose of other detected objects, such as cars, can be estimated, using the position of the reflection in the reflective surface relative to the camera pose. To associate areas within the reflective surface with the road topology and thereby associating detected objects with the road topology, rays connecting the reflective surface and the road topology can be traced according to the reflection properties of the reflective surface. The selection of suitable rays is determined by the distance of road topology that should be associated with the reflective surface.

In a further implementation form of the method according to the second aspect, the method further comprises predicting a position of the detected object based on the assignment of the object detections to street topology. In other words, an enclosed area on the ground that is delimited by the traced rays (explained above) is predicted as a rough estimation of the position of the detected objects, e.g. traffic participants, based on the assignment of the object detections to the street topology.

In a further implementation form of the method according to the second aspect, wherein the method is implemented by an apparatus of a vehicle, further comprising: detecting at least one of the objects as the vehicle; retrieving positions and geometry of the vehicle; estimating radial and tangential distortion coefficients by comparing distorted positions of the vehicle in a coordinate system of the reflective surface with the retrieved positions and geometry of the vehicle. Thus, the host vehicle can be used to estimate distortion coefficients. Also vehicle-to-vehicle (V2V) communication services can be used for retrieving positions and geometry of the vehicle. The vehicle may be a host vehicle, another vehicle or other vehicle communicating with the host vehicle.

Furthermore, poses of objects observed in the reflective surface can be estimated using multiple captured images for triangulation, while the car with the camera, or connected vehicles move relative to the reflective surface. In a further implementation form of the method according to the second aspect, the method further comprises assessing relevance of the one or more detected objects; determining a Go state or No-go state for supporting the planning of a lane incorporation or passing maneuver. In other words, there is provided a method for decision-making support of a host vehicle based on the information obtained from objects detected and tracked in reflective surfaces, e.g. traffic mirrors. The relevance of objects is assessed based on previous information such as street topology assignment, objects classes, positions and motion characteristics to determine the viability of a safe lane incorporation or passing maneuver of the host vehicle.

According to a third aspect, there is provided a self-driving vehicle comprising the apparatus for detecting objects within a reflective surface for operating the vehicle according to the method for detecting objects within a reflective surface for automated driving support.

Implementation forms of the invention can thus provide a traffic scheduling concept with a flexible traffic scheduler which is flexible to further development in the field of algorithms, in particular faster algorithms that can provide at the same time the superior performance of a hardware-based traffic scheduler.

Furthermore, the unknown orientation of the image of the reflective surface, e.g. traffic mirror image, and its distortion due to the surface curvature pose a great challenge for the reliable object detection within the reflective surface. The correct estimation of the position and motion direction of the detected objects is crucial for a correct semantic understanding of reflective surface detections, e.g. mirror detections. Above, it has been described how this can be achieved. More details are provided in the detailed description.

The method according to the second aspect of the invention can be performed by the apparatus according to the first aspect of the invention. Further features and implementation forms of the method according to the second aspect of the invention correspond to the features and implementation forms of the apparatus according to the first aspect of the invention.

The method according to the second aspect can be extended into implementation forms corresponding to the implementation forms of the apparatus according to the first aspect. Hence, an implementation form of the method comprises the feature (s) of the corresponding implementation form of the apparatus.

The advantages of the methods according to the second aspect are the same as those for the corresponding implementation forms of the apparatus according to the first aspect.

According to a fifth aspect, there is provided a computer program product including a non-transitory computer-readable storage medium having computer-readable instructions stored thereon. The computer-readable instructions is executable by a computerized device comprising processing hardware to execute the method.

BRIEF DESCRIPTION OF DRAWINGS

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:

FIG. 1 illustrates an exemplary environment, where object detection is required;

FIG. 2 shows a schematic of a vehicle and a reflective surface as shown in Fig. 1;

FIG. 3 is a block diagram showing Reflective Surface Perception and Go/No-go Decision;

FIG. 4 is a flowchart of the Go/No-go decision for a vehicle;

FIG. 5 is a block diagram for Object Detection;

FIG. 5a illustrates the Object Detection of FIG. 5;

FIG. 6 is a block diagram of an Image Calibration technique;

FIG. 7 is a block diagram of an Allocating technique for detected objects;

FIG. 8 is a block diagram showing an interaction between Object Detection, Allocation of detected objects and Go/No-go Decision according to various embodiments of the invention; and

FIG. 9 is a block diagram of a further example of an Allocating technique for detected objects.

In the following identical reference signs refer to identical or at least functionally equivalent features if not explicitly specified otherwise.

DETAILED DESCRIPTION OF THE DRAWINGS

In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and which show, by way of illustration, specific aspects of embodiments of the invention or specific aspects in which embodiments of the present invention may be used. It is understood that embodiments of the invention may be used in other aspects and comprise structural or logical changes not depicted in the figures. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.

For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding apparatus or system configured to perform the method and vice versa. For example, if one or a plurality of specific method steps are described, a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps) , even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, for example, if a specific apparatus is described based on one or a plurality of units, e.g. functional units, a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units) , even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.

The present invention is directed to object detection in reflective surfaces, e.g. a traffic mirror, and the respective semantic assessment of the detections for decision-making support in automated driving. The described sensors, processors, memory devices as well as detection and tracking systems may be embodied in hardware or software firmware of a vehicle or a combination thereof. The apparatus and method described below may also be regarded as particularly directed to handling of traffic mirrors for safe navigation in the context of automated driving.

FIG. 1 illustrates an exemplary environment, where object detection is required. A host vehicle 102 with automated driving capabilities is located in the environment 100. The automated driving capabilities of the vehicles comprise all levels of automation, e.g. Level 2 (partial driving automation) , Level 3 (conditional automation) or Level 5 (full automation) , wherein the levels are according to the definition of Society of Automotive Engineers. The environment 100 provides an intersection of roads, where two

incoming vehicles

103 and 104 approach the intersection. The field of view of the approaching host vehicle 102 is partially or completely occluded by infrastructure 105 of the environment 100. For safe traffic operations a reflective surface 101, for example a traffic mirror, is used.

The host vehicle 102 is using the reflective surface 101 for perceiving the incoming vehicle 103. Thus, safe performance of the host vehicle 102 of a passing or lane incorporation maneuver can be performed. As will be explained with reference to the other figures, the host vehicle 102 comprises an apparatus for detecting objects within the reflective surface 101.

Accordingly, also in other environments, where the field of view of automated vehicles is in any way occluded, the use of reflective surfaces 101 becomes necessary. Such environments are, but not limited to, parking lots entrances and exits, narrow shaped intersections and pedestrian crossings.

FIG. 2 shows a schematic of the host vehicle 102 and the reflective surface 101 of FIG. 1 An image capturing device 210 is configured to capture a video sequence formed from a set of input image frames, wherein the images include the reflective surface 101. The image capturing device 210 is a calibrated monocular camera, for example.

The vehicle 102 provides apparatus 220. The apparatus 220 comprises a processor 225. Optionally also a communication subsystem 240 is provided. The processor 225 is configured to receive the sequence of input image frames and detect the reflective surface 101 in the input image frames. Furthermore, the processor 225 is configured to perform object detection within the reflective surface 101. The detected objects include but are not limited to traffic participants and street topology.

The processor 225 is configured to track the detected objects over at least one input image frame. In this way, one or more lists of detected objects and their estimated locations can be provided to a decision-making support process implemented in the processor 225 for trajectory planning purposes of the host vehicle 102. The list (s) of detected objects may be stored in a memory 230 of the apparatus 220. Thus, one or more detected objects are allocated for specifying an object trajectory.

The processor 225 is configured to crop the detected reflective surface from one or more input image frames, thus providing a cropped image. The cropped image comprises the cropped pixels of the image of the detected reflective surface 101, e.g. a traffic mirror.

The memory 230 of apparatus 220 provides a high-definition (HD) map 231 of the environment, which is stored in the memory 230. Additional memory space 232 can be provided for detection, tracking and image processing tasks performed by the apparatus 220. The apparatus 220 provides implementation of the methods shown in FIG. 3, for example.

As shown in FIG. 2 the reflective surface 101 provides a local coordinate system CS. The host vehicle 102 provides a coordinate system Host Vehicle VCS. The use of these coordinate systems will become more apparent in the description of the following figures.

FIG. 3 is a block diagram of a procedure 300 for handling reflective surfaces 101. The procedure 300 can be divided into two methods: Reflective Surface Perception 305 and Go/No-go Decision 400. Both methods run parallel optionally exchanging resulting data or signals after performed steps.

The Reflective Surface Perception 305 encompasses three sub-methods to perceive and assign a semantic meaning to reflective surfaces 101 in driving scenarios: Object Detection 500 for detecting objects in reflective surfaces 101, Image Calibration 600 for calibrating and correcting the reflected image for distortion, and Allocation 700 for classifying the detected objects, assign them to a street topology and use the calibrated image of the reflective surface 101 to track the detected objects.

The steps of the Reflective Surface Perception 305 are not necessarily presented in any particular order and the performance of the procedure 300 in alternative order is possible and contemplated. Particularly, the sub-methods 500, 600 and 700 may not necessarily be executed or executed in sequence, but one or more of the steps of each may be intercalated to the best convenience of the overall efficiency of the perception procedure. The full procedure 300 or part (s) of it may be implemented in one or more of the elements of the apparatus 220.

The Go/No-go Decision 400 provides decision-making support for automated vehicles in driving scenarios requiring the use of reflective surfaces. The Go/No-go Decision 400 comprises retrieving information about positions, motion characteristics, moving directions and classes of the detected objects to determine the viability of a safe lane incorporation or passing maneuvers of the host vehicle 102. This information is gained by the Reflective Surface Perception 305, passed to the Go/No-go Decision 400 and runs as a parallel, continuous supervision loop to support the planning of lane incorporation or passing maneuvers, for example.

Accordingly, the Go/No-go Decision 400 determines the relevance of incoming traffic participants based on object detections in the reflective surface retrieved from the Object Detection 500 and from the parallel Allocation 700 containing processed information or data regarding the detections. Based on these inputs, the Go/No-go Decision 400 accordingly concludes a “Go” state 408 or “No-Go” state 407 explained in more detail in FIG. 4.

FIG. 4 is a flowchart of a Go/No-go decision of a host vehicle. At step 401, object detection (s) in the reflective surface are retrieved from the Object Detection 500. In the following steps 402 to 405, the detected objects in the reflective surface 101 are assessed for several relevant parameters, like their classes, their locations and their movement direction. The objects are assessed regarding their relevance to the safety of car maneuvers, e.g. merging into a lane or road a passing maneuver or similar. This assessment is largely based on the inputs from the parallel Allocation 700 (which is described in more detail below) .

At step 402, the relevance of the detected objects within the reflective surface is deduced based on the classes of the detected objects. For example, motorized vehicles and bicycles are relevant traffic participants that need to be considered to avoid possible accidents. However, other detected objects such as trees or even small animals may not be relevant to the decision process. In this sense, the classes of all current detections are assessed for relevance and the residual object detections, whose classes are considered relevant or unknown are passed on to the next step 403. If the classes of all current detected objects are assessed as irrelevant, the method enters the “Go” state 408.

Similarly, at step 403, the locations of the detected objects are assessed for relevance. Traffic participants assessed as relevant in the previous step 402 need to be found in relevant locations such as on the street within the traffic lanes to be considered as potentially hazardous for the host vehicle. Objects that may be found in irrelevant locations such as parking lots or garages are not relevant for the decision-making process. Thus, the locations of all the current detections are assessed for relevance and the residual object detections, whose locations are considered relevant or unknown are passed on to the next step 404. If the locations of all current detected objects are assessed as irrelevant, the method enters the “Go” state 408.

At step 404, the moving directions of the detected objects are evaluated. Relevant traffic participants found in relevant locations need to be moving towards the host vehicle to pose a real threat of an accident. For example, if a vehicle is detected on the street within the lanes, but it is moving away from the host vehicle, it is safe to assume that the host vehicle can safely perform an incorporation or passing maneuver. Therefore, the moving directions of all the current object detections are evaluated for relevance and the remaining object detections, whose moving directions are considered relevant are passed on to the next step 405.

If the moving directions of all current detected objects are assessed as irrelevant in step 405, the method enters the “Go” state 408. However, if the moving direction of at least one object detection is unknown, the method enters a “Standby” state 406, where the host vehicle is required to wait until the relevant detected objects have either left the scene, are not moving or are not in relevant locations anymore. In case this condition fulfilled, the “Stand-by” state 406, will proceed to the “Go” state 408. At step 405, the estimated distances between the remaining relevant detected objects and the host vehicle are retrieved from the Allocating 700 method and evaluated to determine, if the relevant traffic participants found on relevant locations and moving towards the host vehicle are within safe distances for the host vehicle to perform lane incorporation or passing maneuver.

If all the distances between all current detected objects and the host vehicle are evaluated as safe, the method enters the “Go” state 408 and the host vehicle performs the maneuver. Else, if at least one of the distances is evaluated as not safe, the method enters the “No-Go” state 407, and the method immediately continues to the “Stand-by” state 406, to wait for the incoming traffic participants to pass. If at least one of the distances between the host vehicle and the incoming traffic participants cannot be estimated or cannot be assessed, the method directly enters the “Stand-by” state 406.

The host vehicle 102 can start the maneuver, only if the Go/No-go Decision enters in the “Go” state 408, else the maneuver will not take place.

FIG. 5 is a block diagram for Object Detection 500. At step 501 a sequence of input images, e.g. from a video, is received. The input images can be captured from a video or image capturing device. The image sequence comprises objects, of which at least one is a reflective surface. At step 502 the reflective surface 101 is detected among one or more input frames. For detecting the reflective surface 101 an exemplary object detection algorithm such as but not limited to the “Single shot multibox detector” by Liu, Wei, et al (in the following referred to as SSD) can be used. This, or any other object detection algorithm can be specifically configured and trained for the detecting reflective surfaces, in particular traffic mirrors.

At step 503, the pixels of the image frames of the detected reflective surface 101 area, e.g. a traffic mirror, are cropped providing a cropped image for further processing. For this a bounding box is placed over the image, explained in more detail in FIG. 5a.

At step 504 the bounding box discretization space and network configuration of the previously used detection algorithm is adapted into a set of smaller anchor boxes. Subsequently, at step 505 a high-resolution object detection is performed over the cropped image with a modified discretization space configuration. By adapting the discretization space and network configuration of the detection model into a set of smaller default anchor boxes, the detection resolution is increased within the selected cropped image. This modified configuration of the object detection algorithm (e.g. adapted and/or trained SSD /algorithm) allows performing a reliable object and street topology detection within the cropped image.

FIG. 5a illustrates the Object Detection 500 by using an exemplary image 551 received from a set of input image frames. The object detection described here for one image, is also applicable to a sequence or set of input image frames.

Usually, networks scan the image with sets of fixed-sized bounding box grids that are mapped to convolution layers. On the one hand, if the size of the bounding boxes is too big, the algorithm will have difficulties to detect the reflective surface, let alone objects. On the other hand, if the size of the bounding boxes is too small, the detection becomes slow and resource intensive. Accordingly, it is proposed, to choose an appropriate size of the fixed-sized bounding boxes, which is crucial for the performance and accuracy of the network. For example the size can be chosen to not be larger than the largest reflective surfaces in the area surrounding the car that should be detected for the evaluation. The size of the used bounding box, reflects a compromise between scanning resolution and computational efficiency.

The image 551 comprises objects 570, of which at least one is a reflective surface 101, in this case a traffic mirror. The image shows the reflective surface 101 attached to a rod 554, with a road sign 555. In the reflective surface 101 area other objects 570 can be seen.

The reflective surface 101 is detected in the image (FIG. 5a (a) ) . For this a bounding box 556 discretization space and network configuration is used. The bounding box 556 slides over the image 551 vertically and horizontally for applying the detection of the reflective surface. The dots are sampling points of images, and each sampling point is used as the center point of the box in the SSD algorithm. The same applies for FIG. 5a (c) .

The pixels of the image of the detected reflective surface 101 area, here the traffic mirror, are cropped as can be seen in FIG. 5a (b) . A cropped image 560 is provided for further processing. As can be seen in FIG. 5a (c) the cropped image 560 comprises the objects 570 already present in FIG. 5a (a, b) , however in smaller size.

A high-resolution object detection is performed over the cropped image 560, as shown in FIG. 5a (c) . An adapted bounding box 561 discretization space and network configuration of the previously used detection algorithm is used. A set of smaller anchor boxes is used, as can be seen in FIG. 5a (c) .

Subsequently, a high-resolution object detection is performed over the cropped image 560 with a modified discretization space configuration. By adapting the discretization space and network configuration of the detection model into a set of smaller default anchor boxes, the detection resolution is increased within the selected cropped image 560. This modified configuration of the object detection algorithm (e.g. adapted and/or trained SSD /algorithm) allows performing a reliable object and street topology detection within the cropped image 560.

As can be seen in FIG. 5a (d) objects 570 are detected within the cropped image 560. Some of the detected objects 570 are to be classified as traffic participants, such as the

vehicles

552, 553. Other objects 570 are classified as street topology 571.

FIG. 6 shows a block diagram of an Image Calibration 600 technique or method. At step 601 the cropped image or a sequence of cropped images of the resulting bounding boxes of the reflective surface detections is received. The cropped image or images are generated as described in in FIG. 5 steps 503 to 505. After receiving the cropped image (s) of the detected reflective surface, at step 602 the host vehicle 102 is detected in the cropped image (s) with the modified configuration of the object detection mentioned at step 504 (FIG. 5) . Alternatively or additionally, in other embodiments, other communicating vehicles may be detected with V2V (vehicle to vehicle) or V2I (vehicle to Internet) communication services, e.g. via 5G or the like.

At step 603, a known position and geometry, such as height, width, length of the vehicle (in the following referred to as geometric information) of the host vehicle 102 or other communicating vehicles is retrieved. At step 604 the cropped image is corrected for distortion caused, for example, by the curvature of the reflective surface. In case the reflective surface is a convex mirror (atraffic mirror is also referred to a convex or curved mirror) , a fisheye mirror or a diverging mirror, the outward bulge or curve of the mirror expands the visible field reflecting a wider field of view as a non-curved mirror. The cropped image can be calibrated: the position and rotation of the reflective surface with respect to the position of the host vehicle 102 is estimated and corrected.

To increase the availability of information for image calibration, V2V communications can be used, so that the host vehicle 102 can further retrieve positions and geometry of other communicating vehicles that are also detected within the cropped image. The intention is to use the geometry and position of the host vehicle 102 as the main information source for the distortion correction and image calibration. The V2V information can then be used as additional information for achieving higher accuracy or also for a case that the host vehicle is not visible within the reflective surface.

Referring to FIG. 4, use of V2V communications, therefore, increases the ability of the Go/No-Go Decision 400 to enter the “Go” state 408 and thus increases availability of the automated driving functionality. Moreover, in addition to V2V communications, information about the environment, such as the position and orientation of the reflective surface can be retrieved via V2I communication, if available. This enhances the accuracy and reliability of the presented system further.

Distortion correction is performed at step 604, so that the cropped image is corrected for distortion. The known geometries of the host vehicle and other communicating vehicles, if any, are used to map the pixels of detected distorted structural features to the known geometry. In this way, through by comparing distorted and known geometry, radial and tangential distortion coefficients and intrinsic calibration parameters of an image calibration model (such as but not limited to a fish-eye calibration model) are estimated. Subsequently the image is adapted accordingly, producing an undistorted image of the reflected surface.

After retrieving undistorted images from different image frames, image calibration is performed at step 605. Image calibration 605 is included for estimating translational and rotational calibration parameters (extrinsic calibration parameters) of the reflective surface 101 with respect to the host vehicle 102. This is accomplished by, firstly, calculating a translation vector between the reflective surface and the host vehicle by triangulation methods, and then calculating a rotation matrix of the reflective surface by comparison of the host vehicle’s known geometry during different, consequent image frames. Accuracy of this calculation is optionally increased by incorporation of retrieved geometries of V2V capable vehicles.

FIG. 7 is a block diagram of an Allocating 700 technique or method for the detected objects, assigning them to the street topology and using the calibrated cropped image to track the detected objects. As mentioned in FIG. 4, the detected objects are provided to the Go/No-go Decision 400 method along with necessary inputs for relevance estimation of the respective detected objects, as described in FIG. 4

At step 701, the detected objects are classified in predefined categories to determine objects of relevant classes, for example, motorized vehicles or bicycles, etc. also referred to as detected traffic participants in the following. Furthermore, some objects are classified as belonging to the environment, such as a street, trees, walkway or traffic signs, etc. also referred to as detected street topology in the following.

At step 702, the detected street topology in the cropped image (including but not limited to street lanes) is mapped to the street topology of a HD map 231. Such HP map can be provided as internal HD map 231. At step 703, the detected traffic participants are assigned to the recognized street topology (after mapping step 702) , to provide a rough location assessment of the detected objects.

At step 704, the detected traffic participants are tracked within the cropped image in local coordinates CS of the reflected surface. Thus, relevance of the motion direction of the detected traffic participants is determined and assessed. At step 705 position and velocity of the relevant detected objects are estimated in the host vehicle coordinate system to verify a safe distance between the detected object and the host vehicle.

FIG. 8 is a block diagram showing an interaction between Object Detection, Allocation of detected objects and Go/No-go Decision according to various embodiments described above. The Reflective Surface Perception 305 as well as the Go/No-go Decision 400 methods are implemented by the host vehicle 102 with automated driving capabilities.

FIG. 8 provides in detail a possible composition of the methods described above and the relations between them. The Go/No-go Decision 400 and the Reflective Surface Perception 305 run in parallel. Inputs are provided to the Go/No-go Decision 400 by the Reflective Surface Perception 305. The Go/No-go Decision 400 may consequently interrupt the Reflective Surface Perception 305, if the “Go” state 408 is reached.

The Reflective Surface Perception 305 comprises of three sub-methods, the Object Detection 500, Image Calibration 600 and Allocation 700. Steps of the Image Calibration 600, namely Distortion Correction 604 and Image Calibration 605, are embedded within the Object Detection 500 and the Allocation 700 respectively. These steps may also receive direct input from the V2V communications subsystem 240.

Apart from the mentioned steps of Image Calibration 600, the steps occur in a sequential manner, where the Object Detection method 500 receives image input frames and provides the detected objects as output to the Allocation 700 for further processing.

Regarding Image Calibration 600, Distortion Correction 604 takes place once the reflective surface is detected and cropped. This allows a better object detection performance within the reflective surface in the following steps of the method. Image Calibration 605 is embedded in Allocation 700 and is performed before the detected relevant traffic participants are tracked in a host vehicle coordinate system at step 705. For this, the extrinsic calibration parameters of the reflected surface are needed.

The Go/No-go Decision 400 receives the detected objects from step 505 of the Object Detection 500 to start relevance determination. The relevance determination 402 retrieves its inputs, i.e. classes of objects, from Allocation 700, determines the relevance of the detected object and accordingly issues a “Go” state 408 or “No-go” state 407.

Step 701 of the Allocation 700 provides classes of the detected object (s) to the step 402 of the Go/No-go Decision 400 for the relevance estimation of the detection type. Accordingly, step 702 provides step 403 with associations of the detected objects and the street topology to determine the relevance of the location of the detected objects. Step 704 provides step 404 with one or more track list (s) of the detections in local coordinates to evaluate the relevance of the moving direction of the detected objects. Finally, step 705 provides step 405 with the track lists of the detected objects in the host vehicle coordinate system to assess if the distance between the host vehicle and the detected objects is safe for the performance of a lane incorporation or passing maneuver. FIG. 8 shows the complete interaction of the methods and the flow of information between them.

FIG. 9 is a block diagram of a further example of an Allocating 700 technique or method for detected objects. Allocating 700 further comprises Tracking 900 of the relevant detected objects in the reflective surface with respect to the host vehicle coordinate system.

Position and velocity of the relevant detected objects is tracked to verify safe distance between the detected objects and the host vehicle. At step 901, the curvature of the reflected surface is approximated using distortion parameters estimated during the Image Calibration 600 at step 604. The distortion parameters assume a predetermined curvature model of generic reflective surfaces, e.g. traffic mirrors, so that the curvature, if any, of the reflective surface can be calculated. This can be done, for example, by mapping the distortion parameters with a defined curvature. The curvature may be continually adapted or extended over time to better fit a larger set of generic reflective surfaces.

At step 902, after calculating an approximate of the surface curvature and considering position and rotation of the reflective surface with respect to the position of the host vehicle 102, based on the estimated curvature or the reflection properties of the reflective surface the pose of other detected objects, such as cars, are estimated, using the position of the reflection in the reflective surface relative to the camera pose. To associate areas within the reflective surface with the road topology and thereby associating detected objects with the road topology, rays connecting the reflective surface and the road topology can be traced according to the reflection properties of the reflective surface. The selection of suitable rays is determined by the distance of road topology that should be associated with the reflective surface. The enclosed areas of the ground that are delimited by the mentioned rays represent rough estimations of the positions of the detected traffic participants. With these position estimations and given that the detected objects were assigned to the street topology in the previous step 703 of the Allocation 700, at step 903, a combined estimation of the positions of the detected traffic participants with respect to the host vehicle coordinate system is calculated.

The invention has been described in conjunction with various embodiments herein. However, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

Claims

An apparatus (220) comprising at least one processor (225) configured to:

- receive (501) a sequence of input image frames;

- detect (502) a reflective surface (101) in the sequence of input image frames;

- perform (505) object detection within the detected reflective surface; and

- allocate (700) one or more detected objects (570) for specifying an object trajectory.
The apparatus according to claim 1, wherein the at least one processor is further configured to crop (503) the detected reflective surface from one or more input frames, providing at least one cropped image (560) comprising cropped pixels of a corresponding bounding box.
The apparatus according to claim 2, wherein the at least one processor is further configured to perform a high-resolution object detection over the cropped image.
The apparatus according to any of the preceding claims, wherein allocating of the one or more detected objects (570) further comprises at least one of: classifying the one or more detected objects, assigning the one or more detected objects to an environment topology, and tracking the assigned one or more detected objects.
The apparatus according to any of the claims 2 to 4, wherein the at least one processor is further configured to calibrate (600) the cropped image using distortion correction.
The apparatus according to any of the claims 2 to 5, wherein the at least one processor is further configured to calibrate extrinsic parameters of the cropped image with provided or detected positions and geometry of the environment topology.
The apparatus according to any of the preceding claims, further comprising a communication subsystem (240) configured to use wireless technology and associated protocols to enable communication between the apparatus (220) and other remote computer networks or devices.
A method of object detection implemented by an apparatus, comprising:

- receiving (501) a sequence of input image frames;

- detecting (502) a reflective surface in the sequence of input image frames;

- performing (505) object detection within the detected reflective surface;

- allocating (700) one or more detected objects for specifying an object trajectory.
The method according to claim 8, further comprising cropping the detected reflective surface from one or more input frames and providing a cropped image comprising cropped pixels of a corresponding bounding box.
The method according to any of the claims 8 to 9, wherein allocating of the one or more detected objects further comprises at least one of:

- classifying the one or more detected objects;

- assigning the one or more detected objects to an environment topology; and

- tracking the assigned one or more detected objects.
The method according to any of the claims 8 to 10, further comprising tracking the trajectory of the assigned one or more detected objects in a coordinate system external to the reflective surface.
The method according to claim 11, further comprising predicting a position of the detected object based on the assignment of the object detections to street topology.
A method of operating a self-driving vehicle according to one of the claims 8-11, wherein the method is implemented by an apparatus of a vehicle, further comprising:

- detecting at least one of the objects as the vehicle;

- retrieving positions and geometry of the vehicle;

- estimating radial and tangential distortion coefficients by comparing distorted positions of the vehicle in a coordinate system of the reflective surface with the retrieved positions and geometry of the vehicle.
The method according to one of the claims 8 to 13, further comprising:

- assessing relevance of the one or more detected objects;

- determining a Go state or No-go state for supporting the planning of a lane incorporation or passing maneuver.
Self-driving vehicle comprising the apparatus of one of the claims 1 to 7 for operating the vehicle according to one of the claims 8 to 14 for automated driving support.