WO2022258203A1

WO2022258203A1 - Platform for perception function development for automated driving system

Info

Publication number: WO2022258203A1
Application number: PCT/EP2021/065855
Authority: WO
Inventors: Magnus Gyllenhammar; Carl ZANDÉN; Majid KHORSAND VAKILZADEH
Original assignee: Zenseact Ab
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2022-12-15

Abstract

Disclosed herein are methods, apparatuses and systems that allow the passengers of an ADS- quipped vehicle to supply weak (i.e. "inaccurate") annotations to the vehicle-platform by streaming (or otherwise transmitting) perception data (e.g. images or various combinations of other sensor data) to the passenger's mobile devices to elicit annotations. More specifically, the herein disclosed embodiments provide a means for the vehicle's perception function to receive or retrieve high quality data that can effectively be utilized for safety assurance, development/updating of the underlying perception models and/or for validation/verification efforts at a low cost and at a high pace. The platform may be further used for perception matching, in order to obtain a "matching" of perceptive parameters between different versions of the perception system or even of different perception systems altogether.

Description

Title

PLATFORM FOR PERCEPTION FUNCTION DEVELOPMENT FOR AUTOMATED DRIVING SYSTEM

TECHNICAL FIELD

The present disclosure relates to methods and systems for performance evaluation and/or development of perception features of a vehicle.

BACKGROUND

During the last few years, the research and development activities related to autonomous vehicles has exploded in number and many different approaches are being explored. An increasing portion of modern vehicles have advanced driver-assistance systems (ADAS) to increase vehicle safety and more generally road safety. ADAS - which for instance may be represented by adaptive cruise control, ACC, collision avoidance system, forward collision warning, etc. - are electronic systems that may aid a vehicle driver while driving. Today, there is ongoing research and development within a number of technical areas associated to both the ADAS and Autonomous Driving (AD) field. ADAS and AD will herein be referred to under the common term Automated Driving System (ADS) corresponding to all of the different levels of automation as for example defined by the SAE JS016 levels (0 - 5) of driving automation, and in particular for level 4 and 5.

In a not too distant future, ADS solutions are expected to have found their way into a majority of the new cars being put on the market. An ADS may be construed as a complex combination of various components that can be defined as systems where perception, decision making, and operation of the vehicle are performed by electronics and machinery instead of a human driver, and as introduction of automation into road traffic. This includes handling of the vehicle, destination, as well as awareness of surroundings. While the automated system has control over the vehicle, it allows the human operator to leave all or at least some responsibilities to the system. An ADS commonly combines a variety of sensors to perceive the vehicle's surroundings, such as e.g. radar, LIDAR, sonar, camera, navigation system e.g. GPS, odometer and/or inertial measurement units (IMUs), upon which advanced control systems may interpret sensory information to identify appropriate navigation paths, as well as obstacles, free-space areas, and/or relevant signage.

Much of the current efforts for development of ADSs revolves around safely launching a first system to the market. Generally, there are significant costs associated with the development and verification of safety of the ADS, especially related to field tests and the understanding of how the system behaves in traffic. Moreover, there are additional challenges in terms of managing the immense amounts of data generated by ADS equipped vehicles in order to develop and verify various ADS features, not only from a data storage, processing and bandwidth perspective, but also from a data security and data privacy perspective.

As mentioned, the perception systems of an ADS is used to process raw sensor data to output a more refined world-view of the surrounding of the ADS than the raw sensor detections imply. Especially the vision part of such systems is commonly realized by using (deep) neural networks. However, such networks need to be trained using a set of training samples that are used as ground truth. This process of supervised learning requires annotation of raw images to be able to use them as training samples. This process of annotation is both costly and time consuming. By using accurate annotations, it is believed that there is a need for several hundreds of thousands of accurately annotated images. Since accurate annotations is difficult and expensive to obtain it has recently been investigated to use a scheme of "weak annotations", i.e. annotations of less accuracy, to obtain the same results. For example, by using weak annotations the need for accurate annotations may be reduced by around 70% and replaced with the same number of weak annotations as the original number (i.e. 1 million "accurately" annotated images can be replaced with 300 000 accurately annotated images and 1 million "weakly" annotated images).

There is accordingly a need in the art for new solutions for facilitating development and verification of ADSs in order to continuously be able to provide safer and more performant systems. As always, the improvements shall preferably be made without significant impact on the size, power consumption and cost of the on-board system or platform.

SUMMARY It is therefore an object of the present invention to provide solutions for facilitating development, testing, and/or validation of perception features or functions for autonomous and semi-autonomous vehicles in order to continuously be able to provide safer and more performant systems.

It is also an object of the present invention to provide method for enabling weak annotation of perception output for development of perception features for a vehicle, computer-readable storage medium, a system, which alleviate, mitigate or completely eliminate all or at least some of the drawbacks of presently known solutions.

It is also an object of the present invention to provide method performed by an in-vehicle processing device for enabling weak annotation of perception output for development of perception features for a vehicle, a computer-readable storage medium a corresponding apparatus, and a vehicle comprising such an apparatus, which alleviate, mitigate or completely eliminate all or at least some of the drawbacks of presently known solutions.

It is also an object of the present invention to provide a method performed by one or more processors of a user-device for enabling weak annotation of perception output for development of perception features for a vehicle, a computer-readable storage medium, and a corresponding user-device, which alleviate, mitigate or completely eliminate all or at least some of the drawbacks of presently known solutions.

It is also an object of the present invention to provide a method for development of a perception-development module of a vehicle, and a corresponding system, which alleviate, mitigate or completely eliminate all or at least some of the drawbacks of presently known solutions.

These objects are achieved by means of the methods, systems, computer-readable storage media, apparatuses, vehicles and user-devices, as defined in the appended claims. The term exemplary is in the present context to be understood as serving as an instance, example or illustration. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

In accordance with a first aspect of the present invention, there is provided a method for enabling weak annotation of perception output for development of perception features for a vehicle. The method comprises obtaining, in the vehicle, a first set of perception data from comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors. The method further comprises forming, in the vehicle, a filtered worldview from the obtained first set of perception data, where the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data. Furthermore, the method comprises transmitting the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. The method further comprises at the user-device displaying via the display apparatus, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. Furthermore, the method comprises, at the user-device, obtaining a user annotation event from the input device of the user-device, the user annotation event being indicative of a user interaction with the displayed graphical representation. After the obtained user annotation event, the method further comprises, at the user-device, forming an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event and the filtered worldview, and transmitting the annotated worldview.

According to a second aspect of the present invention, there is provided a system for enabling weak annotation of perception output for development of perception features for a vehicle. The system comprises an in-vehicle apparatus and a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. The in- vehicle apparatus comprises control circuitry configured to obtain a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors. The control circuitry is further configured to form a filtered worldview from the obtained first set of perception data, and to transmit the filtered worldview from the vehicle to the user-device. The filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data. The one or more processors of the user-device are configured to display via the display apparatus, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. The one or more processors are further configured to obtain a user annotation event from the input device of the user-device, where the user annotation event being indicative of a user interaction with the displayed graphical representation. After the obtained user annotation event, the one or more processors are configured to form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event and the filtered worldview, and to transmit the annotated worldview. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

According to a third aspect of the present invention, there is provided a method performed by an in-vehicle processing device for enabling weak annotation of perception output for development of perception features for a vehicle. The method comprises obtaining, in the vehicle, a first set of perception comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors. The method further comprises forming, in the vehicle, a filtered worldview from the obtained first set of perception data, where the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data. Moreover, the method comprises transmitting the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. Then, the method comprises receiving, in the vehicle, an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data from the user-device. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a processing system, the one or more programs comprising instructions for performing the method according to any one of the embodiments of the third aspect. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention. According to a fifth aspect of the present invention, there is provided an apparatus for enabling weak annotation of perception output for development of perception features for a vehicle. The apparatus comprises control circuitry configured to obtain, in the vehicle, a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors. The control circuitry is further configured to form, in the vehicle, a filtered worldview from the obtained first set of perception data, where the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data. Moreover, the control circuitry is configured to transmit the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. Then, the control circuitry is configured to receive, in the vehicle, an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data from the user-device. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

According to a sixth aspect of the present invention, there is provided a vehicle comprising a set of vehicle-mounted sensors configured to monitor a surrounding environment of the vehicle. The vehicle further comprises a set of vehicle-mounted sensors configured to monitor the surrounding environment of the vehicle, and an apparatus according to any one of the embodiments of the fifth aspect of the invention. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

According to an seventh aspect of the present invention, there is provided a method performed by one or more processors of a user-device for enabling weak annotation of perception output for development of perception features for a vehicle. The method comprises receiving, from the vehicle, a filtered worldview generated by processing a perception output of the vehicle. Furthermore, the method comprises displaying via the display apparatus, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. The method further comprises obtaining a user annotation event from the input device of the user-device. The user annotation event is indicative of a user interaction with the displayed graphical representation. Then, after the obtained user annotation event, the method comprises forming an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event and the filtered worldview, and transmitting the annotated worldview. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

According to an eighth aspect of the present invention, there is provided a computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a processing system, the one or more programs comprising instructions for performing a method according to any one of the embodiments of the seventh aspect of the invention. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

According to a ninth aspect of the present invention, there is provided a user-device for enabling weak annotation of perception output for development of perception features for a vehicle. The user device comprises one or more processors, at least one memory, a display apparatus, and at least one input device. The one or more processors are configured to receive, from the vehicle, a filtered worldview generated by processing a perception output of the vehicle, and to display via the display apparatus, a graphical user interface. The graphical user interface comprises a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. The one or more processors are further configured to obtain a user annotation event from the input device of the user-device. The user annotation event is indicative of a user interaction with the displayed graphical representation. After the obtained user annotation event, the one or more processors are configured to form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event, and to transmit the annotated worldview. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed first aspect of the invention.

Further, according to a tenth aspect of the present invention, there is provided a method for development of a perception-development module of an vehicle. The method comprises obtaining, at the vehicle, a first set of perception data indicative of a surrounding environment of the vehicle during a time period. The method further comprises obtaining, at the vehicle, a second set of perception data indicative of the surrounding environment of the vehicle during the time period. The second set of perception data is different from the first set of perception data. Furthermore, the method comprises transmitting, from the vehicle, the first set of perception data and the second set of perception data to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device.

Moreover, the method comprises, at the user-device, displaying via the display apparatus, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on first set of perception data and the second set of perception data. The method further comprises, at the user-device, displaying via the display apparatus, a graphical user interface comprising a prompter to match the second set of perception data to the first set of perception data in order to identify a match between a perceptive parameter of the second set of perception data and a corresponding perceptive parameter in the first set of perception data. Further, the method comprises, at the user device, obtaining a user interaction event from the input device of the user-device in response to the displayed prompter, the user interaction event being indicative of a user interaction with the displayed graphical representation. After the obtained user interaction event, the method comprises, at the user-device, matching the perceptive parameter of the second set of perception data and the corresponding perceptive parameter in the first set of perception data based on the obtained user interaction event. Moreover, the method comprises, at the user-device, transmitting an output signal indicative of the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the first set of perception data to the vehicle. Then, the method comprises updating, at the vehicle, one or more parameters of a perception model of a perception-development module based on the output signal.

According to an eleventh aspect of the present invention, there is provided a system for development of a perception-development module of a vehicle. The method comprises an in- vehicle apparatus and a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. The in-vehicle apparatus comprises control circuitry configured to obtain a first set of perception data indicative of a surrounding environment of the vehicle during a time period, and to obtain a second set of perception data indicative of the surrounding environment of the vehicle during the time period. The second set of perception data is different from the first set of perception data. The control circuitry is further configured to transmit the first set of perception data and the second set of perception data to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. The one or more processors of the user-device are configured to display via the display apparatus, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the first set of perception data and the second set of perception data. The one or more processors of the user-device are further configured to display via the display apparatus, a graphical user interface comprising a prompter to match the second set of perception data to the baseline worldview in order to identify a match between the perceptive parameter of the second set of perception data and a corresponding perceptive parameter in the first set of perception data. The one or more processors are further configured to obtain a user interaction event from the input device of the user-device in response to the displayed prompter. The user interaction event is indicative of a user interaction with the displayed graphical representation. After the obtained user interaction event, the one or more processors are configured to match the perceptive parameter of the second set of perception data and the corresponding perceptive parameter in the first set of perception data based on the obtained user interaction event, and to transmit an output signal indicative of the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the first set of perception data to the vehicle. The control circuitry of the in-vehicle apparatus is further configured to update one or more parameters of a perception model of a perception-development module based on the output signal. With this aspect of the invention, similar advantages and preferred features are present as in the previously discussed eleventh aspect of the invention.

Further embodiments of the invention are defined in the dependent claims. It should be emphasized that the term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, integers, steps, or components. It does not preclude the presence or addition of one or more other features, integers, steps, components, or groups thereof. These and other features and advantages of the present invention will in the following be further clarified with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1 is a schematic communication sequence diagram representation of a method for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention.

Fig. 2 is a schematic flow chart illustration of a method for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention.

Fig. 3 is a block diagram representation of a system for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention.

Fig. 4 is a block diagram representation of a system for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention.

Fig. 5 is a block diagram representation of a system for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention.

Fig. 6 is a block diagram representation of a system or development of a perception- development module for a vehicle in accordance with an embodiment of the present invention.

Fig. 7 is schematic top-view illustration of a post-processing method in accordance with an embodiment of the invention in the form of a series of scenes depicting temporal development of vehicle approaching an object.

Fig. 8 is a schematic perspective view of a vehicle and a perceptive parameter matching- process in accordance with an embodiment of the present invention. Fig. 9 is a schematic top view of a vehicle and a perceptive parameter matching-process in accordance with an embodiment of the present invention.

Fig. 10 is a schematic top view of a vehicle and a perceptive parameter matching-process in accordance with an embodiment of the present invention.

Fig. 11 is a schematic side-view of a vehicle comprising an apparatus for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention.

DETAILED DESCRIPTION

In the following detailed description, embodiments of the present invention will be described. However, it is to be understood that features of the different embodiments are exchangeable between the embodiments and may be combined in different ways, unless anything else is specifically indicated. Even though in the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well known constructions or functions are not described in detail, so as not to obscure the present invention.

Those skilled in the art will appreciate that the steps, services and functions explained herein may be implemented using individual hardware circuitry, using software functioning in conjunction with a programmed microprocessor or general purpose computer, using one or more Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGA) and/or using one or more Digital Signal Processors (DSPs). It will also be appreciated that when the present disclosure is described in terms of a method, it may also be embodied in one or more processors and one or more memories coupled to the one or more processors, wherein the one or more memories store one or more programs that perform the steps, services and functions disclosed herein when executed by the one or more processors.

OVERVIEW

As stated in the background section, there are some drawbacks related to conventional solutions for annotating raw images in order to be able to use them as training samples for training neural networks used in perception features for vehicles, and in particular for ADS- equipped vehicles. It is therefore herein proposed a platform for perception system development suitable for vehicles, and in particular autonomous and semi-autonomous vehicles, which allows non-trained persons (or non-professionals) to supply these annotations without the need for prior domain expertise. Further, a scheme of federated learning may be used as a means to train various perception features/functions in each vehicle during operation based on the annotated data without a need for the data to be transmitted to any remote entity.

In short, the herein proposed methods, apparatuses and systems allow the passengers of an ADS-equipped vehicle to supply weak (i.e. "inaccurate") annotations to the platform by streaming (or otherwise transmitting) perception data (e.g. images or various combinations of other sensor data) to the passenger's mobile devices to elicit annotations. More specifically, the herein disclosed embodiments provide a means for the vehicle's perception function to receive or retrieve high quality data that can effectively be utilized for safety assurance, development/updating of the underlying perception models and/or for validation/verification efforts at a lower cost and at a higher pace than other conventional solutions.

Consequently, advantages in terms of reduced costs for generating annotations for perception development and increased "speed" in the development of new perception features are readily achievable. In extension, the development of perception features for autonomous and semi-autonomous vehicles may proceed at a greater pace and at a reduced cost, thereby providing safer and overall more capable Automated Driving Systems. Moreover, the herein proposed solution may, in accordance with some embodiments, further alleviate problems related to data security and data privacy as the need for transmission of specific data to remote locations may be reduced or completely alleviated.

A "perception system" as used herein, may be understood as software and/or hardware configured to acquire (raw) sensor data from one or more on-board sensors such as cameras, LIDARs and RADARs, ultrasonic sensors, and convert this (raw) sensor data into scene understanding including state estimations and/or predictions thereof (i.e. to a "worldview"). Accordingly, a perception system is configured to generate perception output/data that is indicative of one or more perceptive parameters (e.g. object position, object dimension, object classification, lane tracking, road geometry estimation, free-space estimation, etc.) based on one or more perception models (e.g. one or more neural networks) and sensor data serving as input. These "perception models" may, in accordance with some embodiments, then be updated by means of a weakly supervised learning algorithm using the weakly- annotated data. Thus, in reference to a "perception system" as used herein, a perception module (e.g. a perception development module) may be understood as a sub-module of the complete perception system.

Moreover, in the following, reference will be made to a "perception-development module", which may be understood as a "Module Under Test" (MUT), meaning that is a "new" (under development) software and/or hardware perception feature/function. The perception feature may for example be an object detection feature, an object classification feature, an object state estimation feature, a road reference estimation feature, a free-space estimation feature, a road friction estimation feature, an object trajectory estimation feature, a target/object tracking feature, and/or drivable area estimation feature. In other words, the perception- development module may in the present context be understood as software and/or hardware configured to generate a perception output, where the module is currently "under development", and not yet "in production" (e.g. not verified/validated). Thus, in reference to the terms "perception module" and "perception system", a difference is that the "perception- development module" is not yet actively used for generating perception output that the ADS acts upon, assuming that the vehicle is equipped with an ADS. Accordingly, in some embodiments, the vehicle may simply be equipped with one or more sensors (e.g. cameras) whose output is processed and used in accordance with the concepts herein to develop a "new" perception module. In more detail, the obtained first set of perception data may in some embodiments be in the form of raw sensor data (e.g. images captured by one or more vehicle-mounted cameras). Thus, the vehicle does not need to have an ADS in order to utilize the teachings herein.

The term "storing" perception data may refer to "storing in one or more memories", "storing on-board said vehicle", "storing in one or more memories on-board said vehicle", and/or "storing digitally and/or electronically" a set of perception data, and further to "collecting" and/or "obtaining" a set of perception data. The term "set" of perception data, on the other hand, may refer to "range", "amount", "series", "continuous and/or intermittent flow" and/or "collection" of perception data, whereas "perception data" may refer to "continuously and/or intermittently collected perception data". Furthermore, the term "perception" data may refer to "surroundings assessment" data, "spatial perception" data, "processed sensory" data and/or "temporal dependencies" data, whereas perception "data" may refer to perception "information" and/or "estimates". The term "obtained" from a perception module, on the other hand, may refer to "derived" from a perception model and/or "based on output data" from a perception module whereas perception module configured to "generate the set of perception data" may refer to perception module/system adapted and/or configured to "estimate the surroundings of said vehicle", "estimate at least a portion of surroundings of said vehicle", "determine surroundings of said vehicle", "interpret sensory information relevant for the autonomous manoeuvring of said vehicle", and/or "estimate surroundings of said vehicle and make model predictions of future states of the surroundings of said vehicle".

The term "perception model" is in the present context to be understood as a software algorithm configured to receive input in the form of sensor data (raw or having some level of pre processing) and to therefore generate an output comprising a representation of at least a portion of the surrounding environment of the vehicle. The perception model may for example be in the form of a neural network, and the model parameters may accordingly be in the form of network weights. Thus, a number of "perception models" may be used independently for different tasks such as lane segmentation, traffic sign identification, free-space estimation. However, these outputs should preferably be fused and provided as input for various "decision and control" functions, which supply the control signals for manoeuvring the vehicle autonomously.

The phrase storing "during" a time period may refer to storing "for" a time period, whereas time period "ranging" from a first time point to a second time point may refer to time period "extending" and/or "taking place" from a first time point to a second time point. "Time period" may refer to "pre-determinable time period" and/or "predetermined time period". "Time point", on the other hand, may refer to "point in time", whereas "from a first time point to a second time point" may refer to "from a first time point to a subsequent second time point".

Fig. 1 provides a schematic overview of some embodiments of the present invention, in the form of a schematic communication sequence diagram representation of a method for enabling weak annotation of perception output for development of perception features for a vehicle. In some embodiments, the method comprises, at the user device 200, obtaining S101 a user request from an input device 202 of the user-device 200. The user request may be understood as a request from the user to the vehicle 100 to start transmitting data to allow the user to start the annotation of the vehicle's 100 perception output. The user request may then be processed at a stream and processing engine 201 of the user-device 200 so to generate S102 and transmit a data stream request to the vehicle 100 via a communication interface of the user device 200. The term obtaining is herein to be interpreted broadly and encompasses receiving, retrieving, collecting, acquiring, and so forth.

The data stream request is received at a streaming and processing engine 102 of the vehicle 100 via a corresponding communication interface. The streaming and processing engine 102 of the vehicle may be further configured to generate S103 a request for the vehicle's perception stack 101 to retrieve S104 data from one or more on-board sensors of the vehicle or from the vehicle's perception system. In some embodiments, where the vehicle 100 is an ADS-equipped vehicle, the perception stack may be a part of the vehicle's 100 ADS. Once, the required data/information has been retrieved S104, the perception data is processed S105 in order to form a filtered worldview to be transmitted to the user-device 200 to elicit annotations from the user. The filtered worldview may in accordance with some embodiments be a processed version of the vehicle's perception output in a suitable format for display at the user device, including necessary indicators and prompters to be displayed at the user device.

The user device then receives and processes the filtered worldview in order to display S106, via a display apparatus of the user-device 202, a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. The user is then prompted to interact S107 with the presented S106 scene while the user-device processes S108 the user interactions and the displayed graphical representation in order to form an annotated worldview (i.e. a weakly annotated dataset). This annotated worldview may subsequently be transmitted to the vehicle 100 from the user-device 200, where it may be used to update S109 one or more model parameters of a perception model employed by the vehicle's 100 perception system. However, in some embodiments, the annotated worldview may not be suitable to be consumed directly by the in-vehicle training algorithm, but may instead be indicative of a rare scenario or edge-case. In such cases, the relevant dataset (sensor data, perception data, and annotated worldview) may instead be transmitted to a back-office for manual analysis. Further details and example embodiments of the above- summarized process are provided in the following.

WEAK ANNOTATION PLATFORM

Fig. 2 is a schematic flow chart illustration of a method S200 for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with an embodiment of the present invention. Moreover, the flow chart depicted in Fig. 2 provides an overview of the processes performed in each node of the system, the nodes being the vehicle 100, the user-device 200, and the remote system 300 (i.e. back-office, fleet management system, or an associated "cloud-system").

The user-device 200 may for example be a wireless communication device (may also be referred to as a user equipment, wireless device or terminal), or a fixed terminal arranged in the vehicle (e.g. in the form of a touch-screen arranged on the back-side of a front seat). In very general forms, it should be understood by the skilled in the art that "wireless communication device" is a non-limiting term which means any suitable wireless device, terminal, or node having a graphical user-interface and being capable of receiving in DL and transmitting in UL (e.g. PDA, laptop, mobile, etc.). Alternatively, the user-device 200 may be configured to communicate with the vehicle 100 using a wired local network setup, such as CAN bus, I2C, Ethernet, optical fibers, and so on. However, the user-device 200 is preferably configured to communicate with the vehicle 100 via wireless protocols such as WiFi, LoRa, Zigbee, Bluetooth, or similar mid/short range technologies.

The method S200 comprises obtaining S201, in the vehicle 100, a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors. As mentioned, the vehicle 100 may be an ADS-equipped vehicle, and the first set of perception data may accordingly be output generated by a perception system configured to generate a worldview indicative of the surrounding environment of the vehicle. However, the vehicle may also be provided with a relatively simple "perception stack" and no ADS-functionality, the first of perception data may accordingly be the output from the perception stack. Furthermore, the method S200 comprises forming S202, in the vehicle, a filtered worldview from the obtained first set of perception data, wherein the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data. In more detail, the filtered worldview may for example contain only those objects that are determined to be within a threshold distance from the vehicle, only those objects that are associated with a certain confidence level, or only some predefined set of objects depending on specific implementations (e.g. removing objects on opposing lanes or objects detected on the other side of a barrier). Moreover, the filtering may include adapting the format of the perception data to be suitable for processing by the user-device (e.g. views of the surrounding environment from specific perspectives such as top-view, side-view, 3D view, etc.).

The method S200 further comprises transmitting S204 the filtered worldview from the vehicle 100 to a user-device 200 having one or more processors, at least one memory, a display apparatus, and at least one input device. As mentioned, the user-device 200 may be a wireless handheld device, such as e.g. a smartphone, wherefore the display apparatus and the input device may be in the form of a touch-sensitive screen as readily understood by the skilled person in the art. The transmitted S204 filtered worldview may in some embodiments be in the form of static images or a sequence of images (e.g. video).

Further, the method S200 comprises, at the user-device, displaying S205 via the display apparatus, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. Then, a user annotation event is obtained S206, at the user-device 200, from the input device of the user-device 200. The obtained S206 user annotation event is indicative of a user interaction with the displayed graphical representation. Moreover, in some embodiments, the method S200 may further comprise a step of generating, at the user device 200, a prompter indicative of an instruction for a user of the user-device 200 to annotate the displayed graphical representation of the surrounding environment of the vehicle, i.e. to perform a user annotation event.

Still further, the method S200 comprises forming S207, at the user device 200, an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data (generated in the vehicle) based on the obtained user annotation event and the filtered worldview. It should be understood that the "annotated worldview" is indicative of at least one annotated perceptive parameter in the filtered worldview, which in turn is indicative of at least one annotated perceptive parameter in the first set of perception data, since the filtered worldview is a processed form of the first set of perception data. Then, the formed S207 annotated worldview is transmitted S208 from the user-device 200 to the vehicle 100.

In accordance with a more illustrative example, the displayed S205 graphical representation of at least a portion of the surrounding environment of the vehicle may for example be a perspective view of the surrounding environment (as perceived by the vehicle's perception system). This graphical representation (e.g. an image/video frame captured by a vehicle- mounted camera) may for example include a number of external objects/obstacles (e.g. other vehicles, cyclists, pedestrians, buildings, barriers, traffic signs, traffic lights, and so forth). The graphical representation may further include one or more prompters instructing the user to classify one or more of the external objects present in the graphical representation. In particular, the prompter may instruct the user to classify one or more of the external objects that the vehicle's perception system was unable to classify with an adequate certainty level or confidence level. This may for example be the case if an object is partly obstructed, or otherwise compromised thereby making it difficult for the perception system to classify the object to with sufficient certainty level or confidence level. Another situation that may occur is that the object is a "rare" or otherwise unidentified object - i.e. an object that the vehicle's perception system has not (yet) been configured to classify - and the user may be prompted to classify the unidentified object, which forms the annotated worldview. Thereby some embodiments disclosed herein provides for using the annotated worldview to train the vehicle's perception model to classify new objects.

In another example, the graphical representation may include e.g. a birds-eye view or top- view indicative of the road/lane geometry of the surrounding environment of the vehicle. Furthermore, the graphical representation may include lane traces or road edge traces (i.e. the perception system's estimate of the lane geometry or road geometry). In this case, the graphical representation may include one or more prompters instructing the user to verify the lane traces or road edge traces (i.e. confirm them as accurate) or to correct them if they are erroneous. In reference to the latter, the perception system's lane trace may have missed that the lane is curving at some distance in front of the vehicle and therefore provided a "straight" lane trace instead of a "curved". However, the graphical representation may include a view of the surrounding environment without any lane traces or road edge traces, and the user may accordingly be prompted to indicate the lane markers and/or road edge, which then form the annotated worldview. Thereby some embodiments disclosed herein provides for using the annotated worldview to train the vehicle's perception model to generate lane traces or road edge traces.

In yet another example, the graphical representation may include a birds-eye view, top-view or 3D perspective view indicative of the surrounding environment of the vehicle. Furthermore, the graphical representation may include a free-space estimation (as estimated by the vehicle's perception system) in the surrounding environment of the vehicle. Accordingly, the graphical representation may include one or more prompters instructing the user to verify the free-space estimation (i.e. confirm it as accurate) or to correct it if it is erroneous. In reference to the latter, the perception system's free-space estimation may for example erroneously have concluded that a shadow, road marking, or minor elevation of the road surface is an "obstacle" and therefore not "free-space". Alternatively, the graphical representation may include a view of the surrounding environment without any free-space estimation (e.g. an image/video frame captured by a vehicle-mounted camera), and the user may accordingly be prompted to provide a free-space estimation, which then forms the annotated worldview. Thereby some embodiments disclosed herein provides for using the annotated worldview to train the vehicle's perception model to perform free-space estimations. Free-space (or free- space areas) may in the present context be understood as areas in the surrounding environment of the ego-vehicle absent of objects (e.g. other vehicles, pedestrians, barriers, animals, bicycles, static objects, etc.). However, in some embodiments, drivable area estimations/annotations may be analogously performed as for the free-space areas exemplified above, with an additional constraint of having a "drivable" surface present. In other words, in addition that the surface is absent of objects/obstacles, the surface also has to be "drivable". For example, a road portion may be considered drivable if it is absent of objects/obstacles while a sidewalk/footway or a grass surface may not be considered drivable even if it is absent of objects. It should be noted that the above-described examples are merely some out of a multitude of possible realizations for what the perception output and the graphical representation may include, and what the "annotation" may entail, as readily understood by the skilled artisan. Other perception functions or features that may utilize the teachings herein include bounding box annotation (e.g. where the user is prompted to annotate directly on images/frames captured by vehicle-mounted cameras), semantic segmentation or instance segmentation (e.g. where the user is prompted to enclose different objects). Moreover, it should be noted that the filtered worldview that is to be annotated by the user, need not necessarily comprise an overlaid output of the vehicle's perception algorithm. Instead, the filtered worldview may comprise an image or video frame and instructions to the user of the handheld device to annotate the presented image or video frame. For example, in the case of object free area, the user can annotate the object free area in the image plane without any knowledge about the performance of the vehicle's perception model.

Moving on, the annotations may be elicited from the user (of the user-device 200) in real-time (synchronously) or asynchronously. Thus, in some embodiments, the transmitted S204 filtered worldview is streamed in real-time or near real-time to the user-device. In other words, the filtered worldview may be indicative of a current scenario or scene in the surrounding environment of the vehicle.

However, in some embodiments, the method S200 further comprises (at least temporarily) storing S203, in a memory device of the vehicle, the filtered worldview, and then transmitting S204 the stored filtered worldview to the user-device 200. Buffering or storing the filtered world-view may provide the advantage of being able to provide more relevant data for annotation (e.g. scenarios where the vehicle's perception system had a hard time to accurately make specific estimations/predictions, or scenarios deemed to be informative enough to be added to training dataset) which may further include the efficiency of the whole annotation process as described herein. Thus, one may store "more relevant" images for annotation for any suitable period of time as the vehicle's perception system encounters interesting scenarios, and then when suitable (e.g. when a connected user-device is present and has requested data for annotation) the stored filtered worldview (i.e. the "more relevant" images) is transmitted S204 to the user-device 200. Moreover, by storing S203 the filtered worldview, the user of the user-device may be provided with the possibility to select between different perception functions (e.g. object classification or free-space estimation) to be annotated.

In accordance with some embodiments, the method S200 may comprise checking an accuracy or reliability level of one or more perceptive parameters of the obtained S201 perception data, and if the accuracy or confidence level is below a threshold, a filtered worldview is formed S202 based on the perception data associated with the accuracy or reliability level below the threshold. In other words, in some embodiments, the ADS/vehicle 100 transmits data for annotation if the perception system is uncertain of its surroundings and therefore needs (weakly) annotated data to further train the underlying perception model. Another use- case would be that the "low" confidence level of the one or more perception data may be indicative of a "rare" scenario or "edge-case", which would benefit from further analysis. The "rare" scenario use-case is further discussed below (e.g. in ref. to Fig. 5). Another use-case may for example be to select perception data associated with a (predefined) desired condition that add information to the training dataset (e.g. perception data of "night conditions") in order to balance the training dataset.

Moving on, the annotated worldview may, in accordance with some embodiments, be transmitted S208 to a remote entity 300 from the user-device (e.g. through a wireless wide- area network). The annotated worldviews from a plurality of user-devices 200 may subsequently be used to facilitate development of new perception features/functions centrally.

However, in some embodiments, the annotated worldview is transmitted S208 to the vehicle 100 from the user-device 200. Further, the method S200 may further comprise updating S209, in the vehicle 100, one or more parameters of a perception model of a perception- development module based on the annotated worldview. More specifically, the perception model may for example be updated S209 using a weakly supervised learning algorithm as known in the art of machine learning. Moreover, in some embodiments, the updating S209 may be performed on a perception module of the (production) perception system. Thus, it is not necessarily the "test-module" of the perception system that is updated, but it may be a module that is part of the "production platform", i.e. on a perception-module that generates perception output that the ADS acts upon. Thus, in some embodiments, the method S200 further comprises storing, during a time period, sensor data obtained from at least one vehicle-mounted sensor of the set of vehicle-mounted sensors configured to monitor a surrounding environment of the vehicle. Moreover, the filtered worldview is indicative of the surrounding environment during the time period. Accordingly, the step of updating S209 the one or more parameters of the perception model comprises updating, in the vehicle 100, the one or more parameters of the perception model by means of a weakly supervised learning algorithm based on the stored sensor data and the annotated worldview. In more detail, the sensor data obtained from the at least one vehicle- mounted sensor may be time-stamped just like the filtered worldview - and in extension - the annotated worldview. Thus, the method S200 may further comprise a step of synchronizing (in time) the stored sensor data with the annotated worldview - together forming a synchronized dataset - and using the synchronized dataset with the weakly supervised learning algorithm in order to update S209 the one or more parameters of the perception model. However, the synchronization is an optional feature as the user-device may be configured to supply this information directly receiving the associated sensor data together with the filtered worldview and then provide the synchronized dataset (sensor data + annotations in synch) to the vehicle.

The above-described process for weakly supervised learning/training is further elucidated in Fig. 3, which illustrates a system comprising a vehicle 100 and a user-device 200 in accordance with some embodiments. In general, Fig. 3 depicts the flow of information through exposure to an event in the vehicle 100, the transmission to the user-device 100, and further to the updating of the perception model. The vehicle 100 and user-device 200 comprise control circuitry configured to perform the functions of the methods disclosed herein, where the functions may be included in a non-transitory computer-readable storage medium or other computer program product configured for execution by the control circuitry of each node/entity. However, in order to better elucidate the present invention, the control circuitry is represented as various "modules" or "engines" in Fig. 3, each of them linked to one or more specific functions.

Sensor data 110 (e.g. camera images, RADAR output, LIDAR output, etc.) is generated by one or more of the vehicle's 100 on-board sensors and provided as input to a perception block/system 111 of the vehicle's 100 ADS. In parallel to this, the in-vehicle apparatus' control circuitry is configured to store the generated sensor data 110 in an associated memory device 112a. The perception system 111 is configured to use the input sensor data 110 in order to generate a first set of perception data. The first set of perception data and the stored sensor data accordingly comprise information about the surrounding environment of the vehicle 100 for the same time period. Moreover, the control circuitry of the in-vehicle apparatus is configured to form a filtered worldview from the first set of perception data via a data processing engine 114. The filtered worldview may in turn be stored in a memory device 112b before it is transmitted to the user-device 200 for annotation. However, as mentioned before, the perception system 111 may be more or less complex, and in some embodiments, omitted completely, and the sensor data 110 is instead directly processed by the data processing engine 114 in order to generate the filtered worldview.

The user-device 200 has control circuitry, which is here represented by a scene processing engine 212, an augmentation engine 213, and an annotation engine 214. The scene processing engine 212 is configured to display via the display apparatus, here referred to as User Interface (Ul) 211, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview. The scene processing engine and the annotation engine are configured to obtain a user annotation event from an input device of the user-device (also represented by the Ul block 211). The user annotation event is indicative of a user interaction with the displayed graphical representation.

Further, after the obtained user annotation event, the annotation engine 214 is configured to form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data generated in the vehicle based on the obtained user annotation event and the filtered worldview. The annotated worldview is then transmitted from the user- device 200 to the vehicle 100, where it may (optionally) be processed by a synchronization engine 117. The synchronization engine 117 is configured to synchronize the stored sensor data with the annotated worldview - together forming a synchronized dataset.

Accordingly, the synchronized sensor data and annotated worldview (e.g. in the form of synchronized images or image sequences) are fed to a learning engine 115 configured to update one or more parameters of the perception model by means of a weakly supervised learning algorithm based on the (synchronized) stored sensor data and the annotated worldview. However, in some embodiments, the synchronization may be performed directly by the learning engine 115 in association with the consumption of the annotated data.

The updated one or more parameters of the perception model may be transmitted from the vehicle to a remote entity 300 where it is consolidated against updated parameters of the perception model received from a plurality of vehicles. Accordingly, the remote system 300 may form a set of globally updated parameters, and pushes a "global update" to the fleet of vehicles. The learning engine 115 may then use these globally updated parameters to update the perception model of the perception-development module 113.

Moreover, the augmentation engine 213 - i.e. the control circuitry of the user-device 200 - may be configured to manipulate/augment the displayed graphical representation by adding at least one predefined virtual object and/or at least one pre-recorded object to the displayed graphical representation. The augmentation engine 213 - i.e. the control circuitry of the user- device 200 - may further be configured to obtain a user verification event from the input device of the user-device 200, where the user verification event is indicative of a user interaction with the added at least one virtual object and/or the at least one pre-recorded object. Then, the augmentation engine 213 - i.e. the control circuitry of the user-device 200 - may be configured to determine a user score based on the obtained user verification event. The annotated worldview may accordingly be formed in dependency of the determined user score or be indicative of the determined user score. In the latter case, the updating of the perception model may be performed in dependence of the determined user score. For example, updates based on an annotated worldview associated with a lower user score may have less effect (e.g. limited to minor updates of the perception model) as compared to an annotated worldview associated with a higher user score. In some embodiments, an annotated worldview associated with a user score below a threshold are excluded/discarded and therefore not used in the updating of the perception model.

Reverting back to Fig. 2, and in accordance with some embodiments, the updating S209 of the one or more parameters of the perception model is performed by means of an optimization algorithm configured to optimize a cost function. In more detail, in some embodiments the method S200 further comprises storing, during a time period, sensor data obtained from at least one vehicle-mounted sensor of the set of vehicle-mounted sensors configured to monitor a surrounding environment of the vehicle. The filtered worldview is here indicative of at least some perceptive parameter(s) in the surrounding environment during the (same) time period.

Accordingly, the method S200 may comprise storing, during the time period, a second set of perception data generated by the perception-development module, wherein the perception- development module is configured to generate perception data based on a perception model and sensor data obtained from the at least one vehicle-mounted sensor of the set of vehicle- mounted sensors. The second set of perception data is indicative of a perceptive parameter of the surrounding environment of the vehicle during the (same) time period. Accordingly, the step of updating S209 the one or more parameters of the perception model comprises:

• Determining an estimation error of the perceptive parameter of the second set of perception data based on the annotated worldview.

• Determining a cost function based on the determined estimation error, where the cost function is indicative of a performance of the perception-development module.

• Updating the one or more parameters of the perception model of the perception- development module by means of an optimization algorithm configured to optimise the calculated cost function.

In other words, the annotated worldview forms a "ground truth" for the second set of perception data, wherefore one can evaluate the accuracy/confidence of the perception output of the perception-development module on the basis of the annotated worldview, and from that evaluation one may compute an estimation error - and in extension - form a cost function. The formed cost function is then employed in an optimization algorithm configured to optimize - i.e. to minimize or maximize the cost-function depending on how the cost function is defined - as an error function or reward function.

The optimization algorithm may for example be a gradient-based optimizer or derivative-free optimizer configured to optimize the determined cost function. In the present disclosure, the terms cost function, loss function, and error function are used interchangeably. The purpose of the cost function as defined herein, is accordingly to provide a means to update a perception model so to maximize desirable output and to minimize unwanted output from the perception model. Moreover, in some embodiments, the cost function is determined based on the type of perceptive parameter. In other words, one cost function may be formed/defined if the perception model of the perception-development module is an object detection algorithm while another cost function may be formed/defined if it is a lane-tracing algorithm.

The above-described process for training/learning of the perception model based on cost functions is further elucidated in Fig. 4, which illustrates a system comprising a vehicle 100 and a user-device 200 in accordance with some embodiments. In general, Fig. 4 depicts the flow of information through exposure to an event in the vehicle 100, the transmission to the user- device 100, and further to the updating of the perception model. The vehicle 100 and user- device 200 comprise control circuitry configured to perform the functions of the methods disclosed herein, where the functions may be included in a non-transitory computer-readable storage medium or other computer program product configured for execution by the control circuitry. However, in order to better elucidate the present invention, the control circuitry is represented as various "modules" or "engines" in Fig. 4, each of them linked to one or more specific functions.

Sensor data 110 (e.g. camera images, RADAR output, LIDAR output, etc.) is generated by one or more of the vehicle's 100 on-board sensors and provided as input to a perception block/system 111 of the vehicle's 100 ADS. In parallel to this, the in-vehicle apparatus' control circuitry is configured to store the generated sensor data 110 in an associated memory device 112a. The perception system 111 is configured to use the input sensor data 110 in order to generate a first set of perception data. The first set of perception data and the stored sensor data accordingly comprise information about the surrounding environment of the vehicle 100 for the same time period. Moreover, the control circuitry of the in-vehicle apparatus is configured to form a filtered worldview from the first set of perception data via a data processing engine 114. The filtered worldview may in turn be stored in a memory device 112b before it is transmitted to the user-device 200 for annotation.

Moreover, in parallel to this, the control circuitry of the in-vehicle apparatus is configured to store, in a memory device 112c, a second set of perception data generated by the perception- development module 113. The perception-development module is configured to generate perception data based on a perception model and sensor data obtained from the at least one vehicle-mounted sensor of the set of vehicle-mounted sensors. The second set of perception data is indicative of a perceptive parameter of the surrounding environment of the vehicle during the (same) time period as the first set of perception data - and consequently as the stored sensor data 110.

Further, after the obtained user annotation event, the annotation engine 214 is configured to form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data generated by the perception system 111 of the vehicle based on the obtained user annotation event and the filtered worldview. The annotated worldview is then transmitted from the user-device 200 to the vehicle 100, where it is processed by an evaluation engine 116.

In more detail, the user annotation event may for example be a user-verification/confirmation of one or more perceptive parameters in the filtered worldview (and consequently of the perception system's 11 output). For example, the filtered worldview may comprise an object classification estimation (made by the perception system 111) of a plurality of external objects in the surrounding environment, where each object classification estimation is associated with a confidence level (e.g. 0% - 100%). Thus, some object classifications may be associated with a confidence level of less than 100% (e.g. 95%), wherefore the user annotation event may be indicative of a user-confirmation that the object classifications are accurate and therefore push the confidence level to 100% for those objects. Another use-case may for example be 3D bounding box estimations, where the user-annotation event may be in the form of confirming one or more bounding boxes provided in the filtered worldview or a correction of one or more bounding boxes provided in the filtered worldview.

Moving on, the evaluation engine 116 is further configured to receive/retrieve the stored second set of perception data generated by the perception development module 113, and to determine an estimation error of the perceptive parameter of the second set of perception data based on the annotated worldview. Going along with the above example where the perceptive parameter is an object classification, the evaluation engine is configured to compare the classification of an object in the second set of perception data and the classification of the corresponding object in the annotated worldview. Here, the estimation error may be a binary value (e.g. true/false). Another example would be to estimate an estimated position of an external object (or its bounding box) relative to the ego-vehicle 100 in the second set of perception data and the position of the corresponding external object (or its bounding box) relative to the ego-vehicle in the annotated worldview. Here, the estimation error may be in the form of a distance defining the difference between the position estimations of the corresponding objects.

The evaluation engine 116 is further configured to determine/compute/derive a cost function based on the determined estimation error, where the cost function is indicative of a performance of the perception-development module. The determined cost function is provided as input to a learning engine 115 configured to update one or more parameters of the perception model of the perception-development module by means of an optimization algorithm configured to optimise the calculated cost function.

Moreover, as the skilled person readily understands, the augmentation engine 213 - i.e. the control circuitry of the user-device 200 - may provide the same functionality as described in the foregoing, and will for the sake of brevity and conciseness not be repeated. Reverting back to Fig. 2, and in accordance with some embodiments, the method S200 comprises transmitting S210, from the vehicle, the one or more (locally) updated parameters of the perception model of the perception development module to a remote entity.

Moreover, the method S200 may comprise receiving, at the vehicle, a set (i.e. one or more) of globally updated parameters of the perception-development module from the remote entity. The set of globally updated parameters are based on information obtained from a plurality of vehicles comprising the perception-development module. Thus, the method S200, may further comprise receiving S212, at the remote entity, locally updated model parameters from a plurality of vehicles comprising corresponding perception models. Then, the perception model may be "globally" updated S213 at the remote entity 300 based on the received S212 locally updated model parameters. Once the "global update" is done, the globally updated S213 model parameters are transmitted S214 from the remote entity 300 to the plurality of vehicles 100.

Further, the method S200 may comprise updating S211, at the vehicle, the perception model of the perception-development module based on the received set of globally updated parameters. In other words, the method S200 may include a federated learning scheme, where the locally updated model parameters from each vehicle in a fleet of vehicles are consolidated centrally (at the remote entity) and subsequently pushed out as a "global update" of the perception model of the perception-development module.

Still further, in some embodiments, the method S200 comprises, at the user-device 200, manipulating/augmenting the displayed graphical representation by adding at least one predefined virtual object and/or at least one pre-recorded object to the displayed graphical representation. The method S200 may further comprise, at the user device 200, obtaining a user verification event from the input device of the user-device, where the user verification event is indicative of a user interaction with the added at least one virtual object and/or the at least one pre-recorded object. Then, a user score may be determined based on the obtained user verification event. The annotated worldview may accordingly be formed in dependency of the determined user score or be indicative of the determined user score. In the latter case, the updating S109 of the perception model may be performed in dependence of the determined user score. For example, updates based on an annotated worldview associated with a lower user score may have less effect (e.g. limited to minor updates of the perception model) as compared to an annotated worldview associated with a higher user score.

In other words, the data presented to the user via the user-device, may in some embodiments, be augmented with pre-recorded or synthetic objects to judge the user's annotation capabilities (i.e. to judge the reliability of the annotations supplied). Moreover, the augmentation also provides an advantage of keeping the user of the user-device engaged in scenes where the (real-time) scene outside the vehicle is not "interesting" enough for engaging the user.

Moreover, in some embodiments, the annotated worldview may be used for anomaly detection in order to flag sensor data (e.g. images) which contain information that would be beneficial for more accurate annotation and analysis in back-office (remote system) 300. This may for example be include that the user-interaction event - and in extension the annotated worldview - is indicating that the data contains a rare or interesting information/scene/scenario.

Thus, assuming that the obtained first set of perception data is indicative of a surrounding environment of the vehicle during a time period, the method S200 may further comprise storing, during the time period, the obtained first set of perception data. Further, the method S200 storing, during the time period, sensor data obtained from the set of vehicle-mounted sensors, where the stored sensor data was used by the perception system to generate the first set of perception data.

Further, the method S200 may comprise, at the user-device, obtaining a user interaction event indicative of a rare scenario in the displayed graphical representation. Here, the formed annotated worldview comprises an indication of the rare scenario. Further, the annotated worldview is transmitted from the used device to the vehicle. Once the vehicle has receive the annotated worldview indicative of the rare scenario, the method S200 may further comprise transmitting the stored sensor data, the stored first set of perception data and the annotated worldview from the vehicle to a remote entity. Alternatively, the "rare" scenario may be deduced (in the vehicle) by evaluating the output of the perception system (which is performing poorly on rare non-seen data) in view of the annotated worldview in order to determine a level of "matching".

Accordingly, assuming that the obtained first set of perception data is indicative of a surrounding environment of the vehicle during a time period, the method S200 may further comprise storing, during the time period, the obtained first set of perception data. Further, the method S200 storing, during the time period, sensor data obtained from the set of vehicle- mounted sensors, where the stored sensor data was used by the perception system to generate the first set of perception data.

The method S200 may further comprise evaluating, in the vehicle, the stored first set of perception data with the annotated worldview in order to determine a level of matching between a set of perceptive parameters of the stored perception data and a set of corresponding perceptive parameters of the annotated worldview. Further, if the determined level of matching is below a threshold, the method S200 further comprises transmitting the stored sensor data, the stored first set of perception data and the annotated worldview from the vehicle to a remote entity.

The above-described processes for anomaly or ra re-scenario detection are further elucidated in Fig. 5, which illustrates a system comprising a vehicle 100 and a user-device in accordance with some embodiments. In general, Fig. 5 depicts the flow of information through exposure to an event in the vehicle 100, the transmission to the user-device and to the subsequent evaluation and transmission of the entire scenario related to the anomaly/rare scenario to the "back- office" 300. The vehicle 100 and user-device 200 comprise control circuitry configured to perform the functions of the methods disclosed herein, where the functions may be included in a non-transitory computer-readable storage medium or other computer program product configured for execution by the control circuitry. However, in order to better elucidate the present invention, the control circuitry is represented as various "modules" or "engines" in Fig. 5, each of them linked to one or more specific functions.

Sensor data 110 (e.g. camera images, RADAR output, LIDAR output, etc.) is generated by one or more of the vehicle's 100 on-board sensors and provided as input to a perception block/system 111 of the vehicle's 100 ADS. In parallel to this, the in-vehicle apparatus' control circuitry is configured to store the generated sensor data 110 in an associated memory device 112a. The perception system 111 is configured to use the input sensor data 110 in order to generate a first set of perception data. The stored first set of perception data and the stored sensor data accordingly comprise information about the surrounding environment of the vehicle 100 for the same time period. Moreover, the control circuitry of the in-vehicle apparatus is configured to form a filtered worldview from the first set of perception data via a data processing engine 114. The filtered worldview may in turn be stored in a memory device 112b before it is transmitted to the user-device 200 for annotation.

Further, after the obtained user annotation event, the annotation engine 214 is configured to form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data generated by the perception system of the vehicle based on the obtained user annotation event and the filtered worldview. The annotated worldview is then transmitted from the user-device 200 to the vehicle 100, where it is processed by an evaluation engine 116.

Here, the evaluation engine 116 may be configured to evaluate the stored first set of perception data with the annotated worldview in order to determine a level of matching between a set of perceptive parameters of the stored perception data and a set of corresponding perceptive parameters of the annotated worldview. In more detail, the evaluation engine 116 to compare the set of perceptive parameters of the stored perception data and a set of corresponding perceptive parameters of the annotated worldview in view of a matching threshold. Then, if the determined level of matching is below a threshold (or a level of "mismatching" is above a threshold), the stored sensor data, the stored first set of perception data and the annotated worldview are transmitted from the vehicle to a remote entity 300. The transmitted data - which may be construed as edge-case data or otherwise important data for developing performant perception features - may then be manually analysed at a "back-office".

Alternatively, the annotated worldview may comprise an indication of a rare scenario provided by a user of the user-device 200. In that case, the evaluation engine 116 may be configured to detect a presence of the indication of a rare scenario in the received annotated worldview - and upon a detection - transmit the stored sensor data, the stored first set of perception data and the annotated worldview from the vehicle to the remote entity 300.

An example use-case for the above is for example when the vehicle's perception system 111 has not been able to classify an object with an adequate level of certainty - i.e. the object classification certainty of one or more objects in the first set of perception data is below a threshold. Moreover, the received annotated worldview may be associated with a user having a low "confidence score" based on an evaluation performed by the augmentation engine 213 of the user-device 200. Then, in such cases, it may not be suitable to rely on the annotated worldview to "correct" this classification-error by means of a learning algorithm. Instead, one may just label the entire scenario as a "rare scenario" and to transmit the relevant dataset (sensor data, perception data, and annotated worldview) to the back-office for manual analysis.

PERCEPTION MATCHING PLATFORM

Further, the above described platform may be further used for perception matching. In more detail, another use-case of the platform is to obtain a "matching" of perceptive parameters between different versions of the perception system or even of different perception systems altogether. In such a process the user of the user-device may for example be prompted to match perceptive parameters of a "baseline" output and the output of the perception- development module at the user-device. The "baseline" output in that use-case may be a "post-processed" version of the output of the vehicle's "production" perception system. The post-processing is further described in reference to Fig. 7 below. However, another use-case may to prompt the user to match perceptive parameters of a RADAR- and LIDAR-based perception output with a vision-/camera-based perception output. This may for example be advantageous to train the camera-based perception models in situations where the RADAR-/ LIDAR-based perception model is inherently more accurate and vice-versa.

An example of a system suitable for perception matching in accordance with the above is provided in Fig. 6, which illustrates a system comprising a vehicle 100 and a user-device in accordance with some embodiments. In general, Fig. 6 depicts the flow of information through exposure to an event in the vehicle 100, the transmission to the user-device, and further to the updating of the perception model. The vehicle 100 and user-device 200 comprise control circuitry configured to perform the functions of the methods disclosed herein, where the functions may be included in a non-transitory computer-readable storage medium or other computer program product configured for execution by the control circuitry. However, in order to better elucidate the present invention, the control circuitry is represented as various "modules" or "engines" in Fig. 6, each of them linked to one or more specific functions.

Sensor data 110 (e.g. camera images, RADAR output, LIDAR output, etc.) is generated by one or more of the vehicle's 100 on-board sensors and provided as input to a perception block/system 111 of the vehicle's 100 ADS and further to a perception-development module 113 of the ADS. In parallel to this, the in-vehicle apparatus' control circuitry is configured to store the generated sensor data 110 in an associated memory device 112a. The (production platform's) perception system 111 and the perception-development module are configured to use the input sensor data 110 in order to generate a first set of perception data and a second set of perception data, respectively. The first set of perception data, the second set of sensor data, and the stored sensor data accordingly comprise information about the surrounding environment of the vehicle 100 for the same time period. Moreover, the control circuitry of the in-vehicle apparatus is optionally configured to form a filtered worldview from the first set of perception data and second set of perception data via one or more data processing engines 114a, 114b. The filtered worldview may in turn be stored in a memory devices 112b, 112c before it is transmitted to the user-device 200 for annotation.

In other words, the control circuitry of the in-vehicle apparatus is configured to obtain a first set of perception data indicative of a surrounding environment of the vehicle during a time period. The control circuitry is further configured to obtain a second set of perception data indicative of the surrounding environment of the vehicle during the time period, where the second set of perception data is different from the first set of perception data. Thus, the first and second sets of perception data may as mentioned be generated by different versions of the perception system, i.e. a "production" version 111 that is currently in use and an "under development" version 113 as illustrated in Fig. 6. However, as mentioned, the first and second sets of perception data may be generated by different perception systems or modules altogether (e.g. one generating a worldview based on RADAR input and one generating a worldview based on LIDAR input).

The control circuitry of the in-vehicle apparatus is further configured to transmit the first set of perception data and the second set of perception data to a user-device 200 having one or more processors (i.e. control circuitry), at least one memory, a display apparatus, and at least one input device.

The user-device 200 has control circuitry, which is here represented by a scene processing engine 212, an augmentation engine 213, and an annotation engine 214. The scene processing engine 212 is configured to display via the display apparatus, here referred to as User Interface (Ul) 211, a graphical user interface comprising a graphical representation of at least a portion of the surrounding environment of the vehicle based on the first set of perception data and the second set of perception data. The scene processing engine and the annotation engine are configured to display via the display apparatus, a graphical user interface comprising a prompter to match the second set of perception data to the baseline worldview in order to identify a match between the perceptive parameter of the second set of perception data and a corresponding perceptive parameter in the first set of perception data. For example, the displayed prompter may be indicative of an object-matching instruction. In more detail, the displayed prompter may be indicative of an instruction to match each vehicle comprised in the second set of perception data with the corresponding vehicles in the first set of perception data. Stated differently, the user of the user-device 200 is prompted to match an object detected based on second set of perception data with one object from the first set of perception data both corresponding to one unique object in the real world.

Further, the control circuitry of the user-device 200 is configured to obtain a user interaction event from the input device 211 of the user-device 200 in response to the displayed prompter. The user interaction event is accordingly indicative of a user interaction with the displayed graphical representation. After the obtained user annotation event, the user-device's 200 control circuitry is configured to match the perceptive parameter of the second set of perception data and the corresponding perceptive parameter in the first set of perception data based on the obtained user interaction event. Then, an output signal is transmitted from the user-device 200 to the vehicle. The output signal is indicative of the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the second set of perception data.

The vehicle 100 accordingly receives the transmitted output signal, whereupon the learning engine 115 is configured to update one or more parameters of a perception model of a perception-development module 113 based on the output signal.

Moreover, as the skilled person readily understands, the augmentation engine 213 - i.e. the control circuitry of the user-device 200 - may provide the same functionality as described in the foregoing, and will for the sake of brevity and conciseness not be repeated. Similarly, the generation of a filtered worldview in the vehicle before transmission to the user-device as described in the foregoing may be analogously applied.

In more detail, the learning engine 115 may be configured to evaluate the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the first set of perception data. The evaluation may for example comprise determining an estimation error (e) of the matched perceptive parameter of the second set of perception data in reference to the corresponding perceptive parameter in the baseline worldview. The estimation error (e) may be construed as a parameter indicative of how well the perceptive parameters correlate.

Then, if the estimation error (e) exceeds a threshold value (e.g. a zero-value), the learning engine 115 may be configured to update the one or more parameters of the perception model of the perception-development module 113 by means of a weakly supervised learning algorithm.

The threshold for the estimation error may be defined in different ways depending on the type of perceptive parameter that is being evaluated. For example, the perceptive parameter may be an object position estimation or an object occupancy estimation. The threshold may then be in the form of a "maximum lateral and longitudinal offset of closest point" between the bounding box representation of an object in the second set of perception data and the bounding box representation of the corresponding object in the first set of perception data. The term "closest point" may be understood the closest point of the detected object to the ego-vehicle. However, the threshold may also be in the form of a "maximum lateral and longitudinal offset" of any point of the two bounding box representations (e.g. bottom left corner, top right corner, etc.).

Similarly, if the perceptive parameter is a free-space estimation, the threshold may be in the form of a "maximum size" (i.e. number of area units) of a non-overlapping free-space area between the free-space area estimations of the second set of perception data and the baseline worldview. In terms of set theory this may be understood as the symmetric difference between the free-space set defined by the free-space estimation of the perception development module and the free-space set defined by the first set of perception data. Moreover, in some embodiments, there is a plurality of different thresholds associated with the free-space estimation, where the thresholds depend on where the "erroneous" portion is located relative to the ego-vehicle. In more detail, one may employ a "weight matrix" or a "weight map", where the threshold for the estimation error is lower for certain portions of the scene around the vehicle. For example, it is more important to be accurate close in front of the ego-vehicle than far away or far out on the sides of the ego-vehicle. Thus, in the updating process 107 of the perception model there are higher penalties associated with erroneous estimations in certain portions of the scene around the vehicle.

Thus, based on the above examples, it is considered to be sufficiently clear for the skilled reader how to define estimation error thresholds for various perceptive parameters as exemplified herein. Concerning the value of the threshold, this value may be set based on the type of parameter and/or based on the level of maturity of the perception-development module. The estimation error may in some embodiments be set to zero or any value above zero, meaning if there is any discrepancy between the perception-development module's output and the baseline worldview, the weak annotation step is triggered. Further, in some embodiments, the learning engine 115 may be configured to determine a cost function based on the determined 106 estimation error (e), where the cost function is indicative of a performance of the perception-development module. Stated differently, once a match between a perceptive parameter of the second set of perception data and a corresponding perceptive parameter in the first set of perception data has been identified/established, one can calculate a cost function that indicates how well the "perception-development module" performed. Moreover, in some embodiments, the cost function (may also be referred to as a loss function) is determined based on the type of perceptive parameter. In other words, one cost function may be formed/defined if the perception model of the perception-development module is an object detection algorithm while another cost function may be formed/defined if it is a lane-tracing/tracking algorithm.

Still further, once a cost-function has been determined/calculated, one or more parameters of the perception-development module may be updated by means of an optimization algorithm (e.g. back propagation for neural networks) configured to optimize - minimize or maximize depending on how the function is defined - the calculated cost function.

A purpose of the "matching" process is to mitigate the risk of attempting to run an updating process of a perception model (associated with the second set of perception data) in situations where it is uncertain if the perceptive parameter of the second set of perception data is incorrectly compared to the "wrong" perceptive parameter in the baseline worldview. Thus, by using the perception-matching platform as proposed herein, the risk of erroneous updates of the vehicle's perception models may be reduced, which reduces the costs and time spent in association with perception system development for autonomous vehicles.

As mentioned, the "in-production" perception systems' output - i.e. the first set of perception data - may be post-processed, in the vehicle, so to form a baseline worldview, which serves as a ground truth for the subsequent matching and updating processes. In other words, the worldview of the ADS is post-processed to construct a "baseline", towards which the output of the software-/hardware-under-development can be compared. The post-processing may for example be performed by the data processing engine 114a indicated in Fig. 6, or by a separate dedicated module (not shown).

The above-mentioned post-processing will now be further exemplified in reference to Fig. 7, which depicts a series (a) - (d) of schematic top-view illustrations of a vehicle 1 moving a road portion towards an external object 24. Each illustration is associated with a point in time within the time period 21 ranging from a first moment in time T1 to a second moment in time T2.

In the first illustration (a) the vehicle 1 (may also be referred to as ego-vehicle 1) is moving towards an external object, here in the form of a truck 24, that is traveling in the same direction on an adjacent lane on the road portion. However, due to the distance to the truck 24, the vehicle's perception system/module may not be able to determine, with a sufficiently high level of accuracy, the position of the external object, and/or to classify it as a truck. This is indicated by the box 22a enclosing the truck 24, as well as by the distorted detection of the object, which serve to schematically indicate the "uncertainties" of the detection and classification.

At a subsequent moment in time, i.e. illustration (b) of Fig. 7, the vehicle 1 is closer to the external object, and the uncertainties regarding the external object's 24 position and class/type are reduced, as indicated by the reduced size of the box 22b and reduced distortion as compared to the first box 22a.

At yet another subsequent moment in time, i.e. illustration (c) of Fig. 7, the vehicle's 1 perception system/module is able to accurately determine the external object's 2 position and classify it as a truck 2. More specifically, the ego-vehicle 1 is now sufficiently close to the truck 2 to be able to classify it and estimate the truck's position on the road with a higher level of accuracy as compared to when the ego-vehicle 1 was located further away from the truck.

Then, by means of a suitable filtering technique and based on the temporal development of the "scenario", one is able to establish a "baseline worldview" at an intermediate point 23 in time between T1 and T2, as indicated in the bottom illustration in Fig. 7, i.e. in illustration (d) of Fig. 2. In more detail, the filtering may for example be based on the temporal development of the trajectories, positions, etc. in combination with predefined models (e.g. motion models) of the vehicle 1 and external objects 2. This established baseline worldview may subsequently used as a "ground truth" for training and/or validation of various perception output, and in particular fortraining and/or validation of the output obtained from the perception-development module. Thus, in some embodiments, the baseline worldview constitutes a ground truth for the second set of perception data.

Further, Figs. 8 - 10 provide a number of examples of the matching process for different perceptive parameters. More specifically, Fig. 8 is a schematic perspective view of a matching process for matching a second set of perception data to a first set of perception data. In the illustrated example, the two sets of perception data comprise an object detection estimation, where the patterned objects 72a-c represent the second set of perception data and the dashed objects 71a-c represented the first set of perception data. The determined estimation error (as indicated by the double-headed arrows) may for example a difference in the location between the bounding boxes around the detected objects 71a-c indicated in the second set of perception data and the bounding boxes around the corresponding objects 72a-c in the baseline worldview.

For example, if the second set of perception data is indicative of a blue vehicle 72a, a red vehicle 72b, and a green vehicle 72c in front of the ego-vehicle. Then, the matching process ensures that the red vehicle 72a is connected to the corresponding red vehicle 71a in the baseline worldview and the blue vehicle 72b is connected to the corresponding blue vehicle 71b in the baseline worldview, and so forth. Otherwise, the perception model of the perception- development module may be trained with erroneous data (based on erroneous conclusions), which would decrease the performance of the perception-development module.

As none of the objects in the two perception outputs resulted in a perfect match, it may be concluded that the perception model of the perception development module generates an inaccurate representation of the surrounding environment. Thus, in order to update one or more parameters of the perception model, one can utilize a weakly supervised learning algorithm, where the sensor data used for generating the second set of perception data, the baseline worldview, and the output signal indicative of the matched parameters together form a "training dataset" or a "training example". More specifically, the sensor data used for the second set of perception data forms the "input object" and the baseline worldview is used as a "supervision signal" (i.e. "desired output"), while the output signal provides the link between the relevant perceptive parameters.

Figs. 9 and 10 show the corresponding evaluation processes for perception models in the form of lane geometry estimations algorithms (Fig. 9) and free-space detection algorithms (Fig. 10). In more detail, Fig. 9 is a schematic top-view illustration of a vehicle according to an embodiment of the present invention traveling on a road portion. The road portion has two lanes, and curves to the left in front of the vehicle 1. The "true" lane markings are indicated by reference numeral 83, the baseline worldview's lane geometry estimation (i.e. first set of perception data) is indicated by reference numeral 82, while the perception-development module's lane geometry estimation is indicated by reference numeral 81.

Analogously as before, a matching between the lane traces of the perception-development module's output and the baseline worldview is identified and indicated in the output signal from the user-device. The matching may for example be in the form of connecting each lane trace in the second set of perception data with the corresponding lane trace in the baseline worldview.

Here the estimation error 88 is indicated as a difference between the locations/geometries of the lane geometry estimations 81, 82. In more detail, the perception-development module's lane geometry estimation 81 failed to generate a representation of the lane markers in some areas (indicated by reference numeral 85).

Further, Fig. 10 is a schematic top-view illustration of a vehicle according to an embodiment of the present invention traveling on a road portion. The road portion has two lanes, and three external objects 93 in the form of two cars and a tractor located in front of the vehicle. The free- space estimation made by the production platform, i.e. the free-space estimation of the baseline worldview (i.e. first set of perception data) is indicated by reference numeral 91, while the free-space estimation of the perception-development module is indicated by reference numeral 82. The estimation error 95', 95'' is, in the illustrated example, simply derived by a measurement or metric indicative of the non-overlapping parts of the two free-space estimations 91, 92. In terms of set theory this may be understood as the symmetric difference between the free-space set defined by the free-space estimation 92 of the perception development module and the free-space estimation 91 defined by the baseline worldview. Analogously as for the example embodiments discussed above in reference to Figs. 8 and 9, if the mismatching or estimation error 95' 95” is above a threshold, the perception model of the perception development module is updated using the baseline as ground truth while relying on the output signal to connect the relevant perceptive parameters to each other.

As mentioned, free-space areas may in the present context be understood as areas in the surrounding environment of the ego-vehicle absent of objects (e.g. other vehicles, pedestrians, barriers, animals, bicycles, static objects, etc.). Thus, the location of free-space areas may be understood as estimates of areas absent of external objects (static and dynamic objects). One can consider an estimation of "driveable area" in an analogous fashion, where in addition to the estimation of areas absent of objects (as in the case of free space) the estimation also includes the presence of a road surface.

VEHICLE-SIDE EMBODIMENTS

Fig. 11 depicts a schematic side view of a vehicle 1 comprising an apparatus 10 (or in-vehicle processing device 10) for enabling weak annotation of perception output for development of perception features for a vehicle in accordance with some embodiments. The vehicle 1 further comprises a perception system 6 (i.e. the perception system of the production platform) and a localization system 5. The localization system 5 is configured to monitor a geographical position and heading of the vehicle, and may in the form of a Global Navigation Satellite System (GNSS), such as a GPS. However, the localization system may alternatively be realized as a Real Time Kinematics (RTK) GPS in order to improve accuracy.

Further, the apparatus 10 has control circuitry 11 configured to obtain a first set of perception data from a perception system 6 of the vehicle. The perception system is accordingly configured to generate a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors 6a-c. The control circuitry 11 is further configured to form a filtered worldview from the obtained first set of perception data, wherein the filtered worldview comprises a reduced amount of data relative to the worldview generated by the perception system.

Still further, the control circuitry 11 is configured to transmit the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device. The control circuitry 11 is further configured to receive an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data generated by the perception system of the vehicle from the user-device. Moreover, in some embodiments, the control circuitry 11 is configured to update one or more parameters of a perception model of a perception-development module based on the annotated worldview.

As the skilled person readily understands, the control circuitry 11 of the apparatus 10 may be further configured to perform one or more vehicle-side functions described in the foregoing in reference to Figs. 1 - 5. However, for the sake of brevity and conciseness they will not be repeated in reference to Fig. 11.

Moreover, in some embodiments, the vehicle 1 comprises an apparatus 10 for development of a perception-development module of a vehicle 1. Accordingly, in such embodiments, the control circuitry is configured to obtain a first set of perception data indicative of a surrounding environment of the vehicle during a time period, and to obtain a second set of perception data indicative of the surrounding environment of the vehicle during the time period. Here, the second set of perception data is different from the first set of perception data. The control circuitry may be further configured to transmit the first set of perception data and the second set of perception data to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device.

Further, the control circuitry 11 may be configured to receive, from the user-device, an output signal indicative of the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the first set of perception data. Then, based on the received output signal, the control circuitry 11 may be configured to update one or more parameters of a perception model of a perception-development module.

As the skilled person readily understands, the control circuitry 11 of the apparatus 10 may be further configured to perform one or more vehicle-side functions described in the foregoing in reference to Figs. 6 - 10. However, for the sake of brevity and conciseness they will not be repeated in reference to Fig. 11. Further, the vehicle 1 may be connected to external network(s) 20 via for instance a wireless link (e.g. for transmitting and receiving updated parameters). The same or some other wireless link may be used to communicate with other vehicles 2 in the vicinity of the vehicle, with local infrastructure elements, or with local wireless communication devices. Cellular communication technologies may be used for long range communication such as to external networks and if the cellular communication technology used have low latency it may also be used for communication between vehicles, vehicle to vehicle (V2V), and/or vehicle to infrastructure, V2X. Examples of cellular radio technologies are GSM, GPRS, EDGE, LTE, 5G, 5G NR, and so on, also including future cellular solutions. However, in some solutions mid to short range communication technologies are used such as Wireless Local Area (LAN), e.g. IEEE 802.11 based solutions. ETSI is working on cellular standards for vehicle communication and for instance 5G is considered as a suitable solution due to the low latency and efficient handling of high bandwidths and communication channels.

The present invention has been presented above with reference to specific embodiments. However, other embodiments than the above described are possible and within the scope of the invention. Different method steps than those described above, performing the method by hardware or software, may be provided within the scope of the invention. Thus, according to an exemplary embodiment, there is provided a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a vehicle control system, the one or more programs comprising instructions for performing the method according to any one of the above-discussed embodiments. Alternatively, according to another exemplary embodiment a cloud computing system can be configured to perform any of the methods presented herein. The cloud computing system may comprise distributed cloud computing resources that jointly perform the methods presented herein under control of one or more computer program products.

Generally speaking, a computer-accessible medium may include any tangible or non-transitory storage media or memory media such as electronic, magnetic, or optical media— e.g., disk or CD/DVD-ROM coupled to computer system via bus. The terms "tangible" and "non-transitory," as used herein, are intended to describe a computer-readable storage medium (or "memory") excluding propagating electromagnetic signals, but are not intended to otherwise limit the type of physical computer-readable storage device that is encompassed by the phrase computer- readable medium or memory. For instance, the terms "non-transitory computer-readable medium" or "tangible memory" are intended to encompass types of storage devices that do not necessarily store information permanently, including for example, random access memory (RAM). Program instructions and data stored on a tangible computer-accessible storage medium in non-transitory form may further be transmitted by transmission media or signals such as electrical, electromagnetic, or digital signals, which may be conveyed via a communication medium such as a network and/or a wireless link.

The processor(s) 11 (associated with the apparatus 10) may be or include any number of hardware components for conducting data or signal processing or for executing computer code stored in memory 12. The apparatus 10 has an associated memory 12, and the memory 12 may be one or more devices for storing data and/or computer code for completing or facilitating the various methods described in the present description. The memory may include volatile memory or non-volatile memory. The memory 12 may include database components, object code components, script components, or any other type of information structure for supporting the various activities of the present description. According to an exemplary embodiment, any distributed or local memory device may be utilized with the systems and methods of this description. According to an exemplary embodiment the memory 12 is communicably connected to the processor 11 (e.g., via a circuit or any other wired, wireless, or network connection) and includes computer code for executing one or more processes described herein.

It should be appreciated that the sensor interface 13 may also provide the possibility to acquire sensor data directly or via dedicated sensor control circuitry 6 in the vehicle. The communication/antenna interface 14 may further provide the possibility to send output to a remote location (e.g. remote operator or control centre) by means of the antenna 8. Moreover, some sensors in the vehicle may communicate with the system 10 using a local network setup, such as CAN bus, I2C, Ethernet, optical fibres, and so on. The communication interface 14 may be arranged to communicate with other control functions of the vehicle and may thus be seen as control interface also; however, a separate control interface (not shown) may be provided. Local communication within the vehicle may also be of a wireless type with protocols such as WiFi, LoRa, Zigbee, Bluetooth, or similar mid/short range technologies. The present disclosure has been presented above with reference to specific embodiments. However, other embodiments than the above described are possible and within the scope of the disclosure. Different method steps than those described above, performing the method by hardware or software, may be provided within the scope of the disclosure. Thus, according to an exemplary embodiment, there is provided a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a control device, the one or more programs comprising instructions for performing the method according to any one of the above-discussed embodiments. Alternatively, according to another exemplary embodiment a cloud computing system can be configured to perform any of the methods presented herein. The cloud computing system may comprise distributed cloud computing resources that jointly perform the methods presented herein under control of one or more computer program products.

It should be noted that the word "comprising" does not exclude the presence of other elements or steps than those listed and the words "a" or "an" preceding an element do not exclude the presence of a plurality of such elements. It should further be noted that any reference signs do not limit the scope of the claims, that the disclosure may be at least in part implemented by means of both hardware and software, and that several "means" or "units" may be represented by the same item of hardware.

Although the figures may show a specific order of method steps, the order of the steps may differ from what is depicted. In addition, two or more steps may be performed concurrently or with partial concurrence. Such variation will depend on the software and hardware systems chosen and on designer choice. All such variations are within the scope of the disclosure. Likewise, software implementations could be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various connection steps, processing steps, comparison steps and decision steps. The above mentioned and described embodiments are only given as examples and should not be limiting to the present disclosure. Other solutions, uses, objectives, and functions within the scope of the disclosure as claimed in the below described patent embodiments should be apparent for the person skilled in the art.

Claims

1. A method for enabling weak annotation of perception output for development of perception features for a vehicle, the method comprising: obtaining, in the vehicle, a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors; forming, in the vehicle, a filtered worldview from the obtained first set of perception data, wherein the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data; transmitting the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device; at the user-device: displaying via the display apparatus, a graphical user interface comprising: a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview; obtaining a user annotation event from the input device of the user-device, the user annotation event being indicative of a user interaction with the displayed graphical representation; after the obtained user annotation event: forming an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event and the filtered worldview; transmitting the annotated worldview.

2. The method according to claim 1, further comprising: storing, in a memory device, the filtered worldview; wherein the step of transmitting the filtered worldview comprises transmitting the stored filtered worldview to the user-device.

3. The method according to claim 1, wherein the transmitted filtered worldview is streamed in real-time or near real-time to the user-device.

4. The method according to any one of claims 1 - 3, wherein the annotated worldview is transmitted to a remote entity.

5. The method according to any one of claims 1 - 3, wherein the annotated worldview is transmitted to the vehicle, and wherein the method further comprises: updating, in the vehicle, one or more parameters of a perception model of a perception- development module based on the annotated worldview.

6. The method according to claim 5, further comprising: storing, during a time period, sensor data obtained from at least one vehicle-mounted sensor of the set of vehicle-mounted sensors configured to monitor a surrounding environment of the vehicle; wherein the filtered worldview is indicative of the surrounding environment during the time period; wherein the step of updating the one or more parameters of the perception model comprises: updating, in the vehicle, the one or more parameters of the perception model by means of a weakly supervised learning algorithm based on the stored sensor data and the annotated worldview.

7. The method according to claim 5, further comprising: storing, during a time period, a second set of perception data generated by the perception-development module, wherein the perception-development module is configured to generate perception data based on a perception model and sensor data obtained from the at least one vehicle-mounted sensor of the set of vehicle-mounted sensors; wherein the filtered worldview is indicative of the surrounding environment during the time period; wherein the second set of perception data is indicative of a perceptive parameter of the surrounding environment of the vehicle during the time period; and wherein the step of updating the one or more parameters of the perception model comprises: determining an estimation error of the perceptive parameter of the second set of perception data based on the annotated worldview; determining a cost function based on the determined estimation error, the cost function being indicative of a performance of the perception-development module; and updating the one or more parameters of the perception model of the perception- development module by means of an optimization algorithm configured to optimise the calculated cost function.

8. The method according to any one of claims 5 - 7, further comprising: transmitting the one or more updated parameters of the perception model of the perception-development module to a remote entity; receiving a set of globally updated parameters of the perception model of the perception-development module from the remote entity, wherein the set of globally updated parameters are based on information obtained from a plurality of vehicles comprising the perception-development module; updating the perception model of the perception-development module based on the received set of globally updated parameters.

9. The method according to any one of claims 1 - 8, further comprising: at the user-device: manipulating the displayed graphical representation by adding at least one predefined virtual object and/or at least one pre-recorded object to the displayed graphical representation; obtaining a user verification event from the input device of the user-device, the user verification event being indicative of a user interaction with the added at least one virtual object and/or the at least one pre-recorded object; determining a user score based on the obtained user verification event; and wherein the annotated worldview is formed in dependency of the determined user score.

10. The method according to claim 1, wherein the obtained first set of perception data is indicative of a surrounding environment of the vehicle during a time period, the method further comprising: storing, during the time period, the obtained first set of perception data; storing, during the time period, sensor data obtained from the set of vehicle-mounted sensors, wherein the stored sensor data was used to generate the first set of perception data; evaluating, in the vehicle, the stored first set of perception data with the annotated worldview in order to determine a level of matching between a set of perceptive parameters of the stored perception data and a set of corresponding perceptive parameters of the annotated worldview; if the determined level of matching is below a threshold, the method further comprises: transmitting the stored sensor data, the stored first set of perception data and the annotated worldview from the vehicle to a remote entity.

11. The method according to claim 1, wherein the obtained first set of perception data is indicative of a surrounding environment of the vehicle during a time period, the method further comprising: storing, in the vehicle, during the time period, the obtained first set of perception data; storing, in the vehicle, during the time period, sensor data obtained from the set of vehicle-mounted sensors, wherein the stored sensor data was used by to generate the first set of perception data; at the user-device: obtaining a user interaction event indicative of a rare scenario in the displayed graphical representation; wherein the formed annotated worldview comprises an indication of the rare scenario; transmitting the annotated worldview to the vehicle; transmitting the stored sensor data, the stored first set of perception data and the annotated worldview from the vehicle to a remote entity.

12. A system for enabling weak annotation of perception output for development of perception features for a vehicle, the system comprising: an in-vehicle apparatus and a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device, wherein the in-vehicle apparatus comprises control circuitry configured to: obtain a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors; form a filtered worldview from the obtained first set of perception data, wherein the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data; transmit the filtered worldview from the vehicle to the user-device; wherein the one or more processors of the user-device are configured to: display via the display apparatus, a graphical user interface comprising: a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview; obtain a user annotation event from the input device of the user-device, the user annotation event being indicative of a user interaction with the displayed graphical representation; after the obtained user annotation event: form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception databased on the obtained user annotation event and the filtered worldview; transmit the annotated worldview.

13. A method performed by an in-vehicle processing device for enabling weak annotation of perception output for development of perception features for a vehicle, the method comprising: obtaining, in the vehicle, a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle-mounted sensors; forming, in the vehicle, a filtered worldview from the obtained first set of perception data, wherein the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data; transmitting the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device; and receiving, in the vehicle, an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data from the user-device.

14. The method according to claim 13, further comprising: storing, in a memory device, the filtered worldview; wherein the step of transmitting the filtered worldview comprises transmitted the stored filtered worldview to the user-device.

15. The method according to claim 13, wherein the transmitted filtered worldview is streamed in real-time or near real-time to the user-device.

16. The method according to any one of claims 13 - 15, further comprising: updating, in the vehicle, one or more parameters of a perception model of a perception- development module based on the annotated worldview.

17. The method according to claim 16, further comprising: storing, during a time period, sensor data obtained from at least one vehicle-mounted sensor of the set of vehicle-mounted sensors configured to monitor a surrounding environment of the vehicle; wherein the filtered worldview is indicative of the surrounding environment during the time period; wherein the step of updating the one or more parameters of the perception model comprises: updating, in the vehicle, the one or more parameters of the perception model by means of a weakly supervised learning algorithm based on the stored sensor data and the annotated worldview.

18. The method according to claim 16, further comprising: storing, during a time period, a second set of perception data generated by the perception-development module wherein the perception-development module is configured to generate perception data based on a perception model and sensor data obtained from the at least one vehicle-mounted sensor of the set of vehicle-mounted sensors; wherein the filtered worldview is indicative of the surrounding environment during the time period; wherein the second set of perception data is indicative of a perceptive parameter of the surrounding environment of the vehicle during the time period; and wherein the step of updating the one or more parameters of the perception model comprises: determining an estimation error of the perceptive parameter of the second set of perception data based on the annotated worldview; determining a cost function based on the determined estimation error, the cost function being indicative of a performance of the perception-development module; and updating the one or more parameters of the perception model of the perception- development module by means of an optimization algorithm configured to optimize the calculated cost function.

19. The method according to any one of claims 16 - 18, further comprising: transmitting the one or more updated parameters of the perception model of the perception-development module to a remote entity; receiving a set of globally updated parameters of the perception model of the perception-development module from the remote entity, wherein the set of globally updated parameters are based on information obtained from a plurality of vehicles comprising the perception-development module; updating the perception model of the perception-development module based on the received set of globally updated parameters.

20. The method according to claim 13, wherein the obtained first set of perception data is indicative of a surrounding environment of the vehicle during a time period, the method further comprising: storing, during the time period, the obtained first set of perception data; storing, during the time period, sensor data obtained from the set of vehicle-mounted sensors, wherein the stored sensor data was used to generate the first set of perception data; evaluating, the stored first set of perception data with the annotated worldview in order to determine a level of matching between a set of perceptive parameters of the stored perception data and a set of corresponding perceptive parameters of the annotated worldview; if the determined level of matching is below a threshold, the method further comprises: transmitting the stored sensor data, the stored first set of perception data and the annotated worldview from the vehicle to a remote entity.

21. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an in-vehicle processing system, the one or more programs comprising instructions for performing the method according to any one of claims 13 - 20.

22. An apparatus for enabling weak annotation of perception output for development of perception features for a vehicle, the apparatus comprising control circuitry configured to: obtain, in the vehicle, a first set of perception data comprising a worldview indicative of the surrounding environment of the vehicle based on sensor data obtained from a set of vehicle- mounted sensors; form, in the vehicle, a filtered worldview from the obtained first set of perception data, wherein the filtered worldview comprises a reduced amount of data relative to the worldview of the first set of perception data; transmit the filtered worldview from the vehicle to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device; and receive, in the vehicle, an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data from the user-device.

23. A vehicle comprising: a set of vehicle-mounted sensors configured to monitor a surrounding environment of the vehicle; a set of vehicle-mounted sensors configured to monitor the surrounding environment of the vehicle; an apparatus according to claim 22.

24. A method performed by one or more processors of a user-device for enabling weak annotation of perception output for development of perception features for a vehicle, the method comprising: receiving, from the vehicle, a filtered worldview generated by processing a perception output from the vehicle; displaying via the display apparatus, a graphical user interface comprising: a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview; obtaining a user annotation event from the input device of the user-device, the user annotation event being indicative of a user interaction with the displayed graphical representation; after the obtained user annotation event: forming an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event and the filtered worldview; transmitting the annotated worldview.

25. The method according to claim 24, further comprising: manipulating the displayed graphical representation by adding at least one predefined virtual object and/or at least one pre-recorded object to the displayed graphical representation; obtaining a user verification event from the input device of the user-device, the user verification event being indicative of a user interaction with the added at least one virtual object and/or at least one pre-recorded object; determining a user score based on the obtained user verification event; and wherein the annotated worldview is formed in dependency of the determined user score.

26. The method according to claim 24 or 25, further comprising: obtaining a user interaction event indicative of a rare scenario in the displayed graphical representation; wherein the formed annotated worldview comprises an indication of the rare scenario.

27. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a processing system, the one or more programs comprising instructions for performing the method according to any one of claims 24 - 26.

28. A user-device for enabling weak annotation of perception output for development of perception features for a vehicle, the user-device comprising: one or more processors, at least one memory, a display apparatus, and at least one input device, wherein the one or more processors are configured to: receive, from the vehicle, a filtered worldview generated by processing a perception output of the vehicle; display via the display apparatus, a graphical user interface comprising: a graphical representation of at least a portion of the surrounding environment of the vehicle based on the filtered worldview; obtain a user annotation event from the input device of the user-device, the user annotation event being indicative of a user interaction with the displayed graphical representation; after the obtained user annotation event: form an annotated worldview indicative of at least one annotated perceptive parameter in the first set of perception data based on the obtained user annotation event; transmit the annotated worldview.

29. The user-device according to claim 28, wherein the one or more processors are further configured to: manipulate the displayed graphical representation by adding at least one predefined virtual object and/or at least one pre-recorded object to the displayed graphical representation; obtain a user verification event from the input device of the user-device, the user verification event being indicative of a user interaction with the added at least one virtual object and/or at least one pre-recorded object; determine a user score based on the obtained user verification event; and wherein the annotated worldview is formed in dependency of the determined user score.

BO. The user-device according to claim 28 or 29, wherein the one or more processors are further configured to: obtain a user interaction event indicative of a rare scenario in the displayed graphical representation; wherein the formed annotated worldview comprises an indication of the rare scenario.

31. The user-device according to any one of claims 28 - 30, wherein the user-device is a wireless communication device.

32. A method for development of a perception-development module of a vehicle, the method comprising: obtaining, at the vehicle, a first set of perception data indicative of a surrounding environment of the vehicle during a time period; obtaining, at the vehicle, a second set of perception data indicative of the surrounding environment of the vehicle during the time period, the second set of perception data being different from the first set of perception data; transmitting, from the vehicle, the first set of perception data and the second set of perception data to a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device; at the user-device: displaying via the display apparatus, a graphical user interface comprising: a graphical representation of at least a portion of the surrounding environment of the vehicle based on first set of perception data and the second set of perception data; a prompter to match the second set of perception data to the first set of perception data in order to identify a match between a perceptive parameter of the second set of perception data and a corresponding perceptive parameter in the first set of perception data; obtaining a user interaction event from the input device of the user-device in response to the displayed prompter, the user interaction event being indicative of a user interaction with the displayed graphical representation; after the obtained user interaction event: matching the perceptive parameter of the second set of perception data and the corresponding perceptive parameter in the first set of perception data based on the obtained user interaction event; transmitting an output signal indicative of the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the first set of perception data to the vehicle; updating, at the vehicle, one or more parameters of a perception model of a perception- development module based on the output signal.

33. A system for development of a perception-development module of a vehicle, the system comprising: an in-vehicle apparatus and a user-device having one or more processors, at least one memory, a display apparatus, and at least one input device, wherein the in-vehicle apparatus comprises control circuitry configured to: obtain a first set of perception data indicative of a surrounding environment of the vehicle during a time period; obtain a second set of perception data indicative of the surrounding environment of the vehicle during the time period, the second set of perception data being different from the first set of perception data; transmit the first set of perception data and the second set of perception data to the user-device; wherein the one or more processors of the user-device are configured to: display via the display apparatus, a graphical user interface comprising: a graphical representation of at least a portion of the surrounding environment of the vehicle based on the first set of perception data and the second set of perception data; a prompter to match the second set of perception data to the baseline worldview in order to identify a match between the perceptive parameter of the second set of perception data and a corresponding perceptive parameter in the first set of perception data; obtain a user interaction event from the input device of the user-device in response to the displayed prompter, the user interaction event being indicative of a user interaction with the displayed graphical representation; after the obtained user interaction event: match the perceptive parameter of the second set of perception data and the corresponding perceptive parameter in the first set of perception data based on the obtained user interaction event; transmit an output signal indicative of the matched perceptive parameter of the second set of perception and the corresponding perceptive parameter in the first set of perception data to the vehicle; wherein the control circuitry of the in-vehicle apparatus is further configured to: update one or more parameters of a perception model of a perception- development module based on the output signal.