WO2020073272A1

WO2020073272A1 - Snapshot image to train an event detector

Info

Publication number: WO2020073272A1
Application number: PCT/CN2018/109802
Authority: WO
Inventors: Wanli Jiang; Maximilian DOMLING; Qianshan LI
Original assignee: Bayerische Motoren Werke Aktiengesellschaft
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2020-04-16
Also published as: EP3864568A1; CN112805716A; EP3864568A4

Abstract

A method and an apparatus for training an event detector with snapshot images is set forth.The method comprises: obtaining at least two frames of sensor data from at least one sensor installed on a vehicle, the at least two frames of sensor data are sequentially collected at different time(702); obtaining results of events that are occurring while the sensor data are obtained(704); for each of the at least two frames, creating a snapshot image with the obtained sensor data(706); associating the obtained results of events with corresponding snapshot images as training data(708); and training an event detector using the training data(710).

Description

Snapshot Image to Train an Event Detector

FIELD OF THE INVENTION

The present disclosure relates in general to automated driving vehicles, and in more particular, to training an event detector with snapshot images.

BACKGROUND OF THE INVENTION

An automated driving vehicle (also known as a driverless car, self-driving car, robotic car) is a kind of vehicle that is capable of sensing its environment and navigating without human input. Automated driving vehicles (hereinafter, called as ADV) use a variety of techniques to detect their surroundings, such as radar, laser light, GPS, odometry and computer vision. Advanced control systems interpret sensory information to identify appropriate navigation paths, as well as obstacles and relevant signage.

More specific, an ADV collects sensor data from a variety of on-board sensors, such as camera, Lidar, radar, etc. Based on the sensor data, the ADV can construct a real-time roadmodel around it. Roadmodels may include a variety of information including, but not limited to, lanemarkings, traffic lights, traffic signs, road boundaries, etc. The constructed roadmodel is compared to the pre-installed roadmodels, such as those provided by high definition (HD) map providers, so that the ADV may more accurately determine its location in the HD map. In the meantime, the ADV may also identify objects around it, such as vehicles and pedestrians based on the sensor data. The ADV can make appropriate driving decisions based on the determined roadmodel and identified surrounding objects, such as lane change, acceleration, break, etc.

As known in the art, different sensors produce different forms or formats of data. For example, cameras provide images, while Lidars provide point clouds. In processing such sensor data from different sensors, each type of sensor data has to be separately processed. Thus, for each type of sensor, one or more models for object identification have to be established. In addition, for any particular type of sensor, it may have drawbacks when being used to train a target model, such as an event detector. For example, if using images directly obtained by a camera to train an event detector, the drawbacks may include: (1) no classification of elements in the image; (2) the images may be in arbitrary perspective; and (3) it requires a huge number of samples images to train a target model. Similar drawbacks may exist for other types of sensors. Therefore, an improved solution of recording of traffic scenarios with sensor data is desired. In addition, if an event detector is trained based on a predefined description, then it may have the following drawbacks: (1) the description should be defined specifically for an event; and (2) the description may easily ignore some supporting information. e.g. for a lane change intention, the objects in neighboring lane also has impact. Also, existing ways for training an event detector do not use machine learning technique and thus the training requires a lot of pre-defined criteria and only work with a lot pre-conditions.

SUMMARY OF THE INVENTION

The present disclosure aims to provide a method and an apparatus for training an event detector with snapshot images and detecting events with the trained event detector. The method of the present invention may have at least the following advantages over the traditional ways of event detector training: (1) a smaller number of training samples is used due to the unified data representation; (2) the sample data can contain various types of information which can also include historical information; (3) computational efficient; and (4) enable the use of existing data of different format.

In accordance with a first exemplary embodiment of the present disclosure, a method for training an event detector with snapshot images is provided. The method comprises: obtaining at least two frames of sensor data from at least one sensor installed on a vehicle, the at least two frames of sensor data are sequentially collected at different time; obtaining results of events that are occurring while the sensor data are obtained; for each of the at least two frames, creating a snapshot image with the obtained sensor data; associating the obtained results of events with corresponding snapshot images as training data; and training an event detector using the training data.

In accordance with a second exemplary embodiment of the present disclosure, a method on a vehicle for detecting events is described. The method comprises: obtaining an event detector trained by the method according to the first embodiment; obtaining at least one frame of sensor data from at least one type of sensor installed on a vehicle; for each of the at least one frame, creating a snapshot image with the obtained sensor data; and detecting events with the event detector based on the created snapshot image.

In accordance with a third exemplary embodiment of the present disclosure, a system for training an event detector with snapshot images is provided. The system comprises a sensor data obtaining module configured for obtaining at least two frames of sensor data from at least one sensor installed on a vehicle; an event result obtaining module configured for obtaining results of events that are occurring while the sensor data are obtained; a snapshot image creating module configured for, for each of the at least two frames, creating a snapshot image with the obtained sensor data; an associating module configured for associating the obtained results of events with corresponding snapshot images as training data; and a training module configured for training an event detector using the training data.

In accordance with a fourth exemplary embodiment of the present disclosure, an apparatus on a vehicle for detecting events is described. The apparatus comprises: a detector obtaining module configured for obtaining an event detector trained by the method according to the first embodiment; a sensor data obtaining module configured for obtaining at least two frames of sensor data from at least one sensor installed on a vehicle; a snapshot image creating module configured for, for each of the at least two frames, creating a snapshot image with the obtained sensor data; and an event detecting module configured for detecting events with the event detector based on the created snapshot image.

In accordance with a fifth exemplary embodiment of the present disclosure, a vehicle including at least one sensor and the apparatus of the fourth exemplary embodiment is provided.

This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Additional aspects, features, and/or advantages of examples will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects and advantages of the present disclosure will become apparent from the following detailed description of exemplary embodiments taken in conjunction with the accompanying drawings which illustrate, by way of example, the principles of the present disclosure. Note that the drawings are not necessarily drawn to scale.

Fig. 1 illustrates an exemplary diagram generated from a snapshot image of a traffic scenario according to an embodiment of the present invention.

Fig. 2 is a flowchart of an exemplary method for creating snapshot images of traffic scenarios according to an embodiment of the present invention.

Fig. 3 illustrates an exemplary diagram generated from a snapshot image of a traffic scenario according to another embodiment of the present invention.

Fig. 4 is a flowchart of an exemplary method for creating snapshot images of traffic scenarios according to another embodiment of the present invention.

Fig. 5 is a flowchart of an exemplary method for creating snapshot images of traffic scenarios according to yet another embodiment of the present invention.

Fig. 6 is a flowchart of an exemplary method for training a road model with snapshot images according to an embodiment of the present invention.

Fig. 7 is a flowchart of an exemplary method for training an event detector with snapshot images according to an embodiment of the present invention.

Fig. 8 is a flowchart of an exemplary method implemented on a vehicle for detecting events according to an embodiment of the present invention.

Fig. 9 illustrates an exemplary apparatus for creating snapshot images of traffic scenario according to an embodiment of the invention.

Fig. 10 illustrates an exemplary vehicle according to an embodiment of the present invention.

Fig. 11 illustrates an exemplary apparatus for creating snapshot images of traffic scenario according to another embodiment of the invention.

Fig. 12 illustrates an exemplary vehicle according to another embodiment of the present invention.

Fig. 13 illustrates an exemplary apparatus for creating snapshot images of traffic scenario according to yet another embodiment of the invention.

Fig. 14 illustrates an exemplary vehicle according to yet another embodiment of the present invention.

Fig. 15 illustrates an exemplary system for training a road model with snapshot images according to an embodiment of the present invention.

Fig. 16 illustrates an exemplary system for training an event detector with snapshot images according to an embodiment of the present invention.

Fig. 17 illustrates an apparatus on a vehicle for detecting events according to an embodiment of the present invention.

Fig. 18 illustrates an exemplary vehicle according to an embodiment of the present invention.

Fig. 19 illustrates a general hardware environment wherein the present disclosure is applicable in accordance with an exemplary embodiment of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth to provide a thorough understanding of the described exemplary embodiments. It will be apparent, however, to one skilled in the art that the described embodiments can be practiced without some or all of these specific details. In other exemplary embodiments, well known structures or process steps have not been described in detail in order to avoid unnecessarily obscuring the concept of the present disclosure.

The term “vehicle” used through the specification refers to a car, an airplane, a helicopter, a ship, or the like. For simplicity, the invention is described with respect to “car” , but the embodiments described herein is not limited to “car” only, but applicable to other kinds of vehicles. The term “A or B” used through the specification refers to “A and B” and “A or B” rather than meaning that A and B are exclusive, unless otherwise specified.

I. Snapshot

The present invention provides a method capable of efficiently integrating various types of sensor data on a vehicle in a unified manner to integrally exhibit information of traffic scenarios around a vehicle. The method is to some extent like taking a picture of a scene, so it is hereinafter called as “snapshot” , and the data of the snapshots are called “snapshot images” .

1. Multiple Sensors, One Timestamp

As a first embodiment of the present invention, a snapshot may be constructed by sensor data from multiple sensors captured at the same time.

As mentioned above, vehicles (especially ADV) are equipped with different types of sensors, such as Lidar, Radar and camera. Each sensor records its own sensor data and provides it to the central processing unit of the vehicle. The formats of sensor data provided by various types or various manufacturers of sensors are typically different. Therefore, the central processing unit needs to have the ability to read and recognize each of the various types of sensor data, and use them separately. It thus consumes a lot of resources and is very inefficient.

The present invention integrates sensor data from multiple sensors in the form of a snapshot. The multiple sensors may be of the same type of sensors, but may also be of different types.

To perform a unified integration, a unified reference coordinate system is established. According to one embodiment of the present invention, the reference coordinate system of the present invention may be a two-dimensional plane parallel to the ground. The origin of the reference coordinate system may be the midpoint of the rear axle of the vehicle, for example. Alternatively, the origin may be the position of any one of the sensors, such as the geometric center of the sensor, or the origin of a local coordinate system used by the sensor. Of course, the origin can also be any point on a vehicle. For convenience of explanation, the midpoint of the rear axle of the car is selected as the origin in this embodiment.

Accordingly, one axis of the reference coordinate system can be parallel to the rear axle of the vehicle, and the other axis can be perpendicular to the rear axle of the vehicle. Thus, as shown in Figure 1, which illustrates an exemplary diagram generated from a snapshot image of a traffic scenario according to an embodiment of the present invention, the x-axis is perpendicular to the rear axle of the vehicle, with the positive half of the x-axis representing positions in front of the direction of travel of the vehicle, and the negative half of the x-axis representing positions behind the direction of travel of the vehicle. The y-axis is parallel to the rear axle of the vehicle. The positive half of the y-axis can represent positions on the left side of the direction of travel of the vehicle, and the negative half-axis of the y-axis can represent positions on the right side of the direction of travel of the vehicle. Optionally, the size of the reference coordinate system can be pre-determined so as to limit the data amount. As an example, the x-axis and the y-axiz may be defined to have a size of -50 to +50 meters, or -100 to +100 meters, or the like. In another example, the x-axis and the y-axiz may be determined by the maximum sensing range of the sensors mounted on the vechicle.

The various sensors used in vehicles, regardless of the data format they employ, typically include at least a binary set of location information and value information, such as { (x, y) , d} , which represents that in the read out of the sensor at position (x, y) is d. The location information is in the local coordinate system of the sensor. Thus, after the reference coordinate system is determined, the sensor data for each sensor can be transformed from its respective local coordinate system to the reference coordinate system. The position at which a sensor is mounted on the vehicle is known, so the corresponding position in the reference coordinate system can be determined. For example, assuming that the relative position between the local coordinate system of a first sensor and the reference coordinate system is x _c1, y _c1, i.e., the origin of the local coordinate system of the first sensor is located at (x _c1, y _c1) in the reference coordinate system. Then, a given position (x _s1, y _s1) in the local reference coordinate system can be transformed to (x _s1-x _c1, y _s1-y _c1) . Similarly, assuming that the relative position between the local coordinate system of a second sensor and the reference coordinate system is x _c2, y _c2, i.e., the origin of the local coordinate system of the second sensor is located at (x _c2, y _c2) in the reference coordinate system. Then, a given position (x _s2, y _s2) in the local reference coordinate system can be transformed to (x _s2-x _c2, y _s2-y _c2) .

In addition, some sensors may use a three-dimensional local coordinate system, for example, point cloud data of Lidar is three-dimensional. Such three-dimensional coordinate systems can be projected onto the two-dimensional reference coordinate system. More specific, such three-dimensional coordinate systems are generally represented by x, y and z axes, wherein the planes formed by two of three axes (assuming the x and y axes) are typically parallel to the ground as well, and thus parallel to the x-y plane in the reference coordinate system of the present invention. Therefore, its x-y coordinates can be similarly transformed into coordinates in the reference coordinate system by translation. The z coordinates do not need to be transformed and can be retained as additional information in the snapshot image data. Through the three-dimensional to two-dimensional transformation, the snapshot image provided by the present invention, if visually displayed, may look similar to a top view of a scenario.

In addition, as aforementioned, data provided by different sensors may have different data formats. In addition to the data format, the degrees of processing of the data may vary. For example, some sensors can only provide raw data, while some sensors provide data that has been processed, for example, data with recognitions to some extent. For example, some Lidars can provide further information of scenarios based on the point cloud data, such as segmentations or recognitions of some objects (e.g., street signs, etc. ) . Some cameras may also provide similar recognition, such as identifying lanemarkings in captured image. Regardless of the extent to which the data is processed, the data output by sensors always contain pairs of position data and values. In other words, the output of sensors always tell what information on what positions. Therefore, for creating snapshots according to the present invention, it is only necessary to record all of the correspondence between the positions and data in a single snapshot, so that the snapshot of the present invention can be made compatible with all sensors, and meanwhile contain all the original information of each sensor.

It can be contemplated that, since multiple sensors are used to sense the same scenario, the same object in the scenario may be sensed by different sensors. For example, as shown in Fig. 1, there is a building 102 at a particular position, assuming (x ₁, y ₁) in the reference coordinate system. Thus, Lidar, Radar, and camera may all have sensed the tree, and respectively provide corresponding sensor data corresponding to the tree represented in their own local coordinate systems, such as { (x _s1, y _s1) , d _s1} by a first sensor, and { (x _s2, y _s2) , d _s2} by a second sensor. Obviously, after transformations to the reference coordinate system, the positions given by the two sensors will be the same spot in the reference coordinate system, i.e., (x ₁, y ₁) . In other words, (x _s1-x _c1, y _s1-y _c1) = (x _s2-x _c2, y _s2-y _c2) = (x ₁, y ₁) . When creating a snapshot, the sensor data given by the two sensors can both be added it to (x ₁, y ₁) , as { (x ₁, y ₁) , d _s1, d _s2} , for example. Those skilled in the art will appreciate that the data format described herein is merely exemplary, and any suitable data format that reflects the relationships between positions and readout values can be used in recording of snapshot image data according to the present invention.

Fig. 2 is a flowchart of an exemplary method 200 for creating snapshot images of traffic scenarios according to an embodiment of the present invention. The method 200 starts at step 202, where sensor data of at least two sensors installed on a vehicle may be obtained. The sensor data is collected at substantially same time (or having the same timestamp) . Then, at step 204, positions of each of the sensors may be obtained. As aforementioned, the positions of each of the sensors are the relative position of each sensor in the reference coordinate system. Thereafter, at step 206, the sensor data of each of the at least two sensors may be transformed into a reference coordinate system based on the obtained positions of the sensors. Finally, at step 208, all of the transformed sensor data may be plotted onto an image to form a snapshot image.

Before plotting the sensor data onto the snapshot image, an optional “fusing” step may be performed on the sensor data. Since a plurality of sensors are used, the sensor data from different sensors can be used to enhance the reliability and confidence of the sensor data. For example, if a Lidar senses a traffic sign and gives a recognized result indicating that the object is a traffic sign, and now if a camera also captures the picture and recognizes the traffic sign, then the recognition of traffic sign has an almost 100%confidence. On the other hand, if the sensor data given by the Lidar is not quite sure on what is it (like a traffic sign with 50%confidence) , but with the sensor data from the camera, the confidence will also be increased to almost 100%confidence. Another case showing the advantage of using multiple sensors may be that a part of lanemarking may be temporarily blocked by an object, such as a car, so the blocked part may be not sensible by sensor A, but with reference to sensor data from sensor B, such as an image captured by a camera showing clearly that the lanemarking is there and just blocked, raw data given by sensor A can be processed so as to use data corresponding to lanemarking data to replace the raw data, as if there is no object blocking that part of the lanemarking.

It should be noted that although terms “snapshot” , “snapshot image” and “plotting” and the like are used in the present disclosure, the recorded snapshot data does not have to be drawn as visible images. Instead, as aforementioned, snapshots or snapshot images are only representatives of recording sensor data of a surrounding scenario at a particular moment or moments. Therefore, the “plotting data onto an image” in step 208 does not mean that the data is visually presented as an image, but refers to integrating the transformed sensor data from various sensors into a unified data structure based on the coordinate positions in the reference coordinate system. This data structure is called a “snapshot” , “snapshot image” or “snapshot image data” . Of course, since the position information and the data values associated with the positions are completely retained in the snapshot image data, it can be visually rendered as an image by some specialized software if necessary, for example for human understanding.

By transforming various sensor data into unified snapshots, vehicles do not have to separately record and use various types of sensor data, which greatly reduces the burdens of on-board systems. Meanwhile, the unified format of sensor data makes it unnecessary to separately train various models based on different sensors, which significantly reduces the amount of computation in the training progress and notably increases the training efficiency.

2. One Sensor, Multiple Timestamp

As a second embodiment of “snapshot” , the snapshot may be constructed by sensor data from one single sensor, but captured at different times. As could be appreciated, the difference from the previously described embodiment is that the first embodiment records a snapshot of multiple sensors at the same time, while the second embodiment records a snapshot of one single sensor at different times.

Similar to the first embodiment, a reference coordinate system may be established first. Assume that it is still a two-dimensional coordinate system parallel to the ground. As an example, the midpoint of the rear axle of the car is again selected as the origin of the reference coordinate system. In a same manner, the x-axis is perpendicular to the rear axle of the vehicle, with the positive half and the negative half of the x-axis representing positions in front of and behind the direction of travel of the vehicle, respectively. The y-axis is parallel to the rear axle of the vehicle, with the positive half and the negative half of the y-axis can represent positions on the left side and the right side of the direction of travel of the vehicle, respectively.

Sensor data captured by a sensor at a single time spot may be referred to as a sensor data frame. As an example, the number n of sensor data frames included in one snapshot may be preset, wherein n is a positive integer greater than or equal to 2, such as n=10 for example. In one embodiment, the n frames may be a serial of successive data frames of a sensor. For example, the n frames of data can be sequentially taken using the sampling interval of the sensor itself. Alternatively, the n sensor data frames may be captured at a regular interval. In another example, an interval larger than the sampling interval of the sampling sensor itself may be appropriately selected. For example, the sampling frequency of the sensor itself is 100 Hz, but one frame may be selected every 10 frames as a snapshot data frame. The sampling interval may be selected based on the speed of movement of the vehicle, for example, such that the data of the sensor can have a relatively significant difference when the vehicle is not moving too fast.

After taking n frames of sensor data, the n frames of sensor data may be transformed to snapshot data. In addition to position information and readout values, sensor data also typically contains a timestamp that records the time at which the data was captured. When creating a snapshot, in addition to establishing a reference coordinate system, a particular time spot may be selected as the reference time, or reference timestamp. For example, the acquisition time of the first frame or the last frame or any one of the frame data in the n frame may be taken as the reference time t ₀. It is assumed herein that the time of the first frame is taken as the reference time t ₀, and the subsequent second to n ^th frames can be denoted as times t ₁, …, t _n-1. The times t ₁, …, t _n-1herein are also called timestamps or age of the frames.

Subsequently, each frame of sensor data may be transformed into data in the reference coordinate system. For the first frame of data, the transformation may include the transformation of the positions between the reference coordinate system and the local coordinate system of the sensor. Similar to the first embodiment, the position of the sensor on the vehicle is known, so the relative positional relationship between the origin of its local coordinate system and the origin of the reference coordinate system is known. Thus, the coordinates can be transformed by translation.

Next, for the second frame of data, in addition to the transformation of the position between the reference coordinate system and the local coordinate system, it is also necessary to consider the movement of the position of the vehicle itself during time t ₀ to time t ₁. As an example, the positional movement of the vehicle may be determined by the time interval between t ₀ and t ₁ and the speed of the car during this time period, or by an odometer or some other sensor data. Assuming that the relative position between the local coordinate system and the reference coordinate system is x _c, y _c, i.e., the origin of the local coordinate system is located at (x _c, y _c) in the reference coordinate system, and the positional movement of the car during t ₀ to t ₁ is (d _x1, d _y1) , then { (x ₁, y ₁) , d ₁, t ₁} in the second frame of data can be transformed into { (x ₁-x _c-d _x1, y ₁-y _c-d _y1) , d ₁, t ₁} , wherein t ₁ represents the time of capturing the second data frame. Similarly, subsequent frames can also perform the same transformation. In the end, all of the n frames of transformed sensor data may be integrated to form a snapshot based on the transformed positions in the reference coordinate system.

It may be contemplated that, assuming that the snapshot data is visually rendered as an image, such as Fig. 3, which illustrates an exemplary diagram generated from a snapshot image of a traffic scenario according to another embodiment of the present invention, the data from time t ₀ to time t _n is integrated into one single coordinate system. In such an image, a stationary object in the scenario still appears to be stationary, but a moving object may appear as a motion trajectory. Taking the building 102 in Fig. 1 as an example, since it is stationary, the frames of sensor data, the coordinate after being transformed into the reference coordinate system will coincide with each other. Thus, it is still shown as stationary in Fig. 3 at the same position as compared to Fig. 1. In contrast, the moving car 103 in FIG. 1, will appear in Fig. 3 as traveling straight along the lane first, and then a lane change is performed.

By combining the multiple frames of data into one single snapshot, it is possible to clearly reflect the dynamics of the scenario over a period of time, which is suitable for subsequent model trainings and will be described in further detail below.

Fig. 4 is a flowchart of an exemplary method 400 for creating snapshot images of traffic scenarios according to another embodiment of the present invention. The method 400 starts at step 402, where at least two frames of sensor data of a sensor installed on a vehicle are obtained. The at least two frames of sensor data may be sequentially collected at different time. Thereafter, at step 404, a position of the sensor is obtained. As in the first embodiment, the position of the sensor is the relative position of the sensor in the reference coordinate system. At step 406, each frame of the sensor data may be transformed into a current reference coordinate system based on the obtained position of the sensor. As previously mentioned, the relative movements of the vehicle during between frames should also be considered during the transformation. After all of the frames of sensor data are transformed, at step 408, plot the transformed frames of sensor data onto an image to form a snapshot image.

There may also be an optional fusing step in this embodiment. Although only one sensor is used, the sensor data captured at different timestamps can also be used to enhance the reliability and confidence of the sensor data. For example, at one timestamp, a sensor may sense an object but not be sure what it is. Several frames after, with the vehicle gets closer to the object, it clearly figure out what it is. Then, the previous data can be processed or fused with newer data.

3. Multiple Sensors, Multiple Timestamp

As a third embodiment of “snapshot” , the snapshot may be constructed by sensor data from multiple sensors at different times. The third embodiment is similar in many aspects to the previously described second embodiment, except that only one sensor is used in the second embodiment, while a plurality of sensors are used in the third embodiment. In the previous described first embodiment, it is described that a snapshot is created with multiple sensors at a single time spot. Similar to the first embodiment, on the basis of the second embodiment recording n frames of sensor data, the data from multiple sensors can be performed a coordinate system transformation and a snapshot can be formed based on coordinates.

As an example, assuming that the relative position between the local coordinate system of a first sensor (e.g., Lidar) and the reference coordinate system is x _c1, y _c1, i.e., the origin of the local coordinate system is located at (x _c1, y _c1) in the reference coordinate system, the relative position between the local coordinate system of a second sensor (e.g., radar) and the reference coordinate system is x _c2, y _c2, i.e., the origin of the local coordinate system is located at (x _c2, y _c2) in the reference coordinate system, and the positional movement of the car during t ₀ to t ₁ is (d _x1, d _y1) , then { (x _s1, y _s1) , d ₁, t ₁} in the second frame of data of the first sensor can be transformed into { (x _s1-x _c1-d _x1, y _s1-y _c1-d _y1) , d ₁, t ₁} , wherein t ₁ represents the time of capturing the second data frame. Similarly, the { (x _s2, y _s2) , d ₂, t ₁} in the second frame of data of the second sensor can be transformed into { (x _s2-x _c2-d _x1, y _s2-y _c2-d _y1) , d ₂, t ₁} . Further, if there is a n ^th sensor, e.g., a camera., then the { (x _sn, y _sn) , d _n, t ₁} in the second frame of data of the third sensor can be transformed into { (x _sn-x _cn-d _x1, y _sn-y _cn-d _y1) , d _n, t ₁} . As previously described, each frame of transformed data of each sensor is integrated into a snapshot under the reference coordinate system. In the terms of data structure, the snapshot formed according to the third embodiment looks like a combination of the snapshot data formats of the first embodiment and the second embodiment, and can be generally represented as { (x, y) , d _s1, d _s2, …, d _sn, t _n-1} to represent multiple sensor data values at (x, y) in the reference coordinate system with timestamps. Assuming that the snapshot data of the third embodiment is visually rendered as an image, the image should appear similar to the second embodiment, reflecting the dynamic changes of the scenario.

Fig. 5 is a flowchart of an exemplary method 500 for creating snapshot images of traffic scenarios according to an embodiment of the present invention. The method starts at step 502, where at least two frames of sensor data of the road scene from at least two sensors installed on a vehicle are obtained. The at least two frames of sensor data may be sequentially collected at different time. At step 504, positions of each of the at least two sensors are obtained. Thereafter, at step 506, each frame of the sensor data are transformed into a current reference coordinate system based on the obtained positions of the at least two sensors. Similar to the second embodiment, the relative movements of the vehicle during between frames should also be considered during the transformation. At step 508, all of the transformed sensor data may be plotted onto an image to form a snapshot image. Also, there may also be an optional fusing step in this embodiment, such as to fuse the sensor data which has overlapped positions in the reference coordinate system.

II. Training a Roadmodel

For an automated driving (AD) vehicle, it makes real-time driving decisions based on HD maps and a variety of sensor data. In general, the AD vehicle must first determine the exact position of it on the road, and then decide how to drive (steering, accelerating, etc. ) . More specific, the AD vehicle identifies objects based on realtime sensor data, such as Lidar, camera, etc. Then, it compares the identified objects to the roadmodel contained in the HD map, thereby determining its position on the road.

A substantial portion of existing road models are actually constructed based on sensor data collected on the road by sensors mounted on map information collection vehicles. It can be understood that at the beginning, such data depends on the judgments of human to identify objects. Slowly through the data accumulation, some rules are formed, and the objects can be automatically recognized by computers. The ultimate goal is to have a mature model that allows for identifying various objects and generate a roadmodel by simply entering acquired sensor data. However, existing roadmodel construction requires the use of various sensors that work independently of each other. Therefore, to train a certain model, one must train separately the model for each sensor. This is obviously inefficient and computationally intensive.

This problem can be solved by using the snapshot technique proposed by the present invention. According to the snapshot technique of the present invention, data of various sensors is integrated into a unified data structure. Thus, only one training upon such a unified data is required.

Fig. 6 is a flowchart of an exemplary method 600 for training a road model with snapshot images according to an embodiment of the present invention. The method starts at step 602, obtaining an existing road model of a road scene. At step 604, obtaining at least two frames of sensor data of the road scene from at least two sensors installed on a vehicle, the at least two frames of sensor data are sequentially collected at different time. At step 606, for each of the at least two frames, creating a snapshot image with the obtained sensor data. At step 608, associating the existing road model with each of the snapshot image as training data. At step 610, training a new road model using the training data. As an example, the training may be based on machine learning techniques. The snapshot images and the known elements from existing road models are paired, or called labeled, so as to be used as training data. With a large amount of training data, the desired model can be trained. Although the amount of training data used for training a model is still large, the amount of data will be significantly less than training the model separately with each type of sensor data.

III. Training an Event Detector

As aforementioned, the snapshot of the present invention may contain data collected by one or more sensors at multiple times, and thus can reflect dynamic information of objects in the scenarios. This feature is also useful in training an ADV to identify motion states (also known as events) of objects that occur in real time in the scenarios. For example, the car in Figure 3 changes from the left lane of the current lane where the vehicle on which the sensors are mounted, to the current lane, which is a commonly-seen lane change on the road, also known as “cut in” . Similar events include but not limited to: lane change; overtaking; steering; break; collision; and lost control.

Fig. 7 is a flowchart of an exemplary method 700 for training an event detector with snapshot images according to an embodiment of the present invention. The method 700 starts at step 702, where at least two frames of sensor data from at least one sensor installed on a vehicle are obtained. The at least two frames of sensor data may be sequentially collected at different time. At step 704, results of events that are occurring while the sensor data are obtained may be obtained. These results may come from human. For example, the engineers may watch a video corresponding to the sensor data frames and identify events in the video. At step 706, for each of the at least two frames, a snapshot image may be created with the obtained sensor data., such as via the methods 200, 400 or 500 of creating snapshot images described in Figs. 2, 4 and 5. At step 708, associating the obtained results of events with corresponding snapshot images as training data. At step 710, training an event detector using the training data. As an example, the training may be based on machine learning techniques. The snapshot images and the known events are paired or labeled so as to be used as training data. With a large amount of training data, the desired event detector can be trained. Although the amount of training data used for training the event detector is still large, the amount of data will be significantly less than training the event detector separately with each type of sensor data.

Fig. 8 is a flowchart of an exemplary method 800 on a vehicle for detecting events. The method 800 starts at step 802, where an event detector, such as the event detector trained via the method 700, may be obtained. At step 804, at least one frame of sensor data from at least one sensor installed on a vehicle may be obtained. At step 806, for each of the at least one frame, a snapshot image may be created with the obtained sensor data. At step 808, events may be detected with the event detector based on the created snapshot image. More specific, this step may include inputting the created snapshot image to the event detector, and then the even detector outputting detected events based on the input snapshot image. Preferably, the results, i.e., the detected events, may be output with probabilities or confidences.

Fig. 9 illustrates an exemplary apparatus 900 apparatus for creating snapshot images of traffic scenario according to an embodiment of the invention. The apparatus 900 may comprise a sensor data obtaining module 902, a sensor position obtaining module 904, a transforming module 906, and a plotting module 908. The sensor data obtaining module 902 may be configured for obtaining sensor data of at least two sensors installed on a vehicle. The sensor position obtaining module 904 may be configured for positions of each of the sensors. The transforming module 906 may be configured for transforming the sensor data of each of the at least two sensors into a reference coordinate system based on the obtained positions of the sensors. The plotting module 908 may be configured for plotting the transformed sensor data onto an image to form a snapshot image.

Fig. 10 illustrates an exemplary vehicle 1000 according to an embodiment of the present invention. The vehicle 1000 may comprise an apparatus for creating snapshot images of traffic scenarios, such as the apparatus 900 in Fig. 9. Like normal vehicles, the vehicle 1000 may further comprise at least two sensors 1002 for collecting sensor data of traffic scenarios. The sensors 1002 may be of different types and include but not limited to Lidars, radars and cameras.

Fig. 11 illustrates an exemplary apparatus 1100 apparatus for creating snapshot images of traffic scenario according to an embodiment of the invention. The apparatus 1100 may comprise a sensor data obtaining module 1102, a sensor position obtaining module 1104, a transforming module 1106, and a plotting module 1108. The sensor data obtaining module 1102 may be configured for obtaining at least two frames of sensor data of a sensor installed on a vehicle. The sensor position obtaining module 1104 may be configured for obtaining a position of the sensor. The transforming module 1106 may be configured for transforming each frame of the sensor data into a current reference coordinate system based on the obtained positions of the sensor. The plotting module 1108 may be configured for plotting the transformed sensor data onto an image to form a snapshot image.

Fig. 12 illustrates an exemplary vehicle 1200 according to an embodiment of the present invention. The vehicle 1200 may comprise an apparatus for creating snapshot images of traffic scenarios, such as the apparatus 1100 in Fig. 11. Like normal vehicles, the vehicle 1200 may further comprise at least one sensor 1202 for collecting sensor data of traffic scenarios. The at least one sensor 1202 may be of different types and include but not limited to Lidars, radars and cameras.

Fig. 13 illustrates an exemplary apparatus 1300 apparatus for creating snapshot images of traffic scenario according to an embodiment of the invention. The apparatus 1300 may comprise a sensor data obtaining module 1302, a sensor position obtaining module 1304, a transforming module 1306, and a plotting module 1308. The sensor data obtaining module 1302 may be configured for obtaining at least two frames of sensor data of the road scene from at least two sensors installed on a vehicle. The sensor position obtaining module 1304 may be configured for obtaining positions of each of the at least two sensors. The transforming module 1306 may be configured for transforming each frame of the sensor data into a current reference coordinate system based on the obtained positions of the at least two sensors. The plotting module 1308 may be configured for plotting the transformed sensor data onto an image to form a snapshot image.

Fig. 14 illustrates an exemplary vehicle 1400 according to an embodiment of the present invention. The vehicle 1400 may comprise an apparatus for creating snapshot images of traffic scenarios, such as the apparatus 1300 in Fig. 13. Like normal vehicles, the vehicle 1400 may further comprise at least two sensors 1402 for collecting sensor data of traffic scenarios. The at least two sensors 1402 may be of different types and include but not limited to Lidars, radars and cameras.

Fig. 15 illustrates an exemplary system 1500 for training a road model with snapshot images according to an embodiment of the present invention. The system 1500 may comprise at least two sensors 1502 configured for collecting sensor data of a road scene, and a processing unit 1504. The processing unit 1504 is configured to perform the method of training a road model with snapshot images, such as the method 600 described in Fig. 6.

Fig. 16 illustrates an exemplary system 1600 for training an event detector with snapshot images. The system 1600 may comprise a sensor data obtaining module 1602, an event result obtaining module 1604, a snapshot image creating module 1606, an associating module 1608 and a training module 1610. The sensor data obtaining module 1602 may be configured for obtaining at least two frames of sensor data from at least one sensor installed on a vehicle. The event result obtaining module 1604 may be configured for obtaining results of events that are occurring while the sensor data are obtained. The snapshot image creating module 1606 may be configured for, for each of the at least two frames, creating a snapshot image with the obtained sensor data. The associating module 1608 may be configured for associating the obtained results of events with corresponding snapshot images as training data. The training module 1610 may be configured for training an event detector using the training data.

Fig. 17 illustrates an apparatus 1700 on a vehicle for detecting events according to an embodiment of the present invention. The apparatus 1700 may comprise a detector obtaining module 1702, a sensor data obtaining module 1704, a snapshot image creating module 1706, and an event detecting module 1708. The detector obtaining module 1702 may be configured for obtaining an event detector trained by the method, such as the method 800 described with regard to Fig. 8. The sensor data obtaining module 1704 configured for obtaining at least two frames of sensor data from at least one sensor installed on a vehicle. The snapshot image creating module 1706 may be configured for, for each of the at least two frames, creating a snapshot image with the obtained sensor data. The event detecting module 1708 may be configured for detecting events with the event detector based on the created snapshot image.

Fig. 18 illustrates an exemplary vehicle 1800 according to an embodiment of the present invention. The vehicle 1800 may comprise an apparatus for detecting events, such as the apparatus 1700 in Fig. 17. Like normal vehicles, the vehicle 1800 may further comprise at least one sensor 1802 for collecting sensor data of traffic scenarios. The sensor 1802 may be of different types and include but not limited to Lidars, radars and cameras.

Fig. 19 illustrates a general hardware environment 1900 wherein the present disclosure is applicable in accordance with an exemplary embodiment of the present disclosure.

With reference to FIG. 19, a computing device 1900, which is an example of the hardware device that may be applied to the aspects of the present disclosure, will now be described. The computing device 1900 may be any machine configured to perform processing and/or calculations, may be but is not limited to a work station, a server, a desktop computer, a laptop computer, a tablet computer, a personal data assistant, a smart phone, an on-vehicle computer or any combination thereof. The aforementioned system may be wholly or at least partially implemented by the computing device 1900 or a similar device or system.

The computing device 1900 may comprise elements that are connected with or in communication with a bus 1902, possibly via one or more interfaces. For example, the computing device 1900 may comprise the bus 1902, and one or more processors 1904, one or more input devices 1906 and one or more output devices 1908. The one or more processors 1904 may be any kinds of processors, and may comprise but are not limited to one or more general-purpose processors and/or one or more special-purpose processors (such as special processing chips) . The input devices 1906 may be any kinds of devices that can input information to the computing device, and may comprise but are not limited to a mouse, a keyboard, a touch screen, a microphone and/or a remote control. The output devices 1908 may be any kinds of devices that can present information, and may comprise but are not limited to display, a speaker, a video/audio output terminal, a vibrator and/or a printer. The computing device 1900 may also comprise or be connected with non-transitory storage devices 1910 which may be any storage devices that are non-transitory and can implement data stores, and may comprise but are not limited to a disk drive, an optical storage device, a solid-state storage, a floppy disk, a flexible disk, hard disk, a magnetic tape or any other magnetic medium, a compact disc or any other optical medium, a ROM (Read Only Memory) , a RAM (Random Access Memory) , a cache memory and/or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions and/or code. The non-transitory storage devices 1910 may be detachable from an interface. The non-transitory storage devices 1910 may have data/instructions/code for implementing the methods and steps which are described above. The computing device 1900 may also comprise a communication device 1912. The communication device 1912 may be any kinds of device or system that can enable communication with external apparatuses and/or with a network, and may comprise but are not limited to a modem, a network card, an infrared communication device, a wireless communication device and/or a chipset such as a Bluetooth ^TM device, 802.11 device, WiFi device, WiMax device, cellular communication facilities and/or the like.

When the computing device 1900 is used as an on-vehicle device, it may also be connected to external device, for example, a GPS receiver, sensors for sensing different environmental data such as an acceleration sensor, a wheel speed sensor, a gyroscope and so on. In this way, the computing device 1900 may, for example, receive location data and sensor data indicating the travelling situation of the vehicle. When the computing device 1900 is used as an on-vehicle device, it may also be connected to other facilities (such as an engine system, a wiper, an anti-lock Braking System or the like) for controlling the traveling and operation of the vehicle.

In addition, the non-transitory storage device 1910 may have map information and software elements so that the processor 1904 may perform route guidance processing. In addition, the output device 1906 may comprise a display for displaying the map, the location mark of the vehicle and also images indicating the travelling situation of the vehicle. The output device 1906 may also comprise a speaker or interface with an ear phone for audio guidance.

The bus 1902 may include but is not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus. Particularly, for an on-vehicle device, the bus 1902 may also include a Controller Area Network (CAN) bus or other architectures designed for application on an automobile.

The computing device 1900 may also comprise a working memory 1914, which may be any kind of working memory that may store instructions and/or data useful for the working of the processor 1904, and may comprise but is not limited to a random access memory and/or a read-only memory device.

Software elements may be located in the working memory 1914, including but are not limited to an operating system 1916, one or more application programs 1918, drivers and/or other data and codes. Instructions for performing the methods and steps described in the above may be comprised in the one or more application programs 1918, and the units of the aforementioned apparatus 800 may be implemented by the processor 1904 reading and executing the instructions of the one or more application programs 1918. The executable codes or source codes of the instructions of the software elements may be stored in a non-transitory computer-readable storage medium, such as the storage device (s) 1910 described above, and may be read into the working memory 1914 possibly with compilation and/or installation. The executable codes or source codes of the instructions of the software elements may also be downloaded from a remote location.

Those skilled in the art may clearly know from the above embodiments that the present disclosure may be implemented by software with necessary hardware, or by hardware, firmware and the like. Based on such understanding, the embodiments of the present disclosure may be embodied in part in a software form. The computer software may be stored in a readable storage medium such as a floppy disk, a hard disk, an optical disk or a flash memory of the computer. The computer software comprises a series of instructions to make the computer (e.g., a personal computer, a service station or a network terminal) execute the method or a part thereof according to respective embodiment of the present disclosure.

Reference has been made throughout this specification to “one example” or “an example” , meaning that a particular described feature, structure, or characteristic is included in at least one example. Thus, usage of such phrases may refer to more than just one example. Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more examples.

One skilled in the relevant art may recognize, however, that the examples may be practiced without one or more of the specific details, or with other methods, resources, materials, etc. In other instances, well known structures, resources, or operations have not been shown or described in detail merely to observe obscuring aspects of the examples.

While sample examples and applications have been illustrated and described, it is to be understood that the examples are not limited to the precise configuration and resources described above. Various modifications, changes, and variations apparent to those skilled in the art may be made in the arrangement, operation, and details of the methods and systems disclosed herein without departing from the scope of the claimed examples.

Claims

A computer-implemented method for training an event detector with snapshot images, wherein the method comprises:

obtaining at least two frames of sensor data from at least one sensor installed on a vehicle, the at least two frames of sensor data are sequentially collected at different time;

obtaining results of events that are occurring while the sensor data are obtained;

for each of the at least two frames, creating a snapshot image with the obtained sensor data;

associating the obtained results of events with corresponding snapshot images as training data; and

training an event detector using the training data.
The method according to claim 1, wherein creating a snapshot image further comprises:

obtaining positions of the at least one sensor;

transforming each frame of the sensor data into a current reference coordinate system based on the obtained positions of the at least one sensor; and

plotting all of the transformed sensor data onto an image to form a snapshot image.
The method according to any one of the preceding claims, wherein the at least one sensor comprises:

Lidar;

Radar; and

camera.
The method according to any one of the preceding claims, wherein training an event detector using the training data comprises:

training the event detector based on machine learning using the training data as labels.
The method according to any one of the preceding claims, wherein the events comprises:

cut in;

lane change;

overtaking;

steering;

break;

collision; and

lost control.
A computer-implemented method on a vehicle for detecting events, characterized in comprising:

obtaining an event detector trained by the method according to any one of claims 1 to 5;

obtaining at least one frame of sensor data from at least one sensor installed on a vehicle;

for each of the at least one frame, creating a snapshot image with the obtained sensor data; and

detecting events with the event detector based on the created snapshot image.
The method according to claim 6, wherein detecting events with the event detector based on the created snapshot image further comprises:

inputting the created snapshot image to the event detector; and

outputting detected events.
The method according to any one of claims 6-7, wherein outputting detected events further comprises:

outputting the detected events with probabilities.
The method according to claim 8, wherein outputting the detected events with probabilities further comprises:

outputting the detected events with probabilities that are above a predetermined threshold.
A system for training an event detector with snapshot images, wherein the system comprises:

a sensor data obtaining module configured for obtaining at least two frames of sensor data from at least one sensor installed on a vehicle;

an event result obtaining module configured for obtaining results of events that are occurring while the sensor data are obtained;

a snapshot image creating module configured for, for each of the at least two frames, creating a snapshot image with the obtained sensor data;

an associating module configured for associating the obtained results of events with corresponding snapshot images as training data; and

a training module configured for training an event detector using the training data.
An apparatus on a vehicle for detecting events, characterized in comprising:

a detector obtaining module configured for obtaining an event detector trained by the method according to any one of claims 1 to 5;

a sensor data obtaining module configured for obtaining at least two frames of sensor data from at least one sensor installed on a vehicle;

a snapshot image creating module configured for, for each of the at least two frames, creating a snapshot image with the obtained sensor data; and

an event detecting module configured for detecting events with the event detector based on the created snapshot image.
A vehicle, characterized in comprising

at least one sensor; and

the apparatus of any one of claim 11.
The vehicle according to claim 12, wherein the at least one sensor comprises:

Lidar;

Radar; and

camera.
The vehicle according to any one of claims 12-13, wherein the events comprises:

cut in;

lane change;

overtaking;

steering;

break;

collision; and

lost control.
The vehicle according to any one of claims 12-14, wherein the vehicle further comprises:

a decision unit configured for making vehicle control decisions in response to the detected events.