CN112889070A

CN112889070A - Snapshot images for training road models

Info

Publication number: CN112889070A
Application number: CN201880098535.3A
Authority: CN
Inventors: M·德姆林; 江万里; 李千山
Original assignee: Bayerische Motoren Werke AG
Current assignee: Bayerische Motoren Werke AG
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2021-06-01
Also published as: EP3864579A4; EP3864579A1; WO2020073268A1

Abstract

Examples of the present disclosure describe a method and system for training a road model using snapshot images. The method comprises the following steps: acquiring an existing road model of a road scene; acquiring at least two frames of sensor data of the road scene from at least two sensors mounted on the vehicle, the at least two frames of sensor data being collected sequentially at different times; for each of the at least two frames, creating a snapshot image using the acquired sensor data; associating the existing road model with each of the snapshot images as training data; and training a new road model using the training data.

Description

Snapshot images for training road models

Technical Field

The present disclosure relates generally to autonomous vehicles and, more particularly, to training road models using snapshot images.

Background

An autonomous vehicle (also known as an unmanned car, an autonomous car, a robotic car) is a vehicle that is capable of sensing its environment and navigating without human input. Autonomous vehicles (hereinafter referred to as ADVs) use various techniques to detect their surroundings, such as using radar, laser, GPS, odometry, and computer vision. Advanced control systems interpret the sensed information to identify appropriate navigation paths, as well as obstacles and related landmarks.

More specifically, ADV collects sensor data from various onboard sensors (e.g., cameras, lidar, radar, etc.). Based on this sensor data, the ADV may construct a real-time road model of its surroundings. The road model may include a variety of information including, but not limited to, lane markings, traffic lights, traffic signs, road boundaries, and the like. The constructed road model is compared to pre-installed road models, such as those provided by High Definition (HD) map providers, so that the ADV can more accurately determine its location in the HD map. At the same time, ADVs may also identify objects around them, such as vehicles and pedestrians, based on sensor data. The ADV may make appropriate driving decisions based on the determined road model and the identified surrounding objects (such as lane changes, acceleration, braking, etc.).

As is known in the art, different sensors produce data in different forms or formats. For example, a camera provides an image, while a lidar provides a point cloud. In processing such sensor data from different sensors, each type of sensor data must be processed separately. Thus, for each type of sensor, one or more models for object identification must be established. In addition, for any particular type of sensor, it may have drawbacks when used to train a target model. For example, if the model is trained using images acquired directly by the camera, the disadvantages may include: (1) elements in the image are not classified; (2) the image may be at any viewing angle; and (3) a large number of sample images are required to train the target model. Similar disadvantages may exist for other types of sensors. Accordingly, an improved solution for training a road model is desired.

Disclosure of Invention

It is an object of the present disclosure to provide a method and system for training a road model using snapshot images that has at least the following advantages over conventional road model training: a smaller number of training samples are used due to the unified data representation; (2) the sample data may include various types of information, which may also include historical information; (3) the calculation is efficient; and (4) enabling the use of existing data in different formats.

According to a first exemplary embodiment of the present disclosure, a method for creating a snapshot image of a traffic scene is provided. The method comprises the following steps: acquiring at least two frames of sensor data of a road scene from at least two sensors mounted on a vehicle, the at least two frames of sensor data collected sequentially at different times; acquiring a position of each of the at least two sensors; transforming each sensor data frame into a current reference coordinate system based on the acquired locations of the at least two sensors; and rendering all transformed sensor data onto the image to form a snapshot image.

According to a second exemplary embodiment of the present disclosure, a method for training a road model using snapshot images is provided. The method comprises the following steps: acquiring an existing road model of a road scene; acquiring at least two frames of sensor data of the road scene from at least two sensors mounted on the vehicle, the at least two frames of sensor data being collected sequentially at different times; for each of the at least two frames, creating a snapshot image using the acquired sensor data according to the method of the first embodiment; associating the existing road model with each of the snapshot images as training data; and training a new road model using the training data.

According to a third exemplary embodiment of the present disclosure, an apparatus for creating a snapshot image of a traffic scene is provided. The device includes: a sensor data acquisition module configured to acquire at least two frames of sensor data of a road scene from at least two sensors mounted on a vehicle, the at least two frames of sensor data collected sequentially at different times; a sensor position acquisition module configured to acquire positions of the at least two sensors; a transformation module configured to transform each frame of sensor data into a current reference coordinate system based on the acquired locations of the at least two sensors; and a rendering module configured to render all transformed sensor data onto an image to form a snapshot image.

According to a fourth exemplary embodiment of the present disclosure, a vehicle is provided comprising at least two sensors and the apparatus of the third exemplary embodiment.

According to a fifth exemplary embodiment of the present disclosure, a system for training a road model using snapshot images is provided. The system may include: at least two sensors configured to collect sensor data of a road scene; and a processing unit configured to: acquiring an existing road model of a road scene; acquiring at least two frames of sensor data of the road scene from at least two sensors mounted on the vehicle, the at least two frames of sensor data being collected sequentially at different times; for each of the at least two frames, creating a snapshot image using the acquired sensor data according to the method of the first embodiment; associating the existing road model with each of the snapshot images as training data; and training a new road model using the training data.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Additional aspects, features and/or advantages of the examples will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The foregoing and other aspects and advantages of the disclosure will become apparent from the following detailed description of exemplary embodiments, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of the disclosure. Note that the drawings are not necessarily drawn to scale.

Fig. 1 illustrates an exemplary diagram generated from a snapshot image of a traffic scene according to an embodiment of the present invention.

FIG. 2 is a flow diagram of an exemplary method for creating a snapshot image of a traffic scene in accordance with an embodiment of the present invention.

Fig. 3 illustrates an exemplary diagram generated from a snapshot image of a traffic scene according to another embodiment of the present invention.

Fig. 4 is a flow diagram of an exemplary method for creating a snapshot image of a traffic scene in accordance with another embodiment of the present invention.

FIG. 5 is a flow diagram of an exemplary method for creating a snapshot image of a traffic scene in accordance with yet another embodiment of the present invention.

FIG. 6 is a flow diagram of an exemplary method for training a road model using snapshot images in accordance with an embodiment of the present invention.

FIG. 7 is a flow diagram of an exemplary method for training an event detector using snapshot images in accordance with an embodiment of the present invention.

FIG. 8 is a flow diagram of an exemplary method implemented on a vehicle for detecting an event in accordance with an embodiment of the invention.

Fig. 9 illustrates an exemplary apparatus for creating a snapshot image of a traffic scene in accordance with an embodiment of the present invention.

FIG. 10 illustrates an exemplary vehicle according to an embodiment of the invention.

Fig. 11 illustrates an exemplary apparatus for creating a snapshot image of a traffic scene according to another embodiment of the present invention.

FIG. 12 illustrates an exemplary vehicle according to another embodiment of the invention.

Fig. 13 illustrates an exemplary apparatus for creating a snapshot image of a traffic scene according to yet another embodiment of the present invention.

FIG. 14 illustrates an exemplary vehicle according to yet another embodiment of the invention.

FIG. 15 illustrates an exemplary system for training a road model using snapshot images in accordance with an embodiment of the present invention.

FIG. 16 illustrates an exemplary system for training an event detector using snapshot images in accordance with an embodiment of the present invention.

FIG. 17 illustrates an apparatus for detecting an event on a vehicle according to an embodiment of the invention.

FIG. 18 illustrates an exemplary vehicle according to an embodiment of the invention.

Fig. 19 illustrates a general hardware environment in which the present disclosure may be applied, according to an exemplary embodiment of the present disclosure.

Detailed Description

In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the described exemplary embodiments. It will be apparent, however, to one skilled in the art, that the described embodiments may be practiced without some or all of these specific details. In other exemplary embodiments, well-known structures or processing steps have not been described in detail in order to avoid unnecessarily obscuring the concepts of the present disclosure.

The term "vehicle" as used in this specification refers to automobiles, airplanes, helicopters, ships, and the like. For simplicity, the invention is described in connection with "automobiles," but the embodiments described herein are not limited to only "automobiles," but may be applicable to other kinds of vehicles. The term "a or B" as used in the specification means "a and B" and "a or B" and does not mean that a and B are exclusive unless otherwise specified.

I.Snapshot

The present invention provides a method that can efficiently integrate various types of sensor data on a vehicle in a unified manner to integrally reveal information of a traffic scene around the vehicle. This method is somewhat analogous to taking a photograph of a scene, and is therefore referred to hereinafter as "snapshots," and the data of these snapshots is referred to as "snapshot images.

1. A plurality of sensors, a time stamp

As a first embodiment of the present invention, a snapshot may be constructed by capturing sensor data from multiple sensors simultaneously.

As mentioned above, vehicles (especially ADVs) are equipped with different types of sensors, such as lidar, radar, and cameras. Each sensor records its own sensor data and provides it to the central processing unit of the vehicle. The format of sensor data provided by sensors of various types or manufacturers is typically different. Therefore, the central processing unit needs to have the ability to read and recognize each of the various types of sensor data and use them individually. Therefore, this consumes a lot of resources and is inefficient.

The present invention integrates sensor data from multiple sensors in the form of snapshots. The plurality of sensors may be the same type of sensor, but may also be different types of sensors.

To perform uniform integration, a uniform reference coordinate system is established. According to one embodiment of the invention, the reference coordinate system of the invention may be a two-dimensional plane parallel to the ground. For example, the origin of the reference coordinate system may be the midpoint of the rear axle of the vehicle. Alternatively, the origin may be the location of any one of the sensors, such as the geometric center of the sensor, or the origin of the local coordinate system used by the sensor. Of course, the origin may be any point on the vehicle. For ease of illustration, in this embodiment, the midpoint of the rear axle of the vehicle is selected as the origin.

Accordingly, one axis of the reference coordinate system may be parallel to the rear axis of the vehicle, while the other axis may be perpendicular to the rear axis of the vehicle. Thus, as shown in fig. 1, which illustrates an exemplary diagram generated from a snapshot image of a traffic scene, the x-axis is perpendicular to the rear axis of the vehicle, where the positive half of the x-axis represents a position forward of the vehicle's direction of travel and the negative half of the x-axis represents a position rearward of the vehicle's direction of travel, according to an embodiment of the present invention. The y-axis is parallel to the rear axle of the vehicle. The positive half of the y-axis may represent the position to the left in the direction of vehicle travel, while the negative half of the y-axis may represent the position to the right in the direction of vehicle travel. Optionally, the size of the reference coordinate system may be predetermined in order to limit the amount of data. By way of example, the x-axis and y-axis may be defined as having a size of-50 to +50 meters, or-100 to +100 meters, or the like. In another example, the x-axis and y-axis may be determined by the maximum sensing range of a sensor mounted on the vehicle.

Various sensors used in vehicles, regardless of the data format they employ, typically include at least a binary set of location information and value information, such as { (x, y), d }, which represents the value read from the sensor at location (x, y) as d. The position information is in the local coordinate system of the sensor. Thus, after the reference coordinate system is determined, the sensor data for each sensor may be transformed from its respective local coordinate system to the reference coordinate system. The mounting position of the sensor on the vehicle is known, and the corresponding position in the reference coordinate system can thus be determined. For example, assume that the relative position between the local coordinate system of the first sensor and the reference coordinate system is x_c1,y_c1I.e. the origin of the local coordinate system of the first sensor is located in the reference coordinate system (x)_c1,y_c1). Subsequently, a given position (x) in the local reference coordinate system_s1,y_s1) Can be transformed into (x)_s1-x_c1,y_s1-y_c1). Similarly, assume that the relative position between the local coordinate system of the second sensor and the reference coordinate system is x_c2,y_c2I.e. the origin of the local coordinate system of the second sensor is located in the reference coordinate system (x)_c2,y_c2). Subsequently, a given position (x) in the local reference coordinate system_s2,y_s2) Can be transformed into (x)_s2-x_c2,y_s2-y_c2)。

Additionally, some sensors may use a three-dimensional local coordinate system, e.g., the point cloud data of the lidar is three-dimensional. Such a three-dimensional coordinate system may be projected onto a two-dimensional reference coordinate system. More specifically, such three-dimensional coordinate systems are generally represented by x, y, and z axes, wherein the plane formed by two of the three axes (assuming x and y axes) is also generally parallel to the ground, and thus parallel to the x-y plane in the reference coordinate system of the present invention. Thus, its x-y coordinates can be similarly transformed into coordinates in the reference coordinate system by translation. The z-coordinate does not need to be transformed and may be retained in the snapshot image data as additional information. Through a three-dimensional to two-dimensional transformation, the snapshot image provided by the present invention may appear similar to an overhead view of a scene when visually displayed.

In addition, as previously described, the data provided by the different sensors may have different data formats. The degree of processing of the data may vary, in addition to the data format. For example, some sensors can only provide raw data, while some provide processed data, e.g., data with identification to some extent. For example, some lidar systems may provide further information of the scene based on the point cloud data, such as segmentation or identification of some objects (e.g., guideboards, etc.). Some cameras may also provide similar recognition, such as identifying lane markings in captured images. Regardless of the degree to which the data is processed, the data output by the sensors always contains pairs of position data and values. In other words, the output of the sensor always tells what information about what location. Thus, to create a snapshot in accordance with the present invention, it is only necessary to record all correspondences between locations and data in a single snapshot, so that the snapshot of the present invention is compatible with all sensors and contains all raw information for each sensor at the same time.

It is contemplated that since multiple sensors are used to sense the same scene, the same object in the scene may be sensed by different sensors. For example, as shown in FIG. 1, in the reference coordinate system, it is assumed to be (x)₁,y₁) There is a building 102 at the particular location. Therefore, the laser radar,Both the radar and the camera may have sensed the tree and provided corresponding sensor data, respectively, corresponding to the tree represented in its own local coordinate system, such as { (x) provided by the first sensor_s1,y_s1),d_s1And { (x) provided by a second sensor_s2,y_s2),d_s2}). Obviously, after transformation to the reference coordinate system, the positions given by the two sensors will be co-located in the reference coordinate system, i.e. (x)₁,y₁). In other words, (x)_s1-x_c1,y_s1-y_c1)＝(x_s2-x_c2,y_s2-y_c2)＝(x₁,y₁). For example, when creating a snapshot, both sensor data given by the two sensors may be added to (x)₁,y₁) Such as { (x)₁,y₁),d_s1,d_s2}. Those skilled in the art will appreciate that the data formats described herein are merely exemplary, and that any suitable data format reflecting the relationship between locations and readout values may be used to record snapshot image data in accordance with the present invention.

Fig. 2 is a flow diagram of an exemplary method 200 for creating a snapshot image of a traffic scene in accordance with an embodiment of the present invention. The method 200 begins at step 202, and at step 202, sensor data for at least two sensors mounted on a vehicle may be acquired. The sensor data is collected at substantially the same time (or with the same timestamp). Subsequently, at step 204, the location of each sensor may be acquired. As described above, the position of each sensor is the relative position of each sensor in the reference coordinate system. Thereafter, at step 206, sensor data for each of the at least two sensors may be transformed into a reference coordinate system based on the acquired position of the sensor. Finally, at step 208, all transformed sensor data may be mapped onto an image to form a snapshot image.

An optional "fusion" step may be performed on the sensor data prior to drawing the sensor data onto the snapshot image. Since multiple sensors are used, sensor data from different sensors may be used to enhance the reliability and confidence of the sensor data. For example, if the lidar senses a traffic sign and gives a recognized result indicating that the object is a traffic sign, and now if the camera also captured a picture and recognized the traffic sign, the recognition of the traffic sign has almost 100% confidence. On the other hand, if the sensor data given by the lidar is less certain what it is (like a traffic sign with 50% confidence), but with the sensor data from the camera, the confidence will also increase to almost 100% confidence. Another situation showing the advantage of using multiple sensors may be that a portion of a lane marker may be temporarily occluded by an object, such as a car, so the occluded portion may not be sensed by sensor a, but with reference to sensor data from sensor B (such as an image captured by a camera showing clearly that there is a lane marker and is only occluded), the raw data given by sensor a may be processed to replace the raw data with data corresponding to the lane marker as if there were no object occluding that portion of the lane marker.

It should be noted that although the terms "snapshot," "snapshot image," and "draw," etc. are used in this disclosure, the recorded snapshot data need not be drawn as a visible image. Alternatively, as previously described, the snapshot or snapshot image is merely representative of the sensor data that recorded the surrounding scene at one or more particular times. Thus, "drawing data onto an image" in step 208 does not mean that the data is visually rendered as an image, but rather refers to integrating the transformed sensor data from the various sensors into a unified data structure based on coordinate locations in a reference coordinate system. This data structure is referred to as a "snapshot," snapshot image, "or" snapshot image data. Of course, since the position information and the data value associated with the position are completely retained in the snapshot image data, it can be visually rendered as an image by some dedicated software, if necessary, for human understanding, for example.

By transforming the various sensor data into a unified snapshot, the vehicle does not have to record and use the various types of sensor data separately, which greatly reduces the burden on the onboard system. Meanwhile, the unified format of the sensor data enables various models to be trained independently without different sensors, which greatly reduces the amount of calculation in the training progress and remarkably improves the training efficiency.

2. One sensor, multiple time stamps

As a second embodiment of a "snapshot," a snapshot may be constructed of sensor data from one sensor, but captured at a different time. As can be appreciated, the difference from the previously described embodiments is that the first embodiment records snapshots of multiple sensors at the same time, while the second embodiment records snapshots of one single sensor at different times.

Similar to the first embodiment, the reference coordinate system may be established first. Assuming that it is still a two-dimensional coordinate system parallel to the ground. As an example, the midpoint of the rear axle of the car is again selected as the origin of the reference coordinate system. In the same way, the x-axis is perpendicular to the rear axis of the vehicle, wherein the positive and negative half-axes of the x-axis represent the positions in front of and behind, respectively, the direction of travel of the vehicle. The y-axis is parallel to the rear axis of the vehicle, where the positive and negative half-axes of the y-axis may represent positions to the left and right in the direction of travel of the vehicle, respectively.

Sensor data captured by a sensor at a single point in time may be referred to as a frame of sensor data. As an example, a number of n sensor data frames included in one snapshot may be preset, where n is a positive integer greater than or equal to 2, such as, for example, n-10. In one embodiment, the n frames may be a series of consecutive data frames of the sensor. For example, the sampling interval of the sensor itself may be used to acquire n data frames in sequence. Alternatively, n frames of sensor data may be captured at regular intervals. In another example, an interval greater than the sampling interval of the sampling sensor itself may be suitably selected. For example, the sampling frequency of the sensor itself is 100Hz, but one frame may be selected as a snapshot data frame every 10 frames. The sampling interval may be selected based on the speed of movement of the vehicle, for example, such that the data of the sensors may have relatively significant differences when the vehicle is not moving too fast.

After acquiring the n frames of sensor data, the n frames of sensor data may be transformed into snapshot data. In addition to the location information and the readout values, the sensor data typically also contains a timestamp recording the time the data was captured. In addition to establishing the reference coordinate system, a particular point in time may be selected as a reference time or reference timestamp when the snapshot is created. For example, the acquisition time of the frame data of the first or last frame or any frame of the n frames may be regarded as the reference time t₀. It is assumed herein that the time of the first frame is taken as the reference time t₀And the subsequent 2 nd to nth frames may be denoted as time t₁、…、t_n-1. Time t₁、…、t_n-1Also referred to herein as a timestamp or age of the frame.

Each frame of sensor data may then be transformed into data in a reference coordinate system. For the first frame of data, the transformation may include a transformation of a position between a reference coordinate system and a local coordinate system of the sensor. Similarly to the first embodiment, the position of the sensor on the vehicle is known, and therefore the relative positional relationship between the origin of its local coordinate system and the origin of the reference coordinate system is known. Thus, the coordinates may be transformed by translation.

Next, for the second data frame, in addition to the transformation of the position between the reference coordinate system and the local coordinate system, it is necessary to take into account the position of the vehicle itself at time t₀To time t₁The movement of the period. As an example, the position of the vehicle may be moved by moving at t₀And t₁The time interval between and the speed of the vehicle during that time period, or by some other sensor data of the odometer. Let x be the relative position between the local coordinate system and the reference coordinate system_c,y_cI.e. the origin of the local coordinate system is located in (x) of the reference coordinate system_c,y_c) And the car is at t₀To t₁The position of the period is shifted to (d)_x1,d_y1) In the second data frame{(x₁,y₁),d₁,t₁Can be transformed into (x)₁-x_c-d_x1,y₁-y_c-d_y1),d₁,t₁Where t is₁Representing the time at which the second data frame was captured. Similarly, subsequent frames may also perform the same transformation. Finally, all n transformed frames of sensor data may be integrated based on the transformed locations in the reference coordinate system to form a snapshot.

It is contemplated that from time t, assuming that the snapshot data is visually rendered as an image, such as in FIG. 3, which illustrates an exemplary diagram generated from a snapshot image of a traffic scene in accordance with another embodiment of the present invention₀To time t_nIs integrated into a single coordinate system. In such images, still objects in the scene still appear to be stationary, but moving objects may appear as motion trajectories. Taking the building 102 in fig. 1 as an example, since it is stationary, for each frame of sensor data, the coordinates will coincide with each other after being transformed to the reference coordinate system. Thus, it is still shown in fig. 3 as being fixed at the same position as compared to fig. 1. In contrast, the moving vehicle 103 in fig. 1 will appear in fig. 3 to first travel straight along the lane and then perform a lane change.

By combining multiple frames of data into a single snapshot, the dynamics of the scene over a period of time can be clearly reflected, which is suitable for subsequent model training and will be described in further detail below.

Fig. 4 is a flow diagram of an exemplary method 400 for creating a snapshot image of a traffic scene in accordance with another embodiment of the present invention. The method 400 begins at step 402 by acquiring at least two frames of sensor data for a sensor mounted on a vehicle at step 402. The at least two frames of sensor data may be collected sequentially at different times. Thereafter, at step 404, the position of the sensor is acquired. As in the first embodiment, the position of the sensor is the relative position of the sensor in the reference coordinate system. At step 406, each frame of sensor data may be transformed into the current reference coordinate system based on the acquired position of the sensor. As mentioned previously, the relative movement of the vehicle between frames should also be taken into account during the conversion. At step 408, after all sensor data frames have been transformed, the transformed sensor data frames are rendered onto an image to form a snapshot image.

In this embodiment, an optional fusion step may also be present. Although only one sensor is used, sensor data captured at different timestamps may also be used to enhance the reliability and confidence of the sensor data. For example, at one timestamp, a sensor may sense an object but cannot determine what it is. After a few frames, it clearly recognizes what it is as the vehicle gets closer to the object. Subsequently, the previous data may be processed or merged with the newer data.

3. Multiple sensors, multiple timestamps

As a third embodiment of "snapshots," snapshots may be constructed of sensor data from multiple sensors at different times. The third embodiment is similar in many respects to the previously described second embodiment, except that only one sensor is used in the second embodiment, and multiple sensors are used in the third embodiment. In the foregoing first embodiment, it was described that a snapshot is created with a plurality of sensors at a single point in time. Similar to the first embodiment, on the basis of the second embodiment in which n frames of sensor data are recorded, coordinate system transformation may be performed on data from a plurality of sensors, and a snapshot may be formed based on the coordinates.

As an example, assume that the relative position between the local coordinate system of the first sensor (e.g., lidar) and the reference coordinate system is x_c1,y_c1I.e. the origin of the local coordinate system is located in (x) of the reference coordinate system_c1,y_c1) The relative position between the local coordinate system of the second sensor (e.g. radar) and the reference coordinate system is x_c2,y_c2I.e. the origin of the local coordinate system is located in (x) in the reference coordinate system_c2,y_c2) And the car is at t₀To t₁The position of the period is shifted to (d)_x1,d_y1) Then first sensingIn the second data frame of the device { (x)_s1,y_s1),d₁,t₁Can be transformed into (x)_s1-x_c1-d_x1,y_s1-y_c1-d_y1),d₁,t₁Where t is₁Representing the time at which the second data frame was captured. Similarly, { (x) in the second data frame for the second sensor_s2,y_s2),d₂,t₁Can be transformed into (x)_s2-x_c2-d_x1,y_s2-y_c2-d_y1),d₂,t₁}. Further, { (x) in the second data frame of the nth sensor, if the nth sensor (e.g., camera) is present_sn,y_sn),d_n,t₁Can be transformed into (x)_sn-x_cn-d_x1,y_sn-y_cn-d_y1),d_n,t₁}. As previously described, each frame of transformed data for each sensor is integrated into the snapshot under the reference coordinate system. In terms of data structure, the snapshot formed according to the third embodiment looks like a combination of the snapshot data formats of the first and second embodiments, and can be generally expressed as { (x, y), d { (x, y) }_s1,d_s2,…,d_sn,t_n-1To represent a plurality of sensor data values with time stamps at (x, y) in a reference coordinate system. Assuming that the snapshot data of the third embodiment is visually rendered as an image, the image should appear similar to the second embodiment reflecting the dynamic changes of the scene.

Fig. 5 is a flow diagram of an exemplary method 500 for creating a snapshot image of a traffic scene in accordance with an embodiment of the present invention. The method starts at step 502, and at step 502, at least two frames of sensor data of a road scene are acquired from at least two sensors mounted on a vehicle. The at least two frames of sensor data may be collected sequentially at different times. At step 504, a location of each of the at least two sensors is acquired. Thereafter, at step 506, each frame of sensor data is transformed into the current reference coordinate system based on the acquired positions of the at least two sensors. Similar to the second embodiment, the relative movement of the vehicle between frames should also be taken into account during the conversion. At step 508, all transformed sensor data may be mapped onto an image to form a snapshot image. Furthermore, there may be an optional fusion step in this embodiment, such as to fuse sensor data having overlapping positions in the reference coordinate system.

II.Training road model

For an Autonomous (AD) vehicle, it makes real-time driving decisions based on HD maps and various sensor data. Generally, an AD vehicle must first determine its exact location on the road and then decide how to drive (steer, accelerate, etc.). More specifically, the AD vehicle identifies objects, such as lidar, cameras, and the like, based on real-time sensor data. It then compares the identified object with the road model contained in the HD map, thereby determining its location on the road.

In fact, a large part of the existing road models is constructed based on sensor data collected on roads by sensors mounted on map information collecting vehicles. It will be appreciated that initially, such data is dependent upon human judgment of the identified object. Through data accumulation, some rules are slowly formed and objects can be automatically identified by the computer. The ultimate goal is to have a sophisticated model that allows identification of various objects and to generate a road model by simply inputting acquired sensor data. However, existing road model constructions require the use of various sensors that work independently of each other. Therefore, in order to train a certain model, the model must be trained separately for each sensor. This is clearly inefficient and computationally expensive.

This problem can be solved by using the snapshot technique proposed by the present invention. According to the snapshot technique of the present invention, the data of the various sensors is integrated into a unified data structure. Therefore, only one training is required for this unified data.

FIG. 6 is a flow diagram of an exemplary method 600 for training a road model using snapshot images in accordance with an embodiment of the present invention. The method begins at step 602 by obtaining an existing road model of a road scene. At step 604, at least two frames of sensor data of the road scene are acquired from at least two sensors mounted on the vehicle, the at least two frames of sensor data being collected sequentially at different times. At step 606, for each of the at least two frames, a snapshot image is created using the acquired sensor data. At step 608, the existing road model is associated with each snapshot image as training data. At step 610, a new road model is trained using the training data. As an example, the training may be based on machine learning techniques. The snapshot image and known elements from the existing road model are paired or referred to as markers in order to be used as training data. With a large amount of training data, the desired model can be trained. Although the amount of training data used to train the model is still large, the amount of data will be significantly less than if the model were trained with each type of sensor data alone.

III.Training event detector

As previously described, snapshots of the present invention may contain data collected by one or more sensors at multiple times and, thus, may reflect the dynamic information of objects in a scene. This feature is also useful when training ADVs to identify motion states (also referred to as events) of objects that occur in real time in a scene. For example, the car in fig. 3 changes from the left lane of the current lane in which the vehicle with the sensor mounted is located to the current lane, which is a common lane change on the road, also known as "cut-in". Similar events include, but are not limited to: lane change; overtaking; turning; braking; collision and runaway.

FIG. 7 is a flow diagram of an exemplary method 700 for training an event detector using snapshot images in accordance with an embodiment of the present invention. The method 700 begins at step 702, and at step 702, at least two frames of sensor data are acquired from at least one sensor mounted on a vehicle. The at least two frames of sensor data may be collected sequentially at different times. At step 704, the results of the events that were occurring at the time the sensor data was acquired may be acquired. These results may be from humans. For example, an engineer may view a video corresponding to a frame of sensor data and identify an event in the video. At step 706, a snapshot image may be created using the acquired sensor data for each of the at least two frames (such as via the methods 200, 400, or 500 of creating a snapshot image described in fig. 2, 4, and 5). At step 708, the results of the captured event are associated with the corresponding snapshot image as training data. At step 710, an event detector is trained using the training data. As an example, the training may be based on machine learning techniques. The snapshot images and known elements are paired or labeled for use as training data. With a large amount of training data, the required event detectors can be trained. Although the amount of training data used to train the event detector is still large, the amount of data will be significantly less than the amount of data used to train the event detector with each type of sensor data alone.

FIG. 8 is a flow chart of an exemplary method 800 for detecting an event aboard a vehicle. The method 800 begins at step 802, and at step 802, an event detector (such as an event detector trained via the method 700) may be acquired. At step 804, at least one frame of sensor data may be acquired from at least one sensor mounted on the vehicle. At step 806, for each of the at least one frame, a snapshot image may be created using the acquired sensor data. At step 808, an event may be detected with the event detector based on the created snapshot image. More specifically, the step may include inputting the created snapshot image to an event detector, and then the event detector outputs the detected event based on the input snapshot image. Preferably, the result, i.e. the detected event, may be output with probability or confidence.

Fig. 9 illustrates an exemplary apparatus 900 for creating a snapshot image of a traffic scene in accordance with an embodiment of the present invention. Apparatus 900 may include a sensor data acquisition module 902, a sensor location acquisition module 904, a transformation module 906, and a rendering module 908. The sensor data acquisition module 902 may be configured to acquire sensor data of at least two sensors mounted on a vehicle. The sensor location acquisition module 904 may be configured to acquire the location of each sensor. The transformation module 906 may be configured for transforming the sensor data of each of the at least two sensors into a reference coordinate system based on the acquired position of the sensor. The rendering module 908 may be configured to render the transformed sensor data onto an image to form a snapshot image.

Fig. 10 illustrates an exemplary vehicle 1000 in accordance with an embodiment of the invention. Vehicle 1000 may include a device for creating a snapshot image of a traffic scene, such as device 900 in fig. 9. Like a normal vehicle, the vehicle 1000 may further include at least two sensors 1002 for collecting sensor data of a traffic scene. The sensors 1002 may be of different types and include, but are not limited to, lidar, radar, and cameras.

Fig. 11 illustrates an exemplary apparatus 1100 for creating snapshot images of traffic scenes in accordance with an embodiment of the present invention. The apparatus 1100 may include a sensor data acquisition module 1102, a sensor location acquisition module 1104, a transformation module 1106, and a rendering module 1108. The sensor data acquisition module 1102 may be configured to acquire at least two frames of sensor data for a sensor mounted on a vehicle. The sensor location acquisition module 1104 may be configured to acquire the location of the sensor. The transformation module 1106 may be configured to transform each frame of sensor data into a current frame of reference based on the acquired position of the sensor. The rendering module 1108 may be configured to render the transformed sensor data onto an image to form a snapshot image.

Fig. 12 illustrates an exemplary vehicle 1200 according to an embodiment of the invention. Vehicle 1200 may include a device for creating a snapshot image of a traffic scene, such as device 1100 in fig. 11. Like a normal vehicle, the vehicle 1200 may further include at least one sensor 1202 for collecting sensor data of a traffic scene. The at least one sensor 1202 may be of different types and include, but are not limited to, lidar, radar, and cameras.

Fig. 13 illustrates an exemplary apparatus 1300 for creating a snapshot image of a traffic scene according to an embodiment of the present invention. Apparatus 1300 may include a sensor data acquisition module 1302, a sensor location acquisition module 1304, a transformation module 1306, and a rendering module 1308. The sensor data acquisition module 1302 may be configured to acquire at least two frames of sensor data of a road scene from at least two sensors mounted on a vehicle. The sensor location acquisition module 1304 may be configured to acquire a location of each of the at least two sensors. The transformation module 1306 may be configured to transform each frame of sensor data into a current reference coordinate system based on the acquired locations of the at least two sensors. The rendering module 1308 may be configured to render the transformed sensor data onto an image to form a snapshot image.

FIG. 14 illustrates an exemplary vehicle 1400 according to an embodiment of the invention. Vehicle 1400 may include a device for creating a snapshot image of a traffic scene, such as device 1300 in fig. 13. Like a normal vehicle, the vehicle 1400 may further include at least two sensors 1402 for collecting sensor data for a traffic scene. The at least two sensors 1402 may be of different types and include, but are not limited to, lidar, radar, and cameras.

FIG. 15 illustrates an exemplary system 1500 for training a road model using snapshot images in accordance with an embodiment of the present invention. The system 1500 may include at least two sensors 1502 configured to collect sensor data of a road scene and a processing unit 1504. The processing unit 1504 is configured to perform a method of training a road model using snapshot images, such as the method 600 described in fig. 6.

Fig. 16 illustrates an exemplary system 1600 for training an event detector using snapshot images. The system 1600 can include a sensor data acquisition module 1602, an event result acquisition module 1604, a snapshot image creation module 1606, an association module 1608, and a training module 1610. The sensor data acquisition module 1602 may be configured to acquire at least two frames of sensor data from at least one sensor mounted on a vehicle. Event result acquisition module 1604 may be configured to acquire the results of events that are occurring while sensor data is being acquired. The snapshot image creation module 1606 may be configured to, for each of the at least two frames, create a snapshot image using the acquired sensor data. The association module 1608 may be configured to associate the results of the acquired events with corresponding snapshot images as training data. The training module 1610 may be configured to train the event detector using the training data.

Fig. 17 illustrates an apparatus 1700 for detecting an event on a vehicle in accordance with an embodiment of the present invention. The apparatus 1700 may include a detector acquisition module 1702, a sensor data acquisition module 1704, a snapshot image creation module 1706, and an event detection module 1708. The detector acquisition module 1702 may be configured for acquiring an event detector trained by the method, such as the method 800 described with respect to fig. 8. The sensor data acquisition module 1704 is configured to acquire at least two frames of sensor data from at least one sensor mounted on the vehicle. The snapshot image creation module 1706 may be configured to, for each of the at least two frames, create a snapshot image using the acquired sensor data. The event detection module 1708 may be configured to detect an event with an event detector based on the created snapshot image.

Fig. 18 illustrates an exemplary vehicle 1800, according to an embodiment of the invention. Vehicle 1800 may include a device for detecting events, such as device 1700 in fig. 17. Like a normal vehicle, the vehicle 1800 may further include at least one sensor 1802 for collecting sensor data of traffic scenarios. The sensors 1802 may be of different types and include, but are not limited to, lidar, radar, and cameras.

Fig. 19 illustrates a general hardware environment 1900 in which the present disclosure may be applied, according to an exemplary embodiment of the present disclosure.

Referring to fig. 19, a computing device 1900 will now be described, computing device 1900 being an example of a hardware device applicable to aspects of the present disclosure. Computing device 1900 may be any machine configured to perform processing and/or computing, and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a personal digital assistant, a smart phone, an on-board computer, or any combination thereof. The above-mentioned systems may be implemented in whole or at least in part by computing device 1900 or similar devices or systems.

Computing device 1900 may include components connected with bus 1902, or in communication with bus 1902, possibly via one or more interfaces. For example, the computing device 1900 may include a bus 1902, as well as one or more processors 1904, one or more input devices 1906, and one or more output devices 1908. The one or more processors 1904 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (such as a dedicated processing chip). Input device 1906 can be any type of device that can input information into a computing device and can include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote control. Output device 1908 can be any type of device that can present information and can include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Computing device 1900 may also include, or be connected with, non-transitory storage device 1910, which non-transitory storage device 1910 may be any storage device that is non-transitory and that enables data storage, and may include, but is not limited to, disk drives, optical storage devices, solid state storage, floppy disks, hard disks, tape, or any other magnetic medium, optical disks or any other optical medium, ROM (read only memory), RAM (random access memory), cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer can read data, instructions, and/or code. Non-transitory storage device 1910 may be separable from an interface. The non-transitory storage device 1910 may have data/instructions/code to implement the methods and steps described above. Computing device 1900 may also include communication devices 1912. The communication device 1912 may be any type of device or system capable of enabling communication with external devices and/or networks and may include, but is not limited to, a modem, a network card, an infrared communication device, such as Bluetooth^TMWireless communication devices and/or such as devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication facilitiesA chipset, and so on.

When computing device 1900 is used as an in-vehicle device, computing device 1000 may also be connected to external devices, such as a GPS receiver, sensors for sensing different environmental data (such as acceleration sensors, wheel speed sensors, gyroscopes), and so forth. In this way, computing device 1900 may, for example, receive location data and sensor data indicative of a driving condition of the vehicle. When computing device 1900 is used as an in-vehicle device, computing device 1000 may also be connected to other facilities for controlling travel and operation of the vehicle (such as an engine system, wipers, anti-lock brake system, etc.).

In addition, the non-transitory storage device 1910 may have map information and software elements so that the processor 1904 may perform route guidance processing. In addition, the output device 1906 may include a display for displaying a map, a position mark of the vehicle, and an image indicating the running condition of the vehicle. The output device 1906 may also include a speaker or interface with headphones for audio guidance.

The bus 1902 may include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA (eisa) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus. In particular, for an in-vehicle device, the bus 1902 may include a Controller Area Network (CAN) bus or other architecture designed for use in applications on an automobile.

Computing device 1900 may also include a working memory 1914, which working memory 1914 may be any type of working memory that can store instructions and/or data useful to the operation of processor 1904 and may include, but is not limited to, random access memory and/or read only memory devices.

Software elements may be located in working memory 1914, including but not limited to an operating system 1916, one or more application programs 1918, drivers, and/or other data and code. Instructions for performing the above-described methods and steps may be included in one or more applications 1918, and the above-mentioned elements of apparatus 800 may be implemented by processor 1904 reading and executing the instructions of one or more applications 1918. Executable code or source code for the instructions of the software elements may be stored in a non-transitory computer-readable storage medium (such as storage device 1910 described above) and may be read into working memory 1914, possibly by compilation and/or installation. Executable code or source code for the instructions of the software elements may also be downloaded from a remote location.

From the above embodiments, it is apparent to those skilled in the art that the present disclosure can be implemented by software having necessary hardware, or by hardware, firmware, and the like. Based on such understanding, embodiments of the present disclosure may be implemented partially in software. The computer software may be stored in a readable storage medium such as a floppy disk, hard disk, optical disk, or flash memory of the computer. The computer software includes a series of instructions to cause a computer (e.g., a personal computer, a service station, or a network terminal) to perform a method or a portion thereof according to a respective embodiment of the present disclosure.

Throughout the specification, reference has been made to "one example" or "an example" meaning that a particular described feature, structure or characteristic is included in at least one example. Thus, use of such phrases may refer to more than one example. Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more examples.

One skilled in the relevant art will recognize, however, that the examples can be practiced without one or more of the specific details, or with other methods, resources, materials, and so forth. In other instances, well-known structures, resources, or operations are not shown or described in detail to avoid obscuring aspects of the examples.

While examples and applications have been illustrated and described, it is to be understood that these examples are not limited to the precise configuration and resources described above. Various modifications, changes, and variations apparent to those skilled in the art may be made in the arrangement, operation, and details of the methods and systems disclosed herein without departing from the scope of the claimed examples.

Claims

1. A computer-implemented method for creating snapshot images of a traffic scene, the method comprising:

acquiring at least two frames of sensor data of a road scene from at least two sensors mounted on a vehicle, the at least two frames of sensor data collected sequentially at different times;

acquiring a position of each of the at least two sensors;

transforming each frame of sensor data into a current frame of reference based on the acquired positions of the at least two sensors; and

all transformed sensor data is mapped onto an image to form a snapshot image.

2. The method of claim 1, wherein the at least two sensors are of different types and are selected from the following sensors:

a laser radar;

a radar; and

a camera.

3. The method according to any one of the preceding claims, wherein the reference coordinate system is a two-dimensional coordinate system parallel to the ground.

4. The method of claim 3, wherein an origin of the reference coordinate system is a center point of a rear axle of the vehicle or a center of mass of any of the at least two sensors.

5. The method according to any one of the preceding claims, wherein the method further comprises:

a reference timestamp of the snapshot image is determined.

6. The method of claim 5, wherein the method further comprises:

the age of each sensor data frame relative to the reference timestamp is tagged.

7. The method according to any one of the preceding claims, wherein the method further comprises:

sensor data having overlapping positions in the reference coordinate system are fused.

8. A computer-implemented method for training a road model using snapshot images, wherein the method comprises:

acquiring an existing road model of a road scene;

acquiring at least two frames of sensor data of the road scene from at least two sensors mounted on a vehicle, the at least two frames of sensor data collected sequentially at different times;

for each of the at least two frames, creating a snapshot image using the acquired sensor data according to the method of any one of claims 1-7;

associating the existing road model with each of the snapshot images as training data; and

training a new road model using the training data.

9. The method of claim 8, wherein training a new road model using the training data comprises:

training the new road model based on machine learning using the training data as a label.

10. An apparatus for creating a snapshot image of a traffic scene, the apparatus comprising:

a sensor data acquisition module configured to acquire at least two frames of sensor data of a road scene from at least two sensors mounted on a vehicle, the at least two frames of sensor data collected sequentially at different times;

a sensor position acquisition module configured to acquire positions of the at least two sensors;

a transformation module configured to transform each frame of sensor data into a current frame of reference based on the acquired locations of the at least two sensors; and

a rendering module configured to render all transformed sensor data onto an image to form a snapshot image.

11. A vehicle, characterized in that the vehicle comprises:

at least two sensors; and

the apparatus of claim 8.

12. The vehicle of claim 11, wherein the at least two sensors are of different types and are selected from the following sensors:

a laser radar;

a radar; and

a camera.

13. The vehicle of any of claims 11-12, wherein the reference coordinate system is a two-dimensional coordinate system parallel to the ground, and an origin of the reference coordinate system is a center point of a rear axle of the vehicle or a center of mass of any of the at least two sensors.

14. A system for training a road model using snapshot images, wherein the system comprises:

at least two sensors configured to collect sensor data of a road scene; and

a processing unit configured to:

acquiring an existing road model of a road scene;

for each of the at least two frames, creating a snapshot image using the acquired sensor data according to any one of claims 1-7;

training a new road model using the training data.

15. The system of claim 14, wherein the new road model is trained based on machine learning using the training data as labels.