CN116503819A

CN116503819A - Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium

Info

Publication number: CN116503819A
Application number: CN202310474419.XA
Authority: CN
Inventors: 黄翼
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-07-28

Abstract

The invention discloses a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium, wherein the method comprises the following steps: when the vehicle is detected to enter a certain monitoring area, a current vehicle-end monitoring image, a current road-end monitoring image and a corresponding depth map are obtained; inputting the current vehicle monitoring image and the current road side monitoring image into a depth alignment neural network to obtain an implicit alignment feature map; inputting the current road end monitoring image and the corresponding depth map into a depth feature extraction neural network to obtain a corresponding road end visual angle feature map; after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a coordinate conversion network for coordinate conversion to obtain a vehicle end visual angle feature map; and inputting the vehicle-end visual angle characteristic diagram into a 3D target detection network to obtain first 3D target detection information. The invention saves the hardware cost of the vehicle, reduces the depth information processing process of unnecessary scenes at the vehicle end, and effectively improves the stability of the automatic driving system.

Description

Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium

Technical Field

The invention relates to the technical field of automatic driving, in particular to a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium.

Background

At present, the existing vehicle-road cooperative three-dimensional target detection method adopts a traditional algorithm to perform position alignment, or a corresponding depth map acquisition device is required to be installed on a vehicle and calibrated. However, the conventional algorithm is used for alignment, so that end-to-end result output cannot be completed, and there is a defect that position prediction errors become large with time. The corresponding depth map acquisition equipment is installed on the vehicle and calibrated, so that the software and hardware cost of the vehicle is increased, and the depth information processing process of a plurality of unnecessary scenes is also increased.

In fact, in the automatic driving process of the automobile, the robustness of the automatic driving system can be greatly improved only by acquiring the 3D information of the target in some key driving areas.

Disclosure of Invention

The invention provides a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium, which are used for solving the problems that the cost of software and hardware of a vehicle is increased and the depth information processing process of unnecessary scenes exists in the prior art that corresponding depth map acquisition equipment is installed on the vehicle.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a vehicle-road cooperative end-to-end 3D target detection method comprises the following steps:

when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;

inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;

inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;

after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;

and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.

Preferably, whether the vehicle enters a certain monitoring area is detected, specifically, when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is judged to enter the monitoring area corresponding to the monitoring camera.

Preferably, the implicit alignment feature map is a coordinate alignment relationship between a vehicle end and a road end.

Preferably, the first 3D object detection information includes a 3D position of an object of a vehicle end view angle, a distance from a vehicle, and an angle.

Preferably, the size of the implicit alignment feature map is the same as that of the road end visual angle feature map, the implicit alignment feature map and the road end visual angle feature map are spliced on a channel and combined into a feature map, and then the feature map is input into a coordinate conversion network for coordinate conversion.

Preferably, the depth alignment neural network, the depth feature extraction neural network, the coordinate transformation network and the 3D object detection network are all depth neural networks.

Further, forming a 3D target detection model by a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network and a 3D target detection network according to the data flow direction; the 3D object detection model is pre-trained as follows:

inputting the train end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image for training into a 3D target detection model to obtain second 3D target detection information;

and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.

A vehicle-road cooperative end-to-end 3D target detection system, comprising:

the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;

the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;

the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;

the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;

the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;

the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;

and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.

A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.

A computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of the method as described above.

The beneficial effects of the invention are as follows:

1. compared with the traditional method that related equipment is arranged on a vehicle for acquiring the depth map, the method does not need to arrange the related equipment for acquiring the depth map on the vehicle, and only needs to acquire the current road-end monitoring image of the road end and the depth map corresponding to the current road-end monitoring image to process, so that the hardware cost of the vehicle is saved, the depth information processing process of unnecessary scenes on the vehicle end is reduced, and the stability of an automatic driving system is effectively improved.

2. Compared with the traditional algorithm, the method has the problems that end-to-end result output cannot be completed, and the position prediction error becomes larger as time goes on. According to the invention, the trained depth alignment neural network, the depth feature extraction neural network, the depth alignment neural network and the coordinate conversion network are directly used for processing corresponding data, and the 3D detection result of the vehicle end visual angle can be directly output only by inputting the corresponding data, so that the rapid end-to-end result output can be realized, and the position prediction error is reduced.

Drawings

Fig. 1 is a flow chart of steps of a vehicle-road cooperative end-to-end 3D target detection method of the present invention.

Fig. 2 is a data flow diagram of each neural network in the present invention.

Fig. 3 is a schematic block diagram of a vehicle-road cooperative end-to-end 3D target detection system according to the present invention.

Detailed Description

Further advantages and effects of the present invention will become readily apparent to those skilled in the art from the disclosure herein, by referring to the accompanying drawings and the preferred embodiments. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

In order to solve the problem that the related equipment for acquiring the depth map is installed on the vehicle in the existing method, the method not only increases the hardware cost of the vehicle, but also increases the depth information processing process of a plurality of unnecessary scenes. However, in the actual automatic driving process of the automobile, only the 3D information of the target needs to be acquired in some key driving areas, so that the robustness of the automatic driving system can be greatly improved.

In view of this, this embodiment provides a vehicle-road collaborative end-to-end 3D target detection method, which is used for vehicle 3D target pose detection in the field of automatic driving, and can utilize monitoring images of road-end cameras and corresponding depth information in a key road section in an automatic driving process, so as to effectively improve stability of an automatic driving system.

As shown in fig. 1 and 2, the vehicle-road cooperative end-to-end 3D target detection method includes the following steps:

In this embodiment, through the monitoring camera of way end pre-installation, monitoring camera is networking, simultaneously, because the monitoring camera of way end is fixed mounting, and the position can not change, consequently can go up the high in the clouds with the position coordinate information of monitoring camera.

When the vehicle is detected to enter a certain monitoring area, a monitoring camera at a road end is requested to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image, and the current vehicle end monitoring image of the vehicle is acquired. The current road-end monitoring image is a road two-dimensional image. The depth map is a depth map of a two-dimensional image of a road, the depth map being an image or image channel containing information about the distance of the surface of a scene object to a viewpoint.

Compared with the traditional method that related equipment is arranged on a vehicle for acquiring the depth map, the method of the embodiment does not need to arrange related equipment for acquiring the depth map on the vehicle, and only needs to process the current road-end monitoring image of the road-end and the depth map corresponding to the current road-end monitoring image by requesting the road-end, so that the hardware cost of the vehicle is saved, and meanwhile, the road-end monitoring image is deepened by the monitoring head of the road-end, so that the depth information processing process of unnecessary scenes on the vehicle end is reduced, and the stability of an automatic driving system is effectively improved.

According to the embodiment, corresponding data are processed directly through the trained depth alignment neural network, the depth feature extraction neural network, the depth alignment neural network and the coordinate conversion network, the 3D detection result of the vehicle end visual angle can be directly output only by inputting the corresponding data, and quick end-to-end result output can be realized.

In this embodiment, whether the vehicle enters a certain monitoring area is detected, specifically, the distance between the position coordinate of the monitoring camera installed at the road end in advance and the current position of the vehicle is judged, and when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is judged to enter the monitoring area corresponding to the monitoring camera.

Specifically, vehicle position information is obtained through a GPS (global positioning system) and the like, the vehicle can obtain position coordinate information of a monitoring camera at a road end through a cloud, and when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is indicated to enter a monitoring area. At the moment, the intelligent driving system requests the monitoring camera to acquire the current road end monitoring image and the corresponding depth map. The method does not need time stamp information, so the requirement on request time delay is not high. When it is perceived that the vehicle is about to enter the monitored area, a request may be sent.

The embodiment inputs the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map, and obtains implicit alignment information, wherein the implicit alignment feature map is the coordinate alignment relation between a vehicle end and a road end. More specifically, after the current road-side monitoring image is acquired, the current monitoring image and the current road-side monitoring image are input together into the depth-aligned neural network. The depth alignment neural network outputs an implicit coordinate alignment relation between a vehicle end and a road side according to the input picture.

The deep alignment neural network is a deep neural network. The depth alignment neural network has the capability of aligning the vehicle end with the road end visual angle after training.

According to the embodiment, the current road end monitoring image and the depth map corresponding to the current road end monitoring image are input into a trained depth feature extraction neural network, and the corresponding road end visual angle feature map is extracted. The road end visual angle characteristic map comprises texture, depth and other information of road end monitoring images. Different network modules in the depth feature extraction neural network are generally used for respectively extracting features of the current road end monitoring image and the current road end monitoring image, and then the same network module is used for fusing feature information of the current road end monitoring image and feature information of the current road end monitoring image to obtain a final road end visual angle feature map.

In this embodiment, a feature map of a vehicle-end view angle is input into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end view angle. The first 3D object detection information comprises the 3D position, the distance and the angle of an object from the vehicle end view angle.

Because the embodiment fuses the depth map corresponding to the current road-end monitoring image, the first 3D target detection information obtained by the embodiment is more accurate than the single-purpose monitoring image of the vehicle. And because the depth neural network is used as a coordinate conversion network to learn the coordinate conversion information and coordinate conversion is carried out on the implicit alignment feature map and the road end visual angle feature map, the vehicle end visual angle feature map of the vehicle end visual angle can be directly output.

In this embodiment, the dimensions of the implicit alignment feature map and the road-end view feature map are the same, the implicit alignment feature map and the road-end view feature map are spliced on a channel to form a feature map, and then the feature map is input into a coordinate conversion network for coordinate conversion.

In this embodiment, the depth alignment neural network, the depth feature extraction neural network, the coordinate transformation network, and the 3D object detection network are all depth neural networks.

In this embodiment, as shown in fig. 2, a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network, and a 3D target detection network are formed into a 3D target detection model according to the data flow direction; the 3D object detection model is pre-trained as follows:

inputting the trained vehicle end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image into a 3D target detection model, and outputting to obtain second 3D target detection information;

Due to errors in the samples during training, the 3D object detection model may learn the ability to overcome delay errors.

In a specific embodiment, there is further provided a vehicle-road cooperative end-to-end 3D object detection system, as shown in fig. 3, including:

In a specific embodiment, there is also provided a computer device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and when the processor executes the computer program, the steps of the method for implementing the end-to-end 3D target detection of the vehicle-road coordination are as follows:

when detecting that a vehicle enters a certain monitoring area, obtaining a current vehicle-end monitoring image, a current road-end monitoring image and a depth map corresponding to the current road-end monitoring image;

inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network, and extracting to obtain a corresponding road end visual angle feature map;

after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a trained coordinate conversion network to perform coordinate conversion, and obtaining a vehicle end visual angle feature map;

Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.

In a specific embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the vehicle-road cooperative end-to-end 3D target detection method as follows:

Those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

The above embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention.

Claims

1. A vehicle-road cooperative end-to-end 3D target detection method is characterized by comprising the following steps of: the method comprises the following steps:

2. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: detecting whether the vehicle enters a certain monitoring area, specifically, judging that the vehicle enters the monitoring area corresponding to the monitoring camera when detecting that the distance between the current position of the vehicle and the monitoring camera is smaller than a preset first distance threshold value.

3. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the implicit alignment feature map is the coordinate alignment relation between the vehicle end and the road end.

4. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the first 3D object detection information comprises the 3D position, the distance and the angle of an object from the vehicle end view angle.

5. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the implicit alignment feature map and the road end visual angle feature map are the same in size, are spliced on a channel and are combined into a feature map, and then are input into a coordinate conversion network for coordinate conversion.

6. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the depth alignment neural network, the depth feature extraction neural network, the coordinate conversion network and the 3D target detection network are all depth neural networks.

7. The vehicle-road cooperative end-to-end 3D target detection method of claim 6, wherein: forming a 3D target detection model by a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network and a 3D target detection network according to the data flow direction; the 3D object detection model is pre-trained as follows:

inputting the train end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image for training into a 3D target detection model for training to obtain second 3D target detection information;

8. The utility model provides a car way is coordinated end to end 3D target detecting system which characterized in that: comprising the following steps:

9. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.