CN116503819A - Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium - Google Patents

Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium Download PDF

Info

Publication number
CN116503819A
CN116503819A CN202310474419.XA CN202310474419A CN116503819A CN 116503819 A CN116503819 A CN 116503819A CN 202310474419 A CN202310474419 A CN 202310474419A CN 116503819 A CN116503819 A CN 116503819A
Authority
CN
China
Prior art keywords
vehicle
road
target detection
monitoring image
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310474419.XA
Other languages
Chinese (zh)
Inventor
黄翼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co Ltd filed Critical Chongqing Changan Automobile Co Ltd
Priority to CN202310474419.XA priority Critical patent/CN116503819A/en
Publication of CN116503819A publication Critical patent/CN116503819A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/38Registration of image sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium, wherein the method comprises the following steps: when the vehicle is detected to enter a certain monitoring area, a current vehicle-end monitoring image, a current road-end monitoring image and a corresponding depth map are obtained; inputting the current vehicle monitoring image and the current road side monitoring image into a depth alignment neural network to obtain an implicit alignment feature map; inputting the current road end monitoring image and the corresponding depth map into a depth feature extraction neural network to obtain a corresponding road end visual angle feature map; after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a coordinate conversion network for coordinate conversion to obtain a vehicle end visual angle feature map; and inputting the vehicle-end visual angle characteristic diagram into a 3D target detection network to obtain first 3D target detection information. The invention saves the hardware cost of the vehicle, reduces the depth information processing process of unnecessary scenes at the vehicle end, and effectively improves the stability of the automatic driving system.

Description

Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium
Technical Field
The invention relates to the technical field of automatic driving, in particular to a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium.
Background
At present, the existing vehicle-road cooperative three-dimensional target detection method adopts a traditional algorithm to perform position alignment, or a corresponding depth map acquisition device is required to be installed on a vehicle and calibrated. However, the conventional algorithm is used for alignment, so that end-to-end result output cannot be completed, and there is a defect that position prediction errors become large with time. The corresponding depth map acquisition equipment is installed on the vehicle and calibrated, so that the software and hardware cost of the vehicle is increased, and the depth information processing process of a plurality of unnecessary scenes is also increased.
In fact, in the automatic driving process of the automobile, the robustness of the automatic driving system can be greatly improved only by acquiring the 3D information of the target in some key driving areas.
Disclosure of Invention
The invention provides a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium, which are used for solving the problems that the cost of software and hardware of a vehicle is increased and the depth information processing process of unnecessary scenes exists in the prior art that corresponding depth map acquisition equipment is installed on the vehicle.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a vehicle-road cooperative end-to-end 3D target detection method comprises the following steps:
when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
Preferably, whether the vehicle enters a certain monitoring area is detected, specifically, when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is judged to enter the monitoring area corresponding to the monitoring camera.
Preferably, the implicit alignment feature map is a coordinate alignment relationship between a vehicle end and a road end.
Preferably, the first 3D object detection information includes a 3D position of an object of a vehicle end view angle, a distance from a vehicle, and an angle.
Preferably, the size of the implicit alignment feature map is the same as that of the road end visual angle feature map, the implicit alignment feature map and the road end visual angle feature map are spliced on a channel and combined into a feature map, and then the feature map is input into a coordinate conversion network for coordinate conversion.
Preferably, the depth alignment neural network, the depth feature extraction neural network, the coordinate transformation network and the 3D object detection network are all depth neural networks.
Further, forming a 3D target detection model by a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network and a 3D target detection network according to the data flow direction; the 3D object detection model is pre-trained as follows:
inputting the train end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image for training into a 3D target detection model to obtain second 3D target detection information;
and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.
A vehicle-road cooperative end-to-end 3D target detection system, comprising:
the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;
the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;
the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;
the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;
the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;
the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;
and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.
A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of the method as described above.
The beneficial effects of the invention are as follows:
1. compared with the traditional method that related equipment is arranged on a vehicle for acquiring the depth map, the method does not need to arrange the related equipment for acquiring the depth map on the vehicle, and only needs to acquire the current road-end monitoring image of the road end and the depth map corresponding to the current road-end monitoring image to process, so that the hardware cost of the vehicle is saved, the depth information processing process of unnecessary scenes on the vehicle end is reduced, and the stability of an automatic driving system is effectively improved.
2. Compared with the traditional algorithm, the method has the problems that end-to-end result output cannot be completed, and the position prediction error becomes larger as time goes on. According to the invention, the trained depth alignment neural network, the depth feature extraction neural network, the depth alignment neural network and the coordinate conversion network are directly used for processing corresponding data, and the 3D detection result of the vehicle end visual angle can be directly output only by inputting the corresponding data, so that the rapid end-to-end result output can be realized, and the position prediction error is reduced.
Drawings
Fig. 1 is a flow chart of steps of a vehicle-road cooperative end-to-end 3D target detection method of the present invention.
Fig. 2 is a data flow diagram of each neural network in the present invention.
Fig. 3 is a schematic block diagram of a vehicle-road cooperative end-to-end 3D target detection system according to the present invention.
Detailed Description
Further advantages and effects of the present invention will become readily apparent to those skilled in the art from the disclosure herein, by referring to the accompanying drawings and the preferred embodiments. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In order to solve the problem that the related equipment for acquiring the depth map is installed on the vehicle in the existing method, the method not only increases the hardware cost of the vehicle, but also increases the depth information processing process of a plurality of unnecessary scenes. However, in the actual automatic driving process of the automobile, only the 3D information of the target needs to be acquired in some key driving areas, so that the robustness of the automatic driving system can be greatly improved.
In view of this, this embodiment provides a vehicle-road collaborative end-to-end 3D target detection method, which is used for vehicle 3D target pose detection in the field of automatic driving, and can utilize monitoring images of road-end cameras and corresponding depth information in a key road section in an automatic driving process, so as to effectively improve stability of an automatic driving system.
As shown in fig. 1 and 2, the vehicle-road cooperative end-to-end 3D target detection method includes the following steps:
when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
In this embodiment, through the monitoring camera of way end pre-installation, monitoring camera is networking, simultaneously, because the monitoring camera of way end is fixed mounting, and the position can not change, consequently can go up the high in the clouds with the position coordinate information of monitoring camera.
When the vehicle is detected to enter a certain monitoring area, a monitoring camera at a road end is requested to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image, and the current vehicle end monitoring image of the vehicle is acquired. The current road-end monitoring image is a road two-dimensional image. The depth map is a depth map of a two-dimensional image of a road, the depth map being an image or image channel containing information about the distance of the surface of a scene object to a viewpoint.
Compared with the traditional method that related equipment is arranged on a vehicle for acquiring the depth map, the method of the embodiment does not need to arrange related equipment for acquiring the depth map on the vehicle, and only needs to process the current road-end monitoring image of the road-end and the depth map corresponding to the current road-end monitoring image by requesting the road-end, so that the hardware cost of the vehicle is saved, and meanwhile, the road-end monitoring image is deepened by the monitoring head of the road-end, so that the depth information processing process of unnecessary scenes on the vehicle end is reduced, and the stability of an automatic driving system is effectively improved.
According to the embodiment, corresponding data are processed directly through the trained depth alignment neural network, the depth feature extraction neural network, the depth alignment neural network and the coordinate conversion network, the 3D detection result of the vehicle end visual angle can be directly output only by inputting the corresponding data, and quick end-to-end result output can be realized.
In this embodiment, whether the vehicle enters a certain monitoring area is detected, specifically, the distance between the position coordinate of the monitoring camera installed at the road end in advance and the current position of the vehicle is judged, and when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is judged to enter the monitoring area corresponding to the monitoring camera.
Specifically, vehicle position information is obtained through a GPS (global positioning system) and the like, the vehicle can obtain position coordinate information of a monitoring camera at a road end through a cloud, and when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is indicated to enter a monitoring area. At the moment, the intelligent driving system requests the monitoring camera to acquire the current road end monitoring image and the corresponding depth map. The method does not need time stamp information, so the requirement on request time delay is not high. When it is perceived that the vehicle is about to enter the monitored area, a request may be sent.
The embodiment inputs the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map, and obtains implicit alignment information, wherein the implicit alignment feature map is the coordinate alignment relation between a vehicle end and a road end. More specifically, after the current road-side monitoring image is acquired, the current monitoring image and the current road-side monitoring image are input together into the depth-aligned neural network. The depth alignment neural network outputs an implicit coordinate alignment relation between a vehicle end and a road side according to the input picture.
The deep alignment neural network is a deep neural network. The depth alignment neural network has the capability of aligning the vehicle end with the road end visual angle after training.
According to the embodiment, the current road end monitoring image and the depth map corresponding to the current road end monitoring image are input into a trained depth feature extraction neural network, and the corresponding road end visual angle feature map is extracted. The road end visual angle characteristic map comprises texture, depth and other information of road end monitoring images. Different network modules in the depth feature extraction neural network are generally used for respectively extracting features of the current road end monitoring image and the current road end monitoring image, and then the same network module is used for fusing feature information of the current road end monitoring image and feature information of the current road end monitoring image to obtain a final road end visual angle feature map.
In this embodiment, a feature map of a vehicle-end view angle is input into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end view angle. The first 3D object detection information comprises the 3D position, the distance and the angle of an object from the vehicle end view angle.
Because the embodiment fuses the depth map corresponding to the current road-end monitoring image, the first 3D target detection information obtained by the embodiment is more accurate than the single-purpose monitoring image of the vehicle. And because the depth neural network is used as a coordinate conversion network to learn the coordinate conversion information and coordinate conversion is carried out on the implicit alignment feature map and the road end visual angle feature map, the vehicle end visual angle feature map of the vehicle end visual angle can be directly output.
In this embodiment, the dimensions of the implicit alignment feature map and the road-end view feature map are the same, the implicit alignment feature map and the road-end view feature map are spliced on a channel to form a feature map, and then the feature map is input into a coordinate conversion network for coordinate conversion.
In this embodiment, the depth alignment neural network, the depth feature extraction neural network, the coordinate transformation network, and the 3D object detection network are all depth neural networks.
In this embodiment, as shown in fig. 2, a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network, and a 3D target detection network are formed into a 3D target detection model according to the data flow direction; the 3D object detection model is pre-trained as follows:
inputting the trained vehicle end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image into a 3D target detection model, and outputting to obtain second 3D target detection information;
and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.
Due to errors in the samples during training, the 3D object detection model may learn the ability to overcome delay errors.
In a specific embodiment, there is further provided a vehicle-road cooperative end-to-end 3D object detection system, as shown in fig. 3, including:
the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;
the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;
the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;
the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;
the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;
the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;
and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.
In a specific embodiment, there is also provided a computer device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and when the processor executes the computer program, the steps of the method for implementing the end-to-end 3D target detection of the vehicle-road coordination are as follows:
when detecting that a vehicle enters a certain monitoring area, obtaining a current vehicle-end monitoring image, a current road-end monitoring image and a depth map corresponding to the current road-end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network, and extracting to obtain a corresponding road end visual angle feature map;
after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a trained coordinate conversion network to perform coordinate conversion, and obtaining a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
In a specific embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the vehicle-road cooperative end-to-end 3D target detection method as follows:
when detecting that a vehicle enters a certain monitoring area, obtaining a current vehicle-end monitoring image, a current road-end monitoring image and a depth map corresponding to the current road-end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a trained coordinate conversion network to perform coordinate conversion, and obtaining a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
Those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
The above embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention.

Claims (10)

1. A vehicle-road cooperative end-to-end 3D target detection method is characterized by comprising the following steps of: the method comprises the following steps:
when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
2. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: detecting whether the vehicle enters a certain monitoring area, specifically, judging that the vehicle enters the monitoring area corresponding to the monitoring camera when detecting that the distance between the current position of the vehicle and the monitoring camera is smaller than a preset first distance threshold value.
3. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the implicit alignment feature map is the coordinate alignment relation between the vehicle end and the road end.
4. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the first 3D object detection information comprises the 3D position, the distance and the angle of an object from the vehicle end view angle.
5. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the implicit alignment feature map and the road end visual angle feature map are the same in size, are spliced on a channel and are combined into a feature map, and then are input into a coordinate conversion network for coordinate conversion.
6. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the depth alignment neural network, the depth feature extraction neural network, the coordinate conversion network and the 3D target detection network are all depth neural networks.
7. The vehicle-road cooperative end-to-end 3D target detection method of claim 6, wherein: forming a 3D target detection model by a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network and a 3D target detection network according to the data flow direction; the 3D object detection model is pre-trained as follows:
inputting the train end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image for training into a 3D target detection model for training to obtain second 3D target detection information;
and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.
8. The utility model provides a car way is coordinated end to end 3D target detecting system which characterized in that: comprising the following steps:
the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;
the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;
the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;
the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;
the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;
the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;
and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.
9. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
CN202310474419.XA 2023-04-27 2023-04-27 Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium Pending CN116503819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310474419.XA CN116503819A (en) 2023-04-27 2023-04-27 Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310474419.XA CN116503819A (en) 2023-04-27 2023-04-27 Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116503819A true CN116503819A (en) 2023-07-28

Family

ID=87319784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310474419.XA Pending CN116503819A (en) 2023-04-27 2023-04-27 Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116503819A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117635663A (en) * 2023-12-12 2024-03-01 中北数科(河北)科技有限公司 Target vehicle video tracking method and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117635663A (en) * 2023-12-12 2024-03-01 中北数科(河北)科技有限公司 Target vehicle video tracking method and electronic equipment
CN117635663B (en) * 2023-12-12 2024-05-24 中北数科(河北)科技有限公司 Target vehicle video tracking method and electronic equipment

Similar Documents

Publication Publication Date Title
CN109920246B (en) Collaborative local path planning method based on V2X communication and binocular vision
EP4152204A1 (en) Lane line detection method, and related apparatus
WO2022206942A1 (en) Laser radar point cloud dynamic segmentation and fusion method based on driving safety risk field
CN110443225B (en) Virtual and real lane line identification method and device based on feature pixel statistics
WO2021155685A1 (en) Map updating method, apparatus and device
EP3822852B1 (en) Method, apparatus, computer storage medium and program for training a trajectory planning model
CN110738121A (en) front vehicle detection method and detection system
CN108594244B (en) Obstacle recognition transfer learning method based on stereoscopic vision and laser radar
CN112654998B (en) Lane line detection method and device
US20220396281A1 (en) Platform for perception system development for automated driving system
CN112753038A (en) Method and device for identifying lane change trend of vehicle
WO2023155580A1 (en) Object recognition method and apparatus
US11999371B2 (en) Driving assistance processing method and apparatus, computer-readable medium, and electronic device
CN114495064A (en) Monocular depth estimation-based vehicle surrounding obstacle early warning method
CN116503819A (en) Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium
US20230326055A1 (en) System and method for self-supervised monocular ground-plane extraction
CN113419245A (en) Real-time mapping system and mapping method based on V2X
CN113643431A (en) System and method for iterative optimization of visual algorithm
CN110472508A (en) Lane line distance measuring method based on deep learning and binocular vision
CN115797578A (en) Processing method and device for high-precision map
CN115236696A (en) Method and device for determining obstacle, electronic equipment and storage medium
US11288520B2 (en) Systems and methods to aggregate and distribute dynamic information of crowdsourcing vehicles for edge-assisted live map service
US11544899B2 (en) System and method for generating terrain maps
WO2022258203A1 (en) Platform for perception function development for automated driving system
WO2023036032A1 (en) Lane line detection method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination