CN116503819A - Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium - Google Patents
Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN116503819A CN116503819A CN202310474419.XA CN202310474419A CN116503819A CN 116503819 A CN116503819 A CN 116503819A CN 202310474419 A CN202310474419 A CN 202310474419A CN 116503819 A CN116503819 A CN 116503819A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- road
- target detection
- monitoring image
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 80
- 238000012544 monitoring process Methods 0.000 claims abstract description 131
- 230000000007 visual effect Effects 0.000 claims abstract description 53
- 238000013528 artificial neural network Methods 0.000 claims abstract description 49
- 238000006243 chemical reaction Methods 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000000605 extraction Methods 0.000 claims abstract description 19
- 238000010586 diagram Methods 0.000 claims abstract description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 11
- 230000010365 information processing Effects 0.000 abstract description 6
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/54—Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/38—Registration of image sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/32—Indexing scheme for image data processing or generation, in general involving image mosaicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium, wherein the method comprises the following steps: when the vehicle is detected to enter a certain monitoring area, a current vehicle-end monitoring image, a current road-end monitoring image and a corresponding depth map are obtained; inputting the current vehicle monitoring image and the current road side monitoring image into a depth alignment neural network to obtain an implicit alignment feature map; inputting the current road end monitoring image and the corresponding depth map into a depth feature extraction neural network to obtain a corresponding road end visual angle feature map; after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a coordinate conversion network for coordinate conversion to obtain a vehicle end visual angle feature map; and inputting the vehicle-end visual angle characteristic diagram into a 3D target detection network to obtain first 3D target detection information. The invention saves the hardware cost of the vehicle, reduces the depth information processing process of unnecessary scenes at the vehicle end, and effectively improves the stability of the automatic driving system.
Description
Technical Field
The invention relates to the technical field of automatic driving, in particular to a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium.
Background
At present, the existing vehicle-road cooperative three-dimensional target detection method adopts a traditional algorithm to perform position alignment, or a corresponding depth map acquisition device is required to be installed on a vehicle and calibrated. However, the conventional algorithm is used for alignment, so that end-to-end result output cannot be completed, and there is a defect that position prediction errors become large with time. The corresponding depth map acquisition equipment is installed on the vehicle and calibrated, so that the software and hardware cost of the vehicle is increased, and the depth information processing process of a plurality of unnecessary scenes is also increased.
In fact, in the automatic driving process of the automobile, the robustness of the automatic driving system can be greatly improved only by acquiring the 3D information of the target in some key driving areas.
Disclosure of Invention
The invention provides a vehicle-road collaborative end-to-end 3D target detection method, a system, equipment and a storage medium, which are used for solving the problems that the cost of software and hardware of a vehicle is increased and the depth information processing process of unnecessary scenes exists in the prior art that corresponding depth map acquisition equipment is installed on the vehicle.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a vehicle-road cooperative end-to-end 3D target detection method comprises the following steps:
when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
Preferably, whether the vehicle enters a certain monitoring area is detected, specifically, when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is judged to enter the monitoring area corresponding to the monitoring camera.
Preferably, the implicit alignment feature map is a coordinate alignment relationship between a vehicle end and a road end.
Preferably, the first 3D object detection information includes a 3D position of an object of a vehicle end view angle, a distance from a vehicle, and an angle.
Preferably, the size of the implicit alignment feature map is the same as that of the road end visual angle feature map, the implicit alignment feature map and the road end visual angle feature map are spliced on a channel and combined into a feature map, and then the feature map is input into a coordinate conversion network for coordinate conversion.
Preferably, the depth alignment neural network, the depth feature extraction neural network, the coordinate transformation network and the 3D object detection network are all depth neural networks.
Further, forming a 3D target detection model by a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network and a 3D target detection network according to the data flow direction; the 3D object detection model is pre-trained as follows:
inputting the train end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image for training into a 3D target detection model to obtain second 3D target detection information;
and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.
A vehicle-road cooperative end-to-end 3D target detection system, comprising:
the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;
the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;
the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;
the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;
the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;
the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;
and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.
A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing the steps of the method as described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of the method as described above.
The beneficial effects of the invention are as follows:
1. compared with the traditional method that related equipment is arranged on a vehicle for acquiring the depth map, the method does not need to arrange the related equipment for acquiring the depth map on the vehicle, and only needs to acquire the current road-end monitoring image of the road end and the depth map corresponding to the current road-end monitoring image to process, so that the hardware cost of the vehicle is saved, the depth information processing process of unnecessary scenes on the vehicle end is reduced, and the stability of an automatic driving system is effectively improved.
2. Compared with the traditional algorithm, the method has the problems that end-to-end result output cannot be completed, and the position prediction error becomes larger as time goes on. According to the invention, the trained depth alignment neural network, the depth feature extraction neural network, the depth alignment neural network and the coordinate conversion network are directly used for processing corresponding data, and the 3D detection result of the vehicle end visual angle can be directly output only by inputting the corresponding data, so that the rapid end-to-end result output can be realized, and the position prediction error is reduced.
Drawings
Fig. 1 is a flow chart of steps of a vehicle-road cooperative end-to-end 3D target detection method of the present invention.
Fig. 2 is a data flow diagram of each neural network in the present invention.
Fig. 3 is a schematic block diagram of a vehicle-road cooperative end-to-end 3D target detection system according to the present invention.
Detailed Description
Further advantages and effects of the present invention will become readily apparent to those skilled in the art from the disclosure herein, by referring to the accompanying drawings and the preferred embodiments. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In order to solve the problem that the related equipment for acquiring the depth map is installed on the vehicle in the existing method, the method not only increases the hardware cost of the vehicle, but also increases the depth information processing process of a plurality of unnecessary scenes. However, in the actual automatic driving process of the automobile, only the 3D information of the target needs to be acquired in some key driving areas, so that the robustness of the automatic driving system can be greatly improved.
In view of this, this embodiment provides a vehicle-road collaborative end-to-end 3D target detection method, which is used for vehicle 3D target pose detection in the field of automatic driving, and can utilize monitoring images of road-end cameras and corresponding depth information in a key road section in an automatic driving process, so as to effectively improve stability of an automatic driving system.
As shown in fig. 1 and 2, the vehicle-road cooperative end-to-end 3D target detection method includes the following steps:
when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
In this embodiment, through the monitoring camera of way end pre-installation, monitoring camera is networking, simultaneously, because the monitoring camera of way end is fixed mounting, and the position can not change, consequently can go up the high in the clouds with the position coordinate information of monitoring camera.
When the vehicle is detected to enter a certain monitoring area, a monitoring camera at a road end is requested to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image, and the current vehicle end monitoring image of the vehicle is acquired. The current road-end monitoring image is a road two-dimensional image. The depth map is a depth map of a two-dimensional image of a road, the depth map being an image or image channel containing information about the distance of the surface of a scene object to a viewpoint.
Compared with the traditional method that related equipment is arranged on a vehicle for acquiring the depth map, the method of the embodiment does not need to arrange related equipment for acquiring the depth map on the vehicle, and only needs to process the current road-end monitoring image of the road-end and the depth map corresponding to the current road-end monitoring image by requesting the road-end, so that the hardware cost of the vehicle is saved, and meanwhile, the road-end monitoring image is deepened by the monitoring head of the road-end, so that the depth information processing process of unnecessary scenes on the vehicle end is reduced, and the stability of an automatic driving system is effectively improved.
According to the embodiment, corresponding data are processed directly through the trained depth alignment neural network, the depth feature extraction neural network, the depth alignment neural network and the coordinate conversion network, the 3D detection result of the vehicle end visual angle can be directly output only by inputting the corresponding data, and quick end-to-end result output can be realized.
In this embodiment, whether the vehicle enters a certain monitoring area is detected, specifically, the distance between the position coordinate of the monitoring camera installed at the road end in advance and the current position of the vehicle is judged, and when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is judged to enter the monitoring area corresponding to the monitoring camera.
Specifically, vehicle position information is obtained through a GPS (global positioning system) and the like, the vehicle can obtain position coordinate information of a monitoring camera at a road end through a cloud, and when the distance between the current position of the vehicle and the monitoring camera is detected to be smaller than a preset first distance threshold value, the vehicle is indicated to enter a monitoring area. At the moment, the intelligent driving system requests the monitoring camera to acquire the current road end monitoring image and the corresponding depth map. The method does not need time stamp information, so the requirement on request time delay is not high. When it is perceived that the vehicle is about to enter the monitored area, a request may be sent.
The embodiment inputs the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map, and obtains implicit alignment information, wherein the implicit alignment feature map is the coordinate alignment relation between a vehicle end and a road end. More specifically, after the current road-side monitoring image is acquired, the current monitoring image and the current road-side monitoring image are input together into the depth-aligned neural network. The depth alignment neural network outputs an implicit coordinate alignment relation between a vehicle end and a road side according to the input picture.
The deep alignment neural network is a deep neural network. The depth alignment neural network has the capability of aligning the vehicle end with the road end visual angle after training.
According to the embodiment, the current road end monitoring image and the depth map corresponding to the current road end monitoring image are input into a trained depth feature extraction neural network, and the corresponding road end visual angle feature map is extracted. The road end visual angle characteristic map comprises texture, depth and other information of road end monitoring images. Different network modules in the depth feature extraction neural network are generally used for respectively extracting features of the current road end monitoring image and the current road end monitoring image, and then the same network module is used for fusing feature information of the current road end monitoring image and feature information of the current road end monitoring image to obtain a final road end visual angle feature map.
In this embodiment, a feature map of a vehicle-end view angle is input into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end view angle. The first 3D object detection information comprises the 3D position, the distance and the angle of an object from the vehicle end view angle.
Because the embodiment fuses the depth map corresponding to the current road-end monitoring image, the first 3D target detection information obtained by the embodiment is more accurate than the single-purpose monitoring image of the vehicle. And because the depth neural network is used as a coordinate conversion network to learn the coordinate conversion information and coordinate conversion is carried out on the implicit alignment feature map and the road end visual angle feature map, the vehicle end visual angle feature map of the vehicle end visual angle can be directly output.
In this embodiment, the dimensions of the implicit alignment feature map and the road-end view feature map are the same, the implicit alignment feature map and the road-end view feature map are spliced on a channel to form a feature map, and then the feature map is input into a coordinate conversion network for coordinate conversion.
In this embodiment, the depth alignment neural network, the depth feature extraction neural network, the coordinate transformation network, and the 3D object detection network are all depth neural networks.
In this embodiment, as shown in fig. 2, a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network, and a 3D target detection network are formed into a 3D target detection model according to the data flow direction; the 3D object detection model is pre-trained as follows:
inputting the trained vehicle end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image into a 3D target detection model, and outputting to obtain second 3D target detection information;
and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.
Due to errors in the samples during training, the 3D object detection model may learn the ability to overcome delay errors.
In a specific embodiment, there is further provided a vehicle-road cooperative end-to-end 3D object detection system, as shown in fig. 3, including:
the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;
the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;
the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;
the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;
the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;
the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;
and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.
In a specific embodiment, there is also provided a computer device, including a memory and a processor, where the memory stores a computer program that can be run on the processor, and when the processor executes the computer program, the steps of the method for implementing the end-to-end 3D target detection of the vehicle-road coordination are as follows:
when detecting that a vehicle enters a certain monitoring area, obtaining a current vehicle-end monitoring image, a current road-end monitoring image and a depth map corresponding to the current road-end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network, and extracting to obtain a corresponding road end visual angle feature map;
after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a trained coordinate conversion network to perform coordinate conversion, and obtaining a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
In a specific embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the vehicle-road cooperative end-to-end 3D target detection method as follows:
when detecting that a vehicle enters a certain monitoring area, obtaining a current vehicle-end monitoring image, a current road-end monitoring image and a depth map corresponding to the current road-end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after the obtained implicit alignment feature map and the road end visual angle feature map are fused, inputting a trained coordinate conversion network to perform coordinate conversion, and obtaining a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
Those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
The above embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention.
Claims (10)
1. A vehicle-road cooperative end-to-end 3D target detection method is characterized by comprising the following steps of: the method comprises the following steps:
when detecting that a vehicle enters a certain monitoring area, requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end, and acquiring the current vehicle end monitoring image;
inputting the current vehicle monitoring image and the current road side monitoring image into a trained depth alignment neural network to obtain an implicit alignment feature map;
inputting the current road end monitoring image and a depth map corresponding to the current road end monitoring image into a trained depth feature extraction neural network to obtain a corresponding road end visual angle feature map;
after fusing the implicit alignment feature map and the road end visual angle feature map, inputting a trained coordinate conversion network to perform coordinate conversion to obtain a vehicle end visual angle feature map;
and inputting the vehicle-end visual angle characteristic diagram into a trained 3D target detection network to obtain first 3D target detection information based on the vehicle-end visual angle.
2. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: detecting whether the vehicle enters a certain monitoring area, specifically, judging that the vehicle enters the monitoring area corresponding to the monitoring camera when detecting that the distance between the current position of the vehicle and the monitoring camera is smaller than a preset first distance threshold value.
3. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the implicit alignment feature map is the coordinate alignment relation between the vehicle end and the road end.
4. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the first 3D object detection information comprises the 3D position, the distance and the angle of an object from the vehicle end view angle.
5. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the implicit alignment feature map and the road end visual angle feature map are the same in size, are spliced on a channel and are combined into a feature map, and then are input into a coordinate conversion network for coordinate conversion.
6. The vehicle-road cooperative end-to-end 3D target detection method according to claim 1, wherein: the depth alignment neural network, the depth feature extraction neural network, the coordinate conversion network and the 3D target detection network are all depth neural networks.
7. The vehicle-road cooperative end-to-end 3D target detection method of claim 6, wherein: forming a 3D target detection model by a depth alignment neural network, a depth feature extraction neural network, a coordinate conversion network and a 3D target detection network according to the data flow direction; the 3D object detection model is pre-trained as follows:
inputting the train end monitoring image, the road end monitoring image and the depth map corresponding to the road end monitoring image for training into a 3D target detection model for training to obtain second 3D target detection information;
and calculating the loss between the second 3D target detection information and the real 3D target detection information of the tag data, and updating the parameters of the 3D target detection model through gradient back propagation.
8. The utility model provides a car way is coordinated end to end 3D target detecting system which characterized in that: comprising the following steps:
the detection module is used for detecting whether the vehicle enters a certain monitoring area or not;
the acquisition module is used for requesting to acquire a current road end monitoring image and a depth map corresponding to the current road end monitoring image from a road end when the vehicle enters a certain monitoring area, and acquiring the current road end monitoring image;
the depth alignment neural network module is used for processing the input current vehicle monitoring image and the current road side monitoring image to obtain an implicit alignment feature map;
the depth feature extraction neural network module is used for extracting the input current road end monitoring image and a depth map corresponding to the current road end monitoring image to obtain a corresponding road end visual angle feature map;
the fusion module is used for fusing the implicit alignment feature map and the road end visual angle feature map;
the coordinate conversion network module is used for carrying out coordinate conversion on the fused implicit alignment feature map and road end view angle feature map to obtain a vehicle end view angle feature map;
and the 3D target detection network module is used for processing the input vehicle-end visual angle characteristic diagram to obtain first 3D target detection information based on the vehicle-end visual angle.
9. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474419.XA CN116503819A (en) | 2023-04-27 | 2023-04-27 | Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474419.XA CN116503819A (en) | 2023-04-27 | 2023-04-27 | Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116503819A true CN116503819A (en) | 2023-07-28 |
Family
ID=87319784
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310474419.XA Pending CN116503819A (en) | 2023-04-27 | 2023-04-27 | Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116503819A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117635663A (en) * | 2023-12-12 | 2024-03-01 | 中北数科(河北)科技有限公司 | Target vehicle video tracking method and electronic equipment |
-
2023
- 2023-04-27 CN CN202310474419.XA patent/CN116503819A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117635663A (en) * | 2023-12-12 | 2024-03-01 | 中北数科(河北)科技有限公司 | Target vehicle video tracking method and electronic equipment |
CN117635663B (en) * | 2023-12-12 | 2024-05-24 | 中北数科(河北)科技有限公司 | Target vehicle video tracking method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109920246B (en) | Collaborative local path planning method based on V2X communication and binocular vision | |
EP4152204A1 (en) | Lane line detection method, and related apparatus | |
WO2022206942A1 (en) | Laser radar point cloud dynamic segmentation and fusion method based on driving safety risk field | |
CN110443225B (en) | Virtual and real lane line identification method and device based on feature pixel statistics | |
WO2021155685A1 (en) | Map updating method, apparatus and device | |
EP3822852B1 (en) | Method, apparatus, computer storage medium and program for training a trajectory planning model | |
CN110738121A (en) | front vehicle detection method and detection system | |
CN108594244B (en) | Obstacle recognition transfer learning method based on stereoscopic vision and laser radar | |
CN112654998B (en) | Lane line detection method and device | |
US20220396281A1 (en) | Platform for perception system development for automated driving system | |
CN112753038A (en) | Method and device for identifying lane change trend of vehicle | |
WO2023155580A1 (en) | Object recognition method and apparatus | |
US11999371B2 (en) | Driving assistance processing method and apparatus, computer-readable medium, and electronic device | |
CN114495064A (en) | Monocular depth estimation-based vehicle surrounding obstacle early warning method | |
CN116503819A (en) | Vehicle-road collaborative end-to-end 3D target detection method, system, equipment and storage medium | |
US20230326055A1 (en) | System and method for self-supervised monocular ground-plane extraction | |
CN113419245A (en) | Real-time mapping system and mapping method based on V2X | |
CN113643431A (en) | System and method for iterative optimization of visual algorithm | |
CN110472508A (en) | Lane line distance measuring method based on deep learning and binocular vision | |
CN115797578A (en) | Processing method and device for high-precision map | |
CN115236696A (en) | Method and device for determining obstacle, electronic equipment and storage medium | |
US11288520B2 (en) | Systems and methods to aggregate and distribute dynamic information of crowdsourcing vehicles for edge-assisted live map service | |
US11544899B2 (en) | System and method for generating terrain maps | |
WO2022258203A1 (en) | Platform for perception function development for automated driving system | |
WO2023036032A1 (en) | Lane line detection method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |