WO2020108647A1

WO2020108647A1 - Target detection method, apparatus and system based on linkage between vehicle-mounted camera and vehicle-mounted radar

Info

Publication number: WO2020108647A1
Application number: PCT/CN2019/122171
Authority: WO
Inventors: 邝宏武; 方梓成; 孙杰
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2018-11-30
Filing date: 2019-11-29
Publication date: 2020-06-04
Also published as: CN111257866A; CN111257866B

Abstract

A target detection method, apparatus and system based on the linkage between a vehicle-mounted camera and a vehicle-mounted radar. The method comprises: detecting, from a video image provided by a vehicle-mounted camera, all image targets around a vehicle, the degrees of confidence of all the image targets, and position information of all the image targets in an image coordinate system (201); detecting, from a speed distance image provided by a radar, all radar targets around the vehicle, and position information of all the radar targets in a radar coordinate system (202); acquiring a first perspective matrix according to position information of image targets, the degree of confidence of which exceeds a first preset threshold, and position information of radar targets, the degree of confidence of which exceeds a second preset threshold (203); and detecting, by means of the first perspective matrix, a target category from image targets, the degree of confidence of which does not exceed the first preset threshold, and radar targets, the degree of confidence of which does not exceed the second preset threshold (204). The method can improve the precision of detecting a target.

Description

Target detection method, device and system linked by vehicle camera and vehicle radar

This application requires the priority of the Chinese patent application filed on November 30, 2018 with the application number 201811459743.X and the invention titled "Target Detection Method, Device and System for Vehicle Camera and Vehicle Radar Linkage", all of which are approved by The reference is incorporated in this application.

Technical field

The present application relates to the field of intelligent transportation, and in particular, to a target detection method, device, and system in which a car camera and a car radar are linked.

Background technique

Automated driving is an important application in intelligent transportation systems. Automated driving requires vehicles to detect the vehicles around them and avoid them according to the vehicles around them to avoid traffic accidents. Vehicles currently include cameras and radars, and vehicles can detect vehicles around them based on cameras and radars.

At present, there is a method for detecting vehicles around the car, which can be: detecting the first position information of each image target of the vehicle around the vehicle based on the camera, and detecting the second position of each radar target of the vehicle based on the radar information. Map the first position information of each image target to the road coordinate system by presetting the first perspective matrix to obtain the third position information of each image target, and preset the second position of each radar target by presetting the second perspective matrix The information is mapped to the road coordinate system to obtain the fourth position information of each radar target. At least one target pair is determined according to the third position information of each image target and the fourth position information of each radar target, and each target pair includes an image target and a radar target belonging to the same object. Since the image target and the radar target in the target pair are the targets detected by the camera and the radar, respectively, which may be the target of the vehicle, the target in the target pair may be the vehicle, and the target in the target pair is the detected vehicle.

During the process of implementing the present application, the inventor found that the above-mentioned method has at least the following defects:

The preset first perspective matrix is used to reflect the conversion relationship between the image coordinate system of the camera and the road coordinate system. The conversion relationship may change due to vehicles driving under different road conditions. The preset first perspective matrix cannot accurately reflect the current The conversion relationship between the image coordinate system of the camera and the road coordinate system, so the accuracy of the third position information obtained by mapping the first position information of the image target to the road coordinate system through the preset first perspective matrix will be reduced, thereby reducing the detection target Accuracy.

Summary of the invention

In order to improve the accuracy of detecting targets, embodiments of the present application provide a vehicle detection method and device based on a camera and a radar. The technical solution is as follows:

According to a first aspect of the embodiments of the present application, there is provided a target detection method in which a vehicle-mounted camera and a vehicle-mounted radar are linked. The method includes:

Detecting each image object around the vehicle from the video image provided by the onboard camera, the confidence of each image object, and the position information of each image object in the image coordinate system;

Each radar target around the vehicle is detected from the speed and distance images provided by the radar, the position information of each radar target in the radar coordinate system, and the confidence of each radar target, the confidence is The probability that the target category of the real target corresponding to the image target or the radar target is a specified category;

Obtain a first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold, the first perspective matrix is used to represent the image image The conversion relationship between the coordinate system and the preset road coordinate system;

Through the first perspective matrix, target categories are detected from image targets whose confidence does not exceed the first preset threshold and radar targets whose confidence does not exceed the second preset threshold.

Optionally, the method further includes:

Classify image objects whose confidence exceeds the first preset threshold according to the detected confidence of each of the image objects, obtain the target category of the real target corresponding to the image object, and output the target category, and

According to the detected confidence level of each of the radar targets, classify the radar target whose confidence level exceeds the second preset threshold to obtain the target category of the real target corresponding to the radar target, and output the target category.

Optionally, before the acquiring the first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold, the method further includes :

N associated target pairs are determined from an image target with a confidence exceeding a first preset threshold and a radar target with a confidence exceeding a second preset threshold, any one of the associated target pairs includes a radar target and an image satisfying a preset association condition Target, the N is a positive integer greater than or equal to 1;

The obtaining the first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold includes:

The first perspective matrix is determined according to the position information of the radar target and the image target in the N associated target pairs.

Optionally, the determining of N associated target pairs from the image target with the confidence level exceeding the first preset threshold and the radar target with the confidence level exceeding the second preset threshold includes:

Mapping the first image object from the image coordinate system to the road coordinate system according to the first position information of the first image object and the stored second perspective matrix, to obtain the first image object on the road Corresponding third position information in the coordinate system; wherein the first image object is an image object whose confidence exceeds a first preset threshold, and the first position information is the first image object in the image coordinate system Location information

Mapping the first radar target from the radar coordinate system to the road coordinate system according to the second position information of the first radar target and the stored third perspective matrix, to obtain the first radar target on the road Corresponding fourth position information in the coordinate system; wherein the first radar target is a radar target whose confidence exceeds a second preset threshold, and the second position information is the first radar target in the radar coordinate system Location information

According to the third position information of each of the first image targets and the fourth position information of each of the first radar targets, positionally associate each of the first image targets and each of the first radar targets to obtain the N related target pairs.

Optionally, the performing position association on each of the first image targets and each of the second radar targets to obtain the N associated target pairs includes:

Determine the projected area of the first image target in the road coordinate system according to the third position information of the first image target and the fourth position information of the first radar target, the first radar target is at The projected area in the road coordinate system, and the overlapping projected area of the first image target and the first radar target in the road coordinate system;

Determine each of the first image targets according to the projected area of the first image target in the road coordinate system, the projected area of the first radar target in the road coordinate system, and the overlapping projected area The associated cost with each of the second radar targets;

It is determined from the first image target and the second image target that one of the first radar target and one of the first image target with the smallest associated cost are the associated target pairs, and then the N associated target pairs are obtained.

Optionally, the determining the first perspective matrix according to the position information of the radar target and the image target in the N associated target pairs includes:

For any one of the N associated target pairs, according to the position information of the first radar target in the any associated target pair, modify the first of the any associated target pair Image target location information;

Modify the second perspective matrix according to the corrected position information of each of the first image targets in the N associated target pairs to obtain the first perspective matrix.

Optionally, the first perspective matrix is used to detect a target category from an image target whose confidence does not exceed the first preset threshold and a radar target whose confidence does not exceed the second preset threshold, include:

M feature fusion target pairs are determined from the image target whose confidence level does not exceed the first preset threshold and the radar target whose confidence level does not exceed the second preset threshold, any one of the feature fusion target pairs includes those satisfying the preset association condition One radar target and one image target, where M is a positive integer greater than or equal to 1;

For any of the feature fusion target pairs, the echo energy feature of the radar target and the image feature of the image target in the feature fusion target pair are respectively convoluted and spliced to obtain the feature fusion The fusion feature map corresponding to the target pair;

After performing convolution and full connection calculation on the fusion feature map, input it to a classification network for target classification to obtain a target category corresponding to the fusion feature map.

Optionally, the determining M feature fusion target pairs from the second image target and the second radar target includes:

Mapping the second image object from the image coordinate system to the road coordinate system through the first perspective matrix to obtain the corresponding position information of the second image object in the road coordinate system, and, by A pre-stored third perspective matrix, mapping the second radar target from the radar coordinate system to the road coordinate system, to obtain the corresponding position information of the second radar target in the road coordinate system, the first The second image target is an image target whose confidence level does not exceed the first preset threshold, and the second radar target is a radar target whose confidence level does not exceed the second preset threshold;

According to the corresponding position information of each second image target in the road coordinate system and the corresponding position information of each second radar target in the road coordinate system, for each second image target and each position The second radar target performs position correlation to obtain the M feature fusion target pairs.

Optionally, the detecting the confidence of each image target around the vehicle from the video image provided by the on-board camera includes:

Obtain the classification confidence, tracking frame number confidence and position of any one of the image objects according to the current frame video image provided by the vehicle camera and the multi-frame historical frame video image close to the current frame video image Confidence;

Determine the confidence of the image target according to one or more of the classification confidence, position confidence, and tracking frame number confidence;

The detection of the confidence of each radar target around the vehicle from the speed and distance images collected by the radar includes:

Determine the radar according to the intensity of the echo energy of any of the radar targets in the current frame speed distance image, the distance from the vehicle, and the duration of the radar target in the multi-frame historical frame speed distance image The confidence of the goal.

According to a second aspect of the embodiments of the present application, there is provided a target detection device in which a car camera and a car radar are linked, and the device includes:

The first detection module is used to detect each image target around the vehicle from the video image provided by the onboard camera, the confidence of each image target, and the position information of each image target in the image coordinate system;

The second detection module is used to detect each radar target around the vehicle from the speed and distance images provided by the radar, the position information of each radar target in the radar coordinate system, and the confidence of each radar target Degree, the confidence is used to indicate the probability that the target category of the real target corresponding to the image target or the radar target is a specified category;

An obtaining module, configured to obtain a first perspective matrix based on the position information of the image target with the confidence level exceeding the first preset threshold and the position information of the radar target with the confidence level exceeding the second preset threshold It represents the conversion relationship between the image coordinate system and the preset road coordinate system;

The third detection module is used to detect the target category from the image target whose confidence does not exceed the first preset threshold and the radar target whose confidence does not exceed the second preset threshold through the first perspective matrix .

Optionally, the device further includes:

A classification module is used to classify image objects whose confidence exceeds the first preset threshold according to the detected confidence of each of the image objects, to obtain the target category of the real target corresponding to the image object, and to Target category output, and, based on the detected confidence of each of the radar targets, classify the radar targets with confidence exceeding the second preset threshold to obtain the target category of the real target corresponding to the radar target, and The target category is output.

Optionally, the device further includes:

A determining module, configured to determine N associated target pairs from an image target with a confidence exceeding a first preset threshold and a radar target with a confidence exceeding a second preset threshold, any one of the associated target pairs includes satisfying a preset associated condition Radar target and image target, N is a positive integer greater than or equal to 1;

The acquisition module is configured to determine the first perspective matrix according to the position information of the radar target and the image target in the N associated target pairs.

Optionally, the determination module is used to:

Optionally, the acquisition module is used to:

Optionally, the third detection module is used to:

Optionally, the first detection module is used to:

The second detection module is used to:

According to a third aspect of the embodiments of the present application, there is provided a target detection system, including a radar provided on a vehicle, an on-board camera provided on the vehicle, and a detection device communicating with the radar and the on-board camera ,

The on-board camera is used to photograph the surroundings of the vehicle to obtain a video image of the current frame, and provide the video image of the current frame to the detection device;

The radar is used to generate a current frame speed distance image based on the transmitted radar signal and the received echo signal, and provide the current frame speed distance image to the detection device;

The detection device is configured to detect each image target around the vehicle from the video image provided by the on-board camera, the confidence of each image target, and the position information of each image target in the image coordinate system Detecting each radar target around the vehicle from the speed and distance images provided by the radar, and the position information of each radar target in the radar coordinate system, the confidence level of the radar target, the confidence level is The probability that the target category of the real target corresponding to the image target or the radar target is a specified category; according to the location information and confidence level of the image target whose confidence exceeds the first preset threshold Obtain the first perspective matrix of the position information of the radar target, and the first perspective matrix is used to represent the conversion relationship between the image coordinate system and the preset road coordinate system; through the first perspective matrix, from the confidence level The target category is detected from the image target that does not exceed the first preset threshold and the radar target that does not exceed the second preset threshold.

Optionally, the on-vehicle camera is provided on the front and rear and/or left and right sides of the vehicle, and the radar is provided on the front and rear of the vehicle.

Optionally, the radar is a millimeter wave radar.

According to a fourth aspect of the embodiments of the present application, there is provided a non-volatile computer-readable storage medium for storing a computer program, which is loaded and executed by a processor to implement the first aspect or the first aspect Instructions for any alternative method.

According to a fifth aspect of the embodiments of the present application, the present application provides an electronic device, the electronic device comprising:

At least one processor; and

At least one memory;

The at least one memory stores one or more instructions, and the one or more instructions are configured to be executed by the at least one processor to execute the following instructions:

The technical solutions provided by the embodiments of the present application may include the following beneficial effects:

The position information of the image target around the vehicle in the image coordinate system and the confidence of the image target are detected by the on-board camera, and the position information of the radar target around the vehicle in the radar coordinate system and the confidence of each radar target are detected by radar; Image targets with confidence exceeding the first preset threshold and radar targets with confidence exceeding the second preset threshold are real targets in the specified category, so image targets with confidence exceeding the first preset threshold and confidence exceeding the second preset The first perspective matrix acquired by the radar target with a threshold can accurately reflect the conversion relationship between the image coordinate system of the vehicle's on-board camera and the road coordinate relationship under the current road conditions, so that the confidence level does not exceed the first preset according to the first perspective matrix The accuracy of the target target detected in the image target of the threshold and the radar target whose confidence level does not exceed the second preset threshold is the real target of the specified category is high, thereby improving the accuracy of target detection.

It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present application.

BRIEF DESCRIPTION

The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the application, and are used together with the specification to explain the principles of the application.

FIG. 1 is a schematic structural diagram of a target detection system in which an in-vehicle camera and an in-vehicle radar are linked according to an embodiment of the present application;

2 is a flowchart of a method for detecting a target in which an onboard camera and an onboard radar are linked according to an embodiment of the present application;

FIG. 3 is a flowchart of a target detection method in which an onboard camera and an onboard radar are linked according to an embodiment of the present application;

4 is a block diagram of a target fusion of a vehicle-mounted camera and a vehicle-mounted radar provided by an embodiment of the present application;

5 is a flowchart of a method for obtaining confidence of an image target provided by an embodiment of the present application;

6 is a schematic structural diagram of a first convolutional neural network provided by an embodiment of the present application;

7 is a flow chart of a method for obtaining confidence of a radar target provided by an embodiment of the present application;

8 is a flowchart of a method for obtaining a first perspective matrix provided by an embodiment of the present application;

9 is a flowchart of a method for detecting a target of a specified category provided by an embodiment of the present application;

10 is a flowchart of detecting a target of a specified category by a convolutional neural network according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a target detection device in which an in-vehicle camera and an in-vehicle radar are linked according to an embodiment of the present application;

12 is a schematic structural diagram of an apparatus provided by an embodiment of the present application;

13 is a schematic structural diagram of a target detection system provided by an embodiment of the present application;

14 is a schematic diagram of various image targets around a vehicle detected from a video image provided by an embodiment of the present application;

15 is a schematic diagram of detecting various radar targets around a vehicle from a speed and distance image provided by a radar provided by an embodiment of the present application.

Through the above drawings, a clear embodiment of the present application has been shown, which will be described in more detail later. These drawings and text descriptions are not intended to limit the scope of the concept of the present application in any way, but to explain the concept of the present application to those skilled in the art by referring to specific embodiments.

detailed description

Exemplary embodiments will be described in detail here, examples of which are shown in the drawings. When referring to the drawings below, unless otherwise indicated, the same numerals in different drawings represent the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.

In the intelligent transportation system, the vehicle can detect other vehicles around it while driving on the road, so that the driver can prompt the driver when driving the vehicle to assist the driver to drive the vehicle, or, when autonomous driving, the vehicle can be based on its The surrounding vehicles plan the driving lane or avoid other vehicles.

Referring to FIG. 1, an embodiment of the present application provides a target detection system in which an on-board camera and an on-board radar are linked, including an on-board camera on a vehicle 1, an on-board radar 2 on a vehicle, and an on-board radar 2 and on-board camera Communication detection equipment 3; vehicle-mounted camera 1 is arranged in front and rear and/or left and right sides of the vehicle, and vehicle-mounted radar 2 is arranged in front and rear of the vehicle;

Vehicle-mounted camera 1, used to capture the video image of the surrounding environment of the vehicle;

The vehicle-mounted radar 2 is used to send radar waves around the vehicle, receive the reflected waves corresponding to the radar waves and obtain the echo energy intensity of the reflected waves to obtain a speed distance image;

The detection device 3 is used to detect the position information and confidence level of each image target around the vehicle in the image coordinate system based on the video image, the confidence level is used to indicate the probability that the target category of the real target corresponding to the image target is a specified category, And detecting the position information and confidence of the radar target around the vehicle in the radar coordinate system based on the speed distance image, the confidence being used to indicate the probability that the target category of the real target corresponding to the radar target is the specified category;

The detection device 3 is further configured to acquire the first perspective matrix based on the position information of the image target whose confidence exceeds the first preset threshold and the position information of the image target whose confidence exceeds the second preset threshold. Represents the conversion relationship between the image image coordinate system and the preset road coordinate system; through the first perspective matrix, from the image target whose confidence does not exceed the first preset threshold and the radar target whose confidence does not exceed the second preset threshold Detection target category.

Optionally, the vehicle includes on-board equipment, and the vehicle detects surrounding vehicles through the on-board equipment, on-board camera, and on-board radar. The vehicle-mounted device can detect each image target around the vehicle through the vehicle-mounted camera, and each radar target around the vehicle through the vehicle-mounted radar. Each image target and each radar target may be real targets in a specified category, and the real targets in a specified category may be vehicles and the like. Optionally, the in-vehicle device may be the detection device 3 described above.

Optionally, the vehicle-mounted radar 2 may be a millimeter wave radar or a laser radar. The in-vehicle device may be the above-mentioned detection device 3.

It should be noted that if a target is simultaneously detected as a real target in the specified category by the on-board camera and detected as a real target in the specified category by the radar, the target is more likely to be a real target in the specified category. Therefore, in order to improve the accuracy of detecting targets around the car, each image target detected by the on-board camera and each radar target detected by radar can be fused to improve the detection accuracy. Wherein, for the process of detecting the image target, the radar target and the fusion, please refer to the description of any of the following embodiments, which will not be described here.

Referring to FIG. 2, an embodiment of the present application provides a target detection method in which a car camera and a car radar are linked. The method includes:

Step 201: Detect each image object around the vehicle, the confidence of each image object, and the position information of each image object in the image coordinate system from the video image provided by the on-board camera.

Step 202: Detect each radar target around the vehicle from the speed and distance images provided by the onboard radar, and the position information of each radar target in the radar coordinate system, the confidence of the radar target, which is used to represent the image target or the radar target The target category of the corresponding real target is the probability of the specified category.

Step 203: Obtain a first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold, and the first perspective matrix is used to represent the image image coordinates The conversion relationship between the system and the preset road coordinate system.

Step 204: Using the first perspective matrix, detect the target category from the image target whose confidence does not exceed the first preset threshold and the radar target whose confidence does not exceed the second preset threshold.

Optionally, after step 204, step 205 is further included, according to the detected confidence of each image target, the image target whose confidence exceeds the first preset threshold is classified to obtain the target category of the real target corresponding to the image target And output the target category, and classify the radar targets whose confidence exceeds the second preset threshold according to the detected confidence of each radar target to obtain the target category of the real target corresponding to the radar target, and Output the target category. By categorizing high-confidence image targets and radar targets, the targets of the desired target category can be classified and output. Through this step, it is possible to realize the calculation of the confidence and target classification of the image target in FIG. 4, the calculation of the confidence and target classification of the radar coordinates, and the decision-level fusion of high-confidence targets.

Among them, referring to FIG. 14, each image object around the vehicle detected from the video image provided by the on-board camera, and the confidence of each image object, and the position information of each image object in the image coordinate system are illustrated. The target in the superimposed frame is an image target with a confidence exceeding the first preset threshold. Refer to FIG. 15 to illustrate the detection of various radar targets around the vehicle from the speed and distance images provided by the radar, and the radar targets in the radar coordinate system The position information in FIG. 3 is the confidence of the radar target, where the target in the superimposed frame is a radar target whose confidence exceeds the second preset threshold.

Optionally, after step 203, step 206 is further included, and N associated target pairs are determined from the image target whose confidence exceeds the first preset threshold and the radar target whose confidence exceeds the second preset threshold, any one of the associations The target pair includes a radar target and an image target satisfying a preset association condition, and N is a positive integer greater than or equal to 1. In this way, the first perspective matrix can be determined according to the position information of the radar target and the image target in the N associated target pairs.

Optionally, step 206 may include:

2061: Based on the first position information of the first image object and the stored second perspective matrix, map the first image object from the image coordinate system to the road coordinate system to obtain the corresponding first image object in the road coordinate system. Three position information; wherein the first image object is an image object with a confidence exceeding a first preset threshold, and the first position information is the position information of the first image object in the image coordinate system;

2062: According to the second position information of the first radar target and the stored third perspective matrix, map the first radar target from the radar coordinate system to the road coordinate system to obtain the corresponding fourth position of the first radar target in the road coordinate system Information; where the first radar target is a radar target whose confidence exceeds a second preset threshold, and the second position information is the position information of the first radar target in the radar coordinate system;

2063: Based on the third position information of each first image target and the fourth position information of each first radar target, position-associate each first image target with each first radar target to obtain N associated target pairs.

Through the above steps 2061 to 2063, N associated target pairs can be obtained. Based on the N associated target pairs, the high-confidence image target detected by the vehicle camera and the high-confidence radar target detected by the radar can be dynamically aligned. The dynamic alignment between the image coordinate system, road coordinate system and radar coordinate system in 4.

Optionally, the above step 2063 may be:

Determine the projected area of the first image target in the road coordinate system according to the third position information of the first image target and the fourth position information of the first radar target, the projected area of the first radar target in the road coordinate system, And the overlapping projected area of the first image target and the first radar target in the road coordinate system;

According to the projected area of the first image target in the road coordinate system, the projected area of the first radar target in the road coordinate system, and the overlapping projected area, determine the associated cost between each first image target and each second radar target ;

From the first image target and the second image target, it is determined that a first radar target and a first image target with the smallest associated cost are associated target pairs, and then N associated target pairs are obtained.

Since the association cost between each first image target and each second radar target is determined, a first radar target and a first image target with the smallest association cost are determined from the first image target and the second image target as the association target pair , Which can improve the accuracy of the associated target pair.

Optionally, the above step 203 may include:

2031: For any associated target pair of the N associated target pairs, modify the position information of the first image target in the associated target pair according to the position information of the first radar target in the associated target pair;

2032: Correct the second perspective matrix according to the corrected position information of each first image target in the N associated target pairs to obtain the first perspective matrix. Because the position information of the radar coordinate system is more accurate than the position information of the image coordinate system, for a related target pair, using the position information of the radar target to modify the position information of the image target can make the position information of the image target and the real target The location information is closer, the second perspective matrix is corrected to the first perspective matrix according to the corrected location information of each image target, and the resulting first perspective matrix can accurately reflect the current image coordinate system and the road coordinate system. Conversion relationship, which is conducive to correct the position information when mapping low-confidence image objects to the road coordinate system.

Optionally, the above step 204 may include:

2041: M feature fusion target pairs are determined from the image target whose confidence level does not exceed the first preset threshold and the radar target whose confidence level does not exceed the second preset threshold, any one of the feature fusion target pairs includes those satisfying the preset association condition For a radar target and an image target, M is a positive integer greater than or equal to 1.

2042: For any feature fusion target pair, the echo energy feature of the radar target in the feature fusion target pair and the image feature of the image target are respectively convoluted and stitched together to obtain the fusion feature corresponding to the feature fusion target pair Figure;

2043: After performing convolution and full connection calculation on the fusion feature map, input it to the classification network for target classification, and obtain the target category corresponding to the fusion feature map.

Through the above steps 2041 to 2043, the feature level fusion of the low-confidence target in FIG. 4 can be achieved. For the specific steps 2042 and 2043, refer to the process shown in FIG. 10.

Optionally, the above step 2041 may be:

Through the first perspective matrix, the second image object is mapped from the image coordinate system to the road coordinate system to obtain the corresponding position information of the second image object in the road coordinate system, and, through the pre-stored third perspective matrix, the The second radar target is mapped from the radar coordinate system to the road coordinate system to obtain the corresponding position information of the second radar target in the road coordinate system. The second image target is an image target whose confidence level does not exceed the first preset threshold. The second radar target The target is a radar target whose confidence level does not exceed the second preset threshold;

According to the corresponding position information of each second image target in the road coordinate system and the corresponding position information of each second radar target in the road coordinate system, position correlation is performed for each second image target and each second radar target to obtain M Feature fusion target pair.

Optionally, the above step 201 may include:

2011: According to the current frame video image provided by the car camera, and the multi-frame historical frame video image close to the current frame video image, obtain the classification confidence, tracking frame number confidence and position confidence of any image target;

2012: Determine the confidence of the image target based on one or more of the classification confidence, position confidence, and tracking frame number confidence.

Optionally, the above step 202 may be:

The confidence of the radar target is determined according to the intensity of the echo energy of any radar target in the current frame speed distance image, the distance from the vehicle, and the duration of the radar target in the multi-frame historical frame speed distance image.

Referring to FIGS. 3 and 4, an embodiment of the present application provides a target detection method in which a car camera and a car radar are linked. This method can be applied to the architecture shown in FIG. 1, and the execution subject of the method can be a car device. The vehicle-mounted device may be the detection device 3 in the system shown in FIG. 1, including:

Step 301: Detect the position information, target area and confidence level of each image target around the vehicle in the image coordinate system from the video image provided by the vehicle's on-board camera. The confidence level is used to represent the target of the real target corresponding to the image target The category is the probability of the specified category.

The on-board camera can capture a frame of video image of the environment around the vehicle. Whenever a frame of video image is captured, for convenience of description, the current frame of video image is called the first video image. The video image is input to the in-vehicle device; the in-vehicle device includes the first convolutional neural network, which can detect the position information of each image target in the first video image in the specified category according to the first video image , Target area and confidence.

Optionally, the image objects whose confidence exceeds the first preset threshold may be classified according to the detected confidence of each image object, to obtain the target category of the real target corresponding to the image object, and output the target category.

The vehicle-mounted camera includes an image coordinate system. For each image target, the position information and target area of the image target are the position information and area of the image target in the image coordinate system.

Optionally, referring to FIG. 5, the vehicle-mounted device may detect the position information, target area, and confidence of each image target through operations from 3011 to 3015, which are:

3011: Detect the first video image currently captured by the vehicle's on-board camera through the first convolutional neural network to obtain the position information and target area of each image target of the specified category.

Referring to FIG. 6, the first convolutional neural network includes a convolutional neural network (Convolutional Neural Network, CNN), a regional candidate network (Region Proposal Network, RPN), and a region-based convolutional neural network (Fast Region-based Convolution Neural Network) , RCNN) and other components, a first category set is set in RCNN in advance, the category of the first category set may include a specified category, for example, the specified category may be a vehicle, the first category set may also include houses, trees, flower stands, Street lights and other categories.

This step may be: first, when the vehicle-mounted device receives the first video image input from the vehicle-mounted camera, the first image is input to the first convolutional neural network, and each of the first video images output by the first convolutional neural network is acquired The location information, area and probability of each object in the image coordinate system belong to each category in the first category set.

During implementation, referring to FIG. 6, the vehicle-mounted device may input the first video image to the CNN, and the CNN convolves the first video image and extracts features to obtain a first feature map corresponding to the first video image, and converts the first feature The graph is input to RPN and RCNN respectively. RPN determines at least one first candidate box area in the first feature map to obtain a second feature map (each first candidate box area is a second feature map), and also inputs the second feature map to the RCNN. RCNN performs regression on the target position and target category of any second feature map according to the first feature map, that is, the target feature in any first candidate frame area, and finally obtains the position of the target object in the first candidate frame area Information and target area, and the probability that each target object belongs to each category in the first category set. The vehicle-mounted device obtains the position information and target area of each target output by the RCNN and the probability that each target belongs to each category in the first category set.

Then for each target, the vehicle-mounted device selects the maximum probability from the probability of each category corresponding to the target, and takes the category corresponding to the maximum probability as the category of the target, so that the category of each target can be obtained, and then from each target Select the target with the specified category as the image target.

Among them, the target area of the image target may be different from the actual area of the image target.

Optionally, the RCNN may also output an image target frame corresponding to each target, that is, each image target may also have a corresponding image target frame, and the image target frame corresponding to the image target includes the image of the image target.

3012: Obtain the classification confidence of each image target according to the video image taken by the vehicle camera before the current one.

In this step, the K-frame video images taken most recently before the current can be obtained and constitute a video image set. For any image target, for convenience of description, it is called a to-be-processed image target, and each video image in the video image set is determined. The confidence of the target of the image to be processed in the video image set is obtained according to the confidence of the target of the image to be processed in each video image in the video image set, and K is a preset value. The classification confidence of the image object to be processed is used to indicate the probability that the image object to be processed is a real vehicle on the image texture.

The video image collection includes the first video image, the second video image, ..., the Kth video image, the first video image is the first video image captured in the video image collection, and the Kth video image is the most Video images taken late.

Optionally, for the operation of determining the confidence of the image target to be processed in each video image in the video image set, the implementation manner may be:

According to the first position information of each image object in the first video image and the first position information of each image object in the Kth video image, calculate each image object and Kth video in the first video image For the distance between each image target in the image, two image targets whose distance is less than a preset distance threshold are determined as the same target in the two video images. Before the current, it has been determined which of the kth video image and the k-1th video image are the same target, k=2, 3, ..., K, so for the image target to be processed in the first video image, you can Determine which video images in the video image set include the image target to be processed and which video images do not include the image target to be processed. For the video image that includes the image target to be processed, the confidence of the image target to be processed in the video image has been obtained at present, and for the video image that does not include the image target to be processed, the The confidence level is set to a preset reliability, for example, the preset reliability may be a value of 0, 1, or the like.

Optionally, the correspondence between the target and the number of images is also stored in the vehicle-mounted terminal. For each record in the correspondence, a target and the number of images including the target are saved in the record.

Correspondingly, when it is determined that the image target in the first video image and the image target in the K-th video image are the same target, the number of images saved in the record including the image target in the correspondence relationship is increased by one. When it is determined that the image target in the first video image is different from any image target in the Kth video image, the image target may be a newly emerging target, set the number of images including the image target to 1, and set the image target Correspond to 1 and stored in the corresponding relationship.

Optionally, for the operation of obtaining the classification confidence of the image target to be processed, the classification confidence of the image target to be processed may be calculated according to the following first formula.

The first formula is:

In the above first formula, C _c is the classification confidence of the image target to be processed, C _c[k] is the confidence level of the image target to be processed in the k-th video image, and γ is the damping coefficient, which is the preset Numerical value, for example, γ=0.9.

3013: Obtain the confidence of the tracking frame number of each image target according to the video image taken by the vehicle camera before the current one.

Obtain the number of images corresponding to the image target to be processed from the correspondence between the target and the number of images, and decrement the number of images by 1 to obtain the number of second video images including the image target to be processed captured by the vehicle camera before the current date, according to For the number of second video images captured before the current including the target of the image to be processed, obtain the confidence of the number of tracking frames of the target of the image to be processed.

The confidence of the tracking frame number of the image target to be processed is used to indicate the stability of the image target to be processed as a real target of a preset category.

Optionally, the confidence of the tracking frame number of the image target to be processed can be obtained according to the second formula below.

The second formula is:

In the above second formula, _CT is the confidence of the number of tracking frames of the image target to be processed, and T is the number of second video images. In the second formula, when the number T of the second video image is less than or equal to 8, the confidence of the number of tracking frames of the image target to be processed C _T =0.2+T/10, when the number T of the second video image is greater than 8, , The confidence of the tracking frame number of the image target to be processed C _T = 1.0.

3014: Acquire the position confidence of each image target according to the first video image.

In this step, for any image object in the first video image, for convenience of description, it is referred to as an image object to be processed, and the distance between the image object to be processed and the vehicle is obtained according to the first position information of the image object to be processed, from Obtain the image height corresponding to the image target to be processed and the blocked ratio of the target frame corresponding to the image target to be processed from the first video image, the target frame including the image corresponding to the image target to be processed, according to the distance, the image height and the blocked Proportionally obtain the position confidence of the image target to be processed.

Optionally, the position confidence of the image target to be processed can be calculated according to the following third formula.

The third formula is: C _p =σ×y/h;

In the above third formula, C _p is the position confidence of the image target to be processed, h is the image height corresponding to the image target to be processed, y is the distance between the image target to be processed and the vehicle, and σ is the image to be processed The occlusion ratio of the target frame corresponding to the target.

The confidence of the position of the image object depends on the ratio of the position information of the image object in the first video image to the image object being blocked: when the position information of the image object is closer to the bottom of the first video image, the position of the image object is confident The higher the degree; the smaller the proportion of the image target blocked by other targets, the higher the confidence of the position of the image target.

3015: Obtain the confidence of each image object according to at least one of the classification confidence, position confidence, and tracking frame number confidence of each image object.

For each image object, according to the classification confidence, position confidence and tracking frame number confidence of the image object, the confidence of the image object in the first video image is calculated according to the following fourth formula.

The fourth formula is: C _s =f(C _c ,C _p ,C _T )=W _c ×C _c +W _p ×C _p +W _T ×C _T ;

In the above fourth formula, C _s is the confidence of the image target, W _c is the weight coefficient of the classification confidence of the image target, W _p is the weight coefficient of the location confidence of the image target, and W _T is the image Weight coefficients for the confidence of the tracking frame number of the target. The three weight coefficients are all preset values.

Among them, the image target detected by the on-board camera with a confidence level exceeding the first preset threshold is higher in the accuracy of the specified category of real targets, and the image target whose confidence level does not exceed the first preset threshold is the actual target of the designated category The accuracy is low. Therefore, in this embodiment, the image target whose confidence exceeds the first preset threshold may be determined as the detected real target of the specified category.

Step 302: Detect the position information of each radar target around the vehicle in the radar coordinate system from the speed distance image provided by the radar of the vehicle and the confidence of each radar target.

The radar has a radar coordinate system, and the detected position information of each radar target is the position information in the radar coordinate system.

Optionally, according to the detected confidence of each radar target, classify the radar target whose confidence exceeds the second preset threshold to obtain the target category of the real target corresponding to the radar target, and output the target category.

Optionally, referring to FIG. 7, this step can be implemented by the following operations from 3021 to 3022, which are:

3021: The vehicle's radar detects the target category around the vehicle as the position information, area, and echo energy intensity of each radar target of the specified category.

The radar can send radar waves to the surroundings of the vehicle. The radar waves can be reflected by objects around the car to form reflected waves. The radar can receive the reflected waves and obtain the return energy intensity of the reflected waves. The position of each reflection point can be calculated according to the echo energy intensity of each reflected wave, and an energy map can be drawn according to the position of each reflection point and the echo energy intensity. The energy map can be a speed distance image. The energy map includes At least one energy block, each energy block includes a plurality of continuous reflection points corresponding to the energy intensity and position of the echo, and each energy block is a target.

Based on each energy block, the target category and area of the target corresponding to each energy block can be determined. The target with the target category as the specified category is determined as the radar target, the extreme point of the echo energy intensity is found from the energy block of the radar target, and the position of the extreme point is determined as the position information of the radar target in the radar coordinate system. The average value of the energy intensity of each echo in the energy block of the radar target can be used as the energy intensity of the echo of the radar target, or the energy intensity of each echo in the energy block of the radar target can be sorted to select the echo in the middle position The energy intensity is taken as the energy intensity of the echo of the radar target.

The area of the radar target detected by the radar is the actual area of the radar target.

3022: Obtain the confidence of each radar target separately according to the position information of each radar target and the intensity of the echo energy.

In this step, the longest distance that the radar can detect and the maximum echo energy intensity of the radar are obtained. For each radar target, the distance between the radar target and the vehicle is calculated according to the position information of the radar target. For the distance, the maximum echo energy intensity, the calculated distance, and the echo energy intensity of the radar target, the confidence of the radar target is calculated according to the following fifth formula.

The fifth formula is: C=a×d/d _max +b×p/p _max ;

Among them, in the fifth formula, C is the confidence of the radar target, a is the weight coefficient of the radar target position, a can be greater than 0 and less than 1, d is the calculated distance, d _{max is} the furthest distance, b is The weight coefficient of the radar target's echo energy intensity, b can be greater than 0 and less than 1, p is the radar target's echo energy intensity, and p _{max is} the maximum echo energy intensity.

Among them, the accuracy of the radar target detected by the radar with a confidence level exceeding the second preset threshold is a specified category of real targets, and the accuracy of the radar target with a confidence level not exceeding the second preset threshold is a designated category of real targets. Lower. Therefore, in this embodiment, the radar target whose confidence exceeds the second preset threshold may be determined as the detected real target of the specified category.

There is no sequence between step 301 and step 302. Step 301 can be executed before step 302, step 301 can be executed before step 301, or step 301 and step 302 can be executed simultaneously.

Step 303: Obtain the first perspective matrix according to the position information of the image target with confidence exceeding the first preset threshold and the position information of the radar target with confidence exceeding the second preset threshold, the first perspective matrix is used to represent the image coordinate system The conversion relationship with the preset road coordinate system. The first preset threshold and the second preset threshold may be the same or different.

Optionally, before performing this step, N associated target pairs are determined from the image targets whose confidence exceeds the first preset threshold and the radar targets whose confidence exceeds the second preset threshold. Any one of the associated target pairs includes Suppose the radar target and the image target of the association condition, N is a positive integer greater than or equal to 1.

Referring to FIG. 8, the process of determining N associated target pairs may include the following operations from 3031 to 3033. The operations from 3031 to 3033 are as follows:

3031: Map the first image object from the image coordinate system to the road coordinate system according to the first position information of the first image object and the stored second perspective matrix, to obtain the corresponding third position of the first image object in the road coordinate system information.

Wherein, the first image object is an image object whose confidence exceeds a first preset threshold, and the first position information of the first image object is position information of the first image object in the image coordinate system.

The in-vehicle terminal locally stores the second perspective matrix obtained last time, and the second perspective matrix is used to reflect the conversion relationship between the image coordinate system of the in-vehicle camera and the preset road coordinate system.

The road coordinate system is different from the image coordinate system of the on-board camera and the radar coordinate system of the radar. The road coordinate system can take the position of a certain point in the vehicle as the coordinate origin and the direction of the vehicle's forward axis as the horizontal axis. , The vertical axis is perpendicular to the direction of vehicle advancement.

Optionally, the position of the center point of the front bumper of the vehicle can be used as the coordinate origin of the road coordinate system.

In this step, the first position information of each first image object whose confidence exceeds the first preset threshold constitutes a first matrix, and a second matrix is obtained according to the following sixth formula. The second matrix includes the confidence exceeding the first The third position information of each first image object in the road coordinate system with a preset threshold.

The sixth formula is: A1*B1=C1; where A1 is the first matrix, B1 is the second perspective matrix, and C1 is the second matrix.

Optionally, the second perspective matrix can also be used to convert the target area of each first image target whose confidence exceeds a preset threshold to obtain the projected area of each first image target in the road coordinate system. The projected area of the image object in the road coordinate system is the actual area.

3032: According to the second position information of the first radar target and the stored third perspective matrix, map the first radar target from the radar coordinate system to the road coordinate system to obtain the corresponding fourth position of the first radar target in the road coordinate system information.

Wherein, the first radar target is a radar target whose confidence exceeds a second preset threshold, and the second position information is the position information of the first radar target in the radar coordinate system.

In this step, the second position information of each first radar target whose confidence exceeds the second preset threshold constitutes a third matrix, and a fourth matrix is obtained according to the following seventh formula. The fourth matrix includes the confidence exceeding the first The fourth position information of each second radar target in the road coordinate system with two preset thresholds.

The seventh formula is: A2*B2=C2; where A2 is the third matrix, B2 is the third perspective matrix, and C2 is the fourth matrix.

Among them, the area of each radar target whose confidence exceeds the second preset threshold is the actual area of the radar target, which is equal to the projected area of each first radar target in the road coordinate system, so there is no need to convert each The area of the first radar target.

3033: According to the third position information and the projection area of each first image target and the fourth position information and the projection area of each first radar target, positionally associate each first image target with each first radar target to obtain N Associated target pair.

Optionally, the N associated target pairs can be determined through the first and second steps as follows:

Step 1: Establish a cost matrix based on the third position information and projected area of the first image target and the fourth position information and projected area of each radar target, each first image target corresponds to a row in the cost matrix, and each A radar target corresponds to a column of the cost matrix, and a row corresponding to the first image target includes cost coefficients between the first image target and each first radar target, and the cost coefficients of the first image target and the first radar target represent The probability that the first image target and the first radar target are the same target.

For example, suppose that there are N first image targets whose reliability exceeds the first preset threshold, and X first radar targets whose confidence exceeds the second preset threshold. The cost matrix thus established includes N rows and X columns. For the i-th first image target and the j-th first radar target, i = 1, 2, ..., N, j = 1, 2, ..., X, according to the i-th first image target at the road coordinates The third position information and projected area in the system

And the fourth position information and projected area of the jth first radar target in the road coordinate system

Calculate the projected overlap area S _ij between the i-th first image target and the j-th first radar target.

Then according to the projected area of the i-th first image target in the road coordinate system

Projected area of the jth first radar target in the road coordinate system

And the projected overlap area S _ij between the i-th first image target and the j-th first radar target, the cost between the i-th first image target and the j-th first radar target is calculated according to the following eighth formula Coefficient d _ij . The cost coefficient d _ij between the i-th first image target and the j-th first radar target is used as the element of the i-th row and j-th column of the cost matrix.

The eighth formula is:

Step 2: Select a maximum cost coefficient from a row of cost coefficients corresponding to the first image target, and form a pair of associated target pairs with the first image target and the first radar target corresponding to the maximum cost coefficient.

Optionally, after obtaining the associated target pair, the first perspective matrix is determined according to the position information of the radar target and the image target in the N associated target pairs.

Optionally, for any one of the N associated target pairs, the position information of the first image target in the associated target pair is corrected according to the position information of the first radar target in the associated target pair; The corrected position information of each of the first image objects in the N associated target pairs corrects the second perspective matrix to obtain a first perspective matrix.

The probability that the first radar target detected by the radar is a real target in a specified category is higher than the probability that the first image target detected by a vehicle camera is a real target in a specified category. Therefore, in this step, for each associated target pair, the first image target and the first radar target included in the associated target pair are the same target, and the third position information of the first image target can be corrected to the first The fourth position information of the radar target, so that the fourth position information of each first image target is obtained. A fifth matrix is constructed from the fourth position information of each first image object, and the first perspective matrix is obtained according to the following ninth formula according to the fifth matrix and the first matrix composed of the first position information of each first image object.

The ninth formula is: A1*B3=C3; where C3 is the fifth matrix and B3 is the first perspective matrix.

Since the first image target with the confidence level exceeding the first preset threshold and the first radar target with the confidence level exceeding the second preset threshold are both detected real targets of the specified category. Therefore, in this step, according to the position information of the first image target whose confidence exceeds the first preset threshold and the position information of the second radar target whose confidence exceeds the second preset threshold, acquiring the first perspective matrix can make the first The perspective matrix reflects the conversion relationship between the image coordinate system of the vehicle-mounted camera and the road coordinate system when the vehicle is driving under the current road conditions.

Optionally, the second perspective matrix saved in the vehicle-mounted device may also be updated to the first perspective matrix.

Step 304: Using the first perspective matrix, detect the real target whose category is the specified category from the image target whose confidence level does not exceed the first preset threshold and the radar target whose confidence level does not exceed the second preset threshold.

Optionally, referring to FIG. 9 and FIG. 10, this step may be implemented by operations from 3041 to 3043, respectively:

3041: Map the position information of each second image object whose confidence level does not exceed the first preset threshold to the road coordinate system through the first perspective matrix to obtain the corresponding position of each second image object in the road coordinate system information.

In this step, the position information of each second image object whose confidence level does not exceed the first preset threshold constitutes a sixth matrix, and the seventh matrix is obtained according to the following tenth formula. The seventh matrix includes the confidence level not exceeding the first The position information of each second image object in the road coordinate system with a preset threshold.

The tenth formula is: A5*B3=C4; where A5 is the sixth matrix, B3 is the first perspective matrix, and C4 is the seventh matrix.

Optionally, the first perspective matrix can also be used to convert the target area of each second image target whose confidence level does not exceed the first preset threshold, to obtain the projected area of each second image target in the road coordinate system. The projected area of a second image object in the road coordinate system is the actual area.

3042: Map each second radar target whose confidence level does not exceed the second preset threshold into the road coordinate system through the third perspective matrix to obtain corresponding position information of each second radar target in the road coordinate system.

In this step, the position information of each second radar target whose confidence level does not exceed the second preset threshold constitutes an eighth matrix, and a ninth matrix is obtained according to the following eleventh formula, which includes the confidence level not exceeding The position information of each second radar target in the road coordinate system of the second preset threshold.

The eleventh formula is: A6*B2=C5; where A6 is the eighth matrix, B2 is the third perspective matrix, and C2 is the ninth matrix.

The area of each second radar target whose confidence level does not exceed the second preset threshold is the actual area of the radar target, which is equal to the projected area of each second radar target in the road coordinate system, so there is no need to pass the first perspective The matrix converts the area of each second radar target.

3043: According to the position information and projection area of each second image target in the road coordinate system whose confidence level does not exceed the first preset threshold and in the road coordinate system each second radar target whose confidence level does not exceed the second preset threshold Position information and projected area, M feature fusion target pairs are determined. The radar target pair includes the image target and the radar target that are the same target.

Optionally, M feature fusion target pairs can be determined through the first and second steps as follows:

Step 1: According to the position information and projection area of each second image target whose confidence level does not exceed the first preset threshold in the road coordinate system and each second radar target whose confidence level does not exceed the second preset threshold on the road The position information and the projected area of the coordinate system establish a second cost matrix, each second image object whose confidence level does not exceed the first preset threshold corresponds to a row in the second cost matrix, and each confidence level where the confidence level does not exceed the second preset threshold A second radar target corresponds to a column of a second cost matrix, and a row corresponding to the second image target includes a second cost coefficient between the image target and each radar target whose confidence level does not exceed a preset threshold, the second The second cost coefficient of the image target and the second radar target represents the probability that the second image target whose confidence level does not exceed the first preset threshold and the second radar target whose confidence level does not exceed the second preset threshold are the same target.

For example, suppose that there are M second image targets whose reliability does not exceed the first preset threshold, and Y second radar targets whose confidence exceeds the second preset threshold. The cost matrix thus established includes M rows and Y columns. For the p-th second image target and the q-th second radar target, p = 1, 2, ..., M, q = 1, 2, ..., Y, according to the p-th second image target at the road coordinates Position information and projected area S _p ^C in the system and the position information and projected area S _q ^{R of} the q-th second radar target in the road coordinate system, calculate the p-th second image target and the q-th second radar target The projected overlap area between S _pq .

Then according to the projection area S _p ^{C of} the p-th second image target in the road coordinate system, the projection area S _q ^{R of} the q-th second radar target in the road coordinate system, and the p-th second image target and the q-th The projected overlap area S _pq between the second radar targets calculates the cost coefficient d _pq between the p-th second image target and the q-th second radar target according to the twelfth formula as follows. The cost coefficient d _pq between the p-th second image target and the q-th second radar target is taken as the element of the p-th row and the q-th column of the cost matrix.

The twelfth formula is: d _pq =S _pq /(S _p ^C +S _q ^R -S _pq ).

Step 2: Select a maximum second cost coefficient from a row of second cost coefficients corresponding to the second image target, and form a feature fusion target pair with the second image target and the second radar target corresponding to the maximum second cost coefficient.

3044: The second convolutional neural network detects from the M feature fusion target pairs that the target category is the true target of the specified category.

The vehicle-mounted device includes a second convolutional neural network, and a second category set is set in the second convolutional neural network in advance, and the categories of the second category set may include designated categories, non-designated categories, and other categories. For example, assume that the specified category is a vehicle or a motor vehicle, and the non-preset category may be a non-motor vehicle.

For each feature fusion target pair, the feature fusion target pair includes a second image target and a second radar target, and the in-vehicle device may use the corresponding image of the second image target in the first video image and the corresponding image of the second radar target The energy block is input to the second convolutional neural network. As shown in FIG. 10, the image features extracted from the image of the second image target and the energy features extracted from the energy block of the second radar target through the second convolutional neural network will be Image features and energy features are stitched together to obtain feature sequences. According to the feature sequence after splicing, through the second convolutional neural network, after performing multi-layer convolution calculation and fully connected layer calculation, the probability of the feature fusion target pair belonging to each category in the second category set is output; the maximum probability is selected Is used as the target category of the feature fusion target pair. When the target category of the feature fusion target pair is the specified category, the second radar target in the feature fusion target pair is used as the real target with the detected category as the specified category. Of course, the second convolutional network can also be composed of multiple sub-networks, each of which completes the following processes: extracting image features from the image of the second image target, and extracting echo energy features from the energy block of the second radar target ; Splicing the image features and energy features to obtain a feature sequence; according to the feature sequence, multi-layer convolution calculation and fully connected layer calculation are performed to output the feature fusion target for each category in the second category set Probability.

When the category of the feature fusion target pair is the specified category, it indicates that the target category of the second image target and the second radar target in the feature fusion target pair are both specified categories, that is to say that the vehicle camera and the radar simultaneously detect the same The target is a real target of the specified category, and because the accuracy of radar detection is higher than that of the on-board camera detection, the second radar target in the feature fusion target pair is used as the real target with the target category detected as the specified category.

Optionally, the second convolutional neural network is obtained by training through a sample set in advance, and the sample set includes a plurality of preset feature fusion target pairs and a target category corresponding to each feature fusion target. During training, the sample set is input to the second convolutional neural network for training.

The beneficial effects of the embodiments of the present application are: the image target and the image target's confidence around the vehicle are detected by the on-board camera, and the radar target and the radar target's confidence around the vehicle are detected by the radar; because the confidence exceeds the first preset The first image target with a threshold and the first radar target with a confidence exceeding the second preset threshold are real targets of the specified category, so the first image target with a confidence exceeding the first preset threshold and the confidence exceeding the second preset The first perspective matrix acquired by the threshold first radar target can reflect the conversion relationship between the image coordinate system of the vehicle's on-board camera and the road coordinate system under the current road conditions, so that the confidence level does not exceed the first preset according to the first perspective matrix The accuracy of the target target detected in the second image target of the threshold and the second radar target whose confidence level does not exceed the second preset threshold is a real target of the specified category is high, thereby improving the accuracy of target detection.

The following is an embodiment of the device of the present application, which can be used to execute the method embodiment of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Referring to FIG. 11, an embodiment of the present application provides a target detection device 400 in which a vehicle camera and a vehicle radar are linked. The device 400 includes:

The first detection module 401 is used to detect each image target around the vehicle from the video image provided by the on-board camera, the confidence of each image target, and the position information of each image target in the image coordinate system;

The second detection module 402 is used to detect each radar target around the vehicle from the speed distance image provided by the radar, the position information of each radar target in the radar coordinate system, and the Confidence, the confidence is used to indicate the probability that the target category of the real target corresponding to the image target or the radar target is a specified category;

The obtaining module 403 is configured to obtain the first perspective matrix based on the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold. Representing the conversion relationship between the image coordinate system and the preset road coordinate system;

The third detection module 404 is configured to detect the target from the image target whose confidence does not exceed the first preset threshold and the radar target whose confidence does not exceed the second preset threshold through the first perspective matrix category.

Optionally, the device 400 further includes:

Optionally, the determination module is used to:

According to the third position information of each of the first image targets and the fourth position information of each of the first radar targets, position-associating each of the first image targets with each of the first radar targets To obtain the N associated target pairs.

Optionally, the determination module is used to:

Determine the projected area of the first image target in the road coordinate system according to the third position information of the first image target and the fourth position information of the first radar target, the first A projected area of a radar target in the road coordinate system, and an overlapping projected area of the first image target and the first radar target in the road coordinate system;

Optionally, the obtaining module 403 is used to:

Optionally, the third detection module 404 is used to:

Optionally, the first detection module 401 is used to:

The second detection module 402 is used to:

The beneficial effects of the embodiments of the present application are as follows: the first detection module detects the image target and the confidence of the image target around the vehicle through the on-board camera, and the second detection module detects the radar target around the vehicle and the confidence of each radar target through the radar Degree; since the image target with confidence exceeding the first preset threshold and the radar target with confidence exceeding the second preset threshold are real targets in the specified category, the acquisition module according to the image target and confidence with the confidence exceeding the first preset threshold The radar target whose degree exceeds the second preset threshold is the real perspective of the specified category. The first perspective matrix can reflect the conversion relationship between the image coordinate system of the vehicle's on-board camera and the road coordinate system under the current road conditions. A perspective matrix detects the target class from the image target whose confidence level does not exceed the first preset threshold and the radar target whose confidence level does not exceed the second preset threshold. Accuracy.

Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 12 shows a structural block diagram of a terminal 500 provided by an exemplary embodiment of the present application. The terminal 500 may be an in-vehicle terminal, and the terminal 500 may also be called other names such as user equipment and portable terminal.

Generally, the terminal 500 includes a processor 501 and a memory 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may adopt at least one hardware form of DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array), PLA (Programmable Logic Array). achieve. The processor 501 may also include a main processor and a coprocessor. The main processor is a processor for processing data in a wake-up state, also known as a CPU (Central Processing Unit, central processor); the coprocessor is A low-power processor for processing data in the standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen. In some embodiments, the processor 501 may further include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.

The memory 502 may include one or more computer-readable storage media, which may be non-transitory. The memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 502 is used to store at least one instruction, which is executed by the processor 501 to implement the camera-based camera provided by the method embodiment in the present application And radar vehicle detection methods.

In some embodiments, the terminal 500 may optionally further include: a peripheral device interface 503 and at least one peripheral device. The processor 501, the memory 502 and the peripheral device interface 503 may be connected by a bus or a signal line. Each peripheral device may be connected to the peripheral device interface 503 through a bus, a signal line, or a circuit board. Specifically, the peripheral device includes at least one of a radio frequency circuit 504, a touch display screen 505, a camera 506, an audio circuit 507, a positioning component 508, and a power supply 509.

The peripheral device interface 503 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, the memory 502, and the peripheral device interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 501, the memory 502, and the peripheral device interface 503 or Both can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The radio frequency circuit 504 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 504 communicates with a communication network and other communication devices through electromagnetic signals. The radio frequency circuit 504 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on. The radio frequency circuit 504 can communicate with other terminals through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 504 may further include NFC (Near Field Communication) related circuits, which is not limited in this application.

The display screen 505 is used to display UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to collect touch signals on or above the surface of the display screen 505. The touch signal can be input to the processor 501 as a control signal for processing. At this time, the display screen 505 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards. In some embodiments, there may be one display screen 505, which is provided with the front panel of the terminal 500; in other embodiments, there may be at least two display screens 505, which are respectively provided on different surfaces of the terminal 500 or have a folded design; In still other embodiments, the display screen 505 may be a flexible display screen, which is disposed on the curved surface or the folding surface of the terminal 500. Even, the display screen 505 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen. The display screen 505 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light emitting diode) and other materials.

The camera component 506 is used to collect images or videos. Optionally, the camera assembly 506 includes a front camera and a rear camera. Usually, the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal. In some embodiments, there are at least two rear cameras, each of which is a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera Integrate with wide-angle camera to realize panoramic shooting and VR (Virtual Reality, virtual reality) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 506 may also include a flash. The flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation at different color temperatures.

The audio circuit 507 may include a microphone and a speaker. The microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 501 for processing, or input them to the radio frequency circuit 504 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively installed in different parts of the terminal 500. The microphone can also be an array microphone or an omnidirectional acquisition microphone. The speaker is used to convert the electrical signal from the processor 501 or the radio frequency circuit 504 into sound waves. The speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible by humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes. In some embodiments, the audio circuit 507 may also include a headphone jack.

The positioning component 508 is used to locate the current geographic location of the terminal 500 to implement navigation or LBS (Location Based Service). The positioning component 508 may be a positioning component based on the GPS (Global Positioning System) of the United States, the Beidou system of China, or the Galileo system of Russia.

The power supply 509 is used to supply power to various components in the terminal 500. The power source 509 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery can also be used to support fast charging technology.

In some embodiments, the terminal 500 further includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: an acceleration sensor 511, a gyro sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.

The acceleration sensor 511 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 can be used to detect components of gravity acceleration on three coordinate axes. The processor 501 may control the touch display screen 505 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 can also be used for game or user movement data collection.

The gyro sensor 512 can detect the body direction and rotation angle of the terminal 500, and the gyro sensor 512 can cooperate with the acceleration sensor 511 to collect a 3D motion of the user on the terminal 500. Based on the data collected by the gyro sensor 512, the processor 501 can realize the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.

The pressure sensor 513 may be disposed on the side frame of the terminal 500 and/or the lower layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, it can detect the user's grip signal on the terminal 500, and the processor 501 can perform left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed on the lower layer of the touch display screen 505, the processor 501 controls the operability control on the UI interface according to the user's pressure operation on the touch display screen 505. The operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 514 is used to collect the user's fingerprint, and the processor 501 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the user's identity based on the collected fingerprint. When the user's identity is recognized as a trusted identity, the processor 501 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings. The fingerprint sensor 514 may be provided on the front, back, or side of the terminal 500. When a physical button or manufacturer logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or manufacturer logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 can control the display brightness of the touch display 505 according to the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 505 is reduced. In another embodiment, the processor 501 can also dynamically adjust the shooting parameters of the camera assembly 506 according to the ambient light intensity collected by the optical sensor 515.

The proximity sensor 516, also called a distance sensor, is usually provided on the front panel of the terminal 500. The proximity sensor 516 is used to collect the distance between the user and the front of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front of the terminal 500 gradually becomes smaller, the processor 501 controls the touch display 505 to switch from the bright screen state to the breathing screen state; when the proximity sensor 516 detects When the distance from the user to the front of the terminal 500 gradually becomes larger, the processor 501 controls the touch display 505 to switch from the breath-hold state to the bright-screen state.

Those skilled in the art may understand that the structure shown in FIG. 12 does not constitute a limitation on the terminal 500, and may include more or fewer components than those illustrated, or combine certain components, or adopt different component arrangements.

Referring to FIG. 13, an embodiment of the present application provides a target detection system 600, including a radar 601 provided on a vehicle, a vehicle-mounted camera 602 provided on the vehicle, and communicating with the radar 601 and the vehicle-mounted camera 602 Of detection device 603,

The on-board camera 602 is used to photograph the surroundings of the vehicle to obtain a video image of the current frame and provide the video image of the current frame to the detection device 603;

The radar 601 is configured to generate a current frame speed distance image based on the transmitted radar signal and the received echo signal, and provide the current frame speed distance image to the detection device 603;

The detection device 603 is configured to detect each image object around the vehicle, the confidence of each image object, and the position of each image object in the image coordinate system from the video image provided by the vehicle-mounted camera 602 Information; each radar target around the vehicle is detected from the speed distance image provided by the radar 601, the position information of each radar target in the radar coordinate system, and the confidence of each radar target, the The confidence level is used to indicate the probability that the target category of the real target corresponding to the image target or the radar target is the specified category; according to the position information and confidence level of the image target whose confidence level exceeds the first preset threshold value Position information of the radar target with a threshold value to obtain a first perspective matrix, where the first perspective matrix is used to represent a conversion relationship between the image coordinate system and a preset road coordinate system; through the first perspective matrix, A target category is detected from an image target whose confidence level does not exceed the first preset threshold and a radar target whose confidence level does not exceed the second preset threshold.

Optionally, the in-vehicle camera 602 is provided in front and rear and/or both sides of the vehicle, and the radar 601 is provided in front and rear of the vehicle.

Optionally, the radar 601 is a millimeter wave radar.

After considering the description and practice of the application disclosed herein, those skilled in the art will easily think of other embodiments of the application. This application is intended to cover any variations, uses, or adaptations of this application, which follow the general principles of this application and include common general knowledge or customary technical means in the technical field not disclosed in this application . The description and examples are to be considered exemplary only, and the true scope and spirit of this application are pointed out by the following claims.

It should be understood that the present application is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from the scope thereof. The scope of this application is limited only by the appended claims.

Claims

A target detection method in which a car camera and a car radar are linked is characterized in that the method includes:

Detecting each image object around the vehicle from the video image provided by the onboard camera, the confidence of each image object, and the position information of each image object in the image coordinate system;

Each radar target around the vehicle is detected from the speed and distance images provided by the radar, the position information of each radar target in the radar coordinate system, and the confidence of each radar target, the confidence is The probability that the target category of the real target corresponding to the image target or the radar target is a specified category;

Obtain a first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold, the first perspective matrix is used to represent the image image The conversion relationship between the coordinate system and the preset road coordinate system;

Through the first perspective matrix, target categories are detected from image targets whose confidence does not exceed the first preset threshold and radar targets whose confidence does not exceed the second preset threshold.
The method of claim 1, wherein the method further comprises:

Classify image objects whose confidence exceeds the first preset threshold according to the detected confidence of each of the image objects, obtain the target category of the real target corresponding to the image object, and output the target category, and

According to the detected confidence level of each of the radar targets, classify the radar target whose confidence level exceeds the second preset threshold to obtain the target category of the real target corresponding to the radar target, and output the target category.
The method according to claim 1, wherein the first position is acquired based on the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold Before perspective matrix, the method further includes:

N associated target pairs are determined from an image target with a confidence exceeding a first preset threshold and a radar target with a confidence exceeding a second preset threshold, any one of the associated target pairs includes a radar target and an image satisfying a preset association condition Target, the N is a positive integer greater than or equal to 1;

The obtaining the first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold includes:

The first perspective matrix is determined according to the position information of the radar target and the image target in the N associated target pairs.
The method according to claim 3, wherein the determining of N associated target pairs from the image target whose confidence exceeds the first preset threshold and the radar target whose confidence exceeds the second preset threshold includes:

Mapping the first image object from the image coordinate system to the road coordinate system according to the first position information of the first image object and the stored second perspective matrix, to obtain the first image object on the road Corresponding third position information in the coordinate system; wherein the first image object is an image object whose confidence exceeds a first preset threshold, and the first position information is the first image object in the image coordinate system Location information

Mapping the first radar target from the radar coordinate system to the road coordinate system according to the second position information of the first radar target and the stored third perspective matrix, to obtain the first radar target on the road Corresponding fourth position information in the coordinate system; wherein the first radar target is a radar target whose confidence exceeds a second preset threshold, and the second position information is the first radar target in the radar coordinate system Location information

According to the third position information of each of the first image targets and the fourth position information of each of the first radar targets, position-associating each of the first image targets with each of the first radar targets To obtain the N associated target pairs.
The method according to claim 4, wherein the performing position association on each of the first image targets and each of the second radar targets to obtain the N associated target pairs includes:

Determine the projected area of the first image target in the road coordinate system according to the third position information of the first image target and the fourth position information of the first radar target, the first A projected area of a radar target in the road coordinate system, and an overlapping projected area of the first image target and the first radar target in the road coordinate system;

Determine each of the first image targets according to the projected area of the first image target in the road coordinate system, the projected area of the first radar target in the road coordinate system, and the overlapping projected area The associated cost with each of the second radar targets;

It is determined from the first image target and the second image target that the first radar target and the first image target with the smallest associated cost are the associated target pairs, and then the N associated target pairs are obtained.
The method according to claim 4, wherein the determining the first perspective matrix according to the position information of the radar target and the image target in the N associated target pairs includes:

For any one of the N associated target pairs, according to the position information of the first radar target in the any associated target pair, modify the first of the any associated target pair Image target location information;

Modify the second perspective matrix according to the corrected position information of each of the first image targets in the N associated target pairs to obtain the first perspective matrix.
The method according to claim 1, characterized in that, through the first perspective matrix, the image target whose confidence level does not exceed the first preset threshold and the confidence level does not exceed the second preset threshold The target category was detected in the radar target, including:

M feature fusion target pairs are determined from the image target whose confidence level does not exceed the first preset threshold and the radar target whose confidence level does not exceed the second preset threshold, any one of the feature fusion target pairs includes those satisfying the preset association condition One radar target and one image target, where M is a positive integer greater than or equal to 1;

For any of the feature fusion target pairs, the echo energy feature of the radar target and the image feature of the image target in the feature fusion target pair are respectively convoluted and spliced to obtain the feature fusion The fusion feature map corresponding to the target pair;

After performing convolution and full connection calculation on the fusion feature map, input it to a classification network for target classification to obtain a target category corresponding to the fusion feature map.
The method of claim 7, wherein the determining of M feature fusion target pairs from the second image target and the second radar target includes:

Mapping the second image object from the image coordinate system to the road coordinate system through the first perspective matrix to obtain the corresponding position information of the second image object in the road coordinate system, and, by A pre-stored third perspective matrix, mapping the second radar target from the radar coordinate system to the road coordinate system, to obtain the corresponding position information of the second radar target in the road coordinate system, the first The second image target is an image target whose confidence level does not exceed the first preset threshold, and the second radar target is a radar target whose confidence level does not exceed the second preset threshold;

According to the corresponding position information of each second image target in the road coordinate system and the corresponding position information of each second radar target in the road coordinate system, for each second image target and each position The second radar target performs position correlation to obtain the M feature fusion target pairs.
The method according to claim 1, wherein the detecting the confidence of each image target around the vehicle from the video image provided by the on-board camera includes:

Obtain the classification confidence, tracking frame number confidence and position of any one of the image objects according to the current frame video image provided by the vehicle camera and the multi-frame historical frame video image close to the current frame video image Confidence;

Determine the confidence of the image target according to one or more of the classification confidence, position confidence, and tracking frame number confidence;

The detection of the confidence of each radar target around the vehicle from the speed and distance images collected by the radar includes:

Determine the radar according to the intensity of the echo energy of any of the radar targets in the current frame speed distance image, the distance from the vehicle, and the duration of the radar target in the multi-frame historical frame speed distance image The confidence of the goal.
A target detection device linked by a vehicle-mounted camera and a vehicle-mounted radar is characterized in that the device includes:

The first detection module is used to detect each image target around the vehicle from the video image provided by the onboard camera, the confidence of each image target, and the position information of each image target in the image coordinate system;

The second detection module is used to detect each radar target around the vehicle from the speed and distance images provided by the radar, the position information of each radar target in the radar coordinate system, and the confidence of each radar target Degree, the confidence is used to indicate the probability that the target category of the real target corresponding to the image target or the radar target is a specified category;

An obtaining module, configured to obtain a first perspective matrix based on the position information of the image target with the confidence level exceeding the first preset threshold and the position information of the radar target with the confidence level exceeding the second preset threshold It represents the conversion relationship between the image coordinate system and the preset road coordinate system;

The third detection module is used to detect the target category from the image target whose confidence does not exceed the first preset threshold and the radar target whose confidence does not exceed the second preset threshold through the first perspective matrix .
The device of claim 10, wherein the device further comprises:

A classification module is used to classify image objects whose confidence exceeds the first preset threshold according to the detected confidence of each of the image objects, to obtain the target category of the real target corresponding to the image object, and to Target category output, and, based on the detected confidence of each of the radar targets, classify the radar targets with confidence exceeding the second preset threshold to obtain the target category of the real target corresponding to the radar target, and The target category is output.
The device of claim 10, wherein the device further comprises:

A determining module, configured to determine N associated target pairs from an image target with a confidence exceeding a first preset threshold and a radar target with a confidence exceeding a second preset threshold, any one of the associated target pairs includes satisfying a preset associated condition Radar target and image target, N is a positive integer greater than or equal to 1;

The acquisition module is configured to determine the first perspective matrix according to the position information of the radar target and the image target in the N associated target pairs.
The apparatus according to claim 12, wherein the determination module is configured to:

Mapping the first image object from the image coordinate system to the road coordinate system according to the first position information of the first image object and the stored second perspective matrix, to obtain the first image object on the road Corresponding third position information in the coordinate system; wherein the first image object is an image object whose confidence exceeds a first preset threshold, and the first position information is the first image object in the image coordinate system Location information

Mapping the first radar target from the radar coordinate system to the road coordinate system according to the second position information of the first radar target and the stored third perspective matrix, to obtain the first radar target on the road Corresponding fourth position information in the coordinate system; wherein the first radar target is a radar target whose confidence exceeds a second preset threshold, and the second position information is the first radar target in the radar coordinate system Location information

According to the third position information of each of the first image targets and the fourth position information of each of the first radar targets, position-associating each of the first image targets with each of the first radar targets To obtain the N associated target pairs.
The apparatus according to claim 13, wherein the acquisition module is configured to:

For any one of the N associated target pairs, according to the position information of the first radar target in the any associated target pair, modify the first of the any associated target pair Image target location information;

Modify the second perspective matrix according to the corrected position information of each of the first image targets in the N associated target pairs to obtain the first perspective matrix.
The device of claim 10, wherein the third detection module is configured to:

M feature fusion target pairs are determined from the image target whose confidence level does not exceed the first preset threshold and the radar target whose confidence level does not exceed the second preset threshold, any one of the feature fusion target pairs includes those satisfying the preset association condition One radar target and one image target, where M is a positive integer greater than or equal to 1;

For any of the feature fusion target pairs, the echo energy feature of the radar target and the image feature of the image target in the feature fusion target pair are respectively convoluted and spliced to obtain the feature fusion The fusion feature map corresponding to the target pair;

After performing convolution and full connection calculation on the fusion feature map, input it to a classification network for target classification to obtain a target category corresponding to the fusion feature map.
The device according to claim 15, wherein the third detection module is configured to:

Mapping the second image object from the image coordinate system to the road coordinate system through the first perspective matrix to obtain the corresponding position information of the second image object in the road coordinate system, and, by A pre-stored third perspective matrix, mapping the second radar target from the radar coordinate system to the road coordinate system, to obtain the corresponding position information of the second radar target in the road coordinate system, the first The second image target is an image target whose confidence level does not exceed the first preset threshold, and the second radar target is a radar target whose confidence level does not exceed the second preset threshold;

According to the corresponding position information of each second image target in the road coordinate system and the corresponding position information of each second radar target in the road coordinate system, for each second image target and each position The second radar target performs position correlation to obtain the M feature fusion target pairs.
The device of claim 10, wherein the first detection module is configured to:

Obtain the classification confidence, tracking frame number confidence and position of any one of the image objects according to the current frame video image provided by the vehicle camera and the multi-frame historical frame video image close to the current frame video image Confidence;

Determine the confidence of the image target according to one or more of the classification confidence, position confidence, and tracking frame number confidence;

The second detection module is used to:

Determine the radar according to the intensity of the echo energy of any of the radar targets in the current frame speed distance image, the distance from the vehicle, and the duration of the radar target in the multi-frame historical frame speed distance image The confidence of the goal.
A target detection system includes a radar installed on a vehicle, an on-board camera installed on the vehicle, and a detection device that communicates with the radar and the on-board camera.

The on-board camera is used to photograph the surroundings of the vehicle to obtain a video image of the current frame, and provide the video image of the current frame to the detection device;

The radar is used to generate a current frame speed distance image based on the transmitted radar signal and the received echo signal, and provide the current frame speed distance image to the detection device;

The detection device is configured to detect each image target around the vehicle from the video image provided by the on-board camera, the confidence of each image target, and the position information of each image target in the image coordinate system Detecting each radar target around the vehicle from the speed and distance images provided by the radar, and the position information of each radar target in the radar coordinate system, the confidence level of the radar target, the confidence level is The probability that the target category of the real target corresponding to the image target or the radar target is a specified category; according to the location information and confidence level of the image target whose confidence exceeds the first preset threshold Obtain the first perspective matrix of the position information of the radar target, and the first perspective matrix is used to represent the conversion relationship between the image coordinate system and the preset road coordinate system; through the first perspective matrix, from the confidence level The target category is detected from the image target that does not exceed the first preset threshold and the radar target that does not exceed the second preset threshold.
The system according to claim 18, wherein the on-vehicle camera is disposed in front and rear and/or on both sides of the vehicle, and the radar is disposed in front and rear of the vehicle.
The system of claim 18, wherein the radar is a millimeter wave radar.
An electronic device, characterized in that it includes:

At least one processor; and

At least one memory;

The at least one memory stores one or more instructions, and the one or more instructions are configured to be executed by the at least one processor to execute the following instructions:

Detecting each image object around the vehicle from the video image provided by the onboard camera, the confidence of each image object, and the position information of each image object in the image coordinate system;

Each radar target around the vehicle is detected from the speed and distance images provided by the radar, the position information of each radar target in the radar coordinate system, and the confidence of each radar target, the confidence is The probability that the target category of the real target corresponding to the image target or the radar target is a specified category;

Obtain a first perspective matrix according to the position information of the image target whose confidence exceeds the first preset threshold and the position information of the radar target whose confidence exceeds the second preset threshold, the first perspective matrix is used to represent the image image The conversion relationship between the coordinate system and the preset road coordinate system;

Through the first perspective matrix, target categories are detected from image targets whose confidence does not exceed the first preset threshold and radar targets whose confidence does not exceed the second preset threshold.