WO2022036567A1

WO2022036567A1 - Target detection method and device, and vehicle-mounted radar

Info

Publication number: WO2022036567A1
Application number: PCT/CN2020/109879
Authority: WO
Inventors: 郝智翔; 李延召
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2022-02-24
Also published as: CN114450720A

Abstract

A target detection method and device and a vehicle-mounted radar capable of enhancing the accuracy of target detection and the robustness of detection results. The method comprises: acquiring a first feature map of a current frame in multiple continuous point cloud frames (110); determining a historical attention map according to the detection results of at least one frame preceding the current frame (120); processing the first feature map according to the historical attention map to obtain a second feature map after the attention is switched (130); and determining the detection results of the current frame according to the second feature map (140).

Description

Method, device and vehicle-mounted radar for target detection

Copyright notice

The disclosure of this patent document contains material that is subject to copyright protection. This copyright belongs to the copyright owner. The copyright owner has no objection to the reproduction by anyone of the patent document or the patent disclosure as it exists in the official records and archives of the Patent and Trademark Office.

technical field

The present application relates to the field of radar applications, and more particularly, to a method, device and vehicle-mounted radar for target detection.

Background technique

The ability to detect and perceive the location information of surrounding vehicles during driving is a necessary condition for the realization of autonomous driving technology. Vehicle detection is to collect environmental information around the vehicle in real time through sensors deployed on the vehicle platform, such as cameras, lidars, millimeter-wave radars, etc., and on this basis, obtain the location information of other vehicles in the surrounding environment through detection algorithms. Based on this information, the autonomous driving system can make control decisions to drive the vehicle to operate autonomously. The accuracy of vehicle detection and the robustness of detection results directly affect the safety of autonomous driving. Therefore, how to improve the accuracy of vehicle detection and the robustness of detection results has become an urgent problem to be solved.

SUMMARY OF THE INVENTION

The present application provides a method, device and vehicle-mounted radar for target detection, which can improve the accuracy of target detection and the robustness of detection results.

In a first aspect, a method for target detection is provided, including:

Obtain the first feature map of the current frame in the consecutive multi-frame point cloud frames;

Determine a historical attention map according to the detection result of at least one frame before the current frame;

According to the historical attention map, the first feature map is processed to obtain the second feature map after the attention shift;

According to the second feature map, the detection result of the current frame is determined.

In a second aspect, an apparatus for target detection is provided, comprising: a memory and a processor,

the memory for storing programs;

The processor is configured to call the program, and when the program is executed, is configured to perform the following operations:

In a third aspect, a vehicle-mounted radar is provided, including the device for target detection according to the second aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the method described in the first aspect.

Based on the above technical solution, when processing the current frame in the multi-frame point cloud frame, the detection result of at least one previous frame is used. Therefore, by efficiently using the historical detection information to guide the target detection process of the current frame, it is possible to Significantly improve the accuracy of target detection and the robustness of detection results.

Description of drawings

FIG. 1 is a schematic flowchart of a method for target detection provided by an embodiment of the present application.

Fig. 2 is a schematic diagram of processing the feature map of the current frame according to the historical attention map.

FIG. 3 is a flowchart based on a possible implementation of the method shown in FIG. 1 .

Figure 4 is the detection result of the existing deep neural network without adding historical detection information.

FIG. 5 is the detection result of the target by the deep neural network of the present application with historical detection information added.

FIG. 6 is a schematic structural diagram of an apparatus for target detection provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and features in the embodiments may be combined with each other without conflict.

It should be understood that the specific examples herein are only for helping those skilled in the art to better understand the embodiments of the present application, rather than limiting the scope of the embodiments of the present application.

It should also be understood that, in various embodiments of the present application, the size of the sequence numbers of each process does not imply the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, rather than the embodiment of the present application. implementation constitutes any limitation.

It should also be understood that the various implementation manners described in this specification may be implemented individually or in combination, which are not limited by the embodiments of the present application.

Unless otherwise specified, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by those skilled in the technical field of the present application. The terminology used in this application is for the purpose of describing specific embodiments only and is not intended to limit the scope of the application.

Usually, when using lidar for target detection, only a single frame of point cloud frame is used as the input of the target detection algorithm. However, lidar can continuously record the environmental point cloud, so in the continuous multi-frame point cloud frame, the historical detection information before the current frame is of great significance to the target detection process of the current frame. This application proposes a target detection solution, which utilizes historical detection information to improve the accuracy of target detection algorithms and the robustness of target detection results. The robustness of the results, thus guaranteeing the safety of autonomous driving.

FIG. 1 is a schematic flowchart of a method for target detection according to an embodiment of the present application. This method can be applied to object detection, especially vehicle detection in autonomous driving scenarios. As shown in FIG. 1 , the method 100 for object detection includes some or all of the following steps.

In step 110, the first feature map of the current frame in the consecutive multi-frame point cloud frames is acquired.

When a deep neural network is used for target detection, multiple point cloud frames are continuously collected, and each point cloud frame is processed. A deep neural network is essentially composed of a series of stacked two-dimensional convolutional operations, also known as convolutional neural networks or deep convolutional neural networks. In a deep neural network, the first feature map of the current frame that needs to be subjected to a convolution operation includes spatial information and corresponding feature information in the current frame input by the deep neural network.

Taking the input of the deep neural network as a point cloud top view as an example, each element in the first feature map generated in the intermediate process of the deep neural network corresponds to a pixel point and corresponding feature information of an area in the two-dimensional top view. , wherein the first two dimensions in the first feature map correspond to the positions in the length and width directions in the two-dimensional top view. The element value of each element in the first feature map indicates the degree of interest of the deep neural network to the region corresponding to each element. For example, if the absolute value of an element is large, it indicates the region corresponding to the element by the deep neural network. More interesting, the feature information required by the deep neural network possessed by this area has a greater contribution to the subsequent output.

In step 120, a historical attention map is determined according to the detection result of at least one frame before the current frame.

This embodiment uses the attention transfer mechanism to process the first feature map of the current frame, so an attention map needs to be obtained, and the attention map is a historical attention map determined according to the detection result of at least one frame before the current frame. Preferably, the historical attention map can be determined according to the detection result of the previous frame of the current frame.

Different elements in the first feature map of the current frame correspond to different areas in the scene, and the element value of each element represents the degree of interest of the deep neural network in the area corresponding to each element. At this time, the historical attention map includes A weight corresponding to each element in the first feature map, or in other words, the historical attention map includes weights corresponding to different regions in the first feature map.

Optionally, in an implementation manner of step 120, it can be determined that the historical attention map corresponds to each element of the first feature map of the current frame according to the target distribution in the detection results of at least one frame before the current frame. weight value.

For example, the weight corresponding to the region containing the target in the historical attention map in the detection result of at least one frame before the current frame is greater than the corresponding weight of the region not containing the target in the historical attention map.

In the scene of vehicle detection, when it is known which areas contain objects such as vehicles, the weights corresponding to the areas where the vehicles are located are increased in the historical attention map, and the weights corresponding to areas that do not contain vehicles are reduced, so as to use the historical attention map. The attention map completes the transfer of attention to the first feature map to guide the deep neural network to pay attention to areas that may contain vehicles, and reduce attention to other areas. In this way, while increasing the probability of detecting the target, the probability of false detection of the target in the detection result can be reduced.

For another example, different types of targets have different corresponding weights in the historical attention map. Among them, different types of target examples can be vehicles, people, and environmental information.

Among them, the weights corresponding to different types of targets in the historical attention map can be determined according to the degree of interest in different types of targets. The area of is assigned a relatively large weight. For example, if you are most interested in vehicles, followed by people, and least interested in environmental information, then when generating the historical attention map, you can assign a larger value to the weight corresponding to the area where the vehicle is located, and the weight corresponding to the area where the person is located can be assigned a larger value. The weight is assigned an intermediate value, and the weight corresponding to the area where the environment is located is assigned a smaller value.

In addition to independently setting the weights corresponding to each element in the first feature map in the historical attention map according to the target distribution, the corresponding weights of different types of targets in the historical attention map can also be based on the previous at least The detection results of one frame are obtained by deep learning, that is, the weights corresponding to the elements in the first feature map in the historical attention map are learned from the detection results of at least one previous frame by using several layers of stacked convolutions through deep learning. .

In these two methods, it is convenient to adjust the relative size of each weight value in different situations for the way of independently setting each weight value in the historical attention map; and the method of determining each weight value in the historical attention map through deep learning can use The training process automatically optimizes the weight generation process, which is convenient to improve the optimization effect of the deep neural network on the detection results. In practical applications, an appropriate method can be selected according to the situation to obtain the historical attention map. In addition, the historical attention map can also be obtained in other ways, which is not limited in this application.

In step 130, according to the historical attention map, the first feature map is processed to obtain the second feature map after the attention shift.

After obtaining the historical attention map, it is necessary to process the first feature map of the current frame according to the historical attention map, so as to obtain the second feature map after the attention has been shifted.

Optionally, in a possible implementation manner of step 130, a convolution operation may be performed on the first feature map according to the weights corresponding to each element of the first feature map in the historical attention map to obtain the Second feature map.

It should be understood that the first feature map of the current frame is the feature map before the attention shift, and the second feature map of the current frame is the feature map after the attention shift. In the deep neural network of the embodiment of the present application, the target detection result of the current frame is obtained based on the second feature map.

When processing the first feature map according to the historical attention map, for example, the element value of each element of the first feature map of the current frame can be multiplied by the weight corresponding to each element in the historical attention map to obtain The element value of each element of the second feature map after attention shift.

Taking FIG. 2 as an example for illustration, FIG. 2 shows the first feature map of the current frame, the historical attention map, and the second feature map of the current frame after the attention is shifted. The historical attention map may be obtained based on the detection results of the previous frame or the previous several frames of the current frame, wherein, the area that contains the target in the detection result has a relatively large corresponding weight in the historical attention map, and there is no Regions containing objects have relatively small weights in the historical attention map.

As shown in Figure 2, taking a 3×3 feature map as an example, the first feature map includes multiple elements corresponding to multiple areas in the scene, where the element value of each element represents the depth neural network corresponding to each element. The level of interest in the area. The weights in the attention map corresponding to each element in the first feature map range from 0 to 1. The larger the value, the more the deep neural network should focus on the corresponding area, which is based on at least one previous frame. The area identified by the detection results that may contain the target. By multiplying the element value of each element in the first feature map with the element value of the corresponding position in the historical attention map, the element value of each element in the second feature map can be obtained. Here, the mathematical operation of element multiplication is used to transfer attention. As can be seen from Figure 2, after the attention is transferred, the attention of the deep neural network starts from the second row and third column of the first feature map. Moved to row 1, column 1 and row 3, column 3 of the second feature map. And it can be seen that the difference between the element values of each element in the second feature map after the attention shift becomes larger, indicating that the deep neural network has become more focused on the current frame.

In step 140, the detection result of the current frame is determined according to the second feature map.

In this embodiment, a historical attention map is obtained according to the detection result of the previous at least one frame, and according to the historical attention map, attention is transferred to the first feature map of the current frame to obtain the second feature map of the current frame, so as to obtain the second feature map of the current frame according to the historical attention map. The second feature map determines the target detection result of the current frame.

Since the self-driving vehicle operates in a continuously changing environment, other vehicles or objects around the vehicle will not suddenly decrease or appear. Therefore, when the vehicle is running at a normal speed, the point cloud frames generated by high-speed sampling, such as lidar with a sampling frequency of 10 Hz, have high similarity, and the target position information in the previous frame or in the previous frames. It has a strong reference for target detection in the next frame. By using historical detection information, the detection effect of target detection can be improved and the occurrence of inter-frame anomalies can be reduced.

The method is suitable for any target detection scene, especially, suitable for processing point cloud frames generated by a new type of non-repetitive scanning lidar. At this time, the detection device used to generate the multi-frame point cloud frames The scan trajectories in the frame point cloud frame are different.

Optionally, in an implementation manner, the method 100 further includes: updating the cached historical attention map according to the detection result of the current frame, so as to be used for processing the feature map of the next frame of the current frame.

That is to say, after using the historical attention map to process the first feature map of the current frame to obtain a second feature map, the detection result of the current frame obtained based on the second feature map is also used to update the historical attention map, where , the updated historical attention map is determined based on the detection result of the current frame, and is used to perform attention transfer processing on the feature map of the next frame of the current frame.

For example, in the flow chart of target detection shown in Figure 3, the detection result of the Nth frame will be affected by the detection result of the previous frame, so using the historical attention map is an effective method to introduce historical detection information. Here it is assumed that attention is shifted to the feature map of the Nth frame based on the detection results of the N-1th frame. In the process of target detection for the Nth frame, it is determined whether the area corresponding to each weight in the historical attention map contains the target of interest, such as a vehicle, in the detection result of the N-1th frame. If it contains a target, then The weight corresponding to this area in the historical attention map is set to a larger value, and if the target is not included, the weight corresponding to this area in the historical attention map is set to a small value. In this way, the historical attention map that needs to be used in the detection of the Nth frame is obtained from the detection result of the N-1th frame.

As shown in FIG. 3, in step 301, the Nth frame of point cloud frame is obtained; in step 302, the historical attention map is obtained according to the detection result of the N-1th point cloud frame; in step 303, the Nth frame The data of the point cloud frame and the historical attention map obtained according to the detection result of the N-1th frame are input to the deep neural network for target detection; in step 304, in the middle of the detection of the Nth frame of point cloud frame by the deep neural network stage, complete the attention transfer according to the historical attention map corresponding to the N-1th frame and introduce the historical detection information, so as to obtain the detection result of the Nth frame point cloud frame helped by the historical detection information; in step 305, according to the Nth frame The detection result of the point cloud frame updates the historical attention map and is used to prepare for the processing of the next N+1th point cloud frame.

In Figure 3, the historical attention map used in the detection of the Nth frame is generated based on the detection results of the N-1th frame. If the N-1th frame does not exist, the weights corresponding to all regions can be used as 0.5 historical attention map.

In the embodiment of the present application, the method of introducing historical detection information by using the attention transfer mechanism can be combined with the use of an existing deep neural network for detecting targets, and the process can include two parts: training and speculation. Few changes can give the deep neural network the ability to utilize historical detection information.

Due to the introduction of historical detection information, the deep neural network after adding the attention shifting mechanism needs to be retrained. During training, a frame of point cloud frame can be randomly extracted from the training data, and the previous frame of the frame can be used as the basis for the generation of the historical attention map. The attention shift of this frame, resulting in the output of the deep neural network. Usually, attention can be transferred in the latter part of the deep neural network, the feature map of the frame in the deep neural network can be extracted, and the feature map can be processed according to the previous method according to the historical attention map, and the result after the attention transfer can be obtained. The feature map of historical detection information is introduced, and then the feature map is sent to the deep neural network to obtain the final detection result. Finally, the network loss function (loss) can also be optimized by using, for example, a gradient descent optimizer to complete the training process.

When the deep neural network for object detection is trained, it can be inferred. At this time, the lidar needs to continuously provide continuous point cloud frames, and the deep neural network uses the historical detection information to give real-time detection results. In implementation, for example, a matrix cache of attention map can be maintained. Whenever the detection of a frame is completed, a new historical attention map is generated using the detection result of the frame, and the matrix cache of attention map is updated with the new historical attention map. The new historical attention map will guide the attention shift of the deep neural network and introduce historical detection information when the next frame is detected.

In this way, by making relatively simple changes to the deep neural network, historical detection information can be introduced into the deep neural network without significantly increasing the amount of calculation, thereby improving the accuracy of target detection and the robustness of the detection results.

FIG. 4 shows the detection result of the target by the existing deep neural network without adding the historical detection information, and FIG. 5 shows the detection result of the target by the deep neural network of the present application adding the historical detection information. Among them, the circles are the detected vehicles. Especially for the non-repetitive scanning lidar, due to the different scanning trajectories in two adjacent point cloud frames, it is easy to miss the target detection. The vehicle at the top of Figure 5 is the vehicle that was missed in Figure 4. It can be seen that when the solution of the present application is used for vehicle detection, after the historical detection information is introduced in the processing of the feature map of the point cloud frame, the missed detection of distant vehicles is significantly improved, and the overall vehicle detection result is improved. .

FIG. 6 is a schematic structural diagram of an apparatus for target detection provided by an embodiment of the present application. Specifically, the apparatus 600 includes: a memory 601 , a processor 602 , and a data interface 603 .

The memory 601 may include a volatile memory (Volatile Memory); the memory 601 may also include a non-volatile memory (Non-Volatile Memory); the memory 601 may also include a combination of the above-mentioned types of memories. The processor 602 may be a central processing unit (Central Processing Unit, CPU). The processor 602 may further include a hardware object detection device. The above-mentioned hardware target detection device may be an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a programmable logic device (Programmable Logic Device, PLD), or a combination of the two. For example, it can be a complex programmable logic device (Complex Programmable Logic Device, CPLD), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), or any combination thereof.

In an implementation manner, the memory 601 is used to store a program, and when the program is executed, the processor 602 can call the program stored in the memory 601 to perform the following steps:

In an implementation manner, different elements of the feature map correspond to different regions in the scene, wherein the element value of each element represents the degree of interest of the deep neural network in the region corresponding to each element, and the attention map includes The weight corresponding to each element. The processing of the first feature map according to the weights corresponding to each element of the first feature map in the historical attention map includes: according to the historical attention map, processing the first feature map on the first feature map. The feature map is convolved.

In an implementation manner, performing a convolution operation on the first feature map according to the weights corresponding to each element of the first feature map in the historical attention map includes: converting the current frame The element value of each element of the first feature map is multiplied by the weight corresponding to each element in the historical attention map to obtain the element value of each element of the second feature map after attention has been shifted.

In an implementation manner, the determining the historical attention map according to the detection result of the previous at least one frame of the current frame includes: determining the historical attention map according to the target distribution in the detection result of the previous at least one frame Weights corresponding to each element of the first feature map in the attention map.

In an implementation manner, the weight corresponding to the area containing the target in the historical attention map in the detection result of the previous at least one frame is greater than the corresponding weight of the area not containing the target in the attention map .

In an implementation manner, different types of targets have different corresponding weights in the attention map.

In an implementation manner, the corresponding weights of different types of objects in the attention map are determined according to the degree of interest in different types of objects.

In an implementation manner, weights corresponding to different types of targets in the attention map are obtained by deep learning according to the detection result of the previous frame.

In one implementation, the different types of objects include vehicle, person, and environmental information.

In an implementation manner, the scanning trajectories of the detection device for generating the multi-frame point cloud frames are different in two adjacent point cloud frames.

In an implementation manner, the processor 602 is further configured to perform: according to the detection result of the current frame, update the cached historical attention map, so as to be used for the feature of the next frame of the current frame image processing.

In the embodiment of the present application, when the device for target detection processes the current frame in the multi-frame point cloud frame, the detection result of at least one previous frame is used. The guidance of the target detection process can significantly improve the accuracy of target detection and the robustness of detection results.

An embodiment of the present application also provides a vehicle-mounted radar, including the apparatus for target detection described in FIG. 6 .

The embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by the processor, the application for the target described in FIG. 1 of the embodiment of the present application is realized The detection method can also implement the apparatus for target detection described in FIG. 6 , which will not be repeated here.

The computer-readable storage medium may be an internal storage unit of the device described in any of the foregoing embodiments, such as a hard disk or a memory of the device. The computer-readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card equipped on the device , Flash Card, etc. Further, the computer-readable storage medium may also include both an internal storage unit of the device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the apparatus. The computer-readable storage medium can also be used to temporarily store data that has been or will be output.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. When the program is executed, it may include the flow of the embodiments of the above-mentioned methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), and the like.

The above disclosure is only a part of the embodiments of the present application, of course, the scope of the rights of the present application cannot be limited by this, so the equivalent changes made according to the claims of the present application are still within the scope of the present application.

Claims

A method for target detection, comprising:

Obtain the first feature map of the current frame in the consecutive multi-frame point cloud frames;

Determine a historical attention map according to the detection result of at least one frame before the current frame;

According to the historical attention map, the first feature map is processed to obtain the second feature map after the attention shift;

According to the second feature map, the detection result of the current frame is determined.
The method according to claim 1, wherein different elements of the first feature map correspond to different regions in the scene, wherein the element value of each element represents the deep neural network's interest in the region corresponding to each element degree, the historical attention map includes weights corresponding to each element,

Wherein, the processing of the first feature map according to the historical attention map includes:

According to the weights corresponding to each element of the first feature map in the historical attention map, a convolution operation is performed on the first feature map.
The method according to claim 2, wherein, performing a convolution operation on the first feature map according to the weights corresponding to each element of the first feature map in the historical attention map, comprising: :

Multiply the element value of each element of the first feature map of the current frame by the weight corresponding to each element in the historical attention map to obtain each element of the second feature map after attention has been shifted element value of .
The method according to claim 2 or 3, wherein the determining a historical attention map according to the detection result of at least one frame before the current frame comprises:

According to the target distribution in the detection result of the previous at least one frame, the weight corresponding to each element of the first feature map in the historical attention map is determined.
The method according to claim 4, wherein the weight corresponding to the area containing the target in the historical attention map in the detection result of the previous at least one frame is greater than that of the area not containing the target in the historical attention map. The corresponding weights in the attention map.
The method according to claim 5, wherein different types of targets have different corresponding weights in the historical attention map.
The method according to claim 6, wherein the weights corresponding to different types of targets in the historical attention map are determined according to the degree of interest in different types of targets.
The method according to claim 6, wherein the weights corresponding to different types of targets in the historical attention map are obtained by deep learning according to the detection result of the previous at least one frame.
8. The method according to any one of claims 6 to 8, wherein the different types of targets include vehicles, people and environmental information.
The method according to any one of claims 1 to 9, wherein the scanning trajectories of the detection device for generating the multi-frame point cloud frames are different in two adjacent point cloud frames.
The method according to any one of claims 1 to 10, wherein the method further comprises:

According to the detection result of the current frame, the cached historical attention map is updated for processing the feature map of the next frame of the current frame.
A device for target detection, comprising:

memory for storing programs;

a processor for invoking the program, wherein when the program is executed, for performing the following operations:

Obtain the first feature map of the current frame in the consecutive multi-frame point cloud frames;

Determine a historical attention map according to the detection result of at least one frame before the current frame;

According to the historical attention map, the first feature map is processed to obtain the second feature map after the attention shift;

According to the second feature map, the detection result of the current frame is determined.
The device according to claim 12, wherein different elements of the feature map correspond to different regions in the scene, wherein the element value of each element represents the degree of interest of the deep neural network in the region corresponding to each element, The attention map includes weights corresponding to each element,

Wherein, the processing of the first feature map according to the weights corresponding to each element of the first feature map in the historical attention map includes:

According to the historical attention map, a convolution operation is performed on the first feature map.
The apparatus according to claim 13, wherein, performing a convolution operation on the first feature map according to the weights corresponding to each element of the first feature map in the historical attention map, comprising: :

Multiply the element value of each element of the first feature map of the current frame by the weight corresponding to each element in the historical attention map to obtain each element of the second feature map after attention has been shifted element value of .
The apparatus according to claim 14, wherein the determining of the historical attention map according to the detection result of at least one frame before the current frame comprises:

According to the target distribution in the detection result of the previous at least one frame, the weights corresponding to each element of the first feature map in the historical attention map are determined.
The device according to claim 15, wherein the weight corresponding to the area containing the target in the historical attention map in the detection result of the previous at least one frame is greater than that of the area not containing the target in the attention map The corresponding weights in the force map.
The apparatus according to claim 16, wherein different types of targets have different corresponding weights in the attention map.
The apparatus according to claim 17, wherein the corresponding weights of different types of objects in the attention map are determined according to the degree of interest in different types of objects.
The apparatus according to claim 17, wherein the corresponding weights of different types of targets in the attention map are obtained by deep learning according to the detection result of the previous frame.
19. The apparatus of any one of claims 17 to 19, wherein the different types of objects include vehicle, person and environmental information.
The device according to any one of claims 12 to 19, characterized in that, the scanning trajectories of the detection device for generating the multi-frame point cloud frames are different in two adjacent point cloud frames.
The apparatus according to any one of claims 12 to 19, wherein the processor is further configured to execute:

According to the detection result of the current frame, the cached historical attention map is updated for processing the feature map of the next frame of the current frame.
A vehicle-mounted radar, characterized by comprising the device for target detection according to any one of claims 12 to 22.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 11 is implemented.