CN112528763A

CN112528763A - Target detection method, electronic device and computer storage medium

Info

Publication number: CN112528763A
Application number: CN202011334487.9A
Authority: CN
Inventors: 金智; 缪其恒
Original assignee: Zhejiang Dahua Automobile Technology Co ltd
Current assignee: Zhejiang Dahua Automobile Technology Co ltd
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2021-03-19

Abstract

The application discloses a target detection method, an electronic device and a computer storage medium, wherein the target method comprises the following steps: acquiring image data shot by a camera device and preprocessing the image data to acquire radar data detected by a millimeter wave radar and preprocessing the radar data so as to enable the radar data and the image data to be respectively input into a convolutional neural network in the same scale; cascading the radar data and the image data in multiple levels of a convolutional neural network to obtain fusion data, and outputting a target class detection result and a key point detection result in the fusion data; and performing post-processing on the category detection result and the key point detection result to output the category and the motion state of the target. By means of the method, the radar data and the image data can be subjected to cascade fusion at multiple levels, and the characteristics of the two data are fully fused, so that the accuracy and the robustness of target detection are improved.

Description

Target detection method, electronic device and computer storage medium

Technical Field

The present application relates to the field of intelligent recognition technologies, and in particular, to a target detection method, an electronic device, and a computer storage medium.

Background

With the development of the fields of interest of artificial intelligence, automatic driving and the like, intelligent identification becomes an important research direction. The target detection is the core research field of automatic driving, the accuracy of the target detection result has a very great influence on the safety of automatic driving, and the existing target detection method is difficult to effectively utilize the characteristics of data collected by a sensing device, so that the target identification effect is poor. In view of the above, how to improve the accuracy of target detection is an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a target detection method, electronic equipment and a computer storage medium, which can perform cascade fusion on radar data and image data at multiple levels, and fully fuse the characteristics of the two data so as to improve the accuracy and robustness of target detection.

In order to solve the above technical problem, a first aspect of the present application provides a target detection method, including: acquiring image data shot by a camera device and preprocessing the image data to acquire radar data detected by a millimeter wave radar and preprocessing the radar data so as to enable the radar data and the image data to be respectively input into a convolutional neural network in the same scale; cascading the radar data and the image data in multiple levels of the convolutional neural network to obtain fusion data, and outputting a category detection result and a key point detection result of a target in the fusion data; and performing post-processing on the category detection result and the key point detection result to output the category and the motion state of the target.

The steps of obtaining radar data detected by the millimeter wave radar and preprocessing the radar data include: obtaining target data including the target detected by the millimeter wave radar at a current time point; compensating the target at the current time point by using historical data which is detected by the millimeter wave radar before the current time point and contains the target, and generating radar data at the current time point; obtaining position information of the target in the radar data at the current time point, and projecting the target to a pixel point of the image data at the current time point according to the position information to obtain target enhancement data of the radar data on the image data; and carrying out normalization processing on the target data.

The target data includes a relative distance, a relative speed and a scattering cross section of the target relative to the millimeter wave radar, and after the step of normalizing the target data, the method further includes: and generating three-channel data according to the distance, the relative speed and the scattering cross section corresponding to the target.

The step of obtaining image data shot by the camera device and preprocessing the image data comprises the following steps: adjusting image processing parameters of the camera device; obtaining the adjusted image data shot by the camera device; and carrying out normalization processing on the image data.

Wherein the step of concatenating the radar data and the image data in a plurality of levels of the convolutional neural network to obtain fused data, and outputting a category detection result and a keypoint detection result of a target in the fused data includes: cascading the radar data maps with different scales obtained after the radar data passes through different levels of the convolutional neural network and the image data maps with the same scale obtained after the image data passes through different levels of the neural network for multiple times to obtain a fusion characteristic map; and outputting the category detection result and the key point detection result of the target according to the fusion feature map.

Wherein, after the step of concatenating the radar data and the image data in multiple hierarchies of the convolutional neural network to obtain fused data and outputting a category detection result and a keypoint detection result of a target in the fused data, the method further includes: and acquiring the key point detection result corresponding to the adjacent time point before the current time point, further acquiring the coincidence coefficient of the key point detection results corresponding to the current time point and the adjacent time point, and determining the key points with the coincidence coefficient larger than a first threshold value as the same target so as to acquire a target time sequence matching result.

Wherein the step of post-processing the category detection result and the key point detection result to output the category and the motion state of the target includes: acquiring the accumulated detection times of the target at the current time point according to the target time sequence matching result; acquiring the accumulated identification times of the category of the target at the current time point according to the category detection result; when the ratio of the accumulated identification times to the accumulated detection times of the same target reaches a second threshold, determining the target as a category reaching the second threshold; and obtaining the motion state of the target by using a preset filtering method according to the key point detection result.

Wherein, the step of obtaining the motion state of the target by using a preset filtering method according to the detection result of the key point comprises: obtaining a world coordinate system of the camera device, and obtaining the position of the target in the world coordinate system according to the detection result of the key point; and acquiring the motion state of the target in the world coordinate system by using an interactive multi-model filtering method.

In order to solve the above technical problem, a second aspect of the present application provides an electronic device, which includes a memory and a processor coupled to each other, wherein the memory stores program data, and the processor calls the program data to execute the object detection method of the first aspect.

In order to solve the above technical problem, a third aspect of the present application provides a computer storage medium having stored thereon program data that, when executed by a processor, implements the object detection method of the first aspect.

The beneficial effect of this application is: according to the method, the radar data and the image data are preprocessed respectively, so that the radar data and the image data are input into the convolutional neural network in the same scale, the radar data and the image data are subjected to cascade fusion in multiple levels, after the characteristics of the two kinds of data are fused sufficiently, the category detection result and the key point detection result of the target are output according to the data after the characteristics are fused, the accuracy of the category and the motion state of the target output according to the category detection result and the key point detection result is improved, and the accuracy and the robustness of target detection are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

fig. 1 is a schematic flowchart illustrating a target detection method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another embodiment of a target detection method provided in the present application;

FIG. 3 is a flowchart illustrating an embodiment corresponding to step S202 in FIG. 2;

FIG. 4 is a schematic flowchart of an embodiment corresponding to step S203 in FIG. 2;

FIG. 5 is a diagram of a topology corresponding to a convolutional neural network;

FIG. 6 is a flowchart illustrating an embodiment corresponding to step S205 in FIG. 2;

FIG. 7 is a flowchart illustrating an embodiment corresponding to step S604 in FIG. 6;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device provided in the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a target detection method according to an embodiment of the present disclosure, the method including:

step S101: and acquiring image data shot by the camera device, preprocessing the image data, acquiring radar data detected by the millimeter wave radar, and preprocessing the radar data so that the radar data and the image data are respectively input into the convolutional neural network in the same scale.

Specifically, a camera device and a millimeter wave radar are provided on the unmanned vehicle for image capturing and obstacle detection, respectively. The millimeter wave radar has the characteristics of strong penetration capability, no influence of weather, small and compact size, high identification precision, long detection distance and relatively low price cost, so that the millimeter wave radar can be suitable for severe natural environment, and can improve the detection range and detection precision of the unmanned automobile when being applied to the field of unmanned driving.

Specifically, image data photographed by a camera is acquired and preprocessed, and linear correction, noise removal, dead pixel removal, and white balance are performed on the original image data to enhance the image quality of effective data on the image data. The method comprises the steps of obtaining radar data detected by a millimeter wave radar, wherein the radar data comprise scanning points of a plurality of targets, projecting the scanning points of the targets in the radar data to target pixel points of image data synchronous in time and space, so that the scanning points in the radar data are compensated by the pixel points around the target pixel points, and further increasing the radar data density to obtain target enhancement data.

Further, the scales of the processed radar data and the processed image data are adjusted to be the same, and if the radar data and the image data are regarded as a group of matrixes, the step of adjusting the scales to be the same is to adjust the matrixes corresponding to the radar data and the image data to be the same in length and width.

Step S102: and cascading the radar data and the image data in multiple hierarchies of the convolutional neural network to obtain fusion data, and outputting a target class detection result and a key point detection result in the fusion data.

Specifically, radar data and image data which are preprocessed and have the same scale are respectively sent to a convolutional neural network, a first radar feature output after the radar data passes through a first level in the convolutional neural network is cascaded with a first image feature output after the image data passes through the first level to obtain a first fusion image, the first fusion image is sent to the next level of the convolutional neural network, a second radar feature output after the radar data passes through a second level in the convolutional neural network is cascaded with a second image output after the first fusion image passes through the second level to obtain a second fusion image, and by analogy, multiple scales of radar features of the radar data are fused with the image data for multiple times to further obtain the fusion data.

Further, the convolutional neural network outputs a target category detection result and a key point detection result on the fusion data according to the fusion data obtained through multiple cascades. The target in the fusion data is a target in the radar data, and the class detection result indicates which class the target belongs to, such as: and the detection results of people, animals, buildings, plants and vehicles and key points give the position information of the target in the fusion data.

Step S103: and performing post-processing on the category detection result and the key point detection result to output the category and the motion state of the target.

Specifically, the method includes the steps of obtaining the number N of times that targets in radar data at multiple time points are recognized within a period of time, and determining the category of the targets as the category corresponding to the maximum recognition number when the M/N value reaches over 90% according to the maximum recognition number M of the category of the corresponding targets in the category detection result.

Further, position information of the target in the key point detection result is obtained, the probability of each motion state is output by considering various current possible motion states of the unmanned automobile, the state with the highest probability is used as the current state of the unmanned automobile, and then the motion state of the target relative to the current unmanned automobile is obtained.

The target detection method provided by this embodiment respectively preprocesses radar data and image data, so that the radar data and the image data are input to a convolutional neural network at the same scale, the radar data and the image data are cascaded and fused at multiple levels, after the features of the two types of data are fully fused, the category detection result and the key point detection result of a target are output according to the data with the fused features, so as to improve the accuracy of outputting the category and the motion state of the target according to the category detection result and the key point detection result, and further improve the accuracy and the robustness of target detection.

Referring to fig. 2, fig. 2 is a schematic flow chart illustrating a target detection method according to another embodiment of the present application, the method including:

step S201: and acquiring image data shot by the camera device and preprocessing the image data.

Specifically, the Image data preprocessing mainly includes exposure parameters, gain parameters, white balance parameters, 3D noise reduction, and digital wide dynamic parameter adjustment, and an Image processing module (ISP) of the Image pickup apparatus is adjusted to obtain Image processing parameters adapted to the current environment.

Optionally, the step of obtaining and pre-processing image data captured by the image capturing device comprises: adjusting image processing parameters of the camera device; obtaining the adjusted image data shot by the camera device; and carrying out normalization processing on the image data.

Specifically, parameters of the image processing module are adjusted to be adaptive to the current environment so as to improve the matching degree of various parameters of the camera device and the current environment, the quality of image data is improved, after the adjustment of the image processing parameters is completed, the image data shot by the camera device with the adjusted image processing parameters is obtained, the image is further cut and/or zoomed, further the image data subjected to preliminary preprocessing is subjected to normalization processing, and the image data subjected to preliminary preprocessing is converted into a standard form through a series of conversion.

Step S202: and radar data detected by the millimeter wave radar is obtained and preprocessed.

Specifically, referring to fig. 3, fig. 3 is a schematic flowchart illustrating an embodiment corresponding to step S202 in fig. 2, where the step S202 includes:

step S301: target data including a target detected by the millimeter wave radar at the current time point is obtained.

Specifically, the millimeter wave radar scans the surrounding environment, identifies and outputs a group of targets corresponding to the current time point, the millimeter wave radar sends out electromagnetic waves with the wavelength of 1-10 mm when scanning the targets, the target object distance is obtained according to the time of the round trip flight from sending to receiving of the electromagnetic waves, the moving speed of the targets relative to the radar is obtained by calculating the frequency change of the returned electromagnetic waves according to the Doppler effect, and the azimuth angles of the targets are calculated according to the phase difference of the electromagnetic waves reflected by the same targets received by the parallel receiving antennas. Therefore, the millimeter wave radar further generates target data of the target when scanning the target, the target data including the relative distance, the relative speed, and the scattering cross section of the target with respect to the millimeter wave radar.

Step S302: and compensating the target at the current time point by using the historical data containing the target detected by the millimeter wave radar before the current time point, and generating radar data at the current time point.

Specifically, a target scanned by the millimeter wave radar and target data corresponding to the target are cached for a predetermined time, which may be scanning cycle time of the millimeter wave radar, historical data including the target before a current time point are acquired, and the historical data and the target data at the current time point are combined to generate radar data at the current time point, so that the target in the radar data at the current time point is more comprehensive.

Step S303: and acquiring the position information of the target in the radar data at the current time point, and projecting the target to the pixel point of the image data at the current time point according to the position information to acquire target enhancement data of the radar data on the image data.

Specifically, combining the kinematics of the radar, the relative motion of the target and the radar, calculating the position of the target in the radar data at the current time point through a uniform acceleration model, projecting the position of the target onto the image data of the current time point, and after one target is projected onto a pixel point on the image data, compensating the target by the pixel point around the pixel point corresponding to the target so as to increase the density of the radar data to obtain target enhancement data, thereby enriching the characteristics of the target at the corresponding position and improving the accuracy of target position identification.

Specifically, the projection of the target in the radar data onto the image data can be represented by the following formula:

X_Cam＝P_CamT_CamRadX_Rad (1)

wherein, P_CamIs the projection matrix of the camera device, determined by the internal and external parameters of the camera device, T_CamRadIs an external parameter matrix, X, of millimeter wave radar to a camera device_RadIs a homogeneous vector of the target.

Step S304: and carrying out normalization processing on the target data.

Specifically, the relative distance, the relative speed and the scattering cross section of the target with respect to the millimeter wave radar are normalized to output data of the same scale, so that the target data is converted into normalized data.

Further, the target data includes a relative distance, a relative speed and a scattering cross section of the target with respect to the millimeter wave radar, and after the step of normalizing the target data, the method further includes: and generating three-channel data according to the distance, the relative speed and the scattering cross section corresponding to the target.

Specifically, data of three channels, namely distance, relative speed and scattering cross section, are generated for the target on the target enhancement data, and all the data are subjected to normalization processing, so that the data are input into the convolutional neural network to be analyzed for different types of data respectively, and the influence of data mixture on data analysis is avoided.

Step S203: and cascading the radar data and the image data in multiple hierarchies of the convolutional neural network to obtain fusion data, and outputting a target class detection result and a key point detection result in the fusion data.

Specifically, referring to fig. 4, fig. 4 is a schematic flowchart illustrating an embodiment corresponding to step S203 in fig. 2, where step S203 includes:

step S401: and performing cascade connection on the radar data maps with different scales obtained by the radar data after passing through different levels of the convolutional neural network and the image data maps with the same scale obtained by the image data after passing through different levels of the neural network for multiple times to obtain a fusion characteristic map.

Specifically, the radar data and the image data are normalized and then have the same scale, radar data maps with different scales output after the radar data passes through a plurality of levels such as convolution, pooling and activation in a convolutional neural network are cascaded with image data maps with the same scale after passing through the convolutional neural network, and then the image data maps fused with the radar data maps are input into the next level. After multiple cascades, a fusion characteristic map is obtained.

In an application manner, please refer to fig. 5, where fig. 5 is a topology structure diagram corresponding to a convolutional neural network, where side lengths in the diagram represent scales of data, the side lengths are equal and represent the scales of the data, and a change in the side lengths represents a scaling of the scales of the data. The method comprises the steps of performing convolution, pooling and activation on radar data to generate radar data maps rf1, rf2, rf3 and rf4 of different scales, cascading the radar data map rf1 with an image data map pf1 of the same scale to generate a first fusion feature map f1, sending the first fusion feature map f1 to the next level, outputting the image data map pf2, cascading a radar data map rf2 of the same scale with the image data map pf2 to generate a second fusion feature map f2, inputting the second fusion feature map f2 into an image data map pf3 of the next level, cascading a radar data map rf3 of the same scale with the image data map pf3 to generate a third fusion feature map f3, inputting the third fusion feature map f3 into an image data map pf4 of the next level, and cascading the radar data map rf4 of the same scale with the image data map pf4 to generate a final fusion feature map 4. In the application mode, the radar data and the image data are subjected to cascade fusion at a plurality of levels, and the characteristics of the two data are fully fused to obtain a fusion characteristic map f 4.

Step S402: and outputting a target category detection result and a key point detection result according to the fusion feature map.

Specifically, the convolutional neural network identifies the target on the fused feature map, analyzes the category of the target and detects the position of the target, and then outputs a category detection result and a key point detection result of the target.

Further, in order to improve the ability of the convolutional neural network to identify the target, the convolutional neural network needs to be trained, a loss function in the training process includes target class loss and key point regression loss, and a calculation formula of the loss function is as follows:

where i denotes the sample number and m denotes the keypoint location.

Step S204: and obtaining a key point detection result corresponding to an adjacent time point before the current time point, further obtaining a coincidence coefficient of the key point detection results corresponding to the current time point and the adjacent time point, and determining the key points with the coincidence coefficient larger than a first threshold value as the same target so as to obtain a target time sequence matching result.

Specifically, based on the detection result of the key point of at least one adjacent time point before the current time point, the coincidence coefficient of the target in the current time point and the target in the adjacent time point is calculated, and the time sequence target is matched in the image corresponding to the current time point. If the coincidence coefficient of the target is larger than the first threshold value, the target is matched as the same target, wherein the calculation formula of the coincidence coefficient is as follows:

wherein, area_iPosition information of an object representing a current time point, area_jPosition information of objects representing adjacent time points.

In a specific application scene, the camera device collects 25 frames of image data every second, the interval of each frame of image data is 40 milliseconds, the coincidence coefficient of the targets in the key point detection result of the current time point and the key point detection result of the previous time point is calculated, if the coincidence coefficient is greater than 90%, the targets are judged to be the same target, the same targets in different time points are screened out, the probability that the targets are repeatedly identified is reduced, and the accuracy of target identification in a period of time is improved.

Step S205: and performing post-processing on the category detection result and the key point detection result to output the category and the motion state of the target.

Specifically, referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment corresponding to step S205 in fig. 2, where step S205 specifically includes:

step S601: and acquiring the accumulated detection times of the target at the current time point according to the target time sequence matching result.

Specifically, according to the target timing matching result, if a historical target which is not matched with the target before the current time point exists in the image at the current time point, a new sequence number is generated for the new target which is not matched with the historical target, and corresponding category information is stored. And if the target in the image at the current time point is matched with the historical target before the current time point, increasing 1 for the life cycle of the target successfully matched with the historical target, and setting 0 for the loss cycle, wherein the life cycle is the accumulated detection times of the target detected. And when the time sequence target in the current time point is not matched in the historical targets, increasing the loss period of the target by 1, and if the loss period exceeds a certain threshold value, deleting the target from the historical targets.

Step S602: and acquiring the accumulated identification times of the category of the target at the current time point according to the category detection result.

Specifically, according to a target time sequence matching result and a category detection result in a period of time, the accumulated detection times N of the same target and the category C identified by the maximum times of the target are obtained_iAccumulated number of identifications M_i。

Step S603: and when the ratio of the accumulated identification times to the accumulated detection times of the same target reaches a second threshold, determining the target as the class reaching the second threshold.

Specifically, when M_iwhen/N is greater than the second threshold, then the target class is set to C_iFor the purpose that the cumulative number of detections reaches N, the category C identified as the largest number of the N times is selected_iQuoting with N, judging the target as class C only if reaching a second threshold value_iAnd further, the influence of discrete values in the category detection result on the category identification result is reduced, the value with the highest possibility in the multiple identification results is taken as the category of the target, and the accuracy of the identification result is improved.

Step S604: and obtaining the motion state of the target by using a preset filtering method according to the key point detection result.

Specifically, referring to fig. 7, fig. 7 is a flowchart illustrating an embodiment corresponding to step S604 in fig. 6, where step S604 includes:

step S701: and acquiring a world coordinate system of the camera device, and acquiring the position of the target in the world coordinate system according to the detection result of the key point.

Specifically, a world coordinate system corresponding to the image capturing device is obtained, and the position information in the detection result of the key point of the target is converted into the world coordinate system to obtain the position of the target relative to the image capturing device.

Step S702: and acquiring the motion state of the target in the world coordinate system by using an interactive multi-model filtering method.

Specifically, an interactive multi-model filtering method is utilized, four possible motion models of uniform speed, uniform acceleration, left turning and right turning are considered, and the motion state of the target in the world coordinate system of the camera device is updated based on adaptive Kalman filtering.

Furthermore, by utilizing an interactive multi-model filtering method, prediction results and corresponding probabilities of the targets under various motion models are provided, so that the targets under different states can output data according with the motion states of the targets, and the method is suitable for road driving environments of different scenes.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of an electronic device 80 provided in the present application, where the electronic device 80 includes a memory 801 and a processor 802 coupled to each other, where the memory 801 stores program data (not shown), and the processor 802 invokes the program data to implement the target detection method in any of the embodiments described above, and the description of relevant contents refers to the detailed description of the embodiments of the methods described above, which is not repeated herein.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application, the computer storage medium 90 stores program data 900, and the program data 900 is executed by a processor to implement the object detection method in any of the above embodiments, and the description of the related contents refers to the detailed description of the above method embodiments, which is not repeated herein.

It should be noted that, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method of object detection, the method comprising:

acquiring image data shot by a camera device and preprocessing the image data to acquire radar data detected by a millimeter wave radar and preprocessing the radar data so as to enable the radar data and the image data to be respectively input into a convolutional neural network in the same scale;

cascading the radar data and the image data in multiple levels of the convolutional neural network to obtain fusion data, and outputting a category detection result and a key point detection result of a target in the fusion data;

and performing post-processing on the category detection result and the key point detection result to output the category and the motion state of the target.

2. The method of claim 1, wherein the step of obtaining and pre-processing radar data for millimeter wave radar detection comprises:

obtaining target data including the target detected by the millimeter wave radar at a current time point;

compensating the target at the current time point by using historical data which is detected by the millimeter wave radar before the current time point and contains the target, and generating radar data at the current time point;

obtaining position information of the target in the radar data at the current time point, and projecting the target to a pixel point of the image data at the current time point according to the position information to obtain target enhancement data of the radar data on the image data;

and carrying out normalization processing on the target data.

3. The method of claim 2, wherein the target data includes a relative distance, a relative velocity, and a scattering cross-section of the target relative to the millimeter wave radar, and wherein the step of normalizing the target data further comprises:

and generating three-channel data according to the distance, the relative speed and the scattering cross section corresponding to the target.

4. The method of claim 1, wherein the step of obtaining and pre-processing image data captured by an imaging device comprises:

adjusting image processing parameters of the camera device;

obtaining the adjusted image data shot by the camera device;

and carrying out normalization processing on the image data.

5. The method of claim 1, wherein the step of concatenating the radar data and the image data in multiple levels of the convolutional neural network to obtain fused data, and outputting a class detection result and a keypoint detection result of a target in the fused data comprises:

cascading the radar data maps with different scales obtained after the radar data passes through different levels of the convolutional neural network and the image data maps with the same scale obtained after the image data passes through different levels of the neural network for multiple times to obtain a fusion characteristic map;

and outputting the category detection result and the key point detection result of the target according to the fusion feature map.

6. The method of claim 1, wherein the step of concatenating the radar data and the image data in multiple levels of the convolutional neural network to obtain fused data, and outputting a class detection result and a keypoint detection result of a target in the fused data further comprises:

and acquiring the key point detection result corresponding to the adjacent time point before the current time point, further acquiring the coincidence coefficient of the key point detection results corresponding to the current time point and the adjacent time point, and determining the key points with the coincidence coefficient larger than a first threshold value as the same target so as to acquire a target time sequence matching result.

7. The method according to claim 6, wherein the step of post-processing the class detection result and the keypoint detection result to output the class and the motion state of the target comprises:

acquiring the accumulated detection times of the target at the current time point according to the target time sequence matching result;

acquiring the accumulated identification times of the category of the target at the current time point according to the category detection result;

when the ratio of the accumulated identification times to the accumulated detection times of the same target reaches a second threshold, determining the target as a category reaching the second threshold;

and obtaining the motion state of the target by using a preset filtering method according to the key point detection result.

8. The method according to claim 7, wherein the step of obtaining the motion state of the target by using a preset filtering method according to the key point detection result comprises:

obtaining a world coordinate system of the camera device, and obtaining the position of the target in the world coordinate system according to the detection result of the key point;

and acquiring the motion state of the target in the world coordinate system by using an interactive multi-model filtering method.

9. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor calls to perform the method of any of claims 1-8.

10. A computer storage medium having program data stored thereon, which program data, when executed by a processor, implements the method according to any one of claims 1-8.