CN112712129A

CN112712129A - Multi-sensor fusion method, device, equipment and storage medium

Info

Publication number: CN112712129A
Application number: CN202110033933.0A
Authority: CN
Inventors: 徐�明; 刘强; 蔡振伟; 李杉杉; 徐丽华; 黄启明
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Nanjing Liwei Zhilian Technology Co.,Ltd.; Shenzhen ZNV Technology Co Ltd
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-04-27
Anticipated expiration: 2041-01-11
Also published as: CN112712129B

Abstract

The invention discloses a multi-sensor fusion method, a multi-sensor fusion device, a multi-sensor fusion equipment and a multi-sensor fusion storage medium, wherein image data are obtained based on a camera, and primary regression processing is carried out on the image data to obtain primary regression attributes of an object; acquiring initial point cloud data based on a millimeter wave radar, and aligning the initial point cloud data with the image data to obtain aligned point cloud data; generating a fusion feature map based on the aligned point cloud data and the primary regression attributes; performing secondary regression processing on the fusion feature map to obtain target attribute information; and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, and detecting the attribute information of the target object according to the target reasoning model. The invention realizes that the attribute information of the object can be accurately and stably sensed based on the millimeter wave radar and the camera in the three-dimensional target detection task.

Description

Multi-sensor fusion method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of multi-sensor fusion, in particular to a multi-sensor fusion method, a multi-sensor fusion device, multi-sensor fusion equipment and a storage medium.

Background

With the development of industries such as urban security and intelligence, industrial intelligence, power internet of things and the like, autonomous edge devices and industrial autonomous robots are increasingly applied, and a sensing system is required to be capable of comprehensively detecting, identifying and tracking target objects. Currently, in the fields of unmanned driving, robots, and the like, most sensor fusion methods focus on three-dimensional object detection using a laser radar and a camera. Lidar calculates the distance to surrounding objects by the time of flight of a laser pulse. Lidar provides precise near-range depth information, which is often represented in the form of point cloud data, but as the distance increases, the point cloud data becomes sparse, resulting in a reduced ability to perceive distant objects. The camera provides rich two-dimensional information such as colors, textures and the like, but cannot sense the depth of an object image. These complementary features make lidar and camera sensor integration a topic of interest in recent years. Such a combination has been successful in proving high accuracy three-dimensional target detection in many applications, but has its limitations, since both cameras and lidar are sensitive to severe weather conditions (e.g. snow, fog, rain, etc.) and cannot detect the speed of objects without the use of time information. However, in many cases, estimating the speed of an object is a key requirement for avoiding a collision, especially in a scenario such as automatic driving, and therefore, determining the speed of the object by relying on time information to avoid a collision may not be a feasible solution in a situation where time information is critical, and thus attribute information such as the height, width, category, and posture of the object cannot be accurately and stably sensed based on a lidar and camera scheme in a three-dimensional target detection task.

Disclosure of Invention

The invention mainly aims to provide a multi-sensor fusion method, a multi-sensor fusion device, a multi-sensor fusion equipment and a storage medium, and aims to solve the technical problem that the attribute information of an object cannot be accurately and stably sensed based on a laser radar and camera scheme in the current three-dimensional target detection task.

In order to achieve the above object, an embodiment of the present invention provides a multi-sensor fusion method, where the multi-sensor fusion method includes:

acquiring image data based on a camera, and performing primary regression processing on the image data to obtain a primary regression attribute of an object;

acquiring initial point cloud data based on a millimeter wave radar, and aligning the initial point cloud data with the image data to obtain aligned point cloud data;

generating a fusion feature map based on the aligned point cloud data and the primary regression attributes;

performing secondary regression processing on the fusion feature map to obtain target attribute information;

and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, and detecting the attribute information of the target object according to the target reasoning model.

Preferably, the step of generating a fused feature map based on the aligned point cloud data and the primary regression attributes comprises:

generating an intermediate feature map according to the alignment point cloud data and the primary regression attribute;

and connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fused feature map.

Preferably, the step of generating an intermediate feature map from the aligned point cloud data and the primary regression attributes comprises:

filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data;

expanding struts for the target point cloud data to form strut characteristic information;

correlating the pillar feature information with the primary regression attribute to obtain a target pillar feature;

and creating a feature map channel with complementary depth and speed according to the target strut feature, and obtaining an intermediate feature map.

Preferably, the step of filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data includes:

creating a 3D region-of-interest frustum and a 3D bounding box region-of-interest according to the primary regression attribute;

and filtering the aligned point cloud data according to the 3D region-of-interest frustum and the 3D bounding box region-of-interest to obtain target point cloud data.

Preferably, the step of performing a secondary regression process on the fused feature map to obtain target attribute information includes:

extracting target features from the fused feature map;

and performing regression operation on the target characteristics to obtain target attribute information.

Preferably, the step of performing a primary regression process on the image data to obtain a primary regression attribute of the object includes:

calling the initial convolutional neural network, carrying out target detection on an object in the image data through the initial convolutional neural network, and generating an initial characteristic diagram for the object;

and performing primary regression processing on the initial characteristic graph to obtain a primary regression attribute of the object.

Preferably, before the step of inferring the initial convolutional neural network based on the target attribute information to obtain the target inference model, the method further includes:

and acquiring a preset convolutional neural network and training data, and training the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

To achieve the above object, the present invention also provides a multi-sensor fusion apparatus, including:

the primary regression processing module is used for acquiring image data based on the camera and performing primary regression processing on the image data to obtain a primary regression attribute of the object;

the alignment processing module is used for acquiring initial point cloud data based on a millimeter wave radar, and aligning the initial point cloud data with the image data to obtain aligned point cloud data;

the data fusion module is used for generating a fusion feature map based on the alignment point cloud data and the primary regression attribute;

the secondary regression processing module is used for carrying out secondary regression processing on the fusion characteristic graph to obtain target attribute information;

and the model reasoning module is used for reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model so as to detect the attribute information of the target object according to the target reasoning model.

Further, to achieve the above object, the present invention also provides a multi-sensor fusion device, which includes a memory, a processor, and a multi-sensor fusion program stored in the memory and executable on the processor, wherein the multi-sensor fusion program, when executed by the processor, implements the steps of the multi-sensor fusion method described above.

Further, to achieve the above object, the present invention also provides a storage medium, on which a multi-sensor fusion program is stored, and the multi-sensor fusion program implements the steps of the multi-sensor fusion method when executed by a processor.

The embodiment of the invention provides a multi-sensor fusion method, a multi-sensor fusion device, a multi-sensor fusion equipment and a multi-sensor fusion storage medium, wherein image data are obtained based on a camera, and primary regression processing is carried out on the image data to obtain a primary regression attribute of an object; acquiring initial point cloud data based on a millimeter wave radar, and aligning the initial point cloud data with the image data to obtain aligned point cloud data; generating a fusion feature map based on the aligned point cloud data and the primary regression attributes; performing secondary regression processing on the fusion feature map to obtain target attribute information; and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, and detecting the attribute information of the target object according to the target reasoning model. According to the invention, because the millimeter wave radar is not sensitive to severe weather, the initial point cloud data can be stably obtained and provided, and the target inference model is obtained by carrying out quadratic regression processing on the fusion characteristic diagram and then inferring the initial convolution neural network, so that the attribute information of the object can be accurately and stably sensed based on the millimeter wave radar and the camera in the three-dimensional target detection task.

Drawings

FIG. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the multi-sensor fusion method of the present invention;

FIG. 2 is a schematic flow chart of a multi-sensor fusion method according to a first embodiment of the present invention;

FIG. 3 is a first schematic diagram illustrating the alignment of initial point cloud data with image data according to the present invention;

FIG. 4 is a second schematic diagram illustrating the alignment of the initial point cloud data with the image data according to the present invention;

FIG. 5 is a third schematic diagram illustrating alignment of initial point cloud data with image data according to the present invention;

FIG. 6 is a schematic flow chart of a multi-sensor fusion method according to a second embodiment of the present invention;

FIG. 7 is a schematic diagram of frustum association performed by the multi-sensor fusion method of the present invention;

FIG. 8 is a schematic overall flow chart of a multi-sensor fusion method according to the present invention;

FIG. 9 is a functional block diagram of a multi-sensor fusion device according to a preferred embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a multi-sensor fusion device in a hardware operating environment according to an embodiment of the present invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

The multi-sensor fusion device can be a PC, or a mobile terminal device such as a tablet computer and a portable computer.

As shown in fig. 1, the multi-sensor fusion apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the multi-sensor fusion device configuration shown in fig. 1 does not constitute a limitation of multi-sensor fusion devices and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a multi-sensor fusion program.

In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke the multi-sensor fusion program stored in the memory 1005 and perform the following operations:

Further, the step of generating a fused feature map based on the aligned point cloud data and the primary regression attributes comprises:

Further, the step of generating an intermediate feature map from the aligned point cloud data and the primary regression attributes comprises:

Further, the step of filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data includes:

Further, the step of performing a secondary regression process on the fusion feature map to obtain target attribute information includes:

extracting target features from the fused feature map;

Further, the step of performing a primary regression process on the image data to obtain a primary regression attribute of the object includes:

Further, before the step of inferring a target inference model for the initial convolutional neural network based on the target attribute information, the processor 1001 may be configured to call the multi-sensor fusion program stored in the memory 1005, and perform the following operations:

For a better understanding of the above technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Referring to fig. 2, a first embodiment of the invention provides a flow chart diagram of a multi-sensor fusion method. In this embodiment, the multi-sensor fusion method includes the steps of:

step S10, acquiring image data based on a camera, and performing primary regression processing on the image data to obtain a primary regression attribute of an object;

the multi-sensor fusion method in the embodiment is applied to a multi-sensor fusion system, and the multi-sensor fusion system at least comprises a millimeter wave radar and a camera, wherein the millimeter wave radar is a radar which works in a millimeter wave band for detection, and generally, millimeter waves refer to waves in a frequency domain (with a wavelength of 1-10 mm) of 30-300 GHz (gigahertz); the camera is a video input device, which is widely applied to video conferences, telemedicine, real-time monitoring and other aspects, and can also be used for video and sound conversations and communication in a network through the camera, and in addition, the camera can be used for various popular digital images, video and audio processing and the like.

Understandably, the automatic industrial robot is generally provided with different types of sensors, the sensors are complementary to the perception characteristics of the industrial scene, and the accuracy and the robustness of the automatic industrial robot to tasks such as target detection, semantic segmentation and the like are enhanced through a multi-sensor fusion method. The current sensor fusion method utilizes a laser radar and a camera to detect three-dimensional objects. Lidar calculates the distance to surrounding objects by the time of flight of a laser pulse. Lidar provides precise near-range depth information, which is often represented in the form of point cloud data, but as the distance increases, the point cloud data becomes sparse, resulting in a reduced ability to perceive distant objects. The camera provides rich two-dimensional information such as colors, textures and the like, but cannot sense the depth of an object image. If three-dimensional object detection is performed by fusing the laser radar and the camera, since both the camera and the laser radar are sensitive to severe weather conditions (e.g., weather such as snow, fog, and rain), the speed of the object cannot be detected without using time information. However, in many cases, estimating the speed of an object is a key requirement for avoiding a collision, and particularly in a scenario such as automatic driving, it is not feasible to rely on time information for three-dimensional object detection when time information is used as the key information. Therefore, the application provides a multi-sensor fusion method, which performs tasks such as three-dimensional object detection and spatial 3D (3-dimensional) attribute sensing by performing multi-sensor fusion on a millimeter wave radar and a camera, and the millimeter wave radar is very robust to severe weather conditions compared with a laser radar and the camera, and can detect objects (up to 200 meters) in a very long range, and the millimeter wave radar accurately estimates the speed of all detected objects by using the doppler effect without any time information. Further, compared to the laser radar, the point cloud data collected by the millimeter wave radar requires only less processing before being used as a target detection result, and the millimeter wave radar is less expensive compared to the laser radar. Moreover, the invention can solve the problems that the density of point cloud data is increased by aggregating a plurality of millimeter wave radar scans, but data delay is introduced into the system, and although millimeter wave radar point clouds are usually expressed as points in a three-dimensional coordinate system, most millimeter wave radars only report the distance and azimuth angle with an object, so that the vertical measurement of the point clouds is usually not very accurate or even does not exist.

Further, the system calls a camera, adjusts the shooting angle by controlling the camera, shoots an image in front of the lens after adjusting the shooting angle, and reads the shot image data from the memory of the camera. Further, the system performs a primary regression process on the read image data through an initial convolutional neural network obtained by training a preset convolutional neural network, specifically, the read image data is input into the initial convolutional neural network, the image data is subject to object detection through the initial convolutional neural network, a feature map is generated on the detected object, and a primary regression process is performed on the 3D attribute of the object according to the feature map to obtain a primary regression attribute of the object, so that an alignment process is subsequently performed on the initial point cloud data and the image data to obtain aligned point cloud data, wherein the primary regression process includes performing a regression operation on attributes such as hm (heat map), off (offset fsp), WH (Width, Height), dim (dimension), dep (depth), rot (rotation), and the like.

step S11, calling the initial convolutional neural network, carrying out target detection on the object in the image data through the initial convolutional neural network, and generating an initial characteristic diagram for the object;

and step S12, performing primary regression processing on the initial feature map to obtain primary regression attributes of the object.

Further, the system calls an initial convolutional neural network, inputs image data into the initial convolutional neural network, performs object detection on the image data through a centret in the trained initial convolutional neural network and generates a feature map of the detected object, the centret is used for modeling an object in one image data as a point, using the point as a central point of a bounding box of a target, finding the central point by using key point estimation and regressing to all other object attributes such as size, 3D attribute, direction, category and even posture, and then inputs an input image into a full convolutional network generating a thermodynamic diagram according to a backbone network in the preset convolutional neural network, a local peak in the thermodynamic diagram corresponds to the center of the object, and predicts the height, width, category, posture and other attributes of the bounding box of the object by using the image features of the peak, and generates the feature map of the object, the backbone network is a full convolutional network, generally has a (DLA (Deep Layer Aggregation) 43, renet 18 and HourglassNet) network structure, the renet 18 is an 18-Layer network structure with weight, and the HourglassNet is an hourglass network and is used for detecting key points of a human body; and performing primary regression processing on attributes such as the depth, the 3D bounding box and the speed of the object according to the feature map to obtain primary regression attributes of the object, so that alignment processing is performed on the initial point cloud data and the image data to obtain aligned point cloud data, and the accuracy of the processed data is improved to a certain extent.

Step S20, acquiring initial point cloud data based on a millimeter wave radar, and aligning the initial point cloud data with the image data to obtain aligned point cloud data;

further, the system calls the millimeter wave radar, controls the millimeter wave radar to perform object detection in a detectable range, forms point cloud data according to detected object information, and reads the point cloud data from a memory of the millimeter wave radar as initial point cloud data. Further, the system aligns initial point cloud data acquired from the millimeter wave radar with image data read from the camera in time, converts a coordinate system where the initial point cloud data is located into the coordinate system where the image data is located, and transforms a radial velocity of the initial point cloud data acquired by the millimeter wave radar so that the radial velocity of the initial point cloud data is consistent with a velocity in a primary regression attribute after primary regression processing to obtain aligned point cloud data, and generates a fusion feature map based on the aligned point cloud data and the primary regression attribute, specifically referring to fig. 3, fig. 4, and fig. 5, wherein fig. 3, fig. 4, and fig. 5 are respectively a first schematic diagram, a second schematic diagram, and a third schematic diagram of aligning the initial point cloud data with the image data, fig. 3 includes a forward moving trolley, and an object a moving in the same direction with the trolley at a velocity of vA exists in front of the trolley, the object B moving at the actual moving speed vB exists in the right front of the trolley, X, Y, Z are respectively an abscissa, an ordinate and a vertical coordinate, because the radial moving speed Vr of the object B is different from the actual moving speed vB of the object B, the actual moving speed vB of the object B needs to be subjected to speed conversion, the vB is decomposed into a speed parallel to the moving direction of the trolley and a speed perpendicular to the moving direction of the trolley, and the speed parallel to the moving direction of the trolley is taken as a new radial speed, so that understandably, the unit of the actual speed and the unit of the radial speed also need to be uniformly converted, and the actual moving speed of the object A is the same as the radial speed, so that the conversion is not needed; in fig. 4, a point C of the camera center in the three-dimensional coordinate system formed by X, Y, Z three coordinate axes is taken as an origin, and the image plane also has the three-dimensional coordinate system formed by X, Y, Z three coordinate axes and the coordinate origin P of the object, so that it is necessary to align the X, Y, Y coordinate axes in the two coordinate systems respectively so as to extract the intermediate feature of the data in the two coordinate systems; fig. 5 includes image data captured by a camera and point cloud data detected by a millimeter wave radar, and the point cloud data is sparse.

Step S30, generating a fusion feature map based on the alignment point cloud data and the primary regression attribute;

further, the system firstly carries out intermediate feature map production of the millimeter wave radar data according to the feature map specification generated when the primary regression processing is carried out on the image data according to the aligned point cloud data and the primary regression attribute; and after the intermediate feature map is manufactured, connecting the intermediate feature map with the primary regression attribute of the image data through the feature map channel of the primary regression attribute to form a fusion feature map, and performing secondary regression processing on the fusion feature map to obtain target attribute information.

Step S40, carrying out secondary regression processing on the fusion characteristic graph to obtain target attribute information;

further, the system performs secondary regression processing on the generated fusion feature map so as to re-estimate the 3D attribute of the detected object to obtain target attribute information; specifically, the system may perform feature extraction on the fused feature map by using a convolution operation, and then perform regression operation on the extracted features to obtain target attribute information.

step S41, extracting target characteristics from the fusion characteristic diagram;

and step S42, performing regression operation on the target characteristics to obtain target attribute information.

Furthermore, the system can extract 3D attributes such as depth, speed and other target features from the fusion feature map through convolution operation, and performs regression operation on the target features after the target features are extracted, and performs regression operation on the target features through a quadratic regression head comprising three convolution layers, so that 3 × 3 convolution layers and 1 × 1 convolution layers are added to generate required output to obtain target attribute information; compared with the regression head of the primary regression processing, the additional convolution layer in the regression head of the secondary regression is beneficial to learning higher-level features from the feature mapping of the millimeter wave radar and finally decoding the result of the operation of the secondary regression head into a 3D boundary box, and the 3D boundary box decoder uses the estimated depth, speed, rotation and attribute from the secondary regression head and acquires the attribute information of other object objects from the primary regression attribute.

And step S50, reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, and detecting the attribute information of the target object according to the target reasoning model.

Further, after the preset convolutional neural network is trained through training data and an initial convolutional neural network is obtained, in order to enable the detected attribute information to be more accurate when the trained initial convolutional neural network is used for three-dimensional target detection, target attribute information obtained through quadratic regression processing needs to be inferred for the initial convolutional neural network, specifically, the target attribute information obtained through quadratic regression processing is input into the initial convolutional neural network, data inference is carried out on the target attribute information through the initial convolutional neural network, and a target inference model for multi-mode fusion depth convolutional neural network 3D target detection and identification is generated after the data inference is completed, so that the target object can be subjected to attribute information detection according to the target inference model, and the purpose that the target detection based on a millimeter wave radar and a camera in a three-dimensional target detection task is fast, is achieved, The attribute information of the object is accurately and stably sensed.

Further, before the step of inferring the initial convolutional neural network based on the target attribute information to obtain the target inference model, the method further includes:

and A, acquiring a preset convolutional neural network and training data, and training the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

Further, the system first obtains a preset convolutional neural Network, where the preset convolutional neural Network in this embodiment may be a Full Convolutional Network (FCN), and the Full convolutional Network is a convolutional neural Network obtained by replacing a fully Connected Network with a convolutional Network on the basis of the fully Connected Network; then, in order to enable the preset convolutional neural network to provide accurate attribute information when the three-dimensional target detection is carried out, the system can acquire images through a camera and point cloud data through a millimeter wave radar, and respectively carry out manual labeling on the images and the point cloud data so as to identify and distinguish objects in the images and the point cloud data, further take the manually labeled images and the point cloud data as training data, input the training data into the preset convolutional neural network for training, and adjust parameters in the preset convolutional neural network to obtain an initial convolutional neural network; the system can also acquire the marked image and point cloud data from the outside in a wireless communication mode to serve as training data, the training data is input into a preset convolutional neural network for training, parameters in the preset convolutional neural network are adjusted, an initial convolutional neural network is obtained, and therefore the initial regression attribute obtained through the initial convolutional neural network through initial regression processing is accurate.

The embodiment provides a multi-sensor fusion method, a multi-sensor fusion device, a multi-sensor fusion equipment and a storage medium, wherein image data are acquired based on a camera, and primary regression processing is performed on the image data to obtain a primary regression attribute of an object; acquiring initial point cloud data based on a millimeter wave radar, and aligning the initial point cloud data with the image data to obtain aligned point cloud data; generating a fusion feature map based on the aligned point cloud data and the primary regression attributes; performing secondary regression processing on the fusion feature map to obtain target attribute information; and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, and detecting the attribute information of the target object according to the target reasoning model. According to the invention, the initial point cloud data and the image data are correlated, the fusion characteristic diagram is generated according to the correlated initial point cloud data and the image data, the initial neural network is inferred by the target attribute information obtained after the fusion characteristic diagram is subjected to secondary regression processing, and the target inference model for detecting the attribute information of the target object is obtained, so that the object attribute information can be quickly, accurately and stably sensed based on the millimeter wave radar and the camera in the three-dimensional target detection task.

Further, referring to fig. 6, a second embodiment of the multi-sensor fusion method of the present invention is proposed based on the first embodiment of the multi-sensor fusion method of the present invention, and in the second embodiment, the step of generating a fusion feature map based on the aligned point cloud data and the primary regression attribute includes:

step S31, generating an intermediate feature map according to the alignment point cloud data and the primary regression attribute;

and step S32, connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fused feature map.

Further, the system firstly acquires a feature map specification generated when primary regression processing is carried out on the image data, and then makes an intermediate feature map of the aligned point cloud data according to the acquired feature map specification by using the aligned point cloud data and the primary regression attribute; further, after the intermediate feature map is manufactured, the system connects the intermediate feature map with the feature map of the primary regression attribute through the feature map channel of the primary regression attribute, and forms a fused feature map after the connection operation is completed, so that secondary regression processing is performed on the fused feature map to obtain target attribute information.

step S311, filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data;

step S312, expanding struts for the target point cloud data to form strut characteristic information;

step S313, the pillar feature information is associated with the primary regression attribute to obtain a target pillar feature;

and step S314, creating a feature map channel with complementary depth and speed according to the target strut features, and obtaining an intermediate feature map.

Further, the system firstly creates a 3D region-of-interest frustum and a 3D bounding box region-of-interest through primary regression attributes obtained through primary regression processing, filters initial point cloud data outside the 3D region-of-interest frustum of the initial point cloud data through the 3D region-of-interest frustum, and then filters the initial point cloud data again through the 3D bounding box region-of-interest to obtain target point cloud data. Further, the millimeter wave radar detection may be outside the ROI frustum of the corresponding object due to inaccurate height information, and in order to solve the problem of inaccurate height information, the pillar feature of the target point cloud data of the millimeter wave radar is introduced, and the point cloud of each millimeter wave radar is expanded to a pillar with a fixed size to form pillar feature information. Further, the system correlates the pillar feature information, which is obtained by performing pillar expansion on target point cloud information obtained by filtering the initial point cloud data, with a primary regression attribute, which is obtained by performing primary regression on image data acquired by the camera by the Centeret, filters the pillar features outside the 3D region of interest, and obtains the target pillar features after the filtering is completed. Further, the system creates a feature map channel with complementary depth and speed by filtering the obtained target strut features, generates an internal thermodynamic feature map channel with the 2D bounding box of the object as the center, and forms an intermediate feature map between the point cloud data and the image data by the internal thermodynamic feature map channel and the feature map formed by the internal thermodynamic feature map channel. The generation method comprises the following steps:

wherein the width and height of the thermodynamic diagram are proportional to the two-dimensional bounding box of the object and are controlled by the parameter a. The thermodynamic diagram value is a normalized object depth d and a component v of the radial speed of the millimeter wave radar on the images x and y_x,v_yI is the feature map channel index, M_iFeature map channel normalization factor, f_iThe characteristic value comprises complementary characteristic depth d and speed v of the millimeter wave radar and the image_xVelocity v_y、

And

x and y coordinates which are the center points of the detected object jLabel, w^jAnd h^jIs the width and height of the j object 2D bounding box. If two objects have overlapping heat map regions, then the object with the smaller depth value is dominant because only the closest object is fully visible in the image.

step S3111, creating a 3D region-of-interest frustum and a 3D bounding box region-of-interest according to the primary regression attribute;

and S2112, filtering the aligned point cloud data according to the 3D region-of-interest frustum and the 3D bounding box region-of-interest to obtain target point cloud data.

Further, the system creates a 3D roi frustum by the primary regression attribute including the 2D bounding box and the depth information obtained by the primary regression process. And reducing the initial point cloud data of the associated millimeter wave radar through the 3D interesting region frustum, and filtering the initial point cloud data outside the interesting region. And then, creating a 3D bounding box region of interest by using the depth and the rotation angle of the object estimated in the primary regression attribute of the primary regression operation, further filtering out initial point cloud data irrelevant to the object, and finally obtaining target point cloud data obtained by filtering the initial point cloud data. It can be understood that if a plurality of millimeter wave radar probes are arranged in the RoI and the nearest point is used as the millimeter wave radar probe corresponding to the target, due to the problem of the accuracy of the depth estimation of the image data, the 3D region of interest can be enlarged or reduced by adjusting the depth estimated by the primary regression, as shown in fig. 7, fig. 7 is a schematic diagram of the multi-sensor fusion method of the present invention for frustum association, and fig. 7 includes the image data taken by the camera, a schematic plan view of the frustum association and a schematic diagram of the point cloud data after the frustum association.

The embodiment generates the fusion characteristic map based on the alignment point cloud data and the primary regression attribute, fuses the depth detection speed of the millimeter wave radar, the accuracy of the depth data and the richness of texture and semantics of image data collected by the camera, overcomes the problem that the sparsity of the millimeter wave radar collected data has no semantic information and the precision of the image data collected by the camera for estimating the 3D attribute, and is beneficial to quickly, accurately and stably sensing the attribute information of an object based on the millimeter wave radar and the camera in a three-dimensional target detection task.

As can be understood, referring to fig. 8, fig. 8 is a schematic diagram of the overall process of the multi-sensor fusion method of the present invention, the overall process of the multi-sensor fusion method of the present invention includes reading an image from a camera and radar point cloud read from a millimeter wave radar, performing a primary regression process on the image through a full convolution backbone network to obtain a primary regression attribute and a 3D bounding box of an object in the image, expanding a strut through point cloud data after aligning the radar point cloud, performing frustum association on strut feature information after expanding the strut and the primary regression attribute, performing a connection operation on a feature map channel of the primary regression attribute and an intermediate feature map obtained through the association, performing a secondary regression operation on a fusion feature map obtained through the connection operation to obtain a secondary regression attribute, and adjusting the original 3D bounding box through the secondary regression attribute, and obtaining a target regression attribute with higher precision, so that after the initial convolutional neural network is inferred through the target regression attribute, the target object can be quickly, accurately and stably detected through a target inference model obtained through inference, wherein the primary regression processing comprises performing regression operations on attributes such as abbreviations HM (heat map), Off (offset), WH (Width, Height), dim (dimension), Dep (depth), Rot (rotation) and the like, the secondary regression processing comprises performing regression operations on attributes such as Dep (depth), Vel (velocity), Rot (rotation) and Att (Attributes), the primary attribute comprises performing convolution with 3x3 convolution and 1x1 convolution on image data through a convolved primary regression head, and the secondary attribute comprises performing convolution with 3x3 convolution and 1x1 convolution on data through three convolved secondary regression heads.

Furthermore, the invention also provides a multi-sensor fusion device.

Referring to fig. 9, fig. 9 is a functional block diagram of a multi-sensor fusion device according to a first embodiment of the present invention.

The multi-sensor fusion device includes:

the primary regression processing module 10 is configured to obtain image data based on a camera, and perform primary regression processing on the image data to obtain a primary regression attribute of an object;

an alignment processing module 20, configured to obtain initial point cloud data based on a millimeter wave radar, and perform alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

a data fusion module 30, configured to generate a fusion feature map based on the aligned point cloud data and the primary regression attribute;

the secondary regression processing module 40 is used for performing secondary regression processing on the fusion feature map to obtain target attribute information;

and the model reasoning module 50 is used for reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model so as to detect the attribute information of the target object according to the target reasoning model.

Further, the primary regression processing module 10 includes:

the detection unit is used for calling the initial convolutional neural network, carrying out target detection on an object in the image data through the initial convolutional neural network, and generating an initial characteristic diagram for the object;

and the primary regression processing unit is used for performing primary regression processing on the initial characteristic map to obtain a primary regression attribute of the object.

Further, the alignment processing module 20 includes:

the extraction unit is used for extracting target features from the fused feature map;

and the regression operation unit is used for performing regression operation on the target characteristics to obtain target attribute information.

Further, the data fusion module 30 includes:

the generating unit is used for generating an intermediate feature map according to the alignment point cloud data and the primary regression attribute;

and the connecting unit is used for connecting the intermediate feature map and the feature map channel of the primary regression attribute to form a fused feature map.

Further, the data fusion module 30 further includes:

the first filtering unit is used for filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data;

the expansion unit is used for expanding the support column for the target point cloud data to form support column characteristic information;

the association unit is used for associating the pillar feature information with the primary regression attribute to obtain a target pillar feature;

and the channel creating unit is used for creating a characteristic diagram channel with complementary depth and speed according to the target strut characteristics and obtaining an intermediate characteristic diagram.

Further, the data fusion module 30 further includes:

the region creating unit is used for creating a 3D region-of-interest frustum and a 3D bounding box region-of-interest according to the primary regression attribute;

and the second filtering unit is used for filtering the aligned point cloud data according to the 3D region of interest frustum and the 3D bounding box region of interest to obtain target point cloud data.

Further, the data fusion module 30 further includes:

and the training unit is used for acquiring a preset convolutional neural network and training data, and training the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

Furthermore, the present invention also provides a storage medium, preferably a computer readable storage medium, having stored thereon a multi-sensor fusion program, which when executed by a processor implements the steps of the embodiments of the multi-sensor fusion method described above.

In the embodiments of the multi-sensor fusion apparatus and the computer-readable medium of the present invention, all technical features of the embodiments of the multi-sensor fusion method are included, and the description and explanation contents are basically the same as those of the embodiments of the multi-sensor fusion method, and are not repeated herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or a part contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk), and includes a plurality of instructions for enabling a terminal device (which may be a fixed terminal, such as an internet of things smart device including smart homes, such as a smart air conditioner, a smart lamp, a smart power supply, a smart router, etc., or a mobile terminal, including a smart phone, a wearable networked AR/VR device, a smart sound box, an autonomous driving automobile, etc.) to execute the method according to each embodiment of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. a multi-sensor fusion method, is characterized in that, described multi-sensor fusion method comprises:

Obtaining image data based on the camera, performing initial regression processing on the image data to obtain the initial regression attribute of the object;

Obtain initial point cloud data based on the millimeter wave radar, and perform alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

generating a fusion feature map based on the aligned point cloud data and the primary regression attribute;

Perform secondary regression processing on the fusion feature map to obtain target attribute information;

Based on the target attribute information, the initial convolutional neural network is inferred to obtain a target inference model, so as to detect the attribute information of the target object according to the target inference model.

2. The multi-sensor fusion method according to claim 1, wherein the step of generating a fusion feature map based on the aligned point cloud data and the initial regression attribute comprises:

Generate an intermediate feature map according to the aligned point cloud data and the primary regression attribute;

The intermediate feature map is connected with the feature map channel of the primary regression attribute to form a fusion feature map.

3. The multi-sensor fusion method according to claim 2, wherein the step of generating an intermediate feature map according to the aligned point cloud data and the initial regression attribute comprises:

Filtering the aligned point cloud data according to the initial regression attribute to obtain target point cloud data;

Extending pillars to the target point cloud data to form pillar feature information;

associating the pillar feature information with the primary regression attribute to obtain the target pillar feature;

Create a depth- and velocity-complementary feature map channel based on the target strut feature, and obtain an intermediate feature map.

4. The multi-sensor fusion method according to claim 3, wherein the step of filtering the aligned point cloud data according to the initial regression attribute to obtain the target point cloud data comprises:

Create a 3D ROI frustum and a 3D bounding box ROI according to the initial regression attribute;

Filter the aligned point cloud data according to the 3D region of interest frustum and the 3D bounding box region of interest to obtain target point cloud data.

5. The multi-sensor fusion method according to claim 1, wherein the step of performing quadratic regression processing on the fusion feature map to obtain target attribute information comprises:

extracting target features from the fusion feature map;

A regression operation is performed on the target feature to obtain target attribute information.

6. The multi-sensor fusion method according to claim 1, wherein the step of performing primary regression processing on the image data to obtain the primary regression attributes of the object comprises:

Calling the initial convolutional neural network, performing target detection on the object in the image data through the initial convolutional neural network, and generating an initial feature map for the object;

The initial regression process is performed on the initial feature map to obtain the initial regression attribute of the object.

7. The multi-sensor fusion method according to claim 1, characterized in that, before the step of obtaining the target inference model by inferring the initial convolutional neural network based on the target attribute information, the method further comprises:

Acquire a preset convolutional neural network and training data, and train the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

8. A multi-sensor fusion device, wherein the multi-sensor fusion device comprises:

an initial regression processing module, configured to obtain image data based on the camera, perform initial regression processing on the image data, and obtain the initial regression attribute of the object;

an alignment processing module, configured to obtain initial point cloud data based on the millimeter wave radar, and perform alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

a data fusion module for generating a fusion feature map based on the aligned point cloud data and the primary regression attribute;

A quadratic regression processing module, used for performing quadratic regression processing on the fusion feature map to obtain target attribute information;

The model inference module is used for inferring an initial convolutional neural network based on the target attribute information to obtain a target inference model, so as to detect the attribute information of the target object according to the target inference model.

9. A multi-sensor fusion device, characterized in that the multi-sensor fusion device comprises a memory, a processor, and a multi-sensor fusion program stored on the memory and executable on the processor, the multi-sensor fusion program The fusion program, when executed by the processor, implements the steps of the multi-sensor fusion method according to any one of claims 1-7.

10. A storage medium, wherein a multi-sensor fusion program is stored on the storage medium, and when the multi-sensor fusion program is executed by a processor, the multi-sensor according to any one of claims 1-7 is implemented The steps of the fusion method.