CN112712129B

CN112712129B - Multi-sensor fusion method, device, equipment and storage medium

Info

Publication number: CN112712129B
Application number: CN202110033933.0A
Authority: CN
Inventors: 徐�明; 刘强; 蔡振伟; 李杉杉; 徐丽华; 黄启明
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2024-04-19
Anticipated expiration: 2041-01-11
Also published as: CN112712129A

Abstract

The invention discloses a multi-sensor fusion method, a device, equipment and a storage medium, wherein image data are acquired based on a camera, and primary regression processing is carried out on the image data to obtain primary regression attributes of an object; acquiring initial point cloud data based on millimeter wave radar, and performing alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data; generating a fusion feature map based on the aligned point cloud data and the primary regression attribute; performing secondary regression processing on the fusion feature map to obtain target attribute information; and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, so as to detect the attribute information of the target object according to the target reasoning model. The invention realizes that the attribute information of the object can be accurately and stably perceived based on the millimeter wave radar and the camera in the three-dimensional target detection task.

Description

Multi-sensor fusion method, device, equipment and storage medium

Technical Field

The present invention relates to the field of multisensor fusion technologies, and in particular, to a multisensor fusion method, device, apparatus, and storage medium.

Background

With the development of industries such as urban security and protection, intelligence, industrial intelligence, electric power internet of things and the like, autonomous edge equipment and industrial autonomous robots are increasingly applied in an intelligent mode, and a perception system is required to comprehensively detect, identify and track target objects. Currently, in the fields of unmanned, robotics, etc., most sensor fusion methods focus on three-dimensional object detection using lidar and cameras. Lidar calculates distance to surrounding objects by the time of flight of a laser pulse. Lidar provides accurate close range depth information, which often appears as point cloud data, but as distance increases, its point cloud data becomes sparse, resulting in a decrease in the ability to perceive distant objects. The camera provides rich two-dimensional information such as color, texture and the like, but cannot sense the depth of an object image. These complementary features have made lidar and camera sensors a topic of interest in recent years. Such a combination has been successfully demonstrated to achieve high precision three-dimensional object detection in many applications, but has its limitations in that both cameras and lidars are sensitive to severe weather conditions (e.g., snow, fog, rain, etc.) and are unable to detect the speed of objects without the use of time information. However, in many cases, estimating the speed of the object is a critical requirement for avoiding collision, especially in the scenes of automatic driving and the like, so that it may not be a viable solution to determine the speed of the object by relying on time information in the case that time information is critical, and further, it may not be possible to accurately and stably perceive the attribute information such as height and width, category, gesture and the like of the object based on the laser radar and camera scheme in the three-dimensional target detection task.

Disclosure of Invention

The invention mainly aims to provide a multi-sensor fusion method, a multi-sensor fusion device, multi-sensor fusion equipment and a multi-sensor fusion storage medium, and aims to solve the technical problem that attribute information of an object cannot be accurately and stably perceived based on a laser radar and camera scheme in a three-dimensional target detection task at present.

In order to achieve the above object, an embodiment of the present invention provides a multi-sensor fusion method, including:

Acquiring image data based on a camera, and performing primary regression processing on the image data to obtain primary regression attributes of an object;

acquiring initial point cloud data based on millimeter wave radar, and performing alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

generating a fusion feature map based on the aligned point cloud data and the primary regression attribute;

performing secondary regression processing on the fusion feature map to obtain target attribute information;

and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, so as to detect the attribute information of the target object according to the target reasoning model.

Preferably, the step of generating a fusion feature map based on the aligned point cloud data and the primary regression attribute includes:

generating an intermediate feature map according to the alignment point cloud data and the primary regression attribute;

and connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fusion feature map.

Preferably, the step of generating an intermediate feature map according to the aligned point cloud data and the primary regression attribute includes:

filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data;

expanding the strut for the target point cloud data to form strut characteristic information;

correlating the pillar feature information with the primary regression attribute to obtain a target pillar feature;

and creating a depth and speed complementary feature map channel according to the target pillar features, and obtaining an intermediate feature map.

Preferably, the step of filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data includes:

Creating a 3D region-of-interest truncated cone and a 3D bounding box region-of-interest according to the primary regression attribute;

and filtering the alignment point cloud data according to the 3D region of interest truncated cone and the 3D bounding box region of interest to obtain target point cloud data.

Preferably, the step of performing a quadratic regression process on the fused feature map to obtain target attribute information includes:

extracting target features from the fusion feature map;

and carrying out regression operation on the target characteristics to obtain target attribute information.

Preferably, the step of performing primary regression processing on the image data to obtain primary regression attributes of the object includes:

invoking the initial convolutional neural network, performing target detection on an object in the image data through the initial convolutional neural network, and generating an initial feature map for the object;

And performing primary regression processing on the initial feature map to obtain primary regression attributes of the object.

Preferably, before the step of reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, the method further includes:

acquiring a preset convolutional neural network and training data, and training the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

To achieve the above object, the present invention also provides a multi-sensor fusion device including:

the primary regression processing module is used for acquiring image data based on the camera, and performing primary regression processing on the image data to obtain primary regression attributes of the object;

The alignment processing module is used for acquiring initial point cloud data based on the millimeter wave radar, and performing alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

The data fusion module is used for generating a fusion feature map based on the aligned point cloud data and the primary regression attribute;

the secondary regression processing module is used for carrying out secondary regression processing on the fusion feature map to obtain target attribute information;

and the model reasoning module is used for reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model so as to detect the attribute information of the target object according to the target reasoning model.

Further, in order to achieve the above object, the present invention also provides a multi-sensor fusion device, which includes a memory, a processor, and a multi-sensor fusion program stored on the memory and executable on the processor, wherein the multi-sensor fusion program implements the steps of the multi-sensor fusion method described above when executed by the processor.

Further, in order to achieve the above object, the present invention further provides a storage medium, on which a multi-sensor fusion program is stored, which implements the steps of the multi-sensor fusion method described above when executed by a processor.

The embodiment of the invention provides a multi-sensor fusion method, a multi-sensor fusion device, multi-sensor fusion equipment and a storage medium, wherein image data are acquired based on a camera, and primary regression processing is carried out on the image data to obtain primary regression attributes of an object; acquiring initial point cloud data based on millimeter wave radar, and performing alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data; generating a fusion feature map based on the aligned point cloud data and the primary regression attribute; performing secondary regression processing on the fusion feature map to obtain target attribute information; and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, so as to detect the attribute information of the target object according to the target reasoning model. According to the invention, the millimeter wave radar is insensitive to severe weather, initial point cloud data can be stably acquired and provided, and the target reasoning model is obtained by reasoning the initial convolutional neural network after the secondary regression processing is carried out on the fusion feature map, so that the attribute information of an object can be accurately and stably perceived based on the millimeter wave radar and the camera in a three-dimensional target detection task.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the multi-sensor fusion method of the present invention;

FIG. 2 is a flow chart of a first embodiment of a multi-sensor fusion method according to the present invention;

FIG. 3 is a first schematic diagram of the present invention for aligning initial point cloud data with image data;

FIG. 4 is a second schematic diagram of the present invention for aligning initial point cloud data with image data;

FIG. 5 is a third schematic diagram of the present invention for aligning initial point cloud data with image data;

FIG. 6 is a flow chart of a second embodiment of the multi-sensor fusion method of the present invention;

FIG. 7 is a schematic diagram of the truncated cone correlation performed by the multi-sensor fusion method of the present invention;

FIG. 8 is a schematic overall flow chart of the multi-sensor fusion method of the present invention;

FIG. 9 is a functional block diagram of a multi-sensor fusion device according to a preferred embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a multi-sensor fusion device of a hardware running environment according to an embodiment of the present invention.

In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present invention, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.

The multi-sensor fusion device of the embodiment of the invention can be a PC, or can be a movable terminal device such as a tablet computer, a portable computer and the like.

As shown in fig. 1, the multi-sensor fusion device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the multi-sensor fusion device structure shown in fig. 1 is not limiting of the multi-sensor fusion device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a multi-sensor fusion program may be included in the memory 1005 as one storage medium.

In the device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server, and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call a multisensor fusion program stored in the memory 1005 and perform the following operations:

Further, the step of generating a fusion feature map based on the aligned point cloud data and the primary regression attribute includes:

Further, the step of generating an intermediate feature map according to the aligned point cloud data and the primary regression attribute includes:

Further, the step of filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data includes:

Further, the step of performing a quadratic regression process on the fused feature map to obtain target attribute information includes:

extracting target features from the fusion feature map;

Further, the step of performing primary regression processing on the image data to obtain primary regression attributes of the object includes:

Further, before the step of deriving the target inference model from the initial convolutional neural network inference based on the target attribute information, the processor 1001 may be configured to invoke the multisensor fusion program stored in the memory 1005, and perform the following operations:

In order that the above-described aspects may be better understood, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.

Referring to fig. 2, a flow chart of a multi-sensor fusion method is provided in a first embodiment of the present invention. In this embodiment, the multi-sensor fusion method includes the steps of:

Step S10, acquiring image data based on a camera, and performing primary regression processing on the image data to obtain primary regression attributes of an object;

The multi-sensor fusion method in the embodiment is applied to a multi-sensor fusion system, and the multi-sensor fusion system at least comprises a millimeter wave radar and a camera, wherein the millimeter wave radar is a radar working in a millimeter wave band for detection, and the millimeter wave is a wave in a frequency domain (wavelength is 1-10 mm) of 30-300 GHz (gigahertz); the camera is video input equipment, is widely applied to video conferences, telemedicine, real-time monitoring and the like, can be used for carrying out image-sound conversation and communication on a network, and can be used for processing various popular digital images, video and audio.

It can be appreciated that, the automation industry robot is generally equipped with different types of sensors, and the accuracy and the robustness of the robot to tasks such as target detection, semantic segmentation and the like are enhanced by utilizing the complementation of the sensing characteristics of the robot to the industry scene and by a multi-sensor fusion method. And the current sensor fusion method utilizes a laser radar and a camera to detect a three-dimensional object. Lidar calculates distance to surrounding objects by the time of flight of a laser pulse. Lidar provides accurate close range depth information, which often appears as point cloud data, but as distance increases, its point cloud data becomes sparse, resulting in a decrease in the ability to perceive distant objects. The camera provides rich two-dimensional information such as color, texture and the like, but cannot sense the depth of an object image. If the three-dimensional object detection is performed by fusing the lidar and the camera, the speed of the object cannot be detected without using time information because both the camera and the lidar are sensitive to severe weather conditions (e.g., weather such as snow, fog, rain, etc.). However, in many cases, estimating the speed of an object is a critical requirement for avoiding collisions, and in particular in situations such as automatic driving, it is not a viable solution to rely on time information for three-dimensional object detection, where time information is the critical information. Therefore, the application provides a multi-sensor fusion method, which performs tasks such as three-dimensional object detection and spatial 3D (three-dimensional) attribute perception by fusing a millimeter wave radar and a camera, and compared with a laser radar and a camera, the millimeter wave radar is very robust to severe weather conditions, can detect objects (up to 200 meters) in a very long range, and can accurately estimate the speeds of all detected objects by using the Doppler effect without any time information. In addition, the point cloud data acquired by the millimeter wave radar requires only less processing before being used as a target detection result, as compared with the laser radar, and the millimeter wave radar is lower in price than the laser radar. And, the present application can solve the problems that the density of point cloud data is increased by aggregating a plurality of millimeter wave radar scans, but data delay is introduced into the system, and, although millimeter wave Lei Dadian cloud is generally represented as a point in a three-dimensional coordinate system, since most millimeter wave radars report only the distance and azimuth angle to an object, the vertical measurement of the point cloud is generally not very accurate, even does not exist.

Further, the system calls the camera, adjusts the shooting angle by controlling the camera, shoots an image in front of the lens after adjusting the shooting angle, and reads the shot image data from the memory of the camera. Further, the system performs primary regression processing on the read image data through an initial convolutional neural network obtained through training of a preset convolutional neural network, specifically, the read image data is input into the initial convolutional neural network, object detection is performed on the image data through the initial convolutional neural network, a feature map is generated on the detected object, primary regression processing is performed on the 3D attribute of the object according to the feature map, and primary regression attribute of the object is obtained, so that alignment point cloud data is obtained after alignment processing is performed according to the initial point cloud data, wherein the primary regression processing comprises the attribute of HM (heat map), off (offset), WH (width, height), dim (Dimension), map (depth), rot (rotation) and the like, and regression operation is performed.

S11, calling the initial convolutional neural network, performing target detection on an object in the image data through the initial convolutional neural network, and generating an initial feature map for the object;

and step S12, performing primary regression processing on the initial feature map to obtain primary regression attributes of the object.

Further, the system calls an initial convolutional neural network, inputs image data into the initial convolutional neural network, detects an object of the image data through CENTERNET in the initial convolutional neural network after training, generates a feature map of the detected object, CENTERNET is used for finding a center point by modeling the object in one image data as a point as a center point of a bounding box of a target, using key point estimation, returning to all other object attributes such as size, 3D attribute, direction, category and even gesture, and inputting an input image into a full convolutional network generating a thermodynamic diagram according to a backbone network in the preset convolutional neural network, wherein a local peak in the thermodynamic diagram corresponds to the center of the object, predicts the height and width, category, gesture and other attributes of the bounding box of the object by using the image features of the peak, and generates a feature map of the object, wherein the backbone network is the full convolutional network, generally (DLA (DEEP LAYER aggration, deep Aggregation) 43, resnet, hourglassNet) network structure, resnet is an 18-layer network structure with weight, and HourglassNet is an hourglass network for detecting the key points; and performing primary regression processing on the attributes such as the depth, the 3D bounding box, the speed and the like of the object according to the feature map to obtain primary regression attributes of the object, so that alignment processing is performed on the initial point cloud data and the image data conveniently, alignment point cloud data is obtained, and the accuracy of the data obtained by processing is improved to a certain extent.

Step S20, acquiring initial point cloud data based on millimeter wave radar, and performing alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

Further, the system calls the millimeter wave radar, controls the millimeter wave radar to detect objects in a detectable range, forms point cloud data according to detected object information, and reads the point cloud data from a memory of the millimeter wave radar as initial point cloud data. Further, the system performs time alignment on initial point cloud data acquired from the millimeter wave radar and image data read from the camera, converts a coordinate system in which the initial point cloud data is located into a coordinate system in which the image data is located, and transforms the radial speed of the initial point cloud data acquired by the millimeter wave radar, so that the radial speed of the initial point cloud data is consistent with the speed in the primary regression attribute after primary regression processing, and an aligned point cloud data is obtained, and a fusion feature map is generated based on the aligned point cloud data and the primary regression attribute, and specifically transforms a first schematic diagram, a second schematic diagram and a third schematic diagram of the invention, wherein the first schematic diagram, the second schematic diagram and the third schematic diagram of the invention are used for aligning the initial point cloud data with the image data, an object A moving forwards in the same direction with the trolley is present, an object B moving in the right front of the trolley at an actual motion speed vB, and X, Y, Z is respectively a horizontal coordinate and a vertical coordinate, and a vertical coordinate are used for transforming the radial speed of the object B from the actual motion speed to the actual motion speed of the trolley B in the same direction as a direction which is different from the actual motion speed of the trolley, and the actual motion speed of the trolley B is required to be parallel to the actual motion speed; in fig. 4, a point C of the camera's photographing center in a three-dimensional coordinate system formed by X, Y, Z three coordinate axes is taken as an origin, and the image plane also has a three-dimensional coordinate system formed by X, Y, Z three coordinate axes and a coordinate origin P of the object, so that X, Y, Y coordinate axes in the two coordinate systems need to be aligned respectively so that intermediate features of data in the two coordinate systems are extracted; fig. 5 includes image data captured by a camera and point cloud data detected by a millimeter wave radar, where the point cloud data is sparse.

Step S30, generating a fusion feature map based on the aligned point cloud data and the primary regression attribute;

further, the system firstly carries out the manufacture of the middle feature map of the millimeter wave radar data according to the feature map specification generated when carrying out primary regression processing on the image data according to the alignment point cloud data and the primary regression attribute; and after the intermediate feature map is manufactured, connecting the intermediate feature map with the primary regression attribute of the image data through a feature map channel of the primary regression attribute to form a fusion feature map, so as to perform secondary regression processing on the fusion feature map, and obtain target attribute information.

Step S40, performing secondary regression processing on the fusion feature map to obtain target attribute information;

Further, the system carries out secondary regression processing on the generated fusion feature map so as to re-estimate the 3D attribute of the detected object and obtain target attribute information; specifically, the system may perform feature extraction on the fused feature map by using convolution operation, and then perform regression operation on the extracted features to obtain the target attribute information.

step S41, extracting target features from the fusion feature map;

and step S42, carrying out regression operation on the target characteristics to obtain target attribute information.

Further, the system can extract target features of 3D attributes such as depth, speed and the like from the fusion feature map through convolution operation, and perform regression operation on the target features after the target features are extracted, and perform regression operation on the target features through a secondary regression head comprising three convolution layers, so that 3×3 convolution layers and 1×1 convolution layers are added to generate required output, and target attribute information is obtained; compared with the regression head of the primary regression processing, the additional convolution layer in the regression head of the secondary regression is beneficial to learning higher-level features from the feature mapping of the millimeter wave radar, and finally, the result of the operation of the secondary regression head is decoded into a 3D boundary box, and the 3D boundary box decoder uses the estimated depth, speed, rotation and attribute from the secondary regression head and acquires the attribute information of other object objects from the primary regression attribute.

And S50, reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, so as to detect the attribute information of the target object according to the target reasoning model.

Further, after training the preset convolutional neural network through training data and obtaining an initial convolutional neural network, in order to enable the attribute information detected when the trained initial convolutional neural network is applied to perform three-dimensional target detection to be more accurate, the initial convolutional neural network is further required to be inferred through target attribute information obtained through secondary regression processing, specifically, the target attribute information obtained through the secondary regression processing is input into the initial convolutional neural network, data inference is performed through the initial convolutional neural network, and after the data inference is completed, a target inference model for multi-modal fusion depth convolutional neural network 3D target detection and recognition is generated, so that the attribute information of a target object can be detected according to the target inference model, and the attribute information of the object can be quickly, accurately and stably perceived in a three-dimensional target detection task based on a millimeter wave radar and a camera.

Further, before the step of reasoning the initial convolutional neural network to obtain the target reasoning model based on the target attribute information, the method further comprises:

and step A, acquiring a preset convolutional neural network and training data, and training the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

Further, the system firstly acquires a preset convolutional neural network, wherein the preset convolutional neural network in the embodiment can be FCN (Full Connected Network, full convolutional network), and the full convolutional network is a convolutional neural network obtained by replacing the full connect network with the convolutional network on the basis of the full connect network; then, in order to enable the preset convolutional neural network to provide accurate attribute information when three-dimensional target detection is performed, the system can acquire images through a camera, acquire point cloud data through a millimeter wave radar, respectively perform manual labeling on the images and the point cloud data so as to identify and distinguish objects in the images and the point cloud data, further take the manually labeled images and the point cloud data as training data, input the training data into the preset convolutional neural network for training, and adjust parameters in the preset convolutional neural network to obtain an initial convolutional neural network; the system can also acquire the marked image and point cloud data from the outside in a wireless communication mode as training data, input the training data into a preset convolutional neural network for training, and adjust parameters in the preset convolutional neural network to obtain an initial convolutional neural network, so that the primary regression attribute obtained by primary regression processing through the initial convolutional neural network is accurate.

The embodiment provides a multi-sensor fusion method, a multi-sensor fusion device, multi-sensor fusion equipment and a storage medium, wherein image data are acquired based on a camera, primary regression processing is carried out on the image data, and primary regression attributes of an object are obtained; acquiring initial point cloud data based on millimeter wave radar, and performing alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data; generating a fusion feature map based on the aligned point cloud data and the primary regression attribute; performing secondary regression processing on the fusion feature map to obtain target attribute information; and reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model, so as to detect the attribute information of the target object according to the target reasoning model. According to the invention, the initial point cloud data and the image data are associated, the fusion feature map is generated according to the associated initial point cloud data and the image data, and the target attribute information obtained after the secondary regression processing of the fusion feature map is used for reasoning the initial neural network, so that the target reasoning model for detecting the attribute information of the target object is obtained, and the attribute information of the object is quickly, accurately and stably perceived based on the millimeter wave radar and the camera in the three-dimensional target detection task.

Further, referring to fig. 6, based on the first embodiment of the multi-sensor fusion method of the present invention, a second embodiment of the multi-sensor fusion method of the present invention is proposed, in the second embodiment, the step of generating a fusion feature map based on the aligned point cloud data and the primary regression attribute includes:

step S31, generating an intermediate feature map according to the aligned point cloud data and the primary regression attribute;

And step S32, connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fusion feature map.

Further, the system firstly acquires a feature map specification generated when primary regression processing is carried out on the image data, and then the alignment point cloud data and the primary regression attribute are manufactured into an intermediate feature map of Ji Dian cloud data according to the acquired feature map specification; further, after the manufacture of the intermediate feature map is completed, the system performs connection operation on the intermediate feature map through a feature map channel of the primary regression attribute and the feature map of the primary regression attribute, and forms a fusion feature map after the connection operation is completed, so as to perform secondary regression processing on the fusion feature map, and obtain target attribute information.

Step S311, filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data;

Step S312, expanding the strut for the target point cloud data to form strut characteristic information;

step S313, associating the pillar feature information with the primary regression attribute to obtain a target pillar feature;

And step S314, creating a depth and speed complementary feature map channel according to the target pillar features, and obtaining an intermediate feature map.

Further, the system firstly creates a 3D region-of-interest truncated cone and a 3D bounding box region-of-interest through primary regression attributes obtained through primary regression processing, filters initial point cloud data outside the 3D region-of-interest truncated cone of the initial point cloud data through the 3D region-of-interest truncated cone, and then filters the initial point cloud data again through the 3D bounding box region-of-interest to obtain target point cloud data. Further, since the millimeter wave radar detection is inaccurate in height information, the millimeter wave radar detection may be out of the ROI truncated cone of the corresponding object, and in order to solve the problem of inaccurate height information, pillar features of target point cloud data of the millimeter wave radar are introduced, and the point cloud of each millimeter wave radar is expanded to a pillar with a fixed size, so that pillar feature information is formed. Further, the system correlates post characteristic information which is obtained by carrying out post expansion on target point cloud information obtained by filtering the initial point cloud data with primary regression attribute obtained by carrying out primary regression on image data acquired by a camera through CENTERNET, filters columnar characteristics outside a 3D region of interest, and obtains target post characteristics after the filtering is completed. Further, the system creates a depth and speed complementary feature map channel by filtering out the obtained target pillar features, generates an internal thermodynamic diagram feature map channel which takes a 2D bounding box of the object as a center, and forms an intermediate feature map between point cloud data and image data by the internal thermodynamic diagram feature map channel and a feature map formed by the internal thermodynamic diagram feature map channel. The generation method comprises the following steps:

Wherein the width and height of the thermodynamic diagram are proportional to the two-dimensional bounding box of the object and are controlled by parameter a. The thermodynamic diagram value is the component v _x,v_y of normalized object depth d and radial velocity of millimeter wave radar on the images x and y, i is the characteristic diagram channel index, M _i is the characteristic diagram channel normalization factor, f _i is the characteristic value including complementary characteristic depth d, velocity v _x, velocity v _y, velocity of millimeter wave radar and image, And/>Is the x and y coordinates of the center point of the detected object j, w ^j and h ^j are the width and height of the j object 2D bounding box. If two objects have overlapping heat map regions, the object with the smaller depth value dominates because only the closest object is fully visible in the image.

Step S3111, creating a 3D region of interest truncated cone and a 3D bounding box region of interest according to the primary regression attribute;

Step S2112, filtering the aligned point cloud data according to the frustum of the 3D region of interest and the region of interest of the 3D bounding box, to obtain target point cloud data.

Further, the system creates a 3DRoI (region of interest ) truncated cone by primary regression processing of the primary regression attributes containing the 2D bounding box and depth information. And (3) reducing initial point cloud data of the associated millimeter wave radar through a 3D region of interest frustum, and filtering out the initial point cloud data outside the region of interest. Then, an object depth and a rotation angle estimated in the primary regression attribute of the primary regression operation are used for creating a 3D bounding box region of interest, initial point cloud data irrelevant to the object are further filtered, and finally target point cloud data after the filtering processing of the initial point cloud data is obtained. It can be understood that if there are multiple millimeter wave radar detection points in the RoI and the nearest point is used as the millimeter wave radar detection point corresponding to the target, due to the accuracy problem of the depth estimation of the image data, the 3D region of interest can be enlarged or reduced by adjusting the depth of the primary regression estimation, as shown in fig. 7, fig. 7 is a schematic diagram of the multi-sensor fusion method of the present invention for performing truncated cone correlation, and fig. 7 includes the image data captured by the camera, a schematic plan diagram of the truncated cone correlation and a schematic diagram of the correlated point cloud data.

According to the method, the fusion feature map is generated based on the alignment point cloud data and the primary regression attribute, the depth detection speed of the millimeter wave radar, the accuracy of the depth data and the richness of textures and semantemes of the image data collected by the camera are fused, the problem that sparsity of the image data collected by the millimeter wave radar does not have semanteme information and the accuracy of estimating the 3D attribute of the image data collected by the camera is solved, and the method is beneficial to quickly, accurately and stably sensing the attribute information of an object based on the millimeter wave radar and the camera in a three-dimensional target detection task.

As can be understood, referring to fig. 8, fig. 8 is a schematic overall flow diagram of the multi-sensor fusion method of the present invention, in which the overall flow diagram of the multi-sensor fusion method of the present invention includes an image read from a camera and a radar point cloud read from a millimeter wave radar, primary regression processing is performed on the image through a full convolution backbone network to obtain a primary regression attribute and a 3D bounding box of an object in the image, a strut is extended by point cloud data aligned with the radar point cloud, strut feature information after the strut is extended and the primary regression attribute is associated, then a connection operation is performed between an intermediate feature map obtained by the association and a feature map channel of the primary regression attribute, and a secondary regression operation is performed on the fusion feature map obtained by the connection operation to obtain a secondary regression attribute, the primary 3D bounding box is adjusted by the secondary regression attribute to obtain a target regression attribute with higher accuracy, so that after the primary convolution neural network is performed by the target regression attribute, the target object is detected rapidly, accurately and stably by a target inference model obtained by inference, wherein the primary regression processing includes a slight regression (hth), an o regression (distance), a rotation (distance (39) and a rotation (distance) of the first position (39) data (distance) and a rotation (distance) of the first position (39) of the image) are calculated by a rotation (distance) and a rotation (distance) of 1, the quadratic property consists of processing the data with 3x3conv (convolution) and 1x1conv by means of three convolved quadratic regression heads.

Further, the invention also provides a multi-sensor fusion device.

Referring to fig. 9, fig. 9 is a schematic functional block diagram of a first embodiment of a multi-sensor fusion device according to the present invention.

The multi-sensor fusion device includes:

the primary regression processing module 10 is used for acquiring image data based on a camera, and performing primary regression processing on the image data to obtain primary regression attributes of an object;

The alignment processing module 20 is configured to obtain initial point cloud data based on a millimeter wave radar, and perform alignment processing on the initial point cloud data and the image data to obtain aligned point cloud data;

A data fusion module 30, configured to generate a fusion feature map based on the aligned point cloud data and the primary regression attribute;

The secondary regression processing module 40 is configured to perform secondary regression processing on the fused feature map to obtain target attribute information;

The model inference module 50 is configured to infer an initial convolutional neural network based on the target attribute information to obtain a target inference model, so as to detect attribute information of a target object according to the target inference model.

Further, the primary regression processing module 10 includes:

The detection unit is used for calling the initial convolutional neural network, carrying out target detection on an object in the image data through the initial convolutional neural network, and generating an initial feature map for the object;

and the primary regression processing unit is used for carrying out primary regression processing on the initial feature map to obtain the primary regression attribute of the object.

Further, the alignment processing module 20 includes:

an extracting unit, configured to extract a target feature from the fusion feature map;

and the regression operation unit is used for carrying out regression operation on the target characteristics to obtain target attribute information.

Further, the data fusion module 30 includes:

the generating unit is used for generating an intermediate feature map according to the aligned point cloud data and the primary regression attribute;

and the connecting unit is used for connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fusion feature map.

Further, the data fusion module 30 further includes:

The first filtering unit is used for filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data;

The expansion unit is used for expanding the strut for the target point cloud data to form strut characteristic information;

The association unit is used for associating the pillar feature information with the primary regression attribute to obtain a target pillar feature;

And the channel creation unit is used for creating a depth and speed complementary feature map channel according to the target pillar features and obtaining an intermediate feature map.

Further, the data fusion module 30 further includes:

The region creation unit is used for creating a 3D region-of-interest truncated cone and a 3D bounding box region-of-interest according to the primary regression attribute;

and the second filtering unit is used for filtering the alignment point cloud data according to the 3D region of interest truncated cone and the 3D bounding box region of interest to obtain target point cloud data.

Further, the data fusion module 30 further includes:

the training unit is used for acquiring a preset convolutional neural network and training data, and training the preset convolutional neural network according to the training data to obtain an initial convolutional neural network.

In addition, the present invention further provides a storage medium, preferably a computer readable storage medium, on which a multisensor fusion program is stored, which when executed by a processor, implements the steps of the embodiments of the multisensor fusion method described above.

In the embodiments of the multi-sensor fusion device and the computer readable medium of the present invention, all technical features of each embodiment of the multi-sensor fusion method are included, and description and explanation contents are basically the same as those of each embodiment of the multi-sensor fusion method, which are not described in detail herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or partly in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a fixed terminal, such as an intelligent device for internet of things, including intelligent home such as an intelligent air conditioner, an intelligent lamp, an intelligent power supply, an intelligent router, or a mobile terminal, including a smart phone, a wearable internet-of-a-r/VR device, an intelligent sound box, an automatic car, or the like) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A multi-sensor fusion method, the multi-sensor fusion method comprising:

The initial convolutional neural network is inferred based on the target attribute information to obtain a target inference model, so that attribute information detection is carried out on a target object according to the target inference model;

The step of generating a fusion feature map based on the aligned point cloud data and the primary regression attribute includes: generating an intermediate feature map according to the alignment point cloud data and the primary regression attribute; connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fusion feature map;

The step of generating an intermediate feature map according to the aligned point cloud data and the primary regression attribute comprises the following steps: filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data; expanding the strut for the target point cloud data to form strut characteristic information; correlating the pillar feature information with the primary regression attribute to obtain a target pillar feature; and creating a depth and speed complementary feature map channel according to the target pillar features, and obtaining an intermediate feature map.

2. The multi-sensor fusion method of claim 1, wherein the step of filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data comprises:

3. The multi-sensor fusion method of claim 1, wherein the step of performing a quadratic regression process on the fusion feature map to obtain the target attribute information comprises:

extracting target features from the fusion feature map;

4. The multi-sensor fusion method of claim 1, wherein the step of performing a primary regression process on the image data to obtain primary regression attributes of the object comprises:

5. The multi-sensor fusion method of claim 1, further comprising, prior to the step of deriving a target inference model based on the target attribute information for the initial convolutional neural network inference:

6. A multi-sensor fusion device, the multi-sensor fusion device comprising:

the model reasoning module is used for reasoning the initial convolutional neural network based on the target attribute information to obtain a target reasoning model so as to detect the attribute information of a target object according to the target reasoning model;

The data fusion module is further used for generating an intermediate feature map according to the aligned point cloud data and the primary regression attribute; connecting the intermediate feature map with the feature map channel of the primary regression attribute to form a fusion feature map;

The data fusion module is further used for filtering the aligned point cloud data according to the primary regression attribute to obtain target point cloud data; expanding the strut for the target point cloud data to form strut characteristic information; correlating the pillar feature information with the primary regression attribute to obtain a target pillar feature; and creating a depth and speed complementary feature map channel according to the target pillar features, and obtaining an intermediate feature map.

7. A multisensor fusion apparatus comprising a memory, a processor, and a multisensor fusion program stored on the memory and executable on the processor, the multisensor fusion program when executed by the processor implementing the steps of the multisensor fusion method of any one of claims 1-5.

8. A storage medium, wherein a multi-sensor fusion program is stored on the storage medium, which when executed by a processor, implements the steps of the multi-sensor fusion method according to any one of claims 1-5.