CN112434682B

CN112434682B - Data fusion method and device based on multiple sensors and storage medium

Info

Publication number: CN112434682B
Application number: CN202110110114.1A
Authority: CN
Inventors: 陈伟
Original assignee: Imotion Automotive Technology Suzhou Co Ltd
Current assignee: Imotion Automotive Technology Suzhou Co Ltd
Priority date: 2021-01-27
Filing date: 2021-01-27
Publication date: 2021-04-27
Anticipated expiration: 2041-01-27
Also published as: CN112434682A

Abstract

The application relates to a data fusion method, a device and a storage medium based on multiple sensors, belonging to the technical field of computers, wherein the method comprises the following steps: acquiring sensor data of the same detection range respectively acquired by the n sensors; respectively extracting the characteristics of the sensor data corresponding to each sensor to obtain a characteristic diagram corresponding to each sensor; determining the channel weight of each image channel in the target image area in each feature map; updating the channel information of the feature map according to the channel weight to obtain an updated feature map; determining the area weight of the target image area in the updated feature map for the target image area in the updated feature map corresponding to each sensor; merging the target image areas in each updated feature map according to the area weight to obtain a fused feature map; the problem that the data fusion effect of the existing data fusion mode based on multiple sensors is poor can be solved; the data fusion effect can be improved.

Description

Data fusion method and device based on multiple sensors and storage medium

Technical Field

The application relates to a data fusion method and device based on multiple sensors and a storage medium, and belongs to the technical field of computers.

Background

With the development of the automatic driving technology, vehicles currently have an environment sensing function, such as: object detection, object recognition, etc. may be performed. The environmental sensing function of the vehicle is realized by a sensor mounted on the vehicle. In general, there are many kinds of sensors on a vehicle, and therefore, data fusion of data collected by the various sensors is required when sensing the environment.

At present, the data fusion method for data collected by various sensors includes: extracting the characteristics of the data collected by each sensor; then, the feature extraction results are combined according to a fixed weight. Or, carrying out feature extraction on the data acquired by each sensor; then, the respective feature extraction results are added according to a fixed weight.

However, each type of sensor data has advantages and disadvantages under different scenarios, such as: when the target is detected by utilizing the image data collected by the image sensor at night, the target is difficult to detect due to the problem of illumination, but the target is not influenced by the brightness of illumination when the target is detected by utilizing the laser point cloud data collected by the laser radar sensor. For another example: the laser point cloud data acquired by the laser radar sensor is sparse, and the problem of low identification accuracy rate is caused by less characteristic information in a target identification scene, but when the target is identified by utilizing the image data acquired by the image sensor, the image data can provide abundant texture information, so that the accuracy of target identification can be improved. If each sensor corresponds to a fixed weight, the fixed weight may not be adapted to the current scene, resulting in a poor data fusion effect.

Disclosure of Invention

The application provides a data fusion method and device based on multiple sensors and a storage medium, and can solve the problem that when data based on multiple sensors are subjected to feature fusion, the weight corresponding to each sensor is fixed, and the current scene cannot be adapted in a self-adaptive manner, so that the data fusion effect is poor. The application provides the following technical scheme:

in a first aspect, a multi-sensor based data fusion method is provided, the method comprising:

acquiring sensor data of the same detection range respectively acquired by n sensors, wherein n is an integer greater than 1;

respectively extracting the characteristics of the sensor data corresponding to each sensor to obtain a characteristic diagram corresponding to each sensor;

determining a target image area of a target to be detected based on a first feature map corresponding to a first sensor of the n types of sensors, and mapping the target image area to a second feature map corresponding to a second sensor of the n types of sensors to obtain a target image area of the second feature map; the second sensor is a different one of the n sensors than the first sensor;

for a target image area in a feature map corresponding to each sensor, determining the importance ratio of each image channel in the target image area, wherein the importance ratio is the channel weight of the corresponding image channel; wherein different image channels represent different types of image information;

updating the channel information of the feature map according to the channel weight to obtain an updated feature map;

for a target image area in an updated feature map corresponding to each sensor, determining the credibility of the target image area in the updated feature map, wherein the credibility is the area weight;

and merging the target image areas in the updated feature maps according to the area weights to obtain the fused feature maps.

Optionally, the updating the channel information of the feature map according to the channel weight to obtain an updated feature map includes:

and multiplying the weight of each channel with the feature map to obtain the updated feature map.

Optionally, the merging the target image regions in each updated feature map according to the region weights to obtain a fused feature map includes:

for each updated feature map, multiplying the region weight by a target image region in the updated feature map to obtain a region-updated feature map;

and merging the updated feature maps of the regions to obtain the fused feature map.

Optionally, the performing feature extraction on the sensor data corresponding to each sensor respectively to obtain a feature map corresponding to each sensor includes:

inputting sensor data corresponding to each sensor into a corresponding convolution layer to obtain a characteristic diagram corresponding to the sensor; wherein, the convolution layers corresponding to different sensors are different.

Optionally, the determining a target image region of the target to be detected based on the first feature map corresponding to the first sensor of the n types of sensors includes:

inputting the first feature map into a preset deconvolution layer;

and carrying out three-dimensional target detection on the deconvolved characteristic graph to obtain a target image area of the target to be detected.

Optionally, the acquiring sensor data of the same detection range respectively acquired by the n types of sensors includes:

obtaining calibration parameters of the n sensors;

acquiring initial sensor data of the n sensors, wherein the initial sensor data is data acquired by the n sensors at the same time;

converting the initial sensor data to a common coordinate system according to the calibration parameters;

and screening the sensor data in the public coordinate system to obtain the sensor data in the same detection range.

Optionally, the first sensor comprises a lidar sensor; the second sensor includes at least one of an image sensor and a millimeter wave radar sensor.

Optionally, after the target image regions in the updated feature maps are merged according to the region weights to obtain the fused feature map, the method further includes:

and carrying out target detection according to the fused feature map to obtain a target detection result.

In a second aspect, a multi-sensor based data fusion apparatus is provided, the apparatus comprising a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the multi-sensor based data fusion method provided by the first aspect.

In a third aspect, a computer-readable storage medium is provided, in which a program is stored, which when executed by a processor is configured to implement the multi-sensor based data fusion method provided in the first aspect.

The beneficial effects of this application include at least: acquiring sensor data of the same detection range respectively acquired by n sensors; respectively extracting the characteristics of the sensor data corresponding to each sensor to obtain a characteristic diagram corresponding to each sensor; determining a target image area of a target to be detected based on a first feature map corresponding to a first sensor of the n sensors, and mapping the target image area to a second feature map corresponding to a second sensor of the n sensors to obtain a target image area of the second feature map; for a target image area in a feature map corresponding to each sensor, determining the importance ratio of each image channel in the target image area, wherein the importance ratio is the channel weight of the corresponding image channel; updating the channel information of the feature map according to the channel weight to obtain an updated feature map; for a target image area in the updated feature map corresponding to each sensor, determining the credibility of the target image area in the updated feature map, wherein the credibility is the area weight; merging the target image areas in each updated feature map according to the area weight to obtain a fused feature map; the problem that when data based on multiple sensors are subjected to feature fusion, the weight corresponding to each sensor is fixed, and the current scene cannot be adapted in a self-adaptive manner, so that the data fusion effect is poor can be solved; because an attention mechanism and a reliability mechanism are introduced into the feature fusion of the multi-source heterogeneous sensor, important image channel information is screened out from a feature map obtained by the same data in an interested region, and the important image channel information is converted into a ratio to be subjected to weight multiplication; and predicting the reliability of the ROI characteristic diagram of several different sensors as a target after attention mechanism processing, and distributing the reliability as a weight to the corresponding ROI characteristic diagram for multiplication and addition to generate a fused characteristic diagram, wherein the channel weight and the region weight can be adaptive to the current image, namely adaptive to the current scene, so that the data fusion effect can be improved.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

[ description of the drawings ]

FIG. 1 is a flow chart of a multi-sensor based data fusion method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a data fusion process provided by one embodiment of the present application;

FIG. 3 is a diagram illustrating channel information update provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a feature fusion process provided by one embodiment of the present application;

FIG. 5 is a block diagram of a multi-sensor based data fusion apparatus provided in one embodiment of the present application;

fig. 6 is a block diagram of a multi-sensor based data fusion apparatus according to yet another embodiment of the present application.

[ detailed description ] embodiments

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

First, several terms referred to in the present application will be described.

Feature Extraction (Feature Extraction): it is a process of finding out the most effective (invariance of homogeneous samples, discriminability of different samples, robustness to noise) features from the original features.

Neural Network (Neural Network): the method is an arithmetic mathematical model simulating animal neural network behavior characteristics and performing distributed parallel information processing.

Feature Level Fusion (Feature Level Fusion): the method is characterized in that the multi-source heterogeneous sensors respectively extract respective features, and the extracted different features are generated into new features through a certain method, so that the new features are more effective for classification.

Region Of Interest (ROI): an area is selected from the entire block of data that is the focus of interest for data analysis.

Region recommendations (Region recommendations): the method is to take an image (with any size) as an input, and output a set of rectangular target suggestion boxes through full convolution network calculation.

Attention Mechanism (Attention Mechanism): it is to focus attention on important points in a plurality of information, select key information and ignore other unimportant information.

Target Detection (Target Detection): it is referred to finding all objects of interest in the image, determining their location and classification.

Optionally, the present application is described by taking an execution subject of each embodiment as an example of an electronic device with data processing capability, where the electronic device may be a terminal or a server, and the terminal may be a mobile phone, a computer, a notebook computer, a tablet computer, a vehicle-mounted computer, and the like.

In this application, the electronic device is connected to n sensors installed on the same vehicle (or called target vehicle) in a communication manner. n is an integer greater than 1. The n sensors refer to different types of sensors, in other words, the n sensors are multi-source heterogeneous sensors. Wherein the different types include: different data acquisition principles, different types of signals emitted by the sensors, different sensing ranges, different sensing distances, and the like, and the present embodiment does not limit the different types of the sensors. Alternatively, the number of each sensor may be one or more, and the present embodiment does not limit the number of each sensor on the vehicle.

Optionally, the n sensors include, but are not limited to, n of the following: a laser radar sensor, a millimeter wave radar sensor, an image sensor (or called a camera), an ultrasonic radar sensor, etc., and the present embodiment does not limit the sensor type of the n sensors.

Fig. 1 is a flowchart of a multi-sensor based data fusion method according to an embodiment of the present application. The method at least comprises the following steps:

step 101, acquiring sensor data of the same detection range respectively acquired by n sensors.

Alternatively, the n types of sensors are installed at different positions on the vehicle, and the data collected by each type of sensor is usually represented by a coordinate system established by the position of the sensor. Therefore, the data collected by different sensors may correspond to different coordinate systems. In order to enable data collected by n types of sensors to be represented in the same coordinate system, before this step, external reference calibration needs to be performed on the n types of sensors, so as to obtain a coordinate conversion relationship between coordinate systems corresponding to each type of sensor, and thus, data corresponding to different coordinate systems are converted to the same coordinate system based on the coordinate conversion relationship.

After the external reference calibration is performed on the n coordinate systems, the obtained calibration parameters of the n sensors are obtained, and the calibration parameters are used for converting data corresponding to different coordinate systems into the same coordinate system.

In one example, the calibration parameters are used to convert data corresponding to different coordinate systems to a predetermined common coordinate system. The common coordinate system is a coordinate system established by taking the target vehicle as a coordinate origin. Optionally, the coordinate origin is a rear axle center of the target vehicle, or a central point of the target vehicle, and the implementation manner of the coordinate origin of the common coordinate system is not limited in this embodiment. Optionally, the common coordinate system has a z-axis as a direction perpendicular to the ground, a y-axis as a traveling manner of the target vehicle, and an x-axis as a direction perpendicular to both the z-axis and the y-axis. In other embodiments, the arrangement manner of the x-axis, the y-axis, and the z-axis in the common coordinate system may also be other manners, and the arrangement manner of the x-axis, the y-axis, and the z-axis in the common coordinate system is not limited in this embodiment.

Optionally, since the n sensors collect data at different frequencies, for example: the acquisition frequencies of the n sensors include 50Hz, 100Hz and 25Hz, and therefore, the sensor data needs to be time-synchronized before fusion. Based on this, before this step, it is necessary to time-synchronize the acquisition timings of the n types of sensors. Optionally, the manner of time synchronizing the acquisition time of the n sensors includes, but is not limited to: performing Time synchronization based on a Precision Time Protocol (PTP); alternatively, Time synchronization is performed based on a Network Time Protocol (NTP), and the like, and the Time synchronization method is not limited in this embodiment.

Based on the above, optionally, the acquiring sensor data of the same detection range respectively acquired by the n types of sensors includes: obtaining calibration parameters of n sensors; acquiring initial sensor data of the n sensors, wherein the initial sensor data is data acquired by the n sensors at the same time; converting the initial sensor data into a common coordinate system according to the calibration parameters; and screening the sensor data in the common coordinate system to obtain the sensor data in the same detection range.

The detection range is a range in three-dimensional directions on the common coordinate system. The number of the detection ranges can be one or more, and the positions of the detection ranges can be prestored in the electronic equipment; or selected by the user, the present embodiment does not limit the determination method of the detection range.

And 102, respectively extracting the characteristics of the sensor data corresponding to each sensor to obtain a characteristic diagram corresponding to each sensor.

The sensor data is used to indicate an environment image around the target vehicle, and a feature map of the environment image can be obtained by feature extraction. Optionally, the manner of feature extraction includes but is not limited to: histogram of Oriented Gradient (HOG), Scale-invariant features transform (SIFT), or feature extraction based on a neural network.

In one example, taking feature extraction based on a neural network as an example for explanation, the feature extraction is performed on sensor data corresponding to each sensor respectively to obtain a feature map corresponding to each sensor, and the feature map comprises: inputting sensor data corresponding to each sensor into a corresponding convolution layer to obtain a characteristic diagram corresponding to the sensor; wherein, the convolution layers corresponding to different sensors are different.

Referring to the data fusion process shown in fig. 2, the n sensors include a laser radar sensor, a millimeter wave radar sensor, and an image sensor, where laser radar data corresponding to the laser radar sensor is input to the first convolutional layer 21, millimeter wave radar data corresponding to the millimeter wave radar sensor is input to the second convolutional layer 22, and image data corresponding to the image sensor is input to the third convolutional layer 23. Wherein the first convolution layer 21, the second convolution layer 22 and the third convolution layer 23 are all different convolution layers.

Optionally, the step sizes of the different convolution layers and/or the sizes of the convolution kernels are the same or different. In addition, the number of the convolution layers corresponding to each sensor may be one or more, and the number of the convolution layers corresponding to each sensor is not limited in this embodiment.

In addition, the feature map corresponding to each sensor is also three-dimensional data, including the width and height of the image and image channel data.

In this embodiment, the feature map includes a plurality of image channels, and different image channels represent different types of image information. Such as: the image channel includes 3 channels for representing texture information, color information, and signal strength information of the sensor, respectively.

And 103, determining a target image area of the target to be detected based on the first feature map corresponding to the first sensor in the n sensors, and mapping the target image area to a second feature map corresponding to the second sensor in the n sensors to obtain a target image area of the second feature map.

Wherein the second sensor is a different sensor of the n types of sensors than the first sensor.

The first sensor may be any one of the n types of sensors, or a specified type of sensor among the n types of sensors. The target to be detected can be detected based on the first characteristic diagram corresponding to the first sensor.

Taking the type of sensor shown in FIG. 2 as an example, the first sensor comprises a lidar sensor; the second sensor includes at least one of an image sensor and a millimeter wave radar sensor, and fig. 2 illustrates an example in which the second sensor includes an image sensor and a millimeter wave radar sensor. After target detection is carried out on a first characteristic diagram corresponding to a first sensor, a target image area of a target to be detected is obtained; and then, correspondingly mapping the target image area to a position corresponding to the second feature map.

It should be added that the first sensor type and the second sensor type shown in fig. 2 are only schematic, and in practical implementation, the first sensor type and the second sensor type may be other types, and this embodiment does not limit the implementation manner of the first sensor and the second sensor.

Optionally, the target detection algorithm is used to extract a three-dimensional target in the first feature map, in other words, a target image region obtained through the target detection algorithm is also three-dimensional data, that is, width and height data of the target image region and image channel data.

Three-dimensional object detection algorithms include, but are not limited to: the present embodiment does not limit the implementation manner of the three-dimensional target detection algorithm, such as a target recognition algorithm based on local features, a target recognition algorithm based on global features, or a target recognition algorithm based on a neural network.

In an example, taking an example that the first feature map is obtained by convolving sensor data, referring to fig. 2, determining a target image region of an object to be detected based on the first feature map corresponding to the first sensor of the n types of sensors includes: inputting the first feature map into a preset deconvolution layer 24; and carrying out three-dimensional target detection on the deconvolved characteristic diagram through a three-dimensional target detection layer 25 to obtain a target image area of the target to be detected. And then mapping the target image area to a second feature map. In fig. 2, the method of mapping the target image region to the second feature map is described as an example of region-of-interest pooling (ROI posing), but in actual implementation, the region mapping method may be other methods, and the embodiment does not limit the region mapping method.

And 104, determining the importance ratio of each image channel in the target image area for the target image area in the feature map corresponding to each sensor, wherein the importance ratio is the channel weight of the corresponding image channel.

The importance ratio of the image channels is calculated using a pre-trained network model. The training process of the network model for calculating the importance ratio comprises the following steps: acquiring target image areas of a plurality of sample characteristic graphs, and labeling importance ratio labels of image channels in the target image areas; and carrying out iterative training on the network model by using the target image area and the importance ratio label of the sample characteristic diagram to obtain the trained network model.

In an example, referring to fig. 2, after a second feature map of a second sensor is subjected to deconvolution, a target image region in the deconvolution feature map is obtained through region mapping, and then the target image region of the first sensor and the target image region of the second sensor are respectively subjected to corresponding network models for calculating importance ratios, so as to obtain the importance ratios of image channels, that is, channel weights of the image channels.

And 105, updating the channel information of the feature map according to the channel weight to obtain an updated feature map.

Optionally, updating the channel information of the feature map according to the channel weight to obtain an updated feature map, including: and multiplying the weight of each channel with the feature map to obtain an updated feature map (namely the attention mechanism). Specifically, each channel weight is multiplied by image channel data in the feature map to obtain an updated feature map.

Referring to the update procedure of the communication information shown in fig. 3, for the target image region (w × h × c) of the feature map, w is wide data, h is high data, and c is image channel data. And multiplying the image channel data c in the target image area by the corresponding channel weight respectively to obtain an updated feature map.

And 106, determining the credibility of the target image area in the updated feature map for the target image area in the updated feature map corresponding to each sensor, wherein the credibility is the area weight.

The confidence level of the target image area is calculated using a pre-trained network model. The network model training process for computing confidence includes: acquiring target image areas of a plurality of sample characteristic graphs, and labeling credibility labels of the target image areas; and performing iterative training on the network model by using the target image area and the reliability label of the sample characteristic diagram to obtain the trained network model.

And 107, merging the target image areas in the updated feature maps according to the area weights to obtain the fused feature maps.

Merging the target image areas in each updated feature map according to the area weight to obtain a fused feature map, wherein the merging comprises the following steps: for each updated feature map, multiplying the region weight by the target image region in the updated feature map to obtain a region-updated feature map; and merging the updated feature maps of the regions to obtain a fused feature map.

In one example, referring to fig. 2, after obtaining the region weight of the target image region corresponding to each sensor, the feature maps corresponding to each sensor are fused by the fusion layer 27 according to the region weight, so as to obtain a fused feature map.

Referring to the process of feature map fusion shown in fig. 4, after the updated feature map a corresponding to the laser radar sensor is multiplied by the corresponding reliability (i.e., area weight) l1, the updated feature map B corresponding to the millimeter wave radar sensor is multiplied by the corresponding reliability (i.e., area weight) l2, and the updated feature map C corresponding to the image sensor is multiplied by the corresponding reliability (i.e., area weight) l3, the respective products are summed up to obtain a fused feature map.

In this embodiment, the processes of feature extraction, three-dimensional target detection, area mapping, channel weight calculation, reliability calculation, and feature map fusion may all be implemented in the same network model. Thus, data fusion can be realized only by training one network model. Specifically, referring to fig. 2, the network model for data fusion includes a network branch corresponding to each sensor (i.e., includes n network branches), and a fusion layer 27 connected to each network branch. Wherein each network branch comprises a feature extraction layer (such as the first convolutional layer 21, the second convolutional layer 22 and the third convolutional layer 23 in fig. 2), a region mapping layer connected to the feature extraction layer (such as the deconvolution layer 24 and the region of interest pooling layer in fig. 2), a channel weight calculation layer connected to the region mapping layer, and a reliability calculation layer 26. The network branch corresponding to the first sensor further comprises a deconvolution layer connected with the feature extraction layer, and a three-dimensional target detection layer 25 connected with the deconvolution layer. Wherein the output of the three-dimensional object detection layer 25 is connected to the area mapping layer in each network branch, respectively.

Of course, the network of each function may also be implemented as a network model, and the implementation manner of the network model for data fusion is not limited in this embodiment

Optionally, after merging the target image regions in each updated feature map according to the region weights to obtain a fused feature map, the method further includes: and carrying out target detection according to the fused feature map to obtain a target detection result.

In summary, in the data fusion method based on multiple sensors provided in this embodiment, the sensor data of the same detection range respectively acquired by n types of sensors is acquired; respectively extracting the characteristics of the sensor data corresponding to each sensor to obtain a characteristic diagram corresponding to each sensor; determining a target image area of a target to be detected based on a first feature map corresponding to a first sensor of the n sensors, and mapping the target image area to a second feature map corresponding to a second sensor of the n sensors to obtain a target image area of the second feature map; for a target image area in a feature map corresponding to each sensor, determining the importance ratio of each image channel in the target image area, wherein the importance ratio is the channel weight of the corresponding image channel; updating the channel information of the feature map according to the channel weight to obtain an updated feature map; for a target image area in the updated feature map corresponding to each sensor, determining the credibility of the target image area in the updated feature map, wherein the credibility is the area weight; merging the target image areas in each updated feature map according to the area weight to obtain a fused feature map; the problem that when data based on multiple sensors are subjected to feature fusion, the weight corresponding to each sensor is fixed, and the current scene cannot be adapted in a self-adaptive manner, so that the data fusion effect is poor can be solved; because an attention mechanism and a reliability mechanism are introduced into the feature fusion of the multi-source heterogeneous sensor, important image channel information is screened out from a feature map obtained by the same data in an interested region, and the important image channel information is converted into a ratio to be subjected to weight multiplication; and predicting the reliability of the ROI characteristic diagram of several different sensors as a target after attention mechanism processing, and distributing the reliability as a weight to the corresponding ROI characteristic diagram for multiplication and addition to generate a fused characteristic diagram, wherein the channel weight and the region weight can be adaptive to the current image, namely adaptive to the current scene, so that the data fusion effect can be improved.

Fig. 5 is a block diagram of a multi-sensor based data fusion apparatus according to an embodiment of the present application. The device at least comprises the following modules: a data acquisition module 510, a feature extraction module 520, a region localization module 530, a first computation module 540, a channel update module 550, a second computation module 560, and a feature fusion module 570.

A data obtaining module 510, configured to obtain sensor data of the same detection range respectively acquired by n types of sensors, where n is an integer greater than 1;

the feature extraction module 520 is configured to perform feature extraction on the sensor data corresponding to each sensor, so as to obtain a feature map corresponding to each sensor;

the area positioning module 530 is configured to determine a target image area of the target to be detected based on a first feature map corresponding to a first sensor of the n types of sensors, and map the target image area to a second feature map corresponding to a second sensor of the n types of sensors to obtain a target image area of the second feature map; the second sensor is a different one of the n sensors than the first sensor;

a first calculating module 540, configured to determine, for a target image region in a feature map corresponding to each sensor, an importance ratio of each image channel in the target image region, where the importance ratio is a channel weight of a corresponding image channel; wherein different image channels represent different types of image information;

a channel updating module 550, configured to update the channel information of the feature map according to the channel weight, to obtain an updated feature map;

a second calculating module 560, configured to determine, for a target image region in an updated feature map corresponding to each sensor, a reliability of the target image region in the updated feature map, where the reliability is a region weight;

and the feature fusion module 570 is configured to merge the target image regions in the updated feature maps according to the region weights to obtain a fused feature map.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the multi-sensor data fusion device provided in the above embodiments, when performing multi-sensor data fusion, only the division of the above functional modules is illustrated, and in practical applications, the above functions may be distributed by different functional modules as needed, that is, the internal structure of the multi-sensor data fusion device may be divided into different functional modules to complete all or part of the above described functions. In addition, the data fusion device based on multiple sensors provided by the above embodiments and the data fusion method based on multiple sensors belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.

Fig. 6 is a block diagram of a multi-sensor based data fusion apparatus according to an embodiment of the present application. The apparatus comprises at least a processor 601 and a memory 602.

Processor 601 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the multi-sensor based data fusion method provided by the method embodiments herein.

In some embodiments, the multi-sensor based data fusion device may further include: a peripheral interface and at least one peripheral. The processor 601, memory 602 and peripheral interface may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the multi-sensor based data fusion device may also include fewer or more components, which is not limited by the embodiment.

Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the multi-sensor based data fusion method of the above method embodiment.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the multi-sensor based data fusion method of the above-mentioned method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A multi-sensor based data fusion method, the method comprising:

respectively extracting the characteristics of the sensor data corresponding to each sensor to obtain a characteristic diagram corresponding to each sensor; the characteristic diagram corresponding to each sensor is three-dimensional data, including the width and height of an image and image channel data;

2. The method according to claim 1, wherein the updating the channel information of the feature map according to the channel weight to obtain an updated feature map comprises:

3. The method according to claim 1, wherein the merging the target image regions in the updated feature maps according to the region weights to obtain a fused feature map comprises:

4. The method according to claim 1, wherein the performing feature extraction on the sensor data corresponding to each sensor to obtain a feature map corresponding to each sensor comprises:

5. The method according to claim 4, wherein the determining the target image area of the target to be detected based on the first feature map corresponding to the first sensor of the n sensors comprises:

inputting the first feature map into a preset deconvolution layer;

6. The method according to claim 1, wherein the acquiring sensor data of the same detection range acquired by the n types of sensors respectively comprises:

obtaining calibration parameters of the n sensors;

7. The method of claim 1, wherein the first sensor comprises a lidar sensor; the second sensor includes at least one of an image sensor and a millimeter wave radar sensor.

8. The method according to any one of claims 1 to 7, wherein after the merging the target image regions in the updated feature maps according to the region weights to obtain the fused feature map, the method further comprises:

9. A multi-sensor based data fusion apparatus, the apparatus comprising a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the multi-sensor based data fusion method of any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when being executed by a processor, is adapted to carry out the multi-sensor based data fusion method according to any one of claims 1 to 8.