CN112233079A - Method and system for fusing images of multiple sensors - Google Patents

Method and system for fusing images of multiple sensors Download PDF

Info

Publication number
CN112233079A
CN112233079A CN202011084849.3A CN202011084849A CN112233079A CN 112233079 A CN112233079 A CN 112233079A CN 202011084849 A CN202011084849 A CN 202011084849A CN 112233079 A CN112233079 A CN 112233079A
Authority
CN
China
Prior art keywords
loss
fusion
image
network
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011084849.3A
Other languages
Chinese (zh)
Other versions
CN112233079B (en
Inventor
耿可可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011084849.3A priority Critical patent/CN112233079B/en
Publication of CN112233079A publication Critical patent/CN112233079A/en
Application granted granted Critical
Publication of CN112233079B publication Critical patent/CN112233079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a method and a system for fusing multi-sensor images, which relate to the technical field of image processing and solve the technical problem of low robustness and effectiveness of environmental perception caused by the prior image processing technology.

Description

Method and system for fusing images of multiple sensors
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and a system for multi-sensor image fusion.
Background
The environment perception is one of key technologies for realizing the autonomous form of the unmanned automobile, wherein the multi-sensor information fusion perception is that the perception of the automobile to the environment is realized by fusing the information of various sensors such as a camera, a laser radar and the like, the driving safety and the intelligence of the automobile are ensured, and the multi-sensor information fusion perception is the eyes of the unmanned automobile and is a necessary condition for realizing the unmanned automobile.
At present, a perception method based on a deep learning network is mostly used for environment perception of an unmanned automobile, and RGB images and distance information are used as input of the deep learning network for feature extraction. One of the most common deep learning networks is a convolutional Neural Network, which has a characteristic learning capability and can perform translation invariant classification on input information according to a hierarchical structure thereof, and one of the most representative convolutional Neural networks is a ResNet (Residual Neural Network), and since the input can be directly connected to the output, the whole Network only needs to learn a Residual, thereby simplifying a learning objective and difficulty.
The key to the fusion of multiple sensors and the perception of the environment lies in the quality of image data, and cameras are widely applied to various environment perception systems because of lower cost and richer image features including color, texture, brightness, direction and the like, but illumination change, motion blur and strong noise have great influence on the image quality, which is very disadvantageous to the effectiveness and robustness of an image-based traffic object classification algorithm. Therefore, how to reasonably utilize different sensor data so as to improve the robustness and effectiveness of environmental perception is a problem to be solved urgently.
Disclosure of Invention
The present disclosure provides a method and a system for multi-sensor image fusion, which aims to reasonably utilize different sensor data and improve the robustness and effectiveness of environment perception.
The technical purpose of the present disclosure is achieved by the following technical solutions:
a method of multi-sensor image fusion, comprising:
evaluating the RGB image by using a visible light image quality evaluation network IQAN to obtain an IQA score of the RGB image;
obtaining a weight fusion function of the RGB image by using a weighted evaluation network IQA to obtain a laser radar image and a weight coefficient of the RGB image;
respectively extracting the features of the RGB image and the laser radar image by using a ResNet101+ FPN network to respectively obtain RGB image features and laser radar image features;
performing feature fusion on the RGB image features and the laser radar image features through the weight fusion function to obtain first fusion features;
performing feature fusion on the first fusion feature through an FPN network to obtain a second fusion feature;
predicting the result of the second fusion characteristic by using a prediction network to obtain a predicted result;
and training a network model by using the joint loss function and the prediction result to obtain a deep learning network.
Further, the visible light image quality evaluation network IQAN includes two convolutional layers, an active layer, a pooling layer, and two fully-connected layers.
Further, the expression of the weight fusion function includes:
Figure BDA0002719995530000021
wherein, wRGBRepresenting the weight coefficients of an RGB image, delta representing the relative error, epsilon representing the effect parameter, IQRGBRepresenting IQA score, IQ, of each RGB imageTThe threshold value of the IQA score is represented, and the weight coefficient of the laser radar image is wLIDAR=1-wRGB
Further, the performing feature extraction on the RGB image and the lidar image by using a ResNet101+ FPN network includes:
sampling a laser radar point cloud in a camera field, and performing projection conversion on the sampling, wherein the projection function is as follows:
Figure BDA0002719995530000022
where α and β represent azimuth and zenith angles at which the lidar point cloud is observed, Δ α and Δ β represent average horizontal and vertical angular resolutions between successive beam emitters, and (r, c) represents the angles at which they are observedA two-dimensional map position index of the lidar point cloud on the projected image, (x, y, z) representing the coordinates of the lidar point cloud in a cartesian coordinate system, transformed at (r, c) with a two-channel data (d, z) fill-in element,
Figure BDA0002719995530000023
and respectively inputting the converted laser radar point cloud data (r, c) and the RGB image into a network formed by ResNet101+ FPN for feature extraction.
Further, the joint loss function includes:
Losstotal=λclsLossclsbboxLossbboxmaskLossmask
therein, LosstotalRepresents total Loss, LossclsRepresents the Loss of classification, LossbboxRepresents the bounding Box regression Loss, LossmaskDenotes the mask prediction loss, λcls、λbbox、λmaskAre respectively corresponding weight coefficients;
using PiAnd Pi *Representing the ground truth classification and the prediction classification respectively, the classification loss is represented as:
Figure BDA0002719995530000024
wherein N isclsIndicating the number of suggested regions, LclsIs a multi-class cross entropy function;
using BiAnd Bi *Respectively representing a ground real bounding box and a prediction bounding box, and then the bounding box regression loss is represented as:
Figure BDA0002719995530000031
wherein N isregSize of the element map, LbboxRepresenting a loss function smooth-L1;
using MiAnd Mi *Respectively representing a ground real segmented mask and a predicted segmented mask, the mask prediction loss is represented as:
Figure BDA0002719995530000032
wherein N ismaskIndicating the position of the element map at pixel level, LsegRepresenting a cross entropy function.
A system for multi-sensor image fusion, comprising:
the evaluation module is used for evaluating the RGB image by using a visible light image quality evaluation network IQAN to obtain an IQA score of the RGB image;
the weight acquisition module is used for acquiring a weight fusion function of the RGB image by using a weighted evaluation network IQA to obtain a laser radar image and a weight coefficient of the RGB image;
the feature extraction module is used for respectively extracting features of the RGB image and the laser radar image by using a ResNet101+ FPN network to respectively obtain RGB image features and laser radar image features;
the first fusion module is used for performing feature fusion on the RGB image features and the laser radar image features through the weight fusion function to obtain first fusion features;
the second fusion module is used for carrying out feature fusion on the first fusion feature through the FPN network to obtain a second fusion feature;
the prediction module is used for predicting the result of the second fusion characteristic by using a prediction network to obtain a prediction result;
and the training module is used for training a network model by using the joint loss function and the prediction result to obtain a deep learning network.
The beneficial effect of this disclosure lies in: the method and the system for fusing the images of the multiple sensors utilize a visible light image quality evaluation network IQAN to evaluate RGB images, perform characteristic fusion on the RGB images and the characteristics of laser radar images through an expression of characteristic weight, and train a network model by utilizing a joint loss function to obtain a deep learning network based on image quality evaluation.
Drawings
FIG. 1 is a flow chart of the disclosed method;
FIG. 2 is a schematic view of the disclosed system;
FIG. 3 is a flow chart of data fusion;
FIG. 4 is a graph comparing the results of the examples.
Detailed Description
The technical scheme of the disclosure will be described in detail with reference to the accompanying drawings. In the description of the present disclosure, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated, but merely as distinguishing between different components.
Fig. 1 is a flow chart of the method of the present disclosure, as shown in fig. 1, 100: first, the RGB image is evaluated using a visible light image quality evaluation network IQAN (image quality evaluation network).
The overall structure of IQAN includes two convolutional layers, an active layer, a pooling layer, and two fully-connected layers. For an input RGB Image, firstly, the RGB Image is adjusted to 64 × 64 pixels, 50 feature maps with a size of 60 × 60 are generated after passing through a first layer of convolutional layer, then 50 feature maps with a size of 56 × 56 are generated after passing through a second layer of convolutional layer, and then the RGB Image reaches two full-link layers after passing through an activation layer and a pooling layer, and one-dimensional output with IQA (Image Quality Assessment) scores is given after linear regression.
101: and obtaining a weight fusion function of the RGB image by using a weighted evaluation network IQA so as to obtain the weight coefficients of the laser radar image and the RGB image. The expression of the weight fusion function includes:
Figure BDA0002719995530000041
wherein, wRGBRepresenting the weight coefficients of an RGB image, delta representing the relative error, epsilon representing the effect parameter, IQRGBRepresenting IQA score, IQ, of each RGB imageTThe threshold value of the IQA score is represented, and the weight coefficient of the laser radar image is wLIDAR=1-wRGB。wRGBAnd wLIDARThe dependency of the example segmentation of traffic objects (cars, etc.) on the RGB image and the lidar image, respectively, is described.
102: feature extraction is performed on the RGB image and the lidar image respectively by using a ResNet101+ FPN network. ResNet101 is one of ResNet, FPN (feature pyramid networks) is a feature pyramid network, and the specific process of feature extraction includes: firstly, sampling a laser radar point cloud in a camera field, converting the sampling into a pixel-level dense spherical depth image, namely performing projection conversion on the sampling, wherein the projection function is as follows:
Figure BDA0002719995530000051
wherein α and β represent azimuth and zenith angles at which the lidar point cloud is observed, Δ α and Δ β represent average horizontal angular resolution and average vertical angular resolution between successive beam emitters, (r, c) represent two-dimensional map position indices of the lidar point cloud on the projected image, (x, y, z) represent coordinates of the lidar point cloud in a cartesian coordinate system, and the transformation is performed at (r, c) with a two-channel data (d, z) fill element,
Figure BDA0002719995530000052
and respectively inputting the converted laser radar point cloud data (r, c) and the RGB image into a network formed by ResNet101+ FPN for feature extraction, and respectively obtaining RGB image features and laser radar image features, namely obtaining feature maps of camera data and laser radar data.
103: performing feature fusion on the RGB image features and the laser radar image features through a weight fusion function to obtain first fusion features, namely multiplying the feature graph of the RGB image and the feature graph of the laser radar image by respective weight coefficients wRGB、wLIDARThen serially connecting to perform feature fusion to obtainTo the first fused feature. Fig. 3 is a flow chart of data fusion, and the RGB image and the lidar image are respectively put into a ResNet101+ FPN network for feature extraction, and then feature fusion is performed on the respective extracted features.
104: and performing feature fusion on the first fusion feature through the FPN network to obtain a second fusion feature. A shallow layer with high-resolution features and a deep layer with rich semantic information are fused by using an FPN network, and a feature pyramid with strong semantic information on all scales is constructed.
105: and predicting the result of the second fusion characteristic by using a prediction network to obtain a prediction result. The output of the FPN network is subjected to bounding box regression, classification and mask prediction using a prediction network.
106: and training a network model by using the joint loss function and the prediction result to obtain a deep learning network. The proposed network model is trained using the following joint loss function: losstotal=λclsLossclsbboxLossbboxmaskLossmask
Therein, LosstotalRepresents total Loss, LossclsRepresents the Loss of classification, LossbboxLoss of expression, LossmaskDenotes the mask prediction loss, λcls、λbbox、λmaskRespectively, corresponding weight coefficients.
Using PiAnd Pi *Representing the ground truth classification and the prediction classification respectively, the classification loss is represented as:
Figure BDA0002719995530000053
wherein N isclsIndicating the number of suggested regions, LclsAs a multi-class cross entropy function.
Using BiAnd Bi *Respectively representing a ground real bounding box and a prediction bounding box, and then the bounding box regression loss is represented as:
Figure BDA0002719995530000054
wherein N isregSize of the element map, LbboxRepresenting the loss function smooth-L1.
Using MiAnd Mi *Respectively representing a ground real segmented mask and a predicted segmented mask, the mask prediction loss is represented as:
Figure BDA0002719995530000061
wherein N ismaskIndicating the position of the element map at pixel level, LsegRepresenting a cross entropy function.
Table 1 shows the data used in practicing the method of the present disclosure, as follows:
environment(s) In sunny days Rainy day In fog weather Night time
Number of 4369 2315 3907 4061
TABLE 1
As can be seen from Table 2: the fusion perception effect of the camera data and the laser radar data evaluated based on the IQAN network is better than the perception effect evaluated based on the IQAN network and only the camera data without the IQAN network. Meanwhile, the perception effect of the dual-mode image perception depth neural network using ResNet-101+ FPN as the backbone network is superior to that of the MASK-RCNN and Retina-RCNN of the current most advanced example segmentation network.
Backbone network Modality FPS AP AP50 AP75
MASK-RCNN ResNet-101-FPN Single 13.5 35.7 58.0 37.8
Retina-RCN ResNet-101-FPN Single 11.2 34.7 55.4 36.9
YOLACT ResNet-101-FPN Single 30.0 29.8 48.5 31.2
IQAN ResNet-18-FPN Dual 37.3 28.7 46.8 30.0
IQAN ResNet-50-FPN Dual 35.5 31.2 50.6 32.8
IQAN ResNet-101-FPN Dual 27.0 39.1 59.7 39.8
TABLE 2
Wherein fps (frames Per second) represents a number of frames transmitted Per second, ap (average prediction) represents an average prediction accuracy, a Single modality represents data input to the backbone network only with RGB images, and a Dual modality represents data input to the backbone network including RGB images and lidar images. AP50 and AP75 indicate average prediction accuracies when the intersection ratio is 50% and 75%, respectively, and AP indicates an average value of the average prediction accuracies under 9 intersection ratios of 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, and 90%, respectively, and the intersection ratio indicates an overlapping ratio between the generated prediction region (candidate frame) and the real target region (original mark frame).
Fig. 4 shows the detection performed by using the test data set in table 1 under different weather conditions, where a represents rainy days, b represents foggy days, c represents presence of street lamps at night, and d represents absence of street lamps at night, and the marking results under three conditions are obtained by combining table 2, it is obvious that the environment sensing result obtained after the RGB image and the lidar image are subjected to feature data fusion based on IQAN is the most accurate.
Fig. 2 is a schematic diagram of the system of the present disclosure, and the multi-sensor image fusion system of the present disclosure includes an evaluation module, a weight obtaining module, a feature extraction module, a first fusion module, a second fusion module, a prediction module, and a training module, and specific functions of each module refer to the method of the present disclosure, which is not described again.
The foregoing is an exemplary embodiment of the present disclosure, and the scope of the present disclosure is defined by the claims and their equivalents.

Claims (10)

1. A method of multi-sensor image fusion, comprising:
evaluating the RGB image by using a visible light image quality evaluation network IQAN to obtain an IQA score of the RGB image;
obtaining a weight fusion function of the RGB image by using a weighted evaluation network IQA to obtain a laser radar image and a weight coefficient of the RGB image;
respectively extracting the features of the RGB image and the laser radar image by using a ResNet101+ FPN network to respectively obtain RGB image features and laser radar image features;
performing feature fusion on the RGB image features and the laser radar image features through the weight fusion function to obtain first fusion features;
performing feature fusion on the first fusion feature through an FPN network to obtain a second fusion feature;
predicting the result of the second fusion characteristic by using a prediction network to obtain a predicted result;
and training a network model by using the joint loss function and the prediction result to obtain a deep learning network.
2. The method of multi-sensor image fusion of claim 1, wherein the visible light image quality assessment network IQAN includes two convolutional layers, an active layer, a pooling layer, and two fully-connected layers.
3. The method of multi-sensor image fusion of claim 1, wherein the expression of the weight fusion function comprises:
Figure FDA0002719995520000011
wherein, wRGBRepresenting the weight coefficients of an RGB image, delta representing the relative error, epsilon representing the effect parameter, IQRGBRepresenting IQA score, IQ, of each RGB imageTThe threshold value of the IQA score is represented, and the weight coefficient of the laser radar image is wLIDAR=1-wRGB
4. The method of multi-sensor image fusion of claim 3, wherein said feature extracting the RGB image and the lidar image using a ResNet101+ FPN network comprises:
sampling a laser radar point cloud in a camera field, and performing projection conversion on the sampling, wherein the projection function is as follows:
Figure FDA0002719995520000012
wherein α and β represent azimuth and zenith angles at which the lidar point cloud is viewed, Δ α and Δ β represent average horizontal angular resolution and average vertical angular resolution between successive beam emitters, (r, c) represent a two-dimensional map position index of the lidar point cloud on the projected image, (x, y, z) represent coordinates of the lidar point cloud in a cartesian coordinate system, and the transformation is performed at (r, c) with a two-channel data (d, z) fill element,
Figure FDA0002719995520000021
and respectively inputting the converted laser radar point cloud data (r, c) and the RGB image into a network formed by ResNet101+ FPN for feature extraction.
5. The method of multi-sensor image fusion of claim 4, wherein the joint loss function comprises:
Losstotal=λclsLossclsbboxLossbboxmaskLossmask
therein, LosstotalRepresents total Loss, LossclsRepresents the Loss of classification, LossbboxRepresents the bounding Box regression Loss, LossmaskDenotes the mask prediction loss, λcls、λbbox、λmaskAre respectively corresponding weight coefficients;
using PiAnd Pi *Representing the ground truth classification and the prediction classification respectively, the classification loss is represented as:
Figure FDA0002719995520000022
wherein N isclsIndicating the number of suggested regions, LclsIs a multi-class cross entropy function;
using BiAnd Bi *Respectively representing a ground real bounding box and a prediction bounding box, and then the bounding box regression loss is represented as:
Figure FDA0002719995520000023
wherein N isregSize of the element map, LbboxRepresenting a loss function smooth-L1;
using MiAnd Mi *Respectively representing a ground real segmented mask and a predicted segmented mask, the mask prediction loss is represented as:
Figure FDA0002719995520000024
wherein N ismaskIndicating the position of the element map at pixel level, LsegRepresenting a cross entropy function.
6. A multi-sensor image fusion system, comprising:
the evaluation module is used for evaluating the RGB image by using a visible light image quality evaluation network IQAN to obtain an IQA score of the RGB image;
the weight acquisition module is used for acquiring a weight fusion function of the RGB image by using a weighted evaluation network IQA to obtain a laser radar image and a weight coefficient of the RGB image;
the feature extraction module is used for respectively extracting features of the RGB image and the laser radar image by using a ResNet101+ FPN network to respectively obtain RGB image features and laser radar image features;
the first fusion module is used for performing feature fusion on the RGB image features and the laser radar image features through the weight fusion function to obtain first fusion features;
the second fusion module is used for carrying out feature fusion on the first fusion feature through the FPN network to obtain a second fusion feature;
the prediction module is used for predicting the result of the second fusion characteristic by using a prediction network to obtain a prediction result;
and the training module is used for training a network model by using the joint loss function and the prediction result to obtain a deep learning network.
7. The multi-sensor image fusion system of claim 6, wherein the visible light image quality assessment network IQAN includes two convolutional layers, an active layer, a pooling layer, and two fully-connected layers.
8. The multi-sensor image fusion system of claim 6, wherein the expression of feature weights comprises:
Figure FDA0002719995520000031
wherein, wRGBRepresenting the weight coefficients of an RGB image, delta representing the relative error, epsilon representing the effect parameter, IQRGBRepresenting IQA score, IQ, of each RGB imageTThe threshold value of the IQA score is represented, and the weight coefficient of the laser radar image is wLIDAR=1-wRGB
9. The multi-sensor image fusion system of claim 8, the feature extraction module to: sampling a laser radar point cloud in a camera field, and performing projection conversion on the sampling, wherein the projection function is as follows:
Figure FDA0002719995520000032
wherein α and β represent azimuth and zenith angles at which the lidar point cloud is viewed, Δ α and Δ β represent average horizontal angular resolution and average vertical angular resolution between successive beam emitters, (r, c) represent a two-dimensional map position index of the lidar point cloud on the projected image, (x, y, z) represent coordinates of the lidar point cloud in a cartesian coordinate system, and the transformation is performed at (r, c) with a two-channel data (d, z) fill element,
Figure FDA0002719995520000033
and respectively inputting the converted laser radar point cloud data (r, c) and the RGB image into a network formed by ResNet101+ FPN for feature extraction.
10. The multi-sensor image fusion system of claim 9, wherein the joint loss function comprises:
Losstotal=λclsLossclsbboxLossbboxmaskLossmask
therein, LosstotalRepresents total Loss, LossclsRepresents the Loss of classification, LossbboxLoss of expression, LossmaskDenotes the mask prediction loss, λcls、λbbox、λmaskAre respectively corresponding weight coefficients;
using PiAnd Pi *Representing the ground truth classification and the prediction classification respectively, the classification loss is represented as:
Figure FDA0002719995520000034
wherein N isclsIndicating the number of suggested regions, LclsIs a multi-class cross entropy function;
using BiAnd Bi *Respectively representing a ground real bounding box and a prediction bounding box, and then the bounding box regression loss is represented as:
Figure FDA0002719995520000041
wherein N isregSize of the element map, LbboxRepresenting a loss function smooth-L1;
using MiAnd Mi *Respectively representing a ground real segmented mask and a predicted segmented mask, the mask prediction loss is represented as:
Figure FDA0002719995520000042
wherein N ismaskIndicating the position of the element map at pixel level, LsegRepresenting a cross entropy function.
CN202011084849.3A 2020-10-12 2020-10-12 Method and system for fusing images of multiple sensors Active CN112233079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011084849.3A CN112233079B (en) 2020-10-12 2020-10-12 Method and system for fusing images of multiple sensors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011084849.3A CN112233079B (en) 2020-10-12 2020-10-12 Method and system for fusing images of multiple sensors

Publications (2)

Publication Number Publication Date
CN112233079A true CN112233079A (en) 2021-01-15
CN112233079B CN112233079B (en) 2022-02-11

Family

ID=74113304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011084849.3A Active CN112233079B (en) 2020-10-12 2020-10-12 Method and system for fusing images of multiple sensors

Country Status (1)

Country Link
CN (1) CN112233079B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221659A (en) * 2021-04-13 2021-08-06 天津大学 Double-light vehicle detection method and device based on uncertain sensing network
CN117864412A (en) * 2023-08-10 2024-04-12 中国人民解放军海军航空大学 Onboard electronic equipment test signal trigger mechanism based on laser point cloud information

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289808A (en) * 2011-07-22 2011-12-21 清华大学 Method and system for evaluating image fusion quality
US20130330001A1 (en) * 2012-06-06 2013-12-12 Apple Inc. Image Fusion Using Intensity Mapping Functions
CN106897986A (en) * 2017-01-23 2017-06-27 浙江大学 A kind of visible images based on multiscale analysis and far infrared image interfusion method
CN108549874A (en) * 2018-04-19 2018-09-18 广州广电运通金融电子股份有限公司 A kind of object detection method, equipment and computer readable storage medium
WO2019196539A1 (en) * 2018-04-11 2019-10-17 杭州海康威视数字技术股份有限公司 Image fusion method and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289808A (en) * 2011-07-22 2011-12-21 清华大学 Method and system for evaluating image fusion quality
US20130330001A1 (en) * 2012-06-06 2013-12-12 Apple Inc. Image Fusion Using Intensity Mapping Functions
CN106897986A (en) * 2017-01-23 2017-06-27 浙江大学 A kind of visible images based on multiscale analysis and far infrared image interfusion method
WO2019196539A1 (en) * 2018-04-11 2019-10-17 杭州海康威视数字技术股份有限公司 Image fusion method and apparatus
CN108549874A (en) * 2018-04-19 2018-09-18 广州广电运通金融电子股份有限公司 A kind of object detection method, equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KEKE GENG等: "Deep Dual-Modal Traffic Objects Instance Segmentation Method Using Camera and LIDAR Data for Autonomous Driving", 《REMOTE SENSING》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221659A (en) * 2021-04-13 2021-08-06 天津大学 Double-light vehicle detection method and device based on uncertain sensing network
CN113221659B (en) * 2021-04-13 2022-12-23 天津大学 Double-light vehicle detection method and device based on uncertain sensing network
CN117864412A (en) * 2023-08-10 2024-04-12 中国人民解放军海军航空大学 Onboard electronic equipment test signal trigger mechanism based on laser point cloud information

Also Published As

Publication number Publication date
CN112233079B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN110188696B (en) Multi-source sensing method and system for unmanned surface equipment
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN110570429B (en) Lightweight real-time semantic segmentation method based on three-dimensional point cloud
CN115082674B (en) Multi-mode data fusion three-dimensional target detection method based on attention mechanism
CN110738121A (en) front vehicle detection method and detection system
CN109919026B (en) Surface unmanned ship local path planning method
CN114266977B (en) Multi-AUV underwater target identification method based on super-resolution selectable network
CN114639115B (en) Human body key point and laser radar fused 3D pedestrian detection method
CN112233079B (en) Method and system for fusing images of multiple sensors
CN111738071B (en) Inverse perspective transformation method based on motion change of monocular camera
CN116129233A (en) Automatic driving scene panoramic segmentation method based on multi-mode fusion perception
CN112257668A (en) Main and auxiliary road judging method and device, electronic equipment and storage medium
CN114972968A (en) Tray identification and pose estimation method based on multiple neural networks
CN115830265A (en) Automatic driving movement obstacle segmentation method based on laser radar
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method
Zhang et al. Change detection between digital surface models from airborne laser scanning and dense image matching using convolutional neural networks
CN117372697A (en) Point cloud segmentation method and system for single-mode sparse orbit scene
CN115984646B (en) Remote sensing cross-satellite observation oriented distributed target detection method and device and satellite
CN111353481A (en) Road obstacle identification method based on laser point cloud and video image
CN115187959B (en) Method and system for landing flying vehicle in mountainous region based on binocular vision
CN116386003A (en) Three-dimensional target detection method based on knowledge distillation
CN115861948A (en) Lane line detection method, lane line detection device, lane line early warning method, lane line early warning system and medium
Zheng et al. Research on environmental feature recognition algorithm of emergency braking system for autonomous vehicles
CN113762195A (en) Point cloud semantic segmentation and understanding method based on road side RSU
CN113221643B (en) Lane line classification method and system adopting cascade network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant