CN111652261A - Multi-modal perception fusion system - Google Patents

Multi-modal perception fusion system Download PDF

Info

Publication number
CN111652261A
CN111652261A CN202010120330.XA CN202010120330A CN111652261A CN 111652261 A CN111652261 A CN 111652261A CN 202010120330 A CN202010120330 A CN 202010120330A CN 111652261 A CN111652261 A CN 111652261A
Authority
CN
China
Prior art keywords
fusion system
modal
camera
laser radar
cameras
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010120330.XA
Other languages
Chinese (zh)
Inventor
王鸿鹏
韩霄
邵岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202010120330.XA priority Critical patent/CN111652261A/en
Publication of CN111652261A publication Critical patent/CN111652261A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Traffic Control Systems (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention provides a multi-modal perception fusion system for a full scene, which comprises an upper computer, a laser radar, a multi-view camera, an IMU, an infrared depth camera and a power supply, wherein the multi-view camera comprises two FLIR industrial network port cameras and two USB3.0 cameras, and the multi-modal perception fusion system comprises the following steps: installing each hardware and software, acquiring data and constructing a model. The multi-mode perception fusion system is small and exquisite, has light weight, can be used for modeling of unmanned vehicle-mounted and unmanned vehicle-mounted environments, medical industries and military unmanned environments, can also be used for various complex environments such as indoor and outdoor environments and the like, and lays a foundation for planning navigation.

Description

Multi-modal perception fusion system
Technical Field
The invention belongs to the field of multi-modal perception fusion systems, and particularly relates to a multi-modal perception fusion system for a whole scene.
Background
With the rapid development of sensor technology and the internet, large data of various different modalities is emerging rapidly at an unprecedented rate. For an object to be described (object, scene, etc.), the coupled data samples collected by the different methods or perspectives are multi-modal data, and each method or perspective in which such data is collected is generally referred to as a modality.
Narrow multi-modal information generally focuses on modalities with different perception characteristics, while broad multi-modal fusion generally also includes multi-feature fusion in the same modality information, data fusion of a plurality of sensors of the same type, and the like, so that the problem of multi-modal perception and learning is closely related to multi-source fusion and multi-sensor fusion in the field of signal processing, multi-view learning or multi-view fusion in the field of machine learning, and the like; the multi-modal data can obtain more comprehensive and accurate information, and the reliability and fault tolerance of the system are enhanced.
In the multi-modal perception and learning problem, since different modalities have completely different description forms and complex coupling correspondence, the problem of perception representation and cognitive fusion related to multi-modalities needs to be uniformly solved. The multi-modal perception and fusion is to enable two completely unrelated data samples with different formats to be relatively fused with each other through proper transformation or projection, and the fusion of the heterogeneous data can often achieve unexpected effects.
At present, multi-modal data plays a great role in the fields of internet information search, human-computer interaction, industrial environment fault diagnosis, robots and the like, multi-modal learning between vision and language is a field with more concentrated research results in the aspect of multi-modal fusion at present, and the robot field still faces a lot of challenging problems needing further exploration at present; therefore, a multi-mode sensing system is developed, and multi-vision, laser, binocular infrared, depth, IMU and other multi-modes are installed according to different directions. The system can realize the full-scene perception by realizing the automatic perception, scanning and modeling of large scenes and small workpieces, is suitable for indoor and outdoor, and gives depth information and distance information to RGB image information of the environment, but the most important difficulty is that: the heterogeneous multi-source sensors, the extraction of the features and the solution of the correlation between the features enable the fusion to be more accurate, and the real-time perception environment can be achieved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a multi-modal perception fusion system for a full scene, so as to realize automatic perception, scanning and modeling of a large scene and a small workpiece. The multi-modal perception fusion system comprises an upper computer, a laser radar, a multi-view camera, an IMU, an infrared depth camera and a power supply, wherein the multi-view camera comprises two FLIR industrial network port cameras and two USB3.0 cameras, and the multi-modal perception fusion system comprises the following steps:
s1: installing hardware: connecting a laser radar to an upper computer in an Ethernet interface connection mode, connecting two FLIR industrial network port cameras to the upper computer in the Ethernet interface mode, respectively connecting two USB3.0 cameras, an IMU (inertial measurement unit) and an infrared depth camera to a USB3.0 interface of the upper computer, and connecting all the parts after being connected with each other through a data line and a power supply;
s2: installation of software and acquisition of data: opening a Linux Ubuntu System, installing and configuring a driver and software of each module, starting nodes of each mode by using a Robot Operating System, and displaying the acquired data of the point cloud of the laser radar, the RGB image of the multi-view camera, the information of an accelerometer and a gyroscope of the IMU and the depth-of-field image of the infrared depth camera by using RVIZ;
s3: constructing a model: and then, processing the acquired data by using an SLAM theoretical system, wherein the processing flow is divided into two steps, namely a front end and a rear end, the front end is responsible for feature extraction of each module and representation of correlation among features, the rear end is responsible for parameter optimization and three-dimensional reconstruction and positioning, and the rear end is responsible for iterative optimization of external parameters by using a Marquardt algorithm in modeling identification to obtain optimal estimation, so that a fused final model and effect graph are obtained.
Preferably, the Operating System adopted by the multi-modal perception fusion System is a Linux Ubuntu System, the middleware adopted is a Robot Operating System, and the used programming languages are c + + and python.
Preferably, the laser radar adopts radium intelligence c 16-151B.
Preferably, the number of the infrared depth cameras is two, and the infrared depth cameras and the IMU employ intel real Sense D435 i.
Preferably, the distance from the laser radar to the ground is 10m, a conical blind area is arranged below the laser radar after projection, the working distance of the infrared depth camera is 0.2-10m, and the blind area which cannot be projected by the laser radar can be made up.
Preferably, the laser radar, the multi-view camera, the IMU and the infrared depth camera are respectively provided with an independent sensor.
Compared with the prior art, the invention has the beneficial effects that: the system has the advantages that heterogeneous sensors are independently combined, the calibration can be performed quickly, the matching of collected information and the fusion under a three-dimensional space are performed, a patch model is generated by point cloud, the iterative optimization is performed again, a three-dimensional reconstruction model capable of achieving the precision is finally obtained, the most accurate model and effect diagram are obtained, the fusion is more accurate, the real-time sensing environment can be achieved, accurate technical data are provided for the later identification and detection technology, the multi-mode sensing fusion system is small and exquisite, the weight is light, the system can be used for modeling of unmanned vehicles, medical industries and military unmanned environments, and can also be used for various complex environments such as indoor and outdoor environments, and the basis is laid for planning navigation.
Drawings
Fig. 1 is an appearance diagram of a multi-modal perceptual fusion system for a full scene.
FIG. 2 is a multi-modal system architecture diagram for a full scene;
fig. 3 is a diagram of the installation steps of the multi-modal perceptual fusion system for a full scene.
In the figure: 1-laser radar; 2-a first FLIR industrial portal camera; 3-a first USB3.0 camera; 4-a first infrared depth camera; 5-a second infrared depth camera; 6-a second FLIR industrial portal camera; 7-a second USB3.0 camera; 8-multi-view camera; 9-IMU.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
The invention is further described below:
example (b):
as shown in fig. 1, an Operating System adopted by the multi-modal perception fusion System is a Linux Ubuntu System, a middleware adopted by the multi-modal perception fusion System is a Robot Operating System, and programming languages used by the multi-modal perception fusion System are c + + and python; the multi-mode perception fusion system comprises an upper computer, a laser radar 1, a first FLIR industrial network port camera 2, a second FLIR industrial network port camera 6, a first USB3.0 camera 3, a second USB3.0 camera 7, a multi-view camera 8, an IMU 9, a first infrared depth camera 4, a second infrared depth camera 5 and a power supply;
specifically, as shown in fig. 3, the steps of forming the multi-modal perceptual fusion system are as follows:
s1: installing hardware: connecting a laser radar to an upper computer in an Ethernet interface connection mode, connecting a first FLIR industrial network port camera 2 and a second FLIR industrial network port camera 3 to the upper computer in the Ethernet interface mode, respectively connecting a first USB3.0 camera 3, a second USB3.0 camera 7, an IMU 9, a first infrared depth camera 4 and a second infrared depth camera 5 to a USB3.0 interface of the upper computer, and connecting all the parts after connection with a power supply through data lines;
s2: installation of software and acquisition of data: the Linux Ubuntu System is opened, drivers and software of all modules are installed and configured, a Robot Operating System is used for starting nodes of all modes, the acquired point cloud of the laser radar 1, RGB images of the multi-view camera 8, the first FLIR industrial portal camera 2, the second FLIR industrial portal camera 6, the first USB3.0 camera 3 and the second USB3.0 camera 7, accelerometer and gyroscope information of the IMU 9 and depth maps of the first infrared depth camera 4 and the second infrared depth camera 5 are displayed by using the RVIZ;
s3: constructing a model: and then, processing the acquired data by using a slam theoretical system, wherein the processing flow is divided into two steps, namely a front end and a rear end, the front end is responsible for feature extraction of each module and representation of correlation among features, the rear end is responsible for parameter optimization and three-dimensional reconstruction and positioning, the external parameters are iteratively optimized by using a Marquardt algorithm in modeling identification to obtain optimal estimation, and finally, a final model and an effect graph which are accurately fused are obtained.
Specifically, the laser radar 1 adopts radium intelligence c 16-151B.
Specifically, the first infrared depth camera 4, the second infrared depth camera 5 and the IMU 9 all employ intel real Sense D435 i.
Specifically, the distance from the laser radar 1 to the ground is 10m, a conical blind area is arranged below the laser radar 1 after projection, the working distance of the first infrared depth camera 4 and the second infrared depth camera 5 is 0.2-10m, and the blind area which cannot be projected by the laser radar 1 can be made up.
Specifically, the laser radar 1, the first FLIR industrial portal camera 2, the second FLIR industrial portal camera 6, the first USB3.0 camera 3, the second USB3.0 camera 7, the multi-view camera 8, the IMU 9, the first infrared depth camera 4, and the second infrared depth camera 5 are respectively provided with an independent sensor.
Referring to fig. 2, a graphical structural representation of a multimodal system is shown. Wherein, the vertex represents the sensors such as laser radar, camera, IMU and the like. The edges represent the relative pose transformation inferences between the sensors.
The workflow diagram is shown in fig. 3.
It should be noted that, in this document, moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. The multi-modal perception fusion system for the full scene is characterized by comprising an upper computer, a laser radar, a multi-view camera, an IMU (inertial measurement Unit), an infrared depth camera and a power supply, wherein the multi-view camera comprises two FLIR (flash infrared) industrial network port cameras and two USB (universal serial bus) 3.0 cameras, and the multi-modal perception fusion system comprises the following steps:
s1: installing hardware: connecting a laser radar to an upper computer in an Ethernet interface connection mode, connecting two FLIR industrial network port cameras to the upper computer in the Ethernet interface mode, respectively connecting two USB3.0 cameras, an IMU (inertial measurement unit) and an infrared depth camera to a USB3.0 interface of the upper computer, and connecting all the parts after being connected with each other through a data line and a power supply;
s2: installation of software and acquisition of data: opening a Linux Ubuntu System, installing and configuring a driver and software of each module, starting nodes of each mode by using a Robot Operating System, and displaying the acquired data of the point cloud of the laser radar, the RGB image of the multi-view camera, the information of an accelerometer and a gyroscope of the IMU and the depth-of-field map of the infrared depth camera by using RVIZ;
s3: constructing a model: and then, processing the acquired data by using a slam theoretical system, wherein the processing flow is divided into two steps, namely a front end and a rear end, the front end is responsible for feature extraction of each module and representation of correlation among features, and the rear end is responsible for optimization, three-dimensional reconstruction and positioning of parameters, so that a fused final model and an effect graph are obtained finally.
2. The System according to claim 1, wherein the operating System adopted by the multimodal perceptual fusion System is Linux Ubuntu System, the middleware adopted by the multimodal perceptual fusion System is RobotOperating System, and the programming languages used by the multimodal perceptual fusion System are c + + and python.
3. The multi-modal perceptual fusion system for a whole scene of claim 1 wherein the lidar employs radium intelligence c 16-151B.
4. The multi-modal perceptual fusion system for a full scene of claim 1 wherein the number of infrared depth cameras is two and the infrared depth cameras and IMU employ Intel Real sensor D435 i.
5. The multi-modal perceptual fusion system for a full scene as claimed in claim 1 wherein the final model and effect map is obtained by using a Marquardt algorithm in modeling recognition to iteratively optimize the external parameters to obtain an optimal estimation, thereby obtaining the most accurate model and effect map.
6. The system of claim 1, wherein the distance from the laser radar projected to the ground is 10m, a conical blind area is formed below the projected laser radar, and the working distance of the infrared depth camera is 0.2-10m, so that the blind area which cannot be projected by the laser radar can be compensated.
7. The multi-modal perceptual fusion system for a whole scene of claim 1 wherein the lidar, the multi-view camera, the IMU, and the infrared depth camera each have independent sensors.
CN202010120330.XA 2020-02-26 2020-02-26 Multi-modal perception fusion system Pending CN111652261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010120330.XA CN111652261A (en) 2020-02-26 2020-02-26 Multi-modal perception fusion system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010120330.XA CN111652261A (en) 2020-02-26 2020-02-26 Multi-modal perception fusion system

Publications (1)

Publication Number Publication Date
CN111652261A true CN111652261A (en) 2020-09-11

Family

ID=72346093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010120330.XA Pending CN111652261A (en) 2020-02-26 2020-02-26 Multi-modal perception fusion system

Country Status (1)

Country Link
CN (1) CN111652261A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327289A (en) * 2021-05-18 2021-08-31 中山方显科技有限公司 Method for simultaneously calibrating internal and external parameters of multi-source heterogeneous sensor

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107036594A (en) * 2017-05-07 2017-08-11 郑州大学 The positioning of intelligent Power Station inspection intelligent body and many granularity environment perception technologies
CN107085422A (en) * 2017-01-04 2017-08-22 北京航空航天大学 A kind of tele-control system of the multi-functional Hexapod Robot based on Xtion equipment
CN107390703A (en) * 2017-09-12 2017-11-24 北京创享高科科技有限公司 A kind of intelligent blind-guidance robot and its blind-guiding method
US20170371329A1 (en) * 2014-12-19 2017-12-28 United Technologies Corporation Multi-modal sensor data fusion for perception systems
CN108700939A (en) * 2016-02-05 2018-10-23 奇跃公司 System and method for augmented reality
CN108846867A (en) * 2018-08-29 2018-11-20 安徽云能天智能科技有限责任公司 A kind of SLAM system based on more mesh panorama inertial navigations
CN109828658A (en) * 2018-12-17 2019-05-31 彭晓东 A kind of man-machine co-melting long-range situation intelligent perception system
CN110174136A (en) * 2019-05-07 2019-08-27 武汉大学 A kind of underground piping intelligent measurement robot and intelligent detecting method
CN110261870A (en) * 2019-04-15 2019-09-20 浙江工业大学 It is a kind of to synchronize positioning for vision-inertia-laser fusion and build drawing method
CN110321000A (en) * 2019-04-25 2019-10-11 南开大学 A kind of dummy emulation system towards intelligence system complex task
US20190339081A1 (en) * 2018-05-03 2019-11-07 Orby, Inc. Unmanned aerial vehicle with enclosed propulsion system for 3-d data gathering and processing
CN110427022A (en) * 2019-07-08 2019-11-08 武汉科技大学 A kind of hidden fire-fighting danger detection robot and detection method based on deep learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170371329A1 (en) * 2014-12-19 2017-12-28 United Technologies Corporation Multi-modal sensor data fusion for perception systems
CN108700939A (en) * 2016-02-05 2018-10-23 奇跃公司 System and method for augmented reality
CN107085422A (en) * 2017-01-04 2017-08-22 北京航空航天大学 A kind of tele-control system of the multi-functional Hexapod Robot based on Xtion equipment
CN107036594A (en) * 2017-05-07 2017-08-11 郑州大学 The positioning of intelligent Power Station inspection intelligent body and many granularity environment perception technologies
CN107390703A (en) * 2017-09-12 2017-11-24 北京创享高科科技有限公司 A kind of intelligent blind-guidance robot and its blind-guiding method
US20190339081A1 (en) * 2018-05-03 2019-11-07 Orby, Inc. Unmanned aerial vehicle with enclosed propulsion system for 3-d data gathering and processing
CN108846867A (en) * 2018-08-29 2018-11-20 安徽云能天智能科技有限责任公司 A kind of SLAM system based on more mesh panorama inertial navigations
CN109828658A (en) * 2018-12-17 2019-05-31 彭晓东 A kind of man-machine co-melting long-range situation intelligent perception system
CN110261870A (en) * 2019-04-15 2019-09-20 浙江工业大学 It is a kind of to synchronize positioning for vision-inertia-laser fusion and build drawing method
CN110321000A (en) * 2019-04-25 2019-10-11 南开大学 A kind of dummy emulation system towards intelligence system complex task
CN110174136A (en) * 2019-05-07 2019-08-27 武汉大学 A kind of underground piping intelligent measurement robot and intelligent detecting method
CN110427022A (en) * 2019-07-08 2019-11-08 武汉科技大学 A kind of hidden fire-fighting danger detection robot and detection method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何守印: "《基于多传感器融合的无人机自主避障研究》", 《中国博士学位论文全文数据库 工程科技Ⅱ辑》 *
陈梦晓: "《基于多传感器数据的移动机器人定位与建图》", 《中国硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327289A (en) * 2021-05-18 2021-08-31 中山方显科技有限公司 Method for simultaneously calibrating internal and external parameters of multi-source heterogeneous sensor

Similar Documents

Publication Publication Date Title
US20230260151A1 (en) Simultaneous Localization and Mapping Method, Device, System and Storage Medium
JP2021120844A (en) Method, device, electronic device and recording medium utilized for determining position of vehicle
CN109737981B (en) Unmanned vehicle target searching device and method based on multiple sensors
CN111813130A (en) Autonomous navigation obstacle avoidance system of intelligent patrol robot of power transmission and transformation station
EP4068206A1 (en) Object tracking in local and global maps systems and methods
WO2021036587A1 (en) Positioning method and system for electric power patrol scenario
WO2024087962A1 (en) Truck bed orientation recognition system and method, and electronic device and storage medium
US20230219221A1 (en) Error detection method and robot system based on a plurality of pose identifications
CN113947134A (en) Multi-sensor registration fusion system and method under complex terrain
Chellali A distributed multi robot SLAM system for environment learning
CN117152249A (en) Multi-unmanned aerial vehicle collaborative mapping and perception method and system based on semantic consistency
JP2021177144A (en) Information processing apparatus, information processing method, and program
Manivannan et al. Vision based intelligent vehicle steering control using single camera for automated highway system
Egodagamage et al. Distributed monocular SLAM for indoor map building
Kostavelis et al. SPARTAN system: Towards a low-cost and high-performance vision architecture for space exploratory rovers
CN111652261A (en) Multi-modal perception fusion system
Valente et al. Evidential SLAM fusing 2D laser scanner and stereo camera
Scheuermann et al. Mobile augmented reality based annotation system: A cyber-physical human system
Jensen et al. Laser range imaging using mobile robots: From pose estimation to 3D-models
Aravind et al. Real-Time Appearance Based Mapping using Visual Sensor for Unknown Environment
US20230219220A1 (en) Error detection method and robot system based on association identification
CN116443256A (en) Methods, systems, and computer program products for airborne fueling
Parra et al. A novel method to estimate the position of a mobile robot in underfloor environments using RGB-D point clouds
Liu et al. A multi-sensor fusion with automatic vision-LiDAR calibration based on Factor graph joint optimization for SLAM
Pal et al. Evolution of Simultaneous Localization and Mapping Framework for Autonomous Robotics—A Comprehensive Review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200911