CN115514885A

CN115514885A - Monocular and binocular fusion-based remote augmented reality follow-up perception system and method

Info

Publication number: CN115514885A
Application number: CN202211037134.1A
Authority: CN
Inventors: 丁伟利; 李健; 华长春; 魏饶
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2022-12-23
Anticipated expiration: 2042-08-26
Also published as: CN115514885B

Abstract

The invention discloses a remote augmented reality follow-up sensing system and a remote augmented reality follow-up sensing method based on monocular and binocular fusion, which belong to the field of intelligent engineering machinery, wherein the system comprises a monocular and binocular fusion follow-up intelligent sensing module, a server-side intelligent processing module and a user-side augmented reality module, wherein the monocular and binocular fusion follow-up intelligent sensing module is used for acquiring RGB information and depth data of a construction scene and sending the RGB information and the depth data to the edge end of a server through a wireless transmission technology; the server-side intelligent processing module is used for executing calculations required by bucket attitude estimation, bucket tip positioning, accurate environment perception and augmented reality; the user-side augmented reality module comprises three-dimensional display equipment and an operation console, the three-dimensional display equipment is used for displaying the three-dimensional information fusion image processed by the server-side algorithm processing module, and the operation console is used for controlling engineering machinery operation on a construction site. The invention is based on the remote follow-up intelligent sensing technology, and can solve the problems of absence of telepresence and distance sense of an operator and the like.

Description

Remote augmented reality follow-up perception system and method based on single and binocular fusion

技术领域technical field

本发明涉及智能工程机械领域，尤其是基于单双目融合的远程增强现实随动感知系统及方法。The invention relates to the field of intelligent construction machinery, in particular to a remote augmented reality follow-up perception system and method based on single and binocular fusion.

背景技术Background technique

在国内和国际市场上，工程机械用户对设备的需求正在不断多样化，人们对操作舒适性的要求越来越高。特别是在艰苦、高危、重复性作业环境中，由于无法保障作业人员的健康，熟练操作机手严重短缺，施工企业招工越来越困难。而在这些环境以及意外灾害事故中，往往需要大量装载机、挖掘机等工程机械进行现场清理、道路恢复等抢险救援，人们越来越渴望通过远程智能遥控工程机械获得舒适的作业环境以及同等的作业效率。但是，目前工程机械遥操作系统的环境感知一般采用位置相对机体固定的可见光相机或激光雷达，一定程度上限制了智能感知的视场范围。In the domestic and international markets, the demands of construction machinery users for equipment are constantly diversifying, and people's requirements for operating comfort are getting higher and higher. Especially in arduous, high-risk, and repetitive operating environments, due to the inability to guarantee the health of workers and the serious shortage of skilled operators, it is becoming more and more difficult for construction companies to recruit workers. In these environments and accidental disasters, a large number of construction machinery such as loaders and excavators are often required for emergency rescue such as site cleaning and road restoration. People are increasingly eager to obtain a comfortable working environment and the same work efficiency. However, at present, the environment perception of construction machinery remote control system generally adopts visible light camera or laser radar whose position is fixed relative to the machine body, which limits the field of view of intelligent perception to a certain extent.

申请号为201810268923.3的发明公开了一种应用于灾区搜索的立体视觉随动系统，通过操作人员头戴VR眼镜和蓝牙耳机实时获取灾区现场音视频，并用头部姿态控制无人机摄像头的同步运动；申请号为202010882933.3的发明公开了一种适用于机器人遥操作的仿人双目随动虚拟现实系统，将双目相机放置在二维平台上，通过随动机构控制双目相机跟随操作人员的头部运动同步俯仰和旋转，从而改变双目相机视角。上述技术虽然为操作人员提供了身临其境的立体感，但还存在一些不足：仅仅为操作人员直接提供双目图像，依旧会导致临场感和距离感缺失的问题。The invention with the application number 201810268923.3 discloses a stereo vision follow-up system applied to disaster area search. The operator wears VR glasses and Bluetooth headsets to obtain real-time audio and video in the disaster area, and uses the head posture to control the synchronous movement of the UAV camera. ; The invention with application number 202010882933.3 discloses a humanoid binocular follow-up virtual reality system suitable for remote operation of robots. The binocular camera is placed on a two-dimensional platform, and the binocular camera is controlled by the follow-up mechanism to follow the movement of the operator. Head movements synchronize pitch and rotation, thereby changing the binocular camera perspective. Although the above technology provides operators with an immersive three-dimensional sense, there are still some shortcomings: only directly providing binocular images for operators will still lead to the problem of lack of presence and distance.

发明内容Contents of the invention

本发明需要解决的技术问题是提供一种基于单双目融合的远程增强现实随动感知系统及方法，该系统以远程随动智能感知技术为基础，能够为操作者提供铲斗的实时姿态估计以及铲斗的斗尖定位等重要信息，解决了操作者的临场感和距离感缺失等问题。The technical problem to be solved in the present invention is to provide a remote augmented reality follow-up perception system and method based on single and binocular fusion. The system is based on remote follow-up intelligent perception technology and can provide the operator with real-time attitude estimation of the bucket And important information such as the positioning of the bucket tip of the bucket solves the problems of the operator's lack of sense of presence and distance.

为解决上述技术问题，本发明所采用的技术方案是：In order to solve the problems of the technologies described above, the technical solution adopted in the present invention is:

一种基于单双目融合的远程增强现实随动感知系统，包括边缘端单双目融合随动智能感知模块、服务器端智能处理模块以及用户端增强现实模块；A remote augmented reality follow-up perception system based on single-eye fusion, including an edge-side single-eye fusion follow-up intelligent perception module, a server-side intelligent processing module, and a user-end augmented reality module;

所述边缘端单双目融合随动智能感知模块，基于单双目融合方法采集施工场景的RGB信息和深度数据，并通过无线传输发送到服务器端；The single and binocular fusion follow-up intelligent perception module at the edge end collects RGB information and depth data of the construction scene based on the single and binocular fusion method, and sends them to the server through wireless transmission;

所述服务器端智能处理模块，用于执行铲斗姿态估计、铲斗斗尖定位、精准环境感知以及增强现实所需的计算；The server-side intelligent processing module is used to perform calculations required for bucket attitude estimation, bucket tip positioning, precise environment perception, and augmented reality;

所述用户端增强现实模块，包括用于显示服务器端算法处理模块处理的三维信息融合图像的三维显示设备和用于控制施工现场的工程机械作业的操纵台。The user-side augmented reality module includes a three-dimensional display device for displaying the three-dimensional information fusion image processed by the server-side algorithm processing module and a console for controlling construction machinery operations on the construction site.

本发明技术方案的进一步改进在于：所述边缘端单双目融合随动智能感知模块，包括单双目视觉传感器、边缘端AI处理器以及随动云台；所述边缘端AI处理器用于实现单双目RGB信息、深度数据、相机位姿信息和关键目标信息的融合感知，并读取用户端佩戴视频眼镜的操作者的头部姿态，进而通过直流无刷电机控制方法控制搭载了单双目视觉传感器的随动云台快速同步操作者头部姿态，达到随动效果。The further improvement of the technical solution of the present invention lies in that: the edge-end single- and binocular fusion follow-up intelligent perception module includes a single- and binocular vision sensor, an edge-end AI processor, and a follow-up pan/tilt; the edge-end AI processor is used to realize Fusion perception of monocular and binocular RGB information, depth data, camera pose information and key target information, and read the head posture of the operator wearing video glasses on the user end, and then control the single and double The follow-up pan/tilt of the visual sensor quickly synchronizes the operator's head posture to achieve a follow-up effect.

本发明技术方案的进一步改进在于：所述边缘端AI处理器，通过USB数据线与单双目视觉传感器连接，实时读取单目RGB信息、双目灰度信息以及相机位姿信息；所述双目灰度信息采用立体匹配算法恢复深度图，通过双目深度摄像头内外参数将深度图上的2D点转换到世界坐标系下的3D点，再通过单目RGB摄像头的内外参数将世界坐标系下的3D点投影到RGB图像上，实现单双目信息融合；基于目标检测算法实时检测关键目标信息，并将关键目标的位置发送到服务器端智能处理模块用于铲斗斗尖定位。The further improvement of the technical solution of the present invention lies in that: the edge AI processor is connected to the single and binocular vision sensor through the USB data line, and reads monocular RGB information, binocular grayscale information and camera pose information in real time; The binocular grayscale information uses the stereo matching algorithm to restore the depth map, converts the 2D points on the depth map to the 3D points in the world coordinate system through the internal and external parameters of the binocular depth camera, and then converts the world coordinate system through the internal and external parameters of the monocular RGB camera. The 3D points below are projected onto the RGB image to achieve single and binocular information fusion; real-time detection of key target information based on the target detection algorithm, and the position of the key target is sent to the server-side intelligent processing module for bucket tip positioning.

本发明技术方案的进一步改进在于：所述服务器端智能处理模块，铲斗姿态估计与定位算法采用高效的、基于稀疏区域模板匹配方法和实时轻量化深度学习网络对铲斗状态进行实时跟踪，精准环境感知算法采用单目视觉SLAM算法，能够为安全作业提供必要的环境地图信息和工程机械姿态信息。The further improvement of the technical solution of the present invention lies in: the server-side intelligent processing module, the bucket attitude estimation and positioning algorithm adopts an efficient template matching method based on a sparse area and a real-time lightweight deep learning network to track the bucket state in real time, accurately The environment perception algorithm adopts the monocular vision SLAM algorithm, which can provide the necessary environment map information and construction machinery attitude information for safe operation.

本发明技术方案的进一步改进在于：所述三维显示设备在显示服务器端处理完成的融合图像的同时，能够精准捕获操作者头部姿态，并实时发送到服务器端，进而被边缘端AI处理器读取，控制随动云台进行跟随；所述操纵台还能用于为操作者提供真实的操纵环境，配合视频眼镜能够达到身临其境的效果。The further improvement of the technical solution of the present invention lies in that: while displaying the fused image processed by the server, the 3D display device can accurately capture the posture of the operator's head, and send it to the server in real time, and then be read by the edge-side AI processor. Take, control the follow-up pan-tilt to follow; the console can also be used to provide the operator with a real manipulation environment, and the effect of being on the scene can be achieved by cooperating with video glasses.

一种基于单双目融合的远程增强现实随动感知方法，包括以下步骤：A remote augmented reality follow-up perception method based on single and binocular fusion, comprising the following steps:

步骤1，将边缘端单双目融合随动智能感知模块放置在工程机械驾驶室内，确保驾驶室前方玻璃无遮挡后，打开边缘端处理器电源，使其处于等待状态，等待与服务器端智能处理模块建立通信连接；Step 1. Place the edge-end single and binocular fusion follow-up intelligent perception module in the construction machinery cab. After ensuring that the front glass of the cab is unobstructed, turn on the power of the edge-end processor and put it in a waiting state for intelligent processing with the server. The module establishes a communication connection;

步骤2，打开服务器端电源，使其处于监听状态，等待与边缘端单双目融合随动智能感知模块和用户端增强现实模块建立通信连接；Step 2, turn on the power of the server, make it in the monitoring state, and wait for the establishment of a communication connection with the edge-side single-eye fusion follow-up intelligent perception module and the user-side augmented reality module;

步骤3，操纵者进入操纵台，并佩戴视频眼镜，等待三维显示设备出现施工界面后，开始远程遥控作业；Step 3, the operator enters the console, wears video glasses, waits for the construction interface to appear on the 3D display device, and then starts the remote control operation;

步骤4，边缘端处理器读取操作者头部姿态数据，控制随动云台实时更新姿态，同时将施工场景的RGB信息和深度数据通过无线传输技术发送到服务器端；Step 4, the edge processor reads the operator's head posture data, controls the follow-up pan/tilt to update the posture in real time, and sends the RGB information and depth data of the construction scene to the server through wireless transmission technology;

步骤5，服务器端智能处理模块通过接收的RGB信息和深度数据进行铲斗的姿态估计与铲斗斗尖定位，以及环境地图信息构建，最终将融合了铲斗姿态信息、斗尖位置信息以及斗尖与自卸卡车等周围物体的实际距离信息的融合图像发送到用户端进行显示；Step 5, the server-side intelligent processing module performs bucket attitude estimation and bucket tip positioning through the received RGB information and depth data, and constructs environmental map information, and finally integrates bucket attitude information, bucket tip position information and bucket The fusion image of the actual distance information between the tip and the surrounding objects such as dump trucks is sent to the client for display;

步骤6，用户端操作者通过视频眼镜显示的融合图像和现场三维信息，配合操纵台远程遥控工程机械作业，同时视频眼镜实时捕获操作者头部姿态，并将其发送到服务器端等待边缘端处理器读取；Step 6: The operator at the user end uses the fused image and on-site 3D information displayed by the video glasses to cooperate with the console to remotely control the construction machinery operations, and at the same time the video glasses capture the operator's head posture in real time and send it to the server for processing at the edge device read;

步骤7，重复步骤4～步骤6。Step 7, repeat steps 4 to 6.

一种基于稀疏区域的铲斗姿态跟踪方法，其中使用基于单双目融合的远程增强现实随动感知系统，包括以下步骤：A method for tracking bucket posture based on sparse areas, wherein a remote augmented reality follow-up perception system based on single-eye fusion is used, comprising the following steps:

S1，将铲斗置于自然光照下，周围避免反光性物体出现，利用拍照设备围绕铲斗一圈，拍摄30张照片，拍摄时要使铲斗处于图像中心位置；S1, place the bucket under natural light, avoid reflective objects around, use camera equipment to circle around the bucket, and take 30 photos, and keep the bucket at the center of the image when shooting;

S2，打开RealityCapture软件，利用30张铲斗照片生成铲斗三维模型，该三维模型与真实铲斗比例完全相同；S2, open the RealityCapture software, use 30 photos of the bucket to generate a three-dimensional model of the bucket, and the three-dimensional model is exactly the same as the real bucket;

S3，在三维模型周围2562个不同的位置放置虚拟相机对三维模型进行渲染，利用渲染图采集铲斗当前姿态下的稀疏轮廓点，并将轮廓点反投影到铲斗三维模型坐标系下进行保存，同时保存轮廓点的法向量和当前姿态的方向向量；最终生成2562个模板视图；S3, place virtual cameras at 2562 different positions around the 3D model to render the 3D model, use the rendered image to collect sparse contour points of the bucket in its current posture, and back-project the contour points to the coordinate system of the bucket 3D model for storage , while saving the normal vector of the contour point and the direction vector of the current attitude; finally generating 2562 template views;

S4，给定铲斗的初始姿态，将所有模板视图的方向向量与初始姿态相乘，找出与初始姿态一致的模板视图；将该模板视图的轮廓点投影到当前真实图像上，轮廓点沿法线方向前18个像素指定为铲斗像素，后18个像素指定为背景像素，将铲斗与背景分割，进而得到铲斗的真实轮廓；S4. Given the initial pose of the bucket, multiply the direction vectors of all template views by the initial pose to find a template view that is consistent with the initial pose; project the contour points of the template view onto the current real image, and the contour points along the The first 18 pixels in the normal direction are designated as bucket pixels, and the last 18 pixels are designated as background pixels, and the bucket is separated from the background to obtain the real outline of the bucket;

S5，利用模型轮廓点与真实轮廓点沿法线方向的距离估计铲斗的真实姿态，进而实现铲斗跟踪。S5, using the distance between the model contour point and the real contour point along the normal direction to estimate the real posture of the bucket, and then realize the bucket tracking.

由于采用了上述技术方案，本发明取得的技术进步是：Owing to having adopted above-mentioned technical scheme, the technical progress that the present invention obtains is:

1、本发明由于采用了单双目随动智能感知技术，实现了操作人员在超视距情景下，摆脱传统遥操作视角固定的局限，能够通过头部姿态的变化自由获取施工现场的不同视角的单目RGB信息和双目深度数据。1. Due to the adoption of single and binocular follow-up intelligent perception technology in the present invention, the operator can get rid of the limitation of the fixed viewing angle of the traditional remote operation in the over-the-horizon situation, and can freely obtain different viewing angles of the construction site through the change of the head posture Monocular RGB information and binocular depth data.

2、本发明由于采用了高效实时的铲斗跟踪与斗尖定位算法，能够实时通过图像捕获铲斗的姿态以及铲斗斗尖的空间位置，进而配合增强现实技术为操作人员提供铲斗的实时姿态信息和位置信息，摆脱传统遥操作无法为操作人员提供距离感的局限。2. Since the present invention adopts an efficient and real-time bucket tracking and bucket tip positioning algorithm, it can capture the posture of the bucket and the spatial position of the bucket tip in real time through images, and then cooperate with augmented reality technology to provide operators with real-time information of the bucket. Attitude information and position information get rid of the limitation that traditional teleoperation cannot provide operators with a sense of distance.

3、本发明由于采用了SLAM+YOLO的环境感知算法，能够将铲斗斗尖的局部坐标实时变换到相机的起点坐标系下，进而提供全局的环境感知，为工程机械的高效作业提供基础。3. Since the present invention adopts the SLAM+YOLO environment perception algorithm, it can transform the local coordinates of the bucket tip to the starting point coordinate system of the camera in real time, thereby providing global environment perception and providing a basis for efficient operation of construction machinery.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图做以简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图；In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings on the premise of not paying creative labor;

图1为本发明实施例中基于单双目融合的远程增强现实随动感知系统总体结构示意图一；Figure 1 is a schematic diagram of the overall structure of a remote augmented reality follow-up perception system based on single-eye and binocular fusion in an embodiment of the present invention;

图2为本发明实施例中基于单双目融合的远程增强现实随动感知系统总体结构示意图二；Fig. 2 is a schematic diagram 2 of the overall structure of the remote augmented reality follow-up perception system based on single and binocular fusion in an embodiment of the present invention;

图3为本发明实施例中基于单双目融合的远程增强现实随动感知系统随动云台结构图；Fig. 3 is the structural diagram of the follow-up pan-tilt platform of the remote augmented reality follow-up perception system based on single-eye and binocular fusion in an embodiment of the present invention;

图4为本发明实施例中基于单双目融合的远程增强现实随动感知系统单双目融合示意图；FIG. 4 is a schematic diagram of single-eye fusion of a remote augmented reality follow-up perception system based on single-eye fusion in an embodiment of the present invention;

图5为本发明实施例中基于单双目融合的远程增强现实随动感知系统铲斗跟踪示意图；Fig. 5 is a schematic diagram of bucket tracking of the remote augmented reality follow-up perception system based on single-eye and binocular fusion in an embodiment of the present invention;

图6为本发明实施例中基于单双目融合的远程增强现实随动感知系统铲斗斗尖定位示意图；Fig. 6 is a schematic diagram of bucket tip positioning of the remote augmented reality follow-up perception system based on single- and binocular fusion in an embodiment of the present invention;

其中，1、边缘端AI处理器，2、单双目视觉传感器，3、y轴直流无刷电机，4、x轴直流无刷电机，5、RGB摄像头，6、左目深度摄像头，7、右目深度摄像头。Among them, 1. Edge AI processor, 2. Single and binocular vision sensor, 3. Y-axis brushless DC motor, 4. X-axis DC brushless motor, 5. RGB camera, 6. Left-eye depth camera, 7. Right-eye Depth camera.

具体实施方式detailed description

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语 “包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "comprising" and "having" in the description and claims of the present invention and the above drawings, as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, including a series of steps or units A process, method, system, product or device is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to the process, method, product or device.

下面结合附图及实施例对本发明做进一步详细说明：Below in conjunction with accompanying drawing and embodiment the present invention is described in further detail:

如图1、2所示，一种基于单双目融合的远程增强现实随动感知系统，包括边缘端单双目融合随动智能感知模块、服务器端智能处理模块以及用户端增强现实模块；As shown in Figures 1 and 2, a remote augmented reality follow-up perception system based on single- and binocular fusion includes an edge-end single- and binocular fusion follow-up intelligent perception module, a server-side intelligent processing module, and a user-side augmented reality module;

边缘端单双目融合随动智能感知模块，放置在工程机械驾驶室内，用于采集施工场景的RGB信息和深度数据，并通过无线传输技术发送到服务器端；The edge-side single-eye fusion follow-up intelligent perception module is placed in the cab of construction machinery to collect RGB information and depth data of the construction scene, and send them to the server through wireless transmission technology;

服务器端智能处理模块，用于执行铲斗姿态估计、铲斗斗尖定位、精准环境感知以及增强现实所需的计算；The server-side intelligent processing module is used to perform calculations required for bucket attitude estimation, bucket tip positioning, precise environment perception, and augmented reality;

用户端增强现实模块，包括视频眼镜和操纵台，视频眼镜用于显示服务器端算法处理模块处理的三维信息融合图像，操纵台用于控制施工现场的工程机械作业。The user-side augmented reality module includes video glasses and a control console. The video glasses are used to display the 3D information fusion images processed by the server-side algorithm processing module, and the control console is used to control construction machinery operations on the construction site.

如图3所示，随动云台集成了单双目视觉传感器2、边缘端AI处理器1、x轴直流无刷电机4以及y轴直流无刷电机3，采用solidworks软件自行设计，拥有x轴和y轴两个自由度，通过电机控制技术（FOC电机控制技术），使直流电机的转矩波动小、效率高、噪声小、动态响应快，能够快速同步操作者头部姿态。As shown in Figure 3, the servo gimbal integrates a single-eye and binocular vision sensor 2, an edge-side AI processor 1, an x-axis brushless DC motor 4, and a y-axis brushless DC motor 3. The two degrees of freedom of the axis and the y-axis, through the motor control technology (FOC motor control technology), make the DC motor have small torque fluctuations, high efficiency, low noise, fast dynamic response, and can quickly synchronize the operator's head posture.

如图4所示，单双目视觉传感器2拥有三个摄像头，分别为RGB摄像头5、左目深度摄像头6以及右目深度摄像头7，RGB摄像头5能够采集包括铲斗在内的RGB信息，通过无线传输发送到服务器端智能处理模块进行铲斗跟踪，左目深度摄像头6与右目深度摄像头7能够采集包括铲斗在内的灰度信息，通过上采样与RGB图像的分辨率保持一致，采用SGBM双目立体匹配算法基于右目图像进行深度恢复，最终通过右目深度摄像头7与RGB摄像头5之间的坐标变换关系将深度图像像素点重投影到RGB图像上进行单双目信息融合，使得铲斗姿态信息可以变换到RGB坐标系下，便于后续对铲斗斗尖进行定位。As shown in Figure 4, the single and binocular vision sensor 2 has three cameras, namely an RGB camera 5, a left-eye depth camera 6, and a right-eye depth camera 7. The RGB camera 5 can collect RGB information including the bucket and transmit it wirelessly. Send it to the server-side intelligent processing module for bucket tracking. The left-eye depth camera 6 and right-eye depth camera 7 can collect grayscale information including the bucket, and maintain the same resolution as the RGB image through upsampling. SGBM binocular stereo The matching algorithm performs depth restoration based on the right-eye image, and finally re-projects the pixels of the depth image onto the RGB image through the coordinate transformation relationship between the right-eye depth camera 7 and the RGB camera 5 for single and binocular information fusion, so that the attitude information of the bucket can be transformed In the RGB coordinate system, it is convenient for subsequent positioning of the bucket tip.

如图5所示，服务器端智能处理模块部署的铲斗姿态估计算法，能够利用铲斗的三维模型生成各个姿态的模板视图，之后与随动云台回传的铲斗真实视图进行匹配，进而得到铲斗姿态的精确估计，再将当前姿态的渲染图与真实图叠加显示，可视化跟踪结果。As shown in Figure 5, the bucket attitude estimation algorithm deployed by the server-side intelligent processing module can use the three-dimensional model of the bucket to generate template views of each attitude, and then match the real view of the bucket returned by the pan/tilt, and then Accurate estimation of the attitude of the bucket is obtained, and then the rendered image of the current attitude is superimposed on the real image to visualize the tracking results.

如图6中的（a）所示，服务器端智能处理模块部署的铲斗斗尖定位算法，能够利用铲斗三维模型渲染的二维图像得到斗尖的二维像平面坐标，再通过虚拟相机的内外参数得到铲斗斗尖在三维模型坐标系下的三维坐标。如图6中的（b）所示，三维模型坐标系下的斗尖坐标可以通过相机姿态变换到相机坐标系下，同时通过边缘端AI处理器1部署的YOLOv4目标检测算法检测自卸卡车的位置，配合深度数据得到卡车中心点在相机坐标系下的三维坐标，进而计算铲斗斗尖到卡车中心点的欧式距离，为操作者提供斗尖到卡车的相对距离信息，当相对距离小于设定的阈值时，会向操作者发出警报，避免发生碰撞。如图6中的（c）所示，通过服务器端部署的SLAM+YOLO算法，方框外的点代表检测到的静态特征点，方框内的点代表铲斗及其周围的动态特征点，通过每一帧的静态特征点计算当前帧相对于第一帧的变换关系，从而将每一时刻的斗尖坐标转换为初始时刻相机坐标系下坐标，即全局坐标系下的坐标。全局铲斗坐标能够更好的估计当前工程机械所处的施工状态。As shown in (a) of Figure 6, the bucket tip positioning algorithm deployed by the server-side intelligent processing module can use the 2D image rendered by the 3D model of the bucket to obtain the 2D image plane coordinates of the bucket tip, and then use the virtual camera The internal and external parameters of the bucket are obtained in the three-dimensional coordinates of the bucket tip in the three-dimensional model coordinate system. As shown in (b) in Figure 6, the bucket tip coordinates in the 3D model coordinate system can be transformed into the camera coordinate system through the camera pose, and at the same time, the YOLOv4 target detection algorithm deployed by the edge AI processor 1 detects the position of the dump truck. Position, combined with the depth data to obtain the three-dimensional coordinates of the truck center point in the camera coordinate system, and then calculate the Euclidean distance from the bucket tip to the truck center point, and provide the operator with relative distance information from the bucket tip to the truck. When the relative distance is less than the set When a certain threshold is reached, the operator is alerted to avoid a collision. As shown in (c) of Figure 6, through the SLAM+YOLO algorithm deployed on the server side, the points outside the box represent the detected static feature points, and the points inside the box represent the dynamic feature points of the bucket and its surroundings. Calculate the transformation relationship between the current frame and the first frame through the static feature points of each frame, so as to convert the coordinates of the bucket tip at each moment into the coordinates in the camera coordinate system at the initial moment, that is, the coordinates in the global coordinate system. The global bucket coordinates can better estimate the construction status of the current construction machinery.

步骤7，重复步骤4～步骤6。Step 7, repeat steps 4 to 6.

具体的，工程机械进行遥控作业时，操作者位于操纵台上，佩戴视频眼镜。此时视频眼镜显示的图像为服务器端智能处理模块处理完成的融合图像，包含了铲斗的姿态信息、铲斗斗尖与随动云台的距离信息、铲斗斗尖与自卸卡车的距离信息以及铲斗所处的全局环境信息，操作者可以利用这些信息判断施工现场的施工状态，并通过操纵台控制施工现场的工程机械作业。施工现场的工程机械驾驶室内放置的随动云台能够根据操作者的头部姿态自由转动，实时采集现场的RGB图像和深度图像，进而通过无线传输技术发送到服务器端智能处理模块进行姿态估计、斗尖定位、环境感知和基于增强现实技术的图像融合，最后服务器端将融合图像发送到视频眼镜进行显示。Specifically, when the construction machinery performs remote control operations, the operator is located on the console and wears video glasses. At this time, the image displayed by the video glasses is the fused image processed by the server-side intelligent processing module, which includes the attitude information of the bucket, the distance information between the bucket tip and the follow-up platform, and the distance between the bucket tip and the dump truck The operator can use this information to judge the construction status of the construction site and control the construction machinery operations on the construction site through the console. The follow-up pan/tilt placed in the construction machinery cab on the construction site can rotate freely according to the operator's head posture, collect the RGB image and depth image of the site in real time, and then send them to the server-side intelligent processing module through wireless transmission technology for attitude estimation, Tip positioning, environment perception and image fusion based on augmented reality technology, and finally the server sends the fused image to the video glasses for display.

可以将基于单双目融合的超视距远程增强现实随动智能感知系统及方法用于现有的实际的智能工程机械遥操作系统中，智能随动云台可代替无人机进行环境感知，基于视觉的铲斗跟踪方法可代替基于IMU传感器的跟踪方法，视频眼镜可代替大屏二维显示界面，为遥操作者提供身临其境的临场感。The over-the-horizon remote augmented reality follow-up intelligent perception system and method based on single and binocular fusion can be used in the existing actual intelligent engineering machinery teleoperation system, and the intelligent follow-up pan-tilt can replace the drone for environment perception. The vision-based bucket tracking method can replace the IMU sensor-based tracking method, and the video glasses can replace the large-screen two-dimensional display interface, providing teleoperators with an immersive sense of presence.

本发明还提供一种基于稀疏区域的铲斗姿态跟踪方法，使用基于单双目融合的远程增强现实随动感知系统，所述跟踪方法通过以下步骤实现功能：The present invention also provides a bucket posture tracking method based on a sparse area, using a remote augmented reality follow-up perception system based on single and binocular fusion. The tracking method realizes the function through the following steps:

S1，将铲斗置于自然光照下，周围尽量避免反光性物体出现。利用任意拍照设备围绕铲斗一圈，拍摄30张照片，拍摄时要尽可能使铲斗处于图像中心位置。S1, place the bucket under natural light, and try to avoid reflective objects around. Use any camera equipment to circle around the bucket and take 30 photos. When shooting, try to keep the bucket in the center of the image.

S2，打开RealityCapture软件，利用30张铲斗照片生成铲斗三维模型，该三维模型与真实铲斗比例完全相同。S2, open the RealityCapture software, use 30 photos of the bucket to generate a three-dimensional model of the bucket, and the three-dimensional model is exactly the same as the real bucket.

S3，在三维模型周围2562个不同的位置放置虚拟相机对三维模型进行渲染，利用渲染图采集铲斗当前姿态下的稀疏轮廓点，并将轮廓点反投影到铲斗三维模型坐标系下进行保存，同时保存轮廓点的法向量和当前姿态的方向向量。最终可生成2562个模板视图。S3, place virtual cameras at 2562 different positions around the 3D model to render the 3D model, use the rendered image to collect sparse contour points of the bucket in its current posture, and back-project the contour points to the coordinate system of the bucket 3D model for storage , while saving the normal vector of the contour point and the direction vector of the current pose. Finally, 2562 template views can be generated.

S4，给定铲斗的初始姿态，将所有模板视图的方向向量与初始姿态相乘，找出最接近初始姿态的模板视图。将该模板视图的轮廓点投影到当前真实图像上，轮廓点沿法线方向前18个像素指定为铲斗像素，后18个像素指定为背景像素，即可将铲斗与背景分割，进而得到铲斗的真实轮廓。S4, given the initial pose of the bucket, multiply the direction vectors of all template views with the initial pose to find the template view closest to the initial pose. Project the contour points of the template view onto the current real image. The first 18 pixels of the contour point along the normal direction are designated as bucket pixels, and the last 18 pixels are designated as background pixels. Then the bucket and the background can be separated, and then obtained The true outline of the bucket.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

1. The utility model provides a long-range augmented reality follow-up perception system based on monocular and binocular fusion which characterized in that: the system comprises an edge end single-binocular fusion follow-up intelligent sensing module, a server end intelligent processing module and a user end augmented reality module;

the edge end single-binocular fusion follow-up intelligent sensing module is used for acquiring RGB information and depth data of a construction scene based on a single-binocular fusion method and sending the RGB information and the depth data to the server end through wireless transmission;

the server-side intelligent processing module is used for executing calculation required by bucket attitude estimation, bucket tip positioning, accurate environment perception and augmented reality;

the user-side augmented reality module comprises a three-dimensional display device and a control console, wherein the three-dimensional display device is used for displaying the three-dimensional information fusion image processed by the server-side algorithm processing module, and the control console is used for controlling the operation of the engineering machinery on a construction site.

2. The monocular and binocular fusion based remote augmented reality follow-up perception system according to claim 1, wherein: the edge end single-binocular fusion follow-up intelligent sensing module comprises a single-binocular vision sensor, an edge end AI processor and a follow-up holder; the edge terminal AI processor is used for realizing the fusion perception of monocular and binocular RGB information, depth data, camera pose information and key target information, reading the head posture of an operator wearing the video glasses at a user terminal, and controlling a follow-up pan-tilt carrying monocular and binocular vision sensors to quickly synchronize the head posture of the operator by a direct current brushless motor control method, so that a follow-up effect is achieved.

3. The monocular and binocular fusion based remote augmented reality follow-up perception system according to claim 2, wherein: the edge terminal AI processor is connected with the monocular and binocular vision sensor through a USB data line, and reads monocular RGB information, binocular gray scale information and camera pose information in real time; the binocular gray scale information adopts a stereo matching algorithm to recover a depth map, 2D points on the depth map are converted into 3D points under a world coordinate system through internal and external parameters of a binocular depth camera, and then the 3D points under the world coordinate system are projected onto an RGB image through the internal and external parameters of a monocular RGB camera, so that single and binocular information fusion is realized; and detecting the information of the key target in real time based on a target detection algorithm, and sending the position of the key target to a server-side intelligent processing module for positioning the bucket tip.

4. The monocular and binocular fusion based remote augmented reality follow-up perception system according to claim 1, wherein: the intelligent processing module at the server side adopts an efficient sparse region template matching method and a real-time lightweight deep learning network based bucket attitude estimation and positioning algorithm to track the state of the bucket in real time, and the accurate environment perception algorithm adopts a monocular vision SLAM algorithm, so that necessary environment map information and engineering machinery attitude information can be provided for safe operation.

5. The monocular and binocular fusion based remote augmented reality follow-up perception system according to claim 1, wherein: the three-dimensional display equipment can accurately capture the head posture of an operator while displaying the fused image processed by the server, and sends the head posture to the server in real time, so that the head posture is read by the edge terminal AI processor to control the follow-up cradle head to follow; the control console can also be used for providing a real control environment for an operator, and can achieve the effect of being personally on the scene by matching with video glasses.

6. A perception method of the monocular and binocular fusion based remote augmented reality follow-up perception system according to any one of claims 1-5, wherein: the method comprises the following steps:

step 1, placing an edge end single-binocular fusion follow-up intelligent sensing module in a cab of engineering machinery, and after ensuring that glass in front of the cab is not shielded, turning on a power supply of an edge end processor to enable the edge end processor to be in a waiting state, and waiting for establishing communication connection with a server end intelligent processing module;

step 2, turning on a power supply of the server side to enable the server side to be in a monitoring state, and waiting for establishing communication connection with the edge end single-binocular fusion follow-up intelligent sensing module and the user side augmented reality module;

step 3, an operator enters the console, wears the video glasses and starts remote control operation after the three-dimensional display equipment has a construction interface;

step 4, reading the head posture data of an operator by the edge end processor, controlling the servo cradle head to update the posture in real time, and simultaneously sending RGB information and depth data of a construction scene to the server end through a wireless transmission technology;

step 5, the server-side intelligent processing module carries out attitude estimation and bucket tip positioning of the bucket and environment map information construction through the received RGB information and depth data, and finally sends a fused image fused with the bucket attitude information, bucket tip position information and actual distance information between the bucket tip and objects around the dump truck to a user side for display;

step 6, the user side operator remotely controls the engineering machinery operation in a manner of being matched with the console through the fused image and the on-site three-dimensional information displayed by the video glasses, and meanwhile, the video glasses capture the head posture of the operator in real time and send the head posture to the server side to wait for the edge side processor to read;

and 7, repeating the step 4 to the step 6.

7. A bucket attitude tracking method based on sparse areas, wherein a monocular and binocular fusion based remote augmented reality follow-up perception system according to any one of claims 1 to 5 is used, characterized in that: the method comprises the following steps:

s1, placing a bucket under natural illumination, avoiding reflective objects around the bucket, taking 30 pictures by using photographing equipment to surround the bucket for a circle, and enabling the bucket to be located at the center of an image during photographing;

s2, opening RealityCapture software, and generating a bucket three-dimensional model by using 30 bucket photos, wherein the three-dimensional model is completely the same as a real bucket in proportion;

s3, placing virtual cameras at 2562 different positions around the three-dimensional model to render the three-dimensional model, acquiring sparse contour points of the bucket in the current posture by using a rendering map, back-projecting the contour points to a coordinate system of the bucket three-dimensional model for storage, and simultaneously storing normal vectors of the contour points and direction vectors of the current posture; finally, 2562 template views are generated;

s4, giving an initial posture of the bucket, multiplying the direction vectors of all the template views by the initial posture, and finding out the template view consistent with the initial posture; projecting the contour point of the template view onto a current real image, wherein the front 18 pixels of the contour point along the normal direction are designated as bucket pixels, the rear 18 pixels are designated as background pixels, and the bucket and the background are segmented to obtain the real contour of the bucket;

and S5, estimating the real posture of the bucket by using the distance between the model contour point and the real contour point along the normal direction, and further realizing the bucket tracking.