CN114445688B

CN114445688B - A distributed multi-camera spherical unmanned system target detection method

Info

Publication number: CN114445688B
Application number: CN202210040564.2A
Authority: CN
Inventors: 蔡志浩; 牛钰; 赵江; 王英勋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2024-06-04
Anticipated expiration: 2042-01-14
Also published as: CN114445688A

Abstract

The present invention discloses a distributed multi-camera spherical unmanned system target detection method, wherein the unmanned system is composed of a plurality of quad-rotor drones, and the plurality of quad-rotor drones are assembled into a spherical unmanned system; a monocular camera is installed on the drone, and when assembled into a spherical state, the unmanned system is a rigid body carrying a plurality of cameras; the method comprises the following steps: designing a distributed camera network topology structure for the multi-camera unmanned system; performing image fusion processing at the feature level using a multi-camera data fusion algorithm; establishing a target detection algorithm based on deep learning, compressing a neural network model, and performing target detection processing on the fused image using the compressed neural network model to complete the target detection task. The method improves the target detection speed and the accuracy of target detection under conditions such as occlusion, and ensures that the spherical unmanned system successfully completes the task.

Description

A distributed multi-camera spherical unmanned system target detection method

技术领域Technical Field

本发明涉及多相机目标检测技术领域，尤其涉及一种分布式多相机网络结构的运动目标检测方法。具体来讲，涉及球形多相机无人系统在滚动模态下的目标检测方法。The present invention relates to the technical field of multi-camera target detection, and in particular to a moving target detection method of a distributed multi-camera network structure, and more specifically to a target detection method of a spherical multi-camera unmanned system in a rolling mode.

背景技术Background technique

近些年来，随着科学技术的发展，移动机器人逐渐进入现代社会生活，并开始在工业生产等领域发挥越来越重要的作用，由于机器人市场的急速扩张和人工智能领域的发展，基于移动机器人的各类目标检测、识别或追踪技术也得到越来越多的研究者的关注。而球形机器人相较于轮式机器人和足式机器人在机械结构和运动特性上均有较大的不同，相关研究存在较少的限制，因此对于球形无人系统的运动目标检测方法研究，具有一定的科学创新和实际意义。In recent years, with the development of science and technology, mobile robots have gradually entered modern social life and started to play an increasingly important role in industrial production and other fields. Due to the rapid expansion of the robot market and the development of artificial intelligence, various target detection, recognition or tracking technologies based on mobile robots have also attracted more and more attention from researchers. Compared with wheeled robots and legged robots, spherical robots are quite different in mechanical structure and motion characteristics, and there are fewer restrictions on related research. Therefore, the research on the motion target detection method of spherical unmanned systems has certain scientific innovation and practical significance.

目前针对图像或视频中各类目标(车辆、行人、面部、姿态等)的检测、识别或追踪技术已经成为计算机视觉智能化的主流方向。视频图像分析和处理技术具有实际应用价值以及广泛的开发前景，包括且不限于智能城市、社会治安管理、家居安防系统、体育赛事画面直播与分析、医疗监测等。而采用单个相机工作的系统受制于自身的硬件约束(视频分辨率、传输问题等)，尤其是其视野范围的限制，导致传统的获取视频图像技术无法满足对数据数量与质量的需求；相比之下，多相机协同工作可以弥补单个相机在工作时的局限性，并已经成为了当下研究的热点之一。At present, the detection, recognition or tracking technology for various targets (vehicles, pedestrians, faces, gestures, etc.) in images or videos has become the mainstream direction of computer vision intelligence. Video image analysis and processing technology has practical application value and broad development prospects, including but not limited to smart cities, social security management, home security systems, live broadcast and analysis of sports events, medical monitoring, etc. The system using a single camera is subject to its own hardware constraints (video resolution, transmission problems, etc.), especially the limitation of its field of view, resulting in the inability of traditional video image acquisition technology to meet the demand for data quantity and quality; in contrast, the collaborative work of multiple cameras can make up for the limitations of a single camera at work, and has become one of the current research hotspots.

发明内容Summary of the invention

本发明针对一种三栖球形模块化自组合无人系统的运动目标检测方法进行研究，该无人系统可在飞行模态、滚动模态、航行模态三个状态间切换，其由多个四旋翼无人机构成，每个无人机可独立或组合执行相应的搜查、勘探以及通信等任务。该球形无人系统的每一个单独的子模块无人机上均安装有单目摄像头，在拼合为球形滚动状态时，可将六个子模块无人机视为一个统一的携带有六个摄像头的刚体。通过各模块单元间协同编队、实时通信、分布控制，可以实现三栖球形模块化自组合无人系统在地面的全方位滚动，满足前线侦查、目标检测与跟踪等球形无人系统的任务需求。而为了解决球形多相机无人系统进行运动目标检测时，单相机视角受限，无法提供对目标的准确认知的问题，本发明提出一种分布式球形多相机无人系统的目标检测方法。The present invention studies a moving target detection method for a three-dimensional spherical modular self-assembling unmanned system. The unmanned system can switch between three states: flight mode, rolling mode, and navigation mode. It is composed of multiple four-rotor drones, and each drone can independently or in combination perform corresponding tasks such as search, exploration, and communication. Each individual sub-module drone of the spherical unmanned system is equipped with a monocular camera. When assembled into a spherical rolling state, the six sub-module drones can be regarded as a unified rigid body carrying six cameras. Through collaborative formation, real-time communication, and distributed control among the module units, the three-dimensional spherical modular self-assembling unmanned system can be realized to roll in all directions on the ground, meeting the task requirements of the spherical unmanned system such as front-line reconnaissance, target detection and tracking. In order to solve the problem that when the spherical multi-camera unmanned system performs moving target detection, the single camera's viewing angle is limited and cannot provide accurate recognition of the target, the present invention proposes a target detection method for a distributed spherical multi-camera unmanned system.

本发明采用的技术方案是，针对多相机系统设计分布式相机网络拓扑结构，以及多相机间的数据融合算法，并使用压缩后的神经网络模型进行目标检测，采用的技术方案如下：The technical solution adopted by the present invention is to design a distributed camera network topology structure for a multi-camera system, as well as a data fusion algorithm between multiple cameras, and use a compressed neural network model for target detection. The technical solution adopted is as follows:

一种分布式多相机球形无人系统目标检测方法，所述无人系统由多个四旋翼无人机构成，所述多个四旋翼无人机拼合成为球形无人系统；所述无人机上安装有单目摄像头，在拼合成球形状态时，无人系统为携带有多个摄像头的刚体；包括以下步骤：A distributed multi-camera spherical unmanned system target detection method, wherein the unmanned system is composed of a plurality of quad-rotor drones, and the plurality of quad-rotor drones are assembled into a spherical unmanned system; the drones are equipped with monocular cameras, and when assembled into a spherical state, the unmanned system is a rigid body carrying a plurality of cameras; the method comprises the following steps:

第一步，针对多相机无人系统设计分布式相机网络拓扑结构，使各相机分别将拍摄到的场景图像进行独立的特征提取处理；The first step is to design a distributed camera network topology for the multi-camera unmanned system, so that each camera can perform independent feature extraction processing on the captured scene images;

第二步，利用多相机数据融合算法，对多个相机提供的图像数据进行特征层面的图像融合处理，使其融合成为一幅完备的图像；The second step is to use the multi-camera data fusion algorithm to perform feature-level image fusion processing on the image data provided by multiple cameras so that they can be fused into a complete image.

第三步，建立基于深度学习的目标检测算法，对神经网络模型进行压缩处理，利用压缩后的神经网络模型对第二步得到的融合图像进行目标检测处理，完成目标检测任务。The third step is to establish a target detection algorithm based on deep learning, compress the neural network model, and use the compressed neural network model to perform target detection on the fused image obtained in the second step to complete the target detection task.

进一步，所述第一步，所述分布式相机网络拓扑结构具体为：对每个相机节点配置处理器作为球形无人系统的相机节点的处理单元，利用ROS中的话题-订阅机制来实现任意多个相机节点的数据通信，每个相机节点独立运行，将拍摄到的场景图像进行独立的特征提取处理。Furthermore, in the first step, the distributed camera network topology structure is specifically as follows: a processor is configured for each camera node as a processing unit of the camera node of the spherical unmanned system, and the topic-subscription mechanism in ROS is used to realize data communication of any number of camera nodes. Each camera node runs independently and performs independent feature extraction processing on the captured scene image.

进一步，所述第二步，所述多相机数据融合算法具体为：Furthermore, in the second step, the multi-camera data fusion algorithm is specifically as follows:

(1)获取多个相机提供的多视角图像的目标图像特征点；(1) Obtaining target image feature points from multi-view images provided by multiple cameras;

(2)将特征点描述子传送至中央处理单元，对输入的多视角图像做特征匹配，使匹配后的多视角图像融合成一幅完备的图像。(2) The feature point descriptors are transmitted to the central processing unit to perform feature matching on the input multi-view images so that the matched multi-view images are fused into a complete image.

进一步，在多视角图像融合时引入改进的加权平滑算法，即：Furthermore, an improved weighted smoothing algorithm is introduced when fusion of multi-view images, namely:

用f(x,y)表示重叠区域融合后的图像，由2幅待融合图像f_L和f_R加权平均得到，即：Let f(x,y) represent the image after overlapping area fusion, which is obtained by weighted average of two images to be fused, _fL and _fR , that is:

f(x,y)＝α×f_L(x,y)+(1-α)f_R(x,y)f(x,y)＝α× _fL (x,y)+(1-α) _fR (x,y)

其中α是可调因子，0＜α＜1，即在图像交叉区域中，沿视角1图像向视角2图像的方向，α由1渐变为0，实现交叉区域的平滑融合；为了给2幅图像建立更大的相关性，使用下式进行融合处理：Among them, α is an adjustable factor, 0＜α＜1, that is, in the image intersection area, along the direction from the view 1 image to the view 2 image, α gradually changes from 1 to 0, achieving smooth fusion of the intersection area; in order to establish a greater correlation between the two images, the following formula is used for fusion processing:

令则/>其中d₁、d₂分别表示交叉区域中的点到2个不同视角图像交叉区域的左边界和右边界的距离。make Then/> Wherein d ₁ and d ₂ represent the distances from a point in the intersection region to the left boundary and the right boundary of the intersection region of two images of different viewing angles, respectively.

进一步，所述第三步，对神经网络模型进行压缩处理具体为：Furthermore, the third step of compressing the neural network model is specifically as follows:

(1)制作训练所用的目标数据集，利用数据集进行基础训练；(1) Create a target dataset for training and use it for basic training;

(2)设置学习率，对网络进行稀疏化训练，网络中许多缩放因子会趋近于零；(2) Setting the learning rate to perform sparse training on the network will cause many scaling factors in the network to approach zero;

(3)进行网络剪枝，对缩放因子进行排序，剪掉值较小的缩放因子对应的通道；(3) Perform network pruning, sort the scaling factors, and prune the channels corresponding to the scaling factors with smaller values;

(4)进行知识蒸馏，使用教师网络指导剪枝后的学生网络进行训练，得到压缩后更加紧凑的神经网络模型。(4) Perform knowledge distillation and use the teacher network to guide the training of the pruned student network to obtain a more compact neural network model after compression.

进一步，在知识蒸馏之后，重新进行稀疏化训练、网络剪枝、知识蒸馏步骤，对模型进行多次压缩。Furthermore, after knowledge distillation, the sparsification training, network pruning, and knowledge distillation steps are repeated to compress the model multiple times.

进一步，所述网络剪枝具体为：Furthermore, the network pruning is specifically as follows:

(1)引入缩放因子，将其乘以通道的输出，计算与所有滤波器相关的新引入的梯度项；(1) Introduce a scaling factor, multiply it by the output of the channel, and calculate the newly introduced gradient terms associated with all filters;

(2)将网络权值和缩放因子联合训练，对缩放因子稀疏正则化；(2) Jointly train the network weights and scaling factors, and perform sparse regularization on the scaling factors;

(3)用一个小因子对通道进行微调，使同一通道的输入或输出端的所有权重在训练时同时趋于零值，以切断与通道相关的所有输入和输出连接，实现通道细化；(3) Fine-tune the channel with a small factor so that all weights at the input or output of the same channel tend to zero at the same time during training, so as to cut off all input and output connections related to the channel and achieve channel refinement;

(4)对近似为0的缩放因子剪裁通道，并移除其所有连接和相应的权重，得到剪枝后的网络模型。(4) Prune the channel with a scaling factor approximately equal to 0 and remove all its connections and corresponding weights to obtain the pruned network model.

进一步，所述知识蒸馏具体为：Furthermore, the knowledge distillation is specifically as follows:

(1)训练大模型；(1) Training large models;

(2)将训练目标从传统的真值标签更新为软目标，将大模型的训练知识转移到小模型上，得到压缩后更加紧凑的神经网络模型。(2) The training target is updated from the traditional true value label to the soft target, and the training knowledge of the large model is transferred to the small model to obtain a more compact neural network model after compression.

本申请相对于现有技术的有益效果：The beneficial effects of this application compared to the prior art are as follows:

(1)相较于单个相机工作时视角受限，在球形无人系统合适的位置上安装多个相机，能够获得相同目标的不同视角的信息，可以弥补单个相机相对角度变化时丢失的正面信息，也对遮挡问题提供了有效的解决途径。(1) Compared with a single camera with limited viewing angle, installing multiple cameras at appropriate locations on a spherical unmanned system can obtain information from different viewing angles of the same target, which can make up for the positive information lost when the relative angle of a single camera changes, and also provide an effective solution to the occlusion problem.

(2)考虑到无人机搭载的机载处理器的计算能力有限，且多相机的使用引入了巨大的数据量，对神经网络模型进行压缩处理，可以提升目标检测的实时性，以满足无人系统顺利完成任务。(2) Considering that the computing power of the onboard processor of the UAV is limited and the use of multiple cameras introduces a huge amount of data, compressing the neural network model can improve the real-time performance of target detection to enable the unmanned system to successfully complete its mission.

(3)针对球形无人系统中的多个相机设计分布式相机网络拓扑结构，可以使系统具有分布式的计算与通信特性，以及高机动性和稳定性。在每个节点都配置有处理器资源以保证本地计算能力，这样即便某一节点功能失衡，多相机系统也能继续维持工作状态。(3) Designing a distributed camera network topology for multiple cameras in a spherical unmanned system can give the system distributed computing and communication characteristics, as well as high mobility and stability. Processor resources are configured at each node to ensure local computing power, so that even if the function of a node is unbalanced, the multi-camera system can continue to maintain its working state.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为三栖球形模块化自组合无人系统；Figure 1 is an amphibious spherical modular self-assembling unmanned system;

图2为三栖球形模块化自组合无人系统模块分解示意图；Figure 2 is a schematic diagram of the decomposition of the modules of the amphibious spherical modular self-assembling unmanned system;

图3为三栖球形模块化自组合无人系统相机示意图；Figure 3 is a schematic diagram of a terrestrial spherical modular self-assembling unmanned system camera;

图4为分布式多相机球形无人系统目标检测方法的技术路线图；FIG4 is a technical roadmap of a distributed multi-camera spherical unmanned system target detection method;

图5为多相机系统分布式相机网络拓扑结构图；FIG5 is a diagram of a distributed camera network topology of a multi-camera system;

图6为神经网络模型压缩流程。Figure 6 shows the neural network model compression process.

具体实施方式Detailed ways

下面结合说明书附图和实施例对本发明作进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

本发明所提出的分布式多相机球形无人系统目标检测方法，如图1-3所示，无人系统由多个四旋翼无人机构成，多个四旋翼无人机拼合成为球形无人系统；无人机上安装有单目摄像头，在拼合成球形状态时，无人系统为携带有多个摄像头的刚体；如图4所示，分布式球形多相机无人系统实现目标检测的整个流程主要包括：球形无人多相机系统分布式相机网络拓扑结构设计与图像特征提取处理、多相机数据融合算法设计、建立基于深度学习的目标检测算法并对神经网络模型进行压缩处理，具体技术方案如下：The target detection method of the distributed multi-camera spherical unmanned system proposed in the present invention is shown in Figures 1-3. The unmanned system is composed of multiple quad-rotor drones, and the multiple quad-rotor drones are assembled into a spherical unmanned system; a monocular camera is installed on the drone, and when assembled into a spherical state, the unmanned system is a rigid body carrying multiple cameras; as shown in Figure 4, the entire process of the distributed spherical multi-camera unmanned system to achieve target detection mainly includes: the design of the distributed camera network topology structure of the spherical unmanned multi-camera system and the image feature extraction and processing, the design of the multi-camera data fusion algorithm, the establishment of a target detection algorithm based on deep learning and the compression processing of the neural network model. The specific technical scheme is as follows:

第一步：球形无人多相机系统分布式相机网络拓扑结构设计。如图5所示，采用分布式的相机网络拓扑结构可以使球形多相机无人系统具有分布式的计算与通信特性以及高机动性和稳定性。在系统结构上，为了避免集中式结构中一旦主服务器出现错误将导致整个系统崩溃这一隐患，采用分布式的结构，每个节点都配置有处理器资源，保证本地计算能力，当任一节点功能失衡时阵列系统也能继续维持工作状态。为了满足分布式系统的计算性能，考虑配置和使用难度并评估处理单元的成本以及尺寸范围，对每个相机节点均配置树莓派4b作为球形无人系统的相机节点的处理单元，以完成图像信息采集，目标图像特征提取等多相机数据融合前的准备工作。Step 1: Design of distributed camera network topology for spherical unmanned multi-camera system. As shown in Figure 5, the use of a distributed camera network topology can make the spherical multi-camera unmanned system have distributed computing and communication characteristics as well as high mobility and stability. In terms of system structure, in order to avoid the hidden danger that the entire system will crash once an error occurs in the main server in the centralized structure, a distributed structure is adopted. Each node is equipped with processor resources to ensure local computing power. When the function of any node is unbalanced, the array system can continue to maintain its working state. In order to meet the computing performance of the distributed system, considering the difficulty of configuration and use and evaluating the cost and size range of the processing unit, each camera node is configured with a Raspberry Pi 4b as the processing unit of the camera node of the spherical unmanned system to complete the preparation work before multi-camera data fusion, such as image information acquisition and target image feature extraction.

并参照适用于分布式结构的多机通信方案，利用ROS中的话题-订阅机制来实现任意多个相机节点的数据通信，每个相机节点独立运行，将拍摄到的场景图像进行运动目标检测处理。分布式相机网络拓扑结构的优点在于，每个相机节点配置有一定的计算资源来保证本地处理能力，各相机节点的运行不存在依赖关系，且每个节点可以与其他任意节点进行数据交换。And referring to the multi-machine communication scheme suitable for distributed structure, the topic-subscription mechanism in ROS is used to realize data communication of any number of camera nodes. Each camera node runs independently and processes the captured scene images for moving target detection. The advantage of the distributed camera network topology is that each camera node is equipped with certain computing resources to ensure local processing capabilities, there is no dependency between the operation of each camera node, and each node can exchange data with any other node.

其中，应用ORB特征点提取算法对目标图像特征点进行提取，其步骤为：首先第一步进行粗提取，从图像中选取一点P，以P为圆心画一个半径为3个像素的圆，如果圆周上有连续n个像素点的灰度值比P点的灰度值大或是小，则认为P为特征点；第二步是通过机器学习的方法训练一个决策树，将特征点圆周上的16个像素输入到决策树中，一次筛选出最优的FAST特征点；第三步通过非极大值抑制算法去除局部较密集的特征点；第四步建立金字塔，实现特征点的多尺度不变性；第五步使用矩法确定FAST特征点的方向，实现特征点的旋转不变性。Among them, the ORB feature point extraction algorithm is used to extract the feature points of the target image, and its steps are as follows: first, the first step is to perform rough extraction, select a point P from the image, draw a circle with a radius of 3 pixels with P as the center, and if there are n consecutive pixels on the circumference whose grayscale values are greater or smaller than the grayscale value of point P, then P is considered to be a feature point; the second step is to train a decision tree through machine learning methods, input 16 pixels on the circumference of the feature point into the decision tree, and screen out the optimal FAST feature point at one time; the third step is to remove locally dense feature points through the non-maximum suppression algorithm; the fourth step is to establish a pyramid to achieve multi-scale invariance of feature points; the fifth step is to use the moment method to determine the direction of the FAST feature point to achieve rotation invariance of the feature point.

第二步：设计多相机数据融合算法。球形多相机无人系统的多个相机协同进行目标检测时，能够获得相同目标的不同视角的信息，为了更好的完成目标检测任务，将多个相机采集到的关于同一目标或场景的图像特征经过特定的特征匹配算法(例如暴力匹配算法)进行图像融合处理，能够弥补单个相机相对角度变化时丢失的目标的正面信息。例如多光谱图像虽然光谱信息丰富，但是分辨率低，而全色图像具有较高分辨率，但色彩辨识度低，图像融合能够充分利用二者的互补信息，经融合生成的图像能更好地提供场景信息，便于人眼观察和进一步机器处理。因此，针对球形无人多相机系统设计多相机的数据融合算法，能够提高目标检测的精度与可信度，并增加系统的容错能力。Step 2: Design a multi-camera data fusion algorithm. When multiple cameras of a spherical multi-camera unmanned system work together to detect a target, they can obtain information from different perspectives of the same target. In order to better complete the target detection task, the image features of the same target or scene collected by multiple cameras are processed by a specific feature matching algorithm (such as a brute force matching algorithm) for image fusion, which can make up for the positive information of the target lost when the relative angle of a single camera changes. For example, although multispectral images are rich in spectral information, they have low resolution, while panchromatic images have high resolution but low color recognition. Image fusion can make full use of the complementary information of the two. The fused image can better provide scene information, which is convenient for human eye observation and further machine processing. Therefore, designing a multi-camera data fusion algorithm for a spherical unmanned multi-camera system can improve the accuracy and credibility of target detection and increase the fault tolerance of the system.

首先获取多个相机提供的多视角图像的目标图像特征点；而后将特征点描述子传送至中央处理单元，对输入的多视角图像做特征匹配，使匹配后的多幅图像融合成一个完备的图像。考虑到在实际问题中不同角度的相机采集的图像融后会产生色差，本发明在多视角图像融合时引入改进的加权平滑算法，以解决不同视角图像融合时产生色差的问题。First, the target image feature points of the multi-view images provided by multiple cameras are obtained; then the feature point descriptors are transmitted to the central processing unit, and feature matching is performed on the input multi-view images, so that the matched multiple images are fused into a complete image. Considering that in practical problems, images collected by cameras at different angles will produce color aberration after fusion, the present invention introduces an improved weighted smoothing algorithm when fusing multi-view images to solve the problem of color aberration when fusing images at different angles.

加权平滑算法具体为：用f(x,y)表示重叠区域融合后的图像，由2幅待融合图像f_L和f_R加权平均得到，即：The weighted smoothing algorithm is as follows: f(x, y) is used to represent the image after the overlapping area is fused, which is obtained by weighted averaging the two images to be fused, _fL and _fR , that is:

f(x,y)＝α×f_L(x,y)+(1-α)f_R(x,y)f(x,y)＝α× _fL (x,y)+(1-α) _fR (x,y)

其中α是可调因子，一般情况下，0＜α＜1，即在图像交叉区域中，沿视角1图像向视角2图像的方向，α由1渐变为0，从而实现交叉区域的平滑融合。为了给2幅图像建立更大的相关性，使用下式进行融合处理：Among them, α is an adjustable factor. Generally, 0＜α＜1, that is, in the image intersection area, along the direction from view 1 image to view 2 image, α gradually changes from 1 to 0, thereby achieving smooth fusion of the intersection area. In order to establish a greater correlation between the two images, the following formula is used for fusion processing:

特别的，多幅图像融合的基础即两两融合。实际场景中检测目标时无需用到六幅图像，首先对每个相机检测到目标的概率排序；而后以检测概率最大的相机为中心(其最有可能正对目标)，融合其上下左右的3-4幅图像再做检测，以增加目标检测的可信度与准确度。In particular, the basis of multi-image fusion is pairwise fusion. In actual scenes, it is not necessary to use six images to detect targets. First, the probability of each camera detecting the target is ranked; then, the camera with the highest detection probability is taken as the center (it is most likely to be facing the target), and 3-4 images above, below, left and right are fused for detection to increase the credibility and accuracy of target detection.

第三步：建立基于深度学习的目标检测算法，并对神经网络模型进行压缩处理。由于球形无人多相机系统平台的能力限制，其搭载的机载处理器的计算能力有限，因此需要选择合适的目标检测算法，以同时满足检测实时性以及对目标的检测精度的要求。相对于传统的基于滑动窗口的目标检测，基于深度学习的目标检测算法YOLO v3采用更加直接的思路，不进行候选区域的选择，而是直接预测目标的位置和所属类别，在目标检测精度和速率方面都有很大的提升。Step 3: Establish a target detection algorithm based on deep learning and compress the neural network model. Due to the capacity limitations of the spherical unmanned multi-camera system platform, the computing power of its onboard processor is limited, so it is necessary to select a suitable target detection algorithm to meet the requirements of real-time detection and target detection accuracy. Compared with traditional sliding window-based target detection, the deep learning-based target detection algorithm YOLO v3 adopts a more direct approach. It does not select candidate areas, but directly predicts the location and category of the target, which greatly improves the target detection accuracy and rate.

YOLO模型通过使用S×S的网格图覆盖于输入图像，将输入图像进行网格化，如果一个物体的中心落在一个网格单元内，那么这个网格单元就负责检测这个物体，并预测边界框的类别信息及置信度，通过设置置信度筛选边界框，置信度低的边界框将被舍弃掉，并对保留下来的置信度高的边界框做非极大值抑制处理，以删除高度冗余的边界框。改进的YOLO v3算法借鉴了残差网络中跳跃连接的思路，采用了新的网络结构Darknet-53，对多尺度目标采用3个不同尺度的特征图进行目标检测，用逻辑回归分类器代替归一化指数函数分类器预测类别，可同时预测多个类别，检测多标签目标对象，在保持速度优势的前提下，YOLO v3算法提升了预测精度，尤其是加强了对小物体的识别能力。The YOLO model grids the input image by covering it with an S×S grid map. If the center of an object falls within a grid unit, then this grid unit is responsible for detecting the object and predicting the category information and confidence of the bounding box. By setting the confidence to filter the bounding box, the bounding boxes with low confidence will be discarded, and the bounding boxes with high confidence that are retained will be subjected to non-maximum suppression to delete highly redundant bounding boxes. The improved YOLO v3 algorithm draws on the idea of skip connections in the residual network and adopts a new network structure Darknet-53. It uses three feature maps of different scales for multi-scale targets for target detection, and uses a logistic regression classifier instead of a normalized exponential function classifier to predict categories. It can predict multiple categories at the same time and detect multi-label target objects. While maintaining the speed advantage, the YOLO v3 algorithm improves the prediction accuracy, especially the recognition ability of small objects.

但是，深度学习算法在运算资源有限且条件复杂的机载平台上移植时，必须要考虑运算资源占用的问题，因此，有必要对目标检测算法进行优化，对神经网络模型进行轻量化处理，对模型进行压缩以减少参数量，在精度下降不大的情况下提高检测速度，以更好地适应机载平台硬件的特殊需求，完成目标检测任务。However, when deep learning algorithms are transplanted on airborne platforms with limited computing resources and complex conditions, the issue of computing resource usage must be considered. Therefore, it is necessary to optimize the target detection algorithm, lightweight the neural network model, compress the model to reduce the number of parameters, and increase the detection speed without much decrease in accuracy, so as to better adapt to the special needs of the airborne platform hardware and complete the target detection task.

本发明所采用的神经网络模型压缩流程如图6所示。深度学习网络模型中存在大量冗余参数，大多数存在于卷积层或是全连接层中的神经元的激活值趋于0，而这些神经元去除后也能表现出同样的模型表达能力，这就是所谓的过度参数化。因此本发明在使用数据集训练神经网络的过程中，采用网络剪枝和知识蒸馏的方法去除网络参数，去掉冗余的网络节点和权重连接甚至是去掉多余的卷积核，使网络结构更加精简。The neural network model compression process used in the present invention is shown in Figure 6. There are a large number of redundant parameters in the deep learning network model. Most of the activation values of neurons in the convolutional layer or the fully connected layer tend to 0, and these neurons can still show the same model expression ability after being removed. This is the so-called over-parameterization. Therefore, in the process of using data sets to train neural networks, the present invention uses network pruning and knowledge distillation methods to remove network parameters, remove redundant network nodes and weight connections, and even remove redundant convolution kernels, so as to make the network structure more streamlined.

网络剪枝通过删除冗余的不重要的连接来减少参数的数量。网络剪枝方法主要包含训练中稀疏约束与训练后剪枝两个大类。稀疏约束是指在不需要预先训练模型的情况下，通过向优化函数添加稀疏约束使网络结构趋于稀疏，网络损失函数的稀疏约束主要通过引入L1正则化和L2正则化约束来实现；训练后剪枝，则是剔除网络中相对不重要的部分使网络稀疏化、精简化，是目前最为简单有效的方法。训练后的网络剪枝从现有的训练模型开始，逐渐消除网络中的冗余信息，避免网络重新训练带来的损失。Network pruning reduces the number of parameters by removing redundant and unimportant connections. Network pruning methods mainly include two categories: sparse constraints during training and pruning after training. Sparse constraints refer to adding sparse constraints to the optimization function to make the network structure sparse without the need for pre-training models. The sparse constraints of the network loss function are mainly achieved by introducing L1 regularization and L2 regularization constraints; post-training pruning is to remove relatively unimportant parts of the network to make the network sparse and streamlined, which is currently the simplest and most effective method. Post-training network pruning starts from the existing training model and gradually eliminates redundant information in the network to avoid losses caused by network retraining.

根据剪枝细致尺度的不同，主要有核内剪枝、通道剪枝、层间修剪、k×k核修剪，本发明使用一种通道剪枝的方法对网络进行剪枝。为了实现通道细化，必须切断与通道相关的所有输入和输出连接，这使得直接修剪预训练模型上的权重的方法无效，因为剪枝需要剪掉趋于零的权值，但通道的输入或输出端的所有权重不可能都趋于零值。因此，本发明通过在训练的目标函数中实施稀疏正则化来解决这个问题，具体来讲，就是采用分组Lasso方法使得所有滤波器的同一个通道在训练时同时趋于0。在计算与所有滤波器相关的新引入的梯度项时，本发明为每个通道引入一个缩放因子，并将其乘以通道的输出；然后将网络权值和这些缩放因子联合训练，对后者稀疏正则化；最后，用一个小因子对这些信道进行微调。上述的目标函数定义如下：According to the different pruning scales, there are mainly intra-core pruning, channel pruning, inter-layer pruning, and k×k core pruning. The present invention uses a channel pruning method to prune the network. In order to achieve channel refinement, all input and output connections related to the channel must be cut off, which makes the method of directly pruning the weights on the pre-trained model invalid, because pruning needs to prune the weights that tend to zero, but it is impossible for all weights at the input or output end of the channel to tend to zero. Therefore, the present invention solves this problem by implementing sparse regularization in the objective function of training. Specifically, the grouped Lasso method is used to make the same channel of all filters tend to 0 at the same time during training. When calculating the newly introduced gradient terms related to all filters, the present invention introduces a scaling factor for each channel and multiplies it by the output of the channel; then the network weights and these scaling factors are jointly trained to sparsely regularize the latter; finally, a small factor is used to fine-tune these channels. The above objective function is defined as follows:

其中，L为目标函数，f表示x,W之间存在映射关系，l表示f(x,W),y之间存在映射关系，无实际含义，Γ表示特征的系数集合，(x,y)代表训练输入和目标，W代表可训练权重，第一个求和项对应着卷积神经网络正常训练的损失，g(γ)是在缩放因子上的稀疏性惩罚，λ是是两项的平衡因子。此处选择g(γ)＝|γ|，即L1正则化，并采用次梯度下降作为非平滑L1惩罚项的优化方法。Among them, L is the objective function, f indicates that there is a mapping relationship between x and W, l indicates that there is a mapping relationship between f(x, W) and y, which has no actual meaning, Γ indicates the coefficient set of the feature, (x, y) represents the training input and target, W represents the trainable weight, the first summation term corresponds to the loss of normal training of the convolutional neural network, g(γ) is the sparsity penalty on the scaling factor, and λ is the balance factor of the two terms. Here, g(γ) = |γ| is selected, that is, L1 regularization, and sub-gradient descent is used as the optimization method for the non-smooth L1 penalty term.

在每个通道进行稀疏归一化后，模型中的很多缩放因子变为0，然后可以对近似为0的缩放因子剪裁通道，并移除其所有连接和相应的权重。对于所有层，本发明根据所有缩放因子的值以一定比例设置一个全局阈值来裁剪通道，这样网络参数和计算操作会少很多，并节省运行时的内存占用。After sparse normalization is performed on each channel, many scaling factors in the model become 0, and then the channels with scaling factors close to 0 can be pruned, and all their connections and corresponding weights can be removed. For all layers, the present invention sets a global threshold to prune channels according to the values of all scaling factors at a certain ratio, so that the network parameters and calculation operations will be much less, and the memory usage at runtime will be saved.

而知识蒸馏是指在小模型训练的过程中，以大模型已经学到的知识作为指导，使小模型在参数量比较低的情况下拥有大模型类似的检测效果。知识蒸馏有两种思路，一种是同时训练大模型和小模型；另一种是先训练大模型，然后在其基础上提炼出小模型。事实上，可以训练出更具普适性的复杂模型，通过复杂模型的知识转移，大规模降低小规模特殊任务上的训练成本，就可以得到高性能的模型。因此，本发明所采用的知识蒸馏的方法是，把大模型的训练知识转移到一个特殊的小模型上，训练小模型时，训练目标从传统的真值标签更新为所谓的软目标，软目标可以在训练过程中提供更大的信息熵，更好地将训练好的模型知识传递给新模型。并且由于在这个目标下不同训练样本之间的梯度方差较小，小模型学习所需的训练数据会大大减少，可以使用更高的学习速率，模型的迭代速度也会显著加快。Knowledge distillation refers to the process of training a small model, using the knowledge that the large model has learned as a guide, so that the small model has a similar detection effect to the large model with a relatively low number of parameters. There are two ideas for knowledge distillation. One is to train the large model and the small model at the same time; the other is to train the large model first, and then refine the small model on its basis. In fact, a more universal complex model can be trained. Through the knowledge transfer of the complex model, the training cost of small-scale special tasks can be reduced on a large scale, and a high-performance model can be obtained. Therefore, the method of knowledge distillation adopted by the present invention is to transfer the training knowledge of the large model to a special small model. When training the small model, the training target is updated from the traditional true value label to the so-called soft target. The soft target can provide greater information entropy during the training process and better transfer the trained model knowledge to the new model. And because the gradient variance between different training samples is small under this goal, the training data required for small model learning will be greatly reduced, a higher learning rate can be used, and the iteration speed of the model will be significantly accelerated.

上述的软目标实际上就是已经训练好的复杂模型的归一化指数函数层的输出概率，而“蒸馏”的方法是在归一化指数函数层中引入了一个“温度”参数T：The soft target mentioned above is actually the output probability of the normalized exponential function layer of the trained complex model, and the "distillation" method introduces a "temperature" parameter T in the normalized exponential function layer:

其中，z_i为每一个分类的概率，q_i为学生模型(小模型)给出的归一化指数函数的输出，温度参数T通常设置为1。最简单的蒸馏形式为，训练小模型时，以复杂模型得到的软目标为目标，采用复杂模型中较高的T，训练后将T改为1。Among them, z _i is the probability of each classification, q _i is the output of the normalized exponential function given by the student model (small model), and the temperature parameter T is usually set to 1. The simplest form of distillation is to use the soft target obtained by the complex model as the target when training the small model, use the higher T in the complex model, and change T to 1 after training.

整个神经网络模型压缩的流程是：首先制作训练所用的目标数据集，此处为待检测目标的数据集，即标注好标签的图片集，利用数据集进行基础训练；然后设置学习率，对网络进行稀疏化训练，网络中许多缩放因子会趋近于零；之后进行网络剪枝，对缩放因子进行排序，剪掉值较小的缩放因子对应的通道；最后进行知识蒸馏，使用教师网络指导剪枝后的学生网络进行训练，这样就可以得到压缩后更加紧凑的神经网络模型。需要注意的是，在知识蒸馏之后，可以重新进行稀疏化训练、网络剪枝、知识蒸馏等步骤，对模型进行多次压缩。The entire neural network model compression process is: first, create the target data set for training, which is the data set of the target to be detected, that is, the set of labeled pictures, and use the data set for basic training; then set the learning rate, perform sparse training on the network, and many scaling factors in the network will approach zero; then perform network pruning, sort the scaling factors, and cut off the channels corresponding to the scaling factors with smaller values; finally, perform knowledge distillation, and use the teacher network to guide the pruned student network for training, so that a more compact neural network model can be obtained after compression. It should be noted that after knowledge distillation, sparse training, network pruning, knowledge distillation and other steps can be repeated to compress the model multiple times.

以上实施例仅用于说明本发明的设计思想和特点，其目的在于使本领域内的技术人员能够了解本发明的内容并据以实施，本发明的保护范围不限于上述实施例。所以，凡依据本发明所揭示的原理、设计思路所作的等同变化或修饰，均在本发明的保护范围之内。The above embodiments are only used to illustrate the design ideas and features of the present invention, and their purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly. The protection scope of the present invention is not limited to the above embodiments. Therefore, any equivalent changes or modifications made based on the principles and design ideas disclosed by the present invention are within the protection scope of the present invention.

Claims

1. A distributed multi-camera spherical unmanned system target detection method, characterized in that the unmanned system is composed of a plurality of quad-rotor drones, and the plurality of quad-rotor drones are assembled into a spherical unmanned system; a monocular camera is installed on the drones, and when assembled into a spherical state, the unmanned system is a rigid body carrying a plurality of cameras; comprising the following steps:

The first step is to design a distributed camera network topology for the multi-camera unmanned system, so that each camera can perform independent feature extraction processing on the captured scene images;

The second step is to use the multi-camera data fusion algorithm to perform feature-level image fusion processing on the image data provided by multiple cameras so that they can be fused into a complete image.

The third step is to establish a target detection algorithm based on deep learning, compress the neural network model, and use the compressed neural network model to perform target detection on the fused image obtained in the second step to complete the target detection task;

In the first step, the distributed camera network topology structure is specifically as follows: a processor is configured for each camera node as a processing unit of the camera node of the spherical unmanned system, and a topic-subscription mechanism in ROS is used to realize data communication of any number of camera nodes, and each camera node runs independently to perform independent feature extraction processing on the captured scene image;

In the second step, the multi-camera data fusion algorithm is specifically as follows:

(1) Obtaining target image feature points from multi-view images provided by multiple cameras;

(2) The feature point descriptors are transmitted to the central processing unit, and feature matching is performed on the input multi-view images so that the matched multi-view images are fused into a complete image;

An improved weighted smoothing algorithm is introduced in multi-view image fusion, namely:

Let f(x,y) represent the image after overlapping area fusion, which is obtained by weighted average of two images to be fused, _fL and _fR , that is:

f(x,y)＝α× _fL (x,y)+(1-α) _fR (x,y)

Among them, α is an adjustable factor, 0<α<1, that is, in the image intersection area, along the direction from the view 1 image to the view 2 image, α gradually changes from 1 to 0, achieving smooth fusion of the intersection area; in order to establish a greater correlation between the two images, the following formula is used for fusion processing:

make Then/> Where d ₁ and d ₂ represent the distances from a point in the intersection region to the left and right boundaries of the intersection region of two images of different view angles, respectively;

The third step of compressing the neural network model is specifically as follows:

(1) Create a target dataset for training and use it for basic training;

(2) Setting the learning rate to perform sparse training on the network will cause many scaling factors in the network to approach zero;

(3) Perform network pruning, sort the scaling factors, and prune the channels corresponding to the scaling factors with smaller values;

(4) Perform knowledge distillation and use the teacher network to guide the pruned student network for training to obtain a more compact neural network model after compression;

The knowledge distillation is specifically as follows:

(1) Training large models;

(2) The training target is updated from the traditional true value label to the soft target, and the training knowledge of the large model is transferred to the small model to obtain a more compact neural network model after compression.

2. The distributed multi-camera spherical unmanned system target detection method according to claim 1 is characterized in that after knowledge distillation, sparse training, network pruning, and knowledge distillation steps are re-performed to compress the model multiple times.

3. The distributed multi-camera spherical unmanned system target detection method according to claim 1, characterized in that the network pruning is specifically:

(1) Introduce a scaling factor, multiply it by the output of the channel, and calculate the newly introduced gradient terms associated with all filters;

(2) Jointly train the network weights and scaling factors, and perform sparse regularization on the scaling factors;

(3) Fine-tune the channel with a small factor so that all weights at the input or output of the same channel tend to zero at the same time during training, so as to cut off all input and output connections related to the channel and achieve channel refinement;

(4) Prune the channel with a scaling factor approximately equal to 0 and remove all its connections and corresponding weights to obtain the pruned network model.