CN114445688B - Target detection method for spherical unmanned system of distributed multi-camera - Google Patents

Target detection method for spherical unmanned system of distributed multi-camera Download PDF

Info

Publication number
CN114445688B
CN114445688B CN202210040564.2A CN202210040564A CN114445688B CN 114445688 B CN114445688 B CN 114445688B CN 202210040564 A CN202210040564 A CN 202210040564A CN 114445688 B CN114445688 B CN 114445688B
Authority
CN
China
Prior art keywords
camera
image
unmanned aerial
target detection
spherical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210040564.2A
Other languages
Chinese (zh)
Other versions
CN114445688A (en
Inventor
蔡志浩
牛钰
赵江
王英勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202210040564.2A priority Critical patent/CN114445688B/en
Publication of CN114445688A publication Critical patent/CN114445688A/en
Application granted granted Critical
Publication of CN114445688B publication Critical patent/CN114445688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method of a distributed multi-camera spherical unmanned system, wherein the unmanned system is composed of a plurality of four-rotor unmanned aerial vehicles, and the plurality of four-rotor unmanned aerial vehicles are spliced into the spherical unmanned system; the unmanned aerial vehicle is provided with a monocular camera, and when the unmanned aerial vehicle is spliced into a spherical state, the unmanned aerial vehicle system is a rigid body carrying a plurality of cameras; the method comprises the following steps: designing a distributed camera network topology structure aiming at a multi-camera unmanned system; carrying out image fusion processing on a characteristic layer by utilizing a multi-camera data fusion algorithm; and establishing a target detection algorithm based on deep learning, compressing the neural network model, and performing target detection processing on the fused image by using the compressed neural network model to finish a target detection task. The method improves the target detection speed and the target detection precision under the conditions of shielding and the like, and ensures that the spherical unmanned system can successfully complete the task.

Description

Target detection method for spherical unmanned system of distributed multi-camera
Technical Field
The invention relates to the technical field of multi-camera target detection, in particular to a moving target detection method of a distributed multi-camera network structure. And more particularly, to a method for detecting an object of a spherical multi-camera unmanned system in a scrolling mode.
Background
In recent years, with the development of science and technology, mobile robots gradually enter modern society life and begin to play an increasingly important role in fields of industrial production and the like, and due to the rapid expansion of robot markets and the development of artificial intelligence fields, various target detection, identification or tracking technologies based on mobile robots are also getting more and more attention of researchers. Compared with a wheeled robot and a foot robot, the spherical robot has larger difference in mechanical structure and motion characteristics, and the related research has less limitation, so that the method has certain scientific innovation and practical significance for the research of a moving object detection method of a spherical unmanned system.
At present, detection, identification or tracking technology for various targets (vehicles, pedestrians, faces, gestures and the like) in images or videos has become the main stream direction of computer vision intelligence. The video image analysis and processing technology has practical application value and wide development prospect, including but not limited to intelligent cities, social security management, home security systems, live broadcast and analysis of sports event pictures, medical monitoring and the like. The system adopting a single camera to work is limited by hardware (video resolution, transmission problem and the like), especially the limitation of the visual field range, so that the traditional video image acquisition technology cannot meet the requirements on the data quantity and quality; in contrast, multi-camera collaborative work can compensate for the limitations of a single camera in operation, and has become one of the hot spots under current research.
Disclosure of Invention
The invention aims at researching a moving target detection method of a triphibian spherical modularized self-assembled unmanned system, the unmanned system can be switched among three states of a flight mode, a rolling mode and a sailing mode, and the triphibian spherical modularized self-assembled unmanned system consists of a plurality of quadrotor unmanned planes, and each unmanned plane can independently or jointly execute corresponding tasks such as searching, exploration, communication and the like. Each independent sub-module unmanned aerial vehicle of the spherical unmanned aerial vehicle system is provided with a monocular camera, and when the spherical rolling state is formed by splicing, six sub-module unmanned aerial vehicles can be regarded as a unified rigid body carrying six cameras. Through cooperative formation, real-time communication and distributed control among the module units, the omnibearing rolling of the triphibian spherical modularized self-assembled unmanned system on the ground can be realized, and the task requirements of the spherical unmanned system such as front line investigation, target detection and tracking are met. The invention provides a target detection method of a distributed spherical multi-camera unmanned system, which aims to solve the problem that a single camera is limited in visual angle and cannot provide accurate cognition to a target when the spherical multi-camera unmanned system detects a moving target.
The technical scheme adopted by the invention is that a distributed camera network topology structure and a data fusion algorithm among multiple cameras are designed aiming at a multi-camera system, and a compressed neural network model is used for target detection, and the technical scheme adopted by the invention is as follows:
The target detection method of the distributed multi-camera spherical unmanned system comprises a plurality of quadrotor unmanned aerial vehicles which are spliced into the spherical unmanned system; the unmanned aerial vehicle is provided with a monocular camera, and when the unmanned aerial vehicle is spliced into a spherical state, the unmanned aerial vehicle system is a rigid body carrying a plurality of cameras; the method comprises the following steps:
firstly, designing a distributed camera network topology structure aiming at a multi-camera unmanned system, and enabling each camera to respectively conduct independent feature extraction processing on shot scene images;
Secondly, performing feature-level image fusion processing on image data provided by a plurality of cameras by utilizing a multi-camera data fusion algorithm to fuse the image data into a complete image;
Thirdly, establishing a target detection algorithm based on deep learning, compressing the neural network model, and performing target detection on the fusion image obtained in the second step by using the compressed neural network model to finish a target detection task.
Further, in the first step, the distributed camera network topology structure specifically includes: and a processor is configured for each camera node to serve as a processing unit of the camera node of the spherical unmanned aerial vehicle system, the topic-subscription mechanism in the ROS is utilized to realize data communication of any plurality of camera nodes, each camera node independently operates, and the photographed scene image is independently subjected to feature extraction processing.
Further, in the second step, the multi-camera data fusion algorithm specifically includes:
(1) Acquiring target image feature points of multi-view images provided by a plurality of cameras;
(2) And transmitting the feature point descriptors to a central processing unit, and performing feature matching on the input multi-view images to enable the matched multi-view images to be fused into a complete image.
Further, an improved weighted smoothing algorithm is introduced in multi-view image fusion, namely:
the image after the overlapping region fusion is represented by f (x, y), which is obtained by weighted average of 2 images f L and f R to be fused, namely:
f(x,y)=α×fL(x,y)+(1-α)fR(x,y)
Alpha is an adjustable factor, namely alpha is more than 0 and less than 1, namely alpha is gradually changed from 1 to 0 along the direction from the image at the view angle 1 to the image at the view angle 2 in the image crossing region, so that smooth fusion of the crossing region is realized; to build greater correlation for 2 images, the fusion process is performed using the following formula:
Order the Then/>Where d 1、d2 represents the distance of the point in the intersection region to the left and right boundaries of the intersection region of the 2 different view images, respectively.
Further, the third step, the compressing the neural network model specifically includes:
(1) Making a target data set for training, and performing basic training by using the data set;
(2) Setting a learning rate, sparsifying the network, and enabling a plurality of scaling factors in the network to approach zero;
(3) Performing network pruning, sorting the scaling factors, and pruning channels corresponding to the scaling factors with smaller values;
(4) And (3) carrying out knowledge distillation, and guiding the pruned student network to train by using a teacher network to obtain a compressed more compact neural network model.
Further, after knowledge distillation, the steps of sparsification training, network pruning and knowledge distillation are carried out again, and the model is compressed for a plurality of times.
Further, the network pruning specifically includes:
(1) Introducing a scaling factor, multiplying it by the output of the channel, calculating a newly introduced gradient term associated with all filters;
(2) Training the network weight and the scaling factor in a combined way, and regularizing the scaling factor sparsity;
(3) The channel is finely adjusted by a small factor, so that all weights of input or output ends of the same channel simultaneously tend to be zero during training, all input and output connections related to the channel are cut off, and channel refinement is realized;
(4) The channel is clipped for a scaling factor of approximately 0 and all connections and corresponding weights are removed, resulting in a pruned network model.
Further, the knowledge distillation specifically includes:
(1) Training a large model;
(2) And updating the training target from the traditional truth label to a soft target, and transferring training knowledge of a large model to a small model to obtain a compressed and more compact neural network model.
Compared with the prior art, the application has the beneficial effects that:
(1) Compared with the situation that the visual angle is limited when a single camera works, a plurality of cameras are arranged at the proper positions of the spherical unmanned system, so that information of different visual angles of the same target can be obtained, the lost positive information when the relative angle of the single camera changes can be made up, and an effective solution way is provided for the shielding problem.
(2) Considering that the calculation capability of an onboard processor carried by the unmanned aerial vehicle is limited, and huge data volume is introduced by using multiple cameras, the neural network model is compressed, and the real-time performance of target detection can be improved, so that the unmanned aerial vehicle system can smoothly finish tasks.
(3) The distributed camera network topology is designed for a plurality of cameras in the spherical unmanned system, so that the system has distributed computing and communication characteristics, and high mobility and stability. Processor resources are configured at each node to ensure local computing power so that the multi-camera system continues to maintain an operational state even if a node is out of function.
Drawings
FIG. 1 is a triphibian spherical modular self-assembling unmanned system;
FIG. 2 is an exploded schematic view of a triphibian spherical modular self-assembled unmanned system module;
FIG. 3 is a schematic view of a triphibian spherical modular self-assembled unmanned system camera;
FIG. 4 is a technical roadmap of a distributed multi-camera spherical unmanned system target detection method;
FIG. 5 is a diagram of a network topology of a multi-camera system distributed camera;
Fig. 6 is a neural network model compression flow.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples of the specification.
According to the target detection method for the distributed multi-camera spherical unmanned system, as shown in fig. 1-3, the unmanned system is composed of a plurality of four-rotor unmanned aerial vehicles, and the plurality of four-rotor unmanned aerial vehicles are spliced into the spherical unmanned system; the unmanned aerial vehicle is provided with a monocular camera, and when the unmanned aerial vehicle is spliced into a spherical state, the unmanned aerial vehicle system is a rigid body carrying a plurality of cameras; as shown in fig. 4, the whole process of realizing target detection by the distributed spherical multi-camera unmanned system mainly comprises: the method comprises the steps of designing a network topology structure and image characteristics of a distributed camera of a spherical unmanned multi-camera system, designing a multi-camera data fusion algorithm, establishing a target detection algorithm based on deep learning and compressing a neural network model, and comprises the following specific technical scheme:
The first step: and (3) designing a distributed camera network topology structure of the spherical unmanned multi-camera system. As shown in fig. 5, the distributed camera network topology structure can enable the spherical multi-camera unmanned system to have distributed computing and communication characteristics and high mobility and stability. In the system structure, in order to avoid the hidden trouble that once a main server is in error in a centralized structure, the whole system is crashed, a distributed structure is adopted, each node is configured with processor resources, the local computing capacity is ensured, and the array system can also continuously maintain the working state when the functions of any node are unbalanced. In order to meet the calculation performance of the distributed system, the raspberry group 4b is configured as the processing unit of the camera node of the spherical unmanned aerial vehicle for each camera node in consideration of the configuration and use difficulty and evaluation of the cost and the size range of the processing unit, so as to complete the preparation work of the image information collection, the target image feature extraction and the like before the multi-camera data fusion.
And referring to a multi-machine communication scheme suitable for a distributed structure, the topic-subscription mechanism in the ROS is utilized to realize data communication of any plurality of camera nodes, each camera node independently operates, and the shot scene image is subjected to moving object detection processing. The distributed camera network topology has the advantages that each camera node is configured with a certain computing resource to ensure local processing capacity, the operation of each camera node has no dependency, and each node can exchange data with any other node.
The method comprises the following steps of: firstly, performing rough extraction, namely selecting a point P from an image, drawing a circle with the radius of 3 pixels by taking the P as the center of a circle, and considering the P as a characteristic point if the gray value of n continuous pixel points on the circle is larger or smaller than that of the P point; training a decision tree by a machine learning method, inputting 16 pixels on the circumference of the feature points into the decision tree, and screening out the optimal FAST feature points once; the third step is to remove the local denser feature point through the non-maximum value suppression algorithm; fourthly, building a pyramid to realize the multi-scale invariance of the feature points; and fifthly, determining the direction of the FAST feature points by using a moment method, and realizing the rotation invariance of the feature points.
And a second step of: and designing a multi-camera data fusion algorithm. When a plurality of cameras of the spherical multi-camera unmanned system cooperatively detect targets, information of different visual angles of the same targets can be obtained, and in order to better complete target detection tasks, image features of the same targets or scenes acquired by the plurality of cameras are subjected to image fusion processing through a specific feature matching algorithm (such as a violence matching algorithm), so that front information of the targets lost when the relative angles of the single cameras are changed can be made up. For example, although the multispectral image has rich spectral information, the resolution is low, the full-color image has higher resolution, but the color recognition is low, the complementary information of the multispectral image and the full-color image can be fully utilized by image fusion, and the fused image can better provide scene information, so that the human eyes can observe the scene information conveniently and further process the scene information by machines. Therefore, a multi-camera data fusion algorithm is designed for the spherical unmanned multi-camera system, so that the accuracy and the credibility of target detection can be improved, and the fault tolerance of the system can be improved.
Firstly, obtaining target image feature points of multi-view images provided by a plurality of cameras; and then the feature point descriptors are transmitted to a central processing unit, and feature matching is carried out on the input multi-view images, so that a plurality of matched images are fused into a complete image. In consideration of the fact that color differences are generated after images acquired by cameras with different angles are fused in the practical problem, the method introduces an improved weighted smoothing algorithm in the process of multi-view image fusion so as to solve the problem of color differences generated in the process of fusion of images with different view angles.
The weighted smoothing algorithm is specifically: the image after the overlapping region fusion is represented by f (x, y), which is obtained by weighted average of 2 images f L and f R to be fused, namely:
f(x,y)=α×fL(x,y)+(1-α)fR(x,y)
where α is an adjustable factor, typically 0 < α <1, i.e., in the image intersection region, α is gradually changed from 1 to 0 along the direction from the view 1 image to the view 2 image, so as to achieve smooth fusion of the intersection region. To build greater correlation for 2 images, the fusion process is performed using the following formula:
Order the Then/>Where d 1、d2 represents the distance of the point in the intersection region to the left and right boundaries of the intersection region of the 2 different view images, respectively.
In particular, the basis of fusion of multiple images is fusion of two by two. Six images are not needed when the targets are detected in the actual scene, and the probability that each camera detects the targets is ranked first; and then taking the camera with the highest detection probability as the center (most likely to be opposite to the target), and fusing 3-4 images on the camera to detect the camera to increase the reliability and accuracy of target detection.
And a third step of: and establishing a target detection algorithm based on deep learning, and performing compression processing on the neural network model. Because of the limited capability of the spherical unmanned multi-camera system platform, the computing capability of the onboard processor is limited, so that a proper target detection algorithm needs to be selected to meet the requirements of real-time detection and detection precision of targets. Compared with the traditional target detection based on the sliding window, the target detection algorithm YOLO v3 based on the deep learning adopts a more direct thought, does not select a candidate region, directly predicts the position and the category of the target, and greatly improves the target detection precision and the target detection speed.
The YOLO model performs gridding on an input image by using an sxs gridding diagram to cover the input image, if the center of an object falls within a grid cell, the grid cell is responsible for detecting the object and predicting the class information and the confidence level of the bounding box, and by setting a confidence level screening bounding box, a bounding box with low confidence level is discarded, and a non-maximum suppression process is performed on a bounding box with high confidence level that remains to delete a highly redundant bounding box. The improved YOLO v3 algorithm uses the thought of jump connection in a residual network to refer to a new network structure Darknet-53, adopts 3 feature graphs with different scales to detect targets, uses a logistic regression classifier to replace a normalized exponential function classifier to predict categories, can simultaneously predict a plurality of categories, detects multi-label target objects, and improves the prediction precision of the YOLO v3 algorithm on the premise of keeping the speed advantage, and particularly enhances the recognition capability of small objects.
However, when the deep learning algorithm is transplanted on an airborne platform with limited operation resources and complex conditions, the problem of occupation of the operation resources must be considered, so that it is necessary to optimize the target detection algorithm, perform light-weight processing on the neural network model, compress the model to reduce the number of parameters, and improve the detection speed under the condition of small precision drop so as to better adapt to the special requirements of the airborne platform hardware and complete the target detection task.
The neural network model compression flow adopted by the invention is shown in figure 6. There are a large number of redundant parameters in the deep learning network model, and the activation values of most neurons in the convolutional layer or the fully connected layer tend to be 0, and the neurons can also show the same model expression capacity after being removed, which is called over parameterization. Therefore, in the process of training the neural network by using the data set, the network parameters are removed by adopting a network pruning and knowledge distillation method, redundant network nodes and weight connection are removed, and even redundant convolution kernels are removed, so that the network structure is more simplified.
Network pruning reduces the number of parameters by eliminating redundant unimportant connections. The network pruning method mainly comprises two main categories of sparse constraint in training and pruning after training. The sparse constraint is that under the condition that a model is not required to be trained in advance, the network structure tends to be sparse by adding the sparse constraint to the optimization function, and the sparse constraint of the network loss function is mainly realized by introducing L1 regularization and L2 regularization constraint; pruning after training is to remove relatively unimportant parts in the network to make the network sparse and simplified, and is the simplest and effective method at present. And the trained network pruning starts from the existing training model, so that redundant information in the network is gradually eliminated, and the loss caused by network retraining is avoided.
According to the difference of fine-scale pruning, the method mainly comprises nuclear pruning, channel pruning, interlayer pruning and k multiplied by k nuclear pruning. To achieve channel refinement, all input and output connections associated with the channel must be cut off, which makes the method of directly pruning weights on the pre-training model ineffective, because pruning requires pruning weights that go to zero, but all weights at the input or output of the channel cannot go to zero. The present invention thus solves this problem by implementing sparse regularization in the training objective function, specifically by using the packet Lasso method so that the same channel of all filters goes to 0 at the same time during training. In calculating the newly introduced gradient terms associated with all filters, the present invention introduces a scaling factor for each channel and multiplies it by the output of the channel; then, the network weight and the scaling factors are trained in a combined mode, and the latter is subjected to sparse regularization; finally, these channels are fine-tuned with a small factor. The objective function is defined as follows:
Wherein L is an objective function, f represents x, W, there is a mapping relationship between f (x, W), L represents f (x, W), y has a mapping relationship without practical meaning, Γ represents a coefficient set of a feature, (x, y) represents a training input and a target, W represents a trainable weight, a first summation term corresponds to a loss of normal training of the convolutional neural network, g (γ) is a sparsity penalty on a scaling factor, and λ is a balance factor of the two terms. Here g (γ) = |γ| i.e. L1 regularization is chosen and a sub-gradient descent is employed as an optimization method for the non-smooth L1 penalty term.
After sparse normalization for each channel, many of the scaling factors in the model become 0, and then the channels can be clipped for scaling factors of approximately 0 and all of their connections and corresponding weights removed. For all layers, the invention sets a global threshold value according to the values of all scaling factors to cut channels, so that network parameters and calculation operations are much less, and memory occupation in operation is saved.
And knowledge distillation refers to that in the training process of the small model, knowledge learned by the large model is used as a guide, so that the small model has a similar detection effect of the large model under the condition of lower parameter quantity. Knowledge distillation has two ideas, one is to train a large model and a small model at the same time; the other is to train a large model and then refine a small model on the basis of the large model. In fact, a more universal complex model can be trained, and a high-performance model can be obtained by reducing training cost on a small-scale special task in a large scale through knowledge transfer of the complex model. Therefore, the knowledge distillation method adopted by the invention is that the training knowledge of a large model is transferred to a special small model, when the small model is trained, the training target is updated from the traditional truth value label to a so-called soft target, and the soft target can provide larger information entropy in the training process and better transfer the trained model knowledge to a new model. And because the gradient variance between different training samples is smaller under the target, training data required by learning of a small model can be greatly reduced, a higher learning rate can be used, and the iteration speed of the model can be obviously accelerated.
The soft target is actually the output probability of the normalized exponential function layer of the trained complex model, and the method of "distillation" is to introduce a "temperature" parameter T into the normalized exponential function layer:
Where z i is the probability of each class, q i is the output of the normalized exponential function given by the student model (small model), and the temperature parameter T is typically set to 1. The simplest distillation form is that when training a small model, a soft target obtained by a complex model is taken as a target, a higher T in the complex model is adopted, and the T is changed into 1 after training.
The whole neural network model compression flow is as follows: firstly, a target data set for training is manufactured, wherein the target data set is a data set of a target to be detected, namely a picture set marked with a label, and basic training is carried out by utilizing the data set; setting a learning rate, and performing sparse training on the network, wherein a plurality of scaling factors in the network approach zero; then, network pruning is carried out, the scaling factors are ordered, and channels corresponding to the scaling factors with smaller values are pruned; and finally, carrying out knowledge distillation, and guiding the pruned student network to train by using a teacher network, so that a compressed more compact neural network model can be obtained. It should be noted that after knowledge distillation, steps such as sparsification training, network pruning, knowledge distillation, etc. may be performed again, and the model may be compressed multiple times.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims (3)

1. The target detection method for the spherical unmanned aerial vehicle system of the distributed multi-camera is characterized in that the unmanned aerial vehicle system is composed of a plurality of four-rotor unmanned aerial vehicles, and the plurality of four-rotor unmanned aerial vehicles are spliced into the spherical unmanned aerial vehicle system; the unmanned aerial vehicle is provided with a monocular camera, and when the unmanned aerial vehicle is spliced into a spherical state, the unmanned aerial vehicle system is a rigid body carrying a plurality of cameras; the method comprises the following steps:
firstly, designing a distributed camera network topology structure aiming at a multi-camera unmanned system, and enabling each camera to respectively conduct independent feature extraction processing on shot scene images;
Secondly, performing feature-level image fusion processing on image data provided by a plurality of cameras by utilizing a multi-camera data fusion algorithm to fuse the image data into a complete image;
thirdly, establishing a target detection algorithm based on deep learning, compressing a neural network model, and performing target detection on the fused image obtained in the second step by using the compressed neural network model to finish a target detection task;
The first step, the distributed camera network topology structure specifically includes: a processor is configured for each camera node to serve as a processing unit of the camera node of the spherical unmanned aerial vehicle system, the topic-subscription mechanism in the ROS is utilized to realize data communication of any plurality of camera nodes, each camera node independently operates, and the photographed scene image is independently subjected to feature extraction processing;
The second step, the multi-camera data fusion algorithm specifically comprises:
(1) Acquiring target image feature points of multi-view images provided by a plurality of cameras;
(2) Transmitting the feature point descriptors to a central processing unit, and performing feature matching on the input multi-view images to enable the matched multi-view images to be fused into a complete image;
an improved weighted smoothing algorithm is introduced in multi-view image fusion, namely:
the image after the overlapping region fusion is represented by f (x, y), which is obtained by weighted average of 2 images f L and f R to be fused, namely:
f(x,y)=α×fL(x,y)+(1-α)fR(x,y)
Wherein alpha is an adjustable factor, 0< alpha <1, namely in the image intersection area, alpha is gradually changed from 1 to 0 along the direction from the view angle 1 image to the view angle 2 image, so that smooth fusion of the intersection area is realized; to build greater correlation for 2 images, the fusion process is performed using the following formula:
Order the Then/>Wherein d 1、d2 represents distances from a point in the intersection region to left and right boundaries of the intersection region of 2 different view images, respectively;
the third step, compressing the neural network model specifically includes:
(1) Making a target data set for training, and performing basic training by using the data set;
(2) Setting a learning rate, sparsifying the network, and enabling a plurality of scaling factors in the network to approach zero;
(3) Performing network pruning, sorting the scaling factors, and pruning channels corresponding to the scaling factors with smaller values;
(4) Performing knowledge distillation, and guiding the pruned student network to train by using a teacher network to obtain a compressed more compact neural network model;
The knowledge distillation is specifically as follows:
(1) Training a large model;
(2) And updating the training target from the traditional truth label to a soft target, and transferring training knowledge of a large model to a small model to obtain a compressed and more compact neural network model.
2. The method for detecting the target of the distributed multi-camera spherical unmanned system according to claim 1, wherein the steps of sparsification training, network pruning and knowledge distillation are performed again after the knowledge distillation, and the model is compressed for a plurality of times.
3. The method for detecting the target of the distributed multi-camera spherical unmanned system according to claim 1, wherein the network pruning is specifically:
(1) Introducing a scaling factor, multiplying it by the output of the channel, calculating a newly introduced gradient term associated with all filters;
(2) Training the network weight and the scaling factor in a combined way, and regularizing the scaling factor sparsity;
(3) The channel is finely adjusted by a small factor, so that all weights of input or output ends of the same channel simultaneously tend to be zero during training, all input and output connections related to the channel are cut off, and channel refinement is realized;
(4) The channel is clipped for a scaling factor of approximately 0 and all connections and corresponding weights are removed, resulting in a pruned network model.
CN202210040564.2A 2022-01-14 2022-01-14 Target detection method for spherical unmanned system of distributed multi-camera Active CN114445688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210040564.2A CN114445688B (en) 2022-01-14 2022-01-14 Target detection method for spherical unmanned system of distributed multi-camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210040564.2A CN114445688B (en) 2022-01-14 2022-01-14 Target detection method for spherical unmanned system of distributed multi-camera

Publications (2)

Publication Number Publication Date
CN114445688A CN114445688A (en) 2022-05-06
CN114445688B true CN114445688B (en) 2024-06-04

Family

ID=81367521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210040564.2A Active CN114445688B (en) 2022-01-14 2022-01-14 Target detection method for spherical unmanned system of distributed multi-camera

Country Status (1)

Country Link
CN (1) CN114445688B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080529A (en) * 2019-12-23 2020-04-28 大连理工大学 Unmanned aerial vehicle aerial image splicing method for enhancing robustness
CN111757822A (en) * 2018-02-26 2020-10-09 联邦快递服务公司 System and method for enhanced collision avoidance on logistics floor support equipment using multi-sensor detection fusion
CN112215334A (en) * 2020-09-24 2021-01-12 北京航空航天大学 Neural network model compression method for event camera
CN113870379A (en) * 2021-09-15 2021-12-31 北京易航远智科技有限公司 Map generation method and device, electronic equipment and computer readable storage medium
CN113888408A (en) * 2021-09-26 2022-01-04 浙江理工大学 Multi-camera image acquisition method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110598731B (en) * 2019-07-31 2021-08-20 浙江大学 Efficient image classification method based on structured pruning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111757822A (en) * 2018-02-26 2020-10-09 联邦快递服务公司 System and method for enhanced collision avoidance on logistics floor support equipment using multi-sensor detection fusion
CN111080529A (en) * 2019-12-23 2020-04-28 大连理工大学 Unmanned aerial vehicle aerial image splicing method for enhancing robustness
CN112215334A (en) * 2020-09-24 2021-01-12 北京航空航天大学 Neural network model compression method for event camera
CN113870379A (en) * 2021-09-15 2021-12-31 北京易航远智科技有限公司 Map generation method and device, electronic equipment and computer readable storage medium
CN113888408A (en) * 2021-09-26 2022-01-04 浙江理工大学 Multi-camera image acquisition method and device

Also Published As

Publication number Publication date
CN114445688A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
Li et al. A survey on semantic segmentation
CN110781262B (en) Semantic map construction method based on visual SLAM
CN110956651A (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN111325797A (en) Pose estimation method based on self-supervision learning
CN112884742B (en) Multi-target real-time detection, identification and tracking method based on multi-algorithm fusion
CN110619268B (en) Pedestrian re-identification method and device based on space-time analysis and depth features
CN113628249A (en) RGBT target tracking method based on cross-modal attention mechanism and twin structure
Wen et al. Hybrid semi-dense 3D semantic-topological mapping from stereo visual-inertial odometry SLAM with loop closure detection
CN112651262A (en) Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment
CN116385761A (en) 3D target detection method integrating RGB and infrared information
CN113870160B (en) Point cloud data processing method based on transformer neural network
CN109919223B (en) Target detection method and device based on deep neural network
Weng et al. Sgformer: A local and global features coupling network for semantic segmentation of land cover
Sun et al. Multi-YOLOv8: An infrared moving small object detection model based on YOLOv8 for air vehicle
CN116363748A (en) Power grid field operation integrated management and control method based on infrared-visible light image fusion
Saleem et al. Neural network-based recent research developments in SLAM for autonomous ground vehicles: A review
Cheng et al. SLBAF-Net: Super-Lightweight bimodal adaptive fusion network for UAV detection in low recognition environment
CN118314377A (en) Edge computing-oriented light-weight bimodal interaction target detection method
Li et al. Multi-modal fusion architecture search for land cover classification using heterogeneous remote sensing images
CN114445688B (en) Target detection method for spherical unmanned system of distributed multi-camera
Tan et al. UAV image object recognition method based on small sample learning
CN114022516A (en) Bimodal visual tracking method based on high rank characteristics and position attention
Yin et al. A review of visible single target tracking based on Siamese networks
Zheng et al. Deep semantic segmentation of unmanned aerial vehicle remote sensing images based on fully convolutional neural network
CN113111721B (en) Human behavior intelligent identification method based on multi-unmanned aerial vehicle visual angle image data driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant