CN111204452A

CN111204452A - Target detection system based on miniature aircraft

Info

Publication number: CN111204452A
Application number: CN202010083951.5A
Authority: CN
Inventors: 姚德臣; 刘恒畅; 杨建伟; 张骄; 武向鹏; 寇子明; 国树东
Original assignee: Beijing University of Civil Engineering and Architecture
Current assignee: Beijing University of Civil Engineering and Architecture
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2020-05-29
Anticipated expiration: 2040-02-10
Also published as: CN111204452B

Abstract

The invention provides a target detection system based on a micro aircraft, which is characterized in that video data are collected through a camera arranged on a four-axis flight platform, and then an object detection system comprising a multi-branch depth separable convolutional neural network and a Single Shot MultiBox Detector operation module is borne through a Raspberry pi 4 micro control unit and a Coral USB accelerating rod additionally arranged on the Raspberry pi 4 micro control unit to perform object detection on a frame picture in the video data. The invention reduces the size of the model by using the depth separable convolution and improves the generalization of the model by using the multi-branch structure. Under the condition that a Coral USB accelerating rod is additionally arranged on the Raspberry pi 4, an object can be rapidly detected through the constructed MBDSCNN-SSD target detection model.

Description

Target detection system based on miniature aircraft

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to a target detection system based on a micro aircraft.

Background

The unmanned aerial vehicle system is a technology-intensive industry in the aerospace manufacturing industry, and various high and new technology systems are integrated in the design and production process of the unmanned aerial vehicle system.

Military, unmanned aerial vehicles have become a large highlight in weaponry of various countries. Because the unmanned aerial vehicle has the characteristics of low operation cost, no casualty risk of people, convenient use and the like, the unmanned aerial vehicle can be widely applied to the fields of anti-terrorism and anti-riot, emergency rescue and relief, aerial survey and aerial photograph, state and soil resource management, environmental protection, city planning and management, movie and television, power department line patrol, maritime patrol, highway patrol, forestry department fire prevention, agricultural (forestry) department disease and pest monitoring and control and the like. The use of unmanned aerial vehicles in the civilian field has also gradually been promoted, and more trades, departments and units are replacing traditional work modes with unmanned aerial vehicles.

From the development of the unmanned aerial vehicle, the specialization and the popularization are not contradictory, and the manufacturers of the unmanned aerial vehicle are also trying to develop new products and new systems to reduce the requirements on the operation technology of the aircraft as much as possible.

One recent trend in consumer-grade drones is to be smaller and lighter. One of the main trends is what we say is "pocket drone". In this respect, the well-known drone, the enterprise, in great jiang, has released the drone Spark.

However, the existing unmanned aerial vehicle technology still has the limitations:

the existing unmanned aerial vehicle technology has a complex flight control system, and an operator is required to manually observe or adjust the flight attitude in time according to unmanned aerial vehicle sensing data. This has higher requirement to unmanned aerial vehicle gesture recognition sensing equipment and the operation requirement of flying hand. Unmanned aerial vehicle attitude identification's error has restricted unmanned aerial vehicle to the controlling of unmanned aerial vehicle to a certain extent to influence the large-scale application of unmanned aerial vehicle.

In addition, the existing visual identification system carried on the unmanned aerial vehicle has a complex target detection algorithm and needs a large amount of computing resources as a support, so that the real-time performance and the accuracy of target identification are required to be improved.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a target detection system based on a micro aircraft. The invention specifically adopts the following technical scheme.

A micro-aircraft-based object detection system is provided with a micro-aircraft comprising: the four-axis flying platform comprises a platform main body, four connecting shafts extending outwards are symmetrically arranged at four corners of the platform main body respectively, the far ends of the four connecting shafts are provided with four hollow cup motors respectively, motor shafts of the four hollow cup motors extend upwards and are connected with a group of propellers respectively, and the hollow cup motors drive the propellers to rotate to drive the micro aircraft to take off, land, fly or adjust the attitude in the air;

the object detection system, it includes:

the camera is arranged on the platform main body and used for collecting video data;

the visual identification unit is connected with the camera, comprises a Raspberry pi 4 micro control unit and a Coral USB acceleration rod additionally arranged on the Raspberry pi 4 micro control unit, and is used for receiving video data acquired by the camera and carrying out object detection on a frame picture in the video data; wherein the content of the first and second substances,

a target detection model is arranged in the Raspberry pi 4 miniature control unit, and the target detection model comprises a multi-branch depth separable convolutional neural network and a Single Shot MultiBox Detector operation module;

the multi-branch depth separable convolution neural network is used for firstly carrying out 3x3 convolution on a frame picture in video data acquired by a camera, then carrying out depth separable convolution on a convolution result output by the 3x3 convolution, inputting data obtained by the depth separable convolution into a connection filter, and then sequentially carrying out two times of depth separable convolution with different parameters on the data output after the connection filter and outputting the data to a global average pooling layer;

the Single Shot multi box Detector operation module is used for performing 3x3 convolution on convolution results output by 3x3 convolution, data output after a filter is connected, and data output to a global average pooling layer after two times of depth separable convolution, adding three 3x3 convolution results and pooling data obtained by the global average pooling layer into a non-maximum suppression layer for detection, outputting a detection result, and identifying a target in video data collected by a camera according to the detection result.

Optionally, in any one of the above target detection systems based on a micro aircraft, the Single shotmultitubox Detector operation module is specifically connected to the MBDSCNN as a front network.

Optionally, in the target detection system based on a micro aircraft, the Single shottbubox Detector operation module is provided with three additional convolutional layers of 3x3, where the size of the three additional convolutional layers of 3x3 decreases gradually from layer to layer, respectively corresponding to a convolutional result output by a 3x3 convolutional in a pre-network MBDSCNN, corresponding to data output after a filter is connected, and corresponding to data output to a global average pooling layer after two deep separable convolutions.

Optionally, in any one of the above target detection systems based on a micro aircraft, the pre-network MBDSCNN includes 1 general convolution with a convolution kernel of 3 × 3, 9 depth separable convolutions, 1 global average pooling layer, and one full connection layer; wherein the 9 depth separable convolutions comprise three branch structures, each branch structure comprising two layers having the same convolution step s and the same filter number c, respectively.

Optionally, in the target detection system based on the micro aircraft, the target detection model merges outputs of the three layers of additional convolutional layers decreasing layer by layer to replace a fully connected layer of the MBDSCNN, and then adds an output of the three layers of additional convolutional layers decreasing layer by layer to a non-maximum suppression layer (NMS) for detection, and outputs a detection result.

Meanwhile, the invention also provides a target detection method based on the micro aircraft, which comprises the following steps:

the method comprises the steps of firstly, acquiring video data collected by a camera arranged on a four-axis flight platform;

secondly, performing 3x3 convolution on frame pictures in video data acquired by a camera, and then performing depth separable convolution on convolution results output by the 3x3 convolution, wherein the depth separable convolution comprises three layers, the convolution step s and the filter number c of the first layer of depth separable convolution are both smaller than those of the second layer of depth separable convolution, the second layer of depth separable convolution is set to comprise three branch structures, and each branch structure of the second layer of depth separable convolution respectively has the same convolution step s and the same filter number c;

inputting data obtained by depth separable convolution into a connection filter, and then sequentially performing depth separable convolution on the data output after the connection filter in a fourth layer and a fifth layer with different parameters respectively and outputting the data to a global average pooling layer;

fourthly, respectively performing a layer of 3x3 extra convolutional layer calculation on the convolution result output by the 3x3 convolution in the second step and the third step, the data output after the filter is connected, and the data output to the global average pooling layer after the fifth layer depth separable convolution;

and fifthly, adding the calculation results of the three layers of the 3x3 extra convolution layers and the pooling data obtained by the global average pooling layer into a non-maximum value inhibition layer for detection, outputting a detection result, and identifying a target in the video data acquired by the camera according to the detection result.

Optionally, in the method for detecting a target based on a micro aircraft, a size of the additional convolutional layer of the first layer 3x3, which is calculated based on the convolution result output by the convolution with 3x3 in the second step, is larger than a size of the additional convolutional layer of the second layer 3x3, which is calculated based on the data output by the filter after the filter is connected in the third step, and a size of the additional convolutional layer of the second layer 3x3, which is calculated based on the data output by the filter after the filter is connected in the third step, is larger than a size of the additional convolutional layer of the third layer 3x3, which is calculated based on the data output to the global average pooling layer after the fifth layer of depth separable convolution in the third step.

Optionally, in the second step, the convolution step size and the number of filters corresponding to two layers of depth separable convolutions with three branch structures are both greater than the convolution step size and the number of filters used for performing convolution with 3 × 3 on a frame picture in video data acquired by a camera.

Optionally, in the fifth step, the step of adding the calculation result of the additional convolutional layers of three layers 3 × 3 and the pooling data obtained by the global average pooling layer to the non-maximum suppression layer for detection specifically includes: respectively merging the calculation results of the three layers of additional convolution layers which are gradually decreased layer by layer to replace the full-connection layer of the MBDSCNN, adding a non-maximum value suppression layer (NMS) for detection, and outputting a detection result; the MBDSCNN is a front-end network operated by a Single Shot multi box Detector corresponding to a non-maximum suppression layer (NMS).

Advantageous effects

The invention collects video data through a camera arranged on a four-axis flight platform, and then carries out object detection on a frame picture in the video data through a Raspberrypi 4 micro control unit and a target detection system loaded on the Raspberrypi 4 micro control unit and a Coral USB accelerator stick and comprising a multi-branch depth separable convolutional neural network and a Single Shot MultiBox Detector operation module. The invention reduces the size of the model by using the depth separable convolution and improves the generalization of the model by using the multi-branch structure. Under the condition that a Coral USB accelerating rod is additionally arranged on the Raspberry pi 4, an object can be rapidly detected through the constructed MBDSCNN-SSD target detection model. The detection speed of the invention can reach 35fps, and the invention can realize real-time detection of various objects.

Furthermore, the invention can realize remote control and communication of the micro aircraft through the SMA antenna. The aircraft can adopt a hollow cup motor to provide flight power for the aircraft. The aircraft can realize basic control function through the STM32F103 singlechip, and the target detection to the camera image can be realized to its cooperation Raspberry Pi 4. In order to improve the speed of target detection, the invention can also add a Coral USB acceleration rod on the Raspberry Pi 4 to improve the reasoning speed of the deep learning model on the Raspberry Pi 4. In order to quickly and accurately realize target detection on a video stream on mobile terminal equipment, a target detection model is specifically established by using a multi-branch depth separable convolutional neural network and a Single Shot multi box Detector (MBDSCNN-SSD), so that the efficiency and accuracy of identification and detection are improved.

In particular, the target detection model of the present invention can be specifically formed by accessing a Single Shotmultitubox Detector (SSD) target extraction detection network behind the MBDSCNN network. The SSD target detection model can combine the prediction results on a plurality of feature maps with different resolutions, and the influence of the size of an object on the detection precision is solved. Compared with a classic target detection model, namely fast R-CNN and the like, the SSD improves the speed and ensures a high-precision detection result. The SSD target detection model further adds an additional convolutional layer on the structure of the front-end network MBDSCNN, and the size of the additional convolutional layer is gradually reduced layer by layer, so that the model can be predicted under the multi-scale condition. The target detection model established by the invention can combine the output of each additional convolution layer to replace the full connection layer of the MBDSCNN, and a non-maximum suppression layer (NMS) is added behind the full connection layer to output the detection result. Therefore, compared with the prior art, the method can accurately identify the targets under various scales and accurately output the detection result.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic overall structural view of the micro-aircraft of the present invention;

FIG. 2 is a flow chart of a method of the present invention for micro-aircraft based target detection;

FIG. 3 is a schematic diagram of the MBDSCNN structure employed in the present invention;

FIG. 4 is a flow chart of the present invention for attitude resolution of a micro aircraft;

FIG. 5 is a schematic illustration of the operand of a standard convolution and a depth separable convolution;

in the figure, 1 is a platform main body; 2 is a connecting shaft; and 3 is a propeller.

Detailed Description

In order to make the purpose and technical solution of the embodiments of the present invention clearer, the technical solution of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In one implementation, the object detection system provided by the present invention is provided on a micro aerial vehicle as shown in FIG. 1. The miniature aircraft include: four-axis flight platform, it includes platform main part 1, the four corners of platform main part 1 symmetry respectively is provided with four connecting axles 2 of outside extension, and the distal end of four connecting axles 2 is provided with four coreless motors respectively, and the motor shaft of four coreless motors upwards stretches out and is connected with a set of screw 3 respectively, the rotatory drive of each set of screw 3 of coreless motor drive miniature aircraft takes off, descends, flies or adjusts the attitude in the air.

The shell of the four-axis flight platform adopts Creo integral design, and the shell adopts streamline design, so that air resistance can be greatly reduced. The whole body adopts a bilateral symmetry type design, so that the stability during suspension can be improved. Various equipment interfaces are reserved, and the feasibility of design is verified through simulation assembly and simulation by a computer.

Therefore, the invention can detect the specific target by using the target detection technology in the scene where the personnel are not easy to enter, and help the personnel to remotely scout the situation, for example, the personnel can scout the information in the military affairs, and the rescue personnel can be helped to quickly know the situation of the disaster area in the disaster area with unknown disaster situation. The target detection system based on the miniature four-axis can realize remote control in function, carry out real-time target detection on video streams acquired by the camera, and transmit the videos and detection results back to the ground station.

The corresponding performance index is given in table 1.

TABLE 1 target detection System Performance index based on Mini-quad-axis

Name (R)	Design choice	Remarks for note
			Control mode	Remote control	The wireless remote control distance is not more than 500m
Maximum load bearing	200g	Self weight of 200g
			Energy selection	Lithium battery	7.4v
Aircraft size	10x93mm
			Driving motor	Hollow cup motor	45000rpm

The micro aircraft further comprises a flight control system and a wireless image transmission system. The flight control system uses STM32F103 as a main control chip, uses MPU6050 as an inertial navigation module, obtains stable output by using complementary filtering on acquired data of a gyroscope and an accelerometer, and obtains the real-time attitude of the aircraft by a quaternion algorithm. Flight stability is maintained using cascaded PID.

The wireless image transmission system of the micro aircraft realizes analog image transmission in a 5.8GHz frequency band by using the RTC6705, the transmission power is switched by keys, the power range comprises 25MW, 100MW and 200MW, and the effective image transmission distance can reach 500M.

In a preferred implementation manner, the micro-aircraft may specifically use an SI2302MOS transistor driving 720 coreless motor as a power source of the aircraft; an MPU6050 chip is used as an Inertial Measurement Unit (IMU), a gyroscope and an accelerometer are integrated in the MPU6050 chip, and three-axis attitude angles (ROLL, PITCH and YAW) of the aircraft can be accurately obtained through an attitude calculation algorithm; the remote control is realized through the NRF24L01 wireless module, the effective distance and the effect of remote control are enhanced, and the SMA antenna is used for amplifying a wireless signal. And transmitting the attitude data and the camera object detection result by using the HC-05 Bluetooth module.

Under some implementation modes, the aircraft can specifically use a 1S battery as a power supply, and because the voltage is 4.2V when the 1S battery is fully charged, the aircraft can cause the voltage of the battery to change greatly in the operation process, and therefore the boost-buck circuit is adopted to ensure the working voltage of the main control chip, the wireless module, the Bluetooth module and the like. The ME2108A boosting DC-DC chip is used for boosting an input power supply to output 5V voltage, and then the 5V voltage is input to the MIC5219 voltage stabilizing chip to output 3.3V voltage.

For the cooperation NRF24L01 wireless module realizes remote control, this miniature aircraft's remote controller is the same with miniature aircraft, uses STM32F103 as main control chip, uses 1s battery powered, adopts XC6209 voltage stabilizing chip to provide stable 3.3V operating voltage for main control chip etc. uses NRF24L01 wireless module remote control miniature aircraft. The remote controller controls the aircraft through the two rocker potentiometers, and the ADC is used for collecting the voltages of the two potentiometers to obtain the expected value of the remote controller. And displaying real-time attitude angle data of the aircraft and the expected value of the current rocker by using the OLED screen. And the remote control board main control chip acquires the manipulated variable of the rocker through a plurality of paths of AD. And sends the rocker expected value to the mini quad via NRF24L 01. The OLED screen on the remote controller can display the Euler angle of the aircraft and the expected value of the rocker through the wireless module. In order to implement the above described functions. The Nrf24l01 wireless module needs to realize the function of full duplex communication, but the Nrf24l01 itself does not have the function of full duplex communication, two Nrf modules transmit and receive one by one, and if bidirectional communication is desired, the transceiving states of the two modules are switched, but the programming of the driver is difficult because the switching of the modules takes time and the switching of the two modules must be kept at the same time. Therefore, here we use the Ack with payload of NRF24L01+, and use the response packet to carry the user data, so as to realize the switch-free transceiving state, i.e. to realize the bidirectional communication. On the other hand, in order to ensure the safety of the micro aircraft, whether signals exist in 4 rocker channels or not is detected every 3ms in the flight control program, and if the remote control signals are not detected for more than 6ms, the micro aircraft executes an automatic landing program.

In the process, in order to ensure the stable operation of the micro aircraft, the attitude of the aircraft needs to be solved simultaneously so as to correctly control the operation of the aircraft. In an implementation mode, the attitude calculation can fuse three-axis acceleration data and angular velocity data acquired by an accelerometer and a gyroscope on the MPU6050, and the data are processed to obtain real-time attitude angles of micro four axes. The premise that the miniature four-axis can stably fly is that correct attitude angle data can be acquired.

For the MPU6050, the accelerometer is sensitive to the acceleration of four shafts or a trolley, and the error of the calculated inclination angle by taking an instantaneous value is larger; the angle obtained by integrating the gyroscope is not influenced by the acceleration, but the errors caused by integral drift and temperature drift are larger along with the increase of time. The two sensors can just compensate for the mutual disadvantage. The complementary filtering is to use the angle obtained by the gyroscope as the optimum in a short time, and to average the angle sampled by the acceleration at regular time to correct the angle obtained by the gyroscope. Complementation, namely, the gyroscope is used for comparison and accuracy in a short time, and is mainly used; the accelerometer is more accurate to use for a long time, and the specific gravity of the accelerometer is increased at the moment.

The accelerometer needs to filter high-frequency signals, the gyroscope needs to filter low-frequency signals, the complementary filter is that different filters (high-pass or low-pass and complementary) are used according to different sensor characteristics, then signals of the whole frequency band are obtained through addition, for example, the accelerometer measures the inclination angle, the dynamic response is slow, the signals are unavailable at high frequency, and therefore the high frequency can be restrained through the low pass; the gyroscope has fast response and can measure the inclination angle after integration, but the signal is not good in a low frequency band due to null shift and the like. Low frequency noise can be suppressed by high pass filtering. By combining the two, the advantages of the gyroscope and the accelerometer are combined to obtain a signal with good high frequency and low frequency, and complementary filtering requires selecting the frequency points of switching, namely the high-pass frequency and the low-pass frequency.

The complementary filtering takes the angle obtained by the gyroscope as an optimal value in a short time, and corrects the angle obtained by the gyroscope by regularly averaging the acceleration values sampled by the acceleration so as to more accurately solve and obtain the attitude of the micro aircraft. The gyroscope is used for comparing accurately in a short time, and mainly used; the accelerometer is more accurate after being used for a long time, and the specific gravity of the accelerometer is increased at the moment to realize complementation. In the application of the invention, the accelerometer is operated to filter high-frequency signals, the gyroscope is operated to filter low-frequency signals, and signals of the whole frequency band are obtained by adding different filters (high-pass or low-pass) through complementary filters according to different sensor characteristics. According to the invention, different sensor signals are weighted and summed according to different weights in a complementary mode, and the attitude calculation of a micro four-axis is realized by utilizing a complementary filtering method. The specific calculation method is as follows:

complementary filter formula: gyro integral angle + (angular velocity) dt;

the angular deviation is the acceleration angle-the gyro integral angle;

the fusion angle is equal to the gyro angle + the attenuation coefficient is equal to the angle deviation;

integral of angular deviation + -angular deviation;

angular velocity + attenuation coefficient + integral of angular deviation;

the specific steps of the attitude solution program flowchart can be referred to fig. 4.

On the basis, considering that the load capacity of the micro aircraft is limited, the target detection system carried by the micro aircraft can be arranged on a Raspberry pi 4 and connected with a camera which is arranged on the platform main body 1 and used for collecting video data, and the target detection of the video stream is realized by utilizing a deep learning technology. Aiming at the problems of large model size and low prediction speed of most deep learning models, the invention can utilize a multi-branch deep separable convolution neural network and a Single Shot multiple BoxDector (MBDSCNN-SSD) to establish a target detection model, use the deep separable convolution to reduce the size of the model and use a multi-branch structure to improve the generalization of the model. Preferably, under the condition that a Coral USB accelerating rod is additionally arranged on the Raspberry pi 4, the detection speed of the MBDSCNN-SSD on the object can reach 35 fps.

The specific implementation mode is as follows:

the visual identification unit provided by the invention can be set to comprise a Raspberry pi 4 micro control unit and a Coral USB acceleration rod additionally arranged on the Raspberry pi 4 micro control unit, and is used for receiving video data acquired by the camera and carrying out object detection on a frame picture in the video data; wherein the content of the first and second substances,

the Single Shot multi box Detector operation module is connected with the MBDSCNN as a front network and is used for performing 3x3 convolution on convolution results output by 3x3 convolution, data output after being connected with a filter and data output to a global average pooling layer after two times of depth separable convolution respectively, adding the three 3x3 convolution results and pooling data obtained by the global average pooling layer into a non-maximum suppression layer for detection respectively, outputting a detection result, and identifying a target in video data acquired by a camera according to the detection result.

In a more preferred implementation manner, the Single Shot multitox Detector operation module is provided with three layers of additional convolution layers of 3x3, corresponding to a convolution result output by a convolution of 3x3 in the pre-network MBDSCNN, corresponding to data output after a filter is connected, and corresponding to data output to the global average pooling layer after two times of deep separable convolutions.

In each convolutional layer, s represents a convolution step, and c represents the number of filters. The size of the three additional convolutional layers of 3x3 decreases from layer to layer. The other specific parameters of each convolutional layer can be set as shown in fig. 2: the size of the additional convolutional layer of the first layer 3x3, which is calculated by the convolution result output by the 3x3 convolution, is larger than the size of the additional convolutional layer of the second layer 3x3, which is calculated on the data output after the filter is connected in the third step, and the size of the additional convolutional layer of the second layer 3x3, which is calculated on the data output after the filter is connected in the third step, is larger than the size of the additional convolutional layer of the third layer 3x3, which is calculated on the data output to the global average pooling layer after the fifth layer depth separable convolution in the third step; the convolution step size and the filter number corresponding to the depth separable convolution with the two layers of three branch structures are both larger than the convolution step size and the filter number for performing 3x3 convolution on the frame picture in the video data collected by the camera.

The pre-network MBDSCNN can be specifically set to include 1 convolution kernel 3 × 3 normal convolution, 9 depth separable convolutions, 1 global average pooling layer and one full connection layer; wherein the 9 depth separable convolutions comprise three branch structures, each branch structure comprising two layers having the same convolution step s and the same filter number c, respectively.

Therefore, in the mode shown in fig. 2, the outputs of the three layers of additional convolutional layers decreasing layer by layer are combined through the target detection model to replace the fully-connected layer of the MBDSCNN shown in fig. 3, then the outputs of the three layers of additional convolutional layers decreasing layer by layer are added into the non-maximum suppression layer (NMS) for detection, and the detection result is output, so that the technical effect of improving the target detection efficiency is realized. The reason for this is that:

the traditional CNN model extracts features in an image through convolutional layers, and as the number of convolutional layers increases, the more high-order features can be extracted, and the network capability is more excellent. However, the increase of the layer number also reduces the model efficiency, and the requirement on running hardware is greatly improved.

When depth Separable Convolution (Depthwise Separable Convolition) is substituted for the conventional Convolution operation, since depth Separable Convolution includes two layers: (1) using one convolution kernel for each input channel is called deep convolution (DW); (2) merging the output of DWs using a 1 × 1 convolution is called Pointwise convolution (PW).

In the standard convolutional layer, assume the input feature size is N_i*N_iThe number of channels is M, the size of convolution kernel is N_f*N_fThe number of convolution kernels is K. The calculated amount of convolutional layers is:

N_i*N_i*M*K*N_f*N_f(1)

the standard convolution operation includes two steps: the features are filtered and the merged features are convolved to generate new high-level features. The deep separable convolution then separates the two steps of the standard convolution, namely DW and PW.

And as shown with reference to fig. 5, the depth separable convolution computation is:

N_i*N_i*M*N_f*N_f+K*M*N_i*N_i(2)

thus, the computation of the standard convolution and the depth separable convolution are compared:

k is the number of channels generally greater than 1, N_fFor convolution kernel sizes, common sizes include 3x3, 5 x 5, 7 x 7, so the value of equation (3) is less than 1 and the depth separable convolution is less computationally intensive than the standard convolution.

Therefore, on the basis of the MBDSCNN comprising 1 convolution kernel 3 multiplied by 3, 9 depth separable convolutions, 1 global average pooling layer and one full-connection layer, in order to improve the expression capability of the model and avoid the situation of gradient disappearance along with the deepening of the layer number, the invention further arranges a multi-branch structure on the MBDS-CNN, improves the generalization of the model by utilizing the multi-branch structure and improves the accuracy and the efficiency of target detection.

The above are merely embodiments of the present invention, which are described in detail and with particularity, and therefore should not be construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the spirit of the present invention, and these changes and modifications are within the scope of the present invention.

Claims

1. An object detection system based on a micro aerial vehicle, characterized in that the micro aerial vehicle comprises: the four-axis flying platform comprises a platform main body (1), four connecting shafts (2) extending outwards are symmetrically arranged at four corners of the platform main body (1) respectively, four hollow cup motors are arranged at the far ends of the four connecting shafts (2) respectively, motor shafts of the four hollow cup motors extend upwards and are connected with a group of propellers (3) respectively, and the hollow cup motors drive the propellers (3) to rotate to drive the micro aircraft to take off, land, fly or adjust the attitude in the air;

the object detection system, it includes:

the camera is arranged on the platform main body (1) and is used for acquiring video data;

2. A micro-aircraft based object detection system according to claims 1-2, characterized in that the SingleShot MultiBox Detector algorithm module is specifically connected to MBDSCNN as a head network.

3. The micro-aircraft based target detection system according to claim 1-2, wherein the SingleShot MultiBox Detector operation module is provided with three layers of 3x3 additional convolutional layers corresponding to the convolution result output by the convolution with 3x3 in the front network MBDSCNN, the data output after the filter is connected, and the data output to the global average pooling layer after two times of depth separable convolutions, respectively, wherein the output sizes of the three layers of 3x3 additional convolutional layers decrease layer by layer.

4. The micro-aircraft based object detection system of claim 3, characterized in that the pre-network MBDSCNN comprises 1 convolution kernel 3

3, 9 depth separable convolutions, 1 global average pooling layer and one full-connected layer; wherein the 9 depth separable convolutions comprise three branch structures, each branch structure comprising two layers having the same convolution step s and the same filter number c, respectively.

5. The micro-aircraft based object detection system as claimed in claims 1-4, wherein the object detection model combines the outputs of three additional convolutional layers decreasing layer by layer in place of the fully connected layer of the MBDSCNN, and then adds the outputs of the three additional convolutional layers decreasing layer by layer to a non-maximum suppression layer (NMS) for detection, and outputs the detection result.