CN111666822A

CN111666822A - Low-altitude unmanned aerial vehicle target detection method and system based on deep learning

Info

Publication number: CN111666822A
Application number: CN202010401407.0A
Authority: CN
Inventors: 闫梦龙; 马益杭; 王书峰; 陈凯强
Original assignee: Sapai Intelligent Technology Co ltd
Current assignee: Sapai Intelligent Technology Co ltd
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-09-15

Abstract

The invention provides a low-altitude unmanned aerial vehicle target detection method and system based on deep learning, which comprises the following steps: collecting optical image information in a target area; performing feature extraction on the optical image information based on an EConvBlock convolution mode to obtain a feature map; constructing a plurality of convolution layers with different scales, and sequentially processing the characteristic diagram to determine whether the optical image information has the unmanned aerial vehicle and the position of the unmanned aerial vehicle; the receptive field of a single convolution module is improved, the feature extractor can achieve a large receptive field by using less convolution modules, and the calculation amount and the calculation redundancy are reduced; the method combines the target characteristics of the unmanned aerial vehicle, restricts the length-width ratio of the default bounding box to be 1:1, reduces the search space and improves the detection accuracy; the technical scheme of the invention combines a comprehensive defense system of an unmanned aerial vehicle defense system, limits the scale search range and adopts nonlinear scale search, thereby increasing the detection density of the near target and improving the comprehensive defense performance.

Description

Low-altitude unmanned aerial vehicle target detection method and system based on deep learning

Technical Field

The invention relates to the field of picture recognition, in particular to a low-altitude unmanned aerial vehicle target detection method and system based on deep learning.

Background

With the rapid development of unmanned aerial vehicle technology, "low-slow small" aircraft, represented by consumer-grade unmanned aerial vehicles, pose a serious threat to public safety, particularly security of military grounds, large civil facilities, military facilities, and major activities. When the traditional air defense weapon faces low-slow-small targets such as an unmanned aerial vehicle, early warning is carried out through an electronic radar and infrared rays, the threat targets are confirmed and attacked by means of manual interaction, and the defects of difficulty in finding and identifying, difficulty in tracking, low real-time performance and the like exist. Aiming at the problem, the invention provides an unmanned aerial vehicle target intelligent detection process and method based on a deep learning method, which are beneficial to improving the full-link automation degree of early warning, detection, identification, tracking and striking in an unmanned aerial vehicle defense system, reducing human intervention to the greatest extent, and improving identification accuracy, system real-time property and comprehensive defense performance.

The traditional general target detection algorithm is limited by the limited characteristic capability of the features, reaches the bottleneck period in the detection accuracy rate, cannot make a great breakthrough, and limits the application scenarios of the methods in the actual environment. Although the existing general target detection algorithm based on deep learning can obtain better effect, two problems still exist:

1. specific limitation and modification are not carried out on the characteristics of the unmanned aerial vehicle target, and a higher false alarm rate still exists;

2. the existing algorithm improves the network receptive field and the feature extraction capability through a large number of repeated convolution modules, and the convolution modules have a large number of redundant calculations and low efficiency.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a low-altitude unmanned aerial vehicle target detection method based on deep learning, which comprises the following steps:

collecting optical image information in a target area;

performing feature extraction on the optical image information based on an EConvBlock convolution mode to obtain a feature map;

and constructing a plurality of convolution layers with different scales, and processing the characteristic diagram in sequence to determine whether the optical image information has the unmanned aerial vehicle and the position of the unmanned aerial vehicle.

Preferably, the feature extraction of the optical image information based on the EConvBlock convolution method to obtain a feature map includes:

feeding the optical image information serving as an input characteristic diagram into a plurality of convolution branches with the same convolution quantity and different apertures for processing;

carrying out cascade combination on the output characteristic graphs of the plurality of convolution branches;

and then sequentially feeding the batch normalization layer and the activation function layer to obtain an output characteristic diagram.

Preferably, the constructing a plurality of feature maps with different scales, and sequentially processing the feature maps to determine whether the optical image information has the unmanned aerial vehicle includes:

constructing a plurality of feature maps with different scales;

inputting the feature maps into different convolution modules in sequence to obtain a plurality of multi-scale feature maps with successively decreasing resolution;

respectively inputting the multi-scale characteristic diagrams into a plurality of different convolution layers to obtain corresponding output characteristic diagrams representing detection results;

feeding all feature maps representing detection results into a loss layer for loss calculation;

the sizes of the candidate frames have different sizes of the constraint reference frames according to different resolutions of the feature maps corresponding to the convolution layers.

Preferably, the number of output channels of the convolutional layer is 6K, where K represents that each unit in the feature map corresponding to the convolutional layer corresponds to K candidate reference frames.

Preferably, the relationship between the feature maps of different scales and the corresponding candidate reference frame scale is as follows:

in the formula, S_minDetecting the scale of the target corresponding to the shallowest layer characteristic diagram, S_maxThe scale of the target is detected corresponding to the highest-level feature map, m is the level of the used feature map layer, k is the level of the designated feature map, S_kAnd (4) corresponding to the k-level scale of the characteristic layer.

Preferably, the aspect ratio of the feature map detection target is 1.

Preferably, the shallowest feature map detects a dimension S of the target_minThe value is 0.1; the scale S of the highest layer characteristic diagram detection target_maxSet to 0.8.

Preferably, the k-level scale S of the feature layer_kThe relationship with the feature map level k is as follows:

when k is 1, S_k＝0.1；

When k is 2, S_k＝0.3；

When k is 3, S_k＝0.5；

When k is 4, S_k＝0.6；

When k is 5, S_k＝0.7；

When k is 6, S_k＝0.8。

Based on the same invention concept, the invention also provides a low-altitude unmanned aerial vehicle target detection system based on deep learning, which comprises:

the acquisition module is used for acquiring optical image information in a target area;

the characteristic extraction module is used for extracting the characteristics of the optical image information based on an EConvBlock convolution mode to obtain a characteristic diagram;

and the unmanned aerial vehicle detection module is used for constructing a plurality of convolution layers with different scales and sequentially processing the characteristic diagram so as to determine whether the optical image information has the unmanned aerial vehicle and the position of the unmanned aerial vehicle.

Compared with the prior art, the invention has the beneficial effects that:

the invention adopts a technical means of a low-altitude unmanned aerial vehicle target detection method based on deep learning, which comprises the following steps: collecting optical image information in a target area; performing feature extraction on the optical image information based on an EConvBlock convolution mode to obtain a feature map; constructing a plurality of convolution layers with different scales, and sequentially processing the characteristic diagram to determine whether the optical image information has the unmanned aerial vehicle and the position of the unmanned aerial vehicle; the receptive field of a single convolution module is improved, the feature extractor can achieve a large receptive field by using less convolution modules, and the calculation amount and the calculation redundancy are reduced;

the technical scheme of the invention also combines the target characteristics of the unmanned aerial vehicle, restricts the length-width ratio of the default bounding box to be 1:1, reduces the search space and improves the detection accuracy;

the technical scheme of the invention combines a comprehensive defense system of an unmanned aerial vehicle defense system, limits the scale search range and adopts nonlinear scale search, thereby increasing the detection density of the near target and improving the comprehensive defense performance.

Drawings

FIG. 1 is a flow chart of a target detection method provided by the present invention;

FIG. 2 is a flow chart of an unmanned aerial vehicle detection algorithm provided by the present invention;

FIG. 3 is a diagram of a conventional convolution module of the prior art;

FIG. 4 is a schematic diagram of an EConvBlock provided by the present invention;

FIG. 5 is a feature extractor architecture provided by the present invention;

fig. 6 is a view of a drone detector provided by the present invention;

FIG. 7 is a block diagram of a target detection system provided by the present invention.

Detailed Description

For a better understanding of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings and examples.

In an unmanned aerial vehicle defense system, in a special application scene, the computing capability of computing equipment is limited, specific requirements are required on an unmanned aerial vehicle target detection algorithm, and a higher computing speed and a lower false alarm rate are required. The invention provides an unmanned aerial vehicle rapid detection algorithm suitable for a low-altitude monitoring unmanned aerial vehicle anti-braking system, which is adaptively modified according to the actual application requirements of an unmanned aerial vehicle target detection algorithm in a defense system:

1. aiming at the characteristics of the unmanned aerial vehicle target, a group of reference frame sizes suitable for the unmanned aerial vehicle target are obtained through calculation, and algorithm parameters are limited to reduce the false alarm rate;

2. a new convolution module EConvBlock is used for replacing the original convolution module, so that the calculation redundancy is reduced;

3. an efficient feature extractor is constructed based on the EConvBlock, a large receptive field is obtained by using only a small number of convolution layers, and the complexity of computing efficiency is low.

In the case of the example 1, the following examples are given,

as shown in fig. 1, the present invention provides a low-altitude unmanned aerial vehicle target detection method based on deep learning,

the specific algorithm flow is as shown in fig. 2, a picture shot by an optical camera is input, a feature graph is obtained through a feature extractor and fed into an unmanned aerial vehicle detector, confidence coefficients of targets existing in each pixel point relative to different reference frames (different length-width ratios) and unmanned aerial vehicle target position information are output, and finally a final detection result is obtained through a non-maximum suppression algorithm.

Feature extractor

The feature extractor needs to have sufficient receptive field to get better feature extraction results. The convolution operation is a local connection mode, and in modern convolutional neural networks, small kernel convolution is generally used, and the receptive field of a single convolution operation is very small. In the general target detection algorithm, classical neural networks such as VGGNets or ResNet are used as feature extractors, and the networks deepen the receptive field of the networks by continuously and repeatedly using the same convolution module, but the same brings a large amount of computational complexity and high computational redundancy. The invention provides an efficient convolution module EConvBlock which replaces the original convolution module, can use fewer network layers to achieve the reception field of a deep network, and reduces the calculation redundancy. The EConvBlock can reduce the number of convolution layers of the feature extractor, further reduce the calculation complexity, improve the detection real-time performance of the unmanned aerial vehicle, and can be applied to an actual unmanned aerial vehicle countering system.

In a typical feature extractor, the convolution module is in a Conv-BN-ReLU mode, samples an optical picture, and generally selects a convolution kernel size of 3 × 3, as shown in fig. 3. The single convolution module in this way has a very limited reception field, so if a large reception field needs to be obtained, a large number of convolution modules need to be connected, and the modern target detector usually uses dozens to hundreds of convolution modules, which brings a large amount of computational complexity and computational redundancy.

Compared with a common convolution module, the EConvBlock provided by the invention can obtain a larger receptive field without additional learning parameter quantity and calculation burden. The module comprises three convolution branches, wherein the convolution of the first branch is consistent with the convolution in a common convolution module, but N/3 characteristic graphs are output to keep the calculation complexity and the learning parameter number; the second branch is a convolution with holes, the aperture size is 5, and N/3 special graphs are output; the third branch is also a perforated convolution with an aperture size of 13. The convolution results of the last three branches are concatenated in the channel dimension.

The module is shown in fig. 4, wherein r represents the aperture size, and the steps of EConvBlock are as follows:

step1, feed the input signature into the first convolution branch, which uses 3x3 convolution with a convolution aperture of 1, N/3 number of output signatures, and SF1_1 as the output signature.

Step2, feed the input signature into the second convolution branch, which uses a 3x3 convolution with a convolution aperture of 5, N/3 number of output signatures, and SF1_ 2.

Step3, feed the input signature into the third convolution branch, which uses a 3x3 convolution with a convolution aperture of 13, N/3 number of output signatures, and SF1_ 3.

And Step4, cascading and combining output characteristic graphs SF1_1, SF1_2 and SF1_3 of the three convolution branches, wherein the number of the output characteristic graphs is N, and the output characteristic graph is SF 2.

And Step5, feeding the combined feature map SF2 into a batch normalization layer (BN), keeping the number of output feature maps unchanged, and recording the output feature map as SF 3.

And Step6, feeding the feature map SF3 into the ReLU layer of the activation function, keeping the output feature map N unchanged, and recording the output feature map as SF4, namely the final output of the module.

In the invention, the high-efficiency EConvBlock is used for replacing a common convolution module to obtain a larger receptive field in a single convolution module, so that fewer EConvBlock can be used for obtaining the receptive field obtained by cascading a large number of common convolution modules. In the feature extractor of the present invention, as shown in fig. 5, N represents the number of output feature maps, and Step represents the Step size of the pooling operation. The maximum receptive field theoretical value of the feature extractor is 232, and only 9 convolution modules are used; if the conventional convolution module is replaced, the receptive field theoretical value is only 72.

Unmanned aerial vehicle detector

The invention makes adaptive modification aiming at the particularity of the target detection scene of the unmanned aerial vehicle in the unmanned aerial vehicle defense system, reduces the search space of the target detection of the unmanned aerial vehicle by restricting the length-width ratio and the size of the reference frame, improves the detection speed and reduces the false alarm rate; and changing the linear relation between the default frame scale and the feature layer, and redesigning the functional relation of the default frame scale and the feature layer so as to focus on detecting the unmanned target with the target scale in the range of [0.5-0.8 ]. The scale here means the number of pixels included in two dimensions, i.e., the "horizontal" and "vertical" dimensions of the picture, rather than the "length and width" of the digital picture being a length unit in a physical sense. The structure of the unmanned aerial vehicle detector is shown in fig. 6, in the diagram, OS represents the ratio of the input image to the feature map length, N represents the number of feature maps, S represents the convolution step size in the convolution module, c represents the number of categories, which is 2 in the unmanned aerial vehicle detection task, K represents the number of categories of candidate frames, and Loss is a Loss function.

The steps of the drone detector as shown in fig. 6 are as follows:

step1: and (3) inputting the input feature map S1 into a general convolution module, wherein the convolution step in the convolution module is 2, the convolution kernel is 3x3, and the output feature map S2 has the same channel number and half of the resolution of the input S1.

Step2: and inputting the feature map S2 into another general convolution module, wherein the convolution step size in the convolution module is 2, the convolution kernel is 3x3, and the feature map S3 is output, wherein the number of channels is kept unchanged, and the resolution is half of that of S2.

Step3: and inputting the feature map S3 into another general convolution module, wherein the convolution step size in the convolution module is 2, the convolution kernel is 3x3, and the feature map S4 is output, wherein the number of channels is kept unchanged, and the resolution is half of that of S3.

Step4: inputting a feature map S1 into a first convolutional layer, wherein the convolutional core of the convolutional layer is a 3x3 convolutional unit, and outputting a feature map D1 representing the detection result, the resolution of the feature map is 1/16 of the network overall input, the number of channels is 6K (c in the map is 2), and K represents that each unit in the feature map corresponds to K reference frames.

Step5: and inputting the feature map S2 into a second convolutional layer, wherein the convolutional core of the convolutional layer is a 3x3 convolutional unit, and outputting a feature map D2 representing the detection result, the resolution of the feature map is 1/32 of the network overall input, the number of channels is 6K (c in the map is 2), and K represents that each unit in the feature map corresponds to K reference frames.

Step6: and inputting the feature map S3 into a third convolutional layer, wherein the convolutional core of the convolutional layer is a 3x3 convolutional unit, and outputting a feature map D3 representing the detection result, the resolution of the feature map is 1/64 of the network overall input, the number of channels is 6K (c in the map is 2), and K represents that each unit in the feature map corresponds to K reference frames.

Step 7: and inputting the feature map S4 into a fourth convolutional layer, wherein the convolutional core of the convolutional layer is a 3x3 convolutional unit, and outputting a feature map D4 representing the detection result, the resolution of the feature map is 1/128 of the network overall input, the number of channels is 6K (c in the map is 2), and K represents that each unit in the feature map corresponds to K reference frames.

Step 8: the feature maps D1, D2, D3, D4 characterizing the test results are fed into the loss layer (loss) for loss calculation.

Most convolutional neural networks will gradually reduce the feature map resolution with the increase of the network depth, which not only can reduce the calculation amount and memory consumption, but also will improve the translation and scale invariance to some extent. To reduce the sensitivity of prediction to scale, one common approach is to use an image pyramid to make independent predictions for each scale of image, and then merge the final results. In the invention, the unmanned aerial vehicle detector can simulate the image pyramid by using multi-level feature maps with different scales, can share calculation and has low calculation cost.

The unmanned aerial vehicle detector adds several convolution feature layers on the feature that the feature extractor proposed, and the resolution ratio of the feature map becomes smaller and smaller with the increase of the network depth, and is used for carrying out unmanned aerial vehicle detection on a plurality of scales.

And after convolution filtering, each newly added feature layer generates a fixed number of prediction result items, wherein the prediction contents comprise the confidence coefficient of each type and shape bias parameters (width h, height w, upper left corner coordinates x and y) relative to a default bounding box.

Each cell in each feature map is assigned a number of default bounding boxes, and each default bounding box is fixed with respect to the cell location in each feature map. For each feature map unit, the drone detector will predict one candidate box for each default bounding box of the respective feature map unit, the parameters of the candidate boxes including relative position bias, shape bias, and class confidence for each bounding box. If each feature map unit generates k default boundary boxes, each default boundary box corresponds to a prediction candidate box, and the prediction content comprises 2 category probabilities which respectively indicate whether the candidate box content is an unmanned aerial vehicle target; 4 bias terms are included, referring to the position bias and the width-height bias, respectively, relative to the default bounding box. Thus, the output result contains (2+4) k terms in total. For an mxn profile, the output dimension is (2+4) kmn.

Because the unmanned aerial vehicle detector uses the multi-scale characteristic diagram, the multi-scale characteristic diagram has different pixel numbers in two dimensions of 'horizontal' and 'vertical' of the characteristic diagram, the dimensions of candidate frames corresponding to the characteristic diagrams with different dimensions are different, and the candidate frames can be constrained to a certain positive correlation function relationship:

S_k＝f(S_min,S_max,k,m),k∈[1,m]

in the formula S_minDetecting the scale of the target corresponding to the shallowest layer characteristic diagram, S_maxFor detecting objects in correspondence with top-level feature mapsAnd m is the number of used feature layer levels, namely the number of the scales, and k is the specified feature map level. f is a monotonically increasing function relative to k, i.e. the shallow feature map is mainly responsible for detecting small targets, and the deep feature map is mainly responsible for detecting large targets, where the shallowest layer refers to the layer with the highest resolution, which is the output feature map of the first convolution layer in this embodiment; the highest layer refers to the layer with the lowest resolution, and in this embodiment, refers to the output profile of the fourth convolutional layer.

In a typical model, for the same position of the feature map, default bounding boxes with different aspect ratios are selected to capture objects with different aspect ratios, where common aspect ratios include

In the unmanned aerial vehicle defense system, due to the particularity of the shape of the unmanned aerial vehicle, the aspect ratio of the unmanned aerial vehicle is very close to 1:1, so that the aspect ratio of the default bounding box is limited, and only {1} is selected as the aspect ratio.

With respect to S_minAnd S_maxThe usual methods are usually set to 0.2 and 0.5, respectively. However, in the unmanned aerial vehicle defense system, the size of the unmanned aerial vehicle in the image is related to the distance between the unmanned aerial vehicle and the camera, and the closer the distance, the larger the size. The unmanned aerial vehicle system has higher requirement on safety, and needs to start early warning and detect the unmanned aerial vehicle target from a longer distance, so S is used in the invention_minThe value was set to 0.1. At the same time, will S_maxSet for 0.8, this is because unmanned aerial vehicle defense system is a comprehensive defense system, need drive it or destroy when unmanned aerial vehicle is close to, does not allow it too close to the protection scope. Therefore, the invention aims at the particularity of the unmanned aerial vehicle defense system and uses S_minAnd S_maxThe constraints are 0.1 and 0.8, respectively.

The function f is typically a linear incremental model, i.e., the feature maps at each level are uniformly distributed with the default bounding box size, i.e., the function f is a linear incremental model

However, in the unmanned aerial vehicle defense system, the importance of remote unmanned aerial vehicle detection is not as high as that of short-distance detection, and especially, the detection accuracy rate in an accurate striking range is particularly important. Therefore, the present invention adopts a non-linear design, the detection density of the small scale target (far distance) is lower than that of the large scale target (near distance), and the different scales are specified in detail as shown in table 1.

TABLE 1 feature level and Scale relationship Table

k	S_k
		1	0.1
2	0.3
		3	0.5
4	0.6
		5	0.7
6	0.8

Example 2:

in order to implement the method, the invention also provides a low-altitude unmanned aerial vehicle target detection system based on deep learning, which comprises the following steps:

the unmanned aerial vehicle detection module is used for constructing a plurality of convolution layers with different scales and sequentially processing the characteristic diagram so as to determine whether the optical image information has the unmanned aerial vehicle and the position of the unmanned aerial vehicle

It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Wherein, the feature extraction module is specifically configured to:

Unmanned aerial vehicle detection module is used for specifically

Constructing a plurality of feature maps with different scales;

The number of output channels of the convolutional layer is 6K, wherein K represents that each unit in the feature diagram corresponding to the convolutional layer corresponds to K candidate reference frames.

The relationship between the feature maps with different scales and the corresponding candidate reference frame scale is as follows:

The dimension length-width ratio of the feature map detection target is 1.

Wherein, the shallowest feature map detects the scale S of the target_minThe value is 0.1; the scale S of the highest layer characteristic diagram detection target_maxSet to 0.8.

K-level scale S of the characteristic layer_kThe relationship with the feature map level k is shown in table 1:

as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.

Claims

1. A low-altitude unmanned aerial vehicle target detection method based on deep learning is characterized by comprising the following steps:

collecting optical image information in a target area;

2. The detection method according to claim 1, wherein the extracting the features of the optical image information based on the EConvBlock convolution method to obtain a feature map comprises:

3. The detection method according to claim 1, wherein the constructing a plurality of feature maps with different scales and sequentially processing the feature maps to determine whether the optical image information includes the unmanned aerial vehicle comprises:

constructing a plurality of feature maps with different scales;

4. The inspection method of claim 3, wherein the number of output channels of the convolutional layer is 6K, where K represents K candidate reference frames corresponding to each unit in the feature map corresponding to the convolutional layer.

5. The detection method according to claim 4, wherein the relationship between the feature maps of different scales and the corresponding candidate reference frame scale is as follows:

6. The detection method according to claim 5, wherein the aspect ratio of the feature map detection target in the dimension is 1.

7. The detection method according to claim 5, wherein the shallowest feature map detects a dimension S of an object_minThe value is 0.1; the scale S of the highest layer characteristic diagram detection target_maxSet to 0.8.

8. The detection method according to claim 5, wherein the k-level scale S of the feature layer_kThe relationship with the feature map level k is as follows: