CN115082801B

CN115082801B - Airplane model identification system and method based on remote sensing image

Info

Publication number: CN115082801B
Application number: CN202210888641.XA
Authority: CN
Inventors: 杨澜; 杨晓冬; 刘建明; 严华
Original assignee: Beijing Daoda Tianji Technology Co ltd
Current assignee: Beijing Daoda Tianji Technology Co ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-25
Anticipated expiration: 2042-07-27
Also published as: CN115082801A

Abstract

The invention relates to an airplane model identification system and method based on remote sensing images, which comprises the following steps: the target detection module is used for detecting whether an airplane target exists in the remote sensing image, and if the airplane target exists, the detected airplane target is sent to the fine-grained classification module; and the model identification module is used for predicting the model of the airplane target based on the fine-grained classification network. The invention constructs a combined framework of a target detection module and a bilinear pooling fine-grained classification network based on the hole convolution, and the framework is suitable for identifying the model of the airplane target in the remote sensing image and can also be suitable for identifying the targets with the same type size and similar appearance characteristics in images in other fields.

Description

Airplane model identification system and method based on remote sensing image

Technical Field

The invention relates to the technical field of target identification, in particular to an airplane model identification system and method based on remote sensing images.

Background

The currently mainstream neural network architecture based on the deep learning target detection algorithm is divided into a one-stage network model and a two-stage network model. The first-stage network model directly regresses the class probability and the position coordinates of the object target, and compared with the second-stage model, the speed is higher, but the precision is lower than that of the second-stage network model. The two-stage network model mainly completes the target detection process through a convolutional neural network, CNN convolutional features are extracted, when the network is trained, the network is mainly trained in two parts, the first step is to train an RPN network, and the second step is to train the network detected in a target area, so that the accuracy of the network is high, but the speed is lower than that of the one-stage network model.

Most remote sensing image airplane target identification methods are in a stage of identifying whether a certain target is an airplane target or not, and further classification and identification of airplane target models are lacked. Limited by the influence of high-resolution, more complex background and interference factors which are difficult to distinguish compared with the airplane target appear in the remote sensing image, and particularly, although some airplane models have certain differences, the common features of the airplane models are obvious and very similar, especially in the image with lower resolution. Therefore, no matter the structure of the one-stage network model or the two-stage network model can not accurately and finely identify two different types of airplanes with similar spatial structures, and the practical precision can not be achieved.

Disclosure of Invention

The invention aims to improve the identification precision of the specific model of the airplane target through a combined framework of a target detection module and a fine-grained classification module, and provides an airplane model identification system and method based on a remote sensing image.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

an aircraft model identification system based on remote sensing images, comprising:

the target detection module is used for detecting whether an airplane target exists in the remote sensing image, and if the airplane target exists, the detected airplane target is sent to the model identification module;

and the model identification module is used for predicting the model of the airplane target based on the fine-grained classification network.

The fine-grained classification network comprises 3 neural network units and 1 bilinear output unit which are connected in sequence, wherein the input end of the 1 st neural network unit is connected with the output end of the target detection module, and the output end of the 3 rd neural network unit is connected with the input end of the bilinear output unit;

each neural network unit comprises a first residual cavity rolling block, a second residual cavity rolling block, a first rolling layer, a second rolling layer, a full connection layer, a BN layer, a third rolling layer and a first activation function layer; the first residual cavity convolution block, the second residual cavity convolution block and the first convolution layer are connected in sequence, the output end of the first convolution layer and the output end of the second convolution layer are connected with the input end of the full connection layer respectively, and the full connection layer, the BN layer, the third convolution layer and the first activation function layer are connected in sequence.

The first residual cavity convolution block or the second residual cavity convolution block comprises a cavity convolution layer, a BN layer and a second activation function layer which are connected in sequence.

The bilinear output unit is used for performing transposition operation on the high-dimensional feature matrix output by the 3 rd neural network unit, and performing outer product on the high-dimensional feature matrix before the transposition operation and the high-dimensional feature matrix after the transposition operation to obtain the fused bilinear feature.

The classification loss function of the fine-grained classification network is as follows:

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci For the prediction output of the fine-grained classification network, the ith aircraft target belongs to the category of the type c, and c belongs to M; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);

prediction output y representing a fine-grained classification network _ci And input sample x _i The included angle therebetween.

An airplane model identification method based on remote sensing images comprises the following steps:

the method comprises the following steps that S1, a target detection module detects whether an airplane target exists in a remote sensing image, and if the airplane target exists, the detected airplane target is sent to a model identification module;

and S2, predicting the model of the airplane target by the model recognition module based on the fine-grained classification network.

The model identification module predicts the model of the airplane target based on the fine-grained classification network, and comprises the following steps:

the fine-grained classification network carries out feature extraction on the airplane target, outputs bilinear features of the airplane target, and predicts the model of the airplane target according to the bilinear features;

the fine-grained classification network comprises 3 neural network units and 1 bilinear output unit which are connected in sequence; the input end of the 1 st neural network unit is connected with the output end of the target detection module, and the output end of the 3 rd neural network unit is connected with the input end of the bilinear output unit; the 3 neural network units output a high-dimensional characteristic matrix of the airplane target, and the bilinear output unit outputs bilinear characteristics of the airplane target.

Each neural network unit comprises a first residual cavity rolling block, a second residual cavity rolling block, a first rolling layer, a second rolling layer, a full connection layer, a BN layer, a third rolling layer and a first activation function layer; the first residual cavity rolling block, the second residual cavity rolling block and the first rolling layer are sequentially connected, the output end of the first rolling layer and the output end of the second rolling layer are respectively connected with the input end of the full-connection layer, and the full-connection layer, the BN layer, the third rolling layer and the first activation function layer are sequentially connected.

The step of outputting bilinear characteristics of the aircraft target by the bilinear output unit comprises the following steps: and the bilinear output unit transposes the high-dimensional feature matrix output by the 3 rd neural network unit, and then performs outer product on the high-dimensional feature matrix before the transposing operation and the high-dimensional feature matrix after the transposing operation to obtain the fused bilinear feature.

denotes y _i The cosine angle of (d);

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention constructs a combined framework of a target detection module and a bilinear pooling fine-grained classification network based on the void convolution, and the framework is suitable for identifying the type of an airplane target in a remote sensing image and can also be suitable for identifying the targets with the same size and similar appearance characteristics in images in other fields.

(2) The invention extracts a fine-grained classification network structure based on the bilinear feature of the cavity convolution, uses a neural network unit for feature sampling formed by combining common convolution and the cavity convolution and uses residual connection, thereby not only reducing the parameter quantity, but also avoiding the loss of key data, and an output end transposes an extracted high-dimensional feature matrix and outputs a feature calculation result after outer products are fused.

(3) The invention aims at the loss function with small inter-class distance of the fine-grained classification network, adds the cosine angle to enlarge the classification limit, increase the class spacing and aggregate the class inner distance, thereby achieving the identification function of the airplane target model, but not simple classification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a block diagram of an aircraft model identification system according to the present invention;

fig. 2 is a schematic structural diagram of a fine-grained classification network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a neural network unit according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a residual hole convolution block according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Also, in the description of the present invention, the terms "first", "second", and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or implying any actual relationship or order between such entities or operations.

The embodiment is as follows:

the invention is realized by the following technical scheme, as shown in figure 1, the airplane model identification system based on the remote sensing image comprises a target detection module and a model identification module. The target detection module is used for detecting whether an airplane target exists in the input remote sensing image, and if the airplane target exists in the input remote sensing image, the detected airplane target is sent to the model identification module. The model identification module is used for predicting the model of the airplane target based on the fine-grained classification network.

The remote sensing image outputs single-class target detection in the field of deep learning image visual algorithm, the target detection module of the scheme is based on a Yolov5 algorithm, the Yolov5 algorithm belongs to a one-stage network model (one-stage), the performance is excellent, the precision is high, and the small target detection precision is particularly improved, so that the method is very suitable for remote sensing image aircraft target detection. Referring to fig. 1, the output of the object detection module includes coordinate information [ xmin, ymin, xmax, ymax ] and category information [ class, conf ] (class represents classification, conf represents confidence), and since the object detection module is only used for detecting whether an airplane object exists in the remote sensing image, only the coordinate information [ xmin, ymin, xmax, ymax ] is focused here. Wherein xmin and ymin represent the coordinates of the left lower vertex of the frame where the airplane target is located, and xmax and ymax represent the coordinates of the right upper vertex of the frame where the airplane target is located.

And according to the obtained coordinate information, carrying out image cutting on the original remote sensing image, obtaining the area where the airplane target is located, and sending the image-cut airplane target to a fine-grained classification network.

In the remote sensing image, because the difference existing between the appearances of different types of airplanes is small, the airplane appearance is very similar due to the characteristic of obvious commonality and is particularly easily influenced by the spatial resolution. The normal convolution reduces the image resolution and the spatial hierarchy information is lost during the downsampling (pooling) process. The size of an airplane target in a remote sensing image is usually small, and if the common convolution is used for sampling learning, the characteristic effective information extracted after multilayer convolution can be greatly compressed. And the space between the airplane target types is small, so that the general image classification task is difficult to finely identify the specific type of the airplane target.

Therefore, the fine-grained classification network used by the model identification module is a bilinear pooling fine-grained classification network based on the hole convolution. The hole convolution is a method of expanding the field of view by injecting a hole in the normal convolution, and has a parameter expansion rate (ratio) that defines the distance between values when the convolution kernel processes data, compared to the normal convolution.

Referring to fig. 2, the structure of the fine-grained classification network includes 3 neural network units and 1 bilinear output unit, where the 3 neural network units are a first neural network unit, a second neural network unit, and a third neural network unit, respectively, and the structures of the neural network units are the same. The first neural network unit, the second neural network unit and the third neural network unit are sequentially connected, and the output end of the third neural network unit is connected with the input end of the bilinear output unit.

Please refer to fig. 3, which shows a structure of a neural network unit, including a first residual void volume block, a second residual void volume block, a first volume layer, a second volume layer, a full link layer, a BN layer, a third volume layer, and a first activation function layer, where the first residual void volume block and the second residual void volume block have the same structure. The first residual cavity rolling block, the second residual cavity rolling block and the first rolling layer are sequentially connected, the output end of the first rolling layer and the output end of the second rolling layer are respectively connected with the input end of the full-connection layer, and the full-connection layer, the BN layer, the third rolling layer and the first activation function layer are sequentially connected.

Referring to fig. 4, the structure of the residual void volume block includes a void volume layer, a BN layer, and a second activation function layer, which are connected in sequence. The cavity convolution layer is used for down sampling, sliding compensation is set to be 2, and texture features are better reserved instead of Pooling.

Referring to fig. 2, the bilinear output unit is configured to perform a transpose operation on the high-dimensional feature matrix output by the 3 rd neural network unit, and perform an outer product on the high-dimensional feature matrix before the transpose operation and the high-dimensional feature matrix after the transpose operation to obtain a fused bilinear feature: x (I) = A (I) ^T Where X (I) is a bilinear feature, A (I) is a high-dimensional feature matrix before transpose operation, A (I) ^T And the high-dimensional feature matrix after the transposition operation is performed.

And high-order feature representation is obtained after square root and two-norm normalization operation, and bilinear pooling provides stronger feature representation than a linear model, so that the classification accuracy of the high-order feature representation is far higher than that of a common one-stage network model.

Before the fine-grained classification network is used, the fine-grained classification network needs to be trained, and the classification loss function of the fine-grained classification network is as follows:

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model class, M represents the total number of the model classes, and j belongs to M; y is _ci The prediction output of the fine-grained classification network indicates that the ith aircraft target belongs to the c model class, and c belongs to M; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);

Based on the system, please refer to fig. 1, the scheme also provides an airplane model identification method based on remote sensing images, which comprises the following steps:

step S1, a target detection module detects whether an airplane target exists in the remote sensing image, and if the airplane target exists, the detected airplane target is sent to a model identification module.

The size of the remote sensing image is usually far larger than that of the natural scene image, if the remote sensing image is directly sent to the target detection module, the airplane target can be seriously compressed, and particularly for the small targets such as the airplane, the network of the target detection module can hardly learn the characteristic information. Therefore, the remote sensing image with the ultra-large width needs to be cut in advance, the problem that one target is cut off is considered, a certain size is set in the scheme, and the remote sensing image is cut in a sliding mode in a certain step length to obtain N small pictures. And setting an overlap area between two adjacent small images, and inputting the cut remote sensing image small images into a target detection module.

The method comprises the steps that after image preprocessing is carried out on a remote sensing image, N small images are generated and sequentially input into a target detection module, coordinate information [ xmin, ymin, xmax, ymax ] of an airplane target is obtained through prediction of the small images through the target detection module, and the airplane target is obtained according to a coordinate information slice.

The input image size of the fine-grained classification network is set to be 112 × 112, and in order to ensure that the airplane target is not stretched and deformed after being zoomed, the airplane target is subjected to filling zooming processing before being input into the fine-grained classification network. Specifically, a base image with a pixel value of 128 is generated by taking the maximum side W/H (length or width) of an image of an airplane target, then the image of the airplane target is attached to the base image to generate an equilateral (W = H) square image, and when the size is scaled to 112 × 112, the airplane shape is guaranteed to be scaled in an equal ratio.

And the fine-grained classification network extracts the characteristics of the airplane target, outputs bilinear characteristics of the airplane target, and predicts the model of the airplane target according to the bilinear characteristics.

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i Representing a label truth value, namely inputting a model type real label of the ith aircraft target of the fine-grained classification network; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci For the prediction output of the fine-grained classification network, the ith aircraft target attribute is representedC belongs to M in the category of the c model; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An aircraft model identification system based on remote sensing image which characterized in that: the method comprises the following steps:

the target detection module is used for detecting whether an airplane target exists in the remote sensing image or not, and if the airplane target exists, the detected airplane target is sent to the model identification module;

the model identification module is used for predicting the model of the airplane target based on the fine-grained classification network;

each neural network unit comprises a first residual cavity rolling block, a second residual cavity rolling block, a first rolling layer, a second rolling layer, a full connection layer, a BN layer, a third rolling layer and a first activation function layer; the first residual cavity rolling block, the second residual cavity rolling block and the first rolling layer are sequentially connected, the output end of the first rolling layer and the output end of the second rolling layer are respectively connected with the input end of the full-connection layer, and the full-connection layer, the BN layer, the third rolling layer and the first activation function layer are sequentially connected;

the bilinear output unit is used for performing transposition operation on the high-dimensional feature matrix output by the 3 rd neural network unit, and performing outer product on the high-dimensional feature matrix before the transposition operation and the high-dimensional feature matrix after the transposition operation to obtain fused bilinear features;

denotes y _i The cosine angle of (d);

2. An aircraft model identification system based on remote sensing images according to claim 1, characterized in that: the first residual cavity convolution block or the second residual cavity convolution block comprises a cavity convolution layer, a BN layer and a second activation function layer which are sequentially connected.

3. An airplane model identification method based on remote sensing images is characterized in that: the method comprises the following steps:

s2, predicting the model of the airplane target by a model identification module based on a fine-grained classification network;

extracting the characteristics of the airplane target by a fine-grained classification network, outputting bilinear characteristics of the airplane target, and predicting the model of the airplane target according to the bilinear characteristics;

the fine-grained classification network comprises 3 neural network units and 1 bilinear output unit which are connected in sequence; the input end of the 1 st neural network unit is connected with the output end of the target detection module, and the output end of the 3 rd neural network unit is connected with the input end of the bilinear output unit; the 3 neural network units output a high-dimensional characteristic matrix of the airplane target, and the bilinear output unit outputs bilinear characteristics of the airplane target;

the bilinear output unit outputs bilinear characteristics of the aircraft target, and the bilinear output unit comprises the following steps: the bilinear output unit transposes the high-dimensional characteristic matrix output by the 3 rd neural network unit, and then performs outer product on the high-dimensional characteristic matrix before the transposing operation and the high-dimensional characteristic matrix after the transposing operation to obtain fused bilinear characteristics;

wherein x is _i Representing input samples, namely the ith aircraft target of the fine-grained classification network, wherein N represents the total number of the input samples, and i belongs to N; y is _i A label truth value is represented, namely a model type real label of the ith aircraft target of the fine-grained classification network is input; j represents the jth model type, M represents the total number of the model types, and j belongs to M; y is _ci The prediction output of the fine-grained classification network indicates that the ith aircraft target belongs to the c model class, and c belongs to M; m represents the distance interval between airplane targets of different types and classes input into the fine-grained classification network;

denotes y _i The cosine angle of (d);