CN110334765A

CN110334765A - Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism

Info

Publication number: CN110334765A
Application number: CN201910603799.6A
Authority: CN
Inventors: 唐旭; 马秋硕; 马晶晶; 焦李成
Original assignee: Xian University of Electronic Science and Technology
Current assignee: Xian University of Electronic Science and Technology
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2019-10-15
Anticipated expiration: 2039-07-05
Also published as: CN110334765B

Abstract

The invention discloses a kind of Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism, mainly solve the problems, such as that prior art classification accuracy is low.Its scheme is: establishing remote sensing images library and the corresponding classification of image library, and selects 80% remote sensing images sample building training image library at random from every class remote sensing images after normalized；Building one includes convolutional network module, the convolutional neural networks for paying attention to power module, SCDA module and full articulamentum；Training sample in training image library is input to the classification results that convolutional neural networks obtain training sample, and determines the loss function of convolutional neural networks；Loss function iteration is updated until penalty values stabilization by gradient descent method, obtains trained convolutional neural networks；By remote sensing image to be sorted after normalizing, it is input to trained convolutional neural networks and obtains classification results；Nicety of grading of the present invention is high, and strong robustness can be applied to the analysis and management of remote sensing image data.

Description

Remote Image Classification based on the multiple dimensioned deep learning of attention mechanism

Technical field

The invention belongs to technical field of image processing, in particular to a kind of Remote Sensing Images classification method can be applied to The analysis and management of remote sensing image data.

Background technique

As satellite remote sensing images and aerial remote sensing images resolution ratio are continuously improved, can be obtained from remote sensing images more Useful data and information.And it is directed to the application of different occasions, also there is different requirements to the processing of remote sensing images, so being Effectively these remote sensing image datas are analyzed and managed, need to stick semantic label to image according to picture material. And scene classification is exactly a kind of important channel for solving the problems, such as such.What scene classification referred to is exactly to distinguish to provide from multiple image There is the image of similar scene feature, and correctly classifies to these images.Compared to natural image, remote sensing images itself have The characteristics of there is itself, it is limitation and jljl different spectrum of the classification results due to the spatial resolution of remote sensing images itself, different Object usually will cause the phenomenon that mistake is divided, this is as caused by the complexity of remote sensing images itself with the presence of spectrum phenomenon.Therefore, How classification more accurately to be carried out to remote sensing images and also becomes a challenge.

Classification based on convolutional neural networks, some pictures of training will be needed by referring to, input convolutional Neural in batches In network, by the repetition training of high-volume data, so that objective optimization loss function reduces.To realize the mesh of classification 's.Nowadays many more mature, famous convolutional Neurals are suggested.Such as 2012, AlexKrizhevsky was just proposed A kind of depth convolutional network model " AlexNet ".

Although existing convolutional neural networks can be realized the task of picture scene classification, but in study picture semantic letter It is insufficient of both being still had when breath, first is that the classification information position inaccurate as caused by remote sensing images complexity, two It is that volume neural network can usually fall into local marking area in training, as shown in Figure 1.The two deficiencies will lead in actual field There are problems that poor robustness during the classification of scape and is easy to generate mistake and divides.

Summary of the invention

Present invention aims at above-mentioned prior art there are aiming at the problem that, propose a kind of multiple dimensioned based on attention mechanism Classifying Method in Remote Sensing Image expand convolutional network to reduce the probability that remote sensing image classification target falls into regional area and pay attention to The classification accuracy of remote sensing images is improved in power region.

Technical thought of the invention is: the convolution feature of picture is obtained using convolutional neural networks, according to attention mechanism Principle is obtained the useful information for being conducive to classification using attention mechanism, multiple dimensioned convolutional layer feature is extracted from useful information, And pass through full articulamentum network implementations image classification.

According to above-mentioned thinking, realization step of the invention includes the following:

(1) remote sensing images library { I is established₁,I₂,…I_n…,I_N, the corresponding classification of image library is { Y₁,Y₂,…Y_n…,Y_N}, And the remote sensing images library of foundation is normalized, wherein n-th of sample number in n representative image library, n ∈ [0, N], N Represent the number of pictures in remote sensing images library；

(2) it selects 80% sample at random from every class image after normalized, constructs training image library { T₁, T₂,…T_j…,T_M, wherein M < N, wherein T_jJ-th of picture in training image library is represented, j ∈ [0, M], M are the total of training sample Number；

(3) constructing one includes convolutional network module, the convolutional Neural net for paying attention to power module, SCDA module and full articulamentum Network；

(4) loss function of convolutional neural networks is determined:

(4a) is by training image library { T₁,T₂,…T_j…,T_MIt is input to the convolutional layer neural network with pre-training weight, Export the last layer feature F of convolutional layer；

The last layer feature F is input to the attention power module of convolutional neural networks by (4b), exports convolution feature A, then will Convolutional layer feature A is input to multiple SCDA modules that convolutional neural networks have different average thresholds, and output T group mask convolution is special Sign: { M₁,M₂,…M_T, wherein T is the number of SCDA module；

T group mask convolution feature is input to the full articulamentum of convolutional neural networks by (4c) by global average Chi Huahou, The classification results for exporting training data, obtain the loss function of convolutional neural networks:

Wherein, loss₁For the cross entropy of output category result and actual result, loss₂For T group mask convolution feature process After full articulamentum the absolute value of output category result and actual result cross entropy and,For convolutional neural networks weight vectors L2 norm, λ_r、λ_s, η be respectively loss₁, loss₂,Hyper parameter；

(5) setting the number of iterations is P, training is iterated to convolutional neural networks by gradient decline optimization, until damage Lose functionDo not decline or exercise wheel number reaches the number of iterations, obtains trained convolutional neural networks；

(6) user is input in trained convolutional neural networks after normalizing remote sensing figure I' picture to be sorted, obtains To classification results, picture classification is completed.

The present invention has the advantage that compared with prior art

1, the present invention can be quickly found more apparent special in remote sensing images due to being based on attention mechanism principle The feature for being used to classify more is concentrated on a certain region with obvious semantic information, enhances remote sensing images scene point by sign Class accuracy；

2, the present invention uses SCDA module, expands the impression visual field of convolutional neural networks, reduces remote sensing image classification Target falls into the probability of regional area, enhances the accuracy and robustness of remote sensing image classification；

3, the present invention devises loss function, further clarifies classification task, improves the accuracy of remote sensing image classification.

Detailed description of the invention

Fig. 1 is implementation flow chart of the invention；

Fig. 2 is the convolutional neural networks structure chart constructed in the present invention；

Fig. 3 is the remote sensing images master drawing that present invention emulation uses.

Specific embodiment

The embodiment of the present invention effect is described in further detail below in conjunction with attached drawing.

Referring to Fig.1, steps are as follows for realization of the invention:

Step 1, remote sensing images library is established, training sample and test sample are obtained.

UC Merced image 1a) is downloaded from the official website weegee, establishes remote sensing images library { I₁,I₂,…I_n…,I_N, image The corresponding classification in library is { Y₁,Y₂,…Y_n…,Y_N, wherein I_nN-th image in representative image library, Y_nN-th in representative image library The corresponding classification of image, n-th of sample number in n representative image library, n ∈ [0, N]；

1b) the remote sensing images library of foundation is normalized according to following formula:

Wherein V_maxFor the point maximum value of all pixels in remote sensing images library, V_minFor the point of all pixels in remote sensing images library Minimum value, { I'₁,I'₂,…I'_n…,I'_NBe normalized after remote sensing images library, I'_nFor image set after normalized N-th of sample, n ∈ [0, N], N represent normalization after remote sensing images library number of pictures；

The remote sensing images for 1c) selecting 80% at random from every class image in remote sensing images library after normalized are used as training Sample set { T₁,T₂,…T_j…,T_M, using remaining 20% remote sensing images as test sample collection { T₁,T₂,…T_d…,T_m, Middle T_jIndicate j-th of sample in training sample, j ∈ [0, M], t_dIndicate d-th of sample in test sample, d ∈ [0, m], M are The total number of training sample, m are the total number of test sample, m < N, M < N.

Step 2, convolutional neural networks are constructed.

Referring to Fig. 2, this step is accomplished by

Convolutional network module 2a) is set, and the module is by five convolutional layers sequentially connected in pre-training AlexNet network { conv1, conv2, conv3, conv4, conv5 } is constituted；

2b) setting pay attention to power module, the module by an overall situation be averaged pond layer, the first full articulamentum, Relu active coating, Second full articulamentum and Sigmoid function composition, structure are as shown in Figure 3；

The global average pond layer: its convolution feature sizes inputted is W × H × C, for the convolution to C W × H It is averaging, exports the convolution feature of 1 × 1 × C；

The first full articulamentum: its convolution kernel is dimensioned to 1 × 1 × C'/16, wherein C' is that input first connects entirely The characteristic dimension of layer；

The second full articulamentum: its convolution kernel is dimensioned to 1 × 1 × C ", wherein C " is that input second connects layer entirely Characteristic dimension；

The Relu activation primitive and Sigmoid activation primitive are respectively as follows:

Wherein x is the input function of Relu activation primitive, and x' is the input function of Sigmoid activation primitive；

SCDA module 2c) is set, for exporting convolution mask feature

Referring to Fig. 2, the working principle of the SCDA module is as follows:

Notice that the Three dimensional convolution feature of the second full articulamentum output of power module inputs to SCDA module and carries out third dimension Degree summation, obtains two-dimensional convolution feature, averages to obtained two-dimensional convolution feature；

Convolution mask is carried out according to average value, i.e., is compared the value at two-dimensional convolution characteristic strong point with average value, if The value at two-dimensional convolution characteristic strong point is greater than average value, then is encoded to 1, if the value of the data point of two-dimensional convolution feature is less than averagely Value, then be encoded to 0, obtain convolution mask；

Convolution mask feature is extracted, i.e., convolution mask is multiplied with the average value threshold value E of setting, and to the convolution after multiplication Mask data value adds 1, then is multiplied with the Three dimensional convolution feature of input SCDA module, obtains mask feature；Before mask feature Two dimensions are averaged, and are obtained convolution mask feature and are exported；

Full articulamentum 2d) is set, the full articulamentum successively by convolution kernel size be respectively 512 × 1024,102 4 × 1024,1024 × 21 three convolution kernels composition；

2e) the convolutional network module of above-mentioned setting, attention power module, SCDA module and full articulamentum are sequentially connected, obtained To convolutional neural networks.

Step 3 determines the loss function of convolutional neural networks:

3a) by training sample set { T₁,T₂,…T_j…,T_MIt is input to the convolutional network module of convolutional layer neural network, it is defeated The last layer feature F of convolutional layer out；

The last layer feature F 3b) is input to the attention power module of convolutional neural networks, exports convolution feature A, then will volume Lamination feature A is input to multiple SCDA modules that convolutional neural networks have different average thresholds, and output T group mask convolution is special Sign: { M₁,M₂,…M_T, wherein T is the number of SCDA module；

T group mask convolution feature 3c) is input to the full articulamentum of convolutional neural networks, exports the classification knot of training data Fruit obtains the loss function loss of convolutional neural networks_op:

Wherein:

For the L2 norm of convolutional neural networks weight vectors, λ_r、λ_s, η be respectively loss₁, loss₂,It is super Parameter；

Indicate the cross entropy of output category result and actual result, y_jFor in training image library T_jPrediction category probability, o_jFor T in training image library_jPractical category；

Indicate T group mask convolution feature output category result after full articulamentum With the absolute value of actual result cross entropy and, T is that SCDA module says wood, loss_mFor T in training image library_jIn m convolution mask Under feature, loss_nFor T in training image library_jLoss under the n-th convolution mask feature₁。

Step 4, training is iterated to convolutional neural networks.

Being iterated trained existing method to convolutional neural networks has gradient optimization algorithm, Nesterov gradient to add Fast method, Adagrad method, the present invention use but are not limited to gradient descent algorithm, and implementation step is as follows:

4a) setting the number of iterations is P, and it is L, attenuation rate β that trained initial learning rate, which is arranged, by training image library { T₁, T₂,…T_j…,T_MBe divided into the convolutional neural networks for being input to step 2 building for G times, the number of pictures Q inputted every time are as follows:

Wherein M is the total number of training image library sample；

4b) set the corresponding learning rate l of input picture every time are as follows:

L=L* β^G；

The update of G subparameter 4c) is carried out to convolutional neural networks by following formula, obtains updated weight vectors W_new；

Wherein, W is the weight vectors of convolutional neural networks parameter；

By updated weight vectors W_newBring 3c into) in loss function, obtain the updated loss function of weight vectors loss_op；

4d) picture will be trained to be input to convolutional neural networks next time, loss function loss updated to weight vectors_op It is updated, so that loss function loss_opValue constantly decline；

4e) repeat 4d), until loss function loss_opNo longer decline, and current exercise wheel number is less than the iteration time of setting Number P then stops the training to the network, obtains trained convolutional neural networks；Otherwise, when training round reaches changing for setting When generation number P, stops the training to the network, obtain trained convolutional neural networks；

Step 5 classifies to the remote sensing scene picture that user inputs.

5a) remote sensing images to be sorted are normalized in user, i.e., first obtain remote sensing images pixel to be sorted Maximum value V'_maxWith the minimum value V' of pixel_min, then to the value of all pixels points of remote sensing images to be sorted divided by V'_maxWith V'_minDifference, the remote sensing images to be sorted after obtaining normalized；

5b) remote sensing images after normalized are input in trained convolutional network model, obtain classification results.

Effect of the invention can be further illustrated by following emulation:

1. simulated conditions

This example in HP-Z840-Workstation with Xeon (R) CPU E5-2630, GeForce TITAN XP, Under 64G RAM, Ubuntu system, on TensorFlow operation platform, the present invention and existing remote sensing images scene classification are completed Emulation.

Simulation parameter setting is as follows, and iteration round P is 100 times, learning rate 0.00001, λ_r=0.7, λ_s=0.3, η= 0.0001, inputting picture number G every time is 6 times, and attenuation rate β is that 0.9, SCDA module takes 3 groups altogether, and three cell mean threshold values are respectively p₁=1.0, p₂=0.8, p₃=0.6, by training data Random-Rotation, enhancing is four times of original data number.Learning sequence is In repetitive exercise each time, to category arbiter, difference optimizer of classifying, common training.

2. emulation content

UC Merced remote sensing images collection is downloaded, shown in Fig. 3, and it is normalized, i.e., first obtains UC Merced image set pixel maximum value V "_maxWith the minimum value V " of pixel_min, then to all pictures of UC Merced image set The value of vegetarian refreshments is divided by V "_maxWith V "_minDifference, the UC Merced image set after obtaining normalized；

80% remote sensing images are selected at random from the UC Merced image after normalized as training sample set D_T, using remaining 20% remote sensing images as test sample collection D_t；

Under above-mentioned simulated conditions, using training sample set D_TRespectively with the present invention and existing representative three kinds of images point Class model is trained, using test sample collection D_tIt is tested, compares the accuracy rate of its classification, as a result such as table 1.

The image that the training sample set and test sample are concentrated has 21 types, respectively agricultural, airplane、baseball diamond、beach、buildings、chaparral、dense residential、forest、 freeway、golf course、harbor、intersection、medium residential、mobilehomepark、 overpass、parking lot、river、runway、sparse residential、storage tanks、tennis Court,

1 present invention of table is evaluated with existing remote sensing image classification model performance

	Test sample accuracy rate
		The present invention	0.9849
MSCP	0.9782
		SHHTFM	0.9789
DCA	0.9690

MSCP is the existing Classifying Method in Remote Sensing Image based on multiple pileup covariance pond in table 1, and SHHTFM is existing base In the Classifying Method in Remote Sensing Image that isomorphism isomery is sparse, DCA is the existing Classifying Method in Remote Sensing Image based on depth characteristic fusion.

As it can be seen from table 1 in training sample set D_TWhen the percentage for accounting for UC Merced image set is 80%, sent out with this Bright trained convolutional neural networks to 20% test sample collection D_tClassify, accuracy rate is representative more distant than existing It is high to feel image classification model accurate rate.

In conclusion the present invention is substantially better than other remote sensing image classification models for the classifying quality of remote sensing images.

Claims

1. a kind of Classifying Method in Remote Sensing Image based on the multiple dimensioned deep learning of attention mechanism, which is characterized in that include the following:

(1) remote sensing images library { I is established₁,I₂,…I_n…,I_N, the corresponding classification of image library is { Y₁,Y₂,…Y_n…,Y_N, and it is right The remote sensing images library of foundation is normalized, wherein I_nN-th image in representative image library, Y_nN-th in representative image library The corresponding classification of image, n-th of sample number in n representative image library, n ∈ [0, N], N represent the number of pictures in remote sensing images library；

(2) it selects 80% remote sensing images sample at random from every class remote sensing images after normalized, constructs training image Library n ∈ [0, N], using remaining 20% remote sensing images as test sample collection { T₁,T₂,…T_d…,T_m, wherein T_jIndicate training J-th of sample in sample, j ∈ [0, M], t_dIndicate that d-th of sample in test sample, d ∈ [0, m], M are the total of training sample Number, m are the total number of test sample, m < N, M < N.Wherein, wherein T_jRepresent j-th of picture in training image library, j ∈ [0, M], M is the total number of training sample；

(3) constructing one includes convolutional network module, the convolutional neural networks for paying attention to power module, SCDA module and full articulamentum；

(4) loss function of convolutional neural networks is determined:

(4a) is by training image library { T₁,T₂,…T_j…,T_MIt is input to the convolutional network module of convolutional layer neural network, output volume The last layer feature F of lamination；

The last layer feature F is input to the attention power module of convolutional neural networks by (4b), exports convolution feature A, then by convolution Layer feature A is input to multiple SCDA modules that convolutional neural networks have different average thresholds, exports T group mask convolution feature: {M₁,M₂,…,M_T, wherein T is the number of SCDA module；

T group mask convolution feature is input to the full articulamentum of convolutional neural networks, output by (4c) by global average Chi Huahou The classification results of training data obtain the loss function of convolutional neural networks:

Wherein, loss₁For the cross entropy of output category result and actual result, loss₂It is T group mask convolution feature by connecting entirely Connect after layer the absolute value of output category result and actual result cross entropy and,For the L2 of convolutional neural networks weight vectors Norm, λ_r、λ_s, η be respectively loss₁, loss₂,Hyper parameter；

(5) setting the number of iterations is P, training is iterated to convolutional neural networks by gradient decline optimization, until losing letter Number loss_opDo not decline or exercise wheel number reaches the number of iterations, obtains trained convolutional neural networks；

(6) user is input in trained convolutional neural networks after normalizing remote sensing images I' to be sorted, is divided Class is as a result, complete picture classification.

2. passing through the method according to claim 1, wherein remote sensing images library is normalized in (1) Following formula carries out:

Wherein V_maxFor the point maximum value of all pixels in remote sensing images library, V_minIt is minimum for the point of all pixels in remote sensing images library Value, { I'₁,I'₂,…I'_n…,I'_NBe normalized after remote sensing images library, I'_nFor remote sensing images after normalized N-th of sample, n ∈ [0, N].

3. the method according to claim 1, wherein (3) in composition convolutional neural networks convolutional network module, Notice that power module, SCDA module and full articulamentum, parameter setting are as follows:

The convolutional network module, by five convolutional layers sequentially connected in pre-training AlexNet network conv1, conv2, Conv3, conv4, conv5 } it constitutes；

The attention mechanism module is connected entirely by the average pond layer of the overall situation, the first full articulamentum, Relu activation primitive, second It connects layer and Sigmoid function is constituted；

The SCDA module, successively it is made of convolutional channel summation layer and mask layer.

4. the method according to claim 1, wherein output category result in (4c) and actual result are intersected Entropy loss₁, formula is as follows:

Wherein, y_jFor T in training image library_jPrediction category probability, o_jFor T in training image library_jPractical category.

5. the method according to claim 1, wherein the T group mask convolution feature in (4c) passes through full articulamentum Afterwards the absolute value of output category result and actual result cross entropy and, formula is as follows:

Wherein T represents the number of SCDA module, loss_mRepresent T in training image library_jLoss under m convolution mask feature₁, loss_nRepresent T in training image library_jLoss under the n-th convolution mask feature₁。

6. the method according to claim 1, wherein by gradient decline optimization to convolutional neural networks in (5) It is iterated training, is accomplished by

The initial learning rate of (5a) setting training is L, attenuation rate β, by training image library { T₁,T₂,…T_j…,T_MBe divided into G times It inputs in the convolutional neural networks of building, the number of pictures Q inputted every time are as follows:

Wherein M is the total number of training image library sample；

(5b) sets the corresponding learning rate l of input picture every time are as follows:

L=L* β^G；

(5c) carries out the update of G subparameter to convolutional neural networks by following formula, obtains updated weight vectors W_new；

Wherein, W is the weight vectors of convolutional neural networks parameter；

(5d) will train picture to input convolutional neural networks, loss function loss updated to weight vectors next time_opIt carries out It updates, so that loss function loss_opValue constantly decline；

(5e) repeats (5d), until loss function loss_opNo longer decline, and current exercise wheel number is less than the number of iterations of setting P then stops the training to the network, obtains trained convolutional neural networks；Otherwise, when training round reaches the iteration of setting When number P, stops the training to the network, obtain trained convolutional neural networks.