CN103823845A

CN103823845A - Method for automatically annotating remote sensing images on basis of deep learning

Info

Publication number: CN103823845A
Application number: CN201410039584.3A
Authority: CN
Inventors: 陈华钧; 黄梅龙; 江琳; 陶金火; 杨建华; 郑国轴; 吴朝晖
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-01-28
Filing date: 2014-01-28
Publication date: 2014-05-28
Anticipated expiration: 2034-01-28
Also published as: CN103823845B

Abstract

The invention discloses a method for automatically annotating remote sensing images on the basis of deep learning. The method for automatically annotating the remote sensing images includes extracting visual feature vectors of the to-be-annotated remote sensing images; inputting the visual feature vectors into a DBM (deep Boltzmann machine) model to automatically annotate the to-be-annotated remote sensing images. The DBM model implemented in the method sequentially comprises a visible layer, a first hidden layer, a second hidden layer and a tag layer from bottom to top, and is acquired by means of training. The method for automatically annotating the remote sensing images has the advantages that the deep Boltzmann machine model implemented in the method comprises the two hidden layers (namely, the first hidden layer and the second hidden layer respectively), accordingly, the problem of 'semantic gaps' in image semantic annotation procedures can be effectively solved by the two hidden layers, and the integral annotation accuracy can be improved.

Description

A kind of remote sensing image automatic marking method based on degree of depth study

Technical field

The present invention relates to intelligent classification and the retrieval technique of remote sensing image, particularly a kind of remote sensing image automatic marking method based on degree of depth study.

Background technology

Remote sensing image is one of significant data of spatial information, is widely used in geology and flood monitoring, agricultural and forest inventory investigation, soil utilization and city planning and military field.Along with the development of Chinese Space science and earth observation technology, remote sensing image data present the trend of exponential increase data year, effective management of magnanimity remote sensing image data is become and is become more and more important.

Remote sensing image mark is one of important content of remote sensing images analysis and understanding, it is the bottom visual signature by extracting remote sensing image, learn contacting between these bottom visual signatures and high-level semantic by some machine learning model, thereby realize some semantic label automatic markings to remote sensing image, for example, by remote sensing image automatic marking process, can automatically mark out residential block, farmland, shopping centre, desert, forest etc. in magnanimity remote sensing image.The automatic marking of remote sensing image is the understanding to remote sensing image semanteme, is also the important technical basis that magnanimity remote sensing image is carried out to classification cataloguing and retrieval.

Can regard the automatic classification work of the broad sense to remote sensing image as to the automatic marking work of remote sensing image, before remote sensing image is carried out to automatic marking work, need to first determine and need the corresponding class label of remote sensing image (being text label) of mark which has, then different remote sensing images to be got up from different class label corresponding relationships.

Mark work to traditional images mainly contains 3 class methods: method, the method based on machine learning and the method based on relevant feedback based on object body.Mostly the mark work of traditional images is to come by the lower-level vision feature of image the vision content of analysis and understanding image, but this method exists a problem mostly: " semantic gap "." semantic gap " refers to the high-level semantic that can not only infer image by the bottom visual signature of image, between the bottom visual signature of image and the high-level semantic of image, there is no the suitable abstract bridge that does, thereby the effect of mark is unsatisfactory.

In order to overcome the problem of " semantic gap ", people have slowly been developed certain methods the bottom visual signature of image have been mapped to the high-level semantic of image, wherein typical method comprises probability latent semantic analysis (Probabilistic Latent Semantic Analysis, pLSA) model, potential Di Li Cray distributes (Latent Dirichlet Allocation, LDA) model and author's theme (Author Topic Model, ATM) model etc., but these method majorities are just considered the color and vein characteristic of image, and do not consider the spectral characteristic of remote sensing image.Spectral characteristic is the key character of remote sensing image, also be the key property that remote sensing image is different from general image data, for different characters of ground object, to the spectrum of different wave length, absorption characteristic and reflection characteristic have very big difference, and therefore the spectral characteristic of remote sensing image has very strong identification capability to different characters of ground object.

Summary of the invention

For deficiency of the prior art, the invention provides a kind of " semantic gap " problem that can overcome linguistic indexing of pictures, realize the remote sensing image automatic marking method based on degree of depth study of the semantic tagger of degree of precision.

Based on a remote sensing image automatic marking method for degree of depth study, comprising:

(1) the low-level image feature vector that extracts remote sensing image to be marked builds the visual feature vector that obtains corresponding remote sensing image;

(2) degree of depth Boltzmann machine model input of described visual feature vector being trained carries out automatic marking;

The degree of depth Boltzmann machine model training in described step (2) obtains by following steps:

(S1) create the label dictionary that comprises several text labels;

(S2) select the mark of respective classes to have text label remote sensing image as model training data set according to label dictionary;

(S3) the low-level image feature vector that extracts each remote sensing image builds the visual feature vector that obtains corresponding remote sensing image, and determines the Text eigenvector of each remote sensing image according to label dictionary and text label;

(S4) build degree of depth Boltzmann machine model, described degree of depth Boltzmann machine model comprises visible layer, the first hidden layer, the second hidden layer and label layer from bottom to up successively; Any two nodes in each layer are without connection, the two-way connection of any two nodes between adjacent layer;

(S5) utilize the visual feature vector of all remote sensing images of model training data centralization and Text eigenvector to described degree of depth Boltzmann machine model training, the degree of depth Boltzmann machine model that obtains training.

In remote sensing image automatic marking method based on degree of depth study of the present invention, first extract the low-level image feature of remote sensing image to be marked and build the visual feature vector that obtains remote sensing image according to all low-level image features, then the directly input using visual feature vector as the visible layer of degree of depth Boltzmann machine model, now degree of depth Boltzmann machine model (DBM model, Deep Boltzmann Machine model) the output of label layer as Text eigenvector, then adopt the text label corresponding with Text eigenvector to carry out automatic marking to remote sensing image to be marked.

In DBM model, high-level semantic (label layer) is by abstract the obtaining of low-level image feature (input of visible layer), because low-level image feature can not be transitioned into high-level semantic well, therefore can produce " semantic gap ".The number of plies of hidden layer is more in theory, semantic gap is less, but the data volume of considering remotely-sensed data is large, if it is too much that the number of plies of hidden layer is set, can then train speed very slow, therefore as preferred, degree of depth Boltzmann machine model used in the present invention comprises two hidden layers (being respectively the first hidden layer and the second hidden layer), by being set, two hidden layers improve the abstracting power of the centre of DBM model, just in time can fill up " semantic gap " problem in linguistic indexing of pictures process, improve overall mark accuracy rate.

The text label number comprising in the label dictionary creating in described step (S1) and kind need to set according to application.If need the remote sensing image of mark to be only divided into land and waters, the size of so whole label dictionary is 2, in label dictionary, has two text labels, is respectively land and waters.In actual applications, the size of label dictionary can, than 2 much bigger, be determined according to practical application, is mostly in situation, has " residential block ", " river ", " highway ", " forest ", " desert " etc. text label in label dictionary.

A text label represents a classification, and the text label of the model training data centralization remote sensing image in described step (S2) is generally contained all text labels in label dictionary, can also have in theory the text label not having in label dictionary.

Text eigenvector in described step (S3) is a 0-1 vector (being that in vector, all elements can only be 0 or 1), and described Text eigenvector is determined the Text eigenvector of each remote sensing image according to following steps:

(S31) full null vector of initialization, makes the corresponding text label of every one dimension;

(S32) according to the text label of remote sensing image, be 1 by the element assignment of corresponding dimension, obtain the Text eigenvector of this remote sensing image.

The degree of depth Boltzmann machine model that described step (S4) builds, any two nodes in each layer are without connection, the two-way connection of any two nodes between adjacent layer.

The interconnection network that the second described hidden layer and label layer form is unidirectional BP(Back Propagation) neural network.

Adopt BP neural network to be because can be used for feedback regulation, this is the characteristic of BP neural network, BP neural network is a feed-forward regulation network, it can contrast according to the result of current training and actual result, then carry out suitable feed-forward regulation and come the parameter of correction model according to error, using BP neural network is exactly the learning model that has supervision of having realized a band feedforward in fact.

The training process of the degree of depth Boltzmann machine model in described step (S5) is as follows:

(S51) using visual feature vector as visible layer, using Text eigenvector as label layer;

(S52) using visible layer and the first hidden layer as limited Boltzmann machine, input using visual feature vector as visible layer, uses sdpecific dispersion algorithm is trained to the connection weight that obtains between visible layer and the first hidden layer and the end-state of the first hidden layer to this Boltzmann machine;

(S53) using the first hidden layer and the second hidden layer as limited Boltzmann machine, input using the end-state of the first hidden layer as the first hidden layer, uses sdpecific dispersion algorithm is trained to the connection weight that obtains between the first hidden layer and the second hidden layer and the end-state of the 2nd hidden layer to this Boltzmann machine;

(S54) input using the end-state of the second hidden layer as the second hidden layer, input using Text eigenvector as label layer, adopt the training of BP neural net method to obtain the connection weight of the second hidden layer and label layer, complete the training of degree of depth Boltzmann machine model.

Training process depends on the structure of the degree of depth Boltzmann machine model of step (S4) structure, input take visual feature vector as visible layer when training, the visual feature vector of each remote sensing image of model training data centralization is thought once to train, by a large amount of training, in training process, constantly adjust the coefficient of connection of adjacent two layers, finally obtain the Best link weight of each group, and then the degree of depth Boltzmann machine model that obtains training.

The node number of described visible layer is identical with the dimension of visual feature vector.

In identification and training process, all inputs using visual feature vector as visible layer, therefore each node of visible layer must be mutually corresponding with the element of every one dimension in visual feature vector, and the node number of visible layer is identical with the dimension of visual feature vector.

The node number of described label layer is identical with the dimension of Text eigenvector.

The dimension of described Text eigenvector is identical with the number of label dictionary Chinese version label.

Obtain Text eigenvector according to the output of label layer, then use the text label corresponding with Text eigenvector to mark remote sensing image to be marked.And the input using the text label of each remote sensing image as label layer in training process, for guaranteeing to carry out smoothly, the node number of label layer is identical with the dimension of Text eigenvector.

Further determine that according to Text eigenvector method can know, a text label in Text eigenvector in each element corresponding label dictionary, the dimension of explanatory text proper vector is identical with the text label number in label dictionary.

Node number in described the first hidden layer and the second hidden layer is set based on experience value, is generally 500～1500, in actual application, can adjust according to experiment effect.

Described low-level image feature vector comprise averaged spectrum Reflectivity vector, color layout's description vectors, color structure description vectors, scalable color description vector, homogeneity texture description vector, edge histogram description vectors, GIST proper vector and based on SIFT(Scale-invariant feature transform) the visual word bag vector of feature.

A remote sensing image has a series of low-level image feature vectors, a kind of low-level image feature vector of many employings in existing remote sensing image identification or mask method.The present invention is the identification that increases degree of depth Boltzmann machine model, improve the accuracy rate of mark, low-level image feature vector of the present invention has comprised the texture feature vectors such as color layout's description vectors, color structure description vectors, scalable color description vector, homogeneity texture description vector, edge histogram description vectors, GIST proper vector and the visual word bag vector based on SIFT feature simultaneously, has also introduced spectral characteristic (averaged spectrum Reflectivity vector).Extraction obtains low-level image feature vector and directly as required the splicing of each low-level image feature vector is obtained to visual feature vector according to book afterwards, the dimension of visual feature vector be all low-level image feature vectors dimension and.

As preferably, described averaged spectrum Reflectivity vector comprises the spectrum average reflectance of four kinds of wavelength, and four kinds of described wavelength are respectively 0.44～0.51 micron, 0.53～0.62 micron, 0.63～0.70 micron and 0.74～0.80 micron.These four kinds of wavelength have stronger separating capacity.

Visual word bag vector based on SIFT feature obtains by following steps:

(a) calculate the SIFT proper vector of all remote sensing images of described model training data centralization;

(b) all SIFT proper vectors are carried out to cluster and obtain 500～1000 cluster centres (SIFT proper vector);

(c), using each cluster centre as vision word, add up the occurrence number of each vision word in the SIFT proper vector of every remote sensing image and form the visual word bag vector based on SIFT feature.

A remote sensing image generally has multiple SIFT proper vectors, and the number of the SIFT proper vector that remote sensing image is corresponding depends on the size of remote sensing image and the content comprising, and remote sensing image is larger, and the content comprising is more, and corresponding SIFT proper vector is more.A common remote sensing image comprises 500～2000 SIFT proper vectors.Vision word, for whole model training data set, obtains by cluster.When cluster take the SIFT proper vector of all remote sensing images of whole model training data centralization as object, cluster obtains cluster centre, so-called cluster centre in fact can think to asking of same class SIFT proper vector flat the vector that all arrives, therefore vision word is also a vector, and same class SIFT proper vector adopts identical vision word to represent.The visual word bag vector dimension of every remote sensing image is identical with vision word number, and the each element in visual word bag vector represents respectively the number of times that in this remote sensing image, different vision words occur.

The number of cluster centre can be set as required, and SIFT proper vector numbers general and all remote sensing images of model training data centralization are suitable.As preferably, in the present invention, the number of cluster centre is 500～1000.

According to the feature of degree of depth Boltzmann machine model, the detailed process of described step (2) is as follows:

The visible layer of the degree of depth Boltzmann machine model that the input of described visual feature vector is trained, according to the output of the label layer of degree of depth Boltzmann machine model, determine the Text eigenvector of remote sensing images to be marked, then mark remote sensing images to be marked with text label corresponding to text proper vector.

Because degree of depth Boltzmann machine model is from visible layer, successively calculate the value (all nodal values are Text eigenvector) of each node of top layer (label layer) according to the connection weight between adjacent layer, the Text eigenvector that in some situation, label layer calculates may occur that non-vanishing is not 1 element yet, therefore, the Text eigenvector that need to calculate label layer is normalized, then according to normalized result, the element of military order value maximum is 1, all the other are all zero, the final Text eigenvector that just obtains figure to be marked.

In remote sensing image automatic marking method based on degree of depth study of the present invention, use the DBM model with two hidden layers, visible layer is the low-level image feature vector of corresponding remote sensing image, the superiors are text labels of remote sensing image, two hidden layers in the middle of DBM just in time can be filled up " semantic gap " problem in linguistic indexing of pictures process like this, and overall mark accuracy rate is improved.And in the time that being described, remote sensing image low-level image feature except using color and vein characteristic, introduced spectral characteristic (average reflectance of the spectrum of different wave length), the identification to different earth objects can greatly increase like this remote sensing image mark time simultaneously.

Embodiment

Below in conjunction with specific embodiment, the present invention is described in further detail.

In this enforcement, low-level image feature vector comprises averaged spectrum Reflectivity vector, color layout's description vectors, color structure description vectors, scalable color description vector, homogeneity texture description vector, edge histogram description vectors, GIST proper vector and the visual word bag vector based on SIFT feature.

Averaged spectrum Reflectivity vector can directly be obtained from remote sensing image data, and remote sensing image data is different from normal image, has just gathered spectral information when satellite is taken remote sensing image.

In the present embodiment, the proper vector of averaged spectrum reflectivity has 4 dimensions, comprise the spectrum average reflectance of four kinds of wavelength, four kinds of wavelength are respectively 0.44～0.51 micron, 0.53～0.62 micron, 0.63～0.70 micron and 0.74～0.80 micron, the color layout's description vectors obtaining is 192 dimensions, color structure description vectors is 256 dimensions, scalable color description vector is 256 dimensions, scalable color description vector is 43 dimensions, edge histogram description vectors is 150 dimensions, GIST proper vector is 960 dimensions, visual word bag vector based on SIFT feature is 1000 dimensions, all low-level image feature vector splicings obtain the visual feature vector of remote sensing image to be marked, this feature visual feature vector is 2861 dimensions.

Visual word bag vector based on SIFT feature is extracted and is obtained by following steps:

(b) all SIFT proper vectors are carried out to cluster and obtain 1000 cluster centres;

(c) using each cluster centre as vision word, add up the occurrence number of each vision word in the SIFT proper vector of every remote sensing image and form the visual word bag vector based on SIFT feature of corresponding remote sensing image, the dimension of visual word bag vector equals 1000(and equals the number of cluster centre), in visual word bag vector, each element is respectively the number of times that in all SIFT proper vectors of corresponding remote sensing image, different vision words occur.

(2) degree of depth Boltzmann machine model input of the visual feature vector of remote sensing image to be marked being trained carries out automatic marking.

The degree of depth Boltzmann machine model training using in step in the present embodiment (2) obtains by following steps:

(S1) create the label dictionary that comprises several text labels;

The text label number comprising in the label dictionary creating in step (S1) and kind need to set according to application.If need the remote sensing image of mark to be only divided into land and waters, the size of so whole label dictionary is 2, in label dictionary, has two text labels, is respectively land and waters.In actual applications, the size of label dictionary can, than 2 much bigger, be determined according to practical application, is mostly in situation, has " residential block ", " river ", " highway ", " forest ", " desert " etc. text label in label dictionary.In the present embodiment, in label dictionary, contain 21 text labels.

(S2) select the remote sensing image that marks text label of respective classes as model training data set according to label dictionary, the text label of this model training data centralization remote sensing image comprises all text labels in label dictionary, and only includes all text labels in label dictionary.

In this step, the acquisition methods of the visual feature vector of each remote sensing image is identical with step (1).

Text eigenvector is a 0-1 vector (being that in vector, all elements can only be 0 or 1), and described Text eigenvector is determined the Text eigenvector of each remote sensing image according to following steps:

(S4) build degree of depth Boltzmann machine model, described degree of depth Boltzmann machine model comprises visible layer, the first hidden layer, the second hidden layer and label layer from bottom to up successively, any two nodes in each layer are without connection, the two-way connection of any two nodes between adjacent layer, and the interconnection network of the second hidden layer and label layer formation is unidirectional BP neural network.

The node number of visible layer is identical with the dimension of visual feature vector, is 2861 dimensions.

The number of the node number of label layer, the dimension of Text eigenvector and label dictionary Chinese version label is all identical, is 21 in this enforcement.

Node number in the first hidden layer and the second hidden layer is 1024.

(S5) utilize the visual feature vector of all remote sensing images of model training data set and Text eigenvector to described degree of depth Boltzmann machine model training, the degree of depth Boltzmann machine model that obtains training, concrete training process is as follows:

(S54) using the second hidden layer and label layer as limited Boltzmann machine, input using the end-state of the second hidden layer as the second hidden layer, input using Text eigenvector as label layer, adopt the training of BP neural net method to obtain the connection weight of the second hidden layer and label layer, complete the training of degree of depth Boltzmann machine model.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, any be familiar with those skilled in the art the present invention disclose technical scope in; the variation that can expect easily or replacement, within all should being encompassed in protection scope of the present invention.

Claims

1. the remote sensing image automatic marking method based on degree of depth study, is characterized in that, comprising:

(S1) create the label dictionary that comprises several text labels;

(S4) build degree of depth Boltzmann machine model, described degree of depth Boltzmann machine model comprises visible layer, the first hidden layer, the second hidden layer and label layer from bottom to up successively, any two nodes in each layer are without connection, the two-way connection of any two nodes between adjacent layer;

2. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 1, is characterized in that, the interconnection network that the second described hidden layer and label layer form is unidirectional BP neural network.

3. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 2, is characterized in that, the training process of the degree of depth Boltzmann machine model in described step (S5) is as follows:

4. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 3, is characterized in that, the node number of described visible layer is identical with the dimension of visual feature vector.

5. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 4, is characterized in that, the node number of described label layer is identical with the dimension of Text eigenvector.

6. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 5, is characterized in that, the dimension of described Text eigenvector is identical with the number of label dictionary Chinese version label.

7. the remote sensing image automatic marking method based on degree of depth study as described in arbitrary claim in claim 1～6, it is characterized in that, described low-level image feature vector comprises averaged spectrum Reflectivity vector, color layout's description vectors, color structure description vectors, scalable color description vector, homogeneity texture description vector, edge histogram description vectors, GIST proper vector and the visual word bag vector based on SIFT feature.

8. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 7, it is characterized in that, described averaged spectrum Reflectivity vector comprises the spectrum average reflectance of four kinds of wavelength, and four kinds of described wavelength are respectively 0.44～0.51 micron, 0.53～0.62 micron, 0.63～0.70 micron and 0.74～0.80 micron.

9. the remote sensing image automatic marking method based on degree of depth study as claimed in claim 8, is characterized in that, the visual word bag vector based on SIFT feature obtains by following steps:

(b) all SIFT proper vectors are carried out to cluster and obtain 500～1000 cluster centres;