WO2020232905A1

WO2020232905A1 - Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium

Info

Publication number: WO2020232905A1
Application number: PCT/CN2019/103702
Authority: WO
Inventors: 王俊; 高鹏; 谢国彤
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-05-20
Filing date: 2019-08-30
Publication date: 2020-11-26
Also published as: CN110287962A; CN110287962B

Abstract

The present application relates to the technical field of image processing, and discloses a superobject information-based remote sensing image target extraction method. The method comprises: acquiring a remote sensing image; performing segmentation on the remote sensing image, and obtaining multiple segmented basic units of the remote sensing image; extracting image features of the segmented basic units, forming a first feature vector, combining superobject feature information of a target to be extracted from the segmented basic units, and forming a second feature vector; fusing the first feature vector and the second feature vector to form a fused feature vector; inputting the fused feature vector into a trained neural network model; and outputting, by means of the neural network model, a target type corresponding to the segmented basic units. The present application further discloses a device, an electronic apparatus, and a storage medium. The present application incorporates superobject feature information of a target to be extracted, and sufficiently utilizes of an image semantic feature and dimension information of the target, thereby enhancing effectiveness and accuracy of target extraction from remote sensing images.

Description

Remote sensing image target extraction method, device, electronic equipment and medium based on super-object information

This application requires the priority of the patent application whose application number is 201910418494.8, the filing date is May 20, 2019, and the invention-creation title is "Method, device and medium for extracting remote sensing image targets based on super-object information".

Technical field

This application relates to the field of image processing technology, and in particular to a remote sensing image target extraction method, device, electronic device, and medium based on super-object information.

Background technique

The goal of remote sensing is to extract information from images and acquire knowledge. Remote sensing image target recognition is generally performed on artificial features, not only based on their spectral characteristics, but also to a large extent based on the target shape, spatial semantic relations, etc. The data source is high space Resolution aerial imagery and satellite imagery. Artificial features are an important element in the spatial geographic information database. Artificial features mainly include buildings, bridges, roads and large-scale engineering structures (such as airports). With the continuous improvement of the resolution of remote sensing images, the information in the image is more complex, and the texture and shape information of the ground features become more diversified. For buildings, the size and shape of the building are different. The extraction of object targets only analyzes the target features at a single scale, and only focuses on the relevant features of the current target itself. It requires more manual feature design, selection, and trial and error, which leads to excessive reliance on feature design, reduced extraction automation, and accuracy It is difficult to break through bottlenecks. In addition, the current extraction methods for building targets are one-sided in carrying out target extraction at a single scale, and insufficient use of background context knowledge on multiple levels of remote sensing images, ignoring contextual visual cues that are more important for target identification, and prioritizing visual cognition. The effective use of experimental knowledge and image context information is unreasonable, resulting in low extraction accuracy and automation.

Summary of the invention

The present application provides a remote sensing image target extraction method, device, electronic device, and medium based on super-object information to solve the problem of low extraction accuracy and automation caused by only single-scale target feature analysis in the prior art.

In order to achieve the above objective, one aspect of this application is to provide a remote sensing image target extraction method based on super-object information, including: acquiring a remote sensing image; segmenting the remote sensing image to obtain multiple segmentation basic units of the remote sensing image Extract the image features of the basic segmentation unit to form a first feature vector, and combine the super-object feature information of the target to be extracted in the basic segmentation unit to form a second feature vector; combine the first feature vector with the first feature vector The two feature vectors are fused to form a fusion feature vector; the fusion feature vector is input to a trained neural network model; the neural network model is used to output a target category corresponding to the segmentation basic unit.

In order to achieve the above object, another aspect of the present application is to provide an electronic device, the electronic device includes: a processor; a memory, the memory includes a remote sensing image target extraction program, the remote sensing image target extraction program is processed When the device is executed, the steps of the remote sensing image target extraction method described above are realized.

In order to achieve the above objective, another aspect of the present application is to provide a computer non-volatile readable storage medium, the computer non-volatile readable storage medium includes a remote sensing image target extraction program, the remote sensing image target extraction When the program is executed by the processor, the steps of the remote sensing image target extraction method described above are realized.

In order to achieve the above objective, the fourth aspect of the present application is to provide a remote sensing image target extraction device based on super-object information, including: an acquisition module to acquire remote sensing images; a segmentation module to segment the remote sensing images to obtain the Multiple segmentation basic units of remote sensing images; a feature extraction module that extracts the image features of the segmentation basic unit to form a first feature vector, and combines the super-object feature information of the target to be extracted in the segmentation basic unit to form a second feature vector Feature fusion module, fusion of the first feature vector and the second feature vector to form a fusion feature vector; input module, input the fusion feature vector into the trained neural network model; output module, through the neural network The model outputs the target category corresponding to the segmentation basic unit.

Compared with the prior art, this application has the following advantages and beneficial effects:

When extracting remote sensing images, this application incorporates the super-object feature information of the target to be extracted, and combines the deep learning of the neural network model to process and map the features layer by layer, so as to realize the semantic features and image semantics of the target to be extracted. Full use of scale information; abstract specific and unexplicit knowledge about the target to be extracted from the pixel representation at multiple levels, overcome the problem of discretization of target information due to remote sensing observation and imaging, and improve remote sensing image target extraction Effectiveness and accuracy.

Description of the drawings

FIG. 1 is a schematic flowchart of a method for extracting a remote sensing image target based on super-object information according to this application;

Figure 2 is a schematic diagram of feature extraction of remote sensing images in this application;

FIG. 3 is a schematic structural diagram of an embodiment of the neural network model described in this application;

Figure 4 is a schematic diagram of the structure of the first neural network unit in this application;

Figure 5 is a schematic diagram of the structure of the second neural network unit in this application;

Figure 6a is a schematic diagram of an original remote sensing image I in this application;

Figure 6b is a schematic diagram of the building extraction result of the remote sensing image I in this application;

Figure 7a is a schematic diagram of another original remote sensing image II in this application;

Figure 7b is a schematic diagram of the building extraction result of the remote sensing image II in this application;

Fig. 8 is a schematic diagram of a remote sensing image target extraction device based on super-object information in this application.

Detailed ways

The embodiments of the present application will be described below with reference to the drawings. A person of ordinary skill in the art may realize that the described embodiments can be modified in various different ways or combinations thereof without departing from the spirit and scope of the present application. Therefore, the drawings and description are illustrative in nature, and are only used to explain the application, rather than to limit the protection scope of the claims. In addition, in this specification, the drawings are not drawn to scale, and the same reference numerals denote the same parts.

Figure 1 is a schematic flow chart of the remote sensing image target extraction method based on super-object information in this application. As shown in Figure 1, the remote sensing image target extraction method includes:

Step S1, acquiring remote sensing images;

Step S2, segment the remote sensing image to obtain multiple segmentation basic units of the remote sensing image;

Step S3, extract the image features of the basic segmentation unit to form a first feature vector, and combine the super-object feature information of the target to be extracted in the basic segmentation unit to form a second feature vector;

Step S4, fusing the first feature vector and the second feature vector to form a fusion feature vector;

Step S5, input the fusion feature vector into the trained neural network model;

In step S6, the target category corresponding to the segmentation basic unit is output through the neural network model, and the target extraction is realized through the classification method, and the target to be extracted is distinguished from other categories.

The remote sensing image target extraction method of this application integrates the super-object feature information of the target to be extracted during target extraction, and combines the deep learning of the neural network model to realize the full utilization of the image semantic features and scale information of the extracted target. Abstracting specific, unexplicitly, pixel-based knowledge about the target at this level, overcoming the problem of discretization of target information caused by remote sensing observation and imaging, and improving the effectiveness and accuracy of target extraction from remote sensing images.

This application can be used to extract building targets in remote sensing images, as well as other types of features in remote sensing images, such as bridges and roads.

In this application, the region growing method is used to segment remote sensing images at multiple scales and levels, and the basic segmentation unit obtained according to the segmentation result can be referred to as "primitive" for short. According to the definition of image segmentation: "divide the image in the scene into sub-areas that do not overlap each other". Primitives are the segmentation of remote sensing images to make homogeneous pixels form primitive objects of different sizes. Each primitive object has attributes such as spectrum, shape, texture, and spatial topological relationship, and has geological semantics. Different primitive object categories can be distinguished by attribute characteristics, such as buildings and other types of objects. In this application, the target category corresponding to the segmentation basic unit can be output through the neural network model. For example, when the target to be extracted is a building, the neural network model in this application can output buildings or other feature categories (including roads). , Water or forest, etc.) to extract buildings in remote sensing images.

Figure 2 is a schematic diagram of remote sensing image feature extraction in this application. As shown in Figure 2, preferably, the step of extracting the image features of the basic segmentation unit to form the first feature vector includes: segmenting the segmentation basic unit using a region growing method to obtain Multiple first sub-images;

Arranging the plurality of first sub-images from the bottom to the top according to the order of the multispectral bands;

Extract image features from multiple first sub-images in a bottom-up order to form a first feature vector (also called target feature vector), where the image features are extracted from the original spectrum-space joint information, and the extracted image Features include spectrum, texture, and shape and other spectrum-space multiple structural features, and different features have different spectrum and spatial information.

The existing target extraction of remote sensing images usually only pays attention to the relevant features of the current target itself, and the extraction task is carried out on this basis. The information source used in this bottom-up extraction mode is limited to the target itself and ignores the target's location. For example, vehicles usually appear on the road or in the parking lot. At this time, the road or parking lot is the super object of the vehicle, or called the parent object. The feature information of the road or parking lot in the remote sensing image is the super object. Object characteristic information. From a cognitive perspective, the features of the background of the target to be extracted (that is, the super-object of the target to be extracted after image segmentation and merging) are closely related to the intrinsic properties of the target to be extracted, and a specific target is usually associated with a specific Super-objects (for example, vehicles usually appear on roads, and when the target to be extracted is a vehicle, the associated specific super-object is a road). Super-object information can be used as the source of information for target extraction and detection (also called context information). The scenes where these spectral features are confused are even more helpful for pattern discrimination than the existing features of the target itself. In this application, in the extracted spectral-space multiple structural features of the multi-scale homogeneous segmentation unit with geo-semantics, the super-object feature information is fused to realize the full utilization of the image semantic features and scale information.

In this application, in the process of forming the second feature vector (also known as the super-object feature vector), the feature fusion method of vector stacking (VS) is adopted (the features of each second sub-image are vertically stacked, and each The basic unit of segmentation has its corresponding multi-level segmentation super-object feature information), associate the target to be extracted at the over-segmentation level with its super-object, and top-down the merged two or even multi-level super-objects in the same position. The object features are superimposed and assigned to the low-level sub-images, and the classification and extraction are carried out on the lowest-level sub-images, and the classification of the basic unit of segmentation (buildings or other features) is output.

As shown in Figure 2, preferably, in combination with the super-object feature information of the target to be extracted in the segmentation basic unit, the step of extracting the second feature vector includes:

Multi-level segmentation and merging of the basic unit of segmentation by setting different region growth and merging thresholds to obtain multiple levels of second sub-images. According to the set region growth merging thresholds, the target to be extracted at each over-segmentation level Respectively associate with the corresponding super-objects, and form the super-object feature information of the target to be extracted at the same position in multiple levels of sub-images after the association;

Arranging the second sub-images of multiple levels from top to bottom in the order of the region growth and merging threshold value from large to small;

Respectively determine the super-object feature information corresponding to the target to be extracted in the second sub-image of each level;

Feature fusion of the super-object feature information at the same position in the second sub-images of multiple levels from top to bottom, and fusion to the second sub-image at the bottom;

The second feature vector is extracted according to the second sub-image at the bottom.

In this application, as shown in Figure 2, when the first feature vector and the second feature vector are formed, after multi-scale and multi-level segmentation is performed on the remote sensing image, at each scale level, the sub-image of the current scale is used as the target Perform feature extraction, and perform vector superposition and fusion of the target features of the first sub-image and its corresponding second sub-image (combined with super-object information), and then input the fused feature vector into the neural network model.

Preferably, the step of separately determining the super-object feature information corresponding to the target to be extracted in the second sub-image of each level includes:

Divide the second sub-images of each level into blocks, and divide them into multiple third sub-images;

By formula

Obtain the similarity between multiple first sub-images and multiple third sub-images, where s _1,3 represents the similarity between multiple first sub-images and one third sub-image, (x ₁ ,x ₂ ,... ., x _d ) are the first feature vectors of multiple first sub-images, (y ₁ , y ₂ ,..., y _d ) are the feature vectors of a third sub-image;

The similarities between the multiple first sub-images and the multiple third sub-images constitute a similarity matrix;

Iteratively update the attractiveness and attribution among multiple third sub-images through the similarity matrix to obtain the third sub-image that meets the conditions that the autocorrelation attribution and autocorrelation attractiveness are both greater than 0;

Obtain the sum of the attribution and attraction of each third sub-image that meets the conditions and other third sub-images;

The other third sub-images corresponding to the maximum value of the sum of each eligible third sub-image are taken as the cluster centers of each eligible third sub-image, so as to obtain the third sub-image of each eligible third sub-image Clustering center, cluster the third sub-images that belong to the same cluster center into one category, and use the feature information of each category after clustering as the super-object corresponding to the target to be extracted in the second sub-image of the level Characteristic information.

The auto-encoder model includes a single-hidden-layer auto-encoder model and a multi-hidden-layer auto-encoder model. The autoencoder (AutoEncoder, AE) generally refers to an encoder structure with one hidden layer (ie, single Hidden-layer autoencoder), the single-hidden-layer autoencoder is a neural network that reproduces the input signal as much as possible, including an input layer for inputting the original feature vector, a hidden layer for feature conversion, and a The output layer that matches the input layer and is used for information reconstruction. The output vector of the AE has the same dimension as the input vector, and it often learns a data representation through a hidden layer or effectively encodes the original data according to a certain form of the input vector. The main goal of the autoencoder is to make the input value and the output value equal, so first use the connection weight between the input layer and the hidden layer (the weight of the coding layer) to encode the input, after the activation function, use the hidden layer and the output The connection weight between the layers (the weight of the decoding layer) is decoded, and the weights of the encoding layer and the decoding layer are usually taken as transposed matrices. Through the process of encoding and decoding, the input value and output value remain unchanged .

It is worth noting that this autoencoder is a non-linear feature extraction method that does not use class labels. As far as the method itself is concerned, the purpose of this feature extraction is to retain and obtain a better information representation, not to perform classification Task, although sometimes these two goals are related.

In addition to the aforementioned single hidden layer auto-encoder model structure, there are several other variant structures of auto-encoders. When the number of hidden layers is greater than 1, the autoencoder is regarded as a deep structure, and it is called Stacked Denoising Auto-encoders (SDA). Introduce random noise in the visual layer of the neural network (that is, the input layer), and then perform encoding and decoding to restore the data or features of the input layer, and the denoise autoencoder (DAE) is obtained. Modeling a stacked restricted Boltzmann machine (RBM) to form a deep confidence network (DBN) method can realize a stacked autoencoder (Stacked AutoEncoder).

The stacked autoencoder model is composed of multiple autoencoders stacked in series. The purpose of the stacked multi-layer autoencoder is to extract the high-order features of the input data layer by layer. In this process, the dimensionality of the input data is reduced layer by layer, and a complex input data is transformed into a series of simple high-order features. Then input these high-level features into a classifier or clusterer for classification or clustering.

Preferably, the neural network model is a stacked noise reduction autoencoder model, including an input layer, multiple hidden layers and an output layer. In this application, by inputting the fusion feature vector into the neural network model, the bottom-up mapping process from the original input to the hidden feature space and the top-down implicit feature mapping process from the output result to the original input are realized Combined.

Preferably, the training step of the neural network model includes:

Select training samples, which are selected from multiple basic segmentation units obtained after segmenting the remote sensing image, and each selected basic segmentation unit is used as a training sample;

Obtain the fusion feature vector of the training sample;

Input the fusion feature vector of the training sample into the neural network model, perform pre-training on the neural network model, and obtain the initial parameters of the neural network model (where the parameters include the connection weight and bias between each connection layer);

According to the initial parameters, the neural network model is back-tuned and trained.

Among them, three issues should be considered when selecting training samples. First, the size of the reference object relative to the target to be extracted in the basic unit of segmentation. If the reference object is too large, the "mixed target" will be selected. Therefore, the reference object must Choose an appropriate size (for example, when extracting a door lock in a remote sensing image, if you select a door handle or a door as the reference object, you can extract the door lock. If you select the wall where the door is installed as the reference object, the reference object is too large. When the door is locked, mixed targets including walls, doors and windows on the wall will be extracted). Second, the selection of scale factors. Generally speaking, the larger the training area (choose higher-scale segmentation basic units), the higher the classification accuracy, but the time cost and economic cost must also be considered. Therefore, this application is selected The target extraction is performed in the segmentation basic unit of the smaller scale level, and the merged super-object feature information is added at the same time. Third, the selection of features. The image features used for remote sensing image classification and extraction are mainly divided into three categories: shape features, texture features and spectral features. In this application, the following image features are selected: multi-band spectral gray value and variance; Area, shape index, aspect ratio, rectangularity, roundness, density; the contrast, correlation and entropy of the gray-level co-occurrence matrix based on the near-infrared band; and the normalized vegetation index NDVI and normalized water index NDWI. In an embodiment of the present application, 100 samples are visually interpreted in the entire image at random, training samples are selected from the interpreted 100 samples, and the types of samples are: buildings, roads, forests, water bodies, etc. Feature categories, select one as the extraction target, and the others as non-extraction targets. For example, if you select a building as the extraction target, the neural network model will output the categories as buildings and non-buildings. If you select a road as the extraction target, pass The neural network model output categories are road and non-road.

The result of pre-training is used as the initial weight of the neural network model, and then the parameters are fine-tuned through the BP backpropagation algorithm. In pre-training, SDA can be seen as many layers of AE autoencoders connected, using Layer-wise layer-wise greedy algorithm for unsupervised network learning, in fine-tuning SDA can be seen as a regular multi-layer perceptron for supervised learning.

For a single hidden layer autoencoder, one of many variants of the BP backpropagation algorithm is usually used for training (for example, the stochastic gradient descent method). However, if it is still applied to a multi-hidden layered stacked noise reduction autoencoder network, the back-propagation training method will cause some problems: after the first few layers, the error will become extremely small, and the training will also Then it becomes invalid. This application uses each layer as a simple automatic decoder for pre-training and then stacking, which greatly improves the training efficiency and training effect.

Preferably, pre-training the neural network model, and the step of obtaining the initial parameters of the neural network model includes:

Divide the neural network model into multiple auto-encoder units;

Pre-training each auto-encoder unit separately;

Obtain the parameters of each auto-encoder unit through the pre-training results;

Randomly initialize the parameters between the output layer of the neural network model and the upper connection layer;

The pre-training results and the parameters obtained by random initialization are used as the initial parameters of the neural network model.

Preferably, dividing the neural network model into a plurality of autoencoder units includes: each hidden layer in the neural network model and the upper layer of the hidden layer constitute an autoencoder unit; the number of divided autoencoder units and the neural network The number of hidden layers in the network model is equal. Each divided autoencoder unit includes two connected layers. The first autoencoder unit divided includes the input layer in the neural network model and an adjacent hidden layer. The other auto-encoder units of all include two hidden layers in the neural network model, and the hidden layer of the first auto-encoder unit serves as the input layer of the second auto-encoder unit, and the hidden layer of the second auto-encoder unit The layer is used as the input layer of the third auto-encoder unit, and so on, the neural network model is divided into multiple auto-encoder units.

Pre-training each autoencoder unit separately includes:

Add a connection layer to each autoencoder unit as the relative output layer of the autoencoder unit to construct multiple neural network units. Each neural network unit includes a relative input layer, a relative hidden layer, and a relative output layer. The pre-training of the auto-encoder unit is realized by pre-training the neural network unit. When a stacked auto-encoder is formed, the relative output layer of each neural network unit is removed for stacking;

Pre-train the first neural network unit;

The relative hidden layer of the first neural network unit obtained by pre-training is used as the relative input layer of the next neural network unit, and a connection layer is added as the relative output layer of the next neural network unit, for the next neural network unit Perform pre-training to complete the pre-training of each neural network unit in turn, that is, complete the pre-training of each auto-encoder unit in turn, and obtain the parameters of each auto-encoder (including the two connection layers in the auto-encoder). Connection weights and biases).

This application uses the semi-supervised neural network model constructed with the denoising autoencoder on the basis of adding the context information of the primitive super-object, and uses the layer-by-layer initialization pre-training to train the multilayer network structure in turn to realize end-to-end unsupervised Feature learning and expression avoids the manual feature analysis and selection steps that require a lot of research in existing machine learning methods.

Take the constructed noise reduction autoencoder model including two hidden layers as an example to further illustrate the training process of the neural network model.

Figure 3 is a schematic structural diagram of an embodiment of the neural network model of this application. As shown in Figure 3, the neural network model includes an input layer, two hidden layers and an output layer, which are divided into two autoencoder units (respectively The first DA unit and the second DA unit), the first autoencoder unit includes the input layer and one hidden layer of the neural network model, and the second autoencoder unit includes two hidden layers of the neural network model. During pre-training, the two auto-encoder units constitute two neural network units respectively, and the two neural network units are pre-trained in sequence to complete the pre-training of the parameters in the two auto-encoder units in sequence.

Figure 4 is a schematic diagram of the structure of the first neural network unit in this application. As shown in Figure 4, a connection layer is added to the first autoencoder unit as the relative output layer of the first DA unit to form the first A neural network unit, train the first neural network unit to obtain the parameters W ₁ and b _{1 of the} first DA unit.

Preferably, the step of pre-training the first autoencoder unit includes:

Input the fusion feature vector of the training sample into the relative input layer of the first neural network unit;

Initially assign the parameters of the first neural network unit, including the connection weight and bias between the relative input layer and the relative hidden layer, and between the relative hidden layer and the relative output layer;

Obtain the output of the relative hidden layer and relative output layer in the first neural network unit through the following equations (1) and (2):

h(y)=σ(W ₁ y+b ₁ )(1)

Among them, W ₁ is the weight value between the relative input layer and the relative hidden layer in the first neural network unit, b ₁ is the bias between the relative input layer and the relative hidden layer in the first neural network unit,

Is the weight value between the relative hidden layer and the relative output layer in the first neural network unit, b ₁₁ is the bias between the relative hidden layer and the relative output layer in the first neural network unit,

Is the output of the relative output layer in the first neural network unit, h(y) is the output of the relative hidden layer in the first neural network unit, y is the input feature vector polluted by noise, and σ(·) is the excitation function , Choose the sigmoid function.

Training the neural network unit based on the minimum loss function, the loss function is shown in the following equation (3):

(W ₁ , b ₁ , b ₁₁ )←argmin(J(W ₁ , b ₁ , b ₁₁ ))

Among them, J is the loss function, X is the original input feature vector not polluted by noise, i is the index of the neuron in the relative output layer of the first neural network unit, and n is the relative output layer of the first neural network unit The number of neurons,

Is the output relative to the output layer in the first neural network unit, and X _i is the original input feature relative to the i-th neuron in the output layer that is not contaminated by noise;

Update the weight value and bias of the neural network unit according to the following equations (4) ~ (9) until the loss function is the smallest,

b′ ₁₁ = b ₁₁ +Δb _m (5)

b′ ₁ = b ₁ +Δb _n (6)

Where J is the loss function, i is the index of the neuron in the relative output layer, j is the index of the neuron in the relative hidden layer,

To update the weight value between the relative output layer and the relative hidden layer in the first neural network unit before updating,

Is the weight value between the relative output layer and the relative hidden layer in the first neural network unit after the update, ΔW _i,j is the _ith neuron relative to the output layer and the relative hidden layer in the first neural network unit The weight error between j neurons, b ₁₁ is the bias between the relative output layer and the relative hidden layer in the first neural network unit before the update, b′ ₁₁ is the relative output in the first neural network unit after the update The offset between the layer and the relative hidden layer, Δb _m is the offset error between the relative output layer and the relative hidden layer in the first neural network unit, b ₁ is the relative hidden layer in the first neural network unit before the update The offset between the relative input layer and the relative input layer, b′ ₁ is the offset between the relative hidden layer and the relative input layer in the first neural network unit after the update, and Δb _n is the relative hidden layer and the relative input layer in the first neural network unit. Relative to the bias error between the input layers, ε is the learning rate,

Is the output of the relative output layer in the first neural network, and h(y) is the output of the relatively hidden layer in the first neural network unit.

After pre-training the first neural network unit, remove the relative output layer of the first neural network unit and its corresponding weight value

And the bias b ₁₁ , only the weight value W ₁ and the bias b ₁ between the relative input layer and the relative hidden layer are retained as the parameters of the first auto-encoder unit.

Figure 5 is a schematic diagram of the structure of the second neural network unit in this application. As shown in Figure 5, a connection layer is added to the second autoencoder unit as the relative output layer of the second DA unit, and the first The relative hidden layer of a neural network unit is used as the relative input layer of the second neural network unit to form the second neural network unit. The second neural network unit is trained to obtain the parameters W ₂ and the first DA unit b ₂ . When pre-training the second autoencoder unit, obtain the output of the relative hidden layer and relative output layer of the second neural network unit through the following equations (10) and (11):

h(h(y))=σ(W ₂ h(y)+b ₂ ) (10)

Among them, W ₂ is the weight value between the relative input layer and the relative hidden layer in the second neural network unit, b ₂ is the bias between the relative input layer and the relative hidden layer in the second neural network unit,

Is the weight value between the relative hidden layer and the relative output layer in the second neural network unit, b ₂₂ is the bias between the relative hidden layer and the relative output layer in the second neural network unit,

Is the output of the relative output layer in the second neural network unit, h(h(y)) is the output of the relatively hidden layer in the second neural network unit, h(y) is the relative input layer in the second neural network unit The input of σ(·) is the excitation function, and the sigmoid function is selected.

Training the neural network unit based on the minimum loss function, the loss function is shown in the following equation (12):

(W ₂ ,b ₂ ,b ₂₂ )←argmin(J(W ₂ ,b ₂ ,b ₂₂ ))

Where J is the loss function, i is the index of the neurons in the relative output layer of the second neural network unit, n is the number of neurons in the relative output layer of the second neural network unit,

Is the output of the i-th neuron in the relative output layer in the second neural network unit, h(X _i ) is the original input feature of the i-th neuron in the second neural network unit relative to the output layer that is not contaminated by noise.

Update the weight value and bias of the neural network unit according to the following formulas (13)～(18) until the loss function is the smallest,

b′ ₂₂ = b ₂₂ +Δb _m′ (14)

b′ ₂ = b ₂ +Δb _n′ (15)

To update the weight value between the relative output layer and the relative hidden layer in the second neural network unit before,

Is the weight value between the relative output layer and the relative hidden layer in the updated second neural network unit, ΔW _i,j is the _ith neuron relative to the output layer and the relative hidden layer in the second neural network unit The weight error between j neurons, b ₂₂ is the bias between the relative output layer and the relative hidden layer in the second neural network unit before the update, b′ ₂₂ is the relative output in the second neural network unit after the update The offset between the layer and the relative hidden layer, Δb _m′ is the offset error between the relative output layer and the relative hidden layer in the second neural network unit, b ₂ is the relative hidden layer in the second neural network unit before the update The bias between the layer and the relative input layer, b′ ₂ is the bias between the relative hidden layer and the relative input layer in the second neural network unit after the update, Δb _n′ is the relative hidden layer in the second neural network unit The bias error between the layer and the relative input layer, ε is the learning rate,

Is the output of the relative output layer in the second neural network, h(h(y)) is the output of the relatively hidden layer in the second neural network unit, h(y) is the relative input layer in the second neural network unit enter.

After the second neural network unit is pre-trained, remove the relative output layer and corresponding weight corresponding to the second neural network unit

And the bias b ₂₂ , only keep the weight W ₂ and the bias b ₂ between the relative input layer and the relative hidden layer in the second neural network unit, as the parameters of the second autoencoder unit, and form a stack In the case of an automatic encoder, stack it on the first automatic encoder unit.

By analogy, the pre-training of multiple neural network units is completed, and the parameters of each auto-encoder unit are obtained.

When a stacked autoencoder model is formed by multiple autoencoder units, an output layer is added above the hidden layer of the last autoencoder unit, and the weight value W ₃ and bias b ₃ of the output layer are randomly initialized , Decoding and restoring, get neural network model and model parameters.

After the pre-training of multiple autoencoder units is completed, the last thing to be done is the overall reverse tuning training. The loss function for tuning training can also use the above-mentioned loss function, and use the gradient descent method from top to bottom (for a neural network model that includes two hidden layers, there are only two layers of backward error propagation during pre-training, but the reverse Back-propagation to the error during tuning training is three layers) to update the weight and bias value.

In an optional embodiment, the step of selecting training samples for the neural network model includes:

Establish a tag library, which stores different tags and tag sequences corresponding to different targets;

Build a picture library, store the pictures that have been determined to contain the target and the corresponding tag sequence, the tag sequence is formed according to the tag sequence in the tag library, the position corresponding to the target in the picture is 1, and the position corresponding to the non-existent target is 0 Tag sequence

Screen the first set number of pictures with known tag sequences from the picture library to construct a feature set;

According to the tag sequence of the feature set, determine the identification tag collection of all the tags that need to be identified in the training set, where the order of the tags in the identification tag collection is consistent with the order of the tags in the tag library;

Select the second set number of positive samples and the third set number of negative samples for each label in the total set of identification labels from the image library to form the training set and the validation set. Among them, the positive sample of a label is the target that contains the corresponding target of the label. A picture, a negative sample of a label is a picture that does not contain the target corresponding to the label, the training set is a positive sample and a negative sample, and the validation set is a label sequence of positive and negative samples.

Preferably, the step of performing reverse tuning training on the neural network model according to the initial parameters includes:

The multiple positive samples of the training set of the first label in the label library are sequentially input into the pre-trained neural network model, where the fusion feature of each positive sample is input to the input layer, and the prediction vector of the output label of the output layer is obtained. The average value of the loss function of multiple positive samples is used as the loss value of the label;

Reversely update the parameters of the neural network according to the loss value of the predicted label sequence of the first label in the identification label set and the label sequence of the corresponding verification set;

Repeat the above two steps until the training of the last tag in the tag library is completed;

Repeat the above three steps to input the negative samples in the tag library in the order of tags into the neural network structure in turn, and update the parameters of the neural network.

Suppose the population size is P, randomly generate the initial population of P individuals, G=(G ₁ , G ₂ ,..., G _p ) ^T , select the random real numbers in the set symmetry interval to form a real number vector of length S, in the population Individual G _O = (g ₁ , g ₂ ,..., g _S ), O = 1, 2,..., P, S = n*l+l*m+l+m, n is the number of nodes in the input layer, l the number of nodes in the hidden layer, m is the number of nodes of the output layer, g _s is the subject of the G _O S gene;

Take each individual gene as the initial assignment of the hidden layer and output layer connection parameters of the neural network model, the initial input layer and hidden layer connection parameters, the initial hidden layer threshold parameter and the initial output layer threshold parameter, The samples belonging to each label are substituted into the hidden layer of the neural network and the output model of the output layer for training, and the output of each node of the output layer corresponding to each sample is obtained, thereby obtaining the fitness of each individual, where:

among them,

A plurality of tags average loss of samples during the initial parameters of the neural network is assigned an individual G _O,

G is the initial population of individual fitness of _O G;

Using a roulette operator, a selection strategy based on the fitness ratio is used to select individuals in the initial population, and the selected individuals G _{u are obtained} ;

The single-point crossover operator is used to perform crossover update on selected individuals. The maximum value of each gene after the update is used as the upper bound of the gene, and the minimum value of each gene after the update is taken as the lower bound of the gene;

The mutation operation is performed on the selected individuals that have undergone cross-update, and the mutated individuals are obtained, which are substituted into the individual evaluation subunit to evolve the initial population, where:

Among them, g _j is the jth gene of the individual G _u selected, g _jmax and g _jmin are the upper and lower bounds of the gene g _j , and r _q is the pseudo-random number generated for the qth time when the individual G _u is selected, Iter _now is the current evolution algebra, iter _max is the maximum evolution algebra set, g _j 'is the jth gene of the individual G _u selected after evolution.

Judge that the change of individual fitness value after evolution is less than the set target value;

If it is less than the set target value, output the optimal population individual as the final initial value of the connection parameters of the hidden layer and the output layer, the connection parameters of the input layer and the hidden layer, the hidden layer threshold parameter and the output layer threshold parameter ；

If it is not less than the set target value, the initial population after evolution is assigned to the parameters of the neural network model, and the above steps are repeated until the individual fitness value change after evolution is less than the set target value.

In this application, a number of high-resolution remote sensing images are combined to conduct comparative experiments and analysis to verify the effectiveness and accuracy of the remote sensing image target extraction method of this application, as shown in Figs. 6a-7b.

In order to verify the effectiveness and accuracy of this method, linear discriminant analysis (LDA), linear regression model (LR), statistical learning model (Support Vector Machine, SVM), integrated learning are used in the same environment Compare the model (Random Forest, RF), Extreme Learning Machine (ELM), Multi-Layer Perceptron (MLP) and the Deep Neural Networks (DNN) of this application, Table 1 and Table 2 are comparisons of the extraction accuracy evaluation of the above-mentioned methods and the original remote sensing image I and the original remote sensing image II of the application. The results show that the deep neural network method of the application has the highest cross-validation accuracy.

Table 1 Accuracy evaluation results of remote sensing image I experiment

Table 2 Accuracy evaluation results of remote sensing image II experiment

Figure 8 is a schematic diagram of a remote sensing image target extraction device based on super-object information in this application. As shown in Figure 8, the remote sensing image target extraction device includes an acquisition module 1, a segmentation module 2, a feature extraction module 3, a feature fusion module 4, and input Module 5 and output module 6, where:

Acquisition module 1, to acquire remote sensing images;

Segmentation module 2, to segment the remote sensing image to obtain multiple segmentation basic units of the remote sensing image;

The feature extraction module 3 extracts the image features of the basic segmentation unit to form a first feature vector, and combines the super-object feature information of the target to be extracted in the basic segmentation unit to form a second feature vector;

The feature fusion module 4 fuses the first feature vector and the second feature vector to form a fusion feature vector;

Input module 5, input the fusion feature vector into the trained neural network model;

The output module 6 outputs the target category corresponding to the segmentation basic unit through the neural network model, realizes the target extraction through the classification method, and distinguishes the target to be extracted from other categories.

Preferably, the feature extraction module 3 includes: a first segmentation unit, which uses a region growing method to segment the basic segmentation unit to obtain a plurality of first sub-images; Arrange bottom-up; the first feature vector forming unit extracts image features from multiple first sub-images in a bottom-up order to form a first feature vector, where the image features are derived from the original spectrum-space joint information In the extraction, the extracted image features include spectrum, texture and shape and other spectrum-space multiple structural features. Different features have different spectrum and spatial information.

Preferably, the feature extraction module 3 further includes: a second segmentation unit, which performs multi-level segmentation and merging of the segmentation basic unit by setting different region growth and merging thresholds to obtain multiple levels of second sub-images, and grow according to the set region Combining the difference of thresholds, the target to be extracted at each over-segmentation level is respectively associated with the corresponding super-object, and after the association, the super-object feature information of the target to be extracted at the same position in multiple levels of sub-images is formed; the second arrangement Unit, which arranges the second sub-images of multiple levels from top to bottom in the order of the region growth and merge threshold from large to small; the super-object determination unit respectively determines that the second sub-images of each level correspond to the target to be extracted The feature information of the super-object; the feature fusion unit, which combines the feature information of the super-object at the same position in the second sub-image of multiple levels from top to bottom, and fuses it to the second sub-image at the bottom; the second feature vector The extraction unit extracts the second feature vector according to the second sub-image at the bottom layer.

In this application, when the first feature vector and the second feature vector are formed, after multi-scale and multi-level segmentation is performed on the remote sensing image, at each scale level, the sub-image of the current scale is used as the target for feature extraction, and The target features of the first sub-image and its corresponding second sub-image (combined with the super-object information) are subjected to vector superposition and fusion, and then the fused feature vector is input to the neural network model.

Preferably, the neural network model is a stacked noise reduction autoencoder model, including an input layer, multiple hidden layers and an output layer. In this application, by inputting the fusion feature vector into the neural network model, the bottom-up mapping process from the original input to the hidden feature space and the top-down hidden feature mapping process from the output result to the original input are realized Combined.

Preferably, it also includes a training module to train a neural network model. Training modules include:

Select the unit and select the training sample. The training sample is selected from the multiple segmentation basic units obtained after the remote sensing image is segmented. Each selected segmentation basic unit is used as a training sample. Among them, three key considerations are given to the selection of training samples for the unit. Aspects of the problem, such as the selection of training samples above, will not be repeated here;

The fusion feature vector obtaining unit obtains the fusion feature vector of the training sample;

The pre-training unit inputs the fusion feature vector of the training sample into the neural network model, pre-trains the neural network model, and obtains the initial parameters of the neural network model (where the parameters include the connection weights and biases between each connection layer);

The reverse tuning training unit performs reverse tuning training on the neural network model according to the initial parameters.

Preferably, the pre-training unit includes: dividing sub-units to divide the neural network model into multiple auto-encoder units; pre-training sub-units to pre-train each auto-encoder unit; DA unit parameter acquisition unit, through pre-training The training result obtains the parameters of each autoencoder unit; the initialization unit, which randomly initializes the parameters between the output layer of the neural network model and the upper layer of the connection layer; the initial parameter acquisition unit, the pre-training result and the parameters obtained by random initialization As the initial parameters of the neural network model.

Preferably, the divided subunits divide the neural network model in the following manner: each hidden layer in the neural network model and the upper layer of the hidden layer constitute an autoencoder unit; the number of divided autoencoder units and the neural network model The number of hidden layers in the middle is equal. Each divided autoencoder unit includes two connected layers. The first autoencoder unit divided includes the input layer in the neural network model and an adjacent hidden layer. The autoencoder unit includes two hidden layers in the neural network model, and the hidden layer of the first autoencoder unit is used as the input layer of the second autoencoder unit, and the hidden layer of the second autoencoder unit is used as The input layer of the third autoencoder unit, and so on, divide the neural network model into multiple autoencoder units.

The pre-training sub-unit separately pre-trains each autoencoder unit in the following way:

Pre-train the first neural network unit;

The remote sensing image target extraction method of this application is applied to electronic devices, which can be terminal devices such as televisions, smart phones, tablet computers, and computers.

The electronic device includes: a processor; a memory for storing a remote sensing image target extraction program; the processor executes the remote sensing image target extraction program to implement the following steps of the remote sensing image target extraction method:

Obtain remote sensing images;

Segment the remote sensing image to obtain multiple segmentation basic units of the remote sensing image;

Extract the image features of the basic segmentation unit to form a first feature vector, and combine the super-object feature information of the target to be extracted in the basic segmentation unit to form a second feature vector;

Fuse the first feature vector and the second feature vector to form a fusion feature vector;

Input the fusion feature vector into the trained neural network model;

The neural network model outputs the target category corresponding to the segmentation basic unit, and realizes the target extraction through the classification method to distinguish the target to be extracted from other categories.

Electronic equipment also includes network interfaces and communication buses. Among them, the network interface may include a standard wired interface and a wireless interface, and the communication bus is used to realize the connection and communication between various components.

The memory includes at least one type of readable storage medium, which can be a non-volatile storage medium such as a flash memory, a hard disk, an optical disk, or a plug-in hard disk, etc., and is not limited to this, and can be stored in a non-transitory manner Any device that provides instructions or software and any associated data files to the processor so that the processor can execute the instructions or software program. In this application, the software program stored in the memory includes a remote sensing image target extraction program, and the remote sensing image target extraction program can be provided to the processor so that the processor can execute the remote sensing image target extraction program to implement the steps of the remote sensing image target extraction method .

The processor can be a central processing unit, a microprocessor, or other data processing chips, etc., and can run a program stored in the memory, for example, the remote sensing image target extraction program in this application.

The electronic device may also include a display, which may also be called a display screen or a display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an organic light-emitting diode (OLED) touch device, and the like. The display is used to display the information processed in the electronic device and to display the visual work interface.

The electronic device may also include a user interface, and the user interface may include an input unit (such as a keyboard), a voice output device (such as a stereo, earphone), and the like.

In other embodiments, the remote sensing image target extraction program can also be divided into one or more modules, and one or more modules are stored in the memory and executed by the processor to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions. The multiple modules of the remote sensing image target extraction program are roughly the same as the specific implementation of the remote sensing image target extraction device described above, and will not be repeated here.

In an embodiment of the present application, a computer non-volatile readable storage medium may be any tangible medium that contains or stores a program or instruction, the program can be executed, and the stored program instructs related hardware to realize the corresponding function. For example, the computer non-volatile readable storage medium may be a computer disk, hard disk, random access memory, read-only memory, etc. The present application is not limited to this, and can be any device that stores instructions or software and any related data files or data structures in a non-transitory manner and can be provided to the processor to enable the processor to execute the programs or instructions therein. The computer non-volatile readable storage medium includes a remote sensing image target extraction program. When the remote sensing image target extraction program is executed by the processor, the following remote sensing image target extraction method is realized: acquiring remote sensing images; segmenting remote sensing images to obtain remote sensing images Multiple segmentation basic units; extract the image features of the segmentation basic unit to form a first feature vector, combine the super-object feature information of the target to be extracted in the segmentation basic unit to form a second feature vector; merge the first feature vector and the second feature vector to form Fusion feature vector; input the fusion feature vector into the trained neural network model; output the target category corresponding to the segmentation basic unit through the neural network model, realize the target extraction through the classification method, and distinguish the target to be extracted from other categories.

The specific implementation of the computer non-volatile readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned remote sensing image target extraction method and device, and electronic equipment, and will not be repeated here.

It should be noted that in this article, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article or method that includes the element.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.

Claims

A remote sensing image target extraction method based on super-object information, applied to electronic equipment, characterized in that it includes:

Obtain remote sensing images;

Segmenting the remote sensing image to obtain multiple segmentation basic units of the remote sensing image;

Extracting image features of the basic segmentation unit to form a first feature vector, and combining the super-object feature information of the target to be extracted in the basic segmentation unit to form a second feature vector;

Fusing the first feature vector and the second feature vector to form a fusion feature vector;

Input the fusion feature vector into a trained neural network model;

The target category corresponding to the segmentation basic unit is output through the neural network model.
The remote sensing image target extraction method based on super-object information according to claim 1, wherein the step of extracting the image features of the basic segmentation unit to form a first feature vector comprises:

Segmenting the basic segmentation unit by using a region growing method to obtain a plurality of first sub-images;

Arranging the plurality of first sub-images from the bottom to the top according to the order of the multispectral bands;

The image features are respectively extracted from the plurality of first sub-images in a bottom-up order to form a first feature vector, wherein the image features are extracted from the original spectrum-space joint information.
The remote sensing image target extraction method based on super-object information according to claim 2, wherein the step of extracting the second feature vector in combination with the super-object feature information of the target to be extracted in the basic segmentation unit comprises:

Perform multi-level segmentation and merging of the basic segmentation unit by setting different region growth and merging thresholds to obtain multiple levels of second sub-images. According to the set region growth and merging thresholds, each over-segmentation level is pending The extraction target is respectively associated with the corresponding super object;

Arranging the second sub-images of multiple levels from top to bottom in the order of the region growth and merging threshold value from large to small;

Respectively determine the super-object feature information corresponding to the target to be extracted in the second sub-image of each level;

Feature fusion of the super-object feature information at the same position in the second sub-images of multiple levels from top to bottom, and fusion to the second sub-image at the bottom;

The second feature vector is formed by extracting the second sub-image at the bottom layer.
The remote sensing image target extraction method based on super-object information according to claim 1, wherein the neural network model is a stacked noise reduction autoencoder model, including an input layer, multiple hidden layers, and an output layer.
The remote sensing image target extraction method based on super-object information according to claim 4, wherein the training step of the neural network model comprises:

Selecting training samples, the training samples being selected from a plurality of segmentation basic units obtained after segmenting the remote sensing image;

Acquiring a fusion feature vector of the training sample;

Input the fusion feature vector of the training sample into the neural network model, perform pre-training on the neural network model, and obtain initial parameters of the neural network model;

Perform reverse tuning training on the neural network model according to the initial parameters.
The remote sensing image target extraction method based on super-object information according to claim 5, wherein the step of pre-training the neural network model and obtaining the initial parameters of the neural network model comprises:

Dividing the neural network model into multiple auto-encoder units;

Pre-training each auto-encoder unit separately;

Obtain the parameters of each auto-encoder unit through the pre-training results;

Random initialization of the parameters between the output layer of the neural network model and the upper connection layer;

The pre-training result and the parameters obtained by random initialization are used as the initial parameters of the neural network model.
The remote sensing image target extraction method based on super-object information according to claim 6, wherein the step of dividing the neural network model into a plurality of auto-encoder units comprises:

Each hidden layer in the neural network model and an upper layer of the hidden layer constitute an auto-encoder unit;

Pre-training each autoencoder unit separately includes:

Add a connection layer to each auto-encoder unit as the relative output layer of the auto-encoder unit to construct multiple neural network units, each neural network unit includes a relative input layer, a relative hidden layer, and a relative output layer ；

Pre-train the first neural network unit;

The relative hidden layer of the pre-trained first neural network unit is used as the relative input layer of the next neural network unit, and the pre-training of each autoencoder unit is completed in turn.
The remote sensing image target extraction method based on super-object information according to claim 7, wherein the neural network model includes an input layer, two hidden layers and an output layer, divided into two autoencoder units, The first autoencoder unit includes the input layer of the neural network model and a hidden layer, the second autoencoder unit includes two hidden layers of the neural network model, and a connection layer is added to the first autoencoder unit As the relative output layer of the first autoencoder unit, it constitutes the first neural network unit. A connection layer is added to the second autoencoder unit as the relative output layer of the second autoencoder unit. The relative hidden layer of a neural network unit serves as the relative input layer of the second neural network unit, forming the second neural network unit.
The remote sensing image target extraction method based on super-object information according to claim 8, wherein the step of pre-training the first neural network unit comprises:

Input the fusion feature vector of the training sample into the relative input layer of the first neural network unit;

Initially assigning the parameters of the first neural network unit, including the connection weight value and bias between the relative input layer and the relative hidden layer, and between the relative hidden layer and the relative output layer;

Obtain the output of the relative hidden layer and relative output layer in the first neural network unit through the following formulas (1) and (2):

h(y)=σ(W 1 y+b 1 ) (1)

Among them, W 1 is the weight value between the relative input layer and the relative hidden layer in the first neural network unit, b 1 is the bias between the relative input layer and the relative hidden layer in the first neural network unit,
Is the weight value between the relative hidden layer and the relative output layer in the first neural network unit, b 11 is the bias between the relative hidden layer and the relative output layer in the first neural network unit,
Is the output of the relative output layer in the first neural network unit, h(y) is the output of the relative hidden layer in the first neural network unit, y is the input feature vector polluted by noise, and σ(·) is the excitation function ；

The neural network unit is trained based on the minimum loss function, and the loss function is shown in the following equation (3):

Among them, J is the loss function, X is the original input feature vector not polluted by noise, i is the index of the neuron in the relative output layer of the first neural network unit, and n is the relative output layer of the first neural network unit The number of neurons,
Is the output relative to the output layer in the first neural network unit, and X i is the original input feature relative to the i-th neuron in the output layer that is not contaminated by noise;

Update the weight value and bias of the neural network unit according to the following equations (4) to (9) until the loss function is the smallest,

b′ 11 = b 11 +Δb m (5)

b′ 1 = b 1 +Δb n (6)

Where J is the loss function, i is the index of the neuron in the relative output layer, j is the index of the neuron in the relative hidden layer,
To update the weight value between the relative output layer and the relative hidden layer in the first neural network unit before updating,
Is the weight value between the relative output layer and the relative hidden layer in the first neural network unit after the update, ΔW i,j is the ith neuron relative to the output layer and the relative hidden layer in the first neural network unit The weight error between j neurons, b 11 is the bias between the relative output layer and the relative hidden layer in the first neural network unit before the update, b′ 11 is the relative output in the first neural network unit after the update The offset between the layer and the relative hidden layer, Δb m is the offset error between the relative output layer and the relative hidden layer in the first neural network unit, b 1 is the relative hidden layer in the first neural network unit before the update The offset between the relative input layer and the relative input layer, b′ 1 is the offset between the relative hidden layer and the relative input layer in the first neural network unit after the update, and Δb n is the relative hidden layer and the relative input layer in the first neural network unit. Relative to the bias error between the input layers, ε is the learning rate,
Is the output of the relative output layer in the first neural network, and h(y) is the output of the relatively hidden layer in the first neural network unit.
The remote sensing image target extraction method based on super-object information according to claim 9, wherein the step of pre-training the second neural network unit comprises:

Obtain the output of the relative hidden layer and relative output layer of the second neural network unit through the following equations (10) and (11):

h(h(y))=σ(W 2 h(y)+b 2 ) (10)

Among them, W 2 is the weight value between the relative input layer and the relative hidden layer in the second neural network unit, b 2 is the bias between the relative input layer and the relative hidden layer in the second neural network unit,
Is the weight value between the relative hidden layer and the relative output layer in the second neural network unit, b 22 is the bias between the relative hidden layer and the relative output layer in the second neural network unit,
Is the output of the relative output layer in the second neural network unit, h(h(y)) is the output of the relatively hidden layer in the second neural network unit, h(y) is the relative input layer in the second neural network unit The input of σ(·) is the excitation function, and the sigmoid function is selected;

The neural network unit is trained based on the minimum loss function, and the loss function is shown in the following equation (12):

Where J is the loss function, i is the index of the neurons in the relative output layer of the second neural network unit, n is the number of neurons in the relative output layer of the second neural network unit,
Is the output of the i-th neuron in the relative output layer in the second neural network unit, h(X i ) is the original input feature of the i-th neuron in the second neural network unit relative to the output layer that is not contaminated by noise;

Update the weight value and bias of the neural network unit according to the following equations (13) to (18) until the loss function is the smallest,

b′ 22 = b 22 +Δb m′ (14)

b′ 2 = b 2 +Δb n′ (15)

Where J is the loss function, i is the index of the neuron in the relative output layer, j is the index of the neuron in the relative hidden layer,
To update the weight value between the relative output layer and the relative hidden layer in the second neural network unit before,
Is the weight value between the relative output layer and the relative hidden layer in the updated second neural network unit, ΔW i,j is the ith neuron relative to the output layer and the relative hidden layer in the second neural network unit The weight error between j neurons, b 22 is the bias between the relative output layer and the relative hidden layer in the second neural network unit before the update, b′ 22 is the relative output in the second neural network unit after the update The offset between the layer and the relative hidden layer, Δb m′ is the offset error between the relative output layer and the relative hidden layer in the second neural network unit, b 2 is the relative hidden layer in the second neural network unit before the update The bias between the layer and the relative input layer, b′ 2 is the bias between the relative hidden layer and the relative input layer in the second neural network unit after the update, Δb n′ is the relative hidden layer in the second neural network unit The bias error between the layer and the relative input layer, ε is the learning rate,
Is the output of the relative output layer in the second neural network, h(h(y)) is the output of the relatively hidden layer in the second neural network unit, h(y) is the relative input layer in the second neural network unit enter.
The remote sensing image target extraction method based on super-object information according to claim 10, wherein the training step of the output layer of the neural network model comprises:

After pre-training the first neural network unit, remove the relative output layer of the first neural network unit and its corresponding weight value
And bias b 11 , only the weight value W 1 and bias b 1 between the relative input layer and the relative hidden layer are retained as the parameters of the first auto-encoder unit;

After the second neural network unit is pre-trained, remove the relative output layer and corresponding weight corresponding to the second neural network unit
And the bias b 22 , only keep the weight W 2 and the bias b 2 between the relative input layer and the relative hidden layer in the second neural network unit, as the parameters of the second autoencoder unit, and form a stack When the automatic encoder is used, stack it on the first automatic encoder unit;

An output layer is added above the hidden layer of the second autoencoder unit, and the weight value W 3 and the bias b 3 of the output layer are randomly initialized, decoded and restored, and the neural network model and model parameters are obtained.
The remote sensing image target extraction method based on super-object information according to claim 5, wherein the step of selecting training samples comprises:

Establishing a tag library, which stores different tags and tag sequences corresponding to different targets;

Build a picture library, store pictures that have been determined to contain the target and the corresponding tag sequence, the tag sequence is based on the tag sequence in the tag library, the position corresponding to the target in the picture is 1, and the position corresponding to the non-existent target is 0 The formed tag sequence;

Screen the first set number of pictures with known tag sequences from the picture library to construct a feature set;

According to the tag sequence of the feature set, determine the identification tag collection of all the tags that need to be identified in the training set, where the order of the tags in the identification tag collection is consistent with the order of the tags in the tag library;

The second set number of positive samples and the third set number of negative samples of each label in the total set of identification tags are selected from the picture library to form the training set and the validation set, where a positive sample of a label is corresponding to the label A picture of a target, a negative sample of a label is a picture that does not contain the target corresponding to the label, the training set is the positive sample and the negative sample, and the verification set is the label sequence of the positive sample and the negative sample.
The remote sensing image target extraction method based on super-object information according to claim 12, wherein the step of performing reverse tuning training on the neural network model according to the initial parameters comprises:

The multiple positive samples of the training set of the first label in the label library are sequentially input into the pre-trained neural network model, where the fusion feature of each positive sample is input to the input layer, and the prediction vector of the output label of the output layer is obtained. The average value of the loss function of multiple positive samples is used as the loss value of the label;

Reversely update the parameters of the neural network according to the loss value of the predicted label sequence of the first label in the identification label set and the label sequence of the corresponding verification set;

Repeat the above two steps until the training of the last tag in the tag library is completed;

Repeat the above three steps to input the negative samples in the tag library in the order of tags into the neural network structure in turn, and update the parameters of the neural network.
The remote sensing image target extraction method based on super-object information according to claim 12, wherein the step of pre-training the neural network model to obtain the initial parameters of the neural network model comprises:

Suppose the population size is P, randomly generate the initial population of P individuals, G=(G 1 , G 2 ,..., G p ) T , select the random real numbers in the set symmetry interval to form a real number vector of length S, in the population Individual G O = (g 1 , g 2 ,..., g S ), O = 1, 2,..., P, S = n*l+l*m+l+m, n is the number of nodes in the input layer, l the number of nodes in the hidden layer, m is the number of nodes of the output layer, g s is the subject of the G O S gene;

Take each individual gene as the initial assignment of the hidden layer and output layer connection parameters of the neural network model, the initial input layer and hidden layer connection parameters, the initial hidden layer threshold parameter and the initial output layer threshold parameter, The samples belonging to each label are substituted into the hidden layer of the neural network and the output model of the output layer for training, and the output of each node of the output layer corresponding to each sample is obtained, thereby obtaining the fitness of each individual, where:

among them,
A plurality of tags average loss of samples during the initial parameters of the neural network is assigned an individual G O,
G is the initial population of individual fitness of O G;

Using a roulette operator, a selection strategy based on the fitness ratio is used to select individuals in the initial population, and the selected individuals G u are obtained ;

A single-point crossover operator is used to perform cross update on selected individuals. The maximum value of each gene after update is used as the upper bound of the gene, and the minimum value of each gene after update is used as the gene The lower bound

The mutation operation is performed on the selected individuals that have undergone cross-update, and the mutated individuals are obtained, which are substituted into the individual evaluation subunit to evolve the initial population, where:

Among them, g j is the jth gene of the individual G u selected, g jmax and g jmin are the upper and lower bounds of the gene g j , and r q is the pseudo-random number generated for the qth time when the individual G u is selected, iter now is the current evolutionary algebra, iter max is the set maximum evolutionary algebra, g j 'is the jth gene of the individual G u selected after evolution;

Judge that the change of individual fitness value after evolution is less than the set target value;

If it is less than the set target value, output the optimal population individual as the final initial value of the connection parameters of the hidden layer and the output layer, the connection parameters of the input layer and the hidden layer, the hidden layer threshold parameter and the output layer threshold parameter ；

If it is not less than the set target value, the initial population after evolution is assigned to the parameters of the neural network model, and the above steps are repeated until the individual fitness value change after evolution is less than the set target value.
The remote sensing image target extraction method based on super-object information according to claim 3, wherein the step of separately determining the super-object feature information corresponding to the target to be extracted in the second sub-image of each level comprises :

Divide the second sub-images of each level into blocks, and divide them into multiple third sub-images;

By formula

Obtain the similarity between multiple first sub-images and multiple third sub-images, where s 1,3 represents the similarity between multiple first sub-images and one third sub-image, (x 1 ,x 2 ,... ., x d ) are the first feature vectors of multiple first sub-images, (y 1 , y 2 ,..., y d ) are the feature vectors of a third sub-image;

The similarities between the multiple first sub-images and the multiple third sub-images constitute a similarity matrix;

Iteratively update the attractiveness and attribution among multiple third sub-images through the similarity matrix to obtain the third sub-image that meets the conditions that the autocorrelation attribution and autocorrelation attractiveness are both greater than 0;

Obtain the sum of the attribution and attraction of each third sub-image that meets the conditions and other third sub-images;

Use the other third sub-images corresponding to the maximum value of the sum of each eligible third sub-image as the cluster center of each eligible third sub-image, thereby obtaining each eligible third sub-image For the cluster centers of sub-images, the third sub-images that belong to the same cluster center are clustered into one category, and the feature information of each category after clustering is used as the second sub-image of the level and all The super-object feature information corresponding to the target to be extracted.
A remote sensing image target extraction device based on super-object information, characterized in that it comprises:

Acquisition module to acquire remote sensing images;

A segmentation module for segmenting the remote sensing image to obtain multiple segmentation basic units of the remote sensing image;

The feature extraction module extracts the image features of the basic segmentation unit to form a first feature vector, and combines the super-object feature information of the target to be extracted in the basic segmentation unit to form a second feature vector;

A feature fusion module, fusing the first feature vector and the second feature vector to form a fusion feature vector;

An input module, which inputs the fusion feature vector into a trained neural network model;

The output module outputs the target category corresponding to the segmentation basic unit through the neural network model.
The remote sensing image target extraction device based on super-object information according to claim 16, wherein the feature extraction module comprises:

The first segmentation unit uses a region growing method to segment the basic segmentation unit to obtain a plurality of first sub-images;

The first arrangement unit arranges the plurality of first sub-images from the bottom to the top according to the order of the multispectral bands;

The first feature vector forming unit extracts image features from the multiple first sub-images in a bottom-up order to form a first feature vector, wherein the image features are extracted from the original spectrum-space joint information;

The second segmentation unit performs multi-level segmentation and merging of the basic segmentation unit by setting different region growth and merging thresholds to obtain multiple levels of second sub-images. According to the set region growth merging thresholds, each The target to be extracted at the over-segmentation level is respectively associated with the corresponding super-object, and after the association, the super-object feature information of the target to be extracted at the same position in the sub-images of multiple levels is formed;

The second arrangement unit arranges the second sub-images of multiple levels in a descending order of the region growth and merging threshold from top to bottom;

The super-object determining unit separately determines the super-object feature information corresponding to the target to be extracted in the second sub-image of each level;

Feature fusion unit, to perform feature fusion of the super-object feature information at the same position in the second sub-images of multiple levels from top to bottom, and fuse it to the second sub-image at the bottom;

The second feature vector extraction unit extracts the second feature vector according to the second sub-image at the bottom layer.
The remote sensing image target extraction device based on super-object information according to claim 16, characterized in that it comprises a training module, and training the neural network model comprises:

A selecting unit, selecting a training sample, the training sample is selected from a plurality of segmentation basic units obtained after the remote sensing image is segmented, and each selected segmentation basic unit is used as a training sample;

A fusion feature vector acquiring unit to acquire the fusion feature vector of the training sample;

A pre-training unit, inputting the fusion feature vector of the training sample into the neural network model, pre-training the neural network model, and obtaining initial parameters of the neural network model;

The reverse tuning training unit performs reverse tuning training on the neural network model according to the initial parameters.
An electronic device, characterized in that it comprises:

processor;

A memory, the memory includes a remote sensing image target extraction program, when the remote sensing image target extraction program is executed by the processor, the steps of the remote sensing image target extraction method according to any one of claims 1 to 15 are realized.
A computer non-volatile readable storage medium, wherein the computer non-volatile readable storage medium includes a remote sensing image target extraction program, and when the remote sensing image target extraction program is executed by a processor, The steps of the remote sensing image target extraction method according to any one of claims 1 to 15.