CN116052018A - Remote sensing image interpretation method based on life learning - Google Patents

Remote sensing image interpretation method based on life learning Download PDF

Info

Publication number
CN116052018A
CN116052018A CN202310331512.5A CN202310331512A CN116052018A CN 116052018 A CN116052018 A CN 116052018A CN 202310331512 A CN202310331512 A CN 202310331512A CN 116052018 A CN116052018 A CN 116052018A
Authority
CN
China
Prior art keywords
interpretation
layer
remote sensing
model
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310331512.5A
Other languages
Chinese (zh)
Other versions
CN116052018B (en
Inventor
张广益
陈宇
鲁锦涛
吴皓
张玥珺
李洁
邹圣兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shuhui Spatiotemporal Information Technology Co ltd
Original Assignee
Beijing Shuhui Spatiotemporal Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shuhui Spatiotemporal Information Technology Co ltd filed Critical Beijing Shuhui Spatiotemporal Information Technology Co ltd
Priority to CN202310331512.5A priority Critical patent/CN116052018B/en
Publication of CN116052018A publication Critical patent/CN116052018A/en
Application granted granted Critical
Publication of CN116052018B publication Critical patent/CN116052018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention discloses a remote sensing image interpretation method based on life learning, which relates to the field of remote sensing image processing and comprises the following steps: s1, constructing a combined model; s2, acquiring a training sample to pretrain the combined model to obtain a first scene classification result; s3, obtaining a remote sensing image to be interpreted, and uniformly cutting; s4, inputting the cut remote sensing images to be interpreted into the combined model in sequence to obtain a second scene classification result and interpretation information; s5, calculating to obtain a scene difference value; s6, calculating to obtain an interpretation loss value; s7, setting a selection strategy based on the scene difference value and the interpretation loss value, and retraining and expanding the dynamic expandable interpretation sub-model according to the selection strategy to obtain a final combined model; s8, interpreting the newly interpreted remote sensing image through the final combined model. The invention realizes the life learning oriented to remote sensing interpretation based on the dynamic extensible network, and avoids the problem of disastrous forgetting common in life learning.

Description

Remote sensing image interpretation method based on life learning
Technical Field
The invention relates to the field of remote sensing image processing, in particular to a remote sensing image interpretation method based on life-long learning.
Background
In the 21 st century, we collected high-resolution remote sensing images at multiple angles through various devices such as satellites, unmanned aerial vehicles, digital cameras, imaging spectrometers, space planes and the like, and applied the images to different fields. How to rapidly and effectively process a large amount of remote sensing image data is an urgent problem to be solved in the remote sensing field. Clearly, manually processing the remote sensing image, while highly accurate, is inefficient and requires significant cost to put in, which is not desirable. The traditional remote sensing image method utilizes the information such as the geometric shape, the spatial position and the like of an object to extract the characteristics of the object, and can also extract the effective characteristics of three-dimensional data by combining the characteristic information such as color, shadow, texture and the like and LiDAR or SAR. The single method for extracting the characteristics has certain defects, such as insufficient classification effect, more classification errors and the like, and a good balance point between the distinguishability and the robustness cannot be maintained. However, increasingly sophisticated machine learning techniques may be applied in many areas of our lives, especially in deep learning to train the network, ultimately allowing accurate predictions of unknown samples by the model. The remote sensing technology provides a large amount of reliable data, which lays a foundation for the development of the deep learning model. The deep learning can be applied to the fields of classification, semantic segmentation, detection and the like of remote sensing images, and plays a certain role in promoting the better development of remote sensing technology.
The current deep learning method applied to remote sensing image interpretation faces a common problem that when facing different interpretation tasks, a brand new deep learning model needs to be built and brand new training needs to be performed in order to achieve higher interpretation accuracy, which results in huge engineering quantity and low model training efficiency in engineering implementation, extremely low effective utilization rate and multiplexing rate of the existing remote sensing image data and the built model, and limits large-scale engineering implementation. To solve this problem and drive the automated development of remote sensing image interpretation, researchers have attempted to multiplex existing models and already learned knowledge in new remote sensing interpretation tasks using on-line learning and continuous learning methods. Among the existing continuous learning methods, the simplest method is to train the original network through new training data provided by new tasks so as to realize network fine adjustment. However, this simple retraining approach reduces the interpretation of both new and old tasks by the original network. If the correlation between the new and old tasks is low, for example, two tasks are to classify two different kinds of features, such as wheat and buildings, then the features learned by the network from the old task may not have any effect on the new task. Another problem that can be encountered is the catastrophic forgetting problem, where the original network can forget what was learned before after learning new knowledge, caused by two points: (1) Because the structure of deep learning is difficult to adjust during training once determined, the structure of the neural network directly determines the capacity of the learning model. The neural network with a fixed structure means that the capacity of the model is limited, and in the case of limited capacity, the neural network must erase old knowledge in order to learn a new task; (2) Second, neurons of the hidden layer of deep learning are global, and small changes of individual neurons can affect the output results of the entire network at the same time. In addition, all parameters of the feed forward network are connected to each dimension of the input, and new data is highly likely to change all parameters in the network. For neural networks that are fixed in their own right, the parameters are the only variables about knowledge. If the changed parameters include parameters that are highly relevant to the historical knowledge, the net effect is that the new knowledge overrides the old knowledge.
For the remote sensing field, how to ensure that the original capability of the model on the old interpretation task is not reduced while the better effect is obtained on the new interpretation task, and how to overcome the problem of catastrophic forgetting are important problems to be solved in the development of the current remote sensing lifelong learning technology.
Disclosure of Invention
The invention provides a remote sensing image interpretation method based on life learning, which realizes the life learning method suitable for remote sensing image interpretation by combining a remote sensing image scene classification model and a combined model of a dynamic extensible remote sensing image interpretation model. The known tasks and the unknown tasks are identified through the remote sensing image scene classification, and further, the expansion of the model capacity and the learning of the unknown tasks facing the new unknown tasks are realized through the expansion and the retraining of the interpretation network, so that the knowledge is continuously updated. The learned knowledge is fully applied to a new remote sensing interpretation task, so that the problem of catastrophic forgetting is effectively avoided while the interpretation accuracy is not reduced, and the utilization rate of the existing model and data is improved.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a remote sensing image interpretation method based on life learning comprises the following steps:
s1, constructing a combined model, wherein the combined model comprises a dynamic extensible interpretation sub-model and a scene classification sub-model, and the scene classification sub-model comprises a scene classifier and a memory;
s2, obtaining training samples in a sample library, pre-training the combination model by the aid of the cut training samples, and taking the obtained pre-training results as first scene classification results and storing the first scene classification results in a memory;
s3, obtaining a plurality of remote sensing images to be interpreted, and uniformly cutting the remote sensing images to be interpreted, wherein each remote sensing image to be interpreted comprises marked ground object samples and unlabeled target interpretation samples, and the marked ground object samples comprise real labels;
s4, sequentially inputting the cut remote sensing images to be interpreted into a combined model to obtain a second scene classification result and interpretation information, wherein the interpretation information comprises interpretation information of marked ground object samples and interpretation information of unlabeled target interpretation samples;
s5, calculating a second scene classification result and a first scene classification result to obtain a scene difference value;
s6, calculating the interpretation information of the marked ground object sample and the real label of the marked ground object sample to obtain an interpretation loss value;
s7, setting a selection strategy based on the scene difference value and the interpretation loss value, and retraining and expanding the dynamic expandable interpretation sub-model according to the selection strategy to obtain a final combined model;
s8, interpreting the new remote sensing image through the final combined model.
In one embodiment of the present invention, in step S7, the selection strategy is:
first kind: when the scene difference value is smaller than a first preset threshold value and the interpretation loss value is smaller than a second preset threshold value, maintaining the current structure of the dynamic extensible interpretation sub-model, and obtaining a final combined model;
second kind: when the scene difference value is smaller than a first preset threshold value and the interpretation loss value is larger than a second preset threshold value, retraining the dynamic extensible interpretation sub-model, updating the combined model, and turning to step S4;
third kind: when the scene difference value is larger than the first preset threshold value and the interpretation loss value is smaller than the second preset threshold value, retraining the dynamic extensible interpretation sub-model, updating the combined model, and turning to step S4;
fourth kind: when the scene difference value is greater than the first preset threshold and the interpretation loss value is greater than the second preset threshold, the dynamically expandable interpretation sub-model is expanded to update the combined model, and the step S4 is proceeded to.
In one embodiment of the present invention, the dynamically expandable interpretation sub-model includes a convolutional neural network for performing interpretation tasks and an expander for expanding the convolutional neural network.
In one embodiment of the invention, expanding the dynamically extensible interpretation sub-model includes adding neurons of a convolutional neural network and training the added neurons;
retraining the dynamically extensible interpretation sub-model includes selectively adjusting portions of the network parameters.
In one embodiment of the present invention, the extended dynamic extensible interpretation sub-model comprises:
adding a preset number of neurons to each layer of neural network;
removing newly added ineffective neurons by using group sparse regularization;
training the final augmented neurons:
Figure SMS_1
wherein l represents the first layer of the neural network, D t For interpretation of data, W is the neural network weight, L is the loss function, μ and γ are the regularized term parameters, t is the current task, t-1 is the previous task, g is a group defined by the input weights of each neuron.
In one embodiment of the present invention, retraining a dynamic extensible interpretation sub-model includes:
when a new task t is received, a sparse linear classifier is installed into the last layer of the dynamically extensible interpretation sub-model:
Figure SMS_2
where l represents the first layer of the convolutional neural network,
Figure SMS_3
for the network parameters of layer I, μ is regularized intensity, N is the total number of layers of the network,/->
Figure SMS_4
Representing in addition to->
Figure SMS_5
Other network parameters than;
identifying a sub-network S related to the current new task t according to the established sparse connection, and retraining the sub-network S:
Figure SMS_6
in an embodiment of the present invention, in step S5, the method for calculating the second scene classification result and the first scene classification result is distance calculation.
In an embodiment of the present invention, the distance calculation process is as follows:
Figure SMS_7
c represents a first scene classification result, wherein p (y=i|x=j) is the prediction probability that the input clipped training sample j belongs to the category i, M is the total number of scene categories, and N is the number of clipped training samples;
the remote sensing image to be interpreted is uniformly cut into r blocks, c t A second scene classification result is represented and,
Figure SMS_8
D=[d 1 ,d 2 ,...,d r ]representing the nearest distance obtained by calculating the distance between the second scene classification result and the first scene classification result, wherein
Figure SMS_9
;/>
D is ordered in descending order of values, the first K values are selected, and the median value of the K values is used as a scene difference value.
In one embodiment of the present invention, the structure of the convolutional neural network in the initial dynamically scalable interpretation sub-model comprises:
a first layer: convolutional layer 1, input a cut image of 229×229×3; the number of convolution kernels is 96; the size of the convolution kernel is 13×13×3; the step length is 4;
a second layer: a pooling layer 1, wherein the pooling size is 3 multiplied by 3; the step length is 2;
third layer: the input of the convolution layer 2 is the output of the second layer, the number of convolution kernels is 256, and the convolution kernel size is 5 multiplied by 5; the step length is 1;
fourth layer: a pooling layer 2, wherein the pooling size is 3 multiplied by 3; the step length is 2;
fifth layer: the convolution layer 3 is input as the output of the fourth layer, the number of convolution kernels is 384, and the convolution kernel size is 3 multiplied by 3;
sixth layer: the convolution layer 4 is input as the output of the fifth layer, the number of convolution kernels is 384, and the convolution kernel size is 3×3;
seventh layer: the convolution layer 5 is input as the output of the sixth layer, the number of convolution kernels is 256, and the convolution kernel size is 3 multiplied by 3;
eighth layer: a pooling layer 3, wherein the pooling size is 3×3; the step length is 2;
the ninth to eleventh layers are all connected layers, and the number of neurons is 384, 192 and 100 respectively.
In one embodiment of the present invention, the scene classifier is a residual network ResNet-50.
The beneficial effects of the invention are as follows: the lifelong learning method suitable for remote sensing image interpretation is realized by combining a scene classification sub-model and a dynamic extensible interpretation sub-model. The known tasks and the unknown tasks are identified through the remote sensing image scene classification, and further, the expansion of the model capacity and the learning of the unknown tasks facing the new unknown tasks are realized through the expansion and the retraining of the interpretation network, so that the knowledge is continuously updated. The learned knowledge is fully applied to a new remote sensing interpretation task, the problem of catastrophic forgetting is effectively avoided while the interpretation accuracy is not reduced, and the utilization rate of the existing model and remote sensing image data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a remote sensing image interpretation method based on life learning;
FIG. 2 is a schematic diagram illustrating interpretation of a plurality of remote sensing images to be interpreted by the combined model;
FIG. 3 is a schematic diagram of retraining a dynamically extensible interpretation sub-model;
FIG. 4 is a schematic diagram of an extension of a dynamically extensible interpretation sub-model.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
Referring to fig. 1, fig. 1 is a flowchart of an embodiment of a remote sensing image interpretation method based on life learning according to the present invention, which includes the following steps:
s1, constructing a combined model, wherein the combined model comprises a dynamic extensible interpretation sub-model and a scene classification sub-model, and the scene classification sub-model comprises a scene classifier and a memory;
s2, obtaining training samples in a sample library, pre-training the combination model by the aid of the cut training samples, and taking the obtained pre-training results as first scene classification results and storing the first scene classification results in a memory;
s3, obtaining a plurality of remote sensing images to be interpreted, and uniformly cutting the remote sensing images to be interpreted, wherein each remote sensing image to be interpreted comprises marked ground object samples and unlabeled target interpretation samples, and the marked ground object samples comprise real labels;
s4, sequentially inputting the cut remote sensing images to be interpreted into a combined model to obtain a second scene classification result and interpretation information, wherein the interpretation information comprises interpretation information of marked ground object samples and interpretation information of unlabeled target interpretation samples;
s5, calculating a second scene classification result and a first scene classification result to obtain a scene difference value;
s6, calculating the interpretation information of the marked ground object sample and the real label of the marked ground object sample to obtain an interpretation loss value;
s7, setting a selection strategy based on the scene difference value and the interpretation loss value, and retraining and expanding the dynamic expandable interpretation sub-model according to the selection strategy to obtain a final combined model;
s8, interpreting the new remote sensing image through the final combined model.
The technical idea of the invention is as follows:
1) The structure of the convolutional neural network in the dynamic extensible interpretation sub-model directly determines the capacity of the learning model, and the convolutional neural network with a fixed structure means that the capacity of the model is limited, and in the case of limited capacity, the convolutional neural network must erase old knowledge in order to learn a new task. The invention realizes the dynamic change of the convolutional neural network structure by constructing the dynamic extensible interpretation sub-model for life learning, can expand the capacity of the convolutional neural network according to the requirement, and can keep old knowledge when learning new knowledge.
2) To avoid catastrophic forgetfulness, the present invention employs selective retraining rather than conventional retraining when receiving a new task. Selectively retraining and selecting a part of neural network structure for retraining, wherein the part of neural network structure is directly related to the current remote sensing image to be interpreted, so that influence on neural nodes irrelevant to the remote sensing image to be interpreted is avoided;
3) In the process of life learning, the correlation between a new remote sensing image to be interpreted and a training sample is uncertain, and a retraining mode is needed to learn new knowledge when the correlation is low, but when a task with high correlation is encountered, the existing model can directly play the task, so that the necessity of retraining is avoided. In order to realize automatic life learning, the invention obtains a second scene classification result of the remote sensing image to be interpreted through a scene classifier before expanding and retraining the convolutional neural network, and confirms whether the scene of the remote sensing image to be interpreted is a new scene or not through comparing the second scene classification result with a first scene result stored in a memory. Retraining of the model is required when facing new scenes and is not required when facing known scenes, so that remote sensing image interpretation scenes are effectively utilized to avoid unnecessary training and adjustment of model structures.
The training samples and the remote sensing image to be interpreted used in the embodiment are remote sensing images with the spatial resolution of 4 meters obtained by high-resolution second-order (GF-2) satellites, the remote sensing images are uniformly cut, and the image size obtained after cutting is 229×229×3.
In particular, the dynamically expandable interpretation sub-model of the present invention includes a convolutional neural network for performing interpretation tasks and an expander for expanding the convolutional neural network.
The schematic diagram of the whole combined model of the embodiment is shown in fig. 2, and the remote sensing interpretation lifelong learning function of the invention is realized by combining a scene classification sub-model based on ResNet-50 and a dynamic extensible interpretation sub-model based on AlexNet.
Specifically, in this embodiment, an improved AlexNet is used as a convolutional neural network in a dynamic extensible interpretation sub-model, where the AlexNet is divided into an upper part and a lower part, and two GPUs are respectively utilized to improve the operation efficiency, and the improved AlexNet has 11 layers of deep neural networks, including a 5-layer convolutional layer, a 3-layer pooling layer, and a 3-layer full-connection layer, without regard to an activation layer.
Specifically, in the initial combined model, the modified AlexNet network structure is as follows:
a first layer: convolutional layer 1, input a cut image of 229×229×3; the number of convolution kernels is 96; the size of the convolution kernel is 13×13×3; the step length is 4;
a second layer: a pooling layer 1, wherein the pooling size is 3 multiplied by 3; the step length is 2;
third layer: the input of the convolution layer 2 is the output of the second layer, the number of convolution kernels is 256, and the convolution kernel size is 5 multiplied by 5; the step length is 1;
fourth layer: a pooling layer 2, wherein the pooling size is 3 multiplied by 3; the step length is 2;
fifth layer: the convolution layer 3 is input as the output of the fourth layer, the number of convolution kernels is 384, and the convolution kernel size is 3 multiplied by 3;
sixth layer: the convolution layer 4 is input as the output of the fifth layer, the number of convolution kernels is 384, and the convolution kernel size is 3×3;
seventh layer: the convolution layer 5 is input as the output of the sixth layer, the number of convolution kernels is 256, and the convolution kernel size is 3 multiplied by 3;
eighth layer: a pooling layer 3, wherein the pooling size is 3×3; the step length is 2;
the ninth to eleventh layers are all connected layers, and the number of neurons is 384, 192 and 100 respectively.
After the remote sensing image to be interpreted is interpreted by utilizing the improved AlexNet network structure, the obtained interpretation information comprises the interpretation information of the marked ground object samples and the interpretation information of the unlabeled target interpretation samples. And calculating the interpretation information of the marked ground object sample and the real label of the marked ground object sample to obtain an interpretation loss value. In this embodiment, the interpretation information of the labeled feature sample is the interpretation label of the labeled feature sample, and the interpretation loss value is the similarity between the interpretation label of the labeled feature sample and the real label. The specific method for calculating the tag similarity may be cosine similarity.
The invention introduces a scene classification sub-model to judge whether the scene of the remote sensing image to be interpreted is related to the learned task scene. If the model is highly relevant, the dynamic extensible interpretation sub-model is considered to be capable of being used for the task of the type, and the current task can be interpreted directly without retraining a convolutional neural network in the model; otherwise, the convolutional neural network needs to be retrained and expanded.
In this embodiment, a residual network ResNet-50 is used to construct a scene classifier in the scene classification sub-model. In the residual network, the crossing connection is established between the lower layer network and the higher layer network to ensure the information circulation from the lower layer network to the higher layer network and avoid the problem of difficult convergence of deep network training caused by gradient dispersion, and the ResNet Block containing the crossing connection forms a basic logic unit of the residual network. The residual network builds a deep network by stacking multiple ResNet blocks. ResNet-50 obtains corresponding feature vectors through layer-by-layer feature extraction, and inputs the features into a scene classifier SoftMax for deep feature classification to obtain scene category probability distribution. The scene probability distribution of the training samples is stored in a memory.
The method comprises calculating the distance between the second scene classification result of the remote sensing image to be interpreted and the first scene classification result in the memory to obtain scene difference value,
Figure SMS_10
a first scene classification result is represented and,
Figure SMS_11
c represents a first scene classification result, wherein p (y=i|x=j) is the prediction probability that the input clipped training sample j belongs to the category i, M is the total number of scene categories, and N is the number of clipped training samples;
the remote sensing image to be interpreted is uniformly cut into r blocks, c t A second scene classification result is represented and,
Figure SMS_12
D=[d 1 ,d 2 ,...,d r ]representing the nearest distance obtained by calculating the distance between the second scene classification result and the first scene classification result, wherein
Figure SMS_13
D is ordered in descending order of values, the first K values are selected, and the median value of the K values is used as a scene difference value.
In this embodiment, K may be the first 30% of D.
And setting a selection strategy based on the scene difference value and the interpretation loss value, and retraining and expanding the dynamic expandable interpretation sub-model according to the selection strategy to obtain a final combined model.
Specifically, the selection strategy is:
first kind: when the scene difference value is smaller than a first preset threshold value and the interpretation loss value is smaller than a second preset threshold value, maintaining the current structure of the dynamic extensible interpretation sub-model, and obtaining a final combined model;
second kind: when the scene difference value is smaller than a first preset threshold value and the interpretation loss value is larger than a second preset threshold value, retraining the dynamic extensible interpretation sub-model, updating the combined model, and turning to step S4;
third kind: when the scene difference value is larger than the first preset threshold value and the interpretation loss value is smaller than the second preset threshold value, retraining the dynamic extensible interpretation sub-model, updating the combined model, and turning to step S4;
fourth kind: when the scene difference value is greater than the first preset threshold and the interpretation loss value is greater than the second preset threshold, the dynamically expandable interpretation sub-model is expanded to update the combined model, and the step S4 is proceeded to.
The first preset threshold and the second preset threshold can be set according to actual conditions.
Specifically, expanding the dynamic extensible interpretation sub-model includes adding neurons of a convolutional neural network, and training the added neurons; retraining the dynamically extensible interpretation sub-model includes selectively adjusting portions of the network parameters.
Referring to fig. 3, the process of retraining AlexNet in this embodiment is as follows:
(1) For the initial training task, use is made of
Figure SMS_14
Regularizing training convolutional neural networks to increase the sparsity of the network such that each neuron is connected to only a portion of neurons:
Figure SMS_15
wherein l represents a first layer of the neural network,
Figure SMS_16
as the network parameter of the first layer, mu is regularized intensity, and N is the total layer number of the network;
(2) By maintaining during life-long learning
Figure SMS_17
And focusing on the sparsity of the new task related subnetwork, the computational load of the network can be greatly reduced. When the model receives a new task t, a sparse linear classifier is installed into the last layer of the model:
Figure SMS_18
wherein
Figure SMS_19
Representing in addition to->
Figure SMS_20
Other network parameters. The association between the output unit and the hidden unit of the N-1 layer is obtained by solving this optimization problem. When the sparse connection of the layer is established, all units and weights affected in the training process can be identified on the premise of not affecting other network structures; />
(3) Identifying a sub-network S related to the current new task t according to the established sparse connection, and retraining the sub-network S:
Figure SMS_21
through l 2 Regularization enables partial retraining of the network. This selective retraining of the partial network can reduce the computational effort and avoid negative migration. Selective retraining as in FIG. 3In the diagram, solid nodes in the diagram are selectively trained network nodes, t-1 is the previous task, and t represents the current task.
Referring to fig. 4, the method for expanding AlexNet includes: k neurons are added to each layer of neural network, then newly added invalid neurons are removed by using group sparse regularization, as shown in a network expansion schematic diagram of fig. 4, solid nodes in the diagram are finally added and trained neurons, nodes with forks are removed newly added invalid neurons, t-1 is the previous task, and t is the current task:
Figure SMS_22
wherein l represents the first layer of the neural network, D t For interpretation of the data, W is the neural network weight, L is the loss function, μ and γ are the regularized term parameters, g is a group defined by the input weights of each neuron.
Finally, to overcome semantic drift and catastrophic forgetting problems, the method is implemented by the following steps of l 2 Regularization:
Figure SMS_23
let W t And W is equal to t-1 Proximity. When lambda is small, the neural network learns new tasks as much as possible, and when lambda is large, the learned knowledge is kept as much as possible. By calculating the l of the neuron at tasks t and (t-1) 2 If the distance is above the threshold, the meaning of the neuron signature is considered to change significantly during the training process, and the corresponding neuron replicates and splits.
The lifelong learning for remote sensing image interpretation can effectively learn the knowledge of the new type of ground features and the different source ground features of the known type in the new task when facing the new interpretation task, does not influence the interpretation effect of the old task, and maximally reserves the learned knowledge.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. The remote sensing image interpretation method based on life learning is characterized by comprising the following steps:
s1, constructing a combined model, wherein the combined model comprises a dynamic extensible interpretation sub-model and a scene classification sub-model, and the scene classification sub-model comprises a scene classifier and a memory;
s2, obtaining training samples in a sample library, pre-training the combination model by the aid of the cut training samples, and taking the obtained pre-training results as first scene classification results and storing the first scene classification results in a memory;
s3, obtaining a plurality of remote sensing images to be interpreted, and uniformly cutting the remote sensing images to be interpreted, wherein each remote sensing image to be interpreted comprises marked ground object samples and unlabeled target interpretation samples, and the marked ground object samples comprise real labels;
s4, sequentially inputting the cut remote sensing images to be interpreted into a combined model to obtain a second scene classification result and interpretation information, wherein the interpretation information comprises interpretation information of marked ground object samples and interpretation information of unlabeled target interpretation samples;
s5, calculating a second scene classification result and a first scene classification result to obtain a scene difference value;
s6, calculating the interpretation information of the marked ground object sample and the real label of the marked ground object sample to obtain an interpretation loss value;
s7, setting a selection strategy based on the scene difference value and the interpretation loss value, and retraining and expanding the dynamic expandable interpretation sub-model according to the selection strategy to obtain a final combined model;
s8, interpreting the new remote sensing image through the final combined model.
2. The method for interpreting a remote sensing image according to claim 1, wherein in step S7, the selection strategy is:
first kind: when the scene difference value is smaller than a first preset threshold value and the interpretation loss value is smaller than a second preset threshold value, maintaining the current structure of the dynamic extensible interpretation sub-model, and obtaining a final combined model;
second kind: when the scene difference value is smaller than a first preset threshold value and the interpretation loss value is larger than a second preset threshold value, retraining the dynamic extensible interpretation sub-model, updating the combined model, and turning to step S4;
third kind: when the scene difference value is larger than the first preset threshold value and the interpretation loss value is smaller than the second preset threshold value, retraining the dynamic extensible interpretation sub-model, updating the combined model, and turning to step S4;
fourth kind: when the scene difference value is greater than the first preset threshold and the interpretation loss value is greater than the second preset threshold, the dynamically expandable interpretation sub-model is expanded to update the combined model, and the step S4 is proceeded to.
3. The life-learning based remote sensing image interpretation method of claim 2, wherein the dynamically expandable interpretation sub-model includes a convolutional neural network and an expander, wherein the convolutional neural network is used to perform the interpretation task, and the expander is used to expand the convolutional neural network.
4. The method for interpreting a remote sensing image based on life-long learning as claimed in claim 3, wherein:
extending the dynamic extensible interpretation sub-model includes adding neurons of a convolutional neural network and training the added neurons;
retraining the dynamically extensible interpretation sub-model includes selectively adjusting portions of the network parameters.
5. The life-learning based remote sensing image interpretation method of claim 4, wherein expanding the dynamically expandable interpretation sub-model comprises:
adding a preset number of neurons to each layer of neural network;
removing newly added ineffective neurons by using group sparse regularization;
training the final augmented neurons:
Figure QLYQS_1
wherein l represents the first layer of the neural network, D t For interpretation of data, W is the neural network weight, L is the loss function, μ and γ are the regularized term parameters, t is the current task, t-1 is the previous task, g is a group defined by the input weights of each neuron, and N is the total number of layers of the network.
6. The life-learning based remote sensing image interpretation method of claim 4, wherein retraining the dynamic extensible interpretation sub-model includes:
when a new task t is received, a sparse linear classifier is installed into the last layer of the dynamically extensible interpretation sub-model:
Figure QLYQS_2
where l represents the first layer of the convolutional neural network,
Figure QLYQS_3
for the network parameters of layer I, μ is regularized intensity, N is the total number of layers of the network,/->
Figure QLYQS_4
Representing in addition to->
Figure QLYQS_5
Other network parameters than;
identifying a sub-network S related to the current new task t according to the established sparse connection, and retraining the sub-network S:
Figure QLYQS_6
7. the method for interpreting a remote sensing image according to claim 1, wherein in step S5, the method for calculating the second scene classification result and the first scene classification result is distance calculation.
8. The method for interpreting a remote sensing image based on life learning as recited in claim 7, wherein the distance calculation process is as follows:
Figure QLYQS_7
c represents a first scene classification result, wherein p (y=i|x=j) is the prediction probability that the input clipped training sample j belongs to the category i, M is the total number of scene categories, and N is the number of clipped training samples;
the remote sensing image to be interpreted is uniformly cut into r blocks, c t A second scene classification result is represented and,
Figure QLYQS_8
D=[d 1 ,d 2 ,...,d r ]represent the firstA nearest distance obtained by performing distance calculation on the two scene classification results and the first scene classification result, wherein
Figure QLYQS_9
D is ordered in descending order of values, the first K values are selected, and the median value of the K values is used as a scene difference value.
9. The method of claim 3, wherein the structure of the convolutional neural network in the initial dynamically scalable interpretation sub-model comprises:
a first layer: convolutional layer 1, input a cut image of 229×229×3; the number of convolution kernels is 96; the size of the convolution kernel is 13×13×3; the step length is 4;
a second layer: a pooling layer 1, wherein the pooling size is 3 multiplied by 3; the step length is 2;
third layer: the input of the convolution layer 2 is the output of the second layer, the number of convolution kernels is 256, and the convolution kernel size is 5 multiplied by 5; the step length is 1;
fourth layer: a pooling layer 2, wherein the pooling size is 3 multiplied by 3; the step length is 2;
fifth layer: the convolution layer 3 is input as the output of the fourth layer, the number of convolution kernels is 384, and the convolution kernel size is 3 multiplied by 3;
sixth layer: the convolution layer 4 is input as the output of the fifth layer, the number of convolution kernels is 384, and the convolution kernel size is 3×3;
seventh layer: the convolution layer 5 is input as the output of the sixth layer, the number of convolution kernels is 256, and the convolution kernel size is 3 multiplied by 3;
eighth layer: a pooling layer 3, wherein the pooling size is 3×3; the step length is 2;
the ninth to eleventh layers are all connected layers, and the number of neurons is 384, 192 and 100 respectively.
10. The method for interpreting a remote sensing image based on life-long learning as claimed in claim 1, wherein the scene classifier is a residual network res net-50.
CN202310331512.5A 2023-03-31 2023-03-31 Remote sensing image interpretation method based on life learning Active CN116052018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310331512.5A CN116052018B (en) 2023-03-31 2023-03-31 Remote sensing image interpretation method based on life learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310331512.5A CN116052018B (en) 2023-03-31 2023-03-31 Remote sensing image interpretation method based on life learning

Publications (2)

Publication Number Publication Date
CN116052018A true CN116052018A (en) 2023-05-02
CN116052018B CN116052018B (en) 2023-10-27

Family

ID=86133627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310331512.5A Active CN116052018B (en) 2023-03-31 2023-03-31 Remote sensing image interpretation method based on life learning

Country Status (1)

Country Link
CN (1) CN116052018B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2007135845A (en) * 2007-09-28 2009-04-10 Федеральное государственное унитарное предприятие "Научно-производственное предприятие-Всероссийский научно-исследовательский инст METHOD FOR GEOLOGICAL DECODING OF REMOTE EARTH SURFACES
CN108229516A (en) * 2016-12-30 2018-06-29 北京市商汤科技开发有限公司 For interpreting convolutional neural networks training method, device and the equipment of remote sensing images
CN110703244A (en) * 2019-09-05 2020-01-17 中国科学院遥感与数字地球研究所 Method and device for identifying urban water body based on remote sensing data
CN112347930A (en) * 2020-11-06 2021-02-09 天津市勘察设计院集团有限公司 High-resolution image scene classification method based on self-learning semi-supervised deep neural network
CN113392748A (en) * 2021-06-07 2021-09-14 中国煤炭地质总局勘查研究总院 Remote sensing image farmland information extraction method based on convolutional neural network
CN113449640A (en) * 2021-06-29 2021-09-28 中国地质大学(武汉) Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN
LU501790B1 (en) * 2022-04-04 2022-10-04 Univ Inner Mongolia Agri A multi-source, multi-temporal and large-scale automatic remote sensing interpretation model based on surface ecological features

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2007135845A (en) * 2007-09-28 2009-04-10 Федеральное государственное унитарное предприятие "Научно-производственное предприятие-Всероссийский научно-исследовательский инст METHOD FOR GEOLOGICAL DECODING OF REMOTE EARTH SURFACES
CN108229516A (en) * 2016-12-30 2018-06-29 北京市商汤科技开发有限公司 For interpreting convolutional neural networks training method, device and the equipment of remote sensing images
CN110703244A (en) * 2019-09-05 2020-01-17 中国科学院遥感与数字地球研究所 Method and device for identifying urban water body based on remote sensing data
CN112347930A (en) * 2020-11-06 2021-02-09 天津市勘察设计院集团有限公司 High-resolution image scene classification method based on self-learning semi-supervised deep neural network
CN113392748A (en) * 2021-06-07 2021-09-14 中国煤炭地质总局勘查研究总院 Remote sensing image farmland information extraction method based on convolutional neural network
CN113449640A (en) * 2021-06-29 2021-09-28 中国地质大学(武汉) Remote sensing image building semantic segmentation edge optimization method based on multitask CNN + GCN
LU501790B1 (en) * 2022-04-04 2022-10-04 Univ Inner Mongolia Agri A multi-source, multi-temporal and large-scale automatic remote sensing interpretation model based on surface ecological features

Also Published As

Publication number Publication date
CN116052018B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN107169956B (en) Color woven fabric defect detection method based on convolutional neural network
Othman et al. Domain adaptation network for cross-scene classification
CN108304795B (en) Human skeleton behavior identification method and device based on deep reinforcement learning
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN108021947B (en) A kind of layering extreme learning machine target identification method of view-based access control model
CN105787501B (en) Power transmission line corridor region automatically selects the vegetation classification method of feature
WO2018208939A1 (en) Systems and methods to enable continual, memory-bounded learning in artificial intelligence and deep learning continuously operating applications across networked compute edges
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN112347970B (en) Remote sensing image ground object identification method based on graph convolution neural network
CN111382686B (en) Lane line detection method based on semi-supervised generation confrontation network
CN109033107A (en) Image search method and device, computer equipment and storage medium
CN112837315B (en) Deep learning-based transmission line insulator defect detection method
CN115017418B (en) Remote sensing image recommendation system and method based on reinforcement learning
CN112232151B (en) Iterative polymerization neural network high-resolution remote sensing scene classification method embedded with attention mechanism
CN111160389A (en) Lithology identification method based on fusion of VGG
CN113298129A (en) Polarized SAR image classification method based on superpixel and graph convolution network
CN111640087A (en) Image change detection method based on SAR (synthetic aperture radar) deep full convolution neural network
CN113792631B (en) Aircraft detection and tracking method based on multi-scale self-adaption and side-domain attention
CN112801204B (en) Hyperspectral classification method with lifelong learning ability based on automatic neural network
CN114066899A (en) Image segmentation model training method, image segmentation device, image segmentation equipment and image segmentation medium
CN114187506A (en) Remote sensing image scene classification method of viewpoint-aware dynamic routing capsule network
CN110837787B (en) Multispectral remote sensing image detection method and system for three-party generated countermeasure network
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN117154256A (en) Electrochemical repair method for lithium battery
CN116052018B (en) Remote sensing image interpretation method based on life learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant