CN115861847A - Intelligent auxiliary marking method for visible light remote sensing image target - Google Patents

Intelligent auxiliary marking method for visible light remote sensing image target Download PDF

Info

Publication number
CN115861847A
CN115861847A CN202310159007.7A CN202310159007A CN115861847A CN 115861847 A CN115861847 A CN 115861847A CN 202310159007 A CN202310159007 A CN 202310159007A CN 115861847 A CN115861847 A CN 115861847A
Authority
CN
China
Prior art keywords
target
embedding
typical
sample
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310159007.7A
Other languages
Chinese (zh)
Other versions
CN115861847B (en
Inventor
李冠群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Genyu Muxing Beijing Space Technology Co ltd
Original Assignee
Genyu Muxing Beijing Space Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genyu Muxing Beijing Space Technology Co ltd filed Critical Genyu Muxing Beijing Space Technology Co ltd
Priority to CN202310159007.7A priority Critical patent/CN115861847B/en
Publication of CN115861847A publication Critical patent/CN115861847A/en
Application granted granted Critical
Publication of CN115861847B publication Critical patent/CN115861847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image processing, and discloses an intelligent auxiliary labeling method for a target of a visible light remote sensing image, which comprises the following steps: acquiring a first sample label or attribute description of each type of target manually; constructing a typical target automatic labeling model based on the first sample labeling or attribute description; and marking the visible light remote sensing image through a typical target automatic marking model. Aiming at the problem that the traditional registration method obtains unstable shallow features, the method can reduce the manpower, material resources and financial resources required in the sample data labeling process, improve the efficiency of sample labeling, and further promote the progress of intelligent interpretation research of the visible light remote sensing image.

Description

Intelligent auxiliary marking method for visible light remote sensing image target
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an intelligent auxiliary labeling method for a target of a visible light remote sensing image.
Background
With the wide application of deep learning in the intelligent interpretation of the remote sensing images, the marked data samples have important influence on the intelligent interpretation effect of the remote sensing images, large-scale, standardized and high-quality marked data samples are basic preconditions for the research of the intelligent interpretation algorithm of the remote sensing images, the traditional data marking method highly depends on manual participation, the automation and intelligence degrees are low, and higher cost is consumed.
The type labeling of typical targets has higher requirements on field professional knowledge, the conventional labeling work is usually completed by experts, the manual labeling method of the experts has the problems of low speed, low efficiency, low cost performance, lack of reusability and the like, and in reality, some types of target samples are few, so that a high-precision automatic labeling method cannot be obtained through a large-scale sample training mode.
When a sample data set is established, maintained and expanded, a new labeling requirement can appear, which also requires that an automatic model has an expansion capability, so that when a part of new target samples are given, the automatic labeling capability of a new target can be learned, and meanwhile, the labeling capability of an old target which is already learned cannot be lost. In the traditional method, after a new labeling requirement comes, a new labeling sample and an old labeling sample are mixed to form a new data set to retrain or fine tune a labeling model, so that the model learns the labeling capacity of new and old targets at the same time. The traditional method has a large cost of function expansion because old labeled data needs to be revisited and the model needs to be retrained. Therefore, the method of incremental learning can be used for expanding the labeling range of the labeling model on the premise of accessing the old labeling data as little as possible. The most common problem in incremental learning is the imbalance in sample size, since only a small amount of old annotation data is accessible. The number of samples is not uniform, and the training of the scale will bring the following negative effects to the model:
1) The classifier weight magnitude is unbalanced: for the classifier weight parameters, the magnitude of the new class will be significantly higher than the old class;
2) The classifier weights and sample features of the old classes will be biased: when learning information for a new class, some information (knowledge) for the old class is forgotten, including classifier weights, characteristics of the sample, etc.
After the sample is labeled, the quality of the labeled information is inspected by the traditional manual labeling system through manual visual inspection, and a large amount of manpower is consumed.
Disclosure of Invention
The invention aims to overcome one or more of the prior technical problems and provides an intelligent auxiliary marking method for a target of a visible light remote sensing image.
In order to achieve the purpose, the intelligent auxiliary marking method for the visible light remote sensing image target provided by the invention comprises the following steps:
acquiring a first sample label or attribute description of each type of target manually;
constructing a typical target automatic labeling model based on the first sample labeling or attribute description;
and marking the visible light remote sensing image through a typical target automatic marking model.
According to one aspect of the invention, the method further comprises:
and expanding the target automatic labeling model according to incremental learning.
According to one aspect of the invention, the method further comprises establishing an association between different targets, formulated as:
Figure SMS_1
wherein ,
Figure SMS_2
,/>
Figure SMS_3
in order to label the object(s),
Figure SMS_4
for the number of the symbiotic occurrence of the two,
Figure SMS_5
is comprised of>
Figure SMS_6
As the number of images to be annotated,
Figure SMS_7
and />
Figure SMS_8
Unequal, with asymmetry.
According to one aspect of the invention, the building of the automatic labeling model of the typical target comprises a small sample target class automatic labeling model based on embedded learning and a zero sample typical target automatic labeling model based on dual-mode embedded learning.
According to one aspect of the invention, the embedded learning-based small sample target category automatic labeling model comprises an embedding module, a relation module and an identification module, wherein the embedding module performs dimensionality reduction mapping on a high-dimensional image to a low-dimensional embedding space to obtain an embedding vector; the relationship module measures the degree of approximation between two typical targets; the identification module judges the relation between the marked typical target and the to-be-marked typical target of each model according to the approach principle to obtain the type of the to-be-marked typical target; the small sample target class automatic labeling model based on embedded learning is formulated as,
Figure SMS_9
wherein ,
Figure SMS_10
a representative target model number is shown,
Figure SMS_11
representing the image modality embedding function and,
Figure SMS_12
a sub-network of the relational module is represented,
Figure SMS_13
a vector channel aggregation operation is represented that,
Figure SMS_14
a representative target to be noted is shown,
Figure SMS_15
a typical object that has been labeled is shown,
Figure SMS_16
representing the value of the argument at which the function is at a maximum,
Figure SMS_17
representing a set of models for which a typical object has been labeled.
According to one aspect of the invention, the embedded module is constituted by an embedded representation subnetwork comprising four volume blocks of 64 each
Figure SMS_18
A BN layer, a ReLU non-linear layer, the first two volume blocks also containing a->
Figure SMS_19
The largest pooling layer of (a);
when a plurality of marked samples exist in a typical target of the same type, simultaneously inputting the marked samples into an embedding module to obtain respective embedding vectors, and finally averaging to obtain the embedding vector of the target of the type; the target sample to be marked passes through an embedding module to obtain an embedded vector of the target to be marked;
the index for measuring the distance between the embedded vectors of the target to be marked and the marked target comprises a cosine distance function and an Euclidean distance function;
the relation module is composed of a sub-network, the sub-network comprises two convolution blocks and two full connection layers, and each convolution block comprises 64 convolution blocks
Figure SMS_20
A convolution kernel of (B), a BN layer, a ReLU nonlinear layer and two->
Figure SMS_21
The input of the sub-network is a vector formed by splicing embedded vectors of two targets, the output is a similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is; and according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked.
According to one aspect of the invention, the zero-sample typical target automatic labeling model based on bimodal embedding learning comprises an embedding module, a relation module, an identification module and a target attribute description embedding expression module, wherein the input of the target attribute description embedding expression module is attribute semantic description of a typical target, and the output of the target attribute description embedding expression module is an embedding vector of the typical target corresponding to attribute description;
the zero-sample typical object automatic labeling model based on the bimodal embedding learning is formulated as,
Figure SMS_22
wherein ,
Figure SMS_23
a representative target model number is shown,
Figure SMS_24
representing the image modality embedding function and,
Figure SMS_25
a sub-network of the relational module is represented,
Figure SMS_26
a property modality embedding function is represented,
Figure SMS_27
a vector channel aggregate operation is represented that,
Figure SMS_28
represents the value of the argument at which the image modality embedding function is at a maximum value,
Figure SMS_29
a representative target to be noted is shown,
Figure SMS_30
a typical object that has been labeled is shown,
Figure SMS_31
representing a set of models for which a typical object has been labeled.
According to one aspect of the invention, the automatic target labeling model based on incremental learning expansion comprises a classification method based on cosine similarity and an anti-forgetting constraint based on distillation learning;
the cosine similarity-based classification method is formulated as,
Figure SMS_32
wherein ,
Figure SMS_33
a representative feature extractor for extracting a feature of a video signal,
Figure SMS_34
the representation target automatically marks the model sample,
Figure SMS_35
indicates a natural constant->
Figure SMS_36
An exponential function of the base is used,
Figure SMS_37
a bias vector representing weights of the convolutional neural network,
Figure SMS_38
a bias vector transpose matrix representing the first convolutional neural network weights,
Figure SMS_39
a bias vector transpose matrix representing weights of the second convolutional neural network,
Figure SMS_40
the deviation vector representing the last layer of the convolutional neural network,
Figure SMS_41
representing the predicted probability of the sample in the convolutional neural network;
at the last layer using cosine normalization formulated as,
Figure SMS_42
wherein ,
Figure SMS_43
a learnable scalar representing the peak of the distribution of the controlling softmax function,
Figure SMS_44
is a feature extractor for extracting the features of a document,
Figure SMS_45
indicates a natural constant->
Figure SMS_46
An exponential function of the base is used,
Figure SMS_47
is a deviation vector of the convolutional neural network weight, is->
Figure SMS_48
Representing the predicted probability of the sample in the convolutional neural network;
the value range of the cosine similarity is limited to [ -1,1];
the distillation learning-based anti-forgetting constraint is formulated as,
Figure SMS_49
wherein ,
Figure SMS_50
expressing the normalized features extracted by a typical target automatic labeling model before incremental learning;
Figure SMS_51
expressing the normalized features extracted by the typical target automatic labeling model after incremental learning;
Figure SMS_52
expressed as a loss of constraint, encourages the direction of features extracted by the current network to resemble the original model.
In order to achieve the above object, the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein the computer program, when executed by the processor, implements the above intelligent auxiliary labeling method for a target in a visible light remote sensing image.
In order to achieve the above object, the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for intelligently assisting in labeling a target in a visible light remote sensing image is implemented.
The invention has the beneficial effects that: the manpower, material resources and financial resources required in the sample data labeling process are reduced, the sample labeling efficiency is improved, and the progress of intelligent interpretation research of the visible light remote sensing image is further promoted.
Drawings
FIG. 1 is a schematic flow chart of an intelligent auxiliary labeling method for visible light remote sensing image targets according to the present invention;
FIG. 2 schematically illustrates a small sample exemplary object class automatic labeling model based on embedded learning according to the present invention;
fig. 3 schematically shows an exemplary zero-sample object class automatic labeling model based on embedded learning according to the present invention.
Detailed Description
The present invention will now be discussed with reference to exemplary embodiments, it being understood that the embodiments discussed are only for the purpose of enabling a person of ordinary skill in the art to better understand and thus implement the contents of the present invention, and do not imply any limitation on the scope of the present invention.
As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on" and the terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".
FIG. 1 is a schematic flow chart of an intelligent auxiliary labeling method for visible light remote sensing image targets according to the present invention; FIG. 2 is a schematic diagram illustrating an exemplary object class automatic labeling model based on a small sample of embedded learning according to the present invention; FIG. 3 schematically illustrates a zero-sample representative object class automatic labeling model based on embedded learning according to the present invention; as shown in fig. 1 to fig. 3, the intelligent auxiliary labeling method for visible light remote sensing image targets of the invention includes:
acquiring a first sample label or attribute description of each type of target manually;
constructing a typical target automatic labeling model based on the first sample labeling or attribute description;
and marking the visible light remote sensing image through a typical target automatic marking model.
In this embodiment, a semi-automatic labeling framework of human-computer interaction is firstly studied, small-amount sample labeling is firstly carried out manually or experts are provided to carry out first sample labeling on each model or attribute description on different targets, then an automatic labeling model is obtained through a small-sample and zero-sample learning algorithm, and finally an incremental model updating technology is studied to widen the labeling range of the automatic labeling model through incremental learning.
According to an embodiment of the invention, the method further comprises: and expanding the target automatic labeling model according to incremental learning.
According to an embodiment of the present invention, the method further comprises establishing an association relationship between different targets, formulated as:
Figure SMS_53
wherein ,
Figure SMS_54
,/>
Figure SMS_55
in order to label an object, the object is marked,
Figure SMS_56
for the number of the symbiotic occurrence of the two,
Figure SMS_57
is comprised of>
Figure SMS_58
As the number of images to be annotated,
Figure SMS_59
and />
Figure SMS_60
Unequal, with asymmetry.
In this embodiment, in the remote sensing image big data space, spatial distributions between objects in different regions and between different objects in the same region have a certain association relationship, and due to the existence of the association relationship, in the large-scale remote sensing image labeling data set, objects appearing in the same image at the same time have strong semantic correlation. Such words with higher symbiotic frequency often represent two concepts or things with stronger association, so that there is a high possibility that they are labeled in the same image, such as "airplane" and "airport", "road" and "car". Therefore, semantic related information among the labeled objects can be effectively provided based on the characteristic of the symbiotic association, so that the result of improving the semantic labeling of the big data of the remote sensing image is effectively corrected and improved. However, the word frequency count is simply counted according to the symbiotic relationship, and different characteristics of different labeled objects cannot be effectively considered. Therefore, the invention adopts the following mode to calculate the association and symbiosis relationship:
Figure SMS_61
wherein ,
Figure SMS_62
,/>
Figure SMS_63
in order to label an object, the object is marked,
Figure SMS_64
for the number of the symbiotic occurrence of the two,
Figure SMS_65
to comprise +>
Figure SMS_66
As the number of images to be annotated,
Figure SMS_67
and />
Figure SMS_68
Unequal, with asymmetry.
Based on the above definition of the association and symbiosis relationship,
Figure SMS_70
and />
Figure SMS_74
Are not equal and have asymmetry. Consider that->
Figure SMS_77
,/>
Figure SMS_71
Are two subjects with a greater difference in frequency of occurrence, if->
Figure SMS_75
And/or>
Figure SMS_78
There is a certain interdependence between, for example->
Figure SMS_80
Dependent on->
Figure SMS_69
Present, then it is easier to slave>
Figure SMS_73
Get->
Figure SMS_76
But difficult to slave>
Figure SMS_79
Is present to infer->
Figure SMS_72
Whether or not it is present. For example, the relationship between the objects "port" and "ship" can easily infer the presence of "water" from "ship", but given "water" it is difficult to tell whether "ship" is present, because "port" is associated with more things.
According to one embodiment of the invention, the typical target automatic labeling model is constructed by a small sample target class automatic labeling model based on embedded learning and a zero sample typical target automatic labeling model based on dual-mode embedded learning.
In this embodiment, in the actual labeling, there may be some types of target samples that are few, and a high-precision automatic labeling method cannot be obtained through a large-scale sample training mode. Based on analysis, the invention designs a typical target automatic labeling method based on zero sample and small sample learning, and experts label a first sample for each model or describe the attributes of different targets, and according to a small number of labeled samples or attribute descriptions, the invention realizes an efficient, intelligent and automatic typical target category automatic labeling method by a target classification technology based on small sample and zero sample learning.
According to one embodiment of the invention, the embedded learning-based small sample target category automatic labeling model comprises an embedding module, a relation module and an identification module, wherein the embedding module performs dimension reduction mapping on a high-dimensional image to a low-dimensional embedding space to obtain an embedding vector; the relationship module measures the degree of approximation between two typical targets; the recognition module judges the relation between the typical target with the marked Ge model and the typical target to be marked according to the proximity principle to obtain the type of the typical target to be marked; the small sample object class automatic labeling model based on embedded learning is formulated as,
Figure SMS_81
wherein ,
Figure SMS_82
a representative target model number is shown,
Figure SMS_83
representing the image modality embedding function and,
Figure SMS_84
a sub-network of the relational module is represented,
Figure SMS_85
a vector channel aggregation operation is represented that,
Figure SMS_86
a representative object to be noted is shown,
Figure SMS_87
a typical object that has been labeled is shown,
Figure SMS_88
representing the value of the argument at which the function is at a maximum,
Figure SMS_89
representing a set of models for which a typical object has been labeled.
In this embodiment, the purpose of embedding learning is to learn an embedding function to map a high-dimensional input to a low-dimensional space, and the representation of the high-dimensional input in the low-dimensional space should have the following characteristics: the embedding representation points of the samples which are similar in the high-dimensional space in the low-dimensional space should be closer, and the embedding representation points of the samples which are dissimilar in the high-dimensional space in the low-dimensional space should be far away from each other, so that the classes of the unknown samples can be obtained by performing distance calculation and comparison on the embedding vectors of the unknown samples and the embedding vectors of the known samples in the low-dimensional embedding space. And measuring the distance between the embedded vector of the typical target to be marked and the embedded vector of a small number of marking samples of each model of the typical target as long as a proper embedding function can be learned, wherein the model of the target corresponding to the marking sample closest to the typical target to be marked is the model of the typical target to be marked according to the nearest neighbor principle. Therefore, the invention designs a small sample typical target class automatic labeling model based on embedded learning as shown in FIG. 2.
The model is divided into an embedding module, a relation module and an identification module. The embedding module is used for reducing the dimension of the high-dimensional image and mapping the high-dimensional image to the low-dimensional embedding space to form an embedding vector. The relation module is used for measuring the similarity between the two typical targets, and the identification module is used for judging the relation between the marked typical target and the to-be-marked typical target of each model according to the nearest neighbor principle to obtain the type of the to-be-marked typical target.
According to one embodiment of the invention, the embedded module is constituted by an embedded representation subnetwork comprising four volume blocks of 64 each
Figure SMS_90
A BN layer, a ReLU non-linear layer, the first two volume blocks also containing a->
Figure SMS_91
The maximum pooling layer of (a);
when a plurality of marked samples exist in a typical target of the same type, simultaneously inputting the marked samples into an embedding module to obtain respective embedding vectors, and finally averaging to obtain the embedding vector of the target of the type; the target sample to be marked passes through an embedding module to obtain an embedded vector of the target to be marked;
the index for measuring the distance between the embedded vectors of the target to be marked and the marked target comprises a cosine distance function and an Euclidean distance function;
the relation module is composed of a sub-network, the sub-network comprises two convolution blocks and two full connection layers, and each convolution block comprises 64 convolution blocks
Figure SMS_92
A convolution kernel of (B), a BN layer, a ReLU nonlinear layer and two->
Figure SMS_93
The input of the sub-network is a vector formed by splicing embedded vectors of two targets, the output is a similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is; and according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked. />
In this embodiment, the embedding module is constituted by an embedded representation subnetwork comprising four volume blocks of 64 each
Figure SMS_94
A BN layer, a ReLU non-linear layer, the first two volume blocks also containing a->
Figure SMS_95
The largest pooling layer of (a). The input of the embedding module is a high-dimensional image, and the output is an embedding vector. If a plurality of labeled samples exist in the typical target of the same model, the samples can be simultaneously input into the embedding module to obtain respective embedding vectors, and then the embedding vectors of the target of the model are obtained by averaging the embedding vectors. And similarly, the target sample to be marked is also sent to the embedding module to obtain the embedded vector of the target to be marked. The commonly used indexes for measuring the distance between the embedded vectors of the target to be marked and the marked target are a cosine distance function and an Euclidean distance function.
The function of the relation module is to measure the similarity between the embedded vectors of two targets, the module is composed of a sub-network, the sub-network comprises two convolution blocks and two full connection layers, each convolution block comprises 64 convolution blocks
Figure SMS_96
A convolution kernel of, a BN layer, a ReLU nonlinear layer and twoNumber of or>
Figure SMS_97
The maximum pooling layer of (a). The input of the sub-network is a vector formed by splicing the embedded vectors of the two targets, the output is the similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is. And according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked.
According to one embodiment of the invention, the zero-sample typical target automatic labeling model based on bimodal embedding learning comprises an embedding module, a relation module, an identification module and a target attribute description embedding expression module, wherein the input of the target attribute description embedding expression module is attribute semantic description of a typical target, and the output of the target attribute description embedding expression module is an embedding vector of the typical target corresponding to attribute description;
the zero-sample typical object automatic labeling model based on the bimodal embedding learning is formulated as,
Figure SMS_98
wherein ,
Figure SMS_99
a representative target model number is shown,
Figure SMS_100
representing the image modality embedding function and,
Figure SMS_101
a sub-network of the relational module is represented,
Figure SMS_102
a property-representing modality-embedding function is represented,
Figure SMS_103
a vector channel aggregation operation is represented that,
Figure SMS_104
represents the value of the argument at which the image modality embedding function is at a maximum value,
Figure SMS_105
a representative target to be noted is shown,
Figure SMS_106
a typical object that has been labeled is shown,
Figure SMS_107
representing a set of models for which a typical object has been labeled.
In this embodiment, such a situation may occur in reality: when a new type target appears but an automatic labeling model is trained before, an image sample of a typical target of the type is not obtained in a database, or images of targets of certain types cannot be shot due to the fact that the images are sensitive to military information and the like. Images of such typical objects may be acquired at a later time for various reasons and the model is required to be able to quickly achieve automatic labeling of such objects. Therefore, the invention designs a zero-sample typical object class automatic labeling model based on embedded learning as shown in fig. 3 on the basis of the small-sample typical object class automatic labeling model based on embedded learning.
Compared with a small-sample typical target category automatic labeling model based on embedded learning, a target attribute description embedded representation module is added in the zero-sample typical target category automatic labeling model based on embedded learning, the input of the module is attribute semantic description of a typical target, such as information of fuselage length, wing included angle and the like of the target, and the embedded vector of the typical target corresponding to the attribute description is output. Therefore, only the expert is required to describe the attributes of the target of a certain model to form a typical target attribute description vector, and the typical target category identification under the condition of the zero-marking sample can be realized.
According to one embodiment of the invention, the automatic target labeling model is expanded according to incremental learning and comprises a classification method based on cosine similarity and an anti-forgetting constraint based on distillation learning;
the classification method based on cosine similarity is formulated as,
Figure SMS_108
wherein ,
Figure SMS_109
a representative feature extractor for extracting a feature of a video signal,
Figure SMS_110
the representation target automatically marks the model sample,
Figure SMS_111
indicates a natural constant->
Figure SMS_112
An exponential function of the base is used,
Figure SMS_113
a deviation vector representing the weights of the convolutional neural network,
Figure SMS_114
a bias vector transpose matrix representing the first convolutional neural network weights,
Figure SMS_115
a bias vector transpose matrix representing weights of the second convolutional neural network,
Figure SMS_116
the deviation vector representing the last layer of the convolutional neural network,
Figure SMS_117
representing the predicted probability of the sample in the convolutional neural network;
at the last layer using cosine normalization is formulated as,
Figure SMS_118
wherein ,
Figure SMS_119
a learnable scalar representing the peak of the distribution of the controlling softmax function,
Figure SMS_120
is a feature extractor for extracting the features of a document,
Figure SMS_121
indicating a natural constant>
Figure SMS_122
An exponential function of the base is used,
Figure SMS_123
is the deviation vector of the convolutional neural network weights,
Figure SMS_124
representing the predicted probability of the sample in the convolutional neural network;
the value range of the cosine similarity is limited to [ -1,1];
the distillation learning based anti-forgetting constraint is formulated as,
Figure SMS_125
wherein ,
Figure SMS_126
expressing the normalized features extracted by a typical target automatic labeling model before incremental learning;
Figure SMS_127
expressing the normalized features extracted by the typical target automatic labeling model after incremental learning;
Figure SMS_128
expressed as a constraint penalty, encourages the direction of features extracted by the current network to be similar to the original model.
In this embodiment, when the sample data set is created, maintained, and expanded, a new labeling requirement may occur, which also requires that the automatic model has an expansion capability, so that when a part of new target samples are given, the automatic labeling capability of the new target can be learned, and at the same time, the labeling capability of an old target that has already been learned cannot be lost. In the traditional method, after a new labeling requirement comes, a new labeling sample and an old labeling sample are mixed to form a new data set to retrain or fine tune a labeling model, so that the model learns the labeling capacity of new and old targets at the same time. The traditional method has a large cost of function expansion because old labeled data needs to be revisited and the model needs to be retrained. Therefore, the invention provides a method for using incremental learning to expand the labeling range of the labeling model on the premise of accessing the old labeling data as little as possible.
The most common problem in incremental learning is the imbalance in sample size, since only a small amount of old annotation data is accessible. Training with an unbalanced number of samples will have a negative impact on the model. This can result in an imbalance in the magnitude of the classifier weights, and the magnitude of the new class can be significantly higher than the old class for the classifier weight parameters. At the same time, the classifier weights and sample features of the old classes are also biased: when learning information for a new class, some information (knowledge) for the old class is forgotten, including classifier weights, characteristics of the sample, etc. The invention respectively proposes to use a classification method based on cosine similarity and an anti-forgetting constraint based on distillation learning to overcome the two problems. The cosine similarity is introduced into incremental learning, so that the deviation caused by the obvious difference in the amplitude of the new and old class features can be effectively eliminated.
An anti-forgetting constraint is introduced by the new anti-forgetting loss function, which provides a stronger constraint to retain the previous knowledge than the original loss function. In order to reduce the forgetfulness of the previous contents when the model is adapted to the new data, the normalization characteristics of the new model and the old model are required to be as same as possible, so that a new distillation loss is proposed. The idea behind this loss is that the distributions of features of different classes learned by the old model reflect to some extent the relationships between classes, and therefore maintaining such relationships is also meaningful to prevent forgetfulness. The feature distribution is mainly described by the included angle of the feature vector in the feature space, so that the anti-forgetting constraint based on distillation learning is proposed.
To achieve the above object, the present invention further provides an electronic device, including: the processor, the memory and the computer program stored on the memory and capable of running on the processor are used for realizing the intelligent auxiliary labeling method for the visible light remote sensing image target when the computer program is executed by the processor.
In order to achieve the above object, the present invention further provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for intelligently assisting in labeling a target in a visible light remote sensing image is implemented.
Based on the method, the method has the advantages that aiming at the problem that the traditional registration method obtains unstable shallow features, the method extracts deep features by integrating point-line-plane geometric structural features into a deep learning framework to obtain deep semantic information of the remote sensing image, and improves the robustness of the registration algorithm; and an image pyramid is also constructed to solve the problem of size change, and the problem that the directly extracted features contain a large amount of invalid information is solved by using a key point selection method based on an attention mechanism, so that the accuracy of the feature extraction method is improved, and the accuracy of visible light remote sensing image registration is further improved.
Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the apparatus and the device described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
The technical solution of the present invention substantially or partially contributes to the prior art, or parts of the technical solution may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for sending/receiving an energy saving signal according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims (10)

1. The intelligent auxiliary marking method for the visible light remote sensing image target is characterized by comprising the following steps:
acquiring a first sample label or attribute description of each type of target manually;
constructing a typical target automatic labeling model based on the first sample labeling or attribute description;
and marking the visible light remote sensing image through a typical target automatic marking model.
2. The intelligent auxiliary labeling method for the visible light remote sensing image target according to claim 1, further comprising:
and automatically labeling the extended target according to the incremental learning.
3. The intelligent auxiliary target labeling method for the visible light remote sensing image according to claim 1, characterized in that the method further comprises establishing an association relationship between different targets, which is expressed by a formula:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
,/>
Figure QLYQS_3
in order to label an object, the object is marked,
Figure QLYQS_4
for the number of the symbiotic occurrence of the two,
Figure QLYQS_5
is comprised of>
Figure QLYQS_6
As the number of images to be annotated,
Figure QLYQS_7
and />
Figure QLYQS_8
Unequal, with asymmetry.
4. The intelligent auxiliary target labeling method of visible light remote sensing images as claimed in claim 1,
the typical target automatic labeling model construction comprises a small sample target category automatic labeling model based on embedded learning and a zero sample typical target automatic labeling model based on dual-mode embedded learning.
5. The intelligent auxiliary labeling method for visible light remote sensing image targets as claimed in claim 2,
the embedded learning-based small sample target category automatic labeling model comprises an embedding module, a relation module and an identification module, wherein the embedding module is used for reducing the dimension of a high-dimensional image and mapping the high-dimensional image to a low-dimensional embedding space to obtain an embedding vector; the relationship module measures the degree of approximation between two typical targets; the identification module judges the relation between the marked typical target and the to-be-marked typical target of each model according to the approach principle to obtain the type of the to-be-marked typical target; the small sample object class automatic labeling model based on embedded learning is formulated as,
Figure QLYQS_9
wherein ,
Figure QLYQS_10
a representative target model number is shown,
Figure QLYQS_11
representing the image modality embedding function and,
Figure QLYQS_12
a sub-network of the relational module is represented,
Figure QLYQS_13
a vector channel aggregation operation is represented that,
Figure QLYQS_14
a representative object to be noted is shown,
Figure QLYQS_15
a typical object that has been labeled is shown,
Figure QLYQS_16
representing the value of the argument at which the function is at a maximum,
Figure QLYQS_17
representing a set of models for which a typical object has been labeled.
6. The intelligent auxiliary target labeling method of visible light remote sensing images as claimed in claim 3, characterized in that,
the embedded module is composed of an embedded representation sub-network, the embedded representation sub-network comprises four volume blocks, and 64 volume blocks are arranged in each volume block
Figure QLYQS_18
A BN layer, a ReLU non-linear layer, the first two volume blocks also containing a->
Figure QLYQS_19
The maximum pooling layer of (a); />
When a plurality of marking samples exist in a typical target of the same type, simultaneously inputting the marking samples into an embedding module to obtain respective embedding vectors, and finally averaging to obtain the embedding vector of the target of the type; the target sample to be marked passes through an embedding module to obtain an embedded vector of the target to be marked;
the index for measuring the distance between the embedded vectors of the target to be marked and the marked target comprises a cosine distance function and an Euclidean distance function;
the relation module is composed of a sub-network, the sub-network comprises two convolution blocks and two full connection layers, and each convolution block comprises 64 convolution blocks
Figure QLYQS_20
A convolution kernel of (a), a BN layer, a ReLU nonlinear layer and two->
Figure QLYQS_21
The input of the sub-network is a vector formed by splicing embedded vectors of two targets, the output is a similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is; and according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked.
7. The intelligent auxiliary labeling method for visible light remote sensing image targets as claimed in claim 4,
the zero-sample typical target automatic labeling model based on the bimodal embedding learning comprises an embedding module, a relation module, an identification module and a target attribute description embedding representation module, wherein the input of the target attribute description embedding representation module is attribute semantic description of a typical target, and the output of the target attribute description embedding representation module is an embedding vector of the typical target corresponding to attribute description;
the zero-sample typical object automatic labeling model based on the bimodal embedding learning is formulated as,
Figure QLYQS_22
wherein ,
Figure QLYQS_23
a representative target model number is shown,
Figure QLYQS_24
representing the image modality embedding function and,
Figure QLYQS_25
a sub-network of the relational module is represented,
Figure QLYQS_26
a property modality embedding function is represented,
Figure QLYQS_27
a vector channel aggregation operation is represented that,
Figure QLYQS_28
represents the value of the argument at which the image modality embedding function is at a maximum value,
Figure QLYQS_29
a representative object to be noted is shown,
Figure QLYQS_30
a typical object that has been labeled is shown,
Figure QLYQS_31
representing a set of models for which a typical object has been labeled.
8. The intelligent auxiliary labeling method for the target of the visible light remote sensing image according to claim 2,
the automatic target labeling model expanded according to the incremental learning comprises a classification method based on cosine similarity and anti-forgetting constraint based on distillation learning;
the cosine similarity based classification method is formulated as,
Figure QLYQS_32
wherein ,
Figure QLYQS_33
a representative feature extractor for extracting a feature of the image,
Figure QLYQS_34
the representation target automatically marks the model sample,
Figure QLYQS_35
representing an exponential function with a natural constant e as the base,
Figure QLYQS_36
a deviation vector representing the weights of the convolutional neural network,
Figure QLYQS_37
a deviation vector transpose matrix representing a weight of the first convolutional neural network, based on the first convolution vector transpose matrix, and based on the first convolution vector transpose matrix and the second convolution vector transpose matrix>
Figure QLYQS_38
A bias vector transpose matrix representing weights of the second convolutional neural network,
Figure QLYQS_39
a deviation vector representing the last layer of the convolutional neural network,
Figure QLYQS_40
representing the predicted probability of the sample in the convolutional neural network;
at the last layer using cosine normalization formulated as,
Figure QLYQS_41
wherein ,
Figure QLYQS_42
a learnable scalar representing the peak of the distribution of the controlling softmax function,
Figure QLYQS_43
is a feature extractor for extracting the features of a document,
Figure QLYQS_44
representing an exponential function with a natural constant e as the base,
Figure QLYQS_45
being weights of convolutional neural networksThe vector of the deviation is then calculated,
Figure QLYQS_46
representing the predicted probability of the sample in the convolutional neural network;
the value range of the cosine similarity is limited to [ -1,1];
the distillation learning based anti-forgetting constraint is formulated as,
Figure QLYQS_47
wherein ,
Figure QLYQS_48
expressing the normalized features extracted by a typical target automatic labeling model before incremental learning;
Figure QLYQS_49
expressing the normalized features extracted by the typical target automatic labeling model after incremental learning;
Figure QLYQS_50
expressed as a loss of constraint, encourages the direction of features extracted by the current network to resemble the original model.
9. An electronic device, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the intelligent assisted annotation method for objects in remote sensing images of visible light according to any one of claims 1 to 8.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the intelligent assisted annotation method for objects in visible light remote sensing images according to any one of claims 1 to 8.
CN202310159007.7A 2023-02-24 2023-02-24 Intelligent auxiliary labeling method for visible light remote sensing image target Active CN115861847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310159007.7A CN115861847B (en) 2023-02-24 2023-02-24 Intelligent auxiliary labeling method for visible light remote sensing image target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310159007.7A CN115861847B (en) 2023-02-24 2023-02-24 Intelligent auxiliary labeling method for visible light remote sensing image target

Publications (2)

Publication Number Publication Date
CN115861847A true CN115861847A (en) 2023-03-28
CN115861847B CN115861847B (en) 2023-05-05

Family

ID=85658821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310159007.7A Active CN115861847B (en) 2023-02-24 2023-02-24 Intelligent auxiliary labeling method for visible light remote sensing image target

Country Status (1)

Country Link
CN (1) CN115861847B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670060A (en) * 2018-12-10 2019-04-23 北京航天泰坦科技股份有限公司 A kind of remote sensing image semi-automation mask method based on deep learning
CN111191732A (en) * 2020-01-03 2020-05-22 天津大学 Target detection method based on full-automatic learning
CN111461162A (en) * 2020-01-03 2020-07-28 华中科技大学 Zero-sample target detection model and establishing method thereof
CN112836762A (en) * 2021-02-26 2021-05-25 平安科技(深圳)有限公司 Model distillation method, device, equipment and storage medium
CN113222068A (en) * 2021-06-03 2021-08-06 西安电子科技大学 Remote sensing image multi-label classification method based on adjacency matrix guidance label embedding
CN114003725A (en) * 2021-12-30 2022-02-01 深圳佑驾创新科技有限公司 Information annotation model construction method and information annotation generation method
US20220222921A1 (en) * 2021-06-03 2022-07-14 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method for generating image classification model, roadside device and cloud control platform
CN115690541A (en) * 2022-11-01 2023-02-03 四川大学 Deep learning training method for improving recognition accuracy of small sample and small target

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670060A (en) * 2018-12-10 2019-04-23 北京航天泰坦科技股份有限公司 A kind of remote sensing image semi-automation mask method based on deep learning
CN111191732A (en) * 2020-01-03 2020-05-22 天津大学 Target detection method based on full-automatic learning
CN111461162A (en) * 2020-01-03 2020-07-28 华中科技大学 Zero-sample target detection model and establishing method thereof
CN112836762A (en) * 2021-02-26 2021-05-25 平安科技(深圳)有限公司 Model distillation method, device, equipment and storage medium
CN113222068A (en) * 2021-06-03 2021-08-06 西安电子科技大学 Remote sensing image multi-label classification method based on adjacency matrix guidance label embedding
US20220222921A1 (en) * 2021-06-03 2022-07-14 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method for generating image classification model, roadside device and cloud control platform
CN114003725A (en) * 2021-12-30 2022-02-01 深圳佑驾创新科技有限公司 Information annotation model construction method and information annotation generation method
CN115690541A (en) * 2022-11-01 2023-02-03 四川大学 Deep learning training method for improving recognition accuracy of small sample and small target

Also Published As

Publication number Publication date
CN115861847B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN106778804B (en) Zero sample image classification method based on class attribute transfer learning
CN109993197A (en) A kind of zero sample multi-tag classification method based on the end-to-end example differentiation of depth
CN113705570B (en) Deep learning-based few-sample target detection method
CN112487822A (en) Cross-modal retrieval method based on deep learning
Niu et al. A novel image retrieval method based on multi-features fusion
CN111158641B (en) Automatic recognition method for transaction function points based on semantic analysis and text mining
WO2019230666A1 (en) Feature amount extraction device, method, and program
Wan et al. Affine invariant description and large-margin dimensionality reduction for target detection in optical remote sensing images
CN110705384B (en) Vehicle re-identification method based on cross-domain migration enhanced representation
Song et al. Digital Image Semantic Segmentation Algorithms: A Survey.
Wei et al. Food image classification and image retrieval based on visual features and machine learning
CN115329120A (en) Weak label Hash image retrieval framework with knowledge graph embedded attention mechanism
CN111898528B (en) Data processing method, device, computer readable medium and electronic equipment
CN111339258B (en) University computer basic exercise recommendation method based on knowledge graph
CN112465016A (en) Partial multi-mark learning method based on optimal distance between two adjacent marks
Amrouche et al. Real-time detection of vehicle license plates numbers
CN115861847B (en) Intelligent auxiliary labeling method for visible light remote sensing image target
Yu et al. A method of small object detection based on improved deep learning
CN116109834A (en) Small sample image classification method based on local orthogonal feature attention fusion
US11328179B2 (en) Information processing apparatus and information processing method
Konlambigue et al. Performance evaluation of state-of-the-art filtering criteria applied to sift features
Ouni et al. A hybrid approach for improved image similarity using semantic segmentation
CN110609961A (en) Collaborative filtering recommendation method based on word embedding
Raja et al. Classification and retrieval of natural scenes
Zhou et al. A novel part-based model for fine-grained vehicle recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant