CN115861847A

CN115861847A - Intelligent auxiliary marking method for visible light remote sensing image target

Info

Publication number: CN115861847A
Application number: CN202310159007.7A
Authority: CN
Inventors: 李冠群
Original assignee: Genyu Muxing Beijing Space Technology Co ltd
Current assignee: Genyu Muxing Beijing Space Technology Co ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-03-28
Anticipated expiration: 2043-02-24
Also published as: CN115861847B

Abstract

The invention belongs to the technical field of image processing, and discloses an intelligent auxiliary labeling method for a target of a visible light remote sensing image, which comprises the following steps: acquiring a first sample label or attribute description of each type of target manually; constructing a typical target automatic labeling model based on the first sample labeling or attribute description; and marking the visible light remote sensing image through a typical target automatic marking model. Aiming at the problem that the traditional registration method obtains unstable shallow features, the method can reduce the manpower, material resources and financial resources required in the sample data labeling process, improve the efficiency of sample labeling, and further promote the progress of intelligent interpretation research of the visible light remote sensing image.

Description

Intelligent auxiliary marking method for visible light remote sensing image target

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an intelligent auxiliary labeling method for a target of a visible light remote sensing image.

Background

With the wide application of deep learning in the intelligent interpretation of the remote sensing images, the marked data samples have important influence on the intelligent interpretation effect of the remote sensing images, large-scale, standardized and high-quality marked data samples are basic preconditions for the research of the intelligent interpretation algorithm of the remote sensing images, the traditional data marking method highly depends on manual participation, the automation and intelligence degrees are low, and higher cost is consumed.

The type labeling of typical targets has higher requirements on field professional knowledge, the conventional labeling work is usually completed by experts, the manual labeling method of the experts has the problems of low speed, low efficiency, low cost performance, lack of reusability and the like, and in reality, some types of target samples are few, so that a high-precision automatic labeling method cannot be obtained through a large-scale sample training mode.

When a sample data set is established, maintained and expanded, a new labeling requirement can appear, which also requires that an automatic model has an expansion capability, so that when a part of new target samples are given, the automatic labeling capability of a new target can be learned, and meanwhile, the labeling capability of an old target which is already learned cannot be lost. In the traditional method, after a new labeling requirement comes, a new labeling sample and an old labeling sample are mixed to form a new data set to retrain or fine tune a labeling model, so that the model learns the labeling capacity of new and old targets at the same time. The traditional method has a large cost of function expansion because old labeled data needs to be revisited and the model needs to be retrained. Therefore, the method of incremental learning can be used for expanding the labeling range of the labeling model on the premise of accessing the old labeling data as little as possible. The most common problem in incremental learning is the imbalance in sample size, since only a small amount of old annotation data is accessible. The number of samples is not uniform, and the training of the scale will bring the following negative effects to the model:

1) The classifier weight magnitude is unbalanced: for the classifier weight parameters, the magnitude of the new class will be significantly higher than the old class;

2) The classifier weights and sample features of the old classes will be biased: when learning information for a new class, some information (knowledge) for the old class is forgotten, including classifier weights, characteristics of the sample, etc.

After the sample is labeled, the quality of the labeled information is inspected by the traditional manual labeling system through manual visual inspection, and a large amount of manpower is consumed.

Disclosure of Invention

The invention aims to overcome one or more of the prior technical problems and provides an intelligent auxiliary marking method for a target of a visible light remote sensing image.

In order to achieve the purpose, the intelligent auxiliary marking method for the visible light remote sensing image target provided by the invention comprises the following steps:

acquiring a first sample label or attribute description of each type of target manually;

constructing a typical target automatic labeling model based on the first sample labeling or attribute description;

and marking the visible light remote sensing image through a typical target automatic marking model.

According to one aspect of the invention, the method further comprises:

and expanding the target automatic labeling model according to incremental learning.

According to one aspect of the invention, the method further comprises establishing an association between different targets, formulated as:

；

wherein ,

，/>

in order to label the object(s),

for the number of the symbiotic occurrence of the two,

is comprised of>

As the number of images to be annotated,

and />

Unequal, with asymmetry.

According to one aspect of the invention, the building of the automatic labeling model of the typical target comprises a small sample target class automatic labeling model based on embedded learning and a zero sample typical target automatic labeling model based on dual-mode embedded learning.

According to one aspect of the invention, the embedded learning-based small sample target category automatic labeling model comprises an embedding module, a relation module and an identification module, wherein the embedding module performs dimensionality reduction mapping on a high-dimensional image to a low-dimensional embedding space to obtain an embedding vector; the relationship module measures the degree of approximation between two typical targets; the identification module judges the relation between the marked typical target and the to-be-marked typical target of each model according to the approach principle to obtain the type of the to-be-marked typical target; the small sample target class automatic labeling model based on embedded learning is formulated as,

；

wherein ,

a representative target model number is shown,

representing the image modality embedding function and,

a sub-network of the relational module is represented,

a vector channel aggregation operation is represented that,

a representative target to be noted is shown,

a typical object that has been labeled is shown,

representing the value of the argument at which the function is at a maximum,

representing a set of models for which a typical object has been labeled.

According to one aspect of the invention, the embedded module is constituted by an embedded representation subnetwork comprising four volume blocks of 64 each

A BN layer, a ReLU non-linear layer, the first two volume blocks also containing a->

The largest pooling layer of (a);

when a plurality of marked samples exist in a typical target of the same type, simultaneously inputting the marked samples into an embedding module to obtain respective embedding vectors, and finally averaging to obtain the embedding vector of the target of the type; the target sample to be marked passes through an embedding module to obtain an embedded vector of the target to be marked;

the index for measuring the distance between the embedded vectors of the target to be marked and the marked target comprises a cosine distance function and an Euclidean distance function;

the relation module is composed of a sub-network, the sub-network comprises two convolution blocks and two full connection layers, and each convolution block comprises 64 convolution blocks

A convolution kernel of (B), a BN layer, a ReLU nonlinear layer and two->

The input of the sub-network is a vector formed by splicing embedded vectors of two targets, the output is a similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is; and according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked.

According to one aspect of the invention, the zero-sample typical target automatic labeling model based on bimodal embedding learning comprises an embedding module, a relation module, an identification module and a target attribute description embedding expression module, wherein the input of the target attribute description embedding expression module is attribute semantic description of a typical target, and the output of the target attribute description embedding expression module is an embedding vector of the typical target corresponding to attribute description;

the zero-sample typical object automatic labeling model based on the bimodal embedding learning is formulated as,

；

wherein ,

a representative target model number is shown,

representing the image modality embedding function and,

a sub-network of the relational module is represented,

a property modality embedding function is represented,

a vector channel aggregate operation is represented that,

represents the value of the argument at which the image modality embedding function is at a maximum value,

a representative target to be noted is shown,

a typical object that has been labeled is shown,

representing a set of models for which a typical object has been labeled.

According to one aspect of the invention, the automatic target labeling model based on incremental learning expansion comprises a classification method based on cosine similarity and an anti-forgetting constraint based on distillation learning;

the cosine similarity-based classification method is formulated as,

；

wherein ,

a representative feature extractor for extracting a feature of a video signal,

the representation target automatically marks the model sample,

indicates a natural constant->

An exponential function of the base is used,

a bias vector representing weights of the convolutional neural network,

a bias vector transpose matrix representing the first convolutional neural network weights,

a bias vector transpose matrix representing weights of the second convolutional neural network,

the deviation vector representing the last layer of the convolutional neural network,

representing the predicted probability of the sample in the convolutional neural network;

at the last layer using cosine normalization formulated as,

；

wherein ,

a learnable scalar representing the peak of the distribution of the controlling softmax function,

is a feature extractor for extracting the features of a document,

indicates a natural constant->

An exponential function of the base is used,

is a deviation vector of the convolutional neural network weight, is->

the value range of the cosine similarity is limited to [ -1,1];

the distillation learning-based anti-forgetting constraint is formulated as,

；

wherein ,

expressing the normalized features extracted by a typical target automatic labeling model before incremental learning;

expressing the normalized features extracted by the typical target automatic labeling model after incremental learning;

expressed as a loss of constraint, encourages the direction of features extracted by the current network to resemble the original model.

In order to achieve the above object, the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein the computer program, when executed by the processor, implements the above intelligent auxiliary labeling method for a target in a visible light remote sensing image.

In order to achieve the above object, the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for intelligently assisting in labeling a target in a visible light remote sensing image is implemented.

The invention has the beneficial effects that: the manpower, material resources and financial resources required in the sample data labeling process are reduced, the sample labeling efficiency is improved, and the progress of intelligent interpretation research of the visible light remote sensing image is further promoted.

Drawings

FIG. 1 is a schematic flow chart of an intelligent auxiliary labeling method for visible light remote sensing image targets according to the present invention;

FIG. 2 schematically illustrates a small sample exemplary object class automatic labeling model based on embedded learning according to the present invention;

fig. 3 schematically shows an exemplary zero-sample object class automatic labeling model based on embedded learning according to the present invention.

Detailed Description

The present invention will now be discussed with reference to exemplary embodiments, it being understood that the embodiments discussed are only for the purpose of enabling a person of ordinary skill in the art to better understand and thus implement the contents of the present invention, and do not imply any limitation on the scope of the present invention.

As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on" and the terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".

FIG. 1 is a schematic flow chart of an intelligent auxiliary labeling method for visible light remote sensing image targets according to the present invention; FIG. 2 is a schematic diagram illustrating an exemplary object class automatic labeling model based on a small sample of embedded learning according to the present invention; FIG. 3 schematically illustrates a zero-sample representative object class automatic labeling model based on embedded learning according to the present invention; as shown in fig. 1 to fig. 3, the intelligent auxiliary labeling method for visible light remote sensing image targets of the invention includes:

In this embodiment, a semi-automatic labeling framework of human-computer interaction is firstly studied, small-amount sample labeling is firstly carried out manually or experts are provided to carry out first sample labeling on each model or attribute description on different targets, then an automatic labeling model is obtained through a small-sample and zero-sample learning algorithm, and finally an incremental model updating technology is studied to widen the labeling range of the automatic labeling model through incremental learning.

According to an embodiment of the invention, the method further comprises: and expanding the target automatic labeling model according to incremental learning.

According to an embodiment of the present invention, the method further comprises establishing an association relationship between different targets, formulated as:

；

wherein ,

，/>

in order to label an object, the object is marked,

for the number of the symbiotic occurrence of the two,

is comprised of>

As the number of images to be annotated,

and />

Unequal, with asymmetry.

In this embodiment, in the remote sensing image big data space, spatial distributions between objects in different regions and between different objects in the same region have a certain association relationship, and due to the existence of the association relationship, in the large-scale remote sensing image labeling data set, objects appearing in the same image at the same time have strong semantic correlation. Such words with higher symbiotic frequency often represent two concepts or things with stronger association, so that there is a high possibility that they are labeled in the same image, such as "airplane" and "airport", "road" and "car". Therefore, semantic related information among the labeled objects can be effectively provided based on the characteristic of the symbiotic association, so that the result of improving the semantic labeling of the big data of the remote sensing image is effectively corrected and improved. However, the word frequency count is simply counted according to the symbiotic relationship, and different characteristics of different labeled objects cannot be effectively considered. Therefore, the invention adopts the following mode to calculate the association and symbiosis relationship:

；

wherein ,

，/>

in order to label an object, the object is marked,

for the number of the symbiotic occurrence of the two,

to comprise +>

As the number of images to be annotated,

and />

Unequal, with asymmetry.

Based on the above definition of the association and symbiosis relationship,

and />

Are not equal and have asymmetry. Consider that->

，/>

Are two subjects with a greater difference in frequency of occurrence, if->

And/or>

There is a certain interdependence between, for example->

Dependent on->

Present, then it is easier to slave>

Get->

But difficult to slave>

Is present to infer->

Whether or not it is present. For example, the relationship between the objects "port" and "ship" can easily infer the presence of "water" from "ship", but given "water" it is difficult to tell whether "ship" is present, because "port" is associated with more things.

According to one embodiment of the invention, the typical target automatic labeling model is constructed by a small sample target class automatic labeling model based on embedded learning and a zero sample typical target automatic labeling model based on dual-mode embedded learning.

In this embodiment, in the actual labeling, there may be some types of target samples that are few, and a high-precision automatic labeling method cannot be obtained through a large-scale sample training mode. Based on analysis, the invention designs a typical target automatic labeling method based on zero sample and small sample learning, and experts label a first sample for each model or describe the attributes of different targets, and according to a small number of labeled samples or attribute descriptions, the invention realizes an efficient, intelligent and automatic typical target category automatic labeling method by a target classification technology based on small sample and zero sample learning.

According to one embodiment of the invention, the embedded learning-based small sample target category automatic labeling model comprises an embedding module, a relation module and an identification module, wherein the embedding module performs dimension reduction mapping on a high-dimensional image to a low-dimensional embedding space to obtain an embedding vector; the relationship module measures the degree of approximation between two typical targets; the recognition module judges the relation between the typical target with the marked Ge model and the typical target to be marked according to the proximity principle to obtain the type of the typical target to be marked; the small sample object class automatic labeling model based on embedded learning is formulated as,

；

wherein ,

a representative target model number is shown,

representing the image modality embedding function and,

a sub-network of the relational module is represented,

a vector channel aggregation operation is represented that,

a representative object to be noted is shown,

a typical object that has been labeled is shown,

representing the value of the argument at which the function is at a maximum,

representing a set of models for which a typical object has been labeled.

In this embodiment, the purpose of embedding learning is to learn an embedding function to map a high-dimensional input to a low-dimensional space, and the representation of the high-dimensional input in the low-dimensional space should have the following characteristics: the embedding representation points of the samples which are similar in the high-dimensional space in the low-dimensional space should be closer, and the embedding representation points of the samples which are dissimilar in the high-dimensional space in the low-dimensional space should be far away from each other, so that the classes of the unknown samples can be obtained by performing distance calculation and comparison on the embedding vectors of the unknown samples and the embedding vectors of the known samples in the low-dimensional embedding space. And measuring the distance between the embedded vector of the typical target to be marked and the embedded vector of a small number of marking samples of each model of the typical target as long as a proper embedding function can be learned, wherein the model of the target corresponding to the marking sample closest to the typical target to be marked is the model of the typical target to be marked according to the nearest neighbor principle. Therefore, the invention designs a small sample typical target class automatic labeling model based on embedded learning as shown in FIG. 2.

The model is divided into an embedding module, a relation module and an identification module. The embedding module is used for reducing the dimension of the high-dimensional image and mapping the high-dimensional image to the low-dimensional embedding space to form an embedding vector. The relation module is used for measuring the similarity between the two typical targets, and the identification module is used for judging the relation between the marked typical target and the to-be-marked typical target of each model according to the nearest neighbor principle to obtain the type of the to-be-marked typical target.

According to one embodiment of the invention, the embedded module is constituted by an embedded representation subnetwork comprising four volume blocks of 64 each

The maximum pooling layer of (a);

A convolution kernel of (B), a BN layer, a ReLU nonlinear layer and two->

The input of the sub-network is a vector formed by splicing embedded vectors of two targets, the output is a similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is; and according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked. />

In this embodiment, the embedding module is constituted by an embedded representation subnetwork comprising four volume blocks of 64 each

The largest pooling layer of (a). The input of the embedding module is a high-dimensional image, and the output is an embedding vector. If a plurality of labeled samples exist in the typical target of the same model, the samples can be simultaneously input into the embedding module to obtain respective embedding vectors, and then the embedding vectors of the target of the model are obtained by averaging the embedding vectors. And similarly, the target sample to be marked is also sent to the embedding module to obtain the embedded vector of the target to be marked. The commonly used indexes for measuring the distance between the embedded vectors of the target to be marked and the marked target are a cosine distance function and an Euclidean distance function.

The function of the relation module is to measure the similarity between the embedded vectors of two targets, the module is composed of a sub-network, the sub-network comprises two convolution blocks and two full connection layers, each convolution block comprises 64 convolution blocks

A convolution kernel of, a BN layer, a ReLU nonlinear layer and twoNumber of or>

The maximum pooling layer of (a). The input of the sub-network is a vector formed by splicing the embedded vectors of the two targets, the output is the similarity score of the two targets, the value range is 0-1, and the larger the value is, the closer the relationship between the two is. And according to the nearest neighbor principle, the model of the marked typical target closest to the relation of the typical target to be marked is the model of the target to be marked.

According to one embodiment of the invention, the zero-sample typical target automatic labeling model based on bimodal embedding learning comprises an embedding module, a relation module, an identification module and a target attribute description embedding expression module, wherein the input of the target attribute description embedding expression module is attribute semantic description of a typical target, and the output of the target attribute description embedding expression module is an embedding vector of the typical target corresponding to attribute description;

；

wherein ,

a representative target model number is shown,

representing the image modality embedding function and,

a sub-network of the relational module is represented,

a property-representing modality-embedding function is represented,

a vector channel aggregation operation is represented that,

a representative target to be noted is shown,

a typical object that has been labeled is shown,

representing a set of models for which a typical object has been labeled.

In this embodiment, such a situation may occur in reality: when a new type target appears but an automatic labeling model is trained before, an image sample of a typical target of the type is not obtained in a database, or images of targets of certain types cannot be shot due to the fact that the images are sensitive to military information and the like. Images of such typical objects may be acquired at a later time for various reasons and the model is required to be able to quickly achieve automatic labeling of such objects. Therefore, the invention designs a zero-sample typical object class automatic labeling model based on embedded learning as shown in fig. 3 on the basis of the small-sample typical object class automatic labeling model based on embedded learning.

Compared with a small-sample typical target category automatic labeling model based on embedded learning, a target attribute description embedded representation module is added in the zero-sample typical target category automatic labeling model based on embedded learning, the input of the module is attribute semantic description of a typical target, such as information of fuselage length, wing included angle and the like of the target, and the embedded vector of the typical target corresponding to the attribute description is output. Therefore, only the expert is required to describe the attributes of the target of a certain model to form a typical target attribute description vector, and the typical target category identification under the condition of the zero-marking sample can be realized.

According to one embodiment of the invention, the automatic target labeling model is expanded according to incremental learning and comprises a classification method based on cosine similarity and an anti-forgetting constraint based on distillation learning;

the classification method based on cosine similarity is formulated as,

；

wherein ,

a representative feature extractor for extracting a feature of a video signal,

the representation target automatically marks the model sample,

indicates a natural constant->

An exponential function of the base is used,

a deviation vector representing the weights of the convolutional neural network,

at the last layer using cosine normalization is formulated as,

；

wherein ,

is a feature extractor for extracting the features of a document,

indicating a natural constant>

An exponential function of the base is used,

is the deviation vector of the convolutional neural network weights,

the value range of the cosine similarity is limited to [ -1,1];

the distillation learning based anti-forgetting constraint is formulated as,

；

wherein ,

expressed as a constraint penalty, encourages the direction of features extracted by the current network to be similar to the original model.

In this embodiment, when the sample data set is created, maintained, and expanded, a new labeling requirement may occur, which also requires that the automatic model has an expansion capability, so that when a part of new target samples are given, the automatic labeling capability of the new target can be learned, and at the same time, the labeling capability of an old target that has already been learned cannot be lost. In the traditional method, after a new labeling requirement comes, a new labeling sample and an old labeling sample are mixed to form a new data set to retrain or fine tune a labeling model, so that the model learns the labeling capacity of new and old targets at the same time. The traditional method has a large cost of function expansion because old labeled data needs to be revisited and the model needs to be retrained. Therefore, the invention provides a method for using incremental learning to expand the labeling range of the labeling model on the premise of accessing the old labeling data as little as possible.

The most common problem in incremental learning is the imbalance in sample size, since only a small amount of old annotation data is accessible. Training with an unbalanced number of samples will have a negative impact on the model. This can result in an imbalance in the magnitude of the classifier weights, and the magnitude of the new class can be significantly higher than the old class for the classifier weight parameters. At the same time, the classifier weights and sample features of the old classes are also biased: when learning information for a new class, some information (knowledge) for the old class is forgotten, including classifier weights, characteristics of the sample, etc. The invention respectively proposes to use a classification method based on cosine similarity and an anti-forgetting constraint based on distillation learning to overcome the two problems. The cosine similarity is introduced into incremental learning, so that the deviation caused by the obvious difference in the amplitude of the new and old class features can be effectively eliminated.

An anti-forgetting constraint is introduced by the new anti-forgetting loss function, which provides a stronger constraint to retain the previous knowledge than the original loss function. In order to reduce the forgetfulness of the previous contents when the model is adapted to the new data, the normalization characteristics of the new model and the old model are required to be as same as possible, so that a new distillation loss is proposed. The idea behind this loss is that the distributions of features of different classes learned by the old model reflect to some extent the relationships between classes, and therefore maintaining such relationships is also meaningful to prevent forgetfulness. The feature distribution is mainly described by the included angle of the feature vector in the feature space, so that the anti-forgetting constraint based on distillation learning is proposed.

To achieve the above object, the present invention further provides an electronic device, including: the processor, the memory and the computer program stored on the memory and capable of running on the processor are used for realizing the intelligent auxiliary labeling method for the visible light remote sensing image target when the computer program is executed by the processor.

In order to achieve the above object, the present invention further provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for intelligently assisting in labeling a target in a visible light remote sensing image is implemented.

Based on the method, the method has the advantages that aiming at the problem that the traditional registration method obtains unstable shallow features, the method extracts deep features by integrating point-line-plane geometric structural features into a deep learning framework to obtain deep semantic information of the remote sensing image, and improves the robustness of the registration algorithm; and an image pyramid is also constructed to solve the problem of size change, and the problem that the directly extracted features contain a large amount of invalid information is solved by using a key point selection method based on an attention mechanism, so that the accuracy of the feature extraction method is improved, and the accuracy of visible light remote sensing image registration is further improved.

Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the apparatus and the device described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

The technical solution of the present invention substantially or partially contributes to the prior art, or parts of the technical solution may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for sending/receiving an energy saving signal according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims

1. The intelligent auxiliary marking method for the visible light remote sensing image target is characterized by comprising the following steps:

2. The intelligent auxiliary labeling method for the visible light remote sensing image target according to claim 1, further comprising:

and automatically labeling the extended target according to the incremental learning.

3. The intelligent auxiliary target labeling method for the visible light remote sensing image according to claim 1, characterized in that the method further comprises establishing an association relationship between different targets, which is expressed by a formula:

；

wherein ,

，/>

in order to label an object, the object is marked,

for the number of the symbiotic occurrence of the two,

is comprised of>

As the number of images to be annotated,

and />

Unequal, with asymmetry.

4. The intelligent auxiliary target labeling method of visible light remote sensing images as claimed in claim 1,

the typical target automatic labeling model construction comprises a small sample target category automatic labeling model based on embedded learning and a zero sample typical target automatic labeling model based on dual-mode embedded learning.

5. The intelligent auxiliary labeling method for visible light remote sensing image targets as claimed in claim 2,

the embedded learning-based small sample target category automatic labeling model comprises an embedding module, a relation module and an identification module, wherein the embedding module is used for reducing the dimension of a high-dimensional image and mapping the high-dimensional image to a low-dimensional embedding space to obtain an embedding vector; the relationship module measures the degree of approximation between two typical targets; the identification module judges the relation between the marked typical target and the to-be-marked typical target of each model according to the approach principle to obtain the type of the to-be-marked typical target; the small sample object class automatic labeling model based on embedded learning is formulated as,

；

wherein ,

a representative target model number is shown,

representing the image modality embedding function and,

a sub-network of the relational module is represented,

a vector channel aggregation operation is represented that,

a representative object to be noted is shown,

a typical object that has been labeled is shown,

representing the value of the argument at which the function is at a maximum,

representing a set of models for which a typical object has been labeled.

6. The intelligent auxiliary target labeling method of visible light remote sensing images as claimed in claim 3, characterized in that,

the embedded module is composed of an embedded representation sub-network, the embedded representation sub-network comprises four volume blocks, and 64 volume blocks are arranged in each volume block

The maximum pooling layer of (a); />

When a plurality of marking samples exist in a typical target of the same type, simultaneously inputting the marking samples into an embedding module to obtain respective embedding vectors, and finally averaging to obtain the embedding vector of the target of the type; the target sample to be marked passes through an embedding module to obtain an embedded vector of the target to be marked;

A convolution kernel of (a), a BN layer, a ReLU nonlinear layer and two->

7. The intelligent auxiliary labeling method for visible light remote sensing image targets as claimed in claim 4,

the zero-sample typical target automatic labeling model based on the bimodal embedding learning comprises an embedding module, a relation module, an identification module and a target attribute description embedding representation module, wherein the input of the target attribute description embedding representation module is attribute semantic description of a typical target, and the output of the target attribute description embedding representation module is an embedding vector of the typical target corresponding to attribute description;

；

wherein ,

a representative target model number is shown,

representing the image modality embedding function and,

a sub-network of the relational module is represented,

a property modality embedding function is represented,

a vector channel aggregation operation is represented that,

a representative object to be noted is shown,

a typical object that has been labeled is shown,

representing a set of models for which a typical object has been labeled.

8. The intelligent auxiliary labeling method for the target of the visible light remote sensing image according to claim 2,

the automatic target labeling model expanded according to the incremental learning comprises a classification method based on cosine similarity and anti-forgetting constraint based on distillation learning;

the cosine similarity based classification method is formulated as,

；

wherein ,

a representative feature extractor for extracting a feature of the image,

the representation target automatically marks the model sample,

representing an exponential function with a natural constant e as the base,

a deviation vector transpose matrix representing a weight of the first convolutional neural network, based on the first convolution vector transpose matrix, and based on the first convolution vector transpose matrix and the second convolution vector transpose matrix>

a deviation vector representing the last layer of the convolutional neural network,

at the last layer using cosine normalization formulated as,

；

wherein ,

is a feature extractor for extracting the features of a document,

representing an exponential function with a natural constant e as the base,

being weights of convolutional neural networksThe vector of the deviation is then calculated,

the value range of the cosine similarity is limited to [ -1,1];

the distillation learning based anti-forgetting constraint is formulated as,

；

wherein ,

9. An electronic device, comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the intelligent assisted annotation method for objects in remote sensing images of visible light according to any one of claims 1 to 8.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the intelligent assisted annotation method for objects in visible light remote sensing images according to any one of claims 1 to 8.