CN114912516A

CN114912516A - Cross-domain target detection method and system for coordinating feature consistency and specificity

Info

Publication number: CN114912516A
Application number: CN202210440038.5A
Authority: CN
Inventors: 王晓伟; 蒋沛文; 王惠; 秦晓辉; 边有钢; 胡满江; 秦洪懋; 徐彪; 谢国涛; 秦兆博; 丁荣军
Original assignee: Wuxi Institute Of Intelligent Control Hunan University
Current assignee: Wuxi Institute Of Intelligent Control Hunan University
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2022-08-16
Anticipated expiration: 2042-04-25
Also published as: CN114912516B

Abstract

The invention discloses a cross-domain target detection method and a system for coordinating feature consistency and specificity, which comprises the following steps: step 1, constructing a source domain data set and a target domain data set, and constructing a reference cross-domain target detection model; step 2, continuously updating memory elements in a memory unit through a feature specificity memory read-write module, guiding a reference cross-domain target detection model to learn feature specificity, guiding memory elements of the same class to be confused by using a source domain memory element and a target domain memory element through a feature consistency weighting alignment module, weighting a loss function of each class level domain discriminator according to the proportion of the class of the target to be detected, further guiding the learning of the cross-domain consistency by the features on the basis of semantic specificity, and obtaining a cross-domain target detection model; and 3, training the model by taking the loss function of the cross-domain target detection model for coordinating the consistency and specificity of the characteristics as an optimization target, and applying the trained model to a target domain.

Description

Cross-domain target detection method and system for coordinating feature consistency and specificity

Technical Field

The invention relates to the technical field of domain self-adaptive target detection based on deep learning, in particular to a cross-domain target detection method and system for coordinating feature consistency and specificity.

Background

The current target detection model based on deep learning generally faces the problem of domain drift caused by the difference of data distribution of a training set (called a source domain) and a testing set (called a target domain), and the problem limits the popularization and application capability of the target detection model to a certain extent, so that great challenges are provided for practical application scenes such as road traffic video monitoring and intelligent vehicle target detection. The unsupervised domain self-adaptive target detection tries to transfer the knowledge of the source domain to the target domain, and can improve the detection performance of the target detection model across different fields on the premise of avoiding additional labeling of target domain data and retraining of the model. How to use an appropriate domain adaptive strategy to complete knowledge migration from a source domain to a target domain to improve the cross-domain robustness of a target detection model is becoming a focus of attention in research fields such as computer vision and migration learning.

The existing unsupervised domain self-adaptive target detection method is usually started from a pixel level, an image level and an example level, a game relation between a target detector and a domain discriminator is established by adding domain discriminators with different levels on the target detector and using a gradient inversion layer to perform sign inversion on a gradient flowing from the domain discriminators to the target detector. During the training process, the target detector continuously generates source domain features and target domain features that are as similar as possible to fool the domain discriminator, and the domain discriminator distinguishes as much as possible whether the features generated by the target detector are from the source domain or the target domain. Practice shows that the consistency of the source domain characteristic and the target domain characteristic can be obviously enhanced through the mutual balancing mode of the target detector and the domain discriminator, and the cross-domain detection capability of the target detection model is greatly improved.

In the field of domain adaptive research, consistency of features means that extracted source domain features and target domain features can reach the same state with each other, and specificity of features means that extracted features of different classes can reach different states with each other. The two occupy equal importance in the feature alignment process and often present contradictory relationships.

Most of the existing unsupervised domain self-adaptive target detection methods are dedicated to learning of feature consistency, but the problem of potential loss of feature specificity caused by over-learning of feature consistency is ignored, so that the problem that specific tasks (such as classification, positioning and the like) which are positioned at the downstream of a model and are related to detection are affected adversely to cause feature misalignment is high, and the performance of cross-domain target detection of the model is hindered to a certain extent.

In addition, there may be a large imbalance in the number of different target classes in the source domain data itself, and in the feature consistency learning process, a larger number of target classes are equivalent to having a larger weight, and a smaller number of target classes are equivalent to having a smaller weight. Even if the existing unsupervised domain self-adaptive target detection method has the capability of acquiring the specific features, the problem of insufficient alignment of the specific features is difficult to avoid when the existing unsupervised domain self-adaptive target detection method faces rich and complex practical application scenes, so that the deviation of a cross-domain target detection model is aggravated.

Disclosure of Invention

The invention aims to provide a cross-domain target detection method and a system for coordinating feature consistency and specificity, which can realize coordination of feature specificity and consistency in cross-domain target detection.

To achieve the above object, the present invention provides a cross-domain target detection method coordinating feature consistency and specificity, which comprises:

step 1, constructing a source domain data set and a target domain data set according to actual application requirements, and constructing a reference cross-domain target detection model capable of preliminarily extracting consistency characteristics;

step 2, a characteristic specificity memory read-write module and a characteristic consistency weighting alignment module are arranged on a reference cross-domain target detection model, memory elements which can represent different types of characteristic information in a memory unit are continuously updated through the characteristic specificity memory read-write module, the reference cross-domain target detection model is guided to learn the characteristic specificity, the source domain memory elements and the target domain memory elements are used by the characteristic consistency weighting alignment module to guide the memory elements of the same type of the source domain and the target domain to be mixed, a loss function of each type level domain discriminator is weighted according to the occurrence proportion of the type of a target to be detected, the characteristic is further guided to learn the cross-domain consistency on the basis of the semantic specificity, and the cross-domain target detection model which can coordinate the specificity and the consistency of the characteristic in the cross-domain target detection is obtained;

and 3, training the model by taking the loss function of the cross-domain target detection model for coordinating the consistency and specificity of the characteristics as an optimization target, and applying the trained model to a target domain.

Further, the step 2 of setting the characteristic specific memory read-write module on the reference cross-domain target detection model specifically includes:

step 2.1.1, acquiring a source domain query vector and a target domain query vector;

step 2.1.2, retrieving memory elements of a source domain and memory elements of a target domain;

step 2.1.3, updating the source domain memory unit and the target domain memory unit:

source field memory elements to be read out

Writing to source domain memory cell V _s Memory element v at the corresponding category position _s,k In (1),

memory elements representing the kth category of the source domain;

memory element of target field to be read out

Write target field memory cell V _t Memory element v at the corresponding category position _t,k In (1),

memory elements representing the kth category of the target domain.

Further, source domain memory elements

Obtained in two cases as follows:

case 1:

a1. if the k-th memory element in the source domain memory unit is retrieved by the source domain query vector with the category label index of k, the read source domain memory element

Is described as formula (2):

in the formula (I), the compound is shown in the specification,

representing source domain queries belonging to class kThe mean value of the vector is calculated,

γ ₁ representing the weight coefficient applied to the mean of the source domain query vectors belonging to class k, γ ₂ Represents a weight coefficient, 0, applied to memory elements of the kth class of the source domain<γ ₁ <1，0<γ ₂ <1, and γ ₁ +γ ₂ ＝1；

a2. If the k-th memory element in the target domain memory unit is retrieved by the target domain query vector with the category label index of k, the read target domain memory element

Is described as formula (4):

in the formula (I), the compound is shown in the specification,

represents the mean of the target domain query vectors belonging to class k,

γ ₃ representing a weight coefficient, γ, applied to the mean of the target domain query vectors belonging to class k ₄ Represents a weight coefficient, 0, applied to the memory elements of the kth class of the target domain<γ ₃ <1，0<γ ₄ <1, and γ ₃ +γ ₄ ＝1；

Case 2:

b1. if the k memory element in the memory unit of the source domain is not retrieved, assigning the memory element of the k category of the source domain to the read memory element of the source domain;

b2. if the k-th memory element in the memory unit of the target domain is not retrieved, the memory element of the k-th category of the target domain is directly assigned to the read memory element of the target domain.

Further, the step 2 of "setting a feature consistency weighted alignment module on the reference cross-domain target detection model" specifically includes:

step 2.2.1, constructing a category level domain discriminator;

step 2.2.2, constructing source domain and target domain vector counters, accumulating the occurrence frequency of each type of query vector in one round of the reference cross-domain target detection model training, acquiring the occurrence proportion of the target type to be detected and weighting the loss function of each type level domain discriminator;

and 2.2.3, calculating the weight of the source domain part and the weight of the target domain part of the loss function applied to each class-level discriminator according to the occurrence proportion of the query vectors of the source domain and the target domain of each class in a training turn.

Further, the calculation formulas of "the weight of the source domain portion and the weight of the target domain portion of the loss function" in step 2.2.3 are formula (12) and formula (13), respectively:

in the formula, alpha _k Representing the weight, β, of the source domain portion of the penalty function applied to the kth class level domain discriminator _k Weight of the objective domain part of the penalty function representing the kth class-level domain discriminator, 0<α _k <1，0<β _k <1。

Further, the weighted alignment penalty function for each class level arbiter in step 2.2.1

As shown in equation (14):

in the formula (I), the compound is shown in the specification,

source domain memory elements representing a kth class level domain discriminator for predictive read-out

Probability of belonging to source domain

Predicting read target domain memory elements

Probability of belonging to target domain

Indicating a desire.

Further, step 2.2.2 specifically includes:

before the beginning of each training turn, the number N of the source domain query vectors belonging to the category k in the source domain vector counter _s,k And the number N of target domain query vectors belonging to class k in the target domain vector counter _t,k Are all set to 0;

in each training iteration, the total number n of source domain query vectors indexed by k using the class label is used _s,k And the total number n of target domain query vectors indexed by category label k _t,k And respectively updating the numerical values at the corresponding category positions of the vector counters of the source domain and the target domain according to an equation (10) and an equation (11):

N _s,k ←N _s,k +n _s,k (10)

N _t,k ←N _t,k +n _t,k (11)

when a training round is finished, the values stored in the source domain and target domain vector counters are the total number of source domain and target domain query vectors of each category in the training round, respectively.

The invention provides a cross-domain target detection system for coordinating feature consistency and specificity, which comprises a reference cross-domain target detection model, a feature specificity memory read-write module and a feature consistency weighting alignment module, wherein,

the reference cross-domain target detection model is provided with a basic target detector, different-level domain discriminators and a gradient inversion layer, wherein the gradient inversion layer is used for realizing the countermeasure training of the basic target detector and the various-level domain discriminators, so that the reference cross-domain target detection model has the capability of preliminarily extracting consistency characteristics;

the characteristic specificity memory read-write module is provided with a source domain memory unit and a target domain memory unit which have read and write basic operations and are used for storing different types of characteristic information, and in each training iteration process, the characteristic specificity memory read-write module is used for reading out a source domain memory element and a target domain memory element from the memory unit, updating the source domain memory unit and the target domain memory unit per se, guiding a reference cross-domain target detection model to learn the semantic specificity of the characteristics and providing input for a subsequent characteristic consistency weighting alignment module;

the feature consistency weighting and aligning module is provided with a source domain and target domain vector counter, a plurality of category level domain discriminators and a gradient inversion layer, wherein each category level domain discriminator takes a source domain memory element and a target domain memory element of a corresponding category as input, and performs confusion on the memory elements of the same category of the source domain and the target domain, and is also used for weighting a loss function of each category level domain discriminator according to the occurrence proportion of a target category to be detected, and further guiding the learning of the cross-domain consistency by the features on the basis of semantic specificity.

Further, the feature specificity memory read-write module specifically comprises:

a query vector acquisition unit for acquiring a source domain query vector and a target domain query vector;

a memory element retrieval unit for retrieving a source domain memory element and a target domain memory element;

a memory element updating unit for updating the source domain memory unit and the target domain memory unit:

source field memory elements to be read out

memory elements representing the kth category of the source domain;

memory element of target field to be read out

memory elements representing the kth category of the target domain.

Further, the feature consistency weighting and aligning module specifically includes:

a discriminator construction unit for constructing a category-level domain discriminator;

the vector counter constructing unit is used for constructing a source domain vector counter and a target domain vector counter and is used for accumulating the occurrence times of each type of query vector in one round of reference cross-domain target detection model training;

and the discrimination loss weighting unit is used for calculating the weight of the loss function source domain part and the weight of the target domain part applied to each category level discriminator according to the occurrence proportion of the source domain query vector and the target domain query vector of each category in one training turn so as to weight the loss function of each category level discriminator.

According to the method, the memory elements capable of representing different types of feature information in the memory unit are continuously updated by using the feature specificity memory read-write module, the learning of the reference cross-domain target detection model on the feature specificity is guided, then the alignment of the same type features of the two domains is guided by using the memory elements of the source domain and the target domain according to the same attention, the cross-domain consistency of the features is further enhanced on the basis of the specificity, and therefore the coordination of the specificity and the consistency of the features is realized in the cross-domain target detection.

Drawings

FIG. 1 is an architecture diagram of a cross-domain target detection method that coordinates feature consistency and specificity provided by an embodiment of the invention.

FIG. 2 is a flowchart of a cross-domain target detection method that coordinates feature consistency and specificity provided by an embodiment of the invention.

Fig. 3 is a flowchart of a method for implementing setting of a feature-specific memory read-write module on the reference cross-domain target detection model in step 2 of fig. 2.

FIG. 4 is a diagram illustrating a learning process of the feature-specific memory read/write module according to an embodiment of the present invention.

FIG. 5 is a flowchart of a method for implementing the setting of a feature consistency weighted alignment module on the reference cross-domain target detection model in step 2 of FIG. 2.

Fig. 6 is a schematic diagram of a learning process of a feature consistency weighted alignment module according to an embodiment of the present invention.

FIG. 7 is an architecture diagram of a cross-domain object detection system that coordinates feature consistency and specificity provided by embodiments of the present invention.

Detailed Description

The technical solutions provided by the present invention will be described in detail below, and it should be understood that the following detailed description is only illustrative of the present invention and is not intended to limit the scope of the present invention.

The definition of key terms is as follows:

the characteristics are consistent: the source domain features and the target domain features extracted by the trained model can reach nearly the same state, and the method is used for measuring the migration capability of the model when the model spans different fields.

② characteristic specificity: the features of different classes extracted by the trained model can reach states different from each other, and are used for measuring the discrimination capability of the model facing different classes.

As shown in fig. 1 and fig. 2, the cross-domain target detection method for harmonizing feature consistency and specificity provided by the embodiment of the present invention includes:

step 1, according to actual application requirements, a source domain data set and a target domain data set are constructed, and a reference cross-domain target detection model is constructed, wherein the model has the capability of preliminarily extracting consistency characteristics. The term "preliminary" may be understood as that the model already has the capability of extracting consistent features, but the performance of the model still has defects, and it is possible that the capability of extracting consistent features needs to be improved, and it is possible that the capability is not perfect in terms of feature consistency and specificity coordination, so that the subsequent steps of the present invention are required to further improve the performance of cross-domain target detection.

And 2, setting a characteristic specificity memory read-write module and a characteristic consistency weighting alignment module on the reference cross-domain target detection model to obtain a cross-domain target detection model for coordinating characteristic consistency and specificity, wherein the model can not only keep the specificity of the characteristics from a semantic level, but also keep the consistency of the characteristics from a cross-domain level.

And 3, training the model by taking the loss function of the cross-domain target detection model for coordinating the consistency and the specificity of the characteristics as an optimization target, and applying the trained model to a target domain.

The step 1 specifically comprises:

step 1.1, a source domain data set and a target domain data set are constructed.

According to the actual application requirements, a public data set with a label is selected as a source domain, and a data set acquired in an actual scene is used as a target domain. Wherein labels are commonly referred to as bounding box labels and category labels. The target domain data does not have a bounding box label and a category label, and the label of the source domain data needs to be subjected to category filtering, category merging and the like so as to ensure that the source domain and the target domain have the same category to be detected.

And 1.2, building a reference cross-domain target detection model.

According to the actual application requirements, a target detection model based on deep learning is selected as a basic target detector to ensure that a subsequently built model can complete the basic task of target detection, and the loss function of the basic target detector is recorded as

Setting pixel level, image level and example level domain discriminators on a basic target detector through a Gradient Reverse Layer (GRL), building a reference cross-domain target detection model to ensure that a subsequently built model has the capability of preliminarily extracting consistency characteristics, and recording an introduced domain discrimination loss function as

In order to explain the model building process in the follow-up process, the fast R-CNN commonly adopted in the field of domain adaptive target detection research is selected as a basic target detector, a characteristic diagram output by a characteristic extraction network E is used, and a domain discriminator is set through a gradient inversion layer GRL. As the skilled person will understand, the basic target detector selected when building the cross-domain target detection model with the consistency and specificity of the coordination features only needs to have a similar network structure with the Faster R-CNN, and should not be limited to the Faster R-CNN. In addition, although the reference cross-domain target detection model in fig. 1 only has one level of domain discriminators, the model with the capability of preliminarily extracting the domain consistency features obtained after the pixel level, the image level and the example level domain discriminators are arranged on the basic target detector belongs to the protection category of the "reference cross-domain target detection model" in the present invention.

The step 2 specifically comprises the following steps:

and 2.1, setting a characteristic specificity memory read-write module on the reference cross-domain target detection model, as shown in fig. 3.

Although the reference cross-domain target detection model built in the step 1.2 has a certain capability of generating consistency features, the consistency feature learning process may not be effectively controlled and excessively acts, so that the generated feature specificity is poor, and further negative influence is brought to subsequent specific tasks (such as classification, positioning and the like) related to target detection, so that negative migration of the source domain and target domain features is caused.

In order to effectively overcome the problems, a memory unit with basic operations of reading and writing is used for storing feature information of different categories, and a reference cross-domain target detection model is guided to learn the specificity of features at a semantic level. In a certain training iteration process, the features of different classes are searched for the memory unit by the identity of the query vector, the memory elements which can represent feature information of different classes are read out from the memory unit, and the memory elements can be written into the memory unit again for the next search and reading, and can release beneficial class signals for the subsequent learning of feature consistency.

Step 2.1 specifically includes, as shown in fig. 3:

and 2.1.1, acquiring a source domain query vector and a target domain query vector.

The query vector can retrieve memory elements related to the class feature information from the memory unit, and is the basis for learning feature specificity.

Extracting n from the feature map output by the feature extraction network E according to the correct (Ground Truth) bounding box label of the source domain image _s A source domain feature matrix, where s represents the source domain. Pooling the characteristic matrixes in an area of interest (RoI Pooling), flattening the characteristic matrixes with fixed dimensionality obtained after Pooling, and finally obtaining source domain query vectors through two full-connected (FC) layers

These source domain query vectors all have their respective correct category labels. Wherein the content of the first and second substances,

the dimension representing the source domain query vector is d.

Extracting n from the feature graph output by the feature extraction network E according to the pseudo border frame label obtained by predicting the target domain image by the reference cross-domain target detection model _t And (3) a target domain feature matrix, wherein t represents the target domain. Pooling the interest regions of the feature matrixes, flattening the feature matrixes with fixed dimensions obtained after pooling, and finally obtaining query vectors of the target region through two full-connection layers

These target domain query vectors all have respective pseudo category labels. Wherein the content of the first and second substances,

the dimension representing the target domain query vector is d.

And 2.1.2, retrieving the memory elements of the source domain and the memory elements of the target domain.

The memory elements in the memory unit are used for storing different types of feature information, and can be updated along with the training process of the reference cross-domain target detection model, so that the reference cross-domain target detection model can learn feature specificity.

Before training iteration starts, random numbers are needed to be used for respectively initializing memory elements in memory units of a source domain and a target domain, the number of the memory elements in the memory units is equal to the total number of target categories to be detected, and the dimensionality of each memory element is equal to the dimensionality of a query vector.

The source domain memory cell is denoted as V _s ＝{v _s,1 ,v _s,2 ,…,v _s,(K-1) ,v _s,K And (c) the step of (c) in which,

memory elements representing the kth category of the source domain.

Representing the target domain memory cell as V _t ＝{v _t,1 ,v _t,2 ,…,v _t,(K-1) ,v _t,K And (c) the step of (c) in which,

memory elements representing the kth category of the target domain.

Wherein K represents the class label index of the object class to be detected, K is epsilon {1,2, …, K }, and K represents the total number of the object classes to be detected.

Because each memory element in the memory units of the source domain and the target domain respectively represents the characteristic information of each target category to be detected, the query vector of the source domain is used in a certain training iteration process

The memory elements with the same category can be retrieved from the memory unit of the source domain by the correct category label respectively, and the target domain queries the vector

The memory elements of the same category as the memory elements of the target domain can be retrieved from the memory units of the target domain by means of the corresponding pseudo-category labels.

In case 1, if the kth memory element in the source domain memory unit is retrieved by the source domain query vector with the category label index k, the mean value of the source domain query vectors belonging to the category k, which is obtained along the number direction of the source domain query vectors, needs to be obtained

This facilitates the representation of the category characteristic information of the current source domain query vector, as shown in equation (1). Wherein the content of the first and second substances,

the dimension representing the mean of the source domain query vectors belonging to category k is d.

In the formula, n _s,k The total number of source domain query vectors that represent a class label index of k, having

In order to ensure that the category feature information represented by the memory element read from the source domain memory unit not only pays attention to the category feature information represented by the current source domain query vector, but also considers the category feature information represented by the memory element stored in the memory unit before, the mean value of the source domain query vector representing the current category feature information and the source domain memory element representing the previous category feature information can be weighted and summed respectively to obtain the read source domain memory element

As shown in equation (2).

In the formula, gamma ₁ Representing the weight coefficient applied to the mean of the source domain query vectors belonging to class k, γ ₂ Represents a weight coefficient, 0, applied to memory elements of the kth class of the source domain<γ ₁ <1，0<γ ₂ <1, and γ ₁ +γ ₂ ＝1。

Similarly, if the k-th memory element in the target domain memory unit is retrieved by the target domain query vector with the category label index of k, the mean value of the target domain query vectors belonging to the category k can be obtained

As shown in equation (3). Wherein the content of the first and second substances,

the dimension representing the mean of the target domain query vectors belonging to class k is d.

In the formula, n _t,k The total number of target domain query vectors representing a class label index of k, has

Respectively weighting and then summing the mean value of the query vectors of the target domain belonging to the class k and the memory element of the kth class of the target domain to obtain the read memory element of the target domain

As shown in equation (4).

In the formula, gamma ₃ Representing a weight coefficient, γ, applied to the mean of the target domain query vectors belonging to class k ₄ Represents a weight coefficient, 0, applied to the memory elements of the kth class of the target domain<γ ₃ <1，0<γ ₄ <1, and γ ₃ +γ ₄ ＝1。

Because the query vectors of the source domain and the target domain do not necessarily cover all the target classes to be detected in a certain training iteration process, namely n _s,k 0 or n _t,k Therefore, some memory elements in the source domain and target domain memory units may not be read out, and the subsequent feature consistency learning process needs to take the source domain and target memory elements representing the same class of feature information as input, and the problem can be handled according to the following case 2.

In case 2, if the k-th memory element in the source domain memory unit is not retrieved, the k-th category memory element in the source domain is directly assigned to the read source domain memory element, as shown in formula (5), the category feature information represented by the memory element that has been stored in the memory unit before is still used as the category feature information represented by the memory element read from the source domain memory unit.

Similarly, if the k-th memory element in the target domain memory unit is not retrieved, the k-th class memory element of the target domain is directly assigned to the read target domain memory element, as shown in equation (6).

And 2.1.3, updating the source domain memory unit and the target domain memory unit.

In the training process of the reference cross-domain target detection model, the read memory elements are used for updating the memory units of the source domain and the target domain, so that on one hand, the memory elements in the memory units can continuously inject new feature information representing the same type on the basis of the original type feature information, and the memory capacity of the memory units is continued. On the other hand, before the next training iteration comes, preparation is made for searching the query vector, so that the category characteristic information represented by the memory elements read again is more reasonable and reliable.

For the source domain, the source domain memory element to be read out by the formula (2) or the formula (5)

Writing to source domain memory cell V _s Memory element v at the corresponding category position _s,k As shown in equation (7).

For the target domain, the memory element of the target domain read out by the formula (4) or the formula (6)

Write target field memory cell V _t Memory element v at the corresponding category position _s,k As shown in equation (8).

In each training iteration process, whether the source domain or the target domain, all memory elements capable of representing the characteristic information of the respective category can be retrieved from the memory unit directly or indirectly according to the query vector. As shown in fig. 4, it is assumed that the source domain and the target domain have only two types of target to be detected, and with the advancement of the training process of the reference cross-domain target detection model, the query vectors can find memory elements of the corresponding domain and the same type, and can continuously update the type feature information of the memory elements, so that the memory elements can more and more accurately represent the features of the corresponding type, and the different types of features are more and more well separated at the semantic level. Thus, features with progressively increasing specificity may reduce the risk of semantic confusion for different classes.

And 2.2, setting a characteristic consistency weighted alignment module on the reference cross-domain target detection model, as shown in fig. 5.

After the characteristic specificity memory read-write module is arranged on the reference cross-domain target detection model through the step 2.1, although the characteristic which has primary specificity at the semantic level can be obtained, the cross-domain consistency of different types of characteristics still has a space for improvement.

In order to effectively coordinate specificity and consistency of features, it can be considered to use the source domain memory elements and the target domain memory elements read from the source domain memory units and the target domain memory units at each training iteration to guide the alignment of the same class features of the two domains.

In addition, the loss function of each category level domain discriminator is weighted according to the occurrence proportion of the target category to be detected, and the different categories of features are ensured to have the same attention in the alignment process, so that the cross-domain consistency of the different categories of features is enhanced on the basis of semantic specificity.

The step 2.2 specifically comprises the following steps:

and 2.2.1, constructing a category-level domain discriminator.

The domain discriminator has the basic function of distinguishing whether the features come from a source domain or a target domain, and the learning purpose of the consistency features is to confuse the features of the same class of the source domain and the target domain, so that the class-level domain discriminator is constructed by a gradient inversion layer for each class of the target to be detected to realize the alignment of the corresponding class features.

Specifically, each class-level domain discriminator is composed of a series of fully-connected layers, and the read source domain memory elements and the read target domain memory elements of the corresponding class are input, and the read source domain memory elements and the read target domain memory elements are output as the probabilities of predicting the read memory elements from the source domain. The gradient inversion layer can achieve antagonistic training of the category-level domain discriminator and the basic target detector, when the reference cross-domain target detection model is trained to a certain degree, the basic target detector and the category-level discriminator reach dynamic balance, and at the moment, the characteristic with semantic specificity can also keep cross-domain consistency. Loss function per class level arbiter

As shown in equation (9).

In the formula (I), the compound is shown in the specification,

representing the kth class level domain arbiter for predicting the read source domain memory element

Probability of belonging to source domain

Predicting read target domain memory elements

Indicating a desire.

And 2.2.2, constructing a source domain and target domain vector counter.

The vector counter has the main functions of accumulating the occurrence times of query vectors of each category in one round of training of the reference cross-domain target detection model, and is used for acquiring the occurrence proportion of the target categories to be detected and weighting the loss function of the classifier of each category level domain. And the capacity of the vector counter is the total number of the target classes to be detected.

Before the beginning of each training round, all values of the vector counter are set to 0, namely N _s,k ＝0，N _t,k 0, wherein N _s,k Representing the number of source domain query vectors in the source domain vector counter that belong to class k, N _t,k Representing the number of target domain query vectors in the target domain vector counter that belong to category k.

In each training iteration, the total number n of source domain query vectors indexed by k using the class label is used _s,k And the total number n of target domain query vectors indexed by category label k _t,k And updating the numerical values at the corresponding category positions of the vector counters of the source domain and the target domain, as shown in the formula (10) and the formula (11).

N _s,k ←N _s,k +n _s,k (10)

N _t,k ←N _t,k +n _t,k (11)

When a training round is finished, the numerical values stored in the source domain and target domain vector counters are respectively the total number of the source domain and target domain query vectors of each type in the training round, and after the proportion of the source domain and target domain query vectors of each type in one round is calculated, all the numerical values of the source domain and target domain vector counters are set to be 0 again and used for carrying out statistics on the number of the source domain and target domain query vectors of each type in the next training round.

Step 2.2.3, a weighted alignment loss function is calculated.

Since the number of different target classes in the source domain data may have a large imbalance, and the number of the same target classes in the source domain and target domain data may have a large difference, a large number of target classes is equivalent to having a large weight in the alignment process, and a small number of target classes is equivalent to having a small weight in the alignment process.

In order to adjust the feature alignment degree of different numbers of target classes, the weight of the source domain part and the weight of the target domain part of the loss function applied to each class-level discriminator are calculated according to the occurrence proportion of the source domain query vector and the target domain query vector of each class in a training turn, as shown in formula (12) and formula (13).

In the formula, alpha _k Representing the weight, β, of the source domain portion of the penalty function applied to the kth class level domain discriminator _k Weight of the objective domain part of the penalty function representing the kth class-level domain discriminator, 0<α _k <1，0<β _k <1. If the proportion of the query vectors of the source domain and the target domain of a certain category appearing in one turn is larger, the feature alignment strength can be properly reduced in the feature consistency learning process, and if the proportion of the query vectors of the source domain and the target domain of a certain category appearing in one turn is smaller, the feature alignment strength should be improved as much as possible in the feature consistency learning process.

It should be noted that, the formula (12) and the formula (13) are only one method for obtaining the weights of the source domain part and the target domain part of the loss function of the category-level domain discriminator, and it is within the scope of the present invention to satisfy that the obtained weights are in inverse proportion to the occurrence of each category query vector in a training turn.

Rewriting equation (9) using the weights obtained from equations (12) and (13), the weighted alignment penalty function for each class level discriminator

As shown in equation (14).

Since a class-level domain discriminator is constructed for each target class to be detected to learn the consistency of the features, the total weighted alignment loss function

The sum of the weighted alignment penalty functions for all class-level domain discriminators should be as shown in equation (15).

In each training iteration process, the memory elements of the same category of the source domain and the target domain enter corresponding category-level domain discriminators through a gradient inversion layer, and the category-level domain discriminators are responsible for confusing the characteristics of the same category of the source domain and the target domain. As shown in fig. 6, it is assumed that the source domain and the target domain have only two types of target to be detected, and the query vectors are all matched with corresponding memory elements, and with the progress of the training process of the reference cross-domain target detection model, the difference between the memory elements of the source domain and the memory elements of the target domain is continuously reduced.

The step 3 specifically comprises:

and 3.1, obtaining a loss function of the cross-domain target detection model with the coordination feature consistency and the specificity, and training the cross-domain target detection model with the coordination feature consistency and the specificity.

Loss function of cross-domain target detection model for coordinating feature consistency and specificity

Including a reference cross-domain target detection model loss function and a total weighted alignment loss function

And the reference cross-domain target detection model loss function comprises a base target detector loss function

Sum domain discriminant loss function

Loss function of cross-domain object detection model to coordinate feature consistency and specificity

As shown in equation (16).

In the formula, λ ₁ And λ ₂ For balancing coefficients, it is usually required to obtain through model tuning, and values of 0.01, 0.1, 1, etc. are generally taken.

And (3) taking a formula (16) as an optimization target of the cross-domain target detection model for coordinating feature consistency and specificity, and training the model by adopting a proper optimization algorithm (such as an SGD algorithm, an Adam algorithm and the like). In order to avoid the instability of the cross-domain target detection model for coordinating the consistency and specificity of the features in the initial training stage, the training process is divided into two stages, the first stage only trains the reference cross-domain target detection model, and the second stage trains the whole cross-domain target detection model for coordinating the consistency and specificity of the features, which consists of the reference cross-domain target detection model, the feature specificity memory read-write module and the feature consistency weighting alignment module.

And 3.2, performing cross-domain target detection by using the trained cross-domain target detection model with the coordination feature consistency and specificity.

And loading a weight file which is stored in the training process and corresponds to the performance of the optimal model for a basic target detector (not comprising a domain discriminator, a feature specificity memory read-write module and a feature consistency weighting alignment module), and using the model to detect the target on the target domain without the label. Because the consistency and the specificity of the features are coordinated in the training process, the cross-domain target detection model for coordinating the consistency and the specificity of the features has higher cross-domain robustness.

As shown in fig. 7, the cross-domain target detection system for coordinating feature consistency and specificity provided in the embodiment of the present invention includes a reference cross-domain target detection model, a feature specificity memory read-write module, and a feature consistency weighting alignment module, where:

the reference cross-domain target detection model consists of a basic target detector, domain discriminators of different levels (such as pixel level, image level, instance level and the like) and a gradient inversion layer, wherein the gradient inversion layer can realize the countermeasure training of the basic target detector and the domain discriminators of various levels, so that the reference cross-domain target detection model has the capability of preliminarily extracting consistency characteristics. In addition, the reference cross-domain target detection model is also the basis for subsequently setting a feature specificity memory read-write module and a feature consistency weighting alignment module.

The core of the characteristic specificity memory read-write module is a source domain memory unit and a target domain memory unit, and the memory units have basic operations of reading and writing and can be used for storing different types of characteristic information. In each training iteration process, the characteristic specificity memory read-write module can read out the source domain memory elements and the target domain memory elements from the memory unit and update the source domain memory units and the target domain memory units of the characteristic specificity memory read-write module. The module can guide the reference cross-domain target detection model to learn the semantic specificity of the features and provide input for a subsequent feature consistency weighting and aligning module.

The feature consistency weighting and aligning module is composed of a source domain vector counter, a target domain vector counter, a plurality of category level domain discriminators and a gradient inversion layer. Each category-level domain discriminator takes the source domain memory element and the target domain memory element of the corresponding category as input, and mixes the memory elements of the same category of the two domains. In addition, the loss function of each category level domain discriminator is weighted according to the occurrence proportion of the category of the target to be detected, and the learning of cross-domain consistency by the features is further guided on the basis of semantic specificity.

In one embodiment, the feature-specific memory read/write module specifically includes:

source field memory elements to be read out

Writing to source memory cell V _s Memory element v at the corresponding category position _s,k In (1),

memory elements representing the kth category of the source domain;

memory element of target field to be read out

memory elements representing the kth category of the target domain.

The feature specificity memory read-write module provided by the embodiment of the invention continuously updates the class feature information corresponding to the memory element through the read and write operations of the memory unit, so that the memory element can more and more accurately represent the features of the corresponding class, the retention of the specificity of the features of different classes is realized on a semantic level, the problem of potential loss of specificity caused by over-learning feature consistency is avoided, the risk of wrong alignment of the features of different classes is reduced, and the performance of the reference cross-domain target detection model is improved.

In one embodiment, the feature consistency weighted alignment module specifically includes:

and the discrimination loss weighting unit is used for calculating the weight of the loss function source domain part and the weight of the target domain part applied to each category level discriminator according to the occurrence proportion of the source domain query vector and the target domain query vector of each category in one training turn so as to weight the loss function of each category level discriminator. In this embodiment, in the feature consistency learning process, the source domain partial loss function and the target domain partial loss function of each class-level domain discriminator are balanced by a weighting strategy according to the number of classes of the target to be detected.

The feature consistency weighted alignment module provided by the embodiment of the invention aligns the source domain memory elements and the target domain memory elements through the category-level domain discriminator, indirectly guides the alignment of the same category features of the source domain and the target domain, can also ensure that different category features have the same attention in the alignment process, can enhance the cross-domain consistency of different category features on the basis of semantic specificity, overcomes the problem of insufficient alignment of specific features, and reduces the deviation of a reference cross-domain target detection model.

The invention can reasonably coordinate the specificity and consistency of the features in the training process of the reference cross-domain target detection model, and is different from the prior technical scheme that the learning of the feature consistency is only concerned unilaterally and the learning of the feature specificity is ignored.

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A cross-domain target detection method for coordinating feature consistency and specificity is characterized by comprising the following steps:

2. The method for cross-domain target detection with feature consistency and specificity harmonized according to claim 1, wherein the step 2 of setting a feature-specific memory read-write module on a reference cross-domain target detection model specifically comprises:

source field memory elements to be read out

Writing to source domain memory cell V _s Memory element v at the corresponding category position _s，k In (1),

memory elements representing the kth category of the source domain;

memory element of target field to be read out

Write target field memory cell V _t Memory element v at the corresponding category position _t，k In (1),

memory elements representing the kth category of the target domain.

3. The method of claim 2, wherein source domain memory elements are used to coordinate feature consistency and specificity across domain target detection

Obtained in two cases as follows:

case 1:

a1. if the k-th memory element in the source domain memory unit is retrieved by the source domain query vector with the class label index of k, the read-out source domain memory element

Is described as formula (2):

in the formula (I), the compound is shown in the specification,

represents the mean of the source domain query vectors belonging to class k,

γ ₁ representing the weight coefficient applied to the mean of the source domain query vectors belonging to class k, γ ₂ Represents a weight coefficient applied to the k-th class of memory elements of the source domain, 0 < gamma ₁ ＜1，0＜γ ₂ < 1, and γ ₁ +γ ₂ ＝1；

Is described as formula (4):

in the formula (I), the compound is shown in the specification,

representing target domain queries belonging to class kThe mean value of the vector is calculated,

γ ₃ representing a weight coefficient, γ, applied to the mean of the target domain query vectors belonging to class k ₄ Represents a weight coefficient applied to memory elements of the kth class of the target domain, 0 < gamma ₃ ＜1，0＜γ ₄ < 1, and γ ₃ +γ ₄ ＝1；

Case 2:

4. The method for cross-domain target detection based on feature consistency and specificity coordination according to any one of claims 1 to 3, wherein the step 2 of setting a feature consistency weighted alignment module on a reference cross-domain target detection model specifically comprises:

step 2.2.1, constructing a category level domain discriminator;

step 2.2.2, constructing a source domain and target domain vector counter for accumulating the occurrence times of each category of query vectors in one round of the reference cross-domain target detection model training, acquiring the occurrence proportion of the target category to be detected and weighting the loss function of each category level domain discriminator;

5. The method for cross-domain target detection with harmonized feature consistency and specificity according to claim 4, wherein the calculation formula of "weight of source domain part and weight of target domain part of loss function" in step 2.2.3 are respectively formula (12) and formula (13):

in the formula, alpha _k Representing the weight, β, of the source domain portion of the penalty function applied to the kth class level domain discriminator _k Weight of the objective domain part of the loss function representing the kth class level domain discriminator, 0 < alpha _k ＜1，0＜β _k ＜1。

6. The method of cross-domain object detection with harmonized feature consistency and specificity according to claim 5 wherein the weighted alignment penalty function of each class level discriminator in step 2.2.1

As shown in equation (14):

in the formula (I), the compound is shown in the specification,

Probability of belonging to source domain

Predicting read target domain memory elements

Probability of belonging to a target domain

Indicating the desire.

7. The method for cross-domain target detection with harmonized feature consistency and specificity as defined in claim 4, wherein step 2.2.2 specifically comprises:

before the beginning of each training turn, the number N of the source domain query vectors belonging to the category k in the source domain vector counter _s，k And the number N of target domain query vectors belonging to class k in the target domain vector counter _t，k Are all set to 0;

in each training iteration, the total number n of source domain query vectors indexed by k using the class label is used _s，k And the total number n of target domain query vectors indexed by category label k _t，k And respectively updating the numerical values at the corresponding category positions of the vector counters of the source domain and the target domain according to an equation (10) and an equation (11):

N _s，k ←N _s，k +n _s，k (10)

N _t，k ←N _t，k +n _t，k (11)

8. A cross-domain target detection system coordinating feature consistency and specificity is characterized by comprising a reference cross-domain target detection model, a feature specificity memory read-write module and a feature consistency weighting alignment module, wherein,

the characteristic specificity memory read-write module is provided with a source domain memory unit and a target domain memory unit which have reading and writing basic operations and are used for storing different types of characteristic information, and in each training iteration process, the characteristic specificity memory read-write module is used for reading out a source domain memory element and a target domain memory element from the memory units, updating the source domain memory unit and the target domain memory unit, guiding a reference cross-domain target detection model to learn the semantic specificity of the characteristics and providing input for a subsequent characteristic consistency weighting alignment module;

9. The system of claim 1, wherein the feature specificity memory read-write module comprises:

source field memory elements to be read out

memory elements representing the kth category of the source domain;

memory element of target field to be read out

memory elements representing the kth category of the target domain.

10. The system for cross-domain object detection that coordinates feature consistency and specificity of claim 8 or 9, wherein the feature consistency weighted alignment module specifically comprises: