CN116895002A

CN116895002A - Multi-graph contrast learning-based method and system for detecting adaptive targets from domain

Info

Publication number: CN116895002A
Application number: CN202311013132.3A
Authority: CN
Inventors: 宋然; 张�林; 张伟; 刘世奎; 张生刚
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-10-17
Anticipated expiration: 2043-08-11
Also published as: CN116895002B

Abstract

The invention belongs to the technical field of target detection, and provides a method and a system for detecting a target adaptive to a source domain based on multiple graph comparison learning, wherein the technical scheme is as follows: training a source domain model based on the marked source domain data set to obtain a trained source domain model; performing target detection based on the label-free target domain data set and the trained multiple graph contrast learning model on the basis of the trained source domain model to obtain a target detection result; the model models different layers together by aligning layer-specific node embeddings. Specifically, it captures node information through graph neural network technology and cluster-level information by pulling nodes on multiple graphs within the same semantic cluster in embedded space. Knowledge of the source domain training model is effectively extracted into the target domain, and the knowledge can be effectively applied to different scenes.

Description

Multi-graph contrast learning-based method and system for detecting adaptive targets from domain

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a method and a system for detecting a target adaptive to a source domain based on multiple graph comparison learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The conventional domain adaptation method (Unsupervised domain adaptation, UDA) aims at minimizing domain differences by distributing features of the detector model between Ji Yuanyu and the target domain. For feature alignment, the UDA method requires simultaneous access to marked source data and unmarked target data. However, in practical application scenarios, access to source data is often limited due to privacy/security, data transmission, data specificity, etc. For example, consider a detection model trained on large-scale source data that performs poorly on new devices that use data with different visual fields. In this case, it is more efficient to adapt the source-trained detector model (about 500-1000 MB) to transmit source data (about 10-100 GB) to these new devices. Furthermore, the transmission-only source training model may also alleviate many privacy/security and data-specific issues.

Thus, adapting a source-trained model to a target domain without accessing source data is critical to detecting the actual deployment of the model. For this purpose, passive domain adaptive Settings (SFDA) adapted to the target detector are studied. SFDA is a more challenging setup than traditional domain adaptation. Specifically, without the tag of the target data, the source data is also inaccessible during adaptation. Therefore, most passive domain adaptive methods for object detection have a wider application scenario.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the invention provides a domain-adaptive target detection method and a domain-adaptive target detection system based on multiple graph comparison learning, and provides a domain-adaptive target detection method based on multiple graph comparison learning, which can utilize an example relationship between RPN proposals generated by modeling. Specifically, each node corresponds to one proposal, and edges represent similar relationships between proposals. Using the learned similarity relationships, information about which offers will form positive/negative samples can be extracted and used to guide the training of the network. For this purpose, a contrasted representation learning method based on multiple graphs is proposed to enhance the representation capability of target data, thereby improving the representation capability of target domain data.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the first aspect of the present invention provides a method for detecting a target from domain adaptation based on multiple graph contrast learning, comprising:

acquiring a marked source domain and a unmarked target domain data set;

training a source domain model based on the marked source domain data set to obtain a trained source domain model;

performing target detection based on the label-free target domain data set and the trained multiple graph contrast learning model on the basis of the trained source domain model to obtain a target detection result;

the training process of the multiple graph comparison learning model comprises the following steps: representing the image as a graph structure, wherein each node corresponds to one proposal, and the edges represent the similarity relationship between the proposals; the students and the teacher network share the graphic neural network, and the relationship between the teacher network and the teacher network is captured through the interaction between the learning nodes of the graphic neural network to obtain the graphic structure representation; based on the graph structure representation, contrast learning is adopted, a teacher network is used as a supervision network, training of a student network is guided, and therefore source domain structure information is fused into a target domain.

A second aspect of the present invention provides a multiple graph contrast learning-based source free domain adaptive target detection system, comprising:

a data acquisition module for acquiring a marked source domain and a unmarked target domain dataset;

the source domain model training module is used for training the source domain model based on the marked source domain data set to obtain a trained source domain model;

the target detection module is used for carrying out target detection based on the label-free target domain data set and the trained multiple graph contrast learning model on the basis of the trained source domain model to obtain a target detection result;

A third aspect of the present invention provides a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the multiple graph contrast learning based method of adaptive target detection by domain as described in the first aspect.

A fourth aspect of the invention provides a computer device.

A computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the multiple graph contrast learning based adaptive target detection method of the first aspect when the program is executed.

Compared with the prior art, the invention has the beneficial effects that:

1. the present invention is able to exploit the modeling generated inter-instance relationships between the regional proposal networks. Each node corresponds to one proposal, the edges represent the similarity relation between the proposals, the learned similarity relation is utilized to extract information about which proposals form positive/negative samples and are used for guiding the training of a network, and the representation capability of the target data is enhanced based on a multi-graph comparison representation learning method, so that the representation capability of the target domain data is improved.

2. The method has good adaptability and generalization capability on different data sets, enhances the robustness of the model, enhances the adaptability of the algorithm in practical application, and can be effectively applied to different scenes.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a schematic flow diagram of a domain-adaptive target detection method based on multiple graph comparison learning according to an embodiment of the present invention;

FIG. 2 is a diagram of multiple graph comparison learning provided by an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

As shown in fig. 1, the embodiment provides a method for detecting a target from domain adaptation based on multiple graph contrast learning, which includes the following steps:

step 1: acquiring a marked source domain and a unmarked target domain data set;

domain adaptation takes into account both marked source domain and unmarked target domain datasets for adaptation.

In this embodiment, the labeled source domain dataset is represented asWherein->Representing the nth source image,/->Representing the corresponding real label;

representing unlabeled target domain data sets asWherein->Representing the nth target image without the actual label annotation.

Step 2: training a source domain model based on the marked source domain data set to obtain a trained source domain model;

in contrast, the source domain free domain adaptation (SFDA) setup takes into account the more realistic situation, i.e. during adaptation only the trained source domain model θ and the unlabeled target dataset D _t Available, and cannot access the source data set.

The self-training adaptive strategy updates the model on unlabeled target data using pseudo tags generated by the source training model. The false labels are filtered by confidence thresholds, and reliable false labels are used for supervised detector training.

In general, pseudo tag supervision loss is:

wherein the method comprises the steps ofIs a pseudo tag. />

Construction of knowledge distillation model

Knowledge distillation is a technique that delivers knowledge of a complex model to a simple model to improve the performance of the simple model.

Firstly, source domain pre-training of a teacher model is carried out, wherein the network pre-training of source domain data refers to the process of pre-training a deep neural network by using a large amount of source domain data before target domain data arrives, so as to obtain a teacher networkThe main purpose of the method is to improve the generalization capability of the model and reduce the occurrence of the overfitting phenomenon. Second, initialize the teacher model network to student model +.>The model will learn knowledge from the teacher model.

Step 3: performing target detection based on the label-free target domain data set and the trained multiple graph contrast learning model on the basis of the trained source domain model to obtain a target detection result;

based on the source domain model theta, knowledge of the source domain training model is effectively extracted into the target domain by utilizing a multiple graph-based contrast learning model.

The model co-models structural relationships between nodes by alignment layer node embedding. Specifically, it captures node information through graph neural network technology and cluster-level information by pulling nodes on multiple graphs within the same semantic cluster in embedded space.

The method specifically comprises the following steps:

step 301: construction of multiple graphs

To accommodate complex scenarios and relationships between multiple classes of object detection and instances, in source-domain free object detection, the image is first represented as a graph structure with objects as nodes and relationships (e.g., neighbors, inclusions, etc.) between them as edges.

The graph neural network structure relationships can capture the relationships through the interaction between the learning nodes, so that the accuracy and the robustness of target detection are improved.

The specific construction process is to put forward a multiple graph structure relation network, and to learn the relation between the detection image RPN proposal and the teacher model by using GCN to obtain a graph structure expressed as G (V, E), wherein V is a node of the graph network, and E is an edge between the nodes. The nodes in V correspond to the RoI features extracted from the RPN proposal, e _ij (i, j E) encodes the relationship between the ith and jth offers.

The student and teacher networks share a graph neural network for modeling relational nodes between object suggestions. Then, the structural information of the source domain is fused into the target domain by learning the relation matrix E to find the relation between RPN proposals.

Specifically, a batch of pictures are respectively subjected to intensity amplification to obtain first amplified dataAnd second augmentation data->First augmentation data +.>Input to teacher model->Obtaining RoI characteristics->Will beInput to student model->Obtain the corresponding ROI feature->In multiple graphs, the extracted features are used to generate a new imageAnd->As nodes of the multiple graph.

Based on the above characteristics, the edges E on the graph of the teacher and student networks respectively constructed by the learnable functions,

wherein S is _ij ＝f(v _i )·g(v _j ) ^T Where f and g are learnable functions.

Through the information transmission of the graphic nerve,

Z ⁱ ＝ReLU(E ⁱ H ⁱ W) i＝s,t

the output characteristics of the two graph structures are obtainedAnd->

Based on the construction, the classification weight centroid of the teacher is taken as a nodeIn whichIs the cluster center, and K and d are the number of clusters and the dimension of the embedding space, respectively. Capturing semantic information at the cluster level reduces semantic errors by pulling nodes within the same cluster to their assigned cluster center.

Definition by contrast learningThe probability of belonging to cluster k is:

meaning of c _k Transpose of->Meaning of i is the characteristic of the i-th sample,/->Meaning the features of the j-th sample.

Step 302: construction of multiple graph contrast learning

Based on the knowledge distillation multiple graphs, respectively obtaining the output of a learning network, and based on the distribution of the ith sample of the comparison learning teacher networkAnd student network ith sampleDistribution of the root->The calculation mode is obtained by a formula (1).

In order to further improve the relationship, the embodiment introduces a knowledge distillation optimization function, takes a teacher network as a supervision network, guides training of student models, enables the networks to mutually guide learning, adopts differences between class distributions output by the networks, designs graph structure relationship of intermediate characteristics of the networks, and improves association between the networks.

Defining a multiple graph-based contrast learning loss function as follows:

in order to further improve the association between the source domain and the target domain, a graph contrast distillation learning function is used to compare the functions of the source model and the target model, knowledge distillation learning is performed, and knowledge of the source model is learned, so that the association between the two domains is improved, as shown in fig. 2.

In summary, the objective function of training is:

L _all ＝L _SL +L ₁ +L ₂

the method of the invention not only achieves good effect on the reference data set, but also verifies on other data sets, such as PASCAL VOC, clipart and Watercolor data sets, and the average accuracy (mAP) of the method is respectively improved by 3.5 percent, 3.2 percent and 3.6 percent. The method has good adaptability and generalization capability on different data sets, and can be effectively applied to different scenes, so that the practicability and application value of the model are improved.

Furthermore, the method provided by the invention can be applied to challenging scenes, such as target detection under different weather conditions. Experimental results show that the method can effectively reduce the false alarm rate and the missing report rate of target detection under severe weather conditions such as rainy days, foggy days and the like, so that the robustness and the reliability of the model are improved. The method has strong adaptability and generalization capability, and can be used for target detection in various challenging scenes, such as defect detection, anomaly detection and the like, so that the actual application requirements can be better met.

Example two

The invention provides a source free domain adaptive target detection system based on multiple graph comparison learning, which comprises the following components:

Example III

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a method of adaptive target detection from a domain as described above based on multiple graph contrast learning.

Example IV

The present embodiment provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in domain-adaptive target detection derived from multiple graph contrast learning as described above when the program is executed.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-graph contrast learning-based adaptive target detection method from a domain, comprising:

acquiring a marked source domain and a unmarked target domain data set;

2. The multiple graph contrast learning-based method for domain-adaptive target detection, according to claim 1, wherein the model is updated on unlabeled target data using pseudo tags generated by a source training model, the pseudo tags being obtained by filtering with a confidence threshold.

3. The multiple graph contrast learning-based adaptive target detection method from the domain of interest as recited in claim 1, wherein training the source domain model based on the labeled source domain dataset includes performing source domain pre-training of the teacher model to obtain a teacher network, and then initializing the teacher network as a student network.

4. The multiple graph contrast learning based method of domain-oriented object detection from origin, as claimed in claim 1, wherein the graph structure is denoted G (V, E), where V is the nodes of the graph network, E is the edges between the nodes, the nodes in V correspond to the RoI features extracted from RPN proposals, E _ij (i, j E) encodes the relationship between the ith and jth offers.

5. The multiple graph contrast learning-based domain-oriented target detection method according to claim 1, wherein the student and teacher networks share a graph neural network, and the relationship between the image proposal and the teacher network is captured through the interaction between the graph neural network learning nodes to obtain a graph structure representation, and the method specifically comprises the following steps:

respectively carrying out intensity augmentation on the batched pictures to obtain first augmentation data and second augmentation data;

inputting the first augmentation data into a teacher model to obtain a first ROI feature;

inputting the second augmentation data into the student model to obtain a second ROI feature;

taking the extracted first ROI features and the second ROI features as nodes of the multiple graphs;

edges on the graph of the teacher and student networks respectively constructed by the learnable functions;

based on the nodes of the multiple graphs and the edges on the graphs, the output characteristics of the corresponding graph structure are obtained through the information transmission of graph nerves.

6. The method for detecting adaptive targets from domain according to claim 1, wherein after obtaining node information of multiple graphs, the centroid of classification weight of teacher network is used as nodeSemantic clusters of->Is the cluster center, K and d are the number of clusters and the dimension of the embedded space, respectively, capture the semantic information at the cluster level, and reduce semantic errors by pulling nodes within the same cluster to their assigned cluster center.

7. The method for detecting the target adaptive to the source domain based on the multiple graph comparison learning, which is disclosed in claim 1, is characterized in that a teacher network is used as a supervision network by introducing a knowledge distillation optimization function, so that training of student networks is guided, learning is guided between the networks, the difference between class distributions output by the networks is adopted, and the graph structure relationship of the intermediate characteristics of the networks is designed to improve the association between the networks.

8. A multiple graph contrast learning-based source free domain adaptive target detection system, comprising:

9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the multiple graph contrast learning based adaptive target detection method according to any of claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any one of claims 1-7 in a multiple graph contrast learning based adaptive target detection method.