CN115795993A - Layered knowledge fusion method and device for bidirectional discriminant feature alignment - Google Patents

Layered knowledge fusion method and device for bidirectional discriminant feature alignment Download PDF

Info

Publication number
CN115795993A
CN115795993A CN202211119323.3A CN202211119323A CN115795993A CN 115795993 A CN115795993 A CN 115795993A CN 202211119323 A CN202211119323 A CN 202211119323A CN 115795993 A CN115795993 A CN 115795993A
Authority
CN
China
Prior art keywords
teacher
common
class
model
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211119323.3A
Other languages
Chinese (zh)
Inventor
徐仁军
梁朔颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZJU Hangzhou Global Scientific and Technological Innovation Center
Original Assignee
ZJU Hangzhou Global Scientific and Technological Innovation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZJU Hangzhou Global Scientific and Technological Innovation Center filed Critical ZJU Hangzhou Global Scientific and Technological Innovation Center
Priority to CN202211119323.3A priority Critical patent/CN115795993A/en
Publication of CN115795993A publication Critical patent/CN115795993A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a layered knowledge fusion method for bidirectional discriminant feature alignment, which comprises the following steps: inputting the sample into a teacher model to obtain a teacher soft prediction result set, and inputting unlabeled image data into an initial student model to obtain a student model prediction result; extracting the last layer of features, and inputting the last layer of features into a common feature extractor to obtain common features; different class centers are far away from each other by judging a centroid clustering strategy, and the common characteristics of teachers are close to the class centers of the same class; measuring the ambiguity of a teacher soft prediction result through entropy impurities, constructing reliable source domain characteristics and target domain characteristics, performing Kronecker product on the pseudo labels and common characteristics of a source domain and a target domain respectively for performing discriminative mapping, and performing characteristic alignment through a maximum average difference method; and constructing a total loss function, and training an initial student model through the total loss function to obtain a comprehensive student model capable of being classified accurately.

Description

Layered knowledge fusion method and device for bidirectional discriminant feature alignment
Technical Field
The invention belongs to the field of artificial intelligence, and particularly relates to a layered knowledge fusion method and device for bidirectional discriminant feature alignment.
Background
In recent years, deep Neural Networks (DNNs) have enjoyed dramatic success in many artificial intelligence tasks, such as computer vision and natural language processing. However, the success of the widely used DNN relies on expensive computational cost and storage and extensive manual annotation. To ease replication efforts, many researchers have published their trained models on the web, which motivates us to reuse them in a plug-and-play fashion.
As a model reuse strategy, the knowledge fusion (KA) algorithm achieves compelling performance in various applications. They studied how to effectively utilize multiple pre-trained teacher networks to train a comprehensive mini-student model to handle all the tasks of teachers who did not have labeled data. The students in these traditional KA methods are typically trained to mimic the teacher's output (referred to as classification score learning) and/or intermediate layers (referred to as feature learning) for unlabeled data.
However, publicly available training models typically have different architectures. Thus, a more realistic scenario is heterogeneous knowledge fusion (HKA). In this case, the student cannot learn directly from the features introduced between each network layer of the teacher as usual. They can only learn with classification scores for purposes such as Data-free KA and SKA.
Chinese patent publication No. CN111160409A discloses a heterogeneous neural network knowledge fusion method based on common feature learning, which includes: obtaining a plurality of pre-trained neural network models, which are called teacher models: the characteristics output by the teacher model and the output prediction result are used for guiding the training of the student model through a common characteristic learning and soft target distillation method: in the common characteristic learning process, the characteristics of a plurality of heterogeneous networks are projected to a common characteristic interval, the student models integrate knowledge of a plurality of teacher models, and the soft target distillation method enables the prediction results of the student models to be consistent with the prediction results of the teacher models, so that a stronger student model with the task processing capacity of all the teacher models is obtained. The patent disclosed above is applicable to knowledge fusion of neural network models, and in particular to knowledge fusion of heterogeneous image classification task models.
But it is crude to align student features and teacher features only blindly, and students trained without discriminative feature alignment are likely to align with or be disturbed by irrelevant class features and degrade classification performance. Therefore, in this case, it is difficult for students to learn the true data distribution from teachers, so that the performance of heterogeneous knowledge fusion is generally low and the generalization is poor.
Disclosure of Invention
The invention provides a layered knowledge fusion method for bidirectional discriminant feature alignment, which can obtain a student model capable of accurately judging the category of unlabeled image data through less training.
A layered knowledge fusion method for bidirectional discriminant feature alignment comprises the following steps:
(1) Obtaining a label-free image data set and a teacher model, constructing an initial student model, inputting the label-free image data as a sample into the teacher model to obtain a teacher soft prediction result set, inputting the spliced teacher soft prediction results into an activation function to obtain a pseudo label, and inputting the label-free image data into the initial student model to obtain a student model prediction result;
(2) Respectively extracting the last layer of characteristics in the teacher model and the initial student model, and inputting the last layer of characteristics into a common characteristic extractor to respectively obtain a teacher common characteristic set and student common characteristics; determining class centers by adopting an incremental learning strategy based on class identifiers corresponding to the pseudo labels, punishing distances of different class centers in the teacher model by judging a centroid clustering strategy to enable the different class centers to be far away from each other, and enabling common features of all teachers to approach the same class centers to obtain a clustering common feature set;
(3) Splicing soft prediction results of the teachers, and inputting the splicing results into an activation function to obtain a pseudo label; inputting a teacher soft prediction result into an entropy impurity formula to measure the ambiguity of the teacher soft prediction result, comparing the result after ambiguity normalization with a constraint boundary, screening a confidence teacher model meeting requirements, mixing a clustering common feature set corresponding to the screened confidence teacher model with student common features to obtain a mixed domain common feature set, randomly screening a part of common feature sets from the mixed domain common feature set as source domain features, then using the rest common feature sets as target domain features, binding the common features and corresponding pseudo labels by using a Kronecker product to enable the common features of the source domain and the target domain to be mapped distinctively, achieving the purpose of mapping the same class of features in the source domain and the target domain to the same subspace, and finally aligning the common features after mapping in the source domain features and the target domain features by a maximum average difference method;
(4) Constructing a total loss function, and training an initial student model through the total loss function to obtain a final student model, wherein the total loss function comprises a discrimination centroid clustering strategy loss function, a reliable combined loss function, a reconstruction loss function and a classification score loss function;
the method comprises the steps that a reconstruction loss function is built based on the last layer of characteristics and reconstruction characteristics in a teacher model, the reconstruction characteristics of the teacher model are obtained by adopting a multilayer convolutional neural network based on the common characteristics of the teacher model, and a distinguishing centroid clustering strategy loss function is built based on a plurality of class centers and the common characteristics of all teachers; constructing a reliable combined loss function by adopting maximum average difference loss based on the result difference of Kronecker products of the source domain characteristic and the target domain characteristic and the corresponding pseudo labels respectively; constructing a classification score loss function through cross entropy loss based on the teacher soft prediction result set and the student model prediction results;
(5) And when the image data is applied, inputting the image data without the label to the final student model to obtain the category of the image data without the label.
And inputting the last layer of features into a common feature extractor to respectively obtain a teacher common feature set and student common features, wherein the method comprises the following steps:
the characteristics of the last layer are respectively input into the independently parameterized adaptive layers to align the characteristic dimensions to obtain a plurality of adaptive layer characteristics, and the adaptive layer characteristics are converted into a homogeneous public space through a shared extractor to obtain a teacher common characteristic set and student common characteristics.
Determining the number I by adopting an increment learning strategy based on the class identifier corresponding to the pseudo labelClass center of kth class identifier with batch sample number of tau for n teacher models
Figure BDA0003844236830000031
Comprises the following steps:
Figure BDA0003844236830000032
where τ is the index of the number of samples in the batch, t n For the nth teacher model, k is the index of the class identifier and m is the momentum accumulation hyperparameter.
Inputting the addition result into the activation function to obtain a pseudo label y as:
y=argmax(softmax(c))
Figure BDA0003844236830000033
wherein N is the number of the teacher models,
Figure BDA0003844236830000034
and the soft prediction result is the nth teacher soft prediction result.
Ambiguity of soft prediction of ith unlabeled image data class through entropy impurity balance nth teacher model
Figure BDA0003844236830000035
Comprises the following steps:
Figure BDA0003844236830000041
wherein K is the number of the class identifier,
Figure BDA0003844236830000042
soft prediction of the ith unlabeled image data class for the nth teacher model.
Comparing the normalized ambiguity with marginal constraint to obtain the number L of teacher models meeting the comparison requirement as follows:
Figure BDA0003844236830000043
wherein the content of the first and second substances,
Figure BDA0003844236830000044
for the normalization operator, η is the marginal constraint value,
Figure BDA0003844236830000045
to satisfy the nth teacher model of the comparison requirement.
Total loss function
Figure BDA0003844236830000046
Comprises the following steps:
Figure BDA0003844236830000047
Figure BDA0003844236830000048
Figure BDA0003844236830000049
Figure BDA0003844236830000051
Figure BDA0003844236830000052
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003844236830000053
the reconstructed features and the last layer of features of the ith sample of the nth teacher model, B is the batch number, N is the number of teachers, and lambda C Scoring lost weights for a classificationHeavy, lambda J To align the lost weights, λ, for the joint group DR To reconstruct the weight of the loss and class-centered loss in the total loss,
Figure BDA0003844236830000054
in order to classify the score-loss function,
Figure BDA0003844236830000055
in order to combine the loss functions reliably,
Figure BDA0003844236830000056
to discriminate the centroid clustering strategy loss function,
Figure BDA0003844236830000057
for the reconstruction loss function, B is the number of unlabeled image data,
Figure BDA0003844236830000058
in order to be a function of the cross-over loss,
Figure BDA0003844236830000059
to predict the result for the student model for the ith unlabeled image data,
Figure BDA00038442368300000510
to soft predict the result for the nth teacher model for the ith unlabeled image data,
Figure BDA00038442368300000511
t is the number of the permutation and combination of the multiple groups of mixed common characteristic domains, P is the number of the common characteristics in the source domain, Q is the number of the common characteristics in the target domain,
Figure BDA00038442368300000512
in order to be a logical function of the data,
Figure BDA00038442368300000513
kronecker product is performed for the pseudo label of the ith unlabeled image data and the p common features of the r group source field,
Figure BDA00038442368300000514
for Kronecker product of the jth unlabeled image data pseudo-label and q common features of the r-th target field,
Figure BDA0003844236830000061
so that the common characteristics of each teacher are close to the same class center,
Figure BDA0003844236830000062
the distance punishment is carried out on the centers of different types, the centers of different types are far away from each other, alpha is a balance parameter, N is the number of teacher models,
Figure BDA0003844236830000063
for the y-th teacher model in the n-th teacher model i Class center of class, k 1 And k 2 Are all indexes of class designators, but k 1 And k 2 And v is the distance between the constraint edge and the center of different classes in the control teacher model.
The y in the nth teacher model i Class center of a class
Figure BDA0003844236830000064
Comprises the following steps:
Figure BDA0003844236830000065
Figure BDA0003844236830000066
where τ is the index of the number of samples in the batch, y i Is a pseudo label of the ith unlabeled graph data.
A two-way discriminative feature aligned layered knowledge fusion apparatus comprising a computer memory, a computer processor and a computer program stored in and executable on said computer memory, wherein a final student model is employed in said computer memory;
the computer processor, when executing the computer program, performs the steps of: and inputting the image data without the label to the final student model to obtain the category of the image data without the label.
Compared with the prior art, the invention has the beneficial effects that:
1. the knowledge fusion method for bidirectional discriminant feature alignment firstly constructs a dual discriminant feature learning process in heterogeneous knowledge fusion (HKA), thereby not only ensuring the discriminant of the teacher features before alignment, but also promoting students to learn each teacher comprehensively and discriminantly.
2. Hierarchical feature alignment, including class-level, group-level, and global-level feature alignment, maps features orthogonally to their semantic subspaces, which hinders the transfer of cumbersome interfering knowledge to students, improving the accuracy of the model.
3. The hierarchical feature alignment method of the present invention anchors the differences to each category. Joint Group Feature Alignment (JGFA) decouples relationships and differences in complex multi-teacher knowledge fusion layer by fully mining relationships in local groups composed of multiple fields under each class, so that the relationships and differences are more easily captured and modeled, students can be promoted to fully fuse complementary knowledge of teachers in different fields, and generalization capability of student models is improved.
4. The centroid clustering strategy (DCCS) module relieves the discriminant information loss in a common feature space, supplements with Joint Group Feature Alignment (JGFA), only transfers the most discriminant knowledge to students, relieves the storage pressure of small models, and enables KA to be more easily deployed on small edge equipment.
Drawings
FIG. 1 is a schematic structural diagram of a two-way discriminative knowledge fusion system according to an embodiment;
FIG. 2 is a schematic diagram of a network structure of a common spatial feature extractor according to an embodiment;
FIG. 3 is a general block diagram of a bi-directional discriminative feature alignment based hierarchical knowledge fusion system according to an embodiment;
FIG. 4 is a schematic flow chart of a bi-directional discriminative feature alignment method according to an embodiment;
fig. 5 is a schematic diagram of the working principle of the two-way discriminant feature alignment hierarchical knowledge fusion provided by the specific embodiment.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a bi-directionally discriminative feature aligned hierarchical knowledge fusion (DDFA) system comprises:
the common space feature extractor module is used for eliminating dimension difference among features output by the heterogeneous network and converting the features of all teacher and student networks into a homogeneous common space, and the network structure is based on an adaptive layer and a shared extraction layer of each network;
and a center of mass clustering strategy module (DCCS) is judged, so that the converted features can be mapped back to the original space to ensure the accuracy, and meanwhile, the distinguishability of the features in the common space can be ensured. The module carries out statistical analysis on teacher features in the common feature space and uses an incremental learning strategy to simulate a class center, and gathers similar features to the class center to enable the similar features to be as close as possible to share the same distribution as that in the original space of the teacher; meanwhile, the introduction of the constraint edge distance is to punish the distances of different classes and make the classes far away from each other, so that the effect of controlling the distances of teacher samples of different classes is realized, and the feature fusion is promoted to keep intra-class aggregation and inter-class separation;
a joint semantic mixed group feature alignment (JGFA) module is used for a Joint Group Feature Alignment (JGFA) motivation to firstly increase the difficulty of simulating irrelevant class features and simultaneously enable students to more easily simulate the features of teachers of the same class, so that score learning inconsistent with classification is avoided. Second, treating all teachers and students as a mixed domain, any local blocks of which are traversed and aligned so that intra-domain relationships can be leveraged to promote consensus across all domains. Once all the partial parts composed of multiple domains can be aligned into a whole, students can fuse the characteristics of all teachers, rather than their compromise representations;
the self-adaptive teacher selection learning module effectively measures the ambiguity predicted by the teacher by using the information entropy index so as to screen the predicted teacher with smaller prediction ambiguity for learning. Thereby improving the quality of knowledge fusion.
As shown in fig. 3, a layered knowledge fusion method for bidirectional discriminant feature alignment specifically includes the following steps:
s1: given a label-free dataset χ under a small-batch image recognition task, a class identifier k, and a plurality of perfectly trained teacher models { t) engaged in different classification tasks 1 ,…,t N And an untrained student model s, wherein N is the number of teacher models.
S2: using the unlabelled image data as a sample, inputting the sample into a teacher model to obtain a teacher soft prediction result set expressed as
Figure BDA0003844236830000081
Wherein n is the index of the teacher model, and the unlabelled image data is input into the initial student model to obtain the student model prediction result c s (ii) a The teacher soft prediction result is integrated into a distillation target, a classification score loss function is constructed through cross entropy loss based on the teacher soft prediction result assembly and the student model prediction results, the output of the student model is driven to be consistent with the teacher model through the classification score loss function, as shown in figure 5 (a), namely the learning process of classification scores, and the classification score loss function
Figure BDA0003844236830000082
Comprises the following steps:
Figure BDA0003844236830000083
wherein B is the number of the image data without label,
Figure BDA0003844236830000084
in order to be a function of the cross-over loss,
Figure BDA0003844236830000085
to predict the result for the student model for the ith unlabeled image data,
Figure BDA0003844236830000086
the soft prediction result is the nth teacher model soft prediction result aiming at the ith unlabeled image data.
S3: as shown in fig. 2, the last-layer features of the teacher model and the initial student model are extracted as
Figure BDA0003844236830000091
Last layer of features in student model F S The features are then input into a small and learnable subnetwork consisting of a small convolutional network (1 x 1 stride) of three residuals, whose parameters are shared between teacher and student, hence the name shared extractor, which converts the adaptation layer features into a Common Feature Space (CFS), producing Common Feature Spaces (CFS), respectively
Figure BDA0003844236830000092
And h s Wherein
Figure BDA0003844236830000093
f = C × H × W and H, W, C represent height, width and number of channels of common features, respectively. Reducing the loss of the teacher model in the process of converting the last layer of characteristics into common characteristics to be within the loss threshold range by reconstructing the loss function
Figure BDA0003844236830000094
Comprises the following steps:
Figure BDA0003844236830000095
wherein the content of the first and second substances,
Figure BDA0003844236830000096
the reconstructed features and the last-layer features of the ith sample of the nth teacher model are respectively, B is the batch number, N is the number of teachers, and alpha is a balance parameter.
S4: additionally introducing a module for judging a centroid clustering strategy (DCCS) in a Common Feature Space (CFS), wherein a loss function is
Figure BDA0003844236830000097
The method comprises the following steps of performing discriminant correction on a Common Feature Space (CFS):
determining class centers by adopting an incremental learning strategy based on class identifiers corresponding to the pseudo labels, wherein the class centers of kth class identifiers with the batch sample number of tau of the nth teacher model
Figure BDA0003844236830000098
Comprises the following steps:
Figure BDA0003844236830000099
wherein tau is an index of the number of samples in the batch, tau max ∈(500,800),t n For the nth teacher model, k is the index of the class identifier, m is the momentum accumulation hyperparameter, and m is more than or equal to 0 and less than or equal to 1.
Using a Discriminative Centroid Clustering Strategy (DCCS) loss function
Figure BDA0003844236830000101
Punishing distances of different classes of centers in the teacher model to enable the different classes of centers to be far away from each other, and simultaneously approaching the common characteristics of all teachers to the class of centers to obtain a clustering common characteristic set, wherein a mass center clustering strategy loss function is judged
Figure BDA0003844236830000102
Comprises the following steps:
Figure BDA0003844236830000103
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003844236830000104
so that the common characteristics of each teacher are close to the same class center,
Figure BDA0003844236830000105
the heterogeneous centers are punished by distance, and are far away from each other, as shown in fig. 5 (b), alpha is a balance parameter, N is the number of teacher models,
Figure BDA0003844236830000106
for the y-th teacher model in the n-th teacher model i Class center of a class, k 1 And k 2 Are indexes of class designators, but k 1 And k 2 And v is the distance between the constraint edge and the center of different classes in the control teacher model.
The y in the nth teacher model i Class center of a class
Figure BDA0003844236830000107
Comprises the following steps:
Figure BDA0003844236830000108
Figure BDA0003844236830000109
where τ is the index of the number of samples in the batch, y i Is a pseudo label of the ith unlabeled graph data.
S5: joint Group Feature Alignment (JGFA) first increases the difficulty of simulating irrelevant class features, and at the same time makes it easier for student models to simulate the features of a class-like teacher model, thereby avoiding score learning inconsistent with classification. Joint Group Feature Alignment (JGFA) introduces simple and efficient cross covariance operators
Figure BDA0003844236830000111
And realizing joint semantic feature alignment. By manipulating multiplicative interactions of multiple random variables, a like common feature is mapped to the same feature subspace novel, thereby promoting the migratability of like features in edge alignment; introducing a mixed group feature alignment module, traversing and aligning any local block of all domains, thereby fully utilizing the intra-domain relation to maximize domain consensus to fully promote knowledge fusion; the method comprises the following specific steps:
summing the soft prediction result sets of the teachers, and inputting the summed result into an activation function to obtain a pseudo label y:
y=argmax(softmax(c))
Figure BDA0003844236830000112
wherein N is the number of the teacher models,
Figure BDA0003844236830000113
and the soft prediction result is the nth teacher soft prediction result.
Simply assigning the same weight to all teacher models and indifferently regarding the strengths and weaknesses of the various teacher models may even degrade the performance of multi-view learning in KA. Therefore, a reliable JGFA (rJGFA) is proposed to learn more "confident" to teachers to predict the corresponding features. Here, we use entropy impurities to measure the ambiguity of each teacher's output, and the specific steps are:
measuring the ambiguity of a teacher soft prediction result through entropy impurities, comparing the ambiguity with marginal constraint after normalization, and screening a teacher model meeting comparison requirements, wherein the method comprises the following specific steps of:
the ambiguity predicted by each teacher may be defined as:
Figure BDA0003844236830000114
the smaller the value, the higher the confidence of the prediction, where K is the number of class designators,
Figure BDA0003844236830000115
soft prediction of the ith unlabeled image data class for the nth teacher model. If the i-th image belongs to k,
Figure BDA0003844236830000121
otherwise
Figure BDA0003844236830000122
Based on this ambiguity, a selection learning can be performed from the teacher, and first, L teacher models that show confidence in the sample i, i.e., teacher models that satisfy the comparison requirements, are screened
Figure BDA0003844236830000123
Wherein, in the process,
Figure BDA0003844236830000124
for the normalization operator, η is the marginal constraint value,
Figure BDA0003844236830000125
an nth confident teacher model satisfying the comparison requirement;
constructing a mixed domain group level feature alignment to further explore the relationships in each local block composed of multiple domains under each class, as shown in fig. 5 (c), the present application does not learn to each teacher model in a separate manner, but rather treats all teachers and students as a mixed domain, traverses and aligns any local blocks thereof, so that the intra-domain relationships can be fully utilized to promote consensus of all domains, once all local parts composed of multiple domains can be aligned as a whole, students can fuse the features of all teachers, rather than their trade-off representation, which is easier to capture and more detailed than modeling the global differences of all domains directly, and the specific steps are as follows:
establishing r group domain by clustering common characteristic set corresponding to self-confident teacher model and student common characteristic
Figure BDA0003844236830000126
And the characteristics thereof
Figure BDA0003844236830000127
The total number of L +1 domains is L +1, and P domains are randomly selected from the L +1 domains to be connected as the source domain characteristics
Figure BDA0003844236830000128
Figure BDA00038442368300001211
Splicing features in Q = L-P +1 remaining domains as target domains
Figure BDA0003844236830000129
Number of permutation and combination
Figure BDA00038442368300001210
JGFA introduces a simple and efficient cross covariance operator
Figure BDA0003844236830000131
Wherein
Figure BDA0003844236830000132
Representing a Kronecker product that can handle multiplicative interactions of multiple random variables. Using its single hot semantic information y to map the common features h, novel to map the same class of features to the same subspace, thereby promoting the migratability of the same class of features in edge alignment;
performing Kronecker product on the pseudo label and the common feature of the source domain and the common feature of the target domain respectively to map the same class of features in the source domain and the target domain to the same subspace, as shown in fig. 5 (c), to obtain a source domain cross covariance and a target domain cross covariance, where the Kronecker product is SJFO, and aligning the source domain feature and the target domain feature by a maximum mean difference method (MMD), that is, by reliable joint combination loss function training, based on the result of performing Kronecker product on the source domain feature and the target domain feature and the corresponding pseudo label respectively.
Wherein the loss functions are reliably combined
Figure BDA0003844236830000133
Comprises the following steps:
Figure BDA0003844236830000134
wherein T is the number of permutation and combination of a plurality of groups of mixed common characteristic domains, P is the number of domains in the source domain, Q is the number of domains in the target domain,
Figure BDA0003844236830000135
in order to be a logical function of the data,
Figure BDA0003844236830000136
kronecker product is performed for the pseudo label of the ith unlabeled image data and the p common features of the r group source domain,
Figure BDA0003844236830000137
kronecker product is performed for q common features of the pseudo label and the r-th target domain for the j-th unlabeled image data. JGFA forces students to learn the features of all teachers under each category. Features to be learned by, for example, students also include features extracted from the regions a and B, which are features that the expert teacher (who focuses only on the region C to extract features) cannot capture, as shown in fig. 5 (d).
S7: finally, all losses above are integrated: total loss
Figure BDA0003844236830000141
Is defined as:
Figure BDA0003844236830000142
wherein, λ C To classify the score loss weight, λ J Loss of weight for joint group alignment, λ DR Weights in total loss for reconstruction loss and class center loss.
And training the initial student model through the total loss function to obtain a final student model, and inputting the unlabeled image data into the final student model to obtain the type of the unlabeled image data when the unlabeled image data is applied.
In summary, the method provided by the present embodiment makes the common features more discriminative for student simulation, and also drives students to learn differently from teachers, so that students can not only more easily fuse knowledge of all teachers in each category, but also naturally establish semantic consistency between feature learning and classification score learning. Therefore, the accuracy of knowledge fusion can be effectively improved, and complementary information in teachers can be fully fused to improve the generalization.

Claims (9)

1. A layered knowledge fusion method for bidirectional discriminant feature alignment is characterized by comprising the following steps:
(1) Obtaining a label-free image data set and a teacher model, constructing an initial student model, inputting the label-free image data as a sample into the teacher model to obtain a teacher soft prediction result, inputting the spliced teacher soft prediction result into an activation function to obtain a pseudo label, and inputting the label-free image data into the initial student model to obtain a student model prediction result;
(2) Respectively extracting the last layer of characteristics in the teacher model and the initial student model, and inputting the last layer of characteristics into a common characteristic extractor to respectively obtain a teacher common characteristic set and student common characteristics; determining class centers by adopting an incremental learning strategy based on class identifiers corresponding to the pseudo labels, punishing distances of different class centers in the teacher model by judging a centroid clustering strategy to enable the different class centers to be far away from each other, and enabling common features of all teachers to approach the class centers to obtain a clustered common feature set;
(3) Inputting each teacher soft prediction result into an entropy impurity formula to measure the ambiguity of the teacher soft prediction result, comparing the result after ambiguity normalization with a constraint boundary, screening a confidence teacher model meeting requirements, mixing a clustering common feature set corresponding to the screened confidence teacher model with student common features to obtain a mixed domain common feature set, randomly screening a part of common feature sets from the mixed domain common feature set as source domain features, then using the rest common feature sets as target domain features, binding the common features and corresponding pseudo labels by using a Kronecker product to enable the common features of the source domain and the target domain to be differentially mapped, achieving the purpose of mapping the same class of features in the source domain and the target domain to the same subspace, and finally aligning the common features after mapping in the source domain features and the target domain features by a maximum average difference method;
(4) Constructing a total loss function, and training an initial student model through the total loss function to obtain a final student model, wherein the total loss function comprises a judging centroid clustering strategy loss function, a reliable combined loss function, a reconstruction loss function and a classification score loss function;
the method comprises the steps of constructing a reconstruction loss function based on the last layer of characteristics and reconstruction characteristics in a teacher model, obtaining the reconstruction characteristics of the teacher model by adopting a multilayer convolutional neural network based on the common characteristics of the teacher model, and constructing a judgment centroid clustering strategy loss function based on a plurality of class centers and the common characteristics of each teacher; constructing a reliable combined loss function by adopting maximum average difference loss based on the result of Kronecker product of the source domain characteristic and the target domain characteristic and the corresponding pseudo label respectively; constructing a classification score loss function through cross entropy loss based on the teacher soft prediction result set and the student model prediction results;
(5) And when the image data is applied, inputting the image data without the label into the final student model to obtain the category of the image data without the label.
2. The method for fusing layered knowledge of bidirectional discriminative feature alignment as claimed in claim 1, wherein the step of inputting the last layer of features into the common feature extractor to obtain the teacher common feature set and the student common features respectively comprises:
and firstly, respectively inputting the last layer of characteristics into an individually parameterized adaptation layer to align the characteristic dimensions to obtain a plurality of adaptation layer characteristics, and converting the plurality of adaptation layer characteristics into a homogeneous common space through a shared extractor to obtain a teacher common characteristic set and student common characteristics.
3. The bidirectional discriminative feature-aligned hierarchical knowledge fusion method of claim 1, wherein a class center of a kth class designator with a batch sample number τ of an nth teacher model is determined by an incremental learning strategy based on class designators corresponding to pseudo-tags
Figure RE-FDA0004025447510000021
Comprises the following steps:
Figure RE-FDA0004025447510000022
where τ is the index of the number of samples in the batch, t n For the nth teacher model, k is the index of the class identifier and m is the momentum accumulation hyperparameter.
4. The method of claim 1, wherein the pseudo label y obtained by inputting the sum to the activation function is:
y=argmax(softmax(c))
Figure RE-FDA0004025447510000023
wherein N is the number of the teacher models,
Figure RE-FDA0004025447510000024
and the soft prediction result is the nth teacher soft prediction result.
5. The bi-directional discriminative feature-aligned hierarchical knowledge fusion method of claim 1 wherein the ambiguity of the soft prediction of the ith unlabeled image data class by the nth teacher model is measured by entropy impurity
Figure RE-FDA0004025447510000025
Comprises the following steps:
Figure RE-FDA0004025447510000031
wherein K is the number of the class identifier,
Figure RE-FDA0004025447510000032
soft prediction of the ith unlabeled image data class for the nth teacher model.
6. The two-way discriminative feature-aligned hierarchical knowledge fusion method according to claim 1, wherein the number L of teacher models satisfying the comparison requirement obtained by comparing the normalized blur degree with the marginal constraint is:
Figure RE-FDA0004025447510000033
wherein the content of the first and second substances,
Figure RE-FDA0004025447510000034
for the normalization operator, η is the marginal constraint value,
Figure RE-FDA0004025447510000035
to satisfy the nth teacher model of the comparison requirement.
7. The method for fusing layered knowledge of reliable bi-directional discriminative feature alignment according to claim 1Method, characterized by a total loss function
Figure RE-FDA0004025447510000036
Comprises the following steps:
Figure RE-FDA0004025447510000037
Figure RE-FDA0004025447510000038
Figure RE-FDA0004025447510000039
Figure RE-FDA0004025447510000041
Figure RE-FDA0004025447510000042
wherein the content of the first and second substances,
Figure RE-FDA0004025447510000043
the reconstructed features and the last layer of features of the ith sample of the nth teacher model, B is the batch number, N is the number of teachers, and lambda C To classify the score loss weight, λ J To align the lost weights, λ, for the joint group DR To reconstruct the weight of the loss and class-centered loss in the total loss,
Figure RE-FDA0004025447510000044
in order to classify the score-loss function,
Figure RE-FDA0004025447510000045
in order to combine the loss functions reliably,
Figure RE-FDA0004025447510000046
to discriminate the centroid clustering strategy loss function,
Figure RE-FDA0004025447510000047
for the reconstruction loss function, B is the number of unlabeled image data,
Figure RE-FDA0004025447510000048
in order to be a function of the cross-over loss,
Figure RE-FDA0004025447510000049
to predict the result for the student model for the ith unlabeled image data,
Figure RE-FDA00040254475100000410
to soft predict the result for the nth teacher model for the ith unlabeled image data,
Figure RE-FDA00040254475100000411
t is the number of the permutation and combination of the multiple groups of mixed common characteristic domains, P is the number of the common characteristics in the source domain, Q is the number of the common characteristics in the target domain,
Figure RE-FDA00040254475100000412
in order to be a logical function of the data,
Figure RE-FDA00040254475100000413
kronecker product is performed for the pseudo label of the ith unlabeled image data and the p common features of the r group source domain,
Figure RE-FDA00040254475100000414
kronecker product is performed for q common features of the jth unlabeled image data pseudo label and the r-th target domain,
Figure RE-FDA0004025447510000051
so that the common characteristics of each teacher are close to the same class center,
Figure RE-FDA0004025447510000052
so that the centers of different classes are punished by distance, the centers of different classes are far away from each other, alpha is a balance parameter, N is the number of teacher models,
Figure RE-FDA0004025447510000053
for the y-th teacher model in the n-th teacher model i Class center of class, k 1 And k 2 Are indexes of class designators, but k 1 And k 2 And v is the distance between the constraint edge and the center of different classes in the control teacher model.
8. The bi-directional discriminative feature-aligned hierarchical knowledge fusion method of claim 7 wherein the y-th teacher model in the n-th teacher model i Class center of a class
Figure RE-FDA0004025447510000054
Comprises the following steps:
Figure RE-FDA0004025447510000055
Figure RE-FDA0004025447510000056
where τ is the index of the number of samples in the batch, y i Is a pseudo label of the ith label-free graph data.
9. A bi-directional discriminative feature aligned hierarchical knowledge fusion device comprising a computer memory, a computer processor and a computer program stored in and executable on said computer memory, wherein the final student model of any one of claims 1 to 8 is employed in said computer memory;
the computer processor, when executing the computer program, performs the steps of: and inputting the image data without the label to the final student model to obtain the category of the image data without the label.
CN202211119323.3A 2022-09-14 2022-09-14 Layered knowledge fusion method and device for bidirectional discriminant feature alignment Pending CN115795993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211119323.3A CN115795993A (en) 2022-09-14 2022-09-14 Layered knowledge fusion method and device for bidirectional discriminant feature alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211119323.3A CN115795993A (en) 2022-09-14 2022-09-14 Layered knowledge fusion method and device for bidirectional discriminant feature alignment

Publications (1)

Publication Number Publication Date
CN115795993A true CN115795993A (en) 2023-03-14

Family

ID=85431945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211119323.3A Pending CN115795993A (en) 2022-09-14 2022-09-14 Layered knowledge fusion method and device for bidirectional discriminant feature alignment

Country Status (1)

Country Link
CN (1) CN115795993A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726884A (en) * 2024-02-09 2024-03-19 腾讯科技(深圳)有限公司 Training method of object class identification model, object class identification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726884A (en) * 2024-02-09 2024-03-19 腾讯科技(深圳)有限公司 Training method of object class identification model, object class identification method and device
CN117726884B (en) * 2024-02-09 2024-05-03 腾讯科技(深圳)有限公司 Training method of object class identification model, object class identification method and device

Similar Documents

Publication Publication Date Title
CN112949786B (en) Data classification identification method, device, equipment and readable storage medium
CN114398961B (en) Visual question-answering method based on multi-mode depth feature fusion and model thereof
CN112446591B (en) Zero sample evaluation method for student comprehensive ability evaluation
CN110717431A (en) Fine-grained visual question and answer method combined with multi-view attention mechanism
CN111160409A (en) Heterogeneous neural network knowledge reorganization method based on common feature learning
Sanders et al. Training deep networks to construct a psychological feature space for a natural-object category domain
CN109583562A (en) SGCNN: the convolutional neural networks based on figure of structure
CN110826638A (en) Zero sample image classification model based on repeated attention network and method thereof
CN106202044A (en) A kind of entity relation extraction method based on deep neural network
CN109376610B (en) Pedestrian unsafe behavior detection method based on image concept network in video monitoring
CN108765383A (en) Video presentation method based on depth migration study
CN111539452B (en) Image recognition method and device for multi-task attribute, electronic equipment and storage medium
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN106777402A (en) A kind of image retrieval text method based on sparse neural network
KR20200010672A (en) Smart merchandise searching method and system using deep learning
CN115795993A (en) Layered knowledge fusion method and device for bidirectional discriminant feature alignment
CN116912708A (en) Remote sensing image building extraction method based on deep learning
CN113220915B (en) Remote sensing image retrieval method and device based on residual attention
Hu et al. Saliency-based YOLO for single target detection
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
Shi [Retracted] Application of Artificial Neural Network in College‐Level Music Teaching Quality Evaluation
CN105809200A (en) Biologically-inspired image meaning information autonomous extraction method and device
CN115438152B (en) Simple answer scoring method and system based on multi-neural network and knowledge graph
CN115601745A (en) Multi-view three-dimensional object identification method facing application end
CN115719514A (en) Gesture recognition-oriented field self-adaptive method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination