CN115795993A

CN115795993A - Layered knowledge fusion method and device for bidirectional discriminant feature alignment

Info

Publication number: CN115795993A
Application number: CN202211119323.3A
Authority: CN
Inventors: 徐仁军; 梁朔颖
Original assignee: ZJU Hangzhou Global Scientific and Technological Innovation Center
Current assignee: ZJU Hangzhou Global Scientific and Technological Innovation Center
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2023-03-14

Abstract

The invention discloses a layered knowledge fusion method for bidirectional discriminant feature alignment, which comprises the following steps: inputting the sample into a teacher model to obtain a teacher soft prediction result set, and inputting unlabeled image data into an initial student model to obtain a student model prediction result; extracting the last layer of features, and inputting the last layer of features into a common feature extractor to obtain common features; different class centers are far away from each other by judging a centroid clustering strategy, and the common characteristics of teachers are close to the class centers of the same class; measuring the ambiguity of a teacher soft prediction result through entropy impurities, constructing reliable source domain characteristics and target domain characteristics, performing Kronecker product on the pseudo labels and common characteristics of a source domain and a target domain respectively for performing discriminative mapping, and performing characteristic alignment through a maximum average difference method; and constructing a total loss function, and training an initial student model through the total loss function to obtain a comprehensive student model capable of being classified accurately.

Description

Layered knowledge fusion method and device for bidirectional discriminant feature alignment

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a layered knowledge fusion method and device for bidirectional discriminant feature alignment.

Background

In recent years, deep Neural Networks (DNNs) have enjoyed dramatic success in many artificial intelligence tasks, such as computer vision and natural language processing. However, the success of the widely used DNN relies on expensive computational cost and storage and extensive manual annotation. To ease replication efforts, many researchers have published their trained models on the web, which motivates us to reuse them in a plug-and-play fashion.

As a model reuse strategy, the knowledge fusion (KA) algorithm achieves compelling performance in various applications. They studied how to effectively utilize multiple pre-trained teacher networks to train a comprehensive mini-student model to handle all the tasks of teachers who did not have labeled data. The students in these traditional KA methods are typically trained to mimic the teacher's output (referred to as classification score learning) and/or intermediate layers (referred to as feature learning) for unlabeled data.

However, publicly available training models typically have different architectures. Thus, a more realistic scenario is heterogeneous knowledge fusion (HKA). In this case, the student cannot learn directly from the features introduced between each network layer of the teacher as usual. They can only learn with classification scores for purposes such as Data-free KA and SKA.

Chinese patent publication No. CN111160409A discloses a heterogeneous neural network knowledge fusion method based on common feature learning, which includes: obtaining a plurality of pre-trained neural network models, which are called teacher models: the characteristics output by the teacher model and the output prediction result are used for guiding the training of the student model through a common characteristic learning and soft target distillation method: in the common characteristic learning process, the characteristics of a plurality of heterogeneous networks are projected to a common characteristic interval, the student models integrate knowledge of a plurality of teacher models, and the soft target distillation method enables the prediction results of the student models to be consistent with the prediction results of the teacher models, so that a stronger student model with the task processing capacity of all the teacher models is obtained. The patent disclosed above is applicable to knowledge fusion of neural network models, and in particular to knowledge fusion of heterogeneous image classification task models.

But it is crude to align student features and teacher features only blindly, and students trained without discriminative feature alignment are likely to align with or be disturbed by irrelevant class features and degrade classification performance. Therefore, in this case, it is difficult for students to learn the true data distribution from teachers, so that the performance of heterogeneous knowledge fusion is generally low and the generalization is poor.

Disclosure of Invention

The invention provides a layered knowledge fusion method for bidirectional discriminant feature alignment, which can obtain a student model capable of accurately judging the category of unlabeled image data through less training.

A layered knowledge fusion method for bidirectional discriminant feature alignment comprises the following steps:

(1) Obtaining a label-free image data set and a teacher model, constructing an initial student model, inputting the label-free image data as a sample into the teacher model to obtain a teacher soft prediction result set, inputting the spliced teacher soft prediction results into an activation function to obtain a pseudo label, and inputting the label-free image data into the initial student model to obtain a student model prediction result;

(2) Respectively extracting the last layer of characteristics in the teacher model and the initial student model, and inputting the last layer of characteristics into a common characteristic extractor to respectively obtain a teacher common characteristic set and student common characteristics; determining class centers by adopting an incremental learning strategy based on class identifiers corresponding to the pseudo labels, punishing distances of different class centers in the teacher model by judging a centroid clustering strategy to enable the different class centers to be far away from each other, and enabling common features of all teachers to approach the same class centers to obtain a clustering common feature set;

(3) Splicing soft prediction results of the teachers, and inputting the splicing results into an activation function to obtain a pseudo label; inputting a teacher soft prediction result into an entropy impurity formula to measure the ambiguity of the teacher soft prediction result, comparing the result after ambiguity normalization with a constraint boundary, screening a confidence teacher model meeting requirements, mixing a clustering common feature set corresponding to the screened confidence teacher model with student common features to obtain a mixed domain common feature set, randomly screening a part of common feature sets from the mixed domain common feature set as source domain features, then using the rest common feature sets as target domain features, binding the common features and corresponding pseudo labels by using a Kronecker product to enable the common features of the source domain and the target domain to be mapped distinctively, achieving the purpose of mapping the same class of features in the source domain and the target domain to the same subspace, and finally aligning the common features after mapping in the source domain features and the target domain features by a maximum average difference method;

(4) Constructing a total loss function, and training an initial student model through the total loss function to obtain a final student model, wherein the total loss function comprises a discrimination centroid clustering strategy loss function, a reliable combined loss function, a reconstruction loss function and a classification score loss function;

the method comprises the steps that a reconstruction loss function is built based on the last layer of characteristics and reconstruction characteristics in a teacher model, the reconstruction characteristics of the teacher model are obtained by adopting a multilayer convolutional neural network based on the common characteristics of the teacher model, and a distinguishing centroid clustering strategy loss function is built based on a plurality of class centers and the common characteristics of all teachers; constructing a reliable combined loss function by adopting maximum average difference loss based on the result difference of Kronecker products of the source domain characteristic and the target domain characteristic and the corresponding pseudo labels respectively; constructing a classification score loss function through cross entropy loss based on the teacher soft prediction result set and the student model prediction results;

(5) And when the image data is applied, inputting the image data without the label to the final student model to obtain the category of the image data without the label.

And inputting the last layer of features into a common feature extractor to respectively obtain a teacher common feature set and student common features, wherein the method comprises the following steps:

the characteristics of the last layer are respectively input into the independently parameterized adaptive layers to align the characteristic dimensions to obtain a plurality of adaptive layer characteristics, and the adaptive layer characteristics are converted into a homogeneous public space through a shared extractor to obtain a teacher common characteristic set and student common characteristics.

Determining the number I by adopting an increment learning strategy based on the class identifier corresponding to the pseudo labelClass center of kth class identifier with batch sample number of tau for n teacher models

Comprises the following steps:

where τ is the index of the number of samples in the batch, t _n For the nth teacher model, k is the index of the class identifier and m is the momentum accumulation hyperparameter.

Inputting the addition result into the activation function to obtain a pseudo label y as:

y＝argmax(softmax(c))

wherein N is the number of the teacher models,

and the soft prediction result is the nth teacher soft prediction result.

Ambiguity of soft prediction of ith unlabeled image data class through entropy impurity balance nth teacher model

Comprises the following steps:

wherein K is the number of the class identifier,

soft prediction of the ith unlabeled image data class for the nth teacher model.

Comparing the normalized ambiguity with marginal constraint to obtain the number L of teacher models meeting the comparison requirement as follows:

wherein the content of the first and second substances,

for the normalization operator, η is the marginal constraint value,

to satisfy the nth teacher model of the comparison requirement.

Total loss function

Comprises the following steps:

wherein, the first and the second end of the pipe are connected with each other,

the reconstructed features and the last layer of features of the ith sample of the nth teacher model, B is the batch number, N is the number of teachers, and lambda _C Scoring lost weights for a classificationHeavy, lambda _J To align the lost weights, λ, for the joint group _DR To reconstruct the weight of the loss and class-centered loss in the total loss,

in order to classify the score-loss function,

in order to combine the loss functions reliably,

to discriminate the centroid clustering strategy loss function,

for the reconstruction loss function, B is the number of unlabeled image data,

in order to be a function of the cross-over loss,

to predict the result for the student model for the ith unlabeled image data,

to soft predict the result for the nth teacher model for the ith unlabeled image data,

t is the number of the permutation and combination of the multiple groups of mixed common characteristic domains, P is the number of the common characteristics in the source domain, Q is the number of the common characteristics in the target domain,

in order to be a logical function of the data,

kronecker product is performed for the pseudo label of the ith unlabeled image data and the p common features of the r group source field,

for Kronecker product of the jth unlabeled image data pseudo-label and q common features of the r-th target field,

so that the common characteristics of each teacher are close to the same class center,

the distance punishment is carried out on the centers of different types, the centers of different types are far away from each other, alpha is a balance parameter, N is the number of teacher models,

for the y-th teacher model in the n-th teacher model _i Class center of class, k ₁ And k ₂ Are all indexes of class designators, but k ₁ And k ₂ And v is the distance between the constraint edge and the center of different classes in the control teacher model.

The y in the nth teacher model _i Class center of a class

Comprises the following steps:

where τ is the index of the number of samples in the batch, y _i Is a pseudo label of the ith unlabeled graph data.

A two-way discriminative feature aligned layered knowledge fusion apparatus comprising a computer memory, a computer processor and a computer program stored in and executable on said computer memory, wherein a final student model is employed in said computer memory;

the computer processor, when executing the computer program, performs the steps of: and inputting the image data without the label to the final student model to obtain the category of the image data without the label.

Compared with the prior art, the invention has the beneficial effects that:

1. the knowledge fusion method for bidirectional discriminant feature alignment firstly constructs a dual discriminant feature learning process in heterogeneous knowledge fusion (HKA), thereby not only ensuring the discriminant of the teacher features before alignment, but also promoting students to learn each teacher comprehensively and discriminantly.

2. Hierarchical feature alignment, including class-level, group-level, and global-level feature alignment, maps features orthogonally to their semantic subspaces, which hinders the transfer of cumbersome interfering knowledge to students, improving the accuracy of the model.

3. The hierarchical feature alignment method of the present invention anchors the differences to each category. Joint Group Feature Alignment (JGFA) decouples relationships and differences in complex multi-teacher knowledge fusion layer by fully mining relationships in local groups composed of multiple fields under each class, so that the relationships and differences are more easily captured and modeled, students can be promoted to fully fuse complementary knowledge of teachers in different fields, and generalization capability of student models is improved.

4. The centroid clustering strategy (DCCS) module relieves the discriminant information loss in a common feature space, supplements with Joint Group Feature Alignment (JGFA), only transfers the most discriminant knowledge to students, relieves the storage pressure of small models, and enables KA to be more easily deployed on small edge equipment.

Drawings

FIG. 1 is a schematic structural diagram of a two-way discriminative knowledge fusion system according to an embodiment;

FIG. 2 is a schematic diagram of a network structure of a common spatial feature extractor according to an embodiment;

FIG. 3 is a general block diagram of a bi-directional discriminative feature alignment based hierarchical knowledge fusion system according to an embodiment;

FIG. 4 is a schematic flow chart of a bi-directional discriminative feature alignment method according to an embodiment;

fig. 5 is a schematic diagram of the working principle of the two-way discriminant feature alignment hierarchical knowledge fusion provided by the specific embodiment.

Detailed Description

In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

As shown in fig. 1, a bi-directionally discriminative feature aligned hierarchical knowledge fusion (DDFA) system comprises:

the common space feature extractor module is used for eliminating dimension difference among features output by the heterogeneous network and converting the features of all teacher and student networks into a homogeneous common space, and the network structure is based on an adaptive layer and a shared extraction layer of each network;

and a center of mass clustering strategy module (DCCS) is judged, so that the converted features can be mapped back to the original space to ensure the accuracy, and meanwhile, the distinguishability of the features in the common space can be ensured. The module carries out statistical analysis on teacher features in the common feature space and uses an incremental learning strategy to simulate a class center, and gathers similar features to the class center to enable the similar features to be as close as possible to share the same distribution as that in the original space of the teacher; meanwhile, the introduction of the constraint edge distance is to punish the distances of different classes and make the classes far away from each other, so that the effect of controlling the distances of teacher samples of different classes is realized, and the feature fusion is promoted to keep intra-class aggregation and inter-class separation;

a joint semantic mixed group feature alignment (JGFA) module is used for a Joint Group Feature Alignment (JGFA) motivation to firstly increase the difficulty of simulating irrelevant class features and simultaneously enable students to more easily simulate the features of teachers of the same class, so that score learning inconsistent with classification is avoided. Second, treating all teachers and students as a mixed domain, any local blocks of which are traversed and aligned so that intra-domain relationships can be leveraged to promote consensus across all domains. Once all the partial parts composed of multiple domains can be aligned into a whole, students can fuse the characteristics of all teachers, rather than their compromise representations;

the self-adaptive teacher selection learning module effectively measures the ambiguity predicted by the teacher by using the information entropy index so as to screen the predicted teacher with smaller prediction ambiguity for learning. Thereby improving the quality of knowledge fusion.

As shown in fig. 3, a layered knowledge fusion method for bidirectional discriminant feature alignment specifically includes the following steps:

s1: given a label-free dataset χ under a small-batch image recognition task, a class identifier k, and a plurality of perfectly trained teacher models { t) engaged in different classification tasks ₁ ,…,t _N And an untrained student model s, wherein N is the number of teacher models.

S2: using the unlabelled image data as a sample, inputting the sample into a teacher model to obtain a teacher soft prediction result set expressed as

Wherein n is the index of the teacher model, and the unlabelled image data is input into the initial student model to obtain the student model prediction result c _s (ii) a The teacher soft prediction result is integrated into a distillation target, a classification score loss function is constructed through cross entropy loss based on the teacher soft prediction result assembly and the student model prediction results, the output of the student model is driven to be consistent with the teacher model through the classification score loss function, as shown in figure 5 (a), namely the learning process of classification scores, and the classification score loss function

Comprises the following steps:

wherein B is the number of the image data without label,

in order to be a function of the cross-over loss,

to predict the result for the student model for the ith unlabeled image data,

the soft prediction result is the nth teacher model soft prediction result aiming at the ith unlabeled image data.

S3: as shown in fig. 2, the last-layer features of the teacher model and the initial student model are extracted as

Last layer of features in student model F _S The features are then input into a small and learnable subnetwork consisting of a small convolutional network (1 x 1 stride) of three residuals, whose parameters are shared between teacher and student, hence the name shared extractor, which converts the adaptation layer features into a Common Feature Space (CFS), producing Common Feature Spaces (CFS), respectively

And h _s Wherein

f = C × H × W and H, W, C represent height, width and number of channels of common features, respectively. Reducing the loss of the teacher model in the process of converting the last layer of characteristics into common characteristics to be within the loss threshold range by reconstructing the loss function

Comprises the following steps:

wherein the content of the first and second substances,

the reconstructed features and the last-layer features of the ith sample of the nth teacher model are respectively, B is the batch number, N is the number of teachers, and alpha is a balance parameter.

S4: additionally introducing a module for judging a centroid clustering strategy (DCCS) in a Common Feature Space (CFS), wherein a loss function is

The method comprises the following steps of performing discriminant correction on a Common Feature Space (CFS):

determining class centers by adopting an incremental learning strategy based on class identifiers corresponding to the pseudo labels, wherein the class centers of kth class identifiers with the batch sample number of tau of the nth teacher model

Comprises the following steps:

wherein tau is an index of the number of samples in the batch, tau _max ∈(500,800)，t _n For the nth teacher model, k is the index of the class identifier, m is the momentum accumulation hyperparameter, and m is more than or equal to 0 and less than or equal to 1.

Using a Discriminative Centroid Clustering Strategy (DCCS) loss function

Punishing distances of different classes of centers in the teacher model to enable the different classes of centers to be far away from each other, and simultaneously approaching the common characteristics of all teachers to the class of centers to obtain a clustering common characteristic set, wherein a mass center clustering strategy loss function is judged

Comprises the following steps:

the heterogeneous centers are punished by distance, and are far away from each other, as shown in fig. 5 (b), alpha is a balance parameter, N is the number of teacher models,

for the y-th teacher model in the n-th teacher model _i Class center of a class, k ₁ And k ₂ Are indexes of class designators, but k ₁ And k ₂ And v is the distance between the constraint edge and the center of different classes in the control teacher model.

The y in the nth teacher model _i Class center of a class

Comprises the following steps:

S5: joint Group Feature Alignment (JGFA) first increases the difficulty of simulating irrelevant class features, and at the same time makes it easier for student models to simulate the features of a class-like teacher model, thereby avoiding score learning inconsistent with classification. Joint Group Feature Alignment (JGFA) introduces simple and efficient cross covariance operators

And realizing joint semantic feature alignment. By manipulating multiplicative interactions of multiple random variables, a like common feature is mapped to the same feature subspace novel, thereby promoting the migratability of like features in edge alignment; introducing a mixed group feature alignment module, traversing and aligning any local block of all domains, thereby fully utilizing the intra-domain relation to maximize domain consensus to fully promote knowledge fusion; the method comprises the following specific steps:

summing the soft prediction result sets of the teachers, and inputting the summed result into an activation function to obtain a pseudo label y:

y＝argmax(softmax(c))

wherein N is the number of the teacher models,

and the soft prediction result is the nth teacher soft prediction result.

Simply assigning the same weight to all teacher models and indifferently regarding the strengths and weaknesses of the various teacher models may even degrade the performance of multi-view learning in KA. Therefore, a reliable JGFA (rJGFA) is proposed to learn more "confident" to teachers to predict the corresponding features. Here, we use entropy impurities to measure the ambiguity of each teacher's output, and the specific steps are:

measuring the ambiguity of a teacher soft prediction result through entropy impurities, comparing the ambiguity with marginal constraint after normalization, and screening a teacher model meeting comparison requirements, wherein the method comprises the following specific steps of:

the ambiguity predicted by each teacher may be defined as:

the smaller the value, the higher the confidence of the prediction, where K is the number of class designators,

soft prediction of the ith unlabeled image data class for the nth teacher model. If the i-th image belongs to k,

otherwise

Based on this ambiguity, a selection learning can be performed from the teacher, and first, L teacher models that show confidence in the sample i, i.e., teacher models that satisfy the comparison requirements, are screened

Wherein, in the process,

for the normalization operator, η is the marginal constraint value,

an nth confident teacher model satisfying the comparison requirement;

constructing a mixed domain group level feature alignment to further explore the relationships in each local block composed of multiple domains under each class, as shown in fig. 5 (c), the present application does not learn to each teacher model in a separate manner, but rather treats all teachers and students as a mixed domain, traverses and aligns any local blocks thereof, so that the intra-domain relationships can be fully utilized to promote consensus of all domains, once all local parts composed of multiple domains can be aligned as a whole, students can fuse the features of all teachers, rather than their trade-off representation, which is easier to capture and more detailed than modeling the global differences of all domains directly, and the specific steps are as follows:

establishing r group domain by clustering common characteristic set corresponding to self-confident teacher model and student common characteristic

And the characteristics thereof

The total number of L +1 domains is L +1, and P domains are randomly selected from the L +1 domains to be connected as the source domain characteristics

Splicing features in Q = L-P +1 remaining domains as target domains

Number of permutation and combination

JGFA introduces a simple and efficient cross covariance operator

Wherein

Representing a Kronecker product that can handle multiplicative interactions of multiple random variables. Using its single hot semantic information y to map the common features h, novel to map the same class of features to the same subspace, thereby promoting the migratability of the same class of features in edge alignment;

performing Kronecker product on the pseudo label and the common feature of the source domain and the common feature of the target domain respectively to map the same class of features in the source domain and the target domain to the same subspace, as shown in fig. 5 (c), to obtain a source domain cross covariance and a target domain cross covariance, where the Kronecker product is SJFO, and aligning the source domain feature and the target domain feature by a maximum mean difference method (MMD), that is, by reliable joint combination loss function training, based on the result of performing Kronecker product on the source domain feature and the target domain feature and the corresponding pseudo label respectively.

Wherein the loss functions are reliably combined

Comprises the following steps:

wherein T is the number of permutation and combination of a plurality of groups of mixed common characteristic domains, P is the number of domains in the source domain, Q is the number of domains in the target domain,

in order to be a logical function of the data,

kronecker product is performed for the pseudo label of the ith unlabeled image data and the p common features of the r group source domain,

kronecker product is performed for q common features of the pseudo label and the r-th target domain for the j-th unlabeled image data. JGFA forces students to learn the features of all teachers under each category. Features to be learned by, for example, students also include features extracted from the regions a and B, which are features that the expert teacher (who focuses only on the region C to extract features) cannot capture, as shown in fig. 5 (d).

S7: finally, all losses above are integrated: total loss

Is defined as:

wherein, λ _C To classify the score loss weight, λ _J Loss of weight for joint group alignment, λ _DR Weights in total loss for reconstruction loss and class center loss.

And training the initial student model through the total loss function to obtain a final student model, and inputting the unlabeled image data into the final student model to obtain the type of the unlabeled image data when the unlabeled image data is applied.

In summary, the method provided by the present embodiment makes the common features more discriminative for student simulation, and also drives students to learn differently from teachers, so that students can not only more easily fuse knowledge of all teachers in each category, but also naturally establish semantic consistency between feature learning and classification score learning. Therefore, the accuracy of knowledge fusion can be effectively improved, and complementary information in teachers can be fully fused to improve the generalization.

Claims

1. A layered knowledge fusion method for bidirectional discriminant feature alignment is characterized by comprising the following steps:

(1) Obtaining a label-free image data set and a teacher model, constructing an initial student model, inputting the label-free image data as a sample into the teacher model to obtain a teacher soft prediction result, inputting the spliced teacher soft prediction result into an activation function to obtain a pseudo label, and inputting the label-free image data into the initial student model to obtain a student model prediction result;

(2) Respectively extracting the last layer of characteristics in the teacher model and the initial student model, and inputting the last layer of characteristics into a common characteristic extractor to respectively obtain a teacher common characteristic set and student common characteristics; determining class centers by adopting an incremental learning strategy based on class identifiers corresponding to the pseudo labels, punishing distances of different class centers in the teacher model by judging a centroid clustering strategy to enable the different class centers to be far away from each other, and enabling common features of all teachers to approach the class centers to obtain a clustered common feature set;

(3) Inputting each teacher soft prediction result into an entropy impurity formula to measure the ambiguity of the teacher soft prediction result, comparing the result after ambiguity normalization with a constraint boundary, screening a confidence teacher model meeting requirements, mixing a clustering common feature set corresponding to the screened confidence teacher model with student common features to obtain a mixed domain common feature set, randomly screening a part of common feature sets from the mixed domain common feature set as source domain features, then using the rest common feature sets as target domain features, binding the common features and corresponding pseudo labels by using a Kronecker product to enable the common features of the source domain and the target domain to be differentially mapped, achieving the purpose of mapping the same class of features in the source domain and the target domain to the same subspace, and finally aligning the common features after mapping in the source domain features and the target domain features by a maximum average difference method;

(4) Constructing a total loss function, and training an initial student model through the total loss function to obtain a final student model, wherein the total loss function comprises a judging centroid clustering strategy loss function, a reliable combined loss function, a reconstruction loss function and a classification score loss function;

the method comprises the steps of constructing a reconstruction loss function based on the last layer of characteristics and reconstruction characteristics in a teacher model, obtaining the reconstruction characteristics of the teacher model by adopting a multilayer convolutional neural network based on the common characteristics of the teacher model, and constructing a judgment centroid clustering strategy loss function based on a plurality of class centers and the common characteristics of each teacher; constructing a reliable combined loss function by adopting maximum average difference loss based on the result of Kronecker product of the source domain characteristic and the target domain characteristic and the corresponding pseudo label respectively; constructing a classification score loss function through cross entropy loss based on the teacher soft prediction result set and the student model prediction results;

(5) And when the image data is applied, inputting the image data without the label into the final student model to obtain the category of the image data without the label.

2. The method for fusing layered knowledge of bidirectional discriminative feature alignment as claimed in claim 1, wherein the step of inputting the last layer of features into the common feature extractor to obtain the teacher common feature set and the student common features respectively comprises:

and firstly, respectively inputting the last layer of characteristics into an individually parameterized adaptation layer to align the characteristic dimensions to obtain a plurality of adaptation layer characteristics, and converting the plurality of adaptation layer characteristics into a homogeneous common space through a shared extractor to obtain a teacher common characteristic set and student common characteristics.

3. The bidirectional discriminative feature-aligned hierarchical knowledge fusion method of claim 1, wherein a class center of a kth class designator with a batch sample number τ of an nth teacher model is determined by an incremental learning strategy based on class designators corresponding to pseudo-tags

Comprises the following steps:

4. The method of claim 1, wherein the pseudo label y obtained by inputting the sum to the activation function is:

y＝argmax(softmax(c))

wherein N is the number of the teacher models,

and the soft prediction result is the nth teacher soft prediction result.

5. The bi-directional discriminative feature-aligned hierarchical knowledge fusion method of claim 1 wherein the ambiguity of the soft prediction of the ith unlabeled image data class by the nth teacher model is measured by entropy impurity

Comprises the following steps:

wherein K is the number of the class identifier,

6. The two-way discriminative feature-aligned hierarchical knowledge fusion method according to claim 1, wherein the number L of teacher models satisfying the comparison requirement obtained by comparing the normalized blur degree with the marginal constraint is:

wherein the content of the first and second substances,

for the normalization operator, η is the marginal constraint value,

to satisfy the nth teacher model of the comparison requirement.

7. The method for fusing layered knowledge of reliable bi-directional discriminative feature alignment according to claim 1Method, characterized by a total loss function

Comprises the following steps:

wherein the content of the first and second substances,

the reconstructed features and the last layer of features of the ith sample of the nth teacher model, B is the batch number, N is the number of teachers, and lambda _C To classify the score loss weight, λ _J To align the lost weights, λ, for the joint group _DR To reconstruct the weight of the loss and class-centered loss in the total loss,

in order to classify the score-loss function,

in order to combine the loss functions reliably,

to discriminate the centroid clustering strategy loss function,

for the reconstruction loss function, B is the number of unlabeled image data,

in order to be a function of the cross-over loss,

to predict the result for the student model for the ith unlabeled image data,

in order to be a logical function of the data,

kronecker product is performed for q common features of the jth unlabeled image data pseudo label and the r-th target domain,

so that the centers of different classes are punished by distance, the centers of different classes are far away from each other, alpha is a balance parameter, N is the number of teacher models,

for the y-th teacher model in the n-th teacher model _i Class center of class, k ₁ And k ₂ Are indexes of class designators, but k ₁ And k ₂ And v is the distance between the constraint edge and the center of different classes in the control teacher model.

8. The bi-directional discriminative feature-aligned hierarchical knowledge fusion method of claim 7 wherein the y-th teacher model in the n-th teacher model _i Class center of a class

Comprises the following steps:

where τ is the index of the number of samples in the batch, y _i Is a pseudo label of the ith label-free graph data.

9. A bi-directional discriminative feature aligned hierarchical knowledge fusion device comprising a computer memory, a computer processor and a computer program stored in and executable on said computer memory, wherein the final student model of any one of claims 1 to 8 is employed in said computer memory;