CN111160409A - Heterogeneous neural network knowledge reorganization method based on common feature learning - Google Patents

Heterogeneous neural network knowledge reorganization method based on common feature learning Download PDF

Info

Publication number
CN111160409A
CN111160409A CN201911265852.2A CN201911265852A CN111160409A CN 111160409 A CN111160409 A CN 111160409A CN 201911265852 A CN201911265852 A CN 201911265852A CN 111160409 A CN111160409 A CN 111160409A
Authority
CN
China
Prior art keywords
model
teacher
student
models
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911265852.2A
Other languages
Chinese (zh)
Inventor
宋明黎
罗思惠
方共凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911265852.2A priority Critical patent/CN111160409A/en
Publication of CN111160409A publication Critical patent/CN111160409A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps: acquiring a plurality of pre-trained neural network models, which are called teacher models; the characteristics output by the teacher model and the output prediction result are used for guiding the training of the student model through a common characteristic learning and soft target distillation method; in the common characteristic learning process, the characteristics of a plurality of heterogeneous networks are projected to a common characteristic interval, the student models integrate knowledge of a plurality of teacher models, and the soft target distillation method enables the prediction results of the student models to be consistent with the prediction results of the teacher models, so that a stronger student model with the task processing capacity of all the teacher models is obtained. The student model can be trained without any manual marking because only the prediction result of the teacher model needs to be simulated. The method is suitable for knowledge reorganization of the neural network model, in particular to the knowledge reorganization of the heterogeneous image classification task model.

Description

Heterogeneous neural network knowledge reorganization method based on common feature learning
Technical Field
The invention relates to the field of machine learning, in particular to a heterogeneous neural network knowledge reorganization method based on common feature learning
Background
In recent years, Deep Neural Networks (DNNs) have enjoyed dramatic success in a multitude of artificial intelligence tasks such as computer vision and natural language processing. However, despite the extraordinary results, the training of DNN models relies heavily on large-scale manually labeled datasets and its training takes a long time. To ease the reproduction effort, more and more researchers are starting to publish trained models on the internet for users to download and use them instantly. The released models are reused to obtain the customized model with multitasking capability, and manual data marking is not needed, so that the method has great significance. However, due to the rapid development of deep learning and the consequent emergence of a large number of network variables, such publicly available training models often have varying network structures, each oriented to a particular task or data set, which presents challenges to the fused reorganization of these models.
In the present invention, the inventors have addressed a deep model fusion reuse task, with the goal of training lightweight and multi-task-capable student models using a multi-task oriented heterogeneous teacher model. The method can use a plurality of pre-trained teacher models to train a student model which can be competent for all teacher model tasks without manually marking information. The traditional knowledge distillation method only aims at a single teacher model, and the goal is model compression, namely, a prediction result of a trained large network model is simulated and learned by using a small network model, which is specifically described in GeoffreyHinton, Oriol Vinyals, and Jeff dean. Therefore, the present invention resorts to another method, namely, the output characteristics of the teacher model are projected into a shared learnable characteristic space, then the student model is forced to imitate the characteristics of the teacher model after conversion, a powerful student model is obtained by training by imitating the output of the teacher network in both the characteristics and the prediction results, the comprehensive knowledge from the heterogeneous teacher model can be fused without accessing manual labels, and the tasks of all the teacher models can be solved.
Disclosure of Invention
The invention provides a heterogeneous neural network knowledge reorganization method based on common feature learning. Firstly, the task oriented by the method is defined: given several pre-trained teacher networks, the goal of the invention is to learn a student model that fuses the knowledge of all teacher models and is competent for their tasks without annotating data. The teacher models may be the same or different in architecture, and are not particularly limited.
A heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps:
step 1, selecting a proper student model structure according to customization requirements, carrying out random initialization, inputting the same unlabeled image data to a teacher model and a student model, and respectively obtaining original output characteristics F of the teacher model and the student modelTiAnd FsAnd converting and aligning the two by adopting an adaptation layer to obtain f with consistent sizeTiAnd fS
Step 2, introducing a small learnable subnetwork, wherein parameters of the small learnable subnetwork are shared between teachers and students, namely parameters of shared feature extractors of models of each teacher and each student are the same and are called shared extractors, and the aligned features of the teachers and the students are converted into compatible features in a public feature space through the shared extractors; the shared extractor will fTiAnd fSInto a common space
Figure BDA0002312817280000021
And
Figure BDA0002312817280000022
step 3, measuring the distribution difference among the transformation characteristics obtained in the step 2 by adopting a Maximum Mean Difference (MMD) method, fusing the characteristics of the teacher model, and adapting to the domains of the transformation characteristics of the teacher model and the student models; utensil for cleaning buttockThe body includes: use of
Figure BDA0002312817280000031
To represent a set of all the characteristics of a teacher, where Ct is the total number of characteristics of the teacher; similarly, use is made of
Figure BDA0002312817280000032
To represent all the characteristics of the student, where Cs is the total number of characteristics of the student;
Figure BDA0002312817280000033
and
Figure BDA0002312817280000034
the approximate calculation formula of the MMD distance is as follows:
Figure BDA0002312817280000035
wherein the content of the first and second substances,
Figure BDA0002312817280000036
is an implicit mapping function; by extending this equation with a kernel function K (·,) the MMD loss is defined as follows:
Figure BDA0002312817280000037
the kernel function may project the sample vector into a higher dimensional feature space; it is noted that the features after normalization are used here
Figure BDA0002312817280000038
And
Figure BDA0002312817280000039
then, the MMD losses between the student model and the N teachers are combined to define the total loss L of the common feature space learningMComprises the following steps:
Figure BDA00023128172800000310
step 4, inputting the transferred characteristics into a trainable self-encoder to reconstruct the original output characteristics of the teacher model, and setting F'TiRepresenting teacher original features FTiMeasure the difference between the reconstructed features and the original features and define the reconstruction loss LRIs defined as:
Figure BDA00023128172800000311
by measuring LRThe features converted into the public space can be reversely mapped into the original features, so that the loss of information as little as possible in the feature conversion process is ensured, and the learning of the public feature space is more robust;
step 5, enabling the student model to imitate the prediction result of the teacher model on the input unmarked sample, and taking the difference of the prediction results of the student model and the teacher model on the same task as a last loss function, namely the target distillation loss; specifically, on the task of image classification, the score vectors of teacher models with target classes not overlapped are directly overlapped, namely the serial score vectors are used as the learning targets of student models; in addition, the same strategy is used for overlapping teachers: during training, overlapping classes are treated as multiple different classes, but during testing, they are treated as the same class; let wiRepresents a parameter that maps the output characteristics of the teacher model to the score map, and wsRepresenting the respective parameters of the students, drives the response scores of the student network to approach the loss function L of the teacher's predicted objectiveCComprises the following steps:
LC=‖ws·Fs-[w1·F1,…,wN·FTN]‖2(5)
and 6, combining the losses defined in the steps 3, 4 and 5 together through the hyperparametric weight to form an overall loss function of the network, and calculating the value:
L=LC+(1-α)(LM+LR),α∈[0,1](6)
and 7, calculating the gradient of the network, updating the parameters of the whole network model in the gradient direction of the minimized total loss to obtain the network after the parameters are updated, returning to the step 1, continuously iterating the whole training process until the loss function is converged, and obtaining the student model which is the target model.
Preferably, the structure of the teacher model in step 1 includes, but is not limited to, a residual error network and a VGG network, and the structure of the student model depends on actual needs.
Preferably, the composition of the adaptation layer in step 1 includes, but is not limited to, several layers of 1 × 1 convolution, and the adaptation layer parameters of each teacher model and each student model are different and are obtained through learning; the number of adaptation layer channels can be set to an empirical value 256, or can be set according to actual requirements.
Preferably, the shared feature extractor in step 2 is a small convolutional network composed of three residual modules with 1 × 1 steps; in addition to this, the present invention is,
Figure BDA0002312817280000041
and
Figure BDA0002312817280000042
the number of channels is set to 128, which is empirically set and can be adjusted during actual operation as appropriate.
Preferably, in the soft target distillation module of step 5, the target distillation loss LCDefined as the difference between the response scores of the student network and the predicted scores of the teacher model, may be measured using methods including, but not limited to, calculating Mean Square Error (MSE).
The heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps: acquiring a plurality of pre-trained neural network models, which are called teacher models; the characteristics output by the teacher model and the output prediction result are used for guiding the training of the student model through a common characteristic learning and soft target distillation method; in the common characteristic learning process, the characteristics of a plurality of heterogeneous networks are projected to a common characteristic interval, the student models integrate knowledge of a plurality of teacher models, and the soft target distillation method enables the prediction results of the student models to be consistent with the prediction results of the teacher models, so that a lightweight student model with the task processing capacity of all the teacher models and stronger task processing capacity is obtained. The student model can be trained without any manual marking because only the prediction result of the teacher model needs to be simulated. The method is suitable for knowledge reorganization of the neural network model, in particular to the knowledge reorganization of the heterogeneous image classification task model.
The invention has the advantages that: by reusing the published model, the customized model with the multitask processing capability can be trained without manual marking, resources are fully utilized, and a large amount of labor cost is saved.
Drawings
FIG. 1 is a general block diagram of the process of the present invention.
FIG. 2 is a schematic diagram of a specific structure of a common feature learning module in the method of the present invention.
Detailed Description
The experimental method of the present invention will be described in detail below with reference to the accompanying drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments of the present invention and the features in the embodiments may be combined with each other or in a disordered order, and the formed technical solutions are within the protection scope of the present invention.
The specific framework of the heterogeneous neural network knowledge reorganization method based on common feature learning provided by the invention is shown in figure 1, N teacher networks are assumed, and each teacher network uses TiThe method comprises the following steps:
and step 1, aligning the output characteristics of the teacher model and the student model under the same input.
And selecting a proper student model structure according to the customization requirements and carrying out random initialization. Inputting the same unlabelled image data to the teacher model and the student model, respectivelyObtaining the original output characteristics F of the twoTiAnd Fs. Since the teacher model and the student model are different in structure, F isTiAnd FsMay also be inconsistent, step 1 uses adaptation layer to perform conversion to obtain f with consistent sizeTiAnd fS. The composition of the adaptation layer includes but is not limited to several 1 × 1 convolutions, and the adaptation layer parameters of each teacher model and each student model are different and are obtained through learning. In the implementation of the present invention, the number of adaptation layer channels is set to 256.
And 2, converting the public characteristics.
A small learnable subnetwork is introduced whose parameters are shared between the teacher and the students (i.e., the parameters of the shared feature extractor of each teacher and student model are identical), and hence the shared extractor, by which the aligned teacher and student features are transformed into consistent features in a common feature space. The subnet sharing the feature extractor is a small convolutional network consisting of three residual modules of 1 × 1 steps. It will fTiAnd fSInto a common space
Figure BDA0002312817280000061
And
Figure BDA0002312817280000062
in the course of the particular practice of the present invention,
Figure BDA0002312817280000063
and
Figure BDA0002312817280000064
the number of channels is set to 128, which is empirically set and can be adjusted during actual operation as appropriate.
And 3, calculating the characteristic learning loss.
And (3) measuring the distribution difference among the transformation characteristics obtained in the step (2) by adopting a Maximum Mean Difference (MMD) method, fusing the characteristics of the teacher model, and adapting the domains of the transformation characteristics of the teacher model and the student models. The MMD method can be regarded as a distance measure of probability distribution, which is commonly used as a measure of domain matching in a domain adaptation task, and aligns domains of the student model and the teacher model by estimating a posterior, which is described in detail in "Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Scholkopf, etc. a kernel two-sample test, journal of Machine learning research,13(Mar): 723-773, 2012", and in this step, the MMD method is used to measure a distribution difference between the transformation characteristics of the student model and the teacher model, and to use it as a loss function, and by minimizing this loss, a similarity between the transformation characteristics of the student model and the transformation characteristics of the teacher model is improved, thereby achieving a role of migrating the knowledge of the teacher model to the student model.
In the implementation, the feature similarity measurement between the student and the teacher model is used as an example
Figure BDA0002312817280000071
To represent the set of all the characteristics of the teacher, where Ct is the total number of characteristics of the teacher. Similarly, use is made of
Figure BDA0002312817280000072
To represent all the characteristics of the student, where Cs is the total number of characteristics of the student.
Figure BDA0002312817280000073
And
Figure BDA0002312817280000074
the approximate calculation formula of the MMD distance is as follows:
Figure BDA0002312817280000075
wherein the content of the first and second substances,
Figure BDA0002312817280000076
is an implicit mapping function. By extending this equation with a kernel function K (·,) the MMD loss is defined as follows:
Figure BDA0002312817280000077
the kernel function may project the sample vector into a higher dimensional feature space. It is noted that the features after normalization are used here
Figure BDA0002312817280000078
And
Figure BDA0002312817280000079
then, the MMD losses between the student model and the N teachers are combined to define the total loss L of the common feature space learningMComprises the following steps:
Figure BDA00023128172800000710
and 4, calculating the characteristic reconstruction loss.
The transferred features are input into a trainable self-encoder to reconstruct the original output features of the teacher model. F'TiRepresenting original characteristics F of teacher modelTiMeasure the difference between the reconstructed features and the original features and define the reconstruction loss LRIs defined as:
Figure BDA00023128172800000711
by measuring LRThe features converted into the public space can be reversely mapped into the original features, so that the loss of information as little as possible in the feature conversion process is ensured, and the learning of the public feature space is more robust.
The steps 1 to 4 together form a common feature space learning module, and the specific implementation process of the common feature space learning module is shown in fig. 2. In general, the module transforms the characteristics of the teacher model and the characteristics of the student model to be trained into a common characteristic space through an adaptation layer and a sharing extractor, the parameters of which are learnable. During feature learning, two loss terms are applied: characteristic global loss LMAnd reconstruction lossLR. The former encourages student features to approach the teacher model's transformed features in the public space, while the latter ensures minimal error between the transformed features and the original features.
And 5, enabling the student model to simulate the prediction result of the teacher model on the input unmarked sample, and calculating the target distillation loss.
And the teacher model is used for guiding the training of the student model according to the input prediction result of the unlabelled sample, so that the student model can output the prediction result which is the same as or similar to that of the teacher model. Specifically, on the image classification task, the teacher models with the target classes not overlapped are directly overlapped with their score vectors, namely the serial score vectors are used as the learning targets of the student models. In fact, the same strategy is used for overlapping teachers: in training, overlapping classes are treated as multiple different classes, but during testing they are treated as the same class. Let wiRepresents a parameter that maps the output characteristics of the teacher model to the score map, and wsRepresenting the respective parameters of the students, drives the response scores of the student network to approach the loss function L of the teacher's predicted objectiveCComprises the following steps:
LC=‖ws·Fs-[w1·F1,…,wN·FTN]‖2(5)
step 6, calculating the total loss
Combining the loss functions shown in the formulas (3), (4) and (5) to obtain an end-to-end training total loss function of the student network, wherein the total loss function is as follows:
L=LC+(1-α)(LM+LR),α∈[0,1](6)
α is a hyper-parameter, which acts to balance the three loss terms in equation (6). the overall loss function is calculated by forward propagation through the entire neural network model.
And 7, reversely transmitting and updating the network parameters.
Calculating the gradient of the trainable network shown in fig. 1, updating the parameters of the whole network model in the gradient direction which minimizes the overall loss, and returning to step 1 by using the network after the parameters are updated, wherein the whole training process is iterated continuously until the final convergence, and the obtained student model is the target model.
TABLE 1
Figure BDA0002312817280000091
Table 1 shows the experimental results of a specific example, given two teacher models, which are trained from two subsets of the stanford dog data set or the category 101 data set, respectively, and have network structures of 18-layer residual network (ResNet-18) and 34-layer residual network (ResNet-34), respectively. Knowledge recombination is carried out on the two teacher models by adopting the method to obtain the student models, and the method is compared with other methods on Stanford dogs and Catech 101 data sets according to classification accuracy. From table 1, it can be seen that the student model obtained by training without manual labeling by the method of the present invention is superior to the performance of two teachers on their respective tasks, and even superior to the model obtained by the methods of model integration, classical knowledge distillation, and even training by using real data labels.
TABLE 2
Model (model) LFW data set Agedb30 dataset CFP-FP dataset
T1 97.43% 84.72% 86.20%
T2 97.80% 85.87% 87.27%
Knowledge distillation method 95.15% 84.97% 86.87%
The method of the invention 98.10% 86.93% 87.73%
Table 2 shows the experimental results of another example comparing the process of the present invention with the conventionally known distillation process. Each teacher model in the table is trained on a subset of 3000 classes in the CASIA dataset.
TABLE 3
Model (model) Stanford dog CUB data set FGVC-airplane Stanford car
Single teacher model 87.1% 75.6% 73.2% 82.9%
2 teacher model fusion 84.3% 78.9% - -
3 teacher model fusion 83.1% 77.7% 79.0% -
4 teacher model fusion 82.5% 77.5% 78.3% 84.2%
Table 3 shows that the performance of the student model is affected by the number of teachers. Wherein, the teacher model is obtained by training four different sub-classification task data sets respectively.
TABLE 4
Figure BDA0002312817280000101
Figure BDA0002312817280000111
Table 4 shows a comparison of the classification accuracy of the present method and knowledge distillation method on a stanford data set using various teacher models and student models of different structures.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (5)

1. A heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps:
step 1, selecting a proper student model structure according to customization requirements, carrying out random initialization, inputting the same unlabeled image data to a teacher model and a student model, and respectively obtaining original output characteristics F of the teacher model and the student modelTiAnd FsAnd converting and aligning the two by adopting an adaptation layer to obtain f with consistent sizeTiAnd fS
Step 2, introducing a small learnable subnetwork, wherein parameters of the small learnable subnetwork are shared between teachers and students, namely parameters of shared feature extractors of models of each teacher and each student are the same and are called shared extractors, and the aligned features of the teachers and the students are converted into compatible features in a public feature space through the shared extractors; the shared extractor will fTiAnd fSInto a common space
Figure FDA0002312817270000011
And
Figure FDA0002312817270000012
step 3, measuring the distribution difference among the transformation characteristics obtained in the step 2 by adopting a Maximum Mean Difference (MMD) method, fusing the characteristics of the teacher model, and adapting to the domains of the transformation characteristics of the teacher model and the student models; the method specifically comprises the following steps: use of
Figure FDA0002312817270000013
To represent a set of all the characteristics of a teacher, where Ct is the total number of characteristics of the teacher; similarly, use is made of
Figure FDA0002312817270000014
To represent all the characteristics of the student, where Cs is the total number of characteristics of the student;
Figure FDA0002312817270000015
and
Figure FDA0002312817270000016
the approximate calculation formula of the MMD distance is as follows:
Figure FDA0002312817270000017
wherein the content of the first and second substances,
Figure FDA0002312817270000018
is an implicit mapping function; by extending this equation with a kernel function K (·,) the MMD loss is defined as follows:
Figure FDA0002312817270000019
the kernel function may project the sample vector into a higher dimensional feature space; it is noted that the features after normalization are used here
Figure FDA00023128172700000110
And
Figure FDA00023128172700000111
then, the MMD losses between the student model and the N teachers are combined to define the total loss L of the common feature space learningMComprises the following steps:
Figure FDA00023128172700000112
step 4, inputting the transferred characteristics into a trainable self-encoder to reconstruct the original output characteristics of the teacher model, and setting F'TiRepresenting teacher original features FTiMeasure the difference between the reconstructed features and the original, and define the reconstruction loss LRIs defined as:
Figure FDA0002312817270000021
by measuring LRThe features converted into the public space can be reversely mapped into the original features, so that the loss of information as little as possible in the feature conversion process is ensured, and the learning of the public feature space is more robust;
step 5, enabling the student model to imitate the prediction result of the teacher model on the input unmarked sample, and taking the difference of the prediction results of the student model and the teacher model on the same task as a last loss function, namely the target distillation loss; specifically, on the task of image classification, the score vectors of teacher models with target classes not overlapped are directly overlapped, namely the serial score vectors are used as the learning targets of student models; in addition, the same strategy is used for overlapping teachers: during training, overlapping classes are treated as multiple different classes, but during testing, they are treated as the same class; let wiRepresents a parameter that maps the output characteristics of the teacher model to the score map, and wsRepresenting the respective parameters of the students, drives the response scores of the student network to approach the loss function L of the teacher's predicted objectiveCComprises the following steps:
LC=‖ws·Fs-[w1·F1,…,wN·FTN]‖2(5)
and 6, combining the losses defined in the steps 3, 4 and 5 together through the hyperparametric weight α to form an overall loss function of the network, and calculating the value:
L=LC+(1-α)(LM+LR),α∈[0,1](6)
and 7, calculating the gradient of the network, updating the parameters of the whole network model in the gradient direction of the minimized total loss to obtain the network after the parameters are updated, returning to the step 1, continuously iterating the whole training process until the loss function is converged, and obtaining the student model which is the target model.
2. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the structure of the teacher model in the step 1 includes, but is not limited to, a residual error network and a VGG network, and the structure of the student model is determined according to actual requirements.
3. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the adaptation layer in the step 1 comprises but is not limited to a plurality of layers of 1 × 1 convolution, and the parameters of the adaptation layer of each teacher model and each student model are different and are obtained through learning; the number of adaptation layer channels can be set to an empirical value 256, or can be set according to actual requirements.
4. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the shared feature extractor in the step 2 is a small convolution network consisting of three residual modules with 1 × 1 stride; in addition to this, the present invention is,
Figure FDA0002312817270000031
and
Figure FDA0002312817270000032
the number of channels is set to 128, which is empirically set and can be adjusted during actual operation as appropriate.
5. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the soft target steaming described in step 5In the distillation module, the target distillation loss LCDefined as the difference between the response scores of the student network and the predicted scores of the teacher model, may be measured using methods including, but not limited to, calculating Mean Square Error (MSE).
CN201911265852.2A 2019-12-11 2019-12-11 Heterogeneous neural network knowledge reorganization method based on common feature learning Withdrawn CN111160409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911265852.2A CN111160409A (en) 2019-12-11 2019-12-11 Heterogeneous neural network knowledge reorganization method based on common feature learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911265852.2A CN111160409A (en) 2019-12-11 2019-12-11 Heterogeneous neural network knowledge reorganization method based on common feature learning

Publications (1)

Publication Number Publication Date
CN111160409A true CN111160409A (en) 2020-05-15

Family

ID=70556975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911265852.2A Withdrawn CN111160409A (en) 2019-12-11 2019-12-11 Heterogeneous neural network knowledge reorganization method based on common feature learning

Country Status (1)

Country Link
CN (1) CN111160409A (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695698A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Method, device, electronic equipment and readable storage medium for model distillation
CN111754985A (en) * 2020-07-06 2020-10-09 上海依图信息技术有限公司 Method and device for training voice recognition model and voice recognition
CN111783899A (en) * 2020-07-10 2020-10-16 安徽启新明智科技有限公司 Method for identifying novel contraband through autonomous learning
CN112164054A (en) * 2020-09-30 2021-01-01 交叉信息核心技术研究院(西安)有限公司 Knowledge distillation-based image target detection method and detector and training method thereof
CN112163238A (en) * 2020-09-09 2021-01-01 中国科学院信息工程研究所 Network model training method for multi-party participation data unshared
CN112329725A (en) * 2020-11-27 2021-02-05 腾讯科技(深圳)有限公司 Method, device and equipment for identifying elements of road scene and storage medium
CN112418343A (en) * 2020-12-08 2021-02-26 中山大学 Multi-teacher self-adaptive joint knowledge distillation
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112529162A (en) * 2020-12-15 2021-03-19 北京百度网讯科技有限公司 Neural network model updating method, device, equipment and storage medium
CN112560631A (en) * 2020-12-09 2021-03-26 昆明理工大学 Knowledge distillation-based pedestrian re-identification method
CN112801209A (en) * 2021-02-26 2021-05-14 同济大学 Image classification method based on dual-length teacher model knowledge fusion and storage medium
CN113222123A (en) * 2021-06-15 2021-08-06 深圳市商汤科技有限公司 Model training method, device, equipment and computer storage medium
CN113360777A (en) * 2021-08-06 2021-09-07 北京达佳互联信息技术有限公司 Content recommendation model training method, content recommendation method and related equipment
CN113469977A (en) * 2021-07-06 2021-10-01 浙江霖研精密科技有限公司 Flaw detection device and method based on distillation learning mechanism and storage medium
CN113592007A (en) * 2021-08-05 2021-11-02 哈尔滨理工大学 Knowledge distillation-based bad picture identification system and method, computer and storage medium
CN113792871A (en) * 2021-08-04 2021-12-14 北京旷视科技有限公司 Neural network training method, target identification method, device and electronic equipment
CN113822373A (en) * 2021-10-27 2021-12-21 南京大学 Image classification model training method based on integration and knowledge distillation
WO2022001805A1 (en) * 2020-06-30 2022-01-06 华为技术有限公司 Neural network distillation method and device
WO2022120996A1 (en) * 2020-12-10 2022-06-16 中国科学院深圳先进技术研究院 Visual position recognition method and apparatus, and computer device and readable storage medium
CN114743243A (en) * 2022-04-06 2022-07-12 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence
CN114970862A (en) * 2022-04-28 2022-08-30 北京航空航天大学 PDL1 expression level prediction method based on multi-instance knowledge distillation model
CN115204394A (en) * 2022-07-05 2022-10-18 上海人工智能创新中心 Knowledge distillation method for target detection
CN116028891A (en) * 2023-02-16 2023-04-28 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116091895A (en) * 2023-04-04 2023-05-09 之江实验室 Model training method and device oriented to multitask knowledge fusion
CN116662814A (en) * 2023-07-28 2023-08-29 腾讯科技(深圳)有限公司 Object intention prediction method, device, computer equipment and storage medium
WO2024032386A1 (en) * 2022-08-08 2024-02-15 Huawei Technologies Co., Ltd. Systems and methods for artificial-intelligence model training using unsupervised domain adaptation with multi-source meta-distillation
WO2024066111A1 (en) * 2022-09-28 2024-04-04 北京大学 Image processing model training method and apparatus, image processing method and apparatus, and device and medium
CN114743243B (en) * 2022-04-06 2024-05-31 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097178A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 It is a kind of paid attention to based on entropy neural network model compression and accelerated method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097178A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 It is a kind of paid attention to based on entropy neural network model compression and accelerated method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SIHUI LUO,ET.AL: "Knowledge Amalgamation from Heterogeneous Networks", 《ARXIV PREPRINT ARXIV:1906.10546》 *

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695698A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Method, device, electronic equipment and readable storage medium for model distillation
CN111695698B (en) * 2020-06-12 2023-09-12 北京百度网讯科技有限公司 Method, apparatus, electronic device, and readable storage medium for model distillation
WO2022001805A1 (en) * 2020-06-30 2022-01-06 华为技术有限公司 Neural network distillation method and device
CN111754985B (en) * 2020-07-06 2023-05-02 上海依图信息技术有限公司 Training of voice recognition model and voice recognition method and device
CN111754985A (en) * 2020-07-06 2020-10-09 上海依图信息技术有限公司 Method and device for training voice recognition model and voice recognition
CN111783899A (en) * 2020-07-10 2020-10-16 安徽启新明智科技有限公司 Method for identifying novel contraband through autonomous learning
CN111783899B (en) * 2020-07-10 2023-08-15 安徽启新明智科技有限公司 Method for autonomously learning and identifying novel contraband
CN112163238A (en) * 2020-09-09 2021-01-01 中国科学院信息工程研究所 Network model training method for multi-party participation data unshared
CN112163238B (en) * 2020-09-09 2022-08-16 中国科学院信息工程研究所 Network model training method for multi-party participation data unshared
CN112164054A (en) * 2020-09-30 2021-01-01 交叉信息核心技术研究院(西安)有限公司 Knowledge distillation-based image target detection method and detector and training method thereof
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112329725A (en) * 2020-11-27 2021-02-05 腾讯科技(深圳)有限公司 Method, device and equipment for identifying elements of road scene and storage medium
CN112329725B (en) * 2020-11-27 2022-03-25 腾讯科技(深圳)有限公司 Method, device and equipment for identifying elements of road scene and storage medium
CN112418343B (en) * 2020-12-08 2024-01-05 中山大学 Multi-teacher self-adaptive combined student model training method
CN112418343A (en) * 2020-12-08 2021-02-26 中山大学 Multi-teacher self-adaptive joint knowledge distillation
CN112560631A (en) * 2020-12-09 2021-03-26 昆明理工大学 Knowledge distillation-based pedestrian re-identification method
WO2022120996A1 (en) * 2020-12-10 2022-06-16 中国科学院深圳先进技术研究院 Visual position recognition method and apparatus, and computer device and readable storage medium
CN112529162B (en) * 2020-12-15 2024-02-27 北京百度网讯科技有限公司 Neural network model updating method, device, equipment and storage medium
CN112529162A (en) * 2020-12-15 2021-03-19 北京百度网讯科技有限公司 Neural network model updating method, device, equipment and storage medium
CN112801209A (en) * 2021-02-26 2021-05-14 同济大学 Image classification method based on dual-length teacher model knowledge fusion and storage medium
CN113222123A (en) * 2021-06-15 2021-08-06 深圳市商汤科技有限公司 Model training method, device, equipment and computer storage medium
CN113469977A (en) * 2021-07-06 2021-10-01 浙江霖研精密科技有限公司 Flaw detection device and method based on distillation learning mechanism and storage medium
CN113469977B (en) * 2021-07-06 2024-01-12 浙江霖研精密科技有限公司 Flaw detection device, method and storage medium based on distillation learning mechanism
CN113792871A (en) * 2021-08-04 2021-12-14 北京旷视科技有限公司 Neural network training method, target identification method, device and electronic equipment
CN113592007A (en) * 2021-08-05 2021-11-02 哈尔滨理工大学 Knowledge distillation-based bad picture identification system and method, computer and storage medium
CN113360777A (en) * 2021-08-06 2021-09-07 北京达佳互联信息技术有限公司 Content recommendation model training method, content recommendation method and related equipment
CN113360777B (en) * 2021-08-06 2021-12-07 北京达佳互联信息技术有限公司 Content recommendation model training method, content recommendation method and related equipment
CN113822373A (en) * 2021-10-27 2021-12-21 南京大学 Image classification model training method based on integration and knowledge distillation
CN113822373B (en) * 2021-10-27 2023-09-15 南京大学 Image classification model training method based on integration and knowledge distillation
CN114743243A (en) * 2022-04-06 2022-07-12 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence
CN114743243B (en) * 2022-04-06 2024-05-31 平安科技(深圳)有限公司 Human face recognition method, device, equipment and storage medium based on artificial intelligence
CN114970862B (en) * 2022-04-28 2024-05-28 北京航空航天大学 PDL1 expression level prediction method based on multi-instance knowledge distillation model
CN114970862A (en) * 2022-04-28 2022-08-30 北京航空航天大学 PDL1 expression level prediction method based on multi-instance knowledge distillation model
CN115204394A (en) * 2022-07-05 2022-10-18 上海人工智能创新中心 Knowledge distillation method for target detection
WO2024032386A1 (en) * 2022-08-08 2024-02-15 Huawei Technologies Co., Ltd. Systems and methods for artificial-intelligence model training using unsupervised domain adaptation with multi-source meta-distillation
WO2024066111A1 (en) * 2022-09-28 2024-04-04 北京大学 Image processing model training method and apparatus, image processing method and apparatus, and device and medium
CN116028891A (en) * 2023-02-16 2023-04-28 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116091895B (en) * 2023-04-04 2023-07-11 之江实验室 Model training method and device oriented to multitask knowledge fusion
CN116091895A (en) * 2023-04-04 2023-05-09 之江实验室 Model training method and device oriented to multitask knowledge fusion
CN116662814B (en) * 2023-07-28 2023-10-31 腾讯科技(深圳)有限公司 Object intention prediction method, device, computer equipment and storage medium
CN116662814A (en) * 2023-07-28 2023-08-29 腾讯科技(深圳)有限公司 Object intention prediction method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111160409A (en) Heterogeneous neural network knowledge reorganization method based on common feature learning
CN111897941A (en) Dialog generation method, network training method, device, storage medium and equipment
CN111026842A (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN112116092B (en) Interpretable knowledge level tracking method, system and storage medium
CN114386694A (en) Drug molecule property prediction method, device and equipment based on comparative learning
Yang et al. Visual curiosity: Learning to ask questions to learn visual recognition
CN107077487A (en) Personal photo is tagged using depth network
CN114186084B (en) Online multi-mode Hash retrieval method, system, storage medium and equipment
CN115186097A (en) Knowledge graph and reinforcement learning based interactive recommendation method
El Gourari et al. The implementation of deep reinforcement learning in e-learning and distance learning: Remote practical work
CN113821527A (en) Hash code generation method and device, computer equipment and storage medium
CN116136870A (en) Intelligent social conversation method and conversation system based on enhanced entity representation
Zhu et al. Learning to transfer learn: Reinforcement learning-based selection for adaptive transfer learning
Tang et al. A practical exploration of constructive english learning platform informatization based on rbf algorithm
CN116738371B (en) User learning portrait construction method and system based on artificial intelligence
CN114281955A (en) Dialogue processing method, device, equipment and storage medium
Kamil et al. Literature Review of Generative models for Image-to-Image translation problems
CN114880527B (en) Multi-modal knowledge graph representation method based on multi-prediction task
CN115795993A (en) Layered knowledge fusion method and device for bidirectional discriminant feature alignment
CN112907004B (en) Learning planning method, device and computer storage medium
CN113535911B (en) Reward model processing method, electronic device, medium and computer program product
Xie et al. Skillearn: Machine learning inspired by humans' learning skills
CN113742591A (en) Learning partner recommendation method and device, electronic equipment and storage medium
CN113887471A (en) Video time sequence positioning method based on feature decoupling and cross comparison
CN115619363A (en) Interviewing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200515

WW01 Invention patent application withdrawn after publication