CN111160409A - Heterogeneous neural network knowledge reorganization method based on common feature learning - Google Patents
Heterogeneous neural network knowledge reorganization method based on common feature learning Download PDFInfo
- Publication number
- CN111160409A CN111160409A CN201911265852.2A CN201911265852A CN111160409A CN 111160409 A CN111160409 A CN 111160409A CN 201911265852 A CN201911265852 A CN 201911265852A CN 111160409 A CN111160409 A CN 111160409A
- Authority
- CN
- China
- Prior art keywords
- model
- teacher
- student
- models
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps: acquiring a plurality of pre-trained neural network models, which are called teacher models; the characteristics output by the teacher model and the output prediction result are used for guiding the training of the student model through a common characteristic learning and soft target distillation method; in the common characteristic learning process, the characteristics of a plurality of heterogeneous networks are projected to a common characteristic interval, the student models integrate knowledge of a plurality of teacher models, and the soft target distillation method enables the prediction results of the student models to be consistent with the prediction results of the teacher models, so that a stronger student model with the task processing capacity of all the teacher models is obtained. The student model can be trained without any manual marking because only the prediction result of the teacher model needs to be simulated. The method is suitable for knowledge reorganization of the neural network model, in particular to the knowledge reorganization of the heterogeneous image classification task model.
Description
Technical Field
The invention relates to the field of machine learning, in particular to a heterogeneous neural network knowledge reorganization method based on common feature learning
Background
In recent years, Deep Neural Networks (DNNs) have enjoyed dramatic success in a multitude of artificial intelligence tasks such as computer vision and natural language processing. However, despite the extraordinary results, the training of DNN models relies heavily on large-scale manually labeled datasets and its training takes a long time. To ease the reproduction effort, more and more researchers are starting to publish trained models on the internet for users to download and use them instantly. The released models are reused to obtain the customized model with multitasking capability, and manual data marking is not needed, so that the method has great significance. However, due to the rapid development of deep learning and the consequent emergence of a large number of network variables, such publicly available training models often have varying network structures, each oriented to a particular task or data set, which presents challenges to the fused reorganization of these models.
In the present invention, the inventors have addressed a deep model fusion reuse task, with the goal of training lightweight and multi-task-capable student models using a multi-task oriented heterogeneous teacher model. The method can use a plurality of pre-trained teacher models to train a student model which can be competent for all teacher model tasks without manually marking information. The traditional knowledge distillation method only aims at a single teacher model, and the goal is model compression, namely, a prediction result of a trained large network model is simulated and learned by using a small network model, which is specifically described in GeoffreyHinton, Oriol Vinyals, and Jeff dean. Therefore, the present invention resorts to another method, namely, the output characteristics of the teacher model are projected into a shared learnable characteristic space, then the student model is forced to imitate the characteristics of the teacher model after conversion, a powerful student model is obtained by training by imitating the output of the teacher network in both the characteristics and the prediction results, the comprehensive knowledge from the heterogeneous teacher model can be fused without accessing manual labels, and the tasks of all the teacher models can be solved.
Disclosure of Invention
The invention provides a heterogeneous neural network knowledge reorganization method based on common feature learning. Firstly, the task oriented by the method is defined: given several pre-trained teacher networks, the goal of the invention is to learn a student model that fuses the knowledge of all teacher models and is competent for their tasks without annotating data. The teacher models may be the same or different in architecture, and are not particularly limited.
A heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps:
step 1, selecting a proper student model structure according to customization requirements, carrying out random initialization, inputting the same unlabeled image data to a teacher model and a student model, and respectively obtaining original output characteristics F of the teacher model and the student modelTiAnd FsAnd converting and aligning the two by adopting an adaptation layer to obtain f with consistent sizeTiAnd fS;
Step 2, introducing a small learnable subnetwork, wherein parameters of the small learnable subnetwork are shared between teachers and students, namely parameters of shared feature extractors of models of each teacher and each student are the same and are called shared extractors, and the aligned features of the teachers and the students are converted into compatible features in a public feature space through the shared extractors; the shared extractor will fTiAnd fSInto a common spaceAnd
step 3, measuring the distribution difference among the transformation characteristics obtained in the step 2 by adopting a Maximum Mean Difference (MMD) method, fusing the characteristics of the teacher model, and adapting to the domains of the transformation characteristics of the teacher model and the student models; utensil for cleaning buttockThe body includes: use ofTo represent a set of all the characteristics of a teacher, where Ct is the total number of characteristics of the teacher; similarly, use is made ofTo represent all the characteristics of the student, where Cs is the total number of characteristics of the student;andthe approximate calculation formula of the MMD distance is as follows:
wherein the content of the first and second substances,is an implicit mapping function; by extending this equation with a kernel function K (·,) the MMD loss is defined as follows:
the kernel function may project the sample vector into a higher dimensional feature space; it is noted that the features after normalization are used hereAndthen, the MMD losses between the student model and the N teachers are combined to define the total loss L of the common feature space learningMComprises the following steps:
step 4, inputting the transferred characteristics into a trainable self-encoder to reconstruct the original output characteristics of the teacher model, and setting F'TiRepresenting teacher original features FTiMeasure the difference between the reconstructed features and the original features and define the reconstruction loss LRIs defined as:
by measuring LRThe features converted into the public space can be reversely mapped into the original features, so that the loss of information as little as possible in the feature conversion process is ensured, and the learning of the public feature space is more robust;
step 5, enabling the student model to imitate the prediction result of the teacher model on the input unmarked sample, and taking the difference of the prediction results of the student model and the teacher model on the same task as a last loss function, namely the target distillation loss; specifically, on the task of image classification, the score vectors of teacher models with target classes not overlapped are directly overlapped, namely the serial score vectors are used as the learning targets of student models; in addition, the same strategy is used for overlapping teachers: during training, overlapping classes are treated as multiple different classes, but during testing, they are treated as the same class; let wiRepresents a parameter that maps the output characteristics of the teacher model to the score map, and wsRepresenting the respective parameters of the students, drives the response scores of the student network to approach the loss function L of the teacher's predicted objectiveCComprises the following steps:
LC=‖ws·Fs-[w1·F1,…,wN·FTN]‖2(5)
and 6, combining the losses defined in the steps 3, 4 and 5 together through the hyperparametric weight to form an overall loss function of the network, and calculating the value:
L=LC+(1-α)(LM+LR),α∈[0,1](6)
and 7, calculating the gradient of the network, updating the parameters of the whole network model in the gradient direction of the minimized total loss to obtain the network after the parameters are updated, returning to the step 1, continuously iterating the whole training process until the loss function is converged, and obtaining the student model which is the target model.
Preferably, the structure of the teacher model in step 1 includes, but is not limited to, a residual error network and a VGG network, and the structure of the student model depends on actual needs.
Preferably, the composition of the adaptation layer in step 1 includes, but is not limited to, several layers of 1 × 1 convolution, and the adaptation layer parameters of each teacher model and each student model are different and are obtained through learning; the number of adaptation layer channels can be set to an empirical value 256, or can be set according to actual requirements.
Preferably, the shared feature extractor in step 2 is a small convolutional network composed of three residual modules with 1 × 1 steps; in addition to this, the present invention is,andthe number of channels is set to 128, which is empirically set and can be adjusted during actual operation as appropriate.
Preferably, in the soft target distillation module of step 5, the target distillation loss LCDefined as the difference between the response scores of the student network and the predicted scores of the teacher model, may be measured using methods including, but not limited to, calculating Mean Square Error (MSE).
The heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps: acquiring a plurality of pre-trained neural network models, which are called teacher models; the characteristics output by the teacher model and the output prediction result are used for guiding the training of the student model through a common characteristic learning and soft target distillation method; in the common characteristic learning process, the characteristics of a plurality of heterogeneous networks are projected to a common characteristic interval, the student models integrate knowledge of a plurality of teacher models, and the soft target distillation method enables the prediction results of the student models to be consistent with the prediction results of the teacher models, so that a lightweight student model with the task processing capacity of all the teacher models and stronger task processing capacity is obtained. The student model can be trained without any manual marking because only the prediction result of the teacher model needs to be simulated. The method is suitable for knowledge reorganization of the neural network model, in particular to the knowledge reorganization of the heterogeneous image classification task model.
The invention has the advantages that: by reusing the published model, the customized model with the multitask processing capability can be trained without manual marking, resources are fully utilized, and a large amount of labor cost is saved.
Drawings
FIG. 1 is a general block diagram of the process of the present invention.
FIG. 2 is a schematic diagram of a specific structure of a common feature learning module in the method of the present invention.
Detailed Description
The experimental method of the present invention will be described in detail below with reference to the accompanying drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments of the present invention and the features in the embodiments may be combined with each other or in a disordered order, and the formed technical solutions are within the protection scope of the present invention.
The specific framework of the heterogeneous neural network knowledge reorganization method based on common feature learning provided by the invention is shown in figure 1, N teacher networks are assumed, and each teacher network uses TiThe method comprises the following steps:
and step 1, aligning the output characteristics of the teacher model and the student model under the same input.
And selecting a proper student model structure according to the customization requirements and carrying out random initialization. Inputting the same unlabelled image data to the teacher model and the student model, respectivelyObtaining the original output characteristics F of the twoTiAnd Fs. Since the teacher model and the student model are different in structure, F isTiAnd FsMay also be inconsistent, step 1 uses adaptation layer to perform conversion to obtain f with consistent sizeTiAnd fS. The composition of the adaptation layer includes but is not limited to several 1 × 1 convolutions, and the adaptation layer parameters of each teacher model and each student model are different and are obtained through learning. In the implementation of the present invention, the number of adaptation layer channels is set to 256.
And 2, converting the public characteristics.
A small learnable subnetwork is introduced whose parameters are shared between the teacher and the students (i.e., the parameters of the shared feature extractor of each teacher and student model are identical), and hence the shared extractor, by which the aligned teacher and student features are transformed into consistent features in a common feature space. The subnet sharing the feature extractor is a small convolutional network consisting of three residual modules of 1 × 1 steps. It will fTiAnd fSInto a common spaceAndin the course of the particular practice of the present invention,andthe number of channels is set to 128, which is empirically set and can be adjusted during actual operation as appropriate.
And 3, calculating the characteristic learning loss.
And (3) measuring the distribution difference among the transformation characteristics obtained in the step (2) by adopting a Maximum Mean Difference (MMD) method, fusing the characteristics of the teacher model, and adapting the domains of the transformation characteristics of the teacher model and the student models. The MMD method can be regarded as a distance measure of probability distribution, which is commonly used as a measure of domain matching in a domain adaptation task, and aligns domains of the student model and the teacher model by estimating a posterior, which is described in detail in "Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Scholkopf, etc. a kernel two-sample test, journal of Machine learning research,13(Mar): 723-773, 2012", and in this step, the MMD method is used to measure a distribution difference between the transformation characteristics of the student model and the teacher model, and to use it as a loss function, and by minimizing this loss, a similarity between the transformation characteristics of the student model and the transformation characteristics of the teacher model is improved, thereby achieving a role of migrating the knowledge of the teacher model to the student model.
In the implementation, the feature similarity measurement between the student and the teacher model is used as an exampleTo represent the set of all the characteristics of the teacher, where Ct is the total number of characteristics of the teacher. Similarly, use is made ofTo represent all the characteristics of the student, where Cs is the total number of characteristics of the student.Andthe approximate calculation formula of the MMD distance is as follows:
wherein the content of the first and second substances,is an implicit mapping function. By extending this equation with a kernel function K (·,) the MMD loss is defined as follows:
the kernel function may project the sample vector into a higher dimensional feature space. It is noted that the features after normalization are used hereAndthen, the MMD losses between the student model and the N teachers are combined to define the total loss L of the common feature space learningMComprises the following steps:
and 4, calculating the characteristic reconstruction loss.
The transferred features are input into a trainable self-encoder to reconstruct the original output features of the teacher model. F'TiRepresenting original characteristics F of teacher modelTiMeasure the difference between the reconstructed features and the original features and define the reconstruction loss LRIs defined as:
by measuring LRThe features converted into the public space can be reversely mapped into the original features, so that the loss of information as little as possible in the feature conversion process is ensured, and the learning of the public feature space is more robust.
The steps 1 to 4 together form a common feature space learning module, and the specific implementation process of the common feature space learning module is shown in fig. 2. In general, the module transforms the characteristics of the teacher model and the characteristics of the student model to be trained into a common characteristic space through an adaptation layer and a sharing extractor, the parameters of which are learnable. During feature learning, two loss terms are applied: characteristic global loss LMAnd reconstruction lossLR. The former encourages student features to approach the teacher model's transformed features in the public space, while the latter ensures minimal error between the transformed features and the original features.
And 5, enabling the student model to simulate the prediction result of the teacher model on the input unmarked sample, and calculating the target distillation loss.
And the teacher model is used for guiding the training of the student model according to the input prediction result of the unlabelled sample, so that the student model can output the prediction result which is the same as or similar to that of the teacher model. Specifically, on the image classification task, the teacher models with the target classes not overlapped are directly overlapped with their score vectors, namely the serial score vectors are used as the learning targets of the student models. In fact, the same strategy is used for overlapping teachers: in training, overlapping classes are treated as multiple different classes, but during testing they are treated as the same class. Let wiRepresents a parameter that maps the output characteristics of the teacher model to the score map, and wsRepresenting the respective parameters of the students, drives the response scores of the student network to approach the loss function L of the teacher's predicted objectiveCComprises the following steps:
LC=‖ws·Fs-[w1·F1,…,wN·FTN]‖2(5)
step 6, calculating the total loss
Combining the loss functions shown in the formulas (3), (4) and (5) to obtain an end-to-end training total loss function of the student network, wherein the total loss function is as follows:
L=LC+(1-α)(LM+LR),α∈[0,1](6)
α is a hyper-parameter, which acts to balance the three loss terms in equation (6). the overall loss function is calculated by forward propagation through the entire neural network model.
And 7, reversely transmitting and updating the network parameters.
Calculating the gradient of the trainable network shown in fig. 1, updating the parameters of the whole network model in the gradient direction which minimizes the overall loss, and returning to step 1 by using the network after the parameters are updated, wherein the whole training process is iterated continuously until the final convergence, and the obtained student model is the target model.
TABLE 1
Table 1 shows the experimental results of a specific example, given two teacher models, which are trained from two subsets of the stanford dog data set or the category 101 data set, respectively, and have network structures of 18-layer residual network (ResNet-18) and 34-layer residual network (ResNet-34), respectively. Knowledge recombination is carried out on the two teacher models by adopting the method to obtain the student models, and the method is compared with other methods on Stanford dogs and Catech 101 data sets according to classification accuracy. From table 1, it can be seen that the student model obtained by training without manual labeling by the method of the present invention is superior to the performance of two teachers on their respective tasks, and even superior to the model obtained by the methods of model integration, classical knowledge distillation, and even training by using real data labels.
TABLE 2
Model (model) | LFW data set | Agedb30 dataset | CFP-FP dataset |
T1 | 97.43% | 84.72% | 86.20% |
T2 | 97.80% | 85.87% | 87.27% |
Knowledge distillation method | 95.15% | 84.97% | 86.87% |
The method of the invention | 98.10% | 86.93% | 87.73% |
Table 2 shows the experimental results of another example comparing the process of the present invention with the conventionally known distillation process. Each teacher model in the table is trained on a subset of 3000 classes in the CASIA dataset.
TABLE 3
Model (model) | Stanford dog | CUB data set | FGVC-airplane | Stanford car |
Single teacher model | 87.1% | 75.6% | 73.2% | 82.9% |
2 teacher model fusion | 84.3% | 78.9% | - | - |
3 teacher model fusion | 83.1% | 77.7% | 79.0% | - |
4 teacher model fusion | 82.5% | 77.5% | 78.3% | 84.2% |
Table 3 shows that the performance of the student model is affected by the number of teachers. Wherein, the teacher model is obtained by training four different sub-classification task data sets respectively.
TABLE 4
Table 4 shows a comparison of the classification accuracy of the present method and knowledge distillation method on a stanford data set using various teacher models and student models of different structures.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.
Claims (5)
1. A heterogeneous neural network knowledge reorganization method based on common feature learning comprises the following steps:
step 1, selecting a proper student model structure according to customization requirements, carrying out random initialization, inputting the same unlabeled image data to a teacher model and a student model, and respectively obtaining original output characteristics F of the teacher model and the student modelTiAnd FsAnd converting and aligning the two by adopting an adaptation layer to obtain f with consistent sizeTiAnd fS;
Step 2, introducing a small learnable subnetwork, wherein parameters of the small learnable subnetwork are shared between teachers and students, namely parameters of shared feature extractors of models of each teacher and each student are the same and are called shared extractors, and the aligned features of the teachers and the students are converted into compatible features in a public feature space through the shared extractors; the shared extractor will fTiAnd fSInto a common spaceAnd
step 3, measuring the distribution difference among the transformation characteristics obtained in the step 2 by adopting a Maximum Mean Difference (MMD) method, fusing the characteristics of the teacher model, and adapting to the domains of the transformation characteristics of the teacher model and the student models; the method specifically comprises the following steps: use ofTo represent a set of all the characteristics of a teacher, where Ct is the total number of characteristics of the teacher; similarly, use is made ofTo represent all the characteristics of the student, where Cs is the total number of characteristics of the student;andthe approximate calculation formula of the MMD distance is as follows:
wherein the content of the first and second substances,is an implicit mapping function; by extending this equation with a kernel function K (·,) the MMD loss is defined as follows:
the kernel function may project the sample vector into a higher dimensional feature space; it is noted that the features after normalization are used hereAndthen, the MMD losses between the student model and the N teachers are combined to define the total loss L of the common feature space learningMComprises the following steps:
step 4, inputting the transferred characteristics into a trainable self-encoder to reconstruct the original output characteristics of the teacher model, and setting F'TiRepresenting teacher original features FTiMeasure the difference between the reconstructed features and the original, and define the reconstruction loss LRIs defined as:
by measuring LRThe features converted into the public space can be reversely mapped into the original features, so that the loss of information as little as possible in the feature conversion process is ensured, and the learning of the public feature space is more robust;
step 5, enabling the student model to imitate the prediction result of the teacher model on the input unmarked sample, and taking the difference of the prediction results of the student model and the teacher model on the same task as a last loss function, namely the target distillation loss; specifically, on the task of image classification, the score vectors of teacher models with target classes not overlapped are directly overlapped, namely the serial score vectors are used as the learning targets of student models; in addition, the same strategy is used for overlapping teachers: during training, overlapping classes are treated as multiple different classes, but during testing, they are treated as the same class; let wiRepresents a parameter that maps the output characteristics of the teacher model to the score map, and wsRepresenting the respective parameters of the students, drives the response scores of the student network to approach the loss function L of the teacher's predicted objectiveCComprises the following steps:
LC=‖ws·Fs-[w1·F1,…,wN·FTN]‖2(5)
and 6, combining the losses defined in the steps 3, 4 and 5 together through the hyperparametric weight α to form an overall loss function of the network, and calculating the value:
L=LC+(1-α)(LM+LR),α∈[0,1](6)
and 7, calculating the gradient of the network, updating the parameters of the whole network model in the gradient direction of the minimized total loss to obtain the network after the parameters are updated, returning to the step 1, continuously iterating the whole training process until the loss function is converged, and obtaining the student model which is the target model.
2. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the structure of the teacher model in the step 1 includes, but is not limited to, a residual error network and a VGG network, and the structure of the student model is determined according to actual requirements.
3. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the adaptation layer in the step 1 comprises but is not limited to a plurality of layers of 1 × 1 convolution, and the parameters of the adaptation layer of each teacher model and each student model are different and are obtained through learning; the number of adaptation layer channels can be set to an empirical value 256, or can be set according to actual requirements.
4. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the shared feature extractor in the step 2 is a small convolution network consisting of three residual modules with 1 × 1 stride; in addition to this, the present invention is,andthe number of channels is set to 128, which is empirically set and can be adjusted during actual operation as appropriate.
5. The method of claim 1 for heterogeneous neural network knowledge reorganization based on common feature learning, wherein: the soft target steaming described in step 5In the distillation module, the target distillation loss LCDefined as the difference between the response scores of the student network and the predicted scores of the teacher model, may be measured using methods including, but not limited to, calculating Mean Square Error (MSE).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911265852.2A CN111160409A (en) | 2019-12-11 | 2019-12-11 | Heterogeneous neural network knowledge reorganization method based on common feature learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911265852.2A CN111160409A (en) | 2019-12-11 | 2019-12-11 | Heterogeneous neural network knowledge reorganization method based on common feature learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111160409A true CN111160409A (en) | 2020-05-15 |
Family
ID=70556975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911265852.2A Withdrawn CN111160409A (en) | 2019-12-11 | 2019-12-11 | Heterogeneous neural network knowledge reorganization method based on common feature learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160409A (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695698A (en) * | 2020-06-12 | 2020-09-22 | 北京百度网讯科技有限公司 | Method, device, electronic equipment and readable storage medium for model distillation |
CN111754985A (en) * | 2020-07-06 | 2020-10-09 | 上海依图信息技术有限公司 | Method and device for training voice recognition model and voice recognition |
CN111783899A (en) * | 2020-07-10 | 2020-10-16 | 安徽启新明智科技有限公司 | Method for identifying novel contraband through autonomous learning |
CN112164054A (en) * | 2020-09-30 | 2021-01-01 | 交叉信息核心技术研究院(西安)有限公司 | Knowledge distillation-based image target detection method and detector and training method thereof |
CN112163238A (en) * | 2020-09-09 | 2021-01-01 | 中国科学院信息工程研究所 | Network model training method for multi-party participation data unshared |
CN112329725A (en) * | 2020-11-27 | 2021-02-05 | 腾讯科技(深圳)有限公司 | Method, device and equipment for identifying elements of road scene and storage medium |
CN112418343A (en) * | 2020-12-08 | 2021-02-26 | 中山大学 | Multi-teacher self-adaptive joint knowledge distillation |
CN112508169A (en) * | 2020-11-13 | 2021-03-16 | 华为技术有限公司 | Knowledge distillation method and system |
CN112529162A (en) * | 2020-12-15 | 2021-03-19 | 北京百度网讯科技有限公司 | Neural network model updating method, device, equipment and storage medium |
CN112560631A (en) * | 2020-12-09 | 2021-03-26 | 昆明理工大学 | Knowledge distillation-based pedestrian re-identification method |
CN112801209A (en) * | 2021-02-26 | 2021-05-14 | 同济大学 | Image classification method based on dual-length teacher model knowledge fusion and storage medium |
CN113222123A (en) * | 2021-06-15 | 2021-08-06 | 深圳市商汤科技有限公司 | Model training method, device, equipment and computer storage medium |
CN113360777A (en) * | 2021-08-06 | 2021-09-07 | 北京达佳互联信息技术有限公司 | Content recommendation model training method, content recommendation method and related equipment |
CN113469977A (en) * | 2021-07-06 | 2021-10-01 | 浙江霖研精密科技有限公司 | Flaw detection device and method based on distillation learning mechanism and storage medium |
CN113592007A (en) * | 2021-08-05 | 2021-11-02 | 哈尔滨理工大学 | Knowledge distillation-based bad picture identification system and method, computer and storage medium |
CN113792871A (en) * | 2021-08-04 | 2021-12-14 | 北京旷视科技有限公司 | Neural network training method, target identification method, device and electronic equipment |
CN113822373A (en) * | 2021-10-27 | 2021-12-21 | 南京大学 | Image classification model training method based on integration and knowledge distillation |
WO2022001805A1 (en) * | 2020-06-30 | 2022-01-06 | 华为技术有限公司 | Neural network distillation method and device |
WO2022120996A1 (en) * | 2020-12-10 | 2022-06-16 | 中国科学院深圳先进技术研究院 | Visual position recognition method and apparatus, and computer device and readable storage medium |
CN114743243A (en) * | 2022-04-06 | 2022-07-12 | 平安科技(深圳)有限公司 | Human face recognition method, device, equipment and storage medium based on artificial intelligence |
CN114970862A (en) * | 2022-04-28 | 2022-08-30 | 北京航空航天大学 | PDL1 expression level prediction method based on multi-instance knowledge distillation model |
CN115204394A (en) * | 2022-07-05 | 2022-10-18 | 上海人工智能创新中心 | Knowledge distillation method for target detection |
CN116028891A (en) * | 2023-02-16 | 2023-04-28 | 之江实验室 | Industrial anomaly detection model training method and device based on multi-model fusion |
CN116091895A (en) * | 2023-04-04 | 2023-05-09 | 之江实验室 | Model training method and device oriented to multitask knowledge fusion |
CN116662814A (en) * | 2023-07-28 | 2023-08-29 | 腾讯科技(深圳)有限公司 | Object intention prediction method, device, computer equipment and storage medium |
WO2024032386A1 (en) * | 2022-08-08 | 2024-02-15 | Huawei Technologies Co., Ltd. | Systems and methods for artificial-intelligence model training using unsupervised domain adaptation with multi-source meta-distillation |
WO2024066111A1 (en) * | 2022-09-28 | 2024-04-04 | 北京大学 | Image processing model training method and apparatus, image processing method and apparatus, and device and medium |
CN114743243B (en) * | 2022-04-06 | 2024-05-31 | 平安科技(深圳)有限公司 | Human face recognition method, device, equipment and storage medium based on artificial intelligence |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097178A (en) * | 2019-05-15 | 2019-08-06 | 电科瑞达(成都)科技有限公司 | It is a kind of paid attention to based on entropy neural network model compression and accelerated method |
-
2019
- 2019-12-11 CN CN201911265852.2A patent/CN111160409A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097178A (en) * | 2019-05-15 | 2019-08-06 | 电科瑞达(成都)科技有限公司 | It is a kind of paid attention to based on entropy neural network model compression and accelerated method |
Non-Patent Citations (1)
Title |
---|
SIHUI LUO,ET.AL: "Knowledge Amalgamation from Heterogeneous Networks", 《ARXIV PREPRINT ARXIV:1906.10546》 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111695698A (en) * | 2020-06-12 | 2020-09-22 | 北京百度网讯科技有限公司 | Method, device, electronic equipment and readable storage medium for model distillation |
CN111695698B (en) * | 2020-06-12 | 2023-09-12 | 北京百度网讯科技有限公司 | Method, apparatus, electronic device, and readable storage medium for model distillation |
WO2022001805A1 (en) * | 2020-06-30 | 2022-01-06 | 华为技术有限公司 | Neural network distillation method and device |
CN111754985B (en) * | 2020-07-06 | 2023-05-02 | 上海依图信息技术有限公司 | Training of voice recognition model and voice recognition method and device |
CN111754985A (en) * | 2020-07-06 | 2020-10-09 | 上海依图信息技术有限公司 | Method and device for training voice recognition model and voice recognition |
CN111783899A (en) * | 2020-07-10 | 2020-10-16 | 安徽启新明智科技有限公司 | Method for identifying novel contraband through autonomous learning |
CN111783899B (en) * | 2020-07-10 | 2023-08-15 | 安徽启新明智科技有限公司 | Method for autonomously learning and identifying novel contraband |
CN112163238A (en) * | 2020-09-09 | 2021-01-01 | 中国科学院信息工程研究所 | Network model training method for multi-party participation data unshared |
CN112163238B (en) * | 2020-09-09 | 2022-08-16 | 中国科学院信息工程研究所 | Network model training method for multi-party participation data unshared |
CN112164054A (en) * | 2020-09-30 | 2021-01-01 | 交叉信息核心技术研究院(西安)有限公司 | Knowledge distillation-based image target detection method and detector and training method thereof |
CN112508169A (en) * | 2020-11-13 | 2021-03-16 | 华为技术有限公司 | Knowledge distillation method and system |
CN112329725A (en) * | 2020-11-27 | 2021-02-05 | 腾讯科技(深圳)有限公司 | Method, device and equipment for identifying elements of road scene and storage medium |
CN112329725B (en) * | 2020-11-27 | 2022-03-25 | 腾讯科技(深圳)有限公司 | Method, device and equipment for identifying elements of road scene and storage medium |
CN112418343B (en) * | 2020-12-08 | 2024-01-05 | 中山大学 | Multi-teacher self-adaptive combined student model training method |
CN112418343A (en) * | 2020-12-08 | 2021-02-26 | 中山大学 | Multi-teacher self-adaptive joint knowledge distillation |
CN112560631A (en) * | 2020-12-09 | 2021-03-26 | 昆明理工大学 | Knowledge distillation-based pedestrian re-identification method |
WO2022120996A1 (en) * | 2020-12-10 | 2022-06-16 | 中国科学院深圳先进技术研究院 | Visual position recognition method and apparatus, and computer device and readable storage medium |
CN112529162B (en) * | 2020-12-15 | 2024-02-27 | 北京百度网讯科技有限公司 | Neural network model updating method, device, equipment and storage medium |
CN112529162A (en) * | 2020-12-15 | 2021-03-19 | 北京百度网讯科技有限公司 | Neural network model updating method, device, equipment and storage medium |
CN112801209A (en) * | 2021-02-26 | 2021-05-14 | 同济大学 | Image classification method based on dual-length teacher model knowledge fusion and storage medium |
CN113222123A (en) * | 2021-06-15 | 2021-08-06 | 深圳市商汤科技有限公司 | Model training method, device, equipment and computer storage medium |
CN113469977A (en) * | 2021-07-06 | 2021-10-01 | 浙江霖研精密科技有限公司 | Flaw detection device and method based on distillation learning mechanism and storage medium |
CN113469977B (en) * | 2021-07-06 | 2024-01-12 | 浙江霖研精密科技有限公司 | Flaw detection device, method and storage medium based on distillation learning mechanism |
CN113792871A (en) * | 2021-08-04 | 2021-12-14 | 北京旷视科技有限公司 | Neural network training method, target identification method, device and electronic equipment |
CN113592007A (en) * | 2021-08-05 | 2021-11-02 | 哈尔滨理工大学 | Knowledge distillation-based bad picture identification system and method, computer and storage medium |
CN113360777A (en) * | 2021-08-06 | 2021-09-07 | 北京达佳互联信息技术有限公司 | Content recommendation model training method, content recommendation method and related equipment |
CN113360777B (en) * | 2021-08-06 | 2021-12-07 | 北京达佳互联信息技术有限公司 | Content recommendation model training method, content recommendation method and related equipment |
CN113822373A (en) * | 2021-10-27 | 2021-12-21 | 南京大学 | Image classification model training method based on integration and knowledge distillation |
CN113822373B (en) * | 2021-10-27 | 2023-09-15 | 南京大学 | Image classification model training method based on integration and knowledge distillation |
CN114743243A (en) * | 2022-04-06 | 2022-07-12 | 平安科技(深圳)有限公司 | Human face recognition method, device, equipment and storage medium based on artificial intelligence |
CN114743243B (en) * | 2022-04-06 | 2024-05-31 | 平安科技(深圳)有限公司 | Human face recognition method, device, equipment and storage medium based on artificial intelligence |
CN114970862B (en) * | 2022-04-28 | 2024-05-28 | 北京航空航天大学 | PDL1 expression level prediction method based on multi-instance knowledge distillation model |
CN114970862A (en) * | 2022-04-28 | 2022-08-30 | 北京航空航天大学 | PDL1 expression level prediction method based on multi-instance knowledge distillation model |
CN115204394A (en) * | 2022-07-05 | 2022-10-18 | 上海人工智能创新中心 | Knowledge distillation method for target detection |
WO2024032386A1 (en) * | 2022-08-08 | 2024-02-15 | Huawei Technologies Co., Ltd. | Systems and methods for artificial-intelligence model training using unsupervised domain adaptation with multi-source meta-distillation |
WO2024066111A1 (en) * | 2022-09-28 | 2024-04-04 | 北京大学 | Image processing model training method and apparatus, image processing method and apparatus, and device and medium |
CN116028891A (en) * | 2023-02-16 | 2023-04-28 | 之江实验室 | Industrial anomaly detection model training method and device based on multi-model fusion |
CN116091895B (en) * | 2023-04-04 | 2023-07-11 | 之江实验室 | Model training method and device oriented to multitask knowledge fusion |
CN116091895A (en) * | 2023-04-04 | 2023-05-09 | 之江实验室 | Model training method and device oriented to multitask knowledge fusion |
CN116662814B (en) * | 2023-07-28 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Object intention prediction method, device, computer equipment and storage medium |
CN116662814A (en) * | 2023-07-28 | 2023-08-29 | 腾讯科技(深圳)有限公司 | Object intention prediction method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111160409A (en) | Heterogeneous neural network knowledge reorganization method based on common feature learning | |
CN111897941A (en) | Dialog generation method, network training method, device, storage medium and equipment | |
CN111026842A (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN112116092B (en) | Interpretable knowledge level tracking method, system and storage medium | |
CN114386694A (en) | Drug molecule property prediction method, device and equipment based on comparative learning | |
Yang et al. | Visual curiosity: Learning to ask questions to learn visual recognition | |
CN107077487A (en) | Personal photo is tagged using depth network | |
CN114186084B (en) | Online multi-mode Hash retrieval method, system, storage medium and equipment | |
CN115186097A (en) | Knowledge graph and reinforcement learning based interactive recommendation method | |
El Gourari et al. | The implementation of deep reinforcement learning in e-learning and distance learning: Remote practical work | |
CN113821527A (en) | Hash code generation method and device, computer equipment and storage medium | |
CN116136870A (en) | Intelligent social conversation method and conversation system based on enhanced entity representation | |
Zhu et al. | Learning to transfer learn: Reinforcement learning-based selection for adaptive transfer learning | |
Tang et al. | A practical exploration of constructive english learning platform informatization based on rbf algorithm | |
CN116738371B (en) | User learning portrait construction method and system based on artificial intelligence | |
CN114281955A (en) | Dialogue processing method, device, equipment and storage medium | |
Kamil et al. | Literature Review of Generative models for Image-to-Image translation problems | |
CN114880527B (en) | Multi-modal knowledge graph representation method based on multi-prediction task | |
CN115795993A (en) | Layered knowledge fusion method and device for bidirectional discriminant feature alignment | |
CN112907004B (en) | Learning planning method, device and computer storage medium | |
CN113535911B (en) | Reward model processing method, electronic device, medium and computer program product | |
Xie et al. | Skillearn: Machine learning inspired by humans' learning skills | |
CN113742591A (en) | Learning partner recommendation method and device, electronic equipment and storage medium | |
CN113887471A (en) | Video time sequence positioning method based on feature decoupling and cross comparison | |
CN115619363A (en) | Interviewing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200515 |
|
WW01 | Invention patent application withdrawn after publication |