CN115471700A - Knowledge transmission-based image classification model training method and classification method - Google Patents

Knowledge transmission-based image classification model training method and classification method Download PDF

Info

Publication number
CN115471700A
CN115471700A CN202211126235.6A CN202211126235A CN115471700A CN 115471700 A CN115471700 A CN 115471700A CN 202211126235 A CN202211126235 A CN 202211126235A CN 115471700 A CN115471700 A CN 115471700A
Authority
CN
China
Prior art keywords
classification model
image classification
training
data set
new
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211126235.6A
Other languages
Chinese (zh)
Inventor
�谷洋
郭帅
文世杰
马媛
陈益强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202211126235.6A priority Critical patent/CN115471700A/en
Publication of CN115471700A publication Critical patent/CN115471700A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a knowledge transmission-based image classification model training method, which is used for carrying out incremental training on a pre-trained image classification model, wherein the pre-trained image classification model comprises a feature extraction network and a classifier, and is characterized in that the method comprises the following steps of carrying out incremental training on the pre-trained image classification model by adopting a new image data set: s1, enhancing a current new image data set; and S2, initializing the image classification model by using the parameters of the image classification model after the last training, and training the image classification model to be convergent by using the enhanced current new image data set, wherein the model parameters are updated by adopting cross entropy loss, distillation loss and knowledge transmission loss in the training process. The invention can realize the migration of the model characteristic space and relieve the problem of catastrophic forgetting in the incremental training process.

Description

Knowledge transmission-based image classification model training method and classification method
Technical Field
The invention relates to the field of computer vision, in particular to the field of image classification in the field of computer vision, and more particularly to an image classification model training method and a classification method based on knowledge transmission.
Background
In the prior art, deep learning is widely applied to the field of image processing, and a deep convolutional neural network in the deep learning has remarkable success in aspects of retinal vessel segmentation, retinal disease classification, fundus disease image abnormality detection and the like. However, these successes depend to a large extent on the amount of annotation data, and the lack of sufficient annotation data to train the network may result in poor network processing capabilities in retinal vessel segmentation, retinal disease classification, fundus disease image anomaly detection, and the like. And in some fields, such as the field of medical images, it takes a lot of time, labor and money to collect and label data, so it is difficult to construct an effective data set in a short time. The traditional deep learning method is a batch learning method, all data are required to train the model, the model performance may be poor if a large amount of labeled data is not available for training the model, and if new labeled data is received subsequently, the whole model needs to be retrained, which increases the training cost and may cause resource waste.
Incremental learning provides a new idea for solving the above problems, and updating the model according to new data that comes continuously can effectively alleviate the current dilemma. Incremental learning updates an already trained model and improves the generalization capability of the model with newly arrived data while preserving as much as possible the existing knowledge and during the incremental process, old data is not available. Through incremental learning, the method is not limited by the difficulty of constructing a data set in a short time, and the whole model does not need to be retrained. However, new data that continuously comes may be accompanied by changes of different acquisition devices or acquisition users, which may cause changes in feature space and data distribution of new data, and may further cause a data difference problem. Therefore, if the newly added data is directly used to fine-tune the trained model, the classification accuracy of the old data may be rapidly reduced, which may cause a catastrophic forgetting problem.
In order to solve the problems, the Chinese patent application CN113066025A provides an image defogging method based on increment learning, characteristics and attention transfer, but the method does not well solve the problem of catastrophic forgetting in increment; the chinese patent application CN106022368a proposes an increment track anomaly detection method based on increment kernel principal component analysis, which utilizes kernel principal component analysis to realize increment anomaly detection; the chinese patent application CN112990280a proposes a method that uses class increment on image classification problem; the US patent application US2020302230A1 proposes a method for applying class increment learning to the field of object detection.
Although numerous approaches to mitigating catastrophic forgetting have been proposed in the prior art, the challenges of catastrophic forgetting still face in the incremental learning process based on data increment. And the widely adopted scheme of taking distillation loss and cross entropy loss as loss functions to update model parameters so as to relieve the catastrophic forgetting in the incremental learning process at present has the following defects: 1. which features extracted by a deep learning model in the incremental process are key features are not explored; 2. the semantic relationship between new and old data in the incremental process is not fully utilized. This leaves the incrementally learned model still subject to catastrophic forgetting.
Disclosure of Invention
Therefore, an object of the present invention is to overcome the above-mentioned drawbacks of the prior art, and to provide an image classification model training method based on knowledge transmission and an image classification method.
The purpose of the invention is realized by the following technical scheme:
according to a first aspect of the present invention, a knowledge transmission-based image classification model training method is provided, which is used for performing incremental training on a pre-trained image classification model, wherein the pre-trained image classification model includes a feature extraction network and a classifier, and the method includes performing incremental training on the pre-trained image classification model by using a new image data set according to the following manner: s1, enhancing a current new image data set; and S2, initializing the image classification model by using the parameters of the image classification model after the last training, and training the image classification model to be convergent by using the enhanced current new image data set, wherein the model parameters are updated by adopting cross entropy loss, distillation loss and knowledge transmission loss in the training process.
Preferably, the step S1 includes: and carrying out self-supervision enhancement processing on the current new image data set, or carrying out category enhancement processing on the current new image data set, or carrying out self-supervision enhancement processing on the current new image data set and then carrying out category enhancement processing.
Preferably, the self-supervision enhancing processing is to flip samples in the current new image data set according to rotation angles of 90 °, 180 °, and 270 °; the class enhancement processing means that two samples of different classes are randomly sampled from the current new image data set and class expansion is performed according to the following mode to generate a new sample to be added into the current new image data set:
Figure BDA0003848386050000031
wherein the content of the first and second substances,
Figure BDA0003848386050000032
representation is based on sample x α And x β New samples generated, x α Representing samples in class alpha in the dataset, x β Representing samples in class beta in the dataset and mu representing the interpolation coefficient.
Preferably, the step S2 includes: s21, transmitting the feature extraction network parameters of the image classification model after the last training to a feature extraction network of the image classification model by adopting an optimal knowledge transmission method to obtain an initial image classification model of the current training; and S22, training the initial image classification model of the current training by adopting the current new image data set until convergence, and updating the initial image classification model parameters of the current training by adopting cross entropy loss, distillation loss and knowledge transmission loss.
In some embodiments of the present invention, in step S22, the total loss function is calculated according to the following formula:
L total (x,y)=L CE (x,y)+λL KD (x)+γL KT (x)
wherein L is total (. Represents the total loss function, L CE (. Represents a cross-entropy loss function, L KD (. Represents a function of distillation loss of knowledge, L KT (. Cndot.) represents a knowledge transfer loss function, λ and γ represent hyper-parameters, x represents a sample, and y represents a label of the sample x.
In some embodiments of the invention, the knowledge distillation loss is calculated using the following formula:
Figure BDA0003848386050000033
wherein Y is b-1 Representing the number of samples of the image data set of the last training after the enhancement process, x representing the samples, S k It is indicated that the softmax function is,
Figure BDA0003848386050000034
a classifier representing the image classification model after the last training,
Figure BDA0003848386050000035
a feature extraction network representing the image classification model after the last training,
Figure BDA0003848386050000036
classifier, phi, representing the image classification model currently trained new (. Cndot.) represents the feature extraction network of the currently trained image classification model.
In some embodiments of the invention, the knowledge transmission loss is calculated using the following formula:
Figure BDA0003848386050000037
wherein Y is b Number of samples representing new image data set of current training after enhancement processing, x representing sample, S k It is indicated that the softmax function is,
Figure BDA0003848386050000038
a classifier representing the image classification model after the last training,
Figure BDA0003848386050000039
a feature extraction network representing the image classification model after the last training,
Figure BDA00038483860500000310
classifier, φ ', representing the initial image classification model of the current training' new (. Cndot.) represents the feature extraction network of the initial image classification model currently trained.
According to a second aspect of the present invention, there is provided an image classification method, comprising: t1, acquiring an image to be processed; and T2, processing the image to be processed by the image classification model trained by the method of the first aspect of the invention to obtain a classification result.
Compared with the prior art, the invention has the advantages that:
(1) By adopting a data enhancement and category enhancement method of self-supervision learning, the model learning in the incremental training process has more generalization and mobility representation;
(2) And the optimal knowledge is used for transmitting and acquiring the semantic relation between the new data set and the old data set, so that the migration of the model feature space is realized, and the problem of catastrophic forgetting in the incremental training process is solved.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a flowchart illustrating an image classification model training method according to an embodiment of the present invention;
FIG. 2 is a graph illustrating the results of a comparative experiment in which 2 incremental trainings were performed according to an embodiment of the present invention;
FIG. 3 is a graph illustrating the results of a comparative experiment in which 5 incremental trainings are performed according to an embodiment of the present invention;
FIG. 4 is a graph illustrating the results of a comparative experiment in which 10 incremental trainings were performed according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a confusion matrix incrementally trained according to the DER method;
FIG. 6 is a schematic diagram of a confusion matrix for incremental training according to the MUC method;
FIG. 7 is a schematic diagram of a confusion matrix for incremental training according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As mentioned in the background, the prior art, while overcoming the catastrophic forgetting problem, still suffers from two disadvantages: on one hand, which features extracted by a deep learning model in the increment process are key features are not explored; on the other hand, the semantic relationship between new and old data in the incremental process is not fully utilized. Aiming at the defects of the prior art, the invention provides a knowledge transmission-based image classification model training method, which is characterized in that a pre-trained image classification model on a basic labeling data set is subjected to incremental training by adopting new image data, wherein in the training process, an automatic supervision enhancement method and a category enhancement method are adopted to enhance the new image data set, so that the model learning in the incremental learning process has more generalization and mobility characteristics, and then the optimal knowledge transmission is utilized to obtain the semantic relation between the new data set and the old data set, so that the migration of the model feature space is realized, and the catastrophic forgetting problem in the incremental learning process is relieved. In summary, as shown in fig. 1, the image classification model training method based on knowledge transmission of the present invention includes incremental training of a pre-trained image classification model with a new image data set as follows: and performing enhancement processing (including self-supervision enhancement and class enhancement) on the current new image data set to realize data representation enhancement, initializing the image classification model by using the parameters of the image classification model after the last training (preferably, initializing by adopting an optimal knowledge transmission mode), and training the image classification model to be convergent by using the enhanced current new image data set, wherein model parameters are updated by adopting cross entropy loss, distillation loss and knowledge transmission loss in the training process.
For a better understanding of the present invention, the present invention is described below in conjunction with specific embodiments from the five aspects of new image dataset partitioning, data enhancement, critical knowledge transfer, training process, and verification process, respectively.
1. New image dataset partitioning
The incremental learning method is different from the traditional deep learning method, does not need to train the model by using all data, and updates the model according to new data which comes continuously. In order to better train the model, a new image data set is divided into a plurality of sub data sets for incremental training, one sub data set is used for one incremental training, and the change of the class in the incremental training process can be controlled by dividing the data of the new image data set, so that the cost between the sample center and the sample between different classes can be calculated in the following process.
According to one embodiment of the invention, the new image data set is divided in the following way: and dividing the new image data set into 10 sub data sets in an equal proportion, wherein one sub data set is used for one-time incremental training, all the sub data sets contain the same image category, and the image category proportion in each sub data set is the same. According to one example of the present invention, if the new image dataset includes A, B, C, D four categories of image samples, wherein category a has 4000 image samples, category B has 3000 image samples, category C has 2000 image samples, category D has 1000 image samples, 10000 total images in four categories are equally divided into 10 subsets, each subset includes 1000 images, and A, B, C, D in each subset has a ratio of 4: 3: 2: 1.
2. Data enhancement
In incremental learning, as the number of incremental training times increases, the model forgets old knowledge, which may lead to performance degradation of the model. From the perspective of spectrum analysis, the spectrum component with a large characteristic value is not easy to forget in the increment process, so that the spectrum component can be expanded by researching the enhancement of high-characterization data based on the spectrum analysis, and further more diversified and transferable characterization information can be obtained in the increment process. Specifically, the sensitivities in different directions in the model depth feature space are quantified by calculating the similarity of the feature space before and after the new data set. First, the old data set D is used old Training a pre-trained image classification model (the image classification model generally comprises a feature extraction network and a classifier, which are known in the field of image classification and are not repeated in the invention) to obtain an old feature extraction network f θ,old Reuse of the new data set D new Updating old feature extraction network f θ,old Obtaining a new feature extraction network f θ,new . It should be noted that the feature extraction network may adopt a context network or a ResNet network or other neural networks. Then, the old feature extraction network f is used again θ,old And a new feature extraction network f θ,new For old data set D old Processing to obtain old feature extraction network f θ,old Mapped depth feature and new feature extraction network f θ,new Mapping depth features, extracting old features from the network f θ,old Mapped depth feature and new feature extraction network f θ,new The mapped depth features are decomposed into different directions in the following way:
Figure BDA0003848386050000061
where n represents the number of data set samples, x i Denotes the ith sample, f θ (. H) represents the depth feature of the ith sample, f θ (·) T Denotes f θ Transpose of (·), d denotes the dimension of the feature space, λ j Representing the corresponding eigenvalue, u, of the j-th dimension eigenspace j Representing the corresponding feature vector, u, of the j-th dimension of the feature space j T Represents u j The transposing of (1). Two sets of feature vectors can be obtained by decomposition, where one set of vectors is used to represent the original characterizing information u old,1 ,……,u old,d And another set of vectors is used to represent new characterization information u new,1 ,……,u new,d }. And finally, researching the forgetting and the mobility of each direction in the depth feature space of the model according to the two groups of feature vectors obtained by decomposition, and exploring the old feature extraction network f in the incremental process by using the angular relation psi θ,old And a new feature extraction network f θ,new Distance between the corresponding spaces:
Figure BDA0003848386050000062
wherein u is old,j Representing old feature extraction network f θ,old The jth eigenvector, u, with the jth largest eigenvalue in the corresponding eigenspace new,j Representing a new feature extraction network f θ,new The jth eigenvector with the jth largest eigenvalue in the corresponding eigenspace. | u old,j | =1 and | | u new,j I | =1. It should be noted that the spatial distance between the feature extraction networks can be understood as the distance between two-dimensional planes in a three-dimensional coordinate system.
In the incremental learning process, the retention of old knowledge is embodied on the representation level, and the shape of the characteristic distribution, namely covariance, should not change too much. If the direction of a feature vector is slightly changed after the feature extraction network is updated, a small change angle is reflected. Therefore, the changed angle size after the feature vector increment training can reflect which are key features and which are features with stronger mobility and less possibility of being forgotten, when the angle change is small, the fact that the space deviation of the front and rear features is small is indicated, and the feature vector is less possibility of being forgotten and has mobility; when the angle change is large, the characteristic vector is easier to forget because the front and rear characteristic space is greatly deviated. Through the research, after the characteristic extraction network is updated, the larger the characteristic value, the smaller the angle change of the characteristic vector is, which means that the characteristic vector with the larger characteristic value is not easy to forget and has mobility after the incremental training, and meanwhile, the characteristic vector with the larger characteristic value is shown as a key vector. It follows that learning more migratory features in the course of incremental learning can alleviate the catastrophic forgetting problem.
As can be seen from the foregoing spectral analysis research, the feature vector with a larger feature value is a key vector, and in order to learn more key vectors to alleviate the catastrophic forgetting problem in the incremental learning process, the present invention enhances the number of feature vectors with larger feature values seen by the model in the incremental learning process by performing enhancement processing on data, and further, the model can learn more migratable features to alleviate catastrophic forgetting. According to an embodiment of the invention, the invention provides an automatic supervision enhancing method for carrying out automatic enhancement processing on data to improve data characteristic diversity, and provides a category enhancing method for carrying out category enhancement processing on data to improve data category diversity.
According to an embodiment of the present invention, the present invention performs the self-supervision enhancement processing on the data in a rotation manner, specifically, the samples in the data set are rotated by a preset angle, and according to an embodiment of the present invention, the preset angle may be 90 °, 180 °, and 270 °. Based on the rotation processing, if the original data set includes K-class samples, the K-class samples in the original data set after the self-supervision enhancement processing are expanded to 4K-class samples, and a new label is assigned to the rotated samples according to the rotation angle. Compared with the 4-path self-supervision task widely used at present, the self-supervision enhancement method relaxes certain invariant constraint when simultaneously learning the original task and the self-supervision task, and is beneficial to learning richer features in the incremental learning process.
According to an embodiment of the present invention, the present invention performs class enhancement processing on data by means of sample combination, and specifically, the present invention randomly samples two different classes of samples from a data set and performs class expansion to generate a new sample to be added to the data set as follows:
Figure BDA0003848386050000071
wherein the content of the first and second substances,
Figure BDA0003848386050000072
representation is based on sample x α And x β New sample generated, x α Representing samples in class alpha in the dataset, x β Representing samples in class beta in the dataset and mu representing the interpolation coefficient. According to one embodiment of the invention, the μ slave interval [0.3,0.7]And (4) sampling. Based on such class expansion processing, if the original data set includes K classes of samples, the K classes of samples in the original data set after the class enhancement processing will be expanded to M + K classes of samples, M = K (K-1)/2. As more classes are included in the data set after the class enhancement processing, more classes can be seen in the incremental learning process to learn the migratable and diversified features.
3. Key knowledge transmission
By the aid of the embodiment, the data set is enhanced, so that the model can learn more generalized and migratory features in the incremental learning process, and the features are transmitted in the incremental learning process to relieve the catastrophic forgetting of the model. According to an embodiment of the invention, in the incremental training process, the invention adopts key knowledge transmission to transmit the feature extraction network parameters of the image classification model after the last training to the feature extraction network of the image classification model to obtain the initial image classification model of the current training, and the feature extraction network can be initialized better through the knowledge transmission, so that the initialized feature extraction network keeps the previously learned knowledge, and the catastrophic forgetting is avoided. For a better understanding of the invention, the basic principle of the transmission of critical knowledge is explained below.
In the incremental process, a mapping relation, namely a semantic relation, exists between the new data set and the old data set. And since the old and new models between the old and new data sets are related, the similarity between the old and new data sets can help to realize incremental training. As the number of incremental training times is increased, the number of old data sets is increased, the probability that new data sets and old data sets are correlated with each other is increased, and migration of key features is promoted. Under the promotion of semantic relations, the method links the old data set and the new data set through model reuse. Assuming that semantic relationships between new and old data sets have been extracted, a feature extraction network can be migrated from original data to target data using semantic mapping, i.e., the semantic mapping takes the original feature extraction network as input, generating a feature extraction network that matches the target data.
Semantic mapping can capture the correlation between new and old data sets and can extract the original features into a network phi o (. O) conversion to a target feature extraction network g (. Cndot.). Thus, the transformed prediction may be weighted with a semantic mapping. The semantic mapping encodes sample-level semantic associations between old data with a dimension of alpha and new data with a dimension of beta, and the more associations between the old and new data, the larger the corresponding semantic mapping value.
Definition u 1 ∈Δ α ,u 2 ∈Δ β Wherein
Figure BDA0003848386050000081
Is in a d-dimensional simple form,
Figure BDA0003848386050000082
is a positive real number of dimension d, u 1 Normalized edge probability, u, representing the importance of each sample in an old dataset of dimension α 2 Normalized edge probability, u, representing the importance of each sample in a new dataset of dimension β 1 、u 2 Is set to be uniformly distributed and there is no prior of information. Introducing a cost matrix
Figure BDA0003848386050000083
To describe sample changes in new and old datasets and guide migration, whose elements point to the cost to be paid to link the old dataset to the new dataset. Reconsidering the semantic mapping as a coupling of two distributions can tie the samples between the tasks with the lowest transportation cost and optimize by minimizing:
Figure BDA0003848386050000084
wherein T1 represents the sum of all elements in the matrix as 1,T T 1 denotes the transpose matrix of T1, T denotes the semantic mapping, T ∈ R α×β It is shown how to align the old data set with the new data set, in which case the probabilistic quality of a sample will migrate to similar samples at a small cost, and the correctly aligned mapping between the old data set and the new data set will also be output. By applying the semantic mapping T, the invention can convert the trained feature extraction network from the previous task to the feature extraction network of the current task. It should be noted that the present invention uses the Sinkhorn algorithm to solve the optimal transmission problem.
As can be seen from the above, the cost matrix C characterizes the relationship between the old and new data sets. If the cost matrix C is required to be solved, the sample center of each category in the data set needs to be solved first, then the sample center is used for calculating the cost among different categories in the data set and constructing the cost matrix C, and the sample center is solved by adopting the following method:
Figure BDA0003848386050000091
wherein v is n Sample center representing class n, n b Representing the number of samples in the data set, y i Label, x, representing the ith sample i Denotes the ith sample,. Phi. Cndot.And (5) characterizing the extraction network.
If samples in the same dataset between two different classes are correlated, their corresponding sample centers will also be close to each other, and two Euclidean distances are used to measure the cost between samples:
Figure BDA0003848386050000092
wherein, C n,m Representing the cost, v, between samples of class n and class m n Sample center, v, representing class n m Representing the sample center for category m. The greater the distance, the greater the difference between samples of two classes in the data set, and the greater the difference between samples of different classes the greater the difficulty of reusing certain coefficients of a previously well-trained model.
In the solving process, different types of sample centers can be obtained after the formula (2) is solved, then the cost matrix C can be obtained based on the formula (3), and then the semantic mapping relation T based on the optimal knowledge transmission can be obtained based on the formula (1). Since the key knowledge transmission technology is known to those skilled in the art, the present invention is not described in detail, and the key knowledge transmission is represented by a semantic mapping relationship T in the subsequent embodiments.
In incremental training, in the face of a new task, the model needs to adapt to the feature space of new data quickly, that is, the classification capability of a new data set is improved, and meanwhile, the catastrophic forgetting of an old data set is reduced. In order to better enable the model to be quickly adapted to the feature space of new data in the incremental learning process, the invention utilizes optimal knowledge transmission to solve the problem of feature space adaptation through semantic mapping between new and old data sets. The invention maps the relation T to phi 'in the semantic meaning' new =T(φ old ) Under the guidance of old feature extraction network phi old Construction of a New feature extraction network phi' new I.e. extracting the network phi based on old features using an optimal transmission method old Parameter initialization novel feature extraction network phi' new Feature extraction network phi 'based on this' new Well utilizes old characteristic extraction netAnd the semantic relation among the samples is preserved. At the same time, alignment between old and new samples is maintained because the relationships between samples in the old and new data sets are captured by semantic mapping. Therefore, even if training is not performed, the new feature extraction network obtained after optimal knowledge transmission has the capability of accurately classifying the new data set while maintaining the capability of classifying the old data set.
4. Training process
According to an embodiment of the present invention, the present invention performs multiple incremental training on a pre-trained image classification model by using multiple sub data sets divided by a new image data set, and in order to more clearly describe related technical features of the present invention, the present invention introduces a complete incremental training as an example, wherein each incremental training includes steps S1 and S2, and each step is described in detail below.
In step S1, the sub data set of the current training is enhanced, and the sub data set of the current training is first subjected to the self-supervision enhancement processing, and then subjected to the category enhancement processing. Specifically, the process of enhancement processing is described by taking the image samples of the category a and the image samples of the category b in the currently trained sub-data set as an example, first, the image samples of the category a and the image samples of the category b are rotated according to 90 °, 180 °, and 270 °, the image categories after the self-supervision enhancement processing are extended from the original 2 categories to 8 categories, and then the image samples after the self-supervision enhancement processing are subjected to category enhancement processing, that is, the extended 8 categories of image samples are subjected to category enhancement processing according to the processing method described in the foregoing embodiment, and the image categories after the category enhancement processing are extended from 8 categories to 36 categories. It should be noted that, the enhancement processing may be performed on the sub-data set only by using an auto-supervision enhancement mode, or may be performed on the sub-data set only by using a category enhancement mode.
In step S2, the image classification model is initialized by using the parameters of the image classification model after the last training, and is trained to converge by using the enhanced sub data set of the current training, wherein the model parameters are updated by using the cross entropy loss, the distillation loss and the knowledge transmission loss in the training process.
According to an embodiment of the invention, said step S2 comprises steps S21, S22.
In step S21, an optimal knowledge transfer method is adopted to transfer the feature extraction network parameters of the image classification model after the last training to the feature extraction network of the image classification model to obtain the initial image classification model of the current training, that is, the semantic mapping relation T between the sub data set of the last training and the sub data set of the current training is calculated according to the optimal knowledge transfer method, and the feature extraction network phi of the image classification model after the last training is used under the guidance of the semantic mapping relation T old Construction of a New feature extraction network phi' new ,φ′ new =T(φ old ) Network phi 'is extracted by using new features' new Parameter of (2) initializing feature extraction network phi of image classification model of current training new And obtaining an initial image classification model of the current training. Feature extraction network phi for guiding last-trained image classification model by utilizing semantic mapping relation T old Construction of a New feature extraction network phi' new Namely, the semantic mapping relation T is used as a limiting condition to extract the characteristics into a network phi old Taking the parameters as input, substituting into a semantic mapping relation T to output a feature extraction network phi' new The parameter (c) of (c).
In step S22, the initial image classification model of the current training is trained to converge using the current sub-dataset, and the initial image classification model parameters of the current training are updated using the cross-entropy loss, the distillation loss, and the knowledge transfer loss.
According to an embodiment of the present invention, in step S22, the parameters of the currently trained initial image classification model are updated as follows:
L total (x,y)=L CE (x,y)+λL KD (x)+γL KT (x)
L total (. Represents the total loss function, L CE (. Represents a cross-entropy loss function, L KD (. Represents a function of distillation loss of knowledge, L KT (·)Representing the knowledge transfer loss function, λ and γ representing the hyper-parameters, x representing the samples, y representing the labels of the samples x. According to one example of the invention, the values of the hyper-parameters λ and γ are 10.
According to one embodiment of the invention, the knowledge distillation loss is calculated as follows:
Figure BDA0003848386050000111
wherein, Y b-1 Representing the number of samples of the sub data set of the last training after enhancement, x representing the sample, S k It is indicated that the softmax function is,
Figure BDA0003848386050000112
a classifier representing the image classification model after the last training,
Figure BDA0003848386050000113
a feature extraction network representing the image classification model after the last training,
Figure BDA0003848386050000114
classifier, phi, representing the image classification model currently trained new (. Cndot.) represents the feature extraction network of the currently trained image classification model. It should be noted that the image classification model after the last training is not trained again according to the sub data set of the current training in the current incremental training.
According to one embodiment of the invention, the knowledge transmission loss is calculated as follows:
Figure BDA0003848386050000115
wherein, Y b Representing the number of samples of the sub data set of the current training after enhancement processing, x representing the sample, S k It is indicated that the softmax function is,
Figure BDA0003848386050000116
a classifier representing the image classification model after the last training,
Figure BDA0003848386050000117
a feature extraction network representing the image classification model after the last training,
Figure BDA0003848386050000118
denotes the classifier, φ 'of the initial image classification model currently trained' new (. Cndot.) represents the feature extraction network of the initial image classification model currently trained.
It should be noted that the cross entropy loss is a common loss function in the deep learning field, and therefore, the description of the invention is not repeated.
5. Verification process
In order to verify the effect of the invention, the invention adopts 6 different fundus disease image data sets to carry out comparison experiments, and compares the difference between the invention and the prior art. The first data set comprises 12238 high-quality clinical fundus disease image samples and corresponding labeling labels thereof, and the data set comprises four types of fundus disease image samples, namely age-related macular degeneration (AMD), diabetic Retinopathy (DR), glaucoma and normal retina image samples. The second data set comprises 13,812 fundus disease image samples with uneven quality levels and corresponding labeling labels, and the data set comprises two types of fundus disease image samples, namely age-related macular degeneration (AMD) image samples and Diabetic Retinopathy (DR) image samples. 1748 retinal color image samples are included in the third data set, and all Diabetic Retinopathy (DR) image samples are included in the data set. The fourth data set included image samples of 15 fundus diseases each for normal patients, glaucoma patients, and Diabetes (DR) patients. The fifth data set included 650 glaucoma-retinal image samples. The sixth data set comprises 40 digital retina image samples and corresponding labeling labels.
The incremental learning experiment is carried out by adopting the first data set to simulate different data proportions, specifically, the first data set is divided into a training set and a testing set, and the selection proportion of the training set and the testing set is 4:1. And equally dividing the test set into 2, 5 and 10 equal parts respectively, performing 2 incremental training, 5 incremental training and 10 incremental training respectively according to the content of the embodiment of the invention, and testing the image classification model obtained after the current training by using the test set every time the incremental training is completed to obtain the model precision of the image classification model after the current training, wherein the precision results of the image classification model after performing 2, 5 and 10 incremental training are shown in fig. 2-4, and it can be known from the figure that the accuracy of the image classification model obtained after performing 2 incremental training, 5 incremental training and 10 incremental training respectively according to the embodiment of the invention is superior to the accuracy of the image classification model obtained after performing incremental training according to the prior art, and oracle represents the test precision obtained by using all data for training. Among them, the prior art includes End-to-End Incremental Learning (EEIL), incremental Learning by a Unified Classifier (LUCIR), incremental Learning based on regularization (Learning with out Learning (LwF), memory Aware Synthesis (MAS)), memoryless Learning (Learning with out Learning, lwM), and Incremental Learning based on multiple classifiers (MUC).
Further, in order to simulate the practical problems of data quality difference and incomplete data category when the data set used in the current incremental training is different from the data set used in the last incremental training in data acquisition equipment or data acquisition personnel. The method comprises the steps of firstly adopting partial data in a first data set to finish incremental training, then placing data in other five data sets in the subsequent incremental training, and finally comparing the difference between the method and the prior art by acquiring a confusion matrix in each incremental training process, thereby verifying the effectiveness of the method in reducing the error classification condition, wherein the verification results are shown in figures 5-7. The confusion matrix is an analysis table for summarizing prediction results of the classification model in machine learning, two standards are judged and summarized according to real categories and categories predicted by the classification model on records in a data set in a matrix form, rows of the matrix represent real values, columns of the matrix represent predicted values, and when the color on a diagonal line from the top left to the bottom right of a matrix pair is darker, the effect of the model is better. Therefore, as can be seen from the figure, when there are problems of data quality difference and incomplete data categories, the precision of the image classification model after Incremental training is obviously better than that of the image classification model obtained after training by the Incremental Learning method based on multiple classifiers, and is not much different from that of the image classification model obtained after training by the dynamic extensible Representation for Class Incremental Learning (DER) method based on Class Incremental Learning.
According to the method, based on the spectrum analysis, the characteristics of which attributes are key characteristics in the incremental process are researched, and then the incremental model learning is more generalized and migratory in the data enhancement and category enhancement method of the self-supervision learning; further, in order to alleviate the problem of catastrophic forgetting in the incremental process, the optimal knowledge is used for transmitting and acquiring the semantic relation between the new data set and the old data set, so that the migration of the model feature space is realized, and the problem of catastrophic forgetting in the incremental training process is alleviated.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A knowledge transmission-based image classification model training method is used for carrying out incremental training on a pre-trained image classification model, wherein the pre-trained image classification model comprises a feature extraction network and a classifier, and is characterized in that the method comprises the following steps of carrying out incremental training on the pre-trained image classification model by adopting a new image data set:
s1, enhancing a current new image data set;
and S2, initializing the image classification model by using the parameters of the image classification model after the last training, and training the image classification model to be convergent by using the enhanced current new image data set, wherein the model parameters are updated by adopting cross entropy loss, distillation loss and knowledge transmission loss in the training process.
2. The method according to claim 1, wherein the step S1 comprises:
and carrying out self-supervision enhancement processing on the current new image data set, or carrying out category enhancement processing on the current new image data set, or carrying out self-supervision enhancement processing on the current new image data set and then carrying out category enhancement processing.
3. The method of claim 2, wherein:
the self-supervision enhancement processing is to turn over the samples in the current new image data set according to rotation angles of 90 degrees, 180 degrees and 270 degrees;
the class enhancement processing means that two samples of different classes are randomly sampled from the current new image data set and class expansion is performed according to the following mode to generate a new sample to be added into the current new image data set:
Figure FDA0003848386040000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003848386040000012
representation is based on sample x α And x β New samples generated, x α Representing samples in class alpha in the dataset, x β Representing samples in class beta in the dataset and mu representing the interpolation coefficient.
4. The method according to claim 1, wherein the step S2 comprises:
s21, transmitting the feature extraction network parameters of the image classification model after the last training to a feature extraction network of the image classification model by adopting an optimal knowledge transmission method to obtain an initial image classification model of the current training;
and S22, training the initial image classification model of the current training by adopting the current new image data set until convergence, and updating the initial image classification model parameters of the current training by adopting cross entropy loss, distillation loss and knowledge transmission loss.
5. The method according to claim 4, characterized in that in step S22, the total loss function is calculated according to the following formula:
L total (x,y)=L CE (x,y)+λL KD (x)+γL KT (x)
wherein L is total (. Represents the total loss function, L CE (. Represents a cross-entropy loss function, L KD (. Represents a function of distillation loss of knowledge, L KT (. Cndot.) represents a knowledge transfer loss function, λ and γ represent hyper-parameters, x represents a sample, and y represents a label of the sample x.
6. The method of claim 5, wherein the knowledge distillation loss is calculated using the following formula:
Figure FDA0003848386040000021
wherein, Y b-1 Representing the number of samples of the image data set of the last training after the enhancement process, x representing the samples, S k It is indicated that the softmax function is,
Figure FDA0003848386040000022
a classifier representing the image classification model after the last training,
Figure FDA0003848386040000023
a feature extraction network representing the image classification model after the last training,
Figure FDA0003848386040000024
classifier, phi, representing the image classification model currently trained new (. Cndot.) represents the feature extraction network of the currently trained image classification model.
7. The method of claim 5, wherein the knowledge transmission loss is calculated using the following formula:
Figure FDA0003848386040000025
wherein, Y b Number of samples representing new image data set of current training after enhancement processing, x representing sample, S k It is indicated that the softmax function is,
Figure FDA0003848386040000026
a classifier representing the image classification model after the last training,
Figure FDA0003848386040000027
a feature extraction network representing the image classification model after the last training,
Figure FDA0003848386040000028
denotes the classifier, φ 'of the initial image classification model currently trained' new (. Cndot.) represents the feature extraction network of the initial image classification model currently trained.
8. A method of image classification, the method comprising:
t1, acquiring an image to be processed;
and T2, processing the image to be processed by adopting the image classification model trained by the method according to any one of claims 1-7 to obtain a classification result.
9. A computer-readable storage medium, having stored thereon a computer program executable by a processor for performing the steps of the method of any one of claims 1 to 7 or 8.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the steps of the method according to any one of claims 1-7, 8.
CN202211126235.6A 2022-09-16 2022-09-16 Knowledge transmission-based image classification model training method and classification method Pending CN115471700A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211126235.6A CN115471700A (en) 2022-09-16 2022-09-16 Knowledge transmission-based image classification model training method and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211126235.6A CN115471700A (en) 2022-09-16 2022-09-16 Knowledge transmission-based image classification model training method and classification method

Publications (1)

Publication Number Publication Date
CN115471700A true CN115471700A (en) 2022-12-13

Family

ID=84371349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211126235.6A Pending CN115471700A (en) 2022-09-16 2022-09-16 Knowledge transmission-based image classification model training method and classification method

Country Status (1)

Country Link
CN (1) CN115471700A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115797732A (en) * 2023-02-15 2023-03-14 杭州实在智能科技有限公司 Image retrieval model training method and system used in open category scene
CN115908441A (en) * 2023-01-06 2023-04-04 北京阿丘科技有限公司 Image segmentation method, device, equipment and storage medium
CN116311103A (en) * 2023-05-10 2023-06-23 江西云眼视界科技股份有限公司 Incremental learning-based pavement ponding detection method, device, medium and equipment
CN117523409A (en) * 2023-11-10 2024-02-06 中国科学院空天信息创新研究院 Distributed collaborative incremental updating method and device based on model structure decoupling
CN117523409B (en) * 2023-11-10 2024-06-07 中国科学院空天信息创新研究院 Distributed collaborative incremental updating method and device based on model structure decoupling

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115908441A (en) * 2023-01-06 2023-04-04 北京阿丘科技有限公司 Image segmentation method, device, equipment and storage medium
CN115908441B (en) * 2023-01-06 2023-10-10 北京阿丘科技有限公司 Image segmentation method, device, equipment and storage medium
CN115797732A (en) * 2023-02-15 2023-03-14 杭州实在智能科技有限公司 Image retrieval model training method and system used in open category scene
CN116311103A (en) * 2023-05-10 2023-06-23 江西云眼视界科技股份有限公司 Incremental learning-based pavement ponding detection method, device, medium and equipment
CN117523409A (en) * 2023-11-10 2024-02-06 中国科学院空天信息创新研究院 Distributed collaborative incremental updating method and device based on model structure decoupling
CN117523409B (en) * 2023-11-10 2024-06-07 中国科学院空天信息创新研究院 Distributed collaborative incremental updating method and device based on model structure decoupling

Similar Documents

Publication Publication Date Title
Wen et al. Incomplete multiview spectral clustering with adaptive graph learning
Kukačka et al. Regularization for deep learning: A taxonomy
CN115471700A (en) Knowledge transmission-based image classification model training method and classification method
CN107122809B (en) Neural network feature learning method based on image self-coding
Vapnik et al. A new learning paradigm: Learning using privileged information
Li et al. Semi-supervised domain adaptation by covariance matching
US20180341862A1 (en) Integrating a memory layer in a neural network for one-shot learning
Yu et al. Deep learning with kernel regularization for visual recognition
CN112446423B (en) Fast hybrid high-order attention domain confrontation network method based on transfer learning
Yu et al. Multi-target unsupervised domain adaptation without exactly shared categories
CN111127364B (en) Image data enhancement strategy selection method and face recognition image data enhancement method
CN113239131B (en) Low-sample knowledge graph completion method based on meta-learning
US8775345B2 (en) Recovering the structure of sparse markov networks from high-dimensional data
CN110674323A (en) Unsupervised cross-modal Hash retrieval method and system based on virtual label regression
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
CN114491039B (en) Primitive learning few-sample text classification method based on gradient improvement
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN116188941A (en) Manifold regularized width learning method and system based on relaxation annotation
CN109977961B (en) Binary feature learning method and system based on layered attention mechanism
Zhang et al. The role of knowledge creation-oriented convolutional neural network in learning interaction
CN115565001A (en) Active learning method based on maximum average difference antagonism
CN112784927B (en) Semi-automatic image labeling method based on online learning
CN108734116A (en) A kind of face identification method learning depth autoencoder network based on speed change
CN115169436A (en) Data dimension reduction method based on fuzzy local discriminant analysis
Zou et al. Nonnegative and adaptive multi-view clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination