CN114549906A

CN114549906A - Improved image classification algorithm for step-by-step training of Top-k loss function

Info

Publication number: CN114549906A
Application number: CN202210185010.1A
Authority: CN
Inventors: 邓泽林; 胡钰聪
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-27

Abstract

The invention discloses an image classification algorithm for step-by-step training of an improved Top-k loss function, which comprises an image data preprocessing module; the system comprises a deep learning feature extraction module and a system prediction output module, and is characterized in that an image data preprocessing module preprocesses input image data; the deep learning feature extraction module comprises an improvement of step-by-step training by using an improved Top-k loss function by using deep learning, and the classifier module is a deep learning network module; and the system output module processes the output of the classifier and outputs a judgment result. The image classification system breaks through the accuracy limit of the deep neural network under the condition of not modifying the network structure by using an improved image classification algorithm for step-by-step training of the Top-k loss function, and the accuracy is effectively improved by using the improved algorithm.

Description

An Image Classification Algorithm Based on Step-by-Step Training of Improved Top-k Loss Function

技术领域technical field

本发明属于图像分类领域，尤其涉及一种基于深度学习的改进型Top-k损失函数分步训练的图像分类系统。The invention belongs to the field of image classification, in particular to an image classification system based on a deep learning improved Top-k loss function for step-by-step training.

背景技术Background technique

在日常生活的各个领域，图像分类发挥着日益重要的作用，图像分类(ImageClassification)技术应运而生。为进一步实现图像的正确分类，给定一组各自被标记为单一类别的图像，我们对一组新的测试图像的类别进行预测，并测量预测的准确性结果成为研究重点。传统的图像分类算法提取手工设计的特征，而手工设计的特征具有较大的局限性且难于设计，因此仍无法胜任一些复杂的任务，如K邻算法(KNN)、支持向量机(SVM)等算法，设计往往难度较高，特征提取及分类器算法的组合繁琐，难以实现很高的分类准确率。In various fields of daily life, image classification plays an increasingly important role, and the technology of Image Classification emerges as the times require. To further achieve the correct classification of images, given a set of images each labeled as a single class, we make predictions on the class of a new set of test images, and measure the accuracy of the predictions The results become the focus of the research. Traditional image classification algorithms extract hand-designed features, which are limited and difficult to design, so they are still unable to perform some complex tasks, such as K-neighbor algorithm (KNN), support vector machine (SVM), etc. Algorithms and design are often difficult, and the combination of feature extraction and classifier algorithms is cumbersome, and it is difficult to achieve high classification accuracy.

近年来深度学习不断被应用于图像分类系统中。目前，深度学习方法在实际应用方面取得了很多突破性的成绩，逐渐成为了人工智能的重要工具。卷积神经网络是深度学习算法的一种，深度特征相比较于传统的手工设计的特征，优点在于不需要人工进行复杂而耗时的特征提取算法设计,只需要设计有效的神经网络模型，而且分类准确度高。但是深度神经网络往往无法在不修改网络结构的条件下突破准确率限制，需要一种改进算法有效提高准确率。In recent years, deep learning has been continuously applied to image classification systems. At present, deep learning methods have achieved many breakthroughs in practical applications, and have gradually become an important tool for artificial intelligence. Convolutional neural network is a kind of deep learning algorithm. Compared with traditional hand-designed features, deep features have the advantage of not requiring manual design of complex and time-consuming feature extraction algorithms, only need to design effective neural network models, and The classification accuracy is high. However, deep neural networks are often unable to break through the accuracy limit without modifying the network structure, and an improved algorithm is needed to effectively improve the accuracy.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种用于图像分类的改进Top-k多损失函数的分步训练算法，相比传统方法，能够相比原来损失函数实现更高的分类准确度，使用深度学习方法同时省去人工提取特征和人为选择分类器的步骤，克服了传统方法中特征提取和分类困难的问题。The purpose of the present invention is to provide a step-by-step training algorithm for an improved Top-k multi-loss function for image classification, which can achieve higher classification accuracy than the original loss function compared to the traditional method, and use the deep learning method at the same time. The steps of manual feature extraction and manual selection of classifiers are omitted, which overcomes the difficulty of feature extraction and classification in traditional methods.

本发明提出的解决方法是采用一种用于图像分类的改进Top-k多损失函数的分步训练算法实现更高分类正确率，包括：图像数据预处理模块；深度学习特征提取模块和系统预测输出模块。The solution proposed by the present invention is to use an improved Top-k multi-loss function step-by-step training algorithm for image classification to achieve higher classification accuracy, including: image data preprocessing module; deep learning feature extraction module and system prediction output module.

图像数据的预处理模块将图像数据用一个三维张量表示。图像数据预处理模块的功能包括：摒弃无关的输入数据,可以减少负面的影响，通常由改变原图像的明亮度、改变原图片的对比度、对图像进行中心修剪、改变色彩的函数、去噪预处理等组成。预处理图像数据，使其转换为图像目标分类模块能接受的输入形式，在训练模式下需要标记图像目标数据类别，选择机器学习方法所需的数据集。The image data preprocessing module represents the image data as a three-dimensional tensor. The functions of the image data preprocessing module include: discarding irrelevant input data, which can reduce negative effects, usually by changing the brightness of the original image, changing the contrast of the original image, center trimming the image, changing the color function, denoising preprocessing. processing, etc. Preprocess the image data to convert it into an input form acceptable to the image target classification module. In the training mode, you need to label the image target data category and select the data set required by the machine learning method.

训练模式下图像数据预处理的工作方法为：对三通道图像，改变原图像的明亮度、改变原图片的对比度、对图像进行中心修剪、改变色彩的函数、去噪预处理，做统一上采样、中心裁切、旋转等预处理，扩充数据量。训练模型中，可以更新卷积神经网络的参数，得到能对指定数据集进行准确分类的CNN网络。The working method of image data preprocessing in training mode is: for three-channel images, changing the brightness of the original image, changing the contrast of the original image, center trimming the image, changing the color function, denoising preprocessing, and uniform upsampling , center cropping, rotation and other preprocessing to expand the amount of data. In the training model, the parameters of the convolutional neural network can be updated to obtain a CNN network that can accurately classify the specified data set.

测试模式下图像目标数据预处理的工作方法为：对三通道图像，改变原图像的明亮度、改变原图片的对比度、改变色彩的函数、去噪预处理，做统一上采样，不做数据扩充处理。统一上采样到某一固定分辨率，以下种实施例中上采样后得到图像分辨率为224*224像素。得到最后分类效果。The working method of image target data preprocessing in test mode is: for three-channel images, change the brightness of the original image, change the contrast of the original image, change the function of color, denoise preprocessing, do unified upsampling, and do not do data expansion deal with. Uniform upsampling to a fixed resolution. In the following embodiments, the image resolution obtained after upsampling is 224*224 pixels. Get the final classification effect.

图像目标分类模块包括深度学习特征提取模块及损失函数分步训练模块，训练网络时使用监督学习算法。CNN网络特征提取模块及分类器模块，Top-k损失函数是对交叉熵损失函数的改进，训练具有较低top-2错误的模型比改进top-1正确分类更简单，并且只要前两个预测类别包含真实标签，就可以有效地提高模型结构的容错性。通过联合训练策略的基础分类器用于我们提出的集成学习，并使用该训练策略分别训练多个CNN模型。在第一步中，我们使用交叉熵损失函数训练CNN网络，该函数由ImageNet预训练模型初始化，并针对数据集进行微调。我们知道，交叉熵损失函数是为了优化top-1损失而创建的损失函数，并且通过交叉熵损失函数对模型进行训练，以获得具有更高top-1正确率的网络。在第二步中，我们使用为前k个正确标签创建的top-k损失函数进行微调，并使用在第一步中训练的模型作为top-k损失函数训练的初始权重，以在top-1正确率和更高top-2正确率不变的情况下获得优化网络，该模型提高了识别能力。该网络由两个并行的损失函数共同训练，形成一个统一的网络。The image target classification module includes a deep learning feature extraction module and a loss function step-by-step training module, and a supervised learning algorithm is used to train the network. CNN network feature extraction module and classifier module, Top-k loss function is an improvement of cross entropy loss function, training a model with lower top-2 error is simpler than improving top-1 correct classification, and only the first two predictions are required. The category contains ground truth labels, which can effectively improve the fault tolerance of the model structure. The base classifier through a joint training strategy is used for our proposed ensemble learning, and multiple CNN models are trained separately using this training strategy. In the first step, we train a CNN network with a cross-entropy loss function initialized by an ImageNet pretrained model and fine-tuned for the dataset. We know that the cross-entropy loss function is a loss function created to optimize the top-1 loss, and the model is trained by the cross-entropy loss function to obtain a network with a higher top-1 correct rate. In the second step, we fine-tune using the top-k loss function created for the top-k correct labels, and use the model trained in the first step as the initial weights for the top-k loss function to train at the top-1 The optimized network is obtained with the same accuracy and higher top-2 accuracy, and the model improves the recognition ability. The network is jointly trained by two parallel loss functions to form a unified network.

系统输出模块对分类器的输出进行处理后输出判定结果。The system output module processes the output of the classifier and outputs the judgment result.

本发明公开的一种改进型Top-k损失函数分步训练的图像分类算法，相较传统交叉熵损失函数，能够实现更高的Top-1和Top-2分类准确度，同时降低了人工计算的复杂度。在改变模型架构的基础上，本发明比交叉熵损失函数提高了Top-1和Top-2准确率。相比传统训练方法，提出的改进型Top-k损失函数分步训练的图像分类算法具有更好的鲁棒性，在CIFAR-10数据集上具有更优的性能。Compared with the traditional cross-entropy loss function, an improved image classification algorithm based on the step-by-step training of the Top-k loss function disclosed in the present invention can achieve higher classification accuracy of Top-1 and Top-2, while reducing manual calculation. complexity. On the basis of changing the model architecture, the present invention improves the accuracy of Top-1 and Top-2 compared with the cross-entropy loss function. Compared with the traditional training method, the proposed improved Top-k loss function step-by-step training image classification algorithm has better robustness and better performance on the CIFAR-10 dataset.

附图说明Description of drawings

图1为使用深度学习的图像分类算法流程图；Figure 1 is a flowchart of an image classification algorithm using deep learning;

图2使用两种损失函数进行分步训练实现图像分类的流程图；Fig. 2 uses two kinds of loss functions to carry out step-by-step training to realize the flow chart of image classification;

图3为为传统训练方法和提出的Top-k损失函数分步训练准确率效果对比图；Figure 3 is a comparison chart of the accuracy of the traditional training method and the proposed Top-k loss function step-by-step training;

具体实施方式Detailed ways

下面参照附图对本发明作进一步详细描述。The present invention will be described in further detail below with reference to the accompanying drawings.

首先进行数据集的选取，我们选择的数据集是CIFAR-10图像分类数据集，CIFAR-10数据集是广泛用于图像分类的基准数据集。该数据集包含60000张图像，分为10类(飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船和卡车)，分为50000张用于训练的图像和10000张用于测试的图像，大小均为32x 32像素。数据集详细信息如下表所示：First, the selection of the data set is carried out. The data set we choose is the CIFAR-10 image classification data set, which is a benchmark data set widely used in image classification. The dataset contains 60,000 images divided into 10 classes (airplane, car, bird, cat, deer, dog, frog, horse, boat, and truck), divided into 50,000 images for training and 10,000 images for testing Images, all 32x32 pixels in size. The dataset details are shown in the following table:

表1 CIFAR-10数据集信息表Table 1 CIFAR-10 dataset information table

数据集data set 训练集Training set 验证集validation set 测试集test set 类别category CIFAR-10CIFAR-10 5000050000 00 1000010000 1010

深度学习模型的选择，我们选取的进行测试的深度学习模型分类器有：The choice of deep learning model, the deep learning model classifiers we selected for testing are:

Inception-v3，Inception-v3具有强大的图像特征抽取和分类性能，是一种广泛使用的图像识别模型。它由对称和非对称构建块组成，包括卷积、平均池化、最大池化、辍学和全连接层。批量归一化在整个模型中广泛使用，并应用于激活输入。。Inception-v3, Inception-v3 has strong image feature extraction and classification performance, and is a widely used image recognition model. It consists of symmetric and asymmetric building blocks, including convolution, average pooling, max pooling, dropout, and fully connected layers. Batch normalization is widely used throughout the model and applied to the activation inputs. .

DPN92，DPN模型中用High Order RNN结构(HORNN)将DenseNet和ResNet联系到了一起，证明了DenseNet能从靠前的层级中提取到新的特征，而ResNet本质上是对之前层级中已提取特征的复用。通过把这两种结构的优点结合到一起，DPN网络可以有效提升分类效率。DPN92, the High Order RNN structure (HORNN) is used in the DPN model to connect DenseNet and ResNet, which proves that DenseNet can extract new features from the previous layer, and ResNet is essentially the feature that has been extracted in the previous layer. reuse. By combining the advantages of these two structures, the DPN network can effectively improve the classification efficiency.

ResNet，ResNet使用恒等映射将前一层的输出直接传到后层，增加了网络的深度的同时，并且误差不会增加，更深的网络不会带来训练集上误差的上升，解决了梯度消失问题。ResNet, ResNet uses identity mapping to directly transmit the output of the previous layer to the back layer, which increases the depth of the network, and the error will not increase. The deeper network will not bring about an increase in the error on the training set, and solve the gradient disappear problem.

训练模型，使用下表所示环境训练数据集。To train the model, use the environment training dataset shown in the table below.

表2实验环境表Table 2 Experimental environment table

模型训练参数如下表所示，根据模型参数量适当调整Batchsize减少显卡占用显存和模型训练时间。The model training parameters are shown in the following table. Adjust the Batchsize appropriately according to the model parameters to reduce the video memory usage of the graphics card and the model training time.

表3模型初始化参数表Table 3 Model initialization parameter table

参数parameter 值value 学习率learning rate 0.10.1 学习率衰减learning rate decay 0.10.1 动量momentum 0.90.9 批大小batch size 32/2432/24 训练轮次training rounds 200200

训练过程中，使用我们提出的一种改进型Top-k损失函数分步训练的图像分类算法，如图2所示，训练具有较低top-2错误的模型比改进top-1正确分类更简单，并且只要前两个预测类别包含真实标签，就可以有效地提高模型结构的容错性。通过联合训练策略的基础分类器用于我们提出的集成学习，并使用该训练策略分别训练多个CNN模型。在第一步中，我们使用交叉熵损失函数训练CNN网络，该函数由ImageNet预训练模型初始化，并针对数据集进行微调。我们知道，交叉熵损失函数是为了优化top-1损失而创建的损失函数，并且通过交叉熵损失函数对模型进行训练，以获得具有更高top-1正确率的网络。在第二步中，我们使用为前k个正确标签创建的top-k损失函数进行微调，并使用在第一步中训练的模型作为top-k损失函数训练的初始权重，以在top-1正确率和更高top-2正确率不变的情况下获得优化网络，该模型提高了识别能力。然而，由于标签的模糊性，特征的不确定性，没有办法真正识别，除了提高特征提取的效率，预测的标签隐藏在top-1到top-k之间，改进的top-2分类正确率进一步提高我们将提取top-2标签以便有效使用，该网络由两个并行的损失函数共同训练，形成一个统一的网络。During training, an image classification algorithm trained in steps using our proposed improved Top-k loss function, as shown in Figure 2, training a model with lower top-2 errors is simpler than improving top-1 correct classification , and can effectively improve the fault tolerance of the model structure as long as the first two predicted classes contain ground truth labels. The base classifier through a joint training strategy is used for our proposed ensemble learning, and multiple CNN models are trained separately using this training strategy. In the first step, we train a CNN network with a cross-entropy loss function initialized by an ImageNet pretrained model and fine-tuned for the dataset. We know that the cross-entropy loss function is a loss function created to optimize the top-1 loss, and the model is trained by the cross-entropy loss function to obtain a network with a higher top-1 correct rate. In the second step, we fine-tune using the top-k loss function created for the top-k correct labels, and use the model trained in the first step as the initial weights for the top-k loss function to train at the top-1 The optimized network is obtained with the same accuracy and higher top-2 accuracy, and the model improves the recognition ability. However, due to the ambiguity of labels and the uncertainty of features, there is no way to really identify, except to improve the efficiency of feature extraction, the predicted labels are hidden between top-1 and top-k, and the improved top-2 classification accuracy is further To improve we will extract the top-2 labels for efficient use, the network is jointly trained by two parallel loss functions to form a unified network.

测试模型，经过训练模型，得到ResNet、DPN92、Inception-v3三个深度学习模型的测试结果，使用传统交叉熵损失函数效果如下表所示：Test the model, after training the model, the test results of the three deep learning models of ResNet, DPN92, and Inception-v3 are obtained. The effect of using the traditional cross-entropy loss function is shown in the following table:

表3使用损失函数在CIFAR-10数据集上分类结果Table 3 Classification results on CIFAR-10 dataset using loss function

方法method 准确率Accuracy ResNet18ResNet18 96.5096.50 DPN92DPN92 97.5697.56 Incption-v3Incption-v3 97.1997.19

测试模型，经过训练模型，得到ResNet、DPN92、Inception-v3三个深度学习模型的测试结果，使用我们提出的Top-k损失函数分步训练如下表所示：Test the model, after training the model, the test results of the three deep learning models of ResNet, DPN92, and Inception-v3 are obtained. The Top-k loss function proposed by us is used for step-by-step training as shown in the following table:

表4在CIFAR-10数据集的Inception-v3、DPN92和ResNet18模型上使用不同损失函数对分类结果进行比较。Table 4 compares the classification results using different loss functions on Inception-v3, DPN92 and ResNet18 models on the CIFAR-10 dataset.

本文提出了一种改进型Top-k损失函数分步训练的图像分类算法，在保证top-1精度的同时，试图减少模型的top-2泛化误差，并通过约束top-k损失值来强调模型多个输出值的重要性和top-k损失的重要性，进一步提高分类性能。在CIFAR-10数据集上对这两种损失功能进行了对比实验。使用top-k损失函数后的基础分类器不仅在top-1精度和top-2精度方面优于使用交叉熵损失函数。可以看出，应用top-k损失函数后，模型的分类性能得到了改善。特别是，在应用top-k损失函数后，在CIFAR-10数据集上，模型的top-1精度提高了约0.03％-0.2％，top-2精度提高了约0.2％-0.6％，附图3显示了CIFAR-10数据集上top-k损失函数的稳定性。This paper proposes an image classification algorithm with an improved Top-k loss function trained step by step. While ensuring the top-1 accuracy, it tries to reduce the top-2 generalization error of the model, and emphasizes the top-k loss by constraining the value. The importance of multiple output values of the model and the importance of the top-k loss further improve the classification performance. Contrastive experiments are conducted on the two loss functions on the CIFAR-10 dataset. The base classifier after using the top-k loss function is not only better than using the cross-entropy loss function in terms of top-1 accuracy and top-2 accuracy. It can be seen that the classification performance of the model is improved after applying the top-k loss function. In particular, after applying the top-k loss function, on the CIFAR-10 dataset, the top-1 accuracy of the model is improved by about 0.03%-0.2%, and the top-2 accuracy is improved by about 0.2%-0.6%, Fig. 3 shows the stability of the top-k loss function on the CIFAR-10 dataset.

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, etc., All should be included within the protection scope of the present invention.

Claims

1. the image classification algorithm of an improved Top-k loss function step-by-step training, is characterized in that, comprises module:

(1) Image data preprocessing module

(2) Deep learning feature extraction module

(3) System prediction output module.

2. the image classification algorithm of a kind of improved Top-k loss function step-by-step training according to claim 1, is characterized in that, the image data preprocessing module in module (1), carries out preprocessing to the input image data , the deep learning feature extraction module in module (2), including the use of deep learning to use an improved Top-k loss function step-by-step training improvement, the classifier module is a deep learning network fully connected layer, module (3) The system prediction output module of the classifier outputs the judgment result after processing the output of the classifier.

3. the image classification algorithm of a kind of improved Top-k loss function step-by-step training according to claim 1, it is characterized in that by improving Top-k loss function step-by-step training can replace cross entropy loss function, constructs Top-k loss function step by step training. A better network with improved 1 accuracy and Top-2 accuracy. The traditional cross entropy loss function, the mathematical formula is expressed as:

The top-k loss depends on whether y is part of the top-k predictions, which is equivalent to comparing the top-k predictions with the ground-truth labels. The mathematical formula of the Top-k loss function is expressed as:

4. the image classification algorithm of a kind of improved Top-k loss function step-by-step training according to claim 1, is characterized in that can guarantee to improve the ability of model feature extraction while not revising network structure, obtain better. classification effect and improve the robustness of the model.