CN113962329A

CN113962329A - Novel image recognition algorithm based on deep ensemble learning

Info

Publication number: CN113962329A
Application number: CN202111351249.3A
Authority: CN
Inventors: 邓泽林; 胡钰聪
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-01-21
Anticipated expiration: 2041-11-15
Also published as: CN113962329B

Abstract

The invention discloses a new image recognition algorithm based on deep ensemble learning, which comprises an image data preprocessing module, a depth algorithm module, a deep learning algorithm integration module and a prediction output module, and is characterized in that the image data preprocessing module converts an input image into a model input matrix and expands data; the depth algorithm module is used for carrying out preference selection in a plurality of depth models, the selected models are used as a base learner for integrated learning, and the base learner is independently trained, so that the diversity among classifiers can be increased and the independence can be enhanced; the deep learning algorithm integration module carries out strategy integration on the multiple models, and maximizes the output of the models by combining the multiple output results of the multiple models; and the system prediction module outputs the judgment result to the integration module. The image recognition network based on ensemble learning relieves the ambiguity problem by fully utilizing the output result of the base learner, effectively improves the fault-tolerant capability of the model and obtains higher recognition accuracy.

Description

Novel image recognition algorithm based on deep ensemble learning

Technical Field

The invention relates to the field of image recognition, in particular to a novel image recognition method based on deep ensemble learning.

Background

The ensemble learning can efficiently solve the problem of practical application, and therefore, the ensemble learning attracts much attention in the field of machine learning. Initially, ensemble learning was aimed at improving the accuracy of automated decision-making systems, and this approach has been able to successfully solve a variety of machine learning problems today, and ensemble learning provides solutions to a variety of machine learning problems, primarily by combining multiple learners, whose models can solve many problems that a single model cannot solve.

Dasarathy et al, first proposed an integrated learning concept that achieves a range of enhanced recognition system performance by deploying a composite classifier system consisting of two or more component classifiers belonging to different classes. The deployment domain of these individual components (classifiers) is determined by the optimal partitioning of the problem space. The criteria for such an optimal division are in each case determined by the characteristics of the classifier component. The concepts, related methods, and possible benefits one may expect are designed by such a composite classifier system, exemplified in terms of partitioning the feature space to optimize the deployment of the composite system with linear and nearest neighbor classifiers as its components.

Hansen et al propose an integration model based on neural networks and optimize network parameters and architecture with cross validation, the integration model reduces residual error generalization errors by calling for the integration of similar networks, and has better generalization capability.

Kittler et al developed a general theoretical framework to combine classifiers using different pattern representations and showed that many existing schemes can be considered as special cases of composite classification, where all pattern representations are used jointly to make decisions. Experimental comparisons of various classifier combination schemes show that the combination rule developed under the most rigorous hypothesis-the sum rule-is superior to other classifier combination schemes. Analysis of the sensitivity of the various schemes to estimation errors shows that this finding is theoretically reasonable. .

Cheng et al studied a number of widely used integration methods, including unweighted averaging, majority voting, bayesian optimal classifiers, and (discrete) super learners, for image recognition tasks, and with deep neural networks as candidate algorithms. And several experiments are designed, and the candidate algorithms are the same network structure with different model check points in a single training process, the network with the same structure but randomly trained for multiple times, and the network with different structures. In addition, the over-confidence phenomenon of the neural network, and its influence on the integration method, were further studied. In all experiments, the super learner achieved the best performance in all integrated approaches.

Hafiz et al, proposes a simple, sequential, and efficient ensemble learning method using multiple deep networks to address the shortcomings of the traditional machine learning methods when complex data. The deep network used in the integration is ResNet 50. The model draws inspiration from a binary decision/classification tree.

In summary, although the image classification task achieves significant results in recognition rate through different integration methods, some problems still exist. The existing integrated classifiers only change the part of the model or the training part of the model, the output of the base classifier is not fully utilized, and the waste of characteristic information caused by neglecting other output results influences the final classification effect. Therefore, the invention provides a new classifier based on ensemble learning, and the classifier is applied to an image classification task to extract the high-level features of the image and improve the accuracy rate through the diversity of a plurality of models.

Disclosure of Invention

Compared with the traditional method, the image recognition system based on the ensemble learning can realize extremely high recognition accuracy on the image, increase the feature extraction efficiency, overcome the problems of low feature extraction efficiency and difficult classification in the traditional method and achieve higher accuracy. The problem that the traditional integrated classifier only changes the part of a model or a model training part and the output of a base classifier is not fully utilized is solved, a novel image classification integrated classifier based on integrated learning is provided, a plurality of output predictions of multiple models are combined, the features extracted by the base model can be utilized to the maximum extent, the model utilization rate is increased, the model robustness is increased, and the classification effect is improved by using a novel multi-model integration method through the network.

The solution proposed by the invention is to realize image recognition by adopting a network based on deep ensemble learning, which comprises the following steps: the device comprises an image data preprocessing module, a depth algorithm module, a depth learning algorithm integration module and a prediction output module.

The image data preprocessing module has two conditions of training and testing. The functions of the data preprocessing module comprise: image data is preprocessed and converted into an input form which can be accepted by a model image recognition module, image categories, namely real labels, need to be marked in a training mode, and a data set needed by a machine learning method is constructed.

The working method of image data preprocessing in the training mode comprises the following steps: and converting the image data into matrix data, and performing preprocessing such as uniform up-sampling, center cutting, rotation and the like on each sample image data to expand the data volume. After the training mode is finished, the parameters of the convolutional neural network can be updated, and the network capable of accurately identifying the specified target is obtained. And upsampling is unified to a fixed resolution, in the following embodiment, the upsampling results in an image resolution of 224 × 224, wherein inclusion-v 3 is 299 × 299.

The working method of image data preprocessing in the test mode comprises the following steps: the image data is converted into matrix data, each sample image data is subjected to unified upsampling without data expansion processing, and is uniformly upsampled to a fixed resolution, the resolution of the image obtained after upsampling in the following embodiment is 224 × 224, wherein the inclusion-v 3 is 299 × 299.

The deep algorithm module comprises the selection of base classifiers, based on the principle that the base classifiers can provide as much supplementary information as possible in classification, the test is carried out from a plurality of neural networks, and then a plurality of base classifiers with better classification efficiency are selected as the base classifiers. Several classifiers are selected that have different mechanisms in classification so that they can provide complementary information to each other. Furthermore, the single classifier chosen is complex and is known to achieve high performance in classification. These classifiers trained with the same training parameters will help increase diversity and enhance independence between classifiers.

The deep learning algorithm integration module comprises a proposed integration algorithm, in the model training process, an independent training mode is adopted, the model independence is enhanced, a large number of experiments show that many correct labels exist in the first k predicted values, and a plurality of models can be used for relieving the class ambiguity problem by considering that the first k outputs are the results of model feature extraction. Two results, namely top-1 and top-2 prediction, are output through each classifier, the top-1 prediction results of a plurality of models are placed into a top-1 prediction pool, the top-2 prediction results are placed into a top-2 prediction pool, when the top-1 prediction pool cannot obtain the maximum classification result, the top-1 prediction pool and the top-2 prediction pool are combined, the class with the most labels is used as a sample class to be output, if the maximum result cannot be obtained, each model is used as an independent 'evidence', confidence coefficient distribution of the evidence is fused, and the model maximum prediction value with the highest confidence coefficient corresponds to the label to be used as the final classification result.

And the system output module processes the output of the classifier and outputs a prediction result.

The invention has the beneficial effects that:

the invention provides a novel multi-model multi-output network based on ensemble learning based on the problem of image recognition, which can increase the utilization efficiency of features and improve the image recognition effect. The problem that the output result of a base learner is not fully utilized in the existing deep ensemble learning method can be solved, ambiguity problems are relieved by combining top-2 results of a plurality of models, and the fault tolerance of the models is effectively improved. And the identification capability of the model is improved by outputting results of multiple models and combining confidence coefficient distribution to obtain more useful features.

Drawings

FIG. 1 is a diagram of the base classifier EfficientNet model used in the present invention.

FIG. 2 is a diagram of a base classifier VGGNet model used in the present invention.

Fig. 3 is a diagram of a network model of the basis classifier inclusion-v 3 used in the present invention.

FIG. 4 is a diagram of multiple model independent training based on ensemble learning in an embodiment of the present invention.

FIG. 5 is a diagram of a model for fusing multiple output results of multiple models according to an embodiment of the present invention.

Detailed Description

The present invention is described in further detail below with reference to the attached drawings.

Examples

First, we selected a data set, which is a dermannist skin lesion data set, and which contains 7 classes, including 7007 training sets, 1003 verification sets, 2005 test sets, and 28 × 28 image sizes. The data set details are shown in the following table:

TABLE 1DermaMNIST data set introduction

Dataset	Train Size	Val Size	Test Size	Classes
					DermaMNIST	7007	1003	2005	7

The next step is the selection of the base classifier. Based on the principle that the classifier can provide as much supplementary information as possible in classification, the classifier performs testing from a plurality of deep neural networks and then selects a plurality of base classifiers with better classification efficiency.

We selected neural network classifiers for testing to be:

(1) VGGNet, VGGNet is a deep convolutional neural network developed together by Google corporation, oxford university. It explores the relationship between the depth of the convolutional neural network and its performance, VGGNet obtains the champions of the ILSVRC 2014 game and the champions of the positioning projects by iteratively stacking a small convolutional kernel of 3x3 and a maximum pooling layer of 2x2, with an error rate of 7.5% on top 5.

(2) The EfficientNet is one of the most advanced neural network algorithms in the world at present, the algorithm has the quantization adjustment capacity of a complex network, and the optimal network parameters for specific requirements are obtained by comprehensively adjusting the depth (depth), the width (width) and the resolution (resolution) of an input picture, so that the network has the dual advantages of network size and identification accuracy. The inside of the EfficientNet model is realized by a plurality of MBConv convolution blocks, and the MBConv convolution blocks also use a structure similar to residual linking, so that the problem of model gradient disappearance is solved.

(3) ResNet, ResNet uses the identity mapping to directly transmit the output of the previous layer to the back layer, when the depth of the network is increased, the error cannot be increased, the deeper network cannot bring the rise of the error on the training set, and the problem of gradient disappearance is solved.

(4) The Incep-v 3 and the Incep-v 3 have strong image feature extraction and classification performance and are widely used image recognition models. It consists of symmetric and asymmetric building blocks, including convolution, average pooling, maximum pooling, dropout, and full-connected layers. Batch normalization is widely used throughout the model and applied to activation inputs. Two 3x3 convolution kernels are used for replacing a 5x5 convolution kernel, three 3x3 convolution kernels are used for replacing a 7x7 convolution kernel, the parameter number is reduced, calculation is accelerated, the nxn convolution kernels are further decomposed into 1xn and nx1 convolution kernels, the size of feature map is reduced, and the channel number is increased.

(5) The High Order RNN structure (HORNN) used in the DPN model to tie densnet and ResNet together demonstrates that densnet can extract new features from earlier levels, while ResNet is essentially a reuse of extracted features from earlier levels. By combining the advantages of the two structures, the DPN network can effectively improve the classification efficiency.

The model was trained using the environmental training data set shown in the table below.

Table 2 experimental environment table

Name(s)	Configuration of
		Operating system	Ubuntu18.04
GPU	NVIDIA GEFORCE RTX 2080Ti
		CPU	Inter xeon processor(skylake,IBRS),2
RAM	16GB
		GPU correlation library	CUDA10.2,CUDNN7.6
Deep learning framework	Pytorch

The model training parameters are shown in the following table, and the Batchsize and the Epoch are properly adjusted according to the model parameter quantity to reduce the video memory occupied by the video card and the model training time.

TABLE 3 model initialization parameter Table

The test model is trained to obtain test results of five base classifiers, namely VGGNet, EfficientNet, ResNet, DPN92 and inclusion, and the test results are shown in the following table:

TABLE 4 DermaMNIST data set comparison of different basis classifier accuracy performance

Method	Accuracy
		VGGNet	76.41
EfficientNet	76.86
		ResNet	73.22
DPN92	73.72
		Inception-v3	76.86

Finally, according to the experimental effect, EfficientNet, VGGNet and inclusion-v 3 are selected as the basic networks for integration, and the network models are respectively shown in the following tables as the network structures shown in FIG. 1, FIG. 2 and FIG. 3:

table 5 selected base classifier network architecture

And in a final deep algorithm integration module of the model, an independent training mode is adopted, and basic classifiers are independently trained one by a basic network. Each neural network model outputs two prediction results, and the first deep neural network is assumed to generate h for the input image₁(x) 1 column n-dimensional matrix, where n is the number of classes, and a second deep neural network generates h for the input image₂(x) A third deep neural network generates h for the input image₃(x) The maximum prediction labels of the three neural networks are used as a first classification pool, the second maximum prediction label is placed in a second classification pool, and the model is trained independently as shown in fig. 4. The best strategy for realizing the highest prediction performance by using the convolutional neural network is to independently train each base classifier, each model is independently trained to serve as a class of classified 'experts', each base learner gives out own classification prediction and is not interfered by other learners, the independence of the learners is ensured, and information acquisition redundancy caused by the interference of other learners is avoided. Each learner is an independent CNN architecture network with good prediction efficiency. Here we represent the base classifier as a shallow network and then integrate its output into a deeper network, generalizing the feature extraction capability.

And after the data passes through the independently trained base classifier, the prediction probability of each class of the target is output through a Soft-max function, the class where the maximum probability value is located is judged as the class of the target, namely Top-1, and the class where the second approximate probability value is located is judged as the second prediction class, namely Top-2. In practical application, the image data is input into the trained model, the system can determine and output the Top-1 and Top-2 categories of the target to be used as the input of the ensemble learning, and the multi-model output is combined with the model as shown in fig. 5. In this case, the model prediction is used instead of the softmax classification value, so that the maximum and minimum values generated by the Soft-max function are prevented from causing certain damage to other base classifier models. We express the first classification pool as a Majority voting (Majority voting) mathematical formula:

wherein, T represents T classifiers, N represents N categories, namely, the prediction result of the T classifiers to the category j is more than half of the total voting result, the prediction is the category j, otherwise, the prediction is rejected.

And if the first classification pool has a prediction rejection condition, namely a majority voting result is not generated, combining the first classification pool and the second classification pool, and continuously using a majority voting method to obtain the prediction. If the prediction is still rejected, the first classification pool and the second classification pool still do not produce most results after being combined. We treat each model as an independent "evidence", fuse confidence distributions of the evidence, and the mathematical formula is expressed as:

the above equation gives the probability that the predictions are correct in each of the n models, where p is the probability that the predictions are correct in a single model. The higher the confidence coefficient is, the more positive the output result of the model is, and the label corresponding to the maximum predicted value of the model with the highest confidence coefficient is taken as the final classification result.

The test model, the method of the invention, on the DermaMNIST public data set, compares with the traditional method and other convolution neural network methods respectively. Table 2 shows the comparison of the recognition accuracy obtained by different algorithms on the dermannist image dataset based on ensemble learning as proposed by the present invention. ResNet18, DenseNet, inclusion-v 3, EffectintNet, DPN92, MobileNet V2 and VGGNet11 are respectively single different convolutional neural network architectures, Ruta is an integrated learning method proposed by other people, and compared with the methods, the integrated learning method provided by the inventor obtains higher identification accuracy, and the accuracy is compared with the following table:

TABLE 6 comparison of Performance of different methods on DermaMNIST dataset

Method	Accuracy
		AutoKeras	74.9
Yang	76.8
		ResNet18	73.22
Auto-sklearn	71.9
		VGGNet11	76.41
Inception-v3	76.86
		EfficientNet	76.86
MobileNetV2	75.81
		DPN92	73.72
Ruta	77.40
		Ours Method	77.55

The above embodiments describe in detail a specific embodiment of the new image recognition system based on ensemble learning, and the above embodiments only serve to help understanding the proposed method and core idea of the present invention, and according to the idea of the present invention, there may be some differences in the specific embodiment, and in summary, the content of the present specification should not be construed as limiting the present invention.

Claims

1. A new ensemble learning based multiple output convolutional neural network for image recognition, comprising the steps of:

(1) selecting a data set for carrying out an image classification experiment from a plurality of image classification data sets;

(2) forward propagation: inputting the image into a classifier, and obtaining a plurality of model prediction results through a plurality of convolutional neural networks including residual error networks ResNet, VGGNet, EfficientNet, DPN92 and other deep learning networks;

(3) and (3) back propagation: firstly, parameters of the classifier are fixed, gradient descending is not carried out on the parameters, and meanwhile back propagation and parameter updating are carried out on a new classifier. Then, allowing the gradient of the classifier to descend, and performing back propagation and updating parameters;

(4) and integrating the models according to the proposed integration strategy, comparing the models with other latest single models and other integrated models in a model structure, testing the trained models by adopting Accuracy (Accuracy), and measuring classification results.

2. The ensemble learning-based multi-output convolutional neural network for image classification as claimed in claim 1,

in the step (2), based on a plurality of new classifiers of the deep neural network, each neural network model outputs two prediction results, and the first deep neural network is assumed to generate h for the input image₁(x) 1 column n-dimensional matrix, where n is the number of classes, and a second deep neural network generates h for the input image₂(x) A third deep neural network generates h for the input image₃(x) The maximum prediction labels of the three neural networks are used as a first classification pool, a second maximum prediction label is placed in a second classification pool, and the first classification pool is expressed by a mathematical formula according to a Majority voting method (Majority voting):

3. The ensemble learning-based multi-output convolutional neural network for image classification as claimed in claim 1,

the new classifier in the step (2) is a classifier based on ensemble learning, and the ensemble learning is a machine learning method which uses a series of base learners to learn and integrates all learning results by using a certain rule so as to obtain better learning effect than a single learner. The range of enhancing the performance of the recognition system can be achieved by utilizing a plurality of classifiers and utilizing the diversity and independence of the classifiers to perform better-level extraction on the image features and deploying a composite classifier system consisting of two or more component classifiers belonging to different categories.

4. The ensemble learning-based multi-output convolutional neural network for image classification as claimed in claim 1,

and (4) in the step (3), the new classifier utilizes a network with a plurality of residual modules in convolution, so that feature extraction can be ensured under the condition of not reducing features, and the problem of feature reduction caused by gradient disappearance due to too deep network of the classifier is prevented.