CN110070116B

CN110070116B - Segmented selection integration image classification method based on deep tree training strategy

Info

Publication number: CN110070116B
Application number: CN201910274548.8A
Authority: CN
Inventors: 杨云; 胡媛媛
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2022-09-20
Anticipated expiration: 2039-04-08
Also published as: CN110070116A

Abstract

The invention discloses a sectional type selection integration image classification method based on a deep tree training strategy, which adopts the deep tree training strategy, adds a branch classifier at each intermediate layer of a deep convolutional neural network, trains a plurality of branch classifiers which have difference and can extract the characteristics of the intermediate layer by integrating and gradient compensating the branch classifiers, and generates a candidate base classifier set; and screening the candidate base classifiers by using a sectional selection integration method based on correct rate and difference screening to obtain a selected base classifier set, and performing weighted integration on the selected base classifier set to generate a final classification model. According to the invention, image classification is carried out by a gradient compensation technology, a feature fusion technology and a mode of combining deep learning and ensemble learning, so that the training efficiency of a deep neural network is improved, the classification performance of a classifier is improved, the training cost of ensemble learning is reduced, and the generalization and the accuracy of the final classifier are enhanced to the greatest extent.

Description

Segmented selection integration image classification method based on deep tree training strategy

Technical Field

The invention relates to the field of computer vision and supervised learning, in particular to a segmented selection integration image classification method based on a deep tree training strategy.

Background

Image classification is a main research direction in the field of computer vision, and plays a very important role and influence in intelligent data processing taking images as main bodies. Because image classification has higher theoretical research and practical application value, many scholars at home and abroad put forward a plurality of image classification models based on machine learning. The traditional machine learning method is divided into three steps: feature extraction, feature selection and classification, and deep learning is one type of machine learning method. At present, the deep convolutional neural network which is widely applied is a deep learning model with the best effect.

In 2012, a.krizhevsky et al proposed an eight-layer convolutional neural network AlexNet in Imagenet classification with deep connected neural networks, and took the champion in the Imagenet image classification competition of the current year and far beyond the second ten percent, and let the convolutional neural network come back to the sight of people again. Various improved convolutional neural network structures were subsequently developed as bamboo shoots in the spring after rain, representative of which are VGG, google lenet and ResNet.

However, in deep learning, there are two main problems: (1) the problem that the gradient disappears can occur in the deep neural network along with the increase of the number of network layers, so that the model is difficult to train and converges to a good local optimal point; (2) an image classification model constructed by using the deep neural network generally only uses the deepest features of the network and a group of features with the highest image extraction degree as a final classification basis, but a single group of features is weak in robustness in the face of complex tasks, so that the single group of features is not the best features for classification for some image classification tasks, in other words, the traditional deep neural network lacks stability and robustness for the image classification tasks. For example, c.szegdy et al, in Going stripper with considerations, 2014, proposed a convolutional neural network google net with 22 layers, which employs two branch classifiers with 1 convolutional layer, 1 pooling layer, and 2 fully-connected layers, and the gradient compensation effect is lost because the gradient of compensation needs to go through more layers. In addition, the basis for final classification is only the deepest features, and the stability and robustness in the face of complex image classification problems are not ideal.

Therefore, a method combining the deep learning and the integrated learning which has flexibility and can improve stability and robustness is also developed, however, the following two main problems still exist in such a method: (1) the training cost of a single deep neural network is high, so the time cost for training an integrated member for integrated learning by adopting a traditional deep training method is very high; (2) the core idea of ensemble learning is that a plurality of ensemble members with high classification accuracy and difference are trained and integrated, so that the generalization of an ensemble learning model is improved; however, since the classification accuracy and the classification variability are not independent from each other, it is still the key point and the difficulty in the field of ensemble learning to maximize the generalization of the ensemble model by considering both the classification accuracy and the classification variability.

Disclosure of Invention

The invention aims to provide a sectional type selection integration image classification method based on a deep tree-shaped training strategy aiming at the defects and shortcomings of the prior art, and solves the problem of gradient disappearance, improves the efficiency of deep network training, and enhances the generalization and the accuracy of a final classifier to the maximum extent through technical methods such as gradient compensation, feature fusion, sectional type selection integration based on accuracy and difference screening and the like.

In order to achieve the purpose, the invention adopts the technical scheme that:

the segmented selection integration image classification method based on the deep tree training strategy comprises the following steps:

s1: a deep tree training strategy is adopted, a branch classifier is added to each middle layer of a deep convolutional neural network, each branch classifier only comprises a full connection layer, and the branch classifiers are not discarded in the training or testing process; performing gradient compensation on the weight at the shallow layer by using the branch classifier; training a deep convolutional neural network containing branch classifiers to obtain a plurality of branch classifiers which have difference and can classify on the characteristics of the middle layer, namely obtaining a candidate base classifier set with a plurality of branch classifiers;

s2: and obtaining final integration members and generating a final classification model by adopting a sectional selection integration method based on accuracy and difference screening.

Further, the construction process of the branch classifier in step S1 is as follows:

s1.11: the specific implementation method for acquiring the characteristic diagram from the intermediate convolution layer comprises the following steps:

the input picture training data set with the class mark is s { (X) _i ，Y _i ) I ═ 1,2, …, N }, where the picture data X _i ∈R ⁿ (indicating that each picture data is n-dimensional) and picture X _i Corresponding true class label Y _i E {1, …, K } (where K is the total number of categories);

without considering the bias term, the convolution and activation calculation process for convolutional layer M (where M ∈ {1, …, M }, and M is the total number of convolutional layers) can be expressed as shown in formula (a1), and the pooling calculation process is shown in formula (a 2):

Q _m ＝f(w _m *Z _m-1 )，and(Z ₀ ≡X _i ) (A1)

Z _m ＝g(Q _m ) (A2)

wherein, w _m Is the weight of convolutional layer m to be learned, is the convolution operation, Z _m-1 The characteristic diagram generated for layer m-1 is shown. The function f (mu) is max (0, mu) is a ReLu activation function which can accelerate network training and obtain a model with better performance, and g (Q) _m ) Representing a maximal pooling function;

s1.12: flattening the characteristic diagram, wherein the specific implementation method comprises the following steps:

performing feature extraction on input data layer by the convolutional layer, wherein a feature map extracted by the last convolutional layer is a two-dimensional feature matrix, the flattening operation is to convert the two-dimensional feature matrix into a one-dimensional feature vector, then the one-dimensional feature vector is input into a classifier to perform final decision calculation, and neurons between the one-dimensional feature vector obtained by the flattening operation and the classifier are fully connected;

s1.13: the feature vector after flattening is combined with the classifier in a full-connection mode, and the specific implementation process is as follows:

s1.13.1: in the full connection layer part, V is assumed ^(m) And

respectively represent the feature vectors and weight matrices input to the branch classifier m,

representing its bias term, the final layer of output vectors O of the branch classifier m ^(m) Is calculated as shown in equation (A3):

inputting the output vector into a classifier to calculate a K-dimensional probability vector, wherein the class K corresponding to the maximum probability value is the branch classifier m according to the input example X _i Calculating a prediction class mark, wherein the corresponding probability value is called a score;

s1.13.2: to conveniently represent the objective function of the branch classifier m in deep tree training, convolutional layer weights (w) are used ₁ ，…，w _m ) Weight matrix for fully connected network with branch classifier m

Combining as shown in formula (A4):

s1.13.3: evaluating the learning effect of the deep learning model by measuring the distance between the prediction score and the real class mark through the cross entropy by using an objective function based on cross entropy loss, wherein the smaller the cross entropy in the training data set is, the better the fitting effect of the model on the training data set is represented; the cross-entropy loss of the branch classifier m in the training data set containing n instances is shown as equation (a 5):

wherein, log (x) ═ ln (x);

thus, the objective function of the branch classifier m

As shown in formula (A6):

s1.13.4: for each branch classifier, the data characteristics of the final classification decision are different, and the contribution degree of the branch classifier to gradient compensation in the training process can also be different; therefore, the final objective function of the convolutional neural network using the deep tree training strategy performs weighted integration on the objective functions of the branch classifiers, as shown in formula (a 7):

wherein, beta ^(m) Denotes the weight of the branch classifier m, W ═ W ⁽¹⁾ ，…，W ^(M) )，

Thus, the objective function of the deep convolutional neural network trained by using the deep tree training strategy is obtained, and if a set of weights W and theta are found according to the relation of the objective function _full To make J (W, theta) _full ) The smallest means that the classification performance of the model is the best.

Further, the process of the gradient compensation in step S1 is as follows:

s1.21: calculating the error between the prediction class mark and the real class mark of each branch classifier;

s1.22: the gradients of the branch classifiers are injected into the sharing part layer by layer in a back propagation mode, and the sharing part sums the gradients obtained from each branch classifier and is used for updating the weights. The gradient compensation is mainly expressed in a back propagation stage, and for a convolutional neural network using a deep tree training strategy, the gradient compensation is realized as follows:

weight w at layer m _m Gradient that can be obtained

From M-M branch classifiers after layer M, thus

Is calculated as shown in equation (A8):

wherein,

and beta ^(m+i) Representing the gradient obtained from the branch classifier m + i and the assigned weight, respectively. From equation (A8), a weighted average of the gradients from the different branch classifiers is calculated at each gradient compensation node, which is also the gradient used during the parameter update of the model. Therefore, the model parameter updating is carried out by using the average gradient obtained by the gradient injected by each branch classifier, so that the stability of parameter updating can be improved, and the final objective function can quickly reach a minimum value.

Further, the specific implementation process of step S2 is as follows:

s2.1: screening the candidate base classifiers based on the classification accuracy, firstly calculating the verification accuracy of each candidate base classifier, then performing descending order arrangement on the candidate base classifiers according to the verification accuracy, and screening out the front R base classifiers with better performance in the sequence;

s2.2: adopting a double-error difference metric function, carrying out centralized screening on the first R base classifiers screened and determined in the step S2.1, and screening out a base classifier set which has the maximum internal difference and can ensure that the generalization capability of the integrated classifier is optimal as a selected base classifier set, wherein the specific implementation method comprises the following steps:

classifier d in base classifier screening based on double-error difference metric function _i And d _j The difference between Div (d) _i ,d _j ) The joint distribution between the two classifiers can be expressed as shown in formula (a 9):

wherein N is ¹¹ 、N ¹⁰ 、N ⁰¹ 、N ⁰⁰ Represents two classifiers d _i And d _j Number of instances of joint distribution between, e.g. N ¹¹ Representation classifier d _i And d _j All classify the correct number of instances;

and (4) performing iterative screening on the first R base classifiers selected in the step (S2.1) one by one, and adding the classifier with the maximum difference into the selected base classifier set one by one. Wherein a certain classifier d _i With a set of classifiers D ^* The difference between them is an average difference calculated as shown in formula (a 10):

wherein, | D ^* I denotes the set of classifiers D ^* The number of classifiers contained;

s2.3: determining the weight of the selected base classifier set obtained by screening in the step S2.2 according to the classification error magnitude in the verification data set, considering the difference between each base classifier and the contribution degree of the base classifier set to the target task, and then determining the selected base classifier d _t (where T is 1 … T) and the final classifier model is expressed by the formula (T:)A11) Shown in the figure:

wherein,

represented in the selected base classifier set D ^* The classification result (i.e. the probability vector) of the t-th classifier for input instance x,

its corresponding weight. Therefore, the objective function of the weighted integration is as shown in equation (a 12):

wherein, CQ _ij ＝∫(d _i (x)-f(x))(d _j (x) -f (x)) p (x) dx, p (x) is the distribution of sample x.

For weighted integration, the goal is to find a set of weights such that loss (EN) takes a minimum. According to the Lagrange multiplier method, here the weight α _t The calculation of (c) can be expressed as shown in equation (a 13):

wherein,

indicating the quality of the corresponding t-th classifier,

is a normalization term.

Further, in the construction process of the branch classifier, the classifier used is a softmax classifier, an SVM classifier or other classifiers capable of performing multi-classification tasks.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. by a gradient compensation method, the gradient is supplemented for the deep neural network in the training process, the problem of gradient disappearance is solved, and the training efficiency of the deep neural network is improved to the greatest extent;

2. the method comprises the steps that a plurality of branches are led out from a plurality of middle hidden layers of a deep neural network, branch classifiers which adopt middle layer characteristics as classification bases are constructed, and then the branch classifiers are integrated by adopting ensemble learning, so that the middle layer characteristics and the deepest layer characteristics are fused, the middle layer characteristics are utilized to the greatest extent, and the classification performance of the final classifier is improved;

3. because the branch classifier with only one full connection layer is adopted, a plurality of classifiers with high classification accuracy and certain difference can be obtained by training the deep neural network containing a plurality of branch classifiers once, and the branch classifiers are taken as integration members, so that the integrated learning training cost of taking the deep neural network as the integration members is reduced;

4. by the method, the integrated members are screened according to the classification accuracy of the integrated members, the screened integrated members are screened again according to the classification difference of the integrated members, and the weights are determined according to the classification errors of the screened base classifiers in the verification data set for weighted integration, so that the generalization and the accuracy of the final classifier are enhanced to the greatest extent.

Drawings

FIG. 1 is a flow chart of an algorithm for segmented selection integration based on a deep tree training strategy;

FIG. 2 is a diagram of an example of CIFAR-10 image data.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.

As shown in fig. 1, the present embodiment provides a segmentation type selection integrated image classification method based on a deep tree training strategy, which includes the following steps:

the input picture training data set with the class mark is S { (X) _i ,Y _i ) I ═ 1,2, …, N }, where the picture data X _i ∈R ⁿ (indicating that each picture data is n-dimensional) and picture X _i Corresponding true class label Y _i E {1, …, K } (where K is the total number of categories).

Without considering the bias term, the convolution and activation calculation process of convolutional layer M (where M ∈ {1, …, M }, and M is the total number of convolutional layers) can be expressed as shown in formula (B1), and the pooling calculation process is shown in formula (B2):

Q _m ＝f(w _m *Z _m-1 ),and(Z ₀ ≡X _i ) (B1)

Z _m ＝g(Q _m ) (B2)

wherein, w _m Is the weight of convolutional layer m to be learned, is the convolution operation, Z _m-1 The characteristic diagram generated for layer m-1 is shown. The function f (mu) is max (0, mu) which is a ReLu activation function and can accelerate network training and obtain better performanceModel, g (Q) _m ) The maximum pooling function is represented.

s1.13.1: in the full connection layer part, V is assumed ^(m) And

representing its bias term, the final layer output vector o (m) of the branch classifier m is calculated as shown in equation (B3):

inputting the output vector into a softmax classifier to calculate a K-dimensional probability vector, wherein the class K corresponding to the maximum probability value is the branch classifier m according to the input example X _i The corresponding probability value is called score.

Here, the softmax classifier is represented by the formula (B4):

wherein exp (x) e ^x Σ is the sum symbol, h ^(m) (y＝k|X _i ) Is a sample X _i Is classified as the probability score corresponding to class k.

S1.13.2: to conveniently represent the objective function of the branch classifier m in deep tree training, convolutional layer weights (w) are used ₁ ,…,w _m ) Weight matrix for fully connected network with branch classifier m

Combining as shown in formula (B5):

s1.13.3: evaluating the learning effect of the deep learning model by measuring the distance between the prediction score and the real class mark through the cross entropy by using an objective function based on cross entropy loss, wherein the smaller the cross entropy in the training data set is, the better the fitting effect of the model on the training data set is represented; the cross-entropy loss of the branch classifier m in the training dataset containing n instances is shown as (B6):

wherein, log (x) ═ ln (x);

thus, the objective function of the branch classifier m

As shown in formula (B7):

s1.13.4: for each branch classifier, the data characteristics of the final classification decision are different, and the contribution degree of the branch classifier to gradient compensation in the training process may also be different; therefore, the final objective function of the convolutional neural network using the deep tree training strategy performs weighted integration on the objective functions of the branch classifiers, as shown in formula (B8):

wherein beta is ^(m) Denotes the weight of the branch classifier m, W ═ W ⁽¹⁾ ,…,W ^(M) )，

Thus, the final objective function of the deep convolutional neural network trained by using the deep tree training strategy is obtained, and if a set of weights W and theta are found according to the objective function relation _full To make J (W, theta) _full ) The smallest means that the classification performance of the model is the best.

Further, the process of the gradient compensation in step S1 is as follows:

s1.21: the error between the predicted and true class labels of each branch classifier is calculated.

S1.22: the gradients of the branch classifiers are injected into the sharing part layer by layer in a back propagation mode, and the sharing part sums the gradients obtained from each branch classifier and is used for updating the weights.

The gradient compensation is mainly expressed in a back propagation stage (i.e. a weight updating stage), and the effectiveness of the deep tree training strategy is illustrated by comparing the gradient of a general convolutional neural network with the gradient of a convolutional neural network using the deep tree training strategy. For a convolutional neural network that does not use a deep tree training strategy, its weight w at layer m _m Gradient that can be obtained

As shown in formula (B9), wherein

Represents the solving function J (W, theta) _full ) For w _m Derivative of (a):

and in the iterative update of the ith round, the weight w _m Is shown as equation (B10), where λ is the learning rate, i.e. the step size of each weight update:

it is obvious that

When the factor in (1) is less than 1, the parameter w _m It is difficult to get updated.

For a convolutional neural network using a deep tree training strategy, its weight w at layer m _m Gradient that can be obtained

From M-M branch classifiers after layer M, thus

Is calculated as shown in equation (B11):

wherein,

and beta ^(m+i) Representing the gradient obtained from the branch classifier m + i and the assigned weight, respectively. Obviously, for the same convolutional neural network, there is

That is to say using a depth treeThe training strategy can inject more gradients into the model, and the problem of gradient disappearance is solved.

From equation (B11), a weighted average of the gradients from the different branch classifiers is calculated at each gradient compensation node, which is also the gradient used during the parameter update of the model. For the classification task of a specific data set, the model that can fit the data set is not unique, so the training process of the deep neural network is also the process of searching in the solution space. However, due to factors such as data noise, there is a chance that the gradient calculated by the deep neural network in the training process (sometimes, the obtained gradient can be beneficial to model convergence, and sometimes, the obtained gradient is harmful to model convergence). In the invention, each branch classifier has the same objective function, namely, a data set is drawn to minimize the classification error. Therefore, the model parameter updating by using the average gradient obtained by the gradients injected by the branch classifiers can improve the stability of parameter updating, so that the final objective function can quickly reach a minimum value.

Further, the specific implementation process of step S2 is as follows:

s2.1: and screening the candidate base classifiers based on the classification accuracy, firstly calculating the verification accuracy of each candidate base classifier, then performing descending arrangement on the candidate base classifiers according to the verification accuracy, and screening the front R base classifiers with better performance in the sequence.

S2.2: adopting a double-error difference measurement function, then carrying out centralized screening on the first R base classifiers screened and determined in the step S2.1, and screening out a base classifier set which has the maximum internal difference and can ensure that the generalization capability of the integrated classifier is optimal as a selected base classifier set, wherein the specific implementation method comprises the following steps:

classifier d in base classifier screening based on double-error difference metric function _i And d _j The difference between Div (d) _i ,d _j ) The joint distribution between two classifiers according to table 1 can be expressed as shown in equation (B12):

wherein N is ¹¹ 、N ¹⁰ 、N ⁰¹ 、N ⁰⁰ Represents two classifiers d _i And d _j Number of instances of joint distribution between, e.g. N ¹¹ Representation classifier d _i And d _j The correct number of instances is classified.

TABLE 1 Joint distribution between a pair of classifiers

The first R base classifiers selected in the step S2.1 are iteratively screened one by one, the classifier with the maximum difference is added into a selected base classifier set one by one, and a certain classifier d is subjected to iterative screening _i With a set of classifiers D ^* The difference between them is an average difference calculated as shown in formula (B13):

wherein, | D ^* I denotes the set of classifiers D ^* The number of classifiers included.

Specifically, the algorithm flow based on differential screening is as follows:

inputting:

representing a validation data set;

represents the set of base classifiers after the first stage screening (M classifiers, d) _i The ith classifier to be selected);

t indicates that T classifiers need to be selected.

And (3) outputting:

a set of classifiers is selected (of which there are a total of T classifiers,

for the selected ith classifier);

at S _val On each candidate classifier d _i The classification error of (2);

d: -at S _val Classifier d for obtaining minimum classification error _i ；

D ^* :＝{d}；

D:＝D/D ^* (differencing, i.e. removing set D from set D ^* All elements of (a).

The method comprises the following steps:

for each d _j E, executing the following steps:

calculation of Div (d) according to equation (B13) _j ,D ^* )，

When all classifiers D in the set D _j The layer cycle is ended after the differences have been calculated;

②R _t i.e. to the classifier D in the set D _j According to the difference Div (d) _j ,D ^* ) Performing descending arrangement;

(iii) d ═ in R _t Classifier d being first in the middle column _j ；

④D ^* :＝D ^* U { D } indicates that the classifier D selected in the present round of the loop is added to the set D ^* ；

D ═ D/{ D } denotes that the selected classifier D in this round of rotation is removed from set D;

sixthly, executing T-1 times to finish the circulation.

S2.3: determining the weight of the selected base classifier set obtained by screening in the step S2.2 according to the classification error of the selected base classifier set in the verification data set, and considering the difference between each base classifier and the contribution degree of the base classifier set to the target task and then applying the selected base classifier set to the selected base classifier

(where T ═ 1 … T) and the final classifier model is represented by the formula (B14):

wherein, d _t (x) Represented in the selected base classifier set D ^* The classification result (i.e. probability vector) of the t-th classifier for input instance x,

its corresponding weight. Therefore, the objective function of the weighted integration is shown as (B15):

For weighted integration, the goal is to find a set of weights such that loss (EN) takes a minimum. According to the Lagrange multiplier method, here the weight α _t The calculation of (c) can be expressed as shown in equation (B16):

wherein,

indicating the quality of the corresponding t-th classifier,

is a normalization term.

The following experiments were conducted in accordance with the method of the present invention to illustrate the experimental effects of the present invention.

On the basis of the disclosed reference image classification data set, the image classification method based on the segmented selection integration technology of the deep tree training strategy disclosed by the invention is well verified.

The experimental conditions are as follows: the experiment used a CIFAR-10 image dataset containing 50000 images in the training dataset and 10000 images in the test dataset, for 10 categories, and an example of image data is shown in FIG. 2.

Preparation of the experiment: before the start of the experiment, we randomly sampled 10% from the training data set as the validation data set. The training data set is used for model training, the verification data set is used for artificially adjusted hyper-parameter selection, and the test data set is used for final model evaluation.

The method comprises the following specific implementation steps:

the experiment firstly uses a training data set and a verification data set to train according to the method of the invention to obtain a final integrated classification model, and then uses a test data set to test the classification performance of the final integrated classification model, wherein the evaluation performance mainly comprises classification accuracy, sensitivity, specificity, F1 score and CPU calculation time.

The method provided by the invention is a framework technology, namely, the method can be flexibly integrated with various basic convolutional neural networks. In this verification experiment, two specific models, respectively, a model based on an AlexNet convolutional neural Network (referred to as AlexNet-based) and a model based on a Network in Network convolutional neural Network (referred to as NIN-based), are adopted, and the specific model structures are shown in tables 2 and 3. Taking an AlexNet-based model as an example, 8 branch classifiers are led out from a hidden layer of the AlexNet-based model, then the whole model is trained, then a sectional type selection integration technology is used for screening the 8 branch classifiers, 4 branch classifiers are left after screening, and finally the 4 branch classifiers are subjected to weighted integration to obtain a final classifier model.

TABLE 2 AlexNet-based model realization by deep tree training method based on AlexNet

TABLE 3 implementation of NIN-based model using deep tree training method on the basis of Network in Network

The experimental results are as follows:

as can be seen from the test results in Table 4, the classification performance of the method of the invention on the CIFAR-10 data set is improved. The AlexNet-based model which takes AlexNet as a basic model and adopts the sectional selection integration based on the deep tree-shaped training strategy is improved by 2.8 percent in classification accuracy compared with the original AlexNet model, and the NIN-based model which takes Network in Network as a basic model and adopts the sectional selection integration based on the deep tree-shaped training strategy is improved by 1.64 percent in accuracy compared with the original Network in Network. In addition, the improved model is greatly improved compared with the original model on the evaluation indexes of Sensitivity (SEN), Specificity (SPE) and F1 score. The CPU Time (i.e. Time cost of per epoch) is affected by the number of layers of the model, so the improved model needs slightly more Time (mainly the calculation on the branch is Time-consuming), but the improved model performs better in the balance of performance and Time cost, and is also improved in each evaluation index compared with other classical image classification models (LENET, VGG). The method has great reference value for practical application scenes including image classification tasks, such as medical image diagnosis, biological recognition, automatic driving and the like.

TABLE 4 comparison of Classification Performance of the methods of the present invention on test sets

The above description is only exemplary of the invention, and any modification, equivalent replacement, and improvement made within the spirit and principle of the invention should fall within the protection scope of the invention.

Claims

1. The segmented selection integrated image classification method based on the deep tree training strategy is characterized by comprising the following steps of:

s1, adopting a deep tree training strategy, adding a branch classifier at each middle layer of the deep convolutional neural network, wherein each branch classifier only comprises a full connection layer, and the branch classifier is not discarded in the training or testing process; the process of gradient compensation through the branch classifiers is divided into two parts, firstly, the error between the prediction class label and the real class label of each branch classifier is calculated, then the gradient of the branch classifier is injected into a sharing part layer by layer in a back propagation mode, and the sharing part sums the gradients obtained from each branch classifier and is used for updating the weight; training a deep convolutional neural network containing branch classifiers to obtain a plurality of branch classifiers which have difference and can classify on the intermediate layer characteristics, and thus obtaining a candidate base classifier set with a plurality of branch classifiers;

s2, adopting a sectional selection integration method based on correct rate and difference screening to finally integrate members and generate a final classification model, firstly screening a base classifier set with better performance for a candidate base classifier based on classification correct rate, and then screening a base classifier set which has the maximum internal difference and can ensure the generalization capability of an integrated classifier to be optimal from the base classifier set through a double-error difference measurement function as a selected base classifier set; and finally, optimally integrating the selected base classifier into a final classifier through a weighted integration strategy.

2. The method according to claim 1, wherein the construction process of the branch classifier in step S1 is as follows:

s1.11, acquiring a characteristic diagram from the middle convolutional layer, wherein the specific implementation method comprises the following steps:

the input picture training data set with the class mark is S { (X) _i ,Y _i ) 1,2, a, N }, wherein picture data X _i ∈R ⁿ That each picture data is n-dimensional, and picture X _i Corresponding true class label Y _i E { 1.., K }, where K is the total number of categories;

without considering the bias term, the convolution and activation calculation process of convolutional layer M can be expressed as shown in formula (1), where M ∈ { 1...., M }, M is the total number of convolutional layers, and the pooling calculation process is shown in formula (2):

Q _m ＝f(w _m *Z _m-1 ),and(Z ₀ ≡X _i ) (1)

Z _m ＝g(Q _m ) (2)

wherein, w _m Is the weight of convolutional layer m to be learned, is the convolution operation, Z _m-1 Representing a feature map generated by a layer m-1, wherein a function f (u) ═ max (0, u) is a ReLu activation function which can accelerate network training and obtain a model with better performance, and g (·) represents a maximum pooling function;

s1.12, flattening the characteristic diagram, wherein the specific implementation method comprises the following steps:

feature extraction is carried out on input data layer by the convolutional layers, a feature map extracted by the last convolutional layer is a two-dimensional feature matrix, the flattening operation is to convert the two-dimensional feature matrix into a one-dimensional feature vector, then the one-dimensional feature vector is input into a classifier to carry out final decision calculation, and neurons between the one-dimensional feature vector obtained from the flattening operation and the classifier are in full connection;

and S1.13, combining the flattened feature vectors with a classifier in a full-connection mode.

3. The method for classifying integrated images based on segmented selection of a deep tree training strategy according to claim 2, wherein the step S1.13 is implemented by:

s1.13.1 in the full junction layer part, V is assumed ^(m) And

respectively representing the feature vectors and weight matrices input to the branch classifier m,

representing its bias term, the final layer of output vectors O of the branch classifier m ^(m) Is calculated as shown in equation (3):

s1.13.2 for convenience in representing the objective function of the branch classifier m in deep tree training, convolutional layer weights (w) ₁ ,...,w _m ) Weight matrix for fully connected network with branch classifier m

Combining, as shown in formula (4):

s1.13.3, evaluating the learning effect of the deep learning model by measuring the distance between the prediction score and the real class mark by using an objective function based on cross entropy loss, wherein the smaller the cross entropy in the training data set is, the better the fitting effect of the model to the training data set is; the cross-entropy loss of the branch classifier m in the training dataset containing n instances is shown as equation (5):

wherein log (x) ═ ln (x), h ^(m) (y＝k|X _i ) Is a sample X _i The probability scores corresponding to class k are classified, M belongs to { 1., M }, and M is the total number of convolutional layers;

thus, the objective function of the branch classifier m

As shown in formula (6):

s1.13.4, for each branch classifier, the data characteristics of the final classification decision are different, and the contribution degree of the final classification decision to the gradient compensation in the training process can also be different; therefore, the final objective function of the convolutional neural network of the deep tree training strategy is used to perform weighted integration on the objective functions of the branch classifiers, as shown in formula (7):

wherein, beta ^(m) Denotes the weight of the branch classifier m, W ═ W ⁽¹⁾ ,...,W ^(M) ),

M belongs to { 1.,. M }, wherein M is the total number of convolutional layers;

4. The method for classifying selective integration images based on the deep tree training strategy according to claim 1, wherein the gradient compensation process in step S1 is as follows:

s1.22, injecting the gradients of the branch classifiers into a sharing part layer by layer in a back propagation mode, and summing the gradients obtained from each branch classifier by the sharing part and using the gradients for updating the weight; the gradient compensation is mainly expressed in a back propagation stage, and for a convolutional neural network using a deep tree training strategy, the gradient compensation is realized as follows:

weight w at layer m _m Gradient that can be obtained

From M-M branch classifiers after layer M, thus

Is calculated as shown in equation (8):

wherein,

and beta ^(m+i) Respectively representing the gradients obtained from the branch classifier M + i and the assigned weights, M ∈ { 1., M }, M being the total number of convolutional layers, J (W, θ) _full ) A final objective function for a convolutional neural network using a deep-tree training strategy, where W ═ W ⁽¹⁾ ,...,W ^(M) )，

Expressed as convolutional layer weights (w) ₁ ,...,w _m ) Weight matrix for fully connected network with branch classifier m

In the combination of (a) and (b),

representing the objective function of the branch classifier m + i, W ^(m+i) Expressed as convolutional layer weights (w) ₁ ,...,w _m+i ) Weight matrix for fully connected network with branch classifier m + i

Combination of (a) and (b), h ^(m+i) Representing the probability score, O, of the branch classifier m + i ^(m+i) The last layer output vector, V, represented as the branch classifier m + i ^(m+i) Representing the feature vector, Z, input to the branch classifier m + i _m+i Feature map, Q, representing the generation of layer m + i _m+i Representing a characteristic diagram, Q, of the convolutional layer m + i after convolution and activation _m Representing a characteristic diagram obtained after convolution and activation of the convolution layer m; as can be seen from equation (8), a weighted average of the gradients from the different branch classifiers is calculated at each gradient compensation node, which is also the gradient used during the parameter update of the model; therefore, the model parameter updating is carried out by using the average gradient obtained by the gradient injected by each branch classifier, so that the stability of parameter updating can be improved, and the final objective function can quickly reach a minimum value.

5. The method for classifying selective integration images based on the deep tree training strategy according to claim 1, wherein the step S2 is implemented by:

s2.1, screening the candidate base classifiers based on the classification accuracy, firstly calculating the verification accuracy of each candidate base classifier, then performing descending order arrangement on the candidate base classifiers according to the verification accuracy, and screening the front R base classifiers with better performance in the sequence;

s2.2, adopting a double-error difference measurement function, then carrying out centralized screening on the first R base classifiers screened and determined in the step S2.1, and screening out a base classifier set which has the maximum internal difference and can enable the generalization capability of the integrated classifier to be optimal as a selected base classifier set, wherein the specific implementation method comprises the following steps:

classifier d in base classifier screening based on double-error difference metric function _i And d _j The difference between the two classifiers can be expressed as the following formula (9) according to the joint distribution between the two classifiers:

wherein, N ¹¹ 、N ¹⁰ 、N ⁰¹ 、N ⁰⁰ Represents two classifiers d _i And d _j Number of instances of joint distribution therebetween, e.g. N ¹¹ Representation classifier d _i And d _j The correct number of instances are classified;

the first R base classifiers selected in the step S2.1 are iteratively screened one by one, and the classifier with the maximum difference is added to a selected base classifier set one by one, wherein a certain classifier d _i With a set of classifiers D ^* The difference between them is the average difference, which is calculated as shown in equation (10):

s2.3, determining the weight of the selected base classifier set obtained by screening in the step S2.2 according to the classification error of the selected base classifier set in the verification data set, considering the difference between each base classifier and the contribution degree of the base classifier set to the target task, and then applying the weight to the selected base classifier set

Wherein T is 1.. T, performing weighted integration, and representing a final classifier model as shown in formula (11):

wherein, d _t (x) Represented in the selected base classifier set D ^* The classification result of the t-th classifier for the input instance x, i.e. the probability vector, α _t Is its corresponding weight, where α _t ≥0,

Therefore, the objective function of the weighted integration is as shown in equation (12):

wherein, CQ _ij ＝∫(d _i (x)-f(x))(d _j (x) -f (x)) p (x) dx, p (x) is the distribution of samples x, the function f (x) max (0, x) is the ReLu activation function;

for weight integration, the goal is to find a set of weights such that loss (EN) takes a minimum, where the weight α is given by the Lagrangian multiplier method _t The calculation of (c) can be expressed as shown in equation (13):

wherein

Indicating the quality of the corresponding t-th classifier,

is a normalization term.

6. The method for classifying the integrated image based on the segmented selection of the deep tree training strategy according to claim 2, wherein in the construction process of the branch classifier, the classifier used is a softmax classifier, an SVM classifier or other classifiers capable of performing multi-classification tasks.