CN111401443A

CN111401443A - Width learning system based on multi-feature extraction

Info

Publication number: CN111401443A
Application number: CN202010181905.9A
Authority: CN
Inventors: 刘然; 刘亚琼; 刘宴齐; 田逢春; 钱君辉; 郑杨婷; 赵洋; 陈希; 崔珊珊; 王斐斐; 陈丹
Original assignee: Chongqing University; China Academy of Chinese Medical Sciences CACMS
Current assignee: Chongqing University; China Academy of Chinese Medical Sciences CACMS
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2020-07-10
Anticipated expiration: 2040-03-16
Also published as: CN111401443B

Abstract

The invention relates to a width learning system based on multi-feature extraction, which comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node; each sub-width learning system extracts an image feature from the image data set, and each sub-width learning system combines the image features extracted from the image data set to obtain a respective feature node, and then enhances the respective feature node through an enhancement mapping function to form a corresponding enhancement node; after each sub-width learning system forms an enhanced node, the feature nodes of the sub-width learning system are merged with the corresponding enhanced nodes and then connected to the sub-nodes of the sub-width learning system, and then the output of the sub-nodes of each sub-width learning system is normalized and then connected to the final output layer. The method has the advantages of short model training time and high classification accuracy on the problem of complex data set classification.

Description

Width learning system based on multi-feature extraction

Technical Field

The invention relates to the technical field of image classification, in particular to a width learning system based on multi-feature extraction.

Background

Image classification is a hot problem in image processing, and aims to automatically classify a large number of images. The technology is widely used in applications such as pedestrian detection, video analysis, and image quality assessment.

In recent years, an image classification method based on deep learning has been widely focused and studied. Typical Deep learning models are Deep Belief Networks (DBN), Deep Boltzmann Machines (DBM), and Convolutional Neural Networks (CNN). CNN is widely used for image processing, especially image classification, due to the ability to learn higher levels of semantic features. The CNN consists of a convolution layer, a pooling layer and a full-connection layer, and the number of parameters can be effectively reduced by adopting a weight sharing method. Better performing image classification models, such as AlexNet, GoogleNet, ResNet, and GPipe, are then derived based on CNN. The deep convolution neural networks such as ResNet and GPipe have good effect on data sets such as MNIST, SVHN, CIFAR-10, CIFAR-100, ImageNet and the like. However, because the hidden layers in the network are numerous, the parameters such as the weight and the bias which need to be trained are more than millions, and the training mode of the deep learning model is based on the gradient descent algorithm and the back propagation algorithm, the model training speed is slow, and the time is long.

To solve this problem, Chen et al propose a breadth learning System (B L S), and prove that the model has a universal approximation property (B L S) which can be effectively applied to classification and regression tasks, B L S is based on a Random vector functional Neural Network (RVF L NN), has a flat Network architecture with only one hidden layer, weights and biased Random assignments in the Network, and is not updated during training, the Network uses ridge regression to find optimal weights, therefore, the Network can quickly classify images.

To improve the classification performance of B L S, L iu et al introduced a K-means feature representation method in original B L S, and proposed a K-means-B L S model, which extracts K-means features, uses the features instead of the original image input, inputs them into B L S to improve the classification effect of B L S at CIFAR-10. in view of the local invariance of image data, Jin et al proposed a GB L S model, introduces manifold learning into the objective function of the model, constrains the output weights, and further improves the classification capability of the model.

From the above discussion, we can see that the deep learning network can realize accurate classification on complex data sets, but has the problems of long training time and repeated parameter adjustment, and the B L S and various improved models do not sufficiently learn the features of image data due to the shallow structure thereof, so that the classification performance of the model is not very good when classifying the complex data sets although the training time of the model is short.

Disclosure of Invention

In view of the above, in order to solve the existing problems described above, the present invention provides a width learning system based on multi-feature extraction, so as to solve the technical problem that the existing image classification method does not combine the advantages of short model training time and high classification accuracy in the complex data set classification problem.

The invention relates to a width learning system based on multi-feature extraction, which comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node;

each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after each sub-width learning system forms an enhanced node, combining the characteristic node with the corresponding enhanced node, and then connecting the feature node and the corresponding enhanced node to the sub-nodes of the sub-width learning system;

the width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.

Further, the step of extracting the HOG features of the image data set by the first sub-width learning system of the multi-feature extraction-based width learning system includes:

1) normalizing the input image, and converting the image into a gray scale image;

2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;

3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;

4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;

5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;

the second sub-width learning system extracting color features of the image data set comprises:

1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;

2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;

3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;

4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vector is a feature node of the second sub-width learning system;

the third sub-width learning system extracts the K-means feature of the image dataset comprising:

1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;

2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is R^d→R^kR is a real number set, an image block is mapped into a feature vector, and d is the dimension of the image block vector; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:

wherein, mu^(j)Is the jth clustering center, and k is the number of the clustering centers; d_jRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;

the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:

1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;

2) after 4 times of convolution and pooling, flattening the obtained result into a vector;

3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.

Further, the enhancement mapping function is a non-linear mapping function.

Further, the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:

the characteristic nodes corresponding to the HOG are as follows:

Z_H＝[h₁,h₂,...,h_N]^T∈R^N×M(1)

wherein N is the number of samples, h₁,h₂,...,h_NRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:

H_H＝φ_H(Z_HW_EH+β_H) (1)

wherein W_EHIs the mapping weight, β_HIs an offset of phi_HA non-linear activation function; weight W_EHAnd an offset β_HIs randomly generated; corresponding child node output U_HIn the form of:

U_H＝[Z_H,H_H]W_H＝A_HW_H(3)

wherein A is_H＝[Z_H,H_H]The sub-width learning system objective function corresponding to the HOG is as follows:

wherein Y is a label set, λ_HIs a ridge regression parameter; derivation is performed on equation (4) to obtain:

wherein I is an identity matrix;

method for solving weights and sub-node outputs corresponding to other three sub-width learning systems and method for solving weights and sub-node outputs by first sub-width learning systemIn the same way, in solving a second sub-width learning system

And child node output U_SIn the same way, the subscript H corresponding to the HOG feature and the subscript S corresponding to the color feature in the formula are only needed to be replaced by the subscript H corresponding to the HOG feature, and the third sub-width learning system is solved

And child node output U_KIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solved

And child node output U_FThen, only the subscript H corresponding to the HOG characteristic in the formula is needed to be replaced by the subscript F corresponding to the convolution characteristic;

the width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning system_H,U_S,U_K,U_FThe post-processing algorithm is as follows:

to U_H,U_S,U_K,U_FAre respectively normalized to respectively obtain U'_H,U'_S,U'_K,U'_FAnd Z is set as:

Z＝[U'_H,U'_S,U'_K,U'_F](7)

the overall output Y of the width learning system based on multi-feature extraction is as follows:

Y＝[Z]W＝AW (8)

where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:

wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:

W^*＝(A^TAI+λI)^-1A^TY (10)

wherein I is an identity matrix;

the final output of the multi-feature extraction based width learning system is then:

the invention has the beneficial effects that:

the invention is based on a width learning system of multi-feature extraction, adopts a multi-feature extraction method to replace a random mapping method of an original width learning system (B L S for short), extracts K-means features, HOG features, color features and convolution features of an image, and can remarkably improve the feature learning capability of B L S, takes the fact that the four features represent different meanings and focus points on the image into consideration, constructs four independent sub-B L S, respectively carries out enhanced mapping on each feature, all sub-B L S jointly form a large width learning system based on multi-feature extraction, MFB L S for short, comprehensively considers the output of each sub-B L S, simultaneously uses a normalization layer to improve the generalization capability of the model, experiments on complex data sets such as SVHN, CIFAR-10 and CIFAR-100 show that (1) the classification performance of the proposed MFB L S model on the complex data set is superior to that of the existing width learning model on the complex data set, and the classification performance of the MFB L S model on the complex data set is superior to that the existing width learning model is extracted by comparison with the existing multi-feature extraction model, and the comparison of the training model is not only based on the comparison of the rough classification model, the comparison of the classification model, the classification cost is lower than the comparison of the classification model (MDBS 3650), the classification) is proved by adopting the training of the classification).

Drawings

Fig. 1 is a schematic structural diagram of a multi-feature extraction-based width learning system MFB L S, in which a dashed box represents a sub-B L S.

Fig. 2 is an example SVHN dataset.

FIG. 3 is a CIFAR-10 dataset example.

FIG. 4 is a CIFAR-100 dataset example.

FIG. 5 is a parameter sensitivity study on MFB L S on SVHN, CIFAR-10, and CIFAR-100 datasets_HResults on three data sets with different values.

Detailed Description

The invention is further described below with reference to the figures and examples.

The width learning system based on multi-feature extraction in this embodiment includes four sub-width learning systems, and each sub-width learning system includes a feature node, an enhanced node, and a sub-node.

Each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after the enhancement nodes are formed, the characteristic nodes of the sub-width learning systems are merged with the corresponding enhancement nodes and then connected to the sub-nodes.

In the multi-feature extraction-based width learning system of the present embodiment, the step of extracting the HOG features of the image data set by the first sub-width learning system includes:

2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is R^d→R^kR is a real number set, an image block is mapped into a feature vector, d is the dimension of the image block vector, and k is the number of clustering centers; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:

The enhancement mapping function described in this embodiment is a non-linear mapping function.

In the width learning system based on multi-feature extraction in this embodiment, the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:

the characteristic nodes corresponding to the HOG are as follows:

Z_H＝[h₁,h₂,...,h_N]^T∈R^N×M(1)

H_H＝φ_H(Z_HW_EH+β_H) (2)

U_H＝[Z_H,H_H]W_H＝A_HW_H(3)

wherein I is an identity matrix;

since the four sub-B L S have the same structure except the different features, the weight W of the four sub-B L S is the same as the solving way of the output U, and the second sub-width learning system is solved

And child node output U_FIn the above formula, the subscript H corresponding to the HOG feature and the subscript F corresponding to the convolution feature are simply replaced with each other. In order to make the writing concise and avoid large text duplication, the solution is not yet to be solved here

U_S,U_K,U_FThe specific formulas of (A) and (B) are listed.

Z＝[U'_H,U'_S,U'_K,U'_F](7)

Y＝[Z]W＝AW (8)

W^*＝(A^TAI+λI)^-1A^TY (10)

wherein I is an identity matrix;

the following is an image classification experiment performed on the SVHN, CIFAR-10, CIFAR-100 data sets by MFB L S in this example, and compared to other advanced methods.

Data set and settings

SVHN and MINST are similar and are both sets of number identifications. However, the MNIST image is in a binary format, and the background and the number are easily separated. And the SVHN image is in an RGB format, and the image background is more complicated. Image classification for SVHN is therefore more challenging. The SVHN dataset consists of a training set, an additional set, and a test set. The training set had 73257 samples, the additional set had 531131 samples, and the test set had 26032 samples. The data set is a 10 classification. Some examples of SVHN datasets are shown in fig. 2. Experiment 36743 samples were randomly selected from the extra set and combined with the training set to form a new training set, which has 11000 samples. In addition, 1000 samples are chosen from the remaining extra set as the validation set. The data set information used for the experiment is shown in table 1.

The CIFAR-10 dataset consists of 60000 RGB images of 32 × 32, the training set has 50000 images, the testing set has 10000 images, the dataset has 10 classes each of which contains 5000 training images and 1000 testing images, the classes include airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.

CIFAR-100 is similar to CIFAR-10 in that CIFAR-100 image is also a 32 × 32 RGB image, except that the classification of CIFAR-100 is more refined the dataset has 100 classes, each with 500 training samples and 100 test samples, 100 classes are divided into 20 super classes, for example the super class of flowers contains five classes orchid, poppy, rose, sunflower, tulip.

TABLE 1 MFB L S Experimental data set information

The hyper-parameters in the experiment need to be set, and the hyper-parameters are divided into two types, one type is the hyper-parameters of feature extraction, and the other type is the hyper-parameters of the B L S model.

The hyper-parameters of the feature extraction comprise (1) the size of an image block and the size of the image block of k.SVHN when the K-means feature is extracted, the size of the image block of k.SVHN is 8 × 8, the value of K is 500, the size of the image block of CIFAR-10 and CIFAR-100 is 6 × 6, and the value of K is 1024 and 1300 respectively, wherein the K-means feature of SVHN is extracted on a gray scale map of input data, and the CIFAR-10 and CIFAR-100 are extracted on 3 channels, (2) when the convolution feature is extracted, the values of a parameter gamma for calculating the number of convolution kernels and the values of PCA hyper-parameters p.SVHN, CIFAR-10 and CIFAR-100 are 0.2, 0.18 and 0.18 respectively, and the value of p is 0.99 on all data sets.

The hyper-parameters of the B L S model are (1) ridge regression parameter lambda_H、λ_S、λ_K、λ_Fλ, λ. (2) Scaling parameter S when generating an enhanced node_H、S_S、S_K、S_F. (3) Number of enhanced nodes E_H、E_S、E_K、E_F. The settings of these parameters on the respective data sets are shown in table 2.

TABLE 2 MFB L S Experimental parameter settings

Results of the experiment

The present example compares the proposed method to other advanced methods on three datasets, SVHN, CIFAR-10 and CIFAR-100. other comparison methods are B L S, K-means-B L S, CNNB L S, EFB L S and convolutional DBN, respectively. convolutional DBN is a depth model Table 3 shows the best test accuracy achieved on three datasets using MFB L S and other methods the following conclusions can be drawn from Table 3:

1) K-means-B L S gave better results than B L S and EFB L S, indicating that classification using K-means features can significantly improve model performance.

2) CNNB L S gave better results than B L S and EFB L S, indicating that the use of convolution and pooling operations to extract features helps improve the discriminative power of the model.

3) On a CIFAR-100 data set, the classification accuracy of MFB L S is 12.71% higher than that of K-means-B L S, and it can be shown that the classification performance of the model can be obviously improved by using other characteristics besides the K-means characteristic, namely convolution characteristic, HOG characteristic and color characteristic.

4) The proposed MFB L S achieves the highest classification accuracy on SVHN, CIFAR-10 and CIFAR-100. firstly, the classification performance of MFB L is superior to other width learning models for image classification, secondly, on CIFAR-10, 81.03% of classification accuracy is obtained, which is 2.14% higher than that of a convolution DBN, which shows that the performance of MFB L S is also superior to that of a convolution DBN model, and finally, in 6 comparison experiments, the performance of MFB L S on three data sets is superior to that of other comparison methods, thereby proving the effectiveness of the method.

Additionally, MFB L S is able to exceed the results of the baseline convolution DBN model and does not have to be pre-trained.

TABLE 3 accuracy (%) (of MFB L S and other Current methods in SVHN test set, CIFAR-10 test set, and CIFAR-100 test set)

Sensitivity of parameters

By performing a parameter sensitivity analysis, it can be shown that the proposed MFB L S framework can achieve optimal results over a wide range of parameter values, while verifying the robustness of MFB L S.

The value range of the over-parameter in the experiment is as follows:

1) ridge regression parameter lambda_H、λ_S、λ_K、λ_Fλ is selected from the set {0.001,0.005,0.01,0.05,0.1,0.5,1,5,10 }.

2) Scaling parameter S_H、S_S、S_K、S_FSelected from the set {0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95 }.

3) Number of enhanced nodes E_H、E_S、E_K、E_FSelected from the

set

500,1000,4000,7000,8000,9000,10000.

4) And when the K-means characteristics are extracted, the number K of the centroids is selected from the set {500,700,900,1100,1300,1500,1700 }.

5) The hyperparameter p of the PCA is selected from the set {0.91,0.93,0.95,0.97,0.99 }.

6) The parameter γ used to calculate the number of convolution kernels is selected from the set {0.05,0.1,0.15,0.2,0.25 }.

The results obtained on the SVHN, CIFAR-10 and CIFAR-100 datasets using different hyper-parameter settings are shown in FIG. 5. It can be seen that fig. 5 consists of 16 sub-graphs, each of which describes the effect on the result of the above 16 hyper-parameters on the respective data set. The x-axis of all sub-graphs represents the parameter values and the y-axis represents the test accuracy. For example, FIG. 5(a) illustrates the use of different λ_HValue, test accuracy of three data sets. By analyzing the results of these experiments, the following conclusions can be drawn:

1) as can be seen in FIG. 5, the blue line (SVHN) is flattest, the orange line (CIFAR-10) is second, and the green line (CIFAR-100) is most tortuous when the value of the hyper-parameter value changes, indicating that the data sets are, as a whole, SVHN, CIFAR-10, CIFAR-100 in order of smaller to larger sensitivity to the parameter, while also indicating that MFB L S is more robust on simpler data sets.

2) From subgraph (a) to subgraph (d), λ_KThe change of (2) has the greatest influence on the result, λ_HThe effect on the result is small, λ_SAnd λ_FHas little effect on the results. When lambda is_KThe classification performance on the respective data sets gradually decreases as the value of (c) increases.

3) From sub-graph (e) to sub-graph (h), the scaling parameter S_H、S_S、S_K、S_FThe accuracy on the three data sets hardly changed when the value of (c) increased.

4) As can be seen from subgraph (i) to subgraph (l), the parameter E of the number of the enhanced nodes_H、E_SAnd E_KThe value of (a) has a slight influence on the result. E_CHas little effect on the results.

5) As can be seen from the sub-graphs (m) and (n), when the number k of clustering centroids and the ridge regression parameter λ are changed, the blue line (SVHN) and the orange line (CIFAR-10) are relatively flat, while the green line (CIFAR-100) is very zigzag. This indicates that for CIFAR-100, the parameters k and λ have a large effect on the results, but not on SVHN and CIFAR-10.

6) As can be seen from subgraph (o) and subgraph (p), the change of the superparameter p of the PCA and the parameter γ for calculating the number of convolution kernels has little effect on the result.

7) FIG. 5 and the above conclusions indicate that the accuracy on the three data sets remains substantially around a certain value for most of the hyper-parametric variations, which indicates that the MFB L S model in this embodiment is very robust.

Time complexity

The experiment compared the run time of MFB L S and almost all other models on three datasets, as shown in Table 4. the run environment is an Intel Xeon E5-2678 CPU, and a block NVIDIA TITAN Xp. as seen in Table 4:

1) the runtime of the MFB L S method is longer than the runtime of the B L S, CNNB L S, EFB L S method, and the reason for this may be that MFB L S takes time in feature extraction, for example, on CIFAR-10, the time taken for MFB L S to perform feature extraction is about 900S.

2) MFB L S and K-means-B L S run at approximately CIFAR-10 and CIFAR-100 times, while MFB L S is longer on SVHN datasets.

3) The running time of the convolution DBN on the CIFAR-10 is 36h (NVIDIA GTX 280 is adopted) and is far more than that of the MFB L S, the time of the MFB L S is short, and the accuracy is higher than that of the convolution DBN, so that the MFB L S can reduce a large amount of running time while ensuring the classification performance.

4) Compared with the B L S correlation model, the MFB L S has longer running time but is within an acceptable range, and the classification performance of MFB L S is higher.

Table 4 MFB L S and other latest methods run time at SVHN, CIFAR-10 and CIFAR-100.

Finally, although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims

1. A width learning system based on multi-feature extraction is characterized in that: the system comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node;

2. The multi-feature extraction based width learning system according to claim 1, wherein:

the step of the first sub-width learning system extracting the HOG features of the image data set comprises:

4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vectors are the feature nodes of the second sub-width learning system.

d_j＝||x-μ^(j)||₂

3. The multi-feature extraction based width learning system according to claim 1, wherein: the enhanced mapping function is a non-linear mapping function.

4. The multi-feature extraction based width learning system of claim 1, 2 or 3, wherein:

the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:

the characteristic nodes corresponding to the HOG are as follows:

Z_H＝[h₁,h₂,...,h_N]^T∈R^N×M(1)

H_H＝φ_H(Z_HW_EH+β_H) (1)

U_H＝[Z_H,H_H]W_H＝A_HW_H(3)

wherein I is an identity matrix;

the method for solving the corresponding weight and sub-node output of the other three sub-width learning systems is the same as the method for solving the weight and sub-node output of the first sub-width learning system, and the method for solving the second sub-width learning system

And child node output U_SIn the same way, the subscript H corresponding to the HOG characteristic in the formula and the subscript S corresponding to the color characteristic are only needed to be replaced by the subscript H corresponding to the HOG characteristic, and the third sub-width learning system is solved

And child node output U_FThen, only the subscript H corresponding to the HOG characteristic in the formula is required to be replaced by the subscript F corresponding to the convolution characteristic;

Z＝[U'_H,U'_S,U'_K,U'_F](7)

Y＝[Z]W＝AW (8)

W^*＝(A^TAI+λI)^-1A^TY (10)

wherein I is an identity matrix;