CN111401443A - Width learning system based on multi-feature extraction - Google Patents
Width learning system based on multi-feature extraction Download PDFInfo
- Publication number
- CN111401443A CN111401443A CN202010181905.9A CN202010181905A CN111401443A CN 111401443 A CN111401443 A CN 111401443A CN 202010181905 A CN202010181905 A CN 202010181905A CN 111401443 A CN111401443 A CN 111401443A
- Authority
- CN
- China
- Prior art keywords
- sub
- learning system
- width learning
- feature
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/50—Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a width learning system based on multi-feature extraction, which comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node; each sub-width learning system extracts an image feature from the image data set, and each sub-width learning system combines the image features extracted from the image data set to obtain a respective feature node, and then enhances the respective feature node through an enhancement mapping function to form a corresponding enhancement node; after each sub-width learning system forms an enhanced node, the feature nodes of the sub-width learning system are merged with the corresponding enhanced nodes and then connected to the sub-nodes of the sub-width learning system, and then the output of the sub-nodes of each sub-width learning system is normalized and then connected to the final output layer. The method has the advantages of short model training time and high classification accuracy on the problem of complex data set classification.
Description
Technical Field
The invention relates to the technical field of image classification, in particular to a width learning system based on multi-feature extraction.
Background
Image classification is a hot problem in image processing, and aims to automatically classify a large number of images. The technology is widely used in applications such as pedestrian detection, video analysis, and image quality assessment.
In recent years, an image classification method based on deep learning has been widely focused and studied. Typical Deep learning models are Deep Belief Networks (DBN), Deep Boltzmann Machines (DBM), and Convolutional Neural Networks (CNN). CNN is widely used for image processing, especially image classification, due to the ability to learn higher levels of semantic features. The CNN consists of a convolution layer, a pooling layer and a full-connection layer, and the number of parameters can be effectively reduced by adopting a weight sharing method. Better performing image classification models, such as AlexNet, GoogleNet, ResNet, and GPipe, are then derived based on CNN. The deep convolution neural networks such as ResNet and GPipe have good effect on data sets such as MNIST, SVHN, CIFAR-10, CIFAR-100, ImageNet and the like. However, because the hidden layers in the network are numerous, the parameters such as the weight and the bias which need to be trained are more than millions, and the training mode of the deep learning model is based on the gradient descent algorithm and the back propagation algorithm, the model training speed is slow, and the time is long.
To solve this problem, Chen et al propose a breadth learning System (B L S), and prove that the model has a universal approximation property (B L S) which can be effectively applied to classification and regression tasks, B L S is based on a Random vector functional Neural Network (RVF L NN), has a flat Network architecture with only one hidden layer, weights and biased Random assignments in the Network, and is not updated during training, the Network uses ridge regression to find optimal weights, therefore, the Network can quickly classify images.
To improve the classification performance of B L S, L iu et al introduced a K-means feature representation method in original B L S, and proposed a K-means-B L S model, which extracts K-means features, uses the features instead of the original image input, inputs them into B L S to improve the classification effect of B L S at CIFAR-10. in view of the local invariance of image data, Jin et al proposed a GB L S model, introduces manifold learning into the objective function of the model, constrains the output weights, and further improves the classification capability of the model.
From the above discussion, we can see that the deep learning network can realize accurate classification on complex data sets, but has the problems of long training time and repeated parameter adjustment, and the B L S and various improved models do not sufficiently learn the features of image data due to the shallow structure thereof, so that the classification performance of the model is not very good when classifying the complex data sets although the training time of the model is short.
Disclosure of Invention
In view of the above, in order to solve the existing problems described above, the present invention provides a width learning system based on multi-feature extraction, so as to solve the technical problem that the existing image classification method does not combine the advantages of short model training time and high classification accuracy in the complex data set classification problem.
The invention relates to a width learning system based on multi-feature extraction, which comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node;
each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after each sub-width learning system forms an enhanced node, combining the characteristic node with the corresponding enhanced node, and then connecting the feature node and the corresponding enhanced node to the sub-nodes of the sub-width learning system;
the width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.
Further, the step of extracting the HOG features of the image data set by the first sub-width learning system of the multi-feature extraction-based width learning system includes:
1) normalizing the input image, and converting the image into a gray scale image;
2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;
3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;
4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;
5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;
the second sub-width learning system extracting color features of the image data set comprises:
1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;
2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;
3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;
4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vector is a feature node of the second sub-width learning system;
the third sub-width learning system extracts the K-means feature of the image dataset comprising:
1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;
2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is Rd→RkR is a real number set, an image block is mapped into a feature vector, and d is the dimension of the image block vector; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:
wherein, mu(j)Is the jth clustering center, and k is the number of the clustering centers; djRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;
the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:
1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;
2) after 4 times of convolution and pooling, flattening the obtained result into a vector;
3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.
Further, the enhancement mapping function is a non-linear mapping function.
Further, the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:
the characteristic nodes corresponding to the HOG are as follows:
ZH=[h1,h2,...,hN]T∈RN×M(1)
wherein N is the number of samples, h1,h2,...,hNRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:
HH=φH(ZHWEH+βH) (1)
wherein WEHIs the mapping weight, βHIs an offset of phiHA non-linear activation function; weight WEHAnd an offset βHIs randomly generated; corresponding child node output UHIn the form of:
UH=[ZH,HH]WH=AHWH(3)
wherein A isH=[ZH,HH]The sub-width learning system objective function corresponding to the HOG is as follows:
wherein Y is a label set, λHIs a ridge regression parameter; derivation is performed on equation (4) to obtain:
wherein I is an identity matrix;
method for solving weights and sub-node outputs corresponding to other three sub-width learning systems and method for solving weights and sub-node outputs by first sub-width learning systemIn the same way, in solving a second sub-width learning systemAnd child node output USIn the same way, the subscript H corresponding to the HOG feature and the subscript S corresponding to the color feature in the formula are only needed to be replaced by the subscript H corresponding to the HOG feature, and the third sub-width learning system is solvedAnd child node output UKIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solvedAnd child node output UFThen, only the subscript H corresponding to the HOG characteristic in the formula is needed to be replaced by the subscript F corresponding to the convolution characteristic;
the width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning systemH,US,UK,UFThe post-processing algorithm is as follows:
to UH,US,UK,UFAre respectively normalized to respectively obtain U'H,U'S,U'K,U'FAnd Z is set as:
Z=[U'H,U'S,U'K,U'F](7)
the overall output Y of the width learning system based on multi-feature extraction is as follows:
Y=[Z]W=AW (8)
where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:
wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:
W*=(ATAI+λI)-1ATY (10)
wherein I is an identity matrix;
the final output of the multi-feature extraction based width learning system is then:
the invention has the beneficial effects that:
the invention is based on a width learning system of multi-feature extraction, adopts a multi-feature extraction method to replace a random mapping method of an original width learning system (B L S for short), extracts K-means features, HOG features, color features and convolution features of an image, and can remarkably improve the feature learning capability of B L S, takes the fact that the four features represent different meanings and focus points on the image into consideration, constructs four independent sub-B L S, respectively carries out enhanced mapping on each feature, all sub-B L S jointly form a large width learning system based on multi-feature extraction, MFB L S for short, comprehensively considers the output of each sub-B L S, simultaneously uses a normalization layer to improve the generalization capability of the model, experiments on complex data sets such as SVHN, CIFAR-10 and CIFAR-100 show that (1) the classification performance of the proposed MFB L S model on the complex data set is superior to that of the existing width learning model on the complex data set, and the classification performance of the MFB L S model on the complex data set is superior to that the existing width learning model is extracted by comparison with the existing multi-feature extraction model, and the comparison of the training model is not only based on the comparison of the rough classification model, the comparison of the classification model, the classification cost is lower than the comparison of the classification model (MDBS 3650), the classification) is proved by adopting the training of the classification).
Drawings
Fig. 1 is a schematic structural diagram of a multi-feature extraction-based width learning system MFB L S, in which a dashed box represents a sub-B L S.
Fig. 2 is an example SVHN dataset.
FIG. 3 is a CIFAR-10 dataset example.
FIG. 4 is a CIFAR-100 dataset example.
FIG. 5 is a parameter sensitivity study on MFB L S on SVHN, CIFAR-10, and CIFAR-100 datasetsHResults on three data sets with different values.
Detailed Description
The invention is further described below with reference to the figures and examples.
The width learning system based on multi-feature extraction in this embodiment includes four sub-width learning systems, and each sub-width learning system includes a feature node, an enhanced node, and a sub-node.
Each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after the enhancement nodes are formed, the characteristic nodes of the sub-width learning systems are merged with the corresponding enhancement nodes and then connected to the sub-nodes.
The width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.
In the multi-feature extraction-based width learning system of the present embodiment, the step of extracting the HOG features of the image data set by the first sub-width learning system includes:
1) normalizing the input image, and converting the image into a gray scale image;
2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;
3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;
4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;
5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;
the second sub-width learning system extracting color features of the image data set comprises:
1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;
2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;
3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;
4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vector is a feature node of the second sub-width learning system;
the third sub-width learning system extracts the K-means feature of the image dataset comprising:
1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;
2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is Rd→RkR is a real number set, an image block is mapped into a feature vector, d is the dimension of the image block vector, and k is the number of clustering centers; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:
wherein, mu(j)Is the jth clustering center, and k is the number of the clustering centers; djRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;
the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:
1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;
2) after 4 times of convolution and pooling, flattening the obtained result into a vector;
3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.
The enhancement mapping function described in this embodiment is a non-linear mapping function.
In the width learning system based on multi-feature extraction in this embodiment, the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:
the characteristic nodes corresponding to the HOG are as follows:
ZH=[h1,h2,...,hN]T∈RN×M(1)
wherein N is the number of samples, h1,h2,...,hNRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:
HH=φH(ZHWEH+βH) (2)
wherein WEHIs the mapping weight, βHIs an offset of phiHA non-linear activation function; weight WEHAnd an offset βHIs randomly generated; corresponding child node output UHIn the form of:
UH=[ZH,HH]WH=AHWH(3)
wherein A isH=[ZH,HH]The sub-width learning system objective function corresponding to the HOG is as follows:
wherein Y is a label set, λHIs a ridge regression parameter; derivation is performed on equation (4) to obtain:
wherein I is an identity matrix;
since the four sub-B L S have the same structure except the different features, the weight W of the four sub-B L S is the same as the solving way of the output U, and the second sub-width learning system is solvedAnd child node output USIn the same way, the subscript H corresponding to the HOG feature and the subscript S corresponding to the color feature in the formula are only needed to be replaced by the subscript H corresponding to the HOG feature, and the third sub-width learning system is solvedAnd child node output UKIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solvedAnd child node output UFIn the above formula, the subscript H corresponding to the HOG feature and the subscript F corresponding to the convolution feature are simply replaced with each other. In order to make the writing concise and avoid large text duplication, the solution is not yet to be solved hereUS,UK,UFThe specific formulas of (A) and (B) are listed.
The width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning systemH,US,UK,UFThe post-processing algorithm is as follows:
to UH,US,UK,UFAre respectively normalized to respectively obtain U'H,U'S,U'K,U'FAnd Z is set as:
Z=[U'H,U'S,U'K,U'F](7)
the overall output Y of the width learning system based on multi-feature extraction is as follows:
Y=[Z]W=AW (8)
where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:
wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:
W*=(ATAI+λI)-1ATY (10)
wherein I is an identity matrix;
the final output of the multi-feature extraction based width learning system is then:
the following is an image classification experiment performed on the SVHN, CIFAR-10, CIFAR-100 data sets by MFB L S in this example, and compared to other advanced methods.
Data set and settings
SVHN and MINST are similar and are both sets of number identifications. However, the MNIST image is in a binary format, and the background and the number are easily separated. And the SVHN image is in an RGB format, and the image background is more complicated. Image classification for SVHN is therefore more challenging. The SVHN dataset consists of a training set, an additional set, and a test set. The training set had 73257 samples, the additional set had 531131 samples, and the test set had 26032 samples. The data set is a 10 classification. Some examples of SVHN datasets are shown in fig. 2. Experiment 36743 samples were randomly selected from the extra set and combined with the training set to form a new training set, which has 11000 samples. In addition, 1000 samples are chosen from the remaining extra set as the validation set. The data set information used for the experiment is shown in table 1.
The CIFAR-10 dataset consists of 60000 RGB images of 32 × 32, the training set has 50000 images, the testing set has 10000 images, the dataset has 10 classes each of which contains 5000 training images and 1000 testing images, the classes include airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.
CIFAR-100 is similar to CIFAR-10 in that CIFAR-100 image is also a 32 × 32 RGB image, except that the classification of CIFAR-100 is more refined the dataset has 100 classes, each with 500 training samples and 100 test samples, 100 classes are divided into 20 super classes, for example the super class of flowers contains five classes orchid, poppy, rose, sunflower, tulip.
TABLE 1 MFB L S Experimental data set information
The hyper-parameters in the experiment need to be set, and the hyper-parameters are divided into two types, one type is the hyper-parameters of feature extraction, and the other type is the hyper-parameters of the B L S model.
The hyper-parameters of the feature extraction comprise (1) the size of an image block and the size of the image block of k.SVHN when the K-means feature is extracted, the size of the image block of k.SVHN is 8 × 8, the value of K is 500, the size of the image block of CIFAR-10 and CIFAR-100 is 6 × 6, and the value of K is 1024 and 1300 respectively, wherein the K-means feature of SVHN is extracted on a gray scale map of input data, and the CIFAR-10 and CIFAR-100 are extracted on 3 channels, (2) when the convolution feature is extracted, the values of a parameter gamma for calculating the number of convolution kernels and the values of PCA hyper-parameters p.SVHN, CIFAR-10 and CIFAR-100 are 0.2, 0.18 and 0.18 respectively, and the value of p is 0.99 on all data sets.
The hyper-parameters of the B L S model are (1) ridge regression parameter lambdaH、λS、λK、λFλ, λ. (2) Scaling parameter S when generating an enhanced nodeH、SS、SK、SF. (3) Number of enhanced nodes EH、ES、EK、EF. The settings of these parameters on the respective data sets are shown in table 2.
TABLE 2 MFB L S Experimental parameter settings
Results of the experiment
The present example compares the proposed method to other advanced methods on three datasets, SVHN, CIFAR-10 and CIFAR-100. other comparison methods are B L S, K-means-B L S, CNNB L S, EFB L S and convolutional DBN, respectively. convolutional DBN is a depth model Table 3 shows the best test accuracy achieved on three datasets using MFB L S and other methods the following conclusions can be drawn from Table 3:
1) K-means-B L S gave better results than B L S and EFB L S, indicating that classification using K-means features can significantly improve model performance.
2) CNNB L S gave better results than B L S and EFB L S, indicating that the use of convolution and pooling operations to extract features helps improve the discriminative power of the model.
3) On a CIFAR-100 data set, the classification accuracy of MFB L S is 12.71% higher than that of K-means-B L S, and it can be shown that the classification performance of the model can be obviously improved by using other characteristics besides the K-means characteristic, namely convolution characteristic, HOG characteristic and color characteristic.
4) The proposed MFB L S achieves the highest classification accuracy on SVHN, CIFAR-10 and CIFAR-100. firstly, the classification performance of MFB L is superior to other width learning models for image classification, secondly, on CIFAR-10, 81.03% of classification accuracy is obtained, which is 2.14% higher than that of a convolution DBN, which shows that the performance of MFB L S is also superior to that of a convolution DBN model, and finally, in 6 comparison experiments, the performance of MFB L S on three data sets is superior to that of other comparison methods, thereby proving the effectiveness of the method.
Additionally, MFB L S is able to exceed the results of the baseline convolution DBN model and does not have to be pre-trained.
TABLE 3 accuracy (%) (of MFB L S and other Current methods in SVHN test set, CIFAR-10 test set, and CIFAR-100 test set)
Sensitivity of parameters
By performing a parameter sensitivity analysis, it can be shown that the proposed MFB L S framework can achieve optimal results over a wide range of parameter values, while verifying the robustness of MFB L S.
The value range of the over-parameter in the experiment is as follows:
1) ridge regression parameter lambdaH、λS、λK、λFλ is selected from the set {0.001,0.005,0.01,0.05,0.1,0.5,1,5,10 }.
2) Scaling parameter SH、SS、SK、SFSelected from the set {0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95 }.
3) Number of enhanced nodes EH、ES、EK、EFSelected from the set 500,1000,4000,7000,8000,9000,10000.
4) And when the K-means characteristics are extracted, the number K of the centroids is selected from the set {500,700,900,1100,1300,1500,1700 }.
5) The hyperparameter p of the PCA is selected from the set {0.91,0.93,0.95,0.97,0.99 }.
6) The parameter γ used to calculate the number of convolution kernels is selected from the set {0.05,0.1,0.15,0.2,0.25 }.
The results obtained on the SVHN, CIFAR-10 and CIFAR-100 datasets using different hyper-parameter settings are shown in FIG. 5. It can be seen that fig. 5 consists of 16 sub-graphs, each of which describes the effect on the result of the above 16 hyper-parameters on the respective data set. The x-axis of all sub-graphs represents the parameter values and the y-axis represents the test accuracy. For example, FIG. 5(a) illustrates the use of different λHValue, test accuracy of three data sets. By analyzing the results of these experiments, the following conclusions can be drawn:
1) as can be seen in FIG. 5, the blue line (SVHN) is flattest, the orange line (CIFAR-10) is second, and the green line (CIFAR-100) is most tortuous when the value of the hyper-parameter value changes, indicating that the data sets are, as a whole, SVHN, CIFAR-10, CIFAR-100 in order of smaller to larger sensitivity to the parameter, while also indicating that MFB L S is more robust on simpler data sets.
2) From subgraph (a) to subgraph (d), λKThe change of (2) has the greatest influence on the result, λHThe effect on the result is small, λSAnd λFHas little effect on the results. When lambda isKThe classification performance on the respective data sets gradually decreases as the value of (c) increases.
3) From sub-graph (e) to sub-graph (h), the scaling parameter SH、SS、SK、SFThe accuracy on the three data sets hardly changed when the value of (c) increased.
4) As can be seen from subgraph (i) to subgraph (l), the parameter E of the number of the enhanced nodesH、ESAnd EKThe value of (a) has a slight influence on the result. ECHas little effect on the results.
5) As can be seen from the sub-graphs (m) and (n), when the number k of clustering centroids and the ridge regression parameter λ are changed, the blue line (SVHN) and the orange line (CIFAR-10) are relatively flat, while the green line (CIFAR-100) is very zigzag. This indicates that for CIFAR-100, the parameters k and λ have a large effect on the results, but not on SVHN and CIFAR-10.
6) As can be seen from subgraph (o) and subgraph (p), the change of the superparameter p of the PCA and the parameter γ for calculating the number of convolution kernels has little effect on the result.
7) FIG. 5 and the above conclusions indicate that the accuracy on the three data sets remains substantially around a certain value for most of the hyper-parametric variations, which indicates that the MFB L S model in this embodiment is very robust.
Time complexity
The experiment compared the run time of MFB L S and almost all other models on three datasets, as shown in Table 4. the run environment is an Intel Xeon E5-2678 CPU, and a block NVIDIA TITAN Xp. as seen in Table 4:
1) the runtime of the MFB L S method is longer than the runtime of the B L S, CNNB L S, EFB L S method, and the reason for this may be that MFB L S takes time in feature extraction, for example, on CIFAR-10, the time taken for MFB L S to perform feature extraction is about 900S.
2) MFB L S and K-means-B L S run at approximately CIFAR-10 and CIFAR-100 times, while MFB L S is longer on SVHN datasets.
3) The running time of the convolution DBN on the CIFAR-10 is 36h (NVIDIA GTX 280 is adopted) and is far more than that of the MFB L S, the time of the MFB L S is short, and the accuracy is higher than that of the convolution DBN, so that the MFB L S can reduce a large amount of running time while ensuring the classification performance.
4) Compared with the B L S correlation model, the MFB L S has longer running time but is within an acceptable range, and the classification performance of MFB L S is higher.
Table 4 MFB L S and other latest methods run time at SVHN, CIFAR-10 and CIFAR-100.
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.
Claims (4)
1. A width learning system based on multi-feature extraction is characterized in that: the system comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node;
each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after each sub-width learning system forms an enhanced node, combining the characteristic node with the corresponding enhanced node, and then connecting the feature node and the corresponding enhanced node to the sub-nodes of the sub-width learning system;
the width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.
2. The multi-feature extraction based width learning system according to claim 1, wherein:
the step of the first sub-width learning system extracting the HOG features of the image data set comprises:
1) normalizing the input image, and converting the image into a gray scale image;
2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;
3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;
4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;
5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;
the second sub-width learning system extracting color features of the image data set comprises:
1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;
2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;
3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;
4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vectors are the feature nodes of the second sub-width learning system.
The third sub-width learning system extracts the K-means feature of the image dataset comprising:
1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;
2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is Rd→RkR is a real number set, an image block is mapped into a feature vector, d is the dimension of the image block vector, and k is the number of clustering centers; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:
dj=||x-μ(j)||2
wherein, mu(j)Is the jth clustering center, and k is the number of the clustering centers; djRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;
the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:
1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;
2) after 4 times of convolution and pooling, flattening the obtained result into a vector;
3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.
3. The multi-feature extraction based width learning system according to claim 1, wherein: the enhanced mapping function is a non-linear mapping function.
4. The multi-feature extraction based width learning system of claim 1, 2 or 3, wherein:
the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:
the characteristic nodes corresponding to the HOG are as follows:
ZH=[h1,h2,...,hN]T∈RN×M(1)
wherein N is the number of samples, h1,h2,...,hNRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:
HH=φH(ZHWEH+βH) (1)
wherein WEHIs the mapping weight, βHIs an offset of phiHA non-linear activation function; weight WEHAnd an offset βHIs randomly generated; corresponding child node output UHIn the form of:
UH=[ZH,HH]WH=AHWH(3)
wherein A isH=[ZH,HH]The sub-width learning system objective function corresponding to the HOG is as follows:
wherein Y is a label set, λHIs a ridge regression parameter; derivation is performed on equation (4) to obtain:
wherein I is an identity matrix;
the method for solving the corresponding weight and sub-node output of the other three sub-width learning systems is the same as the method for solving the weight and sub-node output of the first sub-width learning system, and the method for solving the second sub-width learning systemAnd child node output USIn the same way, the subscript H corresponding to the HOG characteristic in the formula and the subscript S corresponding to the color characteristic are only needed to be replaced by the subscript H corresponding to the HOG characteristic, and the third sub-width learning system is solvedAnd child node output UKIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solvedAnd child node output UFThen, only the subscript H corresponding to the HOG characteristic in the formula is required to be replaced by the subscript F corresponding to the convolution characteristic;
the width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning systemH,US,UK,UFThe post-processing algorithm is as follows:
to UH,US,UK,UFAre respectively normalized to respectively obtain U'H,U'S,U'K,U'FAnd Z is set as:
Z=[U'H,U'S,U'K,U'F](7)
the overall output Y of the width learning system based on multi-feature extraction is as follows:
Y=[Z]W=AW (8)
where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:
wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:
W*=(ATAI+λI)-1ATY (10)
wherein I is an identity matrix;
the final output of the multi-feature extraction based width learning system is then:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010181905.9A CN111401443B (en) | 2020-03-16 | 2020-03-16 | Width learning system based on multi-feature extraction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010181905.9A CN111401443B (en) | 2020-03-16 | 2020-03-16 | Width learning system based on multi-feature extraction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111401443A true CN111401443A (en) | 2020-07-10 |
CN111401443B CN111401443B (en) | 2023-04-18 |
Family
ID=71432751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010181905.9A Active CN111401443B (en) | 2020-03-16 | 2020-03-16 | Width learning system based on multi-feature extraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111401443B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112802011A (en) * | 2021-02-25 | 2021-05-14 | 上海电机学院 | Fan blade defect detection method based on VGG-BLS |
CN113011493A (en) * | 2021-03-18 | 2021-06-22 | 华南理工大学 | Electroencephalogram emotion classification method, device, medium and equipment based on multi-kernel width learning |
CN113098910A (en) * | 2021-05-13 | 2021-07-09 | 福州大学 | Network intrusion detection method and system based on space-time granularity and three-width learning |
CN113159062A (en) * | 2021-03-23 | 2021-07-23 | 中国科学院深圳先进技术研究院 | Training of classification model, image classification method, electronic device and storage medium |
CN113283530A (en) * | 2021-06-08 | 2021-08-20 | 重庆大学 | Image classification system based on cascade characteristic blocks |
CN113705946A (en) * | 2020-09-28 | 2021-11-26 | 天翼智慧家庭科技有限公司 | User experience prediction method and system |
CN114492569A (en) * | 2021-12-20 | 2022-05-13 | 浙江大学 | Typhoon path classification method based on width learning system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920021A (en) * | 2019-03-07 | 2019-06-21 | 华东理工大学 | A kind of human face sketch synthetic method based on regularization width learning network |
CN110222453A (en) * | 2019-06-14 | 2019-09-10 | 中国矿业大学 | A kind of compressor outlet parameter prediction modeling method based on width learning system |
CN110243590A (en) * | 2019-06-25 | 2019-09-17 | 中国民航大学 | A kind of Fault Diagnosis Approach For Rotor Systems learnt based on principal component analysis and width |
CN110288088A (en) * | 2019-06-28 | 2019-09-27 | 中国民航大学 | Semi-supervised width study classification method based on manifold regularization and broadband network |
CN110458077A (en) * | 2019-08-05 | 2019-11-15 | 高新兴科技集团股份有限公司 | A kind of vehicle color identification method and system |
CN110490324A (en) * | 2019-08-21 | 2019-11-22 | 重庆大学 | A kind of gradient decline width learning system implementation method |
-
2020
- 2020-03-16 CN CN202010181905.9A patent/CN111401443B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109920021A (en) * | 2019-03-07 | 2019-06-21 | 华东理工大学 | A kind of human face sketch synthetic method based on regularization width learning network |
CN110222453A (en) * | 2019-06-14 | 2019-09-10 | 中国矿业大学 | A kind of compressor outlet parameter prediction modeling method based on width learning system |
CN110243590A (en) * | 2019-06-25 | 2019-09-17 | 中国民航大学 | A kind of Fault Diagnosis Approach For Rotor Systems learnt based on principal component analysis and width |
CN110288088A (en) * | 2019-06-28 | 2019-09-27 | 中国民航大学 | Semi-supervised width study classification method based on manifold regularization and broadband network |
CN110458077A (en) * | 2019-08-05 | 2019-11-15 | 高新兴科技集团股份有限公司 | A kind of vehicle color identification method and system |
CN110490324A (en) * | 2019-08-21 | 2019-11-22 | 重庆大学 | A kind of gradient decline width learning system implementation method |
Non-Patent Citations (2)
Title |
---|
INRIA RHˆONE-ALPS: "histograms-of-oriented-gradients-for-human-detection" * |
贾晨: "不同模态问题下宽度学习系统的研究与应用" * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705946A (en) * | 2020-09-28 | 2021-11-26 | 天翼智慧家庭科技有限公司 | User experience prediction method and system |
CN112802011A (en) * | 2021-02-25 | 2021-05-14 | 上海电机学院 | Fan blade defect detection method based on VGG-BLS |
CN113011493A (en) * | 2021-03-18 | 2021-06-22 | 华南理工大学 | Electroencephalogram emotion classification method, device, medium and equipment based on multi-kernel width learning |
CN113159062A (en) * | 2021-03-23 | 2021-07-23 | 中国科学院深圳先进技术研究院 | Training of classification model, image classification method, electronic device and storage medium |
WO2022199148A1 (en) * | 2021-03-23 | 2022-09-29 | 中国科学院深圳先进技术研究院 | Classification model training method, image classification method, electronic device and storage medium |
CN113159062B (en) * | 2021-03-23 | 2023-10-03 | 中国科学院深圳先进技术研究院 | Classification model training and image classification method, electronic device and storage medium |
CN113098910A (en) * | 2021-05-13 | 2021-07-09 | 福州大学 | Network intrusion detection method and system based on space-time granularity and three-width learning |
CN113283530A (en) * | 2021-06-08 | 2021-08-20 | 重庆大学 | Image classification system based on cascade characteristic blocks |
CN113283530B (en) * | 2021-06-08 | 2022-11-15 | 重庆大学 | Image classification system based on cascade characteristic blocks |
CN114492569A (en) * | 2021-12-20 | 2022-05-13 | 浙江大学 | Typhoon path classification method based on width learning system |
CN114492569B (en) * | 2021-12-20 | 2023-08-29 | 浙江大学 | Typhoon path classification method based on width learning system |
Also Published As
Publication number | Publication date |
---|---|
CN111401443B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401443B (en) | Width learning system based on multi-feature extraction | |
CN110163258B (en) | Zero sample learning method and system based on semantic attribute attention redistribution mechanism | |
CN109086658B (en) | Sensor data generation method and system based on generation countermeasure network | |
Połap | An adaptive genetic algorithm as a supporting mechanism for microscopy image analysis in a cascade of convolution neural networks | |
Wu et al. | Semi-supervised dimensionality reduction of hyperspectral imagery using pseudo-labels | |
CN111785329B (en) | Single-cell RNA sequencing clustering method based on countermeasure automatic encoder | |
CN111209398B (en) | Text classification method and system based on graph convolution neural network | |
CN109410184B (en) | Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning | |
CN111046900A (en) | Semi-supervised generation confrontation network image classification method based on local manifold regularization | |
CN109934278B (en) | High-dimensionality feature selection method for information gain mixed neighborhood rough set | |
Gu et al. | Blind image quality assessment via vector regression and object oriented pooling | |
Gumusbas et al. | Offline signature identification and verification using capsule network | |
Feng et al. | Marginal stacked autoencoder with adaptively-spatial regularization for hyperspectral image classification | |
Xia et al. | Weakly supervised multimodal kernel for categorizing aerial photographs | |
CN113554100B (en) | Web service classification method for enhancing attention network of special composition picture | |
CN112784921A (en) | Task attention guided small sample image complementary learning classification algorithm | |
Little et al. | Generative adversarial networks for synthetic data generation: a comparative study | |
Liu et al. | Unsupervised automatic attribute discovery method via multi-graph clustering | |
Sorci et al. | Modelling human perception of static facial expressions | |
CN115861729A (en) | Small sample forestry pest identification method based on deep learning algorithm | |
Pan et al. | Pseudo-set frequency refinement architecture for fine-grained few-shot class-incremental learning | |
CN109359694B (en) | Image classification method and device based on mixed collaborative representation classifier | |
Xue et al. | Learn decision trees with deep visual primitives | |
CN113344189A (en) | Neural network training method and device, computer equipment and storage medium | |
CN114548197A (en) | Clustering method based on self-discipline learning SDL model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |