CN111401443A - Width learning system based on multi-feature extraction - Google Patents

Width learning system based on multi-feature extraction Download PDF

Info

Publication number
CN111401443A
CN111401443A CN202010181905.9A CN202010181905A CN111401443A CN 111401443 A CN111401443 A CN 111401443A CN 202010181905 A CN202010181905 A CN 202010181905A CN 111401443 A CN111401443 A CN 111401443A
Authority
CN
China
Prior art keywords
sub
learning system
width learning
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010181905.9A
Other languages
Chinese (zh)
Other versions
CN111401443B (en
Inventor
刘然
刘亚琼
刘宴齐
田逢春
钱君辉
郑杨婷
赵洋
陈希
崔珊珊
王斐斐
陈丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
China Academy of Chinese Medical Sciences CACMS
Original Assignee
Chongqing University
China Academy of Chinese Medical Sciences CACMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University, China Academy of Chinese Medical Sciences CACMS filed Critical Chongqing University
Priority to CN202010181905.9A priority Critical patent/CN111401443B/en
Publication of CN111401443A publication Critical patent/CN111401443A/en
Application granted granted Critical
Publication of CN111401443B publication Critical patent/CN111401443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a width learning system based on multi-feature extraction, which comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node; each sub-width learning system extracts an image feature from the image data set, and each sub-width learning system combines the image features extracted from the image data set to obtain a respective feature node, and then enhances the respective feature node through an enhancement mapping function to form a corresponding enhancement node; after each sub-width learning system forms an enhanced node, the feature nodes of the sub-width learning system are merged with the corresponding enhanced nodes and then connected to the sub-nodes of the sub-width learning system, and then the output of the sub-nodes of each sub-width learning system is normalized and then connected to the final output layer. The method has the advantages of short model training time and high classification accuracy on the problem of complex data set classification.

Description

Width learning system based on multi-feature extraction
Technical Field
The invention relates to the technical field of image classification, in particular to a width learning system based on multi-feature extraction.
Background
Image classification is a hot problem in image processing, and aims to automatically classify a large number of images. The technology is widely used in applications such as pedestrian detection, video analysis, and image quality assessment.
In recent years, an image classification method based on deep learning has been widely focused and studied. Typical Deep learning models are Deep Belief Networks (DBN), Deep Boltzmann Machines (DBM), and Convolutional Neural Networks (CNN). CNN is widely used for image processing, especially image classification, due to the ability to learn higher levels of semantic features. The CNN consists of a convolution layer, a pooling layer and a full-connection layer, and the number of parameters can be effectively reduced by adopting a weight sharing method. Better performing image classification models, such as AlexNet, GoogleNet, ResNet, and GPipe, are then derived based on CNN. The deep convolution neural networks such as ResNet and GPipe have good effect on data sets such as MNIST, SVHN, CIFAR-10, CIFAR-100, ImageNet and the like. However, because the hidden layers in the network are numerous, the parameters such as the weight and the bias which need to be trained are more than millions, and the training mode of the deep learning model is based on the gradient descent algorithm and the back propagation algorithm, the model training speed is slow, and the time is long.
To solve this problem, Chen et al propose a breadth learning System (B L S), and prove that the model has a universal approximation property (B L S) which can be effectively applied to classification and regression tasks, B L S is based on a Random vector functional Neural Network (RVF L NN), has a flat Network architecture with only one hidden layer, weights and biased Random assignments in the Network, and is not updated during training, the Network uses ridge regression to find optimal weights, therefore, the Network can quickly classify images.
To improve the classification performance of B L S, L iu et al introduced a K-means feature representation method in original B L S, and proposed a K-means-B L S model, which extracts K-means features, uses the features instead of the original image input, inputs them into B L S to improve the classification effect of B L S at CIFAR-10. in view of the local invariance of image data, Jin et al proposed a GB L S model, introduces manifold learning into the objective function of the model, constrains the output weights, and further improves the classification capability of the model.
From the above discussion, we can see that the deep learning network can realize accurate classification on complex data sets, but has the problems of long training time and repeated parameter adjustment, and the B L S and various improved models do not sufficiently learn the features of image data due to the shallow structure thereof, so that the classification performance of the model is not very good when classifying the complex data sets although the training time of the model is short.
Disclosure of Invention
In view of the above, in order to solve the existing problems described above, the present invention provides a width learning system based on multi-feature extraction, so as to solve the technical problem that the existing image classification method does not combine the advantages of short model training time and high classification accuracy in the complex data set classification problem.
The invention relates to a width learning system based on multi-feature extraction, which comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node;
each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after each sub-width learning system forms an enhanced node, combining the characteristic node with the corresponding enhanced node, and then connecting the feature node and the corresponding enhanced node to the sub-nodes of the sub-width learning system;
the width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.
Further, the step of extracting the HOG features of the image data set by the first sub-width learning system of the multi-feature extraction-based width learning system includes:
1) normalizing the input image, and converting the image into a gray scale image;
2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;
3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;
4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;
5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;
the second sub-width learning system extracting color features of the image data set comprises:
1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;
2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;
3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;
4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vector is a feature node of the second sub-width learning system;
the third sub-width learning system extracts the K-means feature of the image dataset comprising:
1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;
2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is Rd→RkR is a real number set, an image block is mapped into a feature vector, and d is the dimension of the image block vector; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:
Figure BDA0002412853600000041
wherein, mu(j)Is the jth clustering center, and k is the number of the clustering centers; djRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;
the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:
1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;
2) after 4 times of convolution and pooling, flattening the obtained result into a vector;
3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.
Further, the enhancement mapping function is a non-linear mapping function.
Further, the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:
the characteristic nodes corresponding to the HOG are as follows:
ZH=[h1,h2,...,hN]T∈RN×M(1)
wherein N is the number of samples, h1,h2,...,hNRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:
HH=φH(ZHWEHH) (1)
wherein WEHIs the mapping weight, βHIs an offset of phiHA non-linear activation function; weight WEHAnd an offset βHIs randomly generated; corresponding child node output UHIn the form of:
UH=[ZH,HH]WH=AHWH(3)
wherein A isH=[ZH,HH]The sub-width learning system objective function corresponding to the HOG is as follows:
Figure BDA0002412853600000051
wherein Y is a label set, λHIs a ridge regression parameter; derivation is performed on equation (4) to obtain:
Figure BDA0002412853600000052
wherein I is an identity matrix;
Figure BDA0002412853600000053
method for solving weights and sub-node outputs corresponding to other three sub-width learning systems and method for solving weights and sub-node outputs by first sub-width learning systemIn the same way, in solving a second sub-width learning system
Figure BDA0002412853600000054
And child node output USIn the same way, the subscript H corresponding to the HOG feature and the subscript S corresponding to the color feature in the formula are only needed to be replaced by the subscript H corresponding to the HOG feature, and the third sub-width learning system is solved
Figure BDA0002412853600000055
And child node output UKIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solved
Figure BDA0002412853600000056
And child node output UFThen, only the subscript H corresponding to the HOG characteristic in the formula is needed to be replaced by the subscript F corresponding to the convolution characteristic;
the width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning systemH,US,UK,UFThe post-processing algorithm is as follows:
to UH,US,UK,UFAre respectively normalized to respectively obtain U'H,U'S,U'K,U'FAnd Z is set as:
Z=[U'H,U'S,U'K,U'F](7)
the overall output Y of the width learning system based on multi-feature extraction is as follows:
Y=[Z]W=AW (8)
where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:
Figure BDA0002412853600000061
wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:
W*=(ATAI+λI)-1ATY (10)
wherein I is an identity matrix;
the final output of the multi-feature extraction based width learning system is then:
Figure BDA0002412853600000062
the invention has the beneficial effects that:
the invention is based on a width learning system of multi-feature extraction, adopts a multi-feature extraction method to replace a random mapping method of an original width learning system (B L S for short), extracts K-means features, HOG features, color features and convolution features of an image, and can remarkably improve the feature learning capability of B L S, takes the fact that the four features represent different meanings and focus points on the image into consideration, constructs four independent sub-B L S, respectively carries out enhanced mapping on each feature, all sub-B L S jointly form a large width learning system based on multi-feature extraction, MFB L S for short, comprehensively considers the output of each sub-B L S, simultaneously uses a normalization layer to improve the generalization capability of the model, experiments on complex data sets such as SVHN, CIFAR-10 and CIFAR-100 show that (1) the classification performance of the proposed MFB L S model on the complex data set is superior to that of the existing width learning model on the complex data set, and the classification performance of the MFB L S model on the complex data set is superior to that the existing width learning model is extracted by comparison with the existing multi-feature extraction model, and the comparison of the training model is not only based on the comparison of the rough classification model, the comparison of the classification model, the classification cost is lower than the comparison of the classification model (MDBS 3650), the classification) is proved by adopting the training of the classification).
Drawings
Fig. 1 is a schematic structural diagram of a multi-feature extraction-based width learning system MFB L S, in which a dashed box represents a sub-B L S.
Fig. 2 is an example SVHN dataset.
FIG. 3 is a CIFAR-10 dataset example.
FIG. 4 is a CIFAR-100 dataset example.
FIG. 5 is a parameter sensitivity study on MFB L S on SVHN, CIFAR-10, and CIFAR-100 datasetsHResults on three data sets with different values.
Detailed Description
The invention is further described below with reference to the figures and examples.
The width learning system based on multi-feature extraction in this embodiment includes four sub-width learning systems, and each sub-width learning system includes a feature node, an enhanced node, and a sub-node.
Each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after the enhancement nodes are formed, the characteristic nodes of the sub-width learning systems are merged with the corresponding enhancement nodes and then connected to the sub-nodes.
The width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.
In the multi-feature extraction-based width learning system of the present embodiment, the step of extracting the HOG features of the image data set by the first sub-width learning system includes:
1) normalizing the input image, and converting the image into a gray scale image;
2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;
3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;
4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;
5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;
the second sub-width learning system extracting color features of the image data set comprises:
1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;
2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;
3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;
4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vector is a feature node of the second sub-width learning system;
the third sub-width learning system extracts the K-means feature of the image dataset comprising:
1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;
2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is Rd→RkR is a real number set, an image block is mapped into a feature vector, d is the dimension of the image block vector, and k is the number of clustering centers; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:
Figure BDA0002412853600000091
wherein, mu(j)Is the jth clustering center, and k is the number of the clustering centers; djRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;
the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:
1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;
2) after 4 times of convolution and pooling, flattening the obtained result into a vector;
3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.
The enhancement mapping function described in this embodiment is a non-linear mapping function.
In the width learning system based on multi-feature extraction in this embodiment, the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:
the characteristic nodes corresponding to the HOG are as follows:
ZH=[h1,h2,...,hN]T∈RN×M(1)
wherein N is the number of samples, h1,h2,...,hNRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:
HH=φH(ZHWEHH) (2)
wherein WEHIs the mapping weight, βHIs an offset of phiHA non-linear activation function; weight WEHAnd an offset βHIs randomly generated; corresponding child node output UHIn the form of:
UH=[ZH,HH]WH=AHWH(3)
wherein A isH=[ZH,HH]The sub-width learning system objective function corresponding to the HOG is as follows:
Figure BDA0002412853600000101
wherein Y is a label set, λHIs a ridge regression parameter; derivation is performed on equation (4) to obtain:
Figure BDA0002412853600000106
wherein I is an identity matrix;
Figure BDA0002412853600000102
since the four sub-B L S have the same structure except the different features, the weight W of the four sub-B L S is the same as the solving way of the output U, and the second sub-width learning system is solved
Figure BDA0002412853600000103
And child node output USIn the same way, the subscript H corresponding to the HOG feature and the subscript S corresponding to the color feature in the formula are only needed to be replaced by the subscript H corresponding to the HOG feature, and the third sub-width learning system is solved
Figure BDA0002412853600000104
And child node output UKIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solved
Figure BDA0002412853600000107
And child node output UFIn the above formula, the subscript H corresponding to the HOG feature and the subscript F corresponding to the convolution feature are simply replaced with each other. In order to make the writing concise and avoid large text duplication, the solution is not yet to be solved here
Figure BDA0002412853600000105
US,UK,UFThe specific formulas of (A) and (B) are listed.
The width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning systemH,US,UK,UFThe post-processing algorithm is as follows:
to UH,US,UK,UFAre respectively normalized to respectively obtain U'H,U'S,U'K,U'FAnd Z is set as:
Z=[U'H,U'S,U'K,U'F](7)
the overall output Y of the width learning system based on multi-feature extraction is as follows:
Y=[Z]W=AW (8)
where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:
Figure BDA0002412853600000111
wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:
W*=(ATAI+λI)-1ATY (10)
wherein I is an identity matrix;
the final output of the multi-feature extraction based width learning system is then:
Figure BDA0002412853600000112
the following is an image classification experiment performed on the SVHN, CIFAR-10, CIFAR-100 data sets by MFB L S in this example, and compared to other advanced methods.
Data set and settings
SVHN and MINST are similar and are both sets of number identifications. However, the MNIST image is in a binary format, and the background and the number are easily separated. And the SVHN image is in an RGB format, and the image background is more complicated. Image classification for SVHN is therefore more challenging. The SVHN dataset consists of a training set, an additional set, and a test set. The training set had 73257 samples, the additional set had 531131 samples, and the test set had 26032 samples. The data set is a 10 classification. Some examples of SVHN datasets are shown in fig. 2. Experiment 36743 samples were randomly selected from the extra set and combined with the training set to form a new training set, which has 11000 samples. In addition, 1000 samples are chosen from the remaining extra set as the validation set. The data set information used for the experiment is shown in table 1.
The CIFAR-10 dataset consists of 60000 RGB images of 32 × 32, the training set has 50000 images, the testing set has 10000 images, the dataset has 10 classes each of which contains 5000 training images and 1000 testing images, the classes include airplanes, cars, birds, cats, deer, dogs, frogs, horses, boats, and trucks.
CIFAR-100 is similar to CIFAR-10 in that CIFAR-100 image is also a 32 × 32 RGB image, except that the classification of CIFAR-100 is more refined the dataset has 100 classes, each with 500 training samples and 100 test samples, 100 classes are divided into 20 super classes, for example the super class of flowers contains five classes orchid, poppy, rose, sunflower, tulip.
TABLE 1 MFB L S Experimental data set information
Figure BDA0002412853600000121
The hyper-parameters in the experiment need to be set, and the hyper-parameters are divided into two types, one type is the hyper-parameters of feature extraction, and the other type is the hyper-parameters of the B L S model.
The hyper-parameters of the feature extraction comprise (1) the size of an image block and the size of the image block of k.SVHN when the K-means feature is extracted, the size of the image block of k.SVHN is 8 × 8, the value of K is 500, the size of the image block of CIFAR-10 and CIFAR-100 is 6 × 6, and the value of K is 1024 and 1300 respectively, wherein the K-means feature of SVHN is extracted on a gray scale map of input data, and the CIFAR-10 and CIFAR-100 are extracted on 3 channels, (2) when the convolution feature is extracted, the values of a parameter gamma for calculating the number of convolution kernels and the values of PCA hyper-parameters p.SVHN, CIFAR-10 and CIFAR-100 are 0.2, 0.18 and 0.18 respectively, and the value of p is 0.99 on all data sets.
The hyper-parameters of the B L S model are (1) ridge regression parameter lambdaH、λS、λK、λFλ, λ. (2) Scaling parameter S when generating an enhanced nodeH、SS、SK、SF. (3) Number of enhanced nodes EH、ES、EK、EF. The settings of these parameters on the respective data sets are shown in table 2.
TABLE 2 MFB L S Experimental parameter settings
Figure BDA0002412853600000131
Results of the experiment
The present example compares the proposed method to other advanced methods on three datasets, SVHN, CIFAR-10 and CIFAR-100. other comparison methods are B L S, K-means-B L S, CNNB L S, EFB L S and convolutional DBN, respectively. convolutional DBN is a depth model Table 3 shows the best test accuracy achieved on three datasets using MFB L S and other methods the following conclusions can be drawn from Table 3:
1) K-means-B L S gave better results than B L S and EFB L S, indicating that classification using K-means features can significantly improve model performance.
2) CNNB L S gave better results than B L S and EFB L S, indicating that the use of convolution and pooling operations to extract features helps improve the discriminative power of the model.
3) On a CIFAR-100 data set, the classification accuracy of MFB L S is 12.71% higher than that of K-means-B L S, and it can be shown that the classification performance of the model can be obviously improved by using other characteristics besides the K-means characteristic, namely convolution characteristic, HOG characteristic and color characteristic.
4) The proposed MFB L S achieves the highest classification accuracy on SVHN, CIFAR-10 and CIFAR-100. firstly, the classification performance of MFB L is superior to other width learning models for image classification, secondly, on CIFAR-10, 81.03% of classification accuracy is obtained, which is 2.14% higher than that of a convolution DBN, which shows that the performance of MFB L S is also superior to that of a convolution DBN model, and finally, in 6 comparison experiments, the performance of MFB L S on three data sets is superior to that of other comparison methods, thereby proving the effectiveness of the method.
Additionally, MFB L S is able to exceed the results of the baseline convolution DBN model and does not have to be pre-trained.
TABLE 3 accuracy (%) (of MFB L S and other Current methods in SVHN test set, CIFAR-10 test set, and CIFAR-100 test set)
Figure BDA0002412853600000132
Figure BDA0002412853600000141
Sensitivity of parameters
By performing a parameter sensitivity analysis, it can be shown that the proposed MFB L S framework can achieve optimal results over a wide range of parameter values, while verifying the robustness of MFB L S.
The value range of the over-parameter in the experiment is as follows:
1) ridge regression parameter lambdaH、λS、λK、λFλ is selected from the set {0.001,0.005,0.01,0.05,0.1,0.5,1,5,10 }.
2) Scaling parameter SH、SS、SK、SFSelected from the set {0.6,0.65,0.7,0.75,0.8,0.85,0.9,0.95 }.
3) Number of enhanced nodes EH、ES、EK、EFSelected from the set 500,1000,4000,7000,8000,9000,10000.
4) And when the K-means characteristics are extracted, the number K of the centroids is selected from the set {500,700,900,1100,1300,1500,1700 }.
5) The hyperparameter p of the PCA is selected from the set {0.91,0.93,0.95,0.97,0.99 }.
6) The parameter γ used to calculate the number of convolution kernels is selected from the set {0.05,0.1,0.15,0.2,0.25 }.
The results obtained on the SVHN, CIFAR-10 and CIFAR-100 datasets using different hyper-parameter settings are shown in FIG. 5. It can be seen that fig. 5 consists of 16 sub-graphs, each of which describes the effect on the result of the above 16 hyper-parameters on the respective data set. The x-axis of all sub-graphs represents the parameter values and the y-axis represents the test accuracy. For example, FIG. 5(a) illustrates the use of different λHValue, test accuracy of three data sets. By analyzing the results of these experiments, the following conclusions can be drawn:
1) as can be seen in FIG. 5, the blue line (SVHN) is flattest, the orange line (CIFAR-10) is second, and the green line (CIFAR-100) is most tortuous when the value of the hyper-parameter value changes, indicating that the data sets are, as a whole, SVHN, CIFAR-10, CIFAR-100 in order of smaller to larger sensitivity to the parameter, while also indicating that MFB L S is more robust on simpler data sets.
2) From subgraph (a) to subgraph (d), λKThe change of (2) has the greatest influence on the result, λHThe effect on the result is small, λSAnd λFHas little effect on the results. When lambda isKThe classification performance on the respective data sets gradually decreases as the value of (c) increases.
3) From sub-graph (e) to sub-graph (h), the scaling parameter SH、SS、SK、SFThe accuracy on the three data sets hardly changed when the value of (c) increased.
4) As can be seen from subgraph (i) to subgraph (l), the parameter E of the number of the enhanced nodesH、ESAnd EKThe value of (a) has a slight influence on the result. ECHas little effect on the results.
5) As can be seen from the sub-graphs (m) and (n), when the number k of clustering centroids and the ridge regression parameter λ are changed, the blue line (SVHN) and the orange line (CIFAR-10) are relatively flat, while the green line (CIFAR-100) is very zigzag. This indicates that for CIFAR-100, the parameters k and λ have a large effect on the results, but not on SVHN and CIFAR-10.
6) As can be seen from subgraph (o) and subgraph (p), the change of the superparameter p of the PCA and the parameter γ for calculating the number of convolution kernels has little effect on the result.
7) FIG. 5 and the above conclusions indicate that the accuracy on the three data sets remains substantially around a certain value for most of the hyper-parametric variations, which indicates that the MFB L S model in this embodiment is very robust.
Time complexity
The experiment compared the run time of MFB L S and almost all other models on three datasets, as shown in Table 4. the run environment is an Intel Xeon E5-2678 CPU, and a block NVIDIA TITAN Xp. as seen in Table 4:
1) the runtime of the MFB L S method is longer than the runtime of the B L S, CNNB L S, EFB L S method, and the reason for this may be that MFB L S takes time in feature extraction, for example, on CIFAR-10, the time taken for MFB L S to perform feature extraction is about 900S.
2) MFB L S and K-means-B L S run at approximately CIFAR-10 and CIFAR-100 times, while MFB L S is longer on SVHN datasets.
3) The running time of the convolution DBN on the CIFAR-10 is 36h (NVIDIA GTX 280 is adopted) and is far more than that of the MFB L S, the time of the MFB L S is short, and the accuracy is higher than that of the convolution DBN, so that the MFB L S can reduce a large amount of running time while ensuring the classification performance.
4) Compared with the B L S correlation model, the MFB L S has longer running time but is within an acceptable range, and the classification performance of MFB L S is higher.
Table 4 MFB L S and other latest methods run time at SVHN, CIFAR-10 and CIFAR-100.
Figure BDA0002412853600000161
Finally, although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (4)

1. A width learning system based on multi-feature extraction is characterized in that: the system comprises four sub-width learning systems, wherein each sub-width learning system comprises a feature node, an enhancement node and a sub-node;
each sub-width learning system extracts an image feature from an image data set, the image features extracted by the sub-width learning systems are different from each other, the first sub-width learning system extracts the HOG feature of the image data set, the second sub-width learning system extracts the color feature of the image data set, the third sub-width learning system extracts the K-means feature of the image data set, and the fourth sub-width learning system extracts the convolution feature of the image data set; combining the image features extracted from the image data set by each sub-width learning system to obtain respective feature nodes, and enhancing the respective feature nodes by an enhanced mapping function to form corresponding enhanced nodes; after each sub-width learning system forms an enhanced node, combining the characteristic node with the corresponding enhanced node, and then connecting the feature node and the corresponding enhanced node to the sub-nodes of the sub-width learning system;
the width learning system based on multi-feature extraction further comprises a normalization layer for normalizing the output of the child nodes of each child width learning system and a final output layer connected with each normalization layer.
2. The multi-feature extraction based width learning system according to claim 1, wherein:
the step of the first sub-width learning system extracting the HOG features of the image data set comprises:
1) normalizing the input image, and converting the image into a gray scale image;
2) dividing the image into a plurality of small areas, wherein the small areas are called cells, and the dividing method adopts an overlapping dividing method that the divided areas can be overlapped with each other;
3) calculating gradient values and gradient directions of pixel points in each cell to obtain a gradient direction histogram of the region;
4) in a larger area, naming the larger area as blocks, calculating a cumulative gradient direction histogram, and then normalizing all cells in the blocks;
5) merging the gradient direction histograms of all cells to obtain an HOG characteristic; the extracted HOG features are feature nodes of a first sub-width learning system;
the second sub-width learning system extracting color features of the image data set comprises:
1) converting the image from an RGB space to an HSV space, and extracting features in the HSV space;
2) calculating a histogram of the image by respectively using 6 bins, 4 bins and 4 bins according to the value ranges of three channels of the HSV space, and obtaining a 96-dimensional color histogram vector as a result;
3) respectively calculating a first-order color moment, a second-order color moment and a third-order color moment of the pixel on the three channels to finally form a 9-dimensional color moment vector;
4) merging the color histogram vector with the color moment vector, thereby forming a 105-dimensional color feature vector; the extracted color feature vectors are the feature nodes of the second sub-width learning system.
The third sub-width learning system extracts the K-means feature of the image dataset comprising:
1) sampling an image block set from a training set, then carrying out standardization and ZCA whitening on image blocks in the image block set, and finally carrying out K-means clustering on the image block set to obtain a clustering dictionary D;
2) for a three-channel color image, performing sliding sampling with the step of 1 and the interval of 0 by using a window, wherein the size of the window is consistent with the size of an image block when a clustering dictionary D is solved; after sampling, a plurality of image blocks can be obtained and are represented by x; performing feature mapping on each image block by using a clustering dictionary D, wherein a mapping function f is Rd→RkR is a real number set, an image block is mapped into a feature vector, d is the dimension of the image block vector, and k is the number of clustering centers; the mapping method is a hard coding method, and the mapping function f (x; D) of the method is as follows:
Figure FDA0002412853590000021
dj=||x-μ(j)||2
Figure FDA0002412853590000022
wherein, mu(j)Is the jth clustering center, and k is the number of the clustering centers; djRepresenting the distance between the image block x and the jth cluster center; after feature mapping is carried out, each image block is converted into a k-dimensional vector; dividing all image blocks into four parts, performing maximum pooling, combining and standardizing pooled results, wherein the final result is a K-means characteristic with a characteristic dimension of 4K; the extracted K-means characteristics are characteristic nodes of a third sub-width learning system;
the step of the fourth sub-width learning system extracting convolution features of the image data set comprises:
1) performing convolution operation on the image, and then performing pooling operation, wherein the convolution operation and the pooling operation are alternately performed for 4 times;
2) after 4 times of convolution and pooling, flattening the obtained result into a vector;
3) using a PCA method to reduce the dimension of the vector, wherein the final result after the dimension reduction is the convolution characteristic; the extracted convolution features are feature nodes of the fourth sub-width learning system.
3. The multi-feature extraction based width learning system according to claim 1, wherein: the enhanced mapping function is a non-linear mapping function.
4. The multi-feature extraction based width learning system of claim 1, 2 or 3, wherein:
the processing algorithm of the first sub-width learning system after the extracted HOG features is as follows:
the characteristic nodes corresponding to the HOG are as follows:
ZH=[h1,h2,...,hN]T∈RN×M(1)
wherein N is the number of samples, h1,h2,...,hNRespectively corresponding to the HOG characteristics of each sample; m is the HOG feature dimension of a single sample, that is, the HOG feature of each sample is a vector of M dimensions, and the output of the corresponding enhanced node is:
HH=φH(ZHWEHH) (1)
wherein WEHIs the mapping weight, βHIs an offset of phiHA non-linear activation function; weight WEHAnd an offset βHIs randomly generated; corresponding child node output UHIn the form of:
UH=[ZH,HH]WH=AHWH(3)
wherein A isH=[ZH,HH]The sub-width learning system objective function corresponding to the HOG is as follows:
Figure FDA0002412853590000031
wherein Y is a label set, λHIs a ridge regression parameter; derivation is performed on equation (4) to obtain:
Figure FDA0002412853590000032
wherein I is an identity matrix;
Figure FDA0002412853590000033
the method for solving the corresponding weight and sub-node output of the other three sub-width learning systems is the same as the method for solving the weight and sub-node output of the first sub-width learning system, and the method for solving the second sub-width learning system
Figure FDA0002412853590000041
And child node output USIn the same way, the subscript H corresponding to the HOG characteristic in the formula and the subscript S corresponding to the color characteristic are only needed to be replaced by the subscript H corresponding to the HOG characteristic, and the third sub-width learning system is solved
Figure FDA0002412853590000042
And child node output UKIn the method, the subscript H corresponding to the HOG characteristic in the formula and the subscript K corresponding to the K-means characteristic are only needed to be replaced, and the fourth sub-width learning system is solved
Figure FDA0002412853590000043
And child node output UFThen, only the subscript H corresponding to the HOG characteristic in the formula is required to be replaced by the subscript F corresponding to the convolution characteristic;
the width learning system based on multi-feature extraction obtains the child node output U corresponding to each child width learning systemH,US,UK,UFThe post-processing algorithm is as follows:
to UH,US,UK,UFAre respectively normalized to respectively obtain U'H,U'S,U'K,U'FAnd Z is set as:
Z=[U'H,U'S,U'K,U'F](7)
the overall output Y of the width learning system based on multi-feature extraction is as follows:
Y=[Z]W=AW (8)
where a ═ Z, W is the overall weight connecting the feature node and the enhancement node to the output, W is obtained by minimizing the objective function:
Figure FDA0002412853590000044
wherein λ is a ridge regression parameter, solving the above equation using a ridge regression method to obtain:
W*=(ATAI+λI)-1ATY (10)
wherein I is an identity matrix;
the final output of the multi-feature extraction based width learning system is then:
Figure FDA0002412853590000045
CN202010181905.9A 2020-03-16 2020-03-16 Width learning system based on multi-feature extraction Active CN111401443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010181905.9A CN111401443B (en) 2020-03-16 2020-03-16 Width learning system based on multi-feature extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010181905.9A CN111401443B (en) 2020-03-16 2020-03-16 Width learning system based on multi-feature extraction

Publications (2)

Publication Number Publication Date
CN111401443A true CN111401443A (en) 2020-07-10
CN111401443B CN111401443B (en) 2023-04-18

Family

ID=71432751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010181905.9A Active CN111401443B (en) 2020-03-16 2020-03-16 Width learning system based on multi-feature extraction

Country Status (1)

Country Link
CN (1) CN111401443B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802011A (en) * 2021-02-25 2021-05-14 上海电机学院 Fan blade defect detection method based on VGG-BLS
CN113011493A (en) * 2021-03-18 2021-06-22 华南理工大学 Electroencephalogram emotion classification method, device, medium and equipment based on multi-kernel width learning
CN113098910A (en) * 2021-05-13 2021-07-09 福州大学 Network intrusion detection method and system based on space-time granularity and three-width learning
CN113159062A (en) * 2021-03-23 2021-07-23 中国科学院深圳先进技术研究院 Training of classification model, image classification method, electronic device and storage medium
CN113283530A (en) * 2021-06-08 2021-08-20 重庆大学 Image classification system based on cascade characteristic blocks
CN113705946A (en) * 2020-09-28 2021-11-26 天翼智慧家庭科技有限公司 User experience prediction method and system
CN114492569A (en) * 2021-12-20 2022-05-13 浙江大学 Typhoon path classification method based on width learning system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920021A (en) * 2019-03-07 2019-06-21 华东理工大学 A kind of human face sketch synthetic method based on regularization width learning network
CN110222453A (en) * 2019-06-14 2019-09-10 中国矿业大学 A kind of compressor outlet parameter prediction modeling method based on width learning system
CN110243590A (en) * 2019-06-25 2019-09-17 中国民航大学 A kind of Fault Diagnosis Approach For Rotor Systems learnt based on principal component analysis and width
CN110288088A (en) * 2019-06-28 2019-09-27 中国民航大学 Semi-supervised width study classification method based on manifold regularization and broadband network
CN110458077A (en) * 2019-08-05 2019-11-15 高新兴科技集团股份有限公司 A kind of vehicle color identification method and system
CN110490324A (en) * 2019-08-21 2019-11-22 重庆大学 A kind of gradient decline width learning system implementation method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920021A (en) * 2019-03-07 2019-06-21 华东理工大学 A kind of human face sketch synthetic method based on regularization width learning network
CN110222453A (en) * 2019-06-14 2019-09-10 中国矿业大学 A kind of compressor outlet parameter prediction modeling method based on width learning system
CN110243590A (en) * 2019-06-25 2019-09-17 中国民航大学 A kind of Fault Diagnosis Approach For Rotor Systems learnt based on principal component analysis and width
CN110288088A (en) * 2019-06-28 2019-09-27 中国民航大学 Semi-supervised width study classification method based on manifold regularization and broadband network
CN110458077A (en) * 2019-08-05 2019-11-15 高新兴科技集团股份有限公司 A kind of vehicle color identification method and system
CN110490324A (en) * 2019-08-21 2019-11-22 重庆大学 A kind of gradient decline width learning system implementation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
INRIA RHˆONE-ALPS: "histograms-of-oriented-gradients-for-human-detection" *
贾晨: "不同模态问题下宽度学习系统的研究与应用" *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705946A (en) * 2020-09-28 2021-11-26 天翼智慧家庭科技有限公司 User experience prediction method and system
CN112802011A (en) * 2021-02-25 2021-05-14 上海电机学院 Fan blade defect detection method based on VGG-BLS
CN113011493A (en) * 2021-03-18 2021-06-22 华南理工大学 Electroencephalogram emotion classification method, device, medium and equipment based on multi-kernel width learning
CN113159062A (en) * 2021-03-23 2021-07-23 中国科学院深圳先进技术研究院 Training of classification model, image classification method, electronic device and storage medium
WO2022199148A1 (en) * 2021-03-23 2022-09-29 中国科学院深圳先进技术研究院 Classification model training method, image classification method, electronic device and storage medium
CN113159062B (en) * 2021-03-23 2023-10-03 中国科学院深圳先进技术研究院 Classification model training and image classification method, electronic device and storage medium
CN113098910A (en) * 2021-05-13 2021-07-09 福州大学 Network intrusion detection method and system based on space-time granularity and three-width learning
CN113283530A (en) * 2021-06-08 2021-08-20 重庆大学 Image classification system based on cascade characteristic blocks
CN113283530B (en) * 2021-06-08 2022-11-15 重庆大学 Image classification system based on cascade characteristic blocks
CN114492569A (en) * 2021-12-20 2022-05-13 浙江大学 Typhoon path classification method based on width learning system
CN114492569B (en) * 2021-12-20 2023-08-29 浙江大学 Typhoon path classification method based on width learning system

Also Published As

Publication number Publication date
CN111401443B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111401443B (en) Width learning system based on multi-feature extraction
CN110163258B (en) Zero sample learning method and system based on semantic attribute attention redistribution mechanism
CN109086658B (en) Sensor data generation method and system based on generation countermeasure network
Połap An adaptive genetic algorithm as a supporting mechanism for microscopy image analysis in a cascade of convolution neural networks
Wu et al. Semi-supervised dimensionality reduction of hyperspectral imagery using pseudo-labels
CN111785329B (en) Single-cell RNA sequencing clustering method based on countermeasure automatic encoder
CN111209398B (en) Text classification method and system based on graph convolution neural network
CN109410184B (en) Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning
CN111046900A (en) Semi-supervised generation confrontation network image classification method based on local manifold regularization
CN109934278B (en) High-dimensionality feature selection method for information gain mixed neighborhood rough set
Gu et al. Blind image quality assessment via vector regression and object oriented pooling
Gumusbas et al. Offline signature identification and verification using capsule network
Feng et al. Marginal stacked autoencoder with adaptively-spatial regularization for hyperspectral image classification
Xia et al. Weakly supervised multimodal kernel for categorizing aerial photographs
CN113554100B (en) Web service classification method for enhancing attention network of special composition picture
CN112784921A (en) Task attention guided small sample image complementary learning classification algorithm
Little et al. Generative adversarial networks for synthetic data generation: a comparative study
Liu et al. Unsupervised automatic attribute discovery method via multi-graph clustering
Sorci et al. Modelling human perception of static facial expressions
CN115861729A (en) Small sample forestry pest identification method based on deep learning algorithm
Pan et al. Pseudo-set frequency refinement architecture for fine-grained few-shot class-incremental learning
CN109359694B (en) Image classification method and device based on mixed collaborative representation classifier
Xue et al. Learn decision trees with deep visual primitives
CN113344189A (en) Neural network training method and device, computer equipment and storage medium
CN114548197A (en) Clustering method based on self-discipline learning SDL model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant