CN111598187A

CN111598187A - Progressive integrated classification method based on kernel width learning system

Info

Publication number: CN111598187A
Application number: CN202010579123.0A
Authority: CN
Inventors: 余志文; 蓝侃侃
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-08-27
Filing date: 2020-06-23
Publication date: 2020-08-28

Abstract

The invention discloses a progressive integrated classification method based on a kernel width learning system, which comprises the following steps: 1) inputting a training sample and a test sample; 2) training a kernel width learning system as a base classifier by using original training data; 3) calculating a prediction residual according to the training result of the first base classifier, and using the prediction residual as a label for the training of the next base classifier; 4) when the reduction rate of the loss function value of the training reaches a threshold value, stopping the training and not increasing the base classifier; 5) and classifying the test samples to obtain the final prediction result. According to the method, the breadth learning system is utilized, tedious back propagation is not needed, meanwhile, the kernel mapping technology is introduced, the nonlinear fitting capacity of the classifier is improved, the integration means is used for fusing a plurality of base classifiers, the effect is obviously improved on a noisy biological information data set, and the accuracy of biological gene classification is improved.

Description

Progressive integrated classification method based on kernel width learning system

Technical Field

The invention relates to the technical field of biological gene data analysis, in particular to a progressive integrated classification method based on a kernel width learning system.

Background

With the popularization and rapid development of the mobile internet, intelligent medical treatment is produced. In intelligent medicine, including the field of research and analysis of biological genes, machine learning may serve to mine potential patterns and features from biological information. Data mining of biological information is a difficult task, and because biological genes contain many genes with non-expression traits and data noise caused by insufficient acquisition technology, some samples are not matched with genetic disease relevance under real situations, and prediction and classification of the biological genes are influenced, a machine learning model which is more robust and has noise resistance is needed.

Although the deep learning algorithm can extract deep features, due to the excessive dependence on the size of the data set, the deep learning algorithm is often overfitting when the data size is not large enough, and even if the data size is large enough, the deep learning algorithm needs to consume a long training time. How to find a more efficient and noise-resistant model is a hot issue in the field of biological information.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a progressive integrated classification method based on a kernel width learning system, can effectively solve the problem that the work of feature engineering is too complicated, solves the output weight by using pseudo-inverse calculation instead of back propagation, improves the training efficiency, is combined with an integrated learning method, avoids falling into local optimum and overfitting while improving the classification effect of the kernel width learning system, and has good resistance to noise in biological information.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a progressive integration classification method based on a kernel width learning system comprises the following steps:

1) inputting a training sample and a test sample, and preprocessing data in the samples;

2) training a kernel width learning system (KBLS) as a first base classifier;

3) the residual is calculated from the results of the first base classifier as follows:

3.1) defining an integrator and a loss function thereof, and calculating the gradient of the loss function of the integrator;

3.2) solving the sub-gradient of the integrator loss function;

3.3) combining the results obtained in the step 3.1) and the step 3.2), and calculating to obtain an integral residual error;

3.4) taking the obtained residual error as a label for training the next base classifier;

4) calculating a loss function value, and when the reduction rate of the loss function value and the previous iteration is lower than a threshold value, determining the value as convergence, and stopping the iteration;

5) and classifying the test samples by using an integrated classifier to obtain a final prediction result.

Further, in the step 1), data cleaning is carried out on the biological gene data, including median filling or filtering of the data with missing values and the attribute containing a large number of missing values, data types are unified, and meanwhile, data labels are converted into unique hot codes, so that subsequent calculation is facilitated.

Furthermore, the kernel width learning system is a two-layer neural network, the first layer is a feature node layer and an enhanced node layer, the second layer is an output layer, and the feature nodes and the enhanced nodes are fully connected and then spliced together and fully connected to the output layer.

Further, in step 2), training samples are used

Training the KBLS as a first base classifier, wherein N is the number of samples, and the specific process is as follows:

2.1) generating n sets of random weights

And bias term

Generating feature nodes by the features of the training samples through n groups of mappings:

Z_i＝[XW_i+β_i],i＝1,2,...,n，

wherein Z is_iRepresenting the ith set of feature nodes and X representing the input training sample. All feature node groups are connected into one layer, forming a feature layer:

Z＝[Z₁,Z₂,...,Z_n]；

2.2) calculating an inner product matrix among training samples through a Radial Basis Function (RBF), and mapping the characteristic layer into an enhanced node layer:

in the formula x_k,x_mRespectively representing the feature vectors of the kth sample and the mth sample, determining the shape of Gaussian distribution by using sigma as a parameter of RBF, and forming a kernel matrix omega by using the calculated N x N kernel distances (wherein N is equal to the number N of training samples) and simultaneously using the kernel distances as an enhanced node layer;

2.3) calculating the connection weight of the characteristic layer to the enhancement layer by means of pseudo inverse (pseudo inverse):

wherein C represents a regular term coefficient;

2.4) concatenating the feature layer and the enhancement layer into a layer of hidden layer:

A＝[Z,Ω]，

2.5) the weight of the output layer is also obtained by a pseudo-inverse method, and the calculation formula is as follows:

where Y represents the label matrix of the training sample.

Further, in step 3.1), the integrator is defined as:

where B represents the number of base classifiers in the integrator, x represents the current training sample, and F_B(x) Representing the combined output of the first B base classifiers, f_b(x) Representing the output of the b-th base classifier, the penalty function is:

wherein y represents the label of x, K represents the number of categories, y_kRepresenting the value (0 or 1), p, of the kth dimension of y_k(x) Represents a predicted value of class k, denoted by F_B(x) Calculated by a softmax function. The gradient formula of the B-1 integrator loss function is:

g_B-1＝-y_t+y_t·p_B-1,t，

where t denotes the true class of the sample, y_tTag value 1, p representing the t-th dimension, i.e. the true category_B-1,tRepresenting the predicted value of the B-1 integrated classifier to the real category;

further, in step 3.2), the sub-gradient of the integrator loss function is calculated, and the calculation formula is:

h_B-1＝p_B-1,t-p_B-1,t ²。

further, in step 3.3), calculating the integrator training residual of the current iteration round by using the obtained gradient g and the obtained sub-gradient h:

further, in step 3.4), the obtained residual is used as a target for training the next base classifierSign, then the new training sample becomes

Wherein r is_i,bRepresenting the predictor residual of the b-th base classifier on the i-th sample.

Further, in step 4), the difference between the loss function value of the current iteration round and the previous round is calculated:

ΔL＝L(y,F_B-1(x))-L(y,F_B(x))，

when the rate of decrease Δ L is below the threshold value, i.e.

And when the training is finished, the training is stopped as iterative convergence.

Further, in step 5), the test samples are tested by using the integrated classifier, and the confidence degrees, namely the probability values of various types in the prediction results, of the results of the initial base classifier and the base classifier trained according to each round of residual errors are added to obtain the probability value of each type of each test sample.

Compared with the prior art, the invention has the following advantages and beneficial effects: such as

1. The invention effectively solves the problem of long training process in the prior art, reduces the characteristic cost of artificial design and the characteristic of resisting noise in biological information, and improves the classification effect of the analysis effect of biological gene data.

2. Hidden layer nodes in the kernel width learning system comprise linear combination characteristics of biological information and kernel distance characteristics among samples; the linear combination features are equivalent to random sampling and weighting of original features, so that a single classifier has high randomness, and each member in the integrator has diversity and high variance; the kernel distance feature takes the similarity of the samples as a measurement forming feature, is more reasonable than the random mapping of an original edition width learning system, reduces redundancy and improves the generalization capability of the kernel width learning system. The method comprises the steps of carrying out progressive integration on a plurality of different kernel width learning systems, specifically training a new base classifier in the directions of a loss function gradient and a sub-gradient of an integrator, wherein the deviation is smaller than that of a single integrated overall, and the scale of the integrator can be automatically determined according to the convergence speed; and secondly, a plurality of base classifiers are used for making a decision, so that the overall variance of the kernel width learning system is reduced, the local optimum is avoided, and the generalization capability of the integrator is improved. Because of such characteristics, the present invention has a good classification effect on the noisy biogenic data set.

Drawings

FIG. 1 is a flowchart illustrating a progressive ensemble classification method based on a kernel width learning system according to this embodiment.

Detailed Description

The present invention will be further described with reference to the following specific examples.

As shown in fig. 1, the progressive ensemble classification method based on the kernel width learning system provided in this embodiment includes the following steps:

1) inputting a training sample and a test sample, wherein the sample contains desensitized gene characteristics and corresponding labels (various different traits), and preprocessing biological gene data in the sample, wherein the preprocessing comprises the following steps:

1.1) carrying out data cleaning on biological gene data, including carrying out median filling and deletion on data with missing values and carrying out data type unification and data normalization;

1.2) converting the class label of the sample into a one-hot code, thereby facilitating subsequent calculation. The One-hot coding, i.e. One-hot coding, also called One-bit effective coding, the conversion process specifically comprises: an N-bit status register is used to encode N states, each having its own independent register bit and only one bit being active at any one time. For example, the tag is coded as { property 1, property 2, property 3}, and if the sample is property 1, the tag is coded as {1,0,0 };

2) training a kernel width learning system (KBLS) as a first base classifier, the KBLS being a two-layer neural network, the first layer being a layer of feature nodes and enhancement nodes, the second layer being an output layer, wherein the feature nodes are fully connected to the enhancement nodes, and thenAnd then spliced together to be fully connected to the output layer. Training sample

Wherein x_iRepresents the ith sample, y_iRepresenting a corresponding label, wherein N is the number of training samples, and the specific training process is as follows:

2.1) generating n sets of random weights by means of random numbers

And bias term

Z_i＝[XW_i+β_i],i＝1,2,...,n，

wherein Z is_iRepresenting the ith set of feature nodes, X representing the input training sample, all feature node sets being concatenated into a layer:

Z＝[Z₁,Z₂,...,Z_n]；

2.2) calculating an inner product matrix among samples through a radial basis kernel function (RBF), and mapping the characteristic layer into an enhanced node:

in the formula x_k,x_mRespectively representing the feature vectors of the kth sample and the mth sample, determining the shape of Gaussian distribution by using sigma as a parameter of RBF, and forming a kernel matrix omega by using the computed N × N kernel distances (wherein N is equal to the number N of training samples);

wherein C represents a regular term coefficient;

A＝[Z,Ω]，

wherein Y represents a label matrix of the training sample;

3.1) define the integrator as:

g_B-1＝-y_t+y_t·p_B-1,t，

3.2) calculating the sub-gradient of the integrator loss function;

the calculation formula is as follows:

h_B-1＝p_B-1,t-p_B-1,t ²；

calculating the integrator training residual of the current iteration round number by using the obtained gradient g and the sub-gradient h:

new training samples become

Wherein r is_i,bRepresenting the residue of the prediction value of the ith sample of the b-th base classifier;

calculating the difference value between the loss function value of the current iteration round number and the previous round:

ΔL＝L(y,F_B-1(x))-L(y,F_B(x))，

when the rate of decrease Δ L is below the threshold value, i.e.

When the training is finished, the training is stopped as iterative convergence;

And testing the test samples by using the integrated classifier, and adding confidence degrees (probability values of various types in the predicted result) of the results of the plurality of base classifiers to obtain the probability that each test sample belongs to each category (namely, the property).

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A progressive integrated classification method based on a kernel width learning system is characterized by comprising the following steps:

2) training a core width learning system (KBLS) as a first base classifier, wherein the KBLS is obtained by replacing an enhanced node layer in the width learning system (BLS) with a core node layer;

3) the residual is found from the results of the first base classifier as follows:

3.2) solving the sub-gradient of the integrator loss function;

3.3) combining the results obtained in the step 3.1) and the step 3.2) to obtain an integral residual error;

5) and classifying the test samples by using an integrated classifier combined by the base classifier to obtain a final prediction result.

2. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in the step 1), the biological gene data is subjected to data cleaning, the data with missing values are filled and deleted by using median, the attributes containing a plurality of missing values are unified, and meanwhile, the data labels carried by the data set are converted into unique hot codes.

3. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: the kernel width learning system is a neural network with two layers, wherein the first layer is a characteristic node layer and an enhanced node layer, the second layer is an output layer, and the characteristic nodes and the enhanced nodes are fully connected and then are spliced together and are fully connected to the output layer.

4. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 2), training samples are used

To train the kernel width learning system, x_iRepresents the ith sample, y_iRepresenting the corresponding label, wherein N is the number of training samples, and the specific training process for training the KBLS as a first base classifier is as follows:

2.1) generating n sets of random weights by means of random numbers

And bias term

Z_i＝[XW_i+β_i],i＝1,2,...,n，

wherein Z is_iRepresenting the ith group of feature nodes, X representing the input training sample, all feature node groups being connected into a layer, forming a feature layer:

Z＝[Z₁,Z₂,...,Z_n]；

2.2) calculating an inner product matrix among training samples through a Radial Basis Function (RBF kernel, RBF), and mapping the feature layer into an enhanced node layer:

in the formula x_k,x_mRespectively representing the characteristic vectors of the kth sample and the mth sample, determining the shape of Gaussian distribution by using sigma as a parameter of RBF, and forming a kernel matrix omega by using the calculated N × N kernel distances and simultaneously using the kernel distances as an enhanced node layer;

2.3) obtaining the connection weight of the characteristic layer to the enhancement layer by means of pseudo inverse (pseudo inverse):

wherein C represents a regular term coefficient;

A＝[Z,Ω]，

2.5) the output layer weight is also obtained by a pseudo-inverse method, and the formula is as follows:

where Y represents the label matrix of the training sample.

5. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.1), the integrator is defined as:

wherein y represents the label of x, K represents the number of categories, y_kValues representing the k-dimension, i.e. 0 or 1, p_k(x) Represents a predicted value of class k, denoted by F_B(x) The gradient formula of the B-1 integrator loss function is as follows through a softmax function:

g_B-1＝-y_t+y_t·p_B-1,t，

where t denotes the true class of the sample, y_tTag value 1, p representing the t-th dimension, i.e. the true category_B-1,tRepresenting the predicted value of the B-1 st ensemble classifier on the true class.

6. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.2), the sub-gradient formula of the integrator loss function is:

h_B-1＝p_B-1,t-p_B-1,t ²。

7. the progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.3), the integrator training residual of the current iteration round is solved by using the solved gradient g and the sub-gradient h:

8. the progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.4), the obtained residual is used as a label for the next base classifier training, so that a new training sample becomes

9. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 4), calculating the difference between the loss function value of the current iteration round and the previous round:

ΔL＝L(y,F_B-1(x))-L(y,F_B(x))，

when the rate of decrease Δ L is below the threshold value, i.e.

10. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in the step 5), the integrated classifier is used for testing the test samples, confidence degrees, namely probability values of various types in the prediction results, are added to the results of the initial base classifier and the base classifier trained according to the residual errors of each round, and the probability value of each type of each test sample is obtained.