CN111598187A - Progressive integrated classification method based on kernel width learning system - Google Patents
Progressive integrated classification method based on kernel width learning system Download PDFInfo
- Publication number
- CN111598187A CN111598187A CN202010579123.0A CN202010579123A CN111598187A CN 111598187 A CN111598187 A CN 111598187A CN 202010579123 A CN202010579123 A CN 202010579123A CN 111598187 A CN111598187 A CN 111598187A
- Authority
- CN
- China
- Prior art keywords
- training
- learning system
- layer
- width learning
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a progressive integrated classification method based on a kernel width learning system, which comprises the following steps: 1) inputting a training sample and a test sample; 2) training a kernel width learning system as a base classifier by using original training data; 3) calculating a prediction residual according to the training result of the first base classifier, and using the prediction residual as a label for the training of the next base classifier; 4) when the reduction rate of the loss function value of the training reaches a threshold value, stopping the training and not increasing the base classifier; 5) and classifying the test samples to obtain the final prediction result. According to the method, the breadth learning system is utilized, tedious back propagation is not needed, meanwhile, the kernel mapping technology is introduced, the nonlinear fitting capacity of the classifier is improved, the integration means is used for fusing a plurality of base classifiers, the effect is obviously improved on a noisy biological information data set, and the accuracy of biological gene classification is improved.
Description
Technical Field
The invention relates to the technical field of biological gene data analysis, in particular to a progressive integrated classification method based on a kernel width learning system.
Background
With the popularization and rapid development of the mobile internet, intelligent medical treatment is produced. In intelligent medicine, including the field of research and analysis of biological genes, machine learning may serve to mine potential patterns and features from biological information. Data mining of biological information is a difficult task, and because biological genes contain many genes with non-expression traits and data noise caused by insufficient acquisition technology, some samples are not matched with genetic disease relevance under real situations, and prediction and classification of the biological genes are influenced, a machine learning model which is more robust and has noise resistance is needed.
Although the deep learning algorithm can extract deep features, due to the excessive dependence on the size of the data set, the deep learning algorithm is often overfitting when the data size is not large enough, and even if the data size is large enough, the deep learning algorithm needs to consume a long training time. How to find a more efficient and noise-resistant model is a hot issue in the field of biological information.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a progressive integrated classification method based on a kernel width learning system, can effectively solve the problem that the work of feature engineering is too complicated, solves the output weight by using pseudo-inverse calculation instead of back propagation, improves the training efficiency, is combined with an integrated learning method, avoids falling into local optimum and overfitting while improving the classification effect of the kernel width learning system, and has good resistance to noise in biological information.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a progressive integration classification method based on a kernel width learning system comprises the following steps:
1) inputting a training sample and a test sample, and preprocessing data in the samples;
2) training a kernel width learning system (KBLS) as a first base classifier;
3) the residual is calculated from the results of the first base classifier as follows:
3.1) defining an integrator and a loss function thereof, and calculating the gradient of the loss function of the integrator;
3.2) solving the sub-gradient of the integrator loss function;
3.3) combining the results obtained in the step 3.1) and the step 3.2), and calculating to obtain an integral residual error;
3.4) taking the obtained residual error as a label for training the next base classifier;
4) calculating a loss function value, and when the reduction rate of the loss function value and the previous iteration is lower than a threshold value, determining the value as convergence, and stopping the iteration;
5) and classifying the test samples by using an integrated classifier to obtain a final prediction result.
Further, in the step 1), data cleaning is carried out on the biological gene data, including median filling or filtering of the data with missing values and the attribute containing a large number of missing values, data types are unified, and meanwhile, data labels are converted into unique hot codes, so that subsequent calculation is facilitated.
Furthermore, the kernel width learning system is a two-layer neural network, the first layer is a feature node layer and an enhanced node layer, the second layer is an output layer, and the feature nodes and the enhanced nodes are fully connected and then spliced together and fully connected to the output layer.
Further, in step 2), training samples are usedTraining the KBLS as a first base classifier, wherein N is the number of samples, and the specific process is as follows:
2.1) generating n sets of random weightsAnd bias termGenerating feature nodes by the features of the training samples through n groups of mappings:
Zi=[XWi+βi],i=1,2,...,n,
wherein Z isiRepresenting the ith set of feature nodes and X representing the input training sample. All feature node groups are connected into one layer, forming a feature layer:
Z=[Z1,Z2,...,Zn];
2.2) calculating an inner product matrix among training samples through a Radial Basis Function (RBF), and mapping the characteristic layer into an enhanced node layer:
in the formula xk,xmRespectively representing the feature vectors of the kth sample and the mth sample, determining the shape of Gaussian distribution by using sigma as a parameter of RBF, and forming a kernel matrix omega by using the calculated N x N kernel distances (wherein N is equal to the number N of training samples) and simultaneously using the kernel distances as an enhanced node layer;
2.3) calculating the connection weight of the characteristic layer to the enhancement layer by means of pseudo inverse (pseudo inverse):
wherein C represents a regular term coefficient;
2.4) concatenating the feature layer and the enhancement layer into a layer of hidden layer:
A=[Z,Ω],
2.5) the weight of the output layer is also obtained by a pseudo-inverse method, and the calculation formula is as follows:
where Y represents the label matrix of the training sample.
Further, in step 3.1), the integrator is defined as:
where B represents the number of base classifiers in the integrator, x represents the current training sample, and FB(x) Representing the combined output of the first B base classifiers, fb(x) Representing the output of the b-th base classifier, the penalty function is:
wherein y represents the label of x, K represents the number of categories, ykRepresenting the value (0 or 1), p, of the kth dimension of yk(x) Represents a predicted value of class k, denoted by FB(x) Calculated by a softmax function. The gradient formula of the B-1 integrator loss function is:
gB-1=-yt+yt·pB-1,t,
where t denotes the true class of the sample, ytTag value 1, p representing the t-th dimension, i.e. the true categoryB-1,tRepresenting the predicted value of the B-1 integrated classifier to the real category;
further, in step 3.2), the sub-gradient of the integrator loss function is calculated, and the calculation formula is:
hB-1=pB-1,t-pB-1,t 2。
further, in step 3.3), calculating the integrator training residual of the current iteration round by using the obtained gradient g and the obtained sub-gradient h:
further, in step 3.4), the obtained residual is used as a target for training the next base classifierSign, then the new training sample becomesWherein r isi,bRepresenting the predictor residual of the b-th base classifier on the i-th sample.
Further, in step 4), the difference between the loss function value of the current iteration round and the previous round is calculated:
ΔL=L(y,FB-1(x))-L(y,FB(x)),
when the rate of decrease Δ L is below the threshold value, i.e.And when the training is finished, the training is stopped as iterative convergence.
Further, in step 5), the test samples are tested by using the integrated classifier, and the confidence degrees, namely the probability values of various types in the prediction results, of the results of the initial base classifier and the base classifier trained according to each round of residual errors are added to obtain the probability value of each type of each test sample.
Compared with the prior art, the invention has the following advantages and beneficial effects: such as
1. The invention effectively solves the problem of long training process in the prior art, reduces the characteristic cost of artificial design and the characteristic of resisting noise in biological information, and improves the classification effect of the analysis effect of biological gene data.
2. Hidden layer nodes in the kernel width learning system comprise linear combination characteristics of biological information and kernel distance characteristics among samples; the linear combination features are equivalent to random sampling and weighting of original features, so that a single classifier has high randomness, and each member in the integrator has diversity and high variance; the kernel distance feature takes the similarity of the samples as a measurement forming feature, is more reasonable than the random mapping of an original edition width learning system, reduces redundancy and improves the generalization capability of the kernel width learning system. The method comprises the steps of carrying out progressive integration on a plurality of different kernel width learning systems, specifically training a new base classifier in the directions of a loss function gradient and a sub-gradient of an integrator, wherein the deviation is smaller than that of a single integrated overall, and the scale of the integrator can be automatically determined according to the convergence speed; and secondly, a plurality of base classifiers are used for making a decision, so that the overall variance of the kernel width learning system is reduced, the local optimum is avoided, and the generalization capability of the integrator is improved. Because of such characteristics, the present invention has a good classification effect on the noisy biogenic data set.
Drawings
FIG. 1 is a flowchart illustrating a progressive ensemble classification method based on a kernel width learning system according to this embodiment.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the progressive ensemble classification method based on the kernel width learning system provided in this embodiment includes the following steps:
1) inputting a training sample and a test sample, wherein the sample contains desensitized gene characteristics and corresponding labels (various different traits), and preprocessing biological gene data in the sample, wherein the preprocessing comprises the following steps:
1.1) carrying out data cleaning on biological gene data, including carrying out median filling and deletion on data with missing values and carrying out data type unification and data normalization;
1.2) converting the class label of the sample into a one-hot code, thereby facilitating subsequent calculation. The One-hot coding, i.e. One-hot coding, also called One-bit effective coding, the conversion process specifically comprises: an N-bit status register is used to encode N states, each having its own independent register bit and only one bit being active at any one time. For example, the tag is coded as { property 1, property 2, property 3}, and if the sample is property 1, the tag is coded as {1,0,0 };
2) training a kernel width learning system (KBLS) as a first base classifier, the KBLS being a two-layer neural network, the first layer being a layer of feature nodes and enhancement nodes, the second layer being an output layer, wherein the feature nodes are fully connected to the enhancement nodes, and thenAnd then spliced together to be fully connected to the output layer. Training sampleWherein xiRepresents the ith sample, yiRepresenting a corresponding label, wherein N is the number of training samples, and the specific training process is as follows:
2.1) generating n sets of random weights by means of random numbersAnd bias termGenerating feature nodes by the features of the training samples through n groups of mappings:
Zi=[XWi+βi],i=1,2,...,n,
wherein Z isiRepresenting the ith set of feature nodes, X representing the input training sample, all feature node sets being concatenated into a layer:
Z=[Z1,Z2,...,Zn];
2.2) calculating an inner product matrix among samples through a radial basis kernel function (RBF), and mapping the characteristic layer into an enhanced node:
in the formula xk,xmRespectively representing the feature vectors of the kth sample and the mth sample, determining the shape of Gaussian distribution by using sigma as a parameter of RBF, and forming a kernel matrix omega by using the computed N × N kernel distances (wherein N is equal to the number N of training samples);
2.3) calculating the connection weight of the characteristic layer to the enhancement layer by means of pseudo inverse (pseudo inverse):
wherein C represents a regular term coefficient;
2.4) concatenating the feature layer and the enhancement layer into a layer of hidden layer:
A=[Z,Ω],
2.5) the weight of the output layer is also obtained by a pseudo-inverse method, and the calculation formula is as follows:
wherein Y represents a label matrix of the training sample;
3) the residual is calculated from the results of the first base classifier as follows:
3.1) define the integrator as:
where B represents the number of base classifiers in the integrator, x represents the current training sample, and FB(x) Representing the combined output of the first B base classifiers, fb(x) Representing the output of the b-th base classifier, the penalty function is:
wherein y represents the label of x, K represents the number of categories, ykRepresenting the value (0 or 1), p, of the kth dimension of yk(x) Represents a predicted value of class k, denoted by FB(x) Calculated by a softmax function. The gradient formula of the B-1 integrator loss function is:
gB-1=-yt+yt·pB-1,t,
where t denotes the true class of the sample, ytTag value 1, p representing the t-th dimension, i.e. the true categoryB-1,tRepresenting the predicted value of the B-1 integrated classifier to the real category;
3.2) calculating the sub-gradient of the integrator loss function;
the calculation formula is as follows:
hB-1=pB-1,t-pB-1,t 2;
3.3) combining the results obtained in the step 3.1) and the step 3.2), and calculating to obtain an integral residual error;
calculating the integrator training residual of the current iteration round number by using the obtained gradient g and the sub-gradient h:
3.4) taking the obtained residual error as a label for training the next base classifier;
new training samples becomeWherein r isi,bRepresenting the residue of the prediction value of the ith sample of the b-th base classifier;
4) calculating a loss function value, and when the reduction rate of the loss function value and the previous iteration is lower than a threshold value, determining the value as convergence, and stopping the iteration;
calculating the difference value between the loss function value of the current iteration round number and the previous round:
ΔL=L(y,FB-1(x))-L(y,FB(x)),
when the rate of decrease Δ L is below the threshold value, i.e.When the training is finished, the training is stopped as iterative convergence;
5) and classifying the test samples by using an integrated classifier to obtain a final prediction result.
And testing the test samples by using the integrated classifier, and adding confidence degrees (probability values of various types in the predicted result) of the results of the plurality of base classifiers to obtain the probability that each test sample belongs to each category (namely, the property).
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims (10)
1. A progressive integrated classification method based on a kernel width learning system is characterized by comprising the following steps:
1) inputting a training sample and a test sample, and preprocessing data in the samples;
2) training a core width learning system (KBLS) as a first base classifier, wherein the KBLS is obtained by replacing an enhanced node layer in the width learning system (BLS) with a core node layer;
3) the residual is found from the results of the first base classifier as follows:
3.1) defining an integrator and a loss function thereof, and calculating the gradient of the loss function of the integrator;
3.2) solving the sub-gradient of the integrator loss function;
3.3) combining the results obtained in the step 3.1) and the step 3.2) to obtain an integral residual error;
3.4) taking the obtained residual error as a label for training the next base classifier;
4) calculating a loss function value, and when the reduction rate of the loss function value and the previous iteration is lower than a threshold value, determining the value as convergence, and stopping the iteration;
5) and classifying the test samples by using an integrated classifier combined by the base classifier to obtain a final prediction result.
2. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in the step 1), the biological gene data is subjected to data cleaning, the data with missing values are filled and deleted by using median, the attributes containing a plurality of missing values are unified, and meanwhile, the data labels carried by the data set are converted into unique hot codes.
3. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: the kernel width learning system is a neural network with two layers, wherein the first layer is a characteristic node layer and an enhanced node layer, the second layer is an output layer, and the characteristic nodes and the enhanced nodes are fully connected and then are spliced together and are fully connected to the output layer.
4. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 2), training samples are usedTo train the kernel width learning system, xiRepresents the ith sample, yiRepresenting the corresponding label, wherein N is the number of training samples, and the specific training process for training the KBLS as a first base classifier is as follows:
2.1) generating n sets of random weights by means of random numbersAnd bias termGenerating feature nodes by the features of the training samples through n groups of mappings:
Zi=[XWi+βi],i=1,2,...,n,
wherein Z isiRepresenting the ith group of feature nodes, X representing the input training sample, all feature node groups being connected into a layer, forming a feature layer:
Z=[Z1,Z2,...,Zn];
2.2) calculating an inner product matrix among training samples through a Radial Basis Function (RBF kernel, RBF), and mapping the feature layer into an enhanced node layer:
in the formula xk,xmRespectively representing the characteristic vectors of the kth sample and the mth sample, determining the shape of Gaussian distribution by using sigma as a parameter of RBF, and forming a kernel matrix omega by using the calculated N × N kernel distances and simultaneously using the kernel distances as an enhanced node layer;
2.3) obtaining the connection weight of the characteristic layer to the enhancement layer by means of pseudo inverse (pseudo inverse):
wherein C represents a regular term coefficient;
2.4) concatenating the feature layer and the enhancement layer into a layer of hidden layer:
A=[Z,Ω],
2.5) the output layer weight is also obtained by a pseudo-inverse method, and the formula is as follows:
where Y represents the label matrix of the training sample.
5. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.1), the integrator is defined as:
where B represents the number of base classifiers in the integrator, x represents the current training sample, and FB(x) Representing the combined output of the first B base classifiers, fb(x) Representing the output of the b-th base classifier, the penalty function is:
wherein y represents the label of x, K represents the number of categories, ykValues representing the k-dimension, i.e. 0 or 1, pk(x) Represents a predicted value of class k, denoted by FB(x) The gradient formula of the B-1 integrator loss function is as follows through a softmax function:
gB-1=-yt+yt·pB-1,t,
where t denotes the true class of the sample, ytTag value 1, p representing the t-th dimension, i.e. the true categoryB-1,tRepresenting the predicted value of the B-1 st ensemble classifier on the true class.
6. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.2), the sub-gradient formula of the integrator loss function is:
hB-1=pB-1,t-pB-1,t 2。
8. the progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 3.4), the obtained residual is used as a label for the next base classifier training, so that a new training sample becomesWherein r isi,bRepresenting the predictor residual of the b-th base classifier on the i-th sample.
9. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in step 4), calculating the difference between the loss function value of the current iteration round and the previous round:
ΔL=L(y,FB-1(x))-L(y,FB(x)),
10. The progressive ensemble classification method based on kernel width learning system as claimed in claim 1, wherein: in the step 5), the integrated classifier is used for testing the test samples, confidence degrees, namely probability values of various types in the prediction results, are added to the results of the initial base classifier and the base classifier trained according to the residual errors of each round, and the probability value of each type of each test sample is obtained.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2019107939992 | 2019-08-27 | ||
CN201910793999 | 2019-08-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111598187A true CN111598187A (en) | 2020-08-28 |
Family
ID=72191921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010579123.0A Pending CN111598187A (en) | 2019-08-27 | 2020-06-23 | Progressive integrated classification method based on kernel width learning system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111598187A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381279A (en) * | 2020-11-05 | 2021-02-19 | 上海电机学院 | Wind power prediction method based on VMD and BLS combined model |
CN112465152A (en) * | 2020-12-03 | 2021-03-09 | 中国科学院大学宁波华美医院 | Online migration learning method suitable for emotional brain-computer interface |
CN113505827A (en) * | 2021-07-08 | 2021-10-15 | 西藏大学 | Machine learning classification method |
CN116401143A (en) * | 2022-12-19 | 2023-07-07 | 广东能哥知识科技有限公司 | Software testing method and system based on unbalanced data set |
-
2020
- 2020-06-23 CN CN202010579123.0A patent/CN111598187A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381279A (en) * | 2020-11-05 | 2021-02-19 | 上海电机学院 | Wind power prediction method based on VMD and BLS combined model |
CN112381279B (en) * | 2020-11-05 | 2022-06-03 | 上海电机学院 | Wind power prediction method based on VMD and BLS combined model |
CN112465152A (en) * | 2020-12-03 | 2021-03-09 | 中国科学院大学宁波华美医院 | Online migration learning method suitable for emotional brain-computer interface |
CN112465152B (en) * | 2020-12-03 | 2022-11-29 | 中国科学院大学宁波华美医院 | Online migration learning method suitable for emotional brain-computer interface |
CN113505827A (en) * | 2021-07-08 | 2021-10-15 | 西藏大学 | Machine learning classification method |
CN113505827B (en) * | 2021-07-08 | 2024-01-12 | 西藏大学 | Machine learning classification method |
CN116401143A (en) * | 2022-12-19 | 2023-07-07 | 广东能哥知识科技有限公司 | Software testing method and system based on unbalanced data set |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111598187A (en) | Progressive integrated classification method based on kernel width learning system | |
CN108984745A (en) | A kind of neural network file classification method merging more knowledge mappings | |
CN110929034A (en) | Commodity comment fine-grained emotion classification method based on improved LSTM | |
CN112487193B (en) | Zero sample picture classification method based on self-encoder | |
CN112418395B (en) | Gas sensor array drift compensation method based on generation countermeasure network | |
CN101546290B (en) | Method for improving accuracy of quality forecast of class hierarchy in object-oriented software | |
CN111597340A (en) | Text classification method and device and readable storage medium | |
CN111564183A (en) | Single cell sequencing data dimension reduction method fusing gene ontology and neural network | |
CN112381248A (en) | Power distribution network fault diagnosis method based on deep feature clustering and LSTM | |
CN113505225B (en) | Small sample medical relation classification method based on multi-layer attention mechanism | |
CN113591971B (en) | User individual behavior prediction method based on DPI time sequence word embedded vector | |
CN112416358B (en) | Intelligent contract code defect detection method based on structured word embedded network | |
CN111061951A (en) | Recommendation model based on double-layer self-attention comment modeling | |
CN110659367A (en) | Text classification number determination method and device and electronic equipment | |
CN111638034A (en) | Strain balance temperature gradient error compensation method and system based on deep learning | |
CN114821340A (en) | Land utilization classification method and system | |
CN112668633B (en) | Adaptive graph migration learning method based on fine granularity field | |
CN111985680B (en) | Criminal multi-criminal name prediction method based on capsule network and time sequence | |
CN114093445A (en) | Patient screening and marking method based on multi-label learning | |
CN117516937A (en) | Rolling bearing unknown fault detection method based on multi-mode feature fusion enhancement | |
CN109829472B (en) | Semi-supervised classification method based on probability nearest neighbor | |
CN116861250A (en) | Fault diagnosis model training method and device | |
CN111708865A (en) | Technology forecasting and patent early warning analysis method based on improved XGboost algorithm | |
CN112735604B (en) | Novel coronavirus classification method based on deep learning algorithm | |
CN115331754A (en) | Molecule classification method based on Hash algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |