CN109492682A - A kind of multi-branched random forest data classification method - Google Patents

A kind of multi-branched random forest data classification method Download PDF

Info

Publication number
CN109492682A
CN109492682A CN201811273813.2A CN201811273813A CN109492682A CN 109492682 A CN109492682 A CN 109492682A CN 201811273813 A CN201811273813 A CN 201811273813A CN 109492682 A CN109492682 A CN 109492682A
Authority
CN
China
Prior art keywords
sample
cluster
center
sample point
random forest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811273813.2A
Other languages
Chinese (zh)
Inventor
江泽涛
马伟康
胡硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201811273813.2A priority Critical patent/CN109492682A/en
Publication of CN109492682A publication Critical patent/CN109492682A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of multi-branched random forest data classification methods, it is related to random forest data classification technology field, the technical issues of solution, is to provide the classification method of a kind of performance for improving data classification and accuracy rate, this method comprises the following steps: (one) provides unfiled data set, denoises using PCA algorithm to Data Dimensionality Reduction;(2) cluster operation of data is completed using K-means algorithm;(3) multi-branched random forest is constructed;(4) sort operation of the complete paired data of multi-branched Random Forest model is used.The performance and accuracy rate of data classification can be improved using technical solution of the present invention.

Description

A kind of multi-branched random forest data classification method
Technical field
The present invention relates to random forest data classification technology field more particularly to a kind of multi-branched random forest data classifications Method.
Background technique
With the development of artificial intelligence, whether image studies, information security etc. require the participation of artificial intelligence.Cluster It is had important application with sorting algorithm in artificial intelligence field, wherein K-means and random forest are cluster and classification respectively The representative of algorithm.The classification capacity of random forest is one of preferable algorithm of performance in sorting algorithm, is one based on decision tree Kind Ensemble Learning Algorithms.But the random forest data classification method of the prior art is when being classified, sample set excessively redundancy, miscellaneous Disorderly, data purity is low, has a certain impact to classification performance.
Summary of the invention
In view of the deficiencies of the prior art, technical problem solved by the invention is to provide a kind of performance for improving data classification With the classification method of accuracy rate.
In order to solve the above technical problems, the technical solution adopted by the present invention is that a kind of multi-branched random forest data classification side Method includes the following steps:
(1) unfiled data set is provided, Data Dimensionality Reduction is denoised using PCA algorithm, specifically as follows step by step:
(1) sample set is expressed as to the matrix X of N × M;
(2) zero averaging is carried out to each row, that is, seeks the average value R of every a line in matrixi, every a line all subtracts the row Average value Ni-Ri;Find out covariance matrixSeek the eigenvalue λ of covariance matrix C12…λmIt is special with standardization Levy vector x1,x2…xm
(3) by feature vector according to corresponding eigenvalue size from top to bottom by rows at matrix, take before k row composition matrix P;
(4) matrix P is multiplied with matrix X, the data after obtaining dimensionality reduction remove the redundancy section in data.
(2) cluster operation that data set is completed using K-means algorithm, exports cluster C={ C1, C2..., Ck, specific point Steps are as follows:
(1) density value of each sample point is calculated
Wherein,dijk=| | xij-xkj||,pijIt is the density of i-th of sample point in classification j;njFor j Class sample point sum, dijkIt is sample point xijAnd xkjDistance in vector space;By density value pijMaximum sample point conduct First center that clusters;
(2) it is also contemplated that distance in the selection at the remaining center that clusters, to given sample yn, it is arrived into sample point yl Distance be normalized:
(3) by the density value of the sample point and to the sum of the normalized cumulant for having selected cluster centre;
Wherein, pijIndicate the density of i-th of sample point in classification j, DijtIndicate sample point xijTo the t class selected Center ytNormalized cumulant;Cluster numbers K value is determined by elbow method;
(4) wijAccording to being ordered from large to small, preceding k-1 sample point and p are selectedijIt is worth maximum point as just Begin the center C that clusters1, C2..., Ck
(5) by c1, c2...ckIt is denoted as μ again as the initial center that clusters1, μ2...μk;Set maximum number of iterations R;
(6) the distance dist (x of each sample and the center that clusters is calculatedi, μj)=| | xij||2, wherein i=1,2 ... N, J=1,2 ... k;
(7) x is determined according to the nearest center of clustering of distanceiCluster label: λj=arg minI ∈ { 1,2..., k }dist(xi, μj);
(8) by sample xiIt is divided into corresponding cluster: Cλi=Cλi∪{xi};
(9) after clustering to the completion of all samples, new mean value class center is calculated:If μ 'iAnd μi Unequal, class center is updated to μ 'iIf μiWith μ 'iIt is equal, keeping μiIt is constant;It recalculates corresponding belonging to sample Cluster;
(10) it repeats step by step (9), the maximum iteration time until all central points that clusters do not change or reach Number;
(11) output cluster divides C={ C1, C2..., Ck}。
(3) multi-branched random forest is constructed, specific substep is poly- as follows:
(1) building is completed with the training set of known label, provides training set, training set is carried out using K-means algorithm Data prediction obtains cluster C={ C1, C2..., Ck, detailed process is as follows:
1) density value of each sample point is calculated
Wherein,pijIt is the density of i-th of sample point in classification j;nj For j class sample point sum, dijkIt is sample point xijAnd xkjDistance in vector space;By density value pijMaximum sample point is made For first center that clusters;
2) it is also contemplated that distance in the selection at the remaining center that clusters, to given sample yn, it is arrived into sample point yl Distance be normalized:
3) by the density value of the sample point and to the sum of the normalized cumulant for having selected cluster centre:
Wherein, pijIndicate the density of i-th of sample point in classification j, DijtIndicate sample point xijTo the t class selected Center ytNormalized cumulant;Cluster numbers K value is determined by elbow method;
4) wijAccording to being ordered from large to small, preceding k-1 sample point and p are selectedijIt is worth maximum point as initial The center of clustering C1, C2..., Ck
5) by c1, c2…ckIt is denoted as μ again as the initial center that clusters1, μ2...μk;Set maximum number of iterations R;
6) the distance dist (x of each sample and the center that clusters is calculatedi, μj)=| | xij||2, wherein i=1,2 ... N, j =1,2 ... k;
7) x is determined according to the nearest center of clustering of distanceiCluster label: λj=arg minI ∈ { 1,2..., k }dist(xi, μj);
8) by sample xiIt is divided into corresponding cluster: Cλi=Cλi∪{xi};
9) after clustering to the completion of all samples, new mean value class center is calculated:If μ 'iAnd μiNo Equal, class center is updated to μ 'iIf μiWith μ 'iIt is equal, keeping μiIt is constant;Recalculate corresponding cluster belonging to sample;
10) repetitive process 9), the maximum the number of iterations until all central points that clusters do not change or reach;
11) output cluster divides C={ C1, C2..., Ck}。
(2) bootstrap sampling sampling method is used, is completed to cluster CiSampling operation, building multi-branched it is gloomy at random Woods, detailed process is as follows:
1) bootstrap sampling sampling method is used, in cluster CiMiddle use has the sampling put back to, and samples out T containing m The training set D of a training samplei
2) assume that the feature quantity of sample is M, m feature (m < M) is randomly selected in the division of base decision tree, to each spy A and its each value a is levied, is calculated gini index Gini (D, A);
The gini index Gini (D, A), for given sample set D, if belonging to class ckSample set be Ck, then Gini index are as follows:
Under conditions of feature A, whether the gini index Gini (D, A) of set D: given feature A take some according to it Probable value a, sample set D are divided into two subsets: D1And D2, in which:Then:
3) choose optimal characteristics and optimal cut-off: in all feature A and all cut-off a, gini index is minimum A and a be exactly optimal characteristics and optimal cut-off, as tree node.According to optimal characteristics and optimal cut-off by data set Di It is cut into two child nodes;
4) to child node recursive call process 2), process 3), until in data set gini index be less than predetermined value, that is, complete The building of base decision tree;
5) multi-branched random forest is formed by base decision tree.
(4) sort operation for the complete paired data of multi-branched Random Forest model completed using building, specifically step by step such as Under:
(1) step (2) is clustered to the cluster C={ C for completing output1, C2..., CkTo be sequentially inputted to multi-branched gloomy at random Woods;
(2) sample point hiIn category label cjOutput be denoted as
(3) classification of sample is determined using relative majority ballot method:Repeat the above substep Poly- (2), substep poly- (3), is completed until all clusters are classified;
(4) output category result.
The performance and accuracy rate of data classification can be improved using technical solution of the present invention.
Detailed description of the invention
Fig. 1 is flow chart of the present invention;
Fig. 2 is construction multi-branched random forest flow diagram.
Specific embodiment
A specific embodiment of the invention is further described with reference to the accompanying drawing, but is not to limit of the invention It is fixed.
Fig. 1 shows a kind of multi-branched random forest data classification method, includes the following steps:
(1) unfiled data set is provided, Data Dimensionality Reduction is denoised using PCA algorithm, specifically as follows step by step:
(1) sample set is expressed as to the matrix X of N × M;
(2) zero averaging is carried out to each row, that is, seeks the average value R of every a line in matrixi, every a line all subtracts the row Average value Ni-Ri;Find out covariance matrixSeek the eigenvalue λ of covariance matrix C12…λmIt is special with standardization Levy vector x1,x2…xm
(3) by feature vector according to corresponding eigenvalue size from top to bottom by rows at matrix, take before k row composition matrix P;
(4) matrix P is multiplied with matrix X, the data after obtaining dimensionality reduction remove the redundancy section in data.
(2) cluster operation that data set is completed using K-means algorithm, exports cluster C={ C1, C2..., Ck, specific point Steps are as follows:
(1) density value of each sample point is calculated
Wherein,pijIt is the density of i-th of sample point in classification j;nj For j class sample point sum, dijkIt is sample point xijAnd xkjDistance in vector space;By density value pijMaximum sample point is made For first center that clusters;
(2) it is also contemplated that distance in the selection at the remaining center that clusters, to given sample yn, it is arrived into sample point yl Distance be normalized:
(3) by the density value of the sample point and to the sum of the normalized cumulant for having selected cluster centre:
Wherein, pijIndicate the density of i-th of sample point in classification j, DijtIndicate sample point xijTo the t class selected Center ytNormalized cumulant;Cluster numbers K value is determined by elbow method;
(4) wijAccording to being ordered from large to small, preceding k-1 sample point and p are selectedijIt is worth maximum point as just Begin the center C that clusters1, C2..., Ck
(5) by c1, c2…ckIt is denoted as μ again as the initial center that clusters1, μ2...μk;Set maximum number of iterations R;
(6) the distance dist (x of each sample and the center that clusters is calculatedi, μj)=| | xij||2, wherein i=1,2 ... N, J=1,2 ... k;
(7) x is determined according to the nearest center of clustering of distanceiCluster label: λj=arg minI ∈ { 1,2..., k }dist(xi, μj);
(8) by sample xiIt is divided into corresponding cluster: Cλi=Cλi∪{xi};
(9) after clustering to the completion of all samples, new mean value class center is calculated:If μ 'iAnd μi Unequal, class center is updated to μ 'iIf μiWith μ 'iIt is equal, keeping μiIt is constant;It recalculates corresponding belonging to sample Cluster;
(10) it repeats step by step (9), the maximum iteration time until all central points that clusters do not change or reach Number;
(11) output cluster divides C={ C1, C2..., Ck}。
(3) multi-branched random forest is constructed, detailed process is as shown in Fig. 2, specific substep is poly- as follows:
(1) building is completed with the training set of known label, provides training set, training set is carried out using K-means algorithm Data prediction obtains cluster C={ C1, C2..., Ck, detailed process is as follows:
1) density value of each sample point is calculated
Wherein,pijIt is the density of i-th of sample point in classification j;nj For j class sample point sum, dijkIt is sample point xijAnd xkjDistance in vector space;By density value pijMaximum sample point is made For first center that clusters;
2) it is also contemplated that distance in the selection at the remaining center that clusters, to given sample yn, it is arrived into sample point yl Distance be normalized:
3) by the density value of the sample point and to the sum of the normalized cumulant for having selected cluster centre:
Wherein, pijIndicate the density of i-th of sample point in classification j, DijtIndicate sample point xijTo the t class selected Center ytNormalized cumulant;Cluster numbers K value is determined by elbow method;
4) wijAccording to being ordered from large to small, preceding k-1 sample point and p are selectedijIt is worth maximum point as initial The center of clustering C1, C2..., Ck
5) by c1, c2...ckIt is denoted as μ again as the initial center that clusters1, μ2...μk;Set maximum number of iterations R;
6) the distance dist (x of each sample and the center that clusters is calculatedi, μj)=| | xij||2, wherein i=1,2 ... N, j =1,2 ... k;
7) determine that the cluster of xi marks according to the nearest center of clustering of distance: λj=arg minI ∈ { 1,2..., k }dist(xi, μj);
8) by sample xiIt is divided into corresponding cluster: Cλi=Cλi∪{xi};
9) after clustering to the completion of all samples, new mean value class center is calculated:If μ 'iAnd μiNo Equal, class center is updated to μ 'iIf μiWith μ 'iIt is equal, keeping μiIt is constant;Recalculate corresponding cluster belonging to sample;
10) repetitive process 9), the maximum the number of iterations until all central points that clusters do not change or reach;
11) output cluster divides C={ C1, C2..., Ck}。
(2) bootstrap sampling sampling method is used, is completed to cluster CiSampling operation, building multi-branched it is gloomy at random Woods, detailed process is as follows:
1) bootstrap sampling sampling method is used, in cluster CiMiddle use has the sampling put back to, and samples out T containing m The training set D of a training samplei
2) assume that the feature quantity of sample is M, m feature (m < M) is randomly selected in the division of base decision tree, to each spy A and its each value a is levied, is calculated gini index Gini (D, A);
The gini index: for given sample set D, if belonging to class ckSample set be Ck, then gini index Are as follows:
Under conditions of feature A, whether the gini index Gini (D, A) of set D: given feature A take some according to it Probable value a, sample set D are divided into two subsets: D1And D2, in which:Then:
3) choose optimal characteristics and optimal cut-off: in all feature A and all cut-off a, gini index is minimum A and a be exactly optimal characteristics and optimal cut-off, as tree node.According to optimal characteristics and optimal cut-off by data set Di It is cut into two child nodes;
4) to child node recursive call process 2), process 3), until in data set gini index be less than predetermined value, that is, complete The building of base decision tree;
5) multi-branched random forest is formed by base decision tree.
(4) sort operation for the complete paired data of multi-branched Random Forest model completed using building, specifically step by step such as Under:
(1) step (2) is clustered to the cluster C={ C for completing output1, C2..., CkTo be sequentially inputted to multi-branched gloomy at random Woods;
(2) sample point hiIn category label cjOutput be denoted as
(3) classification of sample is determined using relative majority ballot method:Repeat the above substep Poly- (2), substep poly- (3), is completed until all clusters are classified;
(4) output category result.
The performance and accuracy rate of data classification can be improved using technical solution of the present invention.
Detailed description is made that embodiments of the present invention in conjunction with attached drawing above, but the present invention be not limited to it is described Embodiment.To those skilled in the art, without departing from the principles and spirit of the present invention, to these implementations Mode carries out various change, modification, replacement and variant are still fallen in protection scope of the present invention.

Claims (8)

1. a kind of multi-branched random forest data classification method, which comprises the steps of:
(1) unfiled data set is provided, Data Dimensionality Reduction is denoised using PCA algorithm;
(2) cluster operation of data is completed using K-means algorithm;
(3) multi-branched random forest is constructed;
(4) sort operation of the complete paired data of multi-branched Random Forest model is used.
2. multi-branched random forest data classification method as described in claim 1, which is characterized in that the step (1) is specific Substep is poly- as follows:
(1) sample set is expressed as to the matrix X of N × M;
(2) zero averaging is carried out to each row, that is, seeks the average value R of every a line in matrixi, every a line all subtracts being averaged for the row Value Ni-Ri;Find out covariance matrixSeek the eigenvalue λ of covariance matrix C12…λmWith standardized feature to Measure x1,x2…xm
(3) by feature vector according to corresponding eigenvalue size from top to bottom by rows at matrix, take before k row composition matrix P;
(4) matrix P is multiplied with matrix X, the data after obtaining dimensionality reduction remove the redundancy section in data.
3. multi-branched random forest data classification method as described in claim 1, which is characterized in that the step (2) is specific It is as follows step by step:
(1) density value of each sample point is calculated
Wherein,dijk=| | xij-xkj||,pijIt is the density of i-th of sample point in classification j;njFor j class sample This point sum, dijkIt is sample point xijAnd xkjDistance in vector space;By density value pijMaximum sample point is as first A center that clusters;
(2) it is also contemplated that distance in the selection at the remaining center that clusters, to given sample yn, it is arrived into sample pointDistance be normalized:
(3) by the density value of the sample point and to the sum of the normalized cumulant for having selected cluster centre:
Wherein, pijIndicate the density of i-th of sample point in classification j, DijtIndicate sample point xijTo the center for the t class selected ytNormalized cumulant;Cluster numbers K value is determined by elbow method;
(4) wijAccording to being ordered from large to small, preceding k-1 sample point and p are selectedijIt is worth maximum point as initial poly- Cluster center C1, C2..., Ck
(5) by c1, c2..., ckIt is denoted as μ again as the initial center that clusters1, μ2...μk;Set maximum number of iterations R;
(6) the distance dist (x of each sample and the center that clusters is calculatedi, μj)=| | xij||2, wherein i=1,2 ... N, j= 1,2,…k;
(7) x is determined according to the nearest center of clustering of distanceiCluster label: λj=argminI ∈ { 1,2..., k }dist(xi, μj);
(8) by sample xiIt is divided into corresponding cluster: Cλi=Cλi∪{xi};
(9) after clustering to the completion of all samples, new mean value class center is calculated:If μ 'iAnd μiNot phase Deng class center is updated to μ 'iIf μiWith μ 'iIt is equal, keeping μiIt is constant;Recalculate corresponding cluster belonging to sample;
(10) it repeats step by step (9), the maximum the number of iterations until all central points that clusters do not change or reach;
(11) output cluster divides C={ C1, C2..., Ck}。
4. multi-branched random forest data classification method as described in claim 1, which is characterized in that the step (3) is specific Substep is poly- as follows:
(1) building is completed with the training set of known label, provides training set, data are carried out using K-means algorithm to training set Pretreatment obtains cluster C={ C1, C2..., Ck};
(2) bootstrap sampling sampling method is used, is completed to cluster CiSampling operation, construct multi-branched random forest.
5. multi-branched random forest data classification method as claimed in claim 4, which is characterized in that divide in the step (3) Walking poly- (1), detailed process is as follows:
1) density value of each sample point is calculated
Wherein,dijk=| | xij-xkj||,pijIt is the density of i-th of sample point in classification j;njFor j class sample This point sum, dijkIt is sample point xijAnd xkjDistance in vector space;By density value pijMaximum sample point is as first A center that clusters;
2) it is also contemplated that distance in the selection at the remaining center that clusters, to given sample yn, it is arrived into sample pointDistance be normalized:
3) by the density value of the sample point and to the sum of the normalized cumulant for having selected cluster centre
Wherein, pijIndicate the density of i-th of sample point in classification j, DijtIndicate sample point xijTo the center for the t class selected ytNormalized cumulant;Cluster numbers K value is determined by elbow method;
4) wijAccording to being ordered from large to small, preceding k-1 sample point and p are selectedijIt is worth maximum point as initial poly- Cluster center C1, C2..., Ck
5) by c1, c2..., ckIt is denoted as μ again as the initial center that clusters1, μ2...μk;Set maximum number of iterations R;
6) the distance dist (x of each sample and the center that clusters is calculatedi, μj)=| | xij||2, wherein i=1,2 ... N, j=1, 2,…k;
7) x is determined according to the nearest center of clustering of distanceiCluster label: λj=argminI ∈ { 1,2..., k }dist(xi, μj);
8) by sample xiIt is divided into corresponding cluster: Cλi=Cλi∪{xi};
9) after clustering to the completion of all samples, new mean value class center is calculated:If μ 'iAnd μiNot phase Deng class center is updated to μ 'iIf μiWith μ 'iIt is equal, keeping μiIt is constant;Recalculate corresponding cluster belonging to sample;
10) repetitive process 9), the maximum the number of iterations until all central points that clusters do not change or reach;
11) output cluster divides C={ C1, C2..., Ck}。
6. multi-branched random forest data classification method as claimed in claim 4, which is characterized in that divide in the step (3) Walking poly- (2), detailed process is as follows:
1) bootstrap sampling sampling method is used, in cluster CiMiddle use has the sampling put back to, and samples out T containing m training The training set D of samplei
2) assume that the feature quantity of sample is M, m feature (m < M) is randomly selected in the division of base decision tree, to each feature A And its each value a, it calculates gini index Gini (D, A);
3) optimal characteristics and optimal cut-off are chosen: in all feature A and all cut-off a, the smallest A of gini index and A is exactly optimal characteristics and optimal cut-off, as tree node;According to optimal characteristics and optimal cut-off by data set DiIt is cut into Two child nodes;
4) to child node recursive call process 2), process 3), until in data set gini index be less than predetermined value, that is, complete base The building of decision tree;
5) multi-branched random forest is formed by base decision tree.
7. multi-branched random forest data classification method as claimed in claim 6, which is characterized in that the step (3) substep Gini index Gini (D, A) described in poly- (2), for given sample set D, if belonging to class ckSample set be Ck, then base Buddhist nun's index are as follows:
Under conditions of feature A, whether the gini index Gini (D, A) of set D: given feature A take some may according to it Value a, sample set D are divided into two subsets: D1And D2, in which:Then:
8. multi-branched random forest data classification method as described in claim 1, which is characterized in that the step (4) is specific Substep is poly- as follows:
(1) step (2) is clustered to the cluster C={ C for completing output1, C2..., CkIt is sequentially inputted to multi-branched random forest;
(2) sample point hiIn category label cjOutput be denoted as
(3) classification of sample is determined using relative majority ballot method:It is poly- to repeat the above substep (2), substep poly- (3) is completed until all clusters are classified;
(4) output category result.
CN201811273813.2A 2018-10-30 2018-10-30 A kind of multi-branched random forest data classification method Pending CN109492682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811273813.2A CN109492682A (en) 2018-10-30 2018-10-30 A kind of multi-branched random forest data classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811273813.2A CN109492682A (en) 2018-10-30 2018-10-30 A kind of multi-branched random forest data classification method

Publications (1)

Publication Number Publication Date
CN109492682A true CN109492682A (en) 2019-03-19

Family

ID=65691759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811273813.2A Pending CN109492682A (en) 2018-10-30 2018-10-30 A kind of multi-branched random forest data classification method

Country Status (1)

Country Link
CN (1) CN109492682A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110344824A (en) * 2019-06-25 2019-10-18 中国矿业大学(北京) A kind of sound wave curve generation method returned based on random forest
CN110705584A (en) * 2019-08-21 2020-01-17 深圳壹账通智能科技有限公司 Emotion recognition method, emotion recognition device, computer device and storage medium
CN111797883A (en) * 2019-09-30 2020-10-20 浙江浙能中煤舟山煤电有限责任公司 Coal type identification method based on random forest
CN113254494A (en) * 2020-12-04 2021-08-13 南理工泰兴智能制造研究院有限公司 New energy research and development classification recording method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404887A (en) * 2015-07-05 2016-03-16 中国计量学院 White blood count five-classification method based on random forest
CN105868773A (en) * 2016-03-23 2016-08-17 华南理工大学 Hierarchical random forest based multi-tag classification method
CN106203508A (en) * 2016-07-11 2016-12-07 天津大学 A kind of image classification method based on Hadoop platform
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404887A (en) * 2015-07-05 2016-03-16 中国计量学院 White blood count five-classification method based on random forest
CN105868773A (en) * 2016-03-23 2016-08-17 华南理工大学 Hierarchical random forest based multi-tag classification method
CN106203508A (en) * 2016-07-11 2016-12-07 天津大学 A kind of image classification method based on Hadoop platform
CN107395590A (en) * 2017-07-19 2017-11-24 福州大学 A kind of intrusion detection method classified based on PCA and random forest

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
林伟宁等: "一种基于PCA和随机森林分类的入侵检测算法研究", 《技术研究》 *
梁腾: "基于聚类分析的入侵检测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
荀港益: "基于聚类分析与随机森林的短期负荷滚动预测", 《智能城市》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110344824A (en) * 2019-06-25 2019-10-18 中国矿业大学(北京) A kind of sound wave curve generation method returned based on random forest
CN110705584A (en) * 2019-08-21 2020-01-17 深圳壹账通智能科技有限公司 Emotion recognition method, emotion recognition device, computer device and storage medium
WO2021031817A1 (en) * 2019-08-21 2021-02-25 深圳壹账通智能科技有限公司 Emotion recognition method and device, computer device, and storage medium
CN111797883A (en) * 2019-09-30 2020-10-20 浙江浙能中煤舟山煤电有限责任公司 Coal type identification method based on random forest
CN113254494A (en) * 2020-12-04 2021-08-13 南理工泰兴智能制造研究院有限公司 New energy research and development classification recording method
CN113254494B (en) * 2020-12-04 2023-12-08 南理工泰兴智能制造研究院有限公司 New energy research and development classification recording method

Similar Documents

Publication Publication Date Title
CN109492682A (en) A kind of multi-branched random forest data classification method
Fränti et al. Randomised local search algorithm for the clustering problem
CN110460605B (en) Abnormal network flow detection method based on automatic coding
CN110046634B (en) Interpretation method and device of clustering result
Shen et al. Balanced binary neural networks with gated residual
CN111125469B (en) User clustering method and device of social network and computer equipment
JP5754310B2 (en) Identification information providing program and identification information providing apparatus
CN110942091A (en) Semi-supervised few-sample image classification method for searching reliable abnormal data center
CN110598061A (en) Multi-element graph fused heterogeneous information network embedding method
Tan Rule learning and extraction with self-organizing neural networks
CN109902808A (en) A method of convolutional neural networks are optimized based on floating-point numerical digit Mutation Genetic Algorithms Based
CN115512772A (en) High-precision single cell clustering method and system based on marker genes and ensemble learning
da Silva et al. Validity index-based vigilance test in adaptive resonance theory neural networks
CN109934286A (en) Bug based on Text character extraction and uneven processing strategie reports severity recognition methods
Sadeghi et al. Deep clustering with self-supervision using pairwise data similarities
CN117478390A (en) Network intrusion detection method based on improved density peak clustering algorithm
CN116612307A (en) Solanaceae disease grade identification method based on transfer learning
Fu et al. Classification via subspace learning machine (slm): Methodology and performance evaluation
Fattore et al. Optimal scoring of partially ordered data, with an application to the ranking of smart cities
Arifuzzaman et al. An Advanced Decision Tree-Based Deep Neural Network in Nonlinear Data Classification. Technologies 2023, 11, 24
CN115168602A (en) Triple classification method based on improved concepts and examples
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN110347933B (en) Ego network social circle recognition method
Satpute et al. Machine Intelligence Techniques for Protein Classification
Liao Graph neural networks: graph generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190319

WD01 Invention patent application deemed withdrawn after publication