CN109325511A - A kind of algorithm improving feature selecting - Google Patents

A kind of algorithm improving feature selecting Download PDF

Info

Publication number
CN109325511A
CN109325511A CN201810859899.0A CN201810859899A CN109325511A CN 109325511 A CN109325511 A CN 109325511A CN 201810859899 A CN201810859899 A CN 201810859899A CN 109325511 A CN109325511 A CN 109325511A
Authority
CN
China
Prior art keywords
feature
correlation
document
value
rdc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810859899.0A
Other languages
Chinese (zh)
Other versions
CN109325511B (en
Inventor
汪海涛
唐康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810859899.0A priority Critical patent/CN109325511B/en
Publication of CN109325511A publication Critical patent/CN109325511A/en
Application granted granted Critical
Publication of CN109325511B publication Critical patent/CN109325511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a kind of algorithms for improving feature selecting, belong to the high-dimensional feature selection technique field of feature space.The present invention uses RDC (opposite discrimination standard) measurement first to calculate the correlation of each feature, and the correlation between feature is then calculated using Pearson correlation coefficient.Optimal characteristics are gradually selected finally by the M value that the present invention defines is calculated.The present invention not only selects maximally related feature in feature space, and the redundancy between them is considered using relativity measurement, redundancy and incoherent feature can be filtered from feature space, select optimal feature subset in feature space, by feature space dimensionality reduction, to improve the performance of text classification.

Description

A kind of algorithm improving feature selecting
Technical field
The present invention relates to a kind of algorithms for improving feature selecting, belong to the high-dimensional feature selection technique neck of feature space Domain.
Background technique
Big data era founder is exactly internet, and the rapidly development of internet makes data volume that explosive increase be presented.? In face of so big data volume, the opportunity of a lifetime was not only brought but also had brought great challenge.Much have The information of value is flooded by a large amount of hash, makes people be difficult to obtain oneself needs and again valuable information, therefore The information that people's needs how are excavated from mass data becomes the emphasis direction of research.Text classification oneself become one it is important Research topic, be widely studied and applied in machine learning, information retrieval and Spam filtering.In these necks Domain applicating text sorting technique, has many advantages.Classification Management for digital library is contracted significantly relative to manual method It is asked when the short classified finishing of document.In information retrieval field, by Text Classification, text information is divided into related and not Related category filters out useless search result, can significantly improve the accuracy rate and speed of retrieval.The technology of current text classification With theoretical comparative maturity, and good achievement is achieved.But with the development of mobile internet, being permitted occurs in text data Mostly new feature.Such as the social networks based on microblogging, wechat, community and discussion bar is popular, short text data is gradually increasing. In addition, the new variation such as the classification a few days of text increases, category distribution is uneven, classification mark difficulty, also gives text classification band Giant's challenge is carried out.There are also considerable room for improvement for text classification, it is still necessary to studying it, improve the effect of text classification Fruit.During text classification, document is usually modeled as a vector space, wherein each word is considered as a spy Sign.In the vector model of document, the value of feature can be the frequency or term frequency-inverse document frequency (tf-idf) of its equivalent. In text classification sixty-four dollar question first is that processing feature space it is high-dimensional.The higher-dimension of feature space is especially comprising big Measuring in the text categorization task of word leads to increased calculating cost and reduced classification performance.Feature selecting and extraction are to reduce Two kinds of main methods of text feature Spatial Dimension.Feature selecting is paid close attention in recent years, it is intended to be utilized centainly from data Policy selection goes out an optimal subset of primitive character collection, to promote the study of subsequent other goal tasks.Feature selecting Target includes the meaning of three aspects: (1) improving the estimated performance of object module;(2) reduce object module training time and Predicted time improves efficiency;(3) the generation process of the implication and data in data is disclosed.It is briefly exactly feature choosing It selects so that data are more simplified effectively, while helping to more fully understand data.Feature selecting as data processing primary one Step, for big data, can reduce data scale, reduce the difficulty of the learning of object model, can be to Data Dimensionality Reduction for high dimensional data To overcome the problems, such as " dimension disaster ", model over-fitting is prevented.Especially in the study of high dimensional data, to data carry out analysis and The difficulty and cost relative data dimension of study exponentially increase, it is necessary to learn complex model, to improve the expression of model Ability, while needing the data volume of exponential growth also to support the study of complex model.Data volume is too small, then will lead to model The Generalization Capability of over-fitting, model is poor.Therefore, very necessary to data progress feature selecting, but will be in the Pang of primitive character collection Optimal characteristics collection is found as the expression to data in big subset space, and difficulty is very big.Feature extraction refers to by merging or becoming Initial form is changed to generate the process of a small group new feature, and in feature selecting, it is reduced by selecting most significant feature Spatial Dimension.Feature selection approach can be divided into four classes: filter, wrapper, embedded and hybrid method.Filter method Statistical analysis is executed to select the distinction subset of feature to feature space.Feature selection approach should be able to be identified and be removed to the greatest extent Uncorrelated and redundancy feature more than possible.Incoherent feature can be effectively removed in most of feature selection approach, but not Redundancy feature can be handled.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of improvement feature selecting algorithms, in order to overcome above-mentioned The deficiencies in the prior art, the algorithm can filter redundancy and incoherent feature from feature space, select optimal in feature space Character subset further increases the effect of text classification to achieve the purpose that dimensionality reduction.
The technical solution adopted by the present invention is that: a kind of algorithm improving feature extraction includes the following steps:
Step1: inputting the feature quantity k that final feature space includes, and creates new set a S, F as the institute of document D There is characteristic set;
Step2: each of traversal F feature fs, calculate its relevance values RDC (fs), i.e., it is calculated using following equations group RDC value:
RDC(Wi)=AUC (wi,tcm),
Wherein WiIt is characterized word, dfpos(wi) and dfneg(wi) it is containing word W respectivelyiNumber of documents and do not contain word Language WiNumber of documents, tcj(wi) indicate word WiQuantity in document j, AUC (wi,tcj) indicate Feature Words WiWith word frequency tcj ROC curve under area, tcj-1Indicate quantity of the Feature Words in document j-1, tcj+1Indicate number of the Feature Words in document j+1 Amount, tcmIndicate quantity of the Feature Words in the last one document m;
Step3: it is ranked up according to the calculated RDC value of step2 institute;
Step4: the feature f of maximum RDC is selectedmax
Step5: addition fmaxTo set S;
Step6: f is removed from set Fmax
Step7: traversal set F enables sum (f to each characteristic valuei)=0;
Step8: traversal set F, to each characteristic value fi, calculate itself and each feature f in SsThe degree of correlation Correlation(fi,fs) and calculate
Step9: to each of set F characteristic value fi, its M (f is calculated using following formulai) value,
That is:
M(fi)=RDC (fi)-sum(fi),
Wherein RDC (fi) it is feature fiCorrelation, and correlation (fi, fj) indicate by their similarity Two feature f of definitioniAnd fjBetween correlation, calculate correlation with Pearson correlation coefficient:
Wherein fi,dAnd fj,dIt is the word frequency of the Feature Words i and j of d-th of document respectively,WithIt is f respectivelyiAnd fjIn document The average value of word frequency in set, Correlation (fi,fj) be 1 when indicate maximum positive correlation, Correlation (fi,fj) Maximum negative correlation is indicated when being -1, value is between -1 and 1;
Step10: the selection maximum feature f of M valuemax
Step11: increase fmaxTo set S;
Step12: f is removed from set Fmax
Step13: step8-Step12 is repeated until the quantity in set S is equal to K;
Step14: set S is the feature of final choice.
The beneficial effects of the present invention are:
1, the precision of ratio of precision conventional method RDC of the invention is high;
2, the present invention removes redundancy and incoherent feature from feature space, realizes further feature space dimensionality reduction.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is described in further detail.
Embodiment 1: as shown in Figure 1, a kind of algorithm for improving feature extraction, includes the following steps:
The present invention uses RDC (opposite discrimination standard) measurement first to calculate the correlation of each feature, then uses skin Your inferior related coefficient calculates the correlation between feature.Finally by calculating, the M value that the present invention defines is optimal gradually to select Feature.
It is specific as follows:
Step1: inputting the feature quantity k that final feature space includes, (value of k is set according to actual conditions, and does not make here It is specific to limit), creation one new set S, F are all characteristic sets of document D;
Step2: each of traversal F feature fs, calculate its relevance values RDC (fs), i.e., it is calculated using following equations group RDC value:
RDC(Wi)=AUC (wi,tcm),
Wherein WiIt is characterized word, dfpos(wi) and dfneg(wi) it is containing word W respectivelyiNumber of documents and do not contain word Language WiNumber of documents, tcj(wi) indicate word WiQuantity in document j, AUC (wi,tcj) indicate Feature Words WiWith word frequency tcj ROC curve under area, tcj-1Indicate quantity of the Feature Words in document j-1, tcj+1Indicate number of the Feature Words in document j+1 Amount, tcmIndicate quantity of the Feature Words in the last one document m;
Step3: it is ranked up according to the calculated RDC value of step2 institute;
Step4: the feature f of maximum RDC is selectedmax
Step5: addition fmaxTo set S;
Step6: f is removed from set Fmax
Step7: traversal set F enables sum (f to each characteristic valuei)=0;
Step8: traversal set F, to each characteristic value fi, calculate itself and each feature f in SsThe degree of correlation Correlation(fi,fs) and calculate
Step9: to each of set F characteristic value fi, its M (f is calculated using following formulai) value,
That is:
M(fi)=RDC (fi)-sum(fi),
Wherein RDC (fi) it is feature fiCorrelation, and correlation (fi, fj) indicate by their similarity Two feature f of definitioniAnd fjBetween correlation, calculate correlation with Pearson correlation coefficient:
Wherein fi,dAnd fj,dIt is the word frequency of the Feature Words i and j of d-th of document respectively,WithIt is f respectivelyiAnd fjIn document The average value of word frequency in set, Correlation (fi,fj) be 1 when indicate maximum positive correlation, Correlation (fi,fj) Maximum negative correlation is indicated when being -1, value is between -1 and 1;
Step10: the selection maximum feature f of M valuemax
Step11: increase fmaxTo set S;
Step12: f is removed from set Fmax
Step13: step8-Step12 is repeated until the quantity in set S is equal to K;
Step14: set S is the feature of final choice.
Below with reference to specific example, the present invention is described in detail:
1 one, table simple data sets (only two class kind classifications)
Document Class Content
Document D 1 Front Cat, fish
Document D 2 Front Cat, mouse, fish
Document D 3 Front Mouse, fish
Document D 4 Front Mouse, cat, fish, mouse, fish
Document D 5 Front Fish, cat, fish, cat
Document D 6 Front Fish, mouse
Document D 7 Negatively Dog, mouse
Document D 8 Negatively Dog, dog
Document D 9 Negatively Fish, fish, mouse
Document D 10 Negatively Mouse
Document D 11 Negatively Cat, fish
Document D 12 Negatively Dog, fish
A simple generated data collection is provided in table 1.The data set is made of 12 documents, includes 4 words, packet Include ' cat ', ' dog ', ' mouse ' and ' fish '.Each document in this data set belongs to front or negative classification.
The data set word frequency of table 2
Document f1(cat) f2(fish) f3(mouse) f4(dog) f5(fish)
Document 1 1 1 0 0 1
Document 2 1 1 1 0 1
Document 3 0 1 1 0 1
Document 4 1 2 2 0 2
Document 5 2 2 0 0 2
Document 6 0 1 1 0 1
Document 7 0 0 1 1 0
Document 8 0 0 0 2 0
Document 9 0 2 1 0 2
Document 10 0 0 1 0 0
Document 11 1 1 0 0 0
Document 12 0 1 0 1 1
Table 2 shows the matrix form (i.e. vector model) of the data set.Firstly, calculate each word in each document Word frequency.In order to protrude the validity of the invention, one of feature f is repeated again2And as new feature f5It is added to data set In.Therefore, feature fiAnd fjIt is perfectly correlated.The purpose of feature selecting is the highly relevant spy that selection has minimum correlation Sign.f2And f5Comprising identical information, one of them is extra.One of feature has a higher M value, and another redundancy The M value of feature is relatively low.F is calculated below2And f5The value of RDC be the same;And the value for calculating their M is then different.
RDC(f1)=(2+5)/2+ (5+0)/2=6
RDC(f2)=RDC (f5)=(1+0.5)/2+ (0.5+0)/2=1
RDC(f3)=(0+5)/2+ (5+0)/2=5
RDC(f4)=(20+5)/2+ (5+0)/2=15
Formula based on the M proposed repeats these calculating, and corresponding result is as follows, f2And f5M value not It is identical.According to formula:
In, the value of the final M of feature is with correlation (first item on the right of equation) and redundancy (on the right of equation Section 2) it is related.As two feature f2And f5When closely similar, therefore the correlation between them will be close to 1., if fjF is selected beforei, then fiThe redundancy value of distribution will be less than fj
Table 3RDC and M value proposed by the invention
Method f1(cat) f2(fish) f3(mouse) f4(dog) f5(fish)
RDC 6 1 5 15 1
M 5.902 -0.193 4.63 15 -0.84
Table 3 compares RDC the and M value of these features.It can be seen that f2And f5RDC value it is the same, and the two features are then There is different M values.In this example, f2And f5Feature is identical, but f5M value ratio f2M value it is low, and have identical RDC Value.Use M value, f2And f5Selection or give up depending on threshold value k (the predefined quantity k) of final character subset.If k=3, f2And f5It does not select, and selects k=4, then f2 is selected, and f5 is rejected, and if k=5, f2And f5All selected.
The overall flow of the example:
For document D 4, K=4:
Step 1: initialization set S, set F are all characteristic values of D4, i.e., { mouse, cat, fish, mouse, fish };
Step 2: for each value in set F.Using formula calculate its degree of correlation RDC value f (cat)=6, f (fish)= 1, f (mouse)=5, f (dog)=15, f (fish)=1;
Step 3: be ranked up according to RDC f (dog)=15, f (cat)=6, f (mouse)=5, f (fish)=1, f (fish)= 1}
Step4: the selection maximum feature f of RDC valuemax(dog);
Step5: f (dog) is added in set S;
Step6: f (dog) is removed from set F;
Step7: traversal set F, to each feature sum (fi)=0;
Step8: to each of set F feature, the f (dog) combined in S calculates Correlation (fi, f (dog)) i.e.
Correlation (f (cat), f (dog))=sum (cat)+Correlation (f (cat), f (cat))
Sum(fi)=sum (f (cat))+Correlation (f (cat), f (dog))
Correlation (f (fish), f (dog))=sum (fish)+Correlation (f (fish), f (cat))
Sum(fi)=sum (f (fish))+Correlation (f (fish), f (dog))
Correlation (f (mouse), f (dog))=sum (mouse)+Correlation (f (mouse), f (cat))
Sum(fi)=sum (f (mouse))+Correlation (f (mouse), f (dog))
Correlation (f (fish), f (dog))=sum (fish)+Correlation (f (fish), f (cat))
Sum(fi)=sum (f (fish))+Correlation (f (fish), f (dog));
Step9: its M value, respectively M (cat)=5.902, M (fish)=- 0.193, M are calculated to each value in set F (mouse)=4.63, M (fish)=- 0.84;
Step10: selection that maximum characteristic value of M value, i.e. M (cat)=5.902 are put into f (cat) in set S;
Step11: f (cat) is removed from set F;
Step12: the quantity repeated in step8-step11 to set S is equal to 4.
Final selected feature space is { dog, mouse, cat, fish }.
The present invention not only selects maximally related feature in feature space, but also is considered between them using relativity measurement Redundancy can filter redundancy and incoherent feature from feature space, select optimal feature subset in feature space, by feature sky Between dimensionality reduction, to improve the performance of text classification.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (1)

1. a kind of algorithm for improving feature extraction, characterized by the following steps:
Step1: the feature quantity k that final feature space includes is inputted, creation one new set S, F are all spies of document D Collection is closed;
Step2: each of traversal F feature fs, calculate its relevance values RDC (fs), i.e., RDC is calculated using following equations group Value:
RDC(Wi)=AUC (wi,tcm),
Wherein WiIt is characterized word, dfpos(wi) and dfneg(wi) it is containing word W respectivelyiNumber of documents and do not contain word Wi's Number of documents, tcj(wi) indicate word WiQuantity in document j, AUC (wi,tcj) indicate Feature Words WiWith word frequency tcjROC Area under the curve, tcj-1Indicate quantity of the Feature Words in document j-1, tcj+1Indicate quantity of the Feature Words in document j+1, tcmIndicate quantity of the Feature Words in the last one document m;
Step3: it is ranked up according to the calculated RDC value of step2 institute;
Step4: the feature f of maximum RDC is selectedmax
Step5: addition fmaxTo set S;
Step6: f is removed from set Fmax
Step7: traversal set F enables sum (f to each characteristic valuei)=0;
Step8: traversal set F, to each characteristic value fi, calculate itself and each feature f in SsThe degree of correlation Correlation(fi,fs) and calculate
Step9: to each of set F characteristic value fi, its M (f is calculated using following formulai) value,
That is:
M(fi)=RDC (fi)-sum(fi),
Wherein RDC (fi) it is feature fiCorrelation, and correlation (fi, fj) indicate to be defined by their similarity Two feature fiAnd fjBetween correlation, calculate correlation with Pearson correlation coefficient:
Wherein fi,dAnd fj,dIt is the word frequency of the Feature Words i and j of d-th of document respectively,WithIt is f respectivelyiAnd fjIn collection of document The average value of middle word frequency, Correlation (fi,fj) be 1 when indicate maximum positive correlation, Correlation (fi,fj) it is -1 When indicate maximum negative correlation, value is between -1 and 1;
Step10: the selection maximum feature f of M valuemax
Step11: increase fmaxTo set S;
Step12: f is removed from set Fmax
Step13: step8-Step12 is repeated until the quantity in set S is equal to K;
Step14: set S is the feature of final choice.
CN201810859899.0A 2018-08-01 2018-08-01 Method for improving feature selection Active CN109325511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810859899.0A CN109325511B (en) 2018-08-01 2018-08-01 Method for improving feature selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810859899.0A CN109325511B (en) 2018-08-01 2018-08-01 Method for improving feature selection

Publications (2)

Publication Number Publication Date
CN109325511A true CN109325511A (en) 2019-02-12
CN109325511B CN109325511B (en) 2020-07-31

Family

ID=65264054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810859899.0A Active CN109325511B (en) 2018-08-01 2018-08-01 Method for improving feature selection

Country Status (1)

Country Link
CN (1) CN109325511B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188143A (en) * 2019-04-04 2019-08-30 上海发电设备成套设计研究院有限责任公司 A kind of power plant Vibration Trouble of Induced Draft Fan diagnostic method
CN110426612A (en) * 2019-08-17 2019-11-08 福州大学 A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005081158A3 (en) * 2004-02-23 2006-03-02 Novartis Ag Use of feature point pharmacophores (fepops)
CN102200981A (en) * 2010-03-25 2011-09-28 三星电子(中国)研发中心 Feature selection method and feature selection device for hierarchical text classification
CN103177121A (en) * 2013-04-12 2013-06-26 天津大学 Locality preserving projection method for adding pearson relevant coefficient
CN105512311A (en) * 2015-12-14 2016-04-20 北京工业大学 Chi square statistic based self-adaption feature selection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005081158A3 (en) * 2004-02-23 2006-03-02 Novartis Ag Use of feature point pharmacophores (fepops)
CN102200981A (en) * 2010-03-25 2011-09-28 三星电子(中国)研发中心 Feature selection method and feature selection device for hierarchical text classification
CN103177121A (en) * 2013-04-12 2013-06-26 天津大学 Locality preserving projection method for adding pearson relevant coefficient
CN105512311A (en) * 2015-12-14 2016-04-20 北京工业大学 Chi square statistic based self-adaption feature selection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188143A (en) * 2019-04-04 2019-08-30 上海发电设备成套设计研究院有限责任公司 A kind of power plant Vibration Trouble of Induced Draft Fan diagnostic method
CN110426612A (en) * 2019-08-17 2019-11-08 福州大学 A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method

Also Published As

Publication number Publication date
CN109325511B (en) 2020-07-31

Similar Documents

Publication Publication Date Title
Bansal et al. Improved k-mean clustering algorithm for prediction analysis using classification technique in data mining
Silva Filho et al. Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization
CN105469096B (en) A kind of characteristic bag image search method based on Hash binary-coding
CN105426426B (en) A kind of KNN file classification methods based on improved K-Medoids
CN108228844B (en) Picture screening method and device, storage medium and computer equipment
CN108197144B (en) Hot topic discovery method based on BTM and Single-pass
Eluri et al. A comparative study of various clustering techniques on big data sets using Apache Mahout
CN108897791B (en) Image retrieval method based on depth convolution characteristics and semantic similarity measurement
CN109508374B (en) Text data semi-supervised clustering method based on genetic algorithm
Chen et al. Multi-granular mining for boundary regions in three-way decision theory
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
Naidu et al. Feature selection algorithm for improving the performance of classification: A survey
CN108268470A (en) A kind of comment text classification extracting method based on the cluster that develops
Vishwakarma et al. A comparative study of K-means and K-medoid clustering for social media text mining
Akhmedova et al. Automatically generated classifiers for opinion mining with different term weighting schemes
CN109325511A (en) A kind of algorithm improving feature selecting
Duan et al. Improving spectral clustering with deep embedding, cluster estimation and metric learning
Chung et al. Accurate ensemble pruning with PL-bagging
Liang et al. Adaptive fusion based method for imbalanced data classification
Shyr et al. Supervised hierarchical Pitman-Yor process for natural scene segmentation
CN107704872A (en) A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method
Banu et al. A study of feature selection approaches for classification
Rahman et al. Denclust: A density based seed selection approach for k-means
Ahmed et al. Clustering Based Sentiment Analysis Using Randomized Clustering Cuckoo Search Algorithm
Meng et al. Adaptive resonance theory (ART) for social media analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wang Haitao

Inventor before: Wang Haitao

Inventor before: Tang Kang

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant