CN109325511A - A kind of algorithm improving feature selecting - Google Patents
A kind of algorithm improving feature selecting Download PDFInfo
- Publication number
- CN109325511A CN109325511A CN201810859899.0A CN201810859899A CN109325511A CN 109325511 A CN109325511 A CN 109325511A CN 201810859899 A CN201810859899 A CN 201810859899A CN 109325511 A CN109325511 A CN 109325511A
- Authority
- CN
- China
- Prior art keywords
- feature
- correlation
- document
- value
- rdc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 claims description 5
- 238000000034 method Methods 0.000 abstract description 12
- 238000005259 measurement Methods 0.000 abstract description 4
- 241000251468 Actinopterygii Species 0.000 description 33
- 241000282326 Felis catus Species 0.000 description 23
- 241000699666 Mus <mouse, genus> Species 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 210000003739 neck Anatomy 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000009328 Perro Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses a kind of algorithms for improving feature selecting, belong to the high-dimensional feature selection technique field of feature space.The present invention uses RDC (opposite discrimination standard) measurement first to calculate the correlation of each feature, and the correlation between feature is then calculated using Pearson correlation coefficient.Optimal characteristics are gradually selected finally by the M value that the present invention defines is calculated.The present invention not only selects maximally related feature in feature space, and the redundancy between them is considered using relativity measurement, redundancy and incoherent feature can be filtered from feature space, select optimal feature subset in feature space, by feature space dimensionality reduction, to improve the performance of text classification.
Description
Technical field
The present invention relates to a kind of algorithms for improving feature selecting, belong to the high-dimensional feature selection technique neck of feature space
Domain.
Background technique
Big data era founder is exactly internet, and the rapidly development of internet makes data volume that explosive increase be presented.?
In face of so big data volume, the opportunity of a lifetime was not only brought but also had brought great challenge.Much have
The information of value is flooded by a large amount of hash, makes people be difficult to obtain oneself needs and again valuable information, therefore
The information that people's needs how are excavated from mass data becomes the emphasis direction of research.Text classification oneself become one it is important
Research topic, be widely studied and applied in machine learning, information retrieval and Spam filtering.In these necks
Domain applicating text sorting technique, has many advantages.Classification Management for digital library is contracted significantly relative to manual method
It is asked when the short classified finishing of document.In information retrieval field, by Text Classification, text information is divided into related and not
Related category filters out useless search result, can significantly improve the accuracy rate and speed of retrieval.The technology of current text classification
With theoretical comparative maturity, and good achievement is achieved.But with the development of mobile internet, being permitted occurs in text data
Mostly new feature.Such as the social networks based on microblogging, wechat, community and discussion bar is popular, short text data is gradually increasing.
In addition, the new variation such as the classification a few days of text increases, category distribution is uneven, classification mark difficulty, also gives text classification band
Giant's challenge is carried out.There are also considerable room for improvement for text classification, it is still necessary to studying it, improve the effect of text classification
Fruit.During text classification, document is usually modeled as a vector space, wherein each word is considered as a spy
Sign.In the vector model of document, the value of feature can be the frequency or term frequency-inverse document frequency (tf-idf) of its equivalent.
In text classification sixty-four dollar question first is that processing feature space it is high-dimensional.The higher-dimension of feature space is especially comprising big
Measuring in the text categorization task of word leads to increased calculating cost and reduced classification performance.Feature selecting and extraction are to reduce
Two kinds of main methods of text feature Spatial Dimension.Feature selecting is paid close attention in recent years, it is intended to be utilized centainly from data
Policy selection goes out an optimal subset of primitive character collection, to promote the study of subsequent other goal tasks.Feature selecting
Target includes the meaning of three aspects: (1) improving the estimated performance of object module;(2) reduce object module training time and
Predicted time improves efficiency;(3) the generation process of the implication and data in data is disclosed.It is briefly exactly feature choosing
It selects so that data are more simplified effectively, while helping to more fully understand data.Feature selecting as data processing primary one
Step, for big data, can reduce data scale, reduce the difficulty of the learning of object model, can be to Data Dimensionality Reduction for high dimensional data
To overcome the problems, such as " dimension disaster ", model over-fitting is prevented.Especially in the study of high dimensional data, to data carry out analysis and
The difficulty and cost relative data dimension of study exponentially increase, it is necessary to learn complex model, to improve the expression of model
Ability, while needing the data volume of exponential growth also to support the study of complex model.Data volume is too small, then will lead to model
The Generalization Capability of over-fitting, model is poor.Therefore, very necessary to data progress feature selecting, but will be in the Pang of primitive character collection
Optimal characteristics collection is found as the expression to data in big subset space, and difficulty is very big.Feature extraction refers to by merging or becoming
Initial form is changed to generate the process of a small group new feature, and in feature selecting, it is reduced by selecting most significant feature
Spatial Dimension.Feature selection approach can be divided into four classes: filter, wrapper, embedded and hybrid method.Filter method
Statistical analysis is executed to select the distinction subset of feature to feature space.Feature selection approach should be able to be identified and be removed to the greatest extent
Uncorrelated and redundancy feature more than possible.Incoherent feature can be effectively removed in most of feature selection approach, but not
Redundancy feature can be handled.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of improvement feature selecting algorithms, in order to overcome above-mentioned
The deficiencies in the prior art, the algorithm can filter redundancy and incoherent feature from feature space, select optimal in feature space
Character subset further increases the effect of text classification to achieve the purpose that dimensionality reduction.
The technical solution adopted by the present invention is that: a kind of algorithm improving feature extraction includes the following steps:
Step1: inputting the feature quantity k that final feature space includes, and creates new set a S, F as the institute of document D
There is characteristic set;
Step2: each of traversal F feature fs, calculate its relevance values RDC (fs), i.e., it is calculated using following equations group
RDC value:
RDC(Wi)=AUC (wi,tcm),
Wherein WiIt is characterized word, dfpos(wi) and dfneg(wi) it is containing word W respectivelyiNumber of documents and do not contain word
Language WiNumber of documents, tcj(wi) indicate word WiQuantity in document j, AUC (wi,tcj) indicate Feature Words WiWith word frequency tcj
ROC curve under area, tcj-1Indicate quantity of the Feature Words in document j-1, tcj+1Indicate number of the Feature Words in document j+1
Amount, tcmIndicate quantity of the Feature Words in the last one document m;
Step3: it is ranked up according to the calculated RDC value of step2 institute;
Step4: the feature f of maximum RDC is selectedmax;
Step5: addition fmaxTo set S;
Step6: f is removed from set Fmax;
Step7: traversal set F enables sum (f to each characteristic valuei)=0;
Step8: traversal set F, to each characteristic value fi, calculate itself and each feature f in SsThe degree of correlation
Correlation(fi,fs) and calculate
Step9: to each of set F characteristic value fi, its M (f is calculated using following formulai) value,
That is:
M(fi)=RDC (fi)-sum(fi),
Wherein RDC (fi) it is feature fiCorrelation, and correlation (fi, fj) indicate by their similarity
Two feature f of definitioniAnd fjBetween correlation, calculate correlation with Pearson correlation coefficient:
Wherein fi,dAnd fj,dIt is the word frequency of the Feature Words i and j of d-th of document respectively,WithIt is f respectivelyiAnd fjIn document
The average value of word frequency in set, Correlation (fi,fj) be 1 when indicate maximum positive correlation, Correlation (fi,fj)
Maximum negative correlation is indicated when being -1, value is between -1 and 1;
Step10: the selection maximum feature f of M valuemax;
Step11: increase fmaxTo set S;
Step12: f is removed from set Fmax;
Step13: step8-Step12 is repeated until the quantity in set S is equal to K;
Step14: set S is the feature of final choice.
The beneficial effects of the present invention are:
1, the precision of ratio of precision conventional method RDC of the invention is high;
2, the present invention removes redundancy and incoherent feature from feature space, realizes further feature space dimensionality reduction.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is described in further detail.
Embodiment 1: as shown in Figure 1, a kind of algorithm for improving feature extraction, includes the following steps:
The present invention uses RDC (opposite discrimination standard) measurement first to calculate the correlation of each feature, then uses skin
Your inferior related coefficient calculates the correlation between feature.Finally by calculating, the M value that the present invention defines is optimal gradually to select
Feature.
It is specific as follows:
Step1: inputting the feature quantity k that final feature space includes, (value of k is set according to actual conditions, and does not make here
It is specific to limit), creation one new set S, F are all characteristic sets of document D;
Step2: each of traversal F feature fs, calculate its relevance values RDC (fs), i.e., it is calculated using following equations group
RDC value:
RDC(Wi)=AUC (wi,tcm),
Wherein WiIt is characterized word, dfpos(wi) and dfneg(wi) it is containing word W respectivelyiNumber of documents and do not contain word
Language WiNumber of documents, tcj(wi) indicate word WiQuantity in document j, AUC (wi,tcj) indicate Feature Words WiWith word frequency tcj
ROC curve under area, tcj-1Indicate quantity of the Feature Words in document j-1, tcj+1Indicate number of the Feature Words in document j+1
Amount, tcmIndicate quantity of the Feature Words in the last one document m;
Step3: it is ranked up according to the calculated RDC value of step2 institute;
Step4: the feature f of maximum RDC is selectedmax;
Step5: addition fmaxTo set S;
Step6: f is removed from set Fmax;
Step7: traversal set F enables sum (f to each characteristic valuei)=0;
Step8: traversal set F, to each characteristic value fi, calculate itself and each feature f in SsThe degree of correlation
Correlation(fi,fs) and calculate
Step9: to each of set F characteristic value fi, its M (f is calculated using following formulai) value,
That is:
M(fi)=RDC (fi)-sum(fi),
Wherein RDC (fi) it is feature fiCorrelation, and correlation (fi, fj) indicate by their similarity
Two feature f of definitioniAnd fjBetween correlation, calculate correlation with Pearson correlation coefficient:
Wherein fi,dAnd fj,dIt is the word frequency of the Feature Words i and j of d-th of document respectively,WithIt is f respectivelyiAnd fjIn document
The average value of word frequency in set, Correlation (fi,fj) be 1 when indicate maximum positive correlation, Correlation (fi,fj)
Maximum negative correlation is indicated when being -1, value is between -1 and 1;
Step10: the selection maximum feature f of M valuemax;
Step11: increase fmaxTo set S;
Step12: f is removed from set Fmax;
Step13: step8-Step12 is repeated until the quantity in set S is equal to K;
Step14: set S is the feature of final choice.
Below with reference to specific example, the present invention is described in detail:
1 one, table simple data sets (only two class kind classifications)
Document | Class | Content |
Document D 1 | Front | Cat, fish |
Document D 2 | Front | Cat, mouse, fish |
Document D 3 | Front | Mouse, fish |
Document D 4 | Front | Mouse, cat, fish, mouse, fish |
Document D 5 | Front | Fish, cat, fish, cat |
Document D 6 | Front | Fish, mouse |
Document D 7 | Negatively | Dog, mouse |
Document D 8 | Negatively | Dog, dog |
Document D 9 | Negatively | Fish, fish, mouse |
Document D 10 | Negatively | Mouse |
Document D 11 | Negatively | Cat, fish |
Document D 12 | Negatively | Dog, fish |
A simple generated data collection is provided in table 1.The data set is made of 12 documents, includes 4 words, packet
Include ' cat ', ' dog ', ' mouse ' and ' fish '.Each document in this data set belongs to front or negative classification.
The data set word frequency of table 2
Document | f1(cat) | f2(fish) | f3(mouse) | f4(dog) | f5(fish) |
Document 1 | 1 | 1 | 0 | 0 | 1 |
Document 2 | 1 | 1 | 1 | 0 | 1 |
Document 3 | 0 | 1 | 1 | 0 | 1 |
Document 4 | 1 | 2 | 2 | 0 | 2 |
Document 5 | 2 | 2 | 0 | 0 | 2 |
Document 6 | 0 | 1 | 1 | 0 | 1 |
Document 7 | 0 | 0 | 1 | 1 | 0 |
Document 8 | 0 | 0 | 0 | 2 | 0 |
Document 9 | 0 | 2 | 1 | 0 | 2 |
Document 10 | 0 | 0 | 1 | 0 | 0 |
Document 11 | 1 | 1 | 0 | 0 | 0 |
Document 12 | 0 | 1 | 0 | 1 | 1 |
Table 2 shows the matrix form (i.e. vector model) of the data set.Firstly, calculate each word in each document
Word frequency.In order to protrude the validity of the invention, one of feature f is repeated again2And as new feature f5It is added to data set
In.Therefore, feature fiAnd fjIt is perfectly correlated.The purpose of feature selecting is the highly relevant spy that selection has minimum correlation
Sign.f2And f5Comprising identical information, one of them is extra.One of feature has a higher M value, and another redundancy
The M value of feature is relatively low.F is calculated below2And f5The value of RDC be the same;And the value for calculating their M is then different.
RDC(f1)=(2+5)/2+ (5+0)/2=6
RDC(f2)=RDC (f5)=(1+0.5)/2+ (0.5+0)/2=1
RDC(f3)=(0+5)/2+ (5+0)/2=5
RDC(f4)=(20+5)/2+ (5+0)/2=15
Formula based on the M proposed repeats these calculating, and corresponding result is as follows, f2And f5M value not
It is identical.According to formula:
In, the value of the final M of feature is with correlation (first item on the right of equation) and redundancy (on the right of equation
Section 2) it is related.As two feature f2And f5When closely similar, therefore the correlation between them will be close to 1., if
fjF is selected beforei, then fiThe redundancy value of distribution will be less than fj。
Table 3RDC and M value proposed by the invention
Method | f1(cat) | f2(fish) | f3(mouse) | f4(dog) | f5(fish) |
RDC | 6 | 1 | 5 | 15 | 1 |
M | 5.902 | -0.193 | 4.63 | 15 | -0.84 |
Table 3 compares RDC the and M value of these features.It can be seen that f2And f5RDC value it is the same, and the two features are then
There is different M values.In this example, f2And f5Feature is identical, but f5M value ratio f2M value it is low, and have identical RDC
Value.Use M value, f2And f5Selection or give up depending on threshold value k (the predefined quantity k) of final character subset.If k=3,
f2And f5It does not select, and selects k=4, then f2 is selected, and f5 is rejected, and if k=5, f2And f5All selected.
The overall flow of the example:
For document D 4, K=4:
Step 1: initialization set S, set F are all characteristic values of D4, i.e., { mouse, cat, fish, mouse, fish };
Step 2: for each value in set F.Using formula calculate its degree of correlation RDC value f (cat)=6, f (fish)=
1, f (mouse)=5, f (dog)=15, f (fish)=1;
Step 3: be ranked up according to RDC f (dog)=15, f (cat)=6, f (mouse)=5, f (fish)=1, f (fish)=
1}
Step4: the selection maximum feature f of RDC valuemax(dog);
Step5: f (dog) is added in set S;
Step6: f (dog) is removed from set F;
Step7: traversal set F, to each feature sum (fi)=0;
Step8: to each of set F feature, the f (dog) combined in S calculates Correlation (fi, f (dog)) i.e.
Correlation (f (cat), f (dog))=sum (cat)+Correlation (f (cat), f (cat))
Sum(fi)=sum (f (cat))+Correlation (f (cat), f (dog))
Correlation (f (fish), f (dog))=sum (fish)+Correlation (f (fish), f (cat))
Sum(fi)=sum (f (fish))+Correlation (f (fish), f (dog))
Correlation (f (mouse), f (dog))=sum (mouse)+Correlation (f (mouse), f (cat))
Sum(fi)=sum (f (mouse))+Correlation (f (mouse), f (dog))
Correlation (f (fish), f (dog))=sum (fish)+Correlation (f (fish), f (cat))
Sum(fi)=sum (f (fish))+Correlation (f (fish), f (dog));
Step9: its M value, respectively M (cat)=5.902, M (fish)=- 0.193, M are calculated to each value in set F
(mouse)=4.63, M (fish)=- 0.84;
Step10: selection that maximum characteristic value of M value, i.e. M (cat)=5.902 are put into f (cat) in set S;
Step11: f (cat) is removed from set F;
Step12: the quantity repeated in step8-step11 to set S is equal to 4.
Final selected feature space is { dog, mouse, cat, fish }.
The present invention not only selects maximally related feature in feature space, but also is considered between them using relativity measurement
Redundancy can filter redundancy and incoherent feature from feature space, select optimal feature subset in feature space, by feature sky
Between dimensionality reduction, to improve the performance of text classification.
In conjunction with attached drawing, the embodiment of the present invention is explained in detail above, but the present invention is not limited to above-mentioned
Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept
Put that various changes can be made.
Claims (1)
1. a kind of algorithm for improving feature extraction, characterized by the following steps:
Step1: the feature quantity k that final feature space includes is inputted, creation one new set S, F are all spies of document D
Collection is closed;
Step2: each of traversal F feature fs, calculate its relevance values RDC (fs), i.e., RDC is calculated using following equations group
Value:
RDC(Wi)=AUC (wi,tcm),
Wherein WiIt is characterized word, dfpos(wi) and dfneg(wi) it is containing word W respectivelyiNumber of documents and do not contain word Wi's
Number of documents, tcj(wi) indicate word WiQuantity in document j, AUC (wi,tcj) indicate Feature Words WiWith word frequency tcjROC
Area under the curve, tcj-1Indicate quantity of the Feature Words in document j-1, tcj+1Indicate quantity of the Feature Words in document j+1,
tcmIndicate quantity of the Feature Words in the last one document m;
Step3: it is ranked up according to the calculated RDC value of step2 institute;
Step4: the feature f of maximum RDC is selectedmax;
Step5: addition fmaxTo set S;
Step6: f is removed from set Fmax;
Step7: traversal set F enables sum (f to each characteristic valuei)=0;
Step8: traversal set F, to each characteristic value fi, calculate itself and each feature f in SsThe degree of correlation
Correlation(fi,fs) and calculate
Step9: to each of set F characteristic value fi, its M (f is calculated using following formulai) value,
That is:
M(fi)=RDC (fi)-sum(fi),
Wherein RDC (fi) it is feature fiCorrelation, and correlation (fi, fj) indicate to be defined by their similarity
Two feature fiAnd fjBetween correlation, calculate correlation with Pearson correlation coefficient:
Wherein fi,dAnd fj,dIt is the word frequency of the Feature Words i and j of d-th of document respectively,WithIt is f respectivelyiAnd fjIn collection of document
The average value of middle word frequency, Correlation (fi,fj) be 1 when indicate maximum positive correlation, Correlation (fi,fj) it is -1
When indicate maximum negative correlation, value is between -1 and 1;
Step10: the selection maximum feature f of M valuemax;
Step11: increase fmaxTo set S;
Step12: f is removed from set Fmax;
Step13: step8-Step12 is repeated until the quantity in set S is equal to K;
Step14: set S is the feature of final choice.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810859899.0A CN109325511B (en) | 2018-08-01 | 2018-08-01 | Method for improving feature selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810859899.0A CN109325511B (en) | 2018-08-01 | 2018-08-01 | Method for improving feature selection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109325511A true CN109325511A (en) | 2019-02-12 |
CN109325511B CN109325511B (en) | 2020-07-31 |
Family
ID=65264054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810859899.0A Active CN109325511B (en) | 2018-08-01 | 2018-08-01 | Method for improving feature selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109325511B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188143A (en) * | 2019-04-04 | 2019-08-30 | 上海发电设备成套设计研究院有限责任公司 | A kind of power plant Vibration Trouble of Induced Draft Fan diagnostic method |
CN110426612A (en) * | 2019-08-17 | 2019-11-08 | 福州大学 | A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005081158A3 (en) * | 2004-02-23 | 2006-03-02 | Novartis Ag | Use of feature point pharmacophores (fepops) |
CN102200981A (en) * | 2010-03-25 | 2011-09-28 | 三星电子(中国)研发中心 | Feature selection method and feature selection device for hierarchical text classification |
CN103177121A (en) * | 2013-04-12 | 2013-06-26 | 天津大学 | Locality preserving projection method for adding pearson relevant coefficient |
CN105512311A (en) * | 2015-12-14 | 2016-04-20 | 北京工业大学 | Chi square statistic based self-adaption feature selection method |
-
2018
- 2018-08-01 CN CN201810859899.0A patent/CN109325511B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005081158A3 (en) * | 2004-02-23 | 2006-03-02 | Novartis Ag | Use of feature point pharmacophores (fepops) |
CN102200981A (en) * | 2010-03-25 | 2011-09-28 | 三星电子(中国)研发中心 | Feature selection method and feature selection device for hierarchical text classification |
CN103177121A (en) * | 2013-04-12 | 2013-06-26 | 天津大学 | Locality preserving projection method for adding pearson relevant coefficient |
CN105512311A (en) * | 2015-12-14 | 2016-04-20 | 北京工业大学 | Chi square statistic based self-adaption feature selection method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188143A (en) * | 2019-04-04 | 2019-08-30 | 上海发电设备成套设计研究院有限责任公司 | A kind of power plant Vibration Trouble of Induced Draft Fan diagnostic method |
CN110426612A (en) * | 2019-08-17 | 2019-11-08 | 福州大学 | A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method |
Also Published As
Publication number | Publication date |
---|---|
CN109325511B (en) | 2020-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bansal et al. | Improved k-mean clustering algorithm for prediction analysis using classification technique in data mining | |
Silva Filho et al. | Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization | |
CN105469096B (en) | A kind of characteristic bag image search method based on Hash binary-coding | |
CN105426426B (en) | A kind of KNN file classification methods based on improved K-Medoids | |
CN108228844B (en) | Picture screening method and device, storage medium and computer equipment | |
CN108197144B (en) | Hot topic discovery method based on BTM and Single-pass | |
Eluri et al. | A comparative study of various clustering techniques on big data sets using Apache Mahout | |
CN108897791B (en) | Image retrieval method based on depth convolution characteristics and semantic similarity measurement | |
CN109508374B (en) | Text data semi-supervised clustering method based on genetic algorithm | |
Chen et al. | Multi-granular mining for boundary regions in three-way decision theory | |
CN111080551B (en) | Multi-label image complement method based on depth convolution feature and semantic neighbor | |
Naidu et al. | Feature selection algorithm for improving the performance of classification: A survey | |
CN108268470A (en) | A kind of comment text classification extracting method based on the cluster that develops | |
Vishwakarma et al. | A comparative study of K-means and K-medoid clustering for social media text mining | |
Akhmedova et al. | Automatically generated classifiers for opinion mining with different term weighting schemes | |
CN109325511A (en) | A kind of algorithm improving feature selecting | |
Duan et al. | Improving spectral clustering with deep embedding, cluster estimation and metric learning | |
Chung et al. | Accurate ensemble pruning with PL-bagging | |
Liang et al. | Adaptive fusion based method for imbalanced data classification | |
Shyr et al. | Supervised hierarchical Pitman-Yor process for natural scene segmentation | |
CN107704872A (en) | A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method | |
Banu et al. | A study of feature selection approaches for classification | |
Rahman et al. | Denclust: A density based seed selection approach for k-means | |
Ahmed et al. | Clustering Based Sentiment Analysis Using Randomized Clustering Cuckoo Search Algorithm | |
Meng et al. | Adaptive resonance theory (ART) for social media analytics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Wang Haitao Inventor before: Wang Haitao Inventor before: Tang Kang |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |