CN103106275B - The text classification Feature Selection method of feature based distributed intelligence - Google Patents
The text classification Feature Selection method of feature based distributed intelligence Download PDFInfo
- Publication number
- CN103106275B CN103106275B CN201310050583.4A CN201310050583A CN103106275B CN 103106275 B CN103106275 B CN 103106275B CN 201310050583 A CN201310050583 A CN 201310050583A CN 103106275 B CN103106275 B CN 103106275B
- Authority
- CN
- China
- Prior art keywords
- feature
- class
- document
- feature words
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
The invention discloses a kind of text classification Feature Selection method of feature based distributed intelligence, for solving the technical matters of existing text classification Feature Selection method poor accuracy.Technical scheme first carries out pre-service to each section of document in document sets; Again whole collection of document is expressed as vector space model; Structural attitude dictionary; Add up each class C
icomprise number of files DF (t, the C of Feature Words t
j); Calculate for each class C
inormalized tf*idf value, then calculate this Feature Words at each class C
iinterior dispersion DIntra and average inter _ class relationship DInterAvg; Calculate each Feature Words t in text feature space
kat classification C
iin weight w
i(t); By whole Feature Words according to its weight descending sort in whole document sets, when carrying out Feature Selection, preferentially retain the forward Feature Words of ranking.Feature compartment system, on the basis of feature compartment system, applies in Feature Selection process by the method, improves text classification efficiency and accuracy rate.
Description
Technical field
The present invention relates to a kind of text classification Feature Selection method, particularly relate to a kind of text classification Feature Selection method of feature based distributed intelligence.
Background technology
Along with the development of communication technology and network, on the internet, a large amount of electronic documents is had every day as generations such as news, mail, microbloggings.Text automatic classification, as a kind of method of efficiently large volume document being carried out to Classification Management, is widely used in a lot of field.
Along with the explosive increase of quantity of information, the subject matter that automatic Text Categorization faces is the higher-dimension text vector feature space how processing the generation of a large amount of text data.Too high text vector feature space will produce two adverse effects to file classification method: the method for (1) a lot of comparative maturity cannot be optimized in higher dimensional space, and then cannot be applied in text classification.(2) because sorter is trained by training set and obtains, the too high text vector space of dimension will inevitably cause Expired Drugs to occur
[1].In text vector space, most of dimension and text classification are also uncorrelated, and even adulterate the more noise data affecting text classification precision
[2].Text feature screens, and according to certain Feature Selection algorithm, selects the more representative text feature of a part and forms the lower feature space of a new dimension, reach the object of dimensionality reduction from original feature space.The method is the effective method solving the too high problem of text classification Chinese version vector characteristics Spatial Dimension.The object of text feature screening to improve the execution efficiency of text classification work efficiency and algorithm.A lot of experiment proves, in most of the cases, initiatively about subtracts can obtain very large performance boost under the loss of less nicety of grading to feature space
[3].
Existing text classification Feature Selection algorithm mainly contains document frequency (DF), information gain (IG), information gain-ratio (GR), Chi-square Test (CHI), mutual information (MI) and Gini index etc.
[3,4].Below the good technology of several wherein effect in text classification is briefly introduced:
Document frequency (DF): document frequency refers to for given feature t, comprises the number of documents of t in collection of document.Its basic assumption is rare feature for class prediction is do not have helpful, or can not affect overall performance.The advantage of document frequency: because its realization is simple, calculated amount is little, so feature selecting speed is very fast, and actual effect is also good; Shortcoming: rare feature may not be rare in a certain class text, may contain important classification information yet, simply weed out, may affect the effect of classification, therefore should not by a large amount of rejecting feature of DF.
Information gain (IG): information gain is a kind of appraisal procedure based on entropy, a given feature t, when considering and do not consider it, quantity of information is respectively how many, and both differences are exactly the quantity of information that this feature is brought to system, i.e. gain
[5].Whether information gain considers the appearance of a feature, and in imbalanced data sets, for rare classification, experiment shows, considers the absent variable situation of feature to the contribution judging text categories often much smaller than considering the interference that feature does not show situation and brings.
Information gain-ratio (GR): information gain is proved to be devious in a lot of result.Due to the more and different attribute of value for training set learn too abundant, cause Information Gain Method to be more prone to select this attribute, information gain-ratio solves this shortcoming of information gain
[6].
Chi-square Test (CHI): Chi-square Test is the method for a kind of conventional inspection Two Variables independence in mathematical statistics, its most basic thought is exactly determine theoretical correctness by the deviation of observation actual value and theoretical value
[7,8].
During the experiment of text classification shows, during as feature selecting, the effect of Chi-square Test is best one, but it has only added up in text whether occur feature t, but the number of times that feature t occurs in the text is not considered, therefore make it have low-frequency word and necessarily exaggerate effect, " low-frequency word defect " that this namely Chi-square Test is famous.
The present invention is at feature compartment system
[9]basis on, inter _ class relationship computing method are improved, this system are applied in Feature Selection process.
List of references:
[1]JiemingYang,YuanningLiu,XiaodongZhuetal,Anewfeatureselectionbasedoncomprehensivemeasurementbothininter-categoryandintra-categoryfortextcategorization,InformationProcessing&Management,Volume48,Issue4,2012,pp.741-754
[2]WenqianShang,HoukuanHuangandHaibinZhuetal,Anovelfeatureselectionalgorithmfortextclassification,ExpertSystemswithApplications,Volume33,Issuel,2007,pp.1-5
[3]MonicaRogatiandYimingYang,High-performingfeatureselectionfortextclassification.InProceedingsoftheeleventhinternationalconferenceonInformationandknowledgemanagement(CIKM′02).ACM,NewYork,NY,USA,2002,pp.659-661.
[4]Yang,Y.,Pedersen,J.O.,AComparativeStudyonFeatureSelectioninTextClassification.InProceedingsofthe14thinternationalconferenceonmachinelearning,Nashville,USA,1997,pp.4l2-420.
[5]Forman,G.,AnExtensiveEmpiricalofFeatureSelectionMetricsforTextClassification.JournalofMachineLearningResearch,3,2003,pp.1289-1305.
[6]TatsunoriMori,MiwaKikuchiandKazufumiYoshida,,TermWeightingMethodbasedonInformationGainRatioforSummarizingDocumentsretrievedbyIRsystems.JournalofNaturalLanguageProcessing,9(4),2001,pp.3-32.
[7]Zheng,Z.,Srihari,R,OptimallyCombiningPositiveandNegativeFeaturesforTextClassification.ICML2003WorkshoponLearningfromImbalancedDataSets,2003.
[8]LuigiGalavotti,ViaJacopoNardiandFabrizioSebastianietal,FeatureSelectionandNegativeEvidenceinAutomatedTextClassification.InProceedingsofthe4thEuropeanConferenceonResearchandAdvancedTechnologyforDigitalLibraries(ECDL’00),2000.
V.Lertnattee,T.Theeramunkong,Improvingcentroid-basedtextclassificationusingterm-distribution-basedweightingandfeatureselection,InProceedingsofINTECH-01,2ndInternationalConferenceonIntelligentTechnologies,Bangkok,Thailand,2001,pp.349-355.
Summary of the invention
In order to overcome the deficiency of existing text classification Feature Selection method poor accuracy, the invention provides a kind of text classification Feature Selection method of feature based distributed intelligence.The method, on the basis of feature compartment system, is improved inter _ class relationship computing method, is applied in Feature Selection process by feature compartment system.The method takes full advantage of in the tf*idf information of text feature, class and distribution between class information, reflect characteristic item significance level in the text more objectively, thus select the characteristic item that can represent text feature, reach Feature Selection object, text classification efficiency and accuracy rate can be improved.This method can be issued to higher classify accuracy selecting the situation of less characteristic item, has the advantage of fast convergence rate simultaneously, makes this method also can apply to skewed data set to the improvement of distribution between class.
The technical solution adopted for the present invention to solve the technical problems is: a kind of text classification Feature Selection method of feature based distributed intelligence, is characterized in comprising the following steps:
1. in pair document sets, each section of document carries out participle, removes stop words and gets stem process.
2. whole collection of document is expressed as vector space model.
3. from collection of document, extract all Feature Words, structural attitude dictionary.
4. to add up in text feature space each Feature Words t at every section of document d
jfrequency TF (t, the d of middle appearance
j), and at each class C
ifrequency TF (t, the C of middle appearance
i), add up each class C simultaneously
icomprise number of files DF (t, the C of Feature Words t
j).
5. according to the information that step 4 obtains, for each Feature Words t
k, first calculate for each class C
inormalized tf*idf value, then calculate this Feature Words at each class C
iinterior dispersion DIntra and average inter _ class relationship DInterAvg.
6., according to the information that step 4, step 5 step obtain, utilize following formula to calculate each Feature Words t in text feature space
kat classification C
iin weight w
i(t).
w
i(t)=tf*idf*DInterAvg*(1-DIntra)
By Feature Words t
kweight summation in each category, is the weight of this Feature Words in whole document sets, i.e. Feature Words t
ktDFS value.
7. by whole Feature Words according to its weight descending sort in whole document sets, when carrying out Feature Selection, preferentially retain the forward Feature Words of ranking.
The invention has the beneficial effects as follows: because the method is on the basis of feature compartment system, inter _ class relationship computing method are improved, feature compartment system is applied in Feature Selection process.The method takes full advantage of in the tf*idf information of text feature, class and distribution between class information, reflect characteristic item significance level in the text more objectively, thus select the characteristic item that can represent text feature, reach Feature Selection object, improve text classification efficiency and accuracy rate.This method can be issued to higher classify accuracy selecting the situation of less characteristic item, has the advantage of fast convergence rate simultaneously, makes this method also can apply to skewed data set to the improvement of distribution between class.
Below in conjunction with drawings and Examples, the present invention is elaborated.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the text classification Feature Selection method that the present invention is based on feature distributed intelligence.
Embodiment
The inventive method concrete steps are as follows:
1. the concept relevant with the present invention.
Tf*idf(Termfrequencyinversedocumentfrequency): be a kind of statistical method, in order to assess the significance level of a words for a copy of it file in a file set or a corpus.The importance of words to be directly proportional increase along with the number of times that it occurs hereof, the decline but the frequency that can occur in corpus along with it is inversely proportional to simultaneously.
Within-cluster variance (Intra-classdistribution): refer to the distribution situation of a Feature Words in a certain class document, if be uniformly distributed in such each document, then this Feature Words within-cluster variance in such document is lower; Otherwise if integrated distribution is in a few sections of documents, all do not occur in all the other documents, then this Feature Words within-cluster variance in such document is higher.
Inter _ class relationship (Inter-classdistribution): refer to the distribution situation of a Feature Words in whole document sets is of all categories, if be uniformly distributed in all kinds of document, then the inter _ class relationship of this Feature Words in this whole document sets is lower; Otherwise occur if only concentrate in some or several classification, and do not occur in other classifications, then the inter _ class relationship of this Feature Words in whole document sets is higher.
Average inter _ class relationship (Averageinter-classdistribution): the present invention of this Objective Concept proposes, and is an improvement to inter _ class relationship concept.Because inter _ class relationship adopts the total word frequency of Feature Words in each class document to weigh its distribution situation in all kinds of, if different classes of middle number of documents very different, namely there is deflection in data set, use the method that the Feature Words in the less classification of number of documents can be made flood by the larger classification of number of documents.Average inter _ class relationship after improvement, adopt Feature Words in each classification on average every section of document occur that the word frequency of this Feature Words is to weigh its distribution situation in all kinds of, not by the impact of data skew, accurately can reflect the distribution between class situation of Feature Words.
2. the properties relevant with the present invention.
The number of times that the some Feature Words of character 1. occur in a certain class document is more, more the classification of energy specification documents, and weight is larger.
The original document set sample of the garbages such as form removed by table 1
Numbering | Original document | Classification |
1 | Yao has great talent in basketball games. | PE |
2 | We are playing a game about basketball in the playground. | PE |
3 | We are enj oying the music at the concert. | MUSIC |
4 | Music is an art and everybody may enjoy it. | MUSIC |
5 | Playing basketball is my favorite sport. | PE |
6 | Listening to the music is my hobby. | MUSIC |
Such as: in the document sets shown in table 1, Feature Words basketball has occurred 3 times in PE class document, weight is comparatively large, and talent only occurs in such that once, weight is less.
The within-cluster variance of the some Feature Words of character 2. is lower, more the classification of energy specification documents, and weight is larger.
Table 2 is through the collection of document of Text Pretreatment
Numbering | Training document | Classification |
1 | yao ha great talent basketbal game | PE |
2 | we plai game about basketbal playground | PE |
3 | we enjoi music concert | MUSIC |
4 | music art everybodi mai enjoi | MUSIC |
5 | plai basketbal my favorit sport | PE |
6 | listen music my hobbi | MUSIC |
The descending sort of table 3 sample document set whole Feature Words TDFS value
Feature Words | TDFS value | Feature Words | TDFS value |
music | 0.554 | hobbi | 0.158 |
basketbal | 0.489 | listen | 0.158 |
enjoi | 0.489 | great | 0.140 |
game | 0.394 | ha | 0.140 |
plai | 0.394 | talent | 0.140 |
concert | 0.158 | yao | 0.140 |
art | 0.158 | playground | 0.140 |
everybodi | 0.158 | favorit | 0.140 |
mai | 0.158 | sport | 0.140 |
Such as: in the document sets shown in table 1, Feature Words basketball is uniformly distributed in PE class document, and every section of document has a basketball, within-cluster variance is low, illustrate that this word is extensively evenly present in PE class document, can the classification information of specification documents well, weight is larger.Feature Words talent only occurs in one of PE section of document, and all without talent in all the other two sections, so the within-cluster variance of this Feature Words is higher, the document comprising this word is not that the possibility of PE classification is comparatively large, and the weight thus calculated is also lower.
The inter _ class relationship of the some Feature Words of character 3. is higher, more the classification of energy specification documents, and weight is larger.
Such as: in the document sets shown in table 1, Feature Words basketball only occurs in PE class document, never occurs in MUSIC class document, and this Feature Words inter _ class relationship is very high, can the classification information of specification documents well, and weight is larger.And Feature Words my, in PE class and MUSIC class, each appearance is once, and belong between class and be uniformly distributed, so inter _ class relationship is very low, can not represent the feature of classification preferably, thus weight is also lower.
The average inter _ class relationship of the some Feature Words of character 4. is higher, more the classification of energy specification documents, and weight is larger.
The average inter _ class relationship character citing 1 of table 4
The average inter _ class relationship character citing 2 of table 5
Citing: provide two examples in table 4, table 5, suppose that the distribution situation of a certain Feature Words t in document sets is as shown in table 4, because t is in the total word frequency identical (be 2) of A, B two in class, so the inter _ class relationship of this Feature Words is 0, but obviously t is necessarily representational for still having category-B, for no other reason than that A, B two number of documents gaps of class are excessive, the category-B Feature Words t causing number of files less flood by the more category-A of number of files.Example shown in table 5, the inter _ class relationship calculating Feature Words t be equally 0(A, B two in class the total word frequency of t be all 1000), but t for distinguish A, B two the significance level of class be apparent.As can be seen here when data set deflection, when using inter _ class relationship to weigh Feature Words, Feature Words representative in the classification that number of documents is less cannot be highlighted.If use average inter _ class relationship to weigh, Feature Words t is very uneven in A, B two distributions of class, and average inter _ class relationship is higher, therefore can give rational weight for Feature Words t, eliminate the adverse effect that data skew brings, also there is the original function of inter _ class relationship simultaneously.
For given document sets D, the detailed process that the present invention screens attribute in document sets is as follows:
1. parse documents concentrates all documents, rejects useless structural identification etc., extracts the main information such as exercise question, content in document.
Some structural identification information (seeing the following form) may be there are in document, occur all in the same way in every section of document, first fall as information filterings such as times by these marks and with content of text classification is incoherent.
2. pair content of text carries out pre-service, extracts characteristic item (term) and forms text feature space.
For documents all in document sets, after the parsing of the 1st step, the content information of each document can be obtained, in table 1.Pre-service is carried out to each section of document in document sets: participle (tokenizing), remove stop words (stopwordsremoval), get stem (stemming) process after, a set be made up of some words can be obtained, each word in set is referred to as text feature item (term), all characteristic items just constitute text feature space (termspace), the document in table 1 are obtained result after pre-service as shown in table 2.
3. from collection of document, extract all Feature Words, structural attitude dictionary.
For documents all in document sets, after the process by the 2nd step, collect all Feature Words constitutive characteristic dictionaries occurred in document sets, as the basis of Feature Selection.
4. to add up in text feature space each Feature Words t at every section of document d
jfrequency TF (t, the d of middle appearance
j), and at each class C
ifrequency TF (t, the C of middle appearance
i), add up each class C simultaneously
icomprise number of files DF (t, the C of Feature Words t
j).
5. according to the statistical information that the 4th step obtains, for each Feature Words calculates normalized tf*idf value, within-cluster variance, average inter _ class relationship.
(1) tf*idf: computing formula is as follows:
In formula, n represents C
ithe whole Feature Words numbers occurred in class.L is a constant, is obtained by experiment test, usually gets 0.1 or 0.01.The calculation deviation that normalized tf*idf value can avoid document overlength to bring.
(2) within-cluster variance: computing formula is as follows:
(3) average inter _ class relationship: computing formula is as follows:
6. use the result in the 5th step, calculate the weight of characteristic item t in each class.Computing formula is as follows:
wi(t)=tf*idf*DInterAvg*(1-DIntra)
7., by the weight summation of characteristic item t in of all categories, obtain the weight of this characteristic item in whole document sets, namely
TDFS value.Computing formula is as follows:
8. calculate the TDFS value of all characteristic items in document sets according to descending sort, the more forward value of characteristic item in document sets of ranking is higher, and in document classification, role is larger.
Claims (1)
1. a text classification Feature Selection method for feature based distributed intelligence, is characterized in that comprising the following steps:
(1). each section of document in document sets is carried out to participle, removes stop words and get stem process;
(2). whole collection of document is expressed as vector space model;
(3). from collection of document, extract all Feature Words, structural attitude dictionary;
(4). in statistics text feature space, each Feature Words t is at every section of document d
jfrequency TF (t, the d of middle appearance
j), and at each class C
ifrequency TF (t, the C of middle appearance
i), add up each class C simultaneously
icomprise number of files DF (t, the C of Feature Words t
i);
(5). according to the information that step (4) obtains, for each Feature Words t, first calculate for each class C
inormalized tf*idf value, then calculate this Feature Words at each class C
iinterior dispersion DIntra and average inter _ class relationship DInterAvg;
(6). according to the information that step (4), step (5) step obtain, utilize following formula to calculate in text feature space each Feature Words t at classification C
iin weight w
i(t);
w
i(t)=tf*idf*DInterAvg*(1-DIntra)
The computing formula of average inter _ class relationship is as follows:
By Feature Words t weight summation in each category, be the weight of this Feature Words in whole document sets, i.e. the TDFS value of Feature Words t:
(7). by whole Feature Words according to its weight descending sort in whole document sets, when carrying out Feature Selection, preferentially retain the forward Feature Words of ranking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310050583.4A CN103106275B (en) | 2013-02-08 | 2013-02-08 | The text classification Feature Selection method of feature based distributed intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310050583.4A CN103106275B (en) | 2013-02-08 | 2013-02-08 | The text classification Feature Selection method of feature based distributed intelligence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103106275A CN103106275A (en) | 2013-05-15 |
CN103106275B true CN103106275B (en) | 2016-02-10 |
Family
ID=48314130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310050583.4A Expired - Fee Related CN103106275B (en) | 2013-02-08 | 2013-02-08 | The text classification Feature Selection method of feature based distributed intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103106275B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915327B (en) * | 2014-03-14 | 2019-01-29 | 腾讯科技(深圳)有限公司 | A kind of processing method and processing device of text information |
CN104462556B (en) * | 2014-12-25 | 2018-02-23 | 北京奇虎科技有限公司 | Question and answer page relevant issues recommend method and apparatus |
CN105045812B (en) * | 2015-06-18 | 2019-01-29 | 上海高欣计算机系统有限公司 | The classification method and system of text subject |
CN106940703B (en) * | 2016-01-04 | 2020-09-11 | 腾讯科技(北京)有限公司 | Pushed information rough selection sorting method and device |
CN106054857B (en) * | 2016-05-27 | 2019-12-24 | 大连楼兰科技股份有限公司 | Maintenance decision tree/word vector-based fault remote diagnosis platform |
CN106055439B (en) * | 2016-05-27 | 2019-09-27 | 大连楼兰科技股份有限公司 | Based on maintenance decision tree/term vector Remote Fault Diagnosis system and method |
CN106227768B (en) * | 2016-07-15 | 2019-09-03 | 国家计算机网络与信息安全管理中心 | A kind of short text opining mining method based on complementary corpus |
CN106997345A (en) * | 2017-03-31 | 2017-08-01 | 成都数联铭品科技有限公司 | The keyword abstraction method of word-based vector sum word statistical information |
CN106997344A (en) * | 2017-03-31 | 2017-08-01 | 成都数联铭品科技有限公司 | Keyword abstraction system |
CN107329999B (en) * | 2017-06-09 | 2020-10-20 | 江西科技学院 | Document classification method and device |
CN107844553B (en) * | 2017-10-31 | 2021-07-27 | 浪潮通用软件有限公司 | Text classification method and device |
CN108153872A (en) * | 2017-12-25 | 2018-06-12 | 佛山市车品匠汽车用品有限公司 | A kind of method and apparatus of the Internet web page information filtering |
CN108491429A (en) * | 2018-02-09 | 2018-09-04 | 湖北工业大学 | A kind of feature selection approach based on document frequency and word frequency statistics between class in class |
CN108776654A (en) * | 2018-05-30 | 2018-11-09 | 昆明理工大学 | One kind being based on improved simhash transcription comparison methods |
CN110210559B (en) * | 2019-05-31 | 2021-10-08 | 北京小米移动软件有限公司 | Object screening method and device and storage medium |
CN110442678B (en) * | 2019-07-24 | 2022-03-29 | 中智关爱通(上海)科技股份有限公司 | Text word weight calculation method and system, storage medium and terminal |
CN111881668B (en) * | 2020-08-06 | 2023-06-30 | 成都信息工程大学 | TF-IDF computing device based on chi-square statistics and TF-CRF improvement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290626A (en) * | 2008-06-12 | 2008-10-22 | 昆明理工大学 | Text categorization feature selection and weight computation method based on field knowledge |
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN102622373A (en) * | 2011-01-31 | 2012-08-01 | 中国科学院声学研究所 | Statistic text classification system and statistic text classification method based on term frequency-inverse document frequency (TF*IDF) algorithm |
-
2013
- 2013-02-08 CN CN201310050583.4A patent/CN103106275B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101290626A (en) * | 2008-06-12 | 2008-10-22 | 昆明理工大学 | Text categorization feature selection and weight computation method based on field knowledge |
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN102622373A (en) * | 2011-01-31 | 2012-08-01 | 中国科学院声学研究所 | Statistic text classification system and statistic text classification method based on term frequency-inverse document frequency (TF*IDF) algorithm |
Non-Patent Citations (3)
Title |
---|
一种改进的特征权重算法;张瑜等;《计算机工程》;20110331;第37卷(第5期);正文第211页左栏第4段-倒数第2段 * |
文本分类中一种改进的特征选择方法;刘海峰等;《情报科学》;20071031;第25卷(第10期);全文 * |
文本自动分类中特征权重算法的改进研究;徐凤亚等;《计算机工程与应用》;20050131;正文第181页左栏第1段-第183页左栏倒数第4段 * |
Also Published As
Publication number | Publication date |
---|---|
CN103106275A (en) | 2013-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103106275B (en) | The text classification Feature Selection method of feature based distributed intelligence | |
CN102332025B (en) | Intelligent vertical search method and system | |
CN102622373B (en) | Statistic text classification system and statistic text classification method based on term frequency-inverse document frequency (TF*IDF) algorithm | |
CN104239539B (en) | A kind of micro-blog information filter method merged based on much information | |
CN104615608B (en) | A kind of data mining processing system and method | |
CN106156372B (en) | A kind of classification method and device of internet site | |
CN102298646B (en) | Method and device for classifying subjective text and objective text | |
CN101937436B (en) | Text classification method and device | |
CN101609472B (en) | Keyword evaluation method and device based on platform for questions and answers | |
CN103473262B (en) | A kind of Web comment viewpoint automatic classification system based on correlation rule and sorting technique | |
CN106202518A (en) | Based on CHI and the short text classification method of sub-category association rule algorithm | |
CN105573887B (en) | The method for evaluating quality and device of search engine | |
CN105912625A (en) | Linked data oriented entity classification method and system | |
CN103886108B (en) | The feature selecting and weighing computation method of a kind of unbalanced text set | |
CN108268554A (en) | A kind of method and apparatus for generating filtering junk short messages strategy | |
CN105373606A (en) | Unbalanced data sampling method in improved C4.5 decision tree algorithm | |
CN102841946A (en) | Commodity data retrieval sequencing and commodity recommendation method and system | |
CN107145560A (en) | A kind of file classification method and device | |
CN101763431A (en) | PL clustering method based on massive network public sentiment information | |
Vidinli et al. | New query suggestion framework and algorithms: A case study for an educational search engine | |
CN105389505A (en) | Shilling attack detection method based on stack type sparse self-encoder | |
CN109271517A (en) | IG TF-IDF Text eigenvector generates and file classification method | |
CN106484919A (en) | A kind of industrial sustainability sorting technique based on webpage autonomous word and system | |
CN105787662A (en) | Mobile application software performance prediction method based on attributes | |
CN103914551A (en) | Method for extending semantic information of microblogs and selecting features thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160210 Termination date: 20200208 |