CN104866606A - 一种MapReduce并行化大数据文本分类方法 - Google Patents
一种MapReduce并行化大数据文本分类方法 Download PDFInfo
- Publication number
- CN104866606A CN104866606A CN201510297189.XA CN201510297189A CN104866606A CN 104866606 A CN104866606 A CN 104866606A CN 201510297189 A CN201510297189 A CN 201510297189A CN 104866606 A CN104866606 A CN 104866606A
- Authority
- CN
- China
- Prior art keywords
- text
- test data
- classification
- data
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510297189.XA CN104866606B (zh) | 2015-06-02 | 2015-06-02 | 一种MapReduce并行化大数据文本分类方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510297189.XA CN104866606B (zh) | 2015-06-02 | 2015-06-02 | 一种MapReduce并行化大数据文本分类方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104866606A true CN104866606A (zh) | 2015-08-26 |
CN104866606B CN104866606B (zh) | 2019-02-01 |
Family
ID=53912432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510297189.XA Active CN104866606B (zh) | 2015-06-02 | 2015-06-02 | 一种MapReduce并行化大数据文本分类方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104866606B (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302730A (zh) * | 2015-12-09 | 2016-02-03 | 浪潮集团有限公司 | 一种检测计算模型的方法、测试服务器及业务平台 |
CN106484873A (zh) * | 2016-10-13 | 2017-03-08 | 成都东方盛行电子有限责任公司 | 一种大数据分类处理方法 |
CN106897443A (zh) * | 2017-03-01 | 2017-06-27 | 深圳市博信诺达经贸咨询有限公司 | 大数据的划分方法及系统 |
CN107590196A (zh) * | 2017-08-15 | 2018-01-16 | 中国农业大学 | 一种社交网络中地震应急信息筛选评价方法及系统 |
CN112000807A (zh) * | 2020-09-07 | 2020-11-27 | 辽宁国诺科技有限公司 | 一种建议提案精确分类方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6233575B1 (en) * | 1997-06-24 | 2001-05-15 | International Business Machines Corporation | Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values |
CN103810293A (zh) * | 2014-02-28 | 2014-05-21 | 广州云宏信息科技有限公司 | 基于Hadoop的文本分类方法及装置 |
CN104536830A (zh) * | 2015-01-09 | 2015-04-22 | 哈尔滨工程大学 | 一种基于MapReduce的KNN文本分类方法 |
-
2015
- 2015-06-02 CN CN201510297189.XA patent/CN104866606B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6233575B1 (en) * | 1997-06-24 | 2001-05-15 | International Business Machines Corporation | Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values |
CN103810293A (zh) * | 2014-02-28 | 2014-05-21 | 广州云宏信息科技有限公司 | 基于Hadoop的文本分类方法及装置 |
CN104536830A (zh) * | 2015-01-09 | 2015-04-22 | 哈尔滨工程大学 | 一种基于MapReduce的KNN文本分类方法 |
Non-Patent Citations (1)
Title |
---|
余晓山: "基于MapReduce的并行文本聚类", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302730A (zh) * | 2015-12-09 | 2016-02-03 | 浪潮集团有限公司 | 一种检测计算模型的方法、测试服务器及业务平台 |
CN106484873A (zh) * | 2016-10-13 | 2017-03-08 | 成都东方盛行电子有限责任公司 | 一种大数据分类处理方法 |
CN106897443A (zh) * | 2017-03-01 | 2017-06-27 | 深圳市博信诺达经贸咨询有限公司 | 大数据的划分方法及系统 |
CN107590196A (zh) * | 2017-08-15 | 2018-01-16 | 中国农业大学 | 一种社交网络中地震应急信息筛选评价方法及系统 |
CN112000807A (zh) * | 2020-09-07 | 2020-11-27 | 辽宁国诺科技有限公司 | 一种建议提案精确分类方法 |
Also Published As
Publication number | Publication date |
---|---|
CN104866606B (zh) | 2019-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kadhim et al. | Text document preprocessing and dimension reduction techniques for text document clustering | |
CN103279478B (zh) | 一种基于分布式互信息文档特征提取方法 | |
CN106599054B (zh) | 一种题目分类及推送的方法及系统 | |
CN110851598B (zh) | 文本分类方法、装置、终端设备及存储介质 | |
CN104598532A (zh) | 一种信息处理方法及装置 | |
CN104866606A (zh) | 一种MapReduce并行化大数据文本分类方法 | |
CN105701084A (zh) | 一种基于互信息的文本分类的特征提取方法 | |
Gadde et al. | SMS spam detection using machine learning and deep learning techniques | |
CN108304382B (zh) | 基于制造过程文本数据挖掘的质量分析方法与系统 | |
CN107066555A (zh) | 面向专业领域的在线主题检测方法 | |
CN104536830A (zh) | 一种基于MapReduce的KNN文本分类方法 | |
CN104881458A (zh) | 一种网页主题的标注方法和装置 | |
CN102629272A (zh) | 一种基于聚类的考试系统试题库优化方法 | |
Rakholia et al. | Classification of Gujarati Documents using Naï ve Bayes Classifier | |
Deniz et al. | Effects of various preprocessing techniques to Turkish text categorization using n-gram features | |
Kandhro et al. | Classification of Sindhi headline news documents based on TF-IDF text analysis scheme | |
CN115953123A (zh) | 机器人自动化流程的生成方法、装置、设备及存储介质 | |
Hussain et al. | Design and analysis of news category predictor | |
CN106294689B (zh) | 一种基于文本类特征选择进行降维的方法和装置 | |
Swami et al. | Resume classifier and summarizer | |
CN109871889B (zh) | 突发事件下大众心理评估方法 | |
Hardaya et al. | Application of text mining for classification of community complaints and proposals | |
Mehedi et al. | Automatic bangla article content categorization using a hybrid deep learning model | |
Kadhim et al. | Feature extraction for co-occurrence-based cosine similarity score of text documents | |
CN106202116A (zh) | 一种基于粗糙集与knn的文本分类方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210716 Address after: 321000 Dingtai building, No. 1489, Danxi Road, Wucheng District, Jinhua City, Zhejiang Province Patentee after: ZHEJIANG SHIDA JIHAI NEW TECHNOLOGY Co.,Ltd. Address before: 321004 No. 688 Yingbin Road, Zhejiang, Jinhua Patentee before: ZHEJIANG NORMAL University |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220210 Address after: 321000 room 602, unit 2, building 5, 239 danguang West Road, Wucheng District, Jinhua City, Zhejiang Province Patentee after: Zhu Xinzhong Address before: 321000 Dingtai building, No. 1489, Danxi Road, Wucheng District, Jinhua City, Zhejiang Province Patentee before: ZHEJIANG SHIDA JIHAI NEW TECHNOLOGY CO.,LTD. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230712 Address after: Room 703, Building 3, Shengde International Business Center, Liangzhu Street, Yuhang District, Hangzhou, Zhejiang 311118 Patentee after: Hangzhou Yalong Intelligent Technology Co.,Ltd. Address before: 321000 room 602, unit 2, building 5, 239 danguang West Road, Wucheng District, Jinhua City, Zhejiang Province Patentee before: Zhu Xinzhong |
|
TR01 | Transfer of patent right |