CN103279478B - 一种基于分布式互信息文档特征提取方法 - Google Patents
一种基于分布式互信息文档特征提取方法 Download PDFInfo
- Publication number
- CN103279478B CN103279478B CN201310138475.2A CN201310138475A CN103279478B CN 103279478 B CN103279478 B CN 103279478B CN 201310138475 A CN201310138475 A CN 201310138475A CN 103279478 B CN103279478 B CN 103279478B
- Authority
- CN
- China
- Prior art keywords
- document
- participle
- task
- word
- feature words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012545 processing Methods 0.000 claims description 13
- 230000011218 segmentation Effects 0.000 claims description 13
- 239000000203 mixture Substances 0.000 claims description 5
- 241000197727 Euscorpius alpha Species 0.000 claims description 3
- 238000013467 fragmentation Methods 0.000 claims description 3
- 238000006062 fragmentation reaction Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 7
- 230000007812 deficiency Effects 0.000 abstract description 4
- 238000013461 design Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000035800 maturation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000004899 motility Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310138475.2A CN103279478B (zh) | 2013-04-19 | 2013-04-19 | 一种基于分布式互信息文档特征提取方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310138475.2A CN103279478B (zh) | 2013-04-19 | 2013-04-19 | 一种基于分布式互信息文档特征提取方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103279478A CN103279478A (zh) | 2013-09-04 |
CN103279478B true CN103279478B (zh) | 2016-08-10 |
Family
ID=49061998
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310138475.2A Active CN103279478B (zh) | 2013-04-19 | 2013-04-19 | 一种基于分布式互信息文档特征提取方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103279478B (zh) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140372457A1 (en) * | 2013-06-17 | 2014-12-18 | Tencent Technology Shenzhen Company Limited | Method and device for processing data |
CN103530345A (zh) * | 2013-10-08 | 2014-01-22 | 北京百度网讯科技有限公司 | 短文本特征扩展及拟合特征库构建方法、装置 |
CN103955489B (zh) * | 2014-04-15 | 2017-09-22 | 华南理工大学 | 基于信息熵特征权重量化的海量短文本分布式knn分类算法及系统 |
CN104050242B (zh) * | 2014-05-27 | 2018-03-27 | 哈尔滨理工大学 | 基于最大信息系数的特征选择、分类方法及其装置 |
CN105488022A (zh) * | 2014-09-24 | 2016-04-13 | 中国电信股份有限公司 | 一种文本特征提取系统和方法 |
CN104408034B (zh) * | 2014-11-28 | 2017-03-22 | 武汉数为科技有限公司 | 一种面向文本大数据的中文分词方法 |
CN104462544A (zh) * | 2014-12-24 | 2015-03-25 | 大连海天兴业科技有限公司 | 一种面向乘客需求的地铁/高铁车载服务器视频更新方法 |
CN104573027B (zh) * | 2015-01-13 | 2018-07-24 | 清华大学 | 一种从文档集中挖掘特征词的系统和方法 |
CN105117466A (zh) * | 2015-08-27 | 2015-12-02 | 中国电信股份有限公司湖北号百信息服务分公司 | 一种互联网信息筛选系统及方法 |
CN105701084A (zh) * | 2015-12-28 | 2016-06-22 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种基于互信息的文本分类的特征提取方法 |
CN106202498A (zh) * | 2016-07-20 | 2016-12-07 | 淮阴工学院 | 一种基于分类语料库‑关键词词频‑记录关联的网络行为习惯量化方法 |
CN108108346B (zh) * | 2016-11-25 | 2021-12-24 | 广东亿迅科技有限公司 | 文档的主题特征词抽取方法及装置 |
CN107766323B (zh) * | 2017-09-06 | 2021-08-31 | 淮阴工学院 | 一种基于互信息和关联规则的文本特征提取方法 |
CN110069630B (zh) * | 2019-03-20 | 2023-07-21 | 重庆信科设计有限公司 | 一种改进的互信息特征选择方法 |
CN110096705B (zh) * | 2019-04-29 | 2023-09-08 | 扬州大学 | 一种无监督的英文句子自动简化算法 |
CN112948589B (zh) * | 2021-05-13 | 2021-07-30 | 腾讯科技(深圳)有限公司 | 文本分类方法、装置和计算机可读存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763431A (zh) * | 2010-01-06 | 2010-06-30 | 电子科技大学 | 基于海量网络舆情信息的pl聚类处理方法 |
CN102147813A (zh) * | 2011-04-07 | 2011-08-10 | 江苏省电力公司 | 一种电力云环境下基于k最近邻算法的文档自动分类方法 |
US8234285B1 (en) * | 2009-07-10 | 2012-07-31 | Google Inc. | Context-dependent similarity measurements |
CN102638456A (zh) * | 2012-03-19 | 2012-08-15 | 杭州海康威视系统技术有限公司 | 基于云计算的海量实时视频码流智能分析方法及其系统 |
CN102662952A (zh) * | 2012-03-02 | 2012-09-12 | 成都康赛电子科大信息技术有限责任公司 | 一种基于层次的中文文本并行数据挖掘方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110106807A1 (en) * | 2009-10-30 | 2011-05-05 | Janya, Inc | Systems and methods for information integration through context-based entity disambiguation |
-
2013
- 2013-04-19 CN CN201310138475.2A patent/CN103279478B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8234285B1 (en) * | 2009-07-10 | 2012-07-31 | Google Inc. | Context-dependent similarity measurements |
CN101763431A (zh) * | 2010-01-06 | 2010-06-30 | 电子科技大学 | 基于海量网络舆情信息的pl聚类处理方法 |
CN102147813A (zh) * | 2011-04-07 | 2011-08-10 | 江苏省电力公司 | 一种电力云环境下基于k最近邻算法的文档自动分类方法 |
CN102662952A (zh) * | 2012-03-02 | 2012-09-12 | 成都康赛电子科大信息技术有限责任公司 | 一种基于层次的中文文本并行数据挖掘方法 |
CN102638456A (zh) * | 2012-03-19 | 2012-08-15 | 杭州海康威视系统技术有限公司 | 基于云计算的海量实时视频码流智能分析方法及其系统 |
Non-Patent Citations (3)
Title |
---|
Web文本特征提取方法的研究与发展;庞景安;《情报理论与实践》;20060530;第29卷(第3期);第338-340,367页 * |
一种基于MapReduce的分布式文本数据过滤模型研究;李虎等;《信息网络安全》;20110910(第9期);第91-93,119页 * |
基于文本内容的敏感数据识别方法研究与实现;李伟伟等;《计算机工程与设计》;20130416;第34卷(第4期);第1202-1206页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103279478A (zh) | 2013-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103279478B (zh) | 一种基于分布式互信息文档特征提取方法 | |
CN106599054B (zh) | 一种题目分类及推送的方法及系统 | |
CN104112026B (zh) | 一种短信文本分类方法及系统 | |
CN103593418B (zh) | 一种面向大数据的分布式主题发现方法及系统 | |
CN111581949B (zh) | 学者人名的消歧方法、装置、存储介质及终端 | |
Bates et al. | Counting clusters in twitter posts | |
CN106095737A (zh) | 文档相似度计算方法及相似文档全网检索跟踪方法 | |
CN103617157A (zh) | 基于语义的文本相似度计算方法 | |
CN113312461A (zh) | 基于自然语言处理的智能问答方法、装置、设备及介质 | |
Liang et al. | Express supervision system based on NodeJS and MongoDB | |
CN103955489A (zh) | 基于信息熵特征权重量化的海量短文本分布式knn分类算法及系统 | |
Hossny et al. | Feature selection methods for event detection in Twitter: a text mining approach | |
CN102629272A (zh) | 一种基于聚类的考试系统试题库优化方法 | |
Wu et al. | Extracting topics based on Word2Vec and improved Jaccard similarity coefficient | |
CN104536830A (zh) | 一种基于MapReduce的KNN文本分类方法 | |
CN106372122A (zh) | 一种基于维基语义匹配的文档分类方法及系统 | |
CN104866606A (zh) | 一种MapReduce并行化大数据文本分类方法 | |
Campbell et al. | Content+ context networks for user classification in twitter | |
CN106776724B (zh) | 一种题目分类方法及系统 | |
CN109325096B (zh) | 一种基于知识资源分类的知识资源搜索系统 | |
Tian | A mathematical indexing method based on the hierarchical features of operators in formulae | |
Fu et al. | Research on knowledge map construction in intelligentized content website | |
Yong-Sheng et al. | The method for discovering technology competitor groups based on graph clustering | |
Chen et al. | Text classification using SVM with exponential kernel | |
Jiang et al. | The analysis of china’s integrity situation based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20160406 Address after: 100031 Xicheng District West Chang'an Avenue, No. 86, Beijing Applicant after: State Grid Corporation of China Applicant after: China Electric Power Research Institute Applicant after: State Grid Smart Grid Institute Applicant after: Information & Telecommunication Branch of State Grid Jiangsu Electric Power Company Applicant after: Jiangsu Electric Power Company Address before: 100031 Xicheng District West Chang'an Avenue, No. 86, Beijing Applicant before: State Grid Corporation of China Applicant before: China Electric Power Research Institute Applicant before: Information & Telecommunication Branch of State Grid Jiangsu Electric Power Company Applicant before: Jiangsu Electric Power Company |
|
CB02 | Change of applicant information |
Address after: 100031 Xicheng District West Chang'an Avenue, No. 86, Beijing Applicant after: State Grid Corporation of China Applicant after: China Electric Power Research Institute Applicant after: GLOBAL ENERGY INTERCONNECTION RESEARCH INSTITUTE Applicant after: Information & Telecommunication Branch of State Grid Jiangsu Electric Power Company Applicant after: Jiangsu Electric Power Company Address before: 100031 Xicheng District West Chang'an Avenue, No. 86, Beijing Applicant before: State Grid Corporation of China Applicant before: China Electric Power Research Institute Applicant before: State Grid Smart Grid Institute Applicant before: Information & Telecommunication Branch of State Grid Jiangsu Electric Power Company Applicant before: Jiangsu Electric Power Company |
|
COR | Change of bibliographic data | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |