CN105260437B - 文本分类特征选择方法及其在生物医药文本分类中的应用 - Google Patents
文本分类特征选择方法及其在生物医药文本分类中的应用 Download PDFInfo
- Publication number
- CN105260437B CN105260437B CN201510642985.2A CN201510642985A CN105260437B CN 105260437 B CN105260437 B CN 105260437B CN 201510642985 A CN201510642985 A CN 201510642985A CN 105260437 B CN105260437 B CN 105260437B
- Authority
- CN
- China
- Prior art keywords
- feature
- classification
- text
- context
- local context
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013459 approach Methods 0.000 title claims abstract description 23
- 239000003814 drug Substances 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 17
- 238000002790 cross-validation Methods 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 238000003672 processing method Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 abstract description 6
- 238000001914 filtration Methods 0.000 abstract description 5
- 239000006185 dispersion Substances 0.000 abstract description 4
- 238000004458 analytical method Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000010187 selection method Methods 0.000 description 5
- 101100268668 Caenorhabditis elegans acc-2 gene Proteins 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000012706 support-vector machine Methods 0.000 description 4
- 208000024172 Cardiovascular disease Diseases 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000018199 S phase Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000002405 diagnostic procedure Methods 0.000 description 2
- 230000006916 protein interaction Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 208000019553 vascular disease Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
C | F | LLFilter | |
LLFilter vs.GINI | 73.26(507) | 72.84(-0.6%) | 75.08(+2.5%) |
LLFilter vs.DF | 73.27(555) | 72.70(-0.8%) | 75.08(+2.5%) |
LLFilter vs.CDM | 73.40(609) | 74.04(+0.9%) | 75.08(+1.4%) |
LLFilter vs.Acc2 | 73.23(583) | 73.06(-0.2%) | 75.08(+2.5%) |
LLFilter vs.TF-IDF | 72.99(502) | 73.53(+0.7%) | 75.08(+2.9%) |
LLFilter vs.GINIntf | 73.67(567) | 74.22(+0.7%) | 75.08(+1.9%) |
GI | DF | CDM | Acc2 | TF-IDF | GINIntf | LLFilter | |
Dscore | 5054 | 5067 | 5067 | 5054 | 5106 | 5133 | 5319 |
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510642985.2A CN105260437B (zh) | 2015-09-30 | 2015-09-30 | 文本分类特征选择方法及其在生物医药文本分类中的应用 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510642985.2A CN105260437B (zh) | 2015-09-30 | 2015-09-30 | 文本分类特征选择方法及其在生物医药文本分类中的应用 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105260437A CN105260437A (zh) | 2016-01-20 |
CN105260437B true CN105260437B (zh) | 2018-11-23 |
Family
ID=55100128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510642985.2A Active CN105260437B (zh) | 2015-09-30 | 2015-09-30 | 文本分类特征选择方法及其在生物医药文本分类中的应用 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105260437B (zh) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106021508A (zh) * | 2016-05-23 | 2016-10-12 | 武汉大学 | 基于社交媒体的突发事件应急信息挖掘方法 |
CN106326458A (zh) * | 2016-06-02 | 2017-01-11 | 广西智度信息科技有限公司 | 一种基于文本分类的城市管理案件分类方法 |
CN106250367B (zh) * | 2016-07-27 | 2019-04-09 | 昆明理工大学 | 基于改进的Nivre算法构建越南语依存树库的方法 |
CN106708959A (zh) * | 2016-11-30 | 2017-05-24 | 重庆大学 | 一种基于医学文献数据库的组合药物识别与排序方法 |
CN108205524B (zh) * | 2016-12-20 | 2022-01-07 | 北京京东尚科信息技术有限公司 | 文本数据处理方法和装置 |
CN107016073B (zh) * | 2017-03-24 | 2019-06-28 | 北京科技大学 | 一种文本分类特征选择方法 |
CN107092679B (zh) * | 2017-04-21 | 2020-01-03 | 北京邮电大学 | 一种特征词向量获得方法、文本分类方法及装置 |
CN107357837B (zh) * | 2017-06-22 | 2019-10-08 | 华南师范大学 | 基于保序子矩阵和频繁序列挖掘的电商评论情感分类方法 |
CN108009152A (zh) * | 2017-12-04 | 2018-05-08 | 陕西识代运筹信息科技股份有限公司 | 一种基于Spark-Streaming的文本相似性分析的数据处理方法和装置 |
CN109117956B (zh) * | 2018-07-05 | 2021-08-24 | 浙江大学 | 一种最佳特征子集的确定方法 |
CN109767814A (zh) * | 2019-01-17 | 2019-05-17 | 中国科学院新疆理化技术研究所 | 一种基于GloVe模型的氨基酸全局特征向量表示方法 |
CN111382273B (zh) * | 2020-03-09 | 2023-04-14 | 广州智赢万世市场管理有限公司 | 一种基于吸引因子的特征选择的文本分类方法 |
CN111475617B (zh) * | 2020-03-30 | 2023-04-18 | 招商局金融科技有限公司 | 事件主体抽取方法、装置及存储介质 |
CN113470779B (zh) * | 2021-09-03 | 2021-11-26 | 壹药网科技(上海)股份有限公司 | 药品类目识别方法及其系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101122909A (zh) * | 2006-08-10 | 2008-02-13 | 株式会社日立制作所 | 文本信息检索装置以及文本信息检索方法 |
CN102023967A (zh) * | 2010-11-11 | 2011-04-20 | 清华大学 | 一种面向股票领域的文本情感分类方法 |
CN102257492A (zh) * | 2008-12-19 | 2011-11-23 | 伊斯曼柯达公司 | 用于产生语境增强的交流作品的系统和方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7543232B2 (en) * | 2004-10-19 | 2009-06-02 | International Business Machines Corporation | Intelligent web based help system |
-
2015
- 2015-09-30 CN CN201510642985.2A patent/CN105260437B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101122909A (zh) * | 2006-08-10 | 2008-02-13 | 株式会社日立制作所 | 文本信息检索装置以及文本信息检索方法 |
CN102257492A (zh) * | 2008-12-19 | 2011-11-23 | 伊斯曼柯达公司 | 用于产生语境增强的交流作品的系统和方法 |
CN102023967A (zh) * | 2010-11-11 | 2011-04-20 | 清华大学 | 一种面向股票领域的文本情感分类方法 |
Non-Patent Citations (3)
Title |
---|
"中文文本分类中的特征选择算法研究";胡佳妮,等;《光通信研究》;20051231(第3期);全文 * |
"基于关联分析的文本分类特征选择算法";张彪,等;《计算机工程》;20101130;第36卷(第22期);全文 * |
"基于特征向量的实体间语义关系抽取研究";毛小丽;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20120715(第07期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN105260437A (zh) | 2016-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105260437B (zh) | 文本分类特征选择方法及其在生物医药文本分类中的应用 | |
CN107633007B (zh) | 一种基于层次化ap聚类的商品评论数据标签化系统和方法 | |
CN104699730B (zh) | 用于识别候选答案之间的关系的方法和系统 | |
CN103870973B (zh) | 基于电子信息的关键词提取的信息推送、搜索方法及装置 | |
Harfoushi et al. | Sentiment analysis algorithms through azure machine learning: Analysis and comparison | |
CN111708888B (zh) | 基于人工智能的分类方法、装置、终端和存储介质 | |
Qi et al. | Recognizing driving styles based on topic models | |
CN112905739B (zh) | 虚假评论检测模型训练方法、检测方法及电子设备 | |
CN109492105B (zh) | 一种基于多特征集成学习的文本情感分类方法 | |
CN111680225B (zh) | 基于机器学习的微信金融消息分析方法及系统 | |
CN112559684A (zh) | 一种关键词提取及信息检索方法 | |
CN108804595B (zh) | 一种基于word2vec的短文本表示方法 | |
CN107895303B (zh) | 一种基于ocean模型的个性化推荐的方法 | |
CN108038099B (zh) | 基于词聚类的低频关键词识别方法 | |
CN109960727A (zh) | 针对非结构化文本的个人隐私信息自动检测方法及系统 | |
CN114387061A (zh) | 产品推送方法、装置、电子设备及可读存储介质 | |
CN112115712B (zh) | 基于话题的群体情感分析方法 | |
CN102662987B (zh) | 一种基于百度百科的网络文本语义的分类方法 | |
CN112735584A (zh) | 一种恶性肿瘤诊疗辅助决策生成方法及装置 | |
CN116976321A (zh) | 文本处理方法、装置、计算机设备、存储介质和程序产品 | |
Gurung et al. | A study on Topic Identification using K means clustering algorithm: Big vs. Small Documents | |
Jayakody et al. | Sentiment analysis on product reviews on twitter using Machine Learning Approaches | |
Tohabar et al. | Bengali fake news detection using machine learning and effectiveness of sentiment as a feature | |
Rakhsha et al. | Detecting adverse drug reactions from social media based on multichannel convolutional neural networks modified by support vector machine | |
CN112489689B (zh) | 基于多尺度差异对抗的跨数据库语音情感识别方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210727 Address after: No. 86, Yushan West Road, Jiangpu street, Pukou District, Nanjing, Jiangsu 210012 Patentee after: NANJING AUDIT University Address before: No. 86, Yushan West Road, Pukou District, Nanjing City, Jiangsu Province Patentee before: Chen Yifei |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211220 Address after: 210000 No. 10, Fenghuang street, Jiangpu street, Pukou District, Nanjing, Jiangsu - rh0001 Patentee after: Nanjing Rui Hui Data Technology Co.,Ltd. Address before: No. 86, Yushan West Road, Jiangpu street, Pukou District, Nanjing, Jiangsu 210012 Patentee before: NANJING AUDIT University |
|
TR01 | Transfer of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Feature selection method for text classification and its application in biomedical text classification Effective date of registration: 20221011 Granted publication date: 20181123 Pledgee: Nanjing Bank Co.,Ltd. Nanjing Financial City Branch Pledgor: Nanjing Rui Hui Data Technology Co.,Ltd. Registration number: Y2022980017741 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |