CN113657106A - 基于归一化词频权重的特征选择方法 - Google Patents
基于归一化词频权重的特征选择方法 Download PDFInfo
- Publication number
- CN113657106A CN113657106A CN202110758265.8A CN202110758265A CN113657106A CN 113657106 A CN113657106 A CN 113657106A CN 202110758265 A CN202110758265 A CN 202110758265A CN 113657106 A CN113657106 A CN 113657106A
- Authority
- CN
- China
- Prior art keywords
- feature
- word
- class
- documents
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010187 selection method Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims description 32
- 238000012360 testing method Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 13
- 238000010606 normalization Methods 0.000 claims description 10
- 238000012706 support-vector machine Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000013145 classification model Methods 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 2
- 230000000717 retained effect Effects 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 20
- 230000006870 function Effects 0.000 abstract description 9
- 238000011434 tangent normalization method Methods 0.000 abstract description 2
- 239000000284 extract Substances 0.000 abstract 1
- 238000001914 filtration Methods 0.000 description 3
- 238000000546 chi-square test Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110758265.8A CN113657106B (zh) | 2021-07-05 | 2021-07-05 | 基于归一化词频权重的特征选择方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110758265.8A CN113657106B (zh) | 2021-07-05 | 2021-07-05 | 基于归一化词频权重的特征选择方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113657106A true CN113657106A (zh) | 2021-11-16 |
CN113657106B CN113657106B (zh) | 2024-06-21 |
Family
ID=78477929
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110758265.8A Active CN113657106B (zh) | 2021-07-05 | 2021-07-05 | 基于归一化词频权重的特征选择方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113657106B (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034197A (zh) * | 2022-06-30 | 2022-09-09 | 联想(北京)有限公司 | 数据处理方法、装置及电子设备 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060136410A1 (en) * | 2004-12-17 | 2006-06-22 | Xerox Corporation | Method and apparatus for explaining categorization decisions |
US20070294223A1 (en) * | 2006-06-16 | 2007-12-20 | Technion Research And Development Foundation Ltd. | Text Categorization Using External Knowledge |
EP2570970A1 (en) * | 2011-09-16 | 2013-03-20 | Technische Universität Berlin | Method and system for the automatic analysis of an image of a biological sample |
KR101363335B1 (ko) * | 2012-09-19 | 2014-02-19 | 숭실대학교산학협력단 | 문서 분류모델 생성장치 및 방법 |
CN105224695A (zh) * | 2015-11-12 | 2016-01-06 | 中南大学 | 一种基于信息熵的文本特征量化方法和装置及文本分类方法和装置 |
WO2018028065A1 (zh) * | 2016-08-11 | 2018-02-15 | 中兴通讯股份有限公司 | 一种短信息分类方法、装置及计算机存储介质 |
CN108108462A (zh) * | 2017-12-29 | 2018-06-01 | 河南科技大学 | 一种基于特征分类的文本情感分析方法 |
CN109142317A (zh) * | 2018-08-29 | 2019-01-04 | 厦门大学 | 一种基于随机森林模型的拉曼光谱物质识别方法 |
CN111382273A (zh) * | 2020-03-09 | 2020-07-07 | 西安理工大学 | 一种基于吸引因子的特征选择的文本分类方法 |
-
2021
- 2021-07-05 CN CN202110758265.8A patent/CN113657106B/zh active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060136410A1 (en) * | 2004-12-17 | 2006-06-22 | Xerox Corporation | Method and apparatus for explaining categorization decisions |
US20070294223A1 (en) * | 2006-06-16 | 2007-12-20 | Technion Research And Development Foundation Ltd. | Text Categorization Using External Knowledge |
EP2570970A1 (en) * | 2011-09-16 | 2013-03-20 | Technische Universität Berlin | Method and system for the automatic analysis of an image of a biological sample |
KR101363335B1 (ko) * | 2012-09-19 | 2014-02-19 | 숭실대학교산학협력단 | 문서 분류모델 생성장치 및 방법 |
CN105224695A (zh) * | 2015-11-12 | 2016-01-06 | 中南大学 | 一种基于信息熵的文本特征量化方法和装置及文本分类方法和装置 |
WO2018028065A1 (zh) * | 2016-08-11 | 2018-02-15 | 中兴通讯股份有限公司 | 一种短信息分类方法、装置及计算机存储介质 |
CN108108462A (zh) * | 2017-12-29 | 2018-06-01 | 河南科技大学 | 一种基于特征分类的文本情感分析方法 |
CN109142317A (zh) * | 2018-08-29 | 2019-01-04 | 厦门大学 | 一种基于随机森林模型的拉曼光谱物质识别方法 |
CN111382273A (zh) * | 2020-03-09 | 2020-07-07 | 西安理工大学 | 一种基于吸引因子的特征选择的文本分类方法 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115034197A (zh) * | 2022-06-30 | 2022-09-09 | 联想(北京)有限公司 | 数据处理方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113657106B (zh) | 2024-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107193959B (zh) | 一种面向纯文本的企业实体分类方法 | |
CN107391772B (zh) | 一种基于朴素贝叶斯的文本分类方法 | |
CN108898479B (zh) | 信用评价模型的构建方法及装置 | |
CN111914090B (zh) | 一种企业行业分类识别及其特征污染物识别的方法及装置 | |
CN110866819A (zh) | 一种基于元学习的自动化信贷评分卡生成方法 | |
CN110188047B (zh) | 一种基于双通道卷积神经网络的重复缺陷报告检测方法 | |
CN105975518B (zh) | 基于信息熵的期望交叉熵特征选择文本分类系统及方法 | |
CN109271517B (zh) | Ig tf-idf文本特征向量生成及文本分类方法 | |
CN105373606A (zh) | 一种改进c4.5决策树算法下的不平衡数据抽样方法 | |
CN104834940A (zh) | 一种基于支持向量机的医疗影像检查疾病分类方法 | |
CN111177010B (zh) | 一种软件缺陷严重程度识别方法 | |
CN103995876A (zh) | 一种基于卡方统计和smo算法的文本分类方法 | |
CN108647729B (zh) | 一种用户画像获取方法 | |
CN108766464B (zh) | 基于电网频率波动超矢量的数字音频篡改自动检测方法 | |
CN111144106A (zh) | 一种不平衡数据集下的两阶段文本特征选择方法 | |
CN111539451A (zh) | 样本数据优化方法、装置、设备及存储介质 | |
CN107016416B (zh) | 基于邻域粗糙集和pca融合的数据分类预测方法 | |
CN111338950A (zh) | 一种基于谱聚类的软件缺陷特征选择方法 | |
CN115271442A (zh) | 基于自然语言评估企业成长性的建模方法及系统 | |
CN106951728B (zh) | 一种基于粒子群优化和打分准则的肿瘤关键基因识别方法 | |
CN113657106B (zh) | 基于归一化词频权重的特征选择方法 | |
CN105894032A (zh) | 一种针对样本性质提取有效特征的方法 | |
CN117454873A (zh) | 一种基于知识增强神经网络模型的讽刺检测方法及系统 | |
CN113792141B (zh) | 基于协方差度量因子的特征选择方法 | |
CN113704464B (zh) | 基于网络新闻的时评类作文素材语料库的构建方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240409 Address after: 518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province Applicant after: Shenzhen Wanzhida Technology Co.,Ltd. Country or region after: China Address before: 710048 Shaanxi province Xi'an Beilin District Jinhua Road No. 5 Applicant before: XI'AN University OF TECHNOLOGY Country or region before: China |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240524 Address after: Room 304, 3rd Floor, Building 21, Zone 2, Tiantong Zhongyuan, Dongxiaokou Town, Changping District, Beijing, 100000 Applicant after: It's Also A Pleasure For Youpeng (Beijing) Technology Co.,Ltd. Country or region after: China Address before: 518000 1002, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province Applicant before: Shenzhen Wanzhida Technology Co.,Ltd. Country or region before: China |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |