CN111382273B - 一种基于吸引因子的特征选择的文本分类方法 - Google Patents
一种基于吸引因子的特征选择的文本分类方法 Download PDFInfo
- Publication number
- CN111382273B CN111382273B CN202010158078.1A CN202010158078A CN111382273B CN 111382273 B CN111382273 B CN 111382273B CN 202010158078 A CN202010158078 A CN 202010158078A CN 111382273 B CN111382273 B CN 111382273B
- Authority
- CN
- China
- Prior art keywords
- texts
- attraction
- average
- category
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012706 support-vector machine Methods 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 18
- 238000010187 selection method Methods 0.000 claims abstract description 11
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 238000002790 cross-validation Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000005259 measurement Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 description 21
- 238000005516 engineering process Methods 0.000 description 7
- 238000000546 chi-square test Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- RVRCFVVLDHTFFA-UHFFFAOYSA-N heptasodium;tungsten;nonatriacontahydrate Chemical compound O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.[Na+].[Na+].[Na+].[Na+].[Na+].[Na+].[Na+].[W].[W].[W].[W].[W].[W].[W].[W].[W].[W].[W] RVRCFVVLDHTFFA-UHFFFAOYSA-N 0.000 description 4
- 238000007635 classification algorithm Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010158078.1A CN111382273B (zh) | 2020-03-09 | 2020-03-09 | 一种基于吸引因子的特征选择的文本分类方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010158078.1A CN111382273B (zh) | 2020-03-09 | 2020-03-09 | 一种基于吸引因子的特征选择的文本分类方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111382273A CN111382273A (zh) | 2020-07-07 |
CN111382273B true CN111382273B (zh) | 2023-04-14 |
Family
ID=71217271
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010158078.1A Active CN111382273B (zh) | 2020-03-09 | 2020-03-09 | 一种基于吸引因子的特征选择的文本分类方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111382273B (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113657106B (zh) * | 2021-07-05 | 2024-06-21 | 不亦乐乎有朋(北京)科技有限公司 | 基于归一化词频权重的特征选择方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810264A (zh) * | 2014-01-27 | 2014-05-21 | 西安理工大学 | 基于特征选择的网页文本分类方法 |
CN105260437A (zh) * | 2015-09-30 | 2016-01-20 | 陈一飞 | 文本分类特征选择方法及其在生物医药文本分类中的应用 |
CN106383853A (zh) * | 2016-08-30 | 2017-02-08 | 刘勇 | 一种电子病历后结构化以及辅助诊断的实现方法及其系统 |
CN107273387A (zh) * | 2016-04-08 | 2017-10-20 | 上海市玻森数据科技有限公司 | 面向高维和不平衡数据分类的集成 |
WO2018218706A1 (zh) * | 2017-05-27 | 2018-12-06 | 中国矿业大学 | 一种基于神经网络的新闻事件抽取的方法及系统 |
CN109376235A (zh) * | 2018-07-24 | 2019-02-22 | 西安理工大学 | 基于文档层词频重排序的特征选择方法 |
-
2020
- 2020-03-09 CN CN202010158078.1A patent/CN111382273B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810264A (zh) * | 2014-01-27 | 2014-05-21 | 西安理工大学 | 基于特征选择的网页文本分类方法 |
CN105260437A (zh) * | 2015-09-30 | 2016-01-20 | 陈一飞 | 文本分类特征选择方法及其在生物医药文本分类中的应用 |
CN107273387A (zh) * | 2016-04-08 | 2017-10-20 | 上海市玻森数据科技有限公司 | 面向高维和不平衡数据分类的集成 |
CN106383853A (zh) * | 2016-08-30 | 2017-02-08 | 刘勇 | 一种电子病历后结构化以及辅助诊断的实现方法及其系统 |
WO2018218706A1 (zh) * | 2017-05-27 | 2018-12-06 | 中国矿业大学 | 一种基于神经网络的新闻事件抽取的方法及系统 |
CN109376235A (zh) * | 2018-07-24 | 2019-02-22 | 西安理工大学 | 基于文档层词频重排序的特征选择方法 |
Non-Patent Citations (2)
Title |
---|
"Feature selection algorithm for hierarchical text classification using Kullback-Leibler divergence";Yao Lifang et al.;《IEEE International Conference on Cloud Computing and Big Data Analysis》;20170619;全文 * |
"维吾尔文论坛中基于术语选择和Rocchio分类器的文本过滤方法";如先姑力·阿布都热西提 等;《万方数据知识服务平台》;20190612;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111382273A (zh) | 2020-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Georgakopoulos et al. | Convolutional neural networks for toxic comment classification | |
CN107944273B (zh) | 一种基于tf-idf算法和svdd算法的恶意pdf文档检测方法 | |
CN111709439B (zh) | 基于词频偏差率因子的特征选择方法 | |
CN109376235B (zh) | 基于文档层词频重排序的特征选择方法 | |
CN112579783B (zh) | 基于拉普拉斯图谱的短文本聚类方法 | |
WO2020063071A1 (zh) | 基于卡方检验的句向量计算方法、文本分类方法及系统 | |
CN104881399B (zh) | 基于概率软逻辑psl的事件识别方法和系统 | |
Sarwar et al. | An effective and scalable framework for authorship attribution query processing | |
Zhang et al. | Compact representation of high-dimensional feature vectors for large-scale image recognition and retrieval | |
CN113626604B (zh) | 基于最大间隔准则的网页文本分类系统 | |
CN111382273B (zh) | 一种基于吸引因子的特征选择的文本分类方法 | |
CN106844596A (zh) | 一种基于改进的svm中文文本分类方法 | |
CN110348497B (zh) | 一种基于WT-GloVe词向量构建的文本表示方法 | |
Shoryu et al. | A deep neural network approach using convolutional network and long short term memory for text sentiment classification | |
CN105760471B (zh) | 基于组合凸线性感知器的两类文本分类方法 | |
CN117271701A (zh) | 一种基于tggat和cnn的系统运行异常事件关系抽取方法及系统 | |
CN114996446B (zh) | 一种文本分类方法、装置及存储介质 | |
Wang et al. | Learning based neural similarity metrics for multimedia data mining | |
Zhang et al. | Research on classification of scientific and technological documents based on Naive Bayes | |
Selot | Comparative Performance of Random Forest and Support Vector Machine on Sentiment Analysis of Reviews of Indian Tourism | |
Desai et al. | Analysis of Health Care Data Using Natural Language Processing | |
Alamin et al. | Improving Performance Sentiment Movie Review Classification Using Hybrid Feature TFIDF, N-Gram, Information Gain and Support Vector Machine. | |
CN116304058B (zh) | 企业负面信息的识别方法、装置、电子设备及存储介质 | |
Vijayarani et al. | Efficient machine learning classifiers for automatic information classification | |
CN113486176B (zh) | 一种基于二次特征放大的新闻分类方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230313 Address after: Room 501, No. 18, Haizhou Road, Haizhu District, Guangzhou City, Guangdong Province, 510000 (Location: Self made 01) (Office only) Applicant after: Guangzhou Zhiying Wanshi Market Management Co.,Ltd. Address before: 710000 No. B49, Xinda Zhongchuang space, 26th Street, block C, No. 2 Trading Plaza, South China City, international port district, Xi'an, Shaanxi Province Applicant before: Xi'an Huaqi Zhongxin Technology Development Co.,Ltd. Effective date of registration: 20230313 Address after: 710000 No. B49, Xinda Zhongchuang space, 26th Street, block C, No. 2 Trading Plaza, South China City, international port district, Xi'an, Shaanxi Province Applicant after: Xi'an Huaqi Zhongxin Technology Development Co.,Ltd. Address before: 710048 Shaanxi province Xi'an Beilin District Jinhua Road No. 5 Applicant before: XI'AN University OF TECHNOLOGY |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |