CN113032573B - 一种结合主题语义与tf*idf算法的大规模文本分类方法及系统 - Google Patents
一种结合主题语义与tf*idf算法的大规模文本分类方法及系统 Download PDFInfo
- Publication number
- CN113032573B CN113032573B CN202110481459.8A CN202110481459A CN113032573B CN 113032573 B CN113032573 B CN 113032573B CN 202110481459 A CN202110481459 A CN 202110481459A CN 113032573 B CN113032573 B CN 113032573B
- Authority
- CN
- China
- Prior art keywords
- text
- topic
- classification
- feature
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110481459.8A CN113032573B (zh) | 2021-04-30 | 2021-04-30 | 一种结合主题语义与tf*idf算法的大规模文本分类方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110481459.8A CN113032573B (zh) | 2021-04-30 | 2021-04-30 | 一种结合主题语义与tf*idf算法的大规模文本分类方法及系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113032573A CN113032573A (zh) | 2021-06-25 |
CN113032573B true CN113032573B (zh) | 2024-01-23 |
Family
ID=76454814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110481459.8A Active CN113032573B (zh) | 2021-04-30 | 2021-04-30 | 一种结合主题语义与tf*idf算法的大规模文本分类方法及系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113032573B (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113360658A (zh) * | 2021-07-14 | 2021-09-07 | 福建亿榕信息技术有限公司 | 一种用于审计业务的文本自动分类方法 |
CN116701812B (zh) * | 2023-08-03 | 2023-11-28 | 中国测绘科学研究院 | 基于区块单元的地理信息网页文本主题分类方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622373A (zh) * | 2011-01-31 | 2012-08-01 | 中国科学院声学研究所 | 一种基于tf*idf算法的统计学文本分类系统及方法 |
CN103914445A (zh) * | 2014-03-05 | 2014-07-09 | 中国人民解放军装甲兵工程学院 | 数据语义处理方法 |
CN108090231A (zh) * | 2018-01-12 | 2018-05-29 | 北京理工大学 | 一种基于信息熵的主题模型优化方法 |
CN109408641A (zh) * | 2018-11-22 | 2019-03-01 | 山东工商学院 | 一种基于有监督主题模型的文本分类方法及系统 |
WO2019200806A1 (zh) * | 2018-04-20 | 2019-10-24 | 平安科技(深圳)有限公司 | 文本分类模型的生成装置、方法及计算机可读存储介质 |
-
2021
- 2021-04-30 CN CN202110481459.8A patent/CN113032573B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622373A (zh) * | 2011-01-31 | 2012-08-01 | 中国科学院声学研究所 | 一种基于tf*idf算法的统计学文本分类系统及方法 |
CN103914445A (zh) * | 2014-03-05 | 2014-07-09 | 中国人民解放军装甲兵工程学院 | 数据语义处理方法 |
CN108090231A (zh) * | 2018-01-12 | 2018-05-29 | 北京理工大学 | 一种基于信息熵的主题模型优化方法 |
WO2019200806A1 (zh) * | 2018-04-20 | 2019-10-24 | 平安科技(深圳)有限公司 | 文本分类模型的生成装置、方法及计算机可读存储介质 |
CN109408641A (zh) * | 2018-11-22 | 2019-03-01 | 山东工商学院 | 一种基于有监督主题模型的文本分类方法及系统 |
Non-Patent Citations (1)
Title |
---|
LDA 模型在专利文本分类中的应用;廖列法,勒孚刚,朱亚兰;现代情报(第03期);1-5 * |
Also Published As
Publication number | Publication date |
---|---|
CN113032573A (zh) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110717047B (zh) | 一种基于图卷积神经网络的Web服务分类方法 | |
CN107609121B (zh) | 基于LDA和word2vec算法的新闻文本分类方法 | |
Wang et al. | Local probabilistic models for link prediction | |
Gao et al. | Application of improved distributed naive Bayesian algorithms in text classification | |
CN110543564B (zh) | 基于主题模型的领域标签获取方法 | |
Shadgara et al. | Ontology alignment using machine learning techniques | |
WO2021068683A1 (zh) | 正则表达式生成方法、装置、服务器及计算机可读存储介质 | |
CN101763431A (zh) | 基于海量网络舆情信息的pl聚类处理方法 | |
CN108647322B (zh) | 基于词网识别大量Web文本信息相似度的方法 | |
Brucker et al. | Multi-label classification and extracting predicted class hierarchies | |
CN107180075A (zh) | 文本分类集成层次聚类分析的标签自动生成方法 | |
CN110750635A (zh) | 一种基于联合深度学习模型的法条推荐方法 | |
CN111061939B (zh) | 基于深度学习的科研学术新闻关键字匹配推荐方法 | |
CN113032573B (zh) | 一种结合主题语义与tf*idf算法的大规模文本分类方法及系统 | |
Bhutada et al. | Semantic latent dirichlet allocation for automatic topic extraction | |
Hu et al. | EGC: A novel event-oriented graph clustering framework for social media text | |
CN118114658A (zh) | 一种面向电网复杂调控业务的数据检索意图识别方法 | |
Chow et al. | A new document representation using term frequency and vectorized graph connectionists with application to document retrieval | |
CN119558307B (zh) | 一种基于深度学习模型的创建结构化文档的方法 | |
He et al. | Modeling document networks with tree-averaged copula regularization | |
CN110348497B (zh) | 一种基于WT-GloVe词向量构建的文本表示方法 | |
Perumal | Document Clustering Using Graph Based Fuzzy Association Rule Generation. | |
CN114298020A (zh) | 一种基于主题语义信息的关键词向量化方法及其应用 | |
CN109871429B (zh) | 融合Wikipedia分类及显式语义特征的短文本检索方法 | |
CN115563284B (zh) | 一种基于语义的深度多实例弱监督文本分类方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230815 Address after: Rooms B201, B202, B203, B205, B206, B207, B208, B209, B210, 2nd Floor, Building B-2, Zhongguancun Dongsheng Science and Technology Park, No. 66 Xixiaokou Road, Haidian District, Beijing, 100084 (Dongsheng District) Applicant after: TONGFANG KNOWLEDGE NETWORK DIGITAL PUBLISHING TECHNOLOGY CO.,LTD. Applicant after: CHINA ACADEMIC JOURNALS ELECTRONIC PUBLISHING HOUSE CO.,LTD. Address before: 100084 Qinghua garden, Haidian District, Beijing Applicant before: CHINA ACADEMIC JOURNALS ELECTRONIC PUBLISHING HOUSE CO.,LTD. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: Room B201, B202, B203, B205, B206, B207, B208, B209, B210, 2nd Floor, Building B-2, Zhongguancun Dongsheng Science and Technology Park, No. 66 Xixiaokou Road, Haidian District, Beijing (Dongsheng area) Patentee after: Tongfangzhiwang Digital Technology Co.,Ltd. Country or region after: China Patentee after: CHINA ACADEMIC JOURNALS ELECTRONIC PUBLISHING HOUSE CO.,LTD. Address before: Room B201, B202, B203, B205, B206, B207, B208, B209, B210, 2nd Floor, Building B-2, Zhongguancun Dongsheng Science and Technology Park, No. 66 Xixiaokou Road, Haidian District, Beijing (Dongsheng area) Patentee before: TONGFANG KNOWLEDGE NETWORK DIGITAL PUBLISHING TECHNOLOGY CO.,LTD. Country or region before: China Patentee before: CHINA ACADEMIC JOURNALS ELECTRONIC PUBLISHING HOUSE CO.,LTD. |
|
CP03 | Change of name, title or address |