CN109033212A - 一种基于相似度匹配的文本分类方法 - Google Patents
一种基于相似度匹配的文本分类方法 Download PDFInfo
- Publication number
- CN109033212A CN109033212A CN201810704164.0A CN201810704164A CN109033212A CN 109033212 A CN109033212 A CN 109033212A CN 201810704164 A CN201810704164 A CN 201810704164A CN 109033212 A CN109033212 A CN 109033212A
- Authority
- CN
- China
- Prior art keywords
- text
- server
- similarity
- webpage
- candidate sentences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 239000000284 extract Substances 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 210000001072 colon Anatomy 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810704164.0A CN109033212B (zh) | 2018-07-01 | 2018-07-01 | 一种基于相似度匹配的文本分类方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810704164.0A CN109033212B (zh) | 2018-07-01 | 2018-07-01 | 一种基于相似度匹配的文本分类方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109033212A true CN109033212A (zh) | 2018-12-18 |
CN109033212B CN109033212B (zh) | 2021-09-07 |
Family
ID=65521108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810704164.0A Active CN109033212B (zh) | 2018-07-01 | 2018-07-01 | 一种基于相似度匹配的文本分类方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109033212B (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750493A (zh) * | 2019-09-03 | 2020-02-04 | 平安科技(深圳)有限公司 | 一种法律文本归档方法、装置、可读存储介质及终端设备 |
CN110941719A (zh) * | 2019-12-02 | 2020-03-31 | 中国银行股份有限公司 | 数据分类方法、测试方法、装置及存储介质 |
CN111177372A (zh) * | 2019-12-06 | 2020-05-19 | 绍兴市上虞区理工高等研究院 | 一种科技成果的分类方法、装置、设备及介质 |
CN111414765A (zh) * | 2020-03-20 | 2020-07-14 | 北京百度网讯科技有限公司 | 句子一致性的判定方法、装置、电子设备及可读存储介质 |
WO2021092871A1 (zh) * | 2019-11-13 | 2021-05-20 | 北京数字联盟网络科技有限公司 | 一种基于TextRank的应用偏好文本分类方法 |
CN115037739A (zh) * | 2022-06-13 | 2022-09-09 | 深圳乐播科技有限公司 | 文件传输方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120123768A1 (en) * | 2000-09-30 | 2012-05-17 | Weiquan Liu | Method and apparatus for determining text passage similarity |
CN104750833A (zh) * | 2015-04-03 | 2015-07-01 | 浪潮集团有限公司 | 一种文本分类方法及装置 |
CN105095223A (zh) * | 2014-04-25 | 2015-11-25 | 阿里巴巴集团控股有限公司 | 文本分类方法及服务器 |
CN106503184A (zh) * | 2016-10-24 | 2017-03-15 | 海信集团有限公司 | 确定目标文本所属业务类别的方法及装置 |
CN107436875A (zh) * | 2016-05-25 | 2017-12-05 | 华为技术有限公司 | 文本分类方法及装置 |
-
2018
- 2018-07-01 CN CN201810704164.0A patent/CN109033212B/zh active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120123768A1 (en) * | 2000-09-30 | 2012-05-17 | Weiquan Liu | Method and apparatus for determining text passage similarity |
CN105095223A (zh) * | 2014-04-25 | 2015-11-25 | 阿里巴巴集团控股有限公司 | 文本分类方法及服务器 |
CN104750833A (zh) * | 2015-04-03 | 2015-07-01 | 浪潮集团有限公司 | 一种文本分类方法及装置 |
CN107436875A (zh) * | 2016-05-25 | 2017-12-05 | 华为技术有限公司 | 文本分类方法及装置 |
CN106503184A (zh) * | 2016-10-24 | 2017-03-15 | 海信集团有限公司 | 确定目标文本所属业务类别的方法及装置 |
Non-Patent Citations (2)
Title |
---|
乔少杰 等: "基于中心性和 PageRank 的网页综合评分方法", 《西南交通大学学报》 * |
杨茂: "基于句子相似度的文本比对算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750493A (zh) * | 2019-09-03 | 2020-02-04 | 平安科技(深圳)有限公司 | 一种法律文本归档方法、装置、可读存储介质及终端设备 |
CN110750493B (zh) * | 2019-09-03 | 2022-08-09 | 平安科技(深圳)有限公司 | 一种法律文本归档方法、装置、可读存储介质及终端设备 |
WO2021092871A1 (zh) * | 2019-11-13 | 2021-05-20 | 北京数字联盟网络科技有限公司 | 一种基于TextRank的应用偏好文本分类方法 |
CN110941719A (zh) * | 2019-12-02 | 2020-03-31 | 中国银行股份有限公司 | 数据分类方法、测试方法、装置及存储介质 |
CN110941719B (zh) * | 2019-12-02 | 2023-12-19 | 中国银行股份有限公司 | 数据分类方法、测试方法、装置及存储介质 |
CN111177372A (zh) * | 2019-12-06 | 2020-05-19 | 绍兴市上虞区理工高等研究院 | 一种科技成果的分类方法、装置、设备及介质 |
CN111414765A (zh) * | 2020-03-20 | 2020-07-14 | 北京百度网讯科技有限公司 | 句子一致性的判定方法、装置、电子设备及可读存储介质 |
CN115037739A (zh) * | 2022-06-13 | 2022-09-09 | 深圳乐播科技有限公司 | 文件传输方法、装置、电子设备及存储介质 |
CN115037739B (zh) * | 2022-06-13 | 2024-02-23 | 深圳乐播科技有限公司 | 文件传输方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN109033212B (zh) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109033212A (zh) | 一种基于相似度匹配的文本分类方法 | |
CN105824959B (zh) | 舆情监控方法及系统 | |
CN108920633B (zh) | 一种论文相似度的检测方法 | |
CN112347244B (zh) | 基于混合特征分析的涉黄、涉赌网站检测方法 | |
CN109582704B (zh) | 招聘信息和求职简历匹配的方法 | |
CN107463658B (zh) | 文本分类方法及装置 | |
CN108009135B (zh) | 生成文档摘要的方法和装置 | |
US20170091318A1 (en) | Apparatus and method for extracting keywords from a single document | |
CN111104526A (zh) | 一种基于关键词语义的金融标签提取方法及系统 | |
CN104866558B (zh) | 一种社交网络账号映射模型训练方法及映射方法和系统 | |
CN108038099B (zh) | 基于词聚类的低频关键词识别方法 | |
US20150286706A1 (en) | Forensic system, forensic method, and forensic program | |
CN110287314A (zh) | 基于无监督聚类的长文本可信度评估方法及系统 | |
CN105354184B (zh) | 一种使用优化的向量空间模型实现文档自动分类的方法 | |
CN110928986A (zh) | 法律证据的排序和推荐方法、装置、设备及存储介质 | |
CN108897861A (zh) | 一种信息搜索方法 | |
CN110910175A (zh) | 一种旅游门票产品画像生成方法 | |
CN113486664A (zh) | 文本数据可视化分析方法、装置、设备及存储介质 | |
CN115577095A (zh) | 一种基于图论的电力标准信息推荐方法 | |
CN114706949A (zh) | 信息检索方法、装置、设备及计算机可读介质 | |
CN104462065B (zh) | 事件情感类型的分析方法和装置 | |
CN104462439B (zh) | 事件的识别方法和装置 | |
CN109033093A (zh) | 一种基于相似度匹配的文本翻译方法 | |
CN108959263B (zh) | 一种词条权重计算模型训练方法及装置 | |
CN109002508B (zh) | 一种基于网络爬虫的文本信息爬取方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210811 Address after: 200000 No. 7, Lane 999, huanke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai Applicant after: Shanghai new sunfaith intellectual property services Limited by Share Ltd. Address before: Room 403, No.35, Sanxiang, xiashou new village, Xicheng District, Dongguan City, Guangdong Province 523073 Applicant before: DONGGUAN HUARUI ELECTRONIC TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221227 Address after: 523,000 Room 1702, Building 13, No.1 Xuefu Road, Songshanhu Park, Dongguan, Guangdong Patentee after: Guangdong Huazhong Yuechuang Intellectual Property Operation Management Co.,Ltd. Address before: 200000 No. 7, Lane 999, huanke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai Patentee before: Shanghai new sunfaith intellectual property services Limited by Share Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230112 Address after: 523000 Room 102, Building 1, No. 90, Dapantian Road, Dalingshan Town, Dongguan City, Guangdong Province Patentee after: Dongguan Maike Microoptoelectronics Technology Co.,Ltd. Address before: 523,000 Room 1702, Building 13, No.1 Xuefu Road, Songshanhu Park, Dongguan, Guangdong Patentee before: Guangdong Huazhong Yuechuang Intellectual Property Operation Management Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230613 Address after: No.28 Shenpujing Road, Zhujing Town, Jinshan District, Shanghai, 201500 (Jinshan Capital Group North Economic Park) Patentee after: Shanghai Nuozhu Intellectual Property Services Co.,Ltd. Address before: 523000 Room 102, Building 1, No. 90, Dapantian Road, Dalingshan Town, Dongguan City, Guangdong Province Patentee before: Dongguan Maike Microoptoelectronics Technology Co.,Ltd. |