CN110929002B - 相似文章去重的方法、装置、终端及计算机可读存储介质 - Google Patents
相似文章去重的方法、装置、终端及计算机可读存储介质 Download PDFInfo
- Publication number
- CN110929002B CN110929002B CN201811022629.0A CN201811022629A CN110929002B CN 110929002 B CN110929002 B CN 110929002B CN 201811022629 A CN201811022629 A CN 201811022629A CN 110929002 B CN110929002 B CN 110929002B
- Authority
- CN
- China
- Prior art keywords
- articles
- processed
- similar
- mode
- article
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000012545 processing Methods 0.000 claims abstract description 103
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 238000012544 monitoring process Methods 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011143 downstream manufacturing Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811022629.0A CN110929002B (zh) | 2018-09-03 | 2018-09-03 | 相似文章去重的方法、装置、终端及计算机可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811022629.0A CN110929002B (zh) | 2018-09-03 | 2018-09-03 | 相似文章去重的方法、装置、终端及计算机可读存储介质 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110929002A CN110929002A (zh) | 2020-03-27 |
CN110929002B true CN110929002B (zh) | 2022-10-11 |
Family
ID=69854951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811022629.0A Active CN110929002B (zh) | 2018-09-03 | 2018-09-03 | 相似文章去重的方法、装置、终端及计算机可读存储介质 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110929002B (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114328884B (zh) * | 2021-12-03 | 2024-07-09 | 腾讯科技(深圳)有限公司 | 一种图文去重方法及装置 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8645335B2 (en) * | 2010-12-16 | 2014-02-04 | Microsoft Corporation | Partial recall of deduplicated files |
CN103207905B (zh) * | 2013-03-28 | 2015-12-23 | 大连理工大学 | 一种基于目标文本的计算文本相似度的方法 |
CN103543959B (zh) * | 2013-10-08 | 2016-12-07 | 深圳国泰安教育技术股份有限公司 | 海量数据高速缓存的方法及装置 |
CN106933878B (zh) * | 2015-12-30 | 2021-02-05 | 腾讯科技(北京)有限公司 | 一种信息处理方法及装置 |
CN107632984A (zh) * | 2016-07-18 | 2018-01-26 | 阿里巴巴集团控股有限公司 | 一种聚类数据表的展现方法、装置和系统 |
CN106326388A (zh) * | 2016-08-17 | 2017-01-11 | 乐视控股(北京)有限公司 | 一种信息处理方法和装置 |
CN106570066B (zh) * | 2016-10-11 | 2020-07-17 | 北京网诺星云科技有限公司 | 文件监测方法和系统 |
CN111262953B (zh) * | 2016-12-26 | 2022-09-02 | 北京五八信息技术有限公司 | 一种实时推送信息的方法和装置 |
CN106844143A (zh) * | 2016-12-27 | 2017-06-13 | 微梦创科网络科技(中国)有限公司 | 一种日志去重处理方法及装置 |
CN107315799A (zh) * | 2017-06-19 | 2017-11-03 | 重庆誉存大数据科技有限公司 | 一种互联网重复信息筛选方法及系统 |
CN107992470A (zh) * | 2017-11-08 | 2018-05-04 | 中国科学院计算机网络信息中心 | 一种基于相似度的文本查重方法及系统 |
-
2018
- 2018-09-03 CN CN201811022629.0A patent/CN110929002B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN110929002A (zh) | 2020-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10423648B2 (en) | Method, system, and computer readable medium for interest tag recommendation | |
CN107180093B (zh) | 信息搜索方法及装置和时效性查询词识别方法及装置 | |
US20190311009A1 (en) | Method and system for providing context based query suggestions | |
US20150278345A1 (en) | Method, apparatus, and server for acquiring recommended topic | |
JP7451747B2 (ja) | コンテンツを検索する方法、装置、機器及びコンピュータ読み取り可能な記憶媒体 | |
CN111008321A (zh) | 基于逻辑回归推荐方法、装置、计算设备、可读存储介质 | |
CN110334356A (zh) | 文章质量的确定方法、文章筛选方法、以及相应的装置 | |
CN105302807B (zh) | 一种获取信息类别的方法和装置 | |
CN111191178A (zh) | 一种信息推送方法、装置、服务器和存储介质 | |
CN111198961A (zh) | 商品搜索方法、装置及服务器 | |
CN111859133A (zh) | 一种推荐方法及在线预测模型的发布方法和装置 | |
CN107357794B (zh) | 优化键值数据库的数据存储结构的方法和装置 | |
EP3706014A1 (en) | Methods, apparatuses, devices, and storage media for content retrieval | |
CN110827101A (zh) | 一种店铺推荐的方法和装置 | |
CN110929002B (zh) | 相似文章去重的方法、装置、终端及计算机可读存储介质 | |
JP2007528531A (ja) | カテゴリ別のキーワードの入力順位を提供するための検索サービスシステムおよびその方法 | |
US9547701B2 (en) | Method of discovering and exploring feature knowledge | |
WO2008050108A1 (en) | Fast database matching | |
CN107169065B (zh) | 一种特定内容的去除方法和装置 | |
CN108170664B (zh) | 基于重点关键词的关键词拓展方法和装置 | |
JP2020525949A (ja) | メディア検索方法及び装置 | |
CN111143582B (zh) | 一种双索引实时更新联想词的多媒体资源推荐方法及装置 | |
CN113032436B (zh) | 基于文章内容和标题的搜索方法和装置 | |
CN111723201A (zh) | 一种用于文本数据聚类的方法和装置 | |
CN114610960A (zh) | 基于item2vec和向量聚类的实时推荐方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200417 Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Applicant after: Alibaba (China) Co.,Ltd. Address before: 510000 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping square B radio tower 12 layer self unit 01 Applicant before: GUANGZHOU SHENMA MOBILE INFORMATION TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
CB02 | Change of applicant information |
Address after: Room 554, 5 / F, building 3, 969 Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Applicant after: Alibaba (China) Co.,Ltd. Address before: 310052 room 508, 5th floor, building 4, No. 699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Applicant before: Alibaba (China) Co.,Ltd. |
|
CB02 | Change of applicant information | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220915 Address after: 510665 Room 302, Room 301, No. 38, Gaopu Road, Tianhe District, Guangzhou, Guangdong Applicant after: UC MOBILE (CHINA) Co.,Ltd. Address before: Room 554, 5 / F, building 3, 969 Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Applicant before: Alibaba (China) Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |