CN1403959A - 基于文本内容特征相似度和主题相关程度比较的内容过滤器 - Google Patents
基于文本内容特征相似度和主题相关程度比较的内容过滤器 Download PDFInfo
- Publication number
- CN1403959A CN1403959A CN01131420A CN01131420A CN1403959A CN 1403959 A CN1403959 A CN 1403959A CN 01131420 A CN01131420 A CN 01131420A CN 01131420 A CN01131420 A CN 01131420A CN 1403959 A CN1403959 A CN 1403959A
- Authority
- CN
- China
- Prior art keywords
- text
- similarity
- content
- comparison
- filter based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 claims abstract description 145
- 238000012549 training Methods 0.000 claims abstract description 124
- 239000013598 vector Substances 0.000 claims description 58
- 238000000034 method Methods 0.000 claims description 41
- 238000012937 correction Methods 0.000 claims description 38
- 238000011156 evaluation Methods 0.000 claims description 37
- 238000000605 extraction Methods 0.000 claims description 28
- 230000000694 effects Effects 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 17
- 238000000926 separation method Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 239000013589 supplement Substances 0.000 claims description 3
- 230000003313 weakening effect Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 206010033307 Overweight Diseases 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000010297 mechanical methods and process Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (44)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB011314206A CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
PCT/CN2002/000346 WO2003038667A1 (fr) | 2001-09-07 | 2002-05-23 | Filtre a contenu fonctionnant par comparaison entre la similarite de caracteres de contenu et la correlation de la matiere |
US10/488,731 US7617090B2 (en) | 2001-09-07 | 2002-05-23 | Contents filter based on the comparison between similarity of content character and correlation of subject matter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB011314206A CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1403959A true CN1403959A (zh) | 2003-03-19 |
CN1168031C CN1168031C (zh) | 2004-09-22 |
Family
ID=4670569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB011314206A Expired - Fee Related CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7617090B2 (zh) |
CN (1) | CN1168031C (zh) |
WO (1) | WO2003038667A1 (zh) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101876968A (zh) * | 2010-05-06 | 2010-11-03 | 复旦大学 | 对网络文本与手机短信进行不良内容识别的方法 |
CN102411611A (zh) * | 2011-10-15 | 2012-04-11 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
CN104615714A (zh) * | 2015-02-05 | 2015-05-13 | 北京中搜网络技术股份有限公司 | 基于文本相似度和微博频道特征的博文排重方法 |
CN105320641A (zh) * | 2014-07-30 | 2016-02-10 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN105355214A (zh) * | 2011-08-19 | 2016-02-24 | 杜比实验室特许公司 | 测量相似度的方法和设备 |
CN106611042A (zh) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | 一种新的文本特征词汇提取方法 |
US9755616B2 (en) | 2014-06-30 | 2017-09-05 | Huawei Technologies Co., Ltd. | Method and apparatus for data filtering, and method and apparatus for constructing data filter |
WO2018041036A1 (zh) * | 2016-08-29 | 2018-03-08 | 中兴通讯股份有限公司 | 关键词的查找方法、装置及终端 |
CN108427954A (zh) * | 2018-03-19 | 2018-08-21 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN112560457A (zh) * | 2020-12-04 | 2021-03-26 | 上海风秩科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
US11562145B2 (en) * | 2018-02-01 | 2023-01-24 | Tencent Technology (Shenzhen) Company Limited | Text classification method, computer device, and storage medium |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8769127B2 (en) * | 2006-02-10 | 2014-07-01 | Northrop Grumman Systems Corporation | Cross-domain solution (CDS) collaborate-access-browse (CAB) and assured file transfer (AFT) |
AU2007324329B2 (en) * | 2006-11-20 | 2012-01-12 | Squiz Pty Ltd | Annotation index system and method |
US8281361B1 (en) * | 2009-03-26 | 2012-10-02 | Symantec Corporation | Methods and systems for enforcing parental-control policies on user-generated content |
WO2011137386A1 (en) * | 2010-04-30 | 2011-11-03 | Orbis Technologies, Inc. | Systems and methods for semantic search, content correlation and visualization |
US9633003B2 (en) * | 2012-12-18 | 2017-04-25 | International Business Machines Corporation | System support for evaluation consistency |
CN104008098B (zh) * | 2013-02-21 | 2018-09-18 | 腾讯科技(深圳)有限公司 | 基于多义性关键词的文本过滤方法及装置 |
CN103235773B (zh) * | 2013-04-26 | 2019-02-12 | 百度在线网络技术(北京)有限公司 | 基于关键词的文本的标签提取方法及装置 |
CN105630766B (zh) * | 2015-12-22 | 2018-11-06 | 北京奇虎科技有限公司 | 多新闻之间相关性计算方法和装置 |
CN105654113B (zh) * | 2015-12-23 | 2020-02-21 | 北京奇虎科技有限公司 | 文章指纹特征生成方法和装置 |
CN107133202A (zh) * | 2017-06-01 | 2017-09-05 | 北京百度网讯科技有限公司 | 基于人工智能的文本校验方法和装置 |
CN109062905B (zh) * | 2018-09-04 | 2022-06-24 | 武汉斗鱼网络科技有限公司 | 一种弹幕文本价值评价方法、装置、设备及介质 |
KR102194281B1 (ko) * | 2019-01-14 | 2020-12-22 | 박준희 | 시대별로 음원과 영상이 결합된 음악 방송 콘텐츠 제작 시스템 및 방법 |
CN110717028B (zh) * | 2019-10-18 | 2022-02-15 | 支付宝(杭州)信息技术有限公司 | 一种剔除干扰问题对的方法及系统 |
CN111159115A (zh) * | 2019-12-27 | 2020-05-15 | 深信服科技股份有限公司 | 相似文件检测方法、装置、设备及存储介质 |
CN113553825B (zh) * | 2021-07-23 | 2023-03-21 | 安徽商信政通信息技术股份有限公司 | 一种电子公文脉络关系分析方法及系统 |
CN114928472B (zh) * | 2022-04-20 | 2023-07-18 | 哈尔滨工业大学(威海) | 一种基于全量流通主域名的不良站点灰名单过滤方法 |
CN117234632A (zh) * | 2022-06-07 | 2023-12-15 | 英业达科技有限公司 | 加值内容提供方法及其电脑系统 |
CN118278817B (zh) * | 2024-04-17 | 2024-09-06 | 广东生态工程职业学院 | 一种基于词移距离的研学旅游效果评价方法及系统 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2950222B2 (ja) | 1996-01-12 | 1999-09-20 | 日本電気株式会社 | 情報検索方式 |
JP2800769B2 (ja) * | 1996-03-29 | 1998-09-21 | 日本電気株式会社 | 情報フィルタリング方式 |
US6092091A (en) * | 1996-09-13 | 2000-07-18 | Kabushiki Kaisha Toshiba | Device and method for filtering information, device and method for monitoring updated document information and information storage medium used in same devices |
US5996011A (en) * | 1997-03-25 | 1999-11-30 | Unified Research Laboratories, Inc. | System and method for filtering data received by a computer system |
US6266664B1 (en) * | 1997-10-01 | 2001-07-24 | Rulespace, Inc. | Method for scanning, analyzing and rating digital information content |
US6233618B1 (en) * | 1998-03-31 | 2001-05-15 | Content Advisor, Inc. | Access control of networked data |
US6493744B1 (en) * | 1999-08-16 | 2002-12-10 | International Business Machines Corporation | Automatic rating and filtering of data files for objectionable content |
US6633855B1 (en) * | 2000-01-06 | 2003-10-14 | International Business Machines Corporation | Method, system, and program for filtering content using neural networks |
US20030009495A1 (en) * | 2001-06-29 | 2003-01-09 | Akli Adjaoute | Systems and methods for filtering electronic content |
-
2001
- 2001-09-07 CN CNB011314206A patent/CN1168031C/zh not_active Expired - Fee Related
-
2002
- 2002-05-23 US US10/488,731 patent/US7617090B2/en active Active
- 2002-05-23 WO PCT/CN2002/000346 patent/WO2003038667A1/zh not_active Application Discontinuation
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101876968A (zh) * | 2010-05-06 | 2010-11-03 | 复旦大学 | 对网络文本与手机短信进行不良内容识别的方法 |
CN105355214A (zh) * | 2011-08-19 | 2016-02-24 | 杜比实验室特许公司 | 测量相似度的方法和设备 |
CN102411611A (zh) * | 2011-10-15 | 2012-04-11 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
CN102411611B (zh) * | 2011-10-15 | 2013-01-02 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
US9755616B2 (en) | 2014-06-30 | 2017-09-05 | Huawei Technologies Co., Ltd. | Method and apparatus for data filtering, and method and apparatus for constructing data filter |
CN105320641B (zh) * | 2014-07-30 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN105320641A (zh) * | 2014-07-30 | 2016-02-10 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN104615714A (zh) * | 2015-02-05 | 2015-05-13 | 北京中搜网络技术股份有限公司 | 基于文本相似度和微博频道特征的博文排重方法 |
WO2018041036A1 (zh) * | 2016-08-29 | 2018-03-08 | 中兴通讯股份有限公司 | 关键词的查找方法、装置及终端 |
CN106611042A (zh) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | 一种新的文本特征词汇提取方法 |
US11562145B2 (en) * | 2018-02-01 | 2023-01-24 | Tencent Technology (Shenzhen) Company Limited | Text classification method, computer device, and storage medium |
CN108427954A (zh) * | 2018-03-19 | 2018-08-21 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN108427954B (zh) * | 2018-03-19 | 2021-08-27 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN112560457A (zh) * | 2020-12-04 | 2021-03-26 | 上海风秩科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
CN112560457B (zh) * | 2020-12-04 | 2024-03-12 | 上海秒针网络科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20040243537A1 (en) | 2004-12-02 |
CN1168031C (zh) | 2004-09-22 |
WO2003038667A1 (fr) | 2003-05-08 |
US7617090B2 (en) | 2009-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1168031C (zh) | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 | |
CN1215457C (zh) | 语句识别装置和方法 | |
CN1228762C (zh) | 用于语音识别的方法、组件、设备及服务器 | |
CN1174332C (zh) | 转换表达方式的方法和装置 | |
CN1101032C (zh) | 相关词抽取设备和方法 | |
CN1158627C (zh) | 用于字符识别的方法和装置 | |
CN101042868A (zh) | 群集系统、方法、程序和使用群集系统的属性估计系统 | |
CN1794266A (zh) | 生物特征融合的身份识别和认证方法 | |
CN1495644A (zh) | 评估文件的特殊性 | |
CN1281191A (zh) | 信息检索方法和信息检索装置 | |
CN1894688A (zh) | 对译判断装置、方法及程序 | |
CN1452157A (zh) | 语音识别设备和方法以及记录了语音识别程序的记录媒体 | |
CN1924858A (zh) | 一种获取新词的方法、装置以及一种输入法系统 | |
CN1542735A (zh) | 识别有调语言的系统和方法 | |
CN1818927A (zh) | 指纹识别方法与系统 | |
CN1752992A (zh) | 文字识别装置、文字识别方法及文字识别程序 | |
CN1474379A (zh) | 语音识别/响应系统、语音/识别响应程序及其记录介质 | |
CN1251130C (zh) | 多字体多字号印刷体藏文字符识别方法 | |
CN1282151C (zh) | 语音识别设备和语音识别方法 | |
CN1776724A (zh) | 基于网络的工程制图自动评判方法 | |
CN1200387C (zh) | 基于单个字符的统计笔迹鉴别和验证方法 | |
CN1991819A (zh) | 语言形态分析器 | |
CN1932819A (zh) | 一种互联网音频文件的聚类方法、搜索方法及系统 | |
CN1277398A (zh) | 文献检索的方法和装置 | |
CN1647069A (zh) | 对话控制系统和对话控制方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20170428 Address after: 100055 Beijing, Guang'an, No. 305 Xicheng District street, No. two, building 10, floor 9, floor 1112 Patentee after: New venture (Beijing) Consulting Service Co., Ltd. Address before: 100085 Beijing, Haidian District information industry base on the road No. 6 Patentee before: Lenovo (Beijing) Co., Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191115 Address after: West Street in the official Zhejiang city of Ningbo province Zhenhai District 315000 Village No. 777 Patentee after: Ningbo Lezhi Yongchuang Technology Service Co., Ltd Address before: 100055 Beijing, Guang'an, No. 305 Xicheng District street, No. two, building 10, floor 9, floor 1112 Patentee before: Lezhi Xinchuang (Beijing) Consulting Service Co., Ltd. |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20040922 Termination date: 20200907 |
|
CF01 | Termination of patent right due to non-payment of annual fee |