CN1168031C - 基于文本内容特征相似度和主题相关程度比较的内容过滤器 - Google Patents
基于文本内容特征相似度和主题相关程度比较的内容过滤器 Download PDFInfo
- Publication number
- CN1168031C CN1168031C CNB011314206A CN01131420A CN1168031C CN 1168031 C CN1168031 C CN 1168031C CN B011314206 A CNB011314206 A CN B011314206A CN 01131420 A CN01131420 A CN 01131420A CN 1168031 C CN1168031 C CN 1168031C
- Authority
- CN
- China
- Prior art keywords
- text
- content
- similarity
- filter
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000012549 training Methods 0.000 claims abstract description 121
- 238000001914 filtration Methods 0.000 claims abstract description 88
- 238000000034 method Methods 0.000 claims description 41
- 238000000605 extraction Methods 0.000 claims description 25
- 230000000694 effects Effects 0.000 claims description 24
- 238000011156 evaluation Methods 0.000 claims description 19
- 239000000284 extract Substances 0.000 claims description 14
- 238000012937 correction Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 239000000729 antidote Substances 0.000 claims description 2
- 238000009434 installation Methods 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000009931 harmful effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (36)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB011314206A CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
US10/488,731 US7617090B2 (en) | 2001-09-07 | 2002-05-23 | Contents filter based on the comparison between similarity of content character and correlation of subject matter |
PCT/CN2002/000346 WO2003038667A1 (fr) | 2001-09-07 | 2002-05-23 | Filtre a contenu fonctionnant par comparaison entre la similarite de caracteres de contenu et la correlation de la matiere |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB011314206A CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1403959A CN1403959A (zh) | 2003-03-19 |
CN1168031C true CN1168031C (zh) | 2004-09-22 |
Family
ID=4670569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB011314206A Expired - Fee Related CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7617090B2 (zh) |
CN (1) | CN1168031C (zh) |
WO (1) | WO2003038667A1 (zh) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8769127B2 (en) * | 2006-02-10 | 2014-07-01 | Northrop Grumman Systems Corporation | Cross-domain solution (CDS) collaborate-access-browse (CAB) and assured file transfer (AFT) |
US8095538B2 (en) * | 2006-11-20 | 2012-01-10 | Funnelback Pty Ltd | Annotation index system and method |
US8281361B1 (en) * | 2009-03-26 | 2012-10-02 | Symantec Corporation | Methods and systems for enforcing parental-control policies on user-generated content |
US8725771B2 (en) * | 2010-04-30 | 2014-05-13 | Orbis Technologies, Inc. | Systems and methods for semantic search, content correlation and visualization |
CN101876968A (zh) * | 2010-05-06 | 2010-11-03 | 复旦大学 | 对网络文本与手机短信进行不良内容识别的方法 |
CN105355214A (zh) * | 2011-08-19 | 2016-02-24 | 杜比实验室特许公司 | 测量相似度的方法和设备 |
CN102411611B (zh) * | 2011-10-15 | 2013-01-02 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
US9633003B2 (en) * | 2012-12-18 | 2017-04-25 | International Business Machines Corporation | System support for evaluation consistency |
CN104008098B (zh) * | 2013-02-21 | 2018-09-18 | 腾讯科技(深圳)有限公司 | 基于多义性关键词的文本过滤方法及装置 |
CN103235773B (zh) * | 2013-04-26 | 2019-02-12 | 百度在线网络技术(北京)有限公司 | 基于关键词的文本的标签提取方法及装置 |
CN105224569B (zh) | 2014-06-30 | 2018-09-07 | 华为技术有限公司 | 一种数据过滤、构造数据滤波器的方法及装置 |
CN105320641B (zh) * | 2014-07-30 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN104615714B (zh) * | 2015-02-05 | 2019-05-24 | 北京中搜云商网络技术有限公司 | 基于文本相似度和微博频道特征的博文排重方法 |
CN105630766B (zh) * | 2015-12-22 | 2018-11-06 | 北京奇虎科技有限公司 | 多新闻之间相关性计算方法和装置 |
CN105654113B (zh) * | 2015-12-23 | 2020-02-21 | 北京奇虎科技有限公司 | 文章指纹特征生成方法和装置 |
CN107798004B (zh) * | 2016-08-29 | 2022-09-30 | 中兴通讯股份有限公司 | 关键词查找方法、装置及终端 |
CN106611042A (zh) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | 一种新的文本特征词汇提取方法 |
CN107133202A (zh) * | 2017-06-01 | 2017-09-05 | 北京百度网讯科技有限公司 | 基于人工智能的文本校验方法和装置 |
CN108334605B (zh) * | 2018-02-01 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 文本分类方法、装置、计算机设备及存储介质 |
CN108427954B (zh) * | 2018-03-19 | 2021-08-27 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN109062905B (zh) * | 2018-09-04 | 2022-06-24 | 武汉斗鱼网络科技有限公司 | 一种弹幕文本价值评价方法、装置、设备及介质 |
KR102194281B1 (ko) * | 2019-01-14 | 2020-12-22 | 박준희 | 시대별로 음원과 영상이 결합된 음악 방송 콘텐츠 제작 시스템 및 방법 |
CN110717028B (zh) * | 2019-10-18 | 2022-02-15 | 支付宝(杭州)信息技术有限公司 | 一种剔除干扰问题对的方法及系统 |
CN111159115A (zh) * | 2019-12-27 | 2020-05-15 | 深信服科技股份有限公司 | 相似文件检测方法、装置、设备及存储介质 |
CN112560457B (zh) * | 2020-12-04 | 2024-03-12 | 上海秒针网络科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
CN113553825B (zh) * | 2021-07-23 | 2023-03-21 | 安徽商信政通信息技术股份有限公司 | 一种电子公文脉络关系分析方法及系统 |
CN114928472B (zh) * | 2022-04-20 | 2023-07-18 | 哈尔滨工业大学(威海) | 一种基于全量流通主域名的不良站点灰名单过滤方法 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2950222B2 (ja) | 1996-01-12 | 1999-09-20 | 日本電気株式会社 | 情報検索方式 |
JP2800769B2 (ja) * | 1996-03-29 | 1998-09-21 | 日本電気株式会社 | 情報フィルタリング方式 |
US6092091A (en) | 1996-09-13 | 2000-07-18 | Kabushiki Kaisha Toshiba | Device and method for filtering information, device and method for monitoring updated document information and information storage medium used in same devices |
US5996011A (en) * | 1997-03-25 | 1999-11-30 | Unified Research Laboratories, Inc. | System and method for filtering data received by a computer system |
US6266664B1 (en) * | 1997-10-01 | 2001-07-24 | Rulespace, Inc. | Method for scanning, analyzing and rating digital information content |
US6233618B1 (en) * | 1998-03-31 | 2001-05-15 | Content Advisor, Inc. | Access control of networked data |
US6493744B1 (en) * | 1999-08-16 | 2002-12-10 | International Business Machines Corporation | Automatic rating and filtering of data files for objectionable content |
US6633855B1 (en) * | 2000-01-06 | 2003-10-14 | International Business Machines Corporation | Method, system, and program for filtering content using neural networks |
US20030009495A1 (en) * | 2001-06-29 | 2003-01-09 | Akli Adjaoute | Systems and methods for filtering electronic content |
-
2001
- 2001-09-07 CN CNB011314206A patent/CN1168031C/zh not_active Expired - Fee Related
-
2002
- 2002-05-23 US US10/488,731 patent/US7617090B2/en active Active
- 2002-05-23 WO PCT/CN2002/000346 patent/WO2003038667A1/zh not_active Application Discontinuation
Also Published As
Publication number | Publication date |
---|---|
WO2003038667A1 (fr) | 2003-05-08 |
US7617090B2 (en) | 2009-11-10 |
CN1403959A (zh) | 2003-03-19 |
US20040243537A1 (en) | 2004-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1168031C (zh) | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 | |
CN1101032C (zh) | 相关词抽取设备和方法 | |
CN1109994C (zh) | 文件处理装置与记录媒体 | |
CN1145901C (zh) | 一种基于信息挖掘的智能决策支持构造方法 | |
CN1266624C (zh) | 学习支持系统 | |
CN101079026A (zh) | 文本相似度、词义相似度计算方法和系统及应用系统 | |
CN1855103A (zh) | 特定元素、字符串向量生成及相似性计算的装置、方法 | |
CN1215457C (zh) | 语句识别装置和方法 | |
CN101042868A (zh) | 群集系统、方法、程序和使用群集系统的属性估计系统 | |
CN1316083A (zh) | 使用语音识别模型的自动的语言评估 | |
CN1975858A (zh) | 会话控制装置 | |
CN1728143A (zh) | 基于短语产生文献说明 | |
CN1728141A (zh) | 信息检索系统中基于短语的搜索 | |
CN1728142A (zh) | 信息检索系统中的短语识别 | |
CN1310825A (zh) | 用于分类文本以及构造文本分类器的方法和装置 | |
CN1670729A (zh) | 使用隐含谓词的改善的查询优化器 | |
CN1728140A (zh) | 信息检索系统中基于短语的索引编制 | |
CN1536483A (zh) | 网络信息抽取及处理的方法及系统 | |
CN1701324A (zh) | 用于分类文档的系统,方法和软件 | |
CN1281191A (zh) | 信息检索方法和信息检索装置 | |
CN1406355A (zh) | 执行信息通信和信息提供的信息通信系统 | |
CN1091906C (zh) | 模式识别方法和系统以及模式数据处理系统 | |
CN1578955A (zh) | 关联规则数据挖掘所用的采样方法 | |
CN1991819A (zh) | 语言形态分析器 | |
CN1641633A (zh) | 基于成熟工艺文档的工艺术语提取、规律分析和重用方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20170428 Address after: 100055 Beijing, Guang'an, No. 305 Xicheng District street, No. two, building 10, floor 9, floor 1112 Patentee after: New venture (Beijing) Consulting Service Co., Ltd. Address before: 100085 Beijing, Haidian District information industry base on the road No. 6 Patentee before: Lenovo (Beijing) Co., Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191115 Address after: West Street in the official Zhejiang city of Ningbo province Zhenhai District 315000 Village No. 777 Patentee after: Ningbo Lezhi Yongchuang Technology Service Co., Ltd Address before: 100055 Beijing, Guang'an, No. 305 Xicheng District street, No. two, building 10, floor 9, floor 1112 Patentee before: Lezhi Xinchuang (Beijing) Consulting Service Co., Ltd. |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20040922 Termination date: 20200907 |
|
CF01 | Termination of patent right due to non-payment of annual fee |