CN1403959A - 基于文本内容特征相似度和主题相关程度比较的内容过滤器 - Google Patents
基于文本内容特征相似度和主题相关程度比较的内容过滤器 Download PDFInfo
- Publication number
- CN1403959A CN1403959A CN01131420A CN01131420A CN1403959A CN 1403959 A CN1403959 A CN 1403959A CN 01131420 A CN01131420 A CN 01131420A CN 01131420 A CN01131420 A CN 01131420A CN 1403959 A CN1403959 A CN 1403959A
- Authority
- CN
- China
- Prior art keywords
- content
- text
- similarity
- correlation degree
- theme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 claims abstract description 123
- 238000001914 filtration Methods 0.000 claims abstract description 79
- 238000000034 method Methods 0.000 claims description 42
- 238000000605 extraction Methods 0.000 claims description 24
- 230000000694 effects Effects 0.000 claims description 23
- 238000011156 evaluation Methods 0.000 claims description 19
- 239000000284 extract Substances 0.000 claims description 13
- 238000012937 correction Methods 0.000 claims description 7
- 230000009467 reduction Effects 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 239000000729 antidote Substances 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000009931 harmful effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (44)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB011314206A CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
PCT/CN2002/000346 WO2003038667A1 (fr) | 2001-09-07 | 2002-05-23 | Filtre a contenu fonctionnant par comparaison entre la similarite de caracteres de contenu et la correlation de la matiere |
US10/488,731 US7617090B2 (en) | 2001-09-07 | 2002-05-23 | Contents filter based on the comparison between similarity of content character and correlation of subject matter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB011314206A CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1403959A true CN1403959A (zh) | 2003-03-19 |
CN1168031C CN1168031C (zh) | 2004-09-22 |
Family
ID=4670569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB011314206A Expired - Fee Related CN1168031C (zh) | 2001-09-07 | 2001-09-07 | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 |
Country Status (3)
Country | Link |
---|---|
US (1) | US7617090B2 (zh) |
CN (1) | CN1168031C (zh) |
WO (1) | WO2003038667A1 (zh) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101876968A (zh) * | 2010-05-06 | 2010-11-03 | 复旦大学 | 对网络文本与手机短信进行不良内容识别的方法 |
CN102411611A (zh) * | 2011-10-15 | 2012-04-11 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
CN104615714A (zh) * | 2015-02-05 | 2015-05-13 | 北京中搜网络技术股份有限公司 | 基于文本相似度和微博频道特征的博文排重方法 |
CN105320641A (zh) * | 2014-07-30 | 2016-02-10 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN105355214A (zh) * | 2011-08-19 | 2016-02-24 | 杜比实验室特许公司 | 测量相似度的方法和设备 |
CN106611042A (zh) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | 一种新的文本特征词汇提取方法 |
US9755616B2 (en) | 2014-06-30 | 2017-09-05 | Huawei Technologies Co., Ltd. | Method and apparatus for data filtering, and method and apparatus for constructing data filter |
WO2018041036A1 (zh) * | 2016-08-29 | 2018-03-08 | 中兴通讯股份有限公司 | 关键词的查找方法、装置及终端 |
CN108427954A (zh) * | 2018-03-19 | 2018-08-21 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN112560457A (zh) * | 2020-12-04 | 2021-03-26 | 上海风秩科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
US11562145B2 (en) * | 2018-02-01 | 2023-01-24 | Tencent Technology (Shenzhen) Company Limited | Text classification method, computer device, and storage medium |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8769127B2 (en) * | 2006-02-10 | 2014-07-01 | Northrop Grumman Systems Corporation | Cross-domain solution (CDS) collaborate-access-browse (CAB) and assured file transfer (AFT) |
AU2007324329B2 (en) * | 2006-11-20 | 2012-01-12 | Squiz Pty Ltd | Annotation index system and method |
US8281361B1 (en) * | 2009-03-26 | 2012-10-02 | Symantec Corporation | Methods and systems for enforcing parental-control policies on user-generated content |
US20110271232A1 (en) * | 2010-04-30 | 2011-11-03 | Orbis Technologies, Inc. | Systems and methods for semantic search, content correlation and visualization |
US9633003B2 (en) * | 2012-12-18 | 2017-04-25 | International Business Machines Corporation | System support for evaluation consistency |
CN104008098B (zh) * | 2013-02-21 | 2018-09-18 | 腾讯科技(深圳)有限公司 | 基于多义性关键词的文本过滤方法及装置 |
CN103235773B (zh) * | 2013-04-26 | 2019-02-12 | 百度在线网络技术(北京)有限公司 | 基于关键词的文本的标签提取方法及装置 |
CN105630766B (zh) * | 2015-12-22 | 2018-11-06 | 北京奇虎科技有限公司 | 多新闻之间相关性计算方法和装置 |
CN105654113B (zh) * | 2015-12-23 | 2020-02-21 | 北京奇虎科技有限公司 | 文章指纹特征生成方法和装置 |
CN107133202A (zh) * | 2017-06-01 | 2017-09-05 | 北京百度网讯科技有限公司 | 基于人工智能的文本校验方法和装置 |
CN109062905B (zh) * | 2018-09-04 | 2022-06-24 | 武汉斗鱼网络科技有限公司 | 一种弹幕文本价值评价方法、装置、设备及介质 |
KR102194281B1 (ko) * | 2019-01-14 | 2020-12-22 | 박준희 | 시대별로 음원과 영상이 결합된 음악 방송 콘텐츠 제작 시스템 및 방법 |
CN110717028B (zh) * | 2019-10-18 | 2022-02-15 | 支付宝(杭州)信息技术有限公司 | 一种剔除干扰问题对的方法及系统 |
CN111159115A (zh) * | 2019-12-27 | 2020-05-15 | 深信服科技股份有限公司 | 相似文件检测方法、装置、设备及存储介质 |
CN113553825B (zh) * | 2021-07-23 | 2023-03-21 | 安徽商信政通信息技术股份有限公司 | 一种电子公文脉络关系分析方法及系统 |
CN114928472B (zh) * | 2022-04-20 | 2023-07-18 | 哈尔滨工业大学(威海) | 一种基于全量流通主域名的不良站点灰名单过滤方法 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2950222B2 (ja) | 1996-01-12 | 1999-09-20 | 日本電気株式会社 | 情報検索方式 |
JP2800769B2 (ja) * | 1996-03-29 | 1998-09-21 | 日本電気株式会社 | 情報フィルタリング方式 |
US6092091A (en) * | 1996-09-13 | 2000-07-18 | Kabushiki Kaisha Toshiba | Device and method for filtering information, device and method for monitoring updated document information and information storage medium used in same devices |
US5996011A (en) * | 1997-03-25 | 1999-11-30 | Unified Research Laboratories, Inc. | System and method for filtering data received by a computer system |
US6266664B1 (en) * | 1997-10-01 | 2001-07-24 | Rulespace, Inc. | Method for scanning, analyzing and rating digital information content |
US6233618B1 (en) * | 1998-03-31 | 2001-05-15 | Content Advisor, Inc. | Access control of networked data |
US6493744B1 (en) * | 1999-08-16 | 2002-12-10 | International Business Machines Corporation | Automatic rating and filtering of data files for objectionable content |
US6633855B1 (en) * | 2000-01-06 | 2003-10-14 | International Business Machines Corporation | Method, system, and program for filtering content using neural networks |
US20030009495A1 (en) * | 2001-06-29 | 2003-01-09 | Akli Adjaoute | Systems and methods for filtering electronic content |
-
2001
- 2001-09-07 CN CNB011314206A patent/CN1168031C/zh not_active Expired - Fee Related
-
2002
- 2002-05-23 US US10/488,731 patent/US7617090B2/en active Active
- 2002-05-23 WO PCT/CN2002/000346 patent/WO2003038667A1/zh not_active Application Discontinuation
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101876968A (zh) * | 2010-05-06 | 2010-11-03 | 复旦大学 | 对网络文本与手机短信进行不良内容识别的方法 |
CN105355214A (zh) * | 2011-08-19 | 2016-02-24 | 杜比实验室特许公司 | 测量相似度的方法和设备 |
CN102411611A (zh) * | 2011-10-15 | 2012-04-11 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
CN102411611B (zh) * | 2011-10-15 | 2013-01-02 | 西安交通大学 | 一种面向即时交互文本的事件识别与跟踪方法 |
US9755616B2 (en) | 2014-06-30 | 2017-09-05 | Huawei Technologies Co., Ltd. | Method and apparatus for data filtering, and method and apparatus for constructing data filter |
CN105320641B (zh) * | 2014-07-30 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN105320641A (zh) * | 2014-07-30 | 2016-02-10 | 腾讯科技(深圳)有限公司 | 一种文本校验方法及用户终端 |
CN104615714A (zh) * | 2015-02-05 | 2015-05-13 | 北京中搜网络技术股份有限公司 | 基于文本相似度和微博频道特征的博文排重方法 |
WO2018041036A1 (zh) * | 2016-08-29 | 2018-03-08 | 中兴通讯股份有限公司 | 关键词的查找方法、装置及终端 |
CN106611042A (zh) * | 2016-09-29 | 2017-05-03 | 四川用联信息技术有限公司 | 一种新的文本特征词汇提取方法 |
US11562145B2 (en) * | 2018-02-01 | 2023-01-24 | Tencent Technology (Shenzhen) Company Limited | Text classification method, computer device, and storage medium |
CN108427954A (zh) * | 2018-03-19 | 2018-08-21 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN108427954B (zh) * | 2018-03-19 | 2021-08-27 | 上海壹墨图文设计制作有限公司 | 一种标牌信息采集与识别系统 |
CN112560457A (zh) * | 2020-12-04 | 2021-03-26 | 上海风秩科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
CN112560457B (zh) * | 2020-12-04 | 2024-03-12 | 上海秒针网络科技有限公司 | 基于非监督的文本去噪方法、系统、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
WO2003038667A1 (fr) | 2003-05-08 |
US20040243537A1 (en) | 2004-12-02 |
CN1168031C (zh) | 2004-09-22 |
US7617090B2 (en) | 2009-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1168031C (zh) | 基于文本内容特征相似度和主题相关程度比较的内容过滤器 | |
CN1145901C (zh) | 一种基于信息挖掘的智能决策支持构造方法 | |
CN1701324A (zh) | 用于分类文档的系统,方法和软件 | |
CN1536483A (zh) | 网络信息抽取及处理的方法及系统 | |
CN1794266A (zh) | 生物特征融合的身份识别和认证方法 | |
CN101042868A (zh) | 群集系统、方法、程序和使用群集系统的属性估计系统 | |
CN1215457C (zh) | 语句识别装置和方法 | |
CN1691007A (zh) | 用于文档处理的方法、系统或存储计算机程序的存储器 | |
CN100336056C (zh) | 基于成熟工艺文档的工艺术语提取、规律分析和重用方法 | |
CN1053247C (zh) | 配水流量预测装置 | |
CN1871597A (zh) | 利用一套消歧技术处理文本的系统和方法 | |
CN1924858A (zh) | 一种获取新词的方法、装置以及一种输入法系统 | |
CN1368693A (zh) | 用于全球化软件的方法和设备 | |
CN1489089A (zh) | 文件检索系统和问题回答系统 | |
CN1577328A (zh) | 基于视觉的文档分割 | |
CN1495644A (zh) | 评估文件的特殊性 | |
CN1297561A (zh) | 语音合成系统与语音合成方法 | |
CN1677388A (zh) | 用于逻辑形式的统计语言模型 | |
CN1932756A (zh) | 动态生成用于合成数据的语音可导航菜单的方法和系统 | |
CN1910573A (zh) | 用来识别并分类命名实体的系统 | |
CN1474379A (zh) | 语音识别/响应系统、语音/识别响应程序及其记录介质 | |
CN1156779C (zh) | 文献检索的方法和装置 | |
CN1282151C (zh) | 语音识别设备和语音识别方法 | |
CN1991819A (zh) | 语言形态分析器 | |
CN1570958A (zh) | 多字体多字号印刷体藏文字符识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20170428 Address after: 100055 Beijing, Guang'an, No. 305 Xicheng District street, No. two, building 10, floor 9, floor 1112 Patentee after: New venture (Beijing) Consulting Service Co., Ltd. Address before: 100085 Beijing, Haidian District information industry base on the road No. 6 Patentee before: Lenovo (Beijing) Co., Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20191115 Address after: West Street in the official Zhejiang city of Ningbo province Zhenhai District 315000 Village No. 777 Patentee after: Ningbo Lezhi Yongchuang Technology Service Co., Ltd Address before: 100055 Beijing, Guang'an, No. 305 Xicheng District street, No. two, building 10, floor 9, floor 1112 Patentee before: Lezhi Xinchuang (Beijing) Consulting Service Co., Ltd. |
|
TR01 | Transfer of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20040922 Termination date: 20200907 |
|
CF01 | Termination of patent right due to non-payment of annual fee |