CN100565503C - 动态内容聚类 - Google Patents
动态内容聚类 Download PDFInfo
- Publication number
- CN100565503C CN100565503C CNB2004101020460A CN200410102046A CN100565503C CN 100565503 C CN100565503 C CN 100565503C CN B2004101020460 A CNB2004101020460 A CN B2004101020460A CN 200410102046 A CN200410102046 A CN 200410102046A CN 100565503 C CN100565503 C CN 100565503C
- Authority
- CN
- China
- Prior art keywords
- information
- neighborhood
- cluster
- document
- token
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (41)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US10/735,999 US7333985B2 (en) | 2003-12-15 | 2003-12-15 | Dynamic content clustering |
| US10/735,999 | 2003-12-15 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1629844A CN1629844A (zh) | 2005-06-22 |
| CN100565503C true CN100565503C (zh) | 2009-12-02 |
Family
ID=34523110
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CNB2004101020460A Expired - Fee Related CN100565503C (zh) | 2003-12-15 | 2004-12-15 | 动态内容聚类 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US7333985B2 (zh) |
| EP (1) | EP1544752A3 (zh) |
| JP (1) | JP4627656B2 (zh) |
| CN (1) | CN100565503C (zh) |
| BR (1) | BRPI0405741A (zh) |
| CA (1) | CA2490451C (zh) |
Families Citing this family (31)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050138049A1 (en) * | 2003-12-22 | 2005-06-23 | Greg Linden | Method for personalized news |
| US7523109B2 (en) * | 2003-12-24 | 2009-04-21 | Microsoft Corporation | Dynamic grouping of content including captive data |
| US9760629B1 (en) | 2004-12-29 | 2017-09-12 | Google Inc. | Systems and methods for implementing a news round table |
| CN100458788C (zh) * | 2006-09-25 | 2009-02-04 | 北京搜狗科技发展有限公司 | 一种互联网音频文件的聚类方法、搜索方法及系统 |
| CN101000627B (zh) * | 2007-01-15 | 2010-05-19 | 北京搜狗科技发展有限公司 | 一种相关信息的发布方法和装置 |
| US20080243799A1 (en) * | 2007-03-30 | 2008-10-02 | Innography, Inc. | System and method of generating a set of search results |
| US20090063470A1 (en) * | 2007-08-28 | 2009-03-05 | Nogacom Ltd. | Document management using business objects |
| US8495074B2 (en) * | 2008-12-30 | 2013-07-23 | Apple Inc. | Effects application based on object clustering |
| US8533318B2 (en) * | 2009-10-06 | 2013-09-10 | International Business Machines Corporation | Processing and presenting multi-dimensioned transaction tracking data |
| WO2012032535A1 (en) * | 2010-09-08 | 2012-03-15 | Anuroop Iyengar | An intelligent portable e-book/ e- reader |
| US8798366B1 (en) | 2010-12-28 | 2014-08-05 | Amazon Technologies, Inc. | Electronic book pagination |
| US9069767B1 (en) | 2010-12-28 | 2015-06-30 | Amazon Technologies, Inc. | Aligning content items to identify differences |
| US9846688B1 (en) | 2010-12-28 | 2017-12-19 | Amazon Technologies, Inc. | Book version mapping |
| CN102063485A (zh) * | 2010-12-29 | 2011-05-18 | 深圳市永达电子股份有限公司 | 一种在线分析网络流中短文本信息聚类的方法 |
| US9026591B2 (en) | 2011-02-28 | 2015-05-05 | Avaya Inc. | System and method for advanced communication thread analysis |
| CN102654881B (zh) * | 2011-03-03 | 2014-10-22 | 富士通株式会社 | 用于名称消岐聚类的装置和方法 |
| US9881009B1 (en) * | 2011-03-15 | 2018-01-30 | Amazon Technologies, Inc. | Identifying book title sets |
| US9367526B1 (en) * | 2011-07-26 | 2016-06-14 | Nuance Communications, Inc. | Word classing for language modeling |
| US9026519B2 (en) | 2011-08-09 | 2015-05-05 | Microsoft Technology Licensing, Llc | Clustering web pages on a search engine results page |
| US20130157234A1 (en) * | 2011-12-14 | 2013-06-20 | Microsoft Corporation | Storyline visualization |
| CN103246676A (zh) * | 2012-02-10 | 2013-08-14 | 富士通株式会社 | 对消息进行聚类的方法和设备 |
| CN103399884A (zh) * | 2013-07-14 | 2013-11-20 | 王国栋 | 一种随机新闻系统及其自动刷新方法 |
| US9176969B2 (en) | 2013-08-29 | 2015-11-03 | Hewlett-Packard Development Company, L.P. | Integrating and extracting topics from content of heterogeneous sources |
| US9971594B2 (en) * | 2016-08-16 | 2018-05-15 | Sonatype, Inc. | Method and system for authoritative name analysis of true origin of a file |
| US10353928B2 (en) | 2016-11-30 | 2019-07-16 | International Business Machines Corporation | Real-time clustering using multiple representatives from a cluster |
| CN110019800B (zh) * | 2017-11-30 | 2023-06-20 | 腾讯科技(深圳)有限公司 | 分发内容处理方法、装置、计算机设备和存储介质 |
| CN109325524A (zh) * | 2018-08-31 | 2019-02-12 | 中国科学院自动化研究所 | 事件追踪与变化阶段划分方法、系统及相关设备 |
| US11928430B2 (en) * | 2019-09-12 | 2024-03-12 | Oracle International Corporation | Detecting unrelated utterances in a chatbot system |
| US11295398B2 (en) | 2019-10-02 | 2022-04-05 | Snapwise Inc. | Methods and systems to generate information about news source items describing news events or topics of interest |
| US11341203B2 (en) | 2019-10-02 | 2022-05-24 | Snapwise Inc. | Methods and systems to generate information about news source items describing news events or topics of interest |
| US11250200B2 (en) | 2020-03-16 | 2022-02-15 | Shopify Inc. | Systems and methods for generating digital layouts with feature-based formatting |
Family Cites Families (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6029195A (en) * | 1994-11-29 | 2000-02-22 | Herz; Frederick S. M. | System for customized electronic identification of desirable objects |
| JP3113814B2 (ja) * | 1996-04-17 | 2000-12-04 | インターナショナル・ビジネス・マシーンズ・コーポレ−ション | 情報検索方法及び情報検索装置 |
| US6539115B2 (en) * | 1997-02-12 | 2003-03-25 | Fujitsu Limited | Pattern recognition device for performing classification using a candidate table and method thereof |
| US6012053A (en) | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
| US5864690A (en) * | 1997-07-30 | 1999-01-26 | Integrated Device Technology, Inc. | Apparatus and method for register specific fill-in of register generic micro instructions within an instruction queue |
| JP2000181936A (ja) * | 1998-12-17 | 2000-06-30 | Nippon Telegr & Teleph Corp <Ntt> | 文書特徴抽出装置および文書分類装置 |
| US6678681B1 (en) | 1999-03-10 | 2004-01-13 | Google Inc. | Information extraction from a database |
| US20030050927A1 (en) * | 2001-09-07 | 2003-03-13 | Araha, Inc. | System and method for location, understanding and assimilation of digital documents through abstract indicia |
| US6615209B1 (en) | 2000-02-22 | 2003-09-02 | Google, Inc. | Detecting query-specific duplicate documents |
| JP2001283184A (ja) * | 2000-03-29 | 2001-10-12 | Matsushita Electric Ind Co Ltd | クラスタリング装置 |
| US7136854B2 (en) | 2000-07-06 | 2006-11-14 | Google, Inc. | Methods and apparatus for providing search results in response to an ambiguous search query |
| US6529903B2 (en) | 2000-07-06 | 2003-03-04 | Google, Inc. | Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query |
| KR20040041082A (ko) * | 2000-07-24 | 2004-05-13 | 비브콤 인코포레이티드 | 멀티미디어 북마크와 비디오의 가상 편집을 위한 시스템및 방법 |
| JP3701197B2 (ja) * | 2000-12-28 | 2005-09-28 | 松下電器産業株式会社 | 分類への帰属度計算基準作成方法及び装置 |
| US6658423B1 (en) | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
| US6526440B1 (en) | 2001-01-30 | 2003-02-25 | Google, Inc. | Ranking search results by reranking the results based on local inter-connectivity |
| US8001118B2 (en) | 2001-03-02 | 2011-08-16 | Google Inc. | Methods and apparatus for employing usage statistics in document retrieval |
| US20050022114A1 (en) * | 2001-08-13 | 2005-01-27 | Xerox Corporation | Meta-document management system with personality identifiers |
| US7133862B2 (en) * | 2001-08-13 | 2006-11-07 | Xerox Corporation | System with user directed enrichment and import/export control |
| EP1485825A4 (en) * | 2002-02-04 | 2008-03-19 | Cataphora Inc | DETAILED EXPLORATION TECHNIQUE OF SOCIOLOGICAL DATA AND CORRESPONDING APPARATUS |
| JP2003256466A (ja) * | 2002-03-04 | 2003-09-12 | Denso Corp | 適応的情報検索システム |
| US7523109B2 (en) | 2003-12-24 | 2009-04-21 | Microsoft Corporation | Dynamic grouping of content including captive data |
-
2003
- 2003-12-15 US US10/735,999 patent/US7333985B2/en not_active Expired - Fee Related
-
2004
- 2004-11-24 EP EP04027921A patent/EP1544752A3/en not_active Ceased
- 2004-12-14 CA CA2490451A patent/CA2490451C/en not_active Expired - Fee Related
- 2004-12-15 CN CNB2004101020460A patent/CN100565503C/zh not_active Expired - Fee Related
- 2004-12-15 BR BR0405741-4A patent/BRPI0405741A/pt not_active IP Right Cessation
- 2004-12-15 JP JP2004363715A patent/JP4627656B2/ja not_active Expired - Fee Related
Non-Patent Citations (1)
| Title |
|---|
| Generation and Search of Clustered Files. G.SALTON ,A. WONG.ACM Transactions on Database Systems,Vol.3 No.4. 1978 * |
Also Published As
| Publication number | Publication date |
|---|---|
| US20050131932A1 (en) | 2005-06-16 |
| JP4627656B2 (ja) | 2011-02-09 |
| EP1544752A2 (en) | 2005-06-22 |
| JP2005182808A (ja) | 2005-07-07 |
| CA2490451A1 (en) | 2005-06-15 |
| CA2490451C (en) | 2014-01-28 |
| EP1544752A3 (en) | 2007-10-31 |
| US7333985B2 (en) | 2008-02-19 |
| BRPI0405741A (pt) | 2005-08-02 |
| CN1629844A (zh) | 2005-06-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN100565503C (zh) | 动态内容聚类 | |
| US7523109B2 (en) | Dynamic grouping of content including captive data | |
| US7966337B2 (en) | System and method for prioritizing websites during a webcrawling process | |
| KR100898454B1 (ko) | 통합 검색 서비스 시스템 및 방법 | |
| US8001118B2 (en) | Methods and apparatus for employing usage statistics in document retrieval | |
| Koshman et al. | Web searching on the Vivisimo search engine | |
| US7840538B2 (en) | Discovering query intent from search queries and concept networks | |
| US6502091B1 (en) | Apparatus and method for discovering context groups and document categories by mining usage logs | |
| Beitzel et al. | Temporal analysis of a very large topically categorized web query log | |
| US6286018B1 (en) | Method and apparatus for finding a set of documents relevant to a focus set using citation analysis and spreading activation techniques | |
| US6182091B1 (en) | Method and apparatus for finding related documents in a collection of linked documents using a bibliographic coupling link analysis | |
| US11809432B2 (en) | Knowledge gathering system based on user's affinity | |
| Pu et al. | Subject categorization of query terms for exploring Web users' search interests | |
| US7634716B1 (en) | Techniques for finding related hyperlinked documents using link-based analysis | |
| US9245013B2 (en) | Message recommendation using word isolation and clustering | |
| US7302425B1 (en) | Distributed pre-cached query results and refresh method | |
| US20040163034A1 (en) | Systems and methods for labeling clusters of documents | |
| JP2008097641A (ja) | データベースのデータを検索するための方法と装置 | |
| Rasolofo et al. | Result merging strategies for a current news metasearcher | |
| JP2013168186A (ja) | レビュー処理方法およびシステム | |
| JP2001519952A (ja) | データ要約装置 | |
| JP2001076010A (ja) | データベース分割方法、プログラムを記録したプログラム記憶装置および記録媒体 | |
| JPH11213000A (ja) | インタラクティブ情報検索方法及び装置及びインタラクティブ情報検索プログラムを格納した記憶媒体 | |
| JPH11282874A (ja) | 情報フィルタリング方法および装置 | |
| US8849865B1 (en) | Querying a data store of impressions |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| ASS | Succession or assignment of patent right |
Owner name: MICROSOFT TECHNOLOGY LICENSING LLC Free format text: FORMER OWNER: MICROSOFT CORP. Effective date: 20150423 |
|
| C41 | Transfer of patent application or patent right or utility model | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20150423 Address after: Washington State Patentee after: Micro soft technique license Co., Ltd Address before: Washington State Patentee before: Microsoft Corp. |
|
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20091202 Termination date: 20191215 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |

