JP4425641B2 - 構造化ドキュメントの検索 - Google Patents
構造化ドキュメントの検索 Download PDFInfo
- Publication number
- JP4425641B2 JP4425641B2 JP2004001489A JP2004001489A JP4425641B2 JP 4425641 B2 JP4425641 B2 JP 4425641B2 JP 2004001489 A JP2004001489 A JP 2004001489A JP 2004001489 A JP2004001489 A JP 2004001489A JP 4425641 B2 JP4425641 B2 JP 4425641B2
- Authority
- JP
- Japan
- Prior art keywords
- document
- elements
- term
- search
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61J—CONTAINERS SPECIALLY ADAPTED FOR MEDICAL OR PHARMACEUTICAL PURPOSES; DEVICES OR METHODS SPECIALLY ADAPTED FOR BRINGING PHARMACEUTICAL PRODUCTS INTO PARTICULAR PHYSICAL OR ADMINISTERING FORMS; DEVICES FOR ADMINISTERING FOOD OR MEDICINES ORALLY; BABY COMFORTERS; DEVICES FOR RECEIVING SPITTLE
- A61J3/00—Devices or methods specially adapted for bringing pharmaceutical products into particular physical or administering forms
- A61J3/07—Devices or methods specially adapted for bringing pharmaceutical products into particular physical or administering forms into the form of capsules or similar small containers for oral use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3341—Query execution using boolean model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61J—CONTAINERS SPECIALLY ADAPTED FOR MEDICAL OR PHARMACEUTICAL PURPOSES; DEVICES OR METHODS SPECIALLY ADAPTED FOR BRINGING PHARMACEUTICAL PRODUCTS INTO PARTICULAR PHYSICAL OR ADMINISTERING FORMS; DEVICES FOR ADMINISTERING FOOD OR MEDICINES ORALLY; BABY COMFORTERS; DEVICES FOR RECEIVING SPITTLE
- A61J2200/00—General characteristics or adaptations
- A61J2200/40—Heating or cooling means; Combinations thereof
- A61J2200/42—Heating means
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61J—CONTAINERS SPECIALLY ADAPTED FOR MEDICAL OR PHARMACEUTICAL PURPOSES; DEVICES OR METHODS SPECIALLY ADAPTED FOR BRINGING PHARMACEUTICAL PRODUCTS INTO PARTICULAR PHYSICAL OR ADMINISTERING FORMS; DEVICES FOR ADMINISTERING FOOD OR MEDICINES ORALLY; BABY COMFORTERS; DEVICES FOR RECEIVING SPITTLE
- A61J2200/00—General characteristics or adaptations
- A61J2200/70—Device provided with specific sensor or indicating means
- A61J2200/72—Device provided with specific sensor or indicating means for temperature
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/953—Organization of data
- Y10S707/956—Hierarchical
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Description
図面すべてで、同様の特徴および同様の構成要素を同じ番号で示している。
重み(ti,Ej)=ln(1+tf(ti,Ej)×I(ti,Ej)
I(ti,Ej)は、要素Ejにおける用語tiのエントロピー測度であり、以下のとおり定義される。
計算3において、用語
1.各段落要素306に関して、計算1に従って用語の重みを計算する。
2.ボトムアップ方式に従う1つの上位レベルの任意の要素Ej(すなわち、セクション要素304またはドキュメント要素302)に関して、計算2を使用して用語の重みを計算する。重み(ti,Ej)≧平均(Ej)+std_dev(Ej)である場合、用語tiは、要素Ejの索引用語として選択され、Ejのすべての下位要素は、tiを自らの索引用語リストから除去する。ここで、平均(Ej)は、要素Ejにおけるすべての用語の重みの算術平均を表わし、std_dev(Ej)は、それらの重みの標準偏差を表わす。
3.根底要素、すなわち、ドキュメント要素302に到達するまで2を繰り返す。
102 クライアント
104 サーバ
106 データベース
108 ネットワーク
Claims (18)
- ユーザによって提供されたある探索用語に基づいて、データベースの中に記憶されている複数の構造化ドキュメントから、前記探索用語に関係のあるドキュメントを探索するコンピュータ実施方法であって、当該方法は、記憶装置に記憶されたコンピュータ実行可能命令を処理装置が実行することによって実施され、
複数の構造化ドキュメントを含むデータベースの探索用語に関するクエリを行うステップと、
前記探索用語を含まない構造化ドキュメントを探し出すステップと、
前記探索用語を含む構造化ドキュメントであるマッチした構造化ドキュメントの要素を評価することを、
それぞれの個々の要素が前記探索用語にどれだけよくマッチしているかに基づいて前記個々の要素にランク付けするステップであって、Nが、コーパスにおけるドキュメントの数を表わし、n i が、クエリ用語t i を含むドキュメントの数を表わす
前記ユーザがアクセスすることができる前記個々の要素のランク付けを前記ユーザに示すステップと
を含むことを特徴とする方法。 - 前記ユーザに示すステップは、前記マッチしたドキュメントの階層構造をユーザに表示するステップを含むことを特徴とする請求項1に記載の方法。
- 前記階層構造を前記表示するステップは、前記構造化ドキュメントの構造を表示する階層ツリーを提供するステップを含むことを特徴とする請求項2に記載の方法。
- 前記クエリの指定に応じて、段落、セクション、またはドキュメント全体とすることができる前記ドキュメントの細分性に基づいて前記個々のランク付けされた要素をスケーリングするステップをさらに含むことを特徴とする請求項1に記載の方法。
- 前記個々の数の前記ランク付けが、いくつかのグラフィック標識を使用して示されることを特徴とする請求項1に記載の方法。
- 前記いくつかのグラフィック標識は、いくつかのアスタリスクを含むことを特徴とする請求項5に記載の方法。
- 要素における用語の分布を測定するエントロピー測定を使用して個々の要素を重み付けするステップをさらに含むことを特徴とする請求項1に記載の方法。
- 構造化ドキュメントを前記探し出すステップは、少なくとも1つのクエリ用語を含むすべての要素を特定するステップを含み、
前記個々の要素に前記ランク付けするステップは、前記ランク付けされたパスに対応する要素を上位から順に戻すステップを含むことを特徴とする請求項1に記載の方法。 - 最も近いマッチを有する要素を表示するステップをさらに含むことを特徴とする請求項8に記載の方法。
- 前記要素に重み付けするステップであって、
各段落要素に関して、重み(ti,Pj)が段落Pjにおける用語tiの重みを表わし、「tf(ti,Pj)」がその段落におけるtiの用語頻度であり、Nが、コーパスにおけるドキュメントの数を表わし、niが、用語tiを含むドキュメントの数を表わす、計算、
上位レベルにおける任意のセクション要素Ejに関して、ボトムアップ方式に従って、「I(ti,Ej)」が要素Ejにおける用語tiのエントロピー測度であり、重み(ti,Ej)≧平均(Ej)+std_dev(Ej)である場合、用語tiが、要素Ejの索引用語として選択され、Ejのすべての下位要素が、自らの索引用語リストからtiを除去し、ただし、(Ej)は、要素Ejにおけるすべての用語の重みの算術平均を表わし、std_dev(Ej)は、それらの重みの標準偏差を表わす、計算、重み(ti,Ej)=ln(1+tf(ti,Ej))×I(ti,Ej)を使用して用語の重みを計算するステップと、
根底要素(すなわち、ドキュメント要素)に到達するまで、前記計算、重み(ti,Ej)=ln(1+tf(ti,Ej))×I(ti,Ej)を使用して前記用語の重みを前記計算するステップを繰り返すステップと
をさらに含むことを特徴とする請求項8に記載の方法。 - 処理装置によって実行された際、ユーザによって提供されたある探索用語に基づいて、データベースの中に記憶されている複数の構造化ドキュメントから、前記探索用語に関係のあるドキュメントを探索するコンピュータ実施方法を実施することができるコンピュータ実行可能命令を有するンピュータ可読記録媒体であって、前記方法は、
複数の構造化ドキュメントを含むデータベースの探索用語に関するクエリを行うステップと、
前記探索用語を含まない構造化ドキュメントを除去するステップと、
前記探索用語を含む構造化ドキュメントであるマッチした構造化ドキュメントを評価することを、
それぞれの個々の要素が前記探索用語にどれだけよくマッチしているかに基づいて前記個々の要素にランク付けするステップであって、Nが、コーパスにおけるドキュメントの数を表わし、n i が、クエリ用語t i を含むドキュメントの数を表わす
前記ユーザがアクセスすることができる前記個々の要素のランク付けを前記ユーザに示すステップと
を含むことを特徴とするコンピュータ可読記録媒体。 - 前記ユーザに示すステップは、前記マッチしたドキュメントの階層構造をユーザに表示するステップを含む請求項11に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
- 前記階層構造を前記表示するステップは、前記構造化ドキュメントの構造を表示する階層ツリーを提供するステップを含む請求項12に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
- 前記個々の要素の前記ランク付けが、数値によって示される請求項11に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
- 前記構造化ドキュメントの前記要素が、前記クエリの指定に応じて、段落、セクション、またはドキュメント全体とすることができる前記構造化ドキュメントの細分性に応じてスケーリングされる請求項11に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
- 前記ランク付けが、いくつかのアスタリスクを使用して示される請求項11に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
- 要素における用語の分布を測定するエントロピー測定を使用して個々の要素を重み付けするステップを含む請求項11に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
- 前記構造化ドキュメントを除去するステップは、少なくとも1つのクエリ用語を含むすべての要素を特定するステップを含み、
前記個々の要素に前記ランク付けするステップは、前記ランク付けされたパスに対応する要素を上位から順に戻すステップを含む請求項11に記載の方法を行うことを特徴とするコンピュータ可読記録媒体。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/337,138 US7111000B2 (en) | 2003-01-06 | 2003-01-06 | Retrieval of structured documents |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2004213675A JP2004213675A (ja) | 2004-07-29 |
JP4425641B2 true JP4425641B2 (ja) | 2010-03-03 |
Family
ID=32507431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004001489A Expired - Fee Related JP4425641B2 (ja) | 2003-01-06 | 2004-01-06 | 構造化ドキュメントの検索 |
Country Status (5)
Country | Link |
---|---|
US (4) | US7111000B2 (ja) |
EP (1) | EP1435581B1 (ja) |
JP (1) | JP4425641B2 (ja) |
KR (1) | KR101120760B1 (ja) |
CN (1) | CN100568229C (ja) |
Families Citing this family (86)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7111000B2 (en) * | 2003-01-06 | 2006-09-19 | Microsoft Corporation | Retrieval of structured documents |
JP4049317B2 (ja) * | 2003-05-14 | 2008-02-20 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 検索支援装置およびプログラム |
PL363397A1 (en) * | 2003-11-12 | 2005-05-16 | Advanced Digital Broadcast Ltd. | System for data search and definition in tree formats and method for data search and definition in tree formats |
US20050198059A1 (en) * | 2004-03-04 | 2005-09-08 | Peilin Chou | Database and database management system |
JP2005309727A (ja) * | 2004-04-21 | 2005-11-04 | Hitachi Ltd | ファイルシステム |
US7487145B1 (en) * | 2004-06-22 | 2009-02-03 | Google Inc. | Method and system for autocompletion using ranked results |
US7836044B2 (en) | 2004-06-22 | 2010-11-16 | Google Inc. | Anticipated query generation and processing in a search engine |
JP4309818B2 (ja) * | 2004-07-15 | 2009-08-05 | 株式会社東芝 | 構造化文書管理装置、検索装置、記憶方法、検索方法及びプログラム |
US20060031760A1 (en) * | 2004-08-05 | 2006-02-09 | Microsoft Corporation | Adaptive document layout server/client system and process |
US20060047656A1 (en) * | 2004-09-01 | 2006-03-02 | Dehlinger Peter J | Code, system, and method for retrieving text material from a library of documents |
US20060085401A1 (en) * | 2004-10-20 | 2006-04-20 | Microsoft Corporation | Analyzing operational and other data from search system or the like |
US7499940B1 (en) | 2004-11-11 | 2009-03-03 | Google Inc. | Method and system for URL autocompletion using ranked results |
US20060106769A1 (en) | 2004-11-12 | 2006-05-18 | Gibbs Kevin A | Method and system for autocompletion for languages having ideographs and phonetic characters |
US8090736B1 (en) * | 2004-12-30 | 2012-01-03 | Google Inc. | Enhancing search results using conceptual document relationships |
US9189481B2 (en) * | 2005-05-06 | 2015-11-17 | John M. Nelson | Database and index organization for enhanced document retrieval |
US20060259475A1 (en) * | 2005-05-10 | 2006-11-16 | Dehlinger Peter J | Database system and method for retrieving records from a record library |
CN1318974C (zh) * | 2005-08-05 | 2007-05-30 | 北京九州汇宝软件有限公司 | 数据库备份数据的压缩和查询方法 |
US8156097B2 (en) * | 2005-11-14 | 2012-04-10 | Microsoft Corporation | Two stage search |
US8010523B2 (en) | 2005-12-30 | 2011-08-30 | Google Inc. | Dynamic search box for web browser |
US7809711B2 (en) * | 2006-06-02 | 2010-10-05 | International Business Machines Corporation | System and method for semantic analysis of intelligent device discovery |
CN102929901B (zh) * | 2006-06-26 | 2016-12-14 | 尼尔森(美国)有限公司 | 提高数据仓库性能的方法和装置 |
CN100573520C (zh) * | 2006-08-29 | 2009-12-23 | 国际商业机器公司 | 为检索对多个文档进行预处理的方法和装置 |
US8401841B2 (en) * | 2006-08-31 | 2013-03-19 | Orcatec Llc | Retrieval of documents using language models |
KR20140104048A (ko) * | 2006-10-18 | 2014-08-27 | 구글 인코포레이티드 | 신디케이션에 적합한 포괄적인 온라인 랭킹 시스템 및 방법 |
US7836085B2 (en) * | 2007-02-05 | 2010-11-16 | Google Inc. | Searching structured geographical data |
US7831587B2 (en) * | 2007-05-10 | 2010-11-09 | Xerox Corporation | Event hierarchies and memory organization for structured data retrieval |
US7822752B2 (en) * | 2007-05-18 | 2010-10-26 | Microsoft Corporation | Efficient retrieval algorithm by query term discrimination |
US7853603B2 (en) * | 2007-05-23 | 2010-12-14 | Microsoft Corporation | User-defined relevance ranking for search |
US9256594B2 (en) | 2007-06-06 | 2016-02-09 | Michael S. Neustel | Patent analyzing system |
US8160306B1 (en) * | 2007-06-06 | 2012-04-17 | Neustel Michael S | Patent analyzing system |
US20090119281A1 (en) * | 2007-11-03 | 2009-05-07 | Andrew Chien-Chung Wang | Granular knowledge based search engine |
US8069179B2 (en) * | 2008-04-24 | 2011-11-29 | Microsoft Corporation | Preference judgements for relevance |
US8161036B2 (en) * | 2008-06-27 | 2012-04-17 | Microsoft Corporation | Index optimization for ranking using a linear model |
US8171031B2 (en) * | 2008-06-27 | 2012-05-01 | Microsoft Corporation | Index optimization for ranking using a linear model |
US8312032B2 (en) | 2008-07-10 | 2012-11-13 | Google Inc. | Dictionary suggestions for partial user entries |
US20100125566A1 (en) * | 2008-11-18 | 2010-05-20 | Patentcafe.Com, Inc. | System and method for conducting a patent search |
US10303722B2 (en) | 2009-05-05 | 2019-05-28 | Oracle America, Inc. | System and method for content selection for web page indexing |
US20100287152A1 (en) | 2009-05-05 | 2010-11-11 | Paul A. Lipari | System, method and computer readable medium for web crawling |
KR101122394B1 (ko) * | 2009-05-08 | 2012-03-23 | 엔에이치엔(주) | 엔트로피 점수를 이용한 검색결과 제공 방법 및 장치 |
CN102483752A (zh) | 2009-06-03 | 2012-05-30 | 谷歌公司 | 用于部分输入的查询的自动完成 |
EP2665002A3 (en) | 2009-06-19 | 2014-04-02 | Blekko, Inc. | A method of counting unique items in a database system |
WO2011000165A1 (en) * | 2009-07-03 | 2011-01-06 | Hewlett-Packard Development Company,L.P. | Apparatus and method for text extraction |
US9507827B1 (en) * | 2010-03-25 | 2016-11-29 | Excalibur Ip, Llc | Encoding and accessing position data |
US8370330B2 (en) * | 2010-05-28 | 2013-02-05 | Apple Inc. | Predicting content and context performance based on performance history of users |
US20120084291A1 (en) * | 2010-09-30 | 2012-04-05 | Microsoft Corporation | Applying search queries to content sets |
US9424351B2 (en) | 2010-11-22 | 2016-08-23 | Microsoft Technology Licensing, Llc | Hybrid-distribution model for search engine indexes |
US8478704B2 (en) * | 2010-11-22 | 2013-07-02 | Microsoft Corporation | Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components |
US9195745B2 (en) | 2010-11-22 | 2015-11-24 | Microsoft Technology Licensing, Llc | Dynamic query master agent for query execution |
US9342582B2 (en) | 2010-11-22 | 2016-05-17 | Microsoft Technology Licensing, Llc | Selection of atoms for search engine retrieval |
US8713024B2 (en) | 2010-11-22 | 2014-04-29 | Microsoft Corporation | Efficient forward ranking in a search engine |
US9529908B2 (en) | 2010-11-22 | 2016-12-27 | Microsoft Technology Licensing, Llc | Tiering of posting lists in search engine index |
US8620907B2 (en) | 2010-11-22 | 2013-12-31 | Microsoft Corporation | Matching funnel for large document index |
US9098570B2 (en) * | 2011-03-31 | 2015-08-04 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for paragraph-based document searching |
US20120271844A1 (en) * | 2011-04-20 | 2012-10-25 | Microsoft Corporation | Providng relevant information for a term in a user message |
KR101454677B1 (ko) * | 2011-10-31 | 2014-10-27 | 네이버 주식회사 | 엔트로피 점수를 이용한 검색결과 제공 방법 및 장치 |
US8965904B2 (en) * | 2011-11-15 | 2015-02-24 | Long Van Dinh | Apparatus and method for information access, search, rank and retrieval |
US20130297657A1 (en) * | 2012-05-01 | 2013-11-07 | Gajanan Chinchwadkar | Apparatus and Method for Forming and Using a Tree Structured Database with Top-Down Trees and Bottom-Up Indices |
JP6590481B2 (ja) * | 2012-12-07 | 2019-10-16 | キヤノン電子株式会社 | ウイルス侵入経路特定装置、ウイルス侵入経路特定方法およびプログラム |
US9916284B2 (en) * | 2013-12-10 | 2018-03-13 | International Business Machines Corporation | Analyzing document content and generating an appendix |
JP6461992B2 (ja) | 2014-11-05 | 2019-01-30 | キヤノン電子株式会社 | 特定装置、その制御方法、及びプログラム |
US9875288B2 (en) | 2014-12-01 | 2018-01-23 | Sap Se | Recursive filter algorithms on hierarchical data models described for the use by the attribute value derivation |
US10776376B1 (en) * | 2014-12-05 | 2020-09-15 | Veritas Technologies Llc | Systems and methods for displaying search results |
CN104572620B (zh) * | 2014-12-31 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | 一种用于显示章节内容的方法和装置 |
US10242071B2 (en) | 2015-06-23 | 2019-03-26 | Microsoft Technology Licensing, Llc | Preliminary ranker for scoring matching documents |
US11281639B2 (en) * | 2015-06-23 | 2022-03-22 | Microsoft Technology Licensing, Llc | Match fix-up to remove matching documents |
US10229143B2 (en) | 2015-06-23 | 2019-03-12 | Microsoft Technology Licensing, Llc | Storage and retrieval of data from a bit vector search index |
US10467215B2 (en) | 2015-06-23 | 2019-11-05 | Microsoft Technology Licensing, Llc | Matching documents using a bit vector search index |
US10733164B2 (en) | 2015-06-23 | 2020-08-04 | Microsoft Technology Licensing, Llc | Updating a bit vector search index |
US10565198B2 (en) | 2015-06-23 | 2020-02-18 | Microsoft Technology Licensing, Llc | Bit vector search index using shards |
US11392568B2 (en) | 2015-06-23 | 2022-07-19 | Microsoft Technology Licensing, Llc | Reducing matching documents for a search query |
CN106815266B (zh) * | 2015-12-01 | 2020-06-16 | 北京国双科技有限公司 | 裁判文书检索方法和装置 |
WO2017108550A1 (en) * | 2015-12-24 | 2017-06-29 | Koninklijke Philips N.V. | Device for and method of determining a length of a relevant history |
US20180165265A1 (en) * | 2016-12-08 | 2018-06-14 | International Business Machines Corporation | Indicating property inheritance in object hierarchies |
KR102594625B1 (ko) * | 2017-03-19 | 2023-10-25 | 오펙-에슈콜롯 리서치 앤드 디벨롭먼트 엘티디 | K-부정합 검색을 위한 필터를 생성하는 시스템 및 방법 |
CN108959573B (zh) * | 2018-07-05 | 2022-07-15 | 京东方科技集团股份有限公司 | 基于桌面云的数据迁移方法、装置、电子设备以及存储介质 |
WO2020075062A1 (en) * | 2018-10-08 | 2020-04-16 | Arctic Alliance Europe Oy | Method and system to perform text-based search among plurality of documents |
US11061913B2 (en) * | 2018-11-30 | 2021-07-13 | International Business Machines Corporation | Automated document filtration and priority scoring for document searching and access |
US11074262B2 (en) * | 2018-11-30 | 2021-07-27 | International Business Machines Corporation | Automated document filtration and prioritization for document searching and access |
US11068490B2 (en) * | 2019-01-04 | 2021-07-20 | International Business Machines Corporation | Automated document filtration with machine learning of annotations for document searching and access |
US10977292B2 (en) | 2019-01-15 | 2021-04-13 | International Business Machines Corporation | Processing documents in content repositories to generate personalized treatment guidelines |
US11721441B2 (en) | 2019-01-15 | 2023-08-08 | Merative Us L.P. | Determining drug effectiveness ranking for a patient using machine learning |
US11537581B2 (en) * | 2019-03-22 | 2022-12-27 | Hewlett Packard Enterprise Development Lp | Co-parent keys for document information trees |
CN110990017B (zh) * | 2019-09-11 | 2022-09-09 | 无锡江南计算技术研究所 | 一种基于可信树的特征存储与匹配方法 |
US11531818B2 (en) * | 2019-11-15 | 2022-12-20 | 42 Maru Inc. | Device and method for machine reading comprehension question and answer |
US20210349888A1 (en) * | 2020-05-11 | 2021-11-11 | Dropbox, Inc. | Personalized Spelling Correction |
CN112307356A (zh) * | 2020-10-30 | 2021-02-02 | 北京百度网讯科技有限公司 | 信息搜索方法、装置、电子设备及存储介质 |
Family Cites Families (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5020019A (en) * | 1989-05-29 | 1991-05-28 | Ricoh Company, Ltd. | Document retrieval system |
JPH03122770A (ja) * | 1989-10-05 | 1991-05-24 | Ricoh Co Ltd | キーワード連想文書検索方法 |
US5404514A (en) * | 1989-12-26 | 1995-04-04 | Kageneck; Karl-Erbo G. | Method of indexing and retrieval of electronically-stored documents |
US5321833A (en) * | 1990-08-29 | 1994-06-14 | Gte Laboratories Incorporated | Adaptive ranking system for information retrieval |
JP2943447B2 (ja) * | 1991-01-30 | 1999-08-30 | 三菱電機株式会社 | テキスト情報抽出装置とテキスト類似照合装置とテキスト検索システムとテキスト情報抽出方法とテキスト類似照合方法、及び、質問解析装置 |
JPH05101107A (ja) * | 1991-10-07 | 1993-04-23 | Hitachi Ltd | 適合率を用いた絞り込みデータ検索装置及び方法 |
GB9220404D0 (en) * | 1992-08-20 | 1992-11-11 | Nat Security Agency | Method of identifying,retrieving and sorting documents |
JP2770715B2 (ja) * | 1993-08-25 | 1998-07-02 | 富士ゼロックス株式会社 | 構造化文書検索装置 |
EP0645757B1 (en) * | 1993-09-23 | 2000-04-05 | Xerox Corporation | Semantic co-occurrence filtering for speech recognition and signal transcription applications |
US5692176A (en) * | 1993-11-22 | 1997-11-25 | Reed Elsevier Inc. | Associative text search and retrieval system |
US5574840A (en) | 1994-08-29 | 1996-11-12 | Microsoft Corporation | Method and system for selecting text utilizing a plurality of text using switchable minimum granularity of selection |
JP2896634B2 (ja) | 1995-03-02 | 1999-05-31 | 富士ゼロックス株式会社 | 全文登録語検索装置および全文登録語検索方法 |
US5826260A (en) * | 1995-12-11 | 1998-10-20 | International Business Machines Corporation | Information retrieval system and method for displaying and ordering information based on query element contribution |
US5752242A (en) * | 1996-04-18 | 1998-05-12 | Electronic Data Systems Corporation | System and method for automated retrieval of information |
JP3598742B2 (ja) * | 1996-11-25 | 2004-12-08 | 富士ゼロックス株式会社 | 文書検索装置及び文書検索方法 |
US6098065A (en) * | 1997-02-13 | 2000-08-01 | Nortel Networks Corporation | Associative search engine |
US5873081A (en) * | 1997-06-27 | 1999-02-16 | Microsoft Corporation | Document filtering via directed acyclic graphs |
US5933822A (en) * | 1997-07-22 | 1999-08-03 | Microsoft Corporation | Apparatus and methods for an information retrieval system that employs natural language processing of search results to improve overall precision |
US6014639A (en) | 1997-11-05 | 2000-01-11 | International Business Machines Corporation | Electronic catalog system for exploring a multitude of hierarchies, using attribute relevance and forwarding-checking |
US5999664A (en) * | 1997-11-14 | 1999-12-07 | Xerox Corporation | System for searching a corpus of document images by user specified document layout components |
US6801916B2 (en) * | 1998-04-01 | 2004-10-05 | Cyberpulse, L.L.C. | Method and system for generation of medical reports from data in a hierarchically-organized database |
US6389425B1 (en) | 1998-07-09 | 2002-05-14 | International Business Machines Corporation | Embedded storage mechanism for structured data types |
JP2000029902A (ja) | 1998-07-15 | 2000-01-28 | Nec Corp | 構造化文書分類装置およびこの構造化文書分類装置をコンピュータで実現するプログラムを記録した記録媒体、並びに、構造化文書検索システムおよびこの構造化文書検索システムをコンピュータで実現するプログラムを記録した記録媒体 |
US6446061B1 (en) * | 1998-07-31 | 2002-09-03 | International Business Machines Corporation | Taxonomy generation for document collections |
JP2000090098A (ja) | 1998-09-09 | 2000-03-31 | Hitachi Ltd | データベース問い合わせ方法及びその実施装置並びにその処理プログラムを記録した媒体 |
US6363378B1 (en) * | 1998-10-13 | 2002-03-26 | Oracle Corporation | Ranking of query feedback terms in an information retrieval system |
JP2001160066A (ja) | 1998-12-25 | 2001-06-12 | Matsushita Electric Ind Co Ltd | データ処理装置、データ処理方法および記録媒体、並びに該データ処理方法をコンピュータに実行させるためのプログラム |
AU5587400A (en) * | 1999-05-07 | 2000-11-21 | Carlos Cardona | System and method for database retrieval, indexing and statistical analysis |
US7225182B2 (en) * | 1999-05-28 | 2007-05-29 | Overture Services, Inc. | Recommending search terms using collaborative filtering and web spidering |
US6380947B1 (en) * | 1999-07-22 | 2002-04-30 | At&T Corp. | Method and apparatus for displaying and tree scrolling a hierarchical data structure |
US20020052692A1 (en) | 1999-09-15 | 2002-05-02 | Eoin D. Fahy | Computer systems and methods for hierarchical cluster analysis of large sets of biological data including highly dense gene array data |
US7287214B1 (en) * | 1999-12-10 | 2007-10-23 | Books24X7.Com, Inc. | System and method for providing a searchable library of electronic documents to a user |
US6397211B1 (en) * | 2000-01-03 | 2002-05-28 | International Business Machines Corporation | System and method for identifying useless documents |
US7333983B2 (en) * | 2000-02-03 | 2008-02-19 | Hitachi, Ltd. | Method of and an apparatus for retrieving and delivering documents and a recording media on which a program for retrieving and delivering documents are stored |
EP1122651B1 (en) * | 2000-02-03 | 2010-05-19 | Hitachi, Ltd. | Method and apparatus for retrieving and delivering documents, and recording media storing a program therefor |
WO2002008948A2 (en) * | 2000-07-24 | 2002-01-31 | Vivcom, Inc. | System and method for indexing, searching, identifying, and editing portions of electronic multimedia files |
KR100426382B1 (ko) * | 2000-08-23 | 2004-04-08 | 학교법인 김포대학 | 엔트로피 정보와 베이지안 에스오엠을 이용한 문서군집기반의 순위조정 방법 |
KR100434902B1 (ko) * | 2000-08-28 | 2004-06-07 | 주식회사 에이전트엑스퍼트 | 지식 기반 맞춤 정보 제공 시스템 및 그 서비스 방법 |
US6804662B1 (en) * | 2000-10-27 | 2004-10-12 | Plumtree Software, Inc. | Method and apparatus for query and analysis |
US6693651B2 (en) * | 2001-02-07 | 2004-02-17 | International Business Machines Corporation | Customer self service iconic interface for resource search results display and selection |
US7225234B2 (en) * | 2001-03-02 | 2007-05-29 | Sedna Patent Services, Llc | Method and system for selective advertisement display of a subset of search results |
US20020123989A1 (en) * | 2001-03-05 | 2002-09-05 | Arik Kopelman | Real time filter and a method for calculating the relevancy value of a document |
KR100498574B1 (ko) * | 2001-03-08 | 2005-07-01 | 주식회사 다이퀘스트 | 단락 단위의 실시간 응답 색인을 이용한 자연어 질의-응답검색시스템 |
JP3842577B2 (ja) | 2001-03-30 | 2006-11-08 | 株式会社東芝 | 構造化文書検索方法および構造化文書検索装置およびプログラム |
US20020198962A1 (en) * | 2001-06-21 | 2002-12-26 | Horn Frederic A. | Method, system, and computer program product for distributing a stored URL and web document set |
US20050108200A1 (en) * | 2001-07-04 | 2005-05-19 | Frank Meik | Category based, extensible and interactive system for document retrieval |
US7403938B2 (en) * | 2001-09-24 | 2008-07-22 | Iac Search & Media, Inc. | Natural language query processing |
US20030115191A1 (en) * | 2001-12-17 | 2003-06-19 | Max Copperman | Efficient and cost-effective content provider for customer relationship management (CRM) or other applications |
US7080059B1 (en) * | 2002-05-13 | 2006-07-18 | Quasm Corporation | Search and presentation engine |
CA2485546A1 (en) * | 2002-05-14 | 2003-11-27 | Verity, Inc. | Apparatus and method for region sensitive dynamically configurable document relevance ranking |
US7231395B2 (en) * | 2002-05-24 | 2007-06-12 | Overture Services, Inc. | Method and apparatus for categorizing and presenting documents of a distributed database |
US7139778B2 (en) * | 2002-06-28 | 2006-11-21 | Microsoft Corporation | Linear programming approach to assigning benefit to database physical design structures |
US20040037734A1 (en) * | 2002-08-23 | 2004-02-26 | Toomey Patrick J. | Method for removal of mold from a structure |
US7111000B2 (en) * | 2003-01-06 | 2006-09-19 | Microsoft Corporation | Retrieval of structured documents |
US20070260627A1 (en) * | 2006-05-03 | 2007-11-08 | Lucent Technologies Inc. | Method and apparatus for selective content modification within a content complex |
-
2003
- 2003-01-06 US US10/337,138 patent/US7111000B2/en active Active
- 2003-12-15 EP EP03028647.0A patent/EP1435581B1/en not_active Expired - Lifetime
-
2004
- 2004-01-06 CN CNB2004100016133A patent/CN100568229C/zh not_active Expired - Lifetime
- 2004-01-06 JP JP2004001489A patent/JP4425641B2/ja not_active Expired - Fee Related
- 2004-01-06 KR KR1020040000739A patent/KR101120760B1/ko not_active IP Right Cessation
-
2006
- 2006-03-23 US US11/277,344 patent/US7428538B2/en not_active Expired - Lifetime
- 2006-03-23 US US11/277,345 patent/US20060161532A1/en not_active Abandoned
-
2008
- 2008-09-16 US US12/211,793 patent/US8046370B2/en not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
US20060161532A1 (en) | 2006-07-20 |
EP1435581A3 (en) | 2005-09-28 |
US20090012956A1 (en) | 2009-01-08 |
US8046370B2 (en) | 2011-10-25 |
KR20040063822A (ko) | 2004-07-14 |
KR101120760B1 (ko) | 2012-06-12 |
US7111000B2 (en) | 2006-09-19 |
CN1517914A (zh) | 2004-08-04 |
EP1435581A2 (en) | 2004-07-07 |
EP1435581B1 (en) | 2013-04-10 |
CN100568229C (zh) | 2009-12-09 |
US20040133557A1 (en) | 2004-07-08 |
US7428538B2 (en) | 2008-09-23 |
JP2004213675A (ja) | 2004-07-29 |
US20060155690A1 (en) | 2006-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4425641B2 (ja) | 構造化ドキュメントの検索 | |
US7685112B2 (en) | Method and apparatus for retrieving and indexing hidden pages | |
US8868539B2 (en) | Search equalizer | |
JP5638031B2 (ja) | 格付け方法、検索結果分類方法、格付けシステム及び検索結果分類システム | |
US7769771B2 (en) | Searching a document using relevance feedback | |
US8099423B2 (en) | Hierarchical metadata generator for retrieval systems | |
US7783644B1 (en) | Query-independent entity importance in books | |
US7752243B2 (en) | Method and apparatus for construction and use of concept knowledge base | |
US7788261B2 (en) | Interactive web information retrieval using graphical word indicators | |
US20130268526A1 (en) | Discovery engine | |
US8930822B2 (en) | Method for human-centric information access and presentation | |
US20050081146A1 (en) | Relation chart-creating program, relation chart-creating method, and relation chart-creating apparatus | |
JP2007188352A (ja) | ページリランキング装置、ページリランキングプログラム | |
US20120158716A1 (en) | Image object retrieval based on aggregation of visual annotations | |
Wolfram | The symbiotic relationship between information retrieval and informetrics | |
US8775443B2 (en) | Ranking of business objects for search engines | |
Hristidis et al. | Ranked queries over sources with boolean query interfaces without ranking support | |
Hristidis et al. | Relevance-based retrieval on hidden-web text databases without ranking support | |
Ntoulas et al. | Downloading hidden web content | |
Veningston et al. | Semantic association ranking schemes for information retrieval applications using term association graph representation | |
EP1807781A1 (en) | Data processing system and method | |
Lin et al. | Personalized optimal search in local query expansion | |
Wang | Evaluation of web search engines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20070109 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20090807 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20091109 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20091204 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20091209 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20121218 Year of fee payment: 3 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20131218 Year of fee payment: 4 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
S111 | Request for change of ownership or part of ownership |
Free format text: JAPANESE INTERMEDIATE CODE: R313113 |
|
R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
LAPS | Cancellation because of no payment of annual fees |