JP4639077B2 - 語彙と文法を有する言語の上のストリングの内部構造の各レベルにインデックス付けを行うためのシステム及び方法 - Google Patents
語彙と文法を有する言語の上のストリングの内部構造の各レベルにインデックス付けを行うためのシステム及び方法 Download PDFInfo
- Publication number
- JP4639077B2 JP4639077B2 JP2004360775A JP2004360775A JP4639077B2 JP 4639077 B2 JP4639077 B2 JP 4639077B2 JP 2004360775 A JP2004360775 A JP 2004360775A JP 2004360775 A JP2004360775 A JP 2004360775A JP 4639077 B2 JP4639077 B2 JP 4639077B2
- Authority
- JP
- Japan
- Prior art keywords
- bit
- index
- string
- bit index
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99932—Access augmentation or optimizing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Description
Claims (2)
- 語彙と文法を持つ言語の上の少なくとも1つのストリングに対してインデックス付けを行うための方法であって、
分割手段が、ストリングごとに、当該ストリングに対してパーズ処理を行うことにより、前記ストリングを、前記語彙の少なくとも1つの単語と前記文法上での前記単語間の関係と、を含んだ複数の構成部分に分割し、
第1のビットインデックス算出手段が、各構成部分について、各ビットがそれぞれ前記ストリング中に現れる前記語彙の各単語と前記文法上での前記関係とに対してそれぞれ一意に対応づけられたビットインデックスであって、当該構成部分に含まれる単語及び関係に対応する各ビットが第1の値にセットされ、他の各ビットが第2の値にセットされたビットインデックスを求め、
第2のビットインデックス算出手段が、複数の構成部分から構成される少なくとも1つの集合の各々について、その集合に含まれる各構成部分のビットインデックスのビットごとの論理和を求めることで、その集合のビットインデックスを求める、
方法。 - 語彙と文法を持つ言語の上の少なくとも1つのストリングに対してインデックス付けを行うストリングインデックス付けシステムであって、
少なくとも1つのストリングの入力を受け、前記ストリングに対してパーズ処理を行うことにより、前記ストリングを、前記語彙の少なくとも1つの単語と前記文法上での前記単語間の関係とを含んだ少なくとも1つの構成部分に分割する分割手段と、
各構成部分に含まれる各単語及び各関係に対してそれぞれ一意なインデックス番号を割り当てるインデックス割当手段と、
各インデックス番号を、各ビットがそれぞれ前記各インデックス番号に対してそれぞれ一意に対応づけられたビットインデックスであって、当該インデックス番号に対応するビットが第1の値にセットされ他のビットが第2の値にセットされたビットインデックスに変換することにより、各単語及び各関係のビットインデックスを求めるインデックス変換手段と、
2以上のビットインデックスの入力を受け、それら2以上のビットインデックスのビットごとの論理和の演算結果を表すビットインデックスを出力する合併手段であって、前記各構成部分のビットインデックスを、それぞれ当該構成部分に含まれる各単語及び各関係に対応するビットインデックスのビットごとの論理和により求め、複数の構成部分から構成される集合のビットインデックスを、当該集合に含まれる各構成部分のビットインデックスのビットごとの論理和により求める合併手段と、
を備えるストリングインデックス付けシステム。
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/739,191 US7194450B2 (en) | 2003-12-19 | 2003-12-19 | Systems and methods for indexing each level of the inner structure of a string over a language having a vocabulary and a grammar |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2005182795A JP2005182795A (ja) | 2005-07-07 |
JP4639077B2 true JP4639077B2 (ja) | 2011-02-23 |
Family
ID=34677538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2004360775A Expired - Fee Related JP4639077B2 (ja) | 2003-12-19 | 2004-12-14 | 語彙と文法を有する言語の上のストリングの内部構造の各レベルにインデックス付けを行うためのシステム及び方法 |
Country Status (2)
Country | Link |
---|---|
US (1) | US7194450B2 (ja) |
JP (1) | JP4639077B2 (ja) |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7324648B1 (en) | 2003-07-08 | 2008-01-29 | Copyright Clearance Center, Inc. | Method and apparatus for secure key delivery for decrypting bulk digital content files at an unsecure site |
US8006307B1 (en) * | 2003-07-09 | 2011-08-23 | Imophaze Research Co., L.L.C. | Method and apparatus for distributing secure digital content that can be indexed by third party search engines |
US8886617B2 (en) | 2004-02-20 | 2014-11-11 | Informatica Corporation | Query-based searching using a virtual table |
US7243110B2 (en) * | 2004-02-20 | 2007-07-10 | Sand Technology Inc. | Searchable archive |
US7467155B2 (en) * | 2005-07-12 | 2008-12-16 | Sand Technology Systems International, Inc. | Method and apparatus for representation of unstructured data |
US7797303B2 (en) | 2006-02-15 | 2010-09-14 | Xerox Corporation | Natural language processing for developing queries |
US20070219773A1 (en) * | 2006-03-17 | 2007-09-20 | Xerox Corporation | Syntactic rule development graphical user interface |
US7949514B2 (en) * | 2007-04-20 | 2011-05-24 | Xerox Corporation | Method for building parallel corpora |
US7788084B2 (en) | 2006-09-19 | 2010-08-31 | Xerox Corporation | Labeling of work of art titles in text for natural language processing |
US7774198B2 (en) * | 2006-10-06 | 2010-08-10 | Xerox Corporation | Navigation system for text |
US7890318B2 (en) * | 2007-05-23 | 2011-02-15 | Xerox Corporation | Informing troubleshooting sessions with device data |
US7844633B2 (en) * | 2007-09-13 | 2010-11-30 | International Business Machines Corporation | System and method for storage, management and automatic indexing of structured documents |
US9547640B2 (en) * | 2013-10-16 | 2017-01-17 | International Business Machines Corporation | Ontology-driven annotation confidence levels for natural language processing |
CN104750701A (zh) * | 2013-12-27 | 2015-07-01 | 中兴通讯股份有限公司 | 搜索处理方法、装置及终端 |
US10733164B2 (en) | 2015-06-23 | 2020-08-04 | Microsoft Technology Licensing, Llc | Updating a bit vector search index |
US10242071B2 (en) | 2015-06-23 | 2019-03-26 | Microsoft Technology Licensing, Llc | Preliminary ranker for scoring matching documents |
US11392568B2 (en) | 2015-06-23 | 2022-07-19 | Microsoft Technology Licensing, Llc | Reducing matching documents for a search query |
US10467215B2 (en) * | 2015-06-23 | 2019-11-05 | Microsoft Technology Licensing, Llc | Matching documents using a bit vector search index |
US10565198B2 (en) | 2015-06-23 | 2020-02-18 | Microsoft Technology Licensing, Llc | Bit vector search index using shards |
US11281639B2 (en) | 2015-06-23 | 2022-03-22 | Microsoft Technology Licensing, Llc | Match fix-up to remove matching documents |
US10229143B2 (en) | 2015-06-23 | 2019-03-12 | Microsoft Technology Licensing, Llc | Storage and retrieval of data from a bit vector search index |
US11823798B2 (en) | 2016-09-28 | 2023-11-21 | Merative Us L.P. | Container-based knowledge graphs for determining entity relations in non-narrative text |
US11016973B2 (en) | 2016-11-29 | 2021-05-25 | Sap Se | Query plan execution engine |
US10885032B2 (en) | 2016-11-29 | 2021-01-05 | Sap Se | Query execution pipelining with shared states for query operators |
US10372707B2 (en) | 2016-11-29 | 2019-08-06 | Sap Se | Query execution pipelining with pump operators |
US10521426B2 (en) | 2016-11-29 | 2019-12-31 | Sap Se | Query plan generation for split table query operations |
US10558661B2 (en) | 2016-11-29 | 2020-02-11 | Sap Se | Query plan generation based on table adapter |
US10733184B2 (en) | 2016-11-29 | 2020-08-04 | Sap Se | Query planning and execution with source and sink operators |
US10776353B2 (en) | 2017-01-26 | 2020-09-15 | Sap Se | Application programming interface for database access |
US10671625B2 (en) * | 2017-01-26 | 2020-06-02 | Sap Se | Processing a query primitive call on a value identifier set |
US10860579B2 (en) | 2017-01-30 | 2020-12-08 | Sap Se | Query planning and execution with reusable memory stack |
GB201710925D0 (en) * | 2017-07-07 | 2017-08-23 | Nalanda Tech Ltd | A Searching method and apparatus |
US10949219B2 (en) | 2018-06-15 | 2021-03-16 | Sap Se | Containerized runtime environments |
US10866831B2 (en) | 2018-06-15 | 2020-12-15 | Sap Se | Distributed execution of data processing pipelines |
US10733034B2 (en) | 2018-06-15 | 2020-08-04 | Sap Se | Trace messaging for distributed execution of data processing pipelines |
US11275485B2 (en) | 2018-06-15 | 2022-03-15 | Sap Se | Data processing pipeline engine |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08314966A (ja) * | 1995-05-19 | 1996-11-29 | Toshiba Corp | 文書検索装置のインデックス作成方法及び文書検索装置 |
JPH08329112A (ja) * | 1995-06-06 | 1996-12-13 | Fujitsu Ltd | フリーテキスト検索システム |
JP2000207395A (ja) * | 1999-01-19 | 2000-07-28 | Matsushita Electric Ind Co Ltd | 日本語解析装置および日本語解析方法ならびに日本語解析プログラムを記録した記録媒体 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5469354A (en) * | 1989-06-14 | 1995-11-21 | Hitachi, Ltd. | Document data processing method and apparatus for document retrieval |
US5625554A (en) * | 1992-07-20 | 1997-04-29 | Xerox Corporation | Finite-state transduction of related word forms for text indexing and retrieval |
US5701459A (en) * | 1993-01-13 | 1997-12-23 | Novell, Inc. | Method and apparatus for rapid full text index creation |
US5379366A (en) * | 1993-01-29 | 1995-01-03 | Noyes; Dallas B. | Method for representation of knowledge in a computer as a network database system |
US5963940A (en) * | 1995-08-16 | 1999-10-05 | Syracuse University | Natural language information retrieval system and method |
US6026388A (en) * | 1995-08-16 | 2000-02-15 | Textwise, Llc | User interface and other enhancements for natural language information retrieval system and method |
US5742816A (en) * | 1995-09-15 | 1998-04-21 | Infonautics Corporation | Method and apparatus for identifying textual documents and multi-mediafiles corresponding to a search topic |
US5893094A (en) * | 1997-07-25 | 1999-04-06 | Claritech Corporation | Method and apparatus using run length encoding to evaluate a database |
US6513032B1 (en) * | 1998-10-29 | 2003-01-28 | Alta Vista Company | Search and navigation system and method using category intersection pre-computation |
US7962326B2 (en) * | 2000-04-20 | 2011-06-14 | Invention Machine Corporation | Semantic answering system and method |
CA2340531C (en) * | 2001-03-12 | 2006-10-10 | Ibm Canada Limited-Ibm Canada Limitee | Document retrieval system and search method using word set and character look-up tables |
-
2003
- 2003-12-19 US US10/739,191 patent/US7194450B2/en not_active Expired - Fee Related
-
2004
- 2004-12-14 JP JP2004360775A patent/JP4639077B2/ja not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08314966A (ja) * | 1995-05-19 | 1996-11-29 | Toshiba Corp | 文書検索装置のインデックス作成方法及び文書検索装置 |
JPH08329112A (ja) * | 1995-06-06 | 1996-12-13 | Fujitsu Ltd | フリーテキスト検索システム |
JP2000207395A (ja) * | 1999-01-19 | 2000-07-28 | Matsushita Electric Ind Co Ltd | 日本語解析装置および日本語解析方法ならびに日本語解析プログラムを記録した記録媒体 |
Also Published As
Publication number | Publication date |
---|---|
US7194450B2 (en) | 2007-03-20 |
US20050138000A1 (en) | 2005-06-23 |
JP2005182795A (ja) | 2005-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4639077B2 (ja) | 語彙と文法を有する言語の上のストリングの内部構造の各レベルにインデックス付けを行うためのシステム及び方法 | |
JP3272288B2 (ja) | 機械翻訳装置および機械翻訳方法 | |
US5099426A (en) | Method for use of morphological information to cross reference keywords used for information retrieval | |
JP3601653B2 (ja) | 情報検索装置および方法 | |
US8280721B2 (en) | Efficiently representing word sense probabilities | |
US5285386A (en) | Machine translation apparatus having means for translating polysemous words using dominated codes | |
EP2643770A2 (en) | Text segmentation with multiple granularity levels | |
WO2001029699A1 (en) | Method and system to analyze, transfer and generate language expressions using compiled instructions to manipulate linguistic structures | |
US6915300B1 (en) | Method and system for searching indexed string containing a search string | |
JP3992348B2 (ja) | 形態素解析方法および装置、並びに日本語形態素解析方法および装置 | |
JPH1069497A (ja) | データベースアクセス装置およびその方法 | |
EP0524694B1 (en) | A method of inflecting words and a data processing unit for performing such method | |
JP3430431B2 (ja) | データベース検索装置及びデータベース検索方法 | |
WO2009136426A1 (ja) | 検索クエリ提供装置 | |
CN112052651A (zh) | 诗词生成方法、装置、电子设备及存储介质 | |
JP3873305B2 (ja) | 仮名漢字変換装置および仮名漢字変換方法 | |
JP3500698B2 (ja) | キーワード抽出装置及びキーワード抽出方法 | |
JPH0261768A (ja) | 電子辞書装置及び電子辞書検索方法 | |
JP3628565B2 (ja) | 辞書検索方法、装置、および辞書検索プログラムを記録した記録媒体 | |
CN117291155A (zh) | 数据生成方法、模型训练方法、文本纠错方法及相关装置 | |
JPH0612451A (ja) | 例文検索システム | |
JP2000259627A (ja) | 自然言語文関係判定装置、自然言語文関係判定方法およびこれを用いた検索装置、検索方法ならびに記録媒体 | |
Takeda et al. | CRITAC-A Japanese Text Proofreading System | |
JPH0157829B2 (ja) | ||
JPH03229367A (ja) | テキストベース検索方式 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20071213 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20100706 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20101004 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20101102 |
|
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20101129 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20131203 Year of fee payment: 3 |
|
R150 | Certificate of patent or registration of utility model |
Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
LAPS | Cancellation because of no payment of annual fees |