JP2006099427A

JP2006099427A - Full-text retrieval system and method

Info

Publication number: JP2006099427A
Application number: JP2004284673A
Authority: JP
Inventors: Katsuhiko Takachio; 勝彦高知尾; Kouichi Sasaki; 光一笹氣; Yoji Kato; 陽二加藤
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2004-09-29
Filing date: 2004-09-29
Publication date: 2006-04-13
Anticipated expiration: 2024-09-29
Also published as: CN100412864C; CN1755691A; JP4037859B2

Abstract

<P>PROBLEM TO BE SOLVED: To attain the high speed of full text retrieval while securing retrieving precision to some extents. <P>SOLUTION: A retrieval result number approximation level deciding part 183 decides the approximation level between the hit number of items of primary retrieval by an N gram index to be obtained by an N gram retrieval engine 13, and the hit number of items of morpheme retrieval by a morpheme index to be obtained by a morpheme retrieval engine 16. A full text retrieval execution control mechanism 18 controls the N gram retrieval engine 13 so that secondary retrieval is omitted by the N gram index when it is decided that the hit number of items of the primary retrieval approximates to the hit number of items of the morpheme retrieval by a retrieval result number approximation level deciding part 183. In this case, the full text retrieval execution control mechanism 18 adopts the result of the primary retrieval or the result of the morpheme retrieval as a retrieval result. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、電子化された膨大な文書情報の中から、指定された検索条件に合致する文書を全文検索技術により高速に検索するのに好適な全文検索システム及び方法に関する。 The present invention relates to a full-text search system and method suitable for searching a document that matches a specified search condition at a high speed from a large amount of digitized document information using a full-text search technique.

電子化された膨大な文書情報の中から、指定された検索条件に合致する文書を検索する検索システムが従来から種々開発されている。この種の検索システムで適用される文書検索の代表的な検索手法として、全文検索（フルテキスト検索）に用いられるＮグラム（N-gram）インデックスによる検索手法、或は自然言語検索（概念検索）に用いられる形態素による検索手法が知られている。これらの検索手法の概要は次の通りである。 Various search systems that search for documents that meet specified search conditions from a large amount of digitized document information have been developed. As a typical search method of document search applied in this type of search system, a search method using an N-gram index used for full-text search (full-text search) or natural language search (concept search) A search method based on morphemes used in Japanese is known. The outline of these search methods is as follows.

［Ｎグラムインデックスによる検索手法］
文書中に現れるすべての文字を予め定められた長さＮの連続する文字列（グラム）としてインデックスに登録する。検索時にも同様に検索の対象となる検索文字列（検索単語）を長さＮとなる文字列（グラム）に分割し、インデックスから同じ文字列の出現情報を得ることで検索を行う。 [Search method using N-gram index]
All characters appearing in the document are registered in the index as a continuous character string (gram) having a predetermined length N. Similarly, at the time of search, a search character string (search word) to be searched is divided into character strings (grams) having a length N, and search is performed by obtaining appearance information of the same character string from the index.

Ｎグラムインデックスによる検索では、まず、検索単語から分割された長さＮの文字列に一致する文字列の有無（つまりヒットの有無）のみで候補文書を選別する一次検索が行われる。その後、語の隣接関係を調べることで、検索単語が含まれる文書に絞り込む二次検索が行われる。このように、Ｎグラムインデックスによる検索では、一次検索と二次検索との２段階の検索により、漏れのない全文検索が実現される。 In the search using the N-gram index, first, a primary search is performed in which candidate documents are selected based only on the presence / absence of a character string that matches the character string of length N divided from the search word (that is, the presence / absence of a hit). Thereafter, a secondary search for narrowing down to documents including the search word is performed by checking the adjacency relationship between words. As described above, in the search using the N-gram index, a full-text search without omission is realized by a two-stage search including a primary search and a secondary search.

一次検索の検索精度を上げるには、ＮグラムのＮ値を大きくすれば良いことが知られている。しかし、Ｎ値を大きくすると、インデックスが極端に大きくなるため、検索に多大な時間を要する虞がある。一方、Ｎ値を小さくすると検索ノイズが増して、検索精度が低下する。二次検索はヒットしたすべての文書を対象とするため、ヒット件数が多いほど（実際のノイズの多少に拘わらずに）効率が悪くなる。 It is known that the N value of the N-gram should be increased in order to increase the search accuracy of the primary search. However, if the N value is increased, the index becomes extremely large, so that it may take a long time for the search. On the other hand, if the N value is reduced, search noise increases and search accuracy decreases. Since the secondary search targets all hit documents, the greater the number of hits (regardless of actual noise), the lower the efficiency.

［形態素インデックスによる検索手法］
文書を解析して、当該文書から意味のある最小の言語単位（形態素）の範囲で、索引付けする形態素（単語）を抽出し、抽出された形態素毎に文書情報を割り当てインデックスに登録する。検索時も同様に検索文字列を形態素に分割し、同じ形態素に一致する文書情報を得ることで検索を行う。 [Search method by morpheme index]
The document is analyzed, morphemes (words) to be indexed are extracted from the document within the range of the smallest meaningful language unit (morpheme), and document information is assigned to each extracted morpheme and registered in the index. Similarly, at the time of retrieval, the retrieval character string is divided into morphemes, and retrieval is performed by obtaining document information that matches the same morphemes.

形態素インデックスによる検索では、形態素同士で重なる部分が無いため、インデックスのサイズは小さく、また高速検索が可能である。但し、対象文書と検索条件の形態素結果が一致していない場合には検索漏れが生じる。 In the search based on the morpheme index, there is no overlapping part between morphemes, so the size of the index is small and high-speed search is possible. However, a search omission occurs when the target document and the morphological result of the search condition do not match.

このように、Ｎグラムインデックスによる検索では検索漏れはなく、一時検索は高速であるが、ノイズを除去するための二次検索が遅いという問題がある。一方、形態素インデックスによる検索は、高速検索が可能であるが、検索漏れが生じる問題がある。つまり、全文検索（フルテキスト検索）に用いられるＮグラムインデックスによる検索手法と、自然言語検索（概念検索）に用いられる形態素インデックスによる検索手法とは、それぞれ一長一短がある。 Thus, there is no omission in the search using the N-gram index, and the temporary search is fast, but there is a problem that the secondary search for removing noise is slow. On the other hand, a search using a morpheme index can perform a high-speed search, but there is a problem that a search is omitted. That is, the search method using the N-gram index used for full-text search (full-text search) and the search method using morpheme index used for natural language search (concept search) have their advantages and disadvantages.

そこで、全文検索（フルテキスト検索）と自然言語検索（概念検索）の長所を生かして、検索漏れの少ない文書検索を実現するために、全文検索と自然言語検索との２種の検索を実行し、両検索結果をマージする文書検索手法が提案されている（例えば、特許文献１参照）。この特許文献１に記載の文書検索技術（以下、第１の先行技術と称する）では、全文検索の検索結果の中から、自然言語検索により検索結果を絞り込むことにより、指定されたテキストを含む検索対象から質問文に近い文書を検索する技術、或は自然言語検索の検索結果の中から、全文検索により検索結果を絞り込むことにより、大まかに自然言語検索をして得られた検索結果の中から、指定されたテキストを含む文書を検索する技術が適用される。 Therefore, two types of searches, full-text search and natural language search, are executed to realize document search with few omissions by taking advantage of full-text search (full-text search) and natural language search (concept search). A document search method for merging both search results has been proposed (see, for example, Patent Document 1). In the document search technique described in Patent Document 1 (hereinafter referred to as the first prior art), a search including a specified text is performed by narrowing down the search result by a natural language search from the search results of a full-text search. From the search results obtained by performing a natural language search roughly by narrowing down the search results by full-text search from the search results of documents that are close to the question sentence from the target or the search results of natural language search A technique for searching a document including designated text is applied.

このように第１の先行技術は、全文検索と自然言語検索をそれぞれ独立したものとして扱い、両検索結果をマージするものである。具体的には、全文検索及び自然言語検索の一方による検索結果の中から、全文検索及び自然言語検索の他方により検索結果を絞り込むものである。したがって第１の先行技術においては、全文検索と自然言語検索とが必ず実行される。ところが、全文検索は自然言語検索に比べて低速である。このため、全文検索にＮグラムインデックスによる検索手法を適用する場合であれば、当該全文検索に、Ｎグラムインデックスでのトータルの検索実行時間（一次検索時間＋二次検索時間）を要する。つまり、第１の先行技術は、全文検索の欠点を解消して当該全文検索そのものを高速化するための仕組みを有していない。このため第１の先行技術は、検索ヒット件数が多い場合に問題となる。 Thus, the first prior art treats full-text search and natural language search as independent ones, and merges both search results. Specifically, the search result is narrowed down by the other of the full-text search and the natural language search from the search result by one of the full-text search and the natural language search. Therefore, in the first prior art, full text search and natural language search are always executed. However, full-text search is slower than natural language search. For this reason, if the search method using the N-gram index is applied to the full-text search, the full-text search requires a total search execution time (primary search time + secondary search time) using the N-gram index. That is, the first prior art does not have a mechanism for eliminating the drawbacks of full text search and speeding up the full text search itself. Therefore, the first prior art becomes a problem when the number of search hits is large.

また、「検索式」を前もって評価し、検索式が「キーワード型（全文検索）」であればＮグラムインデックスによる全文検索を用い、「自然言語型（概念検索）」であれば形態素インデックスによる検索を用いる文書検索手法も提案されている（例えば、特許文献２参照）。 In addition, the “search expression” is evaluated in advance, and if the search expression is “keyword type (full text search)”, the full text search using the N-gram index is used. A document search method using the above has also been proposed (see, for example, Patent Document 2).

この特許文献２に記載された文書検索技術（以下、第２の先行技術と称する）では、「キーワード型」と判定されれば、検索処理に、Ｎグラムインデックスでのトータル検索実行時間（一次検索、二次検索）は要することとなり、第１の先行技術と同様に、全文検索そのものを高速化することにはつながらない。
特開２００１−９２８３１号公報（段落０００５乃至００１０、図５）特開２００３−３０８３３５号公報（段落００１１） In the document search technique described in Patent Document 2 (hereinafter referred to as the second prior art), if it is determined as “keyword type”, the total search execution time (primary search) using the N-gram index is included in the search process. Secondary search) is required, and as with the first prior art, the full text search itself is not accelerated.
JP 2001-92831 (paragraphs 0005 to 0010, FIG. 5) JP 2003-308335 A (paragraph 0011)

上記した第１の先行技術においては、全文検索が必ず実行される。また、上記第２の先行技術においても、検索式が「キーワード型（全文検索）」であればＮグラムインデックスによる全文検索が実行される。このＮグラムインデックスによる全文検索には、多大な時間を要する。しかし、第１及び第２の先行技術のいずれも、全文検索そのものを高速化するための仕組みを有していない。 In the first prior art described above, a full text search is always executed. Also in the second prior art, if the search expression is “keyword type (full-text search)”, full-text search using the N-gram index is executed. This full-text search using the N-gram index takes a lot of time. However, neither of the first and second prior arts has a mechanism for speeding up the full text search itself.

本発明は上記事情を考慮してなされたものでその目的は、Ｎグラムインデックスによる一次検索と形態素インデックスによる検索とを行い、両検索結果が近似している場合にＮグラムインデックスによる二次検索を省くことで、検索精度をある程度確保しながら、全文検索を高速に実行できる全文検索システム及び方法を提供することにある。 The present invention has been made in consideration of the above circumstances, and its purpose is to perform a primary search using an N-gram index and a search using a morpheme index, and perform a secondary search using an N-gram index when both search results are approximate. It is an object of the present invention to provide a full-text search system and method capable of executing full-text search at a high speed while omitting the search accuracy to some extent.

本発明の１つの観点によれば、Ｎグラムインデックスによる検索を一次検索と当該一次検索結果に対する二次検索とにより実行する第１の検索手段と、検索条件文を形態素解析する形態素解析手段と、この形態素解析手段による形態素解析結果に基づいて形態素インデックスによる形態素検索を実行する第２の検索手段とを備えた全文検索システムが提供される。このシステムは、上記Ｎグラムインデックスによる一次検索のヒット件数と上記形態素インデックスによる形態素検索のヒット件数との近似度を判定する近似度判定手段と、上記Ｎグラムインデックスによる一次検索のヒット件数と上記形態素インデックスによる形態素検索のヒット件数とが近似していると上記近似度判定手段によって判定された場合、上記Ｎグラムインデックスによる二次検索が省略されるように上記第１の検索手段を制御して、上記一次検索の結果または上記形態素検索の結果を検索結果として採用する全文検索実行制御手段とを備える。 According to one aspect of the present invention, a first search unit that executes a search based on an N-gram index by a primary search and a secondary search for the primary search result, a morpheme analysis unit that performs a morphological analysis on a search condition statement, There is provided a full-text search system including a second search unit that executes a morpheme search using a morpheme index based on a morpheme analysis result by the morpheme analysis unit. The system includes an approximation determining means for determining a degree of approximation between the number of primary search hits using the N-gram index and the number of morpheme search hits using the morpheme index, the number of primary search hits using the N-gram index and the morpheme When it is determined by the degree-of-approximation determining means that the number of morpheme hits by index is approximate, the first search means is controlled so that the secondary search by the N-gram index is omitted, A full-text search execution control unit that employs the result of the primary search or the result of the morpheme search as a search result.

上記の構成においては、形態素インデックスによる検索の結果であるヒット件数を評価値として用い、当該形態素インデックスによる検索のヒット件数とＮグラムインデックスによる一次検索のヒット件数とが近似している場合に、Ｎグラムによる二次検索を省くことで、ある程度の検索精度を確保しながら、高速検索を実現することができる。 In the above configuration, when the number of hits as a result of the search using the morpheme index is used as an evaluation value, the number of hits for the search using the morpheme index is close to the number of hits for the primary search using the N-gram index. By omitting the secondary search by the gram, a high-speed search can be realized while ensuring a certain degree of search accuracy.

ここで、形態素解析手段による形態素解析結果に基づいて上記検索条件文が形態素検索可能な単語に分割できたかを判定する形態素解析結果判定手段を追加し、検索条件文が形態素検索可能な単語に分割できたと判定された場合に限り、上記形態素インデックスによる形態素検索が実行されるように上記全文検索実行制御手段が上記第２の検索手段を制御する構成とすると良い。 Here, based on the morpheme analysis result by the morpheme analysis unit, a morpheme analysis result determination unit that determines whether the search condition sentence can be divided into words that can be searched for morpheme is added, and the search condition sentence is divided into words that can be searched for morpheme. Only when it is determined that the search has been completed, the full-text search execution control unit may control the second search unit so that the morpheme search using the morpheme index is executed.

検索条件文が形態素検索可能な単語に分割できたということは、形態素解析結果と形態素インデックスに含まれる形態素との単語の分割のされ方が多くの場合同じであることが予想される。したがって、このときの形態素解析結果を用いて形態素インデックスによる形態素検索が行われた場合、評価値としての形態素インデックス検索結果（ヒット件数）の精度（信頼性）、つまり近似度判定手段による近似度判定の精度（信頼性）を、ある程度保証することができる。 The fact that the search condition sentence can be divided into words that can be searched for morpheme is expected to be the same in many cases in how words are divided between the morpheme analysis result and the morpheme included in the morpheme index. Therefore, when a morpheme search using the morpheme index is performed using the morpheme analysis result at this time, the accuracy (reliability) of the morpheme index search result (number of hits) as an evaluation value, that is, the approximation determination by the approximation determination unit Accuracy (reliability) can be guaranteed to some extent.

また、上記一次検索の結果または上記形態素検索の結果のいずれを検索結果として採用するかが、上記形態素解析手段の解析結果の単語数（検索条件文が分割された単語数）と基準の単語数とに基づいて決定される構成とすると良い。 Whether the result of the primary search or the result of the morpheme search is adopted as the search result is determined based on the number of words in the analysis result of the morpheme analysis unit (the number of words into which the search condition sentence is divided) and the number of reference words. It is preferable that the configuration is determined based on the above.

このような構成において、形態素解析の結果の単語数が少ない（例えば１単語の）場合、形態素検索の検索漏れは殆どないことが期待されることから、形態素検索の結果の方が一次検索の結果よりも精度がより高いと見なすことができる。したがって、形態素解析の結果の単語数と基準の単語数とに基づいて、一次検索の結果または形態素検索の結果のいずれを検索結果として採用するかを決定することにより、より精度の高い検索結果を採用することができる。 In such a configuration, when the number of words as a result of the morphological analysis is small (for example, one word), it is expected that there is almost no omission in the search for the morpheme search, so the result of the morpheme search is the result of the primary search. Can be considered more accurate. Therefore, based on the number of words in the morphological analysis result and the number of reference words, it is possible to obtain a more accurate search result by determining which of the primary search result or the morpheme search result is adopted as the search result. Can be adopted.

また、一次検索のヒット件数を基準のヒット件数と比較し、一次検索のヒット件数が基準のヒット件数よりも少ない場合には、Ｎグラムインデックスによる二次検索が実行される構成とすると良い。一次検索のヒット件数が少ない場合、二次検索を行っても検索性能に影響を与えない。したがって、一次検索のヒット件数が少ない場合には、二次検索を行うことで、検索性能に影響を及ぼすことなく、より一層精度の高い検索を実現できる。 Further, it is preferable that the number of primary search hits is compared with the reference hit count, and if the number of primary search hits is smaller than the reference hit count, a secondary search using the N-gram index is executed. If the number of hits in the primary search is small, the secondary search will not affect the search performance. Therefore, when the number of hits in the primary search is small, a secondary search is performed to realize a search with higher accuracy without affecting the search performance.

ここで、上記基準の単語数をユーザにより指定可能とするためのユーザインタフェースを追加するならば、検索条件文に応じたチューニングが可能となる。また、上記基準のヒット件数をユーザにより指定可能とするためのユーザインタフェースを追加するならば、利用者の環境に応じたチューニングが可能となる。また、上記近似度判定手段による判定の基準となる近似度をユーザにより指定可能とするためのユーザインタフェースを追加するならば、検索条件文、或は検索の対象となる文書群の特徴に応じたチューニングが可能となる。 Here, if a user interface for allowing the user to designate the number of reference words is added, tuning according to the search condition sentence becomes possible. Further, if a user interface for allowing the user to designate the number of hits of the reference is added, tuning according to the user's environment becomes possible. Further, if a user interface is added to allow the user to specify the degree of approximation that is the criterion for determination by the degree-of-approximation determination means, it corresponds to the search condition sentence or the characteristics of the document group to be searched. Tuning becomes possible.

また、いかなる場合もＮグラムインデックスによる二次検索まで行う標準検索、または上記近似度判定手段による判定結果に基づいてＮグラムインデックスによる二次検索が省略される可能性のある高速検索のいずれか一方をユーザにより指定可能とするためのユーザインタフェースを追加すると良い。このようにすると、検索速度を優先するか、或は検索精度を優先するかに対するユーザの意志を反映することが可能となる。 In either case, either a standard search up to a secondary search using the N-gram index or a high-speed search in which the secondary search using the N-gram index may be omitted based on the determination result by the approximation degree determination means. It is preferable to add a user interface that allows the user to specify the above. In this way, it is possible to reflect the user's intention to prioritize search speed or search accuracy.

また、Ｎグラムインデックスによる一次検索と形態素インデックスによる形態素検索とが並列に実行されるように、上記全文検索実行制御手段が上記第１の検索手段と上記第２の検索手段とをそれぞれ制御する構成とすると良い。このようにすると、より高速な検索が可能となる。 The full-text search execution control means controls the first search means and the second search means so that the primary search using the N-gram index and the morpheme search using the morpheme index are executed in parallel. And good. In this way, a faster search is possible.

本発明によれば、Ｎグラムインデックスによる一次検索と形態素インデックスによる検索とを行い、両検索結果が近似している場合にＮグラムインデックスによる二次検索を省くことによって、検索精度をある程度確保しながら、全文検索を高速に実行することができる。 According to the present invention, a primary search using an N-gram index and a search using a morpheme index are performed, and when both search results are approximate, a secondary search using an N-gram index is omitted, thereby ensuring a certain degree of search accuracy. The full text search can be executed at high speed.

以下、本発明の一実施形態につき図面を参照して説明する。
図１は本発明の一実施形態に係る全文検索システムの構成を示すブロック図である。この全文検索システムは、利用者の要求に応じて、Ｎグラムインデックスによる全文検索及び自然言語検索（形態素インデックスによる検索）を行うシステムである。但し、図１の全文検索システムは、一定の条件を満たした場合、全文検索の一部（Ｎグラムインデックスによる二次検索）が省略される。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing the configuration of a full-text search system according to an embodiment of the present invention. This full-text search system is a system that performs full-text search using an N-gram index and natural language search (search using a morpheme index) in response to a user request. However, the full-text search system of FIG. 1 omits a part of the full-text search (secondary search by N-gram index) when a certain condition is satisfied.

図１の全文検索システムは、ユーザインタフェース１１と、検索実行/応答サーバ１２と、Ｎグラム検索エンジン１３と、Ｎグラムインデックスデータベース１４と、形態素解析機構１５と、形態素検索エンジン１６と、形態素インデックスデータベース１７と、全文検索実行制御機構１８とから構成される。 The full-text search system of FIG. 1 includes a user interface 11, a search execution / response server 12, an N-gram search engine 13, an N-gram index database 14, a morpheme analysis mechanism 15, a morpheme search engine 16, and a morpheme index database. 17 and a full-text search execution control mechanism 18.

ユーザインタフェース１１は、ユーザからの検索要求を受け付けると共に検索結果をユーザに提示するインタフェース機能を有する。本実施形態において、ユーザインタフェース１１は、全文検索システムの一部を構成しているが、これに限るものではない。例えば、ユーザインタフェース１１が、図１の全文検索システムと通信回線（例えばネットワーク）を介して接続されたクライアント端末に設けられる構成であっても構わない。 The user interface 11 has an interface function that accepts a search request from the user and presents the search result to the user. In the present embodiment, the user interface 11 constitutes a part of the full-text search system, but is not limited to this. For example, the user interface 11 may be provided in a client terminal connected to the full-text search system of FIG. 1 via a communication line (for example, a network).

検索実行/応答サーバ１２は、ユーザインタフェース１１により受け付けられた検索要求の示す検索条件を、Ｎグラムインデックスによる検索または形態素による検索（に必要な形態素解析）のために、Ｎグラム検索エンジン１３または形態素解析機構１５に渡す。検索実行/応答サーバ１２はまた、Ｎグラム検索エンジン１３または形態素解析機構１５による検索結果をユーザインタフェース１１によってユーザに提示させる。 The search execution / response server 12 uses the N-gram search engine 13 or the morpheme for the search condition indicated by the search request received by the user interface 11 for the search by the N-gram index or the search by the morpheme (the necessary morpheme analysis). It passes to the analysis mechanism 15. The search execution / response server 12 also causes the user interface 11 to present the search result by the N-gram search engine 13 or the morphological analysis mechanism 15 to the user.

Ｎグラム検索エンジン１３は、Ｎグラムインデックスデータベース１４に格納されているＮグラムインデックスを用いて全文検索を行う。Ｎグラム検索エンジン１３は、一次検索実行部１３１と二次検索実行部１３２とを含む。一次検索実行部１３１は、Ｎグラムインデックスによる一次検索を行う。二次検索実行部１３２は、Ｎグラムインデックスによる二次検索を行う。Ｎグラムインデックスデータベース１４に格納されるＮグラムインデックスには、検索の対象となり得る文書中に現れるすべての文字を予め定められた長さＮの連続する文字列（グラム）として、その文字列毎に、その文字列の位置情報が登録されている。 The N-gram search engine 13 performs a full text search using the N-gram index stored in the N-gram index database 14. The N-gram search engine 13 includes a primary search execution unit 131 and a secondary search execution unit 132. The primary search execution unit 131 performs a primary search using an N-gram index. The secondary search execution unit 132 performs a secondary search using an N-gram index. The N-gram index stored in the N-gram index database 14 includes all characters appearing in a document that can be searched as continuous character strings (grams) having a predetermined length N, for each character string. The position information of the character string is registered.

形態素解析機構１５は、検索条件を形態素解析する。形態素検索エンジン１６は、形態素解析機構１５による形態素解析結果に従い、形態素インデックスデータベース１７に格納されている形態素インデックスを用いて形態素検索を行う。形態素インデックスデータベースに格納される形態素インデックスには、検索の対象となり得る文書から抽出された形態素毎に割り当てられた、位置情報を含む文書情報が登録されている。 The morphological analysis mechanism 15 performs morphological analysis on the search condition. The morpheme search engine 16 performs a morpheme search using the morpheme index stored in the morpheme index database 17 according to the morpheme analysis result by the morpheme analysis mechanism 15. In the morpheme index stored in the morpheme index database, document information including position information assigned to each morpheme extracted from a document that can be searched is registered.

全文検索実行制御機構１８は、Ｎグラムインデックスを用いた全文検索を高速に実行するために、設定情報ファイル１９の設定内容に従ってＮグラム検索エンジン１３及び形態素検索エンジン１６を制御する。設定情報ファイル１９には、全文検索実行制御機構１８による全文検索の実行の制御に必要な条件等の情報が設定されている。 The full-text search execution control mechanism 18 controls the N-gram search engine 13 and the morpheme search engine 16 according to the setting contents of the setting information file 19 in order to execute full-text search using the N-gram index at high speed. In the setting information file 19, information such as conditions necessary for controlling execution of full text search by the full text search execution control mechanism 18 is set.

全文検索実行制御機構１８は、形態素解析結果判定部１８１と一次検索結果数判定部１８２と検索結果数近似度判定部１８３とを含む。形態素解析結果判定部１８１は、形態素解析機構１５による検索条件に対する形態素解析結果に基づいて、形態素インデックスによる検索（形態素検索）またはＮグラムインデックスによる二次検索のいずれを実行すべきかを決定する。一次検索結果数判定部１８２は、Ｎグラムインデックスによる一次検索結果に基づいて、Ｎグラムインデックスによる二次検索を実行すべきかを決定する。検索結果数近似度判定部１８３は、Ｎグラムインデックスによる一次検索結果と形態素検索結果とに基づいて、Ｎグラムインデックスによる二次検索を実行すべきかを決定する。 The full-text search execution control mechanism 18 includes a morphological analysis result determination unit 181, a primary search result number determination unit 182, and a search result number approximation degree determination unit 183. The morpheme analysis result determination unit 181 determines whether to perform a search using a morpheme index (morpheme search) or a secondary search using an N-gram index based on a morpheme analysis result for a search condition by the morpheme analysis mechanism 15. The primary search result number determination unit 182 determines whether to perform a secondary search using the N-gram index based on the primary search result using the N-gram index. The search result number approximation degree determination unit 183 determines whether to perform a secondary search using the N-gram index based on the primary search result using the N-gram index and the morpheme search result.

次に、図１の全文検索システムにおいて実行される高速検索モードでの全文検索処理（高速検索処理）の手順について、図２のフローチャートを参照して説明する。なお、本実施形態では、検索モードとして上記高速検索モードに加えて標準検索モードが用意され、後述するようにユーザによって選択可能なようになっている。 Next, the procedure of full text search processing (high speed search processing) in the high speed search mode executed in the full text search system of FIG. 1 will be described with reference to the flowchart of FIG. In this embodiment, a standard search mode is prepared as a search mode in addition to the high-speed search mode, and can be selected by the user as will be described later.

今、全文検索を希望するユーザが、クライアント端末を用いた入力操作を行うことで、当該端末から図１の全文検索システムに対して、全文検索を指定する検索要求が送られたものとする。ユーザインタフェース１１は、この検索要求を受け付けて、当該検索要求の示す検索条件を抽出する。ユーザインタフェース１１は、抽出された検索条件を検索実行/応答サーバ１２に送る。また、ユーザインタフェース１１は、検索要求の示す検索種別（例えば全文検索）を検索実行/応答サーバ１２に通知する。検索実行/応答サーバ１２は、全文検索が指定されている場合には、全文検索を実行するために、ユーザインタフェース１１から渡された検索条件をＮグラム検索エンジン１３に送る。 Now, it is assumed that a user who desires full-text search performs an input operation using a client terminal, and a search request for designating full-text search is sent from the terminal to the full-text search system of FIG. The user interface 11 receives this search request and extracts search conditions indicated by the search request. The user interface 11 sends the extracted search condition to the search execution / response server 12. Further, the user interface 11 notifies the search execution / response server 12 of the search type (for example, full-text search) indicated by the search request. The search execution / response server 12 sends the search condition passed from the user interface 11 to the N-gram search engine 13 in order to execute the full text search when the full text search is designated.

Ｎグラム検索エンジン１３内の一次検索実行部１３１は、検索実行/応答サーバ１２から送られた検索条件に従い、Ｎグラムインデックスデータベース１４に格納されているＮグラムインデックスを用いて、周知の一次検索を実行する（ステップＳ１）。一次検索実行部１３１は、一次検索結果をＮグラム検索エンジン１３内部に保持すると共に、検索でヒットした数（ヒット件数）Ｎ１を、対応する検索条件と共に全文検索実行制御機構１８へ送る。 The primary search execution unit 131 in the N-gram search engine 13 performs a well-known primary search using the N-gram index stored in the N-gram index database 14 in accordance with the search conditions sent from the search execution / response server 12. Execute (Step S1). The primary search execution unit 131 holds the primary search result in the N-gram search engine 13 and sends the number of hits (number of hits) N1 to the full-text search execution control mechanism 18 together with the corresponding search conditions.

全文検索実行制御機構１８内の一次検索結果数判定部１８２は、一次検索実行部１３１から送られたヒット件数Ｎ１を設定情報ファイル１９に設定されているヒット件数閾値Ｋと比較し、その大小を判定する（ステップＳ２）。もし、ヒット件数Ｎ１がＫ以下の場合は、全文検索実行制御機構１８はＮグラム検索エンジン１３に対して二次検索を要求する。Ｋは、後述するようにユーザの操作により変更（チューニング）可能である。 The primary search result number determination unit 182 in the full-text search execution control mechanism 18 compares the hit number N1 sent from the primary search execution unit 131 with the hit number threshold value K set in the setting information file 19, and determines the magnitude Determine (step S2). If the number of hits N1 is less than or equal to K, the full-text search execution control mechanism 18 requests the N-gram search engine 13 for a secondary search. K can be changed (tuned) by a user operation as will be described later.

これに対し、ヒット件数Ｎ１がＫより大きい場合は、一次検索結果数判定部１８２はヒット件数Ｎ１を全文検索実行制御機構１８の内部に保持した後、検索条件を形態素解析機構１５に送る。形態素解析機構１５は、一次検索結果数判定部１８２から検索条件を受け取ると、当該検索条件を形態素解析する（ステップＳ３）。そして形態素解析機構１５は、形態素解析した結果を全文検索実行制御機構１８へ送る。 On the other hand, when the hit count N1 is larger than K, the primary search result count determination unit 182 holds the hit count N1 in the full-text search execution control mechanism 18 and then sends the search condition to the morphological analysis mechanism 15. When receiving the search condition from the primary search result number determination unit 182, the morpheme analysis mechanism 15 performs morphological analysis on the search condition (step S 3). Then, the morpheme analysis mechanism 15 sends the result of the morpheme analysis to the full-text search execution control mechanism 18.

全文検索実行制御機構１８内の形態素解析結果判定部１８１は、形態素解析機構１５による形態素解析結果を判定する（ステップＳ４）。即ち形態素解析結果判定部１８１は、検索条件が形態素インデックスによる検索（形態素検索）が可能な単語、つまり、それ自体で意味のある単語（例えば、名詞、動詞、形容詞に代表される自立語）に分割できたかを判定する。もし、検索条件が形態素検索可能な単語に分割できなかった場合、全文検索実行制御機構１８はＮグラム検索エンジン１３に対して二次検索を要求する。 The morpheme analysis result determination unit 181 in the full-text search execution control mechanism 18 determines the morpheme analysis result by the morpheme analysis mechanism 15 (step S4). That is, the morpheme analysis result determination unit 181 uses words that can be searched using a morpheme index (morpheme search), that is, words that are meaningful by themselves (for example, independent words represented by nouns, verbs, and adjectives). Judge whether it was divided. If the search condition cannot be divided into words that can be searched for morpheme, the full-text search execution control mechanism 18 requests the N-gram search engine 13 to perform a secondary search.

これに対し、検索条件が形態素検索可能な単語に分割できた場合には、形態素解析結果判定部１８１は形態素解析機構１５による形態素解析結果を形態素検索エンジン１６へ送る。形態素検索エンジン１６は、形態素解析結果判定部１８１から形態素解析結果を受け取ると、当該形態素解析結果と形態素インデックスデータベース１７とを用いて、形態素検索を行う（ステップＳ５）。そして形態素解析機構１５は、形態素検索結果を当該形態素解析機構１５内部に保持すると共に、検索でヒットした数（ヒット件数）Ｎ２を全文検索実行制御機構１８に送る。 In contrast, when the search condition can be divided into words that can be searched for morpheme, the morpheme analysis result determination unit 181 sends the morpheme analysis result by the morpheme analysis mechanism 15 to the morpheme search engine 16. When receiving the morpheme analysis result from the morpheme analysis result determination unit 181, the morpheme search engine 16 performs a morpheme search using the morpheme analysis result and the morpheme index database 17 (step S 5). The morpheme analysis mechanism 15 holds the morpheme search result in the morpheme analysis mechanism 15 and sends the number of hits (number of hits) N2 to the full-text search execution control mechanism 18.

全文検索実行制御機構１８内の検索結果数近似度判定部１８３は、全文検索実行制御機構１８内部に保持されているＮグラム検索エンジン１３（内の一次検索実行部１３１）による一次検索でのヒット件数Ｎ１と形態素検索エンジン１６から送られた形態素検索でのヒット件数Ｎ２とが近似している（Ｎ１≒Ｎ２）かを判定する（ステップＳ６）。ここでは、検索結果数近似度判定部１８３は、Ｎ１とＮ２との近似度（％）が、設定情報ファイル１９に設定されている近似度設定値Ｐ（％）以内であるかを判定する。本実施形態において、Ｎ１とＮ２との近似度は、｜Ｎ１−Ｎ２｜×１００％／Ｎ１または｜Ｎ１−Ｎ２｜×１００％／Ｎ２、つまりＮ１とＮ２との差分の絶対値のＮ１またはＮ２に対する割合（％）で示され、値が小さいほどＮ１とＮ２とが近似していることを表す。近似度設定値Ｐは、後述するようにユーザの操作によりチューニング可能である。 The search result number approximation degree determination unit 183 in the full-text search execution control mechanism 18 is a hit in the primary search by the N-gram search engine 13 (the primary search execution unit 131) held in the full-text search execution control mechanism 18. It is determined whether the number N1 and the number of hits N2 in the morpheme search sent from the morpheme search engine 16 are approximate (N1≈N2) (step S6). Here, the search result number approximation degree determination unit 183 determines whether the approximation degree (%) between N1 and N2 is within the approximation setting value P (%) set in the setting information file 19. In this embodiment, the degree of approximation between N1 and N2 is | N1-N2 | × 100% / N1 or | N1-N2 | × 100% / N2, that is, N1 or N2 of the absolute value of the difference between N1 and N2. The smaller the value, the closer the N1 and N2 are. The closeness setting value P can be tuned by a user operation as will be described later.

検索結果数近似度判定部１８３は、Ｎ１とＮ２との近似度がＰを超えている場合は、Ｎ１とＮ２とは近似していないものと判定する。この場合、全文検索実行制御機構１８はＮグラム検索エンジン１３に対して二次検索を要求する。 When the approximation degree between N1 and N2 exceeds P, the search result number approximation degree determination unit 183 determines that N1 and N2 are not approximate. In this case, the full-text search execution control mechanism 18 requests the N-gram search engine 13 for a secondary search.

これに対し、Ｎ１とＮ２との近似度がＰ以内である場合は、検索結果数近似度判定部１８３はＮ１とＮ２とは近似しているものと判定する。この場合、全文検索実行制御機構１８は、設定情報ファイル１９に設定されている情報に従い、Ｎグラム検索優先か形態素検索優先かを決定する。Ｎグラム検索優先の場合、全文検索実行制御機構１８はＮグラム検索エンジン１３に対して内部保持してある一次検索結果を検索実行/応答サーバ１２に返答するように要求する。これに対し、形態素検索優先の場合には、全文検索実行制御機構１８は、形態素検索エンジン１６に対して内部保持してある形態素検索結果を検索実行/応答サーバ１２に返答するように要求する。つまり検索結果数近似度判定部１８３は、一次検索または形態素検索による検索結果を検索実行/応答サーバ１２に応答させる（ステップＳ７）。 On the other hand, when the degree of approximation between N1 and N2 is within P, the search result number approximation degree determination unit 183 determines that N1 and N2 are approximate. In this case, the full-text search execution control mechanism 18 determines whether to give priority to N-gram search or morpheme search according to the information set in the setting information file 19. In the case of N-gram search priority, the full-text search execution control mechanism 18 requests the N-gram search engine 13 to return the primary search result held internally to the search execution / response server 12. On the other hand, when priority is given to morpheme search, the full-text search execution control mechanism 18 requests the morpheme search engine 16 to return a morpheme search result stored internally to the search execution / response server 12. That is, the search result number approximation degree determination unit 183 causes the search execution / response server 12 to respond to the search result by the primary search or the morpheme search (step S7).

検索実行/応答サーバ１２は、一次検索または形態素検索による検索結果を全文検索実行制御機構１８または形態素検索エンジン１６から受け取ると、その検索結果をユーザインタフェース１１（及び検索アプリケーション）を介してユーザに通知する。この検索結果には、どのような判定により検索が実行されたかを示す情報が付加されている。 When the search execution / response server 12 receives the search result by the primary search or the morpheme search from the full-text search execution control mechanism 18 or the morpheme search engine 16, the search execution / response server 12 notifies the user of the search result via the user interface 11 (and the search application). To do. Information indicating what kind of determination the search is executed is added to the search result.

このように本実施形態では、（ａ１）Ｎグラムによる一次検索でのヒット件数Ｎ１がＫを超えていて、且つ（ａ２）検索条件が形態素検索可能な単語に分割でき、且つ（ａ３）Ｎグラムによる一次検索でのヒット件数Ｎ１と形態素検索でのヒット件数Ｎ２とが近似している（Ｎ１≒Ｎ２の）場合には、Ｎグラムによる二次検索の実行が省略されて、検索要求に対する検索結果として、一次検索または形態素検索による検索結果が採用される。しかし、上記（ａ１），（ａ２），（ａ３）の条件のうちの（ａ３）の条件さえ満たすならば、Ｎグラムによる二次検索の実行を省略しても、ある程度の検索精度を確保し得る。そこで、少なくとも（ａ３）の条件を満たす場合に、Ｎグラムによる二次検索の実行を省略するようにしても構わない。いずれの場合にも、検索精度の低下を抑えながら、Ｎグラムによる二次検索を省くことで、全文検索の高速化を実現できる。但し、上記（ａ１）の条件を満たさない場合には、たとえＮグラムによる二次検索を行っても、性能への影響が少ないことから、二次検索を省く利点は少ない。また、上記（ａ２）の条件を満たさない場合には、（ａ３）の条件を満たすかの判定の信頼性が低下する。 Thus, in the present embodiment, (a1) the number of hits N1 in the primary search using N-grams exceeds K, (a2) the search condition can be divided into words that can be searched for morpheme, and (a3) N-grams When the number of hits N1 in the primary search by N and the number of hits N2 in the morpheme search are similar (N1≈N2), the execution of the secondary search by N-gram is omitted, and the search result for the search request As a result, a search result by primary search or morphological search is adopted. However, as long as the condition (a3) among the above conditions (a1), (a2), and (a3) is satisfied, even if the execution of the secondary search using N-grams is omitted, a certain degree of search accuracy is secured. obtain. Therefore, when at least the condition (a3) is satisfied, the secondary search using N-grams may be omitted. In any case, the speed of the full text search can be increased by omitting the secondary search using N-grams while suppressing a decrease in search accuracy. However, if the above condition (a1) is not satisfied, even if a secondary search using N-grams is performed, the effect on performance is small, so there are few advantages of omitting the secondary search. Further, when the condition (a2) is not satisfied, the reliability of determining whether the condition (a3) is satisfied is lowered.

さて、Ｎグラム検索エンジン１３内の二次検索実行部１３２は、全文検索実行制御機構１８からの二次検索要求を受けた場合のみ、Ｎグラムインデックスの一次検索結果に対して二次検索を実行する（ステップＳ８）。前記したように、本実施形態において全文検索実行制御機構１８からＮグラム検索エンジン１３に対して二次検索が要求されるのは、（ｂ１）Ｎグラムによる一次検索でのヒット件数Ｎ１がＫ以下の場合、（ｂ２）検索条件が形態素検索可能な単語に分割できなかった場合、または（ｂ３）Ｎグラムによる一次検索でのヒット件数Ｎ１と形態素検索でのヒット件数Ｎ２とが近似していない場合のいずれかである。上記（ｂ１）の場合、十分高い検索精度を確保するためにＮグラムによる二次検索を行っても、検索性能への影響が少ない。上記（ｂ２）または（ｂ３）の場合、形態素検索またはＮグラムによる一次検索のみで、ある程度の検索精度を確保できる保証はない。このため本実施形態では、検索性能に影響を及ぼす可能性があっても、十分高い検索精度を確保するためにＮグラムによる二次検索が行われる。 The secondary search execution unit 132 in the N-gram search engine 13 executes a secondary search for the primary search result of the N-gram index only when a secondary search request is received from the full-text search execution control mechanism 18. (Step S8). As described above, in this embodiment, the secondary search is requested from the full-text search execution control mechanism 18 to the N-gram search engine 13 because (b1) the number of hits N1 in the primary search by N-gram is K or less. In the case of (b2) when the search condition cannot be divided into words that can be searched for morpheme, or (b3) the number of hits N1 in the primary search by N-gram and the number of hits N2 in the morpheme search are not approximate One of them. In the case of (b1), even if a secondary search using N-grams is performed in order to ensure a sufficiently high search accuracy, the influence on the search performance is small. In the case of the above (b2) or (b3), there is no guarantee that a certain degree of search accuracy can be ensured only by a morpheme search or a primary search by N-gram. For this reason, in this embodiment, even if there is a possibility of affecting the search performance, a secondary search using N-grams is performed in order to ensure sufficiently high search accuracy.

さて、本実施形態においてユーザインタフェース１１は、全文検索の精度をユーザに選択させる第１の検索インタフェースと、上述した高速検索を実行する際に用いられるチューニングパラメタをユーザに指定させる第２の検索インタフェースとを有する。ユーザインタフェース１１は、これらの第１及び第２の検索インタフェースを実現するための検索インタフェース画面をユーザに提示する。図３は、全文検索が指定された場合における検索インタフェース画面の一例を示す。この検索インタフェース画面は検索実行画面の１つであり、検索条件としての例えばキーワードを指定する検索条件フィールド３１と、検索実行を指示するための検索ボタン３２とに加えて、検索精度選択領域３３と、チューニングパラメタ領域３４とを含む。 In the present embodiment, the user interface 11 includes a first search interface that allows the user to select the accuracy of full-text search, and a second search interface that allows the user to specify tuning parameters used when executing the above-described high-speed search. And have. The user interface 11 presents a search interface screen for realizing these first and second search interfaces to the user. FIG. 3 shows an example of a search interface screen when full-text search is designated. This search interface screen is one of search execution screens. In addition to a search condition field 31 for specifying, for example, a keyword as a search condition, and a search button 32 for instructing search execution, a search accuracy selection area 33 and And tuning parameter area 34.

検索精度選択領域３３には、高速検索モードを指示するための「高速」選択ボタン３３１と、標準検索モードを指示するための「標準」選択ボタン３３２とが配置されている。 In the search accuracy selection area 33, a “high speed” selection button 331 for instructing a high-speed search mode and a “standard” selection button 332 for instructing a standard search mode are arranged.

・標準
「標準」選択ボタン３３２が選択されて検索が指示された場合には、標準検索モードでの全文検索処理（標準検索処理）が実行される。ここでは、Ｎグラムインデックスによる検索（一次及び二次検索）が行われる。この場合、検索結果は完全なものとなるが、検索速度は低下する。 Standard When the “standard” selection button 332 is selected and a search is instructed, a full-text search process (standard search process) in the standard search mode is executed. Here, a search (primary and secondary search) using an N-gram index is performed. In this case, the search result is perfect, but the search speed decreases.

・高速
「高速」選択ボタン３３１が選択されて検索が指示された場合には、高速検索モードでの全文検索処理（高速検索処理）が実行される。ここでは、上述した図２のフローチャートに従う検索が行われ、Ｎグラムインデックスによる一次検索結果と形態素による検索結果とが近似している場合には、ある程度の検索精度を確保しながら、高速な検索が可能となる。 High-speed When the “high-speed” selection button 331 is selected and a search is instructed, a full-text search process (high-speed search process) in the high-speed search mode is executed. Here, when the search according to the flowchart of FIG. 2 described above is performed and the primary search result based on the N-gram index and the search result based on the morpheme are approximate, a high-speed search can be performed while ensuring a certain degree of search accuracy. It becomes possible.

チューニングパラメタ領域３４には、ヒット件数閾値（基準となるヒット件数）Ｋを指定するためのヒット件数フィールド３４１と、近似度設定値（近似割合）Ｐを指定するための近似割合フィールド３４２と、採用条件を指定するための採用条件フィールド３４３とが配置されている。 In the tuning parameter area 34, a hit count field 341 for designating a hit count threshold (reference hit count) K, an approximate ratio field 342 for specifying an approximation setting value (approximate ratio) P, and an adoption are adopted. An adoption condition field 343 for designating a condition is arranged.

・ヒット件数
Ｎグラムインデックスによる一次検索において、検索ヒット件数がヒット件数フィールド３４１の指定値（ヒット件数閾値Ｋ）を超える場合に、Ｎグラムインデックスによる二次検索が省略可能な複数の条件の１つが成立したとして扱われる。検索ヒット件数がヒット件数フィールド３４１の指定値に満たない場合には、Ｎグラムインデックスの一次検索結果に対して二次検索が行われる。つまり、Ｎグラムインデックスの一次検索でのヒット件数が少ない場合には、二次検索を行っても検索性能に与える影響は少ないことから、二次検索を行って精度の高い完全な検索結果が求められる。 -Number of hits In the primary search using the N-gram index, when the number of search hits exceeds the specified value (hit-number threshold K) in the hit number field 341, one of a plurality of conditions that can omit the secondary search using the N-gram index is Treated as established. If the number of search hits is less than the value specified in the hit number field 341, a secondary search is performed on the primary search result of the N-gram index. In other words, when the number of hits in the primary search of the N-gram index is small, the secondary search has little effect on the search performance, so the secondary search is performed to obtain a complete search result with high accuracy. It is done.

・近似割合
Ｎグラムインデックスによる一次検索でのヒット件数と、形態素インデックスによる検索でのヒット件数との近似度が、指定近似割合（近似度設定値Ｐ）未満の場合に、Ｎグラムインデックスによる二次検索が省略可能な複数の条件の１つが成立したとして扱われる。上記近似度が指定近似割合を超えている場合には、Ｎグラムインデックスの一次検索結果に対して二次検索が行われる。つまり、Ｎグラムインデックスによる一次検索の結果と、形態素インデックスによる検索の結果とがかけ離れている場合には、検索精度が悪いと判断して、Ｎグラムインデックスによる二次検索が行われる。 -Approximate ratio When the degree of approximation between the number of hits in the primary search using the N-gram index and the number of hits in the search using the morpheme index is less than the specified approximate ratio (approximation set value P), the secondary using the N-gram index One of a plurality of conditions that can be omitted is treated as being satisfied. If the degree of approximation exceeds the specified approximation ratio, a secondary search is performed on the primary search result of the N-gram index. That is, when the result of the primary search using the N-gram index is far from the result of the search using the morpheme index, it is determined that the search accuracy is poor, and the secondary search using the N-gram index is performed.

・採用条件
Ｎグラムインデックスによる一次検索でのヒット件数と、形態素インデックスによる検索でのヒット件数とが近似していた場合、どちらの検索結果を採用してもその結果は妥当なものである。しかし、検索条件（キーワード）を形態素解析した結果得られる単語数が最小単語数以下となった場合には、形態素インデックスによる検索がより精度の良いものとなる。そこで、Ｎグラムインデックスによる一次検索結果または形態素インデックスによる検索結果のどちらを採用するかを決定する条件、つまりＮグラム検索優先か形態素検索優先かを決定する条件として、検索条件を形態素解析することによって分割された単語数の基準値（基準単語数）を採用条件フィールド３４３によって指定可能とする。また、検索条件を形態素解析した結果得られる単語の数が基準単語数を超えている場合には、形態素解析の結果の精度が低く、形態素インデックスによる検索の結果について、ある程度の精度を確保し得る保証はない。そこで、検索条件を形態素解析した結果得られる単語の数が基準単語数以下であるかの判定を、例えば上記ステップＳ４に加え、基準単語数を超えている場合には、Ｎグラムインデックスによる二次検索が行われるようにすると良い。一方、検索条件を形態素解析した結果得られる単語の数が基準単語数以下で、且つ上記（ａ１），（ａ２），（ａ３）の条件を満たす場合に、Ｎグラムインデックスによる二次検索が省略される構成とするならば、特に長い検索条件が指定された場合の検索時間の点で有効となる。 -Adopting conditions If the number of hits in the primary search using the N-gram index is close to the number of hits in the search using the morpheme index, the result is reasonable regardless of which search result is used. However, when the number of words obtained as a result of morphological analysis of the search condition (keyword) is less than the minimum number of words, the search using the morpheme index becomes more accurate. Therefore, by performing a morphological analysis on the search condition as a condition for determining whether to adopt the primary search result based on the N-gram index or the search result based on the morpheme index, that is, a condition for determining whether the N-gram search priority or the morphological search priority. A reference value (reference word number) of the number of divided words can be designated by the adoption condition field 343. In addition, when the number of words obtained as a result of morphological analysis of the search condition exceeds the reference number of words, the accuracy of the result of the morpheme analysis is low, and a certain degree of accuracy can be secured for the result of the search by the morpheme index. There is no guarantee. Therefore, in addition to the above step S4, for example, if the number of words obtained as a result of the morphological analysis of the search condition is less than the reference word number, if the reference word number is exceeded, the secondary by the N-gram index It is recommended that the search be performed. On the other hand, when the number of words obtained as a result of morphological analysis of the search condition is equal to or less than the reference word number and the above conditions (a1), (a2), and (a3) are satisfied, the secondary search using the N-gram index is omitted. With this configuration, it is effective in terms of search time particularly when a long search condition is specified.

図４は、検索実行/応答サーバ１２によってユーザインタフェース１１を介してユーザに通知される検索結果画面の一例を示す。この検索インタフェース画面は検索実行画面の１つであり、図３に示されている検索インタフェース画面と共通の、検索条件フィールド３１、検索ボタン３２、検索精度選択領域３３及びチューニングパラメタ領域３４に加えて、検索精度領域４１及び検索結果領域４２とを含む。 FIG. 4 shows an example of a search result screen notified to the user via the user interface 11 by the search execution / response server 12. This search interface screen is one of the search execution screens. In addition to the search condition field 31, the search button 32, the search accuracy selection area 33, and the tuning parameter area 34 that are common to the search interface screen shown in FIG. , A search accuracy area 41 and a search result area 42.

上述したように、高速検索処理が実行された場合、図２のフローチャートに従い、Ｎグラムインデックス及び形態素インデックスによる両検索結果とチューニングパラメタの情報とに基づいて、検索結果画面の検索結果領域４２を介してユーザに通知される検索結果が求められる。このユーザに通知される検索結果、つまり検索結果領域４２に表示される検索結果は、以下の３種の検索結果
（ａ）Ｎグラムインデックスによる検索（一次検索及び二次検索）の結果
（ｂ）形態素インデックスによる検索の結果
（ｃ）Ｎグラムインデックスによる検索（一次検索のみ）の結果
のいずれか１つである。一方、標準検索処理が実行された場合には、検索結果領域４２に表示される検索結果として常に上記（ａ）の結果が採用される。 As described above, when the high-speed search process is executed, according to the flowchart of FIG. 2, the search result area 42 of the search result screen is used based on both the search results based on the N-gram index and the morpheme index and the tuning parameter information. Search results to be notified to the user. The search results notified to this user, that is, the search results displayed in the search result area 42 are the following three types of search results: (a) Results of search (primary search and secondary search) by N-gram index (b) Result of search by morpheme index (c) One of the results of search by N-gram index (primary search only). On the other hand, when the standard search process is executed, the above result (a) is always adopted as the search result displayed in the search result area 42.

高速検索処理が実行された場合、上記（ａ），（ｂ），（ｃ）のいずれの検索結果が採用されたかが、例えば、対応する「検索精度」を抽象的に表現する用語で検索精度領域４１に示される。ここでは、（ａ），（ｂ），（ｃ）に対応する「検索精度」を抽象的に表現する用語として、それぞれ、「適切」、「やや粗い」、「粗い」が用いられる。 When the high-speed search processing is executed, which search result (a), (b), (c) is adopted is, for example, a term that expresses the corresponding “search accuracy” in an abstract manner. 41. Here, “appropriate”, “slightly coarse”, and “rough” are used as terms that abstractly express “search accuracy” corresponding to (a), (b), and (c), respectively.

［第１の変形例］
次に、上記実施形態の第１の変形例について、図５のフローチャートを参照して説明する。なお、図５において、高速検索処理の手順を示す図２のフローチャートと等価な部分には、同一符号を付してある。 [First Modification]
Next, a first modification of the above embodiment will be described with reference to the flowchart of FIG. In FIG. 5, the same reference numerals are given to the parts equivalent to the flowchart of FIG. 2 showing the procedure of the high-speed search process.

第１の変形例の特徴は、ステップＳ１，Ｓ２の処理（Ｎグラムインデックスによる一次検索）とステップＳ３乃至Ｓ５の処理（形態素インデックスによる検索）とが、図２のフローチャートとは逆の順番で実行される点にある。ここでは、検索条件が形態素検索可能な単語に分割できなかった場合（ステップＳ４）、ステップＳ１に相当する、Ｎグラムインデックスによる一次検索が実行される（ステップＳ１１）。 The feature of the first modified example is that the processes of steps S1 and S2 (primary search using N-gram index) and the processes of steps S3 to S5 (search using morpheme index) are executed in the reverse order of the flowchart of FIG. Is in the point to be. Here, when the search condition cannot be divided into words that can be searched for morpheme (step S4), a primary search using an N-gram index corresponding to step S1 is executed (step S11).

［第２の変形例］
次に、上記実施形態の第２の変形例について、図６のフローチャートを参照して説明する。なお、図６において、図２のフローチャートと等価な部分には、同一符号を付してある。 [Second Modification]
Next, a second modification of the above embodiment will be described with reference to the flowchart of FIG. In FIG. 6, the same reference numerals are given to the parts equivalent to the flowchart of FIG.

第２の変形例の特徴は、ステップＳ１，Ｓ２の処理（Ｎグラムインデックスによる一次検索）とステップＳ３乃至Ｓ５の処理（形態素インデックスによる検索）とが並列に実行される点にある。このように、双方の検索が並列に実行されることで、より高速な検索が可能となる。 The feature of the second modified example is that the processing of steps S1 and S2 (primary search using N-gram index) and the processing of steps S3 to S5 (search using morpheme index) are executed in parallel. In this way, both searches are executed in parallel, thereby enabling a faster search.

なお、本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment.

本発明の一実施形態に係る全文検索システムの構成を示すブロック図。The block diagram which shows the structure of the full text search system which concerns on one Embodiment of this invention. 同実施形態における高速検索モードでの全文検索処理（高速検索処理）の手順を示すフローチャート。9 is a flowchart showing a procedure of full-text search processing (high-speed search processing) in the high-speed search mode in the embodiment. 検索インタフェース画面の一例を示す図。The figure which shows an example of a search interface screen. 検索結果画面の一例を示す図。The figure which shows an example of a search result screen. 上記実施形態の第１の変形例における高速検索処理の手順を示すフローチャート。The flowchart which shows the procedure of the high-speed search process in the 1st modification of the said embodiment. 上記実施形態の第２の変形例における高速検索処理の手順を示すフローチャート。The flowchart which shows the procedure of the high-speed search process in the 2nd modification of the said embodiment.

Explanation of symbols

１１…ユーザインタフェース、１２…検索実行/応答サーバ、１３…Ｎグラム検索エンジン（第１の検索手段）、１４…Ｎグラムインデックスデータベース、１５…形態素解析機構、１６…形態素検索エンジン（第２の検索手段）、１７…形態素インデックスデータベース、１８…全文検索実行制御機構、１９…設定情報ファイル、３３…検索精度選択領域、３４…チューニングパラメタ領域、１３１…一次検索実行部、１３２…二次検索実行部、１８１…形態素解析結果判定部、１８２…一次検索結果数判定部、１８３…検索結果数近似度判定部、３３１…「高速」選択ボタン、３３２…「標準」選択ボタン、３４１…ヒット件数フィールド、３４２…近似割合フィールド、３４３…採用条件フィールド。 DESCRIPTION OF SYMBOLS 11 ... User interface, 12 ... Search execution / response server, 13 ... N-gram search engine (first search means), 14 ... N-gram index database, 15 ... Morphological analysis mechanism, 16 ... Morphological search engine (second search) Means), 17 ... morpheme index database, 18 ... full text search execution control mechanism, 19 ... setting information file, 33 ... search accuracy selection area, 34 ... tuning parameter area, 131 ... primary search execution section, 132 ... secondary search execution section , 181 ... Morphological analysis result determination unit, 182 ... Primary search result number determination unit, 183 ... Search result number approximation degree determination unit, 331 ... "High speed" selection button, 332 ... "Standard" selection button, 341 ... Hit number field, 342... Approximate ratio field, 343.

Claims

Based on a first search means for executing a search based on an N-gram index by a primary search and a secondary search for the primary search result, a morpheme analysis means for morphological analysis of a search condition statement, and a morpheme analysis result by the morpheme analysis means A full-text search system comprising second search means for executing a morpheme search using a morpheme index,
Approximation degree determination means for determining an approximation degree between the number of hits of the primary search using the N-gram index and the number of hits of the morpheme search using the morpheme index;
When the approximation degree determining unit determines that the number of hits of the primary search using the N-gram index is close to the number of hits of the morpheme search using the morpheme index, the secondary search using the N-gram index is omitted. And a full-text search execution control means for controlling the first search means and adopting the result of the primary search or the result of the morpheme search as a search result.

Further comprising morpheme analysis result determination means for determining whether the search condition sentence can be divided into morpheme searchable words based on the morpheme analysis result by the morpheme analysis means;
The full-text search execution control means executes the morpheme search using the morpheme index only when the morpheme analysis result determination means determines that the search condition sentence has been divided into words that can be searched for morpheme. 2. The full-text search system according to claim 1, wherein the search means is controlled.

The full-text search execution control means, based on the number of words obtained by dividing the search condition sentence indicated by the analysis result of the morpheme analysis means and the number of reference words, the result of the primary search or the result of the morpheme search The full-text search system according to claim 1, wherein which is adopted as a search result is determined.

4. The full-text search system according to claim 3, further comprising a user interface for enabling a user to designate the number of reference words.

A primary search result number judging means for judging whether the number of hits of the primary search is large or small by comparing the number of hits of the primary search with the N-gram index with a reference hit number;
The full-text search execution control means is configured to execute the first search so that a secondary search using the N-gram index is executed when the primary search result number determination means determines that the number of hits of the primary search is small. 2. The full-text search system according to claim 1, wherein said second search result is adopted as a search result by controlling means.

6. The full-text search system according to claim 5, further comprising a user interface for enabling a user to designate the reference hit count.

The full-text search system according to claim 1, further comprising a user interface that allows a user to specify an approximation that is a criterion for determination by the approximation determination unit.

Either of the standard search for performing the secondary search using the N-gram index in any case, or the high-speed search in which the secondary search using the N-gram index may be omitted based on the determination result by the approximation degree determination unit. The full-text search system according to claim 1, further comprising a user interface for enabling the user to specify the password.

The full-text search execution control means controls the first search means and the second search means so that a primary search using the N-gram index and a morpheme search using the morpheme index are executed in parallel. The full-text search system according to claim 1.

Based on a first search means for executing a search based on an N-gram index by a primary search and a secondary search for the primary search result, a morpheme analysis means for morphological analysis of a search condition statement, and a morpheme analysis result by the morpheme analysis means In a full-text search method applied to a system comprising second search means for executing a morpheme search using a morpheme index,
Determining the degree of approximation between the number of primary search hits by the N-gram index and the number of morpheme hits by the morpheme index;
When it is determined that the number of hits of the primary search by the N-gram index is close to the number of hits of the morpheme search by the morpheme index, the second search by the N-gram index is omitted. Controlling the search means;
And adopting the result of the primary search or the result of the morpheme search as a search result when a secondary search using the N-gram index is omitted.