JP6668855B2

JP6668855B2 - Search device, search method and program

Info

Publication number: JP6668855B2
Application number: JP2016052786A
Authority: JP
Inventors: 佐藤　勝彦; 勝彦佐藤
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2016-03-16
Filing date: 2016-03-16
Publication date: 2020-03-18
Anticipated expiration: 2036-03-16
Also published as: JP2017167837A

Description

本発明は、検索装置、検索方法及びプログラムに関する。 The present invention relates to a search device, a search method, and a program.

電子文書の増大に伴い、大規模な文書データベースから所望の電子文書を効率良く検索する技術の重要性が高まっている。 As the number of electronic documents increases, the importance of technology for efficiently searching for a desired electronic document from a large-scale document database has increased.

例えば、特許文献１及び特許文献２は、Ｎグラムを用いた検索技術を開示している。具体的に説明すると、特許文献１は、検索対象の辞書データのテキストを見出し語カテゴリと本文カテゴリとに分類し、見出し語カテゴリに属するテキストを前方一致検索し、見出し語カテゴリに属するテキストと本文カテゴリに属するテキストとを部分一致検索するテキスト検索装置を開示している。また、特許文献２は、検索文字列が少し誤って入力されても所望の文書を探し出すことができる曖昧検索技術を開示している。 For example, Patent Literature 1 and Patent Literature 2 disclose a search technique using an N-gram. More specifically, Patent Document 1 classifies text of dictionary data to be searched into a headword category and a text category, performs a head-on search for texts belonging to the headword category, and searches for text and text belonging to the headword category. A text search device that performs a partial match search for text belonging to a category is disclosed. Patent Document 2 discloses an ambiguous search technology that can search for a desired document even if a search character string is input by mistake.

特開２０１３−１６１３７１号公報JP 2013-161371 A 特開２０１４−１４６３０１号公報JP 2014-146301 A

上記のようなＮグラムを用いた電子文書の検索において、検索の効率を向上させ、検索時間を短縮することが望まれている。 In searching for an electronic document using the N-gram as described above, it is desired to improve the search efficiency and shorten the search time.

本発明は、以上のような課題を解決するためのものであり、Ｎグラムを用いた電子文書の検索において、検索時間を短縮することが可能な検索装置、検索方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a search device, a search method, and a program that can reduce a search time in a search for an electronic document using an N-gram. Aim.

上記目的を達成するため、本発明に係る検索装置は、
検索対象の電子文書のうちの第１のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第１のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第１のインデックス、及び、前記電子文書のうちの第２のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第２のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第２のインデックスを記憶する記憶手段と、
検索キーワードを取得する取得手段と、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記第１のカテゴリに属するテキストから前記検索キーワードを検索する第１の検索手段と、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックス及び前記第２のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記電子文書から前記検索キーワードを検索する第２の検索手段と、
少なくとも前記第１の検索手段と前記第２の検索手段の一方による検索結果を出力する出力手段と、
を備えることを特徴とする。 In order to achieve the above object, a search device according to the present invention comprises:
A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. Storage means for storing the second index;
Acquisition means for acquiring a search keyword;
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. First search means;
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. Second search means for performing
Output means for outputting a search result by at least one of the first search means and the second search means;
It is characterized by having.

本発明によれば、Ｎグラムを用いた電子文書の検索において、検索時間を短縮させることができる。 According to the present invention, in an electronic document search using an N-gram, the search time can be reduced.

本発明の実施形態に係る検索装置の外観を示す図である。It is a figure showing appearance of a search device concerning an embodiment of the present invention. 検索装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram illustrating a hardware configuration of a search device. 検索装置の機能的な構成を示すブロック図である。FIG. 2 is a block diagram illustrating a functional configuration of the search device. 辞書データ及び再配置データを示す図である。FIG. 4 is a diagram showing dictionary data and relocation data. 検索インデックスの構成を示す第１の図である。FIG. 3 is a first diagram illustrating a configuration of a search index. 検索インデックスの構成を示す第２の図である。FIG. 9 is a second diagram illustrating a configuration of a search index. インクリメンタル検索の例を示す図である。It is a figure showing an example of an incremental search. 全文検索の結果表示画面の例を示す図である。It is a figure showing an example of a display screen of a result of full-text search. 曖昧検索の結果表示画面の例を示す図である。It is a figure showing the example of the display screen of the result of a fuzzy search. 検索インデックス生成処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a search index generation process. インクリメンタル検索処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an incremental search process. 全文検索処理の流れを示す第１のフローチャートである。11 is a first flowchart illustrating a flow of a full-text search process. 全文検索処理の流れを示す第２のフローチャートである。13 is a second flowchart illustrating the flow of a full-text search process. 全文検索処理の流れを示す第３のフローチャートである。13 is a third flowchart illustrating the flow of a full-text search process. 全文検索処理の流れを示す第４のフローチャートである。15 is a fourth flowchart illustrating the flow of the full-text search process. 全文検索処理の流れを示す第５のフローチャートである。15 is a fifth flowchart illustrating the flow of the full-text search process. 曖昧検索処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an ambiguous search process.

以下、本発明の実施形態について、図面を参照して説明する。なお、図中同一又は相当する部分には同一符号を付す。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the drawings, the same or corresponding portions are denoted by the same reference numerals.

図１に、本発明の実施形態に係る検索装置１００の外観を示す。検索装置１００は、一例として、電子辞書である。図１に示すように、検索装置１００は、ユーザの操作に応じて検索キーワードの入力を受け付けるキーボード１００ｉと、検索キーワードに基づいて辞書を検索した検索結果を表示するＬＣＤ（Liquid Crystal Display）１００ｈと、を備える。 FIG. 1 shows an appearance of a search device 100 according to an embodiment of the present invention. The search device 100 is, for example, an electronic dictionary. As shown in FIG. 1, the search device 100 includes a keyboard 100 i for receiving an input of a search keyword in accordance with a user operation, an LCD (Liquid Crystal Display) 100 h for displaying a search result obtained by searching a dictionary based on the search keyword. , Is provided.

図２に示すように、検索装置１００は、ＣＰＵ（Central Processing Unit）１００ａ、ＲＯＭ（Read Only Memory）１００ｂ、ＲＡＭ（Random Access Memory）１００ｃ、ハードディスク１００ｄ、メディアコントローラ１００ｅ、ビデオカード１００ｇ、及びスピーカ１００ｊをその内部に備える。これら各部は、ＬＣＤ１００ｈ及びキーボード１００ｉとバスを介して接続されている。 As shown in FIG. 2, the search device 100 includes a CPU (Central Processing Unit) 100a, a ROM (Read Only Memory) 100b, a RAM (Random Access Memory) 100c, a hard disk 100d, a media controller 100e, a video card 100g, and a speaker 100j. Is provided inside. These components are connected to the LCD 100h and the keyboard 100i via a bus.

ＣＰＵ１００ａは、例えばマイクロプロセッサ等であって、様々な処理及び演算を実行する中央演算処理部である。ＣＰＵ１００ａは、中央処理装置、中央演算装置又はプロセッサ等とも呼ばれる。ＣＰＵ１００ａは、命令及びデータを転送するための伝送経路であるシステムバスを介して検索装置１００の各部と接続され、検索装置１００全体を統括制御する。ＣＰＵ１００ａは、ＲＯＭ１００ｂ又はハードディスク１００ｄに保存されたプログラムを実行するにより、下記の検索処理を含む各種の処理を実行する。ＲＡＭ１００ｃは、ＣＰＵ１００ａによるプログラムの実行時において、処理対象とするデータを一時的に記憶する等、作業領域として用いられる。 The CPU 100a is, for example, a microprocessor or the like, and is a central processing unit that executes various processes and calculations. The CPU 100a is also called a central processing unit, a central processing unit, a processor, or the like. The CPU 100a is connected to each unit of the search device 100 via a system bus, which is a transmission path for transferring commands and data, and controls the entire search device 100. The CPU 100a executes various programs including the following search processes by executing programs stored in the ROM 100b or the hard disk 100d. The RAM 100c is used as a work area for temporarily storing data to be processed when the CPU 100a executes a program.

ハードディスク１００ｄは、各種のデータを保存したテーブルと、英和辞書、国語辞典又は百科事典等の辞書データと、を記憶する不揮発性の記憶部である。なお、検索装置１００は、ハードディスク１００ｄの代わりに、フラッシュメモリを備えても良い。 The hard disk 100d is a non-volatile storage unit that stores a table storing various data and dictionary data such as an English-Japanese dictionary, a Japanese language dictionary, or an encyclopedia. Note that the search device 100 may include a flash memory instead of the hard disk 100d.

メディアコントローラ１００ｅは、フラッシュメモリ、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、及びブルーレイディスク（Blu-ray Disc）（登録商標）を含む記録媒体から各種のデータ及びプログラムを読み出す。 The media controller 100e reads various data and programs from recording media including a flash memory, a CD (Compact Disc), a DVD (Digital Versatile Disc), and a Blu-ray Disc (Blu-ray Disc) (registered trademark).

ビデオカード１００ｇは、ＣＰＵ１００ａから出力されたデジタル信号に基づいて画像を描画（つまり、レンダリング）すると共に、描画された画像を表す画像信号を出力する。ＬＣＤ１００ｈは、ビデオカード１００ｇから出力された画像信号に従って画像を表示する表示デバイスである。なお、検索装置１００は、ＬＣＤ１００ｈの代わりに、ＰＤＰ（Plasma Display Panel）若しくはＥＬ（Electroluminescence）ディスプレイを備えても良い。スピーカ１００ｊは、ＣＰＵ１００ａから出力された信号に基づいて音声を出力する。 The video card 100g draws (ie, renders) an image based on the digital signal output from the CPU 100a, and outputs an image signal representing the drawn image. The LCD 100h is a display device that displays an image according to an image signal output from the video card 100g. The search device 100 may include a PDP (Plasma Display Panel) or an EL (Electroluminescence) display instead of the LCD 100h. The speaker 100j outputs sound based on the signal output from the CPU 100a.

図３に示すように、検索装置１００は、機能的に、生成部１２０と、取得部１３０と、検索部１４０と、評価部１５０と、出力部１６０と、表示部１７０と、を備える。ＣＰＵ１００ａは、ＲＯＭ１００ｂに記憶されたプログラムをＲＡＭ１００ｃに読み出して、そのプログラムを実行して制御することにより、これら各部として機能する。また、検索装置１００は、機能的に、データ記憶部１１０と、表示部１７０と、を備える。データ記憶部１１０は、ＲＡＭ１００ｃ又はハードディスク１００ｄの記憶領域に構築される。表示部１７０は、ＬＣＤ１００ｈ等の表示デバイスである。 As illustrated in FIG. 3, the search device 100 functionally includes a generation unit 120, an acquisition unit 130, a search unit 140, an evaluation unit 150, an output unit 160, and a display unit 170. The CPU 100a reads the program stored in the ROM 100b into the RAM 100c, and executes and controls the program to function as these units. Further, the search device 100 functionally includes a data storage unit 110 and a display unit 170. The data storage unit 110 is constructed in a storage area of the RAM 100c or the hard disk 100d. The display unit 170 is a display device such as the LCD 100h.

データ記憶部１１０は、検索対象の電子文書の一例である辞書データを記憶する。この辞書データは、図４に示すように、見出し語を表すテキスト（以下、「見出し語テキスト」という。）と、見出し語の説明を表すテキスト（以下、「説明テキスト」という。）と、例えば、例文、成句又は複合語等（以下、「イディオム」という。）といった見出し語の用例を表すテキスト（以下、「用例テキスト」という。）と、を含んでいる。 The data storage unit 110 stores dictionary data, which is an example of an electronic document to be searched. As shown in FIG. 4, the dictionary data includes a text representing a headword (hereinafter, referred to as “headword text”) and a text representing an explanation of the headword (hereinafter, referred to as “description text”). , Example sentences, phrases, compound words, and the like (hereinafter, referred to as “idioms”) (hereinafter, referred to as “example texts”).

以下では、説明テキストと用例テキストとを合わせて本文テキストと言う。言い換えれば、辞書データに含まれるテキストは、見出し語カテゴリに属するテキスト（見出し語テキスト）と、本文カテゴリに属するテキスト（本文テキスト）と、に分類される。見出し語カテゴリを第１のカテゴリといい、本文カテゴリを第２のカテゴリという。また、辞書データのうち、見出し語テキストが占める部分を見出し部、説明テキストが占める部分を説明部、用例テキストが占める部分を用例部という。 Hereinafter, the description text and the example text are collectively referred to as a body text. In other words, the text included in the dictionary data is classified into a text belonging to the headword category (headword text) and a text belonging to the body text category (body text). The headword category is called a first category, and the body category is called a second category. In the dictionary data, a portion occupied by the headword text is referred to as a heading portion, a portion occupied by the explanation text is referred to as an explanation portion, and a portion occupied by the example text is referred to as an example portion.

辞書データは、見出し部と説明部と用例部との組を一つの構成単位として、この構成単位が連なって構成されている。各構成単位は、例えば、辞書データが英和辞典であれば、見出し語テキストのアルファベット順に並べられている。各構成単位において、見出し部の直後にその見出し語を説明する説明部が配置されており、その直後にその見出し語の用例を表す用例部が配置されている。 The dictionary data is composed of a set of a heading part, a description part, and an example part as one constituent unit, and these constituent units are connected to each other. For example, if the dictionary data is an English-Japanese dictionary, the constituent units are arranged in alphabetical order of the headword text. In each of the constituent units, an explanation part for explaining the headword is arranged immediately after the heading part, and an example part showing an example of the headword is arranged immediately after the heading part.

辞書データには、複数の説明部が、電子辞書の編集者が定めた並び順に従って配置されている。具体的に説明すると、見出し語のより一般的な意味内容を説明する説明部の方が、見出し語のより特殊な意味内容を説明する説明部よりも先の位置に格納されている。或いは、より使用頻度の高い意味内容を説明する説明部の方が、より使用頻度の低い意味内容を説明する説明部よりも後の位置に格納されているとしても良い。用例部については、説明部に対応する用例が記載された用例部が該当する説明部毎に記載されている場合と、１つの説明部の後に複数の用例部がまとめて記載されている場合とがある。 In the dictionary data, a plurality of explanation units are arranged according to the arrangement order determined by the editor of the electronic dictionary. More specifically, the explanation section for explaining the more general meaning content of the headword is stored at a position earlier than the explanation section for explaining the more specific meaning content of the headword. Alternatively, the explanation unit that explains the meaning content that is used more frequently may be stored at a position after the explanation unit that explains the meaning content that is used less frequently. As for the example section, there are cases where an example section in which an example corresponding to the explanation section is described is described for each corresponding explanation section, and a case where a plurality of example sections are described collectively after one explanation section. There is.

生成部１２０は、検索対象の電子文書である辞書データに含まれる複数のＮグラムと、辞書データにおけるこの複数のＮグラムの出現位置と、を対応付けた検索インデックスを生成する。Ｎグラムとは、連続したＮ文字の文字列（Ｎは自然数）である。例えば、Ｎが１の場合のＮグラムをモノグラム、Ｎが２の場合のＮグラムをバイグラム、Ｎが３の場合のＮグラムをトライグラムという。生成部１２０は、ＣＰＵ１００ａがＲＡＭ１００ｃ又はハードディスク１００ｄ等と協働することによって実現される。生成部１２０は、生成手段として機能する。また、生成部１２０は、検索インデックスを生成することで検索インデックスを取得するインデックス取得手段として機能する。 The generation unit 120 generates a search index in which a plurality of N-grams included in dictionary data, which is an electronic document to be searched, are associated with appearance positions of the plurality of N-grams in the dictionary data. The N-gram is a character string of N consecutive characters (N is a natural number). For example, an N gram when N is 1 is a monogram, an N gram when N is 2 is a bigram, and an N gram when N is 3 is a trigram. The generation unit 120 is realized by the CPU 100a cooperating with the RAM 100c or the hard disk 100d or the like. The generation unit 120 functions as a generation unit. Further, the generation unit 120 functions as an index acquisition unit that acquires a search index by generating a search index.

生成部１２０は、検索インデックスを生成するために、再配置データ生成部１２１と検索インデックス生成部１２２との機能を備えている。 The generation unit 120 has functions of a relocation data generation unit 121 and a search index generation unit 122 to generate a search index.

再配置データ生成部１２１は、データ記憶部１１０に記憶された辞書データを読み出し、辞書データに含まれるテキストを再配置することによって、再配置データを生成する。具体的に説明すると、辞書データは、図４左側に示したように、見出し語毎に、その説明部と用例部とが交互に配置されている。再配置データ生成部１２１は、交互に配置された説明部と用例部とを、それぞれまとまって配置されるように並び替えて、図４右側に示す再配置データを生成する。 The rearrangement data generation unit 121 reads the dictionary data stored in the data storage unit 110 and generates rearrangement data by rearranging text included in the dictionary data. More specifically, in the dictionary data, as shown on the left side of FIG. 4, the description part and the example part are alternately arranged for each headword. The rearrangement data generation unit 121 rearranges the explanation units and the example units that are alternately arranged so as to be arranged collectively, and generates the rearrangement data illustrated on the right side of FIG. 4.

検索インデックス生成部１２２は、再配置データ生成部１２１によって生成された再配置データから検索インデックスを生成する。具体的に説明すると、検索インデックス生成部１２２は、再配置データに含まれるＮグラムを順に抽出し、抽出した各Ｎグラムとその出現位置とを対応付けたテーブルを、検索インデックスとして生成する。検索インデックス生成部１２２は、Ｎグラムとして、モノグラムとバイグラムとを用いる。 The search index generation unit 122 generates a search index from the relocation data generated by the relocation data generation unit 121. More specifically, the search index generation unit 122 sequentially extracts N-grams included in the relocation data, and generates a table in which each extracted N-gram is associated with its appearance position as a search index. The search index generator 122 uses monograms and bigrams as N-grams.

図５及び図６に、検索インデックスの構成を示す。図５に示すように、検索インデックスは、見出し部インデックス、説明部インデックス、用例部インデックスを含んでいる。 5 and 6 show the structure of the search index. As shown in FIG. 5, the search index includes a heading part index, an explanation part index, and an example part index.

見出し部インデックスは、辞書データのうちの見出し語テキストに含まれる複数のＮグラム（具体的にはモノグラム及びバイグラム）と、当該見出し語テキストにおける当該複数のＮグラムの出現位置と、を対応付けたインデックスである。見出し部インデックスは、（１）見出し部に含まれるＮグラムの文字列パターンに関するファイル（pattern_hdl.idx）と、（２）その出現位置及び出現頻度に関するファイル（position_hdl.idx）と、を含んでいる。 The heading index associates a plurality of N-grams (specifically, monograms and bigrams) included in the headword text in the dictionary data with the appearance positions of the plurality of N-grams in the headword text. It is an index. The heading index includes (1) a file (pattern_hdl.idx) related to the character string pattern of the N-gram included in the heading, and (2) a file (position_hdl.idx) related to its appearance position and frequency. .

説明部インデックスは、辞書データのうちの説明テキストに含まれる複数のＮグラムと、当該説明テキストにおける当該複数のＮグラムの出現位置と、を対応付けたインデックスである。説明部インデックスは、（３）説明部に含まれるＮグラムの文字列パターンに関するファイル（pattern_bdy.idx）と、（４）その出現位置及び出現頻度に関するファイル（position_bdy.idx）と、を含んでいる。 The explanation part index is an index that associates a plurality of N-grams included in the explanation text in the dictionary data with the appearance positions of the plurality of N-grams in the explanation text. The explanation part index includes (3) a file (pattern_bdy.idx) related to the character string pattern of the N-gram included in the explanation part, and (4) a file (position_bdy.idx) related to its appearance position and appearance frequency. .

用例部インデックスは、辞書データのうちの用例テキストに含まれる複数のＮグラムと、当該用例テキストにおける当該複数のＮグラムの出現位置と、を対応付けたインデックスである。用例部インデックスは、（５）用例部に含まれるＮグラムの文字列パターンに関するファイル（pattern_exp.idx）と、（６）その出現位置及び出現頻度に関するファイル（position_exp.idx）と、を含んでいる。 The example section index is an index that associates a plurality of N-grams included in the example text in the dictionary data with the appearance positions of the plurality of N-grams in the example text. The example section index includes (5) a file (pattern_exp.idx) related to the character string pattern of the N-gram included in the example section, and (6) a file (position_exp.idx) related to its appearance position and frequency. .

検索インデックス生成部１２２は、見出し語テキスト、説明テキスト及び用例テキストのそれぞれにおいて別々に、先頭から１文字ずつずらしながらモノグラム及びバイグラムを抽出し、抽出したモノグラム及びバイグラムの出現位置と出現頻度とを導出する。そして、検索インデックス生成部１２２は、抽出したモノグラム及びバイグラムとその出現位置及び出現頻度とを対応付けることによって、このような３種類のインデックスを生成する。 The search index generation unit 122 separately extracts monograms and bigrams from the headword text, explanatory text, and example text while shifting one character from the beginning, and derives the appearance position and frequency of the extracted monograms and bigrams. I do. Then, the search index generation unit 122 generates such three types of indexes by associating the extracted monograms and bigrams with their appearance positions and appearance frequencies.

なお、見出し部インデックスを第１のインデックスといい、説明部インデックスと用例部インデックスとを合わせて第２のインデックスという。 Note that the heading section index is referred to as a first index, and the description section index and the example section index are collectively referred to as a second index.

また、図６に示すように、検索インデックスは、（７）見出し番号に関するファイル（number.idx）と、（８）見出し部の開始位置に関するファイル（headline.idx）と、（９）用例部の開始位置に関するファイル（example.idx）と、を含んでいる。見出し番号に関するファイル（number.idx）は、見出し語単位で、見出し部の開始位置、見出し語個数、説明部の開始位置、用例部の開始位置、用例個数、辞書番号、及び見出し番号の情報を含んでいる。辞書番号は、辞書データを識別する番号である。見出し番号は、複数の見出し部のそれぞれを識別するための番号である。 As shown in FIG. 6, the search index includes (7) a file related to a heading number (number.idx), (8) a file related to a start position of a heading part (headline.idx), and (9) an example section. File about the start position (example.idx). The file (number.idx) relating to the heading number contains information on the starting position of the heading part, the number of heading words, the starting position of the explanation part, the starting position of the example part, the number of examples, the dictionary number, and the heading number in heading word units. Contains. The dictionary number is a number for identifying dictionary data. The heading number is a number for identifying each of the plurality of heading parts.

なお、見出し部インデックスに含まれるＮグラム文字列パターンの出現位置、及び、見出し番号に関するファイルにおける見出し部の開始位置は、再配置データの見出し部の先頭文字からの、見出し語テキストのみ（図４における見出し部１〜見出し部Ｍ）における文字数を数えて導出した位置である。同様に、説明部インデックスに含まれるＮグラム文字列パターンの出現位置、及び、見出し番号に関するファイルにおける説明部の開始位置は、再配置データの説明部の先頭文字からの、説明テキストのみ（図４における説明部１〜説明部Ｍ）における文字数を数えて導出した位置である。同様に、用例部インデックスに含まれるＮグラム文字列パターンの出現位置、及び、見出し番号に関するファイルにおける用例部の開始位置は、再配置データの用例部の先頭文字からの、用例テキストのみ（図４における用例部１〜用例部Ｍ）における文字数を数えて導出した位置である。 Note that the appearance position of the N-gram character string pattern included in the heading index and the starting position of the heading in the file related to the heading number are only the headword text from the first character of the heading of the relocation data (FIG. 4). Are positions derived by counting the number of characters in the heading portion 1 to the heading portion M). Similarly, the appearance position of the N-gram character string pattern included in the explanation part index and the start position of the explanation part in the file related to the index number are only the explanation text from the first character of the explanation part of the relocation data (FIG. 4). Are the positions derived by counting the number of characters in the explanation units 1 to M). Similarly, the appearance position of the N-gram character string pattern included in the example part index and the start position of the example part in the file related to the index number are only the example text from the first character of the example part of the relocation data (FIG. 4). Is a position derived by counting the number of characters in the example section 1 to the example section M).

図３に示した検索装置１００の機能構成の説明に戻る。データ記憶部１１０は、生成部１２０によって生成された検索インデックスを記憶する。データ記憶部１１０は、記憶手段として機能する。 Returning to the description of the functional configuration of the search device 100 shown in FIG. The data storage unit 110 stores the search index generated by the generation unit 120. The data storage unit 110 functions as a storage unit.

取得部１３０は、検索キーワードを取得する。ユーザは、キーボード１００ｉを操作することで、所望の検索キーワードを検索装置１００に入力することができる。取得部１３０は、また、検索装置１００に予め複数の電子文書が登録されている場合、ユーザは、キーボード１００ｉを操作することで、複数の電子文書のうちから検索対象となる電子文書を指定することができる。取得部１３０は、このようにして入力された検索キーワード及び電子文書の指定を取得する。取得部１３０は、ＣＰＵ１００ａがキーボード１００ｉ等と協働することによって実現される。取得部１３０は、取得手段（検索キーワード取得手段）として機能する。 The acquisition unit 130 acquires a search keyword. The user can input a desired search keyword to the search device 100 by operating the keyboard 100i. When a plurality of electronic documents are registered in advance in the search device 100, the acquisition unit 130 operates the keyboard 100i to specify the electronic document to be searched from among the plurality of electronic documents. be able to. The acquisition unit 130 acquires the search keyword and the electronic document specification thus input. The acquisition unit 130 is realized by the CPU 100a cooperating with the keyboard 100i and the like. The acquisition unit 130 functions as an acquisition unit (search keyword acquisition unit).

検索部１４０は、検索対象の電子文書である辞書データから、取得部１３０によって取得された検索キーワードを検索する。具体的に説明すると、検索部１４０は、検索キーワードに含まれる複数のＮグラムの辞書データにおける出現位置を検索インデックスから読み出し、読み出した出現位置の連続性を評価することによって、辞書データから検索キーワードを検索する。検索部１４０は、ＣＰＵ１００ａがＲＡＭ１００ｃ等と協働することによって実現される。 The search unit 140 searches the dictionary data, which is the electronic document to be searched, for the search keyword acquired by the acquisition unit 130. More specifically, the search unit 140 reads, from the search index, the appearance positions of a plurality of N-grams included in the search keyword in the dictionary data, and evaluates the continuity of the read appearance positions. Search for. The search unit 140 is realized by the CPU 100a cooperating with the RAM 100c and the like.

より詳細に説明すると、検索部１４０は、インクリメンタル検索部１４１と、全文検索部１４２と、曖昧検索部１４３と、の機能を含んでいる。検索部１４０は、これら３種類の検索を実行することができる。 More specifically, the search unit 140 includes functions of an incremental search unit 141, a full-text search unit 142, and an ambiguity search unit 143. The search unit 140 can execute these three types of searches.

インクリメンタル検索部１４１は、見出し部を対象として、インクリメンタル検索を実行する。インクリメンタル検索は、ユーザによって検索キーワードが１文字ずつ入力される度に実行される前方一致検索である。インクリメンタル検索部１４１は、取得部１３０によって検索キーワードが１文字ずつ取得される度に、取得部１３０によって既に取得された少なくとも１文字に新たに取得された１文字を加えた文字列を検索キーワードとして検索する。 The incremental search unit 141 performs an incremental search on the heading part. The incremental search is a head-on search that is executed each time a search keyword is input by the user one character at a time. Each time the acquisition unit 130 acquires a search keyword one character at a time, the incremental search unit 141 uses, as a search keyword, a character string in which at least one character already acquired by the acquisition unit 130 and one newly acquired character are added. Search for.

具体的に図７を参照して、インクリメンタル検索を実行した例を説明する。ユーザがキーボード１００ｉを介して検索キーワードを１文字ずつ入力すると、取得部１３０は、この入力に応じて検索キーワードを１文字ずつ取得する。ユーザが検索キーワードの先頭の１文字（図７の例では「ｗ」）を入力すると、インクリメンタル検索部１４１は、見出し部インデックスを参照して、検索対象となる辞書データにおいて先頭が「ｗ」から始まる見出し語を検索する。その結果、図７の左側に示したように、先頭が「ｗ」から始まる見出し語一覧が生成される。 Specifically, an example in which the incremental search is executed will be described with reference to FIG. When the user inputs a search keyword one character at a time via the keyboard 100i, the obtaining unit 130 obtains the search keyword one character at a time in accordance with the input. When the user inputs the first character of the search keyword (“w” in the example of FIG. 7), the incremental search unit 141 refers to the index of the heading part and starts with “w” in the dictionary data to be searched. Search for a headword that starts. As a result, as shown on the left side of FIG. 7, a list of headwords beginning with "w" is generated.

次の１文字（図７の例では「ｈ」）が入力されると、インクリメンタル検索部１４１は、取得部１３０によって既に取得された１文字「ｗ」と新たに取得された１文字「ｈ」とを加えた文字列「ｗｈ」を先頭に持つ見出し語を、見出し部インデックスを参照して更に検索する。その結果、図７の中央に示すように、先頭が「ｗｈ」から始まる見出し語一覧が生成される。更に次の１文字（図７の例では「ｉ」）が入力されると、インクリメンタル検索部１４１は、取得部１３０によって既に取得された２文字「ｗｈ」と新たに取得された１文字「ｉ」とを加えた文字列「ｗｈｉ」を先頭に持つ見出し語を、見出し部インデックスを参照して更に検索する。その結果、図７の右に示すように、先頭が「ｗｈｉ」から始まる見出し語一覧が生成される。 When the next one character (“h” in the example of FIG. 7) is input, the incremental search unit 141 compares the one character “w” already acquired by the acquisition unit 130 with the one character “h” newly acquired. The headword having the character string “wh” at the beginning of the character string is further searched with reference to the heading index. As a result, as shown in the center of FIG. 7, a list of headwords beginning with "wh" is generated. Further, when the next one character (“i” in the example of FIG. 7) is input, the incremental search unit 141 compares the two characters “wh” already acquired by the acquisition unit 130 and the newly acquired one character “i” "Is further searched for with reference to the index of the heading, which has the character string" whi "at the beginning. As a result, as shown on the right side of FIG. 7, a list of headwords beginning with "whi" is generated.

このように、インクリメンタル検索部１４１は、ユーザによって検索キーワードが１文字ずつ入力される度に、見出し部インデックスを参照して、見出し語カテゴリに属する見出し語テキストから検索キーワードで始まる文字列を検索する前方一致検索を実行する。 As described above, the incremental search unit 141 searches the headline text belonging to the headword category for a character string starting with the search keyword by referring to the headline index each time the search keyword is input one character at a time by the user. Perform a prefix search.

全文検索部１４２は、辞書データ全体を対象として、全文検索を実行する。全文検索は、例えばユーザによってリターンキーが押下される等によって検索キーワードが確定された後に実行される完全一致検索である。全文検索部１４２は、検索キーワードが確定されると、見出し部インデックス、説明部インデックス及び用例部インデックスを参照して、辞書データ全体から検索キーワードを含む文字列を検索する全文検索を実行する。 The full-text search unit 142 performs a full-text search on the entire dictionary data. The full-text search is an exact match search that is executed after the search keyword is determined by, for example, pressing the return key by the user. When the search keyword is determined, the full-text search unit 142 executes a full-text search that searches the entire dictionary data for a character string including the search keyword with reference to the index of the index, the description, and the example.

例えば、検索キーワードとして「ｗｈｉｌｅ」との文字列が取得された場合、全文検索部１４２は、検索キーワード「ｗｈｉｌｅ」の先頭から末尾まで順に２文字ずつ切り出していき、「ｗｈ」、「ｈｉ」、「ｉｌ」、「ｌｅ」の４個のバイグラムを抽出する。そして、全文検索部１４２は、データ記憶部１１０に記憶された見出し部インデックス、説明部インデックス及び用例部インデックスから抽出した各バイグラムの出現位置を読み込んで、その連続性を評価する。具体的に説明すると、全文検索部１４２は、抽出した４個のバイグラム「ｗｈ」、「ｈｉ」、「ｉｌ」、「ｌｅ」が、この順番で１文字ずつずれた位置に出現するかを判定する。評価の結果、各バイグラムの出現位置に連続性が有ると評価した場合、全文検索部１４２は、その出現位置に、検索キーワードに相当する文字列が存在していると判定する。 For example, when a character string “while” is acquired as a search keyword, the full-text search unit 142 cuts out two characters in order from the beginning to the end of the search keyword “while”, and outputs “wh”, “hi”, The four bigrams “il” and “le” are extracted. Then, the full-text search unit 142 reads the appearance position of each bigram extracted from the heading index, the explanation index, and the example index stored in the data storage unit 110, and evaluates the continuity. More specifically, the full-text search unit 142 determines whether the extracted four bigrams “wh”, “hi”, “il”, and “le” appear at positions shifted by one character in this order. I do. As a result of the evaluation, when it is evaluated that the appearance position of each bigram has continuity, the full-text search unit 142 determines that a character string corresponding to the search keyword exists at the appearance position.

図８に、「ｗｈｉｌｅ」との検索キーワードを全文検索した場合の例を示す。図８に示すように、全文検索部１４２は、「ｗｈｉｌｅ」との文字列の完全一致検索を、見出し部、説明部及び用例部のそれぞれについて実行する。その結果、見出し部、説明部及び用例部のそれぞれにおいて、検索キーワード（図８の例では「ｗｈｉｌｅ」）が含まれるテキストが見付け出される。 FIG. 8 shows an example in which a full-text search is performed for a search keyword “while”. As illustrated in FIG. 8, the full-text search unit 142 performs an exact match search of the character string with “while” for each of the heading part, the description part, and the example part. As a result, a text including the search keyword (“while” in the example of FIG. 8) is found in each of the heading section, the explanation section, and the example section.

曖昧検索部１４３は、見出し部を対象として、曖昧検索を実行する。曖昧検索は、検索キーワードと部分的には一致するが完全には一致しない文字列を探し出すことが可能な部分一致検索である。曖昧検索部１４３は、ユーザがキーボード１００ｉを操作して検索キーワードを入力し、曖昧検索モードを選択した場合に、曖昧検索を実行する。曖昧検索の詳細については、上記特許文献２に開示されているため、ここでは簡単に説明する。 The fuzzy search unit 143 performs a fuzzy search on the heading part. The fuzzy search is a partial match search that can search for a character string that partially matches but does not completely match the search keyword. The fuzzy search unit 143 executes a fuzzy search when the user operates the keyboard 100i to input a search keyword and selects the fuzzy search mode. The details of the fuzzy search are disclosed in Patent Document 2 described above, and will be briefly described here.

曖昧検索部１４３は、検索キーワードに含まれる複数のＮグラムの見出し語カテゴリに属するテキストにおける出現位置を見出し部インデックスから読み出し、読み出した出現位置が、検索キーワードにおける複数のＮグラムの位置と連続して合致する数（連続合致Ｎグラム数）に基づいて、見出し語テキストから検索キーワードを検索する。連続合致Ｎグラム数とは、検索キーワードから抽出した各Ｎグラムの見出し語テキストにおける出現位置が、検索キーワードにおける対応する各Ｎグラムの位置と連続して合致する数である。曖昧検索部１４３は、連続合致Ｎグラム数を算出し、連続合致Ｎグラム数が多いほど評価が高い（ユーザが望んでいる結果である確率が高い）検索結果であると判定する。 The fuzzy search unit 143 reads, from the heading index, the appearance positions of the plurality of N-grams included in the search keyword in the text belonging to the headword category, and the read out appearance position is continuous with the positions of the plurality of N-grams in the search keyword. A search keyword is searched from the headword text based on the number of matching words (the number of consecutive matching N-grams). The number of consecutive matching N-grams is the number at which the appearance position in the headword text of each N-gram extracted from the search keyword continuously matches the position of each corresponding N-gram in the search keyword. The fuzzy search unit 143 calculates the number of consecutive matching N-grams, and determines that the higher the number of consecutive matching N-grams, the higher the evaluation (the higher the probability that the result is a result desired by the user).

図９に、「ｓａｌａｅｄ」との検索キーワードを曖昧検索した場合の例を示す。図９に示すように、曖昧検索部１４３は、曖昧検索を実行することで、「ｓａｌａｅｄ」との文字列に近い文字列を見出し語の中から見付け出す。その結果、例えば「ｓａｌａｄ」等の見出し語が見付け出される。 FIG. 9 shows an example in which the search keyword “saled” is ambiguously searched. As shown in FIG. 9, the fuzzy search unit 143 performs a fuzzy search to find a character string close to the character string “saled” from the headword. As a result, a headword such as "salad" is found.

インクリメンタル検索部１４１及び曖昧検索部１４３の検索対象は、どちらも辞書データのうちの見出し語カテゴリ（第１のカテゴリ）に属するテキストのみである。言い換えると、インクリメンタル検索部１４１及び曖昧検索部１４３は、それぞれ第１のカテゴリに属するテキストから検索キーワードを検索する第１の検索手段として機能する。これに対して、全文検索部１４２の検索対象は、辞書データ全体、言い換えると見出し語カテゴリ（第１のカテゴリ）と本文カテゴリ（第２のカテゴリ）との両方である。言い換えると、全文検索部１４２は、第１のカテゴリに属するテキストと第２のカテゴリに属するテキストとから検索キーワードを検索する第２の検索手段として機能する。 The search targets of the incremental search unit 141 and the fuzzy search unit 143 are only texts belonging to the headword category (first category) in the dictionary data. In other words, the incremental search unit 141 and the fuzzy search unit 143 each function as a first search unit that searches for a search keyword from text belonging to the first category. On the other hand, the search target of the full-text search unit 142 is the entire dictionary data, in other words, both the headword category (first category) and the body category (second category). In other words, the full-text search unit 142 functions as a second search unit that searches for a search keyword from texts belonging to the first category and texts belonging to the second category.

評価部１５０は、検索部１４０による検索結果に対する順位付け評価値を算出する。順位付け評価値とは、検索結果を出力する際の優先順位を示す値である。評価部１５０は、ＣＰＵ１００ａがＲＡＭ１００ｃ等と協働することによって実現される。評価部１５０は、評価手段として機能する。 The evaluation unit 150 calculates a ranking evaluation value for the search result obtained by the search unit 140. The ranking evaluation value is a value indicating a priority when outputting a search result. The evaluation unit 150 is realized by the CPU 100a cooperating with the RAM 100c and the like. The evaluation unit 150 functions as an evaluation unit.

例えば、評価部１５０は、検索によって得られた検索キーワードの出現位置が見出し部、説明部又は用例部の先頭に近いほど、順位を高く評価する。また、評価部１５０は、１つの見出し部に複数の見出し語がある場合には、先頭に近い見出し語が検索結果として得られた場合ほど、高く評価する。説明部及び用例部についても同様である。また、評価部１５０は、曖昧検索において、検索キーワードと合致する文字数が多いほど順位を高くする。また、評価部１５０は、全文検索におけるＡＮＤ検索の場合には、２つの検索キーワードの出現位置が近いほど順位を高く評価する。 For example, the evaluation unit 150 evaluates the ranking higher as the appearance position of the search keyword obtained by the search is closer to the head of the heading part, the explanation part, or the example part. In addition, when there are a plurality of headwords in one heading, the evaluation unit 150 evaluates higher as the headword closer to the head is obtained as a search result. The same applies to the explanation section and the example section. Further, in the fuzzy search, the evaluation unit 150 ranks higher as the number of characters matching the search keyword increases. Further, in the case of an AND search in a full-text search, the evaluation unit 150 evaluates the rank higher as the appearance positions of the two search keywords are closer.

出力部１６０は、検索部１４０による検索結果を出力する。具体的に説明すると、出力部１６０は、インクリメンタル検索部１４１、全文検索部１４２又は曖昧検索部１４３によって検索が実行されると、検索によって得られた検索キーワードの出現位置を、評価部１５０によって算出された順位付け評価値が高い順に並べ替える。そして、出力部１６０は、並び替えられた出現位置を示す情報を検索結果として表示部１７０に表示する。或いは、出力部１６０は、検索結果をスピーカ１００ｊから音声で出力しても良い。出力部１６０は、ＣＰＵ１００ａがビデオカード１００ｇ、ＬＣＤ１００ｈ又はスピーカ１００ｊ等と協働することによって実現される。出力部１６０は、出力手段として機能する。 The output unit 160 outputs a search result obtained by the search unit 140. Specifically, when the search is executed by the incremental search unit 141, the full-text search unit 142, or the fuzzy search unit 143, the output unit 160 calculates the appearance position of the search keyword obtained by the search by the evaluation unit 150. The sorted ranking evaluation values are sorted in descending order. Then, the output unit 160 displays information indicating the rearranged appearance positions on the display unit 170 as a search result. Alternatively, the output unit 160 may output the search result by voice from the speaker 100j. The output unit 160 is realized when the CPU 100a cooperates with the video card 100g, the LCD 100h, the speaker 100j, and the like. The output unit 160 functions as an output unit.

具体的に説明すると、インクリメンタル検索が実行された場合、出力部１６０は、ユーザが検索キーワードを１文字ずつ入力する度に、例えば図７に示したような見出し語一覧を表示部１７０に表示する。これに対して、全文検索が実行された場合、出力部１６０は、例えば図８に示したように、検索結果を見出し部、説明部及び用例部に分けて表示部１７０に表示する。また、曖昧検索が実行された場合、出力部１６０は、例えば図９に示したように、曖昧検索結果を表示部１７０に表示する。 More specifically, when the incremental search is performed, the output unit 160 displays, for example, a list of headwords as shown in FIG. 7 on the display unit 170 every time the user inputs a search keyword one character at a time. . On the other hand, when the full-text search is executed, the output unit 160 divides the search result into a heading section, an explanation section, and an example section and displays the search result on the display section 170 as shown in FIG. 8, for example. When the fuzzy search is executed, the output unit 160 displays the fuzzy search result on the display unit 170, for example, as illustrated in FIG.

以上のように構成される検索装置１００によって実行される処理の流れについて、図１０から図１７に示すフローチャートを参照して説明する。 The flow of processing executed by the search device 100 configured as described above will be described with reference to the flowcharts shown in FIGS.

まず、図１０に示すフローチャートを参照して、生成部１２０によって実行される検索インデックス生成処理の流れを説明する。検索インデックス生成処理は、再配置データ生成処理と検索インデックス生成処理とに分けられる。図１０に示す検索インデックス生成処理は、例えばユーザがキーボード１００ｉを操作して検索インデックスを生成する指示を入力すると、開始する。 First, the flow of the search index generation process executed by the generation unit 120 will be described with reference to the flowchart shown in FIG. The search index generation processing is divided into relocation data generation processing and search index generation processing. The search index generation process illustrated in FIG. 10 starts when, for example, the user operates the keyboard 100i and inputs an instruction to generate a search index.

検索インデックス生成処理が開始すると、再配置データ生成部１２１は、データ記憶部１１０に記憶された辞書データを読み出す（ステップＳ１）。そして、再配置データ生成部１２１は、読み出した辞書データから見出し語テキスト、説明テキスト及び用例テキストを抽出する（ステップＳ２）。 When the search index generation process starts, the relocation data generation unit 121 reads out the dictionary data stored in the data storage unit 110 (Step S1). Then, the relocation data generation unit 121 extracts a headword text, an explanation text, and an example text from the read dictionary data (step S2).

再配置データ生成部１２１は、抽出した見出し語テキストのそれぞれの先頭と末尾とに、区切り文字を付加する（ステップＳ３）。区切り文字は、見出し語の区切りを示すためのマークとして機能する文字である。区切り文字として、例えばスペース又は特殊文字等、見出し語テキストに現れない文字が用いられる。 The rearrangement data generation unit 121 adds a delimiter to each of the head and end of the extracted headword text (step S3). The delimiter character is a character that functions as a mark for indicating a delimiter between headwords. A character that does not appear in the headword text, such as a space or a special character, is used as a delimiter.

見出し語テキストに区切り文字を付加すると、再配置データ生成部１２１は、見出し語毎に説明テキストと用例テキストとをまとめた、図４右側に示したような再配置データを生成し、データ記憶部１１０へ保存する（ステップＳ４）。 When a delimiter is added to the headword text, the rearrangement data generation unit 121 generates rearrangement data as shown on the right side of FIG. 4 in which the description text and the example text are combined for each headword, and the data storage unit The data is stored in the file 110 (step S4).

次に、検索インデックス生成部１２２は、再配置データ生成部１２１によって生成された再配置データから検索インデックスを生成する（ステップＳ５）。具体的に説明すると、検索インデックス生成部１２２は、再配置データにおける見出し部、説明部及び用例部のそれぞれについて、先頭から１文字ずつずらしながらモノグラム及びバイグラムを抽出し、抽出したモノグラム及びバイグラムの出現位置と出現頻度とを導出する。そして、検索インデックス生成部１２２は、抽出したモノグラム及びバイグラムのそれぞれについてその出現位置と出現頻度とを対応付けることによって、図５及び図６に示すような検索インデックスを生成し、データ記憶部１１０に記憶する。以上により、図１０における検索インデックス生成処理は終了する。 Next, the search index generation unit 122 generates a search index from the relocation data generated by the relocation data generation unit 121 (Step S5). More specifically, the search index generation unit 122 extracts monograms and bigrams for each of the heading part, the description part, and the example part in the rearranged data while shifting one character from the beginning, and the appearance of the extracted monograms and bigrams The position and appearance frequency are derived. Then, the search index generation unit 122 generates a search index as shown in FIGS. 5 and 6 by associating the appearance position and the appearance frequency of each of the extracted monogram and bigram, and stores the search index in the data storage unit 110. I do. Thus, the search index generation processing in FIG. 10 ends.

次に、図１１から図１７に示すフローチャートを参照して、検索部１４０によって実行される検索処理の流れを説明する。上述したように、検索部１４０は、検索処理として、インクリメンタル検索処理、全文検索処理、及び曖昧検索処理を実行する。 Next, the flow of a search process executed by the search unit 140 will be described with reference to flowcharts shown in FIGS. As described above, the search unit 140 performs an incremental search process, a full-text search process, and an ambiguous search process as the search processes.

第１に、図１１を参照して、インクリメンタル検索処理の流れを説明する。ユーザがキーボード１００ｉを操作して対象辞書を指定した上で検索キーワードを１文字ずつ入力する度に、インクリメンタル検索部１４１は、図１１に示すインクリメンタル検索処理を開始する。以下、検索キーワードとして「ｗｈｉ」との文字列が入力された場合を例にとって、説明する。 First, the flow of the incremental search process will be described with reference to FIG. Each time the user operates the keyboard 100i to specify a target dictionary and inputs a search keyword one character at a time, the incremental search unit 141 starts the incremental search processing shown in FIG. Hereinafter, a case where a character string “whi” is input as a search keyword will be described as an example.

インクリメンタル検索処理が開始すると、インクリメンタル検索部１４１は、検索キーワードとして入力された入力文字列から、検索文字列を生成する（ステップＳ１１）。具体的に説明すると、インクリメンタル検索部１４１は、入力文字列の先頭に区切り文字を付加することによって、検索文字列を生成する。例えば区切り文字を「・」と表した場合、インクリメンタル検索部１４１は、検索キーワード「ｗｈｉ」の先頭に「・」を付加することによって、「・ｗｈｉ」との検索文字列を生成する。 When the incremental search process starts, the incremental search unit 141 generates a search character string from an input character string input as a search keyword (Step S11). More specifically, the incremental search unit 141 generates a search character string by adding a delimiter to the head of the input character string. For example, when the delimiter is represented as “•”, the incremental search unit 141 generates a search character string “• whi” by adding “•” to the beginning of the search keyword “whi”.

検索文字列を生成すると、インクリメンタル検索部１４１は、検索文字列からバイグラムを抽出する（ステップＳ１２）。具体的に説明すると、インクリメンタル検索部１４１は、検索文字列「・ｗｈｉ」の先頭から末尾まで順に２文字ずつ切り出していき、「・ｗ」、「ｗｈ」、「ｈｉ」の３個のバイグラムを抽出する。そして、インクリメンタル検索部１４１は、検索文字列を基準文字列と設定する（ステップＳ１３）。 After generating the search character string, the incremental search unit 141 extracts a bigram from the search character string (Step S12). More specifically, the incremental search unit 141 cuts out two characters in order from the beginning to the end of the search character string “· whi”, and divides the three bigrams of “· w”, “wh”, and “hi”. Extract. Then, the incremental search unit 141 sets the search character string as a reference character string (Step S13).

バイグラムを抽出すると、インクリメンタル検索部１４１は、見出し部インデックス（図５に示した（１）pattern_hdl.idx及び（２）position_hdl.idx）を参照して、基準文字列から抽出した各バイグラムの出現位置を読み込んで、その連続性を評価する（ステップＳ１４）。具体的に説明すると、インクリメンタル検索部１４１は、抽出した３個のバイグラム「・ｗ」、「ｗｈ」、「ｈｉ」が、見出し語テキストにおいてこの順番で１文字ずつずれた位置に出現するかを判定する。 When the bigram is extracted, the incremental search unit 141 refers to the index part index ((1) pattern_hdl.idx and (2) position_hdl.idx shown in FIG. 5) and displays the appearance position of each bigram extracted from the reference character string. Is read, and its continuity is evaluated (step S14). More specifically, the incremental search unit 141 determines whether the three extracted bigrams “· w”, “wh”, and “hi” appear at positions shifted by one character in this order in the headword text. judge.

評価の結果、連続性が無いと判定した場合は（ステップＳ１５；ＮＯ）、インクリメンタル検索部１４１は、ステップＳ１４において、見出し部インデックスから読み込んだ他の出現位置について同様に連続性を評価する。 As a result of the evaluation, when it is determined that there is no continuity (step S15; NO), the incremental search unit 141 similarly evaluates the continuity of the other appearance positions read from the heading index in step S14.

これに対して、連続性有りと判定した場合（ステップＳ１５；ＹＥＳ）、インクリメンタル検索部１４１は、先頭のバイグラムの出現位置を基準文字列の出現位置と判定する（ステップＳ１６）。そして、インクリメンタル検索部１４１は、検索インデックスの見出し番号に関するファイルと見出し部の開始位置に関するファイル（図６に示した（７）number.idx及び（８）headline.idx）とを参照して、基準文字列の出現位置から検索結果として辞書番号及び見出し番号を導出する（ステップＳ１７）。そして、評価部１５０は、検索結果に対する順位付け評価値を算出する（ステップＳ１８）。 On the other hand, if it is determined that there is continuity (step S15; YES), the incremental search unit 141 determines the appearance position of the leading bigram as the appearance position of the reference character string (step S16). Then, the incremental search unit 141 refers to the file relating to the index number of the search index and the files relating to the start position of the index part ((7) number.idx and (8) headline.idx shown in FIG. 6), and The dictionary number and the index number are derived as search results from the appearance position of the character string (step S17). Then, the evaluation unit 150 calculates a ranking evaluation value for the search result (step S18).

次に、インクリメンタル検索部１４１は、未評価の出現位置が存在するかを判定し（ステップＳ１９）、存在すると判定した場合には（ステップＳ１９；ＹＥＳ）、ステップＳ１４に戻る。一方、未評価の出現位置が存在しないと判定した場合には（ステップＳ１９；ＮＯ）、出力部１６０は、順位付け評価値に基づいて、検索結果を並べる（ステップＳ２０）。そして、出力部１６０は、検索結果を見出し語単位で表示部１７０に出力する（ステップＳ２１）。出力部１６０は、例えば図７に示した表示画面を表示部１７０に表示する。これにより、インクリメンタル検索処理は終了する。 Next, the incremental search unit 141 determines whether an unevaluated appearance position exists (step S19), and when it determines that it exists (step S19; YES), returns to step S14. On the other hand, when it is determined that there is no unevaluated appearance position (step S19; NO), the output unit 160 arranges the search results based on the ranking evaluation value (step S20). Then, the output unit 160 outputs the search result to the display unit 170 in headword units (step S21). The output unit 160 displays the display screen illustrated in FIG. 7 on the display unit 170, for example. Thus, the incremental search processing ends.

第２に、図１２から図１６に示すフローチャートを参照して、全文検索処理の流れを説明する。ユーザがキーボード１００ｉを操作して対象辞書を指定した上で検索キーワードを確定させると、全文検索部１４２は、図１２から図１６に示す全文検索処理を開始する。 Second, the flow of the full-text search process will be described with reference to the flowcharts shown in FIGS. When the user operates the keyboard 100i to specify the target dictionary and finalize the search keyword, the full-text search unit 142 starts the full-text search processing shown in FIGS.

全文検索処理が開始すると、全文検索部１４２は、検索キーワードとして入力された入力文字列から、検索文字列を生成する（ステップＳ３１）。具体的に説明すると、全文検索部１４２は、入力文字列の両端に区切り文字を付加することによって、検索文字列を生成する。例えば検索キーワードとして「ｗｈｉｌｅ」との文字列が入力された場合、全文検索部１４２は、検索キーワード「ｗｈｉｌｅ」の先頭と末尾とに「・」を付加することによって、「・ｗｈｉｌｅ・」との検索文字列を生成する。 When the full-text search process starts, the full-text search unit 142 generates a search character string from an input character string input as a search keyword (step S31). More specifically, the full-text search unit 142 generates a search character string by adding delimiters to both ends of the input character string. For example, when a character string of “while” is input as a search keyword, the full-text search unit 142 adds “•” to the beginning and end of the search keyword “while”, and thereby searches for “• while ·”. Generate a search string.

検索文字列を生成すると、全文検索部１４２は、検索文字列の個数が１個であるかを判定する（ステップＳ３２）。言い換えると、全文検索部１４２は、現在の検索が複数の検索キーワードによるＡＮＤ検索であるか否かを判定する。全文検索部１４２は、検索文字列の数が１個である場合は（ステップＳ３２；ＹＥＳ）、処理をステップＳ３３に進め、一方、検索文字列の数が複数である場合は（ステップＳ３２；ＮＯ）、処理をステップＳ３５に進める。 After generating the search character string, the full-text search unit 142 determines whether the number of search character strings is one (Step S32). In other words, the full-text search unit 142 determines whether the current search is an AND search using a plurality of search keywords. If the number of search character strings is one (step S32; YES), full-text search unit 142 advances the process to step S33, while if the number of search character strings is plural (step S32; NO). ) And the process proceeds to step S35.

検索文字列が１個の場合には（ステップＳ３２；ＹＥＳ）、全文検索部１４２は、検索文字列からモノグラム及びバイグラムを抽出する（ステップＳ３３）。具体的に説明すると、検索文字列が３文字の文字列（・＋１文字の入力文字列＋・）である場合には、全文検索部１４２は、１文字の入力文字列に相当する１個のモノグラムと、両端の区切り文字を含む２個のバイグラムと、を抽出する。一方、検索文字列が４文字以上の文字列（・＋２文字以上の入力文字列＋・）である場合には、全文検索部１４２は、検索文字列の先頭から末尾まで順にバイグラムを抽出する。例えば、検索文字列「・ｗｈｉｌｅ・」からは「・ｗ」、「ｗｈ」、「ｈｉ」、「ｉｌ」、「ｌｅ」、「ｅ・」の６個のバイグラムが抽出される。 When there is only one search character string (step S32; YES), the full-text search unit 142 extracts a monogram and a bigram from the search character string (step S33). More specifically, when the search character string is a three-character character string (· + 1 character input character string + ·), the full-text search unit 142 outputs one character string corresponding to one character input character string. Extract a monogram and two bigrams including the delimiters at both ends. On the other hand, when the search character string is a character string of four or more characters (. + An input character string of two or more characters +.), The full-text search unit 142 extracts bigrams in order from the beginning to the end of the search character string. For example, six bigrams of ".w", "wh", "hi", "il", "le", and "e." Are extracted from the search character string ".while."

次に、全文検索部１４２は、検索文字列を基準文字列と設定する（ステップＳ３４）。そして、全文検索部１４２は、見出し部、説明部及び用例部の順で、すなわちカテゴリ毎に、抽出したバイグラムの連続性を評価する処理に移行する。 Next, the full-text search unit 142 sets the search character string as a reference character string (Step S34). Then, the full-text search unit 142 proceeds to a process of evaluating the continuity of the extracted bigram in the order of the heading, the explanation, and the example, that is, for each category.

図１３に移って、全文検索部１４２は、基準文字列の文字数が３文字であるかを判定する（ステップＳ５１）。基準文字列の文字数が３文字であると判定した場合には（ステップＳ５１；ＹＥＳ）、全文検索部１４２は、見出し部インデックス（図５に示した（１）pattern_hdl.idx及び（２）position_hdl.idx）を参照して、基準文字列から抽出した１個のモノグラムの出現位置を読み込んで、これを基準文字列の出現位置と特定する（ステップＳ５２）。 Referring to FIG. 13, the full-text search unit 142 determines whether the number of characters in the reference character string is three (step S51). If it is determined that the number of characters of the reference character string is three (step S51; YES), the full-text search unit 142 determines the index of the heading part ((1) pattern_hdl.idx and (2) position_hdl. With reference to (idx), the appearance position of one monogram extracted from the reference character string is read, and this is specified as the appearance position of the reference character string (step S52).

基準文字列の出現位置を特定すると、全文検索部１４２は、検索状態を導出する（ステップＳ５３）。検索状態とは、検索キーワードが見出し語そのものであるか否かを示す情報である。具体的に説明すると、全文検索部１４２は、見出し部インデックスから、区切り文字を含む２個のバイグラムの出現位置を読み込んで、その連続性を評価する。連続性が有る場合、検索キーワードが見出し語そのものであるとの検索状態を導出する。 When the appearance position of the reference character string is specified, the full-text search unit 142 derives a search state (Step S53). The search state is information indicating whether or not the search keyword is the headword itself. More specifically, the full-text search unit 142 reads the appearance positions of the two bigrams including the delimiter from the index of the index, and evaluates the continuity. If there is continuity, a search state that the search keyword is the headword itself is derived.

一方、ステップＳ５１において、基準文字列の文字数が４文字以上であると判定した場合には（ステップＳ５１；Ｎｏ）、全文検索部１４２は、見出し部インデックスを参照して、基準文字列から抽出したバイグラムのうちの区切り文字を含まないバイグラムの出現位置を読み込んで、その連続性を評価する（ステップＳ５４）。評価の結果、連続性が無いと判定した場合は（ステップＳ５５；ＮＯ）、全文検索部１４２は、ステップＳ５４において、見出し部インデックスから読み込んだ他の出現位置について同様に連続性を評価する。 On the other hand, when it is determined in step S51 that the number of characters of the reference character string is four or more (step S51; No), the full-text search unit 142 extracts the character string from the reference character string with reference to the index of the heading part. The appearance position of the bigram that does not include the delimiter character in the bigram is read and its continuity is evaluated (step S54). As a result of the evaluation, when it is determined that there is no continuity (step S55; NO), the full-text search unit 142 similarly evaluates the continuity of the other appearance positions read from the index of the index in step S54.

これに対して、連続性が有ると判定した場合（ステップＳ５５；ＹＥＳ）、全文検索部１４２は、２番目のバイグラムの出現位置を、基準文字列の出現位置と判定する（ステップＳ５６）。そして、全文検索部１４２は、見出し部インデックスから、区切り文字を含む２個のバイグラムの出現位置を読み込んで、その連続性を評価することで、検索状態を導出する（ステップＳ５７）。 On the other hand, when it is determined that there is continuity (step S55; YES), the full-text search unit 142 determines that the appearance position of the second bigram is the appearance position of the reference character string (step S56). Then, the full-text search unit 142 derives the search state by reading the appearance positions of the two bigrams including the delimiter from the index of the heading part and evaluating the continuity thereof (step S57).

ステップＳ５３又はステップＳ５７において検索状態を導出すると、全文検索部１４２は、検索インデックスの見出し番号に関するファイル（（７）number.idx）と見出し部の開始位置に関するファイル（（８）headline.idx）とを参照して、基準文字列の出現位置から検索結果として辞書番号、見出し番号及び用例番号を導出する（ステップＳ５８）。そして、評価部１５０は、検索結果に対する順位付け評価値を算出する（ステップＳ５９）。 When the search state is derived in step S53 or step S57, the full-text search unit 142 compares the file ((7) number.idx) regarding the index number of the search index and the file ((8) headline.idx) regarding the start position of the index part. , A dictionary number, a heading number, and an example number are derived as search results from the appearance position of the reference character string (step S58). Then, the evaluation unit 150 calculates a ranking evaluation value for the search result (step S59).

次に、全文検索部１４２は、未評価の出現位置が存在するかを判定し（ステップＳ６０）、未評価の出現位置が存在する場合は（ステップＳ６０；ＹＥＳ）、ステップＳ５１に戻る。一方、未評価の出現位置が存在しない場合は（ステップＳ６０；ＮＯ）、全文検索部１４２は、未評価のカテゴリが存在するかを判定する（ステップＳ６１）。未評価のカテゴリが存在する場合は（ステップＳ６１；ＹＥＳ）、全文検索部１４２は、ステップＳ５１に戻り、説明部及び用例部の順に未評価のカテゴリについて同様の処理を実行する。 Next, the full-text search unit 142 determines whether an unevaluated appearance position exists (step S60). If an unevaluated appearance position exists (step S60; YES), the process returns to step S51. On the other hand, when there is no unevaluated appearance position (step S60; NO), the full-text search unit 142 determines whether an unevaluated category exists (step S61). If there is an unevaluated category (step S61; YES), the full-text search unit 142 returns to step S51, and executes the same process for the unevaluated category in the order of the explanation part and the example part.

最終的に、未評価のカテゴリが存在しないと判定した場合は（ステップＳ６１；ＮＯ）、出力部１６０は、順位付け評価値に基づいて、検索結果を並べ替える（ステップＳ６２）。そして、出力部１６０は、検索結果を表示部１７０に出力する（ステップＳ６３）。このとき、出力部１６０は、図８に示したように、見出し部、説明部及び用例部からの検索結果を、それぞれ見出し語単位、説明部単位及び用例部単位で表示部１７０に表示する。以上により、検索文字列の数が１個である場合の全文検索処理は終了する。 Finally, when it is determined that there is no category that has not been evaluated (step S61; NO), the output unit 160 rearranges the search results based on the ranking evaluation value (step S62). Then, the output unit 160 outputs the search result to the display unit 170 (Step S63). At this time, as shown in FIG. 8, the output unit 160 displays the search results from the heading part, the explanation part, and the example part on the display part 170 in headword unit, explanation part unit, and example part unit, respectively. Thus, the full-text search process in the case where the number of search character strings is one is completed.

図１２に戻って、ステップＳ３２において、全文検索部１４２は、検索文字列の数が複数である場合には（ステップＳ３２；ＮＯ）、各検索文字列からモノグラム及びバイグラムを抽出する（ステップＳ３５）。言い換えると、全文検索部１４２は、複数の検索文字列のそれぞれについて、ステップＳ３２と同様の処理を実行する。 Returning to FIG. 12, in step S32, when the number of search character strings is plural (step S32; NO), the full-text search unit 142 extracts a monogram and a bigram from each search character string (step S35). . In other words, the full-text search unit 142 performs the same processing as in step S32 for each of the plurality of search character strings.

次に、全文検索部１４２は、見出し部インデックス、説明部インデックス及び用例部インデックスを参照して、抽出したモノグラム及びバイグラムのそれぞれの出現頻度を抽出する（ステップＳ３６）。そして、全文検索部１４２は、各検索文字列を構成するモノグラム及びバイグラムの中で最小出現頻度を特定する（ステップＳ３７）。 Next, the full-text search unit 142 extracts the appearance frequency of each of the extracted monogram and bigram with reference to the heading index, the explanation index, and the example index (step S36). Then, the full-text search unit 142 specifies the minimum appearance frequency among the monograms and bigrams constituting each search character string (step S37).

最小出現頻度を特定すると、全文検索部１４２は、複数の検索文字列から基準文字列と検証文字列とを決定する（ステップＳ３８）。具体的に説明すると、全文検索部１４２は、最小出現頻度のモノグラム又はバイグラムを含む１個の検索文字列を基準文字列と決定し、それ以外の少なくとも１個の検索文字列を検証文字列と設定する。これは、ＡＮＤ検索において、出現頻度が低い検索文字列を基準にして検索した方が、処理負荷を下げ、検索速度を上げることができるからである。或いは、検索文字列の入力順を考慮した検索の場合には、全文検索部１４２は、最初に入力した検索文字列を基準文字列と決定し、２番目以降に入力した検索文字列を検証文字列と決定しても良い。 After specifying the minimum appearance frequency, the full-text search unit 142 determines a reference character string and a verification character string from a plurality of search character strings (step S38). More specifically, the full-text search unit 142 determines one search character string including a monogram or a bigram with the lowest appearance frequency as a reference character string, and determines at least one other search character string as a verification character string. Set. This is because, in the AND search, performing a search based on a search character string having a low appearance frequency can reduce the processing load and increase the search speed. Alternatively, in the case of a search in consideration of the input order of the search character strings, the full-text search unit 142 determines the search character string input first as the reference character string, and determines the search character string input second or later as the verification character string. It may be determined as a column.

その後、全文検索部１４２は、見出し部、説明部及び用例部の順で、すなわちカテゴリ毎に、抽出したバイグラムの連続性を評価する処理に移行する。 Thereafter, the full-text search unit 142 proceeds to a process of evaluating the continuity of the extracted bigram in the order of the heading, the explanation, and the example, that is, for each category.

図１４に移って、全文検索部１４２は、基準文字列の文字数が３文字であるかを判定する（ステップＳ７１）。以降、全文検索部１４２は、基準文字列の出現位置を特定し、その検索状態を導出する（ステップＳ７１からステップＳ７７）。ステップＳ７１からステップＳ７７の処理は、検索文字列の数が１個の場合において説明したステップＳ５１からステップＳ５７の処理と同じであるため、説明を省略する。 Referring to FIG. 14, the full-text search unit 142 determines whether the number of characters in the reference character string is three (step S71). Thereafter, the full-text search unit 142 specifies the appearance position of the reference character string and derives the search state (steps S71 to S77). The processing from step S71 to step S77 is the same as the processing from step S51 to step S57 described in the case where the number of search character strings is one, and therefore the description is omitted.

基準文字列の出現位置を特定し、その検索状態を導出すると、全文検索部１４２は、検索インデックスの見出し番号に関するファイル（（７）number.idx）と見出し部の開始位置に関するファイル（（８）headline.idx）とを参照して、基準文字列の出現位置から検索結果候補として辞書番号、見出し番号及び用例番号を導出する（ステップＳ７８）。 When the appearance position of the reference character string is specified and its search state is derived, the full-text search unit 142 determines a file ((7) number.idx) relating to the index number of the search index and a file ((8)) relating to the start position of the index part. With reference to (headline.idx), a dictionary number, a heading number, and an example number are derived as search result candidates from the appearance position of the reference character string (step S78).

このように基準文字列の評価処理を終了すると、全文検索部１４２は、図１５に移り、検証文字列の評価処理を実行する。図１５において、全文検索部１４２は、検証文字列の文字数が３文字であるかを判定する（ステップＳ８１）。 When the evaluation processing of the reference character string ends in this way, the full text search unit 142 proceeds to FIG. 15, and executes the evaluation processing of the verification character string. In FIG. 15, the full-text search unit 142 determines whether the number of characters in the verification character string is three (step S81).

検証文字列の文字数が３文字であると判定した場合には（ステップＳ８１；ＹＥＳ）、全文検索部１４２は、見出し部インデックスから、抽出した１個のモノグラムの出現位置を読み込んで、これを検証文字列の出現位置と特定する（ステップＳ８２）。そして、全文検索部１４２は、見出し部インデックスから、区切り文字を含む２個のバイグラムの出現位置を読み込んで、その連続性を評価することで、検索状態を導出する（ステップＳ８３）。 When it is determined that the number of characters of the verification character string is three (step S81; YES), the full-text search unit 142 reads the appearance position of the extracted one monogram from the index of the heading part, and verifies this. The appearance position of the character string is specified (step S82). Then, the full-text search unit 142 derives the search state by reading the appearance positions of the two bigrams including the delimiter from the index of the heading part and evaluating the continuity thereof (step S83).

検索状態を導出すると、全文検索部１４２は、ステップＳ７８において導出された検索結果候補の辞書番号、見出し番号及び用例番号によって指定される範囲内に検証文字列が存在するかを評価する（ステップＳ８４）。この範囲は、基準文字列と検証文字列とがＡＮＤ検索としてヒットする範囲である。例えば、基準文字列と検証文字列とがどちらも同じ辞書データにおける同じ見出し語の構成単位（説明部及び用例部を含む）に存在する場合に、ＡＮＤ検索がヒットするように設定する。或いは、基準文字列と検証文字列とがどちらも同じ用例部に存在する場合に、ＡＮＤ検索がヒットするように設定することもできる。 When the search state is derived, the full-text search unit 142 evaluates whether the verification character string exists within the range specified by the dictionary number, index number, and example number of the search result candidate derived in step S78 (step S84). ). This range is a range where the reference character string and the verification character string are hit as an AND search. For example, if both the reference character string and the verification character string exist in the same headword constituent unit (including the description part and the example part) in the same dictionary data, the AND search is set to hit. Alternatively, if both the reference character string and the verification character string exist in the same example section, the setting can be set so that the AND search is hit.

評価の結果、検証文字列の出現位置が指定範囲の最小位置以上で無いと判定した場合には（ステップＳ８５；ＮＯ）、全文検索部１４２は、処理をステップＳ８２に戻す。一方、検証文字列の出現位置が最小位置以上であると判定した場合には（ステップＳ８５；ＹＥＳ）、全文検索部１４２は、検証文字列の出現位置が指定範囲内の最大位置以下であるかを評価する（ステップＳ８６）。検証文字列の出現位置が最大位置以下でないと判定した場合は（ステップＳ８６；ＮＯ）、全文検索部１４２は、処理を図１６のステップＳ１０３に移す。一方、検証文字列の出現位置が最大位置以下であると判定した場合は（ステップＳ８６；ＹＥＳ）、全文検索部１４２は、処理をステップＳ９４に移す。 As a result of the evaluation, when it is determined that the appearance position of the verification character string is not more than the minimum position of the specified range (Step S85; NO), the full-text search unit 142 returns the process to Step S82. On the other hand, when determining that the appearance position of the verification character string is equal to or greater than the minimum position (step S85; YES), the full-text search unit 142 determines whether the appearance position of the verification character string is equal to or less than the maximum position within the specified range. Is evaluated (step S86). When determining that the appearance position of the verification character string is not less than the maximum position (step S86; NO), the full-text search unit 142 shifts the processing to step S103 in FIG. On the other hand, when determining that the appearance position of the verification character string is equal to or less than the maximum position (step S86; YES), the full-text search unit 142 shifts the processing to step S94.

ステップＳ８１において、検証文字列が４文字以上であると判定した場合には（ステップＳ８１；ＮＯ）、全文検索部１４２は、見出し部インデックスから、検証文字列から抽出したバイグラムのうちの２番目のバイグラムの出現位置を読み込んで、これを検証文字列の出現位置と特定する（ステップＳ８７）。 If it is determined in step S81 that the verification character string has four or more characters (step S81; NO), the full-text search unit 142 extracts the second bigram of the bigram extracted from the verification character string from the heading index. The appearance position of the bigram is read and identified as the appearance position of the verification character string (step S87).

検証文字列の出現位置を特定すると、全文検索部１４２は、ステップＳ７８において導出された検索結果候補の辞書番号、見出し番号及び用例番号によって指定される範囲内に検証文字列が存在するかを評価する（ステップＳ８８）。 When the appearance position of the verification character string is specified, the full-text search unit 142 evaluates whether the verification character string exists within the range specified by the dictionary number, heading number, and example number of the search result candidate derived in step S78. (Step S88).

評価の結果、検証文字列の出現位置が指定範囲の最小位置以上で無いと判定した場合には（ステップＳ８９；ＮＯ）、全文検索部１４２は、処理をステップＳ８７に戻す。一方、検証文字列の出現位置が最小位置以上であると判定した場合には（ステップＳ８９；ＹＥＳ）、全文検索部１４２は、検証文字列の出現位置が指定範囲内の最大位置以下であるかを評価する（ステップＳ９０）。検証文字列の出現位置が最大位置以下でないと判定した場合は（ステップＳ９０；ＮＯ）、全文検索部１４２は、処理を図１６のステップＳ１０３に移す。 As a result of the evaluation, when it is determined that the appearance position of the verification character string is not greater than or equal to the minimum position of the specified range (Step S89; NO), the full-text search unit 142 returns the process to Step S87. On the other hand, when determining that the appearance position of the verification character string is equal to or greater than the minimum position (step S89; YES), the full-text search unit 142 determines whether the appearance position of the verification character string is equal to or less than the maximum position within the specified range. Is evaluated (step S90). If it is determined that the appearance position of the verification character string is not less than the maximum position (step S90; NO), the full-text search unit 142 shifts the processing to step S103 in FIG.

一方、検証文字列の出現位置が最大位置以下であると判定した場合は（ステップＳ９０；ＹＥＳ）、全文検索部１４２は、見出し部インデックスから、検証文字列から抽出したバイグラムのうちの区切り文字を含まないバイグラムの出現位置を読み込んで、その連続性を評価する（ステップＳ９１）。評価の結果、連続性が無いと判定した場合は（ステップＳ９２；ＮＯ）、全文検索部１４２は、処理をステップＳ８７に戻し、２番目のバイグラムの他の出現位置について同様の処理を実行する。 On the other hand, if it is determined that the appearance position of the verification character string is equal to or less than the maximum position (step S90; YES), the full-text search unit 142 determines the delimiter character of the bigram extracted from the verification character string from the heading index. The appearance position of the bigram that is not included is read and its continuity is evaluated (step S91). As a result of the evaluation, when it is determined that there is no continuity (step S92; NO), the full-text search unit 142 returns the process to step S87 and executes the same process for another occurrence position of the second bigram.

これに対して、連続性が有ると判定した場合（ステップＳ９２；ＹＥＳ）、全文検索部１４２は、見出し部インデックスから、区切り文字を含むバイグラムの出現位置を読み込んで、その連続性を評価することで、検証状態を導出する（ステップＳ９３）。そして、全文検索部１４２は、処理をステップＳ９４に移す。 On the other hand, when it is determined that there is continuity (step S92; YES), the full-text search unit 142 reads the appearance position of the bigram including the delimiter from the index of the heading part, and evaluates the continuity. Then, a verification state is derived (step S93). Then, the full-text search unit 142 shifts the processing to step S94.

ステップＳ９４において、全文検索部１４２は、全ての検証文字列について評価を完了したかを判定する（ステップＳ９４）。全ての検証文字列について評価を完了していないと判定した場合には（ステップＳ９４；ＮＯ）、処理をステップＳ８１に戻し、全ての検証文字列について同様の処理を実行する。 In step S94, the full-text search unit 142 determines whether the evaluation has been completed for all verification character strings (step S94). If it is determined that the evaluation has not been completed for all verification character strings (step S94; NO), the process returns to step S81, and the same processing is executed for all verification character strings.

一方、全ての検証文字列について評価を完了したと判定した場合には（ステップＳ９４；ＹＥＳ）、全文検索部１４２は、図１６に移り、以上の処理で得られた検索結果候補をＡＮＤ検索の検索結果と決定する（ステップＳ１０１）。言い換えると、全文検索部１４２は、指定範囲内に存在する１個の基準文字列の出現位置と少なくとも１個の検証文字列の出現位置とを、ＡＮＤ検索でヒットした位置と決定する。評価部１５０は、このＡＮＤ検索の検索結果に対する順位付け評価値を算出する（ステップＳ１０２）。 On the other hand, when it is determined that the evaluation has been completed for all the verification character strings (step S94; YES), the full-text search unit 142 proceeds to FIG. 16 and compares the search result candidates obtained by the above processing with the AND search. A search result is determined (step S101). In other words, the full-text search unit 142 determines the appearance position of one reference character string and the appearance position of at least one verification character string existing in the specified range as the positions where the AND search has been performed. The evaluation unit 150 calculates a ranking evaluation value for the search result of the AND search (step S102).

次に、全文検索部１４２は、未評価の出現位置が存在するかを判定し（ステップＳ１０３）、未評価の出現位置が存在すると判定した場合には（ステップＳ１０３；ＹＥＳ）、処理を図１４に示したステップＳ７１に戻す。一方、全文検索部１４２は、未評価の出現位置が存在しないと判定した場合には（ステップＳ１０３；ＮＯ）、未評価のカテゴリが存在するかを判定する（ステップＳ１０４）。未評価のカテゴリが存在する場合は（ステップＳ１０４；ＹＥＳ）、全文検索部１４２は、ステップＳ７１に戻り、説明部及び用例部の順に未評価のカテゴリについて同様の処理を実行する。 Next, the full-text search unit 142 determines whether an unevaluated appearance position exists (step S103). If it determines that an unevaluated appearance position exists (step S103; YES), the full-text search unit 142 proceeds to FIG. The process returns to step S71 shown in FIG. On the other hand, when determining that there is no unevaluated appearance position (step S103; NO), the full-text search unit 142 determines whether an unevaluated category exists (step S104). If there is an unevaluated category (step S104; YES), the full-text search unit 142 returns to step S71, and executes the same processing for the unevaluated category in the order of the explanation unit and the example unit.

最終的に、未評価のカテゴリが存在しないと判定した場合は（ステップＳ１０４；ＮＯ）、出力部１６０は、順位付け評価値に基づいて、カテゴリ別に検索結果を並べ替える（ステップＳ１０５）。そして、出力部１６０は、検索結果を表示部１７０に出力する（ステップＳ１０６）。このとき、出力部１６０は、図８に示したように、見出し部、説明部及び用例部からの検索結果を、それぞれ見出し語単位、説明部単位及び用例部単位で表示部１７０に表示する。以上により、検索文字列の数が複数である場合の全文検索処理は終了する。 Finally, when it is determined that there is no unrated category (step S104; NO), the output unit 160 sorts the search results by category based on the ranking evaluation value (step S105). Then, the output unit 160 outputs the search result to the display unit 170 (Step S106). At this time, as shown in FIG. 8, the output unit 160 displays the search results from the heading part, the explanation part, and the example part on the display part 170 in headword unit, explanation part unit, and example part unit, respectively. As described above, the full-text search processing when the number of search character strings is plural is completed.

第３に、図１７に示すフローチャートを参照して、曖昧検索処理の流れを説明する。ユーザがキーボード１００ｉを操作して検索キーワードを入力し、曖昧検索モードを選択した場合に、曖昧検索部１４３は、図１７に示す曖昧検索処理を開始する。 Third, the flow of the fuzzy search process will be described with reference to the flowchart shown in FIG. When the user operates the keyboard 100i to input a search keyword and selects the fuzzy search mode, the fuzzy search unit 143 starts the fuzzy search processing shown in FIG.

曖昧検索処理が開始すると、曖昧検索部１４３は、検索キーワードとして入力された入力文字列から、検索文字列を生成する（ステップＳ１２１）。具体的に説明すると、曖昧検索部１４３は、入力文字列の両端に区切り文字を付加することによって、検索文字列を生成する。例えば検索キーワードとして「ｓａｌａｅｄ」との文字列が入力された場合、曖昧検索部１４３は、検索キーワード「ｓａｌａｅｄ」の先頭と末尾とに「・」を付加することによって、「・ｓａｌａｅｄ・」との検索文字列を生成する。 When the fuzzy search process starts, the fuzzy search unit 143 generates a search character string from the input character string input as a search keyword (step S121). More specifically, the fuzzy search unit 143 generates a search character string by adding delimiters to both ends of the input character string. For example, when the character string “saled” is input as a search keyword, the fuzzy search unit 143 adds “•” to the beginning and end of the search keyword “salaed”, so that Generate a search string.

検索文字列を生成すると、曖昧検索部１４３は、検索文字列からモノグラム及びバイグラムを抽出する（ステップＳ１２２）。具体的に説明すると、曖昧検索部１４３は、検証文字列がＭ文字であるとすると、検証文字列から、両端の区切り文字をそれぞれ含む２個のバイグラムと、区切り文字を含まない（Ｍ−２）個のモノグラムを抽出する。例えば、検索文字列「・ｓａｌａｅｄ・」からは、両端の「・ｓ」、「ｄ・」の２個のバイグラムと、中央の「ｓ」、「ａ」、「ｌ」、「ａ」、「ｅ」、「ｄ」の６個のモノグラムと、が抽出される。 After generating the search character string, the fuzzy search unit 143 extracts a monogram and a bigram from the search character string (Step S122). More specifically, assuming that the verification character string is M characters, the fuzzy search unit 143 uses the verification character string to include two bigrams each including delimiters at both ends and no delimiter (M-2 ) Extract monograms. For example, from the search string ".salaed.", Two bigrams of ".s" and "d." At both ends and "s", "a", "l", "a", " e ”and six monograms“ d ”are extracted.

次に、曖昧検索部１４３は、検証文字列を基準文字列と設定する（ステップＳ１２３）。基準文字列を設定すると、曖昧検索部１４３は、辞書データに含まれる見出し語毎に、抽出したモノグラム及びバイグラムの連続性を評価する処理に移る。 Next, the fuzzy search unit 143 sets the verification character string as a reference character string (step S123). After setting the reference character string, the fuzzy search unit 143 proceeds to a process of evaluating the continuity of the extracted monogram and bigram for each headword included in the dictionary data.

まず、曖昧検索部１４３は、辞書データに含まれる複数の見出し語のうちの第１の見出し語を選択し、その見出し語の出現範囲を導出する（ステップＳ１２４）。具体的に説明すると、曖昧検索部１４３は、検索インデックスの見出し番号に関するファイルと見出し部の開始位置に関するファイル（図６に示した（７）number.idx及び（８）headline.idx）とを参照して、辞書データ内において見出し語が出現する範囲を特定する。 First, the fuzzy search unit 143 selects a first headword from a plurality of headwords included in the dictionary data, and derives the appearance range of the headword (step S124). More specifically, the fuzzy search unit 143 refers to a file related to the index number of the search index and files related to the start position of the index part ((7) number.idx and (8) headline.idx shown in FIG. 6). Then, the range in which the headword appears in the dictionary data is specified.

見出し語の出現範囲を導出すると、曖昧検索部１４３は、見出し部インデックス（図５に示した（１）pattern_hdl.idx及び（２）position_hdl.idx）を参照して、基準文字列から抽出した各モノグラム及びバイグラムの出現位置を読み込む（ステップＳ１２５）。 After deriving the appearance range of the headword, the fuzzy search unit 143 refers to the headline index ((1) pattern_hdl.idx and (2) position_hdl.idx shown in FIG. 5) and extracts each of the extracted character strings from the reference character string. The appearance positions of the monogram and the bigram are read (step S125).

出現位置を読み込むと、曖昧検索部１４３は、ステップＳ１２４で導出した見出し語テキストの出現範囲内に存在する各モノグラム及びバイグラムの出現位置を基準として、連続合致Ｎグラム数を算出する（ステップＳ１２６）。具体的に説明すると、例えば検索文字列が「・ｓａｌａｅｄ・」である場合、曖昧検索部１４３は、この検索文字列を構成する「・ｓ」、「ｓ」、「ａ」、「ｌ」、「ａ」、「ｅ」、「ｄ」、「ｄ・」の８個のモノグラム又はバイグラムが、見出し語テキストにおいてこの順番で１文字ずつずれて出現するかを判定する。そして、曖昧検索部１４３は、出現位置が連続して出現するモノグラム及びバイグラムの数を、連続合致Ｎグラム数として算出する。 When the appearance position is read, the fuzzy search unit 143 calculates the number of consecutive matching N-grams based on the appearance position of each monogram and bigram existing within the appearance range of the headword text derived in step S124 (step S126). . More specifically, for example, when the search character string is “· salaed ·”, the fuzzy search unit 143 uses “· s”, “s”, “a”, “l”, It is determined whether eight monograms or bigrams of “a”, “e”, “d”, and “d ·” appear one character at a time in this order in the headword text. Then, the fuzzy search unit 143 calculates the number of monograms and bigrams whose appearance positions appear consecutively as the number of consecutive matching N-grams.

連続合致Ｎグラム数を算出すると、曖昧検索部１４３は、算出した連続合致Ｎグラム数に基づいて、連続合致Ｎグラム数の頻度分布を生成する（ステップＳ１２７）。そして、曖昧検索部１４３は、生成した頻度分布から曖昧検索評価値を算出する(ステップＳ１２８）。曖昧検索評価値の算出方法は、上記特許文献２によって開示された方法を用いることができる。具体的に説明すると、曖昧検索部１４３は、生成した頻度分布を参照して、連続合致Ｎグラム数が多いほど大きな値を曖昧検索評価値として算出する。また、曖昧検索部１４３は、曖昧検索評価値として、連続合致するＮグラムの出現頻度が高いほど大きな値を曖昧検索評価値として算出する。 After calculating the number of consecutive matching N-grams, the fuzzy search unit 143 generates a frequency distribution of the number of consecutive matching N-grams based on the calculated number of consecutive matching N-grams (step S127). Then, the fuzzy search unit 143 calculates an fuzzy search evaluation value from the generated frequency distribution (Step S128). As a method of calculating the fuzzy search evaluation value, the method disclosed in Patent Document 2 can be used. More specifically, the fuzzy search unit 143 refers to the generated frequency distribution and calculates a larger value as the number of consecutive matching N-grams is larger as the fuzzy search evaluation value. In addition, the fuzzy search unit 143 calculates, as the fuzzy search evaluation value, a larger value as the frequency of appearance of successively matching N-grams is higher, as the fuzzy search evaluation value.

次に、評価部１５０は、順位付け評価値を算出する（ステップＳ１２９）。そして、曖昧検索部１４３は、未評価の見出し語が存在するかを判定する（ステップＳ１３０）。未評価の見出し語が存在すると判定した場合には（ステップＳ１３０；ＹＥＳ）、曖昧検索部１４３は、処理をステップＳ１２４に戻し、未評価の見出し語について同様の処理を実行する。 Next, the evaluation unit 150 calculates a ranking evaluation value (Step S129). Then, the fuzzy search unit 143 determines whether an unevaluated headword exists (step S130). If it is determined that there is an unevaluated headword (step S130; YES), the fuzzy search unit 143 returns the process to step S124, and performs the same processing for the unevaluated headword.

最終的に、未評価の見出し語が存在しないと判定した場合は（ステップＳ１３０；ＮＯ）、出力部１６０は、曖昧検索評価値及び順位付け評価値に基づいて、検索結果を並べ替える（ステップＳ１３１）。そして、出力部１６０は、例えば図９に示したように、検索結果を表示部１７０に出力する（ステップＳ１３２）。以上により、曖昧検索処理は終了する。 Finally, when it is determined that there is no unevaluated headword (step S130; NO), the output unit 160 rearranges the search results based on the fuzzy search evaluation value and the ranking evaluation value (step S131). ). Then, the output unit 160 outputs the search result to the display unit 170, for example, as illustrated in FIG. 9 (Step S132). Thus, the fuzzy search process ends.

以上に説明したように、本実施形態に係る検索装置１００は、検索対象の電子文書である辞書データに含まれる複数のＮグラムとその出現位置とを対応付けた検索インデックスを、辞書データのうちの見出し語カテゴリに属するテキストと本文カテゴリに属するテキストとに分けて生成する。そして、検索装置１００は、辞書データ内の見出し語テキストのみを対象とするインクリメンタル検索及び曖昧検索の際には、見出し部インデックスのみを参照して検索キーワードを検索し、辞書データ全体を対象とする全文検索の際には、検索インデックス全体を参照して検索キーワードを検索する。 As described above, the search device 100 according to the present embodiment stores a search index that associates a plurality of N-grams included in dictionary data, which is a search target electronic document, with their appearance positions in the dictionary data. Are generated separately for the text belonging to the headword category and the text belonging to the body category. Then, at the time of the incremental search and the ambiguous search only for the headword text in the dictionary data, the search device 100 searches the search keyword by referring only to the heading index, and targets the entire dictionary data. In a full-text search, a search keyword is searched by referring to the entire search index.

言い換えると、検索装置１００は、Ｎグラムの出現位置をカテゴリ毎に集計してインデックスを生成し、インクリメンタル検索及び曖昧検索の際には、辞書データ全体ではなく、見出し語カテゴリ単位で生成された、データサイズの小さいインデックスを用いて検索する。これにより、インクリメンタル検索及び曖昧検索の際の検索効率が向上するため、検索時間を短縮することができ、所望の電子文書を効率良く探し出すことができる。 In other words, the search device 100 generates an index by summarizing the appearance positions of the N-grams for each category, and generates the index not in the entire dictionary data but in the headword category unit in the incremental search and the ambiguous search. Search using an index with a small data size. As a result, the search efficiency at the time of the incremental search and the ambiguous search is improved, so that the search time can be shortened and a desired electronic document can be efficiently searched.

特に、インクリメンタル検索は、検索キーワードが１文字ずつ入力される毎に実行されるため、インクリメンタル検索による検索時間を短縮することで、全文検索を含めた検索処理全体の処理時間を大きく短縮することができる。また、曖昧検索は、処理時間が比較的大きい検索であるため、検索時間を短縮することによる効果が大きい。 In particular, since the incremental search is executed every time a search keyword is input one character at a time, by shortening the search time by the incremental search, it is possible to greatly reduce the processing time of the entire search process including the full-text search. it can. Further, since the ambiguous search is a search having a relatively long processing time, the effect of reducing the search time is great.

（変形例）
以上に本発明の実施形態について説明したが、上記実施形態は一例であり、本発明の適用範囲はこれに限られない。すなわち、本発明の実施形態は種々の応用が可能であり、あらゆる実施の形態が本発明の範囲に含まれる。 (Modification)
Although the embodiment of the present invention has been described above, the above embodiment is an example, and the scope of the present invention is not limited to this. That is, the embodiments of the present invention can be applied in various ways, and all embodiments are included in the scope of the present invention.

例えば、上記実施形態では、検索装置１００は、検索インデックスを生成する生成部１２０を備えていた。しかしながら、本発明において、検索装置１００は、検索インデックスを生成する機能を備えていなくても良い。予め外部装置が生成した検索インデックスを取得し、取得した検索キーワードをデータ記憶部１１０に記憶しておけば、検索装置１００は、上述した検索処理を実行することができる。例えば、工場出荷時において、検索装置１００に搭載される複数の辞書データのそれぞれに対応する検索インデックスをデータ記憶部１１０に記憶するとしても良い。 For example, in the above-described embodiment, the search device 100 includes the generation unit 120 that generates a search index. However, in the present invention, the search device 100 may not have a function of generating a search index. If the search index generated by the external device is acquired in advance and the acquired search keyword is stored in the data storage unit 110, the search device 100 can execute the above-described search processing. For example, at the time of factory shipment, search indexes corresponding to a plurality of dictionary data mounted on the search device 100 may be stored in the data storage unit 110.

上記実施形態では、検索対象の電子文書として辞書データを例にとって説明した。しかしながら、検索対象の電子文書は、辞書に限定される訳ではなく、複数のカテゴリに分類されたテキストを含む文書であれば、どのような文書でも良い。例えば、検索対象の電子文書は、「発明の名称」及び「特許請求の範囲」等のカテゴリにそれぞれ分類されたテキストを含む特許明細書であっても良い。或いは、検索対象の電子文書は、ある製品が有する機能の名称を表すテキストが分類されるカテゴリ（機能名カテゴリ）と、当該機能を利用するための操作方法を表すテキストが分類されるカテゴリ（操作方法カテゴリ）とを含む説明書であっても良い。 In the above embodiment, dictionary data has been described as an example of an electronic document to be searched. However, the electronic document to be searched is not limited to a dictionary, and may be any document including text classified into a plurality of categories. For example, the electronic document to be searched may be a patent specification including texts classified into categories such as "Title of Invention" and "Claims". Alternatively, in the electronic document to be searched, a category (function name category) into which a text representing a name of a function of a certain product is classified and a category (operation) into which a text representing an operation method for using the function is classified (Method category).

上記実施形態では、検索装置１００は、第１の検索手段としてインクリメンタル検索部１４１と曖昧検索部１４３とを備え、第２の検索手段として全文検索部１４２を備えていた。しかしながら、本発明において、検索装置１００は、第１の検索手段として、インクリメンタル検索又は曖昧検索のどちらか一方のみ備えていても良い。或いは、検索装置１００は、第１の検索手段又は第２の検索手段として、これらの検索に限らず、他の種類の検索機能を備えていても良い。また、インクリメンタル検索（前方一致検索）、全文検索（完全一致検索）、及び曖昧検索（部分一致検索）における具体的な方法、特に順位付けの方法については、任意に設定可能である。また、上記実施形態では、Ｎグラムとしてモノグラム及びバイグラムを用いたが、例えばトライグラム等、他のＮグラムを用いても良い。 In the above-described embodiment, the search device 100 includes the incremental search unit 141 and the fuzzy search unit 143 as the first search unit, and includes the full-text search unit 142 as the second search unit. However, in the present invention, the search device 100 may include only one of the incremental search and the fuzzy search as the first search means. Alternatively, the search device 100 may include not only these searches but also other types of search functions as the first search means or the second search means. Further, a specific method, particularly a ranking method, in an incremental search (a forward match search), a full-text search (a perfect match search), and an ambiguous search (a partial match search) can be arbitrarily set. Further, in the above embodiment, a monogram and a bigram are used as the Ngram, but another Ngram such as a trigram may be used.

上記実施形態では、検索装置１００において、ＣＰＵ１００ａがＲＯＭ１００ｂに記憶されたプログラムを実行することによって、生成部１２０、取得部１３０、検索部１４０、評価部１５０及び出力部１６０のそれぞれとして機能した。しかしながら、本発明において、検索装置１００は、ＣＰＵ１００ａの代わりに、例えばＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、又は、各種制御回路等の専用のハードウェアを備え、専用のハードウェアが、生成部１２０、取得部１３０、検索部１４０、評価部１５０及び出力部１６０のそれぞれとして機能してもよい。この場合、各部の機能それぞれを個別のハードウェアで実現してもよいし、各部の機能をまとめて単一のハードウェアで実現してもよい。また、各部の機能のうち、一部を専用のハードウェアによって実現し、他の一部をソフトウェア又はファームウェアによって実現してもよい。 In the above embodiment, in the search device 100, the CPU 100a functions as each of the generation unit 120, the acquisition unit 130, the search unit 140, the evaluation unit 150, and the output unit 160 by executing the program stored in the ROM 100b. However, in the present invention, the search apparatus 100 includes dedicated hardware such as an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), or various control circuits instead of the CPU 100a. The hardware may function as each of the generation unit 120, the acquisition unit 130, the search unit 140, the evaluation unit 150, and the output unit 160. In this case, each function of each unit may be realized by individual hardware, or the function of each unit may be realized by a single piece of hardware. Further, a part of the function of each unit may be realized by dedicated hardware, and the other part may be realized by software or firmware.

なお、本発明に係る機能を実現するための構成を予め備えた検索装置として提供できることはもとより、プログラムの適用により、既存のパーソナルコンピュータや情報端末機器を、本発明に係る検索装置として機能させることもできる。すなわち、上記実施形態で説明した検索装置１００による各機能構成を実現させるためのテキスト検索プログラムを、既存のパーソナルコンピュータや情報端末機器等を制御するＣＰＵ等が実行できるように適用することで、本発明に係る検索装置１００として機能させることができる。また、本発明に係る検索方法は、検索装置１００を用いて実施することができる。 It should be noted that the present invention can be provided not only as a search device having a configuration for realizing the function according to the present invention but also as a search device according to the present invention by applying a program to an existing personal computer or information terminal device. You can also. That is, by applying a text search program for realizing each functional configuration by the search device 100 described in the above embodiment so that a CPU or the like for controlling an existing personal computer, information terminal device, or the like can execute the text search program. It can function as the search device 100 according to the invention. Further, the search method according to the present invention can be implemented using the search device 100.

また、このようなプログラムの適用方法は任意である。プログラムを、例えば、コンピュータが読取可能な記録媒体（ＣＤ−ＲＯＭ（Compact Disc Read-Only Memory）、ＤＶＤ（Digital Versatile Disc）、ＭＯ（Magneto Optical disc）等）に格納して適用できる他、インターネット等のネットワーク上のストレージにプログラムを格納しておき、これをダウンロードさせることにより適用することもできる。また、辞書データ、再配置データ及び検索インデックス等の上記処理に必要なデータの一部又は全部を外部サーバに記憶し、通信機能を用いてこれらのデータを取得して上記処理を実行する構成も可能である。 The method of applying such a program is arbitrary. The program can be stored in a computer-readable recording medium (Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD), Magneto Optical disc (MO), etc.), and applied, for example, to the Internet, etc. It is also possible to store the program in a storage on the network and download the program to apply. Further, a configuration is also possible in which part or all of the data required for the above processing such as dictionary data, relocation data, and search index is stored in an external server, and these data are obtained using a communication function to execute the above processing. It is possible.

以上、本発明の好ましい実施形態について説明したが、本発明は係る特定の実施形態に限定されるものではなく、本発明には、特許請求の範囲に記載された発明とその均等の範囲とが含まれる。以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。 As described above, the preferred embodiments of the present invention have been described, but the present invention is not limited to the specific embodiments, and the present invention includes the invention described in the claims and the equivalents thereof. included. Hereinafter, the invention described in the claims of the present application is additionally described.

（付記１）
検索対象の電子文書のうちの第１のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第１のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第１のインデックス、及び、前記電子文書のうちの第２のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第２のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第２のインデックスを記憶する記憶手段と、
検索キーワードを取得する取得手段と、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記第１のカテゴリに属するテキストから前記検索キーワードを検索する第１の検索手段と、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックス及び前記第２のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記電子文書から前記検索キーワードを検索する第２の検索手段と、
少なくとも前記第１の検索手段と前記第２の検索手段の一方による検索結果を出力する出力手段と、
を備えることを特徴とする検索装置。 (Appendix 1)
A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. Storage means for storing the second index;
Acquisition means for acquiring a search keyword;
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. First search means;
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. Second search means for performing
Output means for outputting a search result by at least one of the first search means and the second search means;
A search device comprising:

（付記２）
前記取得手段は、前記検索キーワードを１文字ずつ取得し、
前記第１の検索手段は、前記取得手段によって前記検索キーワードが１文字ずつ取得される度に、前記取得手段によって既に取得された少なくとも１文字に新たに取得された１文字を加えた文字列を前記検索キーワードとして検索する、
ことを特徴とする付記１に記載の検索装置。 (Appendix 2)
The acquiring means acquires the search keyword one character at a time,
The first search means, every time the search keyword is obtained one character at a time by the obtaining means, outputs a character string obtained by adding at least one character newly obtained to the at least one character already obtained by the obtaining means. Search as the search keyword,
3. The search device according to claim 1, wherein

（付記３）
前記第１の検索手段は、前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックスから読み出し、読み出した出現位置が、前記検索キーワードにおける前記複数のＮグラムの位置と連続して合致する数に基づいて、前記第１のカテゴリに属するテキストから前記検索キーワードを検索する、
ことを特徴とする付記１又は２に記載の検索装置。 (Appendix 3)
The first search means reads out appearance positions of a plurality of N-grams included in the search keyword from the first index, and the read out appearance position is continuous with positions of the plurality of N-grams in the search keyword. Searching for the search keyword from texts belonging to the first category based on the number of matches.
3. The search device according to claim 1 or 2, wherein

（付記４）
前記第１の検索手段は、前記第１のインデックスを参照して、前記第１のカテゴリに属するテキストから前記検索キーワードで始まる文字列を検索し、
前記第２の検索手段は、前記第１のインデックス及び前記第２のインデックスを参照して、前記電子文書から前記検索キーワードを含む文字列を検索する、
ことを特徴とする付記１から３のいずれか１つに記載の検索装置。 (Appendix 4)
The first search means searches the text belonging to the first category for a character string starting with the search keyword with reference to the first index,
The second search means searches the electronic document for a character string including the search keyword with reference to the first index and the second index;
4. The search device according to any one of supplementary notes 1 to 3, wherein:

（付記５）
前記第１のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第１のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けることによって、前記第１のインデックスを生成し、前記第２のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第２のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けることによって、前記第２のインデックスを生成する生成手段、を更に備え、
前記記憶手段は、前記生成手段によって生成された前記第１のインデックス及び前記第２のインデックスを記憶する、
ことを特徴とする付記１から４のいずれか１つに記載の検索装置。 (Appendix 5)
Generating the first index by associating the plurality of N-grams included in the text belonging to the first category with the appearance positions of the plurality of N-grams in the text belonging to the first category; Generating the second index by associating a plurality of N-grams included in the text belonging to the second category with the appearance positions of the plurality of N-grams in the text belonging to the second category Means, further comprising:
The storage unit stores the first index and the second index generated by the generation unit,
5. The search device according to any one of supplementary notes 1 to 4, wherein:

（付記６）
前記電子文書は、辞書データであり、
前記第１のカテゴリに属するテキストは、前記辞書データにおける見出し語を表すテキストであり、
前記第２のカテゴリに属するテキストは、前記見出し語の説明又は用例を表すテキストである、
ことを特徴とする付記１から５のいずれか１つに記載の検索装置。 (Appendix 6)
The electronic document is dictionary data,
The text belonging to the first category is a text representing a headword in the dictionary data,
The text belonging to the second category is text representing an explanation or an example of the headword,
6. The search device according to any one of supplementary notes 1 to 5, characterized in that:

（付記７）
検索対象の電子文書のうちの第１のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第１のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第１のインデックス、及び、前記電子文書のうちの第２のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第２のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第２のインデックスを取得するインデックス取得ステップと、
検索キーワードを取得する検索キーワード取得ステップと、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記第１のカテゴリに属するテキストから前記検索キーワードを検索する第１の検索ステップと、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックス及び前記第２のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記電子文書から前記検索キーワードを検索する第２の検索ステップと、
少なくとも前記第１の検索ステップと前記第２の検索ステップの一方による検索結果を出力する出力ステップと、
を含む検索方法。 (Appendix 7)
A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. An index obtaining step of obtaining an index of 2;
A search keyword obtaining step for obtaining a search keyword;
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. A first search step;
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. A second search step,
An output step of outputting a search result by at least one of the first search step and the second search step;
Search method including.

（付記８）
コンピュータを、
検索対象の電子文書のうちの第１のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第１のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第１のインデックス、及び、前記電子文書のうちの第２のカテゴリに属するテキストに含まれる複数のＮグラムと、前記第２のカテゴリに属するテキストにおける当該複数のＮグラムの出現位置と、を対応付けた第２のインデックスを取得するインデックス取得手段、
検索キーワードを取得する検索キーワード取得手段、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記第１のカテゴリに属するテキストから前記検索キーワードを検索する第１の検索手段、
前記検索キーワードに含まれる複数のＮグラムの出現位置を前記第１のインデックス及び前記第２のインデックスから読み出し、読み出した出現位置の連続性を評価することによって、前記電子文書から前記検索キーワードを検索する第２の検索手段、
少なくとも前記第１の検索手段と前記第２の検索手段の一方による検索結果を出力する出力手段、
として機能させるためのプログラム。 (Appendix 8)
Computer
A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. Index acquisition means for acquiring the second index;
Search keyword acquisition means for acquiring a search keyword,
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. A first search means,
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. A second search means,
Output means for outputting a search result by at least one of the first search means and the second search means;
Program to function as

１００…検索装置、１００ａ…ＣＰＵ、１００ｂ…ＲＯＭ、１００ｃ…ＲＡＭ、１００ｄ…ハードディスク、１００ｅ…メディアコントローラ、１００ｇ…ビデオカード、１００ｈ…ＬＣＤ、１００ｉ…キーボード、１００ｊ…スピーカ、１１０…データ記憶部、１２０…生成部、１２１…再配置データ生成部、１２２…検索インデックス生成部、１３０…取得部、１４０…検索部、１４１…インクリメンタル検索部、１４２…全文検索部、１４３…曖昧検索部、１５０…評価部、１６０…出力部、１７０…表示部 100 search device, 100a CPU, 100b ROM, 100c RAM, 100d hard disk, 100e media controller, 100g video card, 100h LCD, 100i keyboard, 100j speaker, 110 data storage unit, 120 ... Generating unit, 121 ... Relocation data generating unit, 122 ... Search index generating unit, 130 ... Acquisition unit, 140 ... Search unit, 141 ... Incremental search unit, 142 ... Full-text search unit, 143 ... Fuzzy search unit, 150 ... Evaluation Unit, 160: output unit, 170: display unit

Claims

A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. Storage means for storing the second index;
Acquisition means for acquiring a search keyword;
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. First search means;
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. Second search means for performing
Output means for outputting a search result by at least one of the first search means and the second search means;
A search device comprising:

The acquiring means acquires the search keyword one character at a time,
The first search means, every time the search keyword is obtained one character at a time by the obtaining means, outputs a character string obtained by adding at least one character newly obtained to the at least one character already obtained by the obtaining means. Search as the search keyword,
The search device according to claim 1, wherein:

The first search means reads out appearance positions of a plurality of N-grams included in the search keyword from the first index, and the read out appearance position is continuous with positions of the plurality of N-grams in the search keyword. Searching for the search keyword from texts belonging to the first category based on the number of matches.
The search device according to claim 1, wherein

The first search means searches the text belonging to the first category for a character string starting with the search keyword with reference to the first index,
The second search means searches the electronic document for a character string including the search keyword with reference to the first index and the second index;
The retrieval device according to claim 1, wherein:

Generating the first index by associating the plurality of N-grams included in the text belonging to the first category with the appearance positions of the plurality of N-grams in the text belonging to the first category; Generating the second index by associating a plurality of N-grams included in the text belonging to the second category with the appearance positions of the plurality of N-grams in the text belonging to the second category Means, further comprising:
The storage unit stores the first index and the second index generated by the generation unit,
The search device according to any one of claims 1 to 4, wherein:

The electronic document is dictionary data,
The text belonging to the first category is a text representing a headword in the dictionary data,
The text belonging to the second category is text representing an explanation or an example of the headword,
The search device according to any one of claims 1 to 5, wherein:

Computer
A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. An index obtaining step of obtaining an index of 2;
A search keyword obtaining step for obtaining a search keyword;
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. A first search step;
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. A second search step,
An output step of outputting a search result by at least one of the first search step and the second search step;
Search method to perform .

Computer
A first N-gram in which a plurality of N-grams included in the text belonging to the first category in the electronic document to be searched are associated with the appearance positions of the plurality of N-grams in the text belonging to the first category; An index, and a plurality of N-grams included in the text belonging to the second category of the electronic document, and an appearance position of the plurality of N-grams in the text belonging to the second category. Index acquisition means for acquiring the second index;
Search keyword acquisition means for acquiring a search keyword,
The search keywords are retrieved from the text belonging to the first category by reading the appearance positions of the plurality of N-grams included in the search keywords from the first index and evaluating the continuity of the read appearance positions. A first search means,
The search keywords are retrieved from the electronic document by reading the appearance positions of a plurality of N-grams included in the search keywords from the first index and the second index and evaluating the continuity of the read appearance positions. A second search means,
Output means for outputting a search result by at least one of the first search means and the second search means;
Program to function as