JPH09198395A

JPH09198395A - Document retrieval device

Info

Publication number: JPH09198395A
Application number: JP8004857A
Authority: JP
Inventors: Hiroshi Yamaguchi; 浩山口
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1996-01-16
Filing date: 1996-01-16
Publication date: 1997-07-31

Abstract

PROBLEM TO BE SOLVED: To execute efficient retrieval through the use of a keyword from the next time even if the inputted keyword is not registered in an index. SOLUTION: When a retrieval expression is inputted from an input part 1, a keyword extraction part 2 refers to a morpheme analysis dictionary 3 based on the retrieval expression inputted from the input part 1, analyzes a morpheme, extracts the keyword and inputs the keyword to a retrieval part 4. The retrieval part 4 retrieves the index by setting an index storage part 5 to be an object with the keyword as a retrieval key. When the index retrieval result cannot be obtained, a whole sentence retrieval part 8 retrieves the whole sentence of the document in a document storage part 6 by means of searching a character string for the retrieval expression. When the whole sentence retrieval result can be obtained, an addition/registration part 9 registers the word of the retrieval expression in the morpheme analysis dictionary 3 and the index 5 as one word.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、登録する文書に対
してインデックスを作成しておき、検索時、入力された
キーワードに対して上記インデックスを用いて検索を行
うことにより文書を特定する文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search in which an index is created for a document to be registered and a document is specified by searching the inputted keyword using the index when searching. Regarding the device.

【０００２】[0002]

【従来の技術】文書を検索するための代表的な検索方式
として、文字列サーチを用いて全文の検索を行う全文検
索方式と、登録する文書に対してインデックスを作成し
ておき、検索時には、入力されたキーワードに対して上
記インデックスを用いて検索を行い文書を特定するイン
デックス検索方式が知られている。2. Description of the Related Art As a typical search method for searching a document, a full-text search method for searching a full-text using a character string search and an index for a document to be registered are prepared. An index search method is known in which a document is specified by performing a search for an input keyword using the index.

【０００３】文字列サーチによる全文検索方式は、予め
キーワードが付与されるインデックス方式に比べてかな
り長い検索時間を必要とする。また、例えば文字列Ａと
Ｂの論理積あるいは論理和をとるといった文字列サーチ
が指定された場合等における検索条件論理式の照合を考
え合わせると、更に速度が低下することになる。しか
も、この全文検索方式では、同じ検索式が与えられた場
合でも、その都度長い検索時間を必要とする。The full-text search method based on the character string search requires a considerably longer search time than the index method in which a keyword is added in advance. Further, if the collation of the search condition logical expression is specified in the case where a character string search such as logical product or logical sum of the character strings A and B is specified, the speed further decreases. Moreover, this full-text search method requires a long search time each time even if the same search expression is given.

【０００４】一方、インデックス検索方式では、登録す
る文書から、キーワード抽出手段によって、予め特定の
キーワードを抽出してインデックスに登録しておき、検
索時には、入力された検索式から上記キーワード抽出手
段を用いて検索キーを抽出し、この検索キーに対応する
インデックスを参照しながら文書を特定することが基本
となっているため、インデックスによって示された文書
のみを検索すれば良く、高速かつ柔軟な検索が可能とな
る。On the other hand, in the index search method, a specific keyword is extracted in advance from the document to be registered by the keyword extracting means and registered in the index, and at the time of search, the keyword extracting means is used from the input search expression. Basically, the search key is extracted and the document is specified by referring to the index corresponding to this search key. Therefore, it is sufficient to search only the document indicated by the index, which enables fast and flexible search. It will be possible.

【０００５】但し、このインデックス検索方式におい
て、全文の検索を行わずに済むという上記利点が生かさ
れるのは、検索式からキーワードがうまく抽出され、か
つこのキーワードがインデックスに登録されている場合
に限られることは言うまでもない。However, in the index search method, the advantage of not having to search the entire text is utilized only when the keyword is successfully extracted from the search expression and the keyword is registered in the index. It goes without saying that it will be done.

【０００６】ところが、文書検索においては、例えば、
検索式が複合語である場合等において、キーワードがう
まく抽出されないといった事態も起こり得る。かかる不
都合を解消すべく、従来、特開平3-116375号において、
インデックス登録時に複合語を単純語に分割して登録し
ておき、検索時には、上記複合語から成るキーワードの
各単語の間に所定の検索条件を与えることにより、入力
キーワードのインデックスへのヒット率を高めるように
した情報検索装置が提案されている。However, in document retrieval, for example,
When the search expression is a compound word, a situation may occur in which the keyword is not successfully extracted. In order to eliminate such inconvenience, conventionally, in JP-A-3-116375,
When the index is registered, the compound word is divided into simple words and registered, and at the time of search, by applying a predetermined search condition between each word of the keywords composed of the compound words, the hit ratio of the input keyword to the index can be determined. An information retrieval device designed to increase the number has been proposed.

【０００７】この従来装置では、例えば、「騒音防止条
例」というキーワードが付与されている場合において、
上記検索条件の与え方によって、「騒音」，「防止」，
「条例」という各キーワードでの検索の他、「騒音条
例」というキーワードでの検索も実現できるようにな
る。In this conventional apparatus, for example, in the case where the keyword "noise prevention regulations" is added,
"Noise", "Prevention",
In addition to the search by each keyword "regulation", the search by the keyword "noise regulation" can be realized.

【０００８】この従来装置では、全ての単純語に分割し
たものをキーワードとしてインデックスに登録しておく
ことで、複合語が入ってきた場合でも、入力キーワード
がうまく抽出されない点の改善にこそ寄与するが、キー
ワードがインデックスに登録されていないことに起因す
る検索不能については何等対処し得なかった。In this conventional apparatus, by dividing all simple words into keywords and registering them in the index, even if a compound word comes in, it contributes to the improvement of the point that the input keyword is not well extracted. However, we could not deal with the unsearchability caused by the keyword not being registered in the index.

【０００９】そこで、かかる不都合を解消するための対
策として、抽出するキーワードを予めユーザ辞書や専門
辞書等に登録しておくという方法が考えられる。しかし
ながら、この方法においても、ユーザ辞書や専門辞書中
に所望のキーワードが入っていない場合には検索が行え
ないという点では上記方法と何等変わるところはなかっ
た。しかも、この方法では、新たな文書登録の度に、必
要とされるキーワードを上記辞書に登録するという手順
を踏むため、既に登録されているデータベースに対して
後から登録したキーワードが反映されず、せっかく登録
したにも拘らず、このキーワードを用いた検索が行えな
いことになった。Therefore, as a measure for eliminating such inconvenience, a method of registering the keyword to be extracted in a user dictionary or a specialized dictionary in advance can be considered. However, even with this method, there is no difference from the above method in that the search cannot be performed if the desired keyword is not included in the user dictionary or the specialized dictionary. Moreover, in this method, since a required keyword is registered in the dictionary each time a new document is registered, the keyword registered later is not reflected in the already registered database, Despite registering with great care, I could not search using this keyword.

【００１０】[0010]

【発明が解決しようとする課題】上記の如く、従来の文
書検索装置では、インデックス検索方式を採用すること
によって、全文検索において、毎回、同じ検索式を用い
しかも長い時間をかけて検索を行うといった無駄を軽減
できるものの、反面、インデックス検索方式の特有の事
情により、文書中に検索式と同じキーワードが存在して
いても、インデックスにそのキーワードが登録されてい
なければ、その文書の検索が行えないという問題点があ
った。As described above, the conventional document retrieval apparatus employs the index retrieval method, so that the same retrieval expression is used every time in full-text retrieval and the retrieval is performed for a long time. Although it can reduce waste, on the other hand, due to the peculiar circumstances of the index search method, even if the same keyword as the search expression exists in the document, if that keyword is not registered in the index, the document cannot be searched. There was a problem.

【００１１】本発明は上記実情に鑑みてなされたもので
あり、入力したキーワードがインデックスに登録されて
いない場合にも、次回からは、当該キーワードを用いた
効率の良い検索を行える文書検索装置を提供することを
目的とする。The present invention has been made in view of the above circumstances, and from the next time onward, even if the input keyword is not registered in the index, a document search device capable of performing an efficient search using the keyword is provided. The purpose is to provide.

【００１２】[0012]

【課題を解決するための手段】本発明は、文書を記憶す
る文書記憶手段と、該文書記憶手段に記憶された文書の
インデックスを記憶するインデックス記憶手段と、検索
キーに基づき前記インデックス記憶手段を検索し、該検
索キーに該当する文書を特定するインデックス検索手段
とを備え、前記検索キーに対する前記インデックス検索
手段のインデックス検索結果に基づき文書検索を行う文
書検索装置において、検索キーを用いて前記文書記憶手
段に記憶された文書を検索し、当該検索キーを含む文書
を特定するテキスト検索手段と、前記インデックス検索
手段により前記検索キーに該当する文書が特定されなか
った場合、前記テキスト検索手段により前記文書記憶手
段に記憶された文書の検索を行わせ、該文書検索により
文書が特定された場合、前記検索キーを該検索キーによ
り特定された文書の文書情報に対応付けて前記インデッ
クス記憶手段に追加登録する追加登録手段とを具備する
ことを特徴とするものである。According to the present invention, there are provided a document storage means for storing a document, an index storage means for storing an index of a document stored in the document storage means, and the index storage means based on a search key. In a document search device, which is provided with an index search unit that searches and identifies a document corresponding to the search key, and performs document search based on an index search result of the index search unit with respect to the search key, the document using the search key. A text search unit that searches the document stored in the storage unit and specifies a document including the search key; and if the index search unit does not specify the document corresponding to the search key, the text search unit A document stored in the document storage means is searched, and the document is specified by the document search. If, the search key is characterized in that it comprises the additional registration means for additionally registering the index storage means in association with the document information of the document specified by the retrieval key.

【００１３】望ましくは、本発明は、形態素解析情報を
記憶する形態素解析辞書と、検索時に入力される検索式
を前記形態素解析情報を用いて形態素解析することによ
り検索キーワードを抽出し、該検索キーワードを前記検
索キーとして前記インデックス検索手段に与えるキーワ
ード抽出手段とを具備し、前記追加登録手段は、前記イ
ンデックス記憶手段に対する前記検索キーの追加登録
時、該検索キー及び当該検索キーに対する品詞情報を前
記形態素解析辞書にも追加登録することを特徴とする。Preferably, the present invention extracts a search keyword by morphologically analyzing a morpheme analysis dictionary storing morpheme analysis information and a search expression input at the time of search using the morpheme analysis information, and the search keyword And a keyword extracting means for providing the index search means as the search key, wherein the additional registration means, when additionally registering the search key in the index storage means, stores the search key and the part-of-speech information for the search key. It is characterized in that it is additionally registered in the morphological analysis dictionary.

【００１４】[0014]

【発明の実施の形態】以下、本発明の実施の形態につい
て添付図面を参照して詳細に説明する。図１は、本発明
の第１の実施の形態に係わる文書検索装置の概略構成を
示すブロック図である。この文書検索装置は、入力部
１、キーワード抽出部２、形態素解析辞書３、検索部
４、インデックス記憶部５、表示部６、文書記憶部７、
全文検索部８、追加登録部９を具備して構成される。Embodiments of the present invention will be described below in detail with reference to the accompanying drawings. FIG. 1 is a block diagram showing a schematic configuration of a document search device according to the first embodiment of the present invention. This document search device includes an input unit 1, a keyword extraction unit 2, a morphological analysis dictionary 3, a search unit 4, an index storage unit 5, a display unit 6, a document storage unit 7,
A full-text search unit 8 and an additional registration unit 9 are provided.

【００１５】形態素解析辞書３には、例えば、図３に示
す如く、見出し語に品詞情報を対応付けて成る形態素解
析情報が記憶される。この形態素解析情報は、検索時に
入力される検索式を形態素解析するために用いられる。
インデックス記憶部５には、例えば、図４に示す如く、
文書記憶部７への文書登録時に抽出したキーワードと該
キーワードが含まれる文書の文書情報（文書ＩＤ）とを
対応付けて成るインデックスが記憶される。文書記憶部
７には、上記インデックスにおける各キーワードを含む
文書が記憶されている。The morphological analysis dictionary 3 stores, for example, morphological analysis information in which headwords are associated with part-of-speech information, as shown in FIG. This morphological analysis information is used for morphological analysis of the search expression input at the time of search.
In the index storage unit 5, for example, as shown in FIG.
An index formed by associating the keyword extracted at the time of document registration in the document storage unit 7 with the document information (document ID) of the document including the keyword is stored. The document storage unit 7 stores a document including each keyword in the index.

【００１６】本発明の文書検索装置において、文書記憶
部７内の文書を検索する方法として、インデックス記憶
部５に記憶されているインデックスに基づき検索する方
法と、上記インデックスに依らず、文書記憶部７内の文
書の全文を文字列サーチにより検索する方法とがある。
検索部４は前者の検索方法による検索機能を担い、全文
検索部８は後者の方法による検索機能を担っている。In the document retrieval apparatus of the present invention, as a method for retrieving a document in the document storage unit 7, a method of retrieving based on an index stored in the index storage unit 5 and a document storage unit regardless of the index are used. There is a method of searching the entire text of the documents in 7 by a character string search.
The search unit 4 has a search function by the former search method, and the full-text search unit 8 has a search function by the latter method.

【００１７】次に、この文書検索装置における文書検索
処理について、図２に示すフローチャートを参照して説
明する。まず、文書検索を開始するにあたり、検索すべ
き文書の内容を意識した検索式がユーザによって選ば
れ、この検索式が入力部１から入力される（ステップ１
０１）。この検索式は入力部１からキーワード抽出部２
に転送される。キーワード抽出部２は、上記検索式を基
に形態素解析辞書３を参照して形態素解析を行い、当該
検索式を構成する名詞等の単語をキーワードとして抽出
する。この形態素解析により抽出されたキーワードとし
ての単語は、キーワード抽出部２から検索部４に入力さ
れる。Next, the document search processing in this document search device will be described with reference to the flowchart shown in FIG. First, when a document search is started, a user selects a search formula in consideration of the content of the document to be searched, and the search formula is input from the input unit 1 (step 1
01). This search formula is input from the input unit 1 to the keyword extraction unit 2
Is forwarded to The keyword extracting unit 2 refers to the morpheme analysis dictionary 3 based on the above-mentioned search expression, performs morpheme analysis, and extracts words such as nouns constituting the search expression as keywords. A word as a keyword extracted by this morphological analysis is input from the keyword extracting unit 2 to the searching unit 4.

【００１８】検索部４は、キーワード抽出部２から入力
される上記単語を検索キーとして用いながら、インデッ
クス記憶部５内の登録情報を対象としたインデックス検
索を行い（ステップ１０２）、インデックス検索対象の
検索キーが抽出されたか否かを判断する（ステップ１０
３）。The search section 4 performs an index search on the registered information in the index storage section 5 using the above-mentioned word input from the keyword extraction section 2 as a search key (step 102), and searches for the index search target. It is determined whether the search key has been extracted (step 10).
3).

【００１９】ここで、インデックス検索する検索キーが
抽出された場合（ステップ１０３ＹＥＳ）、次いで、検
索部４は、当該抽出された全ての検索キーに対応するイ
ンデックスを読み出し、これら全インデックスにおける
各文書情報間の論理積（ＡＮＤ）演算を行う（ステップ
１０４）。その後、検索部４は、上記論理積演算の演算
結果に従って上記インデックス検索結果を表示部６に表
示し（ステップ１０５）、一連の検索処理を終える。If a search key for index search is extracted (YES in step 103), the search unit 4 then reads out the indexes corresponding to all the extracted search keys, and the document information in all the indexes. An AND operation between them is performed (step 104). Then, the search unit 4 displays the index search result on the display unit 6 according to the calculation result of the logical product calculation (step 105), and ends the series of search processing.

【００２０】これに対して、上記インデックス検索の開
始後、インデックス検索する検索キーが抽出されなかっ
た場合（ステップ１０３ＮＯ）、次いで、検索部４は、
ステップ１０１でキーワード抽出部２から受け取った検
索式の文字列を全て全文検索部８に転送する。ここで、
全文検索部８は、入力された検索式の文字列全体を一語
として用いながら、この検索式に関して、文書記憶部７
に登録されている文書を対象とした全文検索を行いなが
ら（ステップ１１０）、当該全文検索の検索結果が得ら
れたか否かを判定する（ステップ１１１）。ここで、全
文検索結果が得られなかった場合すなわち検索結果が
「０」の場合（ステップ１１１ＮＯ）、直ちに検索処理
を終了する。On the other hand, when the search key for index search is not extracted after the index search is started (NO in step 103), then the search unit 4
In step 101, all the character strings of the search expression received from the keyword extracting unit 2 are transferred to the full-text searching unit 8. here,
The full-text search unit 8 uses the entire character string of the input search expression as one word, and with respect to this search expression, the document storage unit 7
While performing a full-text search for the document registered in (step 110), it is determined whether or not the search result of the full-text search is obtained (step 111). Here, when the full-text search result is not obtained, that is, when the search result is “0” (step 111 NO), the search process is immediately terminated.

【００２１】一方、上記全文検索結果が「１」以上存在
する場合（ステップ１１１ＹＥＳ）、全文検索部８はそ
の全文検索結果を追加登録部９へ通知する。追加登録部
９は、上記通知された検索結果に基づき上記全文検索に
用いられた上記検索式の文字列を一単語としてかつ品詞
情報は名詞として形態素解析辞書３に登録し、かつイン
デックス記憶部５に対しては、上記検索式の文字列をキ
ーワード欄に、また該検索式により検索された文書ＩＤ
を文書情報欄にそれぞれ登録する（ステップ１１２）。
次いで、全文検索部８は、上記全文検索結果を表示部７
に対して表示せしめ（ステップ１０５）、一連の検索処
理を終える。On the other hand, when the above-mentioned full-text search result is "1" or more (YES in step 111), the full-text search unit 8 notifies the additional-registration unit 9 of the full-text search result. The additional registration unit 9 registers the character string of the search expression used in the full-text search as one word and the part-of-speech information as a noun in the morphological analysis dictionary 3 based on the notified search result, and the index storage unit 5 For, the character string of the above search formula in the keyword field, and the document ID searched by the search formula
Are registered in the document information fields (step 112).
Next, the full-text search unit 8 displays the full-text search result on the display unit 7.
Is displayed (step 105), and a series of search processing is completed.

【００２２】次に、上記文書検索のより具体的な処理動
作に関し、検索式として、例えば「自然言語処理技術の
応用」という文字列が入力された場合と、「ＤＰ３０
０」という商品名が入力された場合について詳述する。
なお、この場合において、形態素解析辞書３の登録情報
の内容は図３に示す如くであり、インデックス記憶部５
の登録情報の内容は図４に示す如くであるものとする。Next, regarding a more specific processing operation of the document search, for example, a case where a character string "application of natural language processing technology" is input as a search expression and "DP30
The case where the product name "0" is input will be described in detail.
In this case, the content of the registration information of the morphological analysis dictionary 3 is as shown in FIG.
It is assumed that the content of the registration information is as shown in FIG.

【００２３】まず、入力部１から「自然言語処理技術の
応用」という検索式の入力があると、キーワード抽出部
２は、この検索式を形態素解析辞書３を参照して形態素
解析することにより、「自然／言語／処理／技術／の／
応用」という解析結果を得て、名詞である「自然」，
「言語」，「処理」，「技術」，「応用」をキーワード
として抽出して検索部４へと転送する。First, when a search expression "application of natural language processing technology" is input from the input unit 1, the keyword extracting unit 2 performs morphological analysis on this search expression by referring to the morphological analysis dictionary 3, "Nature / language / processing / technology / of /
After obtaining the analysis result of "application", the noun "natural",
“Language”, “processing”, “technology”, and “application” are extracted as keywords and transferred to the search unit 4.

【００２４】次いで、検索部４は、キーワード抽出部２
から受け取った上記各キーワードを用いてインデックス
記憶部５内のインデックスのＡＮＤ検索を行う。この
時、インデックス記憶部５の登録内容（図４参照）か
ら、まず上記各キーワードを含む文書の文書情報として
「１，３，５」，「１，７，８」，「１，５，９」，
「１，６，８」，「１，５，６」を取得し、続いてこれ
ら各文書情報のＡＮＤ演算を行うことにより、文書情報
「１」というＡＮＤ検索結果を得る。Next, the search unit 4 is connected to the keyword extraction unit 2
An AND search of the index in the index storage unit 5 is performed using each of the keywords received from the above. At this time, based on the registered contents of the index storage unit 5 (see FIG. 4), first, as document information of a document including each of the above keywords, “1, 3, 5”, “1, 7, 8”, “1, 5, 9”. ],
By obtaining "1, 6, 8" and "1, 5, 6", and then performing an AND operation on each of these document information, an AND search result of document information "1" is obtained.

【００２５】この時、表示部６では、上記インデックス
検索結果を、例えば、検索結果：文書情報「１」という態様で表示を行う。なお、検索結果の表示方法に
ついては、この他、上記ＡＮＤ検索結果である文書情報
「１」に対応する文書を文書記憶部７から読み出し、そ
の内容を表示する等、種々の変形が可能である。At this time, the display section 6 displays the index search result in the form of search result: document information "1", for example. The display method of the search result can be modified in various ways such as reading the document corresponding to the document information “1” which is the AND search result from the document storage unit 7 and displaying the content. .

【００２６】一方、入力部１から「ＤＰ３００」という
検索式の入力があると、キーワード抽出部２は、形態素
解析辞書３を参照して形態素解析を行うが、この時、形
態素解析辞書３には、「ＤＰ３００」というキーワード
が登録されていない（図３参照）ため、上記検索式「Ｄ
Ｐ３００」の入力に際しキーワード抽出部２から検索部
４に対してキーワード抽出結果が返ってこない。On the other hand, when the search expression "DP300" is input from the input unit 1, the keyword extracting unit 2 refers to the morphological analysis dictionary 3 to perform morphological analysis. , The keyword “DP300” is not registered (see FIG. 3), the search formula “D
When inputting "P300", the keyword extraction result is not returned from the keyword extraction unit 2 to the search unit 4.

【００２７】従って、検索部４では入力キーワードに対
するインデックス検索そのものが行えず、この時のキー
ワード検索結果は「０」となる。この検索結果「０」を
得た検索部４は、キーワード抽出部２から上記検索式
「ＤＰ３００」を取り込み、これを一語としてそのまま
全文検索部８に転送する。Therefore, the search unit 4 cannot perform the index search itself for the input keyword, and the keyword search result at this time is "0". The retrieval unit 4 which has obtained the retrieval result “0” fetches the retrieval formula “DP300” from the keyword extraction unit 2 and transfers it as a word to the full-text retrieval unit 8 as it is.

【００２８】全文検索部８は、検索部４から与えられた
検索式「ＤＰ３００」をサーチ用の文字列として用いな
がら、文書記憶部７に登録された全ての文書について文
字列サーチによる全文検索を行う。The full-text search section 8 uses the search expression "DP300" provided from the search section 4 as a character string for searching, and performs a full-text search by a character string search for all documents registered in the document storage section 7. To do.

【００２９】ここで、文書記憶部７に登録済みの文書２
と文書４に上記検索式「ＤＰ３００」の文字列が含まれ
ており、上記全文検索結果として、文書情報「２，４」
が得られたものとする。図５は、このうちの文書２の文
書内容を示したものであり、当該文書の第２行目には
「ＤＰ３００」の文字列が認められる。Here, the document 2 registered in the document storage unit 7
And the document 4 include the character string of the search expression "DP300", and the full text search result is document information "2, 4".
Shall be obtained. FIG. 5 shows the document contents of Document 2 among them, and the character string “DP300” is recognized in the second line of the document.

【００３０】上記全文検索の結果は、例えば、検索結果：文書情報「２，４」という態様で表示部６に表示される。なお、この場合の
検索結果の表示に際しても、文書ＩＤの直接表示の他、
当該文書ＩＤを持つ文書を文書記憶部７から読み出して
その内容を表示する方法であっても良い。The result of the full-text search is displayed on the display unit 6 in the form of search result: document information "2, 4", for example. When displaying the search result in this case, in addition to the direct display of the document ID,
A method of reading the document having the document ID from the document storage unit 7 and displaying the content may be used.

【００３１】上述した全文検索の結果、文書情報「２，
４」が得られ、全文検索結果が「０」でなかったことを
認識した追加登録部９では、その検索結果を基に、それ
以後の上記検索式「ＤＰ３００」を用いた検索環境を整
えるための情報を形態素解析辞書３及びインデックス記
憶部５に登録する処理を行う。As a result of the above-mentioned full-text search, the document information "2
4 ”is obtained, and the additional registration unit 9 that recognizes that the full-text search result is not“ 0 ”prepares a search environment using the above-described search expression“ DP300 ”based on the search result. The information is registered in the morphological analysis dictionary 3 and the index storage unit 5.

【００３２】具体的に、追加登録部９は、形態素解析辞
書３に対しては、この時の全文検索に用いた文字列であ
る「ＤＰ３００」を見出し語の欄に記述し、またこの見
出し語の欄に対応する品詞情報の欄に「名詞」という情
報を記述する。また、インデックス記憶部５に対して
は、上記文字列「ＤＰ３００」をキーワードの欄に記述
し、このキーワード欄に対応する文書情報の欄には
「２，４」を記述する。Specifically, for the morphological analysis dictionary 3, the additional registration unit 9 describes "DP300", which is the character string used for the full-text search at this time, in the entry word column, and the entry word The information "noun" is described in the column of part-of-speech information corresponding to the column. For the index storage unit 5, the character string "DP300" is described in the keyword column, and "2, 4" is described in the document information column corresponding to this keyword column.

【００３３】図６は、上記追加登録後における形態素解
析辞書３の登録情報の一例を示すものであり、同様に、
図７は、上記追加登録後におけるインデックス記憶部５
の登録情報の一例を示している。FIG. 6 shows an example of registration information of the morphological analysis dictionary 3 after the additional registration, and similarly,
FIG. 7 shows the index storage unit 5 after the additional registration.
7 shows an example of the registration information of.

【００３４】これら各図に示すような形態素解析辞書３
及びインデックス記憶部５の登録環境下では、それ以
後、検索式として「ＤＰ３００」が入力された場合にお
いても、形態素解析辞書３を用いた形態素解析により、
キーワード「ＤＰ３００」を確実に抽出でき、また、こ
のキーワード「ＤＰ３００」によるインデックスのＡＮ
Ｄ検索の結果、文書情報「２，４」なる検索結果を得
て、全文検索に依らない短時間の検索が実現可能とな
る。Morphological analysis dictionary 3 as shown in these figures
Under the registration environment of the index storage unit 5, even when "DP300" is input as a search expression thereafter, by morphological analysis using the morphological analysis dictionary 3,
The keyword "DP300" can be reliably extracted, and the index AN by this keyword "DP300"
As a result of the D search, a search result of document information “2, 4” is obtained, and a short-time search that does not depend on the full-text search can be realized.

【００３５】すなわち、上記追加登録後、入力部１から
「ＤＰ３００」という検索式の入力があると（ステップ
１０１）、キーワード抽出手部２は、形態素解析辞書３
を参照して形態素解析を開始する。この時の形態素解析
においては、図６に示す形態素解析辞書３の登録内容に
基づき、名詞としての「ＤＰ３００」がキーワードとし
て抽出される。That is, when the search expression "DP300" is input from the input unit 1 after the above additional registration (step 101), the keyword extracting unit 2 causes the morphological analysis dictionary 3 to operate.
To start the morphological analysis. In this morphological analysis, "DP300" as a noun is extracted as a keyword based on the registered contents of the morphological analysis dictionary 3 shown in FIG.

【００３６】このキーワード「ＤＰ３００」は、検索部
４に渡され、インデックス検索（ステップ１０２）に付
される。ここで、検索部４は、この時のインデックス記
憶部５における図７に示す如くの登録内容に基づき、
「ＤＰ３００」というキーワードを見いだし（ステップ
１０３ＹＥＳ）、このキーワード「ＤＰ３００」に対応
する文書情報のＡＮＤ検索に移る（ステップ１０４）。This keyword "DP300" is passed to the search unit 4 and is attached to the index search (step 102). Here, the search unit 4 is based on the registered contents as shown in FIG. 7 in the index storage unit 5 at this time.
The keyword "DP300" is found (YES at step 103), and the AND search of the document information corresponding to this keyword "DP300" is performed (step 104).

【００３７】この時、インデックス記憶部５の登録内容
（図７参照）から、キーワード「ＤＰ３００」に対する
文書情報「２，４」というＡＮＤ検索結果を得る。これ
により、表示部６には、キーワード「ＤＰ３００」に対
するインデックスＡＮＤ検索結果として、文書情報
「２，４」なる表示がなされる（ステップ１０５）。ま
た、この時、ユーザからの所定の検索指示が与えられた
場合、検索部４では、文書「２」または「４」若しくは
これら両文書を文書記憶部７から検索し、その文書内容
を表示部６に表示する。At this time, the AND search result of the document information "2, 4" for the keyword "DP300" is obtained from the registered contents of the index storage unit 5 (see FIG. 7). As a result, the document information "2, 4" is displayed on the display unit 6 as the index AND search result for the keyword "DP300" (step 105). Further, at this time, when a predetermined search instruction is given from the user, the search unit 4 searches the document “2” or “4” or both documents from the document storage unit 7, and displays the document contents. Display in 6.

【００３８】このように、本発明では、指定した検索式
が形態素解析辞書３に登録されていないため適当なキー
ワードが抽出されず、上記検索式によるインデックス検
索結果が得られなかった場合、上記検索式に対して文字
列サーチにより全文検索を行い、この全文検索により検
索結果が得られた場合には、上記検索式の語を一単語と
見なして、形態素解析辞書３およびインデックス記憶部
５に追加登録するようにしたものである。As described above, according to the present invention, since the specified search expression is not registered in the morphological analysis dictionary 3, an appropriate keyword is not extracted, and when the index search result by the above search expression is not obtained, the above search is performed. When a full-text search is performed on the expression by a character string search and a search result is obtained by this full-text search, the word of the above-mentioned search expression is regarded as one word and added to the morphological analysis dictionary 3 and the index storage unit 5. It is the one to be registered.

【００３９】通常のインデックス方式では、検索キーが
インデックスに登録されていない場合には、文書中に所
望のキーワードが存在しても検索結果が返らないが、上
記全文検索機能及び追加登録機能を備えた本発明の構成
によれば、インデックスに登録されていない検索キーを
用いた全文の検索が一度うまく行えれば、次回からはイ
ンデックス検索により効率良く検索を行うことが可能と
なる。In the normal index method, if the search key is not registered in the index, the search result is not returned even if the desired keyword is present in the document, but the above-mentioned full-text search function and additional registration function are provided. Further, according to the configuration of the present invention, once the entire text can be searched using the search key that is not registered in the index, the index search can be performed efficiently from the next time.

【００４０】なお、検索キーがインデックスに登録され
ていない場合に、文書中に所望のキーワードが存在して
も検索結果が返らないという課題に対しては、ユーザ辞
書や専門辞書を用意してキーワードを抽出する方法が考
えられるが、その中に所望のキーワードが入っていない
場合には同じで、検索時に必要な検索キーが登録されて
いないと柔軟な検索が行えない。If the search key is not registered in the index and the search result is not returned even if the desired keyword is present in the document, a user dictionary or a specialized dictionary is prepared for the keyword. However, if the desired keyword is not included in the keyword, the same is true, and the flexible search cannot be performed unless the search key necessary for the search is registered.

【００４１】この点に関し、本発明では、上記追加登録
すべき語の抽出に際し、それ以前に文書記憶部７に登録
済みの全ての文書を検索対象としたことから、追加登録
後の語を、キーワードとして、既に登録されている文書
に対しても確実に反映させることができる。With respect to this point, in the present invention, when extracting the words to be additionally registered, since all the documents registered in the document storage unit 7 before that are searched, the words after the additional registration are As a keyword, it can be surely reflected in the already registered document.

【００４２】なお、上記第１の実施の形態では、検索式
に対して適当なキーワードが抽出されなかった後の全文
検索においても検索結果が得られない場合（ステップ１
２０ＮＯ）、直ちに検索動作を終了するような例を開示
したが、この場合にも、上記検索式を形態素解析辞書３
にだけは登録するようにしても良い。In the first embodiment, if no search result is obtained in the full-text search after a proper keyword is not extracted from the search expression (step 1).
No. 20)), the example in which the search operation is immediately terminated is disclosed, but in this case as well, the search expression is changed to the morphological analysis dictionary 3
You may register only for.

【００４３】かかる処理によれば、次回、この検索式が
入力された場合、キーワード抽出部２による上記形態素
解析辞書３を参照した形態素解析により上記キーワード
が抽出され、次いで、このキーワードに基づきインデッ
クス検索を行った時に、インデックス検索結果が得られ
ないという事態に陥る。ここで、検索式が形態素解析辞
書３に登録されており、かつインデックス記憶部５に登
録されていないということは、このキーワードに対して
既に全文検索がなされ、かつその全文検索の結果、検索
結果が得られなかったことを意味している。従って、こ
の場合、検索部４において、上記検索結果からその意味
を判別し、長時間を要するステップ１１０以降の全文検
索をキャンセルするように対応することで、処理の効率
化を図ることができる。According to this processing, when this search expression is input next time, the keyword is extracted by the morphological analysis with reference to the morphological analysis dictionary 3 by the keyword extracting unit 2, and then the index search is performed based on this keyword. When I do, I fall into the situation that the index search result cannot be obtained. Here, the fact that the search expression is registered in the morphological analysis dictionary 3 and not registered in the index storage unit 5 means that the full-text search has already been performed for this keyword, and the result of the full-text search and the search result Means that was not obtained. Therefore, in this case, the search unit 4 determines the meaning from the search result and cancels the full-text search after step 110, which requires a long time, so that the processing efficiency can be improved.

【００４４】なお、上記第１の実施の形態では、検索式
によるインデックス検索結果が得られなかった場合、該
検索式に対して文字列サーチにより全文検索を行う場合
について述べたが、ここでの検索は必ずしも全文検索に
限るものではなく、文書の一部の検索を行うようにして
も同様の効果が期待できる。これは、後述する他の実施
の形態についても言えることである。In the first embodiment described above, when the index search result by the search formula is not obtained, the full text search is performed by the character string search for the search formula. The search is not necessarily limited to the full-text search, and the same effect can be expected even if a part of the document is searched. This also applies to other embodiments described later.

【００４５】次に、本発明の他の実施の形態について説
明する。上記第１の実施の形態においては、検索式に対
して適当なキーワードが抽出されなかった後の全文検索
結果に応じた上記検索式の追加登録先として、形態素解
析辞書３及びインデックス記憶部５を当てているが、第
２の実施の形態としては、形態素解析辞書３とは別にユ
ーザ辞書を設けて追加登録する構成が考えられる。図８
は、この第２の実施の形態に係わる文書検索装置の基本
構成を示すブロック図であり、図１における装置にユー
ザ辞書１０を追加した構成を有する。Next, another embodiment of the present invention will be described. In the first embodiment, the morphological analysis dictionary 3 and the index storage unit 5 are used as additional registration destinations of the search expression according to the full-text search result after an appropriate keyword is not extracted from the search expression. However, as a second embodiment, a configuration is possible in which a user dictionary is provided separately from the morphological analysis dictionary 3 and additionally registered. FIG.
FIG. 3 is a block diagram showing a basic configuration of the document search device according to the second embodiment, which has a configuration in which a user dictionary 10 is added to the device in FIG.

【００４６】また、図９はこの第２の実施の形態に係わ
る文書検索装置の検索動作を示すフローチャートであ
る。この文書検索装置においては、入力された検索式の
文字列全体を一語として用いた全文検索部８による全文
検索を行い（ステップ１１０）、この全文検索の結果が
「１」以上の場合（ステップ１１１ＹＥＳ）、追加登録
部９は、上記全文検索に用いられたキーワードである上
記検索式の文字列を一単語としてかつ品詞情報は名詞と
してユーザ辞書１０に登録する処理を行うとともに、イ
ンデックス記憶部５に対しては、上記検索式の文字列を
キーワード欄に、また該検索式により検索された文書Ｉ
Ｄを文書情報欄にそれぞれ登録する（ステップ１２
０）。FIG. 9 is a flow chart showing the search operation of the document search device according to the second embodiment. In this document search device, a full-text search is performed by the full-text search unit 8 using the entire character string of the input search expression as one word (step 110), and when the result of this full-text search is "1" or more (step 110). 111 YES), the additional registration unit 9 performs processing of registering the character string of the search expression, which is a keyword used in the full-text search, as one word and the part-of-speech information as a noun in the user dictionary 10, and at the same time, the index storage unit 5 , The character string of the above search formula is used in the keyword field, and the document I searched by this search formula is used.
D is registered in each of the document information fields (step 12)
0).

【００４７】この文書検索装置によれば、追加登録先と
しての専用の記憶部（ユーザ辞書１０）を有することに
よって、形態素解析辞書３には不要な情報を直接追加登
録せずに済み、新たにオリジナルの辞書データで別のデ
ータを作成したい場合に、ユーザ辞書１０をオフにする
ことにより、最初の環境下での登録が行える。According to this document retrieval apparatus, by having a dedicated storage unit (user dictionary 10) as an additional registration destination, it is not necessary to directly additionally register unnecessary information in the morphological analysis dictionary 3, and a new information is newly added. When it is desired to create other data from the original dictionary data, the user dictionary 10 can be turned off to perform registration under the first environment.

【００４８】また、本発明の第３の実施の形態として、
上記検索式がＡＮＤ（論理積）の論理演算式により与え
られる場合の対応機能を有するものが考えられる。例え
ば、検索式が、「Ａ＆Ｂ」のように与えられた場合、こ
の検索式の一方がインデックス登録されていて、もう一
方がインデックス登録されてないことも起こり得るが、
第３の実施の形態に係わる装置においては、上記のよう
な場合でも、インデックス登録されていない片方の検索
式に対して全文検索部８による全文検索を実施し、その
検索結果に応じて上記検索式のもう片方を、追加登録部
９により、形態素解析辞書３（若しくはユーザ辞書１
０）及びインデックス記憶部５に追加登録する機能を追
加する。このように、論理演算にも適用し得る構成とす
ることで、検索の用途が広がり、より効率的な検索を実
現できる。Further, as a third embodiment of the present invention,
It is conceivable that the search expression has a corresponding function when it is given by a logical operation expression of AND (logical product). For example, if a search expression is given as "A &B", it is possible that one of the search expressions is indexed and the other is not.
In the device according to the third embodiment, even in the above case, the full-text search is performed by the full-text search unit 8 for one search expression that is not registered in the index, and the above-mentioned search is performed according to the search result. The other of the expressions is added to the morphological analysis dictionary 3 (or the user dictionary 1 by the additional registration unit 9).
0) and a function of additionally registering in the index storage unit 5 are added. In this way, by adopting a configuration that can also be applied to logical operations, the use of search is expanded and more efficient search can be realized.

【００４９】また、第１の実施の形態では、全文検索を
経て新たに登録する単語の品詞を名詞としているが、本
発明の第４の実施の形態としては、形態素解析辞書３で
の形態素解析に用いることが可能な複数の品詞からユー
ザが指定できるように構成するようにしても良い。この
場合において、適当な品詞が分からない時、デフォルト
の設定（例えば名詞）を用いるように適応させることも
考えられる。この第４の実施の形態に係わる装置によれ
ば、品詞をある程度分けておきたいユーザは、用意した
品詞から指定ができるようになり、新たに登録した単語
を形態素解析する際の精度が向上し、キーワードの抽出
も正確に行えるようになる。Further, in the first embodiment, the part of speech of the word newly registered through the full text search is used as the noun, but in the fourth embodiment of the present invention, the morphological analysis in the morphological analysis dictionary 3 is performed. The user may specify from a plurality of parts of speech that can be used for. In this case, it may be possible to adapt to use a default setting (for example, a noun) when an appropriate part-of-speech is not known. According to the apparatus according to the fourth embodiment, a user who wants to classify parts of speech to some extent can specify from the prepared parts of speech, which improves accuracy in morphological analysis of a newly registered word. , The keywords can be extracted accurately.

【００５０】[0050]

【発明の効果】以上説明したように、本発明によれば、
検索キーに基づくインデックス検索に際し、インデック
ス検索結果が得られなかった場合、上記検索キーを一語
として文字列サーチによる文書検索に切り替えることに
より検索を続行し、この文書検索の結果が得られた場合
に、上記検索キーを一単語として該当する文書情報とと
もにインデックスに追加登録するようにしたため、次回
から同じ検索キーが与えられた場合に、該検索キーに対
応して既に登録されているインデックスを見いだして、
インデックス検索方式による効率的な検索を実現でき
る。As described above, according to the present invention,
When no index search result is obtained when performing an index search based on the search key, the search is continued by switching to document search by character string search with the above search key as one word, and the result of this document search is obtained In addition, since the above search key is additionally registered in the index as one word together with the corresponding document information, when the same search key is given from the next time, the index already registered corresponding to the search key will be found. hand,
An efficient search can be realized by the index search method.

【００５１】特に、形態素解析辞書を持ち、検索時に入
力される検索式を上記形態素解析辞書を用いて形態素解
析することにより検索キーワードを抽出し、該検索キー
ワードを前記検索キーとして上記インデックス検索を行
う方式を採用した装置にあっては、上記インデックスに
対する上記検索キーの追加登録時、該検索キー及び当該
検索キーに対する品詞情報を上記形態素解析辞書にも追
加登録することによって、検索式に基づくキーワードの
抽出及び該キーワードに基づくインデックス検索を確実
ならしめ、より効率の良い検索の実現に寄与する。In particular, a morpheme analysis dictionary is provided, and a search keyword is extracted by performing a morpheme analysis on a search expression input at the time of search using the morpheme analysis dictionary, and the index search is performed using the search keyword as the search key. In the device adopting the method, when additionally registering the search key for the index, by additionally registering the search key and the part of speech information for the search key also in the morphological analysis dictionary, a keyword based on the search formula The extraction and the index search based on the keyword are ensured to contribute to the realization of a more efficient search.

[Brief description of drawings]

【図１】本発明の一実施の形態に係わる文書検索装置の
概略構成を示す図。FIG. 1 is a diagram showing a schematic configuration of a document search device according to an embodiment of the present invention.

【図２】図１における装置での検索動作の概略を示すフ
ローチャート。FIG. 2 is a flowchart showing an outline of a search operation in the device shown in FIG.

【図３】図１における装置での形態素解析辞書の登録内
容の一例を示す図。FIG. 3 is a diagram showing an example of registered contents of a morphological analysis dictionary in the device shown in FIG.

【図４】図１における装置でのインデックスの登録内容
の一例を示す図。FIG. 4 is a view showing an example of index registration contents in the device in FIG.

【図５】図１における装置での全文検索結果の一例を示
す図。5 is a diagram showing an example of a full-text search result by the device in FIG.

【図６】図１における装置での全文検索結果に基づく更
新登録後の形態素解析辞書の登録内容の一例を示す図。6 is a diagram showing an example of registered contents of a morphological analysis dictionary after update registration based on a full-text search result in the apparatus in FIG.

【図７】図１における装置での全文検索結果に基づく更
新登録後のインデックスの登録内容の一例を示す図。FIG. 7 is a diagram showing an example of index registration contents after update registration based on a full-text search result in the apparatus in FIG.

【図８】本発明の別の実施の形態に係わる文書検索装置
の概略構成を示す図。FIG. 8 is a diagram showing a schematic configuration of a document search device according to another embodiment of the present invention.

【図９】図８における装置での検索動作の概略を示すフ
ローチャート。9 is a flowchart showing an outline of a search operation in the apparatus shown in FIG.

[Explanation of symbols]

１…入力部、２…キーワード抽出部、３…形態素解析辞
書、４…検索部、５…インデックス記憶部、６…表示
部、７…文書記憶部、８…全文検索部、９…追加登録
部、１０…ユーザ辞書1 ... Input unit, 2 ... Keyword extraction unit, 3 ... Morphological analysis dictionary, 4 ... Search unit, 5 ... Index storage unit, 6 ... Display unit, 7 ... Document storage unit, 8 ... Full text search unit, 9 ... Additional registration unit 10 ... User dictionary

Claims

[Claims]

1. A document storage means for storing a document, an index storage means for storing an index of the document stored in the document storage means, and a search for the index storage means based on a search key, which corresponds to the search key. And a document search device for performing a document search based on an index search result of the index search unit with respect to the search key, the document stored in the document storage unit using the search key. A text search means for searching and specifying a document including the search key; and a document stored in the document storage means by the text search means when a document corresponding to the search key is not specified by the index search means. When a document is identified by the document search, the search key is searched for. Document search apparatus characterized by comprising the additional registration means for additionally registering the index storage means in association with the document information of the document specified by the key.

2. A morphological analysis dictionary that stores morphological analysis information, and a morphological analysis of a search expression input at the time of search using the morphological analysis information to extract a search keyword, and the search keyword is used as the search key. A keyword extraction unit for giving the index search unit, wherein the additional registration unit adds the search key and the part-of-speech information for the search key to the morphological analysis dictionary when additionally registering the search key in the index storage unit. The document search device according to claim 1, wherein the document search device is additionally registered.