JP2000222432A

JP2000222432A - Document retrieval device, document retrieval method and recording medium recording document retrieval program

Info

Publication number: JP2000222432A
Application number: JP11026518A
Authority: JP
Inventors: Nobuyuki Omori; 信行大森; Daijiro Mori; 大二郎森; Masakatsu Okubo; 雅且大久保; Kazuo Tanaka; 一男田中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-02-03
Filing date: 1999-02-03
Publication date: 2000-08-11

Abstract

PROBLEM TO BE SOLVED: To provide a document retrieval device and a document retrieval method capable of shortening the time to be required for retrieval without reducing the retrieval accuracy of document, and to provide a recording medium on which a document retrieval program is recorded. SOLUTION: In this document retrieval method for extracting a retrieving word only when the extraction of the retrieving word from a retrieval request sentence is necessary, whether plural retrieving words are included in the retrieval request sentence or not is judged, and when the retrieval request sentence is not a sentence constituted of plural retrieving words as the result of judgment, whether the extraction of a retrieving word is necessary or not is judged by inquiring the retrieval request sentence and a word index 105. When the extraction of the retrieving word from the retrieval request sentence is unnecessary as the result of judgment, a document including the retrieval request sentence is retrieved from plural documents without extracting the retrieving word in the retrieval request sentence.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書検索技術に関
し、特に、大量の文書の中からユーザによって指定され
た条件に一致する文書を取りだす文書検索装置、方法、
文書検索プログラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search technique, and more particularly, to a document search apparatus and method for retrieving documents matching a condition specified by a user from a large number of documents.
The present invention relates to a recording medium on which a document search program is recorded.

【０００２】[0002]

【従来の技術】従来の技術において、インターネット等
で実現されている文書検索サービスについては、検索要
求文に自由な文章の入力を許す方式である自由文入力方
式が主流となっている。このような自由文入力方式によ
れば、文書検索サービスを利用するユーザは複雑な論理
式を覚える必要がなく、検索条件を極めて容易に指定す
ることができ、利便性を向上させることができる。2. Description of the Related Art In the prior art, as a document search service realized on the Internet or the like, a free sentence input method which allows a free text to be input in a search request sentence is mainly used. According to such a free sentence input method, a user using the document search service does not need to memorize complicated logical expressions, and can specify search conditions extremely easily, thereby improving convenience.

【０００３】自由文入力方式を実現するにあたり中心と
なる技術は形態素解析技術である。形態素解析とは、自
然文を単語に分割し、品詞を割り当てる処理のことであ
る。自由文入力方式では、この技術によって、ユーザが
自由に入力した検索要求文から検索条件として利用する
検索語を抽出する。形態素解析処理は複数の単語の連続
で構成される文書を単語に分解する処理であるが、この
処理は検索における他の一連の処理と比較して、大きな
計算時間を必要とする。以下で、形態素解析の処理につ
いてより詳細に述べる。The main technology for realizing the free sentence input method is a morphological analysis technology. Morphological analysis is a process of dividing a natural sentence into words and assigning parts of speech. In the free sentence input method, a search word used as a search condition is extracted from a search request sentence freely input by a user using this technique. The morphological analysis process is a process of decomposing a document consisting of a plurality of words into words, and this process requires a long calculation time as compared with other series of processes in the search. Hereinafter, the morphological analysis processing will be described in more detail.

【０００４】形態素解析には、単語辞書及び連接辞書の
二つの辞書を用いる。単語辞書は単語の文字列、品詞及
び単語コストからなり、連接辞書は品詞の連接可能性及
び連接コストからなる。ここで、単語コストとは、その
単語の生起のしやすさを表す数値であり、その値が小さ
いほどその単語が生起しやすいことを示す。また、連接
コストとは、連続する２つの語に対応する品詞の連接の
しやすさを表す数値である。これらの辞書を用い、コス
ト最小法というアルゴリズムに基づいた処理を行う。す
なわち、次に示す処理を行う。[0004] For morphological analysis, two dictionaries, a word dictionary and a concatenation dictionary, are used. The word dictionary is composed of a word character string, a part of speech, and a word cost, and the concatenation dictionary is composed of a concatenation possibility and a concatenation cost of a part of speech. Here, the word cost is a numerical value indicating the likelihood of occurrence of the word, and indicates that the smaller the value is, the more likely the word is to occur. The connection cost is a numerical value indicating the ease of connection of parts of speech corresponding to two consecutive words. Using these dictionaries, processing based on an algorithm called the minimum cost method is performed. That is, the following processing is performed.

【０００５】（１）まず、文の各位置から始まる文字列
で、単語辞書のエントリの文字列と一致する文字列を語
として切り出す。（２）次に、連接辞書を参照して、連接可能な語の並び
を残す。これらの処理により、語をノードとするラティス構造の
形で複数の解析結果が得られることになる。その中か
ら、それぞれの解析結果に対して、単語コストと連接コ
ストの累計を計算し、最小のコストを持つパスを最適解
として選び出す。(1) First, a character string that matches a character string of an entry in a word dictionary is extracted as a word from a character string starting from each position of a sentence. (2) Next, referring to the concatenation dictionary, a list of concatenable words is left. By these processes, a plurality of analysis results are obtained in the form of a lattice structure having words as nodes. From the results, the sum of the word cost and the concatenation cost is calculated for each analysis result, and the path having the minimum cost is selected as the optimal solution.

【０００６】ここで、ｎ文字から成る検索要求文を形態
素解析する場合、ｎ文字から、連続するｎ文字以下の文
字を選んでできる文字列の数は、連続する文字の数によ
り下記のようになる。（１）ｎ文字から連続する１文字選んで構成される文字
列の数はｎ（２）ｎ文字から連続する２文字選んで構成される文字
列の数はｎ−１・・（ｉ）ｎ文字から連続するｉ文字選んで構成される文字
列の数はｎ−ｉ＋１・・（ｎ）ｎ文字から連続するｎ文字選んで構成される文字
列の数は１それぞれの場合に、上記の組み合わせが考えられる。Here, when performing a morphological analysis on a search request sentence consisting of n characters, the number of character strings that can be selected from n characters and up to n consecutive characters depends on the number of consecutive characters as follows. Become. (1) The number of character strings formed by selecting one continuous character from n characters is n. (2) The number of character strings formed by selecting two consecutive characters from n characters is n-1... (I) n The number of character strings formed by selecting i consecutive characters from characters is n−i + 1... (N) The number of character strings formed by selecting n consecutive n characters from n characters is 1 in each case. Can be considered.

【０００７】この合計から、ｎ文字の検索要求文からはFrom this sum, from the search request sentence of n characters,

【０００８】[0008]

【数１】 (Equation 1)

【０００９】通りの単語が考えられる。すなわち、ｎ文
字からなる検索要求文を形態素解析するには、この回数
だけ、単語辞書を引き、その文字列が単語であるのか、
また単語であった場合にはその単語の生起のしやすさで
あるコストを得る必要がある。従って、形態素解析は、
単語辞書と検索要求文の比較において検索要求文の文字
列の長さｎの２乗に比例する計算量を必要とする。[0009] A possible word is considered. That is, in order to perform a morphological analysis on a search request sentence consisting of n characters, a word dictionary is looked up this number of times and whether the character string is a word,
If it is a word, it is necessary to obtain a cost, which is the probability of occurrence of the word. Therefore, morphological analysis is
The comparison between the word dictionary and the search request sentence requires a calculation amount proportional to the square of the length n of the character string of the search request sentence.

【００１０】このように形態素解析は、複数の解析結果
という選択肢の中から、コスト計算に基づいて一つの最
適解を選択するという処理のため、検索における処理と
比較して大きな計算時間を必要とする。従って、検索の
高速化には形態素解析に要する時間の短縮が必要とな
り、形態素解析を高速に行うための手法が研究されてお
り、そのような手法として、例えば、富士通研究所の日
本語形態素解析システムBreakfast 、京都大学のＪＵＭ
ＡＮ、奈良先端技術大学のＣｈａＳｅｎ等がある。しか
し、どのように処理の効率化を行っても、形態素解析に
おいては本質的に検索文字列の長さｎの２乗に比例する
計算量が必要になる。As described above, the morphological analysis requires a large calculation time as compared with a search process because the process of selecting one optimal solution based on cost calculation from a plurality of analysis result options. I do. Therefore, it is necessary to shorten the time required for morphological analysis in order to speed up the search, and techniques for performing morphological analysis at high speed have been studied. For example, Japanese morphological analysis by Fujitsu Laboratories Ltd. System Breakfast, JUM of Kyoto University
AN and Nara Institute of Technology ChaSen. However, no matter how the processing efficiency is increased, the morphological analysis requires a calculation amount that is essentially proportional to the square of the length n of the search character string.

【００１１】更に、文書の検索においては、検索対象と
なる文書を形態素解析し文書中の単語を単語インデック
スに登録しておき、入力された単語と単語インデックス
との照会を行う方法が一般的にとられている。単語イン
デックスには、検索対象となる文書中の単語が登録され
ているので、ある文字列が単語インデックスに含まれる
ことは、その文字列が検索対象文の中に含まれている単
語であるための必要十分な条件であるのでこの方法がと
られている。Further, in a document search, a method of morphologically analyzing a document to be searched, registering words in the document in a word index, and inquiring the input word and the word index is generally used. Has been taken. Since the words in the document to be searched are registered in the word index, the fact that a certain character string is included in the word index means that the character string is a word included in the search target sentence. This method is adopted because the conditions are sufficient and sufficient.

【００１２】ここで、自由文入力方式の検索システム等
のように検索要求文が複数の検索語から構成される文の
場合には、文と単語インデックスとの照会はできないの
で、入力された検索要求文を形態素解析し、抽出したそ
れぞれの検索語と、単語インデックスとの照会を行う。
このため、検索要求文の入力の度に形態素解析が実行さ
れていた。Here, when the search request sentence is a sentence composed of a plurality of search words, such as a free sentence input type search system, it is not possible to query the sentence and the word index. The request sentence is subjected to morphological analysis, and a query is performed between each extracted search word and the word index.
For this reason, each time a search request sentence is input, morphological analysis is performed.

【００１３】[0013]

【発明が解決しようとする課題】しかしながら、上記従
来の技術によると、形態素解析は文書の検索における他
の処理と比較して大さな計算時間を必要とするにもかか
わらず、検索要求文の入力の度に実行されていたため、
形態素解析に要する時間が文書検索における検索時間を
延長させるという問題点があった。However, according to the above-mentioned prior art, morphological analysis requires a large amount of calculation time as compared with other processes in document retrieval, but it does not require a large amount of calculation time. Since it was executed every time input,
There is a problem that the time required for morphological analysis extends the search time in document search.

【００１４】本発明は、上記の点に鑑みてなされたもの
であり、すべての検索要求ごとに形態素解析を行うので
はなく、形態素解析が必要であるか否かを判定し、必要
な場合にのみ形態素解析を行うことによって、検索精度
を低下させることなく、検索に要する時間を削減するこ
とを可能とする文書検索装置、方法及び文書検索プログ
ラムを記録した記録媒体を提供することを目的とする。The present invention has been made in view of the above points, and does not perform morphological analysis for every search request, but determines whether or not morphological analysis is necessary. An object of the present invention is to provide a document search device, a method, and a recording medium on which a document search program is recorded, which can reduce the time required for search without lowering search accuracy by performing only morphological analysis. .

【００１５】[0015]

【発明を解決するための手段】上記目的を達成するため
に、本発明は次のように構成される。本発明は、検索要
求文から必要に応じて抽出した検索語を含む文書を複数
の文書の中から単語インデックスを用いて検索する文書
検索装置であって、前記検索要求文から検索語を抽出す
る必要があるか否かを判定する検索語抽出判定手段と、
該判定の結果、該検索語を抽出する必要がない場合に
は、該検索要求文から検索語を抽出せずに該検索要求文
を含む文書を前記複数の文書の中から検索する手段とを
有する。本発明によれば、まず検索語の抽出の要否を判
定することとしたため、従来技術のように全ての検索要
求文に対して形態素解析を行なって検索語を抽出する必
要がなくなり、形態素解析でかかっていた計算時間を削
減することができるので、短時間で文書を検索すること
が可能となる。In order to achieve the above object, the present invention is configured as follows. The present invention is a document search device for searching a document including a search word extracted from a search request sentence as needed from a plurality of documents using a word index, and extracts a search word from the search request sentence. Search term extraction determining means for determining whether or not it is necessary;
As a result of the determination, when it is not necessary to extract the search term, means for searching a document including the search request sentence from the plurality of documents without extracting the search term from the search request sentence. Have. According to the present invention, it is first determined whether or not it is necessary to extract a search word. Therefore, it is not necessary to perform a morphological analysis on all search request sentences and extract a search word as in the related art. In this case, the calculation time required for the document can be reduced, so that the document can be searched in a short time.

【００１６】上記構成において、前記検索語抽出判定手
段は、前記検索要求文の長さにより該検索要求文が複数
の検索語から構成される文であるかどうかを判定する第
１の判定手段と、該判定の結果、該検索要求文が複数の
検索語から構成される文でない場合には、該検索要求文
と前記単語インデックスとの照会により検索語抽出の必
要があるか否かを判定する第２の判定手段とを有するこ
ととしてもよい。[0016] In the above-mentioned configuration, the search term extraction judging means judges whether or not the search request sentence is a sentence composed of a plurality of search words based on a length of the search request sentence. If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, it is determined whether or not it is necessary to extract a search word by referring to the search request sentence and the word index. It may have a second determination unit.

【００１７】本発明によれば、複数の検索語から構成さ
れていないことがわかればそのまま単語インデックスと
の照会を行うこととしたので、そこで照会に成功すれば
そのまま文書の抽出を行うことができるため、検索時間
を短縮できる。更に、成功しない場合でも、通常どおり
に形態素解析を行うこととすれば、検索精度も低下させ
ることがない。According to the present invention, if it is found that the word is not composed of a plurality of search words, the query with the word index is performed as it is. If the query succeeds, the document can be extracted as it is. Therefore, the search time can be reduced. Furthermore, even if the search is not successful, if the morphological analysis is performed as usual, the search accuracy does not decrease.

【００１８】また、上記構成において、前記検索語抽出
判定手段は、前記検索要求文を構成する文字種により該
検索要求文が複数の検索語から構成される文であるかど
うかを判定する第１の判定手段と、該判定の結果、該検
索要求文が複数の検索語から構成される文でない場合に
は、該検索要求文と前記単語インデックスとの照会によ
り検索語抽出の必要があるか否かを判定する第２の判定
手段とを有することとしてもよい。Further, in the above configuration, the search term extraction determining means determines whether or not the search request sentence is a sentence composed of a plurality of search words based on a character type constituting the search request sentence. Determining means, if the result of the determination indicates that the search request sentence is not a sentence composed of a plurality of search words, whether or not it is necessary to extract a search word by referring to the search request sentence and the word index And a second determination unit that determines

【００１９】本発明によれば、複数の検索語から構成さ
れていないことがわかればそのまま単語インデックスと
の照会を行うこととしたので、そこで照会に成功すれば
そのまま文書の抽出を行うことができるため、検索時間
を短縮できる。更に、照会に成功しない場合でも、通常
どおりに形態素解析を行うこととすれば、検索精度も低
下させることがない。According to the present invention, if it is found that the word is not composed of a plurality of search words, the query with the word index is performed as it is. If the query succeeds, the document can be extracted as it is. Therefore, the search time can be reduced. Furthermore, even if the inquiry is not successful, the morphological analysis is performed as usual, and the search accuracy does not decrease.

【００２０】また、上記の目的を達成するために、次の
ようにしてもよい。本発明は、検索要求文から必要に応
じて抽出した検索語を含む文書を複数の文書の中から単
語インデックスを用いて検索する文書検索方法であっ
て、前記検索要求文から検索語を抽出する必要があるか
否かを判定する検索語抽出判定ステップと、該判定の結
果、該検索語を抽出する必要がない場合には、該検索要
求文から検索語を抽出せずに該検索要求文を含む文書を
前記複数の文書の中から検索するステップとを有する。Further, in order to achieve the above object, the following may be performed. The present invention is a document search method for searching a document including a search word extracted from a search request sentence as needed from a plurality of documents using a word index, and extracting a search word from the search request sentence. A search word extraction determining step of determining whether it is necessary or not, and as a result of the determination, if it is not necessary to extract the search word, the search request sentence is not extracted from the search request sentence; Retrieving a document including the following from the plurality of documents.

【００２１】また、前記検索語抽出判定ステップは、前
記検索要求文の長さにより該検索要求文が複数の検索語
から構成される文であるかどうかを判定する第１の判定
ステップと、該判定の結果、該検索要求文が複数の検索
語から構成される文でない場合には、該検索要求文と前
記単語インデックスとの照会により検索語抽出の必要が
あるか否かを判定する第２の判定ステップとを有するこ
ととしてもよく、更に、前記検索語抽出判定ステップ
は、前記検索要求文を構成する文字種により該検索要求
文が複数の検索語から構成される文であるかどうかを判
定する第１の判定ステップと、該判定の結果、該検索要
求文が複数の検索語から構成される文でない場合には、
該検索要求文と前記単語インデックスとの照会により検
索語抽出の必要があるか否かを判定する第２の判定ステ
ップとを有することとしてもよい。The search term extraction determining step includes a first determination step of determining whether the search request sentence is a sentence composed of a plurality of search words based on a length of the search request sentence. If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a second determination is made as to whether it is necessary to extract a search word by referring to the search request sentence and the word index. And the search word extraction determining step determines whether the search request sentence is a sentence composed of a plurality of search words based on a character type constituting the search request sentence. A first determining step, and if the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words,
There may be provided a second determining step of determining whether or not it is necessary to extract a search word by referring to the search request sentence and the word index.

【００２２】また、本発明は、検索要求文から必要に応
じて抽出した検索語を含む文書を複数の文書の中から単
語インデックスを用いて検索する文書検索処理をコンピ
ュータに実行させる文書検索プログラムを記録した記録
媒体であって、前記検索要求文から検索語を抽出する必
要があるか否かを判定する検索語抽出判定手順と、該判
定の結果、該検索語を抽出する必要がない場合には、該
検索要求文から検索語を抽出せずに該検索要求文を含む
文書を前記複数の文書の中から検索する手順とをコンピ
ュータに実行させることを特徴とする文書検索プログラ
ムを記録した記録媒体とすることもできる。The present invention also provides a document search program for causing a computer to execute a document search process for searching a document including a search word extracted from a search request sentence from a plurality of documents using a word index as necessary. A recording medium for recording, a search term extraction determination procedure for determining whether or not it is necessary to extract a search term from the search request sentence; and, as a result of the determination, when it is not necessary to extract the search term. Recording a document search program on a computer, the program further comprising: causing a computer to execute a step of searching a document including the search request sentence from the plurality of documents without extracting a search term from the search request sentence. It can also be a medium.

【００２３】また、上記構成において、前記検索語抽出
判定手順は、前記検索要求文の長さにより該検索要求文
が複数の検索語から構成される文であるかどうかを判定
する第１の判定手順と、該判定の結果、該検索要求文が
複数の検索語から構成される文でない場合には、該検索
要求文と前記単語インデックスとの照会により検索語抽
出の必要があるか否かを判定する第２の判定手順とを有
することとしてもよく、更に、前記検索語抽出判定手順
は、前記検索要求文を構成する文字種により該検索要求
文が複数の検索語から構成される文であるかどうかを判
定する第１の判定手順と、該判定の結果、該検索要求文
が複数の検索語から構成される文でない場合には、該検
索要求文と前記単語インデックスとの照会により検索語
抽出の必要があるか否かを判定する第２の判定手順とを
有することとしてもよい。Further, in the above configuration, the search word extraction determining step includes a first determination for determining whether the search request sentence is a sentence composed of a plurality of search words based on a length of the search request sentence. And if the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, it is determined whether or not it is necessary to extract a search word by referring to the search request sentence and the word index. The search request extraction determination step may be a sentence in which the search request sentence is composed of a plurality of search words by a character type constituting the search request sentence. A first determination procedure for determining whether or not the search request sentence is not a sentence composed of a plurality of search words; Need to be extracted It may have a second determining step determines whether.

【００２４】文書検索において、複数の単語から構成さ
れる検索要求文が入力された場合には、検索要求文か
ら、その文を構成する検索語を抽出する必要があり、こ
の検索語の抽出には、大きな計算時間を要する形態素解
析処理が必要になるが、上述した本発明によれば、検索
語抽出の必要性を判定し、検索語抽出処理の必要がある
と判断された場合のみ形態素解析を実行する。このよう
に、文書検索において大きな計算処理時間を必要とする
形態素解析処理を減少させることにより、情報検索の速
度の向上が可能になる。In a document search, when a search request sentence composed of a plurality of words is input, it is necessary to extract a search term constituting the sentence from the search request sentence. Requires a morphological analysis process that requires a long calculation time, but according to the present invention described above, the necessity of the search word extraction is determined, and only when it is determined that the search word extraction process is necessary, the morphological analysis is performed. Execute As described above, the speed of the information search can be improved by reducing the morphological analysis processing requiring a long calculation processing time in the document search.

【００２５】また、本発明によれば、検索語抽出の必要
性を判定する処理が必要になるが、この処理に要する時
間は形態素解析に要する時間と比較して十分小さく、検
索における処理時間が延びることはない。また、検索結
果の精度も形態素解析を行った場合と全く同一のものが
得られる。According to the present invention, a process for determining the necessity of extracting a search word is required. The time required for this process is sufficiently smaller than the time required for morphological analysis. It does not extend. In addition, the accuracy of the search result is exactly the same as that obtained by performing the morphological analysis.

【００２６】[0026]

【発明の実施の形態】図１は、本発明の実施の形態にお
ける文書検索装置の構成を示すブロック図である。同図
に示すように、本文書検索装置は、検索要求文入力部１
０１、検索語抽出実行判定部１０２、検索語抽出部１０
３、単語インデックス照会部１０４、単語インデックス
１０５、文書出力部１０６から構成される。検索要求文
入力部１０１は、ユーザが入力した検索要求文を受信し
て検索語抽出実行判定部１０２に送信する機能を有す
る。検索要求文としては、「ＮＴＴ」といった単一の検
索語のみを指定することもでき、また、「ＮＴＴのＩＳ
ＤＮサービスについて」といった複数の検索語から成る
文章を指定することもできる。検索語抽出実行判定部１
０２は、入力された検索要求文を形態素解析する必要が
有るかどうかを判定する機能を有し、検索語抽出部１０
３は、形態素解析が必要と判断された場合に検索要求文
を形態素解析することにより検索語を抽出する機能を有
する。単語インデックス照会部１０４は、検索要求文又
は検索語と、単語インデックス１０５に蓄積されている
単語インデックスとの照会を行う機能を有し、単語イン
デックス１０５は、検索要求文又は検索語との照会を行
う機能を有する。また、文書出力部１０６は、単語イン
デックス照会部１０４で照会に成功した文書を、単語頻
度情報に基づいて適切な順に出力する機能を有する。FIG. 1 is a block diagram showing a configuration of a document search apparatus according to an embodiment of the present invention. As shown in the figure, the present document search device includes a search request sentence input unit 1
01, search term extraction execution determination unit 102, search term extraction unit 10
3. Consists of a word index inquiry unit 104, a word index 105, and a document output unit 106. The search request sentence input unit 101 has a function of receiving a search request sentence input by a user and transmitting the search request sentence to the search term extraction execution determination unit 102. As the search request sentence, only a single search word such as “NTT” can be specified.
For example, a sentence including a plurality of search words such as "about DN service" can be specified. Search term extraction execution determination unit 1
02 has a function of determining whether or not it is necessary to perform a morphological analysis on an input search request sentence.
No. 3 has a function of extracting a search word by performing a morphological analysis on a search request sentence when it is determined that morphological analysis is necessary. The word index inquiry unit 104 has a function of inquiring a search request sentence or a search word and a word index stored in the word index 105, and the word index 105 performs an inquiry of the search request sentence or the search word. It has a function to perform. Further, the document output unit 106 has a function of outputting the documents successfully queried by the word index query unit 104 in an appropriate order based on the word frequency information.

【００２７】検索要求文１０１で入力された検索要求文
に対して、検索語抽出実行判定部１０２はまず形態素解
析を行うかどうかの判定を行う。判定の結果、形態素解
析が必要であれば検索語抽出部１０３にて形態語解析が
行なわれて、単語インデックス照会部１０４で単語イン
デックス１０５との照会が行なわれる。形態素解析が必
要でないとの判定結果であれば、形態素解析は行なわれ
ない。上記の判定には以下の３つの基準がある。For the search request sentence input in the search request sentence 101, the search term extraction execution determination unit 102 first determines whether or not to perform morphological analysis. As a result of the determination, if morphological analysis is necessary, morphological analysis is performed in the search word extracting unit 103, and an inquiry with the word index 105 is performed in the word index inquiry unit 104. If it is determined that morphological analysis is not necessary, morphological analysis is not performed. The above determination has the following three criteria.

【００２８】すなわち、基準１による判定は形態素解析
を行わずに検索要求文と単語インデックスとで照会を行
うものであり、基準２による判定は検索要求文の文字数
が一定以上の場合に、検索要求文は複数の検索語から構
成される文であると判断するものであり、基準３による
判定は検索要求文に複数の種類の文字が含まれる場合、
検索要求文は複数の検索語から構成される文であると判
断するものである。この基準３は、具体的には、句点、
読点、ピリオド、カンマ、引用符、二重引用符、空白文
字、その他記号などが含まれるかどうかを判断する。そ
れぞれの基準がどのように使用されて判定が行なわれる
かについては次の動作の説明のところで説明する。That is, the judgment based on the criterion 1 is for making an inquiry using the search request sentence and the word index without performing morphological analysis, and the judgment based on the criterion 2 is performed when the number of characters of the search request sentence is more than a certain number. The sentence is determined to be a sentence composed of a plurality of search words, and the determination based on the criterion 3 is performed when a plurality of types of characters are included in the search request sentence.
The search request sentence is determined to be a sentence composed of a plurality of search words. This criterion 3 is, specifically,
Determine if there are any punctuation, periods, commas, quotes, double quotes, whitespace, or other symbols. How each criterion is used to make the determination will be described in the following description of the operation.

【００２９】本文書検索装置の動作を図２に示すフロー
チャートを用いてより詳細に説明する。まず、ステップ
２０１として、検索要求文入力部１０１から検索要求文
が入力される。入力された検索要求文は検索語抽出実行
判定部１０２に渡され、ステップ２０２として基準２、
及び基準３による判定を行う。判定の結果、判定基準
２、３に該当した場合、すなわち、検索要求文は複数の
検索語から構成される文であると判断した場合には、ス
テップ２０５として、検索語抽出部１０３にて形態素解
析により検索要求文に含まれている検索語を取り出し、
ステップ２０６として単語インデックス照会部１０４に
て単語インデックス１０５との照会を行う。The operation of the document retrieval apparatus will be described in more detail with reference to the flowchart shown in FIG. First, in step 201, a search request sentence is input from the search request sentence input unit 101. The input search request sentence is passed to the search term extraction execution determination unit 102, and a reference 2
And criterion 3 are determined. As a result of the judgment, when the judgment criteria 2 and 3 are satisfied, that is, when it is judged that the search request sentence is a sentence composed of a plurality of search words, the search word extraction unit 103 performs The search term included in the search request sentence is extracted by the analysis,
In step 206, the word index query unit 104 makes a query with the word index 105.

【００３０】ステップ２０２の判定の結果、判定基準に
該当しない場合、すなわち、検索要求文は単一の検索語
から構成されると判断された場合は、ステップ２０３と
して検索要求文と単語インデックスの照会を行う。この
とき、検索要求文が単一の検索語であり照会に成功した
場合には、ステップ２０４として単語インデックスから
文書ＩＤと単語頻度情報をを取り出す。また、ステップ
２０３にて照合に成功しなかった場合、失敗の原因は、
（１）検索語が単語インデックスに含まれなかったの
か、あるいは（２）検索要求文が複数の単語から構成さ
れていたためなのかを判定することはできない。従っ
て、ここでは判定基準に該当した場合と同様にステップ
２０５にて形態素解析を行い、含まれている検索語を取
り出し、ステップ２０６にて単語インデックスとの照会
を行う。照会に成功すれば、ステップ２０７として、単
語インデックス照会部１０４は単語インデックス１０５
から文書ＩＤと単語頻度情報を抽出し、それを文書出力
部１０６に渡す。照会に失敗した場合は、その検索語を
含む文書は単語インデックス１０５に含まれていないの
で、検索要求文に該当する文書が存在しないという信号
を出力部１０６に渡す。ステップ２０４にて単語インデ
ックス１０５から文書ＩＤと単語頻度情報を抽出した後
も同様にその結果を文書出力部１０６に渡す。If the result of the determination in step 202 is that the search criterion does not correspond to the criterion, that is, if it is determined that the search request sentence is composed of a single search word, then in step 203, the search request sentence and the word index are queried. I do. At this time, if the search request sentence is a single search word and the query succeeds, the document ID and the word frequency information are extracted from the word index in step 204. If the collation is not successful in step 203, the cause of the failure is
It cannot be determined whether (1) the search word is not included in the word index, or (2) whether the search request sentence is composed of a plurality of words. Therefore, here, the morphological analysis is performed in step 205 in the same manner as in the case where the judgment criterion is satisfied, the retrieved search words are extracted, and in step 206, the query with the word index is performed. If the query is successful, the word index query unit 104 determines in step 207 that the word index 105
Then, a document ID and word frequency information are extracted from the document ID, and are passed to the document output unit 106. If the query fails, the document containing the search term is not included in the word index 105, and thus a signal indicating that there is no document corresponding to the search request sentence is passed to the output unit 106. After extracting the document ID and the word frequency information from the word index 105 in step 204, the result is similarly transferred to the document output unit 106.

【００３１】文書出力部１０６は、単語インデックス照
会部１０４から文書ＩＤと単語頻度情報を受け取った場
合に、単語頻度情報に基づいた計算を行い適切な順に文
書を出力する。文書出力部１０６が、検索要求文に該当
する文書が存在しないという信号を受け取った場合は、
「検索語を含む文書がない」という表示を行う。ここ
で、上述した各基準の作用効果について説明する。When the document output unit 106 receives the document ID and the word frequency information from the word index inquiry unit 104, it calculates based on the word frequency information and outputs the documents in an appropriate order. When the document output unit 106 receives a signal indicating that there is no document corresponding to the search request statement,
The display indicates that there is no document containing the search term. Here, the operation and effect of each standard described above will be described.

【００３２】基準１は、単語インデックスを利用して検
索語の抽出を行う必要があるかを判定する判定基準であ
り、この判定処理において、単語インデックスと検索要
求文の照会に成功した場合は、その時点で検索要求文は
単一の検索語から構成されることが確定し、かつ、単語
インデックスからその検索語を含む文書を得ることがで
きる。The criterion 1 is a criterion for determining whether or not it is necessary to extract a search word using a word index. In this determination processing, if the query of the word index and the search request sentence is successful, At that time, it is determined that the search request sentence is composed of a single search word, and a document including the search word can be obtained from the word index.

【００３３】基準２と基準３は、検索要求文の性質か
ら、検索語の抽出を行う必要があるかを判定する判定基
準である。これらの基準に一致した場合は、検索要求文
は複数の検索語から構成されていることが極めて多い。
この判定が誤る確率は極めて小さく、一検索語から構成
される検索要求に対して検索語抽出処理を行うことも極
めて少ない。また、誤った判定を行ったとしても、形態
素解析を行い単語インデックスとの照会を行うという従
来の検索における処理と同様のことを行うので、検索精
度には全く影響がない。The criterion 2 and the criterion 3 are criteria for judging whether or not it is necessary to extract a search term from the nature of the search request sentence. If these criteria are met, the search request sentence is very often composed of a plurality of search terms.
The probability of this determination being erroneous is extremely small, and a search word extraction process is extremely rarely performed for a search request composed of one search word. Further, even if an erroneous determination is made, the same processing as in the conventional search of performing a morphological analysis and performing a query with a word index is performed, so that the search accuracy is not affected at all.

【００３４】従って、基準１、２、３を上述したフロー
チャートで示したように用いることによって、検索要求
文が一つの単語のみで構成されている場合、検索要求文
と単語インデックスの照会処理については、一回の照会
処理のみで文書を取り出すことができる。ここで、一回
の単語インデックスとの照会処理に要する計算時間は、
形態素解析における文字列と単語辞書との一回の比較処
理に要する計算時間と同じ程度であり、検索要求分の長
さｎに比例する。また、基準２及び３における形態素解
析の要否判断のためにはｎに比例する計算時間を要す
る。ここで、形態素解析における文字列と単語辞書との
比較処理の回数はｎの２乗に比例する。従って、形態素
解析を行わないことにより、形態素解析の要・不要の判
断にかかる計算時間を考慮しても、検索要求文の長さｎ
に反比例して検索時間を短縮することが可能である。Therefore, by using the criteria 1, 2, and 3 as shown in the above-described flowchart, when the search request sentence is composed of only one word, the query processing of the search request sentence and the word index is not performed. A document can be retrieved by only one inquiry process. Here, the calculation time required for a single query process with a word index is
It is about the same as the calculation time required for one comparison process between a character string and a word dictionary in morphological analysis, and is proportional to the length n of the search request. Further, it takes a calculation time proportional to n to judge whether or not the morphological analysis is necessary in the criteria 2 and 3. Here, the number of times of the comparison process between the character string and the word dictionary in the morphological analysis is proportional to the square of n. Therefore, by not performing morphological analysis, the length of the search request sentence n
The search time can be shortened in inverse proportion to.

【００３５】以上に述べたとおり、検索要求文から検索
語を抽出する必要はなしと判定された場合、すなわち検
索要求文と単語インデックスの照会に成功した場合は、
検索要求文を形態素解析を行わずに検索が可能となる。
以上の手段により、検索精度については形態素解析を行
った場合と変化はなく、検索時間のみを減少することが
できる。As described above, when it is determined that there is no need to extract a search term from a search request sentence, that is, when the query of the search request sentence and the word index is successful,
The search request sentence can be searched without performing morphological analysis.
With the above-described means, the search accuracy remains unchanged from the case where morphological analysis is performed, and only the search time can be reduced.

【００３６】次に、本文書検索装置の動作をより具体的
に説明する。図３は、単語インデックス１０５の具体例
を示す図である。同図に示すように、単語インデックス
は、単語とその出現文書のＩＤを組にした表である。検
索時には、検索語と単語インデックス中の単語との照会
を行い、一致した行からその語を含む文書のＩＤのリス
トを得る。例えば、図３の単語インデックスに対して、
検索語として「サービス」という入力が与えられると、
単語の欄が「サービス」である行との照会が成功し、Ｉ
Ｄ番号が３、２５等の文書が検索結果として得られる。
単語インデックスには、単語のみが登録されている。よ
って、検索要求文が、「ＮＴＴのＩＳＤＮサービスにつ
いて」のように複数の検索語から構成される場合は、検
索要求文と単語インデックスの照会は失敗する。そのよ
うな場合は、個々の検索語を抽出し、それぞれの検索語
と単語インデックスの照会を行うことになる。Next, the operation of the document search apparatus will be described more specifically. FIG. 3 is a diagram illustrating a specific example of the word index 105. As shown in the figure, the word index is a table in which a word and an ID of an appearing document are paired. At the time of retrieval, an inquiry is made between the search word and the word in the word index, and a list of IDs of documents containing the word is obtained from the matching line. For example, for the word index in FIG.
Given the search term "service",
The query with the line where the word column is "service" succeeds and I
Documents having D numbers of 3, 25, etc. are obtained as search results.
Only words are registered in the word index. Therefore, when the search request sentence is composed of a plurality of search words as in “NTT ISDN service”, the query of the search request sentence and the word index fails. In such a case, individual search terms are extracted, and each search term and word index are queried.

【００３７】続いて、検索要求文と単語インデックスと
の照会について、図２のフローチャートを参照しなが
ら、図３の例を用いて、検索要求文が単一の検索語から
構成されている場合と検索要求文が複数の検索語から構
成される場合について場合を分けて説明する。（１）検索要求文が単一の検索語から構成されている場
合検索要求文が「サービス」である場合に、図２のステッ
プ２０２における基準２、３の判定では一単語のみと判
定され、次に、ステップ２０３において「サービス」と
いう語が単語インデックスに含まれていれば照会に成功
する。図３における単語インデックスの例では、「サー
ビス」という単語が単語インデックスに含まれるので、
照会は成功し「サービス」という語を含む文書のＩＤが
得られる。検索対象文書に「サービス」という語を含む
単語がなく、単語インデックスに含まれていない場合に
は、ステップ２０５で形態素解析を行い、「サービス」
を検索語として取り出す。そして、ステップ２０６にお
いて単語インデックスとの照会に失敗すると、検索語が
単語インデックスに含まれないということがわかる。Next, with reference to the flowchart of FIG. 2, a query between the search request sentence and the word index will be described with reference to the example of FIG. The case where the search request sentence is composed of a plurality of search words will be described separately. (1) When the search request sentence is composed of a single search word When the search request sentence is “service”, it is determined that there is only one word in the determination of criteria 2 and 3 in step 202 in FIG. Next, in step 203, if the word "service" is included in the word index, the inquiry is successful. In the example of the word index in FIG. 3, since the word "service" is included in the word index,
The query succeeds and the ID of the document containing the word "service" is obtained. If there is no word including the word “service” in the search target document and the word is not included in the word index, a morphological analysis is performed in step 205 and the “service”
Is extracted as a search term. Then, if the query with the word index fails in step 206, it is understood that the search word is not included in the word index.

【００３８】（２）検索要求文が複数の検索語から構成
される場合検索要求文が「ＮＴＴのＩＳＤＮサービスについて」で
あるとすると、ステップ２０２における基準２、３の判
定で、検索要求文は複数の検索語から構成されている文
であると判定されるために、ステップ２０５において形
態素解析により検索語を取り出す。ここで検索要求文中
の自立語を検索語とすると、「ＮＴＴ」、「ＩＳＤ
Ｎ」、「サービス」が検索語として抽出される。(2) When the search request sentence is composed of a plurality of search words If the search request sentence is "about NTT's ISDN service", the judgment of the criteria 2 and 3 in step 202 indicates that the search request sentence is Since it is determined that the sentence is composed of a plurality of search words, the search words are extracted by morphological analysis in step 205. Here, assuming that the independent word in the search request sentence is a search word, “NTT”, “ISD”
N "and" service "are extracted as search terms.

【００３９】ステップ２０６において、各検索語と単語
インデックスを照会することにより、各検索語を含む文
書を得る。なお、このように複数の検索語が抽出された
場合には、すべての語を含む、一部の語を含む、ある語
のみを含む、または含まない、といった検索の条件を指
定することも可能である。この条件は、入力時にユーザ
により指定され得る。At step 206, a document containing each search term is obtained by querying each search term and the word index. When multiple search terms are extracted in this way, it is also possible to specify search conditions such as including all words, including some words, including only certain words, or not including It is. This condition can be specified by the user at the time of input.

【００４０】本発明の文書検索装置の構成は、図１で示
した実施の形態の構成に限定されることなく、各々の構
成要素を上記で説明した手順を有するソフトウェア（プ
ログラム）で構築し、コンピュータシステムにそのプロ
グラムを実行させることにより一般的なコンピュータシ
ステムを本発明の文書検索装置とすることが可能であ
る。The configuration of the document search device of the present invention is not limited to the configuration of the embodiment shown in FIG. 1, but each component is constructed by software (program) having the above-described procedure. By causing the computer system to execute the program, a general computer system can be used as the document search device of the present invention.

【００４１】図４は上記コンピュータシステムのハード
ウェア構成の例を示すブロック図である。本コンピュー
タシステムは、処理を実行するＣＰＵ３００、プログラ
ムやデータを記憶するメモリ３０１、メモリ３０１また
はＣＰＵ３００で使用するプログラムやデータを蓄積す
る外部記憶装置３０２、データを表示するディスプレイ
３０３、データまたは命令を入力するキーボード３０
４、ネットワークを介して他のコンピュータシステム等
と通信を行うための通信処理装置３０５から構成され
る。上記プログラムはメモリ３０１又は外部記憶装置３
０２にインストールされＣＰＵ３００により実行され
る。FIG. 4 is a block diagram showing an example of a hardware configuration of the computer system. The computer system includes a CPU 300 for executing processing, a memory 301 for storing programs and data, an external storage device 302 for storing programs and data used by the memory 301 or the CPU 300, a display 303 for displaying data, and inputting data or instructions. Keyboard 30
4. It comprises a communication processing device 305 for communicating with another computer system or the like via a network. The program is stored in the memory 301 or the external storage device 3
02 and executed by the CPU 300.

【００４２】また、本発明の記録媒体はメモリ３０１又
は外部記憶装置３０２に相当する。更に、電子メモリ、
ハードディスク、又は、フロッピーディスク、ＣＤ−Ｒ
ＯＭ、磁気テープ等の可搬記録媒体等も本発明の記録媒
体として使用可能である。本発明の文書検索プログラム
記録媒体に記録されたプログラムを、図４に示すような
一般的なコンピュータシステムにローディングすること
により、そのコンピュータ上で本発明の文書検索方法を
実施することが可能となる。The recording medium of the present invention corresponds to the memory 301 or the external storage device 302. In addition, electronic memory,
Hard disk or floppy disk, CD-R
Portable recording media such as OM and magnetic tape can be used as the recording media of the present invention. By loading the program recorded on the document search program recording medium of the present invention into a general computer system as shown in FIG. 4, it becomes possible to execute the document search method of the present invention on the computer. .

【００４３】なお、本発明は上記の実施例に限定される
ことなく、特許請求の範囲内で種々変更・応用が可能で
ある。It should be noted that the present invention is not limited to the above-described embodiment, but can be variously modified and applied within the scope of the claims.

【００４４】[0044]

【発明の効果】上述したように、本発明によれば、以下
のような効果が得られる。本発明によれば、まず、計算
時間のかかる形態素解析を行うか行なわないかの判断を
行い、検索要求文が単一の検索語の場合には、形態素解
析を行わないこととしたため、検索時間の短縮が可能と
なる。As described above, according to the present invention, the following effects can be obtained. According to the present invention, first, it is determined whether or not to perform a morphological analysis that requires a long calculation time. If the search request sentence is a single search word, the morphological analysis is not performed. Can be shortened.

【００４５】すなわち、インターネットにおける情報検
索サービスにおいては、検索要求文として、例えば「Ｎ
ＴＴのＩＳＤＮサービス」という日常使う文を指定する
ことができるが、実際に検索サービスにおいて入力され
る検索要求文は、単一の単語から構成されることが極め
て多い。従来の技術では、そのような単一の単語にも全
て形態素解析を行なっており、そのために検索時間が長
時間かかっていたが、本発明によれば、そのような単一
の単語に対しては形態素解析を行なわないために、検索
時間が短縮される。That is, in the information search service on the Internet, for example, "N
Although a sentence used everyday called "TT's ISDN service" can be specified, a search request sentence actually input in the search service is very often composed of a single word. In the prior art, the morphological analysis was also performed on such a single word, and the search time was long. Does not perform morphological analysis, so that the search time is reduced.

【００４６】従って、本発明によれば、ユーザにとって
待ち時間の少ない快適な文書検索環境を提供することが
可能となる。Therefore, according to the present invention, it is possible to provide a comfortable document search environment with a short waiting time for the user.

[Brief description of the drawings]

【図１】本発明の実施の形態における文書検索装置の構
成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a document search device according to an embodiment of the present invention.

【図２】本発明の実施の形態における文書検索装置の動
作を示すフローチャートである。FIG. 2 is a flowchart illustrating an operation of the document search device according to the embodiment of the present invention.

【図３】本発明の実施の形態における単語インデックス
の例、及び、検索要求文又は検索語と単語インデックス
との照会の例を示す図である。FIG. 3 is a diagram illustrating an example of a word index and an example of a query between a search request sentence or a search word and a word index in the embodiment of the present invention.

【図４】本発明の実施の形態におけるコンピュータシス
テムの構成を示すブロック図である。FIG. 4 is a block diagram illustrating a configuration of a computer system according to the embodiment of the present invention.

[Explanation of symbols]

１０１検索要求入力部１０２検索語抽出実行判定部１０３検索語抽出部１０４単語インデックス照会部１０５単語インデックス１０６文書出力部３００ＣＰＵ３０１メモリ３０２外部記憶装置３０３ディスプレイ３０４キーボード３０５通信処理装置 101 search request input unit 102 search word extraction execution determination unit 103 search word extraction unit 104 word index inquiry unit 105 word index 106 document output unit 300 CPU 301 memory 302 external storage device 303 display 304 keyboard 305 communication processing device

───────────────────────────────────────────────────── フロントページの続き (72)発明者大久保雅且東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (72)発明者田中一男東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5B075 ND03 NK02 NK54 PP25 QP10 UU06 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Masakatsu Okubo 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Japan Telegraph and Telephone Corporation (72) Inventor Kazuo Tanaka 3--19, Nishishinjuku, Shinjuku-ku, Tokyo No. 2 Nippon Telegraph and Telephone Corporation F-term (reference) 5B075 ND03 NK02 NK54 PP25 QP10 UU06

Claims

[Claims]

1. A document search apparatus for searching a document including a search word extracted as needed from a search request sentence from a plurality of documents using a word index, extracting the search word from the search request sentence. Search term extraction determining means for determining whether or not it is necessary to perform the search request. If the result of the determination indicates that the search term need not be extracted, the search request is not extracted from the search request sentence. Means for retrieving a document including a sentence from the plurality of documents.

2. The search term extraction determination means, comprising: first determination means for determining whether the search request sentence is a sentence composed of a plurality of search words based on a length of the search request sentence; If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a second determination is made as to whether it is necessary to extract a search word by referring to the search request sentence and the word index. 2. The document search device according to claim 1, further comprising: a determination unit.

3. The search term extraction determination means, comprising: first determination means for determining whether or not the search request sentence is a sentence composed of a plurality of search words based on a character type constituting the search request sentence; If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a determination is made as to whether or not it is necessary to extract a search word by referring to the search request sentence and the word index. 2. The document search apparatus according to claim 1, further comprising: a second determination unit.

4. A document search method for searching a document including a search word extracted from a search request sentence as needed from a plurality of documents using a word index, wherein the search word is extracted from the search request sentence. A search word extraction determining step of determining whether or not it is necessary to perform the search. If the result of the determination is that the search word need not be extracted, the search request is not extracted from the search request sentence. Retrieving a document containing a sentence from the plurality of documents.

5. The method according to claim 1, wherein the search term extraction determining step determines whether the search request sentence is a sentence composed of a plurality of search words based on a length of the search request sentence. If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a second determination is made as to whether it is necessary to extract a search word by referring to the search request sentence and the word index. 5. The method according to claim 4, further comprising the steps of:

6. The search word extraction determining step includes: a first determination step of determining whether the search request sentence is a sentence composed of a plurality of search words based on a character type of the search request sentence; If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a determination is made as to whether or not it is necessary to extract a search word by referring to the search request sentence and the word index. 5. The document search method according to claim 4, further comprising the following two determination steps.

7. A recording medium storing a document search program for causing a computer to execute a document search process for searching a document including a search word extracted from a search request sentence from a plurality of documents using a word index as needed. A search word extraction determination procedure for determining whether a search word needs to be extracted from the search request sentence; and as a result of the determination, if the search word does not need to be extracted, the search A step of searching a document including the search request sentence from the plurality of documents without extracting a search term from the request sentence;

8. The search term extraction determination step, comprising: a first determination step of determining whether the search request sentence is a sentence composed of a plurality of search terms based on a length of the search request sentence; If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a second determination is made as to whether it is necessary to extract a search word by referring to the search request sentence and the word index. A recording medium on which the document search program according to claim 7 is recorded.

9. The search term extraction determining step includes: a first determination step of determining whether the search request sentence is a sentence composed of a plurality of search words based on a character type of the search request sentence; If the result of the determination is that the search request sentence is not a sentence composed of a plurality of search words, a determination is made as to whether or not it is necessary to extract a search word by referring to the search request sentence and the word index. 8. A recording medium storing the document search program according to claim 7, comprising: