JP2002324077A

JP2002324077A - Apparatus and method for document retrieval

Info

Publication number: JP2002324077A
Application number: JP2001126541A
Authority: JP
Inventors: Akito Nagai; 明人永井; Yasuhiro Takayama; 泰博高山; Katsushi Suzuki; 克志鈴木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2001-04-24
Filing date: 2001-04-24
Publication date: 2002-11-08

Abstract

PROBLEM TO BE SOLVED: To solve the problem that it is difficult for a conventional apparatus for document retrieval to select a searching word to efficiently narrow a search. SOLUTION: An apparatus for document retrieval comprises a document retrieval part 2 to search documents, a document feature extracting part 4 to output a document vector group 5, a topic classifying part 6 to prepare a topic by classifying the group 5, a narrowing effect presumption part 10 to calculate narrowing effect indicator 11,a generating part 8 for choice of retrieving word for provision to select the retrieval word with high indicator 11 and output it as a candidate of retrieval word 12 for provision, a classifying result providing part 14 to provide the candidate 12 and indicator 11 topic-by topic and a document feature setting part 20 to change the group 5.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、文書よりも大き
な単位である話題毎に絞り込み効果を提示し、追加検索
語の選択を容易にすることによって、効率よく文書の検
索が実行できる文書検索装置および文書検索方法に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search apparatus capable of efficiently executing a document search by presenting a narrowing-down effect for each topic which is a unit larger than a document and facilitating selection of an additional search word. And a document search method.

【０００２】[0002]

【従来の技術】インターネットを利用して閲覧できるＨ
ＴＭＬ文書や大規模なテキストデータベースに記録・管
理される電子化文書などを検索した場合、検索結果とし
て得られる文書情報の情報量が非常に多くなり、ユーザ
が所望する文書の発見に多くの時間と労力とが必要にな
ってきている。このために、ユーザが目的の文書を効率
的に検索できるように、検索結果として得られる文書情
報を内容に応じて分類したり、絞り込み検索のために追
加する検索語候補をユーザに提示したりするような、ユ
ーザが所望する文書の発見を支援する技術に対する要求
が高まっている。2. Description of the Related Art H that can be browsed using the Internet
When searching a TML document or an electronic document recorded and managed in a large-scale text database, the amount of document information obtained as a search result becomes extremely large, and it takes much time for a user to find a desired document. And labor are needed. For this purpose, the document information obtained as a search result is classified according to the content, and a search word candidate to be added for a narrow search is presented to the user so that the user can efficiently search for a target document. There is an increasing demand for a technology that assists a user in finding a desired document.

【０００３】ユーザが目的の文書を効率的に検索できる
技術に関しては、特開平９−２３１２３８号公報「テキ
スト検索結果表示方法及び装置」（以下、文献１と称す
る）、及び、特開平１１−２１３０００号公報「インタ
ラクティブ情報検索方法及び装置及びインタラクティブ
情報検索プログラムを格納した記憶媒体」（以下、文献
２と称する）に開示された技術がある。[0003] With respect to a technology that allows a user to efficiently search for a target document, Japanese Patent Application Laid-Open No. 9-231238, entitled "Method and Apparatus for Displaying Text Search Results" (hereinafter referred to as Document 1), and Japanese Patent Application Laid-Open No. 11-213000. There is a technique disclosed in Japanese Patent Laid-Open Publication No. 2000-214, "Interactive information search method and apparatus and storage medium storing an interactive information search program" (hereinafter referred to as Document 2).

【０００４】文献１において開示された技術は、検索結
果をファジィクラスタリングによって主題分類し、主題
分類されたカテゴリを検索語の組と共にユーザに提示す
るものである。また、文献２において開示された技術
は、検索結果をクラスタリングによって分類し、分類さ
れたカテゴリを検索語の組と共にユーザに提示し、更
に、ユーザが指定したカテゴリをサブカテゴリに分類し
て、インタラクティブな検索を可能にするものである。The technique disclosed in Document 1 classifies search results into subjects by fuzzy clustering, and presents the subject-categorized categories together with a set of search words to a user. The technique disclosed in Document 2 classifies search results by clustering, presents the classified categories to a user together with a set of search words, and further classifies the user-specified categories into sub-categories to provide an interactive method. It enables search.

【０００５】絞り込み検索を支援する技術に関しては、
早川らの「ＷＷＷ検索サービスにおける検索結果絞り込
み用インタフェースの開発」（情報処理学会，ヒューマ
ンインタフェース研究会（ＨＩ７６−５），ｐｐ．２
５，１９９８、以下、文献３と称する）において開示さ
れた技術がある。[0005] With regard to the technology for supporting a refined search,
Hayakawa et al., “Development of an interface for narrowing search results in WWW search service” (IPSJ, Human Interface Workshop (HI76-5), pp.2)
5, 1998, hereinafter referred to as Document 3).

【０００６】文献３において開示された技術は、検索語
と検索結果文献集合との関係を容易に把握できるように
するために、検索語が検索結果件数をどの程度絞り込め
るかという情報を可視化してユーザに提示するインタフ
ェースを提供している。また、検索語の絞り込み結果
を、文献単語行列を用いて可視化しており、単語と文献
とをそれぞれ行と列とに持つマトリクスを、直接ユーザ
に表の形で提示し、文献に対する単語の重みをマトリク
スのセルの明るさで表現して、追加検索語の絞り込み効
果を一覧できるようにしているものである。[0006] The technique disclosed in Document 3 visualizes information on how much a search word can narrow down the number of search results in order to easily grasp the relationship between a search word and a set of search result documents. Provides an interface to present to the user. In addition, the results of narrowing down search terms are visualized using a document word matrix, and a matrix having words and documents in rows and columns, respectively, is directly presented to the user in a table form, and the weight of words for the documents is given. Is expressed by the brightness of the cells of the matrix so that the effect of narrowing down the additional search words can be listed.

【０００７】また、文献３において述べられた従来技術
に関して、特開平１１−８５７６４号公報「検索結果件
数の統計的推定方法及び装置及び検索結果件数の統計的
推定プログラムを格納した記憶媒体」（以下、文献４と
称する）に開示された技術がある。文献４において開示
された技術は、文献３において検索結果をさらに絞り込
むための検索語を追加した場合に、検索結果の件数がど
のように変化するかを知るための検索結果件数の統計的
推定方法について開示しているものである。[0007] Further, regarding the prior art described in Document 3, Japanese Patent Application Laid-Open No. H11-85764, entitled "Method and apparatus for statistically estimating the number of search results and a storage medium storing a program for statistically estimating the number of search results," , Reference 4). The technique disclosed in Document 4 is a method for statistically estimating the number of search results to know how the number of search results changes when a search word for further narrowing the search results is added in Document 3. Is disclosed.

【０００８】図１０は、従来の文書検索装置を示す構成
図であり、文献４に開示された実施の一例である推定装
置の構成図である。図１０において、１０１は全文検索
による文献検索を対象とする推定装置、１０２は文献を
検索するための検索語、１０３は文献が記録・管理され
ているテキストデータベース、１０４はテキストデータ
ベース１０３から出力される文献の検索結果総数、１０
５は検索結果の中から５０件を無作為に抽出しテキスト
データベース１０３から出力される文献サンプル集合、
１０６は文献サンプル集合１０５が入力され各文献につ
いて各単語の出現回数を数える文献単語行列生成部、１
０７は文献単語行列生成部１０６が生成した文献単語行
列である。なお、文献サンプル集合１０５は、５０件に
限定されるものではなく、状況に応じて適宜に設定でき
る。FIG. 10 is a block diagram showing a conventional document retrieval apparatus, and is a block diagram of an estimating apparatus as an example of the embodiment disclosed in Reference 4. In FIG. 10, reference numeral 101 denotes an estimating apparatus for searching a document by full-text search; 102, a search term for searching the document; 103, a text database in which documents are recorded and managed; 104, output from the text database 103; Total search results for documents
5 is a document sample set randomly extracted from the search results and output from the text database 103;
Reference numeral 106 denotes a document word matrix generation unit that receives a document sample set 105 and counts the number of appearances of each word for each document.
Reference numeral 07 denotes a document word matrix generated by the document word matrix generation unit 106. Note that the document sample set 105 is not limited to 50, and can be appropriately set according to the situation.

【０００９】また、図１０において、１０８は文献単語
行列１０７に対して単語重要度を計算する検索語候補提
示部、１０９は検索語候補提示部１０８から出力される
検索語候補、１１０は検索語候補１０９をモニタなどに
出力する検索語選択部、１１１はユーザが検索語候補１
０９を選択した際に入力される選択信号、１１２は選択
信号１１１に基づいて検索語選択部１１０から出力され
る追加検索語、１１３は文献単語行列１０７と追加検索
語１１２に基づいて追加検索語１１２の出現率を計算す
る件数推定部、１１４は件数推定部１１３から出力され
検索結果総数１０４に出現率を乗じて得られる推定件数
である。図１１は、従来の文書検索装置における文献単
語行列１０７の一例を示す説明図である。[0009] In FIG. 10, reference numeral 108 denotes a search word candidate presentation unit that calculates word importance for the document word matrix 107, 109 denotes a search word candidate output from the search word candidate presentation unit 108, and 110 denotes a search word A search term selection unit 111 for outputting the candidate 109 to a monitor or the like;
09 is a selection signal input when selecting 09, 112 is an additional search word output from the search word selection unit 110 based on the selection signal 111, 113 is an additional search word based on the document word matrix 107 and the additional search word 112 The number estimation unit 112 for calculating the appearance rate of 112 is the estimated number obtained by multiplying the total number of search results 104 output from the number estimation unit 113 by the appearance rate. FIG. 11 is an explanatory diagram showing an example of a document word matrix 107 in a conventional document search device.

【００１０】次に動作について説明する。検索語１０２
がテキストデータベース１０３に入力すると、検索語１
０２に基づいてテキストデータベース１０３に記録・管
理されている文献を検索する。テキストデータベース１
０３は、検索結果の中から無作為に５０件を抽出し、文
献サンプル集合１０５として文献単語行列生成部１０６
に出力すると共に、検索結果総数１０４を件数推定部１
１３に出力する。文献単語行列生成部１０６は、文献サ
ンプル集合１０５に基づいて、各文献について各単語が
出現する回数を数えることによって図１１に示された文
献単語行列１０７を生成し、検索語候補提示部１０８及
び件数推定部１１３に出力する。文献単語行列１０７に
おいて、行と列はそれぞれ文献識別子と検索語リストと
を示し、表のセルの値は対応する行の文献の中に対応す
る列の検索語が出現する回数を示している。Next, the operation will be described. Search term 102
Is entered into the text database 103, the search term 1
02 is searched for documents recorded and managed in the text database 103 based on the 02. Text database 1
03, a document word matrix generation unit 106 extracts 50 items at random from the search results and sets the sample as a document sample set 105;
And the total number of search results 104 is
13 is output. The document word matrix generation unit 106 generates the document word matrix 107 shown in FIG. 11 by counting the number of times each word appears for each document based on the document sample set 105, The number is output to the number estimation unit 113. In the document word matrix 107, the row and the column indicate the document identifier and the search word list, respectively, and the value of the cell in the table indicates the number of times the search word of the corresponding column appears in the document of the corresponding row.

【００１１】検索語候補提示部１０８は、文献単語行列
１０７から助詞や代名詞などの単語を除き、任意の文献
について任意の単語がどの程度重要であるかを示す単語
重要度を計算する。単語重要度とは、特定の文献に集中
的に出現している単語では高くなり、逆に多くの文献で
出現している単語では低くなる指標である。また、検索
語候補提示部１０８は、計算した単語重要度に基づい
て、単語重要度が高い単語を検索語候補１０９として検
索語選択部１１０に出力する。[0011] The retrieval word candidate presentation unit 108 calculates the word importance indicating how important an arbitrary word is in an arbitrary document, excluding words such as particles and pronouns from the document word matrix 107. The word importance is an index that is high for words appearing intensively in a specific document, and is low for words appearing in many documents. In addition, the search word candidate presentation unit 108 outputs a word having a high word importance as the search word candidate 109 to the search word selection unit 110 based on the calculated word importance.

【００１２】検索語選択部１１０は、検索語候補１０９
をモニタ等に出力し、ユーザに検索語候補１０９から任
意の検索語を選択させ、選択信号１１１として入力させ
る。また、検索語選択部１１０は、ユーザが入力した選
択信号１１１に基づいて、追加検索語１１２として件数
推定部１１３に出力する。[0012] The search word selection unit 110 provides a search word candidate 109.
Is output to a monitor or the like, and the user is caused to select an arbitrary search word from the search word candidates 109 and input the selected search word as a selection signal 111. Further, the search term selection unit 110 outputs the additional search term 112 to the number estimation unit 113 based on the selection signal 111 input by the user.

【００１３】件数推定部１１３には、検索結果総数１０
４，文献単語行列１０７及び追加検索語１１２が入力す
る。件数推定部１１３は、文献単語行列１０７の追加検
索語１１２の列が“０”ではない行の数を数え、これを
全行数で除することにより追加検索語１１２の出現率が
得られる。さらに、件数推定部１１３は、追加検索語１
１２の出現率に検索結果総数１０４を乗じて得られる推
定件数１１４を出力する。The total number of search results 10
4. The document word matrix 107 and the additional search word 112 are input. The case number estimation unit 113 counts the number of rows in which the columns of the additional search words 112 in the document word matrix 107 are not “0”, and divides this by the total number of rows to obtain the appearance rate of the additional search words 112. Further, the number of cases estimation unit 113
The estimated number 114 obtained by multiplying the appearance rate of Twelve by the total number of search results 104 is output.

【００１４】[0014]

【発明が解決しようとする課題】従来の文書検索装置は
以上のように構成されているので、検索語の絞り込み効
果に関して、文献１及び文献２に開示された従来技術で
は、検索結果の文書を分類して、分類されたカテゴリに
おける検索語をユーザに提示しているが、提示された検
索語を絞り込み検索に用いるために選択する際には、提
示された検索語に対応する絞り込み効果の情報がないの
で、効率的に絞り込み検索をするための検索語を選択す
ることが難しいという課題があった。Since the conventional document retrieval apparatus is configured as described above, with respect to the effect of narrowing down search terms, the conventional techniques disclosed in References 1 and 2 disclose documents of search results. Classifying and presenting the search terms in the classified category to the user. When selecting the presented search terms for use in the refined search, information of the refinement effect corresponding to the presented search terms is used. However, there is no problem, and it is difficult to select a search word for performing a narrowed search efficiently.

【００１５】また、従来の文書検索装置は、検索結果の
可視化に関して、文献３に開示された従来技術では、検
索語の絞り込み効果を文献単位でユーザに提示している
が、検索結果が数千程度の規模になると、文献単語行列
による検索結果全体の絞り込み効果の表示が極めて困難
になり、一覧性に欠けるという課題があった。[0015] Further, in the conventional document search apparatus, regarding the visualization of search results, in the prior art disclosed in Reference 3, the effect of narrowing down search terms is presented to the user in units of documents, but the search results are several thousand. When the size becomes large, it is extremely difficult to display the effect of narrowing down the entire search result using the document word matrix, and there is a problem that the listability is lacking.

【００１６】また、従来の文書検索装置は、検索結果の
可視化に関して、文献４に開示された従来技術では、検
索結果から文献サンプル集合を出力し、文献サンプル集
合に基づいて各文献毎に対応する追加検索語の出現率か
ら、検索結果全体に対する絞り込み件数を推定している
ので、推定された絞り込み件数をユーザに提示する際に
は、追加検索語毎の提示となるから、検索結果全体の絞
り込み効果の表示が極めて困難になり、一覧性に欠ける
という課題があった。Further, the conventional document search apparatus outputs a set of document samples from the search results in the prior art disclosed in Document 4 with respect to visualization of the search results, and handles each document based on the set of document samples. Since the number of refinements for the entire search result is estimated from the appearance rate of the additional search terms, when the estimated number of refinements is presented to the user, each additional search term is presented. There has been a problem that it is extremely difficult to display the effects and the list is lacking.

【００１７】また、従来の文書検索装置は、目的の文書
への絞り込み検索に関して、文献２に開示された従来技
術では、目的の文書に絞り込むために指定したカテゴリ
をサブカテゴリに詳細分類しているが、目的の文書は指
定したカテゴリ内に全て存在しているわけではなく、他
のカテゴリにもユーザが所望する文書が存在しており、
絞り込み検索で得られる文書は指定したカテゴリに存在
する文書のみに限定され、目的の文書の検索漏れが生じ
るから、絞り込み検索を再び行うことが困難であるとい
う課題があった。Further, in the conventional document search apparatus, with respect to narrowing down search to a target document, in the related art disclosed in Document 2, a category designated for narrowing down to a target document is classified into subcategories in detail. However, not all the target documents exist in the specified category, and the documents desired by the user also exist in other categories.
Documents obtained by the refined search are limited to only documents that exist in the designated category, and there is a problem that it is difficult to perform the refined search again because a search for a target document is omitted.

【００１８】また、従来の文書検索装置は、目的の文書
への絞り込み検索に関して、文献３及び文献４に開示さ
れた従来技術では、初回の検索結果に対して検索語を追
加していくＡＮＤ検索を行っているので、絞り込み検索
の検索対象は、直前の検索語に対する検索結果の文書に
限定され、全文書空間に散在する目的の文書の検索が不
可能となり、絞り込み検索を再び行うことが困難である
という課題があった。In the conventional document search apparatus, with respect to narrowing down search to a target document, according to the conventional techniques disclosed in References 3 and 4, an AND search in which a search term is added to an initial search result is performed. The search target of the narrow search is limited to documents of the search result for the immediately preceding search word, and it is impossible to search for the target documents scattered in the entire document space, and it is difficult to perform the narrow search again. There was a problem that was.

【００１９】この発明は上記のような課題を解決するた
めになされたもので、絞り込み検索のために提示する検
索語に対して、絞り込み効果を表す指標を付与し、追加
検索語の選択を容易にして、ユーザの絞り込み検索を効
率的にする文書検索装置および文書検索方法を得ること
を目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem, and an index representing a narrowing effect is given to a search word presented for a narrow search, thereby facilitating selection of an additional search word. Accordingly, it is an object of the present invention to obtain a document search device and a document search method that can efficiently perform a narrow search of a user.

【００２０】また、この発明は、検索結果の文書を分類
して、分類された文書集合を話題とし、文書よりも大き
な単位である話題毎に検索語の絞り込み効果をユーザに
提示して、検索結果の文書全体に対する絞り込み効果の
一覧性を高めるようにする文書検索装置および文書検索
方法を得ることを目的とする。Further, the present invention classifies documents of a search result, sets a classified document set as a topic, and presents a user with an effect of narrowing down a search word for each topic which is a unit larger than the document, thereby performing a search. It is an object of the present invention to provide a document search apparatus and a document search method that enhance the list of narrowing-down effects for the entire result document.

【００２１】さらに、この発明は、ユーザがフィードバ
ック情報として指定した話題や検索語の情報を利用し
て、初回の検索結果を再分類することにより、話題間に
分散した目的の文書を一つの話題に集めるようにして、
検索結果の絞り込みを効率的にする文書検索装置および
文書検索方法を得ることを目的とする。Further, according to the present invention, the first search result is re-classified by using information of a topic or a search word specified by a user as feedback information, thereby enabling a target document dispersed among topics to be converted into one topic. To collect
It is an object of the present invention to obtain a document search device and a document search method that efficiently narrow down search results.

【００２２】[0022]

【課題を解決するための手段】この発明に係る文書検索
装置は、文書集合から検索条件に適合する文書を検索し
出力する文書検索手段と、文書に記述されている単語の
出現頻度に基づいて、単語の統計的な重みを算出し、重
みから得られる文書ベクトル集合を出力する文書特徴抽
出手段と、文書ベクトル集合を、文書ベクトル間の類似
度に従って分類することによって話題を作成し、話題に
属する文書情報と話題の重要度付き検索語とを出力する
話題分類手段と、文書情報と重要度付き検索語とを参照
して話題に属する検索語の絞り込み効果指標を算出し出
力する絞り込み効果推定手段と、絞り込み効果指標を検
索語に付与し、絞り込み効果指標が高い検索語を選択し
て提示検索語候補とし、当該提示検索語候補と当該提示
検索語候補に対応する文書情報とを出力する提示検索語
候補生成手段と、提示検索語候補と話題とを対応付け
て、絞り込み効果指標と共に提示し、指示情報または選
択情報のどちらか一方もしくは両方を入力するように促
す分類結果提示手段と、選択情報に基づいて、文書ベク
トル集合に含まれる文書ベクトルを変更して出力する文
書特徴設定手段とを備えるものである。According to the present invention, there is provided a document search apparatus for searching and outputting a document meeting a search condition from a set of documents, based on a frequency of occurrence of a word described in the document. A document feature extracting means for calculating a statistical weight of words and outputting a set of document vectors obtained from the weights; and creating a topic by classifying the set of document vectors according to the similarity between the document vectors. Topic classification means for outputting document information belonging to the topic and search words with importance of the topic, and narrowing effect estimation for calculating and outputting a narrowing effect index of the search word belonging to the topic with reference to the document information and the search words with importance. Means and a narrowing effect index are assigned to a search word, and a search word having a high narrowing effect index is selected as a suggested search word candidate, and the proposed search word candidate and the suggested search word candidate are supported. And a presentation search word candidate generating means for outputting document information to be presented, associating the presentation search word candidate with a topic, presenting it together with a narrowing-down effect index, and inputting either or both of instruction information and selection information. It includes a classification result presenting means for prompting, and a document feature setting means for changing and outputting a document vector included in the document vector set based on the selection information.

【００２３】この発明に係る文書検索装置は、絞り込み
効果推定手段が、少なくとも一つまたは複数の話題に対
して絞り込み効果指標を算出し出力するようにしたもの
である。In the document search apparatus according to the present invention, the narrowing effect estimating means calculates and outputs a narrowing effect index for at least one or a plurality of topics.

【００２４】この発明に係る文書検索装置は、分類結果
提示手段が、話題と当該話題に属する提示検索語候補と
を行列の形式で提示し、行列の各要素に絞り込み効果指
標を提示するようにしたものである。[0024] In the document search apparatus according to the present invention, the classification result presenting means presents the topic and the proposed search term candidates belonging to the topic in a matrix format, and presents a narrowing effect index to each element of the matrix. It was done.

【００２５】この発明に係る文書検索装置は、文書検索
手段が出力する文書から当該文書を特徴付ける単語を話
題分類項目として抽出し、話題分類項目に関連するテキ
ストを参照して、話題分類項目に対する重みベクトルを
学習し、話題分類項目と重みベクトルとを話題分類手段
に出力する話題分類項目取得手段を備え、話題分類手段
は、話題分類項目と重みベクトルとに基づいて文書ベク
トル集合を分類することによって話題を作成するように
したものである。The document retrieval apparatus according to the present invention extracts a word characterizing the document from the document output by the document retrieval means as a topic classification item, refers to a text related to the topic classification item, and assigns a weight to the topic classification item. A topic classification item acquiring unit that learns a vector and outputs a topic classification item and a weight vector to the topic classification unit. The topic classification unit classifies a document vector set based on the topic classification item and the weight vector. It is intended to create a topic.

【００２６】この発明に係る文書検索装置は、話題分類
項目取得手段が、文書検索手段から出力される文書から
当該文書に記述されている単語の出現頻度に基づいて話
題分類項目を抽出するようにしたものである。[0026] In the document search apparatus according to the present invention, the topic classification item acquiring means extracts the topic classification item from the document output from the document retrieval means based on the appearance frequency of the word described in the document. It was done.

【００２７】この発明に係る文書検索装置は、話題分類
項目取得手段が、文書検索手段から出力される文書から
当該文書に記述されているタグを参照して話題分類項目
を抽出するようにしたものである。[0027] In the document search apparatus according to the present invention, the topic classification item acquiring means extracts a topic classification item from a document output from the document retrieval means by referring to a tag described in the document. It is.

【００２８】この発明に係る文書検索装置は、分類結果
提示手段を介して指定された文書から文書ベクトルを算
出し、文書特徴設定手段に出力する指定文書特徴抽出手
段を備え、文書特徴設定手段が、指定文書特徴抽出手段
から出力される文書ベクトルと、文書特徴抽出手段から
出力される文書ベクトル集合とに基づいて文書ベクトル
集合を変更するようにしたものである。The document retrieval apparatus according to the present invention includes a designated document feature extraction unit that calculates a document vector from a document specified via a classification result presentation unit and outputs the document vector to a document feature setting unit. The document vector set is changed based on the document vector output from the designated document feature extraction means and the document vector set output from the document feature extraction means.

【００２９】この発明に係る文書検索装置は、所定の単
語と関連する単語を定義し関連語として記録する第１の
記録手段と、指定された検索語に対応する関連語を第１
の記録手段から抽出して文書特徴設定手段に出力する関
連語設定手段とを備え、文書特徴設定手段が、関連語と
分類結果提示手段から入力した選択情報とに基づいて、
文書ベクトル集合を変更するようにしたものである。[0029] A document search device according to the present invention defines first words that are related to a predetermined word and records the words as related words, and stores the related words corresponding to the specified search words in the first word.
Related word setting means for extracting from the recording means and outputting the same to the document feature setting means, based on the related words and the selection information input from the classification result presentation means,
The document vector set is changed.

【００３０】この発明に係る文書検索装置は、検索要求
文の作成知識を記録する第２の記録手段と、当該第２の
記録手段を参照して、提示検索語候補生成手段が出力し
た検索語に対応する検索要求文を作成し、文書検索手段
に出力する検索要求作成手段とを備え、提示検索語候補
生成手段が、絞り込み効果指標に基づいて検索要求作成
手段に出力する検索語を選択するようにしたものであ
る。[0030] The document search device according to the present invention includes a second recording unit for recording the knowledge of creating a search request sentence, and a search term output by the presented search word candidate generation unit with reference to the second recording unit. And a search request creating means for creating a search request sentence corresponding to the search request and outputting the search request sentence to the document search means, wherein the suggested search word candidate generating means selects a search word to be output to the search request creating means based on the narrowing effect index. It is like that.

【００３１】この発明に係る文書検索装置は、提示検索
語候補生成手段が、複数の検索語を選択して検索要求作
成手段に出力し、検索要求作成手段が、複数の検索語に
対する論理演算から検索要求文を作成するようにしたも
のである。[0031] In the document search apparatus according to the present invention, the suggested search word candidate generation means selects a plurality of search words and outputs the selected search words to the search request creation means, and the search request creation means performs a logical operation on the plurality of search words. This is to create a search request sentence.

【００３２】この発明に係る文書検索方法は、文書集合
から検索条件に適合する文書を検索し出力する文書検索
ステップと、文書に記述されている単語の出現頻度に基
づいて、単語の統計的な重みを算出し、重みから得られ
る文書ベクトル集合を出力する文書特徴抽出ステップ
と、文書ベクトル集合を、文書ベクトル間の類似度に従
って分類することによって話題を作成し、話題に属する
文書情報と話題の重要度付き検索語とを出力する話題分
類ステップと、文書情報と重要度付き検索語とを参照し
て話題に属する検索語の絞り込み効果指標を算出し出力
する絞り込み効果推定ステップと、絞り込み効果指標を
検索語に付与し、絞り込み効果指標が高い検索語を選択
して提示検索語候補とし、当該提示検索語候補と当該提
示検索語候補に対応する文書情報とを出力する提示検索
語候補生成ステップと、提示検索語候補と話題とを対応
付けて、絞り込み効果指標と共に提示し、指示情報また
は選択情報のどちらか一方もしくは両方を入力するよう
に促す分類結果提示ステップと、選択情報に基づいて、
文書ベクトル集合に含まれる文書ベクトルを変更して出
力する文書特徴設定ステップとを有するものである。According to the document search method of the present invention, a document search step for searching and outputting a document meeting a search condition from a set of documents, and a statistical search of words based on the frequency of occurrence of words described in the document. A document feature extraction step of calculating weights and outputting a set of document vectors obtained from the weights; and creating a topic by classifying the set of document vectors according to the similarity between the document vectors, and generating document information and topic information belonging to the topic. A topic classification step of outputting a search word with importance; a narrowing effect estimation step of calculating and outputting a narrowing effect index of a search word belonging to a topic by referring to the document information and the search word with importance; and a narrowing effect index Is added to the search term, and a search term having a high narrowing effect index is selected as a suggested search term candidate, which corresponds to the suggested search term candidate and the suggested search term candidate. A suggested search word candidate generating step of outputting document information to be presented, associating the suggested search word candidate with a topic, presenting the same together with a narrowing effect index, and inputting either or both of instruction information and selection information. Based on the classification result presentation step to be prompted and the selection information,
A document feature setting step of changing and outputting a document vector included in the document vector set.

【００３３】この発明に係る文書検索方法は、絞り込み
効果推定ステップが、少なくとも一つまたは複数の話題
に対して絞り込み効果指標を算出し出力するようにした
ものである。In the document search method according to the present invention, the narrowing effect estimation step calculates and outputs a narrowing effect index for at least one or a plurality of topics.

【００３４】この発明に係る文書検索方法は、分類結果
提示ステップが、話題と当該話題に属する提示検索語候
補とを行列の形式で提示し、行列の各要素に絞り込み効
果指標を提示するようにしたものである。In the document search method according to the present invention, the classification result presenting step presents the topic and the presented search word candidates belonging to the topic in a matrix format, and presents a narrowing effect index to each element of the matrix. It was done.

【００３５】この発明に係る文書検索方法は、文書検索
ステップが出力する文書から当該文書を特徴付ける単語
を話題分類項目として抽出し、話題分類項目に関連する
テキストを参照して、話題分類項目に対する重みベクト
ルを学習し、話題分類項目と重みベクトルとを出力する
話題分類項目取得ステップを有し、話題分類ステップ
が、話題分類項目と重みベクトルとに基づいて文書ベク
トル集合を分類することによって話題を作成するように
したものである。In the document search method according to the present invention, a word characterizing the document is extracted as a topic classification item from the document output by the document search step, and a text related to the topic classification item is referred to, and a weight for the topic classification item is determined. A topic classification item acquiring step of learning a vector and outputting a topic classification item and a weight vector, wherein the topic classification step creates a topic by classifying a set of document vectors based on the topic classification item and the weight vector It is something to do.

【００３６】この発明に係る文書検索方法は、話題分類
項目取得ステップが、文書検索ステップから出力される
文書から当該文書に記述されている単語の出現頻度に基
づいて話題分類項目を抽出するようにしたものである。In the document search method according to the present invention, the topic classification item obtaining step extracts the topic classification item from the document output from the document search step based on the appearance frequency of the word described in the document. It was done.

【００３７】この発明に係る文書検索方法は、話題分類
項目取得ステップが、文書検索ステップから出力される
文書から当該文書に記述されているタグを参照して話題
分類項目を抽出するようにしたものである。[0037] In the document search method according to the present invention, the topic classification item obtaining step extracts a topic classification item from a document output from the document search step by referring to a tag described in the document. It is.

【００３８】この発明に係る文書検索方法は、分類結果
提示ステップを介して指定された文書から文書ベクトル
を算出し出力する指定文書特徴抽出ステップを有し、文
書特徴設定ステップが、指定文書特徴抽出ステップから
出力された文書ベクトルと、文書特徴抽出ステップから
出力された文書ベクトル集合とに基づいて文書ベクトル
集合を変更するようにしたものである。The document search method according to the present invention includes a designated document feature extraction step of calculating and outputting a document vector from a document designated via the classification result presentation step, and the designated document feature extraction step includes the designated document feature extraction step. The document vector set is changed based on the document vector output from the step and the document vector set output from the document feature extraction step.

【００３９】この発明に係る文書検索方法は、所定の単
語と関連する単語を定義し関連語として記録する第１の
記録ステップと、指定された検索語に対応する関連語を
抽出して出力する関連語設定ステップとを有し、文書特
徴設定ステップが、関連語と分類結果提示ステップから
入力した選択情報とに基づいて、文書ベクトル集合を変
更するようにしたものである。In the document search method according to the present invention, a first recording step of defining a word related to a predetermined word and recording the word as a related word, and extracting and outputting a related word corresponding to the specified search word A related word setting step, wherein the document feature setting step changes the document vector set based on the related word and the selection information input from the classification result presentation step.

【００４０】この発明に係る文書検索方法は、検索要求
文の作成知識を記録する第２の記録ステップと、提示検
索語候補生成ステップが出力した検索語に対応する検索
要求文を作成し出力する検索要求作成ステップとを有
し、提示検索語候補生成ステップが、絞り込み効果指標
に基づいて出力する検索語を選択するようにしたもので
ある。The document search method according to the present invention creates and outputs a search request sentence corresponding to the search word output by the second recording step of recording the creation knowledge of the search request sentence and the presented search word candidate generation step. A search request creation step, wherein the presented search word candidate generation step selects a search word to be output based on the narrowing effect index.

【００４１】この発明に係る文書検索方法は、提示検索
語候補生成ステップが、複数の検索語を選択して出力
し、検索要求作成ステップが、複数の検索語に対する論
理演算から検索要求文を作成するようにしたものであ
る。In the document search method according to the present invention, the suggested search word candidate generating step selects and outputs a plurality of search words, and the search request creating step creates a search request sentence from a logical operation on the plurality of search words. It is something to do.

【００４２】[0042]

【発明の実施の形態】以下、この発明の実施の一形態を
説明する。実施の形態１．図１は、この発明の実施の形態１による
文書検索装置の構成を示すブロック図である。図１にお
いて、１は検索対象となる文書集合であり、例えばイン
ターネットを利用して閲覧できるＨＴＭＬ文書や、イン
ターネットを利用して送受信できる電子メール、記録装
置や記録媒体に記録された大規模なテキストデータベー
スに記録・管理される電子化文書などの電子化されたテ
キストから構成される。２は所定の条件に従って文書集
合１から文書を検索する文書検索部（文書検索手段）、
３は文書検索部２が所定の条件に従って文書集合１から
検索した結果である検索結果文書集合、４は文書検索部
２が出力した検索結果文書集合３に対応する文書ベクト
ルを作成する文書特徴抽出部（文書特徴抽出手段）であ
る。なお、文書ベクトルとは文書毎の各単語の重みをベ
クトルの形式で表現したものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below. Embodiment 1 FIG. FIG. 1 is a block diagram showing a configuration of a document search device according to Embodiment 1 of the present invention. In FIG. 1, reference numeral 1 denotes a set of documents to be searched, for example, an HTML document that can be browsed using the Internet, an e-mail that can be transmitted and received using the Internet, a large-scale text recorded on a recording device or a recording medium. It consists of digitized text such as digitized documents recorded and managed in a database. A document search unit (document search means) 2 for searching a document from the document set 1 according to a predetermined condition;
Reference numeral 3 denotes a search result document set which is a result of the document search unit 2 searching from the document set 1 according to predetermined conditions. (Document feature extraction means). Note that the document vector is a representation of the weight of each word for each document in the form of a vector.

【００４３】また、図１において、５は文書特徴抽出部
４が作成する文書ベクトルに基づいて出力される文書ベ
クトル集合、６は文書特徴抽出部４が出力した文書ベク
トル集合５に基づいて算出した文書ベクトル間の類似度
に従って文書ベクトル集合５を複数の集合に分類するこ
とで話題を作成する話題分類部（話題分類手段）、７は
話題分類部６で分類された各話題の文書情報と共に出力
される重要度付き検索語集合、８は話題分類部６が出力
した重要度付き検索語集合７から所定の基準で提示検索
語候補を選択する提示検索語候補生成部（提示検索語候
補生成手段）である。提示検索語候補を選択するための
所定の基準とは、例えば重要度の順に上位一定数を選択
する。In FIG. 1, reference numeral 5 denotes a document vector set output based on the document vector created by the document feature extraction unit 4, and reference numeral 6 denotes a calculation based on the document vector set 5 output by the document feature extraction unit 4. A topic classifying unit (topic classifying means) for creating a topic by classifying the document vector set 5 into a plurality of sets according to the similarity between the document vectors, and 7 outputs together with the document information of each topic classified by the topic classifying unit 6 A set of search word candidates with importance, 8 is a presentation search word candidate generation unit (presentation search word candidate generation means) for selecting a presentation search word candidate based on a predetermined criterion from the search word set with importance output from the topic classification unit 6. ). The predetermined criterion for selecting the presented search word candidate is, for example, to select a certain upper number in order of importance.

【００４４】さらに、図１において、９は提示検索語候
補生成部８から各話題の文書情報と共に出力される各話
題の重要度付き検索語集合、１０は重要度付き検索語集
合９からユーザが指定した話題に属する検索語の絞り込
み効果を推定する絞り込み効果推定部（絞り込み効果推
定手段）、１１は絞り込み効果を推定するための指標と
なる絞り込み効果推定部１０が算出した絞り込み効果指
標、１２は絞り込み効果指標１１に基づいて提示検索語
候補生成部８が選択し対応する話題に属する文書情報や
絞り込み効果指標１１と共に出力する提示検索語候補、
１３は文書検索装置を操作するユーザ、１４は提示検索
語候補１２を各話題に対応付けて絞り込み効果指標１１
と共にユーザ１３に提示する分類結果提示部（分類結果
提示手段）、１５は分類結果提示部１４がユーザ１３に
提示するために視覚化した分類結果、１６はユーザ１３
が分類結果提示部１４に送信する指示情報である。Further, in FIG. 1, reference numeral 9 denotes a set of search terms with importance of each topic output together with document information of each topic from the presented search word candidate generation unit 8; A narrowing effect estimating unit (a narrowing effect estimating means) for estimating a narrowing effect of a search word belonging to a specified topic; A suggested search word candidate that is selected by the suggested search word candidate generating unit 8 based on the narrowing effect index 11 and is output together with the document information belonging to the corresponding topic and the narrowing effect index 11;
Reference numeral 13 denotes a user who operates the document search apparatus, and reference numeral 14 denotes a narrowed-down effect index 11 by associating the presented search word candidate 12 with each topic.
And a classification result presentation unit (classification result presentation means) for presenting to the user 13, a classification result 15 visualized by the classification result presentation unit 14 for presentation to the user 13, and a classification result 16 for the user 13.
Is instruction information to be transmitted to the classification result presentation unit 14.

【００４５】さらに、図１において、１７はユーザ１３
が文書検索部２に入力する検索条件、１８はユーザ１３
が絞り込み効果推定部１０に入力するユーザ１３により
指定された話題、１９はユーザ１３が検索結果の再分類
を指示した場合に分類結果提示部１４から出力されるユ
ーザ１３の選択情報、２０はユーザ１３の選択情報１９
に基づいて文書ベクトルや検索語に対する重みを変更す
る文書特徴設定部（文書特徴設定手段）、２１は文書特
徴設定部２０が話題分類部６に出力する変更された文書
ベクトルである。Further, in FIG.
Is a search condition input to the document search unit 2, and 18 is a user 13
Is a topic specified by the user 13 input to the narrowing-down effect estimation unit 10, 19 is selection information of the user 13 output from the classification result presentation unit 14 when the user 13 instructs re-classification of search results, and 20 is a user 13 selection information 19
A document feature setting unit (document feature setting unit) 21 that changes the weight of the document vector or the search word based on the document vector 21 is a changed document vector output from the document feature setting unit 20 to the topic classification unit 6.

【００４６】図２は、この発明の実施の形態１による文
書検索装置の動作を説明するフローチャートである。図
３は、この発明の実施の形態１における文書ベクトルの
一例を示す説明図である。図４は、この発明の実施の形
態１における検索語−話題対応表の一例を示す説明図で
ある。FIG. 2 is a flowchart illustrating the operation of the document search device according to the first embodiment of the present invention. FIG. 3 is an explanatory diagram showing an example of a document vector according to Embodiment 1 of the present invention. FIG. 4 is an explanatory diagram showing an example of the search term-topic correspondence table according to Embodiment 1 of the present invention.

【００４７】次に動作について説明する。先ず、ステッ
プＳＴ１において、ユーザ１３は文書検索部２に検索条
件１７を入力する。検索条件１７は、例えば検索語や複
数の検索語同士の論理的な式である。次に、ステップＳ
Ｔ２において、文書検索部２は、入力された検索条件１
７に基づいて文書集合１の文書を検索し、検索条件１７
に適合する検索結果文書集合３を出力する（文書検索ス
テップ）。検索対象となる文書集合１は、例えばインタ
ーネットを利用して閲覧できるＨＴＭＬ文書や、インタ
ーネットを利用して送受信できる電子メール、記録装置
や記録媒体に記録された大規模なテキストデータベース
に記録・管理される電子化文書などの電子化されたテキ
ストである。また、文書検索部２は、検索条件１７で検
索可能であればよく、例えばインターネットにおいて一
般的に使用されている全文検索エンジン等を用いてもよ
い。さらに、文書検索部２は、検索結果の文書に関する
種々の情報、例えば文書を特定するための文書ＩＤ番号
や文書ファイルの場所、文書のタイトル等の情報を文書
情報として検索結果文書集合３と共に出力する。Next, the operation will be described. First, in step ST1, the user 13 inputs a search condition 17 to the document search unit 2. The search condition 17 is, for example, a search term or a logical expression between a plurality of search terms. Next, step S
At T2, the document search unit 2 searches the input search condition 1
7 is searched for the document of document set 1 based on the search condition 17
A search result document set 3 that conforms to is output (document search step). The document set 1 to be searched is recorded and managed in, for example, an HTML document that can be browsed using the Internet, an e-mail that can be transmitted and received using the Internet, and a large-scale text database recorded on a recording device or a recording medium. It is a digitized text such as a digitized document. Further, the document search unit 2 only needs to be searchable under the search condition 17, and may use, for example, a full-text search engine commonly used on the Internet. Further, the document search unit 2 outputs various information related to the document of the search result, for example, information such as a document ID number for specifying the document, a location of the document file, and a title of the document as the document information together with the search result document set 3. I do.

【００４８】次に、ステップＳＴ３において、検索結果
文書集合３の各文書に対する文書ベクトルを文書特徴抽
出部４が求める。文書ベクトルは、図３に示されたよう
に、文書毎の各単語の出現頻度に基づいて、各文書Ｄ
１，Ｄ２，・・・，Ｄｊ，・・・Ｄｍ、に対する各単語
ＫＷ１，ＫＷ２，・・・，ＫＷｉ，・・・，ＫＷｎの統
計的な重みＷｉｊを算出し、文書毎の各単語の重みをベ
クトルの形式で表現したものである。この統計的な重み
の算出方法は、ＴＦ・ＩＤＦやχ２乗統計値など種々の
算出方法があり、目的に合わせて適宜に選択して用いれ
ばよい。文書特徴抽出部４は、統計的な重みを算出して
得られた文書ベクトル集合５を出力する（文書特徴抽出
ステップ）。Next, in step ST3, the document feature extraction unit 4 obtains a document vector for each document in the search result document set 3. As shown in FIG. 3, the document vector is calculated for each document D based on the frequency of occurrence of each word for each document.
, DW,..., KWn are calculated for each of the words KW1, KW2,..., KWi,. In the form of a vector. There are various methods for calculating the statistical weights, such as TF / IDF and chi-square statistical values, which may be appropriately selected and used according to the purpose. The document feature extraction unit 4 outputs a document vector set 5 obtained by calculating a statistical weight (document feature extraction step).

【００４９】次に、ステップＳＴ４において、話題分類
部６は、文書ベクトル間の類似度を算出し、類似度に従
って文書ベクトル集合５を複数の集合に分類することで
話題を作成する（話題分類ステップ）。文書を分類する
方法としては、トップダウンに分類カテゴリを与えて分
類する文書分類と、ボトムアップに類似する文書をまと
めあげていくクラスタリングとの２種類に大別される。Next, in step ST4, the topic classifying unit 6 calculates the similarity between the document vectors, and classifies the document vector set 5 into a plurality of sets according to the similarity to create a topic (topic classifying step). ). Methods of classifying documents are roughly classified into two types: a document classification in which a classification category is provided by giving a top-down classification category, and a clustering in which documents similar to bottom-up are put together.

【００５０】文書分類では、分類先のカテゴリを予め設
定して、カテゴリに属するサンプル文書から、分類カテ
ゴリに対する統計的な重みを分類カテゴリベクトルとし
て学習しておき、入力された文書ベクトル集合５の各文
書ベクトルと、分類カテゴリベクトルとの類似度を算出
する。類似度は、例えばベクトルの内積値を用いる。こ
のようにして得られた類似度に基づいて、最も類似度が
高い分類カテゴリに文書を分類する。一方、クラスタリ
ングは、入力された文書ベクトル集合５に存在する全て
の文書ベクトル間の類似度を算出し、類似度が高い文書
ベクトル同士をまとめて一つのクラスタとし、クラスタ
に対する類似度の算出とまとめあげの処理とを繰り返す
ことによって文書を分類する。In the document classification, the classification destination category is set in advance, and the statistical weight for the classification category is learned from the sample documents belonging to the category as the classification category vector. The similarity between the document vector and the category vector is calculated. As the similarity, for example, an inner product value of a vector is used. Based on the similarity obtained in this way, the document is classified into the category having the highest similarity. On the other hand, in the clustering, the similarity between all the document vectors existing in the input document vector set 5 is calculated, and the document vectors having a high similarity are collectively combined into one cluster, and the similarity for the cluster is calculated and collected. The document is classified by repeating the above processing.

【００５１】話題分類部６は、文書ベクトルを分類する
機能があればよく、上述した文書分類とクラスタリング
に限られるものではなく、その他の分類方法（例えば主
成分分析）を採用してもよい。また、話題分類部６は、
分類された集合を話題として、各話題毎に算出した重要
度の高い単語を検索語とする。例えば、ある話題に属す
る全文書中の単語について、不要とみなして別途設定し
た単語を削除した上で、各単語の出現頻度を数えて重要
度とし、重要度の上位一定数を検索語とする。さらに、
話題分類部６は、各話題に属する文書情報と、各話題の
重要度付き検索語集合７とを出力する。The topic classification unit 6 only needs to have a function of classifying document vectors, and is not limited to the above-described document classification and clustering, but may adopt another classification method (for example, principal component analysis). The topic classification unit 6
The classified set is set as a topic, and a word having high importance calculated for each topic is set as a search word. For example, for words in all documents belonging to a certain topic, words set as unnecessary and separately set words are deleted, the frequency of appearance of each word is counted, and the importance is determined. . further,
The topic classification unit 6 outputs document information belonging to each topic and a search word set 7 with importance of each topic.

【００５２】次に、ステップＳＴ５において、ユーザ１
３の検索が初回である場合はステップＳＴ６に進み、ユ
ーザ１３の検索が初回ではない場合はステップＳＴ７に
進む。ステップＳＴ６において、提示検索語候補生成部
８は、各話題の重要度付き検索語集合７から所定の基準
で提示検索語候補１２を選択し、ステップＳＴ８に進
む。提示検索語候補１２を選択するための所定の基準と
は、例えば重要度の順に上位一定数を選択する。一方、
ステップＳＴ７において、絞り込み効果推定部１０によ
って算出された絞り込み効果指標１１を各検索語候補に
付与して、絞り込み効果指標１１が高い検索語候補を選
択して提示検索語候補１２とする。提示検索語候補生成
部８は、このようにして選択された提示検索語候補１２
を、対応する話題に属する文書情報と共に分類結果提示
部１４に出力する（提示検索語候補生成ステップ）。Next, in step ST5, the user 1
If the search for the user 3 is not the first time, the process proceeds to step ST6. If the search for the user 13 is not the first time, the process proceeds to step ST7. In step ST6, the suggested search word candidate generating unit 8 selects the suggested search word candidates 12 from the set of search words with importance of each topic 7 based on a predetermined criterion, and proceeds to step ST8. The predetermined criterion for selecting the suggested search word candidate 12 is, for example, a certain number of higher ranks in order of importance. on the other hand,
In step ST7, the narrowing effect index 11 calculated by the narrowing effect estimating unit 10 is assigned to each search word candidate, and a search word candidate with a high narrowing effect index 11 is selected to be a presented search word candidate 12. The presented search word candidate generation unit 8 outputs the presented search word candidate 12 thus selected.
Is output to the classification result presentation unit 14 together with the document information belonging to the corresponding topic (presentation search word candidate generation step).

【００５３】次に、ステップＳＴ８において、分類結果
提示部１４は、提示検索語候補１２を、各話題に対応付
けて絞り込み効果指標１１と共に視覚化した分類結果１
５としてユーザ１３に提示する（分類結果提示ステッ
プ）。視覚化の方法は、例えば図４に示されたように、
検索語−話題対応表を用いる。検索語−話題対応表は、
提示検索語候補１２と対応する話題とが行列の形式によ
って表現されており、行列の各要素には視覚化された情
報として絞り込み効果指標１１をユーザ１３に提示す
る。また、分類結果提示部１４は、対応する話題に属す
る文書情報を用いて、各話題Ｔ１，Ｔ２，・・・，Ｔ６
の何れかをユーザ１３が指定すると、指定された話題に
属する文書の一覧、及び各種の文書情報が参照できるよ
うにする。Next, in step ST8, the classification result presentation section 14 classifies the presentation search word candidates 12 into visualizations together with the narrowing-down effect index 11 in association with each topic.
5 is presented to the user 13 (classification result presentation step). The visualization method is, for example, as shown in FIG.
A search word-topic correspondence table is used. The search term-topic correspondence table is
The presented search word candidate 12 and the corresponding topic are expressed in the form of a matrix, and a narrowing effect index 11 is presented to the user 13 as visualized information for each element of the matrix. Further, the classification result presenting unit 14 uses the document information belonging to the corresponding topic to generate the topics T1, T2,.
Is specified by the user 13, a list of documents belonging to the specified topic and various types of document information can be referred to.

【００５４】次に、ステップＳＴ９において、ユーザ１
３が分類結果提示部１４に提示された検索語−話題対応
表を参照して、話題を指定する場合にはステップＳＴ１
０に進み、ユーザ１３が話題を指定しない場合にはステ
ップＳＴ１２に進む。ステップＳＴ１０において、ユー
ザ１３は、指定する話題１８を絞り込み効果推定部１０
に入力する。次に、ステップＳＴ１１において、各話題
に属する文書情報及び重要度付き検索語集合９を、提示
検索語候補生成部８を参照して入手し、ユーザ１３が指
定した話題１８に属する検索語の絞り込み効果指標１１
を算出する（絞り込み効果推定ステップ）。絞り込み効
果指標１１の算出方法としては、例えばユーザ１３が指
定した話題１８に属する検索語に関して、各話題との統
計的な重みを再び算出する。または、ユーザ１３が指定
した話題１８に属する検索語を含む文書数を絞り込み効
果指標１１とする。このようにして算出された絞り込み
効果指標１１を提示検索語候補生成部８に出力し、さら
に、提示検索語候補生成部８から分類結果提示部１４に
出力され、分類結果提示部１４が図４に示されたような
検索語−話題対応表のように視覚化し、ユーザ１３が指
定した話題１８に属する提示検索語候補１２と各話題と
の対応で絞り込み効果指標１１をユーザ１３に提示す
る。ステップＳＴ１１の処理が終了した場合には、ステ
ップＳＴ７に戻る。Next, in step ST9, the user 1
3 designates a topic by referring to the search word-topic correspondence table presented in the classification result presentation section 14 and proceeds to step ST1.
If the user 13 does not specify a topic, the process proceeds to step ST12. In step ST10, the user 13 narrows the topic 18 to be specified by the narrowing-down effect estimation unit 10
To enter. Next, in step ST11, the document information belonging to each topic and the search word set 9 with importance are obtained with reference to the presented search word candidate generation unit 8, and the search words belonging to the topic 18 specified by the user 13 are narrowed down. Effect index 11
Is calculated (a narrowing effect estimation step). As a method of calculating the narrowing-down effect index 11, for example, for a search word belonging to the topic 18 specified by the user 13, the statistical weight with respect to each topic is calculated again. Alternatively, the number of documents including a search word belonging to the topic 18 specified by the user 13 is set as the narrowing effect index 11. The narrowing-down effect index 11 calculated in this manner is output to the suggested search word candidate generating unit 8, and further output from the suggested search word candidate generating unit 8 to the classification result presenting unit 14. Is visualized like a search word-topic correspondence table as shown in FIG. 3, and the narrowing-down effect index 11 is presented to the user 13 in correspondence with the presented search word candidates 12 belonging to the topic 18 specified by the user 13 and each topic. When the process in step ST11 ends, the process returns to step ST7.

【００５５】なお、絞り込み効果推定部１０に入力する
話題１８に関して、複数の話題を指定できるようにした
場合は、話題の選択幅が広がるので、ユーザ１３が所望
する文書の内容を、より的確に指定できるようになる。
例えば、複数指定された話題に属する検索語によって重
要度の高い順に上位一定個数を選択し、選択された提示
検索語候補１２と各話題との対応によって絞り込み効果
指標１１をユーザ１３に提示すればよい。If a plurality of topics can be specified for the topic 18 to be input to the narrowing-down effect estimating unit 10, the selection range of the topics is widened, so that the contents of the document desired by the user 13 can be more accurately determined. Can be specified.
For example, if a certain number of higher rankings are selected in descending order of importance according to search words belonging to a plurality of designated topics, and a narrowing effect index 11 is presented to the user 13 based on a correspondence between the selected presented search word candidate 12 and each topic. Good.

【００５６】また、絞り込み効果推定部１０が算出する
絞り込み効果指標１１を、検索語単位ではなく、話題単
位で算出することによって、より広い範囲の絞り込み効
果指標１１の傾向が検索結果全体に対して把握できるよ
うになる。例えば、指定された話題に属する文書中の単
語出現頻度に基づいて、話題の重みベクトルを作成し、
各話題との類似度をベクトルの内積値として求めて、指
定された話題と各話題との類似度を行列の形式によりユ
ーザ１３に提示すればよい。Further, the narrowing effect index 11 calculated by the narrowing effect estimating unit 10 is calculated not by a search word but by a topic unit, so that the tendency of the narrowing effect index 11 in a wider range is determined for the entire search result. Be able to grasp. For example, a topic weight vector is created based on the frequency of word appearance in a document belonging to a specified topic,
The similarity with each topic may be obtained as the inner product of the vectors, and the similarity between the designated topic and each topic may be presented to the user 13 in the form of a matrix.

【００５７】一方、ステップＳＴ１２において、ユーザ
１３が分類結果提示部１４が提示した検索語−話題対応
表を参照して、絞り込み効果推定部１０に対して話題１
８を入力せずに、再分類を指示した場合にはステップＳ
Ｔ１３に進み、再分類を指示しない場合にはこの処理を
終了する。On the other hand, in step ST12, the user 13 refers to the search word-topic correspondence table presented by the classification result presentation section 14 and sends the topic 1 to the narrowing-down effect estimation section 10.
If re-classification is instructed without inputting 8, step S
Proceeding to T13, this process ends if no re-classification is instructed.

【００５８】ステップＳＴ１３において、ユーザ１３は
分類結果提示部１４に対して、探したい文書に近い内容
の話題や検索語を指定するための指示情報１６を入力
し、再分類の指示を与える。分類結果提示部１４は、指
定された話題に属する文書の文書ＩＤ番号や、指定され
た検索語などの情報をユーザ１３の選択情報１９とし
て、文書特徴設定部２０に出力する。次に、ステップＳ
Ｔ１４において、文書特徴設定部２０は、ユーザ１３の
選択情報１９に基づいて、入力された文書ＩＤ番号に対
応する文書ベクトルや指定された検索語に対する重みを
変更して、変更された文書ベクトル２１を話題分類部６
に出力する（文書特徴設定ステップ）。重みの変更は、
例えば予め設定した定数値を重みに加算することによっ
て、指定された話題に属する文書や検索語の重みを変更
する。ステップＳＴ１４の処理が終了した場合には、ス
テップＳＴ４に戻る。In step ST13, the user 13 inputs, to the classification result presentation unit 14, instruction information 16 for designating a topic or a search word whose contents are close to the document to be searched, and gives a re-classification instruction. The classification result presentation unit 14 outputs information such as a document ID number of a document belonging to the specified topic and a specified search word to the document feature setting unit 20 as selection information 19 of the user 13. Next, step S
At T14, based on the selection information 19 of the user 13, the document feature setting unit 20 changes the document vector corresponding to the input document ID number and the weight for the specified search word, and changes the changed document vector 21. The topic classification unit 6
(Document feature setting step). Changing the weight
For example, the weight of a document or a search word belonging to a specified topic is changed by adding a predetermined constant value to the weight. When the process in step ST14 ends, the process returns to step ST4.

【００５９】以上のように、この実施の形態１によれ
ば、ユーザ１３が指定した話題１８に属する検索語の絞
り込み効果指標１１を算出する絞り込み効果推定部１０
と、絞り込み効果指標１１を視覚化してユーザ１３に提
示する分類結果提示部１４とを備え、ユーザ１３が指定
した話題１８に属する提示検索語候補１２と各話題とに
対応する絞り込み効果指標１１をユーザ１３に提示する
ようにしたので、ユーザ１３が追加検索語の選択を容易
にできるから、絞り込み効果の高い追加検索語を的確に
選択できると共に、絞り込み検索の効率がよくなるとい
う効果が得られる。As described above, according to the first embodiment, the narrowing-down effect estimating unit 10 for calculating the narrowing-down effect index 11 of the search word belonging to the topic 18 specified by the user 13.
And a classification result presenting unit 14 for visualizing the narrowing effect index 11 and presenting the narrowing effect index 11 to the user 13. The presented search word candidate 12 belonging to the topic 18 designated by the user 13 and the narrowing effect index 11 corresponding to each topic are provided. Since it is presented to the user 13, the user 13 can easily select an additional search word, so that an additional search word having a high narrowing effect can be accurately selected, and the effect of improving the efficiency of the narrow search can be obtained.

【００６０】また、この実施の形態１によれば、文書ベ
クトル集合５に基づいて算出した文書ベクトル間の類似
度に従って文書ベクトル集合５を複数の集合に分類する
ことで話題を作成する話題分類部６と、話題毎に検索語
の絞り込み効果指標１１を検索語−話題対応表のように
視覚化しユーザ１３に提示する分類結果提示部１４とを
備え、文書単位よりも大きな単位である話題単位で絞り
込み効果を提示するようにしたから、検索結果の文書全
体に対する絞り込み効果の一覧性が高まり、検索結果全
体の内容を効率的に把握できるという効果が得られる。Further, according to the first embodiment, the topic classifying unit for creating a topic by classifying the document vector set 5 into a plurality of sets according to the similarity between the document vectors calculated based on the document vector set 5 6 and a classification result presentation unit 14 for visualizing the search term narrowing effect index 11 for each topic as a search word-topic correspondence table and presenting it to the user 13, wherein the topic unit is a unit larger than the document unit. Since the narrowing-down effect is presented, the effect of listing the narrowing-down effect for the entire document in the search result is enhanced, and the effect of efficiently grasping the contents of the entire search result is obtained.

【００６１】また、この実施の形態１によれば、ユーザ
１３がフィードバック情報として指示する指示情報１６
に基づいて文書ベクトルの変更を行う文書特徴設定部２
０を備え、初回の検索結果に対して再分類するようにし
たので、話題間に分散した目的の文書を一つの話題に集
めるようにしたから、目的の文書への絞り込み検索の効
率がよくなるという効果が得られる。Further, according to the first embodiment, the instruction information 16 specified by the user 13 as feedback information.
Document feature setting unit 2 that changes the document vector based on
0, and re-classified for the first search result, so that the target documents dispersed among topics are collected into one topic, so that the efficiency of narrowing down search to the target documents is improved. The effect is obtained.

【００６２】実施の形態２．図５は、この発明の実施の
形態２による文書検索装置の構成を示すブロック図であ
る。図５において、図１と同一符号は同一または相当部
分を示すのでその説明を省略する。３１は検索結果文書
集合３における文書から、文書を特徴付ける単語を話題
分類項目として抽出し、抽出された話題分類項目に関連
するテキストを参照して、話題分類項目に対する重みベ
クトルを学習する話題分類項目取得部（話題分類項目取
得手段）である。３２は話題分類項目取得部３１が話題
分類部６に出力し、話題分類項目と重みベクトルとを含
む話題分類項目情報である。Embodiment 2 FIG. 5 is a block diagram showing a configuration of the document search device according to the second embodiment of the present invention. In FIG. 5, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. 31 is a topic classification item for extracting a word characterizing a document from the documents in the search result document set 3 as a topic classification item, referring to a text related to the extracted topic classification item, and learning a weight vector for the topic classification item. An acquisition unit (topic classification item acquisition means). Reference numeral 32 denotes topic classification item information output by the topic classification item acquisition unit 31 to the topic classification unit 6 and including a topic classification item and a weight vector.

【００６３】次に動作について説明する。実施の形態２
において、文書集合１、文書検索部２、文書特徴抽出部
４、提示検索語候補生成部８、絞り込み効果推定部１
０、ユーザ１３、分類結果提示部１４、文書特徴設定部
２０等の動作、及びこれらが奏する効果については、実
施の形態１と同様であるのでその説明を省略する。Next, the operation will be described. Embodiment 2
, Document set 1, document search unit 2, document feature extraction unit 4, presentation search word candidate generation unit 8, narrowing effect estimation unit 1
The operations of the user 0, the user 13, the classification result presenting unit 14, the document feature setting unit 20, and the like, and the effects produced by them are the same as those in the first embodiment, and thus description thereof will be omitted.

【００６４】話題分類項目取得部３１は、検索結果文書
集合３における文書から、文書を特徴付ける単語を話題
分類項目として抽出し、抽出された話題分類項目に関連
するテキストを参照して、話題分類項目に対する重みベ
クトルを学習し、話題分類項目を重みベクトルと共に話
題分類項目情報３２として話題分類部６に出力する（話
題分類項目取得ステップ）。The topic category item acquiring unit 31 extracts words characterizing the document from the documents in the search result document set 3 as topic category items, refers to the text related to the extracted topic category items, and refers to the topic category items. , And outputs the topic classification item together with the weight vector as topic classification item information 32 to the topic classification unit 6 (topic classification item acquisition step).

【００６５】話題分類項目としては、例えば文書中にお
いて出現率が高い単語や、ＨＴＭＬ文書に記述されてい
るＨＴＭＬタグを参照して得られるＨＴＭＬ文書のタイ
トルやＵＲＬに含まれるドメイン名などである。抽出さ
れた話題分類項目に関連するテキストは、例えば、話題
分類項目が存在する位置周辺のテキストを解析し、段落
区切り，章立て，箇条書き，リンク先などの特定のＨＴ
ＭＬタグを検出し、検出されたＨＴＭＬタグに関連付け
られたテキストを複写することにより、話題分類項目に
関連するテキストを抽出する。The topic classification items include, for example, words having a high appearance rate in a document, a title of an HTML document obtained by referring to an HTML tag described in the HTML document, and a domain name included in a URL. The text related to the extracted topic classification item is analyzed, for example, by analyzing the text around the position where the topic classification item exists, and specifying a specific HT such as a paragraph break, a chapter, a bullet point, or a link destination.
The text associated with the topic classification item is extracted by detecting the ML tag and copying the text associated with the detected HTML tag.

【００６６】また、話題分類項目取得部３１は、抽出さ
れた話題分類項目に関連するテキストを用いて、話題分
類項目に対する重みベクトルを学習し、話題分類項目を
重みベクトルと共に話題分類部６に出力する。話題分類
部６は、話題分類項目取得部３１から入力した話題分類
項目情報３２と、文書特徴抽出部４から入力した文書ベ
クトル集合５の各文書ベクトルとの類似度を、例えばベ
クトルの内積値により算出することによって、最も類似
度が高い話題分類項目に文書を分類する。The topic classification item acquiring unit 31 learns a weight vector for the topic classification item by using the text related to the extracted topic classification item, and outputs the topic classification item to the topic classification unit 6 together with the weight vector. I do. The topic classification unit 6 calculates the similarity between the topic classification item information 32 input from the topic classification item acquisition unit 31 and each document vector of the document vector set 5 input from the document feature extraction unit 4 by, for example, the inner product of the vectors. By the calculation, the document is classified into the topic classification item having the highest similarity.

【００６７】以上のように、この実施の形態２によれ
ば、実施の形態１と同様の効果を奏すると共に、検索結
果文書集合３における文書から話題分類項目を抽出し、
当該話題分類項目に対する重みベクトルと共に出力する
話題分類項目取得部３１を備え、話題分類項目情報３２
と各文書ベクトルとの類似度に基づいて文書を分類する
ようにしたので、話題分類部６で用いる話題分類項目が
自動的に取得でき、予め分類カテゴリを設定しておく必
要がなくなるから、検索結果の文書項目の設定作業が不
要になると共に、ユーザ１３の絞り込み検索の効率がよ
くなるという効果が得られる。As described above, according to the second embodiment, the same effects as those of the first embodiment can be obtained, and the topic classification item is extracted from the documents in the search result document set 3.
A topic classification item acquiring unit 31 that outputs a topic classification item together with a weight vector for the topic classification item;
Since the documents are classified based on the similarity with the respective document vectors, the topic classification items used in the topic classification unit 6 can be automatically obtained, and it is not necessary to set the classification categories in advance. The result is that the work of setting the resulting document item is not required, and the efficiency of the narrowed search of the user 13 is improved.

【００６８】実施の形態３．図６は、この発明の実施の
形態３による文書検索装置の構成を示すブロック図であ
る。図６において、図１と同一符号は同一または相当部
分を示すのでその説明を省略する。４１はユーザ１３か
ら指定された文書から文書ベクトルを算出する指定文書
特徴抽出部（指定文書特徴抽出手段）、４２は指定文書
特徴抽出部４１が文書特徴設定部２０に出力する文書ベ
クトル、４３はユーザ１３が指定文書特徴抽出部４１に
出力する指定された文書である。Embodiment 3 FIG. 6 is a block diagram showing a configuration of a document search device according to Embodiment 3 of the present invention. 6, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. 41 is a designated document feature extraction unit (designated document feature extraction unit) that calculates a document vector from a document designated by the user 13; 42 is a document vector output by the designated document feature extraction unit 41 to the document feature setting unit 20; This is a designated document output to the designated document feature extraction unit 41 by the user 13.

【００６９】次に動作について説明する。実施の形態３
において、文書集合１、文書検索部２、文書特徴抽出部
４、話題分類部６、提示検索語候補生成部８、絞り込み
効果推定部１０、分類結果提示部１４等の動作、及びこ
れらが奏する効果については、実施の形態１と同様であ
るのでその説明を省略する。Next, the operation will be described. Embodiment 3
, The operations of the document set 1, the document search unit 2, the document feature extraction unit 4, the topic classification unit 6, the presentation search word candidate generation unit 8, the narrowing effect estimation unit 10, the classification result presentation unit 14, and the effects produced by them Is the same as in the first embodiment, and a description thereof will be omitted.

【００７０】ユーザ１３は、分類結果提示部１４を参照
して、ユーザ１３が所望する文書に近い内容の文書を選
択し、指定文書特徴抽出部４１に指定する文書４３を指
示する。指定文書特徴抽出部４１は、指定された文書４
３に含まれる単語の出現回数に基づいて統計的な重みを
計算して文書ベクトル４２を算出し、文書特徴設定部２
０に出力する。文書特徴設定部２０は、指定された文書
４３の文書ベクトル４２と、文書特徴抽出部４から入力
した文書ベクトル集合５との類似度を計算し、文書ベク
トル集合５における文書ベクトルの重みを変更する（指
定文書特徴抽出ステップ）。例えば、類似度の高い順に
上位一定個数の文書ベクトルを文書ベクトル集合５から
選択し、類似度を文書ベクトルの重みに加算して変更す
る。話題分類部６は、変更された文書ベクトル集合５に
対して分類を行う。The user 13 refers to the classification result presentation unit 14, selects a document whose content is close to the document desired by the user 13, and instructs the designated document feature extraction unit 41 of the document 43 to be designated. The designated document feature extraction unit 41 specifies the designated document 4
3, a document weight 42 is calculated based on the number of appearances of the word included in the document feature setting unit 2, and the document feature setting unit 2
Output to 0. The document feature setting unit 20 calculates the similarity between the document vector 42 of the designated document 43 and the document vector set 5 input from the document feature extraction unit 4, and changes the weight of the document vector in the document vector set 5. (Designated document feature extraction step). For example, a certain number of upper-order document vectors are selected from the document vector set 5 in descending order of similarity, and the similarity is added to the weight of the document vector to be changed. The topic classifying unit 6 classifies the changed document vector set 5.

【００７１】以上のように、この実施の形態３によれ
ば、実施の形態１と同様の効果を奏すると共に、ユーザ
１３から指定された文書から文書ベクトルを算出する指
定文書特徴抽出部４１を備え、指定文書特徴抽出部４１
が出力する文書ベクトル４２と文書特徴抽出部４が出力
した文書ベクトル集合５との類似度を計算し、文書ベク
トルの重みを変更するようにしたので、ユーザ１３が所
望する文書に近い内容の文書４３を直接指定することに
よって、文書特徴設定部２０の文書ベクトルの変更が可
能になると共に、ユーザ１３の絞り込み検索の効率がよ
くなるという効果が得られる。As described above, according to the third embodiment, the same effects as those of the first embodiment can be obtained, and the designated document feature extracting unit 41 for calculating the document vector from the document designated by the user 13 is provided. , Designated document feature extraction unit 41
Calculates the similarity between the document vector 42 output by the user and the document vector set 5 output by the document feature extraction unit 4 and changes the weight of the document vector. By directly specifying 43, it is possible to change the document vector of the document feature setting unit 20, and it is possible to obtain an effect that the efficiency of the refined search of the user 13 is improved.

【００７２】実施の形態４．図７は、この発明の実施の
形態４による文書検索装置の構成を示すブロック図であ
る。図７において、図１と同一符号は同一または相当部
分を示すのでその説明を省略する。５１は単語と当該単
語に関連する関連語が定義されている関連語辞書（第１
の記録手段）、５２は検索語が入力されると検索語に関
連する関連語を関連語辞書５１から選択し関連語を出力
する関連語設定部（関連語設定手段）、５３は文書特徴
設定部２０が出力し関連語設定部５２に入力する検索
語、５４は関連語設定部５２が出力し文書特徴設定部２
０に入力する関連語である。Embodiment 4 FIG. 7 is a block diagram showing a configuration of a document search device according to Embodiment 4 of the present invention. 7, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. 51 is a related word dictionary (first word) in which a word and a related word related to the word are defined.
Is a related word setting unit (related word setting means) for selecting a related word related to the search word from the related word dictionary 51 and outputting the related word when the search word is input, and 53 is a document feature setting. The search word 54 output by the unit 20 and input to the related word setting unit 52 is indicated by 54.
It is a related word that is input to 0.

【００７３】図８は、この発明の実施の形態４における
単語と関連語とを定義した一例を示す説明図である。図
８において、関連語は、異表記と類似語とを関連語とし
ており、それぞれの行に記述される単語と対応してい
る。FIG. 8 is an explanatory diagram showing an example in which words and related words are defined in the fourth embodiment of the present invention. In FIG. 8, the related word has a different notation and a similar word as related words, and corresponds to the word described in each line.

【００７４】次に動作について説明する。実施の形態４
において、文書集合１、文書検索部２、文書特徴抽出部
４、話題分類部６、提示検索語候補生成部８、絞り込み
効果推定部１０、ユーザ１３、分類結果提示部１４等の
動作、及びこれらが奏する効果については、実施の形態
１と同様であるのでその説明を省略する。Next, the operation will be described. Embodiment 4
The operations of the document set 1, the document search unit 2, the document feature extraction unit 4, the topic classification unit 6, the presentation search word candidate generation unit 8, the narrowing effect estimation unit 10, the user 13, the classification result presentation unit 14, and the like Are the same as those in the first embodiment, and the description thereof is omitted.

【００７５】関連語辞書５１には、図８に示されたよう
に、所定の単語と当該単語に関連する単語とを予め記録
しておく（第１の記録ステップ）。関連語設定部５２
は、関連語辞書５１を参照し、入力した検索語５３に対
応する異表記と類似語とを抽出して関連語５４とし、文
書特徴設定部２０に関連語５４を出力する（関連語設定
ステップ）。文書特徴設定部２０は、関連語設定部５２
から入力した関連語５４を、分類結果提示部１４から入
力したユーザ１３の選択情報１９に追加し、文書特徴抽
出部４から入力した文書ベクトル集合５を変更する。As shown in FIG. 8, a predetermined word and a word related to the word are previously recorded in the related word dictionary 51 (first recording step). Related word setting unit 52
Refers to the related word dictionary 51, extracts a different notation and a similar word corresponding to the input search word 53, sets them as a related word 54, and outputs the related word 54 to the document feature setting unit 20 (a related word setting step). ). The document feature setting unit 20 includes a related word setting unit 52
Is added to the selection information 19 of the user 13 input from the classification result presentation unit 14, and the document vector set 5 input from the document feature extraction unit 4 is changed.

【００７６】以上のように、この実施の形態４によれ
ば、実施の形態１と同様の効果を奏すると共に、関連語
辞書５１を参照して関連語５４を出力する関連語設定部
５２を備え、関連語設定部５２から入力する関連語５４
を、分類結果提示部１４から入力したユーザ１３の選択
情報１９に追加し、文書ベクトル集合５を変更するよう
にしたので、検索語５３の異表記や類似語などを含む文
書が検索できるようになるから、検索漏れが抑制される
と共に、ユーザ１３が所望する文書を発見するための検
索の効率がよくなるという効果が得られる。As described above, according to the fourth embodiment, the same effects as those of the first embodiment can be obtained, and the related word setting unit 52 that outputs the related word 54 with reference to the related word dictionary 51 is provided. Related word 54 input from related word setting unit 52
Is added to the selection information 19 of the user 13 input from the classification result presentation unit 14 and the document vector set 5 is changed, so that a document including a different notation or a similar word of the search word 53 can be searched. Therefore, an effect is obtained that search omission is suppressed and the efficiency of search for finding a document desired by the user 13 is improved.

【００７７】実施の形態５．図９は、この発明の実施の
形態５による文書検索装置の構成を示すブロック図であ
る。図９において、図１と同一符号は同一または相当部
分を示すのでその説明を省略する。６１は磁気記録装置
などで構成され検索要求の作成知識を記録する検索要求
作成知識（第２の記録手段）、６２は検索要求検索語が
入力されると検索要求検索語に基づく最適な検索要求文
を検索要求作成知識６１から選択し検索要求文を出力す
る検索要求作成部（検索要求作成手段）、６３は提示検
索語候補生成部８が出力し検索要求作成部６２に入力す
る検索要求検索語、６４は検索要求作成部６２が出力し
文書検索部２に入力する検索要求文である。Embodiment 5 FIG. 9 is a block diagram showing a configuration of a document search device according to Embodiment 5 of the present invention. 9, the same reference numerals as those in FIG. 1 denote the same or corresponding parts, and a description thereof will not be repeated. Reference numeral 61 denotes search request creation knowledge (second recording means) which is composed of a magnetic recording device or the like and records search request creation knowledge, and 62 denotes an optimum search request based on the search request search word when a search request search word is input. A search request creation unit (search request creation means) for selecting a sentence from the search request creation knowledge 61 and outputting a search request sentence, 63 is a search request search output by the presented search word candidate generation unit 8 and input to the search request creation unit 62 The word 64 is a search request sentence output by the search request creation unit 62 and input to the document search unit 2.

【００７８】次に動作について説明する。実施の形態５
において、文書集合１、文書特徴抽出部４、話題分類部
６、絞り込み効果推定部１０、ユーザ１３、分類結果提
示部１４、文書特徴設定部２０等の動作、及びこれらが
奏する効果については、実施の形態１と同様であるので
その説明を省略する。Next, the operation will be described. Embodiment 5
The operations of the document set 1, the document feature extracting unit 4, the topic classifying unit 6, the narrowing-down effect estimating unit 10, the user 13, the classification result presenting unit 14, the document feature setting unit 20, etc., and the effects of these are implemented. Since it is the same as the first embodiment, the description is omitted.

【００７９】提示検索語候補生成部８は、絞り込み効果
推定部１０から入力された絞り込み効果指標１１を参照
し、絞り込み効果指標１１の高い検索語を検索要求作成
部６２に検索要求検索語６３として出力する。検索要求
検索語６３は複数であってもよい。検索要求作成知識６
１には、文書検索部２に対して検索処理の実行を指示す
る検索要求文６４を作成するための知識が予め定義され
ている（第２の記録ステップ）。例えば、検索要求文６
４は、検索命令と検索条件との２種類から構成される。
検索命令としては、例えば、＜検索実行＞，＜実行状態
取得＞，＜データベース指定＞などのコマンドの種類を
定義する。また、検索条件としては、検索語，検索語間
の論理演算子，検索結果として得る情報の指定などのパ
ラメータの記述形式を定義する。The suggested search word candidate generating unit 8 refers to the narrowing effect index 11 input from the narrowing effect estimating unit 10, and sets a search word having a high narrowing effect index 11 to the search request creating unit 62 as a search request search word 63. Output. The search request search term 63 may be plural. Search request creation knowledge 6
1, the knowledge for creating the search request sentence 64 instructing the document search unit 2 to execute the search process is defined in advance (second recording step). For example, search request sentence 6
4 is composed of two types, a search command and a search condition.
As the search command, for example, command types such as <search execution>, <acquisition of execution status>, and <database specification> are defined. As the search condition, a description format of parameters such as a search word, a logical operator between the search words, and specification of information obtained as a search result is defined.

【００８０】検索要求作成部６２は、検索要求作成知識
６１を参照して、検索要求文６４の定義に従って検索要
求検索語６３を検索条件に設定する（検索要求作成ステ
ップ）。検索要求文６４の検索命令は、例えば＜検索実
行＞とし、検索要求文６４を作成して文書検索部２に出
力する。また、検索要求文６４の検索条件を設定する際
には、複数の検索要求検索語６３に対して例えばＡＮＤ
演算子で記述し、検索要求検索語６３に付与された絞り
込み効果指標１１が予め設定した閾値以上であれば、絞
り込み効果が高いとみなして、より広範囲の文書を検索
するように、ＯＲ演算子で記述することもできる。The search request creating section 62 refers to the search request creating knowledge 61 and sets a search request search word 63 as a search condition in accordance with the definition of the search request sentence 64 (search request creating step). The search command of the search request sentence 64 is, for example, <search execution>, and the search request sentence 64 is created and output to the document search unit 2. When setting the search condition of the search request sentence 64, for example, an AND
If the narrowing effect index 11 assigned to the search request search word 63 is greater than or equal to a preset threshold, the narrowing effect is considered to be high, and the OR operator is used to search a wider range of documents. Can also be described as

【００８１】以上のように、この実施の形態５によれ
ば、実施の形態１と同様の効果を奏すると共に、検索要
求作成知識６１を参照して検索要求文６４を作成する検
索要求作成部６２を備え、検索要求作成部６２が出力す
る検索要求文６４に基づいて再び検索するようにしたの
で、初回の検索結果に含まれなかったユーザ１３が所望
する文書を文書集合１から自動的に検索できるようにな
ると共に、ユーザ１３の絞り込み検索の効率がよくなる
という効果が得られる。As described above, according to the fifth embodiment, the same effects as those of the first embodiment can be obtained, and a search request creating unit 62 for creating a search request sentence 64 with reference to the search request creation knowledge 61. And a search is performed again based on the search request sentence 64 output by the search request creation unit 62. Therefore, a document desired by the user 13 that is not included in the first search result is automatically searched from the document set 1. As a result, it is possible to obtain the effect that the efficiency of the refined search of the user 13 is improved.

【００８２】[0082]

【発明の効果】以上のように、この発明によれば、文書
集合から検索条件に適合する文書を検索し出力する文書
検索手段と、文書に記述されている単語の出現頻度に基
づいて、単語の統計的な重みを算出し、重みから得られ
る文書ベクトル集合を出力する文書特徴抽出手段と、文
書ベクトル集合を、文書ベクトル間の類似度に従って分
類することによって話題を作成し、話題に属する文書情
報と話題の重要度付き検索語とを出力する話題分類手段
と、文書情報と重要度付き検索語とを参照して話題に属
する検索語の絞り込み効果指標を算出し出力する絞り込
み効果推定手段と、絞り込み効果指標を検索語に付与
し、絞り込み効果指標が高い検索語を選択して提示検索
語候補とし、当該提示検索語候補と当該提示検索語候補
に対応する文書情報とを出力する提示検索語候補生成手
段と、提示検索語候補と話題とを対応付けて、絞り込み
効果指標と共に提示し、指示情報または選択情報のどち
らか一方もしくは両方を入力するように促す分類結果提
示手段と、選択情報に基づいて、文書ベクトル集合に含
まれる文書ベクトルを変更して出力する文書特徴設定手
段とを備えるように構成したので、絞り込み検索のため
に提示する提示検索語候補に対して絞り込み効果指標を
付与したから、追加検索語の選択を容易にすることがで
き、絞り込み検索を効率よく実行できるという効果を奏
する。As described above, according to the present invention, a document retrieval means for retrieving and outputting a document meeting a retrieval condition from a document set and a word retrieval method based on the appearance frequency of the word described in the document. A document feature extracting means for calculating a statistical weight of the document vector and outputting a set of document vectors obtained from the weight, and generating a topic by classifying the set of document vectors according to the similarity between the document vectors, thereby forming a document belonging to the topic. A topic classification means for outputting information and a search word with the importance of the topic; a narrowing effect estimating means for calculating and outputting a narrowing effect index of the search word belonging to the topic by referring to the document information and the search word with the importance; , Assigning a narrowing effect index to a search word, selecting a search word having a high narrowing effect index as a suggested search word candidate, and presenting the suggested search word candidate and the document information corresponding to the suggested search word candidate A suggested search word candidate generating means for outputting a suggestion search word candidate and a topic in association with a narrowing-down effect index, and prompting the user to input one or both of instruction information and selection information Means, and a document feature setting means for changing and outputting the document vector included in the document vector set based on the selection information. Since the narrowing effect index is provided, it is possible to easily select an additional search word, and it is possible to perform the narrowing search efficiently.

【００８３】この発明によれば、絞り込み効果推定手段
が、少なくとも一つまたは複数の話題に対して絞り込み
効果指標を算出し出力するように構成したので、文書単
位よりも大きな単位である話題単位で絞り込み検索を実
行できるから、より広い範囲の検索結果に対する絞り込
み効果指標の傾向が把握できると共に、ユーザが所望す
る文書の内容を的確に指定できるという効果を奏する。According to the present invention, the narrowing effect estimating means is configured to calculate and output a narrowing effect index for at least one or a plurality of topics. Since the refined search can be executed, it is possible to grasp the tendency of the refined effect index for a wider range of search results, and it is possible to accurately specify a desired document content by the user.

【００８４】この発明によれば、分類結果提示手段が、
話題と当該話題に属する提示検索語候補とを行列の形式
で提示し、行列の各要素に絞り込み効果指標を提示する
ように構成したので、検索結果の文書全体に対する絞り
込み効果の一覧性が高まり、検索結果全体の内容を効率
的に把握できるという効果を奏する。According to the present invention, the classification result presenting means:
Since the topic and the proposed search word candidates belonging to the topic are presented in the form of a matrix, and the narrowing effect index is presented for each element of the matrix, the listability of the narrowing effect for the entire document in the search result is improved, The effect is that the contents of the entire search result can be efficiently grasped.

【００８５】この発明によれば、文書検索手段が出力す
る文書から当該文書を特徴付ける単語を話題分類項目と
して抽出し、話題分類項目に関連するテキストを参照し
て、話題分類項目に対する重みベクトルを学習し、話題
分類項目と重みベクトルとを話題分類手段に出力する話
題分類項目取得手段を備え、話題分類手段は、話題分類
項目と重みベクトルとに基づいて文書ベクトル集合を分
類することによって話題を作成するように構成したの
で、話題分類手段で用いる話題分類項目が自動的に取得
でき、予め分類カテゴリを設定しておく必要がなくなる
から、検索結果の文書項目の設定作業が不要になると共
に、絞り込み検索の効率がよくなるという効果を奏す
る。According to the present invention, a word characterizing the document is extracted as a topic classification item from a document output by the document retrieval means, and a weight vector for the topic classification item is learned by referring to a text related to the topic classification item. And a topic classification item acquiring unit that outputs the topic classification item and the weight vector to the topic classification unit. The topic classification unit creates a topic by classifying a set of document vectors based on the topic classification item and the weight vector. , The topic classification items used by the topic classification means can be automatically obtained, and it is not necessary to set the classification category in advance. This has the effect of improving search efficiency.

【００８６】この発明によれば、話題分類項目取得手段
が、文書検索手段から出力される文書から当該文書に記
述されている単語の出現頻度に基づいて話題分類項目を
抽出するように構成したので、話題分類項目を自動的に
効率よく抽出することができるという効果を奏する。According to the present invention, the topic classification item obtaining means is configured to extract the topic classification items from the document output from the document search means based on the frequency of appearance of the word described in the document. This has the effect of automatically and efficiently extracting topic classification items.

【００８７】この発明によれば、話題分類項目取得手段
が、文書検索手段から出力される文書から当該文書に記
述されているタグを参照して話題分類項目を抽出するよ
うに構成したので、タグが記述されている文書から話題
分類項目を自動的に効率よく抽出することができるとい
う効果を奏する。According to the present invention, the topic classification item acquiring means is configured to extract the topic classification item from the document output from the document search means by referring to the tag described in the document. The topic classification item can be automatically and efficiently extracted from the document in which is described.

【００８８】この発明によれば、分類結果提示手段を介
して指定された文書から文書ベクトルを算出し、文書特
徴設定手段に出力する指定文書特徴抽出手段を備え、文
書特徴設定手段が、指定文書特徴抽出手段から出力され
る文書ベクトルと、文書特徴抽出手段から出力される文
書ベクトル集合とに基づいて文書ベクトル集合を変更す
るように構成したので、ユーザが所望する文書に近い内
容の文書を直接指定することによって、文書特徴設定手
段の文書ベクトルの変更が可能になると共に、絞り込み
検索の効率がよくなるという効果を奏する。According to the present invention, there is provided a designated document feature extraction means for calculating a document vector from a document designated via the classification result presentation means and outputting the document vector to the document feature setting means. Since the document vector set is changed based on the document vector output from the feature extraction unit and the document vector set output from the document feature extraction unit, a document having contents close to the document desired by the user can be directly input. By specifying, it is possible to change the document vector of the document feature setting means, and it is possible to improve the efficiency of the narrowed search.

【００８９】この発明によれば、所定の単語と関連する
単語を定義し関連語として記録する第１の記録手段と、
指定された検索語に対応する関連語を第１の記録手段か
ら抽出して文書特徴設定手段に出力する関連語設定手段
とを備え、文書特徴設定手段が、関連語と分類結果提示
手段から入力した選択情報とに基づいて、文書ベクトル
集合を変更するように構成したので、検索語の異表記や
類似語などを含む文書が検索できるようになるから、検
索漏れが抑制されると共に、ユーザが所望する文書を発
見するための検索の効率がよくなるという効果を奏す
る。According to the present invention, the first recording means for defining a word related to a predetermined word and recording it as a related word,
A related word setting means for extracting a related word corresponding to the specified search word from the first recording means and outputting the related word to the document feature setting means, wherein the document feature setting means inputs the related word and the classification result presentation means The document vector set is configured to be changed based on the selected information, so that it becomes possible to search for a document that includes a different notation or a similar word of the search word. This has the effect of improving the efficiency of the search for finding the desired document.

【００９０】この発明によれば、検索要求文の作成知識
を記録する第２の記録手段と、当該第２の記録手段を参
照して、提示検索語候補生成手段が出力した検索語に対
応する検索要求文を作成し、文書検索手段に出力する検
索要求作成手段とを備え、提示検索語候補生成手段が、
絞り込み効果指標に基づいて検索要求作成手段に出力す
る検索語を選択するように構成したので、初回の検索結
果に含まれなかったユーザが所望する文書を文書集合か
ら自動的に検索できるようになると共に、ユーザの絞り
込み検索の効率がよくなるという効果を奏する。According to the present invention, the second recording means for recording the creation knowledge of the search request sentence, and the second search means refers to the second storage means and corresponds to the search word output by the presented search word candidate generation means. A search request creating unit that creates a search request sentence and outputs the created search request sentence to the document search unit.
Since a search term to be output to the search request creating means is selected based on the narrowing effect index, a user who is not included in the first search result can automatically search for a desired document from the document set. At the same time, there is an effect that the efficiency of the refined search by the user is improved.

【００９１】この発明によれば、提示検索語候補生成手
段が、複数の検索語を選択して検索要求作成手段に出力
し、検索要求作成手段が、複数の検索語に対する論理演
算から検索要求文を作成するように構成したので、ＡＮ
Ｄ演算ではより的確に絞り込み検索が実行でき、ＯＲ演
算ではより広範囲に絞り込み検索が実行できるという効
果を奏する。According to the present invention, the presented search word candidate generation means selects a plurality of search words and outputs the selected search words to the search request creation means, and the search request creation means executes a search request sentence from a logical operation on the plurality of search words. Is configured to create
In the D operation, the refined search can be executed more accurately, and in the OR operation, the refined search can be executed in a wider range.

【００９２】この発明によれば、文書集合から検索条件
に適合する文書を検索し出力する文書検索ステップと、
文書に記述されている単語の出現頻度に基づいて、単語
の統計的な重みを算出し、重みから得られる文書ベクト
ル集合を出力する文書特徴抽出ステップと、文書ベクト
ル集合を、文書ベクトル間の類似度に従って分類するこ
とによって話題を作成し、話題に属する文書情報と話題
の重要度付き検索語とを出力する話題分類ステップと、
文書情報と重要度付き検索語とを参照して話題に属する
検索語の絞り込み効果指標を算出し出力する絞り込み効
果推定ステップと、絞り込み効果指標を検索語に付与
し、絞り込み効果指標が高い検索語を選択して提示検索
語候補とし、当該提示検索語候補と当該提示検索語候補
に対応する文書情報とを出力する提示検索語候補生成ス
テップと、提示検索語候補と話題とを対応付けて、絞り
込み効果指標と共に提示し、指示情報または選択情報の
どちらか一方もしくは両方を入力するように促す分類結
果提示ステップと、選択情報に基づいて、文書ベクトル
集合に含まれる文書ベクトルを変更して出力する文書特
徴設定ステップとを有するように構成したので、絞り込
み検索のために提示する提示検索語候補に対して絞り込
み効果指標を付与したから、追加検索語の選択を容易に
することができ、絞り込み検索を効率よく実行できると
いう効果を奏する。According to the present invention, a document search step for searching and outputting a document meeting a search condition from a set of documents;
A document feature extraction step of calculating a statistical weight of the word based on the frequency of occurrence of the word described in the document and outputting a set of document vectors obtained from the weight; A topic classification step of creating a topic by classifying according to the degree and outputting document information belonging to the topic and a search word with the importance of the topic;
A narrowing effect estimation step of calculating and outputting a narrowing effect index of the search word belonging to the topic by referring to the document information and the search word with importance, and a search word having a high narrowing effect index by assigning the narrowing effect index to the search word Selecting a suggested search word candidate, and a suggested search word candidate generating step of outputting the suggested search word candidate and the document information corresponding to the suggested search word candidate, and associating the suggested search word candidate with the topic, A classification result presenting step of presenting together with the narrowing-down effect index and prompting to input either or both of the instruction information and the selection information; and changing and outputting the document vector included in the document vector set based on the selection information. And a document feature setting step, so that a narrowing effect index is assigned to a suggested search word candidate to be presented for the narrowing search. From additional search term select can facilitate the, an effect that the narrowing search can be efficiently performed.

【００９３】この発明によれば、絞り込み効果推定ステ
ップが、少なくとも一つまたは複数の話題に対して絞り
込み効果指標を算出し出力するように構成したので、文
書単位よりも大きな単位である話題単位で絞り込み検索
を実行できるから、より広い範囲の検索結果に対する絞
り込み効果指標の傾向が把握できると共に、ユーザが所
望する文書の内容を的確に指定できるという効果を奏す
る。According to the present invention, since the narrowing effect estimation step is configured to calculate and output a narrowing effect index for at least one or a plurality of topics, the narrowing effect estimation step is performed in a topic unit which is a unit larger than a document unit. Since the refined search can be executed, it is possible to grasp the tendency of the refined effect index for a wider range of search results, and it is possible to accurately specify a desired document content by the user.

【００９４】この発明によれば、分類結果提示ステップ
が、話題と当該話題に属する提示検索語候補とを行列の
形式で提示し、行列の各要素に絞り込み効果指標を提示
するように構成したので、検索結果の文書全体に対する
絞り込み効果の一覧性が高まり、検索結果全体の内容を
効率的に把握できるという効果を奏する。According to the present invention, the classification result presenting step is configured to present the topic and the proposed search word candidates belonging to the topic in the form of a matrix, and to present a narrowing effect index to each element of the matrix. In addition, it is possible to enhance the listability of the narrowing-down effect of the search result with respect to the entire document, and it is possible to efficiently grasp the contents of the entire search result.

【００９５】この発明によれば、文書検索ステップが出
力する文書から当該文書を特徴付ける単語を話題分類項
目として抽出し、話題分類項目に関連するテキストを参
照して、話題分類項目に対する重みベクトルを学習し、
話題分類項目と重みベクトルとを出力する話題分類項目
取得ステップを有し、話題分類ステップが、話題分類項
目と重みベクトルとに基づいて文書ベクトル集合を分類
することによって話題を作成するように構成したので、
話題分類ステップで用いる話題分類項目が自動的に取得
でき、予め分類カテゴリを設定しておく必要がなくなる
から、検索結果の文書項目の設定作業が不要になると共
に、絞り込み検索の効率がよくなるという効果を奏す
る。According to the present invention, a word characterizing the document is extracted as a topic classification item from a document output by the document retrieval step, and a weight vector for the topic classification item is learned by referring to a text related to the topic classification item. And
A topic classification item obtaining step for outputting a topic classification item and a weight vector, wherein the topic classification step is configured to generate a topic by classifying a document vector set based on the topic classification item and the weight vector. So
The topic classification item used in the topic classification step can be obtained automatically, and it is not necessary to set the classification category in advance. Therefore, the work of setting the document item of the search result becomes unnecessary, and the efficiency of the narrowed search is improved. To play.

【００９６】この発明によれば、話題分類項目取得ステ
ップが、文書検索ステップから出力される文書から当該
文書に記述されている単語の出現頻度に基づいて話題分
類項目を抽出するように構成したので、話題分類項目を
自動的に効率よく抽出することができるという効果を奏
する。According to the present invention, the topic classification item obtaining step is configured to extract a topic classification item from a document output from the document search step based on the frequency of occurrence of a word described in the document. This has the effect of automatically and efficiently extracting topic classification items.

【００９７】この発明によれば、話題分類項目取得ステ
ップが、文書検索ステップから出力される文書から当該
文書に記述されているタグを参照して話題分類項目を抽
出するように構成したので、タグが記述されている文書
から話題分類項目を自動的に効率よく抽出することがで
きるという効果を奏する。According to the present invention, the topic classification item obtaining step is configured to extract the topic classification item from the document output from the document search step by referring to the tag described in the document. The topic classification item can be automatically and efficiently extracted from the document in which is described.

【００９８】この発明によれば、分類結果提示ステップ
を介して指定された文書から文書ベクトルを算出し出力
する指定文書特徴抽出ステップを有し、文書特徴設定ス
テップが、指定文書特徴抽出ステップから出力された文
書ベクトルと、文書特徴抽出ステップから出力された文
書ベクトル集合とに基づいて文書ベクトル集合を変更す
るように構成したので、ユーザが所望する文書に近い内
容の文書を直接指定することによって、文書特徴設定ス
テップの文書ベクトルの変更が可能になると共に、絞り
込み検索の効率がよくなるという効果を奏する。According to the present invention, there is provided a designated document feature extraction step for calculating and outputting a document vector from a document designated via the classification result presentation step, and the document feature setting step includes the step of outputting from the designated document feature extraction step. The document vector set is configured to change the document vector set based on the document vector set output from the document feature extraction step, so that the user directly specifies a document having contents close to the desired document, This makes it possible to change the document vector in the document feature setting step, and to improve the efficiency of the narrowed search.

【００９９】この発明によれば、所定の単語と関連する
単語を定義し関連語として記録する第１の記録ステップ
と、指定された検索語に対応する関連語を抽出して出力
する関連語設定ステップとを有し、文書特徴設定ステッ
プが、関連語と分類結果提示ステップから入力した選択
情報とに基づいて、文書ベクトル集合を変更するように
構成したので、検索語の異表記や類似語などを含む文書
が検索できるようになるから、検索漏れが抑制されると
共に、ユーザが所望する文書を発見するための検索の効
率がよくなるという効果を奏する。According to the present invention, a first recording step of defining a word related to a predetermined word and recording it as a related word, and a related word setting for extracting and outputting a related word corresponding to a specified search word And the document feature setting step is configured to change the set of document vectors based on the related words and the selection information input from the classification result presentation step. Can be searched, so that omission of search is suppressed and the efficiency of search for finding a document desired by a user is improved.

【０１００】この発明によれば、検索要求文の作成知識
を記録する第２の記録ステップと、提示検索語候補生成
ステップが出力した検索語に対応する検索要求文を作成
し出力する検索要求作成ステップとを有し、提示検索語
候補生成ステップが、絞り込み効果指標に基づいて出力
する検索語を選択するように構成したので、初回の検索
結果に含まれなかったユーザが所望する文書を文書集合
から自動的に検索できるようになると共に、ユーザの絞
り込み検索の効率がよくなるという効果を奏する。According to the present invention, the second recording step of recording the creation knowledge of the search request sentence and the search request creation of creating and outputting the search request sentence corresponding to the search word output by the presented search word candidate generation step And the step of generating a suggested search word candidate is configured to select a search word to be output based on the narrowing effect index, so that a document desired by a user who is not included in the first search result is set in a document set. This makes it possible to automatically perform a search, and to improve the efficiency of a narrowed search by the user.

【０１０１】この発明によれば、提示検索語候補生成ス
テップが、複数の検索語を選択して出力し、検索要求作
成ステップが、複数の検索語に対する論理演算から検索
要求文を作成するように構成したので、ＡＮＤ演算では
より的確に絞り込み検索が実行でき、ＯＲ演算ではより
広範囲に絞り込み検索が実行できるという効果を奏す
る。According to the present invention, the suggested search word candidate generating step selects and outputs a plurality of search words, and the search request creating step creates a search request sentence from a logical operation on the plurality of search words. With this configuration, the AND operation can execute a narrowing search more accurately, and the OR operation has an effect that the narrowing search can be executed in a wider range.

[Brief description of the drawings]

【図１】この発明の実施の形態１による文書検索装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a document search device according to a first embodiment of the present invention.

【図２】この発明の実施の形態１による文書検索装置
の動作を説明するフローチャートである。FIG. 2 is a flowchart illustrating an operation of the document search device according to the first embodiment of the present invention.

【図３】この発明の実施の形態１における文書ベクト
ルの一例を示す説明図である。FIG. 3 is an explanatory diagram illustrating an example of a document vector according to the first embodiment of the present invention.

【図４】この発明の実施の形態１における検索語−話
題対応表の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of a search term-topic correspondence table according to Embodiment 1 of the present invention.

【図５】この発明の実施の形態２による文書検索装置
の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a document search device according to a second embodiment of the present invention.

【図６】この発明の実施の形態３による文書検索装置
の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of a document search device according to a third embodiment of the present invention.

【図７】この発明の実施の形態４による文書検索装置
の構成を示すブロック図である。FIG. 7 is a block diagram showing a configuration of a document search device according to a fourth embodiment of the present invention.

【図８】この発明の実施の形態４における単語と関連
語とを定義した一例を示す説明図である。FIG. 8 is an explanatory diagram showing an example in which words and related words are defined in Embodiment 4 of the present invention.

【図９】この発明の実施の形態５による文書検索装置
の構成を示すブロック図である。FIG. 9 is a block diagram showing a configuration of a document search device according to a fifth embodiment of the present invention.

【図１０】従来の文書検索装置を示す構成図である。FIG. 10 is a configuration diagram showing a conventional document search device.

【図１１】従来の文書検索装置における文献単語行列
の一例を示す説明図である。FIG. 11 is an explanatory diagram showing an example of a document word matrix in a conventional document search device.

[Explanation of symbols]

１文書集合、２文書検索部（文書検索手段）、３
検索結果文書集合、４文書特徴抽出部（文書特徴抽出手
段）、５文書ベクトル集合、６話題分類部（話題分
類手段）、７重要度付き検索語集合、８提示検索語
候補生成部（提示検索語候補生成手段）、９重要度付
き検索語集合、１０絞り込み効果推定部（絞り込み効
果推定手段）、１１絞り込み効果指標、１２提示検
索語候補、１３ユーザ、１４分類結果提示部（分類
結果提示手段）、１５分類結果、１６指示情報、１
７検索条件、１８話題、１９選択情報、２０文
書特徴設定部（文書特徴設定手段）、２１文書ベクト
ル、３１話題分類項目取得部（話題分類項目取得手
段）、３２話題分類項目情報、４１指定文書特徴抽
出部（指定文書特徴抽出手段）、４２文書ベクトル、
４３文書、５１関連語辞書（第１の記録手段）、５２
関連語設定部（関連語設定手段）、５３検索語、５４
関連語、６１検索要求作成知識（第２の記録手
段）、６２検索要求作成部（検索要求作成手段）、６３
検索要求検索語、６４検索要求文。1 document set, 2 document search unit (document search means), 3
Search result document set, 4 document feature extraction unit (document feature extraction unit), 5 document vector set, 6 topic classification unit (topic classification unit), 7 search word set with importance, 8 presentation search word candidate generation unit (presentation search Word candidate generating means), 9 search word sets with importance, 10 narrowing effect estimating unit (narrowing effect estimating means), 11 narrowing effect index, 12 presented search word candidates, 13 users, 14 classification result presenting unit (classification result presenting means) ), 15 classification results, 16 instruction information, 1
7 search condition, 18 topics, 19 selection information, 20 document feature setting unit (document feature setting means), 21 document vector, 31 topic classification item acquisition unit (topic classification item acquisition means), 32 topic classification item information, 41 designated document Feature extraction unit (designated document feature extraction means), 42 document vectors,
43 documents, 51 related word dictionary (first recording means), 52
Related word setting section (related word setting means), 53 search words, 54
Related terms, 61 Search request creation knowledge (second recording means), 62 Search request creation section (Search request creation means), 63
Search request search word, 64 search request sentence.

───────────────────────────────────────────────────── フロントページの続き (72)発明者鈴木克志東京都千代田区丸の内二丁目２番３号三菱電機株式会社内Ｆターム(参考） 5B075 ND03 NK02 NK31 NR05 NR12 NS01 PP13 PP22 PQ32 PQ46 UU06 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Katsushi Suzuki 2-3-2 Marunouchi, Chiyoda-ku, Tokyo Mitsubishi Electric Corporation F-term (reference) 5B075 ND03 NK02 NK31 NR05 NR12 NS01 PP13 PP22 PQ32 PQ46 UU06

Claims

[Claims]

1. A document retrieval means for retrieving and outputting a document meeting a retrieval condition from a document set, and based on an appearance frequency of a word described in the document.
A document feature extraction unit that calculates a statistical weight of the word and outputs a document vector set obtained from the weight, and creates a topic by classifying the document vector set according to a similarity between document vectors; A topic classification unit that outputs the document information belonging to the topic and the search word with the importance of the topic, and a narrowing effect index of the search word belonging to the topic with reference to the document information and the search word with the importance. A narrowing effect estimating means for calculating and outputting; and providing the narrowing effect index to the search word, selecting the search word having the high narrowing effect index as a suggested search word candidate, and selecting the suggested search word candidate and the suggested search word. A suggested search word candidate generating means for outputting document information corresponding to the word candidate; and associating the suggested search word candidate with the topic and presenting the suggested search word candidate together with the narrowing effect index A classification result presenting means for prompting the user to input either or both of the instruction information and the selection information; and a document feature setting for changing and outputting a document vector included in the document vector set based on the selection information. Document search device comprising:

2. The document search apparatus according to claim 1, wherein the narrowing effect estimating means calculates and outputs a narrowing effect index for at least one or a plurality of topics.

3. The classification result presenting means presents a topic and proposed search word candidates belonging to the topic in the form of a matrix, and presents a narrowing effect index to each element of the matrix. Document search device as described.

4. A word that characterizes the document is extracted as a topic classification item from the document output by the document search unit, and a weight vector for the topic classification item is learned by referring to a text related to the topic classification item. A topic classification item acquiring unit that outputs the topic classification item and the weight vector to a topic classification unit, wherein the topic classification unit classifies a document vector set based on the topic classification item and the weight vector. 2. The document search apparatus according to claim 1, wherein topics are created.

5. The topic classification item acquisition unit according to claim 4, wherein the topic classification item acquisition unit extracts a topic classification item from the document output by the document search unit based on the appearance frequency of a word described in the document. Document search device.

6. The topic classification item acquiring unit extracts a topic classification item from a document output by the document search unit by referring to a tag described in the document.
Document search device as described.

7. A designated document feature extraction unit that calculates a document vector from a document designated via a classification result presentation unit and outputs the document vector to a document feature setting unit, wherein the document feature setting unit includes the designated document feature extraction unit. 2. The document retrieval apparatus according to claim 1, wherein the document vector set is changed based on the document vector output by the means and the document vector set output by the document feature extraction means.

8. A first recording means for defining a word related to a predetermined word and recording the word as a related word, and extracting the related word corresponding to a specified search word from the first recording means to form a document Related word setting means for outputting to a feature setting means, wherein the document feature setting means changes a document vector set based on the related words and selection information input from the classification result presentation means. The document search device according to claim 1.

9. A second recording means for recording knowledge for creating a search request sentence, and a search request sentence corresponding to a search word output by a presented search word candidate generating means is referred to by referring to the second recording means. Search request creating means for creating and outputting to the document search means, wherein the suggested search word candidate generating means selects the search word to be output to the search request creating means based on a narrowing effect index. The document search device according to claim 1, wherein

10. A proposed search word candidate generating means selects a plurality of search words and outputs the selected search words to a search request creating means, and the search request creating means creates a search request sentence from a logical operation on the plurality of search words. 10. The document search device according to claim 9, wherein the search is performed.

11. A document search step for searching and outputting a document meeting a search condition from a document set, and based on an appearance frequency of a word described in the document.
A document feature extraction step of calculating a statistical weight of the word and outputting a document vector set obtained from the weight; creating a topic by classifying the document vector set according to a similarity between document vectors; A topic classification step of outputting document information belonging to the topic and a search word with the importance of the topic; and a narrowing effect index of the search word belonging to the topic by referring to the document information and the search word with the importance. A narrowing effect estimating step of calculating and outputting; assigning the narrowing effect index to the search word; selecting the search word having the high narrowing effect index as a suggested search word candidate; A suggested search word candidate generating step of outputting document information corresponding to the word candidate; and associating the suggested search word candidate with the topic, And a classification result presentation step of presenting together with the result index and prompting to input either or both of the instruction information and the selection information; and changing and outputting the document vector included in the document vector set based on the selection information. A document feature setting step.

12. The document search method according to claim 11, wherein the narrowing effect estimation step calculates and outputs a narrowing effect index for at least one or a plurality of topics.

13. The classification result presenting step, wherein a topic and a proposed search word candidate belonging to the topic are presented in a matrix format, and a narrowing effect index is presented to each element of the matrix. Document search method described.

14. A word that characterizes the document is extracted as a topic classification item from the document output by the document search step,
A topic classification item acquiring step of learning a weight vector for the topic classification item with reference to a text related to the topic classification item and outputting the topic classification item and the weight vector, wherein the topic classification step includes: 12. The document search method according to claim 11, wherein a topic is created by classifying a set of document vectors based on the topic classification item and the weight vector.

15. The topic classification item obtaining step according to claim 14, wherein the topic classification item is extracted from the document output by the document search step based on the appearance frequency of a word described in the document. Document search method.

16. The document search according to claim 14, wherein the topic category item obtaining step extracts topic category items from the document output by the document search step by referring to tags described in the document. Method.

17. A designated document feature extraction step for calculating and outputting a document vector from a document designated via a classification result presentation step, wherein the document feature setting step includes outputting the document output by the designated document feature extraction step. 12. The document search method according to claim 11, wherein the document vector set is changed based on a vector and a document vector set output by the document feature extraction step.

18. A first recording step of defining a word related to a predetermined word and recording it as a related word, and a related word setting step of extracting and outputting the related word corresponding to a specified search word. 12. The document search method according to claim 11, wherein the document feature setting step changes a document vector set based on the related words and the selection information input from the classification result presentation step.

19. A second method for recording the creation knowledge of a search request sentence.
And a search request creating step of creating and outputting a search request sentence corresponding to the search word output by the suggested search word candidate generating step, wherein the suggested search word candidate generating step is based on a narrowing effect index. 12. The document search method according to claim 11, wherein the search word to be output is selected.

20. A suggested search word candidate generating step for selecting and outputting a plurality of search words, and a search request creating step includes:
20. The document search method according to claim 19, wherein a search request sentence is created from logical operations on the plurality of search words.