JPH04281558A

JPH04281558A - Document retrieving device

Info

Publication number: JPH04281558A
Application number: JP3069319A
Authority: JP
Inventors: Yasuo Tanosaki; 康雄田野崎; Isamu Iwai; 岩井　勇
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 1992-10-07
Anticipated expiration: 2015-06-26
Also published as: JP3056810B2

Abstract

PURPOSE:To reduce the number of times of the scroll operation of a screen by displaying the skeleton structure of a simplified sentence at the time of displaying parts including designated key words in a text, as a list. CONSTITUTION:A character string including the key words is extracted from text data, a text analyzing processing such as a syntax analysis is operated to this character string, and the simplified sentence constituted of words and phrases constituting the skeleton structure of the text is displayed as the element of the list of candidate documents. Then, a document selecting part 5h allows a user to select one of document content expressions which are already displayed as a list by candidate document list display part 5g. The, a document display part 5i reads out document data corresponding to the document content expression selected by the document selecting part 5h from a candidate document storing buffer 51, and displays the text and a chart or the like on the display screen of a display device.

Description

[Detailed description of the invention]

［発明の目的］ [Purpose of the invention]

【０００１】0001

【産業上の利用分野】本発明は、文書データベースの中
からユーザの目的とする文書を効率よく検索することが
可能な文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document retrieval device capable of efficiently searching a document database for a document desired by a user.

【０００２】0002

【従来の技術】大型コンピュータあるいはワークステー
ションを用いた文書検索システムが実用化されている。2. Description of the Related Art Document retrieval systems using large computers or workstations have been put into practical use.

【０００３】このような文書検索装置において文書の検
索を行なう場合には、まずユーザはキーワードを入力す
る。その後、装置側が入力されたキーワードを、本文中
に含んでいるか、あるいは検索キーとしてヘッダ部分に
含んでいる文書をデータベースの中から捜し出し、その
検索結果をユーザに与える。[0003] When searching for a document using such a document search device, the user first inputs a keyword. Thereafter, the device searches the database for documents that include the input keyword in the text or in the header as a search key, and provides the search results to the user.

【０００４】ところで、条件を満たす文書が複数個見つ
かった場合には、ユーザはさらにこのうなかから必要な
ものを選び出す必要がある。そのため、装置側は、捜し
出された各文書のタイトルおよび各文書に付属する文書
情報あるいはアブストラクトなどの文書内容リストを文
書番号とともに列挙表示し、ユーザはここに付加されて
いる文書内容を参照して、各文書が目的にあったものか
否かの判断を行なってから文書本体を閲覧している。[0004] By the way, if a plurality of documents satisfying the conditions are found, the user must further select the desired one from among them. Therefore, the device side enumerates and displays a document content list such as the title of each document found and the document information or abstract attached to each document along with the document number, and the user can refer to the document content added here. The user determines whether each document is suitable for the purpose before viewing the document itself.

【０００５】[0005]

【発明が解決しようとする課題】上記したように、従来
の検索装置においては、候補文書が複数ある場合に、装
置側が与えた文書内容リストなどを参照して、ユーザが
必要なものを選択するという形態が採られているが、文
書内容リストが文書の内容を的確に表現しているケース
が少なく、また、ユーザの必要とする記述が本文中に存
在してもそれが文書のタイトルあるいはヘッダ情報に表
されていないケースもあった。特に、候補文書数が増え
た場合には、目的とする文書を検索するまでに要するユ
ーザの負担は大きかった。また、文書内容リスト中に詳
しく各文書の内容を表現すると、文書内容リストの表示
量自体が大きくなり、表示画面の表示領域に収まらず、
ユーザは画面のスクロールなどを頻繁に行なわなければ
ならないといった操作上の不具合も生じていた。[Problems to be Solved by the Invention] As mentioned above, in conventional search devices, when there are multiple candidate documents, the user selects the desired one by referring to a document content list provided by the device. However, there are few cases in which the document contents list accurately represents the contents of the document, and even if the description the user wants exists in the main text, it is not included in the title or header of the document. There were also cases that were not represented in the information. In particular, when the number of candidate documents increases, the burden on the user required to search for a target document is large. Also, if the contents of each document are expressed in detail in the document contents list, the display amount of the document contents list itself will become large and will not fit in the display area of the display screen.
There were also operational problems, such as the user having to scroll the screen frequently.

【０００６】本発明は、上記事情に鑑みてなされたもの
で、文書内容リスト中に各文書の内容を的確に、かつ最
少限の記述量で表現できる文書検索装置を提供すること
を目的とする。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a document retrieval device that can accurately represent the contents of each document in a document contents list with a minimum amount of description. .

【０００７】［発明の構成］[Configuration of the invention]

【０００８】[0008]

【課題を解決するための手段】本発明は、上記目的を達
成するために、テキスト・データや図表データなどから
なる文書データを格納する文書データ格納手段と、この
文書データ格納手段に格納されている文書データを検索
するキーワードを入力するキーワード入力手段と、この
キーワード入力手段から入力されたキーワードを含む文
書を上記文書データ格納手段の中から検索するキーワー
ドサーチ手段とを備えた文書検索装置において、上記各
文書データごとに上記キーワードサーチ手段によって抽
出された上記キーワードを含む候補文を格納する格納手
段と、この格納手段に格納されている候補文に対し文章
解析処理を施し上記キーワードを含む簡略化された文を
候補文書リストの要素として表示する候補文書一覧表示
手段と、この候補文書一覧表示手段で表示された文書一
覧における上記要素の一つを指定する文書選択手段と、
この文書選択手段で指定された文書に対応する文書デー
タの内容を表示する文書表示手段とを具備したことを特
徴とする。[Means for Solving the Problems] In order to achieve the above object, the present invention provides a document data storage means for storing document data consisting of text data, diagram data, etc. A document search device comprising: a keyword input means for inputting a keyword to search for document data; and a keyword search means for searching the document data storage means for a document containing the keyword input from the keyword input means; A storage means for storing candidate sentences containing the keywords extracted by the keyword search means for each of the document data, and a simplification including the keywords by performing sentence analysis processing on the candidate sentences stored in the storage means. candidate document list display means for displaying the selected sentences as elements of a candidate document list; document selection means for specifying one of the elements in the document list displayed by the candidate document list display means;
The present invention is characterized by comprising a document display means for displaying the contents of document data corresponding to the document specified by the document selection means.

【０００９】[0009]

【作用】本発明は上記のように構成したので、キーワー
ドを用いることにことによって得られた複数の候補文書
データの中から目的とするものを選ぶ場合に、候補文書
リストの要素としてキーワードをテキスト中の周囲の語
と対応づけて表示することにより、文書中でのそのキー
ワードの現われ方が明示表現され、文書全体の内容が目
的に合致したものかどうかの判断が的確に行なわれる。[Operation] Since the present invention is configured as described above, when selecting a target document data from among a plurality of candidate document data obtained by using keywords, the keyword can be used as a text as an element of the candidate document list. By displaying keywords in association with surrounding words, the appearance of the keyword in the document is clearly expressed, and it is possible to accurately judge whether the content of the entire document matches the purpose.

【００１０】さらに、候補文書データ中のキーワードを
含む文に対し文章解析処理を行ない、キーワードを含ん
で短く表現された文章を候補文書リストの要素として表
示することにより、候補文書リストの表示画面上での占
有面積が小さくなる。[0010]Furthermore, by performing sentence analysis processing on the sentences containing keywords in the candidate document data, and displaying short sentences containing the keywords as elements of the candidate document list, on the display screen of the candidate document list. occupies a smaller area.

【００１１】[0011]

【実施例】以下、図面を参照して本発明の実施例を説明
する。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings.

【００１２】図１は、本発明の一実施例の文書検索装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a document search device according to an embodiment of the present invention.

【００１３】同図に示すように、文書検索装置は、入力
装置１　、表示装置２　、文書データ格納装置３　、制
御装置４　、およびメモリ５から構成される。As shown in the figure, the document search device includes an input device 1, a display device 2, a document data storage device 3, a control device 4, and a memory 5.

【００１４】入力装置１　は、文字コード・制御コマン
ド・位置情報などを入力する装置で、例えばキーボード
１ａとマウス１ｂおよびこれらを制御する装置で構成さ
れる。The input device 1 is a device for inputting character codes, control commands, position information, etc., and is composed of, for example, a keyboard 1a, a mouse 1b, and devices for controlling these.

【００１５】表示装置２　は、ユーザに入力を行なわせ
るためのプロンプトメッセージ、入力された文字列、あ
るいは検索の後に得られた文書データなどの表示を行な
うものであり、例えばＶＲＡＭと、このＶＲＡＭに格納
されたビット情報をドット列として表示するためのディ
スプレイからなっている。[0015] The display device 2 displays prompt messages for prompting the user to input, input character strings, document data obtained after searching, and the like. It consists of a display that displays stored bit information as a dot string.

【００１６】文書データ格納装置３　は、各文書データ
を格納するためのものであり、例えばハードディスク装
置などからなる。この文書格納装置３　における文書デ
ータの格納形式を図２に示す。１個の文書データは、文
書中のテキスト情報のみを含むテキストデータ部３ａと
イメージデータ、フォーマット情報などを含む非テキス
トデータ部３ｂからなり、文書データ格納装置３　には
このような形式の文書データが複数個格納されている。すなわち、複数の文書データ３１，３２，…，３ｎ　は
、それぞれテキストデータ部３１ａ，３２ａ，…，３ｎ
ａと非テキストデータ部３１ｂ，３２ｂ，…，３ｎｂか
らなる形式で文書データ格納装置３　に格納されている
。The document data storage device 3 is for storing each document data, and is composed of, for example, a hard disk device. The storage format of document data in this document storage device 3 is shown in FIG. One piece of document data consists of a text data section 3a containing only text information in the document and a non-text data section 3b containing image data, format information, etc. The document data storage device 3 stores document data in this format. Multiple items are stored. That is, the plurality of document data 31, 32,..., 3n are text data portions 31a, 32a,..., 3n, respectively.
The data is stored in the document data storage device 3 in a format consisting of a and non-text data portions 31b, 32b, . . . , 3nb.

【００１７】制御装置４　は、例えばＣＰＵなどからな
るもので、入力装置１　、表示装置２　、文書データ格
納装置３　、およびメモリ５とバスにより接続されてお
り、各装置の制御、装置間のデータの転送などの制御や
処理を行なうものである。The control device 4 is composed of, for example, a CPU, and is connected to the input device 1 , display device 2 , document data storage device 3 , and memory 5 via a bus, and controls each device and transfers data between the devices. It performs control and processing such as transfer of data.

【００１８】メモリ５　は、例えばダイナミックＲＡＭ
からなり、図３に示すように、制御装置４　が各種制御
や処理を実行するためのプログラムを格納するプログラ
ム部５ａと、処理の際に必要なデータをバッファするバ
ッファ部５ｂとからなっている。さらに、プログラム部
５ａは、メイン処理部５ｃ、初期化部５ｄ、キーワード
入力部５ｅ、キーワードサーチ部５ｆ、候補文書一覧表
示部５ｇ、文書選択部５ｈ、および文書表示部５ｉのモ
ジュールに分割され、また、データバッファ部５ｂは、
キーワード格納バッファ５ｊ、キーワードサーチ用バッ
ファ５ｋ、候補文書格納バッファ５ｌ、候補文書数格納
バッファ５ｍ、文字列格納バッファ５ｎ、構文木格納バ
ッファ５ｐ、および文骨格格納バッファ５ｑから構成さ
れる。以下、プログラム部５ａとバッファ部５ｂの各部
の機能について説明する。The memory 5 is, for example, a dynamic RAM.
As shown in FIG. 3, it consists of a program section 5a that stores programs for the control device 4 to execute various controls and processes, and a buffer section 5b that buffers data necessary for processing. . Further, the program section 5a is divided into the following modules: a main processing section 5c, an initialization section 5d, a keyword input section 5e, a keyword search section 5f, a candidate document list display section 5g, a document selection section 5h, and a document display section 5i. Further, the data buffer section 5b is
It is composed of a keyword storage buffer 5j, a keyword search buffer 5k, a candidate document storage buffer 5l, a candidate document number storage buffer 5m, a character string storage buffer 5n, a syntax tree storage buffer 5p, and a sentence skeleton storage buffer 5q. The functions of each part of the program section 5a and buffer section 5b will be explained below.

【００１９】メイン処理部５ｃは、装置全体の処理の制
御を司どるものであり、プログラムの分岐、初期化部５
ｄ以下の各モジュールの呼び出し（起動）などを行ない
、また、初期化部５ｄは、各ハードウェア装置の初期設
定およびデータバッファ部５ｂを構成する各バッファの
内容の初期化を行なう。The main processing section 5c is in charge of controlling the processing of the entire device, and is responsible for program branching and initialization section 5.
In addition, the initialization unit 5d performs the initialization of each hardware device and initializes the contents of each buffer constituting the data buffer unit 5b.

【００２０】キーワード入力部５ｅは、入力装置１　の
キーボード１ａを介してユーザに検索の際にキーとなる
キーワードである文字列を入力させ、これをキーワード
格納バッファ５ｊに格納する。[0020] The keyword input unit 5e allows the user to input a character string that is a key keyword during a search via the keyboard 1a of the input device 1, and stores this in the keyword storage buffer 5j.

【００２１】キーワードサーチ部５ｆは、文書データ格
納装置３　に格納されている文書データを格納されてい
る順序で読み出してキーワードサーチ用バッファ５ｋに
格納し、キーワード格納バッファ５ｉに格納されている
文字列を含む文書データをキーワードサーチ用バッファ
５ｋ上で捜しだす。この検索の結果、得られる複数の文
書データを候補文書データとして候補文書格納バッファ
５ｌに格納する。The keyword search section 5f reads the document data stored in the document data storage device 3 in the order in which they are stored, stores it in the keyword search buffer 5k, and searches the character strings stored in the keyword storage buffer 5i. The document data containing the keyword search buffer 5k is searched for. A plurality of pieces of document data obtained as a result of this search are stored as candidate document data in the candidate document storage buffer 5l.

【００２２】候補文書一覧表示部５ｇは、候補文書格納
バッファ５ｌに格納されている各候補文書データの内容
を表わす表現（以下、文書内容表現と称す）を表示装置
２　の表示画面上に列挙表示する。すなわち、文書内容
表現は、候補文書一覧の要素として表示画面上に列挙表
示される。The candidate document list display section 5g displays, on the display screen of the display device 2, expressions representing the contents of each candidate document data stored in the candidate document storage buffer 5l (hereinafter referred to as document content expressions). do. That is, the document content expressions are displayed as an enumeration on the display screen as elements of the candidate document list.

【００２３】文書選択部５ｈは、すでに候補文書一覧表
示部５ｇによって列挙表示されている文書内容表現のい
ずれか一つをユーザに選択させる。The document selection section 5h allows the user to select one of the document content expressions already listed and displayed by the candidate document list display section 5g.

【００２４】文書表示部５ｉは、文書選択部５ｈによっ
て選択された文書内容表現に対応する文書データを候補
文書格納バッファ５ｌより読み出し、テキスト・図表な
どを表示装置２　の表示画面上に表示する。The document display section 5i reads document data corresponding to the document content representation selected by the document selection section 5h from the candidate document storage buffer 5l, and displays text, charts, etc. on the display screen of the display device 2.

【００２５】候補文書数格納バッファ５ｍは、候補文書
格納バッファ５ｌに含まれる文書データ数を格納するバ
ッファである。The candidate document number storage buffer 5m is a buffer that stores the number of document data included in the candidate document storage buffer 5l.

【００２６】さらに、文字列格納バッファ５ｎはキーワ
ードを含む一文単位の文字列を格納するバッファ、構文
木格納バッファ５ｐは文章解析処理の一つである構文解
析の結果を格納するバッファ、また、文骨格データ格納
バッファ５ｑは文の骨格を表わす文字列を格納するバッ
ファである。Further, the character string storage buffer 5n is a buffer for storing character strings containing keywords in units of sentences, and the syntax tree storage buffer 5p is a buffer for storing the results of syntactic analysis, which is one of the text analysis processes. The skeleton data storage buffer 5q is a buffer that stores character strings representing the skeleton of a sentence.

【００２７】次に、上記構成の文書検索装置の具体的な
処理動作について、図４の処理の流れを示すフローチャ
ートを参照し説明する。Next, specific processing operations of the document retrieval apparatus having the above configuration will be explained with reference to a flowchart showing the flow of processing in FIG.

【００２８】処理全体の制御はメイン処理部５ｃが司ど
っており、メイン処理部５ｃはまず初期化部５ｄを起動
する。起動された初期化部５ｄはバッファ部５ｂのキーワード
格納バッファ５ｊ、キーワードサーチ用バッファ５ｋお
よび候補文書格納バッファ５ｌの初期化、候補文書数格
納バッファ５ｍの内容のクリア、入力装置１　と表示装
置２　の初期設定などを行なう。さらに、コマンド入力
のために必要な各種のアイコンの表示も行なう。（ステ
ップＳ１）。The entire process is controlled by the main processing section 5c, which first starts up the initialization section 5d. The activated initialization unit 5d initializes the keyword storage buffer 5j, keyword search buffer 5k, and candidate document storage buffer 5l of the buffer unit 5b, clears the contents of the candidate document number storage buffer 5m, and input device 1 and display device 2. Perform initial settings, etc. Furthermore, various icons necessary for command input are displayed. (Step S1).

【００２９】続いて、メイン処理部５ｃはキーワード入
力部５ｅを起動する。起動されたキーワード入力部５ｅ
はユーザに入力装置１　のキーボード１ａを介してコー
ド列からなるキーワードを一般に複数個入力させる。入
力されたコード列に対して、カナ漢字変換などの処理を
施し、得られた文字列をキーワード格納バッファ５ｊに
格納する。キーワードが入力されキーワード格納バッフ
ァ５ｊに格納された後、処理はステップＳ３に移行する
。（ステップＳ２）。Next, the main processing section 5c activates the keyword input section 5e. Activated keyword input section 5e
Generally, the user inputs a plurality of keywords each consisting of a code string via the keyboard 1a of the input device 1. Processing such as kana-kanji conversion is performed on the input code string, and the obtained character string is stored in the keyword storage buffer 5j. After the keyword is input and stored in the keyword storage buffer 5j, the process moves to step S3. (Step S2).

【００３０】ステップＳ３ではキーワードサーチ部５ｆ
が起動される。起動されたキーワードサーチ部５ｆは、
文書データ格納装置３　に格納されている文書データを
格納されている順序、例えば最初に文書データ３１を読
み出し、キーワードサーチ用バッファ５ｋに格納する。さらに、キーワードサーチ部５ｆは、キーワードサーチ
用バッファ５ｋに格納されいる文書データ３１のテキス
トデータ部３１ａ　を参照し、この中にキーワード格納
バッファ５ｊに格納されている複数のキーワードのいず
れかの文字列と同一の文字列が含まれているか否かを調
べる。含まれている場合には、キーワードサーチ用バッ
ファ５ｋに格納されいる文書データ３１全体を候補文書
格納バッファ５ｌに候補文書として格納し、候補文書数
格納バッファ５ｍの内容を“１”増加させる。続いて、
キーワードサーチ部５ｆは、文書データ３２から文書デ
ータ３ｎまでの文書データに対して上記した一連の処理
を順次実行する。すなわち、文書データ格納装置３　に
格納されている全ての文書データに対して上記処理を実
行する。（ステップＳ３）。[0030] In step S3, the keyword search section 5f
is started. The activated keyword search section 5f is
The document data stored in the document data storage device 3 is read out in the order in which it is stored, for example, the document data 31 is read out first and stored in the keyword search buffer 5k. Further, the keyword search section 5f refers to the text data section 31a of the document data 31 stored in the keyword search buffer 5k, and searches the text data section 31a for any one of the plurality of keywords stored in the keyword storage buffer 5j. Check to see if it contains the same string as . If it is included, the entire document data 31 stored in the keyword search buffer 5k is stored as a candidate document in the candidate document storage buffer 5l, and the content of the candidate document number storage buffer 5m is increased by "1". continue,
The keyword search unit 5f sequentially executes the above-described series of processes on the document data 32 to 3n. That is, the above process is executed for all document data stored in the document data storage device 3. (Step S3).

【００３１】上記ステップＳ３における処理が終了する
と、候補文書格納バッファ５ｌの内容が参照され、ステ
ップＳ２で入力されたキーワードをそのテキストデータ
に含む文書データが存在するか否か、すなわち、候補文
書が存在するか否かが調べられる。条件が満たされなか
った（候補文書が存在しない）場合には処理はステップ
Ｓ５に、また、条件が満たされた（候補文書が存在する
）場合には処理はステップＳ６にそれぞれ移行する。（
ステップＳ４）。[0031] When the process in step S3 is completed, the contents of the candidate document storage buffer 5l are referenced to determine whether there is document data that includes the keyword input in step S2 in its text data, that is, if the candidate document is You can check whether it exists or not. If the condition is not met (no candidate document exists), the process proceeds to step S5, and if the condition is met (candidate document exists), the process proceeds to step S6. (
Step S4).

【００３２】ステップＳ５においては、該当する文書が
見つからなかった旨を示すメッセージを表示装置２　の
表示画面上に表示した後、処理をステップＳ２に戻して
ユーザに新たなキーワードを入力させ、上記処理を繰り
返す。[0032] In step S5, a message indicating that the corresponding document was not found is displayed on the display screen of the display device 2, and then the process returns to step S2 to prompt the user to enter a new keyword, and the process described above is continued. repeat.

【００３３】ステップＳ６においては、候補文書一覧表
示部５ｇが起動され、候補文書一覧表示部５ｇは候補文
書格納バッファ５ｌに格納されている各文書データのテ
キストデータ部の内容を参照して、文書ごとに候補文書
一覧の要素としてその文書内容表現を表示する。文書内
容表現は文字列から構成されており、各文書内容表現は
後の処理のために表示装置２　の画面上の矩形領域の内
部に格納し、この矩形の輪郭を表示する。このステップ
Ｓ６は、ステップＳ６１　〜Ｓ６５　の５ステップから
なっており、以下、ステップＳ６における処理について
詳述する。In step S6, the candidate document list display unit 5g is activated, and the candidate document list display unit 5g refers to the contents of the text data portion of each document data stored in the candidate document storage buffer 5l, and displays the document. For each document, the document content representation is displayed as an element of the candidate document list. The document content representations are composed of character strings, and each document content representation is stored within a rectangular area on the screen of the display device 2 for later processing, and the outline of this rectangle is displayed. This step S6 consists of five steps, steps S61 to S65, and the processing in step S6 will be described in detail below.

【００３４】まず、候補文書格納バッファ５ｌに格納さ
れている文書データのテキストデータ部の内容を参照し
て、キーワード格納バッファ５ｉに格納されている、キ
ーワードを含む文字列からなる箇所を抽出して文字列格
納バッファ５ｎに格納する。ここで、抽出される単位は
文、つまりテキストデータ中で句点（「。」）で区切ら
れる単位である。なお、一つの候補文書データのテキス
ト部にキーワードを含む箇所が複数存在した場合には、
その最初に出現したものを採用する。候補文書格納バッ
ファ５ｌに格納されている図５に示す原テキスト１０か
ら、キーワードとして「ワークステーション」という語
で抽出した文字列１１の例を図６に示す。この抽出結果
は、文字列格納バッファ５ｎに格納される。（ステップ
Ｓ６１　）。First, by referring to the contents of the text data portion of the document data stored in the candidate document storage buffer 5l, a portion consisting of a character string containing a keyword stored in the keyword storage buffer 5i is extracted. Store in character string storage buffer 5n. Here, the unit to be extracted is a sentence, that is, a unit separated by a period (“.”) in the text data. Note that if there are multiple locations that include keywords in the text part of one candidate document data,
The first one that appears is adopted. FIG. 6 shows an example of a character string 11 extracted with the word "workstation" as a keyword from the original text 10 shown in FIG. 5 stored in the candidate document storage buffer 5l. This extraction result is stored in the character string storage buffer 5n. (Step S61).

【００３５】続いて、文字列格納バッファ５ｎに格納さ
れている抽出された文字列に対して構文解析を行なう。すなわち、まず抽出された文字列１１を、図７に示すよ
うに、主語、述語、目的語、補語、および修飾語に分解
し、リスト形式データである構文木情報を得る。得られ
た構文木情報を構文木格納バッファ５ｐに格納する。図
６に示す抽出された文字列１１に対し構文解釈を行なっ
た結果、構文木格納バッファ５ｐに格納される構文木情
報１２内容の例を図７に示す。（ステップＳ６２　）。Next, the extracted character string stored in the character string storage buffer 5n is analyzed. That is, first, the extracted character string 11 is decomposed into a subject, a predicate, an object, a complement, and a modifier, as shown in FIG. 7, to obtain syntax tree information that is list-format data. The obtained syntax tree information is stored in the syntax tree storage buffer 5p. FIG. 7 shows an example of the contents of the syntax tree information 12 stored in the syntax tree storage buffer 5p as a result of performing syntax interpretation on the extracted character string 11 shown in FIG. (Step S62).

【００３６】構文木情報の構文木格納バッファ５ｐへの
格納後、構文木格納バッファ５ｐ中の構文木情報が参照
され、構文木における主種動詞およびこの主動詞に直結
する各語句が取り出されて、これらを結合した文骨格デ
ータ１３が生成される。生成された文骨格データは文骨
格データ格納バッファ５ｑに格納される。図７に示す構
文木情報から生成され文骨格データ格納バッファ５ｑに
格納される文骨格データ１３の例を図８に示す。このよ
うにして生成された文骨格データは候補文書データから
抽出された文字列に比べ、短く表現され、簡略化された
文となる。（ステップＳ６３　）。After the syntax tree information is stored in the syntax tree storage buffer 5p, the syntax tree information in the syntax tree storage buffer 5p is referred to, and the main species verb in the syntax tree and each phrase directly connected to this main verb are extracted. , sentence skeleton data 13 is generated by combining these. The generated sentence skeleton data is stored in the sentence skeleton data storage buffer 5q. FIG. 8 shows an example of the sentence skeleton data 13 generated from the syntax tree information shown in FIG. 7 and stored in the sentence skeleton data storage buffer 5q. The sentence skeleton data generated in this way is expressed as a shorter and simplified sentence than the character string extracted from the candidate document data. (Step S63).

【００３７】さらに、文骨格データ格納バッファ５ｑの
内容の文字列が表示装置２　の画面上の矩形領域の内部
に候補文書の文書内容表現として表示され、この矩形の
輪郭が表示される。（ステップＳ６４　、ステップＳ６
５）。Furthermore, the character string of the contents of the sentence skeleton data storage buffer 5q is displayed as a document content representation of the candidate document within a rectangular area on the screen of the display device 2, and the outline of this rectangle is displayed. (Step S64, Step S6
5).

【００３８】上記したように、候補文書一覧表示部５ｇ
が起動されると、ステップＳ６１　〜ステップＳ６５　
の処理を候補文書格納バッファ５ｌに格納されている全
ての文書データに対して各文書データごとに実行する。画面上において、各文書に対応する文書内容表現を表示
する順序は、候補文書文書格納バッファ５ｌに格納され
ている順序に従って行なわれる。このようにして表示装
置２　の画面上に表示された候補文書の一覧１４の例を
図９に示す。As described above, the candidate document list display section 5g
is started, steps S61 to S65
The above process is executed for each document data stored in the candidate document storage buffer 5l. The order in which document content expressions corresponding to each document are displayed on the screen is performed according to the order stored in the candidate document storage buffer 5l. FIG. 9 shows an example of the list 14 of candidate documents displayed on the screen of the display device 2 in this manner.

【００３９】ステップＳ６における候補文書一覧の表示
の処理が終了すると、文書選択部５ｈが起動される。文
書選択部５ｈが起動されると、入力装置１　のマウス１
ｂを介してユーザによる表示装置２　の画面上の位置入
力が行なわれる。ここで、ユーザによって指定された位
置が、ステップＳ１で表示されたアイコンと同様の終了
コマンドを表すアイコンの内部であれば、一連の検索処
理が終了する。（ステップＳ７、ステップＳ８）。When the process of displaying the list of candidate documents in step S6 is completed, the document selection section 5h is activated. When the document selection section 5h is activated, the mouse 1 of the input device 1
The user inputs a position on the screen of the display device 2 through the arrow b. Here, if the position specified by the user is inside an icon representing an end command similar to the icon displayed in step S1, the series of search processes ends. (Step S7, Step S8).

【００４０】また、ユーザによって指定された位置が、
図９に示す文書内容表現を含む画面上の矩形領域の内部
であれば、その矩形が画面上で何番目のものかが調べら
れ、対応する文書データが候補文書格納バッファ５ｌか
ら読み出されるとともに文書表示部５ｉが起動される。文書表示部５ｉが起動されると、読み出された文書デー
タを構成するテキストデータおよびイメージデータなど
が画面上に表示される。文書データの表示処理が終わる
と、制御はステップＳ７に戻り、新たな文書データを表
示すべく、候補文書一覧に表示されている文書の選択が
再度行なわれる。なお、ユーザによって指定された位置
が、文書内容表現を含む画面上の矩形領域の外側である
場合には、ユーザに正しい位置を指定させるために、ス
テップＳ７に戻り、再度位置入力が行なわれる。（ステ
ップＳ９、ステップＳ１０　）。[0040] Furthermore, if the position specified by the user is
If it is inside a rectangular area on the screen that includes the document content representation shown in FIG. The display section 5i is activated. When the document display section 5i is activated, text data, image data, etc. that constitute the read document data are displayed on the screen. When the document data display processing is completed, control returns to step S7, and the documents displayed in the candidate document list are selected again in order to display new document data. Note that if the position specified by the user is outside the rectangular area on the screen that includes the document content representation, the process returns to step S7 and the position is input again in order to have the user specify the correct position. (Step S9, Step S10).

【００４１】なお、上記実施例では候補文書一覧を表示
する際、一文単位で構文解析を行ないこれを候補文書一
覧の要素としたが、これに限ることはなく、一つの段落
に含まれる複数の文に構文解析を行ない、その結果をひ
とまとめにして候補文書一覧の要素としてもよい。[0041] In the above embodiment, when displaying a list of candidate documents, syntax analysis is performed on a sentence-by-sentence basis and this is used as an element of the list of candidate documents. However, the present invention is not limited to this. A sentence may be parsed and the results may be grouped together as elements of a list of candidate documents.

【００４２】また、上記実施例では構文解析により候補
文書一覧の要素として文骨格データを表示するようにし
たが、これに限ることはなく、他の文章解析処理により
解析された解析データを表示するようにしてもよい。例
えば、文字列格納バッファ５ｎに格納されているキーワ
ードを含む文字列に対して形態素解析を実行し、該当す
るキーワードおよびその前後の一定語数、例えば２語ま
で含む領域を抽出する。このとき、付属語（例えば、の
、を、に等）は語数としてカウントせず、また、対象と
なる文字列中で該当するキーワードの前方に上記条件を
満たす語が所定数以上存在しなかった場合には、抽出す
る文の先頭を対象とする文の先頭とする。図６に示す候
補文書データから抽出されたキーワードを含む文字列１
１に対して形態素解析を実行した文字列１５の例を図１
０に示す。この例の場合にも、構文解析を実行した場合
と同様に、候補文書データから抽出された文字列に比べ
、キーワードを含んで簡略化された文となる。要するに
、キーワードを含む文字列、すなわち、候補文を簡略化
して短く表現された文に変換する文章解析処理方法であ
れば、いかなる文章解析方法であってもよい。Furthermore, in the above embodiment, sentence skeleton data is displayed as an element of the candidate document list through syntax analysis, but the present invention is not limited to this, and analysis data analyzed through other sentence analysis processes may be displayed. You can do it like this. For example, morphological analysis is performed on a character string containing a keyword stored in the character string storage buffer 5n, and a region containing the relevant keyword and a certain number of words, for example up to two words, before and after the keyword is extracted. At this time, attached words (for example, の, wo, ni, etc.) are not counted as the number of words, and there are no more than a predetermined number of words that meet the above conditions before the corresponding keyword in the target character string. In this case, the beginning of the sentence to be extracted is the beginning of the target sentence. Character string 1 containing keywords extracted from candidate document data shown in FIG.
Figure 1 shows an example of character string 15 for which morphological analysis was performed on 1.
0. In this example, as in the case of performing syntax analysis, the resulting sentence is simplified and includes keywords compared to the character string extracted from the candidate document data. In short, any text analysis method may be used as long as it simplifies a character string containing a keyword, that is, a candidate sentence, and converts it into a shortened sentence.

【００４３】また、本発明は上記実施例に限定されるも
のではなく、本発明の要旨を逸脱しない範囲で種々変形
可能であることは勿論である。Further, the present invention is not limited to the above-mentioned embodiments, and it goes without saying that various modifications can be made without departing from the gist of the present invention.

【００４４】[0044]

【発明の効果】以上詳述したように、本発明の文書検索
装置によれば、キーワードを用いて検索して得た候補文
書の一覧表の要素として、テキスト中の指定されたキー
ワードを含む箇所を列挙表示する際に、簡略化された文
の骨格を表示することにより、一度に表示画面上に表示
できる候補文書の数を増加することができるので、画面
のスクロール操作などの回数を減少でき、操作性の向上
が図れる。As described in detail above, according to the document search device of the present invention, a portion of text that includes a specified keyword is used as an element of a list of candidate documents obtained by searching using a keyword. By displaying a simplified sentence skeleton when enumerating documents, you can increase the number of candidate documents that can be displayed on the display screen at once, reducing the number of screen scroll operations. , the operability can be improved.

【００４５】また、候補文書の一覧表の要素として、テ
キスト中のキーワードを含む簡略化された文の骨格を表
示することにより、候補として与えられた文書が目的と
するものかどうかの判定を瞬時にかつ正確に行なうこと
ができ、その結果、文書データベース中から目的とする
ものを検索する際に要するユーザの労力を著しく削減す
ることが可能になるなどその実用的効果は多大である。[0045] Furthermore, by displaying the skeleton of a simplified sentence that includes keywords in the text as an element of the list of candidate documents, it is possible to instantly determine whether a document given as a candidate is the desired one. This can be done quickly and accurately, and as a result, it has great practical effects, such as making it possible to significantly reduce the user's effort required to search for a desired item in a document database.

[Brief explanation of the drawing]

【図１】本発明の一実施例の文書検索装置の構成を示す
ブロック図である。FIG. 1 is a block diagram showing the configuration of a document search device according to an embodiment of the present invention.

【図２】文書データ格納装置内における文書データの格
納形式を示した図である。FIG. 2 is a diagram showing a storage format of document data in a document data storage device.

【図３】メモリ装置内部の構成を示した図である。FIG. 3 is a diagram showing the internal configuration of the memory device.

【図４】処理の流れの概略を示したフローチャートであ
る。FIG. 4 is a flowchart showing an outline of the flow of processing.

【図５】原テキストデータの例を示す図である。FIG. 5 is a diagram showing an example of original text data.

【図６】抽出された候補文の例を示す図である。FIG. 6 is a diagram showing an example of extracted candidate sentences.

【図７】構文木格納バッファの内容の一例を示す図であ
る。FIG. 7 is a diagram showing an example of the contents of a syntax tree storage buffer.

【図８】文骨格データ格納バッファの内容の一例を示し
た図である。FIG. 8 is a diagram showing an example of the contents of a sentence skeleton data storage buffer.

【図９】文書ごとに文書内容表現が表示されている例を
示す図である。FIG. 9 is a diagram showing an example in which document content expressions are displayed for each document.

【図１０】他の実施例を示す図である。FIG. 10 is a diagram showing another embodiment.

[Explanation of symbols]

１　…入力装置（キーワード入力手段）３　…文書デー
タ格納装置（文書データ格納手段）５ｆ…キーワードサ
ーチ部（キーワードサーチ手段）５ｇ…候補文書一覧表
示部（文書一覧表示手段）５ｈ…文書選択部（文書選択
手段）５ｉ…文書表示部（文書表示手段）1... Input device (keyword input means) 3... Document data storage device (document data storage means) 5f... Keyword search section (keyword search means) 5g... Candidate document list display section (document list display means) 5h... Document selection section ( Document selection means) 5i...Document display section (document display means)

Claims

[Claims]

Claim 1: Document data storage means for storing document data consisting of text data, diagram data, etc.; keyword input means for inputting keywords for searching document data stored in the document data storage means; and keyword search means for searching the document data storage means for a document containing the keyword input from the keyword input means, the keyword extracted by the keyword search means for each document data. storage means for storing candidate sentences containing the keywords; and a candidate document list display that performs sentence analysis processing on the candidate sentences stored in the storage means and displays simplified sentences containing the above keywords as elements of the candidate document list. means, a document selection means for specifying one of the above elements in the document list displayed by the candidate document list display means, and a document display for displaying the contents of document data corresponding to the document specified by the document selection means. A document retrieval device characterized by comprising: means.