JP2937519B2

JP2937519B2 - Document search device

Info

Publication number: JP2937519B2
Application number: JP3069321A
Authority: JP
Inventors: 康雄田野崎; 幸夫中本
Original assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Current assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Priority date: 1991-03-08
Filing date: 1991-03-08
Publication date: 1999-08-23
Anticipated expiration: 2014-08-23
Also published as: JPH04281565A

Description

DETAILED DESCRIPTION OF THE INVENTION

［発明の目的］ [Object of the invention]

【０００１】[0001]

【産業上の利用分野】本発明は、入力されたキーワード
を含む文書を抽出するフルテキストサーチを行なう文書
検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search apparatus for performing a full text search for extracting a document including an input keyword.

【０００２】[0002]

【従来の技術】近年、文書が電子化され、大量の文書デ
ータが流通している。この大量の文書データの中からユ
ーザが必要とするものを抽出する際には、文字列からな
るキーワードを入力して検索を行なう検索方式が主流と
なっている。このキーワードによる検索は、主に次の２
方式に大別される。2. Description of the Related Art In recent years, documents have been digitized and a large amount of document data has been distributed. When extracting what the user needs from this large amount of document data, a search method of performing a search by inputting a keyword composed of a character string has become mainstream. The search by this keyword is mainly
The method is roughly classified.

【０００３】(1) 全ての文書に予めキーワードが付与さ
れており、ユーザが入力したキーワードが付与されてい
る文書を抽出する方式。(1) A method in which a keyword is previously assigned to all documents, and a document to which a keyword input by a user is assigned is extracted.

【０００４】(2) ユーザが入力したキーワードを文書中
に含んでいるかフルテキストサーチによる方式。(2) A method using a full-text search to determine whether a document contains a keyword input by a user.

【０００５】上記(1) の方式では、文書に付与されてい
るキーワードの数を制限することによって、ユーザが入
力したキーワードとの文字列マッチング処理を最少限に
して、高速な検索を実現している。しかしながら、文書
には予めキーワードが付与されていなければならず、ま
た、付与されていないキーワードをユーザが指定した場
合には、文書を抽出することは不可能である。その上、
文書にキーワードを付与する作業は、文書作成者にとっ
て負担であり、また、キーワードの決め方も文書作成者
まかせの場合があり、キーワードの統一性を確保するの
が困難である。In the above method (1), by limiting the number of keywords assigned to a document, a character string matching process with a keyword input by a user is minimized, and a high-speed search is realized. I have. However, a keyword must be assigned to a document in advance, and if a keyword that has not been assigned is specified by a user, it is impossible to extract the document. Moreover,
The task of assigning a keyword to a document is a burden on the document creator, and the method of determining the keyword may be left to the document creator, making it difficult to ensure keyword uniformity.

【０００６】また、上記(2) の方式は、ユーザが入力し
た文字列を含んでいる文書を抽出するので、抽出洩れも
少ない。In the method (2), a document containing a character string input by a user is extracted, so that there is little omission in extraction.

【０００７】ところで、上記(1),(2) の方式では、条件
を満たす文書が見つかった場合には、ユーザが入力した
文字列を含んでいる文書を列挙表示して、検索結果をユ
ーザに与える。ユーザはさらに大量に出力された文書の
中から、表示画面のスクロールによって順番に各文書が
目的にあったものか否かの判断を行なって必要なものを
選び出す必要がある。このとき、キーワードが文書中で
どのように出現しているかはユーザには示されない。In the above methods (1) and (2), when a document that satisfies the condition is found, the document containing the character string input by the user is listed and displayed, and the search result is displayed to the user. give. The user is required to select necessary documents from the documents output in a large amount by scrolling the display screen to determine in order whether each document is suitable for the purpose. At this time, the user is not shown how the keyword appears in the document.

【０００８】[0008]

【発明が解決しようとする課題】上記したように、従来
のフルテキストサーチによる検索装置においては、検索
装置から得られた検索結果はユーザが入力したキーワー
ドを含む文書がファイルの格納順に列挙表示され、ユー
ザはこの列挙表示された文書を全て読まなければならな
いという問題点があった。As described above, in a conventional full-text search apparatus, documents including a keyword input by a user are listed and displayed in the order of file storage in a search result obtained from the search apparatus. However, there is a problem that the user must read all the documents listed and displayed.

【０００９】本発明は、上記事情に鑑みてなされたもの
で、キーワードを含む文字列の文書中における出現位置
の重要度を類推して検索結果を出力する文書検索装置を
提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to provide a document search apparatus which outputs a search result by estimating the importance of the appearance position of a character string including a keyword in a document. I do.

【００１０】［発明の構成］[Structure of the Invention]

【００１１】[0011]

【課題を解決するための手段】本発明は、上記目的を達
成するために、入力されたキーワードをテキスト中に含
む文書を抽出する文書検索装置において、キーワードを
入力する入力手段と、文書を表題・前書き文・本文など
に文書分割する文書分割手段と、上記キーワードが上記
文書中に含まれているか否かを判別する文字列マッチン
グ手段と、この文字列マッチング手段で判別されたキー
ワードの文書中の出現位置と上記文書分割手段による文
書分割情報に基づいて検索文書の優先度を計算する優先
度計算手段と、この優先度計算手段により得られた優先
度順に文書を出力する検索結果出力手段とを具備したこ
とを特徴とする。SUMMARY OF THE INVENTION In order to achieve the above object, the present invention provides a document search apparatus for extracting a document including an input keyword in a text, an input means for inputting a keyword, and a title for the document. A document dividing unit that divides the document into a preamble sentence, a text, etc., a character string matching unit that determines whether or not the keyword is included in the document, and a character string matching unit that determines whether the keyword is included in the document. Priority calculating means for calculating the priority of the search document based on the appearance position of the document and the document division information by the document dividing means; and search result output means for outputting the documents in the order of the priority obtained by the priority calculating means. It is characterized by having.

【００１２】[0012]

【作用】本発明は上記のように構成したので、入力手段
から入力されたキーワードが文書中に出現する位置（例
えば、表題、前書き文、あるいは本文など）に基づい
て、検索された文書の優先度を算出し、この算出された
優先度順に文書を検索結果として出力することにより、
効率的な文書の検索が行なわれる。Since the present invention is constructed as described above, the priority of the retrieved document is determined based on the position (for example, title, preamble, or text) where the keyword input from the input means appears in the document. By calculating the degrees and outputting the documents as search results in the order of the calculated priorities,
An efficient document search is performed.

【００１３】[0013]

【実施例】以下、図面を参照して本発明の実施例を説明
する。Embodiments of the present invention will be described below with reference to the drawings.

【００１４】図１は、本発明の一実施例の文書検索装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a document search apparatus according to one embodiment of the present invention.

【００１５】同図において、1 は、例えばフロッピーデ
ィスク装置やハードディスク装置などからなり、すでに
作成されている文書データを保存する外部記憶装置で、
この外部記憶装置1 から読み出された文書データは、例
えばダイナミックＲＡＭからなる文書データメモリ2 に
記憶される。１個の文書データは、文書中のテキスト情
報のみを含むテキストデータ部とイメージデータ、フォ
ーマット情報などを含む非テキストデータ部からなる。In FIG. 1, reference numeral 1 denotes an external storage device which is composed of, for example, a floppy disk device or a hard disk device, and stores already created document data.
The document data read from the external storage device 1 is stored in a document data memory 2 composed of, for example, a dynamic RAM. One piece of document data includes a text data portion including only text information in a document and a non-text data portion including image data, format information, and the like.

【００１６】また、3 は検索キーワードやコマンドなど
を入力するための入力部で、例えばキーボードとマウス
およびこれらを制御する装置で構成され、この入力部3
から入力された文字列からなる検索キーワードやコマン
ドなどは、制御部4 の制御により、例えばＶＲＡＭと、
このＶＲＡＭに格納されたビット情報をドット列として
表示するためのＣＲＴディスプレイからなる表示部5 に
表示される。表示部5は、さらに、検索結果や文書デー
タメモリ2 に記憶されている文書データなども表示す
る。Reference numeral 3 denotes an input unit for inputting a search keyword, a command, and the like, and is constituted by, for example, a keyboard, a mouse, and a device for controlling them.
Under the control of the control unit 4, for example, a VRAM,
The bit information stored in the VRAM is displayed on a display unit 5 comprising a CRT display for displaying as dot rows. The display unit 5 further displays search results, document data stored in the document data memory 2, and the like.

【００１７】制御部4 は、システムプログラムを記憶す
るとともにバッファメモリとして用いられるＲＡＭや制
御動作を実行するＣＰＵなどから構成され、上記各装置
あるいは後述する各装置とバスにより接続され、各装置
の制御、装置間のデータの転送などの制御や処理を行な
うものである。なお、この制御部4 には制御や処理に必
要なバッファやカウンタが含まれており、例えば、外部
記憶装置1 に格納されている文書データ数は、図示しな
い文書数格納バッファに格納されている。The control unit 4 comprises a RAM which stores a system program and is used as a buffer memory and a CPU which executes a control operation. The control unit 4 is connected to each of the above-mentioned devices or each of the devices described later by a bus, and controls each device. And controls and processes such as data transfer between devices. The control unit 4 includes buffers and counters necessary for control and processing.For example, the number of document data stored in the external storage device 1 is stored in a document number storage buffer (not shown). .

【００１８】6 は、文書データのテキストデータ部を、
図２に示すように、表題部分7 、前書き文部分8 、およ
び本文部分9 に文書分割する文書分割部で、文書データ
メモリ2 に格納されている文書データから、改行コード
および句点までを一文単位として呼び出す。文書分割部
6 は、一テキスト文書の第１文から文末が句点である一
文の前文までを表題部分7 、「はじめに」や「初めに」
などのように本テキスト文書の内容を具体的に述べ始め
ることを示す語句を含む文の前文、すなわちアブストラ
クトの部分を前書き文部分8 、および前書き文部分の次
の文から本テキスト文書の最終文までを本文部分9 とし
て判断し、文書分割する。この文書分割情報は制御部4
に出力される。6 represents a text data portion of the document data,
As shown in FIG. 2, the document dividing unit divides the document into a title part 7, a preamble sentence part 8, and a body part 9, and from the document data stored in the document data memory 2 to the line feed code and the punctuation point in one sentence unit. Call as Document division
6 is the title part from the first sentence of one text document to the preceeding sentence of which the end of the sentence is a punctuation mark7, "Introduction" or "Introduction"
The preamble of a sentence that includes a phrase indicating that the content of the text document starts to be specifically described, such as the preamble sentence 8 in the abstract part, and the final sentence of the text document from the sentence following the preamble sentence Is determined as the body part 9 and the document is divided. This document division information is stored in the control unit 4
Is output to

【００１９】10はマッチング部で、一テキスト文書デー
タが格納されている文書データメモリ2 から一文を呼び
出し、その一文中に入力部3 から入力されたキーワード
が含まれているか否かを判別する。一方、11は部分別値
表で、この部分別値表11にはユーザが求める文書の順位
付けを行なうための計算値が格納されており、入力部3
から入力されたキーワードが文書中のどの部分、すなわ
ち文書分割部分のどの部分に存在しているかによって計
算値が設定されている。例えば、図３に示すように、表
題部分7 に位置する場合には“１０”、前書き部分8 の
場合は“５”、本文部分の場合は“２”と設定されてい
る。テキスト文書中にキーワードが含まれている場合に
は、マッチング部10は、文書分割部6 から制御部4 に出
力されている文書分割情報に基づいてキーワードが出現
する位置に該当する値を部分別値表11から呼び出し、計
算値格納バッファ12に加算する。複数のテキスト文書デ
ータに対し、この計算値格納バッファ12への加算をテキ
スト文書データの第１文から最終文まで行なったときの
計算値格納バッファ12の内容一例を図４に示す。この計
算値格納バッファ12に格納されている計算値は、検索結
果を出力する際の検索結果の出力優先度を示している。A matching unit 10 calls one sentence from the document data memory 2 in which one text document data is stored, and determines whether or not the sentence includes the keyword input from the input unit 3. On the other hand, reference numeral 11 denotes a partial value table. The partial value table 11 stores calculated values for ranking documents required by the user.
The calculation value is set according to which part of the document, that is, which part of the document division part, the keyword input from is present. For example, as shown in FIG. 3, "10" is set for the title part 7, "5" is set for the preamble part 8, and "2" is set for the body part. If the keyword is included in the text document, the matching unit 10 separates the value corresponding to the position where the keyword appears based on the document division information output from the document division unit 6 to the control unit 4 into parts. It is called from the value table 11 and added to the calculated value storage buffer 12. FIG. 4 shows an example of the contents of the calculated value storage buffer 12 when the addition to the calculated value storage buffer 12 for a plurality of text document data is performed from the first sentence to the last sentence of the text document data. The calculated value stored in the calculated value storage buffer 12 indicates the output priority of the search result when outputting the search result.

【００２０】13は検索結果出力部で、検索結果出力部13
は各文書に対する計算値が格納されている計算値格納バ
ッファ12の内容を参照し、例えば図５に示すように、計
算値の大きい順すなわち優先度順に文書を表示部5に表
示する。Reference numeral 13 denotes a search result output unit.
Refers to the contents of the calculated value storage buffer 12 storing the calculated values for each document, and displays the documents on the display unit 5 in the order of the calculated values, that is, in order of priority, as shown in FIG. 5, for example.

【００２１】次に、上記構成の文書検索装置の具体的な
処理動作について、図６の処理の流れを示すフローチャ
ートを参照し説明する。Next, a specific processing operation of the document retrieval apparatus having the above configuration will be described with reference to a flowchart showing the processing flow of FIG.

【００２２】まず、制御部4 内のバッファやカウンタ、
および計算値格納バッファ12の初期化が行なわれ、続い
て、入力部3 から検索のための文字列からなるキーワー
ドがユーザによって複数個入力される。（ステップS1、
ステップS2）。First, buffers and counters in the control unit 4
Then, the calculation value storage buffer 12 is initialized, and subsequently, a plurality of keywords composed of character strings for search are input from the input unit 3 by the user. (Step S1,
Step S2).

【００２３】キーワード入力が終了すると、複数の文書
データが格納されている外部記憶装置1 から１テキスト
文書のデータが文書データメモリ2 に読み込まれる。１
テキスト文書を読み込むと、文書分割情報であるICHIフ
ラグをリセットするとともに制御部4 内のテキストカウ
ンタＮ（不図示）を“１”にセットする。（ステップS
3、ステップS4）。When the keyword input is completed, the data of one text document is read into the document data memory 2 from the external storage device 1 in which a plurality of document data are stored. 1
When the text document is read, the ICHI flag, which is the document division information, is reset, and a text counter N (not shown) in the control unit 4 is set to "1". (Step S
3, step S4).

【００２４】続いて、文書データメモリ2 から改行コー
ドあるいは句点で区切られた最初の一文、例えば図２に
示す最初の一文14（以下、具体的なテキスト文書として
図２に示すテキスト文書を参照し説明する）が制御部4
を介し、文書分割部6 とマッチング部10に読み込まれ
る。（ステップS5）。最初の一文14が読み込まれると、
ステップS4においてICHIフラグがリセットされているの
で、ICHIフラグが本文部分9 と前書き文部分8 であるか
をチェックするステップS6とステップS7を通って、処理
はステップS8に移行する。Subsequently, from the document data memory 2, the first sentence delimited by a line feed code or a period, for example, the first sentence 14 shown in FIG. 2 (hereinafter referred to as a specific text document, see the text document shown in FIG. 2) Control) 4
Are read into the document dividing unit 6 and the matching unit 10 via (Step S5). When the first sentence 14 is read,
Since the ICHI flag has been reset in step S4, the process proceeds to step S8 through steps S6 and S7 in which it is checked whether the ICHI flag is the text part 9 and the preamble sentence part 8.

【００２５】ステップS8では、最初の一文14が読み込ま
れているので、文書分割部6 は表題部分7 と判別する。
この文書分割情報に基づいて、マッチング部10により、
部分別値表11から表題部分7 に該当する値“１０”が呼
び出され、ICHIフラグが“１０”にセットされる。続い
て、ステップS9において、読み込まれた一文に文末句点
があるか否かがステップS9においてチェックされるが、
最初の一文14には句点がないので、処理はステップS10
に移行する。In step S8, since the first sentence 14 has been read, the document division unit 6 determines that the sentence 14 is the title portion 7.
Based on the document division information, the matching unit 10
The value “10” corresponding to the title portion 7 is called from the partial value table 11, and the ICHI flag is set to “10”. Subsequently, in step S9, it is checked in step S9 whether the read sentence has a terminating period,
Since the first sentence 14 has no punctuation, the process proceeds to step S10.
Move to

【００２６】ステップS10 では、読み込まれた一文が入
力部3 から入力されたキーワードを含むか否かがマッチ
ング部10でマッチング法によって調べられる。ここで、
キーワードが“文書検索装置”という文字列であるとす
ると、最初の一文14は“文書検索装置”という文字列を
含んでいるので、処理はステップS11 に移行する。In step S10, the matching unit 10 checks whether or not the read sentence includes the keyword input from the input unit 3 by a matching method. here,
Assuming that the keyword is a character string "document search device", the first sentence 14 includes a character string "document search device", so that the process proceeds to step S11.

【００２７】ステップS11 においては、マッチング部10
により計算値格納バッファ12への加算が行なわれる。す
なわち、ICHIフラグの値“１０”がfile［１］、つまり
“文書１”の領域に加算される。加算後、ステップS12
において、さらに読み込むべき一文があるか否かがチェ
ックされる。図２の例では読み込むべき一文があるの
で、処理はステップS5に戻る。In step S11, the matching unit 10
Is added to the calculated value storage buffer 12. That is, the value of the ICHI flag “10” is added to file [1], that is, the area of “document 1”. After addition, step S12
Is checked to see if there is one more sentence to read. In the example of FIG. 2, since there is one sentence to be read, the process returns to step S5.

【００２８】ステップS5では、２番目の一文15が読み込
まれ、上記と同様の処理が行なわれるが、この一文15に
は“検索”というキーワードが含まれていないので、ス
テップS11 をスキップ（すなわち、２番目の一文15には
キーワードが含まれていないので、“文書１”領域への
加算は行なわれない）してステップ10からステップ12へ
処理が移行し、さらに、ステップS12 からステップS5に
再び戻る。In step S5, the second sentence 15 is read and the same processing as described above is performed. However, since this sentence 15 does not include the keyword "search", step S11 is skipped (ie, Since no keyword is included in the second sentence 15, addition to the “document 1” area is not performed), and the process proceeds from step 10 to step 12, and further proceeds from step S 12 to step S 5 again. Return.

【００２９】ステップS5に戻ると、３番目の一文16が読
み込まれ、上記と同様の処理が行なわれるが、この３番
目の一文16には文末句点があるので、処理はステップS9
からステップS13 に移行する。Returning to step S5, the third sentence 16 is read and the same processing as described above is performed. However, since the third sentence 16 has an end-of-sentence point, the process is executed in step S9.
Then, control goes to a step S13.

【００３０】ステップS13 では、読み込まれている３番
目の一文16は表題部分7 に続く一文で、かつ文末に句点
があるので、文書分割部6 は前書き文部分8 と判別す
る。この文書分割情報に基づいて、マッチング部10を介
して部分別値表11から前書き文部分8 に該当する値
“５”が呼び出され、ICHIフラグが“５”にセットされ
る。ICHIフラグへの前書き文部分8 に該当する値“５”
のセットが終了すると、処理はステップS10 に移行す
る。In step S13, since the third sentence 16 that has been read is a sentence following the title portion 7 and there is a period at the end of the sentence, the document dividing section 6 determines that it is the preamble sentence portion 8. Based on the document division information, the value “5” corresponding to the preamble sentence portion 8 is called from the partial value table 11 via the matching unit 10, and the ICHI flag is set to “5”. Value "5" corresponding to the preamble sentence part 8 to the ICHI flag
Is completed, the process moves to step S10.

【００３１】ステップS10 では、読み込まれた３番目の
一文16には“文書検索装置”という文字列が含まれてい
るので、処理はステップS11に移行する。ステップS11
においては、計算値格納バッファ12への加算が行なわれ
るが、ICHIフラグが“５”にセットされているので、
“文書１”の領域には“５”が加算される。この加算に
よって、計算値格納バッファ12の内容は“１５”とな
る。計算値格納バッファ12には、全文書について１文書
ごとに計算値が格納されており、この計算値に基づいて
検索結果出力の優先順位が決定される。このステップS1
1 の処理が終了すると、続く前書き文部分8 の一文を読
み込むために、処理は、同様にステップS12からステッ
プS5に戻る。In step S10, since the third sentence 16 that is read includes the character string "document search device", the process proceeds to step S11. Step S11
In, addition to the calculated value storage buffer 12 is performed, but since the ICHI flag is set to "5",
“5” is added to the area of “document 1”. By this addition, the content of the calculated value storage buffer 12 becomes “15”. The calculated value storage buffer 12 stores calculated values for all documents for each document, and the search result output priority is determined based on the calculated values. This step S1
When the processing of 1 is completed, the processing similarly returns from step S12 to step S5 in order to read one sentence of the following preamble sentence part 8.

【００３２】３番目の一文16以降の前書き文部分8 の一
文が読み込まれると、ステップS7以降の処理が上記処理
とは一部異なる。すなわち、ステップS7において、ICHI
フラグが前書き文部分8 に該当する値“５”にセットさ
れているので、処理はステップS8ではなくステップ14に
移行する。When one sentence of the preamble sentence portion 8 after the third sentence 16 is read, the processing after step S7 is partially different from the above processing. That is, in step S7, ICHI
Since the flag is set to the value “5” corresponding to the preamble sentence part 8, the processing shifts to step 14 instead of step S8.

【００３３】ステップS14 では、読み込まれた一文に本
文部分9 を示す文字列「はじめに」が含まれているか否
かのチェックが行なわれる。読み込まれた一文に文字列
「はじめに」が含まれていない場合には、処理はステッ
プS10 に移行し、上記と同様の処理を繰り返す。At step S14, it is checked whether or not the read sentence contains a character string "Introduction" indicating the body part 9. If the read sentence does not include the character string “Introduction”, the process proceeds to step S10, and the same process as above is repeated.

【００３４】また、読み込まれた一文が本文部分9 の最
初の一文17である場合には、処理はステップS15 に進
む。ステップS15 では、「はじめに」という語句が含ま
れているので、文書分割部6 は本文部分9 と判別する。
この文書分割情報に基づいて、マッチング部10を介して
部分別値表11から本文部分9 に該当する値“２”が呼び
出され、ICHIフラグが“２”にセットされる。ICHIフラ
グへの本文部分9 に該当する値“２”のセットが終了す
ると、処理はステップS10 に移行する。If the read sentence is the first sentence 17 of the body part 9, the process proceeds to step S15. In step S15, since the word “Introduction” is included, the document dividing unit 6 determines that the document is the text part 9.
Based on the document division information, the value “2” corresponding to the text part 9 is called from the partial value table 11 via the matching unit 10, and the ICHI flag is set to “2”. When the setting of the value “2” corresponding to the text part 9 to the ICHI flag is completed, the process proceeds to step S10.

【００３５】ステップS10 以降のステップS11 、ステッ
プS12 の処理は、上記した処理と同様の処理が行なわれ
るが、本文部分9 がキーワードを含む場合には、ICHIフ
ラグにセットされている“２”が計算値格納バッファ12
への“文書１”領域に加算される。図２に示す例文にお
いては、本文部分9 にキーワードが含まれていないの
で、この“２”の加算は行なわれない。The processing in steps S11 and S12 following step S10 is the same as the above-described processing. However, when the body part 9 includes a keyword, "2" set in the ICHI flag is replaced with "2". Calculation value storage buffer 12
To the “document 1” area. In the example sentence shown in FIG. 2, since the keyword is not included in the body part 9, the addition of "2" is not performed.

【００３６】本文部分9 の一文が読み込まれると、ステ
ップS15 でICHIフラグが本文部分9に該当する値“２”
にセットされているので、ステップS6から途中の処理ス
テップをスキップしてステップS10 に処理が移行する。When one sentence of the body part 9 is read, the ICHI flag is set to a value "2" corresponding to the body part 9 in step S15.
Therefore, the processing skips the processing steps in the middle from step S6 and moves to step S10.

【００３７】以上の動作を本文部分9 の最終文まで繰り
返すと、読み込むべき一文がなくなり、“文書１”に対
する検索が終了する。このとき、計算値格納バッファ12
には図４に示すような計算値が“文書１”の領域に格納
される。When the above operation is repeated up to the last sentence of the text portion 9, there is no more sentence to be read, and the search for "document 1" ends. At this time, the calculated value storage buffer 12
, A calculated value as shown in FIG. 4 is stored in the area of “document 1”.

【００３８】読み込むべき一文がなくなると、処理はス
テップS16 から、次の文書、例えば“文書２”を読み込
むためにステップS3に戻る。When there is no more sentence to be read, the process returns from step S16 to step S3 to read the next document, for example, "document 2".

【００３９】ステップS3に戻ると、上記と同様に、外部
記憶装置1 から“文書２”のテキスト文書が文書データ
メモリ2 に読み込まれ、さらに、ステップS4では、文書
分割情報であるICHIフラグをリセットするとともに制御
部4 内のテキストカウンタＮを“２”にセットする。ス
テップS5以降は、上記処理と同様の処理が繰り返され
る。Returning to step S3, the text document of "document 2" is read from the external storage device 1 into the document data memory 2 in the same manner as described above, and further, in step S4, the ICHI flag as document division information is reset. At the same time, the text counter N in the control section 4 is set to "2". After step S5, the same processing as the above processing is repeated.

【００４０】このようにして、外部記憶装置1 に格納さ
れている全てのテキスト文書に対する検索処理が完了す
ると、すなわち、制御部4 内の文書数格納バッファとテ
キストカウンタＮの値が一致すると、処理はステップS1
6 からステップS17 へ移行する。As described above, when the search processing for all text documents stored in the external storage device 1 is completed, that is, when the value of the text number storage buffer in the control unit 4 matches the value of the text counter N, the processing is started. Is Step S1
The process moves from step 6 to step S17.

【００４１】ステップS17 では、検索結果出力部13が起
動され、計算値格納バッファ12の内容が参照される。計
算値格納バッファ12に、例えば図４に示す内容が格納さ
れているとすると、検索結果出力部13は、計算値格納バ
ッファ12の計算値の高い順に図５に示す順番でテキスト
文書を表示部5 に表示し優先出力する。In step S17, the search result output unit 13 is activated, and the contents of the calculated value storage buffer 12 are referred to. Assuming that, for example, the contents shown in FIG. 4 are stored in the calculated value storage buffer 12, the search result output unit 13 displays the text documents in the order shown in FIG. Displayed at 5 and prioritized output.

【００４２】以上のように、計算値格納バッファ12にお
ける計算値の高い文書がユーザの求めている文書に近い
ものであるとして優先出力することにより、効率的な文
書検索が行なわれる。As described above, an efficient document search is performed by preferentially outputting a document having a high calculated value in the calculated value storage buffer 12 as being close to the document desired by the user.

【００４３】なお、上記実施例ではマッチング部10は、
読み込まれた一文中にキーワードが含まれているか否か
を判別するとともに文書分割情報とキーワードが出現す
る位置に基づいて計算値格納バッファ12に加算するよう
にしたが、マッチング部10をキーワードが含まれている
か否かを判別する文字列マッチング手段と文書分割情報
とキーワードが出現する位置に基づいて計算値格納バッ
ファ12に加算する優先度計算手段とに構成を分割させて
も同様の作用が達成される。In the above embodiment, the matching unit 10
It is determined whether a keyword is included in the read sentence and added to the calculated value storage buffer 12 based on the document division information and the position where the keyword appears, but the matching unit 10 includes the keyword. The same effect can be achieved even if the configuration is divided into a character string matching unit that determines whether or not the keyword is included and a priority calculation unit that adds the calculated value to the calculated value storage buffer 12 based on the document division information and the position where the keyword appears. Is done.

【００４４】また、上記実施例では文書を表題部分7 、
前書き文部分8、および本文部分9に文書分割するように
したが、これに限ることはなく、例えば後書き文部分な
どを加えたり、あるいは表題部分7 と本文部分9 に分割
したり、などその文書分割数の追加・削除は任意に設定
してもよい。Further, in the above embodiment, the document is designated by the title part 7,
The document is divided into the foreword part 8 and the body part 9, but the invention is not limited to this.For example, the document is divided into a postscript part and the like, or divided into the title part 7 and the body part 9. Addition / deletion of the number of divisions may be arbitrarily set.

【００４５】また、上記実施例では文書を表題部分7 、
前書き文部分8、本文部分9 の順に文書分割するように
したが、これに限ることはなく、文書分割する順序をそ
の文書の属する技術分野に応じて任意に変えられること
は勿論である。Further, in the above embodiment, the document is designated by the title part 7,
Although the document is divided in the order of the preamble sentence portion 8 and the body portion 9 in this order, the present invention is not limited to this, and it is needless to say that the document division order can be changed arbitrarily according to the technical field to which the document belongs.

【００４６】また、本発明は上記実施例に限定されるも
のではなく、本発明の要旨を逸脱しない範囲で種々変形
可能であることは勿論である。Further, the present invention is not limited to the above-described embodiment, and it is needless to say that various modifications can be made without departing from the gist of the present invention.

【００４７】[0047]

【発明の効果】以上詳述したように、本発明の文書検索
装置によれば、文書中におけるキーワードの出現位置に
応じて、そのキーワードの文書中における重要度を推定
し、検索結果の出力に優先度を設けることにより、効率
的な文書検索ができ、その結果、文書データベース中か
ら目的とするものを検索する際に要するユーザの労力を
著しく削減することが可能になるなどその実用的効果は
多大である。As described in detail above, according to the document search apparatus of the present invention, the importance of the keyword in the document is estimated in accordance with the position of the keyword in the document, and the output of the search result is performed. By setting priorities, efficient document search can be performed, and as a result, the user's labor required for searching for a target document in a document database can be significantly reduced. It is enormous.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例の文書検索装置の構成を示す
ブロック図である。FIG. 1 is a block diagram illustrating a configuration of a document search device according to an embodiment of the present invention.

【図２】一文書の表題・前書き文・本文の分割の例を示
す図である。FIG. 2 is a diagram illustrating an example of division of a title, a preamble, and a body of one document.

【図３】検索結果に優先順位付けを行なうための値を格
納する部分別値表の内容の例を示す図である。FIG. 3 is a diagram showing an example of the contents of a partial value table for storing values for prioritizing search results;

【図４】文書毎の計算値を格納する計算値格納バッファ
の内容の例を示す図である。FIG. 4 is a diagram illustrating an example of the contents of a calculated value storage buffer that stores a calculated value for each document.

【図５】検索結果の出力順番を示す図である。FIG. 5 is a diagram showing an output order of search results.

【図６】処理の流れの概略を示したフローチャートであ
る。FIG. 6 is a flowchart showing an outline of a processing flow.

[Explanation of symbols]

3 …入力部（入力手段） 6 …文書分割部（文書分割手段） 10…マッチング部（文字列マッチング手段、優先度計算
手段） 13…検索結果出力部（検索結果出力手段）3 ... input unit (input means) 6 ... document division unit (document division means) 10 ... matching unit (character string matching means, priority calculation means) 13 ... search result output unit (search result output means)

フロントページの続き (56)参考文献特開昭58−44536（ＪＰ，Ａ) 特開平２−224069（ＪＰ，Ａ) 特開平３−294963（ＪＰ，Ａ) 特開平２−44465（ＪＰ，Ａ) 特開昭58−50071（ＪＰ，Ａ) 影浦峡ほか、「文献の論理構造を考慮した全文検索システム」，学術情報センター紀要第３号，ｐｐ．49−58（平成２年９月30日) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/30 Continuation of the front page (56) References JP-A-58-44536 (JP, A) JP-A-2-224069 (JP, A) JP-A-3-294963 (JP, A) JP-A-2-44465 (JP, A) , A) JP-A-58-50071 (JP, A) Kageura Gorge et al., “Full-text search system considering the logical structure of documents”, Bulletin of the National Center for Science Information Systems, Vol. 49-58 (September 30, 1990) (58) Field surveyed (Int. Cl. ⁶ , DB name) G06F 17/30

Claims

(57) [Claims]

An input unit for inputting a keyword; a document dividing unit for dividing a document into a title, an introductory sentence, a text, and the like; A character string matching unit that determines whether the keyword is included in the document, and a search based on the appearance position of the keyword determined by the character string matching unit in the document and the document division information by the document division unit A document search device comprising: a priority calculation unit that calculates a priority of a document; and a search result output unit that outputs documents in the order of the priority obtained by the priority calculation unit.