JPH0991311A

JPH0991311A - Information storage and retrieval device and its control method

Info

Publication number: JPH0991311A
Application number: JP7270735A
Authority: JP
Inventors: Satoru Yashiro; 哲八代
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-09-26
Filing date: 1995-09-26
Publication date: 1997-04-04

Abstract

PROBLEM TO BE SOLVED: To provide the information storage and retrieval device which can be improved in the efficiency of document registration and document retrieval and its control method by analyzing a document, informing an operator of a word that seems to be a new word or coined word and efficiently registering it in a word dictionary, and generating an index. SOLUTION: When a word registered in the word dictionary is extracted from the inputted document and a coined word other than the extracted word is extracted for characters in the document, the extracted coined word is registered in a coined- word candidate list. When the operator instructs a new-word registering process, the new word registered in the new word candidate list is displayed in a new word candidate list and when the marks '○' and '×' in a 'choice' area R21 are specified by mouse clicking, etc., the selection state is inverted each time the mouse is clicked and whether or not the new word is registered is selected. The area R24 is an area where new words in KATAKANA (square form of Japanese syllabary) are all selected together and the selection state of the KATAKANA new words can be set to '○'more easily than the marks '○' and '×' of the respective words in the area R21 are clocked.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】文書情報から単語辞書によっ
て単語を抽出し、検索のためのインデックスを構築する
情報蓄積検索装置およびその制御方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information storage and retrieval apparatus for extracting a word from document information by a word dictionary and constructing an index for retrieval, and a control method therefor.

【０００２】[0002]

【従来の技術】従来の情報蓄積検索装置は、文書情報か
ら単語辞書に従って単語を抽出し、文書情報の２次的な
情報であるインデックスを作成することで、文書の検索
効率を向上させていた。2. Description of the Related Art A conventional information storage / retrieval device extracts a word from document information in accordance with a word dictionary and creates an index which is secondary information of the document information to improve the document retrieval efficiency. .

【０００３】[0003]

【発明が解決しようとする課題】しかし、上記従来の情
報蓄積検索装置では、オペレータが文書を登録する際
に、文書に含まれる商品名のような新語や造語は新たに
単語辞書に追加しなければならいために、文書を一通り
読み、新語や造語の有無をチェックする必要があった。
このような単語は文書を検索する際のキーワードとして
使用されることが多く、単語辞書への登録を怠ったため
にインデックスが作成されていないと、システムの検索
能力が著しく低下していた。However, in the above-mentioned conventional information storage and retrieval apparatus, when an operator registers a document, a new word or coined word such as a product name included in the document must be newly added to the word dictionary. I had to read through the document and check for new words and coined words in order to do so.
Such a word is often used as a keyword when searching a document, and if the index is not created because the word is not registered in the word dictionary, the search capability of the system is significantly reduced.

【０００４】本発明は、上記問題に鑑みてなされたもの
で、文書を解析し、新語や造語と思われる単語をオペレ
ータに知らせて単語辞書への登録を効率的に行うととも
に、インデックスを作成することにより、文書登録およ
び文書検索の効率を向上させることが可能な情報蓄積検
索装置およびその制御方法を提供することを目的とす
る。The present invention has been made in view of the above problems, and analyzes a document, notifies an operator of a word that is considered to be a new word or a coined word, efficiently registers the word in a word dictionary, and creates an index. Accordingly, it is an object of the present invention to provide an information storage / retrieval device and a control method thereof that can improve the efficiency of document registration and document retrieval.

【０００５】[0005]

【課題を解決するための手段】上記目的を達成するため
本発明の情報蓄積検索装置は、既知の単語を記憶する単
語辞書と、文書情報から該単語辞書に記憶された単語を
抽出する抽出手段と、該抽出された単語に該単語を検索
するためのインデックスを付与するインデックス付与手
段とを備えた情報蓄積検索装置において、前記文書情報
中、前記抽出された単語以外の文字の集合を対象にし
て、該集合に属する文字が新語の候補になるものである
か否かを判断する新語判断手段と、該判断の結果、新語
の候補となる文字を記憶する新語候補文字記憶手段と、
該記憶された新語候補文字からオペレータの指示に応じ
て新語を選択するための新語選択手段と、該選択された
新語を、既知の単語として前記単語辞書に登録する単語
辞書登録手段と有することを特徴とする。In order to achieve the above object, an information storage / retrieval apparatus of the present invention comprises a word dictionary for storing known words and an extracting means for extracting the words stored in the word dictionary from document information. And an index accumulating unit that adds an index for searching the extracted word to the word, in the document information, a set of characters other than the extracted word is targeted. A new word determining means for determining whether or not a character belonging to the set is a candidate for a new word, and a new word candidate character storage means for storing a character as a candidate for a new word as a result of the determination.
A new word selecting means for selecting a new word from the stored new word candidate characters according to an instruction of an operator; and a word dictionary registering means for registering the selected new word in the word dictionary as a known word. Characterize.

【０００６】好ましくは、前記新語選択手段は、前記記
憶された新語候補文字のうちカタカナ文字を一括選択す
るカタカナ文字一括選択手段を有することを特徴とす
る。Preferably, the new word selecting means includes katakana character batch selecting means for collectively selecting katakana characters from the stored new word candidate characters.

【０００７】また、本発明の情報蓄積検索装置の制御方
法は、既知の単語を記憶する単語辞書と、文書情報から
該単語辞書に記憶された単語を抽出する抽出手段と、該
抽出された単語に該単語を検索するためのインデックス
を付与するインデックス付与手段とを備えた情報蓄積検
索装置の制御方法において、前記文書情報中、前記抽出
された単語以外の文字の集合を対象にして、該集合に属
する文字が新語の候補になるものであるか否かを判断
し、該判断の結果、新語の候補となる文字を新語候補文
字記憶手段に記憶し、該記憶された新語候補文字からオ
ペレータの指示に応じて新語を選択し、該選択された新
語を、既知の単語として前記単語辞書に登録することを
特徴とする。Further, the control method of the information storage and retrieval apparatus of the present invention includes a word dictionary for storing known words, an extracting means for extracting words stored in the word dictionary from document information, and the extracted words. In the control method of the information storage and retrieval apparatus, further comprising an index assigning unit that assigns an index for searching the word, in the document information, a set of characters other than the extracted word is targeted, and the set It is determined whether the character belonging to is a candidate for a new word, and as a result of the determination, the character that is a candidate for the new word is stored in the new word candidate character storage means, and the stored new word candidate character A new word is selected according to an instruction, and the selected new word is registered in the word dictionary as a known word.

【０００８】[0008]

【発明の実施の形態】以下、本発明の実施の形態を図面
に基づいて詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【０００９】図１は、本発明の実施の一形態に係る情報
蓄積検索装置の概略構成を表わすブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an information storage / retrieval device according to an embodiment of the present invention.

【００１０】同図において、１は、本実施の形態の情報
蓄積検索装置全体の制御を司るＣＰＵであり、該ＣＰＵ
１は、バス８を介して、ディスプレイ２、コマンド入力
装置３、ＲＯＭ４、ＲＡＭ５、２次記憶装置６、および
文書入力装置７と相互に接続され、相互にデータの送受
信が可能となっている。In FIG. 1, reference numeral 1 is a CPU that controls the entire information storage and retrieval apparatus of this embodiment.
Reference numeral 1 is connected to the display 2, the command input device 3, the ROM 4, the RAM 5, the secondary storage device 6, and the document input device 7 via the bus 8 so that data can be transmitted and received mutually.

【００１１】ディスプレイ２は、例えばＣＲＴまたは液
晶などにより構成され、作成中の情報や選択肢などを表
示する。The display 2 is composed of, for example, a CRT or a liquid crystal, and displays the information and options being created.

【００１２】コマンド入力装置３は、例えばキーボード
やマウスなどにより構成され、各種処理の指令、注目点
の移動や、選択操作、文字、記号の入力などを行うため
に使用される。The command input device 3 is composed of, for example, a keyboard and a mouse, and is used for instructing various processes, moving a point of interest, selecting operation, inputting characters and symbols, and the like.

【００１３】ＲＯＭ４は、前記ＣＰＵ１が実行する各種
の処理プログラムなどを記憶する読み出し専用メモリで
あり、その一部の領域である文書登録プログラム部４ａ
には、文書を登録したり、新語候補を出力したりするプ
ログラムやデータが記憶されている。The ROM 4 is a read-only memory for storing various processing programs executed by the CPU 1, and is a partial area of the document registration program section 4a.
Stores programs and data for registering documents and outputting new word candidates.

【００１４】ＲＡＭ５は、各種演算結果や入力された情
報を一時的に格納する読み出し書き込み可能メモリであ
り、その一部の領域である新語候補リスト部５ａは、前
記ＣＰＵ１により出力された新語候補リストを一時的に
格納する領域として使用される。The RAM 5 is a readable / writable memory that temporarily stores various calculation results and input information, and a new word candidate list section 5a which is a part of the area is a new word candidate list output by the CPU 1. Is used as a temporary storage area.

【００１５】２次記憶装置６は、例えばハードディスク
やＭＯ（Magneto Optical disk）等により構成され、１
次情報である文書情報を記憶する文書情報領域６ａ、２
次情報であるインデックスを記憶するインデックス領域
６ｂ、および単語辞書を記憶する単語辞書領域６ｃなど
を有している。The secondary storage device 6 is composed of, for example, a hard disk, an MO (Magneto Optical disk), or the like.
Document information areas 6a and 2 for storing document information that is the next information
It has an index area 6b for storing an index which is the next information and a word dictionary area 6c for storing a word dictionary.

【００１６】文書入力装置７は、例えばＯＣＲ文書読み
取り装置や、通信機器などから構成され、本実施の形態
の情報蓄積検索装置に格納する文書を入力するために使
用される。The document input device 7 is composed of, for example, an OCR document reading device and a communication device, and is used for inputting a document to be stored in the information storage / retrieval device of the present embodiment.

【００１７】図２は、文書入力装置７から入力され、文
書情報領域６ａに格納される文書の一例を示す図であ
る。FIG. 2 is a diagram showing an example of a document input from the document input device 7 and stored in the document information area 6a.

【００１８】図３は、単語辞書領域６ｃに記憶された単
語のごく一部を示す図である。同図に示すように、単語
辞書領域６ｃには既知の単語が格納され、さらに、検索
の時のキーワードとなる語であるか否かの情報が付加さ
れる。キーワードとなる単語が文書から抽出された場合
には、その文書の格納先やキーワードの出現する位置が
インデックス領域６ｂに格納される。FIG. 3 is a diagram showing only a part of the words stored in the word dictionary area 6c. As shown in the figure, a known word is stored in the word dictionary area 6c, and further information on whether or not the word is a keyword to be used for a search is added. When the keyword word is extracted from the document, the storage location of the document and the position where the keyword appears are stored in the index area 6b.

【００１９】図４は、いくつかの文書から抽出された新
語候補の情報が追加された新語候補リストの一例を示す
図である。FIG. 4 is a diagram showing an example of a new word candidate list to which information of new word candidates extracted from some documents is added.

【００２０】以上のように構成された情報蓄積検索装置
のＣＰＵ１が実行する制御処理を、以下、図５〜１１を
参照して説明する。The control process executed by the CPU 1 of the information storage / retrieval device configured as described above will be described below with reference to FIGS.

【００２１】図５は、ＣＰＵ１が実行する制御処理の手
順を示すフローチャートであり、本フローチャートで示
されるプログラムは、図１のＲＯＭ４ａに記憶されてい
る。FIG. 5 is a flow chart showing the procedure of the control processing executed by the CPU 1, and the program shown in this flow chart is stored in the ROM 4a of FIG.

【００２２】オペレータがシステムの電源を投入する
と、本制御処理プログラムが呼び出され、まずステップ
Ｓ１に進み、必要な初期化処理を行う。When the operator turns on the power of the system, this control processing program is called, and first the process proceeds to step S1 to perform the necessary initialization processing.

【００２３】次に、ステップＳ２にてキーボード３から
のコマンド入力を待ち、何らかのコマンドが入力される
と、ステップＳ３に進む。Next, in step S2, a command input from the keyboard 3 is waited, and when any command is input, the process proceeds to step S3.

【００２４】ステップＳ３では、そのコマンドを評価
し、コマンドの内容に応じた処理に分岐する。このと
き、新しく２次記憶装置６に文書を登録する場合には、
ステップＳ４に進む。In step S3, the command is evaluated and the process branches depending on the content of the command. At this time, if a document is newly registered in the secondary storage device 6,
Proceed to step S4.

【００２５】ステップＳ４では、文書入力装置７から登
録文書を入力し、ステップＳ５では、当該文書と単語辞
書領域６ｃ内の単語とを照合し、文書に含まれる単語を
抽出する。図８は、抽出された単語の様子をわかりやす
く示した図であり、同図中、（）で囲まれた文字列が
抽出された単語を示している。この結果出力される抽出
された単語は、後述するステップＳ８にて索引を作成す
るために使用される。In step S4, the registered document is input from the document input device 7, and in step S5, the document is collated with the word in the word dictionary area 6c to extract the word contained in the document. FIG. 8 is a diagram showing the states of the extracted words in an easy-to-understand manner. In the figure, the character strings enclosed by () indicate the extracted words. The extracted words output as a result are used to create an index in step S8 described later.

【００２６】次に、ステップＳ６では、新語の抽出処理
を行う新語抽出処理サブルーチンを実行する。Next, in step S6, a new word extraction processing subroutine for performing new word extraction processing is executed.

【００２７】図６は、この新語抽出処理サブルーチンの
詳細な手順を示したフローチャートであり、図９は、こ
の処理で新語が抽出される様子をわかりやすく示した図
である。図９中、スラッシュ“／”は単語の区切りを表
わす。以下、新語の抽出処理についてこのフローチャー
トを使って詳細に説明する。FIG. 6 is a flow chart showing the detailed procedure of this new word extraction processing subroutine, and FIG. 9 is a diagram showing in an easy-to-understand manner how a new word is extracted by this processing. In FIG. 9, a slash "/" represents a word delimiter. The new word extraction process will be described in detail below using this flowchart.

【００２８】図６において、まずステップＳ６１では、
新語候補リスト５ａに文書へのポインタを追加する。新
語候補リスト５ａは、ＦＩＦＯ（first in first out）
のキューの性質を持ち、情報はリストの末尾に追加され
るので、リストの先頭から情報を得ることで追加した順
序で情報を取り出すことができる。これは、ある新語が
候補として抽出されたときには、その単語に関するイン
デックスがまだ作成されておらず、後で行う操作で、新
語として確定した後でインデックス作成が必要な文書を
特定するための一つの実現方法である。In FIG. 6, first in step S61,
A pointer to the document is added to the new word candidate list 5a. The new word candidate list 5a is a FIFO (first in first out)
Since it has the property of a queue and information is added to the end of the list, information can be retrieved in the order of addition by obtaining information from the beginning of the list. This is because when a certain new word is extracted as a candidate, the index for that word has not been created yet, and it is one of the operations to be performed later to identify the document that needs to be indexed after it is decided as a new word. It is a realization method.

【００２９】次に、ステップＳ６２では、単語抽出され
た残りの文字列を対象とするように絞り込む。新語を含
む可能性がある文字列である。Next, in step S62, the remaining character strings from which words have been extracted are narrowed down to the target. It is a character string that may include a new word.

【００３０】次に、ステップＳ６３に進み、漢字、ひら
がな、カタカナ、数字、アルファベット、記号のように
文字種の変わる境界や空白文字で単語の区切りを入れ
る。Next, in step S63, words are delimited by boundaries of different character types such as kanji, hiragana, katakana, numbers, alphabets, and symbols, and blank characters.

【００３１】次に、ステップＳ６４に進み、区切られた
各単語についてひらがな、１文字以下の単語、数字から
なる単語をはずす。Next, in step S64, the hiragana, the word consisting of one or less characters, and the word consisting of numbers are removed from each of the separated words.

【００３２】このようにして抽出された新語の候補は、
ステップＳ６５で単語の綴りの順に並べ替え、ステップ
Ｓ６６で重複した単語をなくし、ステップＳ６７で新語
候補リスト５ａに追加する。The new word candidates extracted in this way are
In step S65, the words are rearranged in the order of spelling, duplicate words are eliminated in step S66, and the words are added to the new word candidate list 5a in step S67.

【００３３】図５に戻り、ステップＳ６の新語抽出処理
を終了すると、ステップＳ７に進み、文書情報を２次記
憶装置６内の１次情報領域６ａに格納する。Returning to FIG. 5, when the new word extracting process in step S6 is completed, the process proceeds to step S7, and the document information is stored in the primary information area 6a in the secondary storage device 6.

【００３４】その後、ステップＳ８に進み、ステップＳ
５で抽出された単語と文書の登録場所とを対応づけたイ
ンデックスを構築し処理を終了する。Thereafter, the process proceeds to step S8, and step S
An index in which the word extracted in step 5 and the registration location of the document are associated with each other is constructed, and the process ends.

【００３５】ステップＳ８の処理が終了すると、ステッ
プＳ２のコマンド待ちの状態に戻る。When the process of step S8 is completed, the process returns to the command waiting state of step S2.

【００３６】ステップＳ３のコマンド評価の結果、新語
の登録コマンドであれば、ステップＳ９に進み、新語登
録処理を行う新語登録処理サブルーチンを実行する。As a result of the command evaluation in step S3, if the command is a new word registration command, the process proceeds to step S9, and a new word registration processing subroutine for performing new word registration processing is executed.

【００３７】図７は、この新語登録処理サブルーチンの
詳細な手順を示すフローチャートであり、以下、新語の
登録処理について、このフローチャートを使って詳細に
説明する。FIG. 7 is a flowchart showing the detailed procedure of this new word registration processing subroutine. The new word registration processing will be described in detail below with reference to this flowchart.

【００３８】同図において、まずステップＳ９１で、新
語候補リスト５ａの内容の有無を評価し、新語候補リス
ト５ａがクリア直後の場合には従来通りステップＳ９２
に進み、登録する単語を入力する。In the figure, first, in step S91, the presence or absence of the contents of the new word candidate list 5a is evaluated, and if the new word candidate list 5a has just been cleared, step S92 is performed as usual.
Go to and enter the word to register.

【００３９】図１０は、ステップＳ９２でディスプレイ
２に表示されるウィンドウシステムのＧＵＩ（グラフィ
カルユーザインターフェイス）による入力画面の一例
を示す図である。コマンド入力装置３からウインドウの
領域をマウスクリックなどにより指定して入力および実
行を行う。同図において、領域Ｒ１１は登録する単語を
入力する領域であり、領域Ｒ１２を指定すると領域Ｒ１
１に入力されている単語を決定し、次のステップに処理
を進める。FIG. 10 is a diagram showing an example of an input screen by the GUI (graphical user interface) of the window system displayed on the display 2 in step S92. The command input device 3 is used to specify and input a region of the window by clicking the mouse. In the figure, a region R11 is a region for inputting a word to be registered, and when the region R12 is designated, the region R1
The word entered in 1 is determined, and the process proceeds to the next step.

【００４０】図７に戻り、ステップＳ９３に進み単語辞
書に登録する。Returning to FIG. 7, the flow advances to step S93 to register in the word dictionary.

【００４１】前記ステップＳ９１の判別で、新語候補リ
スト５ａの内容がある場合には、ステップＳ９４に進
む。If it is determined in step S91 that the new word candidate list 5a has contents, the process proceeds to step S94.

【００４２】ステップＳ９４では、新語候補リスト５ａ
から登録すべき単語を表示し、オペレータに登録するか
否かを問い合わせる。このとき、オペレータは各単語に
ついて登録するか否を指定するだけでよい。In step S94, the new word candidate list 5a
The word to be registered is displayed and the operator is asked whether or not to register. At this time, the operator need only specify whether or not to register each word.

【００４３】図１１は、ステップＳ９４でディスプレイ
２に表示されるウィンドウシステムのＧＵＩによる入力
画面の一例を示す図である。コマンド入力装置３からウ
インドウの領域をマウスクリックなどの方法で指定する
ことによって入力および実行を行う。図１１において、
領域Ｒ２１には新語を登録するか否かが、○×マークに
より示されており、このマークをマウスクリックなどで
指定することにより選択状態が反転される。領域Ｒ２２
およびＲ２３はスクロールボタンであり、新語が表示領
域に収まらないときに表示範囲を移動することを指示す
る領域である。領域Ｒ２２を押すと表示範囲は上方向に
移動し、領域Ｒ２３を押すと表示範囲は下方向に移動す
る。領域Ｒ２４は、カタカナによる新語を一括して選択
すること指示する領域で、領域Ｒ２１の各単語の○×マ
ークをクリックするよりもさらに簡単にカタカナ新語の
選択状態を○にすることができる。領域Ｒ２５は、一覧
リストの新語をすべて一括して選択することを指示する
領域で、領域Ｒ２１の各単語の○×マークをクリックす
るよりもさらに簡単に一覧リストのすべての新語の選択
状態を○にすることができる。領域Ｒ２６を指定すると
領域Ｒ２１の選択状態が×である単語を新語候補リスト
５ａから消去し、次のステップに処理を進める。FIG. 11 is a diagram showing an example of an input screen by the GUI of the window system displayed on the display 2 in step S94. Input and execution are performed by designating a window area from the command input device 3 by a method such as mouse clicking. In FIG.
Whether or not to register a new word is indicated in the area R21 by a mark XX, and the selection state is reversed by designating this mark by mouse click or the like. Area R22
And R23 are scroll buttons, which are areas for instructing to move the display range when a new word does not fit in the display area. When the area R22 is pressed, the display range moves upward, and when the area R23 is pressed, the display range moves downward. The region R24 is a region for instructing to collectively select new words in katakana, and the selection state of new katakana words can be set to ◯ more easily than clicking the ◯ × mark of each word in the region R21. The area R25 is an area for instructing to select all the new words in the list at once, and the selection state of all the new words in the list is more easily selected than clicking the ○ × mark of each word in the area R21. Can be When the region R26 is specified, the word in the region R21 having a selection state of x is deleted from the new word candidate list 5a, and the process proceeds to the next step.

【００４４】図７に戻り、選択作業を終了すると、ステ
ップＳ９５に進み、選択した単語を単語辞書に登録す
る。Returning to FIG. 7, when the selection work is completed, the process proceeds to step S95, and the selected word is registered in the word dictionary.

【００４５】次に、ステップＳ９６に進み、関連文書の
索引を作成する。関連文書は、新語候補リスト５ａに前
記ステップＳ３で追加された文書へのポインタ情報を元
に特定する。新語を抽出した文書を含む、それ以降に追
加された文書を対象に、新語が含まれているか否かを調
べ、含まれている場合にはインデックスに追加する。Next, in step S96, the index of the related document is created. The related document is specified based on the pointer information to the document added in step S3 to the new word candidate list 5a. It is checked whether or not the new word is included in the documents added after that including the document in which the new word is extracted. If the new word is included, it is added to the index.

【００４６】次に、ステップＳ９７に進み、処理が終了
した新語候補リスト５ａをクリアして処理を終了する。Next, in step S97, the new word candidate list 5a for which the processing has been completed is cleared and the processing ends.

【００４７】図５に戻り、ステップＳ９の処理が終了す
ると、ステップＳ２のコマンド待ちの状態に戻る。Returning to FIG. 5, when the process of step S9 ends, the process returns to the command waiting state of step S2.

【００４８】前記ステップＳ３のコマンド評価の結果、
その他のコマンドであれば、ステップＳ１０に進み、そ
れぞれのコマンドの必要な処理を行った後に、ステップ
Ｓ２のコマンド待ちの状態に戻る。As a result of the command evaluation in step S3,
If it is any other command, the process proceeds to step S10, and after the necessary processing of each command is performed, the process returns to the command waiting state of step S2.

【００４９】以上説明したように本実施の形態では、入
力された文書を解析し、単語辞書にない文字、すなわち
新語や造語と思われる単語を新語候補リスト５ａに登録
し、その登録された新語を新語候補一覧に表示し、オペ
レータはこの表示された新語候補から所望の新語をマウ
スクリック等の方法で指定することにより単語辞書に登
録するとともに、この新語にインデックスを付与するよ
うに構成したので、新語の単語辞書への登録を効率的に
行うことができるとともに、インデックスの作成により
文書検索の効率を向上させることができる。As described above, in the present embodiment, the input document is analyzed, characters that are not in the word dictionary, that is, words that seem to be new words or coined words are registered in the new word candidate list 5a, and the registered new words are registered. Is displayed in the new word candidate list, and the operator registers the desired new word from the displayed new word candidates by a method such as mouse click to register it in the word dictionary and is configured to add an index to this new word. , The new word can be efficiently registered in the word dictionary, and the efficiency of the document search can be improved by creating the index.

【００５０】また、新語登録に際して、カタカナ文字の
新語を一括に選択することができるようにしたので、さ
らに簡単に新語登録を行うことができる。Further, when registering a new word, the new words of katakana characters can be selected all at once, so that the new word can be registered more easily.

【００５１】なお、本発明は、複数の機器から構成され
るシステムに適用しても、１つの機器からなる装置に適
用してもよい。さらに、本発明はシステム或いはプログ
ラムを供給することによって達成される場合にも適用で
きることは云うまでもない。この場合、本発明を達成す
るためのソフトウェアによって表されるプログラムを格
納した記憶媒体を該システム或いは装置に読み出すこと
によって、そのシステム或いは装置が、本発明の効果を
享受することが可能となる。The present invention may be applied to a system constituted by a plurality of devices or to an apparatus constituted by a single device. Further, it goes without saying that the present invention can be applied to the case where it is achieved by supplying a system or a program. In this case, by reading a storage medium storing a program represented by software for achieving the present invention into the system or the apparatus, the system or the apparatus can enjoy the effects of the present invention.

【００５２】[0052]

【発明の効果】以上説明したように、本発明に依れば、
文書情報中、抽出手段によって抽出された単語以外の文
字の集合に対して、該集合に属する文字が新語の候補に
なるものであるか否かが判断され、該判断の結果、新語
の候補となる文字が新語候補文字記憶手段に記憶され、
該記憶された新語候補文字からオペレータの指示に応じ
て新語が選択され、該選択された新語が、既知の単語と
して前記単語辞書に登録されるので、単語辞書への登録
を効率的に行うことが可能となる効果を奏する。また、
新語を単語辞書に登録すると、その新語に対してインデ
ックスが付与されるので、文書検索の効率を向上させる
ことが可能となる効果を奏する。As described above, according to the present invention,
With respect to the set of characters other than the words extracted by the extraction means in the document information, it is determined whether or not the characters belonging to the set are candidates for a new word, and as a result of the determination, a candidate for a new word is determined. Is stored in the new word candidate character storage means,
A new word is selected from the stored new word candidate characters in accordance with an instruction from the operator, and the selected new word is registered in the word dictionary as a known word, so that the word dictionary can be efficiently registered. Has the effect of enabling. Also,
When a new word is registered in the word dictionary, an index is added to the new word, so that it is possible to improve the efficiency of document search.

【００５３】また、新語を選択するときに、前記記憶さ
れた新語候補文字のうちカタカナ文字を一括選択するよ
うに構成したので、さらに簡単に新語登録を行うことが
できる。Further, when selecting a new word, the katakana characters among the stored new word candidate characters are collectively selected, so that the new word can be registered more easily.

[Brief description of drawings]

【図１】本発明の実施の一形態に係る情報蓄積検索装置
の概略構成を表わすブロック図である。FIG. 1 is a block diagram showing a schematic configuration of an information storage / retrieval device according to an embodiment of the present invention.

【図２】図１の文書入力装置から入力され、文書情報領
域に格納される文書の一例を示す図である。2 is a diagram showing an example of a document input from the document input device of FIG. 1 and stored in a document information area.

【図３】図１の単語辞書領域に記憶された単語のごく一
部を示す図である。FIG. 3 is a diagram showing a small part of words stored in a word dictionary area of FIG. 1;

【図４】いくつかの文書から抽出された新語候補の情報
が追加された新語候補リストの一例を示す図である。FIG. 4 is a diagram showing an example of a new word candidate list to which information of new word candidates extracted from some documents is added.

【図５】図１のＣＰＵが実行する制御処理の手順を示す
フローチャートである。5 is a flowchart showing a procedure of control processing executed by the CPU of FIG.

【図６】図５のステップＳ６の新語抽出処理サブルーチ
ンの詳細な手順を示したフローチャートである。FIG. 6 is a flowchart showing a detailed procedure of a new word extraction processing subroutine of step S6 of FIG.

【図７】図５のステップＳ９の新語登録処理サブルーチ
ンの詳細な手順を示すフローチャートである。FIG. 7 is a flowchart showing a detailed procedure of a new word registration processing subroutine of step S9 of FIG.

【図８】図５のステップＳ５で抽出された単語の様子を
わかりやすく示した図である。FIG. 8 is a diagram showing the states of words extracted in step S5 of FIG. 5 in an easy-to-understand manner.

【図９】図６の新語抽出処理で新語が抽出される様子を
わかりやすく示した図である。FIG. 9 is a diagram showing in an easy-to-understand manner a new word is extracted in the new word extraction processing of FIG.

【図１０】図７のステップＳ９２で図１のディスプレイ
に表示されるＧＵＩによる入力画面の一例を示す図であ
る。10 is a diagram showing an example of a GUI input screen displayed on the display of FIG. 1 in step S92 of FIG.

【図１１】図７のステップＳ９４で図１のディスプレイ
に表示されるＧＵＩによる入力画面の一例を示す図であ
る。11 is a diagram showing an example of a GUI input screen displayed on the display of FIG. 1 in step S94 of FIG.

[Explanation of symbols]

１ＣＰＵ（抽出手段、インデックス付与手段、新語判
断手段、新語選択手段、単語辞書登録手段、カタカナ文
字一括選択手段）２ディスプレイ（新語選択手段、カタカナ文字一括選
択手段）３コマンド入力装置（新語選択手段、カタカナ文字一
括選択手段）５ａ新語候補リスト（新語候補文字記憶手段）６ｃ単語辞書1 CPU (extracting means, index assigning means, new word judging means, new word selecting means, word dictionary registering means, katakana character collective selecting means) 2 Display (new word selecting means, katakana character collective selecting means) 3 Command input device (new word selecting means) , Katakana character batch selection means) 5a New word candidate list (new word candidate character storage means) 6c Word dictionary

Claims

[Claims]

1. A word dictionary for storing a known word, an extracting means for extracting a word stored in the word dictionary from document information, and an index for searching the word for the extracted word. In an information storage / retrieval device provided with index assigning means, in the document information, for a set of characters other than the extracted words, whether or not the characters belonging to the set are candidates for a new word. A new word judging means for judging a new word candidate character storing means for storing a character that is a candidate for a new word as a result of the judgment, and a new word for selecting a new word from the stored new word candidate characters in accordance with an instruction from the operator. An information storage / retrieval device comprising: a selection unit and a word dictionary registration unit that registers the selected new word as a known word in the word dictionary.

2. The information storage / retrieval device according to claim 1, wherein the new word selection means includes katakana character batch selection means for collectively selecting katakana characters from the stored new word candidate characters.

3. A word dictionary for storing known words, an extracting means for extracting words stored in the word dictionary from document information, and an index for searching the words for the extracted words. In a control method of an information storage and retrieval apparatus including an index assigning means, a set of characters other than the extracted words in the document information is targeted, and the characters belonging to the set are candidates for a new word. As a result of the determination, a character that is a candidate for a new word is stored in the new word candidate character storage means, a new word is selected from the stored new word candidate characters according to an instruction from the operator, and the selected new word is selected. A new word is registered in the word dictionary as a known word.