JPH10187699A

JPH10187699A - Document processor and its method

Info

Publication number: JPH10187699A
Application number: JP8348598A
Authority: JP
Inventors: Masaru Washikita; 賢鷲北
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1996-12-26
Filing date: 1996-12-26
Publication date: 1998-07-21

Abstract

PROBLEM TO BE SOLVED: To estimate and automatically register a meaning classification of an unknown word by the document processor which converts an inputted character string while properly referring to a dictionary containing meaning classifications of words. SOLUTION: A program module 252 has program modules 261 to 264 as subroutines. Here, the program module 261 extracts an unknown word and its postpositional particle of Japanese and the module 262 extract a verb having modification relation with the extracted known word; and the module 263 takes a verb valence pattern corresponding to the verb extracted from a verb valence dictionary (dictionary containing meaning classifications of nouns corresponding to respective verbs and their postpositional particles) and estimates a meaning classification of the unknown word from the verb valence pattern matching the postpositional particle of the unknown word and the module 264 registers the meaning classification of the unknown word.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は文書処理装置及びそ
の方法に係り、特に仮名漢字変換機構により漢字仮名混
じり文を作成する文書処理装置及びその方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document processing apparatus and method, and more particularly to a document processing apparatus and method for creating a sentence mixed with kanji and kana by a kana-kanji conversion mechanism.

【０００２】[0002]

【従来の技術】従来、日本語ワードプロセッサなどの文
書処理装置は、漢字仮名混じり文の入力を仮名漢字変換
機構を使って行うことが一般的であった。仮名漢字変換
機構は、仮名漢字辞書を参照することにより入力された
読み列を漢字に変換する機構である。この仮名漢字辞書
は、各単語に対し品詞情報が記述された辞書であり、仮
名漢字変換機構は、読み列を解析して可能な文節候補を
作成し、それらを組み合わせて変換候補を決定し、尤も
らしい順に掲示する。そして、掲示された変換候補の中
からオペレータは所望の候補を選択する。2. Description of the Related Art Conventionally, a document processing apparatus such as a Japanese word processor generally inputs a sentence mixed with kanji and kana using a kana-kanji conversion mechanism. The kana-kanji conversion mechanism is a mechanism for converting a reading sequence input by referring to a kana-kanji dictionary into kanji. This Kana-Kanji dictionary is a dictionary in which part-of-speech information is described for each word, and the Kana-Kanji conversion mechanism analyzes the reading sequence to create possible phrase candidates, combines them to determine conversion candidates, Posted in likely order. Then, the operator selects a desired candidate from the posted conversion candidates.

【０００３】例えば、読み列「わたしはねこをかってい
る」が入力された場合、仮名漢字変換機構は、「和」、
「棉」、「私は」、「他誌」、「他誌は」、「寝」、
「猫」、「猫を」、「子」、「子を」、「買ってい
る」、「飼っている」、「勝っている」、「勝手」、
「居る」、「入る」などの文節候補を作成し、これらを
組み合わせた「私は猫を買っている」、「私は猫を飼っ
ている」の変換候補をこの順に提示する。この場合、
「買う」の方が一般に「飼う」よりも頻度が高いため、
「私は猫を飼っている」が第１候補として提示されるこ
とは少ない。[0003] For example, if the reading sequence "I am using a cat" is input, the kana-kanji conversion mechanism is "Japanese",
"Cotton", "I", "other magazines", "other magazines", "sleeping",
"Cat,""cat,""child,""child,""buying,""keeping,""winning,""selfish,
The phrase candidates such as “I stay” and “Enter” are created, and conversion candidates of “I am buying a cat” and “I have a cat” are presented in this order. in this case,
Buy is generally more frequent than keep,
"I have a cat" is rarely presented as the first candidate.

【０００４】そこで、変換効率を高めるために、動詞結
合価を利用する変換手法が提案されている。この手法
は、文における名詞の意味分類と動詞との係り受け関係
を結合価パターン（動詞と、所定の意味分類を有する語
との関係をパターン化したもの）として捉え、読み列の
仮名漢字変換時に、その読み列に含まれる語の係り受け
関係を調べ、それを結合価パターンと比較し、一致する
結合価パターンがある場合には、その結合価パターンに
従った変換候補を優先的に提示する手法である。[0004] To increase the conversion efficiency, a conversion method using verb valency has been proposed. This method captures the dependency relationship between a noun's semantic classification and a verb in a sentence as a valency pattern (a pattern of the relationship between a verb and a word having a predetermined semantic classification) and converts the reading sequence into kana-kanji characters Sometimes, examine the dependency relationship of the words included in the reading sequence, compare it with the valence pattern, and if there is a matching valence pattern, preferentially present conversion candidates according to the valence pattern It is a technique to do.

【０００５】例えば「＜物＞を買う」「＜動物＞を飼
う」という結合価パターンが動詞結合価辞書に登録され
ている場合に、「ねこをかう」という入力に対しては、
「猫＝＜動物＞」という関係から「猫を飼う」が第１候
補として提示される。[0005] For example, if the valency pattern of "buying <object>" and "keeping <animal>" is registered in the verb valence dictionary, an input of "cats" is
“Keep a cat” is presented as a first candidate from the relationship “cat = <animal>”.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上記従
来例においては、辞書に登録されていない未知語に関し
ては、意味分類が不明（例えば、上記の例では「猫＝＜
動物＞」であることが不明の場合等）であるため結合価
パターンとの比較を行うことができない。したがって、
未知語を含む場合においては、仮名漢字変換機構は、第
１候補として適切な変換候補を提示することができな
い。そこで、これを変換候補を適正化するためには、オ
ペレータは未知語に関して、読み、表記、品詞と併せて
意味分類を単語登録する必要があった。However, in the above-mentioned conventional example, the semantic classification is unknown for unknown words that are not registered in the dictionary (for example, in the above example, “cat = <<
Animal> ”is unknown) and thus cannot be compared with the valency pattern. Therefore,
When an unknown word is included, the kana-kanji conversion mechanism cannot present an appropriate conversion candidate as the first candidate. Therefore, in order to make the conversion candidate appropriate, it is necessary for the operator to register the semantic classification of unknown words in addition to reading, writing, and part of speech.

【０００７】本発明は上記問題点に鑑みてなされたもの
であり、未知語の意味分類を推定して自動登録する文書
処理装置及びその方法を提供することを課題とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a document processing apparatus and method for estimating the semantic classification of unknown words and automatically registering them.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するため
本発明に係る文書処理装置は、入力した文字列を、単語
の意味分類を収めた辞書を適宜参照しながら変換する文
書処理装置であって、変換した文字列から意味分類が未
知なる未知語を抽出する未知語抽出手段と、抽出した未
知語と係り受け関係にある係り受け語を抽出する係り受
け語抽出手段と、前記未知語の助詞と前記係り受け語と
の関係から前記未知語の意味分類を推定する意味分類推
定手段と、推定した意味分類を前記未知語と併せて前記
辞書に登録する未知語登録手段とを備えることを特徴と
する。In order to solve the above-mentioned problems, a document processing apparatus according to the present invention is a document processing apparatus for converting an input character string while appropriately referring to a dictionary containing meaning classifications of words. An unknown word extracting means for extracting an unknown word whose semantic classification is unknown from the converted character string; a dependency word extracting means for extracting a dependency word having a dependency relationship with the extracted unknown word; A semantic classification estimating means for estimating the semantic classification of the unknown word from the relationship between the particle and the dependency word, and an unknown word registering means for registering the estimated semantic classification in the dictionary together with the unknown word. Features.

【０００９】本発明の好適な実施態様に従えば、例え
ば、前記係り受け語は動詞であり、前記意味分類推定手
段は、各動詞と係り受け関係を有し得る少なくとも１つ
の語の意味分類と該意味分類を有する語に接尾する助詞
とを各動詞毎に収めた結合価辞書手段を有し、前記結合
価辞書手段を参照して、前記未知語の助詞から前記未知
語の意味分類を推定することが望ましい。According to a preferred embodiment of the present invention, for example, the dependency word is a verb, and the semantic classification estimating means determines a semantic classification of at least one word that may have a dependency relationship with each verb. A valence dictionary means for storing, for each verb, a particle suffixed to the word having the semantic classification, and referring to the valence dictionary means, estimating the semantic classification of the unknown word from the particles of the unknown word; It is desirable to do.

【００１０】また、本発明に係る文書処理装置は、メモ
リ媒体上のプログラムに基づいて動作し、入力した文字
列を、単語の意味分類を収めた辞書を適宜参照しながら
変換する文書処理装置であって、前記メモリ媒体は、変
換した文字列から意味分類が未知なる未知語を抽出する
未知語抽出工程の手順コードと、抽出した未知語と係り
受け関係にある係り受け語を抽出する係り受け語抽出工
程の手順コードと、前記未知語の助詞と前記係り受け語
との関係から前記未知語の意味分類を推定する意味分類
推定工程の手順コードと、推定した意味分類を前記未知
語と併せて前記辞書に登録する未知語登録工程の手順コ
ードとを備えることを特徴とする。A document processing apparatus according to the present invention operates based on a program on a memory medium, and converts an input character string while appropriately referring to a dictionary containing meaning classifications of words. The memory medium includes a procedure code of an unknown word extracting step for extracting an unknown word whose semantic classification is unknown from the converted character string, and a dependency for extracting a dependency word having a dependency relation with the extracted unknown word. The procedure code of the word extraction step, the procedure code of the semantic classification estimation step of estimating the semantic classification of the unknown word from the relationship between the particle of the unknown word and the dependency word, and combining the estimated semantic classification with the unknown word And a procedure code for an unknown word registration step of registering in the dictionary.

【００１１】また、本発明に係る文書処理方法は、入力
した文字列を、単語の意味分類を収めた辞書を適宜参照
しながら変換する文書処理方法であって、変換した文字
列から意味分類が未知なる未知語を抽出する未知語抽出
工程と、抽出した未知語と係り受け関係にある係り受け
語を抽出する係り受け語抽出工程と、前記未知語の助詞
と前記係り受け語との関係から前記未知語の意味分類を
推定する意味分類推定工程と、推定した意味分類を前記
未知語と併せて前記辞書に登録する未知語登録工程とを
備えることを特徴とする。Further, the document processing method according to the present invention is a document processing method for converting an input character string while appropriately referring to a dictionary containing the meaning classification of words. An unknown word extraction step of extracting an unknown unknown word, a dependency word extraction step of extracting a dependency word having a dependency relation with the extracted unknown word, and a relationship between the particle of the unknown word and the dependency word A semantic classification estimating step of estimating the semantic classification of the unknown word; and an unknown word registration step of registering the estimated semantic classification in the dictionary together with the unknown word.

【００１２】本発明の好適な実施態様に従えば、例え
ば、前記係り受け語は動詞であり、前記意味分類推定工
程は、各動詞と係り受け関係を有し得る少なくとも１つ
の語の意味分類と、該意味分類を有する語に接尾する助
詞とを各動詞毎に収めた結合価辞書を参照し、前記未知
語の助詞から前記未知語の意味分類を推定することが望
ましい。According to a preferred embodiment of the present invention, for example, the dependency word is a verb, and the semantic classification estimating step includes the step of determining the semantic classification of at least one word that may have a dependency relationship with each verb. It is preferable that the semantic classification of the unknown word is estimated from the particles of the unknown word by referring to a valency dictionary in which particles suffixed to the word having the semantic classification are stored for each verb.

【００１３】[0013]

【発明の実施の形態】以下、本発明の実施の形態の一例
を図面を参照しながら説明する。図１は、本実施の形態
における文書処理装置のシステム構成を示すブロック図
である。ＣＰＵ１１０は、文字処理のための演算、制御
等を行うマイクロプロセッサであり、コントロールバス
（ＣＢ）２１１、データバス（ＤＢ）２１２、アドレス
バス(ＡＢ）２１３で構成されるＣＰＵバスを介して、
各デバイスを制御する。ＣＢ２１１は、リード信号、ラ
イト信号、割り込み信号等で構成されるコントロールバ
スである。ＤＢ２１２は、ＣＰＵ１１０と各デバイス間
でデータを転送するデータバスである。ＡＢ２１２は、
ＣＰＵ１１０の制御対象を特定するアドレスバスであ
る。An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a system configuration of the document processing apparatus according to the present embodiment. The CPU 110 is a microprocessor that performs calculations, controls, and the like for character processing. The CPU 110 includes a control bus (CB) 211, a data bus (DB) 212, and an address bus (AB) 213.
Control each device. The CB 211 is a control bus including a read signal, a write signal, an interrupt signal, and the like. The DB 212 is a data bus that transfers data between the CPU 110 and each device. AB212 is
An address bus for specifying a control target of the CPU 110.

【００１４】ＲＯＭ１２０は、読み出し専用のメモリで
あり、実施の態様により、ＣＰＵ１１０に対して供給す
るプログラムを格納する。このプログラムには、オペレ
ータレーティングシステムや文書処理プログラム等が含
まれる。なお、文書処理プログラムの詳細については後
述する。The ROM 120 is a read-only memory, and stores a program to be supplied to the CPU 110 according to the embodiment. This program includes an operator rating system, a document processing program, and the like. The details of the document processing program will be described later.

【００１５】ＲＡＭ１３０は、ＣＰＵ１１０のワークメ
モリとして機能するランダムアクセスメモリであり、Ｉ
ＢＵＦ１３１、ＯＢＵＦ１３２、ＤＩＣ１３３、ＫＤＩ
Ｃ１３４、ＴＢＵＦ１３５及びプログラムのロード領域
１３６の各領域を有する。ＩＢＵＦ１３１は、キー入力
されたキー入力データ（読み列）を一時記憶する入力バ
ッファ、ＯＢＵＦ１３２は、仮名漢字変換された結果を
一時的に記憶する出力バッファ、ＤＩＣ１３３は、仮名
漢字変換を行うための単語辞書、ＫＤＩＣ１３４は動詞
の結合価パターンを記述した動詞結合価辞書、ＴＢＵＦ
１３５は編集中の文書を記憶するテキストバッファであ
る。また、ロード領域１３６は、実施の態様により、Ｄ
ＩＳＫ１４０に格納された文書処理プログラム等をロー
ドし、ＣＰＵ１１０に該プログラムを供給するための領
域である。The RAM 130 is a random access memory functioning as a work memory for the CPU 110.
BUF131, OBUF132, DIC133, KDI
C134, TBUF135, and program load area 136. IBUF 131 is an input buffer for temporarily storing key input data (reading sequence) input by keys, OBUF 132 is an output buffer for temporarily storing the result of kana-kanji conversion, and DIC 133 is a word for performing kana-kanji conversion. The dictionary, KDIC134, is a verb valency dictionary describing verb valency patterns, TBUF
Reference numeral 135 denotes a text buffer for storing a document being edited. Further, depending on the embodiment, the load area 136 is
An area for loading a document processing program or the like stored in the ISK 140 and supplying the program to the CPU 110.

【００１６】ＫＢ１５０は、キーボード・インターフェ
ースを含むキーボードであり、アルファベットキー、ひ
らがなキー、カタカナキー等の文字記号入力キーの他、
変換キー、変換を確定する確定キー、コピーキー、削除
キー等の文書処理装置に対する指示を与えるための各種
のファンクションキーを備えている。A KB 150 is a keyboard including a keyboard interface. In addition to alphabet keys, hiragana keys, katakana keys, and other character symbol input keys,
Various function keys for giving an instruction to the document processing apparatus, such as a conversion key, a determination key for determining the conversion, a copy key, and a deletion key, are provided.

【００１７】ＤＩＳＫ１４０は、ハードディスクやフロ
ッピーディスク等を含む外部記憶部であり、テキストバ
ッファ（ＴＢＵＦ）１３５上で編集した文書データ等を
格納する他、フロッピーディスク等に後述する文書編集
プログラムを格納し、該プログラムをＲＡＭ１３０のロ
ード領域１３５に転送するために使用することもでき
る。保管した文書データは、必要に応じてテキストバッ
ファ（ＴＢＵＦ）１３５上に読み込んで使用することが
できる。なお、ＤＩＳＫ１４０より文書処理プログラム
を供給する場合には、ＲＯＭ１２０に文書処理プログラ
ムを重複して格納しておく必要はない。The DISK 140 is an external storage unit including a hard disk and a floppy disk. The disk 140 stores document data edited on a text buffer (TBUF) 135, and stores a document editing program described later on a floppy disk or the like. The program can be used to transfer the program to the load area 135 of the RAM 130. The stored document data can be read and used on a text buffer (TBUF) 135 as needed. In the case where the document processing program is supplied from the DISK 140, it is not necessary to store the document processing program in the ROM 120 redundantly.

【００１８】カーソルレジスタ（ＣＲ）１６０は、ＣＲ
Ｔ１９０によって表示するカーソル位置を示すアドレス
情報を保持するレジスタであり、ＣＰＵ１１０は、この
カーソルレジスタ（ＣＲ）１６０内のアドレス情報を書
換えることによってカーソル位置を変更することができ
る。The cursor register (CR) 160 has a CR
This register stores address information indicating the cursor position displayed by T190. The CPU 110 can change the cursor position by rewriting the address information in the cursor register (CR) 160.

【００１９】表示用バッファメモリ（ＤＢＵＦ）１７０
は、ＣＲＴ１９０によって表示するデータ（カーソルを
除く）を保持するバッファメモリであり、ＣＰＵ１１０
は、この表示用バッファメモリ（ＤＢＵＦ）１７０上
に、キャラクタジェネレータ（ＣＧ）２００等を参照し
ながらビットマップ画像を展開する。Display buffer memory (DBUF) 170
Is a buffer memory for holding data (excluding cursors) displayed by the CRT 190,
Develops a bitmap image on the display buffer memory (DBUF) 170 with reference to the character generator (CG) 200 and the like.

【００２０】ＣＲＴコントローラ（ＣＲＴＣ）１８０
は、カーソルレジスタ（ＣＲ）１６０及び表示用バッフ
ァメモリ（ＤＢＵＦ）１７０から供給される情報に基づ
いて、表示装置であるＣＲＴ１９０に画像信号を供給す
る他、同期信号等を供給してＣＲＴ１９０を制御する。
ＣＲＴ１９０は、陰極線管等を用いた表示装置である。
キャラクタジェネレータ（ＣＧ）２００は、ＣＲＴ１９
０に表示する文字や記号のパターンデータを生成或いは
保持する。CRT controller (CRTC) 180
Supplies an image signal to a CRT 190 serving as a display device and supplies a synchronization signal and the like to control the CRT 190 based on information supplied from a cursor register (CR) 160 and a display buffer memory (DBUF) 170. .
The CRT 190 is a display device using a cathode ray tube or the like.
The character generator (CG) 200 is a CRT 19
It generates or holds pattern data of characters and symbols displayed in 0.

【００２１】以上の構成において、オペレータがキーボ
ード（ＫＢ）１５０を介してキー入力を行うと、キーボ
ード（ＫＢ）１５０は、ＣＰＵ１１０に対して割り込み
要求を発生する。このときＣＰＵ１１０の実行は、メイ
ンルーチンから、キー入力を処理する割り込みルーチ
ン、すなわち、後述するキー入力判定ルーチンに移行す
る。In the above configuration, when an operator performs a key input through the keyboard (KB) 150, the keyboard (KB) 150 issues an interrupt request to the CPU 110. At this time, the execution of the CPU 110 shifts from the main routine to an interrupt routine for processing a key input, that is, a key input determination routine to be described later.

【００２２】次に、この実施の形態における文書処理プ
ログラムの構成を示す。図２は、文書処理プログラムの
概念的な構成例を示す図であり、その実体は前述のよう
に、ＲＯＭ１２０、或いはＤＩＳＫ１４０に格納するこ
とができる。Next, the configuration of the document processing program according to this embodiment will be described. FIG. 2 is a diagram showing a conceptual configuration example of the document processing program, and its substance can be stored in the ROM 120 or the DISK 140 as described above.

【００２３】２５０は、キー入力を判定し、それに基づ
いて対応する処理を行うキー入力判定処理に関するプロ
グラムモジュールであり、サブルーチンとしてプログラ
ムモジュール２５１〜２５３を有する。２５１は、入力
した文字を仮名漢字変換するプログラムモジュール、２
５２は仮名漢字変換を確定した際に、学習を行う確定処
理に関するプログラムモジュール、２５３は、例えば文
書の移動、削除、挿入、コピー等の通常の技術に従った
編集処理に関するプログラムモジュールである。Reference numeral 250 denotes a program module related to a key input determining process for determining a key input and performing a corresponding process based on the key input, and includes program modules 251 to 253 as subroutines. 251 is a program module for converting input characters into kana-kanji characters, 2
Reference numeral 52 denotes a program module relating to a determination process for performing learning when the kana-kanji conversion is determined, and reference numeral 253 denotes a program module relating to an editing process according to a normal technique such as moving, deleting, inserting, and copying a document.

【００２４】プログラムモジュール２５２は、サブルー
チンとしてプログラムモジュール２６１〜２６４を有す
る。２６１は未知語及びその助詞を抽出するプログラム
モジュール、２６２は抽出した未知語と係り受けの関係
にある動詞を抽出するプログラムモジュール、２６３は
動詞結合価辞書（ＫＤＩＣ）１３４から抽出した動詞に
対応する動詞結合価パターンを取り出し、未知語の助詞
と一致する動詞結合価パターンから該未知語の意味分類
を推定するプログラムモジュール、２６４は未知語の意
味分類を登録するプログラムモジュールである。The program module 252 has program modules 261 to 264 as subroutines. 261 is a program module for extracting an unknown word and its particles, 262 is a program module for extracting a verb having a dependency relationship with the extracted unknown word, and 263 corresponds to a verb extracted from a verb valence dictionary (KDIC) 134. A program module 264 for extracting the verb valency pattern and estimating the semantic classification of the unknown word from the verb valency pattern that matches the particle of the unknown word is a program module 264 for registering the semantic classification of the unknown word.

【００２５】図３は、入力バッファ（ＩＢＵＦ）１３１
及び出力バッファ（ＯＢＵＦ）１３２の概念的な構成例
を示す図である。両バッファ共に最初の２バイトは、バ
ッファのサイズ情報であり、バッファ内に格納されてい
る文字数から１を減じ、２倍した数値である。入力バッ
ファ（ＩＢＵＦ）１３１の末尾にある「／／」は、そこ
で変換キーが打鍵されたことを意味する。また、各文字
は１文字２バイトで構成され、例えば「ＪＩＳＸ０２
０８」コード等で格納される。FIG. 3 shows an input buffer (IBUF) 131.
2 is a diagram showing a conceptual configuration example of an output buffer (OBUF) 132. FIG. The first two bytes of both buffers are buffer size information, and are numerical values obtained by subtracting 1 from the number of characters stored in the buffer and doubling the number. “//” at the end of the input buffer (IBUF) 131 means that the conversion key has been pressed there. Each character is composed of 2 bytes per character. For example, "JIS X02
08 ”code.

【００２６】図４は、単語辞書（ＤＩＣ）１３３の概念
的な構成例を示す図である。図示のように、単語辞書
（ＤＩＣ）１３３は、「読み」、「表記」、「品詞」、
「単語尤度」、「意味分類」のフィールドから構成され
る。FIG. 4 is a diagram showing a conceptual configuration example of the word dictionary (DIC) 133. As shown, the word dictionary (DIC) 133 includes “reading”, “notation”, “speech”,
It is composed of "word likelihood" and "semantic classification" fields.

【００２７】「読み」には単語の読み、「表記」には単
語の表記、「品詞」には単語の品詞か格納される。ま
た、「単語尤度」には、例えば使用の頻度情報等の、そ
の単語自体の尤もらしさを示す情報が１〜５の値で格納
される。尤度値５は、最も尤もらしいという意味であ
り、値が小さくなるにつれ疑わしいと解釈される。尤度
値０は全く考えられないということを意味するので、単
語尤度の値としては存在しない。また、「意味分類」に
は、＜組織＞、＜動物＞、＜食物＞、＜人間＞等のよう
な意味の分類が格納（一般には複数個）される。但し、
意味分類はその単語が名詞の場合に記述される。The "reading" stores the reading of the word, the "notation" stores the notation of the word, and the "part of speech" stores the part of speech of the word. In the “word likelihood”, information indicating likelihood of the word itself, such as frequency information of use, is stored as values 1 to 5. A likelihood value of 5 means most likely and is interpreted as suspicious as the value decreases. Since the likelihood value 0 means that it cannot be considered at all, it does not exist as a word likelihood value. In the “semantic classification”, semantic classifications such as <organization>, <animal>, <food>, <human>, etc. are stored (generally plural). However,
The semantic classification is described when the word is a noun.

【００２８】図５は、動詞結合価辞書（ＫＤＩＣ）１３
４の概念的な構成例を示す図である。図示のように、動
詞結合価辞書（ＫＤＩＣ）１３４は、「動詞」と、「第
１スロット」〜「第５スロット」のフィールドから構成
されている。FIG. 5 shows a verb valence dictionary (KDIC) 13.
FIG. 4 is a diagram illustrating a conceptual configuration example of No. 4; As shown in the figure, the verb valence dictionary (KDIC) 134 includes a “verb” and fields of “first slot” to “fifth slot”.

【００２９】「動詞」には、見出しとなる動詞が格納さ
れる。「第１スロット」〜「第５スロット」には、「動
詞」の動詞結合価パターンが格納される。各スロットに
は名詞の「意味分類」と「助詞」の他、「尤度」とし
て、各スロットの動詞結合価パターンにおける尤もらし
さを示す情報をが格納される。図中に示される動詞結合
価パターンの例は、「＜人間＞は＜動物＞を飼う」と、
「＜人間＞は＜物＞を買う」の２つである。The "verb" stores a verb serving as a heading. The “first slot” to the “fifth slot” store verb valency patterns of “verbs”. In each slot, information indicating the likelihood in the verb valency pattern of each slot is stored as “likelihood” in addition to the noun “semantic classification” and “particle”. The example of the verb valency pattern shown in the figure is “<human> keeps <animal>”,
"<Human> buys <thing>".

【００３０】ところで、入力バッファ（ＩＢＵＦ）１３
１に入力された読み列に未知語が含まれている場合、そ
の未知語の意味分類は当然に不明であるから、該読み列
を動詞結合価パターンと比較しても適切な変換候補を得
ることができない。そこで、次回同様な読み列が入力さ
れた場合に適切な変換候補を得るためには、その意味分
類を学習する必要がある。The input buffer (IBUF) 13
If an unknown word is included in the reading sequence input to No. 1, since the semantic classification of the unknown word is obviously unknown, an appropriate conversion candidate can be obtained by comparing the reading sequence with the verb valency pattern. Can not do. Therefore, in order to obtain an appropriate conversion candidate when a similar reading sequence is input next time, it is necessary to learn the semantic classification.

【００３１】本実施の形態においては、未知語を含む読
み列が入力された場合（変換結果に未知語が含まれる場
合）に、オペレータの指示に従って作成された変換結果
に基づいて、その未知語の意味分類を自動的に学習す
る。In this embodiment, when a reading string containing an unknown word is input (when the unknown word is included in the conversion result), the unknown word is converted based on the conversion result created according to the instruction of the operator. Automatically learns the semantic classification of.

【００３２】＜意味分類の学習の原理＞図６は、意味分
類の学習の原理を示す図である。例として「わたしはね
ずみをかっている」という読み列が入力バッファ（ＩＢ
ＵＦ）１３１に入力され、単語「鼠」（ねずみ）の意味
分類が未知である場合について説明する。<Principle of Learning of Semantic Classification> FIG. 6 is a diagram showing the principle of learning of semantic classification. As an example, the reading string "I am rat" is input buffer (IB
A case will be described in which the meaning classification of the word “rat” (mouse) is input to the UF) 131 and is unknown.

【００３３】この例において、オペレータの操作によっ
て最終的に確定された変換結果が「私は鼠を飼ってい
る」であった場合、未知語「鼠」に接尾する助詞「を」
に注目し、動詞結合価辞書（ＫＤＩＣ）１３４から、文
中の動詞「飼う」の動詞結合価パターンにおける助詞
「を」を検索する。そして、その条件に一致するものが
存在すれば、動詞結合価パターンとの対応から未知語
「鼠」の意味分類が「＜動物＞」であることを特定する
ことができる。この意味分類は、読み、表記等と併せて
単語辞書（ＤＩＣ）１３３に登録することによって、次
回「ねずみ」を含む読み列が入力された場合に、「ねず
み」は「＜動物＞」であることが判っているため、仮名
漢字変換を効率的に行うことができる。In this example, when the conversion result finally determined by the operation of the operator is "I have a rat", the particle "o" suffixed to the unknown word "rat" is obtained.
And searches the verb valency dictionary (KDIC) 134 for the particle “o” in the verb valency pattern of the verb “Kai” in the sentence. Then, if there is one that matches the condition, it can be specified that the semantic classification of the unknown word “rat” is “<animal>” from the correspondence with the verb valency pattern. By registering this semantic classification in the word dictionary (DIC) 133 together with readings, notations, etc., the next time a reading sequence including “mouse” is input, “mouse” is “<animal>”. Therefore, kana-kanji conversion can be performed efficiently.

【００３４】以下、文書処理装置の動作例をフローチャ
ートを参照しながら説明する。Hereinafter, an operation example of the document processing apparatus will be described with reference to a flowchart.

【００３５】＜キー入力判定ルーチンの動作例＞図７
は、キー入力判定ルーチンの処理の流れを示すフローチ
ャートである。キー入力の判定ルーチンには、キーボー
ド（ＫＢ）１５０を介してキー入力が起こったときに、
キーボード（ＫＢ）１５０よりＣＰＵ１１０に対して割
り込み要求を発生することによって移行する。<Operation Example of Key Input Determination Routine> FIG.
5 is a flowchart showing the flow of processing of a key input determination routine. In the key input determination routine, when a key input occurs via the keyboard (KB) 150,
The transition is made by generating an interrupt request from the keyboard (KB) 150 to the CPU 110.

【００３６】先ず、ステップＳ７０１では、キーボード
（ＫＢ）１５０からのキー入力を取り込む。ステップＳ
５０２では、入力されたキーが如何なるキーであるかを
判定し、その結果に基づいて処理を分岐する。すなわ
ち、入力キーが、変換キー（上記の「／／」）である場
合はステップＳ７０３に進み、確定キーである場合には
ステップＳ７０４に進み、コピーキーや削除キー等の、
その他のキー（変換キー、確定キー以外のキー）である
場合にはステップＳ７０５に進む。First, in step S701, a key input from the keyboard (KB) 150 is fetched. Step S
At 502, it is determined what key the input key is, and the process branches based on the result. That is, if the input key is a conversion key ("//" above), the process proceeds to step S703. If the input key is a determination key, the process proceeds to step S704.
If it is another key (a key other than the conversion key and the confirmation key), the process proceeds to step S705.

【００３７】ステップＳ７０３（変換処理）及びステッ
プＳ７０４（確定処理）については後述する。ステップ
Ｓ７０５では、キー入力に基づいて、例えば文書の移
動、削除、挿入、コピー等の通常の技術に従った編集処
理を行う。そして、ステップＳ７０３、Ｓ７０４若しく
はＳ７０５の処理を終えたらメインルーチンに復帰す
る。Step S703 (conversion processing) and step S704 (confirmation processing) will be described later. In step S705, based on the key input, an editing process according to a normal technique such as moving, deleting, inserting, or copying a document is performed. Then, when the processing of step S703, S704 or S705 is completed, the process returns to the main routine.

【００３８】＜変換処理ルーチンの動作例＞図８は、ス
テップＳ７０３の変換処理の流れを示すフローチャート
である。ステップＳ８０１では、入力バッファ（ＩＢＵ
Ｆ）１３１に格納された読み文字列を解析し、文節候補
群を作成する。例えば、読み列「わたしはねこをかって
いる」が入力された場合、文節候補は、例えば「和」、
「棉」、「私は」、「他誌」、「他誌は」、「寝」、
「猫」、「猫を」、「子」、「子を」、「買ってい
る」、「飼っている」、「勝っている」、「勝手」、
「居る」、「入る」のようになる。<Example of Operation of Conversion Processing Routine> FIG. 8 is a flowchart showing the flow of the conversion processing in step S703. In step S801, the input buffer (IBU
F) Analyze the read character string stored in 131 and create a phrase candidate group. For example, when the reading column “I am a cat” is input, the phrase candidates are, for example, “Wa”,
"Cotton", "I", "other magazines", "other magazines", "sleeping",
"Cat,""cat,""child,""child,""buying,""keeping,""winning,""selfish,
It is like "I'm here", "I'm going".

【００３９】ステップＳ８０２では、作成した文書候補
群から、第１の変換候補を決定する第１候補決定処理を
行う。ステップＳ８０３では、決定した第１候補に基づ
いて変換結果を作成し、出力バッファ（ＯＢＵＦ）１３
２に出力し、キー入力判定ルーチンに復帰する。In step S802, a first candidate determination process for determining a first conversion candidate from the created document candidate group is performed. In step S803, a conversion result is created based on the determined first candidate, and the output buffer (OBUF) 13
2 to return to the key input determination routine.

【００４０】図９は、ステップＳ８０２の第１候補決定
処理の流れを示すフローチャートである。先ず、ステッ
プＳ９０１では、最尤文尤度を、例えば処理上許される
最小値（例えば−３２７６７）に初期設定する。FIG. 9 is a flowchart showing the flow of the first candidate determining process in step S802. First, in step S901, the maximum likelihood sentence likelihood is initialized to, for example, a minimum value (for example, -32767) that is allowed in processing.

【００４１】ステップＳ９０２では、ステップＳ８０１
（文節候補作成処理）において作成した文節候補群から
文節候補列（例えば上記の例においては、「私は猫を飼
っている」、「私は猫を買っている」等）を１つ取り出
す。In step S902, step S801
One phrase candidate sequence (for example, “I have a cat”, “I buy a cat”, etc.) is extracted from the phrase candidate group created in the (phrase candidate creation process).

【００４２】ステップＳ９０３では、取出した文節候補
列に適用可能な動詞結合価パターンを動詞結合価辞書
（ＫＤＩＣ）１３４から検索し、そのパターンに含まれ
る各尤度の和である「動詞結合価尤度和」を算出する
（詳細は後述）。In step S903, a verb valency pattern applicable to the extracted phrase candidate string is retrieved from the verb valency dictionary (KDIC) 134, and the "verb valency likelihood" which is the sum of likelihoods included in the pattern is retrieved. Is calculated (details will be described later).

【００４３】ステップＳ９０４では、対象としている文
節候補列がどの程度尤もらしいかを表す文尤度を、一般
的な文節尤度、文節間尤度、用例尤度と、上記の動詞結
合価尤度とから算出する。In step S904, the sentence likelihood indicating the likelihood of the target phrase candidate sequence is calculated by using the general phrase likelihood, the inter-phrase likelihood, the example likelihood, and the above-mentioned verb binding likelihood. Is calculated from

【００４４】ステップＳ９０５では、ステップＳ９０４
において算出した文尤度（現文尤度）が、最大文尤度よ
りも大きい（尤もらしい）か否かを判定し、尤もらしい
場合にはステップＳ９０６に進み、尤もらしくない場合
にはステップＳ９０８に直接進む。In step S905, step S904
It is determined whether or not the sentence likelihood (current sentence likelihood) calculated in (2) is larger than the maximum sentence likelihood (likelihood). If it is likely, the process proceeds to step S906; otherwise, the process proceeds to step S908. Go directly to.

【００４５】ステップＳ９０６では、ステップＳ９０４
において算出した現文尤度によって、最大文尤度を更新
し、ステップＳ９０７では、対象となっている文節候補
列を記憶する（既に、記憶されている場合には更新す
る）。In step S906, step S904
The maximum sentence likelihood is updated based on the current sentence likelihood calculated in step S907, and in step S907, the target phrase candidate sequence is stored (if it is already stored, it is updated).

【００４６】ステップＳ９０８では、別の文節候補列が
存在するか否かを判定し、存在する場合にはステップＳ
９０２に戻り、別の文節候補列に関して上記の処理を繰
り返し、文尤度が最大となる文節候補列を見つけ出す。
一方、全ての文節候補列を対象として上記の処理を終え
た場合には変換処理ルーチンに復帰する。In step S908, it is determined whether or not another clause candidate string exists.
Returning to step 902, the above processing is repeated for another phrase candidate sequence, and a phrase candidate sequence with the maximum sentence likelihood is found.
On the other hand, when the above processing is completed for all the phrase candidate strings, the process returns to the conversion processing routine.

【００４７】図１０は、ステップＳ９０３の結合価尤度
和算出処理の流れを示すフローチャートである。先ず、
ステップＳ１００１では、結合価尤度和を０に初期化す
る。ステップＳ１００２では、文節候補列から動詞を取
り出し、動詞結合価辞書（ＫＤＩＣ）１３４から、その
動詞に関する動詞結合価パターンを検索する。ステップ
Ｓ１００３では、文節候補列から取出した動詞と係り受
け関係にある名詞と、その名詞の助詞を検索する。FIG. 10 is a flowchart showing the flow of the valency likelihood sum calculation processing in step S903. First,
In step S1001, the sum of likelihood of valence is initialized to zero. In step S1002, the verb is extracted from the phrase candidate sequence, and the verb valency pattern (KDIC) 134 is searched for a verb valency pattern relating to the verb. In step S1003, a noun having a dependency relationship with the verb extracted from the phrase candidate sequence and a particle of the noun are searched.

【００４８】ステップＳ１００４では、ステップＳ１０
０３において、取出した動詞と係り受け関係にある名詞
と、その名詞の助詞を発見することができたか否かを判
定し、発見することができなかった場合（該当する全て
の名詞についての処理を終えた場合を含む）には処理を
終了し、第１候補決定処理ルーチンに復帰する。In step S1004, step S10
03, it is determined whether or not a noun having a dependency relation with the extracted verb and a particle of the noun could be found, and if the noun could not be found (the processing for all applicable nouns was performed). In this case, the process is terminated, and the process returns to the first candidate determination process routine.

【００４９】ステップＳ１００５では、文節候補列から
取出した動詞に対応する動詞結合価パターンと、文節候
補列から取出した名詞の意味分類（単語辞書（ＤＩＣ）
１３３を参照して得る）及びその名詞の助詞とを比較
し、適合するスロットの尤度を取り出す。そして、取出
した尤度を結合価尤度和に加算し、ステップＳ１００３
に戻る。以下、文節候補列から次の名詞とその助詞を検
索し、全ての名詞に関して処理が終了するまでステップ
Ｓ１００３以降を繰り返して実行する。In step S1005, the verb valence pattern corresponding to the verb extracted from the phrase candidate sequence and the semantic classification of the noun extracted from the phrase candidate sequence (word dictionary (DIC)
133) and its noun particle and extract the likelihood of the matching slot. Then, the extracted likelihood is added to the sum of valence likelihoods, and step S1003
Return to Hereinafter, the next noun and its particle are searched from the phrase candidate sequence, and the processing from step S1003 is repeated until all the nouns are processed.

【００５０】＜確定処理ルーチンの動作例＞以下、確定
処理ルーチンの動作例について説明する。図１１は、ス
テップＳ７０４の確定処理の流れを示すフローチャート
である。先ず、ステップＳ１１０１では、前述の＜意味
分類の学習の原理＞に従って「意味分類未知語処理」を
行う。ステップＳ１１０２では、通常の技術に基づいて
他の学習処理（例えば、読みと表記の関係等）を行い、
キー入力判定ルーチンに復帰する。<Operation Example of Determination Processing Routine> An operation example of the determination processing routine will be described below. FIG. 11 is a flowchart illustrating the flow of the determination process in step S704. First, in step S1101, "semantic classification unknown word processing" is performed in accordance with the <principle of learning of semantic classification>. In step S1102, another learning process (for example, a relationship between reading and notation) is performed based on a normal technique.
Return to the key input determination routine.

【００５１】図１２は、ステップＳ１１０１の意味分類
未知語処理の流れを示すフローチャートである。先ず、
ステップＳ１２０１では、仮名漢字変換の出力結果、す
なわち、出力バッファ（ＯＢＵＦ）１３２上の出力結果
に意味分類が未知である未知語が含まれているか否かを
単語辞書（ＤＩＣ）１３３を参照することによって判定
し、未知語が含まれていなければ、以下の処理の対象と
はならないため確定処理ルーチンに復帰する。一方、未
知語が含まれている場合にはステップＳ１２０２に進
み、その未知語と、それに接尾する助詞とを取り出す。FIG. 12 is a flowchart showing the flow of the process of unknown words in the meaning classification in step S1101. First,
In step S1201, the word dictionary (DIC) 133 is referred to to determine whether the output result of the kana-kanji conversion, that is, the output result on the output buffer (OBUF) 132 includes an unknown word whose semantic classification is unknown. If an unknown word is not included, the process returns to the finalization process routine because it is not the target of the following process. On the other hand, if an unknown word is included, the process proceeds to step S1202, and the unknown word and a particle suffixed to the unknown word are extracted.

【００５２】ステップＳ１２０３では、取出した未知語
と係り受け関係にある動詞を出力バッファ（ＯＢＵＦ）
１３２上の変換結果から取り出す。ステップＳ１２０４
では、動詞結合価辞書（ＫＤＩＣ）１３４を参照して、
取出した動詞に関する動詞結合価パターンを得る。ステ
ップＳ１２０５では、ステップＳ１２０２において取出
した助詞と、ステップＳ１２０４において得た動詞結合
価パターンの各スロットの助詞とを比較し、両者が一致
するスロットが存在したら、ステップＳ１２０６におい
て該スロットの意味分類を、先に取り出した未知語の意
味分類であると推定して、該未知語を単語辞書（ＤＩ
Ｃ）１３３に登録し、ステップＳ１２０１に戻り、次の
未知後に関して上記同様の処理を行う。一方、両者が一
致するスロットが存在しない場合は、先に取出した未知
語の意味分類を上記の例に従って推定することはできな
いため、該未知語の登録は行わず、ステップＳ１２０１
に戻り、次の未知後に関して上記同様の処理を行う。In step S1203, a verb having a dependency relationship with the extracted unknown word is output to an output buffer (OBUF).
132 from the conversion result. Step S1204
Then, referring to the verb valence dictionary (KDIC) 134,
Obtain the verb valency pattern for the extracted verb. In step S1205, the particle extracted in step S1202 is compared with the particle in each slot of the verb valency valence pattern obtained in step S1204. If there is a slot in which both match, the semantic classification of the slot is determined in step S1206. It is presumed that the unknown word is a semantic classification of the previously extracted unknown word, and the unknown word is referred to a word dictionary (DI).
C) 133, the process returns to step S1201, and the same processing as described above is performed for the next unknown unknown. On the other hand, if there is no matching slot, the semantic classification of the previously extracted unknown word cannot be estimated according to the above example, so that the unknown word is not registered, and step S1201 is not performed.
And the same processing as above is performed for the next unknown post.

【００５３】以上のように、変換結果に意味分類に関す
る未知語が存在する場合に、該未知語の意味分類を、該
未知語と係り受けの関係にある動詞に関する動詞結合価
パターンを参照して推定し、登録することにより、逐一
オペレータが未知語の登録を行う必要がなくなる。As described above, when there is an unknown word relating to the semantic classification in the conversion result, the semantic classification of the unknown word is determined by referring to the verb valency pattern relating to the verb having a dependency relationship with the unknown word. Estimating and registering eliminates the need for the operator to register unknown words every time.

【００５４】＜未知語に関する他の処理例＞上記の説明
においては、未知語の意味分類の推定を、該未知語の助
詞と、対応する動詞結合価パターンとを比較することに
より実現する例を説明したが、該未知語の助詞と一致す
る動詞結合価パターンが存在しない場合には、例えば、
次のようにして未知語の意味分類を推定することができ
る。すなわち、例えば未知語に接尾する助詞が「で」で
あれば、その未知語に対して「＜場所＞」という意味分
類を付与する等、特定の助詞を有する場合において、そ
の助詞から一義的に導き得る意味分類、或いはその助詞
と結び付く可能性が高い意味分類を付与することもでき
る。<Another Processing Example Regarding Unknown Word> In the above description, an example of realizing estimation of the semantic classification of an unknown word by comparing the particle of the unknown word with the corresponding verb valency pattern will be described. As described above, when there is no verb valence pattern that matches the particle of the unknown word, for example,
The semantic classification of an unknown word can be estimated as follows. That is, for example, if the particle suffixed to an unknown word is "de", when the unknown word has a specific particle, such as by assigning a semantic classification of "<place>", the particle is uniquely identified from the particle A semantic classification that can be derived, or a semantic classification that is likely to be associated with the particle can also be given.

【００５５】また、本発明は、複数の機器（例えば、ホ
ストコンピュータ，インタフェイス機器，リーダ，プリ
ンタなど）から構成されるシステムに適用しても、一つ
の機器からなる装置（例えば、複写機，ファクシミリ装
置など）に適用してもよい。また、本発明の目的は、前
述した実施形態の機能を実現するソフトウェアのプログ
ラムコードを記録した記憶媒体を、システムあるいは装
置に供給し、そのシステムあるいは装置のコンピュータ
（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプロ
グラムコードを読出し実行することによっても、達成さ
れることは言うまでもない。The present invention can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), and can be applied to a single device (for example, a copier, Facsimile machine, etc.). Further, an object of the present invention is to provide a storage medium storing a program code of software for realizing the functions of the above-described embodiments to a system or an apparatus, and a computer (or CPU or MPU) of the system or apparatus to store the storage medium. Needless to say, this can also be achieved by reading and executing the program code stored in the program.

【００５６】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００５７】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００５８】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (Operating System) running on the computer based on the instruction of the program code. ) May perform some or all of the actual processing, and the processing may realize the functions of the above-described embodiments.

【００５９】さらに、記憶媒体から読出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張ボードや機能拡張ユニットに備わ
るＣＰＵなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided on a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instructions of the program code, It goes without saying that the CPU included in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００６０】[0060]

【発明の効果】以上説明したように本発明によれば、未
知語の意味分類を推定し自動登録を行うことにより、逐
一オペレータが未知語の登録を行わなくても、意味分類
を学習することが可能になり、操作性を向上させること
ができるという効果がある。As described above, according to the present invention, the semantic classification of an unknown word is estimated and automatic registration is performed, so that the operator can learn the semantic classification without registering the unknown word. And the operability can be improved.

【００６１】[0061]

[Brief description of the drawings]

【図１】本実施の形態における文書処理装置のシステム
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a system configuration of a document processing apparatus according to an embodiment.

【図２】文書処理プログラムの概念的な構成例を示す図
である。FIG. 2 is a diagram illustrating a conceptual configuration example of a document processing program.

【図３】入力バッファ及び出力バッファの概念的な構成
例を示す図である。FIG. 3 is a diagram illustrating a conceptual configuration example of an input buffer and an output buffer.

【図４】単語辞書の概念的な構成例を示す図である。FIG. 4 is a diagram showing a conceptual configuration example of a word dictionary.

【図５】動詞結合価辞書の概念的な構成例を示す図であ
る。FIG. 5 is a diagram showing a conceptual configuration example of a verb valence dictionary.

【図６】意味分類の学習の原理を示す図である。FIG. 6 is a diagram showing the principle of learning of semantic classification.

【図７】キー入力判定ルーチンの処理の流れを示すフロ
ーチャートである。FIG. 7 is a flowchart illustrating a flow of processing of a key input determination routine.

【図８】変換処理の流れを示すフローチャートである。FIG. 8 is a flowchart illustrating a flow of a conversion process.

【図９】第１候補決定処理の流れを示すフローチャート
である。FIG. 9 is a flowchart illustrating a flow of a first candidate determination process.

【図１０】結合価尤度和算出処理の流れを示すフローチ
ャートである。FIG. 10 is a flowchart illustrating a flow of a valence likelihood sum calculation process.

【図１１】確定処理の流れを示すフローチャートであ
る。FIG. 11 is a flowchart illustrating the flow of a confirmation process.

【図１２】意味分類未知語処理の流れを示すフローチャ
ートである。FIG. 12 is a flowchart showing a flow of semantic classification unknown word processing.

[Explanation of symbols]

１３１入力バッファ１３２出力バッファ１３３単語辞書１３４動詞結合価辞書１３５テキストバッファ１４０外部記憶部 131 input buffer 132 output buffer 133 word dictionary 134 verb valency dictionary 135 text buffer 140 external storage unit

Claims

[Claims]

1. A document processing apparatus for converting an input character string while appropriately referring to a dictionary storing semantic classification of words, wherein the unknown word extracting an unknown word whose semantic classification is unknown from the converted character string Extraction means; dependency word extraction means for extracting a dependency word having a dependency relationship with the extracted unknown word; and estimating the semantic classification of the unknown word from the relationship between the particles of the unknown word and the dependency word. A document processing apparatus comprising: semantic classification estimating means; and unknown word registering means for registering the estimated semantic classification in the dictionary together with the unknown word.

2. The dependency word is a verb, and the semantic classification estimating means includes a semantic classification of at least one word that may have a dependency relationship with each verb, and a particle suffixed to the word having the semantic classification. 2. A valence dictionary means for storing each of the verbs for each verb, and referring to the valence dictionary means to estimate a semantic classification of the unknown word from a particle of the unknown word. Document processing device.

3. A document processing apparatus that operates based on a program on a memory medium and converts an input character string while appropriately referring to a dictionary containing semantic classifications of words. A procedure code of an unknown word extraction step of extracting an unknown word whose semantic classification is unknown from the extracted character string, and a procedure code of a dependency word extraction step of extracting a dependency word having a dependency relationship with the extracted unknown word; A procedure code for a semantic classification estimation step of estimating the semantic classification of the unknown word from the relationship between the particle of the unknown word and the dependency word; and an unknown word registration for registering the estimated semantic classification in the dictionary together with the unknown word. A document processing apparatus, comprising: a procedure code of a process;

4. A document processing method for converting an input character string while appropriately referring to a dictionary containing semantic classification of words, wherein the unknown word extracting an unknown word whose semantic classification is unknown from the converted character string. An extraction step, a dependency word extraction step of extracting a dependency word having a dependency relation with the extracted unknown word, and a semantic classification of the unknown word is estimated from a relationship between the particles of the unknown word and the dependency word. A document processing method comprising: a semantic classification estimating step; and an unknown word registering step of registering the estimated semantic classification together with the unknown word in the dictionary.

5. The dependency word is a verb, and the semantic classification estimating step includes a semantic classification of at least one word that may have a dependency relationship with each verb, and a particle suffixed to the word having the semantic classification. 5. The document processing method according to claim 4, wherein a semantic classification of the unknown word is estimated from the particles of the unknown word by referring to a valence dictionary containing the following for each verb.