JP2862236B2

JP2862236B2 - Character processor

Info

Publication number: JP2862236B2
Application number: JP62246302A
Authority: JP
Inventors: 英一朗戸島; 駿平竹中
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1987-09-30
Filing date: 1987-09-30
Publication date: 1999-03-03
Anticipated expiration: 2014-03-03
Also published as: JPS6488869A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は仮名漢字混じり文字を入力、入力、編集する
文字処理装置に関するものである。［従来の技術］従来、仮名漢字混じりの日本文を入力する方法とし
て、タツチタイプ法、仮名漢字変換法等の入力方式が考
案されてきた。タツチタイプ法は、各漢字をキーボード上のキーのユ
ニークなシーケンスでコード化し、対応するキーを操作
することで直接に漢字を入力する方式である。各漢字の
コードを全て記憶する必要はあるが、習熟すると高速で
漢字入力が可変であるという利点を持っている。仮名漢字変換法は、漢字に対応する読みを入力し、表
示された候補のうちから目的のものをオペレータが選択
する方式である。漢字の読み方を知っていればすぐに入
力が可能なので習熟が非常に速いという利点を持ってい
る。［発明が解決しようとしている問題点］しかしながら、これらの従来方式には欠点があった。タツチタイプ法は、漢字のコードを記憶しなければな
らないため、習熟に時間がかかり、かつ、習熟が進んで
いないときは入力速度が他の方式（例えば、仮名漢字変
換法）よりもかえって遅くなるため、相当訓練を行なっ
たあとでないと日常業務に使用することができなかっ
た。すなわち、業務に使用しながら練習するということ
ができなかった。仮名漢字変換法は直接にユニークな漢字を入力する方
式ではなく、対話式に漢字を選んでいくためある一定以
上には入力速度が上がらないという欠点があった。［問題点を解決するための手段および作用］上記の問題点を解決するために、本発明によれば、文
字処理装置に、仮名と漢字とを含んだ仮名漢字混じり文
字列を入力する入力手段と、該入力手段より入力された
仮名漢字混じり文字列を記憶する記憶手段と、単語の読
みと、漢字表記とを、該漢字表記中の各漢字に対応する
読み毎に前記単語の読みに区切り情報を付加して記憶す
る辞書手段と、該辞書手段の漢字表記の一部の漢字を前
記区切り情報に基づいて対応する読みに展開した仮名漢
字混じり表記を参照して、前記記憶手段に記憶された仮
名漢字混じり文字列中の仮名を、該仮名を読みとする漢
字であって、該漢字と前記仮名に連続する漢字とが１単
語の漢字表記を構成するような漢字に変換する変換手段
とを備えたことにより、仮名漢字混じり文字列中の仮名
を、該仮名を読みとする漢字であって、該漢字と前記仮
名に連続する漢字とが１単語の漢字表記を構成するよう
な漢字に変換するようにしたものである。［実施例］以下図面を参照しなから本発明を詳細に説明する。第１図は本発明の全体構成の一例である。図示の構成において、CPUは、マイクロプロセツサで
あり、文字処理のための演算、論理判断等を行ない、ア
ドレスバスAB、コントロールバスCB、データバスDBを介
して、それらのバスに接続された各構成要素を制御す
る。アドレスバスABはマイクロプロセツサCPUの制御の対
象とする構成要素を指示するアドレス信号を転送する。
コントロールバスCBはマイクロプロセツサCPUの制御の
対象とする各構成要素のコントロール信号を転送して印
加する。データバスDBは各構成機器相互間のデータの転
送を行なう。つぎにROMは、読出し専用の固定メモリであり、第７
図〜第９図につき後述するマイクロプロセツサCPUによ
る制御の手順等を記憶させておく。また、RAMは、１ワード16ビツトの構成の書込み可能
のランダムアクセスメモリであって、各構成要素からの
各種データの一時記憶に用いる。 DICは辞書であり、仮名漢字混じりの単語を漢字表記
に変換するための対応表を記憶する。 IBUFは入力された仮名漢字混じり文を蓄えるための入
力バツフアである。 OBUFは変換結果の表記列を蓄えるための出力バツフア
である。 BUNTBは文節テーブルであり、入力仮名漢字混じり文
に含まれる文節を記憶するテーブルである。 TBUFはテキストバツフアであり、編集中の文書データ
を一時記憶するエリアである。 KBはキーボードであって、アルフアベツトキー、ひら
かなキー、カタカナキー等の文字記号入力キー、及び、
変換キー等の本発明文字処理装置に対する各種機能を指
示するための各種のフアンクシヨンキーを備えている。 DISKは文書データ、及び辞書データを記憶するための
外部記憶であり、作成された文書の保管を行ない、保管
された文書はキーボードの指示により、必要な時呼び出
される。また、辞書データは適当なタイミングでRAM上
のエリアDICにロードされ、参照される。 CRはカーソルレジスタである。CPUにより、カーソル
レジスタの内容を読み書きできる。後述するCRTコント
ローラCRTCは、ここに蓄えられたアドレスに対応する表
示装置CRT上の位置にカーソルを表示する。 DBUFは表示用バツフアメモリで、表示すべきデータの
パターンを蓄える。 CRTCはカーソルレジスタCR及びバツフアDBUFに蓄えら
れた内容を表示器CRTに表示する役割を担う。またCRTは陰極線管等を用いた表示装置であり、その
表示装置CRTにおけるドツト構成の表示パターンおよび
カーソルの表示をCRTコントローラで制御する。さらに、CGはキヤラクタジエネレータであて、表示装
置CRTに表示する文字、記号のパターンを記憶するもの
である。かかる各構成要素からなる本発明文字処理装置におい
て、キーボードKBからの各種の入力に応じて作動するも
のであって、キーボードKBからの入力が供給されると、
まず、インタラプト信号がマイクロプロセツサCPUに送
られ、そのマイクロプロセツサCPUがROM内に記憶してあ
る各種の制御信号を読出し、それらの制御信号に従って
各種の制御が行なわれる。第２図は本発明の有用性を示した図である。図におい
てTSが文書画面を示し、MSはモニタ画面を示す。キーボ
ードから入力したデータは一旦モニタ画面MSに表示さ
れ、変換後文書画面TSに表示される。（ａ）はオペレータがキーボードより「大さん事のゆ
う因をつい跡する」入力した図である。ここで、「大」
「事」「因」「跡」はコードを記憶していたのでタツチ
タイプ法で入力している。「さん」「ゆう」「つい」に
ついても本当はタツチタイプで「惨」「誘」「追」とダ
イレクト入力したがったがコードを記憶していなかった
のでやむなく読みで入力したのである。（ｂ）は変換キーを入力したあとを示した図である。
「さん」「ゆう」「つい」がそれぞれ、「惨」「誘」
「追」に変換されている。例えば、「さん」については
「三」「山」「参」「産」「散」など多数の同音語が存
在するが、「大惨事」という単語が辞書に登録されてい
るので一意に「惨」と変換されたのである。第３図は辞書（DIC）の構成を示した図である。各レ
コードは単語を記憶し、１単語26バイト（固定長）で記
憶する。レコード先頭20バイトは見出しである。単語の見出し
が１文字２バイトで格納される。文字コードはJIS X 02
08コードで記憶される。次の４バイトはポインタである。単語の漢字表記を持
つレコードの存在するアドレスを記憶する。例えば、
「大さん事」は「大惨事」のレコードをポイントする。最後の２バイトは文法情報である。例えば、「大さん
事」であれば名詞、「つい跡」であればサ変名詞と記憶
する。第４図は入力バツフア（IBUF）出力バツフア（OBUF）
の構成を示した図である。キーボードから入力されるキ
ーは全てタツチタイプ法によるコード変換を受け、対応
する漢字等に変換され、JIS X 0208コードとしてIBUFに
入る。 IBUFは１文字２バイトで構成され、各文字はJIS X 02
08Dコードで記述される。入力がまだ終っていない末尾
の部分についてはOFF Hが埋まっている。 OBUFの構成もIBUFと同一である。変換が指示されると、IBUFの内容が変換されてOBUFに
格納され、その後テキストバツフア（TBUF）中に転送さ
れる。第５図は文節テーブルの構成を説明した図である。入
力バツフアIBUF上の文の文節が解析され、BUNTB上に記
憶される。入力バツフア先頭の文節から１文節５ワードで記憶さ
れる。最終文節の次からはOFF Hを埋める。各文節の構成を次に示す。先頭１ワードは文節開始位置を記憶する。その文節の
開始点のIBUF先頭からの相対アドレスを記憶する。例え
ば文節「大さん事の」であれば、「大」の位置に相当す
るアドレスを記憶する。次の１ワードの文節終了位置を記憶する。その分折の
終了点のIBUF先頭からの相対アドレスを記憶する。例え
ば文節「大さん事の」であれば、「の」の位置に相当す
るアドレスを記憶する。次の１ワードは文節先頭にある自立語部分の文字数を
記憶する。例えば、文節「大さん事の」であれば、自立
語は「大さん事」であれから、「４」が記憶される。最後の２ワードは自立語の表記が存在する辞書上のア
ドレスを記憶する。例えば文節「大さん事」のであれ
ば、自立語は「大さん事」でありその表記は「大惨事」
であるから、「大惨事」という表記が存在する辞書上の
アドレスが記憶される。第６図（ａ）は仮名漢字変換の通過を説明した図であ
る。入力「大さん事のゆう因をつい跡する」にたいして、
まず、入力中の文節が解析され、文節区切が付けられ
る。区切りの求め方は、最長一致法、２文節最長一致
法、文節数最小法等の方式があり、任意に選ぶことがで
きる。区切りが決まると、各文節自立語の仮名部分が漢
字に変換される。（ｂ）は文節の区切り方として最長一致法を採用した
ときの説明である。まず入力列の最初の文節の切り出しが行なわれる。そ
の結果、「大」「大さん」（人名）「大さん事」「大さ
ん事の」という文節候補が切り出される。このうち、最
長一致する「大さん事の」が採用される。次にそれに引き続き文節の切り出しが行われる。その
結果、「ゆ」（「湯」）「ゆう」（「結う」「夕」）
「ゆう因」「ゆう因を」が切り出され、最長一致する
「ゆう因を」が採用される。以下同様にして文節の区切りが確定していく。上述の時の動作をフローに従って説明する。第７図はキー入力を取り込み、処理を行なう部分のフ
ローチヤートである。ステツプ７−１はキーボードからのデータを入力バツ
フアIBUFに取り込む処理である。IBUF内にもし変換キー
のデータが含まれていたときはかな漢字変換を行なわな
ければならず、ステツプ７−２に進む。そうでなければ
通常の編集処理を行なうのでステツプ７−６に進む。ステツプ７−２において第８図に詳細するようにIBUF
上に入力列を文節に分割する。ステツプ７−３において第９図に詳述するように分折
に分割された入力列をOBUF上に漢字に変換する。ステツプ７−４においてOBUF上に作成された変換結果
をテキストバツフアTBUF上に出力する。更にステツプ７−５において出力された変換結果を表
示する。ステツプ７−６はカーソル移動、文字入力、文書保
存、等の通常の文字処理装置で公知の処理を行なうもの
であり、説明は省略する。第８図はステツプ７−２の文節分割処理を詳細化した
ものである。ステツプ８−１は変数の初期化処理である。入力バツ
フアIBUFの先頭から何文字目を処理しているかを監視す
る変数をｉを１に初期化し、文節テーブルBUNTBの現在
作成中のアドレスを管理する変数ｊを０に初期化する。ステツプ８−２においてIBUF上に処理すべき文字がも
はや存在しないかどうかチエツクする。具体的にはIBUF
のｉ文字目がOFF Hであるかどうかで判定する。もし、I
BUFが終了していればステツプ８−８に進み、文節テー
ブルをクローズする。IBUFが終了していなければ、ステ
ツプ８−３に進む。ステツプ８−３において入力列上に作成できるあらゆ
る文節の可能性をチエツクするために、IBUFのｉ文字目
から始まる単語を全てサーチする。ステツプ８−４において、サーチされた単語につなが
る付属語列を全て解析する。ステツプ８−５において解析された文節の候補のうち
最長のものを取り出し決定する。ステツプ８−６において決定された最長の文節候補を
文節テーブルに登録する。登録するときはBUNTBのｊバ
イト目から作成する。作成後、ｊの値を作成したバイト
数、すなわち10だけ加算する。ステツプ８−７においてｉの値を現在処理された最長
の文節の読み数分だけ加算し、ステツプ８−２にループ
する。ステツプ８−８は文節テーブルをクローズする処理で
あり、BUNTBのｊの示すバイト以降をOFF Hでクリアす
る。第９図はステツプ７−３の漢字変換処理を更に詳細化
したものである。ステツプ９−１は変数の初期化処理である。現在、文
節テーブルBUNTBの何文節目を処理しているかを管理す
る変数ｉを１に初期化する。また、出力バツフアOBUFを
次に何バイト目から作成すれば良いかを管理する変数ｊ
を０に初期化する。ステツプ９−２において文節テーブル上の全ての文節
に対する処理が終了したかどうかをチェツクする。具体
的には文節テーブルのｉ番目の文節がOFF Hで始まって
いるかどうかで判定する。文節が終了していると判定さ
れたときはとステツプ９−８に進み、出力バツフアをク
ローズする。文節が終了していないときはステツプ９−
３に進む。ステツプ９−３において文節テーブルｉ番目の文節に
登録されている文節中の自立語部分を漢字に変換し出力
バツフアOBUFのｊバイト目からに出力する。漢字への変
換の仕方は単なる辞書引きであり、辞書構成より処理は
明らかであるので特に説明は行なわない。ステツプ９−４において自立語部分を出力した分（自
立語長×２バイト）だけ、出力バツフアのポインタｊを
進める。ステツプ９−５において文節テーブルｉ番目の文節の
送り仮名部分を出力バツフアに出力する。ステツプ９−６において送り仮名部分を出力した部分
（送り仮名長×２バイト）だけ、出力バツフアのポイン
タｊを進める。ステツプ９−７においてｉ−値の１加算して次の文節
の処理に移り、ステツプ９−２にループする。ステツプ９−８は出力バツフアをクローズする処理で
あり、具体的には出力バツフアｊバイト目以降をOFF H
でクリアする。［他の実施例］以上の説明において、辞書構成は各単語のあらゆる仮
名漢字の組合せを見出しとして網羅する構成を説明し
た。例えば、「大惨事」であれば、「だいさんじ」「だ
いさん事」「だい惨事」「大さんじ」「大さん事」「大
惨じ」「大惨事」の全ての見出しをもつように説明し
た。が、更に工夫することもできる。例えば、第10図の
ような辞書構成をもつこともできる。この時、見出しは
単語の表記を記述する。読みは単語の読みを記述し、更
に読みの区切を「／」で記述する。分法情報は品詞等の
分法情報を記述する。このように構成すると、辞書をコ
ンパクトに実現できる。処理的には辞書サーチ時に辞書
の内容を展開してマツチングを取るようにサーチすれば
良い。また、実施例はタツチタイプ法により入力した仮名漢
字混じり分中の仮名を漢字に変換する方法を説明してい
るが、一度入力された文章の仮名部分を漢字変換するよ
うな用途であれば、どのようなものにでも応用が可能で
ある。例えば、常用漢字内の漢字だけで作成した文書を
常用漢字外の漢字も使用した文書に変換する装置を構成
することも可能である。この時、辞書としては常用漢字
外の漢字を使用した単語を重点的に登録することにな
る。［発明の効果］以上説明したように、本発明によれば、単語の漢字仮
名混じり表示中の仮名部分を、その単語の漢字表記にお
ける漢字に、少ない辞書容量の辞書を用いて正確に変換
できるという効果がある。Description: TECHNICAL FIELD The present invention relates to a character processing device for inputting, inputting, and editing characters mixed with kana-kanji characters. [Prior Art] Conventionally, input methods such as a touch type method and a kana-kanji conversion method have been devised as a method for inputting Japanese sentences mixed with kana-kanji characters. The touch type method is a method in which each kanji is coded by a unique sequence of keys on a keyboard, and the kanji is directly input by operating a corresponding key. Although it is necessary to memorize all the codes of each kanji, there is an advantage that the kanji input is variable at a high speed when mastered. The kana-kanji conversion method is a method in which a reading corresponding to a kanji is input and an operator selects a target one from displayed candidates. If you know how to read kanji, you can enter it immediately, so you have the advantage that learning is very fast. [Problems to be Solved by the Invention] However, these conventional systems have disadvantages. The touch-type method requires memorizing kanji codes, so it takes a long time to learn, and when the skill is not advanced, the input speed is slower than other methods (for example, the kana-kanji conversion method). Only after considerable training could they be used for daily work. That is, it was not possible to practice while using it for work. The kana-kanji conversion method is not a method of directly inputting unique kanji, but has a drawback that the input speed does not increase beyond a certain level because kanji is selected interactively. [Means and Actions for Solving Problems] To solve the above problems, according to the present invention, input means for inputting a kana-kanji mixed character string including kana and kanji to a character processing device And storage means for storing a character string mixed with kana kanji input from the input means, and separating a word reading and a kanji notation into readings of the word for each reading corresponding to each kanji in the kanji notation. Reference is made to dictionary means for adding and storing information and kana-kanji mixed notation obtained by expanding a part of the kanji of the kanji notation of the dictionary means into a corresponding reading based on the delimiter information. Conversion means for converting the kana in the kana-kanji mixed character string into a kanji reading the kana, wherein the kanji and the kanji consecutive to the kana form a kanji notation of one word. Kana Han The kana in the mixed character string is converted into a kanji reading the kana, and the kanji and the kanji following the kana form a kanji notation of one word. is there. EXAMPLES Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 shows an example of the overall configuration of the present invention. In the configuration shown in the figure, the CPU is a microprocessor, performs calculations for character processing, performs logical decisions, etc., and is connected to those buses via an address bus AB, a control bus CB, and a data bus DB. Control the components. The address bus AB transfers an address signal indicating a component to be controlled by the microprocessor CPU.
The control bus CB transfers and applies a control signal of each component to be controlled by the microprocessor CPU. The data bus DB transfers data between the components. Next, the ROM is a fixed read-only memory.
A procedure of control by the microprocessor CPU, which will be described later with reference to FIGS. The RAM is a writable random access memory having a structure of 16 bits per word, and is used for temporarily storing various data from each component. DIC is a dictionary and stores a correspondence table for converting words containing kana and kanji into kanji notation. IBUF is an input buffer for storing input kana-kanji mixed sentences. OBUF is an output buffer for storing the conversion sequence. BUNTB is a phrase table that stores phrases contained in the sentence mixed with the input kana / kanji. TBUF is a text buffer and is an area for temporarily storing document data being edited. KB is a keyboard. Alphabet key, hiragana key, katakana key and other character symbol input keys, and
It is provided with various function keys for instructing various functions for the character processing apparatus of the present invention, such as a conversion key. DISK is an external storage for storing document data and dictionary data, and stores created documents. The stored documents are called up when necessary according to keyboard instructions. The dictionary data is loaded into the area DIC on the RAM at an appropriate timing, and is referred to. CR is a cursor register. The CPU can read and write the contents of the cursor register. A CRT controller CRTC described later displays a cursor at a position on the display device CRT corresponding to the address stored here. DBUF is a display buffer memory for storing patterns of data to be displayed. The CRTC plays a role of displaying the contents stored in the cursor register CR and the buffer DBUF on the display CRT. The CRT is a display device using a cathode ray tube or the like, and the display pattern of the dot configuration and the display of the cursor on the display device CRT are controlled by a CRT controller. Further, CG is a character generator which stores character and symbol patterns to be displayed on the display device CRT. In the character processing device of the present invention including such components, the device operates according to various inputs from the keyboard KB, and when an input from the keyboard KB is supplied,
First, an interrupt signal is sent to the microprocessor CPU, which reads out various control signals stored in the ROM, and performs various controls according to the control signals. FIG. 2 is a diagram showing the usefulness of the present invention. In the figure, TS indicates a document screen, and MS indicates a monitor screen. The data input from the keyboard is once displayed on the monitor screen MS, and is displayed on the converted document screen TS. FIG. 7A is a diagram in which the operator inputs “traces the cause of a major accident” from the keyboard. Where "large"
"Things", "causes", and "traces" are entered by the touch-type method because the codes are stored. As for "san", "yu" and "tsuki", I wanted to directly input "miserable", "invitation" and "pursuit" with a touch type, but because I did not memorize the code, I had to read it. FIG. 4B is a diagram showing a state after a conversion key is input.
"San", "yu" and "tsuki" are "disaster" and "invitation" respectively.
Has been converted to "add". For example, "san" has many homonyms such as "san", "yama", "san", "san" and "san", but since the word "catastrophe" is registered in the dictionary, it is uniquely identified as "san". It was converted. FIG. 3 is a diagram showing a configuration of a dictionary (DIC). Each record stores a word and stores one word at 26 bytes (fixed length). The first 20 bytes of the record are the header. A word heading is stored in two bytes per character. Character code is JIS X 02
Stored in 08 code. The next four bytes are pointers. The address where the record having the kanji notation of the word exists is stored. For example,
"Large affair" points to the record of "Great catastrophe". The last two bytes are grammar information. For example, "no-san" is stored as a noun, and "tsutori" is stored as a noun. Fig. 4 shows input buffer (IBUF) and output buffer (OBUF).
FIG. 3 is a diagram showing the configuration of FIG. All keys input from the keyboard undergo code conversion by the touch type method, are converted to corresponding kanji, etc., and enter IBUF as JIS X 0208 codes. IBUF consists of 2 bytes per character, each character is JIS X 02
Described in 08D code. OFF H is filled in the last part where the input has not been completed yet. The configuration of the OBUF is the same as that of the IBUF. When the conversion is instructed, the contents of the IBUF are converted and stored in the OBUF, and then transferred in the text buffer (TBUF). FIG. 5 is a diagram for explaining the structure of the phrase table. The clauses of the sentence on the input buffer IBUF are parsed and stored on BUNTB. It is stored as 5 words per phrase from the phrase at the head of the input buffer. OFF H is filled after the last clause. The structure of each clause is shown below. The first word stores the phrase start position. The relative address from the beginning of the IBUF at the start of the clause is stored. For example, in the case of the phrase "large affair", an address corresponding to the position of "large" is stored. The phrase end position of the next one word is stored. The relative address from the beginning of the IBUF at the end point of the analysis is stored. For example, in the case of the phrase “Osansaku no”, an address corresponding to the position of “no” is stored. The next one word stores the number of characters of the independent word part at the head of the phrase. For example, in the case of the phrase “Osan-san”, “4” is stored since the independent word is “Osan-san”. The last two words store the address on the dictionary where the notation of the independent word exists. For example, if the phrase is "large sanji", the independent word is "large sanji" and its notation is "catastrophe"
Therefore, the address on the dictionary where the notation "catastrophe" exists is stored. FIG. 6 (a) is a diagram illustrating the passage of kana-kanji conversion. For the input "I'll trace the cause of the big things",
First, the phrase being input is analyzed, and the phrase is separated. The method of obtaining the delimiter includes a longest match method, a two-segment longest match method, a minimum number of clauses method, and the like, and can be arbitrarily selected. When the delimiter is determined, the kana part of each independent phrase is converted to kanji. (B) is an explanation when the longest match method is adopted as a way of separating phrases. First, the first segment of the input sequence is cut out. As a result, phrase candidates "large", "large" (personal name) "large" and "large" are cut out. Of these, the longest match, "Osansan", is adopted. Next, segmentation is performed subsequently. As a result, "Yu"("Yu")"Yu"("Tie""Evening")
“Yu factor” and “Yu factor” are cut out, and the longest matching “Yu factor” is adopted. Thereafter, the segment breaks are determined in the same manner. The operation at the time of the above will be described according to a flow. FIG. 7 is a flow chart of a portion for receiving a key input and performing processing. Step 7-1 is a process for taking data from the keyboard into the input buffer IBUF. If the conversion key data is included in the IBUF, the kana-kanji conversion must be performed, and the process proceeds to step 7-2. If not, a normal editing process is performed, so that the process proceeds to step 7-6. In step 7-2, as described in detail in FIG.
Split the input sequence into clauses above. In step 7-3, the input string divided in half as shown in FIG. 9 is converted into Chinese characters on the OBUF. In step 7-4, the conversion result created on the OBUF is output on the text buffer TBUF. Further, the conversion result output in step 7-5 is displayed. Steps 7-6 perform well-known processes such as cursor movement, character input, document storage, and the like in ordinary character processing devices, and a description thereof will be omitted. FIG. 8 shows the phrase dividing process in step 7-2 in detail. Step 8-1 is a variable initialization process. A variable i for monitoring the number of characters from the head of the input buffer IBUF being processed is initialized to i, and a variable j for managing the address of the phrase table BUNTB currently being created is initialized to 0. Step 8-2 checks if there are no more characters to process on the IBUF. Specifically, IBUF
Is determined based on whether the i-th character is OFF H. If I
If the BUF has been completed, the process proceeds to step 8-8, and the phrase table is closed. If the IBUF has not been completed, go to step 8-3. In step 8-3, all words starting from the i-th character of the IBUF are searched in order to check the possibility of any phrase that can be created on the input sequence. In step 8-4, all the accessory word strings leading to the searched word are analyzed. In step 8-5, the longest phrase candidate analyzed from the phrase candidates is determined. The longest phrase candidate determined in step 8-6 is registered in the phrase table. When registering, create from the j-th byte of BUNTB. After creation, the value of j is added by the number of created bytes, that is, by 10. In step 8-7, the value of i is added by the number of readings of the longest phrase currently processed, and the process loops to step 8-2. Step 8-8 is a process of closing the phrase table, and clears the byte after j indicated by BUNTB by OFFH. FIG. 9 shows the kanji conversion process in step 7-3 in more detail. Step 9-1 is a variable initialization process. A variable i for managing the number of clauses currently processed in the clause table BUNTB is initialized to 1. Also, a variable j that manages from which byte the output buffer OBUF should be created next.
Is initialized to 0. At step 9-2, it is checked whether or not the processing for all the phrases on the phrase table has been completed. Specifically, it is determined whether or not the i-th phrase in the phrase table starts with OFF H. If it is determined that the phrase has been completed, the flow advances to step 9-8 to close the output buffer. If the phrase is not over, go to step 9-
Proceed to 3. In step 9-3, the independent word portion in the phrase registered in the phrase table of the i-th phrase is converted into a kanji and output from the j-th byte of the output buffer OBUF. The method of conversion to kanji is a simple dictionary lookup, and the processing is clear from the dictionary configuration. In step 9-4, the output buffer pointer j is advanced by an amount corresponding to the output of the independent word part (independent word length × 2 bytes). In step 9-5, the kana part of the i-th phrase in the phrase table is output to the output buffer. In step 9-6, the output buffer pointer j is advanced by the portion where the sending kana part was output (sending kana length × 2 bytes). In step 9-7, the i-value is incremented by one, and the process proceeds to the next clause, and the process loops to step 9-2. Step 9-8 is a process for closing the output buffer. Specifically, the output buffer after the j-th byte is turned off.
To clear. [Other Embodiments] In the above description, the dictionary configuration covers all combinations of kana and kanji of each word as headings. For example, if it is "catastrophe", it should have all the headings "daisanji""daisanji""daisanji""daisanji""daisanji""catastrophe""catastrophe" Explained. However, it can be devised further. For example, a dictionary configuration as shown in FIG. 10 can be provided. At this time, the heading describes the notation of the word. In the pronunciation, the pronunciation of the word is described, and the delimitation of the pronunciation is described with “/”. The division information describes division information such as part of speech. With this configuration, the dictionary can be compactly realized. In terms of processing, the contents of the dictionary may be expanded at the time of searching the dictionary, and the search may be performed so as to take the matching. Further, the embodiment describes a method of converting kana in kana-kanji mixed with a kana-kanji input by the touch type method into kanji, but any application that converts a kana part of a sentence once into kanji can be used. It can be applied to such a thing. For example, it is also possible to configure a device that converts a document created using only kanji in the common kanji into a document that also uses kanji outside the common kanji. At this time, the dictionary mainly registers words using kanji other than common kanji. [Effects of the Invention] As described above, according to the present invention, a kana part of a word displayed in a mixture of kanji and kana can be accurately converted into a kanji in the kanji notation of the word using a dictionary having a small dictionary capacity. This has the effect.

【図面の簡単な説明】第１図は本発明の全体構成のブロツク図第２図は本発明の有用性を示した図第３図は辞書の構成を示した図第４図は入力バツフア、出力バツフアの構成を示した図第５図は文節テーブルの構成を示した図第６図は文節の解析手順を示した図第７図〜第９図は本発明文字処理装置の動作を示すフロ
ーチヤート第10図は辞書構成の他の実施例を示した図 DISK……外部記憶 CPU……マイクロプロセツサ ROM……読出し専用メモリ RAM……ランダムアクセスメモリ DIC……辞書 IBUF……入力バツフア OBUF……出力バツフア BUNTB……文節テーブル TBUF……テキストバツフアBRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of the overall configuration of the present invention. FIG. 2 is a diagram showing the usefulness of the present invention. FIG. 3 is a diagram showing the configuration of a dictionary. FIG. 4 is an input buffer. FIG. 5 shows the structure of a phrase table. FIG. 6 shows the procedure for analyzing the phrase. FIGS. 7 to 9 show the flow of the operation of the character processing apparatus of the present invention. FIG. 10 is a diagram showing another embodiment of a dictionary configuration DISK ... external storage CPU ... microprocessor ROM ... read-only memory RAM ... random access memory DIC ... dictionary IBUF ... input buffer OBUF ... … Output buffer BUNTB …… Phrase table TBUF …… Text buffer

Claims

(57) [Claims] Input means for inputting a kana-kanji mixed character string including kana and kanji; storage means for storing a kana-kanji mixed character string inputted from the input means; reading of words and kanji notation; Dictionary means for adding delimiter information to the reading of the word for each reading corresponding to each kanji in the notation, and storing the kanji in the kanji notation of the dictionary means based on the delimiter information. Referring to the expanded kana-kanji mixed notation, the kana in the kana-kanji mixed character string stored in the storage means is a kanji reading the kana, and the kanji and the kanji continuous with the kana are read. A conversion unit for converting a kanji notation of one word into kanji.