JPH0469766A

JPH0469766A - Kana to kanji converter

Info

Publication number: JPH0469766A
Application number: JP2182076A
Authority: JP
Inventors: Katsuhiko Toida; 勝彦樋田
Original assignee: KANSAI PANAFUAKOMU KK; PFU Ltd
Current assignee: KANSAI PANAFUAKOMU KK; PFU Ltd
Priority date: 1990-07-10
Filing date: 1990-07-10
Publication date: 1992-03-04

Abstract

PURPOSE:To improve the conversion speed by fetching a work from a divide-writing part at every character of an inputted character-string, discriminating the propriety of a connection to an immediately previous word in the input character-string with regard to the maximum likelihood word and executing a back-track with regard to the immediately previous word in the input character-string in the case it is not appropriate and selecting the maximum likelihood word. CONSTITUTION:A character is fetched by one character each by a character fetching means 101 from an inputted character-string, and stored in a read buffer 105. Subsequently, in a maximum likelihood evaluating and selecting means 103, while sorting a word which can be connected to the immediately previous maximum likelihood candidate word in order of frequency with regard to each word of a dictionary buffer 106, it is stored in a candidate word buffer 107, and the most superordinate word of sorting order is selected and stored in a maximum likelihood work buffer 108. Thereafter, these operations are executed with respect to the word corresponding to the next sentence clause, and when the word of the next sentence clause is found out, the present maximum likelihood candidate is written in a defined word buffer 109. In such a way, to output data for conversion to a KANJI (Chinese character) and KANA (Japanese syllabary) mixed sentence, an access is executed only once, therefore, the processing efficiency is improved.

Description

【発明の詳細な説明】［概要１かな文字人力の文字列を漢字かな混じり文に変換するか
な漢字変換装置に関しパックトラッキングを行っても取り扱う単語のデータ長
を短くして、バッファの容量を少なくすると共に変換速
度を向」二できるかな漢字変換装置を堤供することを目
的とし分かち書き検索だけに用いる読みと品詞を含む分かち古
き用データ部と、読みと漢字等を含む出力用データ部と
で構成する辞書を設け１人力された文字列の１文字毎に
辞也の分かち書き用データ部から単語を取り出し、取り
出した単語から最尤選択により候補単語を抽出し、最尤
単語について入力文字列中の直前単語との接続の可否を
判別して、可の場合後続の文字列について動作を行い不
可の場合人力文字列中の直前単語に・ついて上記分かち
書きデータを用いて八ツクトランクして最尤単語を選択
し１選択された最尤単語を上記辞書の出力用データを用
いて漢字かな混じり文に変換して出力するよう構成する
。[Detailed description of the invention] [Summary 1] Regarding a kana-kanji conversion device that converts a human-written character string of kana characters into a sentence containing kanji and kana, the data length of the words handled is shortened and the buffer capacity is reduced even when pack tracking is performed. The purpose of this dictionary is to provide a kanji conversion device that can improve the conversion speed at the same time, and is composed of a data section containing pronunciations and parts of speech used only for retrieval, and an output data section containing pronunciations, kanji, etc. A word is extracted from the parting data section of the dictionary for each character of the input string, and candidate words are extracted from the retrieved words by maximum likelihood selection. Determine whether connection is possible or not, and if yes, perform the operation on the subsequent character string, and if not, select the most likely word by 8-trunking the previous word in the human character string using the above separation data. Then, the selected maximum likelihood word is converted into a sentence containing kanji and kana using the output data of the dictionary and is output.

［産業上の利用分野１本発明はかな文字人力の文字列を漢字かな混じり文に変
換するかな漢字変換装置に関する。[Industrial Application Field 1] The present invention relates to a kana-kanji conversion device for converting a human-written character string of kana characters into a sentence containing kanji and kana.

日本語人力を行うワードプロセッサやワークステーショ
ン等のシステムにおいて、複数文節に渡る文字列をかな
人力（べた書き入力）で受付で自動的に文節または単語
に分かち書きして漢字かな混じり文に変換する機構が設
けられている。In systems such as word processors and workstations that perform Japanese language processing, there is a mechanism that automatically divides character strings spanning multiple phrases into phrases or words at the reception desk using kana manual input (solid input) and converts them into sentences containing kanji and kana. It is provided.

そのようなかな漢字変換を行う従来の装置は変換を行う
際に引用する辞書として１分かち書きした「読み文字Ｊ
に対応した単語が格納されている。この場合の単語デー
タには、漢字を出力するだめの出力用データが含まれて
いるので、最も適した単語の候補（最尤候補）を選択す
る場合に一度選択した単語が不適切であることが接続関
係のチエ・ンクにより分かると２元に戻ってより適切な
文節（文字列）について２かな漢字変換の処理が行われ
る（これをバンクＩ・ラッキングという）。Conventional devices that perform such kana-kanji conversion use the ``yomiji J
The words corresponding to are stored. In this case, the word data includes data for outputting kanji, so when selecting the most suitable word candidate (maximum likelihood candidate), it is important to note that the word once selected is inappropriate. When it is determined by checking the connections, the process returns to binary and converts the more appropriate clause (character string) into two kana-kanji (this is called bank I racking).

このような八ツクトランキングを含むかな漢字変換が高
速にしかも、できるだけ少ないメモリ容量で実現できる
ことが望まれている。It is desired that such kana-kanji conversion including eight-track ranking be realized at high speed and with as little memory capacity as possible.

［従来の技術］第６図は従来例の構成図、第７図は従来例の変換処理フ
ロー図である。[Prior Art] FIG. 6 is a block diagram of a conventional example, and FIG. 7 is a conversion processing flow diagram of the conventional example.

第６図において、６１はキーボー１”、６２はファイル
、６３は入力３ｈ　’Ｊバッファ、６４は辞書アクセス
用読みバッファ、６５は辞書、６６は辞書データバッフ
ァ、６７ば候補単語バッファ、６８は最尤候補バッファ
、６９ば確定単語バッファ。In FIG. 6, 61 is the keyboard 1", 62 is the file, 63 is the input 3h'J buffer, 64 is the reading buffer for dictionary access, 65 is the dictionary, 66 is the dictionary data buffer, 67 is the candidate word buffer, and 68 is the most Likelihood candidate buffer, 69 confirmed word buffer.

７０はＣＲＴ等の表示装置である。70 is a display device such as a CRT.

上記の辞書６５には２図に示すように、各単語毎に、「
読め長」　（読み文字列の長さ）２　「読み」（読みの
内容）、「データ長」、「出力用データ」　（漢字デー
タ）、「品詞」、「頻度」とが１つの単語毎に格納され
、同し「読み」で複数の単語がある場合（同音異義）、
各単語に対応して「データ長Ｊ＋　　ｒ出力用データ（
漢字データ）」。As shown in Figure 2, the dictionary 65 mentioned above has "
``Reading length'' (length of reading character string) 2 ``Reading'' (reading content), ``data length'', ``output data'' (kanji data), ``part of speech'', and ``frequency'' are set for each word. If there are multiple words stored with the same reading (homophones),
Corresponding to each word, "data length J + r output data (
Kanji data).

「品詞」、「頻度」が順次格納される。"Part of speech" and "frequency" are stored sequentially.

第６図の構成の動作を項分けして説明する。The operation of the configuration shown in FIG. 6 will be explained in terms of sections.

■第６図のキーボーＦ６１またはファイル６２等から変
換したい文字列（読み文字）を入力すると、その文字列
のデータは入力読みバッファ６３に格納される。(2) When a character string (reading character) to be converted is input from the keyboard F61 or file 62 in FIG. 6, the data of the character string is stored in the input reading buffer 63.

■読み文字を１文字づつ辞書アクセス用読みバッファ６
４に取り出す。■Reading buffer 6 for accessing the dictionary one character at a time
Take it out at 4.

■読みが１文字だけ辞書アクセス用読みバッファ６４に
取り出されるたびに辞書６５を検索して検索された単語
を辞書データバッファ６６に読み込む。(2) Search the dictionary 65 and read the searched word into the dictionary data buffer 66 each time the reading of one character is taken out to the reading buffer 64 for dictionary access.

■直前文節の最尤候補単語に接続可能な単語を頻度順に
ソートしながら候補単語バッファ６７に書き込む。なお
、ごの時の最尤候補は、読め長の長いものを優先し５次
に頻度の高いものを優先して候補よし、同一文節の単語
をすべてソートする。(2) Write words that can be connected to the most likely candidate word of the immediately preceding clause into the candidate word buffer 67 while sorting them in order of frequency. In addition, for the maximum likelihood candidates in this case, all words in the same clause are sorted, giving priority to those with the longest reading length and giving priority to those with the highest frequency in the 5th order.

■ソートの結果最も順位が」二の単語を１最尤候補ハソ
フア６８に書き込む。■The word with the highest rank as a result of the sorting is written in the 1st most likely candidate field 68.

０次の文節の最尤単語が見つかったら、確定単語バッフ
ァ６９に書き込む。When the maximum likelihood word of the 0th order clause is found, it is written to the confirmed word buffer 69.

ただし５次の文節を取り出す処理■、■、■を繰り返し
で、■の入力読みデータが終了してしまった場合は、候
補単語バッファから、現在の最尤候補に続く単語を最尤
候補ハ・ンファ６８に書き込む（上記■）。この時、同
一文節の候補単語がない場合は、処理文節を一つ前の文
節に戻して２候補型語バッファの最後の単語を最尤候補
バッファ６８に戻して、候補単語バッファ６７内の最尤
候補単語に続く単語を最尤候補バッファ６８に書き込ん
で、引き続き単語を探す（■、■、■）。このような処
理を一般にバック１〜ラツキングと呼んでいる。However, if the input reading data for ■ is completed by repeating the process of extracting the fifth clause, ■, ■, ■, the word following the current maximum likelihood candidate is selected from the candidate word buffer Write to the buffer 68 (■ above). At this time, if there is no candidate word for the same clause, the processing clause is returned to the previous clause, the last word in the two-candidate word buffer is returned to the maximum likelihood candidate buffer 68, and the most likely candidate word in the candidate word buffer 67 is returned. The word following the most likely candidate word is written to the most likely candidate buffer 68, and the word is continued to be searched for (■, ■, ■). Such processing is generally called back 1 to racking.

■入力読みがすべて変換されると、確定単語バッファ６
９から出力用データを取り出して漢字かな混じり文とし
て出力する。■Once all input readings have been converted, the confirmed word buffer 6
The data for output is extracted from 9 and output as a mixed sentence of kanji and kana.

次に上記従来例の変換処理のフローを第７図を用いて説
明する。Next, the flow of the conversion process in the above conventional example will be explained using FIG. 7.

入力読み（かな入力）を取り出し１人力読みのデータが
ないと漢字かな混じり文への変換を行い（それ以前に変
換処理が終了している文章について）、ある場合は２読
みに合う単語を辞書（第６図の６５）からメモリに全て
読み込む（第７図７０〜７２）。この場合、各読みに対
応した出力用データを含む辞書データを辞書（第６図の
６５）から読み込まれる。The input pronunciation (kana input) is extracted and if there is no manual reading data, it is converted to a sentence containing kanji and kana (for sentences for which the conversion process has been completed before then), and if there is, the word that matches the two pronunciations is added to the dictionary. (65 in FIG. 6) is read into memory (70 to 72 in FIG. 7). In this case, dictionary data including output data corresponding to each reading is read from the dictionary (65 in FIG. 6).

この結果、読み込まれた単語がある時ば、最尤単語（」
二記第６図に関して既述）を取り出し、最尤単語がある
時は１次に品詞の接続可能性（前の単語と後の単語の品
詞の組合せが可能か否か）を判断する（同７３〜７６）
。この結果接続が可能の場合、最尤単語に続く文節の読
みから処理を始めるための前処理を行い（最尤単語の後
の「読み」を次の入力読み取り出し時に使用する）、■
の経路によりステップ７０に移行する（同７７）。As a result, if there is a loaded word, the most likely word (''
2 (already mentioned in relation to Figure 6), and when there is a maximum likelihood word, the first step is to determine the connectability of the parts of speech (whether or not the combination of the parts of speech of the previous word and the subsequent word is possible). 73-76)
. If a connection is possible as a result, preprocessing is performed to start processing from the reading of the clause following the most likely word (the "yomi" after the most likely word is used when extracting the next input reading), and ■
The process moves to step 70 (step 77).

また１品詞による接続の可能性のチエツクにより接続不
可である場合、■の経路によりステップ７４に移行し次
の順位の最尤単語を取り出す。Further, if the possibility of connection based on one part of speech is checked and it is found that the connection is not possible, the process moves to step 74 according to route 2, and the most likely word of the next rank is extracted.

入力読みに合う単語が辞書になかった場合及び最尤単語
がない場合は、直前の文節があるか判断し、有る時はバ
ックトランクするだめの前処理を行って直前文節の処理
に戻る（同７８．７９）。If there is no word in the dictionary that matches the input pronunciation, or if there is no maximum likelihood word, it is determined whether there is a previous clause, and if there is, it performs preprocessing to backtrunk and returns to processing the previous clause. 78.79).

また、ステップ７８で、直前文節がないことが分かると
変換不能として入力読みを全て全角ひらがなに変換して
終了する（同８０）。If it is found in step 78 that there is no preceding phrase, it is determined that conversion is impossible and all input readings are converted to full-width hiragana and the process ends (step 80).

最後に入力読みがないと（文節の終了）１選択された最
尤単語に含まれる漢字コードを用いて漢字かな混じり文
に変換を行って終了する。Finally, if there is no input reading (end of clause), the kanji code included in the selected most likely word is used to convert it into a sentence containing kanji and kana, and the process ends.

上記のように従来の日本語変換は、バンクトランキング
を繰り返して最も長く文節を構成できるパターンを見つ
けて漢字かな混じり文への変換完了としている。As mentioned above, in conventional Japanese conversion, bank trunking is repeated to find the longest pattern that can form a clause, and the conversion to a sentence containing kanji and kana is completed.

［発明が解決しようとする課題］上記従来例の方式によれば１次のような問題がある。[Problem to be solved by the invention] According to the conventional method described above, there is a first-order problem.

■一つの単語のデータ長が大きいため、読みに合う単語
を取り出した時に、候補単語バッファ（第６図の６７）
に格納されるデータ量が多くなってメモリ容量を圧迫す
る。このため、メモリに制約があって十分な大きさの「
候補単語バッファ」に取れない時、候補単語を全てバッ
ファに格納できない事態・が生じる。この時、バックド
ラッギングが生じるとメモリからあふれたデータを再度
読み込む必要が生じて、ファイルアクセスが多くなり速
度が低下する。■Since the data length of one word is large, when a word that matches the pronunciation is extracted, the candidate word buffer (67 in Figure 6)
The amount of data stored in the memory increases, putting pressure on memory capacity. For this reason, there are memory constraints and a sufficiently large "
When candidate words cannot be stored in the "candidate word buffer", a situation occurs in which all candidate words cannot be stored in the buffer. At this time, if backdragging occurs, it becomes necessary to reread the data that overflows from the memory, which increases the number of file accesses and reduces speed.

■辞書ファイルに格納している単語数が多い場合、候補
データとして読み込む単語数が増大するため、１単語の
データ長が長ければ長い程処理速度が低下する。(2) If there are many words stored in the dictionary file, the number of words to be read as candidate data will increase, so the longer the data length of one word, the slower the processing speed will be.

■辞書フアイルの大きさに制約がある場合、単語数を少
なくする必要が生じ、単語数を少なくすると変換不能と
なるケースが発生し易くなる。■If there are restrictions on the size of the dictionary file, it will be necessary to reduce the number of words, and if the number of words is reduced, cases will likely occur where conversion will not be possible.

本発明はバンクトラッキングを行っても取り扱・う単語
のデータ長を短くして、バッファの容量を少なくすると
共に変換速度を向上できるかな漢字変換装置を提供する
ことを目的とする。SUMMARY OF THE INVENTION An object of the present invention is to provide a Kana-Kanji conversion device that can shorten the data length of words to be handled even when bank tracking is performed, thereby reducing buffer capacity and improving conversion speed.

［課題を解決するための手段］第１図は本発明の原理的構成図である。[Means to solve the problem] FIG. 1 is a diagram showing the basic configuration of the present invention.

第１図において、１０はＣＰＵ及びメモリを含む処理装
置３１１は辞書、１２は入力部、１３は出力部を表す。In FIG. 1, 10 represents a processing device 311 including a CPU and memory, a dictionary, 12 an input section, and 13 an output section.

処理装置１０には１文字取り出し手段１０１分かち書き
用データ検索手段１０２．最尤単語評価選択手段１０３
．出力用データ検索手段１０４の各手段と、複数のバッ
ファ１０５〜１０９とが備えられ、辞書１１ば各「読み
」に対して品詞頻度を含むデータが格納された分かち書
き用データ部１１１と各「読み」に対して漢字やかなの
データや頻度が格納された出力用データ部１１２とで構
成される。The processing device 10 includes a single character extraction means 101, a data retrieval means 102 for dividing data. Maximum likelihood word evaluation selection means 103
．． Each means of the output data search means 104 and a plurality of buffers 105 to 109 are provided. '', and an output data section 112 in which data and frequencies of kanji and kana are stored.

本発明は辞書に読みに対応する品詞を含む分かち書き用
データ部と、読みに対応するかなや漢字情報を含む出力
用データ部の２つを分離して格納し、最尤候補の単語を
選択する際に分かち書き用データを用い、単語が確定し
た後は出力用データ部を用いて漢字変換を行うものであ
る。The present invention separately stores two parts in a dictionary: a parting data section containing parts of speech corresponding to the readings, and an output data section containing kana and kanji information corresponding to the readings, and selects the most likely candidate word. At the time of writing, the parting data is used, and after the word is determined, the output data section is used to perform kanji conversion.

［作用］最初にキーボードやファイル等の入力部から変換すべき
文字列が処理装置１０に入力される。その入力された文
字列から文字取り出し手段１０１により１文字づつ文字
が取り出された読みバッファ１０５に格納される。１文
字取り出される毎に読みバッファ１０５の内容を検索キ
ーとして辞書１１の分かち書き用データ部に対して分か
ち書き用データ検索手段１．０２が検索を行う。検索に
より得られた分かち書き用データ（多数個ある場合が多
い）は辞書バッファ１０６に格納する。[Operation] First, a character string to be converted is input to the processing device 10 from an input unit such as a keyboard or a file. Characters are extracted one by one by character extraction means 101 from the input character string and stored in reading buffer 105 . Every time one character is retrieved, the separation data search means 1.02 searches the separation data section of the dictionary 11 using the contents of the reading buffer 105 as a search key. The separation data (often there is a large number of pieces) obtained by the search is stored in the dictionary buffer 106.

次に最尤単語評価選択手段１０３において、辞書バッフ
ァ１０６の各単語について、直前の最尤候補単語に接続
可能（品詞により判断）な単語を頻度順（但し、読みの
長い単語を優先）にソートしながら候補単語バッファ１
０７に格納し１次に同一文節の単語を読み込のが終了し
たら、最もソー　＋−順の上位の単語を選択して最尤単
語バッファ］、　０８に格納する。ごの後３次の文節に
該当する単語に対して、各手段ＩＯ１〜１０３による動
作が行われ３次の文節の単語が見出されたら、現在の最
尤候補を確定単語バッファ１０９に書き込む。Next, for each word in the dictionary buffer 106, the maximum likelihood word evaluation selection means 103 sorts words that can be connected to the previous maximum likelihood candidate word (determined by part of speech) in order of frequency (with priority given to words with long pronunciations). Candidate word buffer 1
After reading the words of the same clause in the first stage, select the word with the highest rank in the sort order and store it in the maximum likelihood word buffer] and 08. After the operation, each means IO1-103 performs an operation on the word corresponding to the tertiary clause, and when the word of the tertiary clause is found, the current maximum likelihood candidate is written into the confirmed word buffer 109.

この動作を繰り返して確定単語バッファに確定単語が書
き込まれる。もし１次の文節につながる単語が辞書に無
い場合は、バックトラッキングを行い、前の単語にさか
のぼって同様の処理を行う。This operation is repeated to write the confirmed word to the confirmed word buffer. If there is no word in the dictionary that connects to the first clause, backtracking is performed and the same process is performed going back to the previous word.

全ての入力された読めに対して処理が終了すると、確定
単語バッファ１０９の内容から辞書１１の出力用データ
１．１２を読み出して漢字かな混じり文の出力データを
取り出して出力部１３に出力する。When processing is completed for all input readings, the output data 1.12 of the dictionary 11 is read out from the contents of the confirmed word buffer 109, and output data of sentences containing kanji and kana is extracted and outputted to the output unit 13.

Ｆ記の動作において、各手段１０２，１０３及び各バッ
ファファ１．０５〜１０９で扱うデータは短い長さの分
かち書き用データであるから、ハックトラッキングが発
生した場合にも処理するデク量が少なく、漢字かな混じ
り文への変換用の出力用データへは１回しかアクセスし
ないので処理効率を向上させることかできる。In the operation described in F, the data handled by each means 102, 103 and each buffer 1.05 to 109 is short-length separation writing data, so even if hack tracking occurs, the amount of data to be processed is small. Since the output data for conversion into a kanji-kana mixed sentence is accessed only once, processing efficiency can be improved.

］　２［実施例］第２図は本発明の実施例の処理フロー図、第３図は本発
明による辞書データの構成例、第４図は単語間の品詞に
よる接続可否を表すテーブル、第５図は本発明による動
作例である。2 [Example] Fig. 2 is a processing flow diagram of an embodiment of the present invention, Fig. 3 is a configuration example of dictionary data according to the present invention, Fig. 4 is a table showing whether or not words can be connected according to parts of speech, Fig. 5 The figure shows an example of operation according to the present invention.

最初に本発明による辞書データの構成を第３図を用いて
説明する。First, the structure of dictionary data according to the present invention will be explained using FIG.

第３図のＡ、は１分かち書き用データ部（第１図の分か
ち書き用データ部と同じ）のデータ構成が示され、各「
読み」に対し、先頭に全データ長（同じ読みに対応する
全てのデータの長さ；ＤＬで表示）１次に読み長（読み
データの長さ：Ｙ１．で表示）、「読み」２品詞及び頻
度が格納され、この後に同じ「読ミ」の他の単語がある
場合、それぞれの品詞、頻度が順番に格納される。A in FIG. 3 shows the data structure of the 1-minute writing data section (same as the dividing writing data section in FIG. 1), and each "
For "Yomi", the first is the total data length (the length of all data corresponding to the same reading; displayed in DL), the first is the reading length (the length of the reading data is displayed as Y1.), and the second part of speech is "Yomi". and frequency are stored, and if there are other words of the same "yomi" after this, their part of speech and frequency are stored in order.

第３図Ｂ、は、出力用データ部（第１図の出力用データ
部と同じ）のデータ構成が示され、各「読み」について
、最初にデータ長（ＤＬ）、読み長（Ｙｉｌと読のが、
第３図Ａ、と同様に格納され、この後、同−品詞内語数
（ＩＩｃで表示）、出力用データ長（ＫＬで表示）が格
納され、その後に漢字データとその頻度が設＆Ｊられて
いる。この場合も、同し「読み」に対して複数の漢字が
存在する場合シＪ、各弔語に対してデータ長１読み長５
読の出力用データ長、漢字データ及び頻度が格納される
。Figure 3B shows the data structure of the output data section (same as the output data section in Figure 1), and for each "reading", first the data length (DL), reading length (Yil and The thing is,
It is stored in the same way as in Figure 3A, and then the number of words in the same part of speech (indicated by IIc) and the output data length (indicated by KL) are stored, and then the kanji data and their frequencies are set &J. There is. In this case as well, if there are multiple kanji for the same reading, the data length is 1 and the reading length is 5 for each funeral word.
The reading output data length, kanji data, and frequency are stored.

第３図Ｃ９及びり、には、辞書に格納された一部の１読
み１に対応するデータの具体例が示されＣ１は分かち書
き用データ部の例、Ｄ、はＣ９の各データに対応する出
力用データ部の例である。C9 and C9 in FIG. 3 show specific examples of data corresponding to some of the 1 readings 1 stored in the dictionary, C1 is an example of the data section for separation, and D corresponds to each data in C9. This is an example of an output data section.

第４図は品詞接続表であり、この表は最尤候補を選択す
る際に、先に確定した単語に対する後続する単語を選択
する時に用いる。FIG. 4 is a part-of-speech connection table, and this table is used when selecting a word following a previously determined word when selecting a maximum likelihood candidate.

図において、Ａの欄は前に位置する単語の品詞を表し、
ＢはＡに後続する単語の品詞を表し１両者の交点が○で
あればその接続は可能、×であればその接続は不可であ
ることを意味する。この表に使用する品詞の分類は必要
により細分化することが可能であり、その例の一部が示
されている。In the figure, column A represents the part of speech of the preceding word,
B represents the part of speech of the word following A; 1. If the intersection of the two is ○, the connection is possible; if the intersection is ○, the connection is not possible. The classification of parts of speech used in this table can be subdivided as necessary, and some examples are shown.

次に第２図の変換処理フローを説明する。Next, the conversion process flow shown in FIG. 2 will be explained.

最初にキーボードやファイル等から人力されたデータが
入力ハッファ（図示せず）に保持された後、１文字だり
人力読み取り出しが行われる（第２図２０）。ここで人
力８ｆｅ　Ｌ’ｊがあるか否かを判別しく同２１）、無
い時は最尤単語の選択が終了したものとして、後述する
漢字かな混じり文への変換処理（同３２〜３８）に移行
する。First, data entered manually from a keyboard, a file, etc. is held in an input buffer (not shown), and then one character is manually read and retrieved (FIG. 2, 20). Here, it is determined whether or not there is a human power 8fe L'j (21), and if there is not, it is assumed that the selection of the maximum likelihood word has been completed, and the conversion process to a sentence containing kanji and kana (32 to 38), which will be described later, is carried out. Transition.

人力読み（文字データ）があると、辞書の分かぢ書きデ
ータ部（第３図のＣ）から読みに合う単語（分から書き
データ）をすべＣ読み込む（同２２）にの時読み込まれ
た単語があるか否かを判＋ｔ＋１しく同２３）、無い時
は後述するステップ２９の処理が行われ、ある時は、全
ての分かち書きブタについて最尤評価法（長い文節を優
先し、その後に頻度の高いものを優先する）により各候
補をソートする（同２４）。If there is manual reading (character data), all the words that match the pronunciation (minute writing data) are read from the dictionary's writing data section (C in Figure 3). If there is no clause, the process of step 29 described later is performed, and in some cases, the maximum likelihood evaluation method (prioritizes long clauses, and then Sort each candidate according to the criteria (giving priority to the highest one) (24).

すなわら、長いデータを優先し、その後に頻度の高い分
から書きデータに＋＝ｊ　してその品詞と、直前の単語
（ｎｉ１回の処理で選択された最尤単語）の品詞とを品
詞接続表（第４Ｍ参照）により接続可か否かを判定して
、接続可である分かち書きデータを順次頻度順にソート
する。In other words, give priority to long data, then add +=j to the written data from the most frequent parts, and connect the part of speech with the part of speech of the immediately preceding word (the most likely word selected in one process of ni). It is determined whether connection is possible or not based on the table (see 4th M), and the separated data that can be connected are sorted in order of frequency.

ソートが終了すると、ソーＩ・された最尤候補の単語の
中から最も優先度の高い単語を最尤単語として取り出す
（同２５）。次に最尤単語があるか否かを判定しく同２
６）、ない場合は、直前文節があるか判断して（同２９
）、あった場合、ハックＩ・ラックするための前処理（
直前文節の処理に戻る）を行う（同３０）。この場合、
■の経路でステップ２５に戻り、直前の文節に対して、
候補単語の中の次の順位の単語を取り出して、単語があ
れば１品詞接続が可能か否かを判定する（同２７）。When the sorting is completed, the word with the highest priority is extracted as the most likely word from among the words of the most likely candidates that have been sorted (25). Next, determine whether there is a maximum likelihood word or not.
6), if there is not, determine whether there is a previous clause (29).
), if there is, preprocessing for hacking and racking (
Return to the processing of the immediately preceding clause) (30). in this case,
Return to step 25 using route ■, and for the previous clause,
The next ranked word among the candidate words is extracted, and if there is a word, it is determined whether or not one part of speech connection is possible (27).

ごの結果、接続可能と分かったら、接続可能な最尤単語
を確定単詔ハンファ（第１図の１０９）に格納され、最
尤単語に続く文節の読み・から処理を始めるため前処理
を行う（同２８）。続いて図の■の経路によりステップ
２０に戻って入力ハッファ内の後続の読みを取り出し、
以下の同様の処理を行う。If it is found that the words can be connected as a result, the most likely words that can be connected are stored in the confirmed single edict Hanwha (109 in Figure 1), and preprocessing is performed to start processing from the reading of the clause following the most likely word. (ibid. 28). Next, return to step 20 according to the path marked ■ in the figure, retrieve the subsequent readings in the input huffer, and
Perform the following similar processing.

上記のステップ２７において１品詞接続が不可であると
分かった場合は、更に■の経路でステップ２５に戻り、
他の候補単語を取り出す。If it is found in step 27 above that one-part-of-speech connection is not possible, return to step 25 via route ■.
Pick up other candidate words.

なお、ステップ２９．３０の処理は、上記の入力読みに
対応する単語（分かち書きデータ）が辞書にない場合（
ステップ２３）にも実行される。Note that the processing in steps 29 and 30 is performed only when the word (separation data) corresponding to the above input pronunciation is not found in the dictionary (
Step 23) is also executed.

もし、５ステツプ２９において直前文節がない場合は、
入力読みをすべて全角ひらがなに変換して（同３１）、
終了する（このようなすべてひらがなの表示が行われる
と完全に変換不能であることを意味する）。If there is no previous clause in step 29,
Convert all input readings to full-width hiragana (31),
(If all hiragana are displayed like this, it means that it is completely unconvertible).

上記ステップ２＋１こおいて、入力読みがないことが分
かると（終了状態）ステップ３２に移行する。ここで、
以後のフローにおいて、　「ａデータ」は分かち書き用
データ（第３図のＡ、、Ｃ）を表し、［ｂデータ［は出
力用データ（第３図のＢＤ）を表すものとする。At step 2+1, if it is found that there is no input reading (end state), the process moves to step 32. here,
In the subsequent flow, "a data" represents the separation data (A, C in FIG. 3), and "b data" represents the output data (BD in FIG. 3).

ステップ３２では、ａデータの変換結果（複数の単語が
含まれている時は最初のもの）から辞書のｂデータを読
み込む。このｂデータに、出力用データがあるか否かを
判定し、無い場合はａデータを全角ひらがなに変換しく
同３７）、この文章内に残りのａデータがあるか判断す
る（同３８）。In step 32, the b data of the dictionary is read from the conversion result of the a data (the first one if a plurality of words are included). It is determined whether or not there is output data in this b data, and if there is not, the a data is converted to full-width hiragana (37), and it is determined whether there is any remaining a data in this sentence (38).

この時　有る場合は■の経路でステップ３２に戻り、な
い場合は全ての文節についての漢字かな混じり文への変
換が終了し、変換した漢字かな混じり文を出力する。At this time, if there is a sentence, the process returns to step 32 via the route (■), and if there is not, the conversion of all the clauses into sentences containing kanji and kana is completed, and the converted sentences containing kanji and kana are output.

ステップ３３において、ｂデータに漢字がある時は、ｂ
データに同一品詞がないか判断しく同３４）、をると同
し品詞の漢字が複数個あることになり頻度に従ってａデ
ータをｂデータに置き換えて（同３５）、変換文字列の
処理対象を次の文節に変更し、残りのａデータの有無を
判断しく同３８）、上記と同様の処理を行う。」二記ス
テップ３４において、ｂデータに同一品詞がない場合、
ａデータをひらがなに変換する。In step 33, if b data has kanji, b
It is necessary to determine whether the data has the same part of speech (34), and since there are multiple kanji with the same part of speech, the a data is replaced with b data according to the frequency (35), and the processing target of the converted character string is The process is changed to the next phrase, the presence or absence of the remaining a data is determined (38), and the same processing as above is performed. ” In step 34, if the b data does not have the same part of speech,
a Convert the data to hiragana.

次に、第５図に示す本発明の詳細な説明する。Next, the present invention shown in FIG. 5 will be explained in detail.

（１）に示すように、読み人ノｊが「きょうははれてず
」を例とする。As shown in (1), for example, reader No.j says, ``Kyo wa Haretzu.''

（２）第１文節の処理「き」　「きょ」　「きょう」　「きょうは」　「きょ
うはは」・・・の読メで１分かち書きデータを読み込ん
で、評価の高いものから順に並べると、第５図（２）に
示すような順になる（最尤候補）。ここで、「きょうは
」が１番目の候補になるのは、長い文節を最優先にする
ことからである。(2) Processing of the first phrase "Ki", "Kyo", "Kyo", "Kyoha", "Kyohaha"...Read the 1-minute writing data and arrange them in descending order of the highest evaluation. The order is as shown in FIG. 5 (2) (maximum likelihood candidate). Here, "Kyoha" is the first candidate because long phrases are given top priority.

（３）第２文節の処理第１文節の１番目のデータ［きようは（教派）ｊに続く
読みで候補を探すが、辞書にないためハックトランキン
グして、第１文節の２番目のデータ「きょう（今日）」
に続く読み（「は」　「はは」　「ははれ」　「ははれ
て」・・・）を捜し第５図（３）のように候補群を作成
する。(3) Processing of the second clause The first data of the first clause [Kiyo wa (sect) I searched for a candidate with the reading following j, but since it was not in the dictionary, I hack trunked and converted the second clause of the first clause. Data “Kyo (Today)”
Search for pronunciations that follow (``ha'', ``haha'', ``hahare'', ``hahare''...) and create a candidate group as shown in Figure 5 (3).

（４）第３文節の処理第２文節の１番目のデータ「はは（母）」に続く読みで
候補を捜すが、辞書にないため、八ツクトラッキングし
て第２文節の２番目のデータ「は（は）」に続く読み（
「は」　「はれ」　「はれてず」）を捜し第５図（４）
に示すように候補群を作成１つする。(4) Processing of the third clause The first data of the second clause is searched for by the pronunciation that follows "haha (mother)", but since it is not in the dictionary, eight-track tracking is performed and the second data of the second clause is searched for. The reading that follows “ha (ha)” (
``Ha''``Hare''``Harezu'') Figure 5 (4)
Create one candidate group as shown in .

（５）第４文節の処理第３文節の１番目のデータ「は（ば）」に続く読のと２
番目のデータ「は（は）」で候補を捜すが辞書にないた
め、へツクトラッキングして第３文節の３番目のデータ
「はれ（晴れ）」に続く読みを捜し第５図（５）のよ・
うに候補群を作成する。(5) Processing of the 4th clause The first data of the 3rd clause is the reading and 2 following the ``ha (ba)''.
I searched for a candidate for the 3rd data ``ha (ha)'', but it was not in the dictionary, so I searched for the pronunciation following the 3rd data ``hare (hare)'' in the 3rd clause by tracking. Noyo・
Create a group of sea urchin candidates.

（６）第４文節の１番目のデータ「で（で）」に続く読
めと２番目のデータ「で（で）」で候補を捜すが辞書に
ないため、ハックｉ・ラッキングして第３文節の３番目
のデータ「でず（でず）」で読み文字列を全て変換し終
わるので、出力データを作成する（図示せず）。(6) I searched for a candidate using the reading following the first data "de (de)" and the second data "de (de)" in the fourth clause, but it was not in the dictionary, so I hacked and racked it and found the third clause. Since all reading character strings are converted with the third data "Dezu", output data is created (not shown).

（７）出力データの作成辞書の出力用データ部から「今日ｊ　「は」　「晴れ」
　「です」のそれぞれのデータを読み込んで変換文字列
を作成して出力する（図示せず）。(7) Creating output data From the output data section of the dictionary, select "Kyouj""Ha""Hare"
It reads each data of "desu", creates a converted character string, and outputs it (not shown).

［発明の効果１本発明によればバックドラッギングを繰り返して最も長
く文節を構成できるパターンを見つけることにより変換
完了とする日本語変換においてバックトラッキングする
際にべた書き人力の文字列を文節や単語に分かち書きす
る時に扱うデータを少なくすることができるので処理を
効率化できる。[Advantageous Effects of the Invention 1] According to the present invention, the conversion is completed by repeating backdragging to find the pattern that can form the longest phrase.When backtracking in Japanese conversion, it is possible to convert human-written character strings into phrases or words. Processing can be made more efficient because the amount of data to be handled can be reduced when writing separately.

また２日本語には同音異義語が数多くあるが品詞は数が
ある程度限られており同−品詞異字語が数多くあるので
、読みと品詞を取り出すことにより分かち書きの際に取
り扱うデータ数も少なくすることができる。In addition, although there are many homonyms in Japanese, the number of parts of speech is limited to a certain extent, and there are many homophones and different parts of speech, so by extracting the pronunciation and part of speech, the amount of data to be handled during partitioning can be reduced. I can do it.

扱うデータ長及びデータ数を少なくすることによりメモ
リを余裕を持って使用できファイルアクセス回数を減少
させることができる。By reducing the data length and number of data to be handled, the memory can be used with sufficient margin and the number of file accesses can be reduced.

よる動作例、第６図は従来例の構成図、第７図は従来例
の変換処理フロー図である。FIG. 6 is a block diagram of a conventional example, and FIG. 7 is a conversion processing flow diagram of a conventional example.

第１図中］０：処理装置１１；辞書１１１：分かち書き用データ部１１２：出力用データ部１２：入力部１３：出力部１０■：文字取り出し手段１０２・分かち書き用データ検索手段１０３：最尤単語評価選択手段１０４；出力用データ検索手段In Figure 1 ]0: Processing device 11; Dictionary 111: Separating data section 112: Output data section 12: Input section 13: Output section 10 ■: Character extraction means 102・Separation data search means 103: Maximum likelihood word evaluation selection means 104; Output data search means

[Brief explanation of drawings]

第１図は本発明の原理的構成図、第２図は本発明の実施
例の処理フロー図、第３図は本発明による辞書データの
構成例、第４図は単語間の品詞による接続可否を表すテ
ーブル、第５図は本発明に特許出願人　株式会社ビーエ
フニー（外１名）復代理人弁理士　　穂坂　和雄（３）第２文節の処理に用いる分かち書きデータ分かち書きデータFig. 1 is a diagram showing the basic configuration of the present invention, Fig. 2 is a processing flow diagram of an embodiment of the present invention, Fig. 3 is an example of the structure of dictionary data according to the present invention, and Fig. 4 shows whether or not words can be connected according to parts of speech. Figure 5 is a table representing the patent applicant for the present invention: Bfn Co., Ltd. (one other person), sub-agent patent attorney Kazuo Hosaka (3) Separation data used to process the second clause Separation data

Claims

[Scope of Claims] A kana-kanji conversion device that converts an input character string of kana characters into a sentence containing kanji and kana, which includes: a data section for separation that includes readings and parts of speech used only for separation search; and output data that includes readings, kanji, etc. For each character of the input character string, a word is extracted from the parting data section of the dictionary, candidate words are extracted from the retrieved words by maximum likelihood selection, and the input characters for the maximum likelihood word are extracted. It is determined whether the connection with the immediately preceding word in the string is possible, and if it is possible, the operation is performed on the subsequent character string, and if it is not possible, the immediately preceding word in the input string is backtracked using the above separation data and the maximum likelihood is calculated. A kana-kanji conversion device characterized by selecting a word, converting the selected maximum likelihood word into a sentence containing kanji and kana using the output data of the dictionary, and outputting the converted sentence.