JP2812495B2

JP2812495B2 - Syllabic input of language using kanji

Info

Publication number: JP2812495B2
Application number: JP1168660A
Authority: JP
Inventors: 健楠井
Original assignee: 健楠井
Priority date: 1989-06-30
Filing date: 1989-06-30
Publication date: 1998-10-22
Anticipated expiration: 2013-10-22
Also published as: JPH0334058A

Description

【発明の詳細な説明】 3.1 産業上の利用分野本発明は、漢字だけでなる文または漢字と音標文字と
が混在する文を使う言語のワードプロセッサに関する。
詳しくは、本発明は中国語、日本語、韓国語のワードプ
ロセッサに関する。更に詳しくは、本発明は、表音文字
（中国語はローマ字の音、日本語はカナまたはローマ字、韓国語はハングル）
を使用し音節を単位として文の読みを入力し、入力に追
従して最適語音区切を自動的に行う語音区切方式および
この語音区切方式により区切られた語音ごとに漢字変換
を行う語音漢字変換方式に関する。DETAILED DESCRIPTION OF THE INVENTION 3.1 Field of Industrial Application The present invention relates to a word processor of a language that uses a sentence consisting only of kanji or a sentence in which kanji and phonetic characters are mixed.
More specifically, the present invention relates to Chinese, Japanese, and Korean word processors. More specifically, the present invention relates to phonetic characters (Chinese Sound, Kana or Romaji for Japanese, Hangul for Korean)
A syllable delimiter system that automatically reads the sentence reading in syllable units using, and automatically follows the input as well as a kanji conversion system that performs kanji conversion for each word separated by this syllable delimiter About.

3.2 従来の技術 3.2.1 日本語ワードプロセッサにおける最長一致法に
よるカナ文字変換の弱点［あい／しょか」は「愛／初夏」か「愛書／家」かの
カナ漢字変換が困難である（斜線／は語の区切を表す。
以下同じ）。「じょうきげん」は「上／機嫌」か「蒸気
／源」かの判断変換が難しい。「周期的発病」や「周期
的振動」には一度でうまく変換されるのに、「しゅうき
てきせいちょう」は「しゅうきて紀勢町」となる。いわ
ゆる連語一括変換をしようとすれば、このような語の区
切違いが頻発し、文一括変換は、現在の日本語ワードプ
ロセッサ（以下では単にワープロと略記することがあ
る）においては、その機能があっても、使用者を真に納
得させるだけの性能がなく、実際には使用されないこと
が多い。3.2 Conventional technology 3.2.1 Weakness of kana character conversion by longest match method in Japanese word processor [Ai / Shoka] has difficulty in kana kanji conversion between "love / early summer" and "love book / house" (shaded lines) / Represents a word separator.
same as below). It is difficult to judge whether "Jokigen" is "upper / cheerful" or "steam / source". Although it is well converted at one time into "periodic onset" and "periodic oscillation", "shukyuekiseichi" becomes "shukyutekisemachi". When performing so-called collocation batch conversion, such delimitation of words frequently occurs, and sentence batch conversion has a function in a current Japanese word processor (hereinafter may be simply abbreviated as word processor). However, there is no performance enough to truly convince the user, and it is often not actually used.

文一括変換における正変換率を向上するために所謂語
接続処理を行う日本語ワードプロセッサもある。この語
接続処理では語彙を名詞や動詞の語幹などの主要語と助
詞や助動詞あるいは活用語尾や接頭・接尾語などの機能
語に分類し、主要語と機能語尾の接続関係を、システム
辞書における語の附属データとして導入しておき、その
附属データを利用して語区切を行う。かかる語接続処理
の採用により正変換率は相当に向上するが語接続処理を
行っても動詞形容詞の活用語尾を除いて、やはり誤変換
が少なくない。「ごへんかん」は「ご返還」か「御返
還」となり、「誤変換」と書きたいときには「ご／へん
かん」と人が語区切をしない限り、一発では誤変換にな
ることが多い。There is also a Japanese word processor that performs a so-called word connection process in order to improve the normal conversion rate in the sentence batch conversion. In this word connection process, vocabulary is classified into main words such as nouns and verb stems and functional words such as particles and auxiliary verbs or inflected endings and prefixes and suffixes. Is introduced as ancillary data, and word separation is performed using the ancillary data. By adopting such a word connection process, the correct conversion rate is considerably improved, but even if the word connection process is performed, erroneous conversions are not small except for the inflected endings of verb adjectives. "Gohenkan" is either "return" or "return", and when you want to write "wrong conversion", it is likely to be wrong conversion in one shot unless you separate the word with "go / henkan". .

現在の日本語ワープロの主要なカナ漢字変換方式は、
最長一致語区切＋語接続処理であり、最長一致語区切に
よるかぎり、長短各種の音節の読みが混じりあって文節
をなしている場合には変換成績が悪い。The main kana-kanji conversion method of the current Japanese word processor is
This is the longest matching word delimiter + word connecting process, and as long as the longest matching word delimiter is used, if syllables of various lengths are mixed to form a phrase, the conversion result is poor.

本発明は、語音（語の読み）の統計的頻率を重みとし
て、文全体に対して如何なる語音の並びが最適かを、
「語音エントロピー最小の原理」すなわち、文を構成す
る各語音の頻率の積が最大になるような語音の並びが最
も真に近いという原理によって判断し、かかる最適語音
並びが確定した後、各語音に対して個別に語音漢字変換
を行う。最長一致法も語音並びの最適化を行ってはいる
が、それは「語音の音節長が長いものが重みが大きい」
という、単純すぎる原理による。本発明は頻度統計量で
重みをつけて語区切を行うので、理論的に裏付けのしっ
かりした方式だといえる。The present invention uses the statistical frequency of speech sounds (word readings) as a weight to determine which speech sound arrangement is optimal for the entire sentence.
The principle of minimum speech entropy is determined by the principle that the sequence of speech sounds that maximizes the product of the frequency of each speech sound that constitutes a sentence is the closest to true. Kanji conversion is performed individually for. The longest match method also optimizes the arrangement of phonemes, but it says that "song syllables of speech have longer weights."
It is based on the too simple principle. In the present invention, word division is performed by weighting with a frequency statistic, so it can be said that the method is theoretically firmly supported.

3.2.2 中国語ワープロにおける音入力漢字変換のポイント人間工学の見地からすれば、中国語ワープロは、中国
式ローマ字たる音でキー入力し音漢字変換をして漢字文を出力するのが王道である。た
だし、それには文化的なハードルが幾つかある。そのう
ち重大なのは表音文字の音で記述させると、「中国人は語区切を間違う」という
事実である。大昔から漢字という表意文字だけを使って
文を書いてきたから、中国文には分かち書きの必要がな
かったこともある。しかし元来、中国語には単音節のの
語彙が多く、喫糖（飴を食べる）は語としては喫糖でも
喫／糖でも成立し、胡説（胡のようなことを言う−バカ
なことを言う）の語区切は、胡説でも胡／説でも間違い
ではない。現代中国語の語彙のうち90％は、かかる語区
切り不明確の語だといわれている（武占坤ほか共著、漢
字・漢字改革史、1988、湖南人民出版社、p.257）。3.2.2 In Chinese word processors From the point of view of ergonomics, Chinese word processors are Chinese-style Roman characters Key in the sound The royal road is to convert sound to kanji and output kanji sentences. However, it has some cultural hurdles. The most important of them is the phonetic The fact that they are described in sound is that "Chinese are mistaken for word separation." Since ancient times, writing has been done using only the ideographic character of kanji, the Chinese sentence did not need to be separated. Originally, however, Chinese has many monosyllable vocabulary words, and sugar (eating candy) is established as a word either as sugar or as a sugar, and has a Hu theory (say something like Hu-stupid The word delimiter is not wrong in either the Hu theory or the Hu / theory. It is said that 90% of the vocabulary in modern Chinese is such a word with unclear word separation (co-authored by Takezoku and others, Kanji / Kanji Reform History, 1988, Hunan People's Press, p.257).

現在までに開発され、中国や日本で既に売り出されて
いる音入力式の中国語ワープロは、すべて語単位の漢字変換
方式である（中国語には日本語のごとき文節がないか
ら、文線変換を前提とする最長一致法は使えない）。中
国語の語彙の97％は単音節語または双音節語であるか
ら、この方式では頻繁に人が変換キーを打つ必要があ
り、人の語区切ミスが終始起こることは避けられない。
ゆえに中国語ワープロにおいては、自動語区切による全
文一括漢字変換、または音節入力に追従した逐次語区切
漢字変換の導入が最も望ましいことになる。Developed to date and already available in China and Japan All sound-input Chinese word processors use the kanji conversion method on a word-by-word basis (Chinese does not have phrases like Japanese, so the longest match method that assumes line-line conversion cannot be used). Since 97% of the Chinese vocabulary is monosyllabic or disyllable, this method requires frequent human conversion keystrokes, and it is inevitable that human punctuation errors will occur all the time.
Therefore, in Chinese word processors, it is most desirable to introduce full-word kanji conversion using automatic word separation or sequential word separation kanji conversion following syllable input.

自動語区切による全文一括漢字変換を行う中国語ワー
プロについては、本願の発明者と同一人がした発明「中
国語の語音区切方式および語音漢字変換方式（特願昭63
−105030号）」および「中国語の語音区切方式（特願昭
63−172163号）」がある。これらの発明では語音（語の
読み）の統計的出現頻度を重みとして、文全体として最
適な語音の並び（即ち語音区切）を「語音エントロピー
最小の原理」によって判断するという音頻法による語音
区切方式を採用している。For the Chinese word processor that performs full-text batch kanji conversion by automatic word separation, see the invention "Chinese word separation method and speech kanji conversion method (Japanese Patent Application No.
−105030) ”and“ Chinese phonetic delimiter method (A.
63-172163) ". In these inventions, the statistical frequency of speech sounds (word readings) is used as a weight, and the optimal arrangement of speech sounds (that is, speech sound segmentation) for the entire sentence is determined by the principle of minimum speech entropy. Is adopted.

前述のように単音節語と双音節語が語彙の大部分だと
いう、中国語の読みの簡潔さが、音頻法により自動語区
切を適用するときに非常な強みとなる。実施テストによ
って、それは十分証明されている（第８図を参照された
い）。しかし３以上の音節の語音が多い文に対しては、
上記２件の発明では若干の不適応が生じる恐れがある。
本願の発明によって、そのような心配を一掃することが
できる。The simplicity of Chinese reading, as mentioned above, that monosyllabic and disyllable terms are the bulk of the vocabulary, is a great advantage when applying automatic word segmentation by phonetic frequency. It has been well proven by running tests (see FIG. 8). However, for sentences with more than three syllable words,
In the above two inventions, some maladaption may occur.
Such anxiety can be eliminated by the present invention.

3.2.3 韓国語ワープロとハングル漢字変換の状況韓国語はすべてハングルで書かれると思われている
が、伝統的な漢字まじりハングル文も依然勢力が強く、
しかも徐々に漢字の再評価が進む傾向が出てきている。
現在の韓国語ワープロの主流はハングル要素入力ハング
ル変換方式か、それに漢字熟語単位のハングル漢字変換
を付加した方式であるが、従来は日本語ワープロの後を
追って、ハングル要素入力漢字まじりハングル変換方式
のワープロが普及する可能性がある。第13図に日本語に
対応する韓国語の例を示す。3.2.3 Korean word processor and Hangul-Kanji conversion situation It is thought that all Korean is written in Hangul, but traditional Kanji-style Hangul sentences are still strong,
In addition, there is a tendency that the re-evaluation of kanji gradually progresses.
Currently, the mainstream of the Korean word processor is the Hangul element input Hangul conversion method or the method that adds Hangeul Kanji conversion for each kanji idiom. Word processors may spread. FIG. 13 shows an example of Korean corresponding to Japanese.

第13図の例のように、韓国語は漢語（韓国漢字音でカ
タカナで表現）と、純韓国語の語彙（ひらがなで表現）
からなる点で、日本語によく似ている。文法と語順も日
本語に近い。ゆえに漢字まじりハングル文を扱うワープ
ロにおいては、設計方針は日本語ワープロと基本的に同
じでよい。そこで、本発明が漢字まじりハングル文のワ
ープロに対しても、十分に有効なことは後述のとおりで
ある。As shown in the example in Fig. 13, Korean is kanji (expressed in Korean kanji sounds in katakana) and pure Korean vocabulary (expressed in hiragana)
It is very similar to Japanese in that it consists of Grammar and word order are similar to Japanese. Therefore, in a word processor that handles Hangul sentences with kanji, the design policy may be basically the same as that of a Japanese word processor. Therefore, it will be described later that the present invention is sufficiently effective also for a word processor of a Hangul sentence composed of Chinese characters.

3.3 発明が解決しようとする課題前に述べたように、本発明者は、中国語ワープロに関
して、既に２つの発明について特許出願をしている（特
願昭63−105030:中国語の語音区切方式および語音漢字
変換方式、特願昭63−172163:中国語の語音区切方
式）。これらの発明は、いずれも音入力を前提とし、単音節語と双音節語について、語音
の統計頻度を情報量たる頻級として定義し、それを辞書
データのなかに取り込み、文を構成する語の頻級の和が
最小になるような語音の並びが、最も確からしい語音の
区切を示すという「語音情報エントロピー最小の原理」
を適用して自動語音区切を実行するものである。この自
動語音区切の技術が「音頻法」である。3.3 Problems to be Solved by the Invention As described above, the present inventor has already filed patent applications for two inventions for the Chinese word processor (Japanese Patent Application No. 63-105030: Chinese word separation system). And Japanese kanji conversion method, Japanese Patent Application No. 63-172163: Chinese word separation method). Each of these inventions Assuming phonetic input, for monosyllabic words and disyllable words, define the statistical frequency of speech as a frequency that is an information amount, import it into dictionary data, and minimize the sum of the frequency of words that compose the sentence. "Principle of minimum speech information entropy" that the arrangement of speech sounds that shows the most probable separation of speech sounds
Is applied to execute automatic speech separation. This technique of automatic speech separation is the "tone frequency method".

ただし、上記の２件の発明においては、音頻法の処理
対象は、単音節語と双音節語に限られていた。その理由
は、一般論としてＭ音節までの（Ｍは理論的には任意の
正の整数）語音を含む語音列を音頻法によって処理する
ときには、処理が繁雑になり処理時間が長くなって、読
み漢字変換の短時間のリアルタイム処理をいやおうなし
に要求されるワープロに対する設計条件を満たさなくな
る恐れがあると考えられたからであった。However, in the above two inventions, the processing target of the vocal frequent method is limited to monosyllabic words and disyllable words. The reason is that, when a speech sequence including speech up to M syllables (M is theoretically any positive integer) up to M syllables is generally processed by the tone frequency method, the processing becomes complicated and the processing time becomes longer, and the reading time becomes longer. This was because it was thought that there was a risk that the design conditions for a word processor required without short-time real-time processing of kanji conversion would not be satisfied.

本発明は上記２件の発明を補うため、３以上の音節語
音を含む音頻処理区に対して、極力簡潔な音頻語音処理
アルゴリズムの提供を目的とする。In order to supplement the above two inventions, the present invention aims to provide as simple and simple a syllable speech processing algorithm as possible for syllable processing sections containing three or more syllables.

中国語に比べて、日本語や韓国語は１語あたりの音節
長が長い。そこで、本発明は、少なくとも３音節以上の
長さの語音についても比較的短い処理時間で音頻語音区
切処理が可能であり、中国語以外に日本語や韓国語にも
適用できる音頻法による語音区切方式および漢字変換方
式を提供しようとするものである。Compared to Chinese, Japanese and Korean have longer syllable lengths per word. Therefore, the present invention enables vocal phrasal speech segmentation processing in a relatively short processing time even for speech sounds having a length of at least three syllables, and also uses speech frequent method which can be applied to Japanese and Korean besides Chinese. It is intended to provide a system and a kanji conversion system.

3.4 課題を解決するための手段前述の課題を解決するために本発明が提供する手段
は、漢字を使用する言語の文において、音標文字を使用し
た音節を単位として逐次に入力して得た音節列に対し
て、該音節列を語音に区切って最適の語音列を逐次に求
め、該語音列の個々の語音ごとに逐次に語音漢字変換を
行い、最確の漢字語列を得る音節入力語音逐次区切漢字
逐次変換方式において、読みを同じくし統計上有意義な各語の読みを１個の語
音とし、前記言語の文に使用される個々の語音の統計的出現頻
度をｆとし、該語音の音節長をｓとし、語音系統資料中
全語音の延べ音節総数をF_tとするとき、各語音の頻率ｐ
をｐ＝（ｆ×ｓ）/F_tとし、各語音の頻級ＩをＩ＝int（−log_a p）、ただしａ＝２として整数にし、前記言語の文において連続した語音列を入力すると
き、該語音列の先頭の第１音節から最近に入力した第ｎ
音節までの語音列を音頻句とし、該音頻句を最近の時点
における語音逐次区切逐次漢字逐次変換処理の対象と
し、該音頻句における各語音の頻級の和を頻級和すると
き、前記言語において統計上有意義な１〜Ｍ音節長（ただ
しＭは２以上の整数）の語音を見出しとして、該各語音
の頻級を収納した語音頻級辞書と、前記各語音を見出しとして、該語音を読みとする漢字
同音語を漢字文字列の形で収納した語音漢字語辞典とを
備え、前記音頻句において、１個の音節が入力される度に、
該音節を末尾とする１〜Ｍ音節長のＭ種類の各語音を見
出しとして、前記語音頻級辞書において該各語音の頻級
を検索し、該各語音と各頻級とを次項に記載する最適頻
級和逐次計算手段に送る語音頻級検索手段と、前記の最適頻級和逐次計算手段に関して、該音頻句における音節入力番号をｎ（＝1,2,3,……,n）
とし、最近入力したｎ番目の音節を末尾とする１〜Ｍ音節の長
さのＭ個の語音をそれぞれR_n1,R_n2,R_n3,……,R_nMとし、該Ｍ個の語音の頻級をそれぞれI_n1,I_n2,I_n3,……,I_nMとし、ｎ音節の長さの語音列において最大限に可能な型の語
音区切型を、末尾の語音の音節の長さｍがそれぞれ1,2,
……,Mで、読みがそれぞれR_n1,R_n2,R_n3,……,R_nMである
Ｍ個の組に分類し、該Ｍ個の組ごとの最小頻級和を、それぞれP_n1,P_n2,P
_n3,……,P_nMとしさらに該P_n1,P_n2,P_n3,……,P_nM中で最
小の値のものを最適頻級和Pnとし、順次の音節入力によ
って、ｎが１から１づつ増加するに従い、つぎつぎに該
P_nを求めることに関して、ｎが１≦ｎ≦Ｍの範囲にあるとき、ｎが１のときには、P₁を、 P_nm＝I₁₁＝P₁によって計算し、ｎがＭに対して１＜ｎ≦Ｍの範囲にあるときには、ｍ≦ｎ−１のｎ−１個のｍに対して、ｎ−１個のP_nm
を P_nm＝P_n-m＋I_nmによって計算し、ｍ＝ｎの１個のｍに対しては１個のP_nmを P_nm＝P_nn＝I_nmによって計算し、結局、P_n1,P_n2,……,P_nMのＭ個の最小頻級和を求め、ｎがＭに対してＭ＜ｎの範囲にあるときには、１≦ｍ≦ＭのＭ個のｍに対してＭ個の最小頻級和を P_nm＝P_n-m＋I_nmによって計算し、結局、P_n1,P_n2,……,P_nmのｎ個の最小頻級和を求め、結局、ｎ組またはＭ組の各組ごとの最適語音区切型の
最小頻級和を、現在の音節入力番号ｎよりもｍ音節以前
の音節入力直後の処理によって既に求められ記憶されて
いるｎ個またはＭ個の最小頻級和P_n-mの各々に、現在検
索したｎ個またはＭ個の頻級I_nmの各々を加算すること
によって求める最小級和逐次計算手段と、音節１個が入力される度に、前記最小頻級和逐次計算
手段によって得られたｎ個またはＭ個の最小頻級和の値
を受け取り、これらに対してｎ者またはＭ者択一の大小
比較選抜を行い、該ｎ者またはＭ者のなかでの最小の値
P_nmを最適頻級和P_nとして求め、該P_nの値を記憶し、同
時に該頻級和を持つ唯一の語音列の末尾の語音R_nmoと該
語音の音節数moとを求める最適頻級和区切型選抜手段
と、前記最適頻級和区切型選抜手段が求めた語音R_nmoを受
け取り、該R_nmoを見出しとして、現在入力された音節を
末尾とする音節数moの同音漢字語のうち、現在最も確か
らしい漢字語H_nmoを、前記語音漢字語辞書から読み出
し、次項の最確漢字列計算手段に送る最確漢字語検索変
換手段と、前記の最適頻級和区切型選抜手段が求めた前記mo、お
よび前項の最確漢字語検索変換手段が求めた最確漢字語
H_nmoを受けて、ｎが１を初期値とし音節入力の度に１づ
つ増加する度に、 K_nmo＝K_n-mo＋H_nmo なる文字列加算によって現在の最確漢字列K_nmoを求め本
発明の手段の出力とする最確漢字列計算手段とを備え、総括すれば、前記言語が１音節からＭ音節までの語音
を持つとき、Ｎ音節の文において、音節番号をｎとし、
ｎの初期値を１とし、語音音節に順次に入力し、ｎが１づつ増加する度に、当該文の末尾に存在し得る
Ｍ個（ｎ≦Ｍのときはｎ個）までのｍ音節と語音と頻級
とを求め、前記の文の先頭から該末尾語音直前の音節ま
での語音列に対するＭ個までの既に求めてある確最適語
音列の頻級和の各々に、前記Ｍ個までの各末尾語音の頻
級を加算し、結局ｎ音節の長さを持つＭ個までの語音列
の頻級和を得、それら頻級和のなかで最小の頻級和を持
つ唯一個の語音列を現在の最適語音区切の語音列とし、
該語音列の末尾語音を現在の最適末尾語音として決定
し、さらに該語音を漢字変換した漢字列を末尾語音漢字列
とし、該語音よりも前の既知の最確語音列に対する漢字
変換列に該末尾語音漢字列を接続した新漢字列を得て、
ｎが１づつ増加する度に、該漢字列を新たに逐次出力す
ることを特徴とする、漢字を使用する言語の音節入力語音逐次区切漢字逐次
変換方式。3.4 Means for Solving the Problems The means provided by the present invention for solving the above-mentioned problems are syllables obtained by sequentially inputting syllables using phonetic characters in units of sentences in a language using kanji. Syllabic input speech to obtain the most accurate kanji word sequence by sequentially dividing the syllable sequence into speech sounds to obtain an optimal speech sequence, sequentially performing speech to kanji conversion for each speech in the speech sequence, In the sequential delimited kanji sequential conversion method, the pronunciation of each word having the same pronunciation and being statistically significant is defined as one speech sound, the statistical frequency of occurrence of each speech sound used in the sentence of the language is defined as f, the syllable length is set to s, when the total syllable total number of all speech in the speech system material and the F _t, Shikiritsu p of each word sound
, P = (f × s) / F _t, and the frequency I of each speech is an integer, where I = int (−log _a p), where a = 2, and a continuous speech sequence in the sentence of the language is input. Then, from the first syllable at the beginning of the word string, the nth most recently input
When the syllable string up to the syllable is a frequent phrase, the frequent phrasal is subjected to the sequential speech-separation-sequential kanji sequential conversion process at the latest time point, and when the sum of the frequent classes of the respective speech sounds in the syllable is frequently summed, In the above, the speech sounds of 1 to M syllable lengths (where M is an integer of 2 or more) which are statistically significant are used as headings, and the speech frequency frequent dictionary storing the frequency of each speech is used as the heading. A lexical kanji word dictionary in which kanji homonyms to be read are stored in the form of kanji character strings, and each time one syllable is input in the vocal phrase,
Searching for the frequency of each speech in the speech frequency dictionary, using each speech of M types having 1 to M syllable lengths ending with the syllable as a heading, and describing each speech and each frequency in the next section With respect to the speech frequency class search means to be sent to the optimal frequency class sum sequential calculation means, and the optimal frequency class sum sequential calculation means, the syllable input number in the vocal phrase is n (= 1, 2, 3,..., N)
, R _n1 , R _n2 , R _n3 ,..., R _nM , respectively, each of the M words having a length of 1 to M syllables ending with the n-th syllable input recently. The classes are _In1 , _In2 , _In3 , ..., _InM , respectively. The syllable segmentation type of the type that can be maximized in the syllable string of n syllable length is defined as the syllable length m of the last syllable. 1,2,
.., M and the readings are classified into M sets of R _n1 , R _n2 , R _n3 ,..., R _nM , respectively, and the minimum frequency sum for each of the M sets is P _n1 , P _n2 , P
_n3, ......, further the P _n1 and _{_{_{P nM, P n2, P n3}}} , ......, the optimum Shikikyu sum Pn to the smallest value in P _nM, by sequential syllable input, n is from 1 1 As the number increases,
With respect to obtaining the P _n, when n is in the range of 1 ≦ n ≦ M, when n is 1, 1 P _1, calculated by _{_{_{P nm = I 11 = P 1}}} , n is relative to M < When n ≦ M, n−1 P _nm for n−1 m of m ≦ n−1
Is calculated by P _nm = P _nm + I _nm , and for one m of m = n, one P _nm is calculated by P _nm = P _nn = I _nm , so that P _n1 , P _n2 , ..., P _nM M minimum frequency sums are obtained. When n is in the range of M <n for M, M minimum frequency classes for M m of 1 ≦ m ≦ M The sum is calculated by P _nm = P _nm + I _nm , and finally, the n minimum frequency sums of P _n1 , P _n2 ,..., P _nm are obtained. The minimum utterance class sum of the syllable delimiter type is added to each of the n or M minimum frequent class sums P _nm already obtained and stored by the processing immediately after the syllable input m m syllables before the current syllable input number n. , A minimum sum sum sequential calculation means obtained by adding each of the n or M frequency classes I _nm searched at present, and the minimum frequency sum successive calculation means each time one syllable is input. Is Receives the value of n or the M minimum frequent class sum, controller compares selection of n's or M's alternative to these minimum values among the n's or M's
Seeking P _nm as the optimal Shikikyu sum P _n, the store the value of P _n, frequent optimum seeking and only word sound column at the end of the speech R _nmo and number of syllables mo of word or sound with該頻class sum simultaneously Class sum _sectioning type selection means, receives the speech R _nmo obtained by the optimal frequency class sum _sectioning type selection means, and, with the R _nmo as a heading, a homophone kanji word of the syllable number mo ending with the currently input syllable. Among them, the most probable kanji word H _nmo at present is read from the phonetic kanji word dictionary and sent to the most probable kanji string calculating means in the next section, the most probable kanji word search conversion means, The determined mo and the most probable kanji word obtained by the most probable kanji word search conversion means of the preceding paragraph
In response to H _nmo , every time n is incremented by 1 each time the syllable is input with 1 as the initial value, the current most probable kanji character string K _nmo is obtained by adding the character string K _nmo = K _n-mo + H _nmo The most probable kanji character string calculating means as an output of the means of the invention. In summary, when the language has speech sounds from one syllable to M syllable, in a sentence of N syllables, the syllable number is n,
The initial value of n is set to 1 and the syllables are sequentially input to the syllable. Each time n increases by one, up to M syllables that can exist at the end of the sentence (n when n ≦ M) A speech sound and a frequent class are obtained, and up to M frequent sums of the already determined probable optimum speech sequences for the speech sequence from the beginning of the sentence to the syllable immediately before the end speech are obtained. By adding the frequent classes of the last vocal sounds, a sum of vocal sequences of up to M words having the length of n syllables is obtained. Is the current optimal phonetic punctuation, and
The ending speech of the speech sequence is determined as the current optimal ending speech, and the kanji sequence obtained by converting the vocabulary to kanji is designated as the ending speech kanji sequence, and the kanji conversion sequence for the known most probable vocabulary sequence prior to the speech is determined. Obtain a new kanji string connected to the last word kanji string,
A syllable-input-sequential-separated-kanji-sequential-sequential-conversion system for a language using kanji, characterized in that the kanji string is newly newly output each time n increases by one.

3.5 作用 3.5.1 ｎ音節長の音頻処理区における１〜ｍ音節の語
音の総ての組み合わせとそのシステム第１図は１〜７音節の長さを持つ音頻処理区におい
て、１〜７音節語音の総てが存在するとき可能な語音区
切型の一覧を例として示す。3.5 Action 3.5.1 All combinations of speech sounds of 1 to m syllables in n-syllable length syllable processing section and its system Fig. 1 shows 1 to 7 syllable word sounds in syllable processing section with 1 to 7 syllable length Here is an example of a list of possible speech-segmentation types when all of the above exist.

ただし、小文字のローマ字a,b,c……は音節を示し、
例えばabcは３個の音節a,bおよびｃによる３音節の語音
を示す。また／は語音間の区切を示す。However, lowercase Roman letters a, b, c …… indicate syllables,
For example, abc indicates a speech of three syllables by three syllables a, b, and c. Also, / indicates a separation between speech sounds.

まず、本発明の説明で用いる主な用語を次のように定
義しておく。First, the main terms used in the description of the present invention are defined as follows.

読みを同じくし統計上有意義な各語の読みを１語の語
音とし、前記語音の文に使用される個々の語音の統計的出現頻
度をｆとし、該語音の音節長をｓとし、語音統計資料中
全語音の延べ音節総数をF_tとするとき、各語音の頻率Ｐ
をＰ＝（ｆ×ｓ）/F_t とし、各語音の頻級ＩをＩ＝int（−log_a p）、ただしａ＝２として整数にし、前記言語の文において連続した語音並びにおいて、最
近に入力した１音節語音を末尾とする如何なる２以上多
音節語音も存在しないとき、該語音並び中の該１音節語
音の直前の点を節点とし、互いに隣り合う２個の接点間の語音並びの音節列を音
頻句とし、該音頻句の先頭音節から連続した各語音の頻級の和を
頻級和とする。The pronunciation of each word having the same pronunciation and statistically significant is regarded as one word sound, the frequency of statistical appearance of each speech used in the sentence of the speech is assumed as f, the syllable length of the speech is assumed as s, the speech statistics when the total syllable total number of all speech in the material and the F _t, Shikiritsu P of each word sound
Is defined as P = (f × s) / F _t, and the frequency I of each speech is an integer as I = int (−log _a p), where a = 2. When there is no two or more polysyllabic words ending with the one syllable word input to the syllable word, the point immediately before the one syllable word in the word sequence is set as a node, and the word sequence between two adjacent contact points is determined. A syllable sequence is referred to as a vowel phrase, and the sum of the continuations of each speech sound from the first syllable of the vowel phrase is referred to as a frequent sum.

第１図の最下欄に示すように、音頻処理区（処理の対
象となる音頻句の区間）の音節長をｎとするとき、理論
上可能な語音区切型の数U_nは2^n-1種存在し得る。したが
ってｎ＝８のときには128種の区切型があり、このなか
から頻級和が最小の最適区切型１個を選抜する計算処理
は相当の手間をとることがわかる。しかし区切型は、第
１図に示すように一定の秩序を以って組織的に配列する
ことができ（語音区切型の樹構造）、ｎが１づつ増加す
るにつれて、次のｎで如何なる型が存在するかは、その
秩序を利用して計算できる。As shown in the bottom column of FIG. 1, when the syllable length of the syllable processing section (the section of the syllable phrase to be processed) is n, the number U _n of theoretically possible speech segmentation types is 2 ⁿ⁻ There can be ^one species. Therefore, when n = 8, there are 128 types of partition types, and it can be seen that the calculation process of selecting one optimal partition type with the smallest frequent sum takes a considerable amount of time. However, the partitioning type can be systematically arranged with a certain order as shown in FIG. 1 (word-segmentation type tree structure), and as n increases by one, any type in the next n Exists can be calculated using the order.

第１図の区切型は、各ｎに対し、以下の秩序によって
配列されている。The partition type shown in FIG. 1 is arranged for each n according to the following order.

（１）最上行の型は単音節語音ｎ個の語音列、最下行
の型はｎ音節語音１個の語音列である（特に白抜き字で
示してある）。(1) The type in the top row is a word string of n monosyllable words, and the type in the bottom row is a word string of one n-syllable word (particularly shown in white characters).

（２）語音列における区切の型を次のような２進数Ｂ
で表現する。(2) The type of delimiter in the speech sequence is represented by the following binary number B
Expressed by

（ａ）音節間に区切／があれば１、なければ０とす
る。(A) If there is a delimiter / syllable between syllables, 1 is set;

（ｂ）音節列の第１番目の区切位置（ａとｂの間）を
２進数の第１桁の数とし、第２番目のｂとｃの間の区切
位置を２進数の第２桁の数とし、ｎ音節の語音列に対し
てｎ−１桁の２進数を作り、区切型を表現する。例えば
a/b/cd/efgは、Ｂ＝001011となる。(B) The first delimiter position (between a and b) of the syllable string is the first digit of the binary number, and the delimiter position between the second b and c is the second digit of the binary number. As a number, a binary number of n-1 digits is created for a word string of n syllables to represent a delimited type. For example
a / b / cd / efg is B = 001011.

（３）上記の２進数Ｂは、例えばｎ＝７においては、
最下行のabcdefgを表す000000から始まり上に向かって
順次に１づつ増加し最上行のa/b/c/d/e/f/gを表す11111
1に至る。したがってＢは2^7-1＝64個のすべての２進数
を表現し、Ｂが表現する区切型は１〜７音節の語音で構
成される全７音節長の語音列のすべての区切型を網羅し
ていることになる。(3) The above-mentioned binary number B is, for example, n = 7.
Starting from 000000 representing abcdefg on the bottom line and increasing in order by one upward and increasing to 1111 representing a / b / c / d / e / f / g on the top line
Leads to one. Therefore, B represents all 2 ^7-1 = 64 binary numbers, and the delimiter type represented by B covers all delimiter types of a word sequence of seven syllable lengths composed of speech sounds of one to seven syllables. You are doing.

第１図の区切型は左から右へ見ていくと、音節数ｎの
区切型は、次のような構造の型の群れに分類されること
が分かる。Looking from left to right, the partitioning type in FIG. 1 shows that the partitioning type having the number of syllables n is classified into a group of types having the following structure.

（ａ）末尾が１音節語音の型は、音節数ｎ−１におけ
る各型の末尾に同じ１音節語音を加えたものである。(A) The type of a one-syllable word sound ends with the same one-syllable word sound added to the end of each type in the syllable number n-1.

（ｂ）末尾が２音節語音の型は、音節数ｎ−２におけ
る各型の末尾に同じ２音節語音を加えたものである。(B) The type of the two-syllable word sound at the end is obtained by adding the same two-syllable word sound to the end of each type in the syllable number n-2.

…… …… （ｍ）末尾がｍ音節語音の型は、音節数ｎ−ｍにおけ
る各型の末尾に同じｍ音節語音を加えたものである。(M) The type of m-syllable words at the end is the same m-syllable word at the end of each type in the syllable number nm.

3.5.2 最小頻級和を求める処理と最適語音区切漢字変
換処理の同時進行以上の縦横の秩序を別な側面から見ると、縦方向の秩
序は、第１図が理論的に可能なすべての区切型を網羅し
ていることを示し、横方向の秩序は、この図によって、
ｎ＝１における１音節語音a,n＝２における２音節語音a
b……,n＝ｍにおけるｍ音節語音abcd……ｍ等ｍ種の語
音を材料として、ｎ＝１におけるａから始まり、上記
（ａ）〜（ｍ）の各手続きによって、ｎが１増加するご
とに、新しいｎに属する総ての型を組織的に順次に作っ
ていくことができることを示す。第１図は、その結果で
きた語音区切型の樹構造を示している。3.5.2 Simultaneous progression of the process of finding the minimum frequency sum and the process of converting the optimal syllables into kanji. Looking at the above vertical and horizontal order from another aspect, the vertical order is the same as that of Fig. 1 It shows that the delimited type is covered, and the horizontal order is
One syllable word a at n = 1, two syllable word a at n = 2
b,..., m-syllable words abcd at n = m, abcd... m, etc., starting from a at n = 1, n is incremented by 1 in each of the above procedures (a) to (m). It shows that every type belonging to a new n can be systematically and sequentially created. FIG. 1 shows the resulting speech-segmented tree structure.

語音区切型の樹構造の秩序に対する上記の認識から、
最小頻級分の逐次計算に関する次のような重要な結論が
導かれる。From the above perception of the order of the tree structure of the phonetic division type,
The following important conclusions regarding the sequential calculation of the least frequent class are drawn.

音頻処理によってえ扱う語音を１〜Ｍ音節の語音に限
るとき、ｎ音節長の語音列中、（１）末尾が１音節語音の語音列中で最小頻級和を有
するものの頻級和は、ｎ−１音節長の語音列中で最小頻
級和を持つものの頻級和に、ｎ番目に入力した音節の単
音節語音の頻級を加えて得られる。該頻級和の値をP_n1
とする。該頻級和を持つ語音列の末尾語音の音節数は１
である。When the speech to be handled by the vocal frequency processing is limited to speech of 1 to M syllables, in the speech sequence of n syllable length, (1) the speech sum of the speech sequence of the last one syllable speech having the minimum frequency sum is: It is obtained by adding the frequency of the monosyllabic speech of the nth input syllable to the frequency sum of the speech sequence having the minimum frequency sum in the word sequence having n-1 syllable lengths. P _n1
And The number of syllables in the last word of the word string having the frequent sum is 1
It is.

（２）末尾が２音節語音の語音列中で最小頻級和を有
するものの頻級和は、ｎ−２音節長の語音列中で最小頻
級和を持つものの頻級和に、ｎ−１番目に入力した音節
とｎ番目に入力した音節を加えた２音節語音の頻級を加
えて得られる。該頻級和の値をP_n2とする。該頻級和を
持つ語音列の末尾語音の音節数は２である。(2) The frequent sum of a speech sequence having a minimum frequency sum in a speech sequence having a two-syllable word sound at the end is the sum of n-1 in a speech sequence having a minimum frequency sum in a speech sequence of n-2 syllable lengths, and n-1. It is obtained by adding the frequency of a two-syllable word sound that is the sum of the syllable input at the nth position and the syllable input at the nth position. The value of the frequency sum is defined as P _n2 . The number of syllables of the last speech in the speech sequence having the frequency sum is 2.

…… …… （Ｍ）末尾がＭ音節音語の語音列中で最小頻級和を持
つものの頻級和は、ｎ−Ｍ音節長の語音列中で最小頻級
和を持つものの頻級和に、ｎ−Ｍ＋１番目に入力した音
節からｎ番目に入力した音節までのＭ個の音節を順次に
連結して得たＭ音節語音の頻級を加えて得られる。該頻
級和の値をP_nMとする。該頻級和を持つ語音列の末尾語
音の音節数はＭである。…… …… (M) The frequent sum of the syllables ending with M syllables that has the least frequent sum is the frequent sum of the syllables that have the minimum number of sums in the mnemonic syllable length And a frequency of M syllable words obtained by sequentially connecting M syllables from the (n−M + 1) th input syllable to the nth input syllable. Let the value of the frequency sum be P _nM . The number of syllables of the last speech in the speech sequence having the frequency sum is M.

結局、ｎ音節長の語音列のなかで最小頻級和を持つ区
切型は、上の（１）〜（Ｍ）のＭ個の最小頻級和の区切
型P_n1,P_n2,……,P_nMのなかで、更に最小の値を持つ総最
小頻級和の区切型ただひとつである。該総最小頻級和の
値をP_nとする。音頻法の原理に基づけば、該区切型は最
適語音区切型であり、末尾の語音の音節数をmoとし、そ
の語音の頻級をI_moとすれば、 P_n＝P_n-mo＋I_mo である。As a result, the partitioning type having the minimum frequency sum in the word sequence having n syllable lengths is the partitioning type P _n1 , P _n2 ,..., Of the above M minimum frequency sums (1) to (M). In P _nM , there is only one delimited type of the total minimum frequency sum with the smallest value. Let the value of the total minimum frequency sum be P _n . Based on the principles of sound frequent method compartment Setsugata is optimal speech separator type, the number of syllables end of speech and mo, if the frequent class of the speech and _{_{_{I mo, P n = P n}}} -mo + I mo It is.

ｎを音頻句の音節入力の順を示す音節入力番号と定義
する。ｎは１を初期値とし、音節が入力されるに従って
１づつ増加していくものとする。n is defined as a syllable input number indicating the order of syllable input of syllable phrases. It is assumed that n has an initial value of 1 and increases by one as a syllable is input.

現在（ｎ＝ｎ）入力した音節を末尾とし、１音節から
Ｍ音節の長さの音節列を読みとするＭ個の語音をそれぞ
れR_n1,R_n2,……,R_nMとする。Assume that the syllable currently input (n = n) is at the end, and M words that read a syllable string having a length from one syllable to M syllables are R _n1 , R _n2 ,..., R _nM .

語音R_n1,R_n2,……,R_nMの頻級を、それぞれI_n1,I_n2,…
…,I_nMとする。The frequencies of speech sounds R _n1 , R _n2 , ..., R _nM are represented by In ₁ , In ₂ , respectively.
…, _InM .

同時に語音がR_n1,R_n2,……,R_nMで、現在最も確からし
い漢字語をそれぞれH_n1,H_n2,……,H_nMとする。At the same time speech is R _n1, R _n2, ......, in the R _nM, H _n1, H _n2 seems now most certainly Kanji words each, ..., and H _nM.

現在の音頻処理区の音節長をｎ、最大語音音節長をＭ
としたとき、可能な区切型の最大数U_nとすれば、ｎ≦Ｍのときには U_n＝2^n-1 ｎ＞Ｍのときには、初期値をU_M＝2^M,U_M-1＝2^M-1……,
U₁＝１として U_n＝U_n-1＋U_n-2＋……＋U_nM であり、一例としてＭ＝4,n＝10ときにはU₁₀＝401とな
る。実際の書き言語においては、ｎ音節長の音頻処理区
において、R_n1からR_nMまでの総ての語音が実在するとは
限らないので、U_nの値は普通は上記よりも小さい。しか
し、ｕもＭも大きいときにはU_nは相当に大きな値となる
ことは明らかである。それら総ての区切型の個々に対し
て頻級和を計算し、最小頻級和を持つ型を選抜し確定し
更に語音漢字変換の処理をするためには多くの時間を要
し、１個の音節入力と次の音節入力の間の僅かな時間で
は処理が困難になってくる。かかる困難を一掃するの
が、ここに述べる逐次音頻処理法である。The syllable length of the current syllable processing section is n and the maximum syllable syllable length is M
If _n is the maximum number of delimitable types, U _n = 2 ^n-1 when n ≦ M, and initial values are U _M = 2 ^M and U _M-1 = 2 when n> M. ^M-1 ……,
Assuming that U ₁ = 1, U _n = U _n-1 + U _n-2 +... + U _{nM. For} example, when M = 4, n = 10, U ₁₀ = 401. In actual writing language, in sound frequently treated section of the n syllable length, since not all words sound from R _n1 to R _nM actually exists, the value of U _n is usually less than the. However, u also M when also large it is clear that a large value U _n is considerably. It takes a lot of time to calculate the frequency sum for each of these delimited types, to select and determine the type with the minimum frequency sum, and to process the phonetic kanji conversion. The processing becomes difficult in a short time between one syllable input and the next syllable input. The successive tone frequency processing method described herein eliminates such difficulties.

上記U_n個の語音区切型を、末尾の語音がR_n1,R_n2,…
…,R_nMのＭ組に分ける。各組ごとの最小頻級和P_n1,P_n2,
……,P_nM各組ごとの多数の区切型について頻級和の比較
処理をして求める必要はなく、ｎがｎ−1,n−2,……,n
−Ｍであった以前の段階における処理によって既に求め
てある総最小頻級和P_n-1,P_n-2,……,P_n-Mのそれぞれ
に、現在のｎにおいて求めたI_n1,I_n2,……,I_nMのそれぞ
れを加算して、次式によって簡単に求められる。The above U _n speech-segment delimiters are represented by R _n1 , R _n2 , ...
…, _Divided into M sets of R _nM . The minimum frequency sum P _n1 , P _n2 ,
..., P _{nM It} is not necessary to obtain a large number of sums for each set by comparing the frequent sums, where n is n−1, n−2,.
−M, the total minimum frequency sums P _n−1 , P _n−2 ,..., P _nM already obtained by the processing in the previous stage are In _n1 and In _n2 obtained at the current n. ,..., And _InM , each of which is easily obtained by the following equation.

ｍを語音区切型における末尾語音の音節数とすると
き、ｎが１≦ｎ≦Ｍの範囲にあるときには、ｍ≦ｎ−１のｎ−１個のｍに対しては、該ｎ−１個の
P_nmを P_nm＝P_n-m＋I_nmによって計算し、ｍ＝ｎの１個のｍに対しては、該１個のP_nmを P_nm＝P_nn＝I_nmによって計算し、結局P_n1,P_n2,……,P_nnのｎ個の最小頻級和を求め、ｎがｎ＞Ｍの範囲に或るときには、１≦ｍ≦ＭのＭ個のｍに対して該ｍ個のP_nmを P_nm＝P_n-m＋I_nmによって計算し、結局P_n1,P_n2,……,P_nMのＭ個の最小頻級和を求める。When m is the number of syllables of the last speech in the speech segmentation type, When n is in the range of 1 ≦ n ≦ M, n−1 m for m ≦ n−1 of
The P _nm calculated by _{_{_{P nm = P nm + I nm}}} , for the one m of m = n, the one of the P _nm calculated by _{_{_{P nm = P nn = I nm}}} , after all P _n1, P _n2 ,..., P _nn are obtained as n minimum frequency sums. When n is in the range of n> M, m m P _nm for 1 ≦ m ≦ M m _Is calculated by P _nm = P _nm + I _nm , and finally, the M minimum sums of P _n1 , P _n2 ,..., P _nM are obtained.

上記のP_n1〜P_nmのうちから最小値を持つ頻級和をｍ者
択一により選抜して、これを総最小頻級和P_nとする。ｍ
者択一は、ｍ−１回の二者択一選抜処理によって実行さ
れる。A frequency sum having a minimum value is selected from the above P _{n1 to} P _nm by selecting one of the m values, and is selected as a total minimum frequency sum P _n . m
The alternative is executed by m-1 alternative selection processes.

P_n1,P_n2,……,P_nmの中からただ１個のP_nが選抜された
とき、最小頻級和を持つ語音区切型も同時に選抜されて
いる。P_nを持つ区切型の末尾語音の音節数をmoとすれ
ば、上記P_n1,P_n2,……のｍ個のうち、添字moのP_nmoがP_n
として選抜されたのである。すなわち上記のｍ者択一選
抜によって、現在処理の対象になっているｎ音節音頻処
理区において、（１）最適語音区切型の最小頻級和の値 P_n （２）該型の末尾語音の音節数 mo が決定される。ところでP_nが計算されたP_n＝P_n-mo＋I_mo
の式のなかのP_n-moは、ｎがｎ−moという、ｎよりも以
前の段階で既に求められている。即ち総最小頻級和P
_n-moの語音区切型の末尾語音の音節数も同様に既知であ
る。この論理によって処理段階をｎ＝１に至るまでさか
のぼって考えれば、本音頻処理法の作用は、次の２点に
簡約できる。When only one P _n is selected from P _n1 , P _n2 ,..., P _nm , the word segmentation type having the minimum frequency sum is also selected at the same time. Assuming that the number of syllables of the delimited end speech having P _n is mo, of the above m of P _n1 , P _n2 ,..., P _{nmo of} the subscript mo is P _n
It was selected as. That is, in the n-syllable syllable frequent processing section currently being processed by the above-mentioned m-choice selection, (1) the value P _n of the minimum frequent class sum of the optimal syllable segmentation type; The number of syllables mo is determined. By the way, _Pn calculated _Pn = _Pn-mo + _Imo
P _n-mo in the equation has already been obtained before n, where n is n-mo. That is, the total minimum frequency sum P
Similarly, the number of syllables of the last speech in the _n-mo speech segmentation type is known. If the processing stage is considered up to n = 1 by this logic, the operation of the true tone processing method can be reduced to the following two points.

（１）新しく音節が入力され、音頻句音節長が１音節
伸びるごとに、最適語音区切を更新するために、最小頻
級和P_nを示す最適語音並びの末尾語音の音節長moを求め
る（入力追従作用）。(1) Each time a new syllable is input and the syllable syllable syllable length is extended by one syllable, the syllable length mo of the last syllable in the optimal syllable sequence indicating the minimum frequent class sum P _n is determined to update the optimal syllable delimiter ( Input tracking action).

（２）最適語音並びは、以前に求めてあったところ
の、moだけ音節長の短い音頻句の最適語音並びに、音節
長moの語音を末尾に加えて得た語音並びであり、該極音
並びの頻級和P_nは、同じく以前に求めてあったところ
の、moだけ音節長の短い音頻句の最小頻級和に、音節長
moの語音の頻級を加算して得た頻級和である（逐次作
用）。(2) The optimal phonetic sequence is the optimal phoneme of the vocal phrasal whose syllable length is short by mo and the phonetic sequence obtained by adding the phoneme of the syllable length mo to the end, which was obtained before. The frequent syllable sum P _n is calculated as the minimum frequent syllable sum of syllable phrases with short syllable length by mo
This is the sum of the frequencies obtained by adding the frequencies of the words of mo (sequential action).

上記の入力追従逐次語音区切り作用のために必要な処
理は、P_nとmoとを求めるためのＭ回以下の頻級検索と整
数加算およびＭ−１回以下の二者択一大小比較で完結
し、音節入力序号あるいは音頻句音節長ｎの如何に拘ら
ず、常に同一である。すなわち本音頻処理法による最適
語音区切処理は、アルゴリズムが非常に簡潔であるだけ
でなく、音節入力１回に続く処理時間が処理対象の音節
列の長さに全く無関係で常に同一である。ここのことは
実用上大きな利点となる。The processing required for the above-described input-tracking sequential speech segmentation function is completed by a frequency search of M times or less for obtaining P _n and mo, an integer addition, and a binary comparison of M-1 times or less. However, it is always the same regardless of the syllable input ordinal number or syllable syllable length n. In other words, the algorithm for the optimal word-sound segmentation processing by the true frequent processing method is not only very simple, but also the processing time following one syllable input is always the same regardless of the length of the syllable string to be processed. This is a great advantage in practical use.

ある音頻句においてｎ番目に入力された音節に関する
処理を終えたときには語音漢字変換は、上述の最適語音
区切処理の結果P_nと共に求められたmoを使用して次の漢
字文字列加算式によって実行される。When the processing related to the nth input syllable in a syllable is completed, the phonetic kanji conversion is performed by the following kanji character string addition formula using the mo obtained together with the result P _n of the above-described optimal syllable delimiter processing. Is done.

K_n＝K_n-mo＋H_mo ……（Ｋ）上記の計算は、ｎがｎ−moの段階で既に求められてい
る漢字文字列K_n-moに、現在ｎがｎのとき語音漢字語辞
書から語音R_n-moを見出しとして検索された最確漢字語H
_moを接続する作用を果たし、最確漢字列K_nは、新しい音
節入力によってｎが１進む度に逐次に求められる。K _n = K _n-mo + H _mo ... (K) The above calculation is based on the kanji character string K _n-mo already obtained when n is n _-mo. The most probable kanji word H searched from the dictionary with the word sound R _n-mo as the heading
fulfill the function of connecting the _mo, most確漢string K _n is, n the new syllable input is determined sequentially in time proceeding 1.

第２図には、Ｍ＝４とし、ｎが１から７のときのP₁か
らP₂までのP_nを求める逐次処理と、それと同時に進行す
るK_nを求める最適語音区切逐次処理および語音漢字変換
処理のアルゴリズムＡを示す。このアルゴリズムは後出
の第５図の実施例Ａに使用されている。K_nは１個の音節
入力に対し、最初にＭ個（ｎ≧Ｍのとき）のK_nmを用意
し、そのなかからP_nの選抜にならって最適の１個を選ん
でK_nとする必要はなく、第５図のようにP_nmの二者択一
選抜１回ごとに対応してK_nm1個を計算して求めていけば
よい。このアルゴリズムは中国語ワープロに対するＭ＝
２の音頻処理のとき有用である。FIG. 2 shows a sequential process for obtaining P _n from P ₁ to P ₂ when M = 4 and n is 1 to 7, an optimal word-segment separation sequential process for obtaining K _n progressing simultaneously, and a word-phone kanji. The algorithm A of the conversion processing is shown. This algorithm is used in Example A in FIG. 5 described later. K _n is for one syllable input, first to prepare the K _nm of the M (when n ≧ M), and K _n to choose one of the best in imitation from among them in the selection of P _n There is no necessity, as shown in FIG. 5, one K _nm may be calculated and obtained for each alternative selection of P _nm . This algorithm uses M =
This is useful for the second frequency processing.

第３図には、Ｍ＝４とし、ｎ＝１〜７のときのP₁から
P₇までの各P_nの処理ごとに求められたmoを使用して、１
個の音節入力に対してH_moの検索と式（Ｋ）の文字列加
算をただ１回実行してK_nを求める語音最適区切逐次処理
および語音漢字変換処理のアルゴリズムＢを示す。この
アルゴリズムは最も簡潔明瞭であり、３以上のＭが必要
な日本語・韓国語のワープロにおける音頻処理に対して
有用である。このアルゴリズムＢは後出の第７図に示す
実施例２において利用されている。The third figure, with M = 4, from P ₁ in the case of n = 1 to 7
Use mo determined for each processing of each P _n to P _7, 1
The algorithm B of speech-sound optimal delimiter sequential processing and speech-kanji conversion processing for finding K _n by executing H _mo search and expression (K) character string addition only once for syllable input is shown below. This algorithm is the simplest and clearest, and is useful for frequent processing in Japanese / Korean word processors requiring M of 3 or more. This algorithm B is used in the second embodiment shown in FIG. 7 described later.

或る語音Ｒが辞書にないとき、したがって頻級Ｉも漢
字語Ｈも存在しないときには、関連する頻級和が当然存
在しない。そのときには、該当する二者択一選抜は不要
であり、処理の流れにおいて省略される。本法による実
際の音頻処理においては、日本語、中国語、韓国語のい
ずれにおいても、とくに音節長の長い語音において、存
在しない語音が多い。何音節の語音の頻度が統計上多い
かは、中国語では１と２音節語音が圧倒的に多いが、日
本語・韓国語ではこれと異なっている。言語によって個
性のあるこれらの事情は、処理ソフトを設計するとき考
慮する点である。When a word R is not in the dictionary, and thus neither the frequent class I nor the kanji word H exists, there is naturally no related frequent class sum. In that case, the corresponding alternative selection is unnecessary, and is omitted in the processing flow. In the actual tone frequency processing according to the present method, there are many non-existent words in Japanese, Chinese, and Korean, especially in words having a long syllable length. The number of syllables whose frequency is statistically high is overwhelmingly high for 1 and 2 syllables in Chinese, but different for Japanese and Korean. These circumstances, which have individuality depending on the language, are points to consider when designing processing software.

あらゆる言語において、語音（語の読み）の音節長に
は上限がある。それがＭである。Ｍは「語」の定義また
は辞書の設計方針によって変わり得るが、日本語では
６、中国語では４、韓国語では４程度ではないだろう
か。In any language, there is an upper limit on the syllable length of speech sounds (word readings). That is M. M may vary depending on the definition of "word" or the design policy of the dictionary, but is it about 6 in Japanese, 4 in Chinese, and 4 in Korean?

ｎがＭよりも小さいときには、P_nmはｎ個しかない。
したがって、P_nを得る手続きは縮減を必要とする。言語
によってｎの最大値Ｎ（１音頻句の最大音節長）および
Ｍの最大値は異なる。音頻処理の進行中に、上記の縮減
手続が使用される率も、言語によって異なると思われ
る。中国語のときには、この率は相当に高い。ソフトの
設計に際して考慮すべき点である。When n is smaller than M, there are only n P _nm .
Therefore, the procedure for obtaining P _n requires reduction. The maximum value N (maximum syllable length of one vowel phrase) and the maximum value of M differ depending on the language. The rate at which the above reduction procedure is used during the tone processing may also vary from language to language. In Chinese, this rate is quite high. This is a point to be considered when designing software.

頻級Ｉ（正の整数変数）と漢字語列Ｈ（文字変数）と
は、音節が入力されるごとに更新される。ゆえにこれら
のために必要な変数はＩとＨに対して実際にはI₁〜I_Mと
H₁〜H_Mの各Ｍ個あればよく、第２図と第３図にあるよう
に各ｎごとにＭ個づつ置く必要はない。最小頻級和P_n1
〜P_nMについても同じ理由で、Ｍ個だけを確保すればよ
い。The frequent class I (positive integer variable) and the kanji word string H (character variable) are updated each time a syllable is input. Therefore, the variables needed for these are actually I _{1 to} I _M for I and H.
It suffices to provide _M for each of H _{1 to} H _M , and it is not necessary to place M for each n as shown in FIGS. 2 and 3. Minimum sum P _n1
With respect to .about.P _nM, for the same reason, it is sufficient to secure only M pieces.

ｎ番目の音節入力に対して、P_n1〜P_nMの計算のために
必要なＰは、P_n-1〜P_n-MのＭ個である。したがって、Ｐ
とＫを逐次に求める処理においては、P₁からP_n-M-1まで
のｎ−Ｍ−１個分の古いデータは不必要である。同じ
く、最確漢字列処理においても、逐次処理に必要なデー
タはK_n-1〜K_n-MのＭ個で済む。For the n-th syllable input, P required for calculation of P _{n1 to} P _nM is M of P _{n−1 to} P _nM . Therefore, P
And in the process of sequentially obtaining the K, nM-1 pieces of old data from P ₁ to P _nM-1 is unnecessary. Similarly, also in the top確漢string processing, data necessary for the sequential processing requires only the M K _n-1 ~K _nM.

本発明者によって既に出願されている特願昭63−1050
30号および特願昭63−172163号においては「節点」に関
して、「連続する２つの音節間の仮想点を跨ぐいかなる
語音もないとき、その点を節点する。節点において音頻
処理区は切断される。」とされている。音頻処理の途中
で「節点」が見出されたときには、ｎを初期値（現在の
説明においては１）にリセットし、次の音節から新しい
音頻処理区が始まるとして処理を進める。Japanese Patent Application No. 63-1050 already filed by the present inventors
In Japanese Patent Application No. 30 and Japanese Patent Application No. 63-172163, regarding "nodes", "when there is no speech that crosses a virtual point between two consecutive syllables, the node is connected. The frequent processing section is cut off at the node. . " When a “node” is found in the middle of the syllable processing, n is reset to an initial value (1 in the present description), and the process proceeds assuming that a new syllable processing section starts from the next syllable.

上記既出願の発明における「断点」の定義および効用
は、Ｍ＞２の場合には無効である。The definition and utility of the "breakpoint" in the above-mentioned invention of the application are invalid when M> 2.

以上のように、逐次処理による音頻語音区切漢字変換
は、ＮやＭが相当に大きい場合でも非常に簡潔なアルゴ
リズムによって実用可能である。As described above, the conversion of vowel-word-separated kanji by sequential processing can be practiced by a very simple algorithm even when N and M are considerably large.

3.5.3 頻級和同点処理音頻処理の進行途上において、頻級和および総頻級和
を二者択一の大小比較によって実行するとき、頻級和は
正の整数であるため、往々にして二者が等しい場合が生
じる。対策には次の方法がある。3.5.3 Frequent sum sum tying process In the process of tone frequent processing, when the frequent sum and the total frequent sum are executed by alternative magnitude comparison, the frequent sum is a positive integer. A case arises where the two are equal. The following measures are available.

（１）２個の語音列のうち末尾の語音の音節数が多い
方を選抜するようにアルゴリズムを設定する。(1) An algorithm is set so as to select the one with the larger number of syllables of the last speech from the two speech strings.

（２）２個の語音列のうち、末尾の語音の音節数が少
ない方を選抜するようにアルゴリズムを設定する。(2) An algorithm is set so as to select one of the two speech strings that has a smaller number of syllables of the last speech.

これらのうち、いずれを選ぶかは対象とする言語の特
性によるべきである。後出の実施例においては、（１）
の方法を採用している。Which of these should be chosen depends on the characteristics of the target language. In the embodiment described later, (1)
The method is adopted.

3.6 実施例 3.6.1 音頻語音区切逐次漢字変換処理の構成例第４図は3.5の「作用」の原理に基づき、「音頻句」
に対して最適語音区切を施し、この語音区切によって選
択された語音を漢字語に変換する本発明の一実施例の構
成を示すブロック図である。この第４図によって本発明
による語音区切および語音漢字変換方式を具体的に説明
する。3.6 Example 3.6.1 Configuration example of frequent-word-speech-separated sequential kanji conversion processing Fig. 4 shows the "phonetic phrase" based on the principle of "action" in 3.5.
FIG. 1 is a block diagram showing a configuration of an embodiment of the present invention in which an optimal speech sound separation is performed on a utterance and a speech sound selected by the speech sound separation is converted into a kanji word. Referring to FIG. 4, the speech separation and speech / kanji conversion method according to the present invention will be described in detail.

第４図において、語音の最大音節長はＭ、音頻処理区
における音節入力番号ｎは１を初期値とし、ｎ＜Ｍの場
合および扱う辞書に存在しない場合の処理は省略してあ
る。また、効率的なメモリの利用や処理の順序について
の考慮は払われていない。In FIG. 4, the initial syllable length of a word sound is M, and the syllable input number n in the syllable processing section is 1 as an initial value, and the processing in the case of n <M and in the case where it does not exist in the dictionary to be handled is omitted. In addition, no consideration is given to efficient memory use and processing order.

第４図における各手段ブロックの機能は以下の通りで
ある。The function of each means block in FIG. 4 is as follows.

1:音節入力手段キーボード等の「読み」を入力する。特に“音節”入
力手段としたのは、本発明の対象の言語の中国語・日本
語・韓国語が、音韻上いずれも顕著な音節構造をなし
（中国語の漢字の読みは１音節。日本語は１モーラが１
音節、ハングルはそれ自身完璧な音節文字）、たとえロ
ーマ字で入力したとしても、辞書中の語音の見出しＲは
音節単位で記憶するのが効率が高いからである。1: Syllabic input means Input "reading" from a keyboard or the like. In particular, the "syllable" input means is that Chinese, Japanese, and Korean, which are the target languages of the present invention, have a pronounced syllable structure in all phonemes (Chinese kanji reading is one syllable; Japan The word is one mora
This is because even if the syllables and Hanguls are perfect syllabic characters themselves, even if they are input in Roman characters, it is highly efficient to store the headings R of the phonetic sounds in the dictionary in syllable units.

3:辞書以下のR,I,Hの添字は音節数を示す。１〜Ｍの音節の
語音の語彙が辞書に記憶されている。3: Dictionary The following R, I, H subscripts indicate the number of syllables. The vocabulary of the syllables 1 to M is stored in the dictionary.

（１）音節列によって表現された語音見出しＲ（添字
は音節数を示す。１〜Ｍ音節の語音の語彙が辞書に記憶
されている）（２）各語音の頻級Ｉ（３）各語音ごとの同音の漢字語彙Ｈ（漢字コード表
現）等のデータを記憶させてある記憶手段である。(1) Speech headings R represented by syllable strings (subscripts indicate the number of syllables. The vocabulary of speech sounds of 1 to M syllables is stored in the dictionary) (2) Frequency I of each speech sound (3) Each speech sound This is storage means for storing data such as the same kanji vocabulary H (kanji code expression) for each sound.

2:語音生成手段音節入力手段１から音節R_nが入力される都度、２は、
既に入力済みで２に記憶されている音節R_n-1,R_n-2,…
…,R_n-M+1とR_nを連結し、によって、R_n1,R_n2,……R_nMのＭ個の語音を生成し、記
憶する。2: Each time the syllables R _n is input from the speech generation means syllable input means 1, 2,
The syllables R _n-1 , R _n-2 , ... already entered and stored in ₂
…, Concatenate R _{n-M + 1} and R _n , R _n1 , R _n2 ,..., R _nM are generated and stored.

4:語音頻級漢字データ検索手段３は４の辞書から、２で生成された語音R_n1,R_n2,…
…,R_nMを見出しとして、次のデータを検索し、記憶す
る。4: Speech frequency kanji data search means 3 The speech sounds R _n1 , R _n2 ,.
.., _RnM is searched for the next data, and stored.

（１）語音R_n1,R_n2,……,R_nMの各頻級I_n1,I_n2,……,I
_nM （２）語音R_n1,R_n2,……,R_nMを持つ漢字語H_n1,H_n2,…
…,H_nM （最近アクセス優先式等の自己学習により、同音語のう
ち現在もっとも確からしいものを選定する） 5:最小頻級和生成手段２から頻級I_n1,I_n2,……,I_nMを受け、１〜Ｍのｍに対
して、式P_nm＝P_n(n-m)＋I_nmによってＭ個の最小頻級和P
_n1,P_n2,……,P_nMを求め、記憶する。(1) Each frequency class I _n1 , In ₂ , ..., I of speech sounds R _n1 , R _n2 , ..., R _nM
_nM (2) Kanji words H _n1 , H _n2 ,… with speech sounds R _n1 , R _n2 , ..., R _nM
…, H _nM (Select the most probable of the same phonetic words by self-learning such as the recent access priority formula.) 5: Minority class sum generation means 2 and frequency classes In ₁ , In ₂ , ……, I Given _nM , for m from 1 to M, the M minimum frequency sums P by the formula P _nm = P _{n (nm)} + I _nm
_Calculate and store _n1 , _Pn2 ,..., _PnM .

6:漢字列生成手段２から漢字語H_n1,H_n2,……,H_nMを受け、１〜Ｍのｍに
対して、式K_nm＝K_n(n-m)＋H_nmによってＭ個の漢字列
K_n1,K_n2,……,K_nMを求め、記憶する。6: Kanji string generating means 2 receives the kanji words H _n1 , H _n2 ,..., H _nM from M, and, for m of 1 to M, M kanji strings by the formula K _nm = K _{n (nm)} + H _nm
K _n1 , K _n2 ,..., K _nM are obtained and stored.

7:総最小頻級和選抜手段４から最小頻級和P_n1,P_n2,……,P_nMを受け、そのなか
で最小の値を持つものを選抜し、これをP_nとして記憶す
る。7: Total minimum frequency class selection means 4 receives the minimum frequency class sums P _n1 , P _n2 ,..., P _nM from among them, selects the one with the minimum value among them, and stores it as P _n .

8:最確漢字列生成手段５から漢字列K_n1,K_n2,……,K_nMを受け、そのなかでP_n
に対応するものを最確漢字列K_nとして記憶する。8: The most probable kanji string generation means 5 receives the kanji strings K _n1 , K _n2 , ..., K _nM from among them, and among them, P _n
Those corresponding to the storage as the most確漢string K _n.

9:漢字列表示手段 K_nをディスプレイ上に表示する。9: Kanji string display means _Kn is displayed on the display.

12:漢字語音判断手段漢字以外の記号等が入力されたとき、それを非漢字と
してディスプレイ上に表示し、音頻処理区を閉じるた
め、音節入力番号ｎを１にリセットして次の音頻処理区
の開始に備える。12: Kanji word sound judging means When a symbol or the like other than Kanji is input, it is displayed on the display as a non-Kanji character, and the syllable input number n is reset to 1 to close the frequent syllable processing section. Prepare for the start of

3.6.2 第１の実施例第５図は１音節語音、２音節語音、３音節語音、また
は４音節語音を音節を単位として入力し、本発明の核心
である二者択一式音頻自動語音区切逐次漢字変換方式を
利用した語音区切および語音漢字変換装置（本発明の第
１の実施例）における処理手順を示す流れ図である。こ
の例のアルゴリズムは前述の第２図を使用する。3.6.2 First Embodiment FIG. 5 shows one syllable speech, two syllable speech, or four syllable speech input in units of syllables, which is the core of the present invention. 5 is a flowchart showing a processing procedure in a speech-sound separating and speech-kanji conversion apparatus (first embodiment of the present invention) using a sequential kanji conversion method. The algorithm in this example uses FIG. 2 described above.

この図においては、語音音節長は１〜４、音節入力番
号ｎは１を初期値とする。In this figure, the initial values of the syllable syllable length are 1 to 4 and the syllable input number n is 1.

[Explanation of variables]

ｎ正の整数変数。 n Positive integer variable.

初期値は１。音節入力番号で、節点検出によって２に
リセットされる。１ではなか２にリセットされる理由は
後に述べる。The initial value is 1. Syllable input number, reset to 2 by node detection. The reason why 1 is reset to 2 will be described later.

ｍ正の整数変数。第５図では１〜４。m Positive integer variable. In FIG. 5, 1-4.

ｎと組みあわせて諸変数の添字とする。語音区切型の
末尾語音の音節数である。Subscripts of various variables are used in combination with n. This is the number of syllables of the last speech in the speech separation type.

Ｒ文字変数。R Character variable.

毎回入力した１個の音節。 One syllable entered each time.

R_m 文字変数。第５図ではR₁〜R₄。R _m character variable. In FIG. 5, R _{1 to} R ₄ .

語音であって3.6.1項で述べたR_n1〜R_nmと同じ、これ
らをR_n1のごとく２変数の文字変数にしない理由は、こ
れらの変数は各回の音節入力ごとに更新されるからであ
る。It is the same as R _{n1 to} R _nm described in Section 3.6.1, and it is not a two-character character variable like R _n1 because these variables are updated for each syllable input. is there.

R₁〜R₄の初期値はすべてノンストリング（nstr）とす
る。Ｒを入力した直後の段階で、添字ｍの初期値を４と
し、ｍを１づつ減じつつR_m＝R_m-1＋Ｒの計算をｍが２に
達するまで繰り返せば（R_m-1はｎが１段前の状態にあ
る）、R₄,R₃,R₂が得られ、最後にｍ＝１に対してR₁＝Ｒ
とすればR₁〜R₄を得る。このときｍ＞ｎのときは上記の
計算を飛ばしてｍ＝ｍ−１の処理だけを計算することに
より、ｎが４よりも小さいとき論理的に不要なR_mはすべ
てnstrのままに留まる。第５図の流れ図では、この処理
は（１）のR₁〜R₄生成ブロックによって実行される。な
おR₁〜R₄生成ブロックの詳細は第６図に示してある。The initial values of R _{1 to} R ₄ are all non-strings (nstr). Immediately after inputting R, the initial value of the subscript m is set to 4, and the calculation of R _m = R _m-1 + R is repeated while reducing m by 1 until m reaches 2, (R _m-1 is n Is one stage before), R ₄ , R ₃ , and R ₂ are obtained. Finally, for m = 1, R ₁ = R
Then, R _{1 to} R ₄ are obtained. At this time, when m> n, the above calculation is skipped, and only the processing of m = m-1 is calculated. When n is smaller than 4, all logically unnecessary R _m remain nstr. In the flow diagram of Figure 5, this processing is executed by R ₁ to R ₄ generation block (1). Note Details of R ₁ to R ₄ generating block is shown in Figure 6.

I_m 正の整数変数。第５図ではI₁〜I₄。I _m positive integer variable. In FIG. 5, I _{1 to} I ₄ .

語音R₁〜R₄の各々の頻級。音節入力ごとに、R_mを見出
しにして辞書から求める。第５図では辞書検索の過程は
書かれていない。なお１音節の読みに対して語音のない
場合でも語音とＩとＨとは形式的に存在するものとす
る。これを虚語音と名付ける。ただしこの場合Ｉ＝32と
し音頻処理に無関係なようにする。実語音のない２以上
の音節の語音のＩは定義する必要がない。Frequencies of each of the speech sounds R _{1 to} R ₄ . For each syllable input obtained from the dictionary and the R _m heading. In FIG. 5, the dictionary search process is not shown. Note that even if there is no speech for one syllable reading, it is assumed that speech, I, and H exist formally. This is called an imaginary word sound. In this case, however, I = 32 is set so that it is irrelevant to the tone frequency processing. It is not necessary to define the I of two or more syllable speeches without real speech.

H_m 漢字変数。第５図ではH₁〜H₄。H _m Kanji variable. In the FIG. 5 H ₁ to H _4.

語音R₁〜R₄の各々を変換した漢字語の同音語中で、現
在最も確からしいもの。音節入力ごとに、R_mを見出しに
して辞書を引いて求める（図では省略されている）。実
語音のない１音節語音に対する漢字語H₁は「読み」のま
まとする。すなわちH₁＝R₁。なお日本語や韓国語におい
て、実語音が有で出力表現が「仮名」や「ハングル」の
場合、H_m＝R_mとする。たとえば日本語の助詞「の」はI₁
＝５でH₁＝“の”となる。虚語音の「ん」はI₁＝32、H₁
＝“ん”である。実語音「から」はI₂＝８、H₂＝“か
ら”、同音H₂は“殻”、“空”“唐”である。Ｈが漢字
または仮名あるいは両者の混合の如何にかかわらず、本
発明においては論理形式上、Ｈはすべて「漢字」と呼ぶ
ことにする。Speech R ₁ to R ₄ each in the converted kanji word homophones in the those likely currently most reliable. For each syllable input, obtained by subtracting the dictionary and the R _m heading (omitted in the figure). Kanji language H ₁ for 1 syllable words sound without actual words sound has been left "reading". That is, H ₁ = R ₁ . It should be noted that in Japanese and Korean, the actual words sound when the output representation Yu is "pseudonym" and "Hangul", and H _{_m} = R _m. For example, the Japanese particle "no" is I ₁
= 5, H ₁ = “of”. The imaginary word “n” is I ₁ = 32, H ₁
= "N". Migooto "from" is _{_{"to" I 2 = 8, H 2}} =, homophones H ₂ is a "shell", "sky", "Tang". Regardless of whether H is a kanji or a kana or a mixture of both, in the present invention, all Hs are called "kanji" in a logical form.

PN_m 正の整数変数。第５図ではPN₁〜PN₄。PN _m positive integer variable. In the FIG. 5 PN ₁ to PN _4.

末尾の語音が１〜４音節語音の語音列におけるそれぞ
れの最小頻級和。PN_m＝P_n-m＋I_mの計算によって、既に
計算済みのＰにＩを加えて求める。音節入力ごとに更新
される。The minimum sum of the respective frequencies in the phonetic sequence of the last syllable having one to four syllables. By calculating the _{_{_{PN m = P nm + I m}}} , determined previously by adding I to the precalculated P. Updated for each syllable input.

P_A 正の整数変数。P _A positive integer variable.

PN₁とPN₂とを比較しPN₂≦PN₁ならP_A＝PN₂,PN₂＞PN₁な
らP_A＝PN₁である。R₁とR₂の語音が共に存在しなければP
_Aは決まらないが、日本語では、「ん」や「っ」のよう
な語音のない読みが２個以上続くことはないから、対策
は不要である。一方中国語では語音のない音節は存在し
ない。P_Aは音節入力のたび更新される。PN ₁ and PN ₂ and compares P _A = PN ₂ if _{_{_{PN 2 ≦ PN 1, PN 2}}} > is P _A = PN ₁ if PN _1. If there speech of R ₁ and R ₂ are both P
_A is not determined, but no countermeasures are required in Japanese because no two or more non-sounding readings such as "n" or "tsu" do not continue. On the other hand, there is no syllable without speech in Chinese. P _A is updated every syllable input.

K_A 文字変数。K _A character variable.

上記のP_Aの選抜においてPN₂≦PN₁ならばK_A＝K_n-2＋
H₂,PN₂＞PN₁ならばK_A＝K_n-1＋H₁とする。K_Aの選抜は、
3.5の「作用」において既に説明したように、P_A選抜に
おいて総最小頻級和が得られた区切型の末尾語音と音節
数を等しくする漢字語Ｈを末尾に持つ漢字列をK_Aに選ん
だのである。K_Aは音節入力の都度更新される。If PN ₂ ≦ PN ₁ in selection of the above _{_{_{P A K A = K n-}}} 2 +
If H ₂ , PN ₂ > PN _1, then K _A = K _n-1 + H ₁ . The selection of K _A
As already described in the "action" of 3.5, choose the kanji column with Kanji word H to equalize P _A tail speech and number of syllables of the total minimum frequent class sum resulting separated type in singles trailing K _A It is. K _A is updated each time a syllable is input.

P_B 正の整数変数。P _B Positive integer variable.

PN₃とPN₄とを比較しPN₄≦PN₃ならP_B＝PN₄,PN₄＞PN₃な
らP_B＝PN₃である。R₃とR₄の双方とも語音がないときに
はP_B＝255とする。意味のあるＩの最大値は18で、14音
節の長さの音頻処理区は経験上絶無だからである。PN ₃ and PN ₄ are compared. If PN ₄ ≤ PN _3, P _B = PN ₄ and if PN ₄ > PN _3, P _B = PN ₃ . When there is no speech both of R ₃ and R ₄ and P _B = 255. This is because the meaningful maximum value of I is 18, and the frequent syllable processing section having the length of 14 syllables is inexperienced.

K_B 文字変数。K _B character variable.

上記のP_Bの選抜において、PN₄≦PN₃ならK_B＝K_n-4＋
H₄,PN₄＞PN₃ならばK_B＝K_n-3＋H₃とする。P_B＝255のとき
には、K_B＝nwrdとする。In selection of the above P _B, if _{_{_{PN 4 ≦ PN 3 K B =}}} K n-4 +
And H _{_4,} PN _4> PN ₃ if _{_{K B = K n-3 +}} H 3. When P _B = 255 is the K _B = nwrd.

P_n 正の整数変数。第５図ではP₀〜P_n。P _n positive integer variable. In Figure 5 P ₀ to P _n.

そのｎにおける総最小頻級和。PN_mのＭ者択一選抜に
よって求められる。第５図の例では、ｍ＝１〜４なの
で、１個のP_nを求めるために、二者択一選抜は４−１＝
３回必要である。P_nを求める過程は第３回目の選抜であ
る。P_B≦P_AならばP_n＝P_B,P_B＞P_AならばP_n＝P_Aである。
終始P₀＝０である。The total minimum frequency sum at that n. It is determined by M-choice selection of PN _m . In the example of FIG. 5, since m = 1 to 4, in order to obtain one P _n , the alternative selection is 4-1 =
You need three times. The process of obtaining P _n is the third selection. P _B ≦ P _A if P _n = P _B, a P _B> P _A if P _n = P _A.
P ₀ = 0 all the time.

K_n 文字変数。第５図ではK₀〜K_n。K _n character variable. In FIG. 5, K _{0 to} K _n .

そのｎにおける最適語音区切による漢字語列。P_nが求
められる過程の一回の二者択一選抜の各段階に並行して
K_nを求める過程が進行する。P_n＝P_BならばK_n＝K_B,P_n＝P
_AならばK_n＝K_Aである。R₂とK_Bの双方ともnwrdのときはK
_n＝nwrdとする。これは現在入力された音節Ｒの直前が
節点であることを示す。K₀は終始ノンストリング（nst
r）である。The kanji word string by the optimal word separation at that n. In parallel with each stage of one alternative selection process in which P _n is required
The process of obtaining K _n proceeds. If P _n = P _B , K _n = K _B , P _n = P
If _A, then K _n = K _A. K When the nwrd both of R ₂ and K _B
_{Let n} = nwrd. This indicates that the node immediately before the currently input syllable R is a node. K ₀ is a non-string (nst
r).

[Explanation of process]

第５図は、仮名またはローマ字で読みを入力し語を自
動的に区切ったのち漢字混じり仮名の文に変換する日本
語ワープロ、ハングルで入力し語を自動的に区切ったの
ち漢字混じりハングルの文に変換する韓国語ワープロ、
およびローマ字音で入力し語を自動的に区切ったのち漢字文に変換する
中国語ワープロに関して、いずれにも適合する逐次語音
区切語音漢字変換装置の実施例の主要部を示す。ただし
語音の最大音節数は４とする。以下、処理の流れを説明
する。Fig. 5 shows a Japanese word processor that reads in kana or romaji and automatically separates words and then converts them into kana sentences with kanji characters. Korean word processor, which converts to
And romaji A main part of an embodiment of a sequential word-segmented-word-to-kanji conversion apparatus suitable for any Chinese word processor that automatically converts words into kanji sentences after inputting with sounds will be described. However, the maximum number of syllables of speech sounds is four. Hereinafter, the flow of the processing will be described.

スタートの直後、音節入力番号ｎを１に、４個の語
音R₁からR₄の初期値をノンストリング（nstr）に、P₀と
K₀の初期値をそれぞれ０とnstrに設定する。Immediately after the start, to 1 syllable input number n, the initial value of R ₄ from four speech R ₁ in the non-string (nstr), and P ₀
Set the initial values of K ₀ to 0 and nstr, respectively.

１個の音節または記号Ｒを入力する。 Enter one syllable or symbol R.

0:非文字処理ブロックは、読みを有する音節以外の入力
を処理し表示し、「自由区切」（本発明は元来、人の変
換キー打鍵による変換操作不要を目的とする。しかしオ
ペレータが任意に行う人力区切を排除するものではな
い。これを「自由区切」ということにする）キーが打鍵
されたときには、表示なしでへ戻る。0: The non-character processing block processes and displays an input other than a syllable having a reading, and performs “free separation” (the present invention originally intended to eliminate the need for a conversion operation using a human conversion key. This does not preclude the manual partitioning. This is referred to as "free partitioning." When a key is pressed, the display returns to the display without display.

（１）R₁〜R₄生成ブロックは、第６図に詳細を示す。 ₍₁₎ R 1 ~R ₄ generating block, shown in detail in Figure 6.

以下、語音の頻級I_m、漢字語H_m等の変数（ｍ＝１〜
４）が記述のなかに出て来るが、これらは読みR_mを見出
しとして辞書から検索する。R_mが有の場合はこれらの値
を求める。無の場合にはR_m＝nwrd（ノンワード）とす
る。ただし第５図にはその検索の過程が省略されてい
る。Hereinafter, speech of Shikikyu I _m, such as Chinese language H _m variables (m =. 1 to
4) comes out Some of the description, these are retrieved from the dictionary as a heading reading R _m. If R _m is present, find these values. In the case of nothing, R _m = nwrd (non-word). However, FIG. 5 omits the search process.

1.第１段二者択一選抜ブロック：語音列の末尾の語音の
１音節語音および２音節語音の有無を調べ、いずれの語
音を持つ語音列が最適語音区切かを二者択一によって決
定し、その語音列に対して漢字変換を行う。1. First-stage alternative selection block: Investigate the presence of one-syllable speech and two-syllable speech at the end of the speech sequence, and determine which speech sequence has the optimal speech segmentation. Then, kanji conversion is performed on the word sequence.

R₁とR₂が共に無のときの処理は、前述のように不要
であるが、第５図においては説明の便宜のためと共に
を書いてある。Process when the R ₁ and R ₂ are both free, although not necessary, as described above, in the fifth view is written together with for convenience of explanation.

末尾が１音節語音の語音列の頻級和PN₁を求め、
のPN選抜に備える。ｎ＝１のときにはＰの添字ｎ−１＝
０であるが、それに備えてにおいてP_n-1＝P₀を０にし
ておく必要があったのである。Find the frequent sum PN ₁ of the phonetic sequence ending with one syllable word,
Prepare for PN selection. When n = 1, the subscript n-1 of P =
Although it is 0, it was necessary to set P _n-1 = P ₀ to 0 in preparation for it.

R₂の語音が無のときには、R₂＝nwrdとしての節点
判断に利用する。When speech of R ₂ is a free utilizes the node determined as R ₂ = nwrd.

末尾が２音節語音の語音列の頻級和PN₂を求めてお
く。The end is previously obtained the Shikikyu sum PN ₂ of the word sound column of the second syllable words sound.

PN₁とPN₂を比較し、小さい方を選抜する。値が同一
のときにはPN₂を選抜とする（統計上の根拠はない）。Compare PN ₁ and PN ₂ and select the smaller one. If the values are the same, PN ₂ is selected (there is no statistical basis).

前記の結果に応じてP_AとK_Aを決める。頻級和の
小さい語音列の語音区切の語音列と漢字列とをきめるこ
とになる。最小頻級和の語音区切と同じ語区切を持つ漢
字列はここで初めて決定される。Determining the P _A and K _A in accordance with the results. The speech sequence and the kanji sequence of the speech delimiter of the speech sequence having a small frequent sum are determined. A kanji string having the same word separation as that of the minimum frequency sum is determined here for the first time.

2.第２段二者択一選抜ブロック：語音列の末尾の語音の
３音節語音および４音節語音の有無を調べ、いずれの語
音を持つ語音列が最適語音区切かを二者択一によって決
定し、検定された語音列に対して漢字変換を行う。2. Second-stage alternative selection block: Checks for the presence of 3-syllabic and 4-syllabic speech at the end of the speech sequence, and determines which speech sequence has the optimal speech segmentation Then, kanji conversion is performed on the tested word sound sequence.

′ ３音節語音R₃が有のときには処理を′に送る。
無のときには′に送る。Send to '3 syllable speech R ₃ is a process when chromatic'.
When there is nothing, send to '.

′ 末尾が３音節語音の語音列の頻級和PN₃を求め
る。'Seek Shikikyu sum PN ₃ of the word sound column at the end are three syllable words sound.

′ ４音節語音R₄が有のときには処理を′に送る。
無のときには′に送る。'4 when syllable speech R ₄ is closed Handles' send.
When there is nothing, send to '.

′ 末尾が４音声語音の語音列の頻級和PN₄を求め
る。'Ending seek 4 Shikikyu sum PN ₄ of the word sound column of sound words sound.

′ PN₃とPN₄を比較し、小さい方を選抜する。値が同
一のときにはPN₄を勝ちとする（統計的な根拠はな
い）。′ Compare PN ₃ and PN ₄ and select the smaller one. If the values are the same, PN ₄ wins (no statistical basis).

′′ ′の結果に応じてP_BとK_Bを決める。頻級和
の小さい語音列の語音区切の語音列と漢字列とを決める
ことになる。Determine the P _B and K _B in accordance with the result of '''. The speech sequence and the kanji sequence of the speech delimiter of the speech sequence having the small sum are determined.

′ 処理は、′と同じであるが、語音R₃がないた
め、PN₃とPN₄の間の二者択一選抜は不要で、処理は直接
′へ送られる。'The processing is the same as', but since there is no speech R ₃ , no alternative selection between PN ₃ and PN ₄ is required and the processing is sent directly to'.

′ R₃とR₄が共に無のとき、次の第３段の（ａ）の選
抜で確実にP_n＝P_Aとするため、P_B＝255と置く。255とし
たのは、日本語・中国語とも、統計上意味のある頻級Ｉ
の最大値は18であり、いっぽう節点から次の節点までの
音頻処理区の長さが14音節もあることは、経験から絶無
だからである（18×14＝252）。またK_B＝nwrdと置く。
これはの節点判断ルーチンで使用するためである。'When both R ₃ and R ₄ are absent, set P _B = 255 to ensure that P _n = P _{A in} the next selection of the third stage (a). 255 is used for both Japanese and Chinese, statistically significant frequency I
Is the maximum value of 18, and the fact that the length of the frequent syllable processing section from one node to the next node is 14 syllables is the result of experience (18 × 14 = 252). Also, set K _B = nwrd.
This is for use in the node determination routine.

3.第３段二者択一選抜ブロック：第１段の二者択一で決
めた末尾が１または２音節語音の語音列のうちの最適語
音列のP_Aと、第２段の二者択一で決めた末尾が３または
４音節語音の語音列のうち最適語音列のP_Bとを比較し、
値の小さい方を最適語音区切を持つ語音列として最終的
に決定する。その頻級和P_nと漢字語列K_nが決定され、次
のＲ入力と処理のため保存される。3. The third stage alternatively Selection Block: and P _A of the optimum word sound string of words sound column at the end decided in alternative of the first stage is 1 or 2 syllable words sound, two parties of the second stage It compares the P _B of the optimum word sound string of words sound column at the end decided in alternative 3 or 4 syllables sound,
The one with the smaller value is finally determined as a speech sequence having an optimal speech segmentation. The frequency sum P _n and the kanji word string K _n are determined and stored for the next R input and processing.

節点とは頻級ネットワークにおいて、すべてのパス
が必ず通過する所の、相隣れる２つの１音節語音頻級間
の点をいう。節点をまたぐ如何なる２以上音節の語音も
存在し得ない。節点は音頻処理区を区切るものである。
は現在のＲと一段前に入力したＲの間の点に関する節
点判断ルーチンである。R₂＝nwrdはその点をまたぐ２音
節語音が無、K_B＝nwrdは３および４音節語音がないこと
を示す。A node refers to a point between two adjacent one-syllable speech frequency classes where all paths must pass through in a frequency network. There cannot be any more than two syllable speech that spans a node. The nodes separate the tone processing zones.
Is a node determination routine for a point between the current R and the R input immediately before. R ₂ = nwrd indicates that two syllables speech across the that point no, K _B = nwrd has no 3 and 4 syllables speech.

その点が節点でないときには、ｎをインクメントした
後、処理は次のＲ入力ルーチンへ戻る。If the point is not a node, the process returns to the next R input routine after incrementing n.

その点が節点のときには、現在のＲを次の音頻句の第
１音節として処理する必要がある。その準備のためのル
ーチンにおいて、R₁＝Ｒとし、R₂からR₄までをnstrと
し、I₁をP₁、H₁をK₁にそれぞれ置き直しておく。その
後、第１字目のK₁を表示し、ｎ＝２にセットした後、
のＲ入力ルーチンへ戻る。When that point is a node, it is necessary to process the current R as the first syllable of the next syllable. In the preparation routine, R ₁ = R, R ₂ to R ₄ are set to nstr, I _{1 is set} to P ₁ , and H _{1 is set} to K ₁ . Then, to display the K ₁ of the first character first, after setting the n = 2,
Return to the R input routine.

以上のようにして、音節長が１から４までの語音の総
て音頻処理の対象として、逐次語音区切語音漢字変換を
実施することができる。As described above, it is possible to successively perform the phonetic-segmentation-word-to-kanji conversion for all syllables whose syllable lengths are 1 to 4.

第５図においては、音頻処理すべき語音の最大音節長
（第６図におけるＭ）は４とした。しかし理論的にはＭ
はいくらでも大きくすることができる。たとえば＝８の
ときには、二者択一の第１段でPN₁とPN₂からP_Aを求め、
第２段ではPN₃とPN₄からP_Bを求め、第３段でPN₅とPN₆か
らP_Cを、第４段でPN₇とPN₈からP_Dを求め、次いでP_AとP_B
からP_Xを、P_CとP_DとP_Yを求め、最後の第７段でP_XとP_Yか
らP_nを求めることになる。これらPNおよびＰの処理と並
行して各段ごとにＰとＫを求めていき、最後にK_nに到達
する。注意すべきは、各段の処理のアルゴリジムは第５
図の第２段および第３段とまったく同形式な点である。
二者択一選抜の回数はＭ−１であり、１個のＲ入力に対
してK_nを求めるのに必要な処理時間はＭが大きいときに
は大体Ｍに比例するとみてよい。このようにが大きく音
頻ネットワークの複雑さがどんどん増加しても、処理時
間がＭの１次に関係して増加するにすぎないのは、本音
頻語音区切処理の長所である。In FIG. 5, the maximum syllable length (M in FIG. 6) of the speech to be subjected to frequent processing is set to 4. But in theory M
You can make it as large as you want. For example when the = 8, obtains a P _A from PN ₁ and PN ₂ with a first stage of the alternative,
Seeking P _B from the PN ₃ and PN ₄ in the second stage, the P _C from PN ₅ and PN ₆ in the third stage, obtains a P _D from PN ₇ and PN ₈ in the fourth stage, then P _A and P _B
From the P _X, determine the P _C and P _D and P _Y, thereby obtaining the P _n from P _X and P _Y in the final seventh stage. In parallel with the processing of these PN and P to go in search of P and K for each stage, reaching the last K _n. It should be noted that the algorithm in each stage
This is exactly the same type as the second and third stages in the figure.
The number of alternative selection is M-1, the processing time required to determine the K _n for one R input may likely roughly proportional to M when M is large. Even if the complexity of the frequent phone network increases as described above, the processing time only increases in relation to the first order of M, which is an advantage of the real phonetic word segmentation processing.

本発明の音頻処理は、音頻句の長さにかかわらず、一
個のＲ入力に対してK_nを求めるのに必要な時間処理は、
ｎの１の増加に対して、常に同じである。これは本発明
のもうひとつの長所であり、毎音節入力に直ちに追従し
て行う読み漢字変換の要求に答えるものである。Sound frequent treatment of the present invention, regardless of the length of the sound Shikiku, time processing necessary for obtaining the K _n with respect to one of the R inputs,
It is always the same for one increase of n. This is another advantage of the present invention, which responds to the demand for reading-to-kanji conversion immediately following each syllable input.

3.6.3 第２の実施例第７図は第２の実施例を示す。第５図の第１の実施例
として比較して変数m_A,m_B,mo（いずれも正の整変数）が
追加されている。これらは最小語音区切型の末尾語音の
音節数である。添字ＡまたはＢはそれぞれ第１段または
第２段の二者択一選抜、moは最終選抜のときの値であ
る。3.6.3 Second Embodiment FIG. 7 shows a second embodiment. Compared to the first embodiment in FIG. 5, variables m _A , m _B , and mo (all positive integer variables) are added. These are the number of syllables of the last speech of the minimum speech separation type. The subscript A or B is the value of the first selection or the second selection, and mo is the value of the final selection.

第７図における処理アルゴリズムは、前述の第３図の
アルゴリズムにより、第５図とは次の点が異なる。The processing algorithm in FIG. 7 differs from the algorithm in FIG. 3 in the following points.

（１） PNの大小比較による二者択一選抜の直後にK_A＝
K_n-1＋H₁等の式によってK_Aを求めない。，，′，
′等におけるように、m_Aおよびm_Bを求める。第５図の
ように漢字語探索と漢字文字列加算を行わない。(1) K _A =
K _A is not determined by an expression such as K _n-1 + H ₁ . ,, ′,
As in 'like, obtains the m _A and m _B. As shown in FIG. 5, kanji word search and kanji character string addition are not performed.

（２）第３段二者択一選抜でP_AとP_BからK_nを求めず、
moだけを求める。(2) without asking K _n from P _A and P _B in the third stage alternatively selected,
Ask for mo only.

（３）５および６音節語音R₅及びR₆が追加され、これ
らに対しては最長一致法によって語音漢字変換を行って
いる。次に図中の４の第４段の最長一致区切ブロックに
ついてだけ説明する。(3) 5 and 6 syllable speech R ₅ and R ₆ are added, it is performed speech kanji conversion by the longest match method for these. Next, only the longest-matching partition block at the fourth stage in FIG. 4 will be described.

（ａ） R₅とR₆が共に有の場合、処理の流れは→→
→→_３となり、を出た所でmo＝６が確定し、６
音節語音が他の語音を圧倒して生き、_３において漢字
変換される。漢字列K_nは、K_n-6＋H₆となり、６音節語音
の漢字語H₆が漢字列の末尾語として確定される。(A) If both R ₅ and R ₆ are present, the processing flow is →→
→→ ₃
The syllable words survive the other words and are converted to kanji at ₃ . Kanji column K _n is, K _n-6 + H _6, and the Chinese language H ₆ of 6 syllables sound is determined as the last word of the kanji column.

（ｂ） R₅が有、R₆が無のときには、処理の流れは→
→→_３となり、mo＝５で_３の式はK_n＝K_n-5＋H₅
となり、５音節語音の漢字語H₅が漢字例K_nの末尾語とし
て確定される。(B) When R ₅ is present and R ₆ is absent, the processing flow is →
→→ ₃ next, mo = formula 5 with ₃ _{_{K n = K n-5 +}} H 5
Next, kanji word H ₅ 5 syllables sound is determined as the last word of Kanji example K _n.

（ｃ） R₅が無で、R₆が有のときには、処理の流れは
→→→_３で、６音節語音の漢字語H₆が漢字列の末
尾語として確定される。(C) R ₅ is without, when R ₆ is Yes, the process flow in →→→ _3, Chinese character word H ₆ of 6 syllables sound is determined as the last word of the kanji column.

（ｄ） R₅とR₆が共に無のときには、節点が検出されな
ければ、処理はからへ直通し、第３段の二者択一選
抜のK_nが答となる。When (d) R ₅ and R ₆ are free both, unless the node is detected, the process is direct from to, K _n for alternative selection of the third stage is the answer.

（４）以上の処理を通じ、語音漢字変換は、_１、
_２、_３などの処理の最終段階において初めて行われ
る。それ以前の処理は頻級和加算、頻級和大小比較、語
音列末尾語音音節数の確定など、整数変数の計算処理だ
けが進む。最後にただ一回の語音漢字変換が実行され
る。このアルゴリズムによって、最も時間のかかる辞書
探索の回数が最低減に減り、処理時間が短縮される。(4) Through the above processing, speech kanji conversion _is 1,
_It is performed for the first time in the final stage of the processing such as ₂ , _3, or the like. In the processing before that, only integer variable calculation processing such as addition of frequent class sum, comparison of frequent class sum magnitude, determination of the number of syllable syllables at the end of a word string, and the like proceed. Finally, only one phonetic-kanji conversion is performed. With this algorithm, the number of time-consuming dictionary searches is minimized and processing time is reduced.

第２の実施例のように、音頻語音区切と最長一致区切
とをつないで区切処理をすることができる。その意義は
次のところにある。As in the second embodiment, it is possible to perform the delimiter processing by connecting the vocal utterance delimiter and the longest match delimiter. The significance is as follows.

（１）一般に、長い読みの語音を辞書に入れて置く価
値のある場合は、その語音が頻度の高い合成語または連
語の場合に限る。この類の合成語は多数あるものではな
い。それらの語音が文中に検出されたときには、対応す
る語は一個しかなく、その語が正解である確率はほとん
ど１である。(1) In general, when it is worthwhile to put a long-reading word sound in a dictionary, it is limited to a case where the word sound is a frequent synthetic word or collocation. There are not many compound words of this kind. When those words are detected in the sentence, there is only one corresponding word, and the probability that the word is correct is almost 1.

（２）最長一致区切は、上記の場合、真に威力を発揮
する。日本語において、たとえば「さんぎいん」は、そ
の前に如何なる助詞音節があったとしても、「参議院」
以外はありえない。(2) In the above case, the longest match delimiter is truly effective. In Japanese, for example, "sangiin" means "no matter what particle syllable before it,
There can be no other.

（３）音頻語音区切は、日本語においては、語音ネッ
トワーク上での１〜４音節語音相互の干渉重複を判定す
るのに威力がある。最長一致法は特に長い語音を分離す
るのに役に立つ。(3) In the Japanese language, the vocal utterance delimiter is effective for judging interference and duplication of one to four syllables on a speech network. The longest match method is particularly useful for separating long speech sounds.

なお、第７図においては、第１段二者択一選抜ブロッ
クより前にある処理は大幅に省略されている。また、１
音節語音は如何なる音節に対しても存在するという前提
で書かれている。表現を簡単にするためである。In FIG. 7, the processing prior to the first-stage alternative selection block is largely omitted. Also, 1
Syllable words are written on the assumption that they exist for any syllable. This is to simplify the expression.

3.7 発明の効果 3.7.1 中国語の文に対する効果中国語の文の特徴は、第１に、語音音節長が短い。事
務的な文章では、１または２音節語が全音節中の97％を
占める。第２に、特定の１音節語が頻繁に使用されるこ
とが多い。第３に、３または４音節語は同音語がほとん
どない。このような特徴は、本発明の音頻式自動語音区
切逐次漢字変換方式が、中国語ワープロに適しているこ
とを示す。以下に、例文を挙げて、本発明の効果につい
て述べる。なお語音頻級統計は北京語言学院編「現代漢
語品率詞典」（1986）（対象延べ音節数F_t＝1,807,40
5）を使用した。3.7 Effects of the Invention 3.7.1 Effects on Chinese Sentences One of the characteristics of Chinese sentences is that syllable lengths are short. In office sentences, one or two syllables make up 97% of all syllables. Second, a particular syllable word is often used. Third, three or four syllable words have few homonyms. Such a feature indicates that the frequent phonetic word-separation sequential kanji conversion method of the present invention is suitable for a Chinese word processor. Hereinafter, the effects of the present invention will be described with reference to example sentences. The frequent pronunciation class statistics are described in “Modern Chinese Product Indices” (ed.), Edited by Beijing Language Institute (1986) (total syllables F _t = 1,807,40)
5) was used.

［例文］（機関応用文、案出版社、1885、p.83より）石油是重要的戦略物資。自従七十年代初発生石油危機
以来，合理使用和節約石油，己経成為全世界普遍関注的
問題。目前，我国毎年焼掉的重油和原油数量很大，占原
油総産量的百分之四十左右，其中相当一部分使用不合
理。為了節約能源，必須大力圧縮焼油，使石油更多地用
作軽紡化工原料，並生産更好的成品油，以満足四化建設
的需要。為此，発布如下指令：［ローマ字音］ Shiyou shi zhongyao de zhanle wuzhi。Zicong 七
十 niandai chu fasheng shiyou weiji yilai,heli shi
yong he jieyue shiyou,yijing chengwei quan shijie
pubian guanzhu de wenti。Muqian,woguo meinian shao
dao de zhongyou he yuanyou shuliang hen da,zhan yu
anyou zongchanliang de 百 fenzhi 四十 zouyou,zhizhong xi
angdang yi bufen shiyong bu heli。Weile jieyue nen
gyuan,bixu dali yasuo shao you,shi shiyou geng duo
de yong zuo qingfang huagong yuanliao,bing shengc
han geng hao de chengpin you,yi manzu sihua jiansh
e de xuyao。weici,fabu ruxia zhiling: 第８図は、先ず上記文例ののローマ字音書きを入力し、次に本発明の自動語音区切および逐次
漢字変換方式によって、それを漢字文に出力したときの
効果例を示す。[Example sentence] (Institutional application sentence, (From Draft Publishers, 1885, p.83) Oil is an important strategic commodity. Self-serving Since the oil crisis that occurred in the early 70's, there has been a rational use of petroleum-saving petroleum. Immediately, the quantity of heavy oil and crude oil burned off every year in Japan, the total production of occupied crude oil is around forty percent, and the use of some of them is unreasonable. Energy saving resources, essential high-strength compression baking oil, light spinning raw materials for further use of petroleum used, average production favorable refined oil, satisfying four-dimensional construction demand. For this reason, the directive was issued as follows: [Latin alphabet Sound] Shiyou shi zhongyao de zhanle wuzhi. Zicong seventy niandai chu fasheng shiyou weiji yilai, heli shi
yong he jieyue shiyou, yijing chengwei quan shijie
pubian guanzhu de wenti. Muqian, woguo meinian shao
dao de zhongyou he yuanyou shuliang hen da, zhan yu
anyou zongchanliang de hundred fenzhi forty zouyou, zhizhong xi
angdang yi bufen shiyong bu heli. Weile jieyue nen
gyuan, bixu dali yasuo shao you, shi shiyou geng duo
de yong zuo qingfang huagong yuanliao, bing shengc
han geng hao de chengpin you, yi manzu sihua jiansh
e de xuyao. weici, fabu ruxia zhiling: Fig. 8 shows the romanization of the above example An example of the effect of inputting a syllabary and then outputting it to a kanji sentence by the automatic vocabulary separation and sequential kanji conversion method of the present invention will be described.

図中、３行の横線と２本の横線を所々で結ぶ若干の縦
線によって構成されている２段の煉瓦積みの如きパタン
は、音頻ネットワークである。第９図に第８図の行の
２つの部分（いずれも３音節語音を含む音頻句）ＡとＢ
を抜粋してある。第９図において、左側は音頻句Ａ、右
側は音頻句Ｂである。1Aと1Bは、語音と頻級を記入した
音頻ネットワークで、2Aと2Bは頻級のみを記入してあ
る。2Aは１・２・３音節語音の各５・３・１個を含んで
いて、（１）から（８）までの８種の語音区切の型があ
る。太線で示したパスは、各区切型のパスを示し、ネッ
トワーク右の白抜きの数字は頻数和である。そのなかで
（８）の区切が最小頻級和30を与える。In the figure, a pattern such as a two-stage brickwork composed of a few vertical lines connecting three horizontal lines and two horizontal lines in some places is a tone frequency network. FIG. 9 shows two parts of the row in FIG. 8 (all of which contain syllables).
Is excerpted. In FIG. 9, the left is a frequent phrase A, and the right is a frequent phrase B. 1A and 1B are phony networks in which speech and frequency are entered, and 2A and 2B are in which only the frequency is entered. 2A includes 5.3.1 of each of the 1,2,3 syllable words, and there are eight types of word segmentation from (1) to (8). The paths shown by bold lines indicate the delimited paths, and the white numbers on the right side of the network are the sums of the frequencies. Among them, the partition of (8) gives the minimum frequency class sum 30.

第10図は、上記の５音節の音頻句Ａに対して、二者択
一式逐次語音区切および語音漢字変換における１音節の
入力ごとの経過を示す。図中（１）は頻級を記入した音
頻ネットワーク、（２）は二者択一式にＮの各段階での
総最小頻級和P_Nを逐次に求めていく経過を示す。最適語
音区切はP_Nと同時に決まっていく。（３）は最適語音区
切の進行と同時に進む語音漢字語変換の経過である。右
側の欄は、音節入力が声母（子音）と韻母（母音）の双
打式のときのディスプレイ上の漢字表示法の一例を示
す。声母打鍵で大文字の声母が出、韻母打鍵で漢字一時
がその位置に表示される。これは非常に見易い表示方法
である。FIG. 10 shows the progress of each input of one syllable in the alternative syllable word segmentation and kanji conversion for the five syllable phrasal phrases A described above. Figure (1) the sound frequently networks fill Shikikyu, (2) shows the course of to seek the total minimum Shikikyu sum P _N at each stage of the N sequentially in the alternative expression. The optimal speech delimiter is determined at the same time as P _N. (3) is the progress of speech-kanji word conversion that proceeds simultaneously with the progress of the optimal speech-segment separation. The right-hand column shows an example of a kanji display method on a display when the syllable input is a double-hit type of a vowel (consonant) and a vowel (vowel). When the initial key is pressed, an uppercase initial is displayed, and when the final key is pressed, the temporary kanji is displayed at that position. This is a very easy-to-see display method.

第８図は第９、10図で説明した二者択一語音区切およ
び逐次漢字変換方式を使用して、前記の例文（中国国務
院指令）をローマ字音から元の漢字文に語区切と漢字変換を実行した結果で
ある。変換は句読点・漢数字の直前（漢数字は無変換直
接キー入力）の各点でしか動作していない。図のなか
で、太い破線のパスは「区切違い」に対する「再区切」
後のパスを示す。それはとに各１箇所づつある。漢
数字を除いて141漢字（音節）ある。そのうちにおけ
る区切違いはab/cをa/bcと誤ったもの、はa/bをabと
間違ったものである。141音節に対して、いわゆる文一
括変換を行い、この程度の区切ミスしか出ないというの
は、本発明の方式が相当に効果があることを示す。な
お、図のなかで網を掛けた漢字語は、同音語違いを示
す。区切は合っているが同音語ミスを犯した漢字は合計
９字である。これも決して多くはない。Figure 8 uses the alternative word separation and sequential kanji conversion methods described in Figures 9 and 10 to convert the above example sentence (Chinese State Council directive) into romaji. This is the result of executing word separation and kanji conversion from the sound to the original kanji sentence. The conversion works only at each point immediately before the punctuation mark / kanji (Kanji is the direct key input without conversion). In the figure, the bold dashed path indicates “re-separation” for “different separation”.
Indicates a later pass. There is one for each. There are 141 kanji (syllables) excluding kanji. The difference between them is that ab / c is incorrectly a / bc and a / b is incorrectly ab. The fact that so-called sentence batch conversion is performed on 141 syllables and only such a division error occurs indicates that the method of the present invention is considerably effective. Note that the kanji words shaded in the figure indicate homophone differences. Although the punctuation is correct, there are a total of nine kanji that make the same phonetic mistake. This is not much.

以上のように、本発明の自動語音区切および逐次語音
漢字変換方式は、中国語の読み入力漢字変換式ワープロ
に対して、有効な自動区切の結果をもたらすものであ
る。As described above, the automatic word-separation and sequential word-kanji conversion system of the present invention provides an effective automatic word-segmentation result for a Chinese input-kanji conversion word processor.

3.7.2 日本語の文に対する効果第11図の（ａ）から（ｃ）までは、全仮名書の日本語
例文に対して頻級ネットワークをつくり、該ネットワー
ク上で本発明の音頻法による最適語音区切のパスを求
め、句読点から次の句読点までの文を一括して語音区切
した結果を表す。第11図によって、本発明の音頻法の語
音区切精度が日本語文において如何に高いかがわかる。3.7.2 Effect on Japanese sentence In Fig. 11 (a) to (c), a frequent network is created for Japanese sentence sentences in all kana books, The path of the speech punctuation is obtained, and the result from the punctuation mark to the next punctuation mark is collectively indicated. From FIG. 11, it can be seen how high the word separation accuracy of the phonetic method of the present invention is in Japanese sentences.

音頻語音区切を実行するためには、語音頻度統計が不
可欠である。ここでは国立国語研究所編「中学教科書の
語彙調査」（秀英出版,1986）（対象延べ音節数Ft＝45
7,845）を使用した。この資料中の語彙とその頻度か
ら、各語音とその頻級を計算し、図の語音ネッオワーク
をつくった。図においてネットワークの上の平仮名は音
節の棒書き、ネットワーク中の数字は頻級である。理解
のため、以下に本発明による処理の一例を第14図にあげ
る（ｓは語音の音節数）。In order to perform phonetic word separation, speech frequency statistics are indispensable. Here, the National Institute for Japanese Language, ed., “Vocabulary survey of junior high school textbooks” (Hideei Publishing, 1986) (total syllables Ft = 45)
7,845). Based on the vocabulary and frequency in this document, each speech and its frequency were calculated, and the speech network shown in the figure was created. In the figure, hiragana on the network is a syllabic stick and numbers in the network are frequent. For understanding, an example of the processing according to the present invention is shown below in FIG. 14 (s is the number of syllables of speech).

第14図に示す如く、この文例について最適な語音区切
および漢字列は次のように求められる。最適語音区切：
こう10/いど15（頻級和:10＋15＝25が最小。∴解は「こ
う／いど」。漢字変換選択：こう緯度→光緯度→校緯度
→高緯度。以後「こう」は「高」に変換。As shown in FIG. 14, the optimal word separation and kanji string for this sentence example are obtained as follows. Optimum speech separation:
10 / id 15 (frequent sum: 10 + 15 = 25 is the minimum. ∴ solution is “ko / ido”. Kanji conversion selection: high latitude → light latitude → school latitude → high latitude. Thereafter “high” is “high” Conversion to.

（１）の音節列「こういど」を音頻語音区切法によっ
て処理する。The syllable string “Kouido” of (1) is processed by the frequent word utterance separation method.

（１）に含まれ得る語音は（２）の７種類である。各
語音を持つ上記語彙統計から引用して（３）に示す。語
の右側の数字は統計資料中における語頻度である。
（４）の音節頻度は語頻度に語をつくる音節数を乗じた
もの。頻率（５）は音節頻度を統計資料の全音節数Ft＝
457,845で除して得られる。（６）の頻級は頻率を情報
量に換算したものである。The speech sounds that can be included in (1) are the seven types in (2). (3) is quoted from the vocabulary statistics having each word sound. The number to the right of the word is the word frequency in the statistical data.
The syllable frequency in (4) is the word frequency multiplied by the number of syllables that make up the word. Frequency (5) indicates the syllable frequency as the total number of syllables in the statistical data Ft =
Divided by 457,845. The frequency of (6) is obtained by converting the frequency to the amount of information.

音節列（１）に対し、語音（２）を用いて（９）の語
音ネットワークと、これに対応する（８）の頻級ネット
ワークを組む。その上の頻級の和が最小になるパスが最
適パスで、そのパス上の語音の種類と順序が、音節列
（１）の最適語音区切を与える。（８）と（９）におけ
る太線のパスが、解である。For the syllable string (1), a speech network of (9) and a corresponding frequent network of (8) are formed using the speech (2). The path on which the sum of the frequent classes is the minimum is the optimal path, and the type and order of the speech on the path give the optimal speech separation of the syllable string (1). The paths indicated by thick lines in (8) and (9) are solutions.

第11図は、例文に対して、音頻法の処理のための頻級
ネットワークを示す。図中の太線は最適パスを示す。第
11図の全495音節に対して、区切違いは僅か２箇所しか
出ていない。太い破線は区切違いを修正した結果を示
す。音頻語音区切処理は、処理対象の音節列が如何に長
くても区切処理が可能な性質を持つ。したがって本質的
に「全文一括仮名漢字変換」に適した処理法である。し
かし第11図の上の頻級ネットワークの構造は相当に複雑
なように見える。特に日本語においては語音の音節数が
多いから、中国語にくらべて、頻級ネットワークがはる
かに入り組んでいる。FIG. 11 shows a frequent class network for the processing of the frequent sound method for example sentences. The thick line in the figure indicates the optimal path. No.
For all 495 syllables in Fig. 11, there are only two distinctions. The thick broken line shows the result of correcting the difference. The phonetic word sound segmentation process has a property that the segmentation process can be performed no matter how long a syllable string to be processed is. Therefore, it is essentially a processing method suitable for "all sentence collective kana-kanji conversion". However, the structure of the upper class network in FIG. 11 seems to be considerably complicated. Especially in Japanese, there are many syllables in speech sounds, so the frequency network is much more complicated than in Chinese.

このような複雑な頻級ネットワークに対しても、本発
明の逐次語音区切処理方法は十分に威力を発揮する。第
12図に文例を挙げて、逐次語音区切の経過の例を示す。The sequential speech segmentation processing method of the present invention is sufficiently effective even for such a complicated frequent network. No.
Fig. 12 shows an example of the progress of sequential speech segmentation, using sentence examples.

第12図において、（１）は文例の音節列Ｒである。
（２）に文例の頻級ネットワークを示す。ネットワーク
中、その点を跨ぐ如何なる語音もない点が「節点」であ
り、この文例では、からまでの６個の節点がある。
それらによって、例「文」は５個の「音頻処理区」に分
割されており、１から始まる音節入力番号ｎは、節点が
検出されるたびに１にリセットされる。音頻処理すなわ
ち最適語音区切は、各音頻処理区ごとに実行され、音頻
処理区の音節列の長さが、無限に長くなることはない。
音頻処理区のなかには、第区のように、１音節「を」
だけの短い例もある。In FIG. 12, (1) shows a syllable string R of a sentence example.
(2) shows a frequent class network of a sentence example. In the network, a point where there is no speech sound over that point is a "node". In this example, there are six nodes from to.
Thereby, the example "sentence" is divided into five "syllable processing sections", and the syllable input number n starting from 1 is reset to 1 every time a node is detected. The vocal frequency processing, that is, the optimal speech separation, is performed for each phonological processing section, and the length of the syllable string of the phonological processing section does not become infinite.
In the syllable processing section, like the first section, one syllable ""
There are also short examples.

（４）は、音節入力番号ｎに従って、本発明のアルゴ
リズムにより、最小頻級和P_nmを順次に求め、次いで総
最小頻級和P_nを求めるプロセスを示す。ここでｍは語音
列の末尾の語音の音節数である。この文例で、あるｎに
おいて最小頻級和がフルに存在しない場合がある。たと
えば、第音頻処理区のｎ＝４の処理においてはP₄₁,P
₄₂,P₄₄があって、P₄₃がない。理由は「んだい」という
語音がないからである。またｎ＝２の区においてはP₂₂
があってP₂₁がない。これも「ん」なる語音が日本語に
はないからである。中国語においては、原則として、す
べての単音節に語音が存在する。日本語はそうではな
い。(4) shows a process of sequentially calculating the minimum frequency sum P _nm according to the algorithm of the present invention according to the syllable input number n, and then calculating the total minimum frequency sum P _n . Here, m is the number of syllables of the last speech in the speech sequence. In this sentence example, there is a case where the minimum frequency class sum does not fully exist at a certain n. For example, P ₄₁ , P
_42, there is a P _44, there is no P _43. The reason is that there is no word "dai". In the case of n = 2, P ₂₂
There is no P ₂₁ if there is. This is also because Japanese does not have the word "n". In Chinese, in principle, every single syllable has speech. Japanese is not so.

日本語においては「ん」、「っ」、「にゃ」…などの
単音節語音はない。「ん」や「っ」で始まる多音節語音
もない。したがって日本語の頻級または語音ネットワー
クにおいては、これらの非存在語音の占める位置は
「穴」になる。また第区の第５音節に終わる２音節語
音「いみ」は、次の語音が「ん」であるから、「いみ／
ん」は成立せず、ｎ＝６においてP₆₁が欠ける原因とな
る。第12図中、網をかけた頻級は、これらの語音欠如の
結果、無効になる既存の語音である。本発明の音頻処理
アルゴリズムは、以上のような語音欠如現象に対して、
十分有効に機能するべきである。In Japanese, there are no monosyllable words such as "n", "tsu", "ni", etc. There are no polysyllabic sounds that begin with "n" or "tsu". Therefore, in the Japanese frequent class or speech network, the position occupied by these non-existing speech sounds is a “hole”. In the two-syllable word sound "Imi" ending with the fifth syllable in the fifth ward, the next word sound is "N", so "Imi /
N "is not satisfied, cause the P ₆₁ in the n = 6 is absent. In FIG. 12, shaded classes are existing speech sounds that become invalid as a result of the lack of these speech sounds. The tone frequency processing algorithm of the present invention, for the above-mentioned speech lack phenomenon,
It should work effectively.

第11図（ａ）〜（ｃ）の各頻級ネットワークにおける
太線のパスは、本発明の音頻処理アルゴリズムによって
処理された最適語音区切のパスである。以下に、第11図
に得られた結果を総合して、日本語文の語音区切に関す
る本発明の効果を評価する。The thick-line paths in each frequency network in FIGS. 11 (a) to 11 (c) are paths for optimal speech separation processed by the frequent processing algorithm of the present invention. In the following, the effect of the present invention on the sound segmentation of Japanese sentences will be evaluated based on the results obtained in FIG.

（１）「の」、「は」、「が」、「を」、「に」
「と」などの助詞類は、ほぼ100％確実に分離して区切
られている。これらの直前と直後は高率で節点になる。(1) “no”, “ha”, “ga”, “wo”, “ni”
Particles such as "to" are almost 100% surely separated. Immediately before and after these are nodes at a high rate.

（２）従来の「最長一致法」では区切違いが頻発した
１漢字語＋２漢字語の連語、例えば「き／きんぞく→貴
金属」「そう／きんぞく／せい→総金属製」、「のう／
りゅうさん→濃硫酸」などの語音区切が、やすやすと成
功する。これらの例は第11図（ｃ）の［Ｃ］〜［Ｋ］お
よび第11図（ｂ）の文例６に例示してある。(2) In the conventional "longest match method", a delimitation frequently occurs in one kanji word + two kanji word collocations, for example, "ki / kinzoku → precious metal", "so / kinzoku / sei → total metal", " Now /
Ryusan → concentrated sulfuric acid, etc., succeeds with ease. These examples are illustrated in [C] to [K] in FIG. 11 (c) and in sentence example 6 in FIG. 11 (b).

（３）既に述べたように、日本語文に対する語音区切
処理における本発明の確度は中国語におけるよりも一層
高い。区切り違いは全495音節の文例に対して、僅かに
２箇所、６音節分にすぎない。しかも第11図（ｃ）の
［Ｂ］の（１）の「は／ちゅう」を「はちゅう」と間違
った理由は、語音頻度の原資料が中学の社会および理科
の教科書の語彙頻度統計のため、一般には頻度が低い語
音「はちゅう→爬虫」の頻級が高過ぎるからであろう。
もっと普遍的な統計値を使えば、ここは区切違いが出な
いと推定される。とすれば本文例全体における区切違い
は、（ｂ）の文例３の４段目の冒頭の、「きし／べ」を
「き／しべ」と間違った３音節だけとなり、誤区切率は
音節数換算で3/495＝0.6％にすぎない。本発明の語音区
切能力は画期的に有効といえよう。(3) As described above, the accuracy of the present invention in speech separation processing for Japanese sentences is higher than that in Chinese. The difference is only two places and six syllables in all 495 syllable sentences. Moreover, the reason why the word “ha / chu” in (1) of [B] in FIG. 11 (c) was mistaken for “hachu” is that the source material of the speech frequency was based on the vocabulary frequency statistics of junior high school society and science textbooks. This is probably because the frequency of the infrequently-speaking word sound "Hachichu-Reptile" is too high.
With more universal statistics, it is presumed that there is no difference here. If this is the case, the difference in the whole text example is that only the three syllables at the beginning of the fourth tier of sentence example 3 in (b), where “Kishi / be” is wrong with “ki / shibe”, have an erroneous delimitation rate of Only 3/495 = 0.6% in syllables. The speech separation ability of the present invention can be said to be epoch-makingly effective.

（４）本発明の発想の基本は、「読み漢字変換」を第
一段の「逐次語音区切」と第二段の「語音漢字変換」に
機能分離するところにある。本発明は第一段に係わるも
のである。ワープロの理想型としての変換キー不要」の
「全文一括変換」の機能は、まず自動語音区切の機能が
優秀という条件がなければ到底達成できない。本発明
は、その条件を十分満足する効果を持ち、使い易い日本
語ワープロのゴールを目指して一歩進めるものである。(4) The basis of the idea of the present invention lies in that the function of "reading kanji conversion" is divided into "sequential word separation" in the first stage and "word kanji conversion" in the second stage. The present invention relates to the first stage. The function of "full-text batch conversion" of "there is no need for a conversion key as an ideal type of word processor" cannot be achieved at all unless there is a condition that the function of automatic word separation is excellent. The present invention has an effect of sufficiently satisfying the conditions and advances one step toward the goal of an easy-to-use Japanese word processor.

（５）本発明の自動逐次語音区切機能によって、100
％近い確度で語音区切を実行した後、区切られた個々の
語音に対して、連語間の文法則を利用し、同音語中より
語確定処理を行い、最終的に読み漢字変換の正変換率を
上げるべきである。(5) The automatic sequential speech segmentation function of the present invention
After performing speech separation with an accuracy close to%, word separation processing is performed on each separated speech using the grammar rules between collocations, and finally the correct conversion rate of Yomi-Kanji conversion Should be raised.

（６）本発明の語音区切方式は、次々の音節入力に追
従しつつ、現在までに入力された音節列全体に対して、
情報論の立場で最も確からしい語音区切を、音節入力の
都度実行するものである。したがって本発明の語音区切
方式は、本質的に音節入力に直接即時に応答してなされ
る逐次処理である。(6) The syllable separation method of the present invention follows the syllable input one after another, and
It is the one that is most probable from the standpoint of information theory, and is executed each time a syllable is input. Thus, the word separation system of the present invention is essentially a sequential process that is performed in immediate response to syllable input.

3.7.3 韓国語の漢字まじりハングル文に対する効果前記3.2.3項において既に述べたように、韓国語と日
本語とは、双方とも言語学上の膠着語であって、語順、
文法、漢語由来の単語の多さ等、互いに酷似している。
本発明者は、韓国語の文に対する本発明の効果を、日本
語や中国語程度の詳細さで調べてはいないが、言語学上
の当然の常識から、本発明は韓国語の漢字まじりハング
ルワープロに対しても、当然有効である。3.7.3 Effect of Korean on Hangul sentences with Chinese characters As described in section 3.2.3 above, Korean and Japanese are both linguistic linguistics,
They are very similar to each other, such as grammar and the number of words derived from Chinese.
Although the present inventor has not examined the effect of the present invention on Korean sentences at the same level of detail as in Japanese and Chinese, based on natural common sense in linguistics, the present invention is based on the Korean kanji hangul. Of course, it is also effective for word processors.

3.7.4 漢字を使用する言語の入力処理に対する本発明
の効果（１）現在書き言葉のなかに漢字を使用する言語は日
本語、中国語および韓国語の３言語である。これらの言
語の文には、漢字を一部または全面的に使用する。表意
文字漢字の使用は、「読み」と「表記」の間に乖離があ
る。漢字語には同音語の出現が避けられないからであ
る。3.7.4 Effects of the present invention on input processing of languages using kanji (1) There are three languages that use kanji in the currently written language: Japanese, Chinese, and Korean. Kanji is used partially or completely in sentences in these languages. The use of ideographic kanji has a gap between "reading" and "notation". This is because the appearance of homonyms is inevitable in kanji words.

（２）日中韓の三語は、たまたまその読みが音節を単
位とした文字で書かれる。中国漢字の読みは１漢字が１
音節である。韓国語のハングル１字の読みは１音節であ
り、韓国漢字音も１音節である。日本語の漢字の読みは
１字１音節の仮名で書ける。従って、日中韓の三語は、
音節を単位として入力するのがよいという共通点を持
つ。(2) The three words of Japan, China, and Korea happen to be written in syllable units. One Chinese kanji reading is one kanji
It is a syllable. One Korean Hangul reading is one syllable, and the Korean Kanji sound is one syllable. You can write Japanese kanji with one-syllable one-syllable kana. Therefore, the three words CJK are
It has a common feature that it is better to input in syllable units.

（３）日中韓の三語は、音節単位の読みで入力したと
き、必ず読み漢字変換を行わなければ通用の文章にはな
らない。そのとき、何かを単位として文を区切り、その
単位ごとに読み漢字変換をすることになる。日韓語にお
いては、その単位は「文節」である。両者とも「文節」
は、表記上おおむね漢字で始まり仮名またはハングルで
終わる。韓国語ハングル文のなかの「スペース」は文節
ごとに置く。したがって文節の内部で漢字語を自動的に
区切ることが要求される。中語語には文節の概念がな
い。語ごとに「分かち書き」をする習慣もないから、中
国人を語に分割して書かせると、一定の分割をするのが
困難である。ゆえに中国語の読み入力処理においては、
いわゆる全文一括によって語区切と読み漢字変換を自動
的に行う技術が要求される。もし、中国語において有効
な全文一括語区切漢字変換の技術が発明されるならば、
その技術は日本語と韓国語における読み入力漢字変換の
目的にも当然有効なはずである。(3) When three Japanese, Chinese, and Korean words are input in syllable units, they must be converted to kanji characters before they can be converted into sentences. At that time, the sentence is delimited in units of something, and the kanji conversion is performed for each unit. In Japanese and Korean, the unit is “bunsetsu”. Both are "bunsetsu"
Starts with kanji and ends with kana or Hangul. The “space” in Korean Hangul sentences is placed for each phrase. Therefore, it is required to automatically delimit kanji words within a phrase. Chinese has no concept of a clause. There is no habit of “separating” each word, so it is difficult to divide a Chinese person into words if they are divided into words. Therefore, in Chinese reading input processing,
There is a need for a technique for automatically performing word separation and reading kanji conversion in a so-called whole sentence package. If the technology of full-text batch delimited kanji conversion effective in Chinese is invented,
This technique should be effective for the purpose of reading and kanji conversion in Japanese and Korean.

（４）本発明の根拠となっている「言語情報エントロ
ピー最小の法則」を具体的に表現すれば「文中の各語の
読みの情報量の総和が最小値をとるように文は構成され
ている」となる。この法則の成立の理由は「きまった長
さの文を書くとき、その文の各節ごとに伝達される情報
に要求される条件は、情報量の多さではなく少なさ、即
ち明瞭度である」ところにある。この要求は、書き言葉
の文でも、話し言葉の文でも同じはずである。(4) If the "rule of minimum linguistic information entropy", which is the basis of the present invention, is specifically expressed, "a sentence is constructed such that the sum of the information amounts of the readings of each word in the sentence takes the minimum value. Yes ". The reason for the establishment of this rule is that when writing a sentence of a fixed length, the condition required for the information transmitted in each section of the sentence is not a large amount of information, but a small amount, that is, clarity. There is ". This requirement should be the same for written and spoken sentences.

（５）本発明の中心の「音頻語音区切法」は、語の情
報量を語の「読み」の統計的頻度に求めた。語の読みの
頻級の定義Ｉ＝int（−log₂p）は、頻級Ｉが情報量その
ものであることを示している。文においてΣＩが最小に
なるような語音の列が、その文に対する最適の語音区切
を与えるという事実は、中国語と日本語の多数の例文に
対して経験的に立証されている。本発明の実用上の効果
は、以上の理論の正しさからもたらされている。(5) In the "phonetic word sound separation method" at the center of the present invention, the amount of information of a word is obtained from the statistical frequency of "reading" of the word. The definition of the word reading frequency I = int (−log ₂ p) indicates that the frequency I is the information amount itself. The fact that a sequence of phonemes that minimizes ΣI in a sentence provides the best phonetic separation for that sentence has been empirically established for many example sentences in Chinese and Japanese. The practical effect of the present invention comes from the correctness of the above theory.

（６）音頻処理区が長く、語音ネットワークが複雑な
ときには、音頻法の理論が如何に妥当でも、最適語音区
切を求める処理量が複雑で大きくなって実用性を失う心
配がある。本発明の特徴である最小頻級和と最適語音区
切とを音節入力に追従して逐次に求める手法は、音頻処
理区の長さの影響を受けない。語音音節長の上限が大き
くても、自動語音区切処理の必要時間の増加はたいした
ものではない。(6) When the frequent phonetic processing section is long and the speech network is complicated, there is a concern that the amount of processing for finding the optimal phonetic speech segmentation is complicated and large, and the practicality is lost, no matter how valid the theory of the phonetic frequency method is. The method of sequentially finding the minimum frequent class sum and the optimal word-sound delimiter, which is a feature of the present invention, following the syllable input is not affected by the length of the frequent phonetic processing section. Even if the upper limit of the syllable syllable length is large, the time required for the automatic syllable segmentation processing is not significantly increased.

（７）日本語ワープロにおいては、全文一括漢字変換
は、操作の大衆化の目的からたえず望まれてきた技術で
あるが、真に実用性と精度の高い技術はまだ開発されて
いない。本発明の音頻法による自動語音区切は、全文一
括漢字変換を実用的次元で可能にし、同時に音節入力に
直ちに追従した漢字変換を可能にする効果がある。(7) In Japanese word processors, full-text batch kanji conversion is a technology that has always been desired for the purpose of popularizing operations, but a truly practical and highly accurate technology has not yet been developed. The automatic vocabulary separation according to the frequent method according to the present invention has the effect of enabling whole sentence kanji conversion in a practical dimension, and at the same time, enabling kanji conversion immediately following syllable input.

（８）中国語ワープロに要求される機能条件は、自動
語区切という苛酷な要求である。本発明は、それに答え
るものである。(8) The functional requirements required of a Chinese word processor are severe requirements for automatic word separation. The present invention answers that.

（９）韓国語ワープロに対して、本発明は、すでに文
節単位で分かち書きをしている韓国語の表記の文節中で
漢字とハングルとの自動区分と、長い漢字列の漢字変換
に効果があると見做される。(9) For Korean word processors, the present invention is effective for the automatic division of kanji and Hangul in Korean phrases that have already been segmented in paragraph units, and for the conversion of long kanji strings to kanji. Will be considered.

[Brief description of the drawings]

第１図は１〜Ｎ音節の語音を含む語音区切型の樹構造を
示す図、第２図は最小頻級和と最適区切型を求める逐次
処理に同期して実行される語音漢字変換処理を示す図、
第３図は最小頻級和と最適区切の末尾語音の音節数moを
求めたのち、moによる語音漢字変換処理を示す図、第４
図は音頻語音区切漢字変換装置を示すブロック図、第５
図は逐次語音区切および語音漢字変換装置の第１の実施
例を示す図、第６図は語音の読みR_m生成と節点処理の詳
細を示す図、第７図は逐次語音区切および語音漢字変換
装置の第２の実施例を示す図、第８図は音頻法による中
国語文の自動語音区切逐次漢字変換の例を示す図、第９
図は語音ネットワークの最適語音区切を示す図、第10図
は二者択一逐次語音区切漢字変換の例を示す図、第11図
は音頻法による日本語文の語音区切の例を示す図、第12
図は音頻法による日本語文の逐次語音区切の経過を例示
する図、第13図は韓国語と日本語との対応関係を例示す
る図、第14図は日本語に対し本発明を適用して得られた
語音区切の例を示す図である。FIG. 1 is a diagram showing a speech segmentation type tree structure including speech sounds of 1 to N syllables, and FIG. 2 is a diagram showing a speech-kanji conversion process executed in synchronization with a sequential process for obtaining a minimum frequency sum and an optimal segmentation type. Diagram,
FIG. 3 is a diagram showing the syllable number kanji conversion processing by mo after calculating the syllable number mo of the last utterance of the minimum frequency sum and the optimal delimiter.
The figure shows a block diagram of the tyrannical word-separated kanji conversion device.
Figure sequential diagram showing a first embodiment of the speech separator and speech kanji conversion apparatus, Figure 6 shows the details of the R _m generated and the node processing reading speech figure 7 Figure sequential speech separator and speech kanji conversion FIG. 8 is a diagram showing a second embodiment of the apparatus, FIG. 8 is a diagram showing an example of automatic syllable-separated sequential kanji conversion of a Chinese sentence by the phonetic method, FIG.
FIG. 10 is a diagram showing an optimal speech separation of a speech network, FIG. 10 is a diagram showing an example of alternative sequential speech separation Kanji conversion, FIG. 11 is a diagram showing an example of speech separation of a Japanese sentence by the phonetic method, FIG. 12
The figure illustrates the sequence of sequential speech separation of Japanese sentences by the frequent method, FIG. 13 illustrates the correspondence between Korean and Japanese, and FIG. 14 illustrates the application of the present invention to Japanese. It is a figure which shows the example of the obtained speech separation.

Claims

(57) [Claims]

In a sentence of a language using kanji, a syllable string obtained by sequentially inputting syllables using phonetic characters as a unit is divided into syllables, and an optimal syllable string is sequentially determined. In the syllable-input-sequential-sequential-separated-Kanji-sequential conversion method that obtains the most probable Kanji-word sequence, the phonetic conversion is performed sequentially for each individual sound in the word-sound sequence. reading a single speech, the statistical frequency of occurrence of individual speech used sentences of the language is f, and s the syllable length of word or sound, the total syllables total of all speech in speech statistics F _t when the, the Shikiritsu p of each word sounds and p = (f × s) / F t, the Shikikyu I of each word sounds _{I = int (-log a p)} , but to an integer as a = 2, the languages When entering a continuous phonetic sequence in a sentence,
A syllable string from the first syllable at the beginning of the syllable string to the n-th syllable that was recently input is regarded as a frequent phrasal phrase, and the frequent phrasal phrase is subjected to a sequential speech / separation / sequential kanji sequential conversion process at the most recent time. When the sum of the frequency of each speech is defined as a frequency sum, the speech of 1 to M syllable lengths (where M is an integer of 2 or more) which is statistically significant in the language is used as a heading, and the frequency of each speech is stored A vocabulary lexical class dictionary, and a vocabulary kanji vocabulary dictionary in which kanji homonyms reading the vocabulary words are stored in the form of kanji character strings, with each vocabulary as a heading. Each time it is entered, each type of speech is searched for in the dictation dictionary, and each vocabulary and each frequency are searched in the dictation dictionary. Sentence to the optimal frequency sum successive calculation means described in the next section For the search means and the optimal frequency class sum sequential calculation means, the syllable input number in the syllable is n (= 1, 2, 3,..., N)
, R _n1 , R _n2 , R _n3 ,..., R _nM , respectively, the M words having the length of 1 to M syllables ending with the n-th syllable input recently. The classes are _In1 , _In2 , _In3 , ..., _InM , respectively. The syllable segmentation type of the type that can be maximized in the syllable string of n syllables length is the syllable length m of the last syllable. 1,2,…
…, M where the readings are R _n1 , R _n2 , R _n3 , ……, R _nM respectively
And the minimum frequency sum of each of the M sets is P _n1 , P _n2 , P _n3 ,
...... further the P _n1 and _{_{_{P nM, P n2, P n3}}} , ......, the optimum Shikikyu sum Pn to the smallest value in P _nM, by sequential syllable input, n is 1 Increment by 1 The P _n
With respect to seek, when n is in the range of 1 ≦ n ≦ M, when n is 1, the P _1, calculated by _{_{_{P nm = I 11 = P 1}}} , n 1 <n ≦ against the M when in range of M, relative to m ≦ n-1 of the (n-1) m, (n-1) and P _nm calculated by _{_{_{P nm = P nm + I nm}}} , 1 single m of m = n , One P _nm is calculated by P _nm = P _nn = I _nm , and finally, the n minimum frequency sums of P _n1 , P _n2 ,..., P _nn are obtained. On the other hand, when it is in the range of M <n, M minimum frequency sums are calculated by P _nm = P _nm + I _nm for M m of 1 ≦ m ≦ M. As a result, P _n1 , P _n2 , ……, P _nM is calculated as the minimum sum of the most frequent syllables. After all, the minimum sum of the most uttered speech segmentation type for each of the n sets or the M sets is m syllables more than the current syllable input number n. Already obtained and stored by the processing immediately after the previous syllable input To each of the n or the M minimum Shikikyu sum P _nm that, the minimum Shikikyu sum sequential calculation means for calculating by adding the each of the n or the M Shikikyu I _nm by the currently searched, one syllable Is received, the values of the n or M minimum frequency sums obtained by the minimum frequency sum sequential calculation means are received, and a magnitude comparison selection of n or M alternatives is received. The minimum value P among the n persons or the M persons
seeking _nm as the optimal Shikikyu sum P _n, optimum Shikikyu which stores the value of the P _n, obtains a unique word sound column at the end of the speech R _nmo and number of syllables mo of word or sound with該頻class sum simultaneously Japanese _language separation type selection means, receives the speech R _nmo determined by the optimal frequency Japanese style division type selection means, and, using the R _nmo as a heading, of the same-sound kanji word of the syllable number mo ending with the currently input syllable , _{Read the} most probable kanji word H _nmo from the phonetic kanji word dictionary,
The most probable kanji word search and conversion means to be sent to the most probable kanji string calculation means of the next section, the mo obtained by the optimal frequency class sum delimitation type selection means, and the most probable kanji word search and conversion means obtained by the preceding section Kanji word H
_The present most probable kanji character string K _nmo is obtained by adding a character string of K _nmo = K _n-mo + H _nmo every time n is incremented by 1 each time a syllable is input, in _response to _nmo. Means for calculating the most probable kanji character string as an output of the means. In summary, when the language has speech sounds from one syllable to M syllable, in a sentence of N syllables, let the syllable number be n,
Is set to 1 and is sequentially input to the syllable syllable. Each time n increases by 1, M which may be present at the end of the sentence
(M when n ≦ M) and m syllables are obtained, and up to M syllables from the beginning of the sentence to the syllable immediately before the last syllable are obtained. To each of the frequency sums of the optimal speech sequence, the frequency of each of the up to M ending speech words is added, and eventually a frequency sum of up to M speech sequences having a length of n syllables is obtained. The only speech sequence having the smallest frequency sum in the sum is the speech sequence of the current optimal speech segmentation, the last speech of the speech sequence is determined as the current optimal speech, and the speech is converted to kanji. A kanji string is defined as a last kanji string, and a new kanji string obtained by connecting the last kanji string to a converted kanji string for a known most probable vocabulary string prior to the utterance is obtained.
A kanji sequence sequentially output a new kanji string each time the kanji is incremented by one.