JPS58166435A

JPS58166435A - Japanese syllabary to chinese character conversion system using probability matrix

Info

Publication number: JPS58166435A
Application number: JP57050434A
Authority: JP
Inventors: Kenji Sugiyama; 健司杉山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-03-29
Filing date: 1982-03-29
Publication date: 1983-10-01

Abstract

PURPOSE:To express information of word-to-word concatenation in details and to perform high-level KANA(Japanese syllabary)-KANJI(Chinese character) conversion, by providing a probability information holding means for indicating the probability of concatenation between words and extracting a specific word on the basis of the probability of the concatenation between words obtained from said means. CONSTITUTION:A possible word extracting device 6 extracts words, one by one, from a possible word stack 5 and sets them in a possible word extraction buffer 7. Consequently, a possibility retrieving device 8 accesses a concatenation possibility matrix 9 on the basis of the part of speech of a word set in a last-word buffer 2 and that in said buffer 7 to set the concatenation possibility in a possibility buffer 10. An evaluated-value arithmetic device 11, on the other hand, sets the evaluated value of the word in an evaluated-value buffer 12 on the basis of data set in the possible-word extraction buffer 7. A multiplication part 13 multiplies the value set in the possibility buffer 10 by that in the evaluated- value buffer 12 and sets the product in a priority buffer 14.

Description

【発明の詳細な説明】（１）　　発明の技術分野本発明は、自動的に漢字とカナ文字が渇在するカナ混り
文が作成で１１ｈ力ナ漢字変換方式に関し。DETAILED DESCRIPTION OF THE INVENTION (1) Technical Field of the Invention The present invention relates to an 11h kana-kanji conversion method that automatically creates kana-mixed sentences in which kanji and kana characters are missing.

特に単語間の接続情報をよ伽詳細ＫＩＩ！現して高性能
のカナ漢字変換を実現するようにしたカナ漢字変換方式
に関する。Especially the connection information between words! This invention relates to a kana-kanji conversion method that realizes high-performance kana-kanji conversion.

セ）　従来技術と問題点例えば！１１図に示すように、会社という単語に「ナイ
」を接続するとき、「会社内」という変換が行なわれる
ことが必蚤である。この場合、第１図（ロ）に示すよう
に各単語間接続情報を使用して。c) Conventional technology and problems, for example! As shown in FIG. 11, when connecting "nai" to the word company, it is necessary to convert it into "inside the company." In this case, each word connection information is used as shown in FIG. 1(b).

名詞と名詞は接続可能であり１名詞と動詞は接続不可能
である・・・・・・勢のｍ５ｔｓ報を得て変換を行う。Nouns and nouns can be connected, but nouns and verbs cannot be connected...Conversion is performed based on the m5ts information of the group.

このとき「ナイ」というものは打消を表わす助動詞「な
い」とも１名詞「内」とも変換できるので、この単飴閾
綴絖情−のみでは「会社ない」と変換するのか「会社内
」と変換するのか不明である。このようなと會ＫＵ、次
ＫＭ度情報を使用する。すなわち日本語の文章上、「な
い」の使用頻度より「内」の使用するＳｔの方が大音い
ので不本意な「会社ない」という変換が行なわれるとい
う間層がある。In this case, ``nai'' can be converted into the auxiliary verb ``nai,'' which expresses negation, or the noun ``uchi,'' so with this single candy threshold, is it converted to ``no company'' or ``inside the company''? It is unclear whether this will be done. In this case, the next KU and next KM degree information is used. In other words, in Japanese sentences, the use of St for ``uchi'' is louder than the frequency of use for ``nai'', so there is an undesired conversion to ``company nai''.

この点について更に＠明する。Let me clarify this point further.

従来のカナ膚字羨換装置で使用されている単断間接続情
輸はある１つの種類の単語にどのような種類の単語が接
続し得るかを表現した。可能性ビットマトリックスであ
る。この表現方法に従うとき１通常小まり出現しない単
語の接続も、よく出現する単語の接続も、同じく１ビツ
トで表現されるので、「１」として表わされる。例えば
、多くの場合２名詞と打消の助動詞「ない」は接続しな
いが１名詞のうち特殊なものとは接続する。その例とし
てｒ会社Ｊ（名詞）と「ない」は接続しないが、「仕方
Ｊ　（名詞）と「ないＪとは接続する。The single-interval connection information used in conventional kanaji conversion devices expresses what types of words can be connected to a certain type of word. is a possibility bit matrix. According to this representation method, both the connection of words that do not occur often and the connection of words that occur frequently are expressed by 1 bit, so they are represented as "1". For example, in most cases, two nouns and the negative auxiliary verb ``nai'' do not connect, but they do connect with a special noun. As an example, r company J (noun) and ``nai'' do not connect, but ``howa J'' (noun) and ``nai J'' do.

このように−議でない関係を可能性ビットで表現したと
き間層が存在する。In this way, an interlayer exists when a non-determined relationship is expressed using possibility bits.

いま「会社内Ｊというテキストを作成する丸めカナ漢字
変換数＃Ｋｒ力イシャナイＪを入力する。Now, enter the rounded kana kanji conversion number #Kr force ishanai J to create the text ``J within the company''.

カナ漢字変換装置ては＋　！Ｉ　１図１ｏ）　Ｋ示す単
語関接続情１１（接続可能性マトリックス）を使用して
。Kana-kanji conversion device is +! I 1 Figure 1o) Using the word association information 11 (connectivity matrix) shown in K.

ｌＩ２図に示すように１本ｌｓ票を行う。Perform one ls vote as shown in Figure lI2.

この場合のカナ漢字を換は、まずレベル１で入力の左端
と部分的に読みが一款する単語を辞書から検索する（「
会社」、「貝」・・・・・・など）。そしてこれらの単
語のうちから一番と考えられる単語「会社」を正しいも
のと仮定して、レベル２において、以後の入力列「ナイ
」を解析する。To convert kana-kanji in this case, first, at level 1, search the dictionary for words that have a partial pronunciation of the left edge of the input ('
company,” “shellfish,” etc.) Then, assuming that the word "company" that is considered to be the best among these words is correct, the subsequent input string "nai" is analyzed at level 2.

この「ナイ」に対しても辞書からは「ない」。The dictionary also says "nai" for this "nai".

「内」・・・・・・などが検索される。そしてこれらの
単語中から一番有力なものが選択されることになるが、
どれが一番有力であるかは、−香長い読みのものを優先
する最長一致法や、＃１度と読み長を考慮する方法（例
、最尤評価法、Ｓ度×３２×し／ゲスで評価；特許出願
ずみ）等が使用される。ところがどちらの方法で判定し
ても、この場合「ない」と「内」を評価するとき、その
軌み兼が同一なので、Ｓ度によ抄その優先度が決ること
になるが、「ない」と「内」の日本一テキスト中に現わ
れるＳ度を考えれば、「ない」の方が［ｊが大きいので
、「ない」が優先される。それから、１１１Ｉ図（ロ）
に示す単語関接続情１１による品詞間接続状態がチェッ
クされるが、これには名詞と助動詞線接続可能とマトリ
ックス表現されているので０．にとなシ、その結果「会
社ない」という不自然な変換結果となってしまう。"inside"... etc. are searched. The most likely word will then be selected from these words,
Which one is most likely is the longest match method that prioritizes long readings, or the method that takes #1 degree and reading length into account (e.g., maximum likelihood evaluation method, S degree x 32 x shi/guess). (evaluation; patent application filed) etc. are used. However, no matter which method is used to determine, in this case, when evaluating ``nai'' and ``in'', the trajectory is the same, so the priority of the selection is determined by the S degree, but ``nai'' Considering the degree of S that appears in the Nippon Ichi text of ``nai'', ``nai'' is larger [j is larger, so ``nai'' is given priority. Then, Figure 111I (b)
The state of connection between parts of speech is checked using the word relation information 11 shown in Figure 1, but since it is expressed in a matrix that nouns and auxiliary verbs can be connected, 0. As a result, the unnatural conversion result is ``there is no company.''

（３）　　発明の目的本発明の１市は、前記の如くあまり出現しない単一クリ
並びの丸めによく現われる単語並びの情報が１効になら
ないという問題を改善し九―率マトリックスを用いたカ
ナ漢字変換方式を提供するものである。(3) Purpose of the Invention One aspect of the present invention is to improve the problem that information on word sequences that often appear in the rounding of single click sequences that do not appear often as described above does not have a single effect, and to develop a kana formula using a nine-rate matrix. It provides a kanji conversion method.

（４）　　発明の構成この目的を達成するために本発明の確率マトリックスを
用いたカナ漢字変換方式では、単一ファイルを具備し、
カタカナわるいはカナで入力された日本語を単紐関の接
続情報を利用して自動的に漢字カナ混り文に変換するカ
ナ漢字変換装置において、率飴間の接続の鍮卓を示す伽
事情に＆持手段を設け、この確率情報保持手段より得九
率胎関の嫉続蓚卓にもとづき特定の単語を抽出するよう
にしたことに％象とする。(4) Structure of the invention In order to achieve this objective, the kana-kanji conversion method using a probability matrix of the present invention has a single file,
In a kana-kanji conversion device that automatically converts Japanese input in katakana waru or kana to a sentence containing kanji and kana using the connection information of a single string, this is a sign that shows the connection between kanji and kana. It is interesting to note that a means for holding & holding is provided, and a specific word is extracted from the means for holding probability information based on the jealousy of the probabilistic information holding means.

１５１　　発明の′Ｊｋ施例本発明を一実施例にもとづき詳述するに先立ち本発明の
原理を第５図にもとづき簡単に統明する。151 'Jk Embodiment of the Invention Before explaining the present invention in detail based on one embodiment, the principle of the present invention will be briefly explained based on FIG.

本発明では、単輪間接続情報として、第５図（イ）に承
すように、確率マトリックスで表示し、このマトリック
ス上に示され友接続確皐を単語選択の優先順位法定に使
用する。このｉ卓マトリックスは各行のすべてを加える
と１になるように各品詞の接続状橿が表示されており１
例えｄ格助詞の彼に動詞の接続される一率が、格助詞の
後に接続されるすべての品詞の総計の古であることを示
している。そしてこの確率は日本語文より統計的に得る
ことができるものである。In the present invention, the connection information between single wheels is displayed as a probability matrix as shown in FIG. 5(a), and the friend connection confirmation shown on this matrix is used to determine the priority order of word selection. In this i-table matrix, the conjunctions of each part of speech are displayed so that the sum of all rows becomes 1.
For example, the rate at which the verb is connected to the d-case particle indicates that it is the sum total of all parts of speech connected after the case particle. This probability can be statistically obtained from Japanese sentences.

したがって「カイシャナイ」を入力したとき。Therefore when you type "Kaishanai".

先ず「会社」か変換され九あと、「内」が接続される一
率は「内」を接尾飴とすれば八であ抄、助動詞の接続さ
れる確率はムであることがわかる。First, the probability that ``company'' is converted, 9, and ``uchi'' is connected is 8 if we use ``uchi'' as the suffix, and the probability that the auxiliary verb is connected is mu.

したがって、別に最長一致法ないし紘最尤１’Ｆ価法に
より「内」、「ない」の評価値Ｆ（ｌ内１．　ｂ内）ｔ
　Ｆ（＋ない１．　　ｈない）を求め、これに前配誉続
鍮亭を乗じて得られた優先度数によシいずれか大きい方
を選択結果として出力するものである。Therefore, separately, using the longest match method or the Hiroshi maximum likelihood 1'F value method, the evaluation value F (within 1. within b) of "within" and "no" is calculated.
The method calculates F (+1. h no), multiplies it by the previous priority number, and outputs the larger one as the selection result.

この部会、ＩＩ’（ｌない１．　ｈない）　＞Ｆ　（Ｉ
内１．ｈ内）でろるＫもかかわらずく例えばＦ（ｌない
１．ｈない）−１０００，Ｆ（ｌ内１．　　ｈ内）−５
０００）、優先度ハｓ’ｏ　ｘＦ（ｌ　内ｌ　ｒ　ｈ内
）、、−７，ＸＦ（＋ない１．　ｈない）で求めること
になるので逆転する可能性かめる。This committee, II' (l not 1. h not) >F (I
Of these, 1. For example, F (l not 1. h not) - 1000, F (l not 1. h not) - 5.
000), the priority is calculated as s'o xF (l in l r h), -7, XF (+ not 1. h not), so there is a possibility of reversal.

本発明の一実施例を第４図にもとづき説明する。An embodiment of the present invention will be described based on FIG.

第４図（イ）は優先度数を作成するまでの構成を示し、
同（ロ）は優先度数にもとづき優先順バッファに優先順
候補列を得る状態を示す。Figure 4 (a) shows the configuration up to creating the priority number,
The same (b) shows a state in which a priority order candidate sequence is obtained in the priority order buffer based on the number of priorities.

図中、１は入力バッファ、２は前単語バッファ。In the figure, 1 is the input buffer and 2 is the previous word buffer.

５は辞書、４は検索装置、５は候補スタック、６は候補
取出装置、７は候補取出バッファ、８は確率検索装置、
９は接続確率マトリックス、１０は一率バソファ、１１
は評価値演算装置、１２は評価値バッファ、１３は乗算
部、１４は優先度数バッファ、１５Ｆｉソート装置、１
６は優先順バッファ、Ｐは優先度数計算装置である。5 is a dictionary, 4 is a search device, 5 is a candidate stack, 6 is a candidate extraction device, 7 is a candidate extraction buffer, 8 is a probability search device,
9 is a connection probability matrix, 10 is a one rate bathopha, 11
1 is an evaluation value calculation device, 12 is an evaluation value buffer, 13 is a multiplication unit, 14 is a priority number buffer, 15 is a Fi sorting device, 1
6 is a priority order buffer, and P is a priority number calculation device.

入カバソファＩｔ！ｉ換を求めるカナ入力がセットされ
るものである。前単語バッファ２は、カナ人力のうち先
に変換された単語がセットされるものであって、「カイ
シャナイ」の場合、Ｅ会社」がセットされるものである
。It's a hippo sofa! Kana input for requesting i-exchange is set. The previous word buffer 2 is used to set the word that was converted first among kana characters, and in the case of ``Kaishanai'', ``E company'' is set.

１１１１Ｆ４３は各槙単語が格納されているファイルで
ある。この辞書３には、カナ読み、漢字、頻度。1111F43 is a file in which each Maki word is stored. This dictionary 3 includes kana reading, kanji, and frequency.

品詞等も一緒に格納されている。Parts of speech, etc. are also stored together.

検索装置４は、入力バッファ１にセットされたカナ入力
により辞書５を検索し、その検索結果の出力を候補スタ
ック５にセットするものである。The search device 4 searches the dictionary 5 based on the kana input set in the input buffer 1, and sets the output of the search results in the candidate stack 5.

飯浦墳出装ＴＩＬ６は候補スタック５から前記＆索結朱
を朧次取り出してこれを候補取出バッファ７にセットす
るものである。The Iiura mound removal TIL 6 extracts the &sequence vermilion from the candidate stack 5 and sets it in the candidate extraction buffer 7.

電卓検索装置１８は、ｍ率飴バッファ２にセット・され
た率飴の品詞と、候補取出バッファ７にセットされた本
論の品詞によ＃）接続確率マトリックス９をアクセスし
てその品詞間の接続確率を読出すものであり、との読出
された電卓は確率バッファ１０にセットされる。を九接
続蓚率マトリックス９は、第５図（イ）に示すように構
成されているものである。The calculator search device 18 accesses the connection probability matrix 9 based on the part of speech of the rate candy set in the m rate candy buffer 2 and the part of speech of the main article set in the candidate extraction buffer 7, and searches the connection between the parts of speech. The probability is read out, and the read calculator is set in the probability buffer 10. The nine-connected frequency matrix 9 is constructed as shown in FIG. 5(a).

＃ｆｉｉＩｌ麺演算装置１１は候補取出バッファ７にセ
ットされた単語の評価値を演算するもので８って。The #fiiIl noodle calculation device 11 calculates the evaluation value of the word set in the candidate extraction buffer 7.

その演算方法としては公知のＩＩＩＬ！に一致＃ｆ価法
によっても、また前記最尤評価法によってもよい。そし
てその奸１ＩＩＩ値は評価値バッファ１２に出力される
。The calculation method is well-known IIIL! The same #f value method may be used, or the maximum likelihood evaluation method may be used. The 1III value is then output to the evaluation value buffer 12.

乗算部１３は、評価値バッファ１２にセットされ九評価
値と一率パソファ１０にセットされた確率とを乗算して
各単語の優先度数を求めるものであり、この結果求めら
れ九優先度数は優先１１＜ソファ１４にセットされる。The multiplier 13 multiplies the nine evaluation values set in the evaluation value buffer 12 by the probabilities set in the one-rate pathophone 10 to obtain the priority number of each word. 11<Set on sofa 14.

ノー）装置１５は、この優先度数ノくソファ１４にセッ
トされた優先度数にもとづき、優先度数００ものを除き
その大きい順に候補単語を優先順・（ソファ１６にセッ
トし、出力優先順の候補列を作るものである。No) Based on the priority number set in the sofa 14, the device 15 sets the candidate words in order of priority (excluding the priority number 00) in the sofa 16 in descending order of priority, and outputs the candidate words in the output priority order. It is something that creates

優先度数計算装ｄＰは、各単語についての優先度数を求
めるものであって、確率検索装ｍｓ、＊続確率マトリッ
クス９．ａ率バソファ１Ｇ、！’Ｆ価値演算装置１１１
．評価値バラフッ１２２乗算部１５等を具備している。The priority number calculation device dP calculates the priority number for each word, and uses the probability search device ms, *continuous probability matrix 9. A rate bathofa 1G,! 'F value calculation device 111
．． It is equipped with an evaluation value balancer 122, a multiplier 15, and the like.

次に！４４図（イ）（ロ）に示すｖ７＆置の動作につい
て説明する。next! The operation of v7&position shown in FIGS. 44(a) and 44(b) will be explained.

ここで１カイシヤナイ」と入力し「カイシャ」について
は「会社」と変換されてこれが前単語バッファ２にセッ
トされており、「ナイ」について変換する場合について
説明する。Here, ``1 kaisha yanai'' is input, and ``kaisha'' is converted to ``company'' and this is set in the previous word buffer 2. A case will be explained in which ``nai'' is converted.

■　検索装置１４は入力バッファ１中の文字列「ナイ」
と−紙する単語を辞書３で検索しこの結果の出力を候補
スタック５に記入する。■ The search device 14 searches for the character string “Nai” in the input buffer 1.
The dictionary 3 is searched for the word ``-'', and the resulting output is entered in the candidate stack 5.

■　候補取出装置６はこの候補スタック５より１つずつ
単語を取出し、これを候補取出バッファ７にセットする
。(2) The candidate extraction device 6 extracts words one by one from the candidate stack 5 and sets them in the candidate extraction buffer 7.

■　これＫより確率検索懺１１８は、前単語バッファ２
にセットされた単語の品詞（この場合には名詞）と、候
補取出バッファ７にセットされた単語の品鉤（第４図（
イ）の場合には名詞）よ抄接続確率マトリックス９をア
ク令スし、その接続確率へを読出し、これを−卓バツフ
ァ１０にセットする。■ From this K, the probability search result 118 is the previous word buffer 2.
The part of speech (in this case, noun) of the word set in , and the part of speech of the word set in the candidate extraction buffer 7 (see
In the case of (a), the noun connection probability matrix 9 is activated, the connection probability is read out, and this is set in the -table buffer 10.

■　一方評価値演算装置１１は候補取出バッファ７にセ
ットされ九データにもとづき、その単語の評価値を演算
して、その結ｉ／Ｉ：得られた評価値（９ＴＩＪえば５
２とする）を評価値バッファ１２にセノ　　卜　す　る
　。■ On the other hand, the evaluation value calculation device 11 is set in the candidate extraction buffer 7 and calculates the evaluation value of the word based on the 9 data.
2) is stored in the evaluation value buffer 12.

■　乗算部１５は、この伽４バッファ１０に奄ノドされ
九九と＃Ｐ価儲バッファ１２にセットされた３２とを乗
算して１．６を得これを優先度数バッファ１４にセット
する。(2) The multiplier 15 multiplies the multiplication table written in the G4 buffer 10 by 32 set in the #P value buffer 12 to obtain 1.6 and sets this in the priority number buffer 14.

■　このような優先度数の計算が候補スタック５に出力
された全候補年給について行われる。この結果、第４図
（ロ）に示す如く、優先度数バッファ１４にセットされ
た優先度数にもとづき、ノート装置１ｔ１５＃′ｉ債先
Ｉｉｔ数００ものを除き、その数艙の大きい吃のより、
優先順バッファ１６に順次並べて優先順の候補夕１ｊを
得る。これＫより第４図（ロ）に示すように、「内」を
最優先変換出力候補として取出すことができる。(2) Such calculation of the priority number is performed for all candidate annual salaries output to the candidate stack 5. As a result, as shown in FIG. 4(b), based on the priority number set in the priority number buffer 14, the notebook device 1t15#'i bonded party Iit number of 00 is excluded, and the larger number of such devices is
They are sequentially arranged in the priority order buffer 16 to obtain priority order candidates 1j. From this K, as shown in FIG. 4(b), "inner" can be extracted as the highest priority conversion output candidate.

このような処理により、固定的なｅ価では実桟できなか
った「内」と１ない」の優先度をいままでの４のと逆転
させることができ、このようにして実用的なカナ漢字変
換を行うことができる。Through this process, it is possible to reverse the priority of ``uch'' and ``1'', which could not be realized with a fixed e-value, from the previous 4, and in this way, practical kana-kanji conversion can be achieved. It can be performed.

（６）　　発明の効果本発明によれは、奉飴関の接続情報を確率マトリックス
で構成したので、ｂ！ｉ１定的にな抄がちな年始の優先
度をすでに表われ九単語の横類により柔転することがで
き、カナ漢字変換の正解率向上や変換速度の向上に寄与
することができる。(6) Effects of the Invention According to the present invention, since the connection information of Fengjieguan is configured as a probability matrix, b! i1 It is possible to change the priority at the beginning of the year, which tends to be fixed, by changing the horizontal order of the nine words that have already appeared, and it can contribute to improving the accuracy rate of kana-kanji conversion and improving the conversion speed.

[Brief explanation of the drawing]

第１図及び４２図は従来の単一間接続情報を使用した変
換方式説明図、第Ｓ図及びｌｉＡ図は本発明の一実施例
構成図である。図中、１は入カパソフア、２は前単語バッファ。５は辞書、４は検索装置、５は候補スタック、６は候補
増重装置、７は候補取出バッファ、８は確率検索値−１
９は接＆確率マトリックス、１０は電卓パソファ、１１
は評価値演算装置、１２は評価値バッファ、１５蝶乗算
部、１４は優先度数バッファ、１５はノート装備、１６
は優先順バッファ、ＰＦｉ優先度数計算装置である。特許出願人　′ｄ士過株式会社代通人弁理士　山　谷　＃１　条」“　；閉才４図（匂FIGS. 1 and 42 are explanatory diagrams of a conventional conversion method using single-to-one connection information, and FIGS. S and 1A are configuration diagrams of an embodiment of the present invention. In the figure, 1 is the input buffer, and 2 is the previous word buffer. 5 is a dictionary, 4 is a search device, 5 is a candidate stack, 6 is a candidate multiplication device, 7 is a candidate extraction buffer, 8 is a probability search value -1
9 is a tangent & probability matrix, 10 is a calculator pa sofa, 11
12 is an evaluation value calculation device, 12 is an evaluation value buffer, 15 is a butterfly multiplication unit, 14 is a priority number buffer, 15 is equipped with a notebook, 16
is a priority buffer, a PFi priority number calculation device. Patent Applicant ``D Shika Co., Ltd. Patent Attorney Yamatani #1''

Claims

[Claims]

(1) In a kana-Shiji conversion device that automatically converts a single file into a ``Shiji-Kana'' character inputted in Katakana or Kana and uses the connection information between words, the word A probability information holding means is provided which indicates the probability of a connection between the two, and the probability information holding means is characterized in that the probability information holding means extracts *@ of 111# based on the connection probability of 9 single 1 leap. Nine kana kanji conversion method using probability matrix.