JPS61221868A

JPS61221868A - Kana-to-kanji converting system

Info

Publication number: JPS61221868A
Application number: JP60062557A
Authority: JP
Inventors: Yasuo Koyama; 小山　泰男
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1985-03-27
Filing date: 1985-03-27
Publication date: 1986-10-02

Abstract

PURPOSE:To convert accurately a KANA (Japanese syllabary) character string into a mixture sentence of KANA and KANJI (Chinese characters), by selecting the independent words with priority. CONSTITUTION:A CPU 1 allocates the different weight values to an independent word and an affix and also calculates the sum total of the weight values of each group extracted as the word candidates. Then a group having a small sum total of weight values is selected with priority in case the weight value lower than the affix is allocated to the independent word. While a group having a large sum total weight values is selected with priority if the weight value higher than the affix is allocated to the independent work. The different weight values are allocated between the independent words and other additional words including affixes, numerals, etc. to increase the probability for selection of independent words. Thus an input KANA character string can be converted quickly into a desired KANA-KANJI mixture sentence without dividing the KANA character string into words of a small number of characters.

Description

【発明の詳細な説明】（技術分野）本発明は、仮名文字により入力された文章を仮名−漢字
混り文に変換する仮名−漢字変換方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a kana-to-kanji conversion method for converting a text input in kana characters into a mixed kana-to-kanji text.

（従来技術）キーボードや音声認識装置から入力された仮名文字列を
漢字混り文に変換する場合には、最大字数で一致する仮
名列を見出語として確定しつつ対応する漢字に変換する
方式、いわゆる最長一致方式が使用されている。(Prior art) When converting a kana character string input from a keyboard or voice recognition device into a sentence containing kanji, there is a method that determines the kana string that matches the maximum number of characters as a headword and converts it into the corresponding kanji. , the so-called longest match method is used.

この方式は、確率論的に正解とされた見出語も意味論的
には不正解であるという誤りをしばしば起し、最適な見
出語の選択に長い時間を必要とするという不都合がある
。This method has the disadvantage that a headword that is probabilistically correct is semantically incorrect, and it takes a long time to select the optimal headword. .

このような問題を解消するため、仮名−漢字変換用辞書
に収容した見出語の内、接辞、数詞、時間等の付属語に
対しては自立語よりも高いウェイト値を割付けて、これ
ら付属語を優先的に選択させることにより曖昧性を小さ
くした仮名−漢字変換方式が提案されている（特開昭５
９−１２１５２９号公報）。In order to solve this problem, among the headwords stored in the kana-kanji conversion dictionary, attached words such as affixes, numerals, and time are assigned higher weight values than independent words, and these attached words are A kana-to-kanji conversion method has been proposed that reduces ambiguity by preferentially selecting words (Japanese Patent Laid-Open No. 5
9-121529).

この方式によれば、接頭辞や接尾辞と同じ音「コ」、「
サン」を含む単語、例えば「コティシサン」　（固定資
産）なる仮名文字列を仮名漢字混じり文に変換しようと
すると、接頭辞「コ」、接尾辞ｒサン」が優先的に選択
されてしまうため「こ停止さん」なる仮名漢字混り文と
なってしまい自立語である「固定資産」が選択され難い
という問題がある。According to this method, the same sounds as the prefix or suffix, ``ko'', ``
If you try to convert a word containing "san", for example the kana character string "Kotishisan" (fixed asset), into a sentence containing kana and kanji, the prefix "ko" and the suffix "r-san" will be preferentially selected. There is a problem in that the sentence ``Ko-stop-san'' is mixed with kana and kanji, making it difficult for the independent word ``fixed assets'' to be selected.

（目的）本発明はこのような問題に鑑み、自立語を優先的に選択
させ、もって仮名文字列を適確に仮名漢字混り文に変換
することができる仮名−漢字変換方式を提供することを
目的とする。(Objective) In view of these problems, the present invention provides a kana-kanji conversion method that can select independent words preferentially and thereby accurately convert a kana character string into a sentence containing kana and kanji. With the goal.

（構成）そこで、以下に本発明の詳細を図示した実施例に基づい
て説明する。(Structure) Therefore, details of the present invention will be described below based on illustrated embodiments.

第１図は１本発明に使用する装置の一実施例を示すもの
であって、図中符号１は、キーボード等の入力部２から
入力されたデータに基づいて装置全体を制御するＣＰＵ
で、入力された仮名文字列及び制御コードを後述する仮
名−漢字変換部３に出力するように構成されている。３
は、前述の仮名−漢字変換部であって、図中符号４は１
文法解析部で、仮名文字列入力部５によりデータバス６
から取込まれた仮名文字タダをそれぞれ自立語検索部７
、付属語検索部８及び接辞検索部９に出力する一方、各
検索部により選択された見出語と、ウェイト値、つまり
自立語に対してはウェイト値「２」を、接辞に対しては
ウェイト値「３」を、接辞を除く付属語に対してはウェ
イト値「０」を付加して単語候補格納部１０に単語候補
データとして格納するように構成されている。この自立
語検索部７は自立語展開部７ａ、自立語辞書索引部７ｂ
、自立語辞書７ｃから構成され、また付属語検索部８は
付属語索引部８ａ、付属語辞書８ｂから構成され、さら
に接辞検索部９は、接辞索引部９ａ、接辞辞書９ｂから
構成されている。１１は１文節構成部で、単語候補格納
部１０に格納されている単語候補データを読出し、ウェ
イト値の合計が小さい順に仮名−漢字文字列出力部１２
を介してデータバス６に出力するように構成されている
。FIG. 1 shows an embodiment of a device used in the present invention, and reference numeral 1 in the figure indicates a CPU that controls the entire device based on data input from an input section 2 such as a keyboard.
The input kana character string and control code are output to a kana-kanji converter 3, which will be described later. 3
is the above-mentioned kana-kanji converter, and the code 4 in the figure is 1
In the grammar analysis section, the data bus 6 is input by the kana character string input section 5.
The independent word search unit 7 uses the kana characters tada imported from
, are output to the adjunct word search unit 8 and affix search unit 9, while the headwords selected by each search unit and weight values, that is, a weight value of “2” for independent words and a weight value of “2” for affixes, are output. It is configured to add a weight value of "3" and a weight value of "0" to attached words other than affixes, and to store them in the word candidate storage unit 10 as word candidate data. The independent word search section 7 includes an independent word expansion section 7a and an independent word dictionary index section 7b.
, an independent word dictionary 7c, an attached word search section 8 consists of an attached word index section 8a and an attached word dictionary 8b, and an affix search section 9 consists of an affix index section 9a and an affix dictionary 9b. . Reference numeral 11 denotes a clause composing unit which reads word candidate data stored in the word candidate storage unit 10 and outputs the kana-kanji character string output unit 12 in descending order of the total weight value.
The data is configured to be output to the data bus 6 via the data bus 6.

なお、図中符号は、１３は仮名漢字候補選択部、１４は
仮名漢字候補表示部、１５は仮名漢字候補確定部、１６
は表示部をそれぞれ示す。In addition, the reference numerals in the figure are: 13 is a kana-kanji candidate selection section, 14 is a kana-kanji candidate display section, 15 is a kana-kanji candidate confirmation section, and 16 is a kana-kanji candidate selection section.
indicates the display section, respectively.

このように構成した装置において、入力部２からｒコテ
イシサンヲヒ菖つカスル１なる仮名文字列が入力し、続
いて仮名漢字変換実行命令が入力されると、この仮名文
字列は仮名文字列入力部５を介して文法解析部４に入力
する０文法解析部４は、解析開始位置を先頭文字「コ」
として仮名文字列を自立語検索部７．付属語検索部８及
び接辞検索部９に振り向け、仮名文字列の先頭から各辞
書７ｃ、８ｂ、９ｂの見出語を検索する。In the device configured as described above, when a kana character string ``rkoteishi sanwohi 薖tsu kasuru 1'' is inputted from the input section 2, and then a kana-kanji conversion execution command is inputted, this kana character string is inputted to the kana character string input section 5. 0 to the grammar analysis unit 4 via
The independent word search unit 7. It is directed to the adjunct word search section 8 and the affix search section 9, and searches for headwords in each of the dictionaries 7c, 8b, and 9b from the beginning of the kana character string.

このようにして仮名文字列中の最大連続文字と−・致し
た各検索部７．８．９の見出語つまり、■自立語検索部
７からは、「コテイ」　（固定）、「シサン」　（資産）。In this way, the headwords of each search section 7, 8, and 9 that matched with the maximum consecutive characters in the kana character string, that is, ■ From the independent word search section 7, "Kotei" (fixed), "Shisan" (assets).

「ヒョウ力」　（評価） ■付属語検索部８からは、「ヲ」、「スル」を抽出する。"Hyo Power" (Evaluation) ■From the attached word search section 8, "wo", "suru" Extract.

このようにして第１の抽出が終了すると、仮名文字列を
１桁だけずらせて第２文字目「テ」を先頭として自立語
を切り出しながら同様な解析を行かい ■自立語検索部７から「ティク」　（停止）、「ヒョウ力」　（評価）■付属
語検索部８から「ヲ」、「スル」 ■接辞検索部９から「コ」を抽出する。When the first extraction is completed in this way, similar analysis is performed while shifting the kana character string by one digit and cutting out independent words starting with the second character "te". "Tiku" (stop), "Hyouriki" (evaluation) ■ Extract "wo" and "suru" from the adjunct word search section 8 ■ "ko" from the affix search section 9.

この第２の抽出が終了すると、仮名文字列をさらに１桁
だけずらせて第３文字目「イ」を先頭として自立語を切
り出しながら同様な解析を行ない、 ■自立語検索部７から「コテ」　（小手）、「イシ」　（石）、「ヒョウカ」
　（評価） ■付属語検索部８から「ヲ」、「スル」 ■接辞検索部９から「サン」を抽出する。When this second extraction is completed, the same analysis is performed by shifting the kana character string by one digit and cutting out independent words with the third character "i" as the beginning. (Kote), “Ishi” (Stone), “Hyouka”
(Evaluation) ■ “wo” and “suru” are extracted from the adjunct word search unit 8. ■ “san” is extracted from the affix search unit 9.

このようにして抽出された全てのり、出ａ　ｔ±、Ａ検
索部７．８．９に対して割当てられているウェイト値を
付けれて単語候補格納部１０に格納される。All the words extracted in this manner are stored in the word candidate storage section 10 with the weight values assigned to the search section 7.8.9.

すなわち、第１グループにおいて自立語検索部７から抽
出された「コテイ」、「シサン」、「ヒョウカ」のそれ
ぞれに対してはウェイト値２が、また付属語検索部８か
ら抽出された「ヲ」及び「スル」のそれぞれに対しては
ウェイト値０が添えられる。また、第２グループにおい
ては、「ティク」及び「ヒョウカ」のそれぞれに対して
ウェイト値２が、「コ」に対してウェイト値３が、ざら
に「ヲ」及び「スル」のそれぞれに対してウェイト値０
が添えられる。さらに、第３グループにおいては「コテ
」、「イシ」、及び「ヒ１ウカ」に対してウェイト値２
が、「サン」に対してウェイト値３が、「ヲ」及び「ス
ル」に対してウェイト値Ｏが付けられる。That is, in the first group, the weight value is 2 for each of "kotei", "shisan", and "hyoka" extracted from the independent word search section 7, and the weight value is 2 for each of "wo" extracted from the adjunct word search section 8. A weight value of 0 is added to each of "Suru" and "Suru". In addition, in the second group, a weight value of 2 is given to each of "tik" and "hyouka", a weight value of 3 is given to "ko", and a weight value of 3 is given to each of "wo" and "suru". weight value 0
is added. Furthermore, in the third group, the weight value is 2 for "Kote", "Ishi", and "Hi1 Uka".
However, a weight value of 3 is assigned to "san", and a weight value of O is assigned to "wo" and "suru".

このようにして、単語候補格納部ｌＯに格納された各グ
ループは、それぞれに含まれている見出語に対して見出
語の品詞情報と単語接続検定テーブルにより接続可能性
の有無が検定され、接続不可能な見出語を含むグループ
は除外される。In this way, each group stored in the word candidate storage unit 1O is tested for the possibility of connection with the headword contained in each group using the part-of-speech information of the headword and the word connection test table. , groups containing headwords that cannot be connected are excluded.

この検定が終了した時点で、各グループのウェイト値の
総計、つまり ■第１グループ（コテイ（２）＋シサン（２）＋ヒョウ力（２））　＋
　（ヲ（０）＋スル（０）　）　＝　６■第２グループ（ティク（２）＋ヒョウ力（２））　＋　（コ（３））
　＋　（ヲ（０）＋スル（０）　）　＝　７■第３グル
ープ（コテ（２）＋イシ（２）＋ヒョウ力（２））＋（サン
（３））　十（ヲ（０）＋スル（０））＝９が算出され
る。At the end of this test, the total weight value of each group, that is, ■1st group (Kotei (2) + Shisan (2) + Hyouriki (2)) +
(W (0) + Suru (0)) = 6 ■ 2nd group (Tik (2) + Hyouriki (2)) + (K (3))
+ (Wo (0) + Suru (0) ) = 7 ■ 3rd group (Kote (2) + Ishi (2) + Hyoriki (2)) + (Sun (3)) 10 (Wo (0) + Suru (0))=9 is calculated.

ＣＰＵＩは、この演算結果に基づいてウェイト値の総計
が自立語のウェイト値に近いグループ、つまり最小のグ
ループである第１グループを第１候補として選択し、仮
名漢字候補表示部１４を介して表示部１６に出力する。Based on this calculation result, the CPU selects the first group whose total weight value is close to the weight value of the independent word, that is, the smallest group, as the first candidate, and displays it on the kana-kanji candidate display section 14. output to section 16.

これにより、表示部１６には「固定資産を評価する」が
表示され、この文章が目的とする仮名漢字混り文である
ので仮名漢字候補確定部１５により確定される。As a result, "Evaluate fixed assets" is displayed on the display section 16, and since this sentence is the intended sentence containing kana and kanji, it is confirmed by the kana and kanji candidate confirmation section 15.

なお、第１候補として表示された仮名漢字混り文が目的
とするものと相違する場合には、仮名漢字候補選択部１
３により順次選択して目的の漢字混り文を表示させるこ
とができる。Note that if the kana-kanji mixed sentence displayed as the first candidate is different from the intended one, the kana-kanji candidate selection section 1
3, the desired kanji-containing sentences can be displayed by sequentially selecting them.

なお、この実施例においては、自立語のウェイト値を接
辞より小さくした関係上、グループ選択時にウェイト合
計値が最小のグループを優先的に出力すようにしている
が、自立語のウェイトを接辞よりも高く設定した場合に
はウェイト合計値が最大のものを優先的に選択するよう
にしても同様の作用を奏する。In this example, since the weight value of independent words is smaller than that of affixes, the group with the smallest total weight value is output preferentially when selecting a group. If the total weight value is also set high, the same effect can be obtained by preferentially selecting the one with the largest total weight value.

また、この実施例においては、接辞に対して均等につ；
、イト値を割付けているが、接辞相互間でウェイト値に
差を付けることにより目的とする漢字を一層正確に選択
することができる。また、付属語や自立語にも使用頻度
等を考慮してウェイト１付けを行なうことにより漢字変
換時の曖昧性を減少させることができる。Also, in this example, the affixes are equally;
, weight values are assigned, but the target kanji can be selected more accurately by differentiating the weight values between the affixes. Further, ambiguity at the time of kanji conversion can be reduced by giving a weight of 1 to attached words and independent words in consideration of their frequency of use.

（効果）以上、説明したように本発明によれば、自立語とこれ以
外の接辞間や数詞等の付属語との間に相異なるウェイト
値を割付けて自立語が選択される確率を高くしたので、
入力された仮名文字列を字数の少ない単語に切り刻むこ
となく目的とする漢字混り文に迅速に変換することがで
きる。(Effects) As explained above, according to the present invention, different weight values are assigned between independent words and other affixes and adjuncts such as numerals to increase the probability that independent words will be selected. So,
To quickly convert an input kana character string into a desired sentence containing kanji without cutting it into words with a small number of characters.

[Brief explanation of drawings]

図は、本発明に使用する装置の一実施例を示す構成図で
ある。The figure is a configuration diagram showing one embodiment of a device used in the present invention.

Claims

[Claims]

In addition to assigning different weight values to independent words and affixes, the total weight value of each group extracted as word candidates is calculated, and when the independent word is assigned a lower weight value than the affix, the weight value is calculated. A kana-to-kanji conversion method that preferentially selects a group with a small total weight value, and a group with a large weight value when an independent word is assigned a higher weight value than an affix.