JP2744241B2

JP2744241B2 - Character processor

Info

Publication number: JP2744241B2
Application number: JP63028199A
Authority: JP
Inventors: 英一朗戸島
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1988-02-08
Filing date: 1988-02-08
Publication date: 1998-04-28
Anticipated expiration: 2013-04-28
Also published as: JPH01204173A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は仮名を入力して漢字に変換することで日本語
の文書を作成する文字処理装置、特に名前（人名）の読
みを入力して漢字に変換する文字処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a character processing device for creating a Japanese document by inputting a kana and converting it to kanji, in particular, by inputting a reading of a name (person name). The present invention relates to a character processing device that converts characters into kanji.

［従来の技術］従来、名前の読みを漢字に変換して入力する装置にお
いては、名前の読みを表記と対応づけて記憶した辞書を
持ち仮名漢字変換するという装置がある。例えば、「二
郎」「純平」という名前についてはそれぞれ「じろう」
「じゅんぺい」の読みで辞書に登録し仮名漢字変換する
装置である。2. Description of the Related Art Conventionally, as an apparatus for converting a name reading into a kanji and inputting it, there is a device which has a dictionary in which the name reading is stored in association with a notation and converts the kana to kanji. For example, the names "Jiro" and "Junpei" are "Jiro" respectively.
This is a device that registers in a dictionary based on the reading of “junpui” and converts kana to kanji.

［発明が解決しようとしている問題点］しかしながら、名前は自由に付けることができ、バリ
エーションが豊富なので、それぞれの名前を単語として
記憶する上記従来の技術では、辞書に必要な記憶容量は
膨大になる。ところが、１つの名前に使用される読みお
よび漢字は、他の複数の名前と共通部分があるので、上
記従来技術では、辞書の記憶内容に重複が多く、メモリ
の使用効率が悪い。また、上記従来技術では、これまで
にない名前については、名前の各部が既存の名前と共通
であっても、全体としては辞書に未登録であるので、そ
の名前の読みを表記に変換することはできなかった。[Problems to be Solved by the Invention] However, since the names can be freely assigned and the variations are abundant, in the above-described conventional technique of storing each name as a word, the storage capacity required for the dictionary is enormous. . However, since the pronunciation and kanji used for one name have a common part with a plurality of other names, in the above-described conventional technology, the storage contents of the dictionary are often duplicated, and the memory use efficiency is poor. Also, in the above-described conventional technology, for a name that has not been hitherto, even if each part of the name is common to the existing name, it is not registered in the dictionary as a whole. Could not.

［問題点を解決するための手段］上記問題点を解決するために、本発明では、文字処理
装置に、名前の読みの単語未満の構成要素である読み要
素と、該読み要素のタイプと、該読み要素の表記とを対
応づけて記憶した読み要素辞書と、読み要素タイプ間の
結合規則を記憶した結合規則辞書と、名前の読みを入力
する入力手段と、該入力手段より入力された名前の読み
を、前記読み要素辞書を参照して複数の読み要素に分解
する分解手段と、該分解手段により分解された複数の読
み要素が結合可能か否かを、前記読み要素辞書に記憶さ
れた当該複数の読み要素のタイプに基づいて、前記結合
規則辞書を参照して判定する結合判定手段と、該結合判
定手段により結合可能と判定された複数の読み要素の各
々を、前記読み要素辞書で対応づけられた表記に変換す
る変換手段とを備える。[Means for Solving the Problems] In order to solve the above problems, in the present invention, the character processing device includes a reading element that is a component that is less than the word of the reading of the name, a type of the reading element, A reading element dictionary that stores the notation of the reading elements in association with each other, a connection rule dictionary that stores connection rules between the reading element types, input means for inputting a name reading, and a name input from the input means. Reading means for decomposing the reading into a plurality of reading elements with reference to the reading element dictionary, and whether the plurality of reading elements decomposed by the decomposing means can be combined is stored in the reading element dictionary. Based on the types of the plurality of reading elements, a combination determining unit that determines by referring to the combination rule dictionary, and each of the plurality of reading elements that are determined to be connectable by the combination determining unit, in the reading element dictionary. Associated Conversion means for converting into notation.

［作用］上記構成により、入力手段より入力された名前の読み
を、読み要素辞書を参照して分解手段が複数の読み要素
に分解する。分解された複数の読み要素が結合可能か否
かを、結合判定手段が、前記読み要素辞書に記憶された
当該複数の読み要素のタイプに基づいて、結合規則辞書
を参照して判定する。結合可能と判定された複数の読み
要素の各々を、変換手段が、前記読み要素辞書で対応づ
けられた表記に変換する。[Operation] With the above configuration, the decomposing means decomposes the reading of the name input from the input means into a plurality of reading elements with reference to the reading element dictionary. Based on the types of the plurality of reading elements stored in the reading element dictionary, the connection determination unit determines whether or not the plurality of decomposed reading elements can be combined with each other by referring to the combination rule dictionary. The conversion unit converts each of the plurality of reading elements determined to be connectable into a notation associated with the reading element dictionary.

［実施例］以下図面を参照しながら本発明を詳細に説明する。EXAMPLES Hereinafter, the present invention will be described in detail with reference to the drawings.

第１図は本発明の全体構成の一例である。 FIG. 1 shows an example of the overall configuration of the present invention.

図示の構成において、CPUは、マイクロプロセツサで
あり、文字処理のための演算、論理判断等を行ない、ア
ドレスバスAB、コントロールバスCB、データバスDBを介
して、それらのバスに接続された各構成要素を制御す
る。In the configuration shown in the figure, the CPU is a microprocessor, performs calculations for character processing, performs logical decisions, etc., and is connected to those buses via an address bus AB, a control bus CB, and a data bus DB. Control the components.

アドレスバスABはマイクロプロセツサCPUの制御の対
象とする構成要素を指示するアドレス信号を転送する。
コントロールバスCBはマイクロプロセツサCPUの制御の
対象とする各構成要素のコントロール信号を転送して印
加する。データバスDBは各構成機器相互間のデータの転
送を行なう。The address bus AB transfers an address signal indicating a component to be controlled by the microprocessor CPU.
The control bus CB transfers and applies a control signal of each component to be controlled by the microprocessor CPU. The data bus DB transfers data between the components.

つぎにROMは、読出し専用の固定メモリであり、第９
図〜第12図につき後述するマイクロプロセツサCPUによ
る制御の手順等を記憶させておく。Next, the ROM is a fixed read-only memory.
The control procedure and the like by the microprocessor CPU, which will be described later with reference to FIGS.

また、RAMは、１ワード16ビツトの構成の書込み可能
のランダムアクセスメモリであって、各構成要素からの
各種データの一時記憶に用いる。The RAM is a writable random access memory having a structure of 16 bits per word, and is used for temporarily storing various data from each component.

EDICは読み要素辞書であり、名前の読み要素を、読み
要素タイプ、読み要素表記と対応付けて記憶する。EDIC is a reading element dictionary, and stores a reading element of a name in association with a reading element type and a reading element notation.

RDICは結合規則辞書であり、各読み要素間の結合規則
を記憶する。RDIC is a binding rule dictionary that stores the binding rules between each reading element.

Y1TBLは第１読みテーブルである。名前の第１読み要
素の候補を記憶する。Y1TBL is a first reading table. The candidate of the first reading element of the name is stored.

Y2TBLは第２読みテーブルである。名前の第２読み要
素の候補を記憶する。Y2TBL is a second reading table. The candidate of the second reading element of the name is stored.

CTBLは組合せテーブルである。第１読み要素の候補と
第２読み要素の候補との組合せの候補を記憶する。CTBL is a combination table. A candidate for a combination of the first reading element candidate and the second reading element candidate is stored.

HTBLは変換候補テーブルである。仮名漢字変換の結
果、出力される候補を記憶する。HTBL is a conversion candidate table. The candidates output as a result of the kana-kanji conversion are stored.

TBUFはテキストバツフアであり、編集中の文書データ
を一時記憶するエリアである。TBUF is a text buffer and is an area for temporarily storing document data being edited.

KBはキーボードであって、アルフアベツトキー、ひら
かなキー、カタカナキー等の文字記号入力キー、及び、
変換キー等の本文字処理装置に対する各種機能を指示す
るための各種のフアンクシヨンキーを備えている。KB is a keyboard. Alphabet key, hiragana key, katakana key and other character symbol input keys, and
Various function keys for instructing various functions to the character processing apparatus such as a conversion key are provided.

DISKは文書データ、及び辞書データを記憶するための
外部記憶であり、作成された文書の保管を行ない、保管
された文書はキーボードの指示により、必要な時呼び出
される。また、辞書データは適当なタイミングでRAM上
のエリアDICにロードされ、参照される。DISK is an external storage for storing document data and dictionary data, and stores created documents. The stored documents are called up when necessary according to keyboard instructions. The dictionary data is loaded into the area DIC on the RAM at an appropriate timing, and is referred to.

CRはカーソルレジスタである。CPUにより、カーソル
レジスタの内容を読み書きできる。後述するCRTコント
ローラCRTCは、ここに蓄えられたアドレスに対応する表
示装置CRT上の位置にカーソルを表示する。CR is a cursor register. The CPU can read and write the contents of the cursor register. A CRT controller CRTC described later displays a cursor at a position on the display device CRT corresponding to the address stored here.

DBUFは表示用バツフアメモリで、表示すべきデータの
パターンを蓄える。DBUF is a display buffer memory for storing patterns of data to be displayed.

CRTCはカーソルレジスタCR及びバツフアDBUFに蓄えら
れた内容を表示器CRTに表示する役割を担う。The CRTC plays a role of displaying the contents stored in the cursor register CR and the buffer DBUF on the display CRT.

またCRTは陰極線管等を用いた表示装置であり、その
表示装置CRTにおけるドツト構成の表示パターンおよび
カーソルの表示をCRTコントローラで制御する。さら
に、CGはキヤラクタジエネレータであって、表示装置CR
Tに表示する文字、記号のパターンを記憶するものであ
る。The CRT is a display device using a cathode ray tube or the like, and the display pattern of the dot configuration and the display of the cursor on the display device CRT are controlled by a CRT controller. Further, CG is a character generator, and the display device CR
This stores the pattern of characters and symbols displayed on T.

かかる各構成要素からなる本発明文字処理装置におい
ては、キーボードKBからの各種の入力に応じて作動する
ものであって、キーボードKBからの入力が供給される
と、まず、インタラプト信号がマイクロプロセツサCPU
に送られ、そのマイクロプロセツサCPUがROM内に記憶し
てある各種の制御信号を読出し、それらの制御信号に従
って各種の制御が行なわれる。The character processing device of the present invention comprising such components operates in response to various inputs from the keyboard KB. When an input from the keyboard KB is supplied, first, an interrupt signal is generated by the microprocessor. CPU
The microprocessor CPU reads various control signals stored in the ROM, and performs various controls in accordance with the control signals.

第２図は本発明の原理を説明した図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

（ａ）は読み要素辞書のサンプルである。例えば、
「じゅん」という読みでは「純」「順」の表記があり、
読み要素タイプが１である。(A) is a sample of the reading element dictionary. For example,
The reading "Jun" has the notation of "pure""order",
The reading element type is 1.

（ｂ）は結合規則のサンプルである。第１読み要素に
「３」とあり、第２読み要素に「４」とあるのは、タイ
プ３とタイプ４の読み要素の結合が可能であるというこ
とを意味しており、具体的には「次郎」「二郎」の名前
が可能であることを意味する。ここで登録されていない
タイプの読み要素の組合せは名前としてはありえないと
いうことになる。例えば、「二平（じぺい）」は結合規
則に「３−５」が登録されていないので名前としては変
換されない。(B) is a sample connection rule. The fact that the first reading element is “3” and the second reading element is “4” means that the type 3 and type 4 reading elements can be combined, and specifically, This means that the names "Jiro" and "Jiro" are possible. Here, the combination of the reading elements of the unregistered type cannot be a name. For example, “Jihei” is not converted as a name because “3-5” is not registered in the combination rule.

第３図は読み要素辞書（EDIC）の構成を示した図であ
る。FIG. 3 is a diagram showing a configuration of a reading element dictionary (EDIC).

各読み要素は８バイトで構成される。 Each reading element is composed of 8 bytes.

先頭４バイトは読みである。読みは１文字１バイトで
記憶される。平仮名についてはJIS X 0208コードの下位
バイトを使用しスペースは20Hを使用する。余った部分
にはスペースが入る。The first four bytes are reading. Yomi is stored in one byte per character. For hiragana, use the lower byte of the JIS X 0208 code and use 20H for the space. Extra space is left in the space.

次の２バイトは読み要素表記である。表記はJIS X 08
08コードを使用し１文字２バイトで記憶される。The next two bytes are a reading element notation. Notation is JIS X 08
Stored in 2 bytes per character using 08 code.

次の１バイトは読み要素タイプである。 The next byte is the reading element type.

最後の１バイトは頻度である。名前の一部としてよく
使用されるかどうかを表現する。０〜255の値を取り、
大きいほど頻度が高いことを意味する。The last byte is the frequency. Expresses whether it is often used as part of a name. Takes a value from 0 to 255,
The larger the value, the higher the frequency.

第４図は結合規則辞書の構成を示した図である。 FIG. 4 is a diagram showing a configuration of the combination rule dictionary.

１つの結合列は３バイトで構成され、２読み要素の結
合まで対応している。One combination column is composed of 3 bytes, and corresponds to the combination of two reading elements.

先頭１バイトは第１読み要素のタイプを表現する。 The first byte indicates the type of the first reading element.

次の１バイトは第２読み要素のタイプを表現する。 The next byte indicates the type of the second reading element.

最後の１バイトはその結合列の尤もらしさ（尤度）を
意味する。０〜255の値を取り、大きいほどその結合が
尤もらしいことを意味する。The last one byte indicates the likelihood (likelihood) of the combined sequence. It takes a value from 0 to 255, and the larger the value, the more likely the combination is.

第５図は入力バツフア（IBUF）の構成を示した図であ
る。FIG. 5 is a diagram showing the configuration of an input buffer (IBUF).

キーボードより入力した仮名は全てIBUFに一旦格納さ
れる。その後変換処理を受けて文書上に出力される。IB
UF上の文字はJIS X 0208コードで記述され、１文字２バ
イトで構成される。最終文字の次からはOFF Hで埋まっ
ている。All pseudonyms entered from the keyboard are temporarily stored in IBUF. After that, it is converted and output on a document. IB
Characters on the UF are described in JIS X 0208 code and are composed of 2 bytes per character. OFF H is filled after the last character.

第６図は第１読みテーブル（Y1TBL）、第２読みテー
ブル（Y2TBL）の構成を示した図である。FIG. 6 is a diagram showing the configuration of the first reading table (Y1TBL) and the second reading table (Y2TBL).

Y1TBLは入力読み列を分解した読み要素辞書をサーチ
して得られた先頭の読み要素（第１読み要素）の候補を
記憶したテーブルである。Y1TBL is a table that stores candidates for the first reading element (first reading element) obtained by searching the reading element dictionary obtained by decomposing the input reading string.

またY2TBLは入力読み列を分解した読み要素辞書をサ
ーチして得られた２番目の読み要素（第２読み要素）の
候補を記憶したテーブルであり、Y1TBLと同一の構成を
している。Y2TBL is a table storing second reading element (second reading element) candidates obtained by searching a reading element dictionary obtained by decomposing an input reading string, and has the same configuration as Y1TBL.

一つの読み要素候補は４バイトで構成される。 One reading element candidate is composed of 4 bytes.

先頭１バイトは読み開始位置を表わす。その読み要素
の始点が入力バツフアの何文字目であるかを記述する。
例えば、入力が「じゅんぺい」であったとき読み要素
「ぺい」の始点は先頭から４文字であるので読み開始位
置として「４」を記憶する。The first byte indicates a reading start position. Describe the character of the input buffer where the starting point of the reading element is.
For example, when the input is “juni”, the starting point of the reading element “pi” is four characters from the beginning, so “4” is stored as the reading start position.

次の１バイトは読み数を意味し、その読み要素の読み
数を記憶する。例えば、読み要素「ぺい」は読み数
「２」を記憶する。The next byte indicates the number of readings, and stores the number of readings of the reading element. For example, the reading element “ぺ” stores the number of readings “2”.

末尾２バイトはその読み要素の読み要素辞書（EDIC）
上へのアドレスを記憶する。The last two bytes are the reading element dictionary (EDIC) of the reading element.
Store the address up.

第７図は組合せテーブルの構成を示した図である。 FIG. 7 is a diagram showing the configuration of the combination table.

入力読み列を分解し、どのように読み要素を結合れば
良いかを示す結合列を記憶するテーブルである。6 is a table that stores a combined sequence indicating how to decompose an input read sequence and combine read elements.

ここに登録されるものは結合規則を満たさなければな
らない。Those registered here must meet the binding rules.

結合列は１つ５バイトで構成される。 Each connection string is composed of 5 bytes.

先頭２バイトは第１読み要素を表わし、EDIC上のアド
レスを記憶する。The first two bytes represent a first reading element and store an address on the EDIC.

次の２バイトは第２読み要素を表わし、EDIC上のアド
レスを記憶する。The next two bytes represent the second reading element and store the address on the EDIC.

最後の１バイトはその組合せの尤度を示す。 The last one byte indicates the likelihood of the combination.

第８図は仮名漢字変換の結果得られた変換結果の候補
を記憶する変換候補テーブル（HTBL）の構成を示した図
である。FIG. 8 is a diagram showing a configuration of a conversion candidate table (HTBL) that stores conversion result candidates obtained as a result of kana-kanji conversion.

先頭１バイトに候補数を格納し、そのあとに各候補を
候補数だけ格納する。The number of candidates is stored in the first byte, and thereafter each candidate is stored by the number of candidates.

各候補は以下のように構成される。 Each candidate is configured as follows.

先頭１バイトはその候補の文字数を意味する。 The first byte indicates the number of characters of the candidate.

次の１バイトはその候補の尤度を記憶する。 The next byte stores the likelihood of the candidate.

そのあとは候補の構成文字が１文字２バイトで順番に
入る。コードはJIS X 0208コードで記憶される。After that, the constituent characters of the candidate are put in order with 2 bytes per character. The codes are stored as JIS X 0208 codes.

上述の実施例の動作をフローに従って説明する。 The operation of the above embodiment will be described according to a flow.

第９図はキー入力を取り込み、処理を行なう部分のフ
ローチャートである。FIG. 9 is a flowchart of a part for taking in key input and performing processing.

ステツプ９−１はキーボードからのデータを入力バツ
フアIBUFに取り込む処理である。IBUF内にもし変換キー
のデータが含まれていたときはかな漢字変換を行なわな
ければならず、ステツプ９−２に進む。そうでなければ
通常の編集処理を行なうのでステツプ９−４に進む。Step 9-1 is a process for taking data from the keyboard into the input buffer IBUF. If the data of the conversion key is included in the IBUF, the kana-kanji conversion must be performed, and the process proceeds to step 9-2. Otherwise, normal editing processing is performed, so that the procedure proceeds to step 9-4.

ステツプ９−２において第10図に詳述するようにIBUF
上の入力列を漢字に変換し、テキストバツフアTBUF上に
出力する。In step 9-2, as described in detail in FIG.
Convert the above input string to kanji and output it on the text buffer TBUF.

ステツプ９−３においてテキスト上に出力された変換
結果を表示する。In step 9-3, the conversion result output on the text is displayed.

ステツプ９−４はカーソル移動、文字入力、文書保
存、等の通常の文字処理装置で公知の処理を行なうもの
であり、説明は省略する。Step 9-4 is for performing well-known processing such as cursor movement, character input, document storage, etc. in a normal character processing apparatus, and a description thereof will be omitted.

第10図はステツプ９−２の仮名漢字変換処理を詳細化
したものである。FIG. 10 shows the details of the kana-kanji conversion process in step 9-2.

ステツプ10−１において通常の仮名漢字変換を行な
い、変換候補を変換候補テーブルHTBLに出力する。In step 10-1, ordinary kana-kanji conversion is performed, and conversion candidates are output to a conversion candidate table HTBL.

ステツプ10−２において第11図に詳述する名前変換を
行ない、やはり同じように変換候補を変換候補テーブル
HTBLに先の仮名漢字変換結果にマージするように出力す
る。In step 10-2, the name conversion described in detail in FIG. 11 is performed.
Output to HTBL to be merged with the previous kana-kanji conversion result.

ステツプ10−３において変換候補テーブル上に出力さ
れた変換候補を尤度順にソートする。（尤度の大きいも
のを上位とする。）ステツプ10−４において変換候補テーブル上の候補を
文書TBUF上に出力する。文書上に出力された変換結果は
通常は第１候補しか表示されず次候補等の特別な操作を
行なうことにより第２候補以下が表示されるが、これら
の候補選択の制御は通常の日本語ワードプロセツサ等に
より後置であるのでここでは記述しない。In step 10-3, the conversion candidates output on the conversion candidate table are sorted in the order of likelihood. (The one with the higher likelihood is the higher order.) At step 10-4, the candidates on the conversion candidate table are output on the document TBUF. Normally, only the first candidate is displayed in the conversion result output on the document, and the second candidate and the like are displayed by performing a special operation such as the next candidate. It is not described here because it is postfixed by a word processor or the like.

第11図はステツプ10−２の名前変換処理を更に詳細化
したものである。FIG. 11 shows the name conversion process of step 10-2 in more detail.

ステツプ11−１において第12図に詳述するように与え
られた入力IBUF上にあるあらゆる読み要素の組合せを組
合せテーブルCTBL上にリストアツプする。At step 11-1, any combination of read elements on the input IBUF provided is restored on the combination table CTBL as described in detail in FIG.

ステツプ11−２はCTBL上に作成された読み要素の組合
せを表記に展開して変換候補テーブルHTBLに出力する。
出力する際はCTBL上の尤度も合せてHTBLに登録する。Step 11-2 develops the combination of the reading elements created on the CTBL into a notation and outputs it to the conversion candidate table HTBL.
When outputting, the likelihood on the CTBL is also registered in the HTBL.

第12図はステツプ11−１の組合せ作成処理を更に詳細
化したものである。FIG. 12 shows the combination creating process of step 11-1 in more detail.

ステツプ12−１は入力バツフアIBUF上にある第１読み
要素のあらゆる候補をサーチして調べる処理である。例
えば、入力「じゅんぺい」にたいしては「じ」「じゅ」
「じゃん」「じゅんぺ」「じゅんぺい」の読み要素が存
在するかどうか読み要素素辞書EDICをサーチする。サー
チした結果、存在した候補は第１読みテーブルY1TBLに
登録する。例えば、「じゅんぺい」の場合「じゅん」の
みがY1TBLに登録される。Step 12-1 is a process for searching and examining all candidates for the first reading element on the input buffer IBUF. For example, for the input "Junpui", "Ji" and "Ju"
Searches the reading element element dictionary EDIC for the presence of the reading elements of "Jan", "Junpo", and "Jumpei". As a result of the search, the existing candidates are registered in the first reading table Y1TBL. For example, in the case of “juni”, only “jun” is registered in Y1TBL.

ステツプ12−２において第１読み要素カウンタｉを１
に初期化する。ｉは第１読みテーブル上の何番目の読み
要素を現在処理中であるかを管理する変数である。In step 12-2, the first reading element counter i is incremented by one.
Initialize to i is a variable for managing the number of the reading element on the first reading table that is currently being processed.

ステツプ12−３においてY1TBLよりｉ番目の読み要素
を第１読み要素としてGETする。すなわち、読み要素辞
書上のアドレス、読み要素タイプ、頻度等の情報を取り
出す。In step 12-3, the ith reading element from Y1TBL is GET as the first reading element. That is, information such as an address, a reading element type, and a frequency on the reading element dictionary is extracted.

ステツプ12−４において正常にGETできたかどうか判
定し、もしGETできていなければリターンする。GETでき
ていればステツプ12−５に進む。At step 12-4, it is determined whether or not the GET has been normally completed, and if not, the process returns. If GET is successful, proceed to step 12-5.

ステツプ12−５において現在処理中の第１読み要素に
引き続く読み要素の候補をサーチする。例えば、入力
「じゅんぺい」にたいしては第１読み要素は「じゅん」
が現在処理中であるから「ぺ」「ぺい」の読み要素が存
在するかどうか読み要素辞書EDICをサーチする。サーチ
した結果、存在した候補は第２読みテーブルY2TBLに登
録する。例えば、「じゅんぺい」の場合「ぺい」のみが
Y2TBLに登録される。In step 12-5, a search is made for a candidate for a reading element following the first reading element currently being processed. For example, for the input “junpui”, the first reading element is “jun”
Is currently being processed, a search is made in the reading element dictionary EDIC to determine whether there are reading elements "ぺ" and "ぺ". As a result of the search, the existing candidates are registered in the second reading table Y2TBL. For example, in the case of "juni", only "pi"
Registered in Y2TBL.

ステツプ12−６において第２読み要素カウンタｊを１
に初期化する。ｊは第２読テーブル上の何番目の読み要
素を現在処理中であるかを管理する変数である。In step 12-6, the second reading element counter j is set to 1
Initialize to j is a variable for managing the number of the reading element on the second reading table that is currently being processed.

ステツプ12−７においてY2TBLよりｊ番目の読み要素
を第２読み要素としてGETする。すなわち、読み要素辞
書上のアドレス、読み要素タイプ、頻度等の情報を取り
出す。In step 12-7, the j-th read element from Y2TBL is GET as the second read element. That is, information such as an address, a reading element type, and a frequency on the reading element dictionary is extracted.

ステツプ12−８において正常にGETできたかどうか判
定し、もしGETできていなければステツプ12−13に分岐
する。GETできていればステツプ12−９に進む。At step 12-8, it is determined whether or not the GET has been normally completed. If not, the process branches to step 12-13. If GET has been made, proceed to step 12-9.

ステツプ12−９においてGETされている第１読み要
素、第２読み要素の結合が認められるかどうかを結合規
則辞書をサーチして求める。サーチした結果、決行が認
められるときは尤度情報を取り出す。In step 12-9, it is determined by searching the combination rule dictionary whether the combination of the first reading element and the second reading element that have been GET is recognized. As a result of the search, if a decision is made, likelihood information is extracted.

ステツプ12−10において結合が認められるかどうか判
定し、認められるときはステツプ12−11に進むが、認め
られないときはステツプ12−12にスキツプする。In step 12-10, it is determined whether or not a connection is found. If the connection is found, the process proceeds to step 12-11. If not, the process skips to step 12-12.

ステツプ12−11において結合の認められた組合せを組
合せテーブルCTBLに登録する。登録のとき、組合せの尤
度を、第１読み要素の頻度、第２読み要素の頻度、結合
の尤度よりある種の関数を用いて計算を行なう。関数の
一例としては「第１読み要素の頻度＋第２読み要素の頻
度＋結合の尤度」等の単なる加算などがある。In step 12-11, the combination for which the connection has been recognized is registered in the combination table CTBL. At the time of registration, the likelihood of the combination is calculated using a certain function based on the frequency of the first reading element, the frequency of the second reading element, and the likelihood of the combination. An example of the function is a simple addition of “frequency of first reading element + frequency of second reading element + likelihood of combination”.

ステツプ12−12において次の第２読み要素を取り出す
ためにｊの値に１を加算し、ステツプ12−７にループす
る。In step 12-12, 1 is added to the value of j to retrieve the next second reading element, and the process loops to step 12-7.

ステツプ12−13において次の第１読み要素を取り出す
ためにｉの値に１を加算し、ステツプ12−３にループす
る。In step 12-13, 1 is added to the value of i to retrieve the next first reading element, and the process loops to step 12-3.

［他の実施例］以上の説明において、読み要素の組合せは２つまでに
限定して説明したが、更に要素数を拡張するのは容易に
可能である。例えば、「大五郎」「裕次郎」などは
「大」「五」［郎」、「裕」「次」「郎」の３要素に分
割可能である。このようにｎ要素までの結合を実現する
には第１読みテーブル、第２読みテーブルのほかに第ｎ
読みテーブルまで拡張し、更に組合せテーブルの１結合
当たりのサイズを拡張して第ｎ読み要素まで格納できる
ようにし、読み要素タイプを必要なだけ新設すればよ
い。[Other Embodiments] In the above description, the combination of reading elements is limited to two, but the number of elements can be easily expanded. For example, "Daigoro", "Yujiro", etc. can be divided into three elements of "Daiga", "Go" [Roro], and "Yu", "Next", "Roro". In order to realize the connection up to n elements in this way, in addition to the first reading table and the second reading table,
The reading table may be expanded, and the size per combination of the combination table may be further expanded to store up to the n-th reading element, and the reading element type may be newly provided as needed.

また、１つの読み要素は１漢字を表現するようにテー
ブルを構成しているが、複数漢字を表現するように構成
することも容易である。例えば、「百合（ゆり）」など
は、「百（ゆ）」と「合（り）」に分割するよりも「ゆ
り」で「百合」を表わすと考えたほうが得策である。こ
れは読み要素辞書の表記の欄を拡張するたけで表現でき
る。Further, although one reading element constitutes a table so as to represent one kanji, it is easy to constitute so as to represent a plurality of kanji. For example, it is better to think that “Yuri” represents “Yuri” in “Yuri” than to be divided into “Hyuri (Yuri)” and “Yuri”. This can be expressed simply by expanding the notation column of the reading element dictionary.

また、文節変換で名前を変換する場合を説明したが、
合成された名前を１文節と見ることによって最長一致
法、２文節最長一致法、文節数最小法等の手法により一
括変換を実現するように構成することもできる。Also, the case where the name is converted by phrase conversion was explained,
By regarding the synthesized name as one phrase, the batch conversion can be realized by a method such as the longest match method, the two-segment longest match method, or the minimum number of phrases method.

［発明の効果］以上説明したように、本発明によれば、名前の読みの
単語未満の構成要素である読み要素に対応づけて読み要
素のタイプ及び表記を記憶するとともに、読み要素タイ
プ間の結合規則を記憶することで、名前を記憶するよう
にしたので、メモリの使用効率が高く、少ない記憶容量
で大量の人名を記憶することができる。また、これまで
にない名前でも、既存の名前と共通の構成要素の組み合
わせであれば、読みから表記へ変換することができると
いう効果がある。[Effects of the Invention] As described above, according to the present invention, the type and notation of a reading element are stored in association with the reading element that is a constituent element of less than the word of the reading of the name, and the Since the names are stored by storing the association rules, the efficiency of use of the memory is high and a large number of names can be stored with a small storage capacity. In addition, there is an effect that even a name that has not existed so far can be converted from reading to notation if it is a combination of an existing name and a common component.

[Brief description of the drawings]

第１図は本発明の全体構成のブロツク図第２図は本発明の原理を示した図第３は読み要素辞書の構成を示した図第４図は結合規則辞書の構成を示した図第５図は入力バツフアの構成を示した図第６図は第１読みテーブル、第２読みテーブルの構成を
示した図第７図は組合せテーブルの構成を示した図第８図は変換候補テーブルの構成を示した図第９図〜第12図は本発明文字処理装置の動作を示すフロ
ーチヤート。 DISK……外部記憶 CPU……マイクロプロセツサ ROM……読出し専用メモリ RAM……ランダムアクセスメモリ EDIC……読み要素辞書 RDIC……結合規則辞書 IBUF……入力バツフア Y1TBL……第１読みテーブル Y2TBL……第２読みテーブル CTBL……組合せテーブル HTBL……変換候補テーブル TBUF……テキストバツフアFIG. 1 is a block diagram of the overall configuration of the present invention. FIG. 2 is a diagram illustrating the principle of the present invention. FIG. 3 is a diagram illustrating a configuration of a reading element dictionary. FIG. 4 is a diagram illustrating a configuration of a combination rule dictionary. FIG. 5 shows the configuration of the input buffer. FIG. 6 shows the configuration of the first reading table and the second reading table. FIG. 7 shows the configuration of the combination table. FIG. 8 shows the configuration of the conversion candidate table. FIG. 9 to FIG. 12 are flow charts showing the operation of the character processing apparatus of the present invention. DISK External memory CPU Microprocessor ROM Read-only memory RAM Random access memory EDIC Read element dictionary RDIC Connection rule dictionary IBUF Input buffer Y1TBL First read table Y2TBL Second reading table CTBL …… Combination table HTBL …… Conversion candidate table TBUF …… Text buffer

Claims

(57) [Claims]

1. A reading element dictionary storing a reading element which is a constituent element less than a word of a reading of a name, a type of the reading element, and a notation of the reading element, and a connection between the reading element types. A combination rule dictionary storing rules, input means for inputting a name reading, decomposition means for decomposing the name reading input from the input means into a plurality of reading elements with reference to the reading element dictionary, A connection determination for judging whether or not a plurality of reading elements decomposed by the decomposing means can be connected by referring to the connection rule dictionary based on the types of the plurality of reading elements stored in the reading element dictionary. And a conversion unit for converting each of the plurality of reading elements determined to be connectable by the connection determination unit into a notation associated with the reading element dictionary.