JP2000259616A

JP2000259616A - Device for collating user defined word dictionary

Info

Publication number: JP2000259616A
Application number: JP11064278A
Authority: JP
Inventors: Hidetoshi Ono; 秀敏小野
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1999-03-11
Filing date: 1999-03-11
Publication date: 2000-09-22

Abstract

PROBLEM TO BE SOLVED: To obtain a collation device capable of effectively utilizing plural user dictionaries. SOLUTION: This user defined word dictionary collation device storing plural user dictionaries storing at least plural word data and capable of collating word data with character code data and executing processing based on the collated result is provided with a text analysis part 2 for storing user defined word data in a user dictionary table to be a data storing area in the previously determined priority order of user dictionaries, collating word data stored in the user dictionary table with character code data inputted from a text input part 1 and executing processing.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ユーザ定義語辞書
照合装置に関するものである。特に、テキストデータを
音声合成処理し、音声として発音する装置において、テ
キストデータを、例えば単語毎に変換する際に用いるユ
ーザ定義語辞書を有効に適用するためのものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a user-defined word dictionary matching device. In particular, in a device that performs text-to-speech processing on text data and pronounces it as speech, it is intended to effectively apply a user-defined word dictionary used when converting text data, for example, word by word.

【０００２】[0002]

【従来の技術】従来からテキストデータ（文字をあらか
じめ定めた文字コードに変換したデータであり、漢字や
仮名文字列が混在したものを含む）を音声合成処理して
音声合成データに変換し、スピーカ等の音出力手段で発
音させる技術（以下、音声合成技術という）がある。音
声合成技術は、テキストデータを音声合成データに変換
するテキスト音声合成エンジンと呼ばれる装置によって
行われる。2. Description of the Related Art Conventionally, text data (data in which characters are converted into predetermined character codes, including data in which kanji and kana character strings are mixed) is subjected to speech synthesis processing, converted into speech synthesis data, and converted into speaker data. (Hereinafter referred to as speech synthesis technology). The speech synthesis technology is performed by a device called a text-to-speech synthesis engine that converts text data into speech synthesis data.

【０００３】ここで、テキストデータを音声合成データ
に変換する工程について考える。テキスト音声合成エン
ジンは、入力されたテキストデータを、ある単位で記録
された語のデータ（一般的に単語単位であるので、以
下、単語データという）と照合し、単語毎に区切る。単
語データは、一般の登録単語データを記憶したシステム
単語データ辞書（以下、システム辞書という）と、ユー
ザ固有に登録した登録単語データを記憶するユーザ定義
単語データ辞書（以下、ユーザ辞書という）の２つに分
けることができる。ここで、テキストデータとの照合を
行う際には、システム辞書の単語データよりもユーザ辞
書の単語データを優先的に照合し、音声合成データを作
成している。特に、ユーザ辞書に登録された登録単語デ
ータは、本来の文字の読み方と違うことがあり、ユーザ
辞書を効率的に用いることが、精度の面で重要な意味を
持つ。Here, a process of converting text data into speech synthesis data will be considered. The text-to-speech engine collates the input text data with word data recorded in a certain unit (generally a word unit, henceforth referred to as word data), and separates each word. The word data includes a system word data dictionary (hereinafter, referred to as a system dictionary) that stores general registered word data, and a user-defined word data dictionary (hereinafter, referred to as a user dictionary) that stores registered word data uniquely registered by a user. Can be divided into two. Here, when performing collation with text data, word data in the user dictionary is preferentially collated with word data in the system dictionary to generate speech synthesis data. In particular, registered word data registered in a user dictionary may differ from the original character reading method, and it is important to use the user dictionary efficiently in terms of accuracy.

【０００４】[0004]

【発明が解決しようとする課題】しかし、従来のテキス
ト音声合成エンジンでは、ユーザ辞書を複数用いること
が考慮されていなかったので、指定できるユーザ辞書は
１つであった。そのため、ユーザは、例えば人名を登録
したユーザ辞書と地名を登録したユーザ辞書とを作成し
ても、どちらか一方しか登録できなかった。そこで、ど
うしてもどちらの単語データも有効に利用したい場合
は、それぞれ独立に作成した単語データを１つにし、ユ
ーザ辞書として登録するしかなかった。ただ、この場
合、１つの辞書に登録されている単語データ数が多くな
るので、例えば、ある人名の単語データを修正したいと
き等に、その単語データを検索するのに非常に時間を費
やさなければならなかった。また、単語データの編集を
する際に利用するメモリの容量も単語データの数に応じ
て増加せざるを得ないという問題があった。However, the conventional text-to-speech synthesis engine does not consider using a plurality of user dictionaries, so that only one user dictionary can be specified. For this reason, for example, even if a user creates a user dictionary in which personal names are registered and a user dictionary in which place names are registered, only one of them can be registered. Therefore, in order to effectively use both word data, there is no other choice but to combine the independently created word data into one and register it as a user dictionary. However, in this case, since the number of word data registered in one dictionary increases, for example, when it is desired to correct the word data of a certain person, it is necessary to spend a very long time searching for the word data. did not become. Further, there is a problem that the capacity of a memory used when editing word data must be increased in accordance with the number of word data.

【０００５】そこで、複数のユーザ辞書を有効に利用す
ることができる照合装置実現が望まれていた。[0005] Therefore, it has been desired to realize a collation device that can effectively use a plurality of user dictionaries.

【０００６】[0006]

【課題を解決するための手段】本発明に係るユーザ定義
語辞書照合装置は、データ格納領域であるユーザ辞書テ
ーブルに、あらかじめ定められたユーザ定義語辞書の優
先順に、ユーザ定義語データを格納するユーザ定義語デ
ータ格納手段と、ユーザ辞書テーブルに格納されたユー
ザ定義語データと入力された文字コードデータとを照合
して処理を行う照合処理手段とを備えている。本発明に
おいては、ユーザ定義語データ格納手段は、複数のユー
ザ定義語辞書の照合優先順をあらかじめ定めておき、デ
ータ格納領域であるユーザ辞書テーブルに格納する。そ
して、照合処理手段が、ユーザ辞書テーブルに格納され
たユーザ定義語データと入力された文字コードデータと
を照合して処理を行う。A user-defined word dictionary matching device according to the present invention stores user-defined word data in a user dictionary table, which is a data storage area, in a predetermined priority order of the user-defined word dictionary. There is provided a user-defined word data storage means, and a collation processing means for collating and processing the user-defined word data stored in the user dictionary table with the input character code data. In the present invention, the user-defined word data storage means determines in advance the order of priority for collation of a plurality of user-defined word dictionaries, and stores them in a user dictionary table, which is a data storage area. Then, the matching processing unit performs a process by comparing the user-defined word data stored in the user dictionary table with the input character code data.

【０００７】また、本発明に係るユーザ定義語辞書照合
装置は、複数のユーザ定義語辞書中のユーザ定義語デー
タを併合して語数の多い順に並べ替え、データ格納領域
であるユーザ辞書テーブルに格納するユーザ定義語デー
タ格納手段と、ユーザ辞書テーブルに格納されたユーザ
定義語データと入力された文字コードデータとを照合し
て処理を行う照合処理手段とを備えている。本発明にお
いては、ユーザ定義語データ格納手段が、複数のユーザ
定義語辞書中のユーザ定義語データを併合して語数の多
い順に並べ替える。そして、照合処理手段が、ユーザ辞
書テーブルに格納されたユーザ定義語データと入力され
た文字コードデータとを照合して処理を行う。Further, the user-defined word dictionary matching device according to the present invention merges user-defined word data in a plurality of user-defined word dictionaries, sorts them in descending order of the number of words, and stores the data in a user dictionary table as a data storage area. A user-defined word data storage means, and a collation processing means for collating and processing the user-defined word data stored in the user dictionary table with the input character code data. In the present invention, the user-defined word data storage unit merges user-defined word data in a plurality of user-defined word dictionaries and sorts them in descending order of the number of words. Then, the matching processing unit performs a process by comparing the user-defined word data stored in the user dictionary table with the input character code data.

【０００８】[0008]

【発明の実施の形態】実施形態１．図１は本発明の第１
の実施の形態に係るユーザ定義語辞書照合装置を適用す
るテキスト音声合成エンジンのブロック図である。図に
おいて、１はテキスト入力部である。テキスト入力部１
には、例えば、一時的にデータを記憶する部分であるク
リップボードや、テキストファイル等からのテキストデ
ータが入力される。テキストデータとは、文字、制御文
字等をコード化したデータのことである。ここで、文字
コードには、２バイトのデータ量で表される全角コード
と１バイトのデータ量で表される半角コードが存在す
る。テキスト入力部１は、テキストデータに半角コード
含まれていた場合は、その半角コードに対応する全角コ
ードにすべて変換する。２はテキスト解析部である。テ
キスト解析部２は、テキスト入力部１に入力されたテキ
ストデータを、単語単位のテキストデータに分割する。
その際には、ユーザ辞書３及びシステム辞書４に保持さ
れた単語データを参照する。そして、合致した単語デー
タと対応した読みデータを送信する。読みデータとは、
その単語データの発音方法（読み仮名、アクセント等）
がデータとして記録されているものである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiment 1 FIG. FIG. 1 shows the first embodiment of the present invention.
It is a block diagram of the text-to-speech synthesis engine to which the user-defined word dictionary matching device according to the embodiment is applied. In the figure, 1 is a text input unit. Text input unit 1
For example, text data from a clipboard, a text file, or the like, which is a part for temporarily storing data, is input to. Text data is data obtained by encoding characters, control characters, and the like. Here, the character code includes a full-width code represented by a 2-byte data amount and a half-width code represented by a 1-byte data amount. If the text data includes a half-width code, the text input unit 1 converts all the half-width code to a full-width code corresponding to the half-width code. 2 is a text analysis unit. The text analysis unit 2 divides the text data input to the text input unit 1 into text data in word units.
At that time, the word data stored in the user dictionary 3 and the system dictionary 4 are referred to. Then, the reading data corresponding to the matched word data is transmitted. What is reading data?
How to pronounce the word data (reading kana, accent, etc.)
Are recorded as data.

【０００９】音声データ生成部５は、テキスト解析部２
から送信される読みデータに基づいて、音声データ辞書
６を参照して音声データを生成する。音声出力部７は、
音声データ生成部５で生成された音声データをスピーカ
等の音声出力装置に送信し、発音させる。また、これを
音声出力装置に送信するのではなく、音声データをファ
イルとして記憶手段等に保存させておくこともできる。The voice data generation unit 5 includes a text analysis unit 2
The voice data is generated by referring to the voice data dictionary 6 based on the read data transmitted from the. The audio output unit 7
The audio data generated by the audio data generation unit 5 is transmitted to an audio output device such as a speaker, and sounds. Also, instead of transmitting this to the audio output device, the audio data may be stored in a storage unit or the like as a file.

【００１０】ここで、少なくとも、音声合成エンジンの
テキスト入力部１、テキスト解析部２及び音声データ生
成部５は、実際には、コンピュータとそのコンピュータ
を動作させるためのプログラムで構成されている。した
がって、テキスト入力部１、テキスト解析部２及び音声
データ生成部５の動作はこのプログラムに基づいて行わ
れる。プログラムは記録媒体等により保持され、コンピ
ュータはそのプログラムを読み込んで実行することにな
る。Here, at least the text input unit 1, text analysis unit 2, and speech data generation unit 5 of the speech synthesis engine are actually constituted by a computer and a program for operating the computer. Therefore, the operations of the text input unit 1, the text analysis unit 2, and the voice data generation unit 5 are performed based on this program. The program is stored in a recording medium or the like, and the computer reads and executes the program.

【００１１】本実施の形態は、複数のユーザ辞書を設定
し、音声合成エンジン内でその中の単語データを併合す
ることにより、通常は１つしか設定できないユーザ辞書
を併合しておかなくても複数設定できるようにしたもの
である。しかも、優先順位を設けることで、例えば、人
名が多いテキストデータに対して、人名のユーザ辞書を
優先して設定しておくように設定できるので、効率を上
げることも可能である。In this embodiment, by setting a plurality of user dictionaries and merging the word data in the speech synthesis engine, it is not necessary to merge the user dictionaries which can normally be set only once. It is one that allows multiple settings. In addition, by setting the priority, for example, it is possible to set the user dictionary of the personal name with priority for the text data having many personal names, so that the efficiency can be improved.

【００１２】図２は図１の音声合成エンジンの動作を示
す図である。図に基づいて、テキスト解析部２を中心と
した本実施の形態の音声合成エンジンの動作について説
明する。まず、テキスト入力処理Ｓ１において、テキス
ト入力部１に、テキストファイル、クリップボード等か
らのテキストデータが入力される。次にユーザ辞書適用
処理Ｓ２において、テキスト解析部２は、あらかじめ定
められた優先順位（優先度が高い順）に基づいて、各ユ
ーザ辞書の登録単語データをユーザ辞書テーブルに格納
する。ここで、照合の優先順位を設定するために、本実
施の形態ではアプリケーションプログラムインターフェ
ース（以下、ＡＰＩという）又はシステムコールを用い
ることにする。システムコールとは、オペレーションシ
ステム（ＯＳ）が提供するサービス（ファイルシステム
操作、入出力操作等）のサービスを利用するための手段
として使われるものである。ＡＰＩはシステムコールを
さらに高度にしたものである。また、システムコールの
意味で使われることもある。例えば、プログラマ等は用
いるユーザ辞書名とその優先順位とをＡＰＩを利用して
設定しておく。テキスト解析部２は、照合する際に、そ
のＡＰＩをコールし、そのＡＰＩに基づいて各ユーザ辞
書の登録単語データを優先順にユーザ辞書テーブルに格
納する。FIG. 2 is a diagram showing the operation of the speech synthesis engine of FIG. The operation of the speech synthesis engine according to the present embodiment centering on the text analysis unit 2 will be described with reference to the drawings. First, in a text input process S1, text data from a text file, a clipboard, or the like is input to the text input unit 1. Next, in the user dictionary application process S2, the text analysis unit 2 stores the registered word data of each user dictionary in the user dictionary table based on a predetermined priority (in order of higher priority). In this embodiment, an application program interface (hereinafter, referred to as an API) or a system call is used to set the priority of the collation. The system call is used as a means for using a service (a file system operation, an input / output operation, etc.) provided by an operation system (OS). The API is a more sophisticated system call. It is also sometimes used as a system call. For example, a programmer or the like sets a user dictionary name to be used and its priority using an API. At the time of collation, the text analysis unit 2 calls the API, and stores the registered word data of each user dictionary in the user dictionary table in priority order based on the API.

【００１３】単語分割処理Ｓ３では、テキスト解析部２
は、ユーザ辞書テーブル及びシステム辞書テーブルを参
照し、テキストデータの分割を行う。テキストデータの
分割は、ユーザ辞書テーブル及びシステム辞書テーブル
に保持されている単語データと照合していき、合致する
単語データがあれば、その単語データと同一の語を表す
テキストデータが存在すると認識する。ここで、ユーザ
辞書テーブル及びシステム辞書テーブルには、各単語デ
ータに対して読みデータが対応して保持されている。テ
キスト解析部２は、認識した単語データと対応する読み
データを送信する。In the word division processing S3, the text analysis unit 2
Refers to the user dictionary table and the system dictionary table, and divides the text data. The division of the text data is performed by collating with the word data stored in the user dictionary table and the system dictionary table. . Here, the user dictionary table and the system dictionary table hold reading data corresponding to each word data. The text analysis unit 2 transmits the read data corresponding to the recognized word data.

【００１４】音声データ生成処理Ｓ４では、送信される
読みデータ及び音声データ辞書６に記録された音声デー
タに基づいて音声合成データの生成を行う。音声出力処
理Ｓ５では、生成された音声合成データを、音声出力部
７を介し、スピーカ等の音声出力装置に出力して発音さ
せたり、ファイルとしてその音声合成データを保存した
りする。In the voice data generation process S4, voice synthesis data is generated based on the read data to be transmitted and the voice data recorded in the voice data dictionary 6. In the voice output processing S5, the generated voice synthesized data is output to a voice output device such as a speaker via the voice output unit 7 to generate sound, or the voice synthesized data is stored as a file.

【００１５】以上のように第１の実施の形態によれば、
ＡＰＩやシステムコールを利用してユーザ辞書に優先度
をつけ、テキストデータを分割するようにしたので、複
数のユーザ辞書を参照し、音声合成データを作成するこ
とができる。したがって、あらかじめユーザ辞書を併合
等をしておかなくても複数の辞書を用いることができ、
用途別等のユーザ辞書を作成することができるので、例
えば辞書データの修正を容易に行え、時間を短縮でき
る。As described above, according to the first embodiment,
Since the user dictionary is prioritized using the API and the system call to divide the text data, the voice synthesis data can be created by referring to a plurality of user dictionaries. Therefore, it is possible to use a plurality of dictionaries without merging the user dictionaries in advance,
Since a user dictionary for each use or the like can be created, for example, dictionary data can be easily corrected and time can be reduced.

【００１６】実施形態２．上述した第１の実施の形態で
は、ＡＰＩ又はシステムコールを用いてユーザ辞書の優
先順位をつけたが、本発明ではそれに限定されるもので
はない。例えば様々なオペレーションシステムで利用さ
れている、レジストリやリソースのような、設定用のフ
ァイルを設けるようして、記述等を行うことで設定して
もよい。そして、テキスト解析部２が、そのファイルに
記述された設定に基づいて優先順に各ユーザ辞書の単語
データをユーザ辞書テーブルに格納する。例えば、レジ
ストリの場合には、レジストリキーに、使用するユーザ
辞書名をキーとし、優先順位を値として記述する。この
ように、設定用ファイルに記述して設定するようにする
と、ＡＰＩで設定するよりもユーザ辞書の優先順の設定
を外部的に行えるので、より汎用的で、修正等が簡単に
行える。Embodiment 2 In the first embodiment described above, the priorities of the user dictionaries are assigned using an API or a system call, but the present invention is not limited to this. For example, a setting file such as a registry or a resource used in various operation systems may be provided, and the setting may be performed by describing the file. Then, the text analysis unit 2 stores the word data of each user dictionary in the user dictionary table in the order of priority based on the settings described in the file. For example, in the case of a registry, the name of a user dictionary to be used is used as a key and the priority is described as a value in a registry key. In this way, if the setting is made by describing the setting in the setting file, the priority order of the user dictionary can be set externally as compared with the setting using the API, so that it is more versatile and can be easily modified.

【００１７】実施形態３．上述した実施の形態では、複
数のユーザ辞書に優先順位をつけてユーザ辞書テーブル
に格納し、各ユーザ辞書を参照するようにしたが、本発
明ではそれに限定されるものではない。例えば、複数の
ユーザ辞書はそのまま保持し、テーブル格納時に、複数
のユーザ辞書を併合し、文字数が最も大きいデータ順に
並べ替えてから格納し、テキストデータと照合する。し
たがって、テキストデータとの照合は、最も文字数が多
いデータ（すべて全角文字コードの場合はデータサイズ
が最も大きいデータ）から行われる。Embodiment 3 In the above-described embodiment, a plurality of user dictionaries are assigned priorities and stored in the user dictionary table, and each user dictionary is referred to. However, the present invention is not limited to this. For example, a plurality of user dictionaries are kept as they are, and when storing a table, a plurality of user dictionaries are merged, sorted in the order of data having the largest number of characters, stored, and collated with text data. Therefore, the matching with the text data is performed from the data having the largest number of characters (the data having the largest data size in the case of all double-byte character codes).

【００１８】通常、テキストデータ中の文字コード列と
ある単語データとが合致する確率が最も高いのは、テキ
ストデータが表す文中において、その単語データが意味
する単語が含まれている箇所である。しかも、文字数が
多くなるほどその確率は高くなる。したがって、文字数
が多いデータから照合することにより、テキストデータ
の分割精度を高めることができ、より効率よく音声合成
処理を行うことができる。Normally, the probability that the character code string in the text data matches certain word data is highest in the sentence represented by the text data, where the word meaning by the word data is included. Moreover, the probability increases as the number of characters increases. Therefore, by performing collation from data having a large number of characters, the accuracy of division of text data can be improved, and speech synthesis processing can be performed more efficiently.

【００１９】実施形態４．上述の実施の形態では、音声
合成エンジンのユーザ辞書適用について説明したが、本
発明ではそれに限定されるものではなく、例えば、ワー
ドプロセッサ等の自動漢字変換等のように、入力される
文字コードを変換するという目的のものについても適用
することができる。Embodiment 4 In the above-described embodiment, the application of the user dictionary of the speech synthesis engine has been described. However, the present invention is not limited to this. For example, conversion of an input character code such as automatic kanji conversion of a word processor or the like is performed. It can also be applied for the purpose of doing so.

【００２０】[0020]

【発明の効果】以上のように本発明によれば、ユーザ定
義語データ格納手段が、複数のユーザ定義語辞書の照合
優先順をあらかじめ定めておき、データ格納領域である
ユーザ辞書テーブルに格納し、照合処理手段が、ユーザ
辞書テーブルに格納されたユーザ定義語データと入力さ
れた文字コードデータとを照合して処理を行うようにし
たので、あらかじめユーザ定義語辞書を併合等をしてお
かなくても複数のユーザ定義語辞書を用いることがで
き、用途別等のユーザ定義語辞書を作成することができ
るので、例えばユーザ定義語データの修正を容易に行
え、時間を短縮できる。As described above, according to the present invention, the user-defined word data storage means preliminarily determines the collation priority order of a plurality of user-defined word dictionaries and stores them in the user dictionary table which is a data storage area. Since the matching processing means performs processing by comparing the user-defined word data stored in the user dictionary table with the input character code data, the user-defined word dictionaries need not be merged in advance. However, since a plurality of user-defined word dictionaries can be used and user-defined word dictionaries for different purposes can be created, for example, user-defined word data can be easily corrected and time can be reduced.

【００２１】また、本発明によれば、ユーザ定義語デー
タ格納手段が、複数のユーザ定義語辞書中のユーザ定義
語データを併合して語数の多い順に並べ替え、照合処理
手段が、ユーザ辞書テーブルに格納されたユーザ定義語
データと入力された文字コードデータとを照合して処理
を行うようにしたので、文字数が多ければ、合致する確
率は少なくなるが、逆に合致すると、他の意味で考えら
れることが少ないことを利用して、照合結果の精度を高
めることができる。Further, according to the present invention, the user-defined word data storage means merges the user-defined word data in the plurality of user-defined word dictionaries and sorts them in descending order of the number of words. When the number of characters is large, the probability of matching is reduced, but when the number of characters is large, the matching is performed in another sense. The accuracy of the collation result can be improved by utilizing the fact that there is little possibility.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態に係るユーザ定義語
辞書照合装置を適用するテキスト音声合成エンジンのブ
ロック図である。FIG. 1 is a block diagram of a text-to-speech synthesis engine to which a user-defined word dictionary matching device according to a first embodiment of the present invention is applied.

【図２】図１の音声合成エンジンの動作を示す図であ
る。FIG. 2 is a diagram showing the operation of the speech synthesis engine of FIG. 1;

[Explanation of symbols]

１テキスト入力部２テキスト解析部３ユーザ定義単語データ辞書４システム単語データ辞書５音声データ生成部６音声データ辞書７音声出力部 DESCRIPTION OF SYMBOLS 1 Text input part 2 Text analysis part 3 User-defined word data dictionary 4 System word data dictionary 5 Voice data generation part 6 Voice data dictionary 7 Voice output part

Claims

[Claims]

1. A user-defined dictionary that stores at least a plurality of user-defined word dictionaries storing a plurality of user-defined word data, compares the user-defined word data with character code data, and performs processing based on the result. In the word dictionary matching device, in a user dictionary table that is a data storage area, a user-defined word data storage unit that stores the user-defined word data in a predetermined priority order of the user-defined word dictionary; A user-defined word dictionary matching device, comprising: matching processing means for comparing stored user-defined word data with input character code data to perform processing.

2. The user-defined word dictionary matching device according to claim 1, wherein the priority order is described and set by an application program interface or a system call.

3. The user-defined word data storage unit stores the user-defined word data based on the setting information file, wherein the priority order is described and set in a setting information file. Item 1. The user-defined word dictionary matching device according to item 1.

4. A user-defined dictionary that stores at least a plurality of user-defined word dictionaries storing a plurality of user-defined word data, compares the user-defined word data with character code data, and performs processing based on the result. In the word dictionary matching device, a user-defined word data storage unit that merges user-defined word data in the plurality of user-defined word dictionaries, sorts the data in descending order of the number of words, and stores the data in a user dictionary table that is a data storage area. A user-defined word dictionary matching device, comprising: matching processing means for checking and processing user-defined word data stored in a user dictionary table with input character code data.

5. The apparatus according to claim 1, further comprising a voice conversion unit configured to convert reading data indicating how to read a word represented by the user-defined word data into voice data, wherein the matching processing unit includes the user-defined word data and the character code data. 5. The user-defined word dictionary matching device according to claim 1, wherein the reading data corresponding to the user-defined word data that matches the character code data is output to the voice conversion unit. 6.

6. The user-defined word dictionary matching device according to claim 5, further comprising a voice output unit that converts the voice data into a sound.