JP2004037528A

JP2004037528A - Information processor and information processing method

Info

Publication number: JP2004037528A
Application number: JP2002190545A
Authority: JP
Inventors: Kenichiro Nakagawa; 中川　賢一郎; Hiroki Yamamoto; 山本　寛樹; Tsuyoshi Yagisawa; 八木沢　津義
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-06-28
Filing date: 2002-06-28
Publication date: 2004-02-05

Abstract

<P>PROBLEM TO BE SOLVED: To solve a problem wherein, according to the method of producing recognition vocabulary only by inputting a sentence containing the recognition vocabulary, it is impossible to fabricate a speaking dictionary for speech synthesis requiring accents and parts of speech and, also, it is difficult to edit the erroneous reading automatically attached to the recognition vocabulary. <P>SOLUTION: The orthographic information of a word is inputted (S211) or is edited (S213) and, on the basis of the inputted and edited orthographic information, the word information indicating reading, accent and part of speech is complemented (S214), the complemented word information is corrected as necessary (S215-S217) and the produced word information is outputted (S219). <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は情報処理装置およびその方法に関し、例えば、音声認識や音声合成で用いられる単語情報を効率よく編集、作成するための情報処理に関する。
【０００２】
【従来の技術】
近年の機器性能およびソフトウェア技術の向上により、ユーザが発声した音声を認識して、機器に対するコマンドとして用いるシステムが開発されている。これらシステムに用いられる多くの音声認識装置は、前もって認識可能な語彙（認識語彙）を登録しておく必要がある。認識語彙の登録は、音声あるいは発声内容を含むテキスト（例えば単語の読みを表すカタカナ）によって行う。また、音声によって認識語彙を登録する場合は、登録ユーザと利用ユーザとが異なると認識率が下がる欠点がある。このため、現状は、テキストによって認識語彙を登録する装置が多い。
【０００３】
また、電子化されたテキストを音声情報に変換する音声合成技術も実用化されている。音声合成は、基本的に、ユーザが単語を登録する必要はないが、装置にとって未知の語彙を正確に発声させようとする場合、語彙の発声辞書が必要になる。発声辞書は、語彙の読みのほか、アクセント情報（読みの何処でアクセントが上がる、下がる）や、その単語の品詞情報も含む場合がある。
【０００４】
【発明が解決しようとする課題】
音声認識を利用するには、上述したように認識語彙が必要である。また、音声合成で単語を正確に発声させるには、上述したように発声辞書が必要である。これらのデータ（以下、まとめて「単語辞書」と呼ぶ）は、認識語彙や発声内容が変化しなければ、システムの構築時に一度作成すれば済むが、システムの対話内容が変化する場合には単語辞書をメンテナンスする必要が生じる。
【０００５】
特開２００２−４１０８１公報には、文字列を入力として受け付け、その文字列を構成単語ごとに分割し、各単語の読み情報を自動補完することで、音声認識用の認識語彙を作成する技術が開示されている。この技術は、認識語彙が含まれる文章を入力するだけで認識語彙が生成されるため、ユーザの負荷が少ない。しかし、この方式ではアクセント、品詞などが必要な音声合成用の発声辞書を作成することはできず、また、認識語彙に自動的に付加された誤った読みを編集することも難しい。
【０００６】
本発明は、上述の問題を個々にまたはまとめて解決するためのもので、音声認識や音声合成で用いられる単語情報を効率よく編集、作成することを目的とする。
【０００７】
また、単語情報に自動付加された情報の訂正を可能にすることを他の目的とする。
【０００８】
【課題点を解決するための手段】
本発明は、前記の目的を達成する一手段として、以下の構成を備える。
【０００９】
本発明にかかる情報処理装置は、単語の表記情報を入力または編集する入力編集手段と、入力または編集された表記情報に基づき、その単語の読み、アクセントおよび品詞を示す単語情報を補完する補完手段と、前記補間手段によって補完された単語情報を、必要に応じて訂正する訂正手段と、作成された単語情報を出力する出力手段とを有することを特徴とする。
【００１０】
本発明にかかる情報処理方法は、単語の表記情報を入力または編集し、入力または編集された表記情報に基づき、その単語の読み、アクセントおよび品詞を示す単語情報を補完し、補完された単語情報を、必要に応じて訂正し、作成された単語情報を出力することを特徴とする。
【００１１】
好ましくは、さらに、前記単語情報の利用形態を設定し、前記利用形態に応じて前記単語情報を出力することを特徴とする。
【００１２】
【発明の実施の形態】
以下、本発明にかかる一実施形態の単語辞書を編集する情報処理装置（以下「単語辞書編集装置」と呼ぶ）を図面を参照して詳細に説明する。
【００１３】
［構成］
図１は単語辞書を編集する装置の構成例を示すブロック図である。
【００１４】
単語辞書編集装置１０１には、入力装置としてキーボード１０４およびマウスなどのポインティングデバイス１０５が接続され、編集作業や登録単語の情報をユーザに通知する出力装置としてディスプレイ１０２およびスピーカ１０３が接続されている。なお、これら装置を一体に構成することも可能である。また、単語辞書編集装置１０１は専用の装置として構成することもできるが、単語辞書編集を実行するソフトウェアをコンピュータ機器（ＰＣ）に供給することでも実現可能である。
【００１５】
入力装置によって入力される、単語編集者（以下「ユーザ」と呼ぶ）の入力操作を示す情報（操作情報）は、単語辞書編集装置１０１の操作情報入力部１１０によって解釈され、その操作情報は単語データ編集部１０８へ送られて、単語データ編集部１０８により、各単語の表記や読み情報の編集、アクセント設定などが行われる。そして、編集された単語データは、単語データ管理部１０９へ送られて、単語辞書１１３へ格納される。単語辞書１１３は、例えばハードディスクや半導体メモリカードに割り付けられた領域に格納されていて、外部の音声認識装置や音声合成装置は、所定のインタフェイスを介して、単語辞書１１３に格納された単語データを読み込み、音声認識や音声合成処理を行う。
【００１６】
単語データ編集部１０８によって単語表記が編集された場合、単語データ自動補完部１１１は、その単語表記から読みやアクセント情報を自動的に付加する自動補完処理を行う。この自動補間処理により、表記に対応する読みやアクセントを入力するユーザの手間を省くことができ、ユーザによる読みやアクセントの設定は、主に、自動補完処理に誤りがあった場合に限られる。自動補完処理は、詳細は後述するが、単語の表記に基づき言語辞書１１４を検索し、言語辞書１１４に格納された読み、アクセントおよび品詞を設定する処理である。
【００１７】
情報出力部１０６は、出力装置を用いて、ユーザの操作情報をフィードバックし、また、単語データ管理部１０９が管理する単語データを、逐次、ユーザに提示する。具体的には、ユーザが音声出力によって単語を確認するコマンドを入力すると、情報出力部１０６は、選択された単語情報（読み、アクセントおよび品詞）および音声合成用素片データ１１２を用いる音声合成処理を音声合成部１０７に実行させ、生成された音声波形をスピーカ１０３に出力する。なお、音声合成処理は、公知の技術を用いているため、詳しい説明は省略する。
【００１８】
なお、言語辞書１１４および音声合成用素片データ１１２は、単語辞書１１３と同様に、例えばハードディスクや半導体メモリカードに割り付けられた領域に格納されている。
【００１９】
［処理］
図２は単語辞書の編集処理の一例を示すフローチャートである。なお、以下では、単語辞書編集装置１０１が、図３に示すようなユーザインタフェイス（ＵＩ）画面を有するダイアログベースのアプリケーションソフトウェアとしてＰＣなどに実装された場合を説明する。
【００２０】
単語辞書編集装置１０１は、起動されるとユーザ操作情報を読み込み（Ｓ２０１）、読み込んだユーザ操作情報に対応する処理を行う下記のループに入る。言い換えれば、ユーザの操作によって発生するイベントに基づき各処理を呼び出す。
【００２１】
ＵＩ画面のボタンあるいはメニューによって「新規単語追加」が指示されると（Ｓ２０２）、新規単語用に空のレコード（情報群）を作成する（Ｓ２１１）。図３に示すＵＩ画面は一行が一つの単語情報を示し、新規単語追加は行を追加する処理に対応し、具体的には、ＵＩ画面のＩＤ列４０２に新規単語のＩＤを例えば数字列で付加する。なお、ＩＤは、登録単語に対してユニークに割り振ることが望ましい。
【００２２】
ＵＩ画面に表示された特定の単語が選択され、ボタンあるいはメニューによって「単語削除」が指示されると（Ｓ２０３）、その単語情報のレコードを削除する（Ｓ２１２）。
【００２３】
マウス１０５などにより、表記列４０３のセルが選択されると（Ｓ２０４）、「表記編集」が指示されたとして、キーボード１０４などを介して入力される文字列を選択セルに表示して、その文字列を単語の表記情報として設定し（Ｓ２１３）、その表記情報を用いて単語情報（読み、アクセントおよび品詞）を自動補完する（Ｓ２１４）。
【００２４】
マウス１０５などにより、読み列４０４のセルが選択されると（Ｓ２０５）、「読み修正処理」が指定されたとして、キーボード１０４などを介して入力される文字列を選択セルに表示して、その文字列を単語の読み情報として設定する（Ｓ２１５）。
【００２５】
マウス１０５などにより、アクセント列４０５のセルが選択されると（Ｓ２０６）、「アクセント修正処理」が指定されたとして、キーボード１０４などを介して入力される文字列を選択セルに表示して、その文字列を単語のアクセント情報として設定する（Ｓ２１６）。その際、図４に示すようなアクセント設定用の別のＵＩ画面を開き、グラフィカルユーザインタフェイス（ＧＵＩ）を利用してアクセントを設定することも可能である。図４に示すＧＵＩの例は、単語の読み情報のモーラ（発声の単位）ごとに一つずつアクセント指定ボタン６０２を割り当て、各ボタンの状態によってアクセントを指定するものである。ボタンの状態と、アクセントの高低との関係は、例えば、次のように定める。
ボタンが押された状態　　　　…　アクセントが低い
ボタンが押されていない状態　…　アクセントが高い
【００２６】
マウス１０５などにより、品詞列４０６のセルが選択されると（Ｓ２０７）、「品詞修正処理」が指定されたとして、キーボード１０４などを介して入力されるまたはリストボックスなどから選択される品詞名を示す文字列を選択セルに表示して、その品詞名を単語の品詞情報として設定する（Ｓ２１７）。
【００２７】
特定の単語（複数の単語でもよい）が選択され、ボタンあるいはメニューによって「音声確認」が指示されると（Ｓ２０８）、その単語の読み／アクセントをユーザに確認させるための音声合成処理および音声出力を行う（Ｓ２１８）。なお、複数の単語が選択された場合、例えばＵＩ画面の上から順に単語の音声を出力する。また、図４に示すように、アクセント設定用のＧＵＩにも音声確認ボタン６０３を用意して、アクセント設定時に音声確認が行えるようにしてもよい。
【００２８】
ボタンあるいはメニューによって「ファイル書き出し」が指示された場合（Ｓ２０９）、現在登録されている全単語を単語辞書１１３としてファイル出力する（Ｓ２１９）。その際、出力する単語辞書１１３の形式を選択できるようにしてもよいし、音声合成用の発声辞書や、音声認識用の認識語彙データ（音声認識用の認識文法に従う単語情報）として出力することを明示して、ファイルに書き出してもよい。図５はファイルの保存先を設定するＵＩ画面の一例であるが、このＵＩ画面の中で、データフォーマット選定リスト７０２により、書き出すファイルの形式を選択できる。図６はファイル出力された単語辞書１１３の一例を示す図である。なお、ファイルの出力先は、装置１０１のハードディスクや半導体メモリカードであるが、所定のインタフェイスを備えることによって、ＩＥＥＥ１３９４やＵＳＢ（Ｕｎｉｖｅｒｓａｌ　Ｓｅｒｉａｌ　Ｂｕｓ）などのシリアルバスや、ＢｌｕｅｔｏｏｔｈやＩｒＤＡなどの無線インタフェイスを介して、音声認識装置や音声合成装置のメモリに出力することも可能である。
【００２９】
また、ボタンあるいはメニューによって「終了」が指示された場合（Ｓ２１０）、処理を終了する。その際、ファイル出力していない単語情報があれば、ユーザにファイル出力を促すダイアログを提示してもよい。
【００３０】
［単語情報の補完］
図７は単語情報の自動補完処理の一例を示すフローチャートで、単語の表記が新規に入力または編集された場合にこの処理が実行される。
【００３１】
まず、単語の表記を用いて言語辞書１１４を検索（辞書引き）し（Ｓ３０１）、その単語表記に対応する単語が言語辞書１１４に格納されている否かを判定する（Ｓ３０２）。対応する単語が格納されている場合、言語辞書１１４から対応する単語情報（読み、アクセントおよび品詞）を取り出し（Ｓ３０６）、取り出した単語情報を、入力または編集された単語表記に対応する読み、アクセントおよび品詞に設定する（Ｓ３０７）。
【００３２】
また、入力された単語表記に対応する単語が検索されなかった場合は、その単語表記を細分して個々の表記を検索する（Ｓ３０３）。単語表記の分割は、形態素ごとに区切ればよいが、形態素に区切っても検索されない場合は一文字ごとに分割する。そして、分割された表記それぞれに対応する読みを、分割前の表記順に繋げ（Ｓ３０４）、アクセントはデフォルト値（例えば０型）に、品詞は名詞にして（Ｓ３０５）、入力または編集された単語表記に対応する読み、アクセントおよび品詞に設定する（Ｓ３０７）。
【００３３】
例えば「ＹＲＰ野比」という表記から読み、アクセントおよび品詞を自動補完する場合、次のような処理になる。
【００３４】
まず「ＹＲＰ野比」が言語辞書１１４から辞書引きされ、「ＹＲＰ野比」が検索されなかったとすると、「ＹＲＰ野比」が形態素解析される。解析の結果、「Ｙ」「Ｒ」「Ｐ」および「野比」に分割され、それぞれが辞書引きされて「ワイ」「アール」「ピー」「ノビ」という読みが得られる。これらを一つの読み「ワイアールピーノビ」と繋げて読みとし、アクセントは０型、品詞は名詞に設定する。
【００３５】
［キーオペレーション］
図８は単語情報補完装置１０１で可能なキーオペレーション用のキー割り当ての一覧を示す図である。図８に示すように、装置１０１はＵＩ画面のボタンやメニューからだけではなく、キー操作によっても各種コマンド指示を行うことができる。
【００３６】
【変形例】
図９は、図３に示したＵＩ画面の変形例を示す図である。この例では、ＵＩ画面に単語利用先を指定する列９０２が追加されている。列９０２の各セルにはリストボックスなどから選択可能な利用先を示す「音声合成」「音声認識」「音声合成・認識両方」が入力可能である。
【００３７】
利用先情報は、その単語を音声認識用の認識語彙データとして用いるか、音声合成用の発声辞書として用いるか、あるいは、その双方に用いるかを示す情報で、この設定値により、単語辞書編集装置１０１の処理を変えることができる。例えば、ステップＳ２１９において、認識語彙データを出力する場合は「音声認識」「音声合成・認識両方」が設定された単語だけを出力し、同様に、発声辞書を出力する場合は「音声合成」「音声合成・認識両方」が設定された単語だけを出力することができる。
【００３８】
このようにすれば、図９に示す一つのＵＩ画面で、音声認識用および音声合成用の語彙を同時に管理することが可能になる。
【００３９】
図１０は、図４に示したＵＩ画面の変形例を示す図である。この例では、アクセントの高低をボタンで指示するのではなく、スライダバー１００２を用いてより細かく指示することができる。また、スライダバー１００２の各スライダを結ぶ曲線（アクセントイメージ）１００４により、単語に設定されたアクセントをグラフィックス表示することができる。
【００４０】
なお、上記では、、単語情報が表記、読み、アクセントおよび品詞で構成されると説明したが、それら以外の情報を含めることができる。例えば、出現確率、重要度、音声認識された際に実行する処理名、音声合成する際の波形辞書名、並びに、音声合成する際の速度・音程のパラメータなどを含めることができる。
【００４１】
勿論、単語辞書編集装置１０１は日本語以外の言語にも適用可能である。
【００４２】
【他の実施形態】
なお、本発明は、複数の機器（例えばホストコンピュータ、インタフェイス機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。
【００４３】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００４４】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【００４５】
本発明を上記記憶媒体に適用する場合、その記憶媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。
【００４６】
【発明の効果】
以上説明したように、本発明によれば、
【００４７】
本単語辞書編集装置において、単語の表記を入力すれば、その読み、アクセント、品詞などが自動補完される。これら補完された単語情報が間違っていた場合だけ、それらの内容を編集すればよいため、ユーザの手間を大幅に削減することが可能になる。
【図面の簡単な説明】
【図１】単語辞書を編集する装置の構成例を示すブロック図、
【図２】単語辞書の編集処理の一例を示すフローチャート、
【図３】ユーザインタフェイス画面の一例を示す図、
【図４】アクセント設定用のユーザインタフェイス画面の一例を示す図、
【図５】ファイルの保存先を設定するユーザインタフェイス画面の一例、
【図６】ファイル出力された単語辞書の一例を示す図、
【図７】単語情報の自動補完処理の一例を示すフローチャート、
【図８】単語情報補完装置で可能なキーオペレーション用のキー割り当ての一覧を示す図、
【図９】図３に示したユーザインタフェイス画面の変形例を示す図、
【図１０】図４に示したユーザインタフェイス画面の変形例を示す図である。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information processing apparatus and method, for example, to information processing for efficiently editing and creating word information used in speech recognition and speech synthesis.
[0002]
[Prior art]
Due to recent improvements in device performance and software technology, systems that recognize voices uttered by users and use the commands as commands for the devices have been developed. Many speech recognition devices used in these systems need to register a recognizable vocabulary (recognized vocabulary) in advance. The registration of the recognition vocabulary is performed by text including text or utterance content (for example, katakana indicating reading of a word). In addition, when the recognition vocabulary is registered by voice, there is a disadvantage that the recognition rate is reduced if the registered user and the user are different. For this reason, at present, there are many devices for registering recognition vocabulary by text.
[0003]
Speech synthesis techniques for converting digitized text into speech information have also been put to practical use. Speech synthesis basically does not require the user to register words, but requires a vocabulary dictionary of vocabulary if the vocabulary unknown to the device is to be uttered accurately. The utterance dictionary may include, in addition to the vocabulary reading, accent information (where the accent goes up or down in the reading) and part of speech information of the word.
[0004]
[Problems to be solved by the invention]
To use speech recognition, a recognition vocabulary is required as described above. Further, in order to accurately utter words in speech synthesis, an utterance dictionary is necessary as described above. These data (hereinafter collectively referred to as a “word dictionary”) need only be created once when the system is built unless the recognized vocabulary and utterance content change, but if the dialog content of the system changes, the word The dictionary needs to be maintained.
[0005]
Japanese Patent Application Laid-Open No. 2002-41081 discloses a technique for generating a recognition vocabulary for speech recognition by receiving a character string as input, dividing the character string into constituent words, and automatically complementing reading information of each word. It has been disclosed. According to this technique, the recognition vocabulary is generated only by inputting a sentence including the recognition vocabulary, so that the load on the user is small. However, with this method, it is not possible to create an utterance dictionary for speech synthesis requiring accents, parts of speech, etc., and it is also difficult to edit erroneous readings automatically added to the recognized vocabulary.
[0006]
The present invention is to solve the above-mentioned problems individually or collectively, and aims to efficiently edit and create word information used in speech recognition and speech synthesis.
[0007]
Another object of the present invention is to enable correction of information automatically added to word information.
[0008]
[Means for solving the problems]
The present invention has the following configuration as one means for achieving the above object.
[0009]
An information processing apparatus according to the present invention includes an input / editing unit that inputs or edits notation information of a word, and a complementing unit that complements word information indicating the reading, accent, and part of speech of the word based on the input or edited notation information. And correction means for correcting the word information complemented by the interpolation means as necessary, and output means for outputting the created word information.
[0010]
The information processing method according to the present invention inputs or edits notation information of a word, and based on the input or edited notation information, complements word information indicating the reading, accent, and part of speech of the word, and complements the complemented word information. Is corrected as necessary, and the created word information is output.
[0011]
Preferably, the method further comprises setting a usage mode of the word information, and outputting the word information according to the usage mode.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an information processing apparatus for editing a word dictionary (hereinafter, referred to as a “word dictionary editing apparatus”) according to an embodiment of the present invention will be described in detail with reference to the drawings.
[0013]
[Constitution]
FIG. 1 is a block diagram showing a configuration example of a device for editing a word dictionary.
[0014]
A keyboard 104 and a pointing device 105 such as a mouse are connected as input devices to the word dictionary editing device 101, and a display 102 and a speaker 103 are connected as output devices for notifying a user of editing work and information of registered words. In addition, it is also possible to integrally configure these devices. The word dictionary editing device 101 can be configured as a dedicated device, but can also be realized by supplying software for executing word dictionary editing to a computer device (PC).
[0015]
Information (operation information) indicating an input operation of a word editor (hereinafter, referred to as “user”) input by the input device is interpreted by the operation information input unit 110 of the word dictionary editing device 101, and the operation information is a word. The data is sent to the data editing unit 108, and the word data editing unit 108 performs notation of each word, editing of reading information, setting of accents, and the like. Then, the edited word data is sent to the word data management unit 109 and stored in the word dictionary 113. The word dictionary 113 is stored in, for example, an area allocated to a hard disk or a semiconductor memory card, and an external speech recognition device or speech synthesis device transmits word data stored in the word dictionary 113 via a predetermined interface. And performs speech recognition and speech synthesis processing.
[0016]
When the word notation is edited by the word data editing unit 108, the word data automatic complementing unit 111 performs an automatic complementing process of automatically adding reading and accent information from the word notation. This automatic interpolation processing can save the user the trouble of inputting a reading or an accent corresponding to the notation, and the setting of the reading or accent by the user is mainly limited to an error in the automatic completion processing. Although the details will be described later, the automatic completion process is a process of searching the language dictionary 114 based on the notation of a word, and setting the pronunciation, accent, and part of speech stored in the language dictionary 114.
[0017]
The information output unit 106 uses the output device to feed back operation information of the user, and sequentially presents the word data managed by the word data management unit 109 to the user. Specifically, when the user inputs a command for confirming a word by voice output, the information output unit 106 performs a voice synthesis process using the selected word information (reading, accent, and part of speech) and the voice synthesis unit data 112. Is performed by the voice synthesizing unit 107, and the generated voice waveform is output to the speaker 103. Since the voice synthesis processing uses a known technique, a detailed description is omitted.
[0018]
Note that, similarly to the word dictionary 113, the language dictionary 114 and the speech synthesis segment data 112 are stored in, for example, an area allocated to a hard disk or a semiconductor memory card.
[0019]
[processing]
FIG. 2 is a flowchart illustrating an example of a word dictionary editing process. Hereinafter, a case will be described in which the word dictionary editing apparatus 101 is implemented in a PC or the like as dialog-based application software having a user interface (UI) screen as shown in FIG.
[0020]
When activated, the word dictionary editing device 101 reads user operation information (S201), and enters the following loop for performing processing corresponding to the read user operation information. In other words, each process is called based on an event generated by a user operation.
[0021]
When "Add new word" is instructed by a button or menu on the UI screen (S202), an empty record (information group) is created for a new word (S211). In the UI screen shown in FIG. 3, one line indicates one piece of word information, and adding a new word corresponds to a process of adding a line. Specifically, the ID of the new word is added to the ID column 402 of the UI screen by a numerical string, for example. Add. It is desirable that IDs are uniquely assigned to registered words.
[0022]
When a specific word displayed on the UI screen is selected and "delete word" is instructed by a button or a menu (S203), the record of the word information is deleted (S212).
[0023]
When a cell of the notation column 403 is selected by the mouse 105 or the like (S204), it is determined that "edit notation" has been instructed, a character string input via the keyboard 104 or the like is displayed in the selected cell, and the character The column is set as word notation information (S213), and the word information (reading, accent, and part of speech) is automatically complemented using the notation information (S214).
[0024]
When a cell of the reading column 404 is selected by the mouse 105 or the like (S205), it is determined that "reading correction process" has been designated, and a character string input via the keyboard 104 or the like is displayed in the selected cell. The character string is set as word reading information (S215).
[0025]
When the cell of the accent row 405 is selected by the mouse 105 or the like (S206), it is determined that “accent correction processing” has been designated, and a character string input via the keyboard 104 or the like is displayed in the selected cell. The character string is set as word accent information (S216). At this time, it is also possible to open another UI screen for setting an accent as shown in FIG. 4 and set an accent using a graphical user interface (GUI). In the example of the GUI shown in FIG. 4, one accent designation button 602 is assigned to each mora (unit of utterance) of word reading information, and an accent is designated according to the state of each button. The relationship between the state of the button and the level of the accent is determined, for example, as follows.
Button pressed… State with low accent button not pressed… High accent
When the cell of the part-of-speech sequence 406 is selected by the mouse 105 or the like (S207), it is determined that “part-of-speech correction processing” has been designated, and The displayed character string is displayed in the selected cell, and the part of speech name is set as the part of speech information of the word (S217).
[0027]
When a specific word (or a plurality of words) is selected and "voice confirmation" is instructed by a button or a menu (S208), a speech synthesis process and a speech output for causing the user to confirm the reading / accent of the word. Is performed (S218). When a plurality of words are selected, for example, the sounds of the words are output in order from the top of the UI screen. Also, as shown in FIG. 4, a voice confirmation button 603 may be provided in the GUI for accent setting so that the voice can be confirmed at the time of accent setting.
[0028]
When "file writing" is instructed by a button or a menu (S209), all words registered at present are output as a word dictionary 113 as a file (S219). At this time, the format of the word dictionary 113 to be output may be selectable, or may be output as a speech synthesis dictionary for speech synthesis or recognition vocabulary data for speech recognition (word information according to a recognition grammar for speech recognition). May be written to a file. FIG. 5 shows an example of a UI screen for setting the storage destination of the file. In this UI screen, the format of the file to be written can be selected by the data format selection list 702. FIG. 6 is a diagram showing an example of the word dictionary 113 output as a file. The output destination of the file is a hard disk or a semiconductor memory card of the device 101. By providing a predetermined interface, a serial bus such as IEEE1394 or USB (Universal Serial Bus) or a wireless interface such as Bluetooth or IrDA is provided. It is also possible to output to a memory of a speech recognition device or a speech synthesis device via the face.
[0029]
When "end" is instructed by the button or the menu (S210), the process ends. At this time, if there is word information that has not been output to a file, a dialog prompting the user to output a file may be presented.
[0030]
[Complete word information]
FIG. 7 is a flowchart illustrating an example of a word information auto-completion process. This process is executed when a word notation is newly input or edited.
[0031]
First, the language dictionary 114 is searched (dictionary lookup) using the word notation (S301), and it is determined whether a word corresponding to the word notation is stored in the language dictionary 114 (S302). If the corresponding word is stored, the corresponding word information (reading, accent, and part of speech) is extracted from the language dictionary 114 (S306), and the extracted word information is input into the reading or accent corresponding to the input or edited word notation. And part of speech (S307).
[0032]
If no word corresponding to the input word notation is found, the word notation is subdivided and individual notations are searched (S303). The word notation may be divided for each morpheme. However, if the search is not performed even if it is divided for each morpheme, it is divided for each character. Then, the pronunciations corresponding to each of the divided notations are connected in the notation order before the division (S304), the accent is set to a default value (for example, type 0), the part of speech is set to a noun (S305), and the input or edited word notation is used. Are set to the reading, accent, and part of speech (S307).
[0033]
For example, when reading from the notation “YRP Nobi” and automatically complementing accents and parts of speech, the following processing is performed.
[0034]
First, if “YRP Nobi” is looked up in the dictionary from the language dictionary 114 and “YRP Nobi” is not searched, “YRP Nobi” is subjected to morphological analysis. As a result of the analysis, the data is divided into “Y”, “R”, “P”, and “Nobi”. These are connected to one reading, "Wire Pinovi," and the reading is made. The accent is set to type 0 and the part of speech is set to noun.
[0035]
[Key operation]
FIG. 8 is a diagram showing a list of key assignments for key operations that can be performed by the word information complementing apparatus 101. As shown in FIG. 8, the apparatus 101 can issue various command instructions not only from buttons and menus on the UI screen but also from key operations.
[0036]
[Modification]
FIG. 9 is a diagram showing a modification of the UI screen shown in FIG. In this example, a column 902 for specifying a word use destination is added to the UI screen. In each cell of the column 902, "speech synthesis", "speech recognition", and "speech synthesis / recognition" indicating a destination that can be selected from a list box or the like can be input.
[0037]
The usage destination information is information indicating whether the word is used as recognition vocabulary data for speech recognition, as an utterance dictionary for speech synthesis, or both, and the set value is used to set the word dictionary editing device. The processing of 101 can be changed. For example, in step S219, when outputting recognized vocabulary data, only words for which "speech recognition" and "both speech synthesis / recognition" are set are output, and similarly, when outputting a speech dictionary, "speech synthesis" and "voice synthesis" are output. Only words for which both "speech synthesis and recognition" are set can be output.
[0038]
This makes it possible to simultaneously manage the vocabularies for speech recognition and speech synthesis on one UI screen shown in FIG.
[0039]
FIG. 10 is a diagram showing a modification of the UI screen shown in FIG. In this example, instead of instructing the height of the accent with a button, the slider bar 1002 can be used to give a finer instruction. In addition, a curve (accent image) 1004 connecting the sliders of the slider bar 1002 makes it possible to graphically display the accent set for the word.
[0040]
In the above description, it has been described that the word information is composed of notation, reading, accent, and part of speech, but other information can be included. For example, an appearance probability, a degree of importance, a name of a process to be executed when speech recognition is performed, a waveform dictionary name for speech synthesis, and parameters of speed and pitch for speech synthesis can be included.
[0041]
Of course, the word dictionary editing apparatus 101 can be applied to languages other than Japanese.
[0042]
[Other embodiments]
The present invention can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, a printer, etc.), but can be applied to a device including one device (for example, a copier, a facsimile machine, etc.). May be applied.
[0043]
Further, an object of the present invention is to supply a storage medium (or a recording medium) in which a program code of software for realizing the functions of the above-described embodiments is recorded to a system or an apparatus, and a computer (or a CPU or a CPU) of the system or the apparatus. Needless to say, the present invention can also be achieved by an MPU) reading and executing a program code stored in a storage medium. In this case, the program code itself read from the storage medium realizes the function of the above-described embodiment, and the storage medium storing the program code constitutes the present invention. When the computer executes the readout program code, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing.
[0044]
Further, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is executed based on the instruction of the program code. It goes without saying that the CPU included in the expansion card or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
[0045]
When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the flowcharts described above.
[0046]
【The invention's effect】
As described above, according to the present invention,
[0047]
In the present word dictionary editing device, when a word notation is input, its reading, accent, part of speech, and the like are automatically complemented. Only when the complemented word information is wrong, it is sufficient to edit the contents thereof, so that it is possible to greatly reduce the trouble of the user.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration example of a device for editing a word dictionary;
FIG. 2 is a flowchart showing an example of a word dictionary editing process;
FIG. 3 is a diagram showing an example of a user interface screen.
FIG. 4 is a diagram showing an example of a user interface screen for setting an accent.
FIG. 5 is an example of a user interface screen for setting a save destination of a file;
FIG. 6 is a diagram showing an example of a word dictionary output as a file.
FIG. 7 is a flowchart showing an example of an automatic completion process of word information;
FIG. 8 is a diagram showing a list of key assignments for key operations that can be performed by the word information complementing device;
FIG. 9 is a view showing a modification of the user interface screen shown in FIG. 3;
FIG. 10 is a diagram showing a modification of the user interface screen shown in FIG.

Claims

Input editing means for inputting or editing word notation information;
A complementing means for complementing word information indicating the reading, accent, and part of speech of the word based on the input or edited written information;
Correction means for correcting the word information complemented by the interpolation means, if necessary,
An output unit that outputs the created word information.

Further, there is a setting means for setting a use form of the word information,
The information processing apparatus according to claim 1, wherein the output unit outputs the word information according to the use mode.

The information processing apparatus according to claim 1, wherein the output unit outputs the word information as information according to a recognition grammar for voice recognition.

The information processing apparatus according to claim 1, wherein the output unit outputs the word information as an utterance dictionary for speech synthesis.

The information processing apparatus according to any one of claims 1 to 4, further comprising a confirmation unit configured to confirm reading of the selected word information by voice.

Furthermore, it has accent setting means for setting accent information for each mora of the word,
The information processing apparatus according to claim 1, wherein the correction unit reflects the accent information set by the accent setting unit in the word information.

7. The information processing apparatus according to claim 6, wherein the accent information is set by a slider bar displayed in a graphic form.

Enter or edit the word ’s notation information,
Based on the input or edited written information, complement the word information indicating the reading, accent and part of speech of the word,
Correct the complemented word information as necessary,
An information processing method characterized by outputting the created word information.

Further, a use mode of the word information is set,
The information processing method according to claim 8, wherein the word information is output according to the use mode.

A program for controlling an information processing apparatus to execute the information processing according to claim 8.

A recording medium on which the program according to claim 10 is recorded.