JP4311710B2

JP4311710B2 - Speech synthesis controller

Info

Publication number: JP4311710B2
Application number: JP2003036524A
Authority: JP
Inventors: 成一天白; 康雄傍島
Original assignee: ARCADIA, INC.
Current assignee: ARCADIA, INC.
Priority date: 2003-02-14
Filing date: 2003-02-14
Publication date: 2009-08-12
Anticipated expiration: 2023-02-14
Also published as: JP2004246129A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a controller which facilitates generation/correction of voice synthesis data. <P>SOLUTION: In addition to a voice synthesis part 4, a voice synthesis controller 4 is provided. The voice synthesis controller 4 displays parameters on a screen 6 so that they are easy to intuitively understand. An operator obtains desired voice synthesis data by changing parameters by operating the screen display. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の背景】
テキストデータに基づいて音声合成を行う装置において、与えられたテキストデータを形態素解析した後、各単語についてアクセントを付与して音声合成データを得るようにしている（特許文献１）。しかしながら、形態素解析が必ずしも正確に行われるわけでなく、また、各単語のアクセントが常に正確に付与されるとは限らない。
【０００２】
この問題は、音声合成エンジンの辞書に多くの単語を登録したり、アクセントの異なる単語を登録したりすることである程度解決できる。しかし、このような方法では、辞書が肥大化するという他の問題を引き起こしてしまうことになる。
【０００３】
また、音声合成エンジンに適切なパラメータを与えて、所望の音声合成データを得る作業は、音声に対する専門知識が必要であり、簡単ではなかった。
【０００４】
そこで、この発明では、音声合成データの修正を容易化することにより、上記のような問題点を解決しようとするものである。
【０００５】
【特許文献１】
特開2003-005776号公報
【発明の概要および効果】
(1)この発明に係る音声合成制御装置は、音声合成部に対するインターフェイスをとるための音声合成制御装置であって、
音声合成指令を受けると、与えられた文字列を音声合成部に与え、音声合成部から、当該文字列に対応する音声合成データおよび音声合成の際に用いたパラメータを取得し、前記パラメータに基づいて文字列を修飾して表示し、操作者によって、前記文字列の修飾が修正され、音声合成指令が与えられると、当該修正された修飾に基づいてパラメータを修正して音声合成部に与え、音声合成部から修正した音声合成データを取得するものである。
【０００６】
したがって、音声合成部の作成したパラメータを利用しつつ、操作者がこれを修正して所望の音声合成データを得ることができる。また、パラメータに基づいて文字列が修飾されて表示されており、この修飾を修正することによりパラメータを修正できるので、修正の操作が容易である。
【０００７】
(3)この発明に係る音声合成制御装置は、文字列が、漢字まじり文字列またはかな文字列であることを特徴としている。
【０００８】
したがって、漢字まじり文字列またはかな文字列に対して、パラメータに基づいた修飾が施されて表示される。
【０００９】
(4)この発明に係る音声合成制御装置は、操作者から与えられる文字列は漢字まじりの文字列であり、音声合成部は、与えられた漢字まじり文字列に対応するかな文字列を生成し、音声合成部から受けたかな文字列に対して、前記修飾を施して表示することを特徴としている。
【００１０】
したがって、漢字まじり文字列を入力すれば、対応するかな文字列が生成され、修飾が施されたかな文字列が表示される。
【００１１】
(5)この発明に係るインターフェイスプログラムは、コンピュータを用いて音声合成部に対するインターフェイスを実現するためのインターフェイスプログラムであって、
音声合成指令を受けると、与えられた漢字まじり文字列を音声合成部に与え、音声合成部から、当該漢字まじり文字列に対応する音声合成データ、当該漢字まじり文字列に対応するかな文字列および音声合成の際に用いたパラメータを取得し、前記パラメータに基づいて前記かな文字列を修飾して表示し、操作者によって、前記かな文字列の修飾が修正され、音声合成指令が与えられると、当該修正された修飾に基づいてパラメータを修正して音声合成部に与え、音声合成部から修正した音声合成データを取得する処理をコンピュータに行わせるものである。
【００１２】
したがって、音声合成部の作成したパラメータを利用しつつ、操作者がこれを修正して所望の音声合成データを得ることができる。また、パラメータに基づいて文字列が修飾されて表示されており、この修飾を修正することによりパラメータを修正できるので、修正の操作が容易である。さらに、漢字まじり文字列を入力すれば、対応するかな文字列が生成され、修飾が施されたかな文字列が表示される。
【００１３】
(6)この発明に係るインターフェイスプログラムは、操作者によって、表示されたかな文字列が修正され、音声合成指令が与えられると、当該修正されたかな文字列を音声合成部に与え、音声合成部から修正した音声合成データを取得することを特徴としている。
【００１４】
したがって、音声合成部によって生成されたかな文字列が誤っている場合、操作者がこれを修正して、音声合成データを得ることができる。
【００１５】
(7)この発明に係る音声合成制御装置は、前記パラメータが、文字に対応する音の長さに関するパラメータであることを特徴としている。
【００１６】
したがって、音の長さを文字の修飾によって直感的に認識することができ、その修正が容易である。
【００１７】
(8)この発明に係る音声合成制御装置は、前記パラメータが、アクセントに関するパラメータであることを特徴としている。
【００１８】
したがって、アクセントに関するパラメータを文字の修飾によって直感的に認識することができ、その修正が容易である。
【００１９】
(9)この発明に係る音声合成装置は、パラメータがアクセント区切または形態素区切またはその双方であり、文字列の修飾は、前記区切の位置において、表示上の区切が設けられるような修飾であることを特徴としている。
【００２０】
したがって、アクセント区切・形態素区切を文字の修飾によって直感的に認識することができる。
【００２１】
(10)この発明に係る音声合成装置は、表示上の区切を変更することにより、これに応じてパラメータとしてのアクセント区切・形態素区切が修正されることを特徴としている。
【００２２】
したがって、アクセント区切・形態素区切を文字の修飾によって直感的に認識することができ、その修正が容易である。
【００２３】
(11)この発明に係る音声合成装置は、パラメータがアクセントの高低であり、文字列の修飾は、アクセントの高低に合わせて、文字列の配列方向に垂直な方向に、各文字の位置を移動したような修飾であることを特徴としている。
【００２４】
したがって、アクセントの高低を文字の位置によって直感的に認識することができる。
【００２５】
(12)この発明に係る音声合成装置は、文字の位置を変更することにより、これに応じてパラメータとしてのアクセントの高低が修正されることを特徴としている。
【００２６】
したがって、アクセントの高低を文字の位置によって直感的に認識することができ、その修正が容易である。
【００２７】
(13)この発明に係る音声合成装置は、音声合成部が、各文字について音声合成の候補とした複数の音素片の特性情報を出力し、これに応じて、文字について、複数の音素片候補の特性を表示し、操作者によって、用いる音素片が変更され、音声合成指令が与えられると、当該変更された音素片の特性を音声合成部に与え、音声合成部から変更した音声合成データを取得することを特徴としている。
【００２８】
したがって、操作者が適切な音素片を選択して、音声合成データを得ることができる。
【００２９】
(14)この発明に係る音声合成装置は、音素片の特性が、音声合成部に記録されている当該音素片を含む一連の音声データにおける、当該音素片およびその前後の音素片の時間長またはアクセントの高低であることを特徴としている。
【００３０】
したがって、操作者は、この特性に基づいて、適切な音素片を選択することができる。
【００３１】
(15)この発明に係る音声合成装置は、保存指令に応じて、音声合成データを音声ファイルとして保存することを特徴としている。
【００３２】
したがって、生成された音声合成データをファイルとして保存することができる。
【００３３】
(16)この発明に係る音声合成装置は、保存指令に応じて、文字列およびパラメータを音声特性ファイルとして保存することを特徴としている。
【００３４】
したがって、音声合成ために必要なデータを保存することができる。
【００３５】
(17)この発明に係る音声合成装置は、文字列の一部に、特定の文字列を決定しない差替部分を設け、当該差替部分については、パラメータを生成するための情報を記録することを特徴としている。
【００３６】
したがって、差替部分について、文字列を挿入することにより、適切なパラメータにて音声合成を行うことのできる音声特性ファイルを得ることができる。
【００３７】
(18)この発明に係る音声特性ファイルは、音声の発話順に並べられた文字列部分と差替部分を備えており、文字列部分には、文字列および各文字に対応するパラメータが記録されており、差替部分には、文字列が挿入された場合に当該文字列の各文字にどのようなパラメータを与えるかを決定するための情報が記録されていることを特徴としている。
【００３８】
したがって、差替部分について、文字列を挿入することにより、適切なパラメータにて音声合成を行うことができる。
【００３９】
(19)この発明に係るプログラムは、音声特性ファイルに基づいて音声合成部に音声合成を行わせる処理をコンピュータに行わせるためのプログラムであって、
前記音声ファイルには、音声の発話順に並べられた文字列部分と差替部分が設けられており、文字列部分には、文字列および各文字に対応するパラメータが記録されており、差替部分には、文字列が挿入された場合に当該文字列の各文字にどのようなパラメータを与えるかを決定するための情報が記録されており、与えられた音声特性ファイルに基づいて、文字列部分を文字で表示し、差替部分を入力領域として、発話順に表示し、操作者によって、前記入力領域に文字列が挿入されて、音声合成指令が与えられると、文字列部分に対応するパラメータを音声特性ファイルから読み出し、挿入された文字列に対するパラメータを、前記差替部分に対応して記録されている情報を考慮して生成し、音声合成部に与えて音声合成データを得る処理をコンピュータに行わせるためのプログラムである。
【００４０】
したがって、差替部分について、文字列を挿入することにより、適切なパラメータにて音声合成を行うことができる。
【００４１】
(20)この発明に係る音声合成装置は、文字列を受けてパラメータを算出し、文字列およびパラメータに基づいて、文字列に対応する音声合成データを生成し、前記パラメータに基づいて文字列を修飾して表示し、操作者によって、前記文字列の修飾が修正され、音声合成指令が与えられると、当該修正された修飾に基づいてパラメータを修正して音声合成データを生成する。
【００４２】
したがって、パラメータに基づいて文字列が修飾されて表示されており、この修飾を修正することによりパラメータを修正できるので、修正の操作が容易である。
【００４３】
(22)この発明に係る音声合成サーバ装置は、端末装置と通信可能な音声合成サーバ装置であって、
文字列を受けてパラメータを算出し、文字列およびパラメータに基づいて、文字列に対応する音声合成データを生成し、前記パラメータに基づいて文字列を修飾して表示するためのデータを端末装置に送信し、端末装置の操作者によって、前記文字列の修飾が修正され、音声合成指令が送信されてくると、当該修正された修飾に対応するパラメータに基づいて音声合成データを生成して端末装置に送信する。
【００４４】
したがって、端末装置において、パラメータに基づいて文字列を修飾して表示することができ、この修飾を修正することによりサーバ装置において修正した音声合成データを生成できるので、修正の操作が容易である。
【００４５】
(24)この発明に係るサーバ装置は、端末装置の要求に応じて、音声特性ファイルを端末装置に送信するものであり、
音声特性ファイルは、音声の発話順に並べられた文字列部分と差替部分を備えており、文字列部分には、文字列および各文字に対応するパラメータが記録されており、差替部分には、文字列が挿入された場合に当該文字列の各文字にどのようなパラメータを与えるかを決定するための情報が記録されている。
【００４６】
したがって、端末装置に対して、音声特性ファイルを送信することができる。
【００４７】
(20)この発明に係る音声合成方法は、文字列を受けてパラメータを算出し、文字列およびパラメータに基づいて、文字列に対応する音声合成データを生成し、前記パラメータに基づいて文字列を修飾して表示し、操作者によって、前記文字列の修飾が修正され、音声合成指令が与えられると、当該修正された修飾に基づいてパラメータを修正して音声合成データを生成する。
【００４８】
したがって、パラメータに基づいて文字列が修飾されて表示されており、この修飾を修正することによりパラメータを修正できるので、修正の操作が容易である。
【００４９】
この発明において、「文字列の修飾」とは、文字列を構成する文字について、その大きさ、色、配置などの視覚的属性を、他の文字との比較において認識可能なように変更することをいう。
【００５０】
「漢字まじり文字列」とは、少なくとも一以上の漢字を含む文字列をいう。
【００５１】
「パラメータ」とは、文字列を用いて音声合成を行う際に必要な特性データであって、たとえば、アクセントの高低、アクセントの位置、発話長などである。
【００５２】
「プログラム」とは、ＣＰＵにより直接実行可能なプログラムだけでなく、ソース形式のプログラム、圧縮処理がされたプログラム、暗号化されたプログラム等を含む概念である。
【００５３】
【発明の実施の形態】
１．第１の実施形態
(1)全体構成および概要
図１に、この発明の一実施形態による音声合成制御装置と音声合成部の全体構成を示す。音声合成部４は、文字列を与えることにより、音声合成データを作成するものである。この音声合成部４は、文字列を与えて音声合成のためのパラメータを与えない場合には、自らパラメータを生成して音声合成データを作成する。また、パラメータとともに文字列を与えた場合には、当該パラメータにしたがって音声合成データを作成する。
【００５４】
音声合成制御装置２は、この音声合成部４とのインターフェイスをとるための装置である。音声合成制御装置２から文字列を与えると、音声合成部４は、音声合成データとそのパラメータを、音声合成制御装置２に返す。音声合成制御装置２は、文字列をこのパラメータによって修飾して表示する（表示画面６参照）。図１の画面６では、たとえば、かな文字がアクセントの高低に応じて、位置が上下するように表示されている。
【００５５】
操作者は、音声合成データに基づく音を聞いて、所望の音でないと感じた場合には、表示された文字列の修飾を変更することによって、パラメータを変更して音声合成をやり直すことができる。このようにして、操作者は、所望の音声合成データを得ることができる。
【００５６】
(2)ハードウエア構成
図２に、ＣＰＵを用いて実現した場合のハードウエア構成を示す。ＣＰＵ１０には、ディスプレイ１２、メモリ１４、マウス／キーボード１６、ハードディスク（記録装置）１８、ＣＤ−ＲＯＭドライブ２０、サウンドカード２２が接続されている。
【００５７】
ハードディスク１８には、WINDOWS（商標）などのオペレーティングシステム（図示せず）の他、音声合成エンジン２８（音声合成プログラム）、およびその辞書３０、およびインターフェイスプログラム２６が格納されている。これらプログラムおよびデータは、ＣＤ−ＲＯＭ３２に記録されていたものを、ＣＤ−ＲＯＭ２０を介して、ハードディスク１８にインストールしたものである。なお、音声合成エンジン２８、インターフェイスプログラム２６は、オペレーティングシステムと協働してその機能を発揮するものである。なお、音声合成エンジン２８の詳細については、たとえば、本出願人による特許３２２０１６３号を参照されたし。
【００５８】
サウンドカード２２は、与えられた音声合成データをアナログ波形に変換してスピーカ２４に出力するものである。
【００５９】
(3)インターフェイスプログラムの処理
ハードディスク１８に記録されたインターフェイスプログラム２６のフローチャートを図３に示す。
【００６０】
まず、ステップＳ１において、ＣＰＵ１０は、図８に示す初期画面をディスプレイ１２に表示する。操作者は、キーボード１６を用いて、テキスト入力領域４０に漢字まじり文字列を入力する。図８では、操作者によって、「安楽島町の道」が入力された状態が示されている。
【００６１】
次に、操作者が、音声合成指令のボタン４２をマウス１６によってクリックし、音声合成指令が与えられると（ステップＳ２）、ＣＰＵ１０は、音声合成エンジン２８に、入力された漢字まじり文字列を出力する（ステップＳ３）。
【００６２】
音声合成エンジン２８は、これを受けて、音声合成のためのパラメータをＣＰＵ１０に返す（ステップＳ３）。ＣＰＵ１０は、このパラメータをメモリ１４に一時的に記憶する。ここで、パラメータとは、音声合成のために必要な情報であり、たとえば、文字ごとのアクセントの高低、アクセントの区切、形態素の区切などである。また、与える文字列などもパラメータである。
【００６３】
パラメータの一部を図９に示す。与えられた漢字まじり文字列「安楽島町の道」に基づいて、読み「あんらくとーちょーのみち」（かな文字列）が生成されている。また、形態素解析によって、形態素の区切情報４６が示されている。また、各形態素について、その品詞情報４８が示されている。さらに、アクセントの一塊りを表すアクセント区切情報５０も示されている。なお、アクセント区切情報５０は、形態素の区切情報４６も兼ねている。加えて、各かな文字ごとに、アクセントの高低情報５２（ＨＬ）が示されている。
【００６４】
パラメータを受け取ったＣＰＵ１０は、このパラメータに基づいて文字列を修飾し、表示を行う（ステップＳ４）。図１０に示すように、この表示は、パラメータ表示欄５４に表示される。かな文字列「あんらくとーちょーのみち」は、アクセントの高低情報５２に基づいて、その位置が上下して表示されている。つまり、アクセントの高いもの（たとえば「ん」）は上方に表示され、アクセントの低いもの（たとえば「あ」）は下方に表示される。
【００６５】
また、漢字まじり文字列「安楽島町の道」は、形態素区切情報４６、アクセント区切情報５０に基づいて、形態素ごとに、形態素枠５６で囲まれて区別可能に表示されている。
【００６６】
さらに、アクセント区切情報５０に基づいて、かな文字列および漢字まじり文字列の双方が、アクセント枠５８によって囲まれて、アクセントの区切りが明確に示されている。
【００６７】
次に、ＣＰＵ１０は、ステップＳ５において、いずれの指令が与えられているかを判断する。ここでは、音声合成指令が与えられているので、ステップＳ６の音声合成の処理に進む。
【００６８】
音声合成処理のフローチャートを図４に示す。ＣＰＵ１０は、現在メモリ１４に記憶しているパラメータ（図９参照）を音声合成エンジン２８に与える（ステップＳ６１）。音声合成エンジン２８は、これを受けて音声合成データを生成し、ＣＰＵ１０に返す。ＣＰＵ１０は、この音声合成データをハードディスク１８に一時的に記憶する（ステップＳ６２）。
【００６９】
次に、ＣＰＵ１０は、この音声合成データをサウンドカード２２に与える（ステップＳ６３）。これによって、スピーカ２４から音が出力される。ＣＰＵ１０は、ステップＳ５に戻って次の指令を待つ。
【００７０】
操作者は、スピーカ２４からの音を聞いて、これが所望の音でなかった場合には、パラメータの編集を行う。
【００７１】
図５に、編集処理のフローチャートを示す。ここでは、まず、読みを変更する場合について説明する。たとえば、音声合成エンジン２８が示した「あんらくとーちょーのみち」に対して、「あらしまちょーのみち」が正しい読みであったとする。
【００７２】
この場合、操作者は、まず、図１０の画面において、読みがなを訂正したい漢字を選択する。ここでは、「安楽」「島」「町」をマウス１６によってクリックして選択する。次に、編集メニュー６０から、読みがなの編集を選択する。これにより、ＣＰＵ１０は、図１１に示すような読みがな編集の画面を表示する（ステップＳ９２）。操作者は、読みがな入力欄６２に、正しい読み（かな文字列）を入力する。図では、正しい読みである「あらしまちょー」が入力されている。また、この際に、正しい品詞を品詞選択欄６４において選択する。ここでは、「地名」を選択している。
【００７３】
音声合成エンジン２８は、形態素の品詞によって、その形態素自身のアクセントを適切に選択したり、前後の形態素の品詞との関係により、適切なアクセントを決定することができる。したがって、形態素について正しい品詞を与えることは、所望の合成音を得るために重要なことである。
【００７４】
操作者によって編集終了のボタン６６がクリックされると、ＣＰＵ１０は、上記編集内容に基づいて、パラメータを修正する（ステップＳ９６）。つまり、メモリ１４に記憶しているパラメータを、図１２に示すように修正する。なお、この際、読みが変わっているため、元のアクセントの高低情報を用いることができない。したがって、ＣＰＵ１０は、アクセントの高低については、最も一般的である平板型のアクセントを付与する。つまり、「あらしまちょー」について、最初の文字「あ」だけアクセントを低く、２番目以降の文字「らしまちょー」についてはアクセントを高くするようにしている。
【００７５】
ＣＰＵ１０は、この編集さらたパラメータに基づいて、図１３に示すような、文字列を修飾した表示を行う。図からわかるように、読み、アクセントの高低、形態素の区切が変更されていることを、視覚的に確認することができる。
【００７６】
その後、ＣＰＵ１０は、図３のステップＳ５に戻って次の指令を待つ。図１３の画面において、操作者が再生ボタン４２をクリックして、音声合成指令が与えられると、ＣＰＵ１０は、ステップＳ６の音声合成処理を実行する。つまり、メモリ１４に記憶している図１２のパラメータを、音声合成エンジン２８に与える。ＣＰＵ１０は、音声合成エンジン２８からの音声合成データを受けてハードディスク１８に一次的に記憶する。ＣＰＵ１０は、さらに、この音声合成データをサウンドカード２２に与えて音として出力する（図４参照）。その後、ＣＰＵ１０は、図３のステップＳ５に戻って次の指令を待つ。
【００７７】
操作者は、この音を聞いて、所望の音になっているかどうかを判断する。所望の音になっていれば、保存ボタン７０をクリックする。ＣＰＵ１０は、これを受けて、図３、ステップＳ７の音声合成データ保存処理を実行する。
【００７８】
音声合成データ保存処理のフローチャートを図６に示す。ＣＰＵ１０は、ハードディスク１８に一次的に記憶されている音声合成データを、音声合成データファイルとして記録する（ステップＳ７１）。このようにして、所望の音を、保存することができる。なお、音声合成データファイルは、フレキシブルディスクなどの可搬性記録媒体に記録したり、メールなどに添付して送信することもできる。
【００７９】
一方、所望の音になっていない場合には、アクセント区切、形態素区切、アクセントの高低、音素片などの編集を行う（図５参照）。
【００８０】
操作者は、次のようにしてアクセント区切の編集を行うことができる（ステップＳ４３）。たとえば、図１３の表示画面において、アクセント句「あらしまちょーの」とアクセント句「みち」とを、１つのアクセント句にする場合について説明する。まず、マウス１６によって、「あらしまちょーの」のアクセント枠５８と、「みち」のアクセント枠５８を選択する。この状態で、編集メニュー６０の中から、アクセント句結合を選択する。これを受けて、ＣＰＵ１０は、選択されている「あらしまちょーの」と「みち」を結合して、「あらしまちょーのみち」という１つのアクセント句にする。
【００８１】
ＣＰＵ１０は、メモリ１４に記憶されているパラメータを図１４のように修正する。「の」と「みち」との間にあった、アクセント区切が、形態素区切に変更されている。したがって、ＣＰＵ１０は、「あらしまちょーのみち」全体を１つのアクセント枠５８で囲って表示する（ステップＳ９６）。なお、アクセントの高低情報５２や品詞情報４８などは、変更前のものをそのまま用いる。
【００８２】
なお、上記では、アクセント句の結合について説明したが、１つのアクセント句を２つのアクセント句に分割するように編集することもできる。この場合、操作者は、分割したいアクセント句をマウス１６によって指定した後、編集メニュー６０の中から、アクセント句分割を選択する。さらに、マウス１６によって、分割したい位置を指定することにより、アクセント句の分割を行うことができる。この場合も、修正後のパラメータがメモリ１４に記憶され、表示が修正される（ステップＳ９６）。
【００８３】
形態素句の結合や分割についても、上記と同様にして行うことができる。この場合も、修正後のパラメータがメモリ１４に記憶され、表示が修正される（ステップＳ９６）。
【００８４】
アクセント高低の編集は、以下のようにして行う。操作者は、アクセント高低の編集を行いたいアクセント句をマウス１６によって選択する。次に、操作者が、編集メニュー６０からアクセント高低編集を選択すると、ＣＰＵ１０は、図１５に示すようなアクセント高低の編集画面をディスプレイ１２に表示する。図１５では、アクセントの高低に対応付けて、各かな文字が上下位置に配置されている。この図では、かな文字「あ」以外のかな文字は、全て高いアクセントが与えられている。
【００８５】
ここで、かな文字「ま」についてアクセントを低くしたければ、マウス１６を操作して、かな文字枠７２を下方向にドラッグする（ステップＳ９４）。これを受けてＣＰＵ１０は、かな文字「ま」のアクセントを低くするようにメモリ１４のパラメータを変更する。また、図１６に示すように、かな文字「ま」のかな文字枠７２を下方向に移動して表示する（ステップＳ９６）。このようにして、アクセントの高低を編集することができる。
【００８６】
なお、音声合成エンジン２８は、音声合成の際に、音声辞書３０に記録されている多くのサンプル音声から、妥当な音を選択して使用するようにしている。つまり、１つのかな文字に対して、複数の音素片から１つの音素片を選択している。音声合成エンジン２８は、選択した音素片の特性情報だけでなく、候補となったが選択されなかった他の音素片の特性情報も、パラメータとしてＣＰＵ１０に返すようにしている（ステップＳ３参照）。
【００８７】
そこで、この実施形態では、この音素片を変更することが可能なようにしている。音素片の編集は、以下のようにして行う。図１６のアクセント高低編集の画面において、操作者は、音素片の編集を行いたいかな文字枠７２を、マウス１６によってダブルクリックする。たとえば、「し」のかな文字枠７２がダブルクリックされると、ＣＰＵ１０は、図１７に示すような画面を表示する。
【００８８】
図では、かな文字「し」の下に、音素片候補欄９０が表示され、５つの音素片の特性が示されている。音素片候補欄９０の左端には、音素片を特定するための符号（番号）１〜５が示されている。各音素片の特性は、図１８に示すような規則にしたがって示されている。符号「１」のすぐ右隣の「２Ｍ」「ａ」は、辞書３０に記録されているサンプル音における当該音素片の直前の音素片を示している。その右隣の「３Ｍ」「ｓｈｉ」は、当該音素片を示している。右端の「４Ｍ」「ｂ」は、辞書３０に記録されているサンプル音における当該音素片の直後の音素片を示している。
【００８９】
「２Ｍ」「３Ｍ」「４Ｍ」の先頭の数字「２」「３」「４」は、モーラ位置を示している。また、「Ｍ」は、サンプル音におけるアクセントの高さを示している。「Ｈ」が高いアクセント、「Ｌ」が低いアクセント、「Ｍ」が中間のアクセントである。
【００９０】
「ａ」「ｓｈｉ」「ｂ」は、音韻の表記である。つまり、サンプル音における直前の音が「ａ」であり、直後の音が「ｂ」であることを示している。
【００９１】
操作者は、このようにして表記された各音素候補の特性を見て、所望の音素片をマウス１６によって選択する（ステップＳ９５）。ＣＰＵ１０は、選択された音素片の符号をメモリ１４のパラメータに記憶する。
【００９２】
上記のようにして編集を行った後、音声合成ボタン４２をクリックすると、ＣＰＵ１０は、編集後のパラメータを音声合成エンジン２８に与える。したがって、編集されたパラメータにて生成された音声合成データを得て、その音をスピーカ２４から聞くことができる。
【００９３】
編集によって所望の音が得られれば、音声合成データ保存ボタン７０をクリックして、音声合成データファイルをハードディスク１８に記録することができる。
【００９４】
また、この実施形態では、音声合成データをそのまま保存するのではなく、テンプレート（音声特性ファイル）として保存することもできる。テンプレートとは、文字列の一部において文字が特定されておらず、音声合成時に文字を指定して使用するものである。迷子の呼び出し放送など、名前の部分だけを変更すれば、他の部分は同じものを繰り返して使用できるような場合に効果的である。
【００９５】
図１３の画面において、操作者が、テンプレートボタン９２をクリックすると、ＣＰＵ１０は、図３のステップＳ８のテンプレートデータ保存処理を実行する。
【００９６】
テンプレートデータ保存処理のフローチャートを図７に示す。まず、ＣＰＵ１０は、図１９に示すような、テンプレート編集用の画面をディスプレイ１２に表示する。操作者は、文字列を差替えて用いたい部分の形態素を、マウス１６によって指定する。ここでは、「安楽島町」の枠５６を指定したとする。ＣＰＵ１０は、これを受けて、図２０に示すような品詞選択のための表示を行う。操作者は、「安楽島町」の部分に入れられるべき文字列の品詞を選択する（ステップＳ８１）。ここでは、たとえば、地名を選択したものとする。ＣＰＵ１０は、図２１に示すような表示を行う。
【００９７】
操作者が、保存ボタン９４（図１９参照）をクリックすると、ＣＰＵ１０は、メモリ１４に記憶されているパラメータを読み出す（ステップＳ８２）。ＣＰＵ１０は、このパラメータに基づいて、図２２に示すようなテンプレートデータを生成する。
【００９８】
「の」「みち」の部分は、具体的文字列が指定された文字列部分である。（＄地名）の部分は、使用時に差替によって文字列が挿入される差替部分である。差替部分においては、具体的な文字列は指定されず、その品詞が指定されている。品詞を指定しておくことにより、音声合成時に、前後の品詞との関係などによって適切なアクセントの高低などのパラメータを正確に決定することができる。
【００９９】
ＣＰＵ１０は、生成した図２２のデータを、テンプレートデータとして、ハードディスク８に記録する。なお、テンプレートデータは、フレキシブルディスクなどの可搬性記録媒体に記録したり、メールなどに添付して送信することもできる。
【０１００】
上記実施形態では、ステップＳ１において漢字まじり文字列を与えるようにしているが、かな文字列を与えるようにしてもよい。
【０１０１】
また、上記実施形態では、一部に差替部分を含むテンプレートを生成する例を示したが、全てが文字列部分であるような音声特性データを生成して記録するようにしてもよい。
【０１０２】
なお、上記実施形態では、音声合成制御装置２と音声合成部４とが分離したものを示したが、両者が一体となった音声合成装置としてもよい。
【０１０３】
２．第２の実施形態
次に、上記のテンプレートデータに基づいて、音声合成を行うためのテンプレート処理プログラムについて説明する。ハードウエア構成は、図２と同様である。ただし、ハードディスク１８には、インターフェイスプログラムに代えて、テンプレート処理プログラムが格納されている。
【０１０４】
テンプレート処理プログラムのフローチャートを図２３に示す。ＣＰＵ１０は、まず、テンプレートデータを読み込んで、編集画面をディスプレイ１２に表示する（ステップＳ１０１）。図２４に、図２２のテンプレートデータを読み込んだ場合の編集画面の表示例を示す。テンプレートデータの差替部分は、文字列入力部１２０として表示され、文字列部分は、その文字列が表示されている。なお、文字列入力部１２０の下には、差替部分の品詞が表示されている。これは、操作者に対する入力ガイダンスのためである。
【０１０５】
操作者は、キーボード１６を用いて、文字列入力部１２０に所望の文字列を入力する。ここでは、「箕面」が入力されたものとする。入力を終えて音声合成指令ボタン（図示せず）をクリックすると、ＣＰＵ１０は、文字列入力部１２０に入力された文字列について、アクセントの高低や読みなどのパラメータを決定する（ステップＳ１０３）。この際、差替部分に与えられている品詞の情報（ここでは地名）を考慮して、これら読みやアクセントなどのパラメータを決定する。
【０１０６】
次に、ＣＰＵ１０は、音声合成エンジン２８に、パラメータを与えて「江坂の道」の音声合成データを得る（ステップＳ１０４）。さらに、ＣＰＵ１０４は、この音声合成データをサウンドカード２２に与え、音声出力を得る（ステップＳ１０５）。なお、この音声合成データを保存することもできる。
【０１０７】
以上のように、テンプレートを用いれば、合成音声の品質を維持しつつ、差替部分における文字列を変更することができる。
【０１０８】
この実施形態では、パラメータを決定するための情報として品詞情報を用いているが、パラメータを決定するための規則などを用いてもよい。
【０１０９】
３．第３の実施形態
図２５に、上記の音声合成制御装置２と音声合成部４を、サーバ装置２０４によって運用した実施形態を示す。端末装置２００は、インターネット２０２を介して、サーバ装置２０４にアクセスすることができる。端末装置２００、サーバ装置２０４のハードウエア構成は、図２と同様である。また、サーバ装置２０４には、テンプレート処理プログラムも格納されている。
【０１１０】
端末装置２００には、ブラウザプログラムが格納されており、このブラウザプログラムによってサーバ装置２０４からの情報を表示することができる。操作者は、サーバ装置２０４にアクセスして、文字列を与えることにより、これに対応する音声合成データを得ることができる。また、パラメータの編集を行うこともできる。
【０１１１】
図２６、図２７に、端末装置２００の要求に応じて、サーバ装置２０４が音声合成データを生成してダウンロードする処理のフローチャートを示す。なお、このフローチャートでは、音声合成エンジン２８の処理とインターフェイスプログラム２６の処理を区別せずに示している。
【０１１２】
ステップＳ１０１において、端末装置２００は、サーバ装置２０４に対して、入力画面を要求する。サーバ装置２０４は、これに応じて、音声合成のための入力画面を送信する（ステップＳ２０１）。端末装置２００は、この入力画面を表示する（ステップＳ１０２）。
【０１１３】
端末装置２００の操作者は、入力画面において、音声合成を希望する文字列を入力する。文字列が入力された画面を図８に示す。この画面において、端末装置２００の操作者が、音声合成指令ボタン４２をクリックすると、音声合成指令がサーバ装置２０４に送信される（ステップＳ１０４）。
【０１１４】
サーバ装置２０４は、入力された文字列に基づいて、パラメータを生成して音声合成を行う（ステップＳ２０２）。サーバ装置２０４は、音声合成データを端末装置２００に送信する。また、サーバ装置２０４は、生成したパラメータに基づいて、文字列を修飾して表示する画面を端末装置２００に送信する（ステップＳ２０３）。
【０１１５】
端末装置２００は、音声合成データを音として再生する（ステップＳ１０５）。また、サーバ装置２０４から送られてきた画面を表示する。この画面は、図１０に示すように、パラメータによって文字列が修飾されたものとなっている。
【０１１６】
端末装置２０の操作者は、再生した音が所望の音でなければ、読み編集、区切編集、アクセント編集、音素編編集などの編集処理を行う。編集処理による修正指令は、サーバ装置２０４に送信される（ステップＳ１０７）。
【０１１７】
サーバ装置２０４は、この修正指令に基づいて文字列の位置などを修正した修正画面を端末装置２００に送信する。また、パラメータの修正を行う（ステップＳ２０４）。端末装置２００は。修正された画面を表示する（ステップＳ１０８）。たとえば、図１３に示すような画面を表示する。
【０１１８】
この画面において、操作者が音声合成指令ボタン４２をクリックすると、音声合成指令がサーバ装置２０４に送信される（ステップＳ１０９）。サーバ装置２０４は、これを受けて、修正されたパラメータに基づいて音声合成を行う（ステップＳ２０５）。さらに、音声合成データを端末装置２００に送信する（ステップＳ２０６）。
【０１１９】
端末装置２００は、この音声合成データを音として再生して出力する（ステップＳ１１０）。操作者は、所望の音が得られるまで、上記の編集を繰り返す。
【０１２０】
所望の音が得られれば、操作者は、保存ボタン７０をクリックする。これにより、端末装置２００は、音声合成データのダウンロード要求を、サーバ装置２０４に送信する（ステップＳ１１１）。
【０１２１】
サーバ装置２０４は、これを受けて、音声合成データを端末装置２００に記録させる（ステップＳ２０７）。これにより、端末装置２００は、音声合成データをファイルとして保存することができる。
【０１２２】
上記では、音声合成データをダウンロードする場合について説明した。所望の音が合成できた後、テンプレートをダウンロードする場合の処理は、図２８のようなフローチャートとなる。
【０１２３】
端末装置２００の操作者が、テンプレート作成ボタン９２をクリックすると、テンプレート作成画面要求がサーバ装置２０４に送信される（ステップＳ１２１）。これに応じて、サーバ装置２０４からテンプレート作成画面が送られ（ステップＳ２１１）、端末装置２００はこの画面を表示する（ステップＳ１２２）。この画面は、たとえば、図１９のような画面である。
【０１２４】
端末装置２００の操作者は、差替部分の指定や品詞の指定などを入力する（ステップＳ１２３）。この入力処理においては、入力されたデータに基づいて、サーバ装置２０４が変更画面を作成するものであるが、フローチャートでは省略している。データ入力の結果、たとえば、図２１のような画面が表示される。
【０１２５】
端末装置２００の操作者が、テンプレート保存ボタン９４をクリックすると、テンプレートダウンロード要求がサーバ装置２０４に送信される（ステップＳ１２４）。サーバ装置２０４は、テンプレートを作成し（ステップＳ２１２）、作成したテンプレートデータを端末装置に保存させる（ステップＳ２１３）。これにより、端末装置２００において、図２２に示すような、テンプレートデータを保存することができる（ステップＳ１２５）。
【０１２６】
このようにして得た音声合成データやテンプレートは、インターネット２０２などを介して、他の人に配布することができる。音声合成データを受けた他人は、サウンドカード２２を持っていれば、合成音声を聞くことができる。また、テンプレートを受けた他人は、端末装置２０６からサーバ装置２０４にアクセスして、テンプレート処理プログラムを実行し、所望の音声合成データを得ることができる。
【０１２７】
図２９に、テンプレートに基づいて音声合成データを得る場合の処理フローチャートを示す。端末装置２０６の操作者は、サーバ装置２０４にアクセスして、テンプレートを送信する（ステップＳ１５１）。サーバ装置２０４は、これに応じて、テンプレート画面を送信する（ステップＳ２５１）。たとえば、図２４に示すような画面を送信する。端末装置２００では、これを表示する。
【０１２８】
操作者は、このテンプレート画面の差替部分１２０に、所望の文字列を入力する（ステップＳ１５２）。さらに、操作者は、音声合成指令ボタンをクリックし、音声合成指令をサーバ装置２０４に送信する（ステップＳ１５３）。
【０１２９】
サーバ装置２０４は、これを受けて、パラメータを生成し（ステップＳ２５２）、音声合成を行う（ステップＳ２５３）。さらに、生成した音声合成データを、端末装置２００に送信する（ステップＳ２５４）。端末装置２００では、この音声合成データを再生する（ステップＳ１５４）。このようにして、音声合成を行うことができる。また、端末装置において、この音声合成データを保存することもできる。
【０１３０】
上記各実施形態においては、各機能をプログラムによって実現しているが、その一部又は全部を論理回路によって実現してもよい。
【図面の簡単な説明】
【図１】この発明の一実施形態による音声合成制御装置と音声合成部の全体構成を示す図である。
【図２】図１の装置をＣＰＵを用いて実現した場合のハードウエア構成を示す図である。
【図３】インターフェイスプログラムのフローチャートである。
【図４】音声合成処理部分のフローチャートである。
【図５】編集処理部分のフローチャートである。
【図６】音声合成データ保存のフローチャートである。
【図７】テンプレートデータ保存のフローチャートである。
【図８】入力・作業画面の例である。
【図９】生成されたパラメータを示す図である。
【図１０】パラメータに基づいて、文字列の形態を修飾して表示した画面の例である。
【図１１】読みがなの編集画面を示す図である。
【図１２】修正されたパラメータを示す図である。
【図１３】修正された入力・作業画面を示す図である。
【図１４】修正されたパラメータを示す図である。
【図１５】アクセントの編集画面を示す図である。
【図１６】アクセントの編集画面を示す図である。
【図１７】音素片の編集画面を示す図である。
【図１８】音素片の特性の表示例である。
【図１９】テンプレート作成画面を示す図である。
【図２０】品詞選択のための画面を示す図である。
【図２１】テンプレート作成画面を示す図である。
【図２２】テンプレートデータを示す図である。
【図２３】テンプレート処理プログラムのフローチャートである。
【図２４】テンプレートによる音声合成を行う際の画面である。
【図２５】端末装置２００からサーバ装置２０４を使って音声合成を行う場合のシステム構成である。
【図２６】音声合成処理のフローチャートである。端末装置の側はブラウザプログラムの処理、サーバ装置の側はインターフェイスプログラムおよび音声合成エンジンの処理を示している。
【図２７】音声合成処理のフローチャートである。
【図２８】テンプレート作成時のフローチャートである。
【図２９】テンプレートによる再生処理を示すフローチャートである。
【符号の説明】
２・・・音声合成制御装置
４・・・音声合成部
６・・・インターフェイス画面[0001]
BACKGROUND OF THE INVENTION
In a device that performs speech synthesis based on text data, the given text data is subjected to morphological analysis, and then accentuated for each word to obtain speech synthesis data (Patent Document 1). However, the morphological analysis is not necessarily performed accurately, and the accent of each word is not always accurately given.
[0002]
This problem can be solved to some extent by registering many words in the dictionary of the speech synthesis engine or registering words with different accents. However, this method causes another problem that the dictionary is enlarged.
[0003]
Also, the task of obtaining desired speech synthesis data by giving appropriate parameters to the speech synthesis engine requires specialized knowledge of speech and is not easy.
[0004]
Therefore, the present invention seeks to solve the above-described problems by facilitating correction of speech synthesis data.
[0005]
[Patent Document 1]
JP 2003-005776 A
SUMMARY OF THE INVENTION AND EFFECT
(1) A speech synthesis control device according to the present invention is a speech synthesis control device for taking an interface to a speech synthesis unit,
Upon receiving the speech synthesis command, the given character string is given to the speech synthesizer, and the speech synthesis data corresponding to the character string and the parameters used in speech synthesis are acquired from the speech synthesizer, and based on the parameters When the modification of the character string is corrected and a speech synthesis command is given by the operator, the parameter is modified based on the modified modification and given to the speech synthesizer. The voice synthesis data corrected is obtained from the voice synthesis unit.
[0006]
Therefore, the operator can correct desired parameters by using the parameters created by the speech synthesis unit and obtain desired speech synthesis data. In addition, the character string is displayed in a modified form based on the parameter, and the parameter can be modified by modifying the modification, so that the modification operation is easy.
[0007]
(3) The speech synthesis control device according to the present invention is characterized in that the character string is a kanji character string or a kana character string.
[0008]
Therefore, the kanji character string or kana character string is displayed with the modification based on the parameters.
[0009]
(4) In the speech synthesis control device according to the present invention, the character string given from the operator is a kanji character string, and the speech synthesizer generates a kana character string corresponding to the given kanji character string. The character string received from the speech synthesizer is displayed with the modification applied.
[0010]
Therefore, if a kanji character string is input, a corresponding kana character string is generated, and a kana character string with a modification is displayed.
[0011]
(5) An interface program according to the present invention is an interface program for realizing an interface to a speech synthesis unit using a computer,
Upon receiving the voice synthesis command, the given kanji character string is given to the voice synthesizer, and from the voice synthesizer, the voice synthesis data corresponding to the kanji character string, the kana character string corresponding to the kanji character string, and The parameter used in the speech synthesis is acquired, the kana character string is modified and displayed based on the parameter, and the modification of the kana character string is corrected by the operator, and a speech synthesis command is given. Based on the modified modification, the parameter is corrected and given to the speech synthesizer, and the computer is caused to perform processing for acquiring the modified speech synthesis data from the speech synthesizer.
[0012]
Therefore, the operator can correct desired parameters by using the parameters created by the speech synthesis unit and obtain desired speech synthesis data. In addition, the character string is displayed in a modified form based on the parameter, and the parameter can be modified by modifying the modification, so that the modification operation is easy. Furthermore, if a Kanji character string is input, a corresponding kana character string is generated, and a modified kana character string is displayed.
[0013]
(6) The interface program according to the present invention, when the kana character string displayed by the operator is corrected and a voice synthesis command is given, gives the corrected kana character string to the voice synthesizer, and the voice synthesizer It is characterized by acquiring the speech synthesis data corrected from the above.
[0014]
Therefore, when the kana character string generated by the speech synthesizer is incorrect, the operator can correct it and obtain speech synthesis data.
[0015]
(7) The speech synthesis control device according to the present invention is characterized in that the parameter is a parameter related to a length of a sound corresponding to a character.
[0016]
Therefore, the length of the sound can be intuitively recognized by the modification of the characters, and the correction is easy.
[0017]
(8) The speech synthesis control device according to the present invention is characterized in that the parameter is a parameter related to an accent.
[0018]
Therefore, the parameters regarding the accent can be intuitively recognized by the modification of the characters, and the correction is easy.
[0019]
(9) In the speech synthesizer according to the present invention, the parameter is an accent break or morpheme break or both, and the modification of the character string is a modification in which a display break is provided at the position of the break. It is characterized by.
[0020]
Therefore, it is possible to intuitively recognize accent breaks and morpheme breaks by character modification.
[0021]
(10) The speech synthesizer according to the present invention is characterized in that by changing the display partition, the accent partition and morpheme partition as parameters are modified accordingly.
[0022]
Therefore, accent breaks and morpheme breaks can be intuitively recognized by character modification, and correction thereof is easy.
[0023]
(11) In the speech synthesizer according to the present invention, the parameter is the height of the accent, and the modification of the character string is performed by moving the position of each character in a direction perpendicular to the arrangement direction of the character string in accordance with the height of the accent. It is characterized by such modifications.
[0024]
Therefore, the height of the accent can be intuitively recognized by the position of the character.
[0025]
(12) The speech synthesizer according to the present invention is characterized in that, by changing the position of a character, the height of an accent as a parameter is corrected accordingly.
[0026]
Therefore, the height of the accent can be intuitively recognized by the position of the character, and the correction is easy.
[0027]
(13) In the speech synthesizer according to the present invention, the speech synthesizer outputs characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character, and accordingly, a plurality of phoneme candidates for the character When the phoneme segment to be used is changed and a voice synthesis command is given by the operator, the characteristics of the changed phoneme are given to the voice synthesizer, and the voice synthesis data changed from the voice synthesizer is displayed. It is characterized by acquisition.
[0028]
Therefore, the operator can select appropriate phoneme pieces and obtain speech synthesis data.
[0029]
(14) In the speech synthesizer according to the present invention, the characteristics of the phoneme are set to the time length of the phoneme and the phoneme before and after the phoneme in the series of speech data including the phoneme recorded in the speech synthesizer. It is characterized by high and low accents.
[0030]
Therefore, the operator can select an appropriate phoneme piece based on this characteristic.
[0031]
(15) The speech synthesizer according to the present invention is characterized in that speech synthesis data is saved as a speech file in response to a save command.
[0032]
Therefore, the generated speech synthesis data can be saved as a file.
[0033]
(16) The speech synthesizer according to the present invention is characterized in that a character string and a parameter are stored as a speech characteristic file in response to a storage command.
[0034]
Therefore, data necessary for speech synthesis can be stored.
[0035]
(17) The speech synthesizer according to the present invention includes a replacement part that does not determine a specific character string in a part of the character string, and records information for generating a parameter for the replacement part. It is characterized by.
[0036]
Therefore, by inserting a character string for the replacement part, it is possible to obtain a voice characteristic file that can be synthesized with appropriate parameters.
[0037]
(18) The voice characteristic file according to the present invention includes a character string part and a replacement part arranged in the order of speech utterances, and the character string part stores parameters corresponding to the character string and each character. The replacement portion is characterized in that information for determining what parameters are given to each character of the character string when the character string is inserted is recorded.
[0038]
Therefore, speech synthesis can be performed with appropriate parameters by inserting a character string for the replacement part.
[0039]
(19) A program according to the present invention is a program for causing a computer to perform a process of causing a voice synthesis unit to perform voice synthesis based on a voice characteristic file,
The audio file is provided with a character string portion and a replacement portion arranged in the order of speech utterances, and the character string portion includes a character string and parameters corresponding to each character recorded therein. Contains information for determining what parameters are given to each character of the character string when the character string is inserted, and the character string portion is based on the given voice characteristic file. When the operator inserts a character string into the input area and gives a speech synthesis command, the parameter corresponding to the character string part is displayed. Processing for reading out the voice characteristic file and generating parameters for the inserted character string in consideration of the information recorded corresponding to the replacement part, and providing the generated voice synthesis data to the voice synthesis unit Is a program for causing a computer.
[0040]
Therefore, speech synthesis can be performed with appropriate parameters by inserting a character string for the replacement part.
[0041]
(20) The speech synthesizer according to the present invention receives a character string, calculates a parameter, generates speech synthesis data corresponding to the character string based on the character string and the parameter, and converts the character string based on the parameter. When the modification of the character string is corrected by the operator and a speech synthesis command is given, the parameters are modified based on the modified modification to generate speech synthesis data.
[0042]
Therefore, the character string is modified and displayed based on the parameter, and the parameter can be modified by modifying this modification, so that the modification operation is easy.
[0043]
(22) A speech synthesis server device according to the present invention is a speech synthesis server device capable of communicating with a terminal device,
Receives a character string, calculates a parameter, generates speech synthesis data corresponding to the character string based on the character string and the parameter, and modifies the character string based on the parameter to display data for display on the terminal device And when the modification of the character string is corrected by the operator of the terminal device and a speech synthesis command is transmitted, speech synthesis data is generated based on the parameter corresponding to the modified modification, and the terminal device Send to.
[0044]
Therefore, the character string can be modified and displayed on the terminal device based on the parameter, and the modified voice synthesis data can be generated in the server device by modifying this modification, so that the modification operation is easy.
[0045]
(24) The server device according to the present invention transmits a voice characteristic file to the terminal device in response to a request from the terminal device,
The voice characteristic file includes a character string portion and a replacement portion arranged in the order of speech utterances. The character string portion stores a character string and parameters corresponding to each character, and the replacement portion includes Information for determining what parameters are given to each character of the character string when the character string is inserted is recorded.
[0046]
Therefore, the voice characteristic file can be transmitted to the terminal device.
[0047]
(20) A speech synthesis method according to the present invention receives a character string, calculates a parameter, generates speech synthesis data corresponding to the character string based on the character string and the parameter, and converts the character string based on the parameter. When the modification of the character string is corrected by the operator and a speech synthesis command is given, the parameters are modified based on the modified modification to generate speech synthesis data.
[0048]
Therefore, the character string is modified and displayed based on the parameter, and the parameter can be modified by modifying this modification, so that the modification operation is easy.
[0049]
In this invention, “modification of character string” is to change the visual attributes such as size, color, and arrangement of characters constituting the character string so that they can be recognized in comparison with other characters. Say.
[0050]
The “kanji magic character string” refers to a character string including at least one kanji character.
[0051]
The “parameter” is characteristic data necessary for speech synthesis using a character string, and includes, for example, accent height, accent position, speech length, and the like.
[0052]
The “program” is a concept that includes not only a program that can be directly executed by the CPU, but also a source format program, a compressed program, an encrypted program, and the like.
[0053]
DETAILED DESCRIPTION OF THE INVENTION
1. First embodiment
(1) Overall configuration and overview
FIG. 1 shows the overall configuration of a speech synthesis control device and speech synthesis unit according to an embodiment of the present invention. The speech synthesizer 4 creates speech synthesis data by giving a character string. When the speech synthesizer 4 gives a character string and does not give a parameter for speech synthesis, the speech synthesizer 4 generates a parameter by itself and creates speech synthesis data. When a character string is given together with a parameter, speech synthesis data is created according to the parameter.
[0054]
The speech synthesis control device 2 is a device for taking an interface with the speech synthesis unit 4. When a character string is given from the speech synthesis control device 2, the speech synthesis unit 4 returns speech synthesis data and its parameters to the speech synthesis control device 2. The speech synthesis control device 2 displays the character string modified with this parameter (see display screen 6). On the screen 6 of FIG. 1, for example, kana characters are displayed so that their positions rise and fall according to the height of the accent.
[0055]
When the operator hears a sound based on the speech synthesis data and feels that the sound is not a desired sound, the operator can change the parameter and change the parameter to change the displayed character string and redo the speech synthesis. . In this way, the operator can obtain desired speech synthesis data.
[0056]
(2) Hardware configuration
FIG. 2 shows a hardware configuration when implemented using a CPU. Connected to the CPU 10 are a display 12, a memory 14, a mouse / keyboard 16, a hard disk (recording device) 18, a CD-ROM drive 20, and a sound card 22.
[0057]
In addition to an operating system (not shown) such as WINDOWS (trademark), the hard disk 18 stores a speech synthesis engine 28 (speech synthesis program), its dictionary 30, and an interface program 26. These programs and data are those recorded in the CD-ROM 32 and installed in the hard disk 18 via the CD-ROM 20. Note that the speech synthesis engine 28 and the interface program 26 exhibit their functions in cooperation with the operating system. For details of the speech synthesis engine 28, see, for example, Japanese Patent No. 32020163 by the present applicant.
[0058]
The sound card 22 converts the given voice synthesis data into an analog waveform and outputs it to the speaker 24.
[0059]
(3) Interface program processing
A flowchart of the interface program 26 recorded on the hard disk 18 is shown in FIG.
[0060]
First, in step S1, the CPU 10 displays an initial screen shown in FIG. The operator uses the keyboard 16 to input a kanji character string into the text input area 40. FIG. 8 shows a state in which the operator inputs “Arashima Town Road”.
[0061]
Next, when the operator clicks the voice synthesis command button 42 with the mouse 16 and a voice synthesis command is given (step S2), the CPU 10 outputs the input kanji character string to the voice synthesis engine 28. (Step S3).
[0062]
In response to this, the speech synthesis engine 28 returns parameters for speech synthesis to the CPU 10 (step S3). The CPU 10 temporarily stores this parameter in the memory 14. Here, the parameter is information necessary for speech synthesis, and is, for example, accent height for each character, accent segmentation, morpheme segmentation, or the like. A character string to be given is also a parameter.
[0063]
Some of the parameters are shown in FIG. Based on the given Kanji character string “Arajimacho no Michi”, the reading “Ankara Tocho no Michi” (kana character string) is generated. Also, morpheme separation information 46 is shown by morpheme analysis. Also, the part of speech information 48 is shown for each morpheme. Further, accent delimiter information 50 representing a group of accents is also shown. The accent delimiter information 50 also serves as morpheme delimiter information 46. In addition, accent height information 52 (HL) is shown for each kana character.
[0064]
CPU10 which received the parameter modifies a character string based on this parameter, and displays it (step S4). As shown in FIG. 10, this display is displayed in the parameter display field 54. The Kana character string “Antarakuchocho no Michi” is displayed with its position up and down based on the accent height information 52. That is, a high accent (for example, “n”) is displayed above, and a low accent (for example, “a”) is displayed below.
[0065]
Further, the kanji magic character string “Arajimacho no Michi” is displayed in a distinguishable manner by being surrounded by a morpheme frame 56 for each morpheme based on the morpheme division information 46 and the accent division information 50.
[0066]
Further, based on the accent delimiter information 50, both the kana character string and the kanji character string are surrounded by the accent frame 58, and the accent delimiter is clearly shown.
[0067]
Next, the CPU 10 determines which command is given in step S5. Here, since the voice synthesis command is given, the process proceeds to the voice synthesis process in step S6.
[0068]
A flowchart of the speech synthesis process is shown in FIG. The CPU 10 gives the parameters (see FIG. 9) currently stored in the memory 14 to the speech synthesis engine 28 (step S61). In response to this, the speech synthesis engine 28 generates speech synthesis data and returns it to the CPU 10. The CPU 10 temporarily stores this voice synthesis data in the hard disk 18 (step S62).
[0069]
Next, the CPU 10 gives the voice synthesis data to the sound card 22 (step S63). As a result, sound is output from the speaker 24. The CPU 10 returns to step S5 and waits for the next command.
[0070]
The operator listens to the sound from the speaker 24 and, if this is not the desired sound, edits the parameters.
[0071]
FIG. 5 shows a flowchart of the editing process. Here, first, the case of changing the reading will be described. For example, suppose that “Arashimacho no Michi” is the correct reading for “Antarakucho no Michi” indicated by the speech synthesis engine 28.
[0072]
In this case, the operator first selects a Chinese character whose reading is to be corrected on the screen of FIG. Here, “Easy”, “Island” and “Town” are selected by clicking with the mouse 16. Next, editing of reading is selected from the edit menu 60. As a result, the CPU 10 displays a reading editing screen as shown in FIG. 11 (step S92). The operator inputs a correct reading (kana character string) in the reading input field 62. In the figure, the correct reading "Arashicho" is entered. At this time, the correct part of speech is selected in the part of speech selection field 64. Here, “place name” is selected.
[0073]
The speech synthesis engine 28 can appropriately select an accent of the morpheme itself based on the part of speech of the morpheme, or can determine an appropriate accent based on the relationship with the part of speech of the preceding and following morphemes. Therefore, giving a correct part of speech for a morpheme is important for obtaining a desired synthesized sound.
[0074]
When the editing end button 66 is clicked by the operator, the CPU 10 corrects the parameter based on the editing content (step S96). That is, the parameters stored in the memory 14 are corrected as shown in FIG. At this time, since the reading is changed, the height information of the original accent cannot be used. Therefore, the CPU 10 gives a flat plate-type accent that is the most common for the height of the accent. That is, for “Arashimacho”, only the first character “A” has a low accent, and for the second and subsequent characters “Rashimacho”, the accent is high.
[0075]
Based on the edited parameters, the CPU 10 performs a display in which the character string is modified as shown in FIG. As can be seen from the figure, it can be visually confirmed that the reading, the height of the accent, and the morpheme division have been changed.
[0076]
Thereafter, the CPU 10 returns to step S5 in FIG. 3 and waits for the next command. When the operator clicks the playback button 42 on the screen of FIG. 13 and a voice synthesis command is given, the CPU 10 executes the voice synthesis process of step S6. That is, the parameters of FIG. 12 stored in the memory 14 are given to the speech synthesis engine 28. The CPU 10 receives voice synthesis data from the voice synthesis engine 28 and temporarily stores it in the hard disk 18. Further, the CPU 10 gives the voice synthesis data to the sound card 22 and outputs it as sound (see FIG. 4). Thereafter, the CPU 10 returns to step S5 in FIG. 3 and waits for the next command.
[0077]
The operator hears this sound and determines whether or not the sound is a desired sound. If the desired sound is obtained, the save button 70 is clicked. In response to this, the CPU 10 executes the speech synthesis data storage process of step S7 in FIG.
[0078]
A flowchart of the speech synthesis data storage process is shown in FIG. The CPU 10 records the voice synthesis data temporarily stored in the hard disk 18 as a voice synthesis data file (step S71). In this way, a desired sound can be stored. Note that the voice synthesis data file can be recorded on a portable recording medium such as a flexible disk, or can be transmitted as an attachment to an e-mail or the like.
[0079]
On the other hand, if the desired sound is not obtained, edits such as accent segmentation, morpheme segmentation, accent height and phoneme segment are performed (see FIG. 5).
[0080]
The operator can edit accent breaks as follows (step S43). For example, the case where the accent phrase “Arashima Chono” and the accent phrase “Michi” are combined into one accent phrase on the display screen of FIG. 13 will be described. First, the “Arashima Chono” accent frame 58 and the “Michi” accent frame 58 are selected by the mouse 16. In this state, the accent phrase combination is selected from the edit menu 60. In response to this, the CPU 10 combines the selected “Arashima Chono” and “Michi” into one accent phrase “Arashima Cho no Michi”.
[0081]
The CPU 10 corrects the parameters stored in the memory 14 as shown in FIG. The accent division between “no” and “michi” has been changed to a morpheme division. Therefore, the CPU 10 displays the entire “Arashima Cho no Michi” surrounded by one accent frame 58 (step S96). The accent height information 52 and the part-of-speech information 48 are the same as before.
[0082]
In the above description, the combination of accent phrases has been described. However, one accent phrase can be edited so as to be divided into two accent phrases. In this case, the operator specifies an accent phrase to be divided by the mouse 16 and then selects accent phrase division from the edit menu 60. Furthermore, the accent phrase can be divided by designating the position to be divided with the mouse 16. Also in this case, the corrected parameter is stored in the memory 14 and the display is corrected (step S96).
[0083]
Combination and division of morpheme phrases can be performed in the same manner as described above. Also in this case, the corrected parameter is stored in the memory 14 and the display is corrected (step S96).
[0084]
Edit accent height as follows. The operator uses the mouse 16 to select an accent phrase to be edited with accent height. Next, when the operator selects accent height editing from the edit menu 60, the CPU 10 displays an edit screen for accent height as shown in FIG. In FIG. 15, each kana character is arranged at the vertical position in association with the height of the accent. In this figure, the kana characters other than the kana character “A” are all given high accents.
[0085]
Here, if the accent of the kana character “MA” is to be lowered, the mouse 16 is operated and the kana character frame 72 is dragged downward (step S94). In response to this, the CPU 10 changes the parameters of the memory 14 so as to lower the accent of the kana character “ma”. Further, as shown in FIG. 16, the kana character “MA” kana character frame 72 is moved downward and displayed (step S96). In this way, the accent height can be edited.
[0086]
Note that the speech synthesis engine 28 selects and uses an appropriate sound from many sample speeches recorded in the speech dictionary 30 at the time of speech synthesis. That is, one phoneme piece is selected from a plurality of phoneme pieces for one kana character. The speech synthesis engine 28 returns not only the characteristic information of the selected phoneme piece but also the characteristic information of other phoneme pieces that have been selected but not selected as parameters to the CPU 10 (see step S3).
[0087]
Therefore, in this embodiment, this phoneme piece can be changed. The phoneme is edited as follows. On the accent height editing screen of FIG. 16, the operator double-clicks with the mouse 16 the character frame 72 for which the phoneme segment is to be edited. For example, when the character frame 72 of “Sh” is double-clicked, the CPU 10 displays a screen as shown in FIG.
[0088]
In the figure, a phoneme segment candidate field 90 is displayed below the kana character “shi”, and the characteristics of five phonemes are shown. On the left end of the phoneme piece candidate field 90, codes (numbers) 1 to 5 for specifying phoneme pieces are shown. The characteristics of each phoneme piece are shown according to the rules shown in FIG. “2M” and “a” immediately to the right of the code “1” indicate a phoneme segment immediately before the phoneme segment in the sample sound recorded in the dictionary 30. “3M” and “shi” on the right side indicate the phoneme segment. “4M” and “b” at the right end indicate phonemes immediately after the phoneme in the sample sound recorded in the dictionary 30.
[0089]
The numbers “2”, “3”, and “4” at the beginning of “2M”, “3M”, and “4M” indicate mora positions. “M” indicates the height of the accent in the sample sound. “H” is a high accent, “L” is a low accent, and “M” is an intermediate accent.
[0090]
“A”, “shi”, and “b” are notation of phonemes. That is, the immediately preceding sound in the sample sound is “a”, and the immediately following sound is “b”.
[0091]
The operator looks at the characteristics of each phoneme candidate written in this way, and selects a desired phoneme piece with the mouse 16 (step S95). The CPU 10 stores the code of the selected phoneme piece as a parameter of the memory 14.
[0092]
When the speech synthesis button 42 is clicked after editing as described above, the CPU 10 gives the edited parameters to the speech synthesis engine 28. Therefore, it is possible to obtain voice synthesis data generated with the edited parameters and listen to the sound from the speaker 24.
[0093]
If the desired sound is obtained by editing, the voice synthesis data save button 70 can be clicked to record the voice synthesis data file on the hard disk 18.
[0094]
In this embodiment, the voice synthesis data can be saved as a template (voice characteristic file) instead of being saved as it is. In the template, characters are not specified in a part of a character string, and characters are specified and used at the time of speech synthesis. This is effective when only the name part is changed, such as a lost call call, and the other part can be used repeatedly.
[0095]
When the operator clicks the template button 92 on the screen of FIG. 13, the CPU 10 executes the template data storage process in step S8 of FIG.
[0096]
A flowchart of the template data storage process is shown in FIG. First, the CPU 10 displays a template editing screen on the display 12 as shown in FIG. The operator uses the mouse 16 to specify the morpheme of the part to be used by replacing the character string. Here, it is assumed that the frame 56 of “Arajima Island” is designated. In response to this, the CPU 10 performs display for part-of-speech selection as shown in FIG. The operator selects the part of speech of the character string to be put in the “Arajimacho” part (step S81). Here, for example, it is assumed that a place name is selected. The CPU 10 performs display as shown in FIG.
[0097]
When the operator clicks the save button 94 (see FIG. 19), the CPU 10 reads out the parameters stored in the memory 14 (step S82). Based on this parameter, the CPU 10 generates template data as shown in FIG.
[0098]
The “no” and “michi” portions are character string portions in which specific character strings are designated. The part of ($ place name) is a replacement part into which a character string is inserted by replacement at the time of use. In the replacement part, a specific character string is not specified, and its part of speech is specified. By specifying the part of speech, it is possible to accurately determine parameters such as the height of an appropriate accent according to the relationship with the preceding and following parts of speech during speech synthesis.
[0099]
The CPU 10 records the generated data in FIG. 22 on the hard disk 8 as template data. The template data can be recorded on a portable recording medium such as a flexible disk, or can be transmitted as an attachment to an e-mail or the like.
[0100]
In the above embodiment, a kanji character string is given in step S1, but a kana character string may be given.
[0101]
Moreover, although the example which produces | generates the template which includes a replacement part in one part was shown in the said embodiment, you may make it produce | generate and record the audio | voice characteristic data that all are character string parts.
[0102]
In the above embodiment, the voice synthesis control device 2 and the voice synthesis unit 4 are separated from each other. However, a voice synthesis device in which both are integrated may be used.
[0103]
2. Second embodiment
Next, a template processing program for performing speech synthesis based on the template data will be described. The hardware configuration is the same as in FIG. However, the hard disk 18 stores a template processing program instead of the interface program.
[0104]
A flowchart of the template processing program is shown in FIG. First, the CPU 10 reads template data and displays an editing screen on the display 12 (step S101). FIG. 24 shows a display example of the editing screen when the template data of FIG. 22 is read. The replacement portion of the template data is displayed as the character string input unit 120, and the character string is displayed as the character string portion. Note that the part of speech of the replacement part is displayed below the character string input unit 120. This is for input guidance to the operator.
[0105]
The operator inputs a desired character string into the character string input unit 120 using the keyboard 16. In this case, it is assumed that “Minoh” has been input. When the input is completed and a voice synthesis command button (not shown) is clicked, the CPU 10 determines parameters such as accent height and reading for the character string input to the character string input unit 120 (step S103). At this time, parameters such as reading and accent are determined in consideration of part-of-speech information (place name here) given to the replacement part.
[0106]
Next, the CPU 10 gives parameters to the speech synthesis engine 28 to obtain speech synthesis data of “Esaka no Michi” (step S104). Further, the CPU 104 gives this voice synthesis data to the sound card 22 to obtain a voice output (step S105). Note that this speech synthesis data can also be saved.
[0107]
As described above, if a template is used, the character string in the replacement part can be changed while maintaining the quality of the synthesized speech.
[0108]
In this embodiment, part-of-speech information is used as information for determining parameters, but rules for determining parameters may be used.
[0109]
3. Third embodiment
FIG. 25 shows an embodiment in which the voice synthesis control device 2 and the voice synthesis unit 4 are operated by the server device 204. The terminal device 200 can access the server device 204 via the Internet 202. The hardware configurations of the terminal device 200 and the server device 204 are the same as those in FIG. The server device 204 also stores a template processing program.
[0110]
The terminal device 200 stores a browser program, and information from the server device 204 can be displayed by the browser program. The operator can obtain speech synthesis data corresponding to the character string by accessing the server device 204 and giving a character string. It is also possible to edit parameters.
[0111]
FIG. 26 and FIG. 27 show flowcharts of processing in which the server device 204 generates and downloads speech synthesis data in response to a request from the terminal device 200. In this flowchart, the process of the speech synthesis engine 28 and the process of the interface program 26 are shown without being distinguished.
[0112]
In step S <b> 101, the terminal device 200 requests an input screen from the server device 204. In response to this, the server apparatus 204 transmits an input screen for speech synthesis (step S201). The terminal device 200 displays this input screen (step S102).
[0113]
The operator of the terminal device 200 inputs a character string desired for speech synthesis on the input screen. A screen in which a character string is input is shown in FIG. In this screen, when the operator of the terminal device 200 clicks the voice synthesis command button 42, the voice synthesis command is transmitted to the server device 204 (step S104).
[0114]
The server device 204 generates a parameter based on the input character string and performs speech synthesis (step S202). The server device 204 transmits the speech synthesis data to the terminal device 200. Further, the server device 204 transmits a screen to be displayed by modifying the character string to the terminal device 200 based on the generated parameter (step S203).
[0115]
The terminal device 200 reproduces the voice synthesis data as sound (step S105). Also, the screen sent from the server device 204 is displayed. In this screen, as shown in FIG. 10, the character string is modified by a parameter.
[0116]
If the reproduced sound is not a desired sound, the operator of the terminal device 20 performs editing processing such as reading editing, segment editing, accent editing, and phoneme editing. The correction command by the editing process is transmitted to the server device 204 (step S107).
[0117]
The server device 204 transmits a correction screen in which the position of the character string is corrected based on the correction command to the terminal device 200. Further, the parameter is corrected (step S204). The terminal device 200. The corrected screen is displayed (step S108). For example, a screen as shown in FIG. 13 is displayed.
[0118]
When the operator clicks the voice synthesis command button 42 on this screen, the voice synthesis command is transmitted to the server device 204 (step S109). In response to this, the server device 204 performs speech synthesis based on the corrected parameters (step S205). Furthermore, the speech synthesis data is transmitted to the terminal device 200 (step S206).
[0119]
The terminal device 200 reproduces and outputs the voice synthesis data as sound (step S110). The operator repeats the above editing until a desired sound is obtained.
[0120]
If the desired sound is obtained, the operator clicks the save button 70. Thereby, the terminal device 200 transmits a download request for speech synthesis data to the server device 204 (step S111).
[0121]
In response to this, the server device 204 causes the terminal device 200 to record speech synthesis data (step S207). Thereby, the terminal device 200 can store the speech synthesis data as a file.
[0122]
In the above description, the case of downloading speech synthesis data has been described. The process for downloading a template after synthesizing a desired sound is a flowchart as shown in FIG.
[0123]
When the operator of the terminal device 200 clicks the template creation button 92, a template creation screen request is transmitted to the server device 204 (step S121). In response to this, a template creation screen is sent from the server device 204 (step S211), and the terminal device 200 displays this screen (step S122). This screen is, for example, a screen as shown in FIG.
[0124]
The operator of the terminal device 200 inputs replacement part designation, part of speech designation, and the like (step S123). In this input process, the server device 204 creates a change screen based on the input data, but this is omitted in the flowchart. As a result of the data input, for example, a screen as shown in FIG. 21 is displayed.
[0125]
When the operator of the terminal device 200 clicks the template save button 94, a template download request is transmitted to the server device 204 (step S124). The server device 204 creates a template (step S212), and stores the created template data in the terminal device (step S213). Thereby, in the terminal device 200, template data as shown in FIG. 22 can be preserve | saved (step S125).
[0126]
Speech synthesis data and templates obtained in this way can be distributed to other people via the Internet 202 or the like. Others who have received the voice synthesis data can listen to the synthesized voice if they have the sound card 22. Also, another person who has received the template can access the server device 204 from the terminal device 206 and execute the template processing program to obtain desired speech synthesis data.
[0127]
FIG. 29 shows a processing flowchart for obtaining speech synthesis data based on a template. The operator of the terminal device 206 accesses the server device 204 and transmits a template (step S151). In response to this, the server device 204 transmits a template screen (step S251). For example, a screen as shown in FIG. 24 is transmitted. The terminal device 200 displays this.
[0128]
The operator inputs a desired character string in the replacement part 120 of the template screen (step S152). Further, the operator clicks a voice synthesis command button, and transmits the voice synthesis command to the server device 204 (step S153).
[0129]
In response to this, the server device 204 generates a parameter (step S252) and performs speech synthesis (step S253). Further, the generated speech synthesis data is transmitted to the terminal device 200 (step S254). The terminal device 200 reproduces this voice synthesis data (step S154). In this way, speech synthesis can be performed. Further, the speech synthesis data can be stored in the terminal device.
[0130]
In each of the above embodiments, each function is realized by a program, but a part or all of the functions may be realized by a logic circuit.
[Brief description of the drawings]
FIG. 1 is a diagram showing an overall configuration of a speech synthesis control device and a speech synthesis unit according to an embodiment of the present invention.
FIG. 2 is a diagram showing a hardware configuration when the apparatus of FIG. 1 is realized using a CPU.
FIG. 3 is a flowchart of an interface program.
FIG. 4 is a flowchart of a speech synthesis processing part.
FIG. 5 is a flowchart of an editing process part.
FIG. 6 is a flowchart of storing voice synthesis data.
FIG. 7 is a flowchart of template data storage.
FIG. 8 is an example of an input / work screen.
FIG. 9 is a diagram showing generated parameters.
FIG. 10 is an example of a screen displayed by modifying the form of a character string based on parameters.
FIG. 11 is a diagram showing an editing screen for reading.
FIG. 12 is a diagram showing corrected parameters.
FIG. 13 is a diagram showing a corrected input / work screen.
FIG. 14 is a diagram showing corrected parameters.
FIG. 15 is a diagram showing an accent editing screen.
FIG. 16 is a diagram showing an accent editing screen.
FIG. 17 is a diagram showing a phoneme piece editing screen;
FIG. 18 is a display example of characteristics of phoneme pieces;
FIG. 19 is a diagram showing a template creation screen.
FIG. 20 is a diagram showing a screen for part-of-speech selection.
FIG. 21 is a diagram showing a template creation screen.
FIG. 22 is a diagram showing template data.
FIG. 23 is a flowchart of a template processing program.
FIG. 24 is a screen when performing speech synthesis using a template.
FIG. 25 shows a system configuration when speech synthesis is performed from the terminal device 200 using the server device 204.
FIG. 26 is a flowchart of speech synthesis processing. The terminal device side represents browser program processing, and the server device side represents interface program and speech synthesis engine processing.
FIG. 27 is a flowchart of speech synthesis processing.
FIG. 28 is a flowchart for creating a template.
FIG. 29 is a flowchart showing a reproduction process using a template.
[Explanation of symbols]
2. Speech synthesis control device
4. Speech synthesis unit
6 ... Interface screen

Claims

A speech synthesis control device for interfacing with a speech synthesis unit,
Upon receiving a voice synthesis command, the given character string is given to the voice synthesizer, and from the voice synthesizer, the voice synthesis data corresponding to the character string and the parameters used for the voice synthesis are acquired,
Display a modified string based on the parameters,
When the modification of the character string is corrected by the operator and a speech synthesis command is given, the parameters are corrected based on the modified modification and given to the speech synthesizer. A speech synthesis control device to obtain,
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the speech synthesizer displays the changed accent and the characteristics of the changed phoneme candidate. A speech synthesis control device that acquires the synthesized speech synthesis data from the speech synthesis unit.

An interface program for realizing an interface to a speech synthesis unit using a computer,
Upon receiving a voice synthesis command, the given character string is given to the voice synthesizer, and from the voice synthesizer, the voice synthesis data corresponding to the character string and the parameters used for the voice synthesis are acquired,
Display a modified string based on the parameters,
When the modification of the character string is corrected by the operator and a speech synthesis command is given, the parameters are corrected based on the modified modification and given to the speech synthesizer. A program for causing a computer to perform acquisition processing ,
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the speech synthesizer displays the changed accent and the characteristics of the changed phoneme candidate. A program for causing a computer to perform processing for obtaining changed speech synthesis data from the speech synthesis unit.

In the apparatus of claim 1 or the program of claim 2,
The character string is a kanji character string or a kana character string.

In the apparatus or program in any one of Claims 1-3,
The character string given by the operator is a character string of kanji.
The speech synthesizer generates a kana character string corresponding to the given kanji character string,
The character string received from the speech synthesizer is displayed with the modification applied.

An interface program for realizing an interface to a speech synthesis unit using a computer,
Upon receiving the speech synthesis command, the given kanji character string is given to the speech synthesizer, and from the speech synthesizer, the speech synthesis data corresponding to the kanji character string, the kana character string corresponding to the kanji character string, and Get the parameters used for speech synthesis,
The kana character string is modified and displayed based on the parameter,
When the modification of the kana character string is corrected by the operator and a speech synthesis command is given, the parameters are modified based on the modified modification and given to the speech synthesizer. the process of acquiring a program for causing a computer,
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the speech synthesizer displays the changed accent and the characteristics of the changed phoneme candidate. A program for causing a computer to perform processing for obtaining changed speech synthesis data from the speech synthesis unit.

In the program of Claim 5,
When the displayed kana character string is corrected by the operator and a voice synthesis command is given, the corrected kana character string is given to the voice synthesizer, and the corrected voice synthesis data is acquired from the voice synthesizer. Features

In the apparatus or program in any one of Claims 1-6,
The parameter includes a parameter related to a sound length corresponding to the character.

In the apparatus or program in any one of Claims 1-6,
The plurality of phoneme candidate candidates are not displayed until each character is clicked by the user.

The apparatus or program according to claim 8.
The parameter is an accent break or morpheme break or both,
The character string is modified such that a delimiter on display is provided at the position of the delimiter.

The apparatus or program according to claim 9.
By changing the partition on the display, the accent partition as a parameter is corrected accordingly.

In the apparatus or program in any one of Claims 1-10 ,
According to a save command, the voice synthesis data is saved as a voice file.

In the apparatus or program in any one of Claims 1-11 ,
Character strings and parameters are saved as voice characteristics files in response to a save command.

The apparatus or program of claim 12 ,
A replacement part that does not determine a specific character string is provided in a part of the character string, and information for generating a parameter is recorded for the replacement part.

Takes a string and calculates the parameters
Generate speech synthesis data corresponding to the character string based on the character string and parameters,
Display a modified string based on the parameters,
When a modification of the character string is corrected by an operator and a speech synthesis command is given, the speech synthesis control device generates speech synthesis data by correcting parameters based on the modified modification .
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the voice is based on the changed accent and the characteristics of the changed phoneme candidate. A speech synthesis control device that generates synthesized data.

A program for performing speech synthesis processing using a computer,
Takes a string and calculates the parameters
Generate speech synthesis data corresponding to the character string based on the character string and parameters,
Display a modified string based on the parameters,
When the modification of the character string is corrected by the operator and a speech synthesis command is given, the speech synthesis for causing the computer to perform processing for generating the speech synthesis data by correcting the parameters based on the modified modification A program ,
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the voice is based on the changed accent and the characteristics of the changed phoneme candidate. A speech synthesis program for causing a computer to perform processing for generating synthesized data.

A speech synthesis server device capable of communicating with a terminal device,
Takes a string and calculates the parameters
Generate speech synthesis data corresponding to the character string based on the character string and parameters,
Sending data for display by modifying the character string based on the parameters to the terminal device,
When the modification of the character string is modified by the operator of the terminal device and a speech synthesis command is transmitted, speech synthesis data is generated based on parameters corresponding to the modified modification and transmitted to the terminal device. A speech synthesis server device ,
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the voice is based on the changed accent and the characteristics of the changed phoneme candidate. A speech synthesis server device that generates synthesized data.

A program for realizing a speech synthesis server device capable of communicating with a terminal device by a computer,
Takes a string and calculates the parameters
Generate speech synthesis data corresponding to the character string based on the character string and parameters,
Sending data for display by modifying the character string based on the parameters to the terminal device,
When the modification of the character string is modified by the operator of the terminal device and a speech synthesis command is transmitted, speech synthesis data is generated based on parameters corresponding to the modified modification and transmitted to the terminal device. A program for causing a computer to perform processing ,
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the voice is based on the changed accent and the characteristics of the changed phoneme candidate. A program for causing a computer to perform processing for generating composite data.

In the server apparatus or program of Claim 16 or Claim 17 ,
The server device transmits a voice characteristic file to the terminal device in response to a request from the terminal device.
The voice characteristic file is
It has a string part and a replacement part arranged in the order of speech utterances,
In the character string part, the character string and the parameters corresponding to each character are recorded.
Information for determining what parameters are to be given to each character of the character string when the character string is inserted is recorded in the replacement portion.

Takes a string and calculates the parameters
Generate speech synthesis data corresponding to the character string based on the character string and parameters,
Display a modified string based on the parameters,
When the modification of the character string is corrected by an operator and a speech synthesis command is given, the speech synthesis method generates speech synthesis data by correcting parameters based on the modified modification .
The parameters include accent information and characteristic information of a plurality of phonemes that are candidates for speech synthesis for each character.
In the modification of the character string, the position of each character is moved in a direction perpendicular to the arrangement direction of the character string according to the height of the accent, and a series of speech including a corresponding phoneme piece for each character. In the data, addition of a display of a plurality of phoneme candidate candidates by the height of the accent of the phoneme and the phoneme before and after the phoneme,
When the operator changes the vertical position of each character or changes the phoneme candidate to be used and gives a speech synthesis command, the speech synthesizer displays the changed accent and the characteristics of the changed phoneme candidate. A speech synthesis method for obtaining modified speech synthesis data from the speech synthesis unit.

  A speech synthesis control device for interfacing with a speech synthesis unit,
  Upon receiving a voice synthesis command, the given character string is given to the voice synthesizer, and from the voice synthesizer, the voice synthesis data corresponding to the character string and the parameters used for the voice synthesis are acquired,
  Display a modified string based on the parameters,
  When the modification of the character string is corrected by the operator and a speech synthesis command is given, the parameters are corrected based on the modified modification and given to the speech synthesizer. A speech synthesis control device to obtain,
  In response to the save command, it is configured to save the character string and parameters as a voice characteristics file,
  A speech synthesis control device, wherein a replacement part that does not determine a specific character string is provided in a part of the character string, and information for generating a parameter is recorded for the replacement part.

  An interface program for realizing an interface to a speech synthesis unit using a computer,
  Upon receiving a voice synthesis command, the given character string is given to the voice synthesizer, and from the voice synthesizer, the voice synthesis data corresponding to the character string and the parameters used for the voice synthesis are acquired,
  Display a modified string based on the parameters,
  When the modification of the character string is corrected by the operator and a speech synthesis command is given, the parameters are corrected based on the modified modification and given to the speech synthesizer. A program for causing a computer to perform acquisition processing,
  Processing to save character strings and parameters as voice characteristics files according to the save command, and
  A program for providing a replacement part that does not determine a specific character string in a part of the character string, and causing the computer to perform processing for recording information for generating a parameter for the replacement part.

  A speech synthesis server device capable of communicating with a terminal device,
  Takes a string and calculates the parameters
  Generate speech synthesis data corresponding to the character string based on the character string and parameters,
  Sending data for display by modifying the character string based on the parameters to the terminal device,
  When the modification of the character string is modified by the operator of the terminal device and a speech synthesis command is transmitted, speech synthesis data is generated based on parameters corresponding to the modified modification and transmitted to the terminal device. A speech synthesis server device,
  The server device transmits a voice characteristic file to the terminal device in response to a request from the terminal device.
  The voice characteristic file is
  It has a string part and a replacement part arranged in the order of speech utterances,
  In the character string part, the character string and the parameters corresponding to each character are recorded.
  A speech synthesis server device characterized in that information for determining what parameter is to be given to each character of a character string when the character string is inserted is recorded in the replacement part.

  A program for realizing a speech synthesis server device capable of communicating with a terminal device by a computer,
  Takes a string and calculates the parameters
  Generate speech synthesis data corresponding to the character string based on the character string and parameters,
  Sending data for display by modifying the character string based on the parameters to the terminal device,
  When the modification of the character string is modified by the operator of the terminal device and a speech synthesis command is transmitted, speech synthesis data is generated based on parameters corresponding to the modified modification and transmitted to the terminal device. A program for causing a computer to perform processing,
  The server device transmits a voice characteristic file to the terminal device in response to a request from the terminal device.
  The voice characteristic file is
  It has a string part and a replacement part arranged in the order of speech utterances,
  In the character string part, the character string and the parameters corresponding to each character are recorded.
  A program in which information for determining what parameter is to be given to each character of a character string when the character string is inserted is recorded in the replacement part.