JP5787794B2

JP5787794B2 - Speech synthesis system, speech conversion support device, and speech conversion support method

Info

Publication number: JP5787794B2
Application number: JP2012048135A
Authority: JP
Inventors: 町田　淳; 淳町田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2012-03-05
Filing date: 2012-03-05
Publication date: 2015-09-30
Anticipated expiration: 2032-03-05
Also published as: JP2013182256A

Description

本発明の実施形態は、例えばテキスト（文字、記号等）、図形等のキャラクタを音声に変換するサービスに利用される音声合成システム、音声変換支援装置および音声変換支援方法に関する。 Embodiments described herein relate generally to a speech synthesis system , a speech conversion support apparatus, and a speech conversion support method used for a service that converts characters such as text (characters, symbols, etc.) and graphics into speech.

近年、例えばインターネットなどでは、文字を音声に変換するサービスが開始されており、このサービスには音声合成装置が利用されている。 In recent years, for example, on the Internet, a service for converting characters into speech has been started, and a speech synthesizer is used for this service.

一般に、音声合成装置は、ユーザが端末から入力したテキストの文面（文字列）を音声合成波形データに変換し、音声信号または音声ファイルを端末へ返すものである。 In general, a speech synthesizer converts a text (character string) of text input by a user from a terminal into speech synthesis waveform data and returns a speech signal or a speech file to the terminal.

ところで、ユーザが入力したテキストが、例えばひらがななどの場合、同音異義語、つまり文面では同じであるが意味上の違いから発音が異なる単語があり、このような単語を含む文面に対して音声変換処理を実施した場合、ユーザの意図とは異なる発音の音声信号または音声ファイルが端末に返されることがある。 By the way, if the text entered by the user is Hiragana, for example, there are homonyms, that is, words that are the same in the text but different in pronunciation due to semantic differences, and speech conversion is performed for texts containing such words When the process is performed, an audio signal or an audio file having a sound different from the user's intention may be returned to the terminal.

実開平０６−２８９００号公報Japanese Utility Model Publication No. 06-28900

この場合、ユーザは修正したテキストの再変換、つまり音声変換のやり直しを音声合成装置に行わせることになるが、このようなやり直しの処理は音声合成装置に多大な負荷をかけるだけでなく、それ相応の時間を要することから、できればしたくない処理である。 In this case, the user causes the speech synthesizer to perform re-conversion of the corrected text, that is, the speech conversion, but such a re-processing does not only impose a great load on the speech synthesizer. Since it takes a certain amount of time, it is a process that you do not want to do.

本発明が解決しようとする課題は、テキストの文面を音声変換した結果として得られる音声の発音の誤りをなくすことができる音声合成システム、音声変換支援装置および音声変換支援方法を提供することにある。 The problem to be solved by the present invention is to provide a speech synthesis system, a speech conversion support apparatus, and a speech conversion support method that can eliminate an error in pronunciation of speech obtained as a result of speech conversion of a text of a text. .

実施形態の音声合成システムおよび音声変換支援装置は、記憶部、検索部、テスト再生部、制御部を備える。前記記憶部には音声変換装置により変換済の音声データとこの音声データと対応する変換元のテキストデータとが記憶されている。前記受付部は前記端末から新たに入力された変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにして、前記記憶部に前記テスト再生箇所の音声データが存在するか否かを検索する。前記テスト再生部は前記検索部による検索の結果、前記記憶部に前記音声データが存在した場合、該音声データをテスト再生する。前記制御部は前記テスト再生部によりテスト再生された音声データが正しいものとの指示を前記端末から受けた場合、前記テスト再生箇所を除いたテキストデータを前記音声変換装置に変換させて得られた音声データとテスト再生した前記既存の音声データとを結合して前記端末へ転送する。 The speech synthesis system and the speech conversion support apparatus according to the embodiment include a storage unit, a search unit, a test playback unit, and a control unit. The storage unit stores voice data that has been converted by the voice conversion device and conversion source text data corresponding to the voice data. The accepting unit uses the text data of the designated test playback portion among the text data to be converted newly input from the terminal as a keyword, and whether or not the voice data of the test playback location exists in the storage unit Search for. If the audio data exists in the storage unit as a result of the search by the search unit, the test reproduction unit performs test reproduction of the audio data. When the control unit receives an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the control unit is obtained by converting the text data excluding the test reproduction part to the audio conversion device. The voice data and the existing voice data that have been test-reproduced are combined and transferred to the terminal.

実施形態の音声合成システムの全体の構成を示す図である。1 is a diagram illustrating an overall configuration of a speech synthesis system according to an embodiment. アプリケーションサーバのブロック図である。It is a block diagram of an application server. インデックス情報の一例を示す図である。It is a figure which shows an example of index information. この音声合成システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of this speech synthesis system. この音声合成システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of this speech synthesis system. 既存の音声ファイルの検索動作を示すフローチャートである。It is a flowchart which shows the search operation | movement of the existing audio | voice file.

以下、図面を参照して、実施形態を詳細に説明する。
（第１の実施形態）図１は実施形態の音声合成システムの構成を示す図である。 Hereinafter, embodiments will be described in detail with reference to the drawings.
(First Embodiment) FIG. 1 is a diagram showing the configuration of a speech synthesis system according to an embodiment.

図１に示すように、この実施形態の音声合成システムは、サービス利用者（以下「ユーザ」と称す）が操作する端末であるコンピュータ１（以下「ユーザＰＣ１」と称す）、音声変換装置としてのコンピュータである音声合成サーバ３と、音声変換支援装置としてのコンピュータ２（以下「アプリケーションサーバ２」と称す）と、これらの機器をネットワーク４を介して接続されたものである。 As shown in FIG. 1, the speech synthesis system of this embodiment includes a computer 1 (hereinafter referred to as “user PC1”), which is a terminal operated by a service user (hereinafter referred to as “user”), and a speech conversion apparatus. A speech synthesis server 3 which is a computer, a computer 2 (hereinafter referred to as “application server 2”) as a speech conversion support device, and these devices are connected via a network 4.

音声合成サーバ３は、音声合成エンジンを搭載しており、テキストデータを音声データに変換する。詳細には、音声合成サーバ３は、アプリケーションサーバ２から送られてきた中間ファイル（テキストデータとアクセント記号のペア（組））を音声データ（以下「音声ファイル」と称す）に変換してアプリケーションサーバ２に戻す。 The speech synthesis server 3 has a speech synthesis engine and converts text data into speech data. Specifically, the speech synthesis server 3 converts the intermediate file (a pair of text data and an accent symbol) sent from the application server 2 into speech data (hereinafter referred to as “speech file”), and then converts the intermediate file into the application server. Return to 2.

アプリケーションサーバ２は、音声合成サーバ３とユーザＰＣ１との間に介在してテキストデータ、中間ファイルおよび音声ファイルのやりとりを行う。詳細には、アプリケーションサーバ２は、ユーザＰＣ１から入力されたテキストデータを音声合成サーバ３に変換させて得られた音声データをユーザＰＣ１へ転送する。 The application server 2 is interposed between the voice synthesis server 3 and the user PC 1 and exchanges text data, intermediate files, and voice files. Specifically, the application server 2 transfers to the user PC 1 the voice data obtained by converting the text data input from the user PC 1 to the voice synthesis server 3.

図２に示すように、アプリケーションサーバ２は、グラフィックユーザインターフェース部２１（以下「ＧＵＩ部２１」と称す）、メモリ２２、中間ファイル生成部２３、キャッシュされたデータを管理するためのインデックス情報記憶部２４、キャッシュデータ保存部２５、検索部２６、通信処理部２７、データ処理部２８、登録部２９、テスト再生部３０などを有している。 As shown in FIG. 2, the application server 2 includes a graphic user interface unit 21 (hereinafter referred to as “GUI unit 21”), a memory 22, an intermediate file generation unit 23, and an index information storage unit for managing cached data. 24, a cache data storage unit 25, a search unit 26, a communication processing unit 27, a data processing unit 28, a registration unit 29, a test reproduction unit 30, and the like.

アプリケーションサーバ２は、ユーザＰＣ１から入力されたテキストデータをキーワード（検索キー）にしてインデックス情報記憶部２４のインデックス情報を利用してキャッシュデータ保存部２５にキャッシュ（記憶）された音声ファイルを検索し、ヒットした場合は、音声合成サーバ３に音声合成を要求することなく、キャッシュされた音声ファイルを読み出して要求元であるユーザＰＣ１に返す。 The application server 2 searches the audio data cached (stored) in the cache data storage unit 25 using the index information in the index information storage unit 24 using the text data input from the user PC 1 as a keyword (search key). If there is a hit, the cached voice file is read out and returned to the requesting user PC 1 without requesting the voice synthesis server 3 for voice synthesis.

すなわち、アプリケーションサーバ２は、自身のハードディスク装置にキャッシュされているか否かをチェックし、キャッシュされていない場合に、ユーザＰＣ１から入力されたテキストデータを音声合成サーバ３へ出力し、このテキストデータに対する応答として音声合成サーバ３にて変換（音声合成）された音声ファイルを取得しユーザＰＣ１へ送る。 In other words, the application server 2 checks whether or not it is cached in its own hard disk device, and if it is not cached, it outputs the text data input from the user PC 1 to the speech synthesis server 3, and As a response, the voice file converted (voice synthesized) by the voice synthesis server 3 is acquired and sent to the user PC 1.

ＧＵＩ部２１は、ユーザＰＣ１からアプリケーションサーバ２にログインするための画面、検索画面、登録画面などを表示し、ユーザＰＣ１からの音声合成要求、テキストデータの入力などを受け付けるとともに、要求に対する応答として音声ファイルをユーザＰＣ１へ送る。ＧＵＩ部２１は、例えば検索画面において、ユーザＰＣ１から新たに入力された変換対象のテキストデータのうち指定されたテスト再生箇所を受け付ける受付部として機能する。 The GUI unit 21 displays a screen for logging in to the application server 2 from the user PC 1, a search screen, a registration screen, and the like, accepts a voice synthesis request from the user PC 1, input of text data, etc., and receives a voice as a response to the request. Send the file to user PC1. For example, on the search screen, the GUI unit 21 functions as a reception unit that receives a designated test reproduction portion of text data to be converted newly input from the user PC 1.

つまり、ＧＵＩ部２１は、ユーザＰＣ１とアプリケーションサーバ２との間の入出力インターフェースを実現するものである。 That is, the GUI unit 21 realizes an input / output interface between the user PC 1 and the application server 2.

メモリ２２は、データ処理部２８、検索部２６および登録部２９などがそれぞれの処理を実行する際のワークエリア、変換要求する際に作成された中間ファイルの一時記憶エリアとして利用される。 The memory 22 is used as a work area when the data processing unit 28, the search unit 26, the registration unit 29, etc. execute the respective processes, and as a temporary storage area for the intermediate file created when a conversion request is made.

中間ファイル生成部２３は、ユーザＰＣ１から入力された変換対象のテキストデータを単語または文節の単位に分割し、分割したテキストデータのうちキャッシュされていないもの、またはテスト再生でユーザにより発音が正しくないものと指示されたものをキーワードにして音声変換辞書を参照して、対応するアクセント記号を音声変換辞書から読み出してテキストデータとアクセント記号とのペア（組）の中間ファイルを生成し、音声合成サーバ３への変換要求のためのデータとしてメモリ２２に記憶する。この中間ファイルは、音声合成用の元データとして音声合成サーバ３へ送信される。 The intermediate file generation unit 23 divides the text data to be converted input from the user PC 1 into units of words or phrases, and the divided text data is not cached, or the pronunciation is not correct by the user in the test playback. Refer to the speech conversion dictionary using what is designated as a keyword, read the corresponding accent symbol from the speech conversion dictionary, generate an intermediate file of text data and accent symbol pairs, and generate a speech synthesis server 3 is stored in the memory 22 as data for requesting conversion to 3. This intermediate file is transmitted to the speech synthesis server 3 as original data for speech synthesis.

なお、既にユーザの承認を受け、変換不要（確定済）のフラグが付されたテキストデータについては、音声変換をしないため中間ファイルも生成しない。 For text data that has already been approved by the user and that has been flagged as not requiring conversion (confirmed), no intermediate file is generated because voice conversion is not performed.

キャッシュデータ保存部２５には、以前（過去）に変換されたファイル（音声データのファイル、テキストデータのファイル、中間データのファイルなど）が保存されている。 The cache data storage unit 25 stores files (voice data files, text data files, intermediate data files, etc.) that have been converted into the previous (past) format.

つまりキャッシュデータ保存部２５には、音声合成サーバ３により変換済の音声データとこの音声データと対応する変換元のテキストデータとが記憶されている。 That is, the cache data storage unit 25 stores the voice data converted by the voice synthesis server 3 and the conversion source text data corresponding to the voice data.

インデックス情報記憶部２４には、キャッシュデータ保存部２５にキャッシュ（保存）されている過去のデータ（ファイル）をユーザとの親和性の高い順に検索するための、絞り込み範囲の異なる複数のインデックス情報が順位付けして記憶されている。 The index information storage unit 24 includes a plurality of pieces of index information with different narrowing ranges for searching past data (files) cached (saved) in the cache data storage unit 25 in descending order of affinity with the user. They are ranked and stored.

インデックス情報記憶部２４には、ユーザ毎の音声変換用の辞書、変換履歴などが記憶されている。辞書にはユーザが独自に登録した音声データの保存先を示すインデックス情報が記憶されている。変換履歴は個々のユーザが音声変換したときのインデックス情報を変換履歴として残したものである。 The index information storage unit 24 stores a voice conversion dictionary for each user, a conversion history, and the like. The dictionary stores index information indicating the storage destination of the voice data uniquely registered by the user. The conversion history is obtained by leaving the index information when each user performs voice conversion as the conversion history.

インデックス情報は、音声データの保存先を示す情報（リンク先）と対応した変換元のテキストデータとこのテキストデータを音声に変換する際に音の強弱を指定するアクセント記号（アクセント情報）とのペア（組）が保存された参照用の辞書である。音声データは音声合成サーバ３により変換（音声合成）された音声データである場合もある。この音声変換辞書は中間ファイル生成部２３により利用される。 The index information is a pair of text data of the conversion source corresponding to the information (link destination) indicating the storage destination of the audio data and an accent symbol (accent information) that specifies the strength of the sound when this text data is converted into audio. This is a reference dictionary in which (set) is stored. The voice data may be voice data converted (voice synthesis) by the voice synthesis server 3. This voice conversion dictionary is used by the intermediate file generation unit 23.

つまりインデックス情報記憶部２４には、キャッシュデータ保存部２５にキャッシュされたデータを管理するためのインデックス情報が記憶されている。なおインデックス情報については図３で具体的に説明する。 That is, the index information storage unit 24 stores index information for managing the data cached in the cache data storage unit 25. The index information will be specifically described with reference to FIG.

検索部２６はユーザＰＣ１から新たに入力された変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにして、キャッシュデータ保存部２５にテスト再生箇所の音声データが存在するか否かを検索する。 The search unit 26 uses the text data of the designated test playback location among the text data to be converted newly input from the user PC 1 as a keyword, and whether or not the voice data of the test playback location exists in the cache data storage unit 25. Search for.

検索部２６はユーザＩＤ毎の登録辞書４２、ユーザＩＤ毎の変換履歴４３、およびサーバ使用履歴４４を予め設定された優先順位の順に検索する。検索部２６は、要求元から入力されたテキストデータを基にインデックス情報記憶部２４の複数のインデックス情報を順位付け（図６の第１ファイル照合、第２ファイル照合、第３ファイル照合、第４ファイル照合）の順に検索して既存の音声ファイル（音声データ）の有無を確認する。 The search unit 26 searches the registration dictionary 42 for each user ID, the conversion history 43 for each user ID, and the server usage history 44 in the order of preset priority. The search unit 26 ranks the plurality of pieces of index information in the index information storage unit 24 based on the text data input from the request source (first file collation, second file collation, third file collation, fourth Search in the order of (file verification) and check whether there is an existing audio file (audio data).

第１ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにしてログインしたユーザ自身のユーザＩＤの登録辞書４２（Ｕｓｅｒ１）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the first file collation, the search unit 26 searches the voice file (voice data) at the test playback location with reference to the registered dictionary 42 (User1) of the user ID of the logged-in user using the text data at the test playback location as a keyword. To do.

第２ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにしてログインしたユーザ自身のユーザＩＤの変換履歴４３（Ｕｓｅｒ１）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the second file collation, the search unit 26 searches the voice file (voice data) at the test playback location with reference to the conversion history 43 (User1) of the user ID of the logged-in user using the text data at the test playback location as a keyword. To do.

第３ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにして複数のユーザの登録辞書４２（Ｕｓｅｒ２…）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the third file collation, the search unit 26 searches the voice file (voice data) at the test playback location by referring to the registered dictionaries 42 (User2...) Of a plurality of users using the text data at the test playback location as a keyword.

第４ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにして複数のユーザのユーザＩＤの変換履歴４３（Ｕｓｅｒ２…）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the fourth file collation, the search unit 26 searches the audio file (audio data) at the test reproduction location by referring to the conversion history 43 (User2...) Of the user IDs of a plurality of users using the text data at the test reproduction location as a keyword. To do.

また検索部２６は、ＧＵＩ部２１がユーザＰＣ１に表示する画面において、ユーザＰＣ１から入力されたテキストデータのうちテスト再生箇所が指定された場合、ＧＵＩ部２１により受け付けられたテスト再生箇所の過去の変換済みデータ、つまり音声ファイル（音声データ）がキャッシュデータ保存部２５に存在するか否かを検索する。 In addition, when the test playback location is specified in the text data input from the user PC 1 on the screen displayed on the user PC 1 by the GUI unit 21, the search unit 26 stores the past of the test playback locations accepted by the GUI unit 21. It is searched whether converted data, that is, an audio file (audio data) exists in the cache data storage unit 25.

テスト再生部３０は、検索部２６による検索の結果、キャッシュデータ保存部２５にキャッシュ（保存）されている過去の変換済みデータ、つまり既存の音声ファイル（音声データ）が存在した場合、該既存の音声ファイル（音声データ）をテスト再生する。 As a result of the search by the search unit 26, the test playback unit 30, when past converted data cached (saved) in the cache data storage unit 25, that is, an existing audio file (audio data) exists, Test play an audio file (audio data).

通信処理部２７は、音声合成サーバ３との間で、ＴＣＰ（ＨＴＴＰ）通信により、データのやりとりを行う。 The communication processing unit 27 exchanges data with the speech synthesis server 3 by TCP (HTTP) communication.

データ処理部２８は、検索部２６による検索の結果、テキストデータ（検索キーまたはキーワード等とも言う）が存在したインデックス情報により示される過去のデータの保存先、つまりキャッシュデータ保存部２５からキャッシュされている既存ファイルに含まれる音声ファイルを読み出して変換要求元のユーザＰＣ１へ転送する。 As a result of the search by the search unit 26, the data processing unit 28 is cached from the past data storage destination indicated by the index information in which text data (also referred to as a search key or keyword) exists, that is, from the cache data storage unit 25. The audio file included in the existing file is read out and transferred to the user PC 1 as the conversion request source.

ユーザＰＣ１の画面において、テスト再生部３０によりテスト再生されたテスト再生箇所の音声データの発音が正しいものとの指示をユーザが行い、その指示をデータ処理部２８がユーザＰＣ１から受けた場合、データ処理部２８は、テスト再生箇所を除いたテキストデータを音声合成サーバ３に変換させて得られた音声データとテスト再生した既存の音声データとを結合してユーザＰＣ１へ返却する制御部として機能する。 When the user gives an instruction on the screen of the user PC 1 that the sound of the test data reproduced by the test reproduction unit 30 is correct and the data processing unit 28 receives the instruction from the user PC 1, the data The processing unit 28 functions as a control unit that combines the voice data obtained by converting the text data excluding the test playback portion into the voice synthesis server 3 and the existing voice data that has been tested and played back to the user PC 1. .

一方、いずれのインデックス情報にもキーワードが存在しない場合、データ処理部２８は、入力されたテキストデータを基に中間ファイル生成部２３により生成されメモリ２２に記憶された中間ファイルを音声合成サーバ３へ送り、音声合成サーバ３により変換（音声合成）された音声ファイルを要求元のユーザＰＣ１へ転送する。なお中間ファイルではなく入力されたテキストデータを送ってもよい。 On the other hand, if no keyword exists in any index information, the data processing unit 28 sends the intermediate file generated by the intermediate file generation unit 23 based on the input text data and stored in the memory 22 to the speech synthesis server 3. Then, the voice file converted (voice synthesized) by the voice synthesis server 3 is transferred to the requesting user PC 1. It is also possible to send input text data instead of an intermediate file.

登録部２９は、ＧＵＩ部２１により表示される辞書登録画面にて、メモリ２２内のユーザＩＤ毎の登録辞書４２にユーザが独自に入力または編集した情報（音声データとテキストデータとアクセント記号）を登録すると共に、メモリ２２の登録辞書４２の管理情報をインデックス情報記憶部２４に登録する。 The registration unit 29 uses the dictionary registration screen displayed by the GUI unit 21 to store information (voice data, text data, and accent marks) that the user has entered or edited in the registration dictionary 42 for each user ID in the memory 22. At the same time, the management information of the registration dictionary 42 in the memory 22 is registered in the index information storage unit 24.

図３に示すように、インデックス情報記憶部２４には、ユーザＩＤテーブル４１の各ユーザＩＤに紐付けられた複数のインデックス情報（ユーザＩＤ毎の登録辞書４２（Ｕｓｅｒ１，２…）、ユーザＩＤ毎の変換履歴４３（Ｕｓｅｒ１，２…）、このアプリケーションサーバ２のすべてのユーザの使用履歴４４（以下「サーバ使用履歴４４」と称す）などの３つのインデックス情報）が記憶されている。このアプリケーションサーバ２にログインしたユーザ一人の変換履歴４３を第１変換履歴という。このアプリケーションサーバ２にログインした複数のユーザ（例えば今までログインした全てのユーザといってもよい）の変換履歴を第２変換履歴という。 As shown in FIG. 3, the index information storage unit 24 stores a plurality of pieces of index information (registered dictionaries 42 for each user ID (User 1, 2...)) Associated with each user ID in the user ID table 41, for each user ID. Conversion history 43 (User 1, 2...) And usage index 44 (hereinafter referred to as “server usage history 44”) of all users of the application server 2) are stored. The conversion history 43 of one user who has logged into the application server 2 is referred to as a first conversion history. A conversion history of a plurality of users who have logged into the application server 2 (for example, all users who have logged in so far) is referred to as a second conversion history.

ユーザＩＤテーブル４１には、このアプリケーションサーバ２にログイン可能なユーザの識別情報であるユーザＩＤが設定されている。ユーザの識別情報は、ユーザＩＤだけでなくパスワードなども含まれる。 In the user ID table 41, a user ID that is identification information of a user who can log in to the application server 2 is set. The user identification information includes not only the user ID but also a password.

ユーザＩＤ毎の登録辞書４２（Ｕｓｅｒ１，２…）には、ユーザが独自に登録したテキストデータ（これを「テキスト」という）と、テキストとそのアクセスト記号のペア（組）である中間ファイルと、これらのデータに対応する音声ファイルの保存先を示す保存先インデックスとが記憶されている。 The registration dictionary 42 (User 1, 2,...) For each user ID includes text data (this is referred to as “text”) uniquely registered by the user, an intermediate file that is a pair of text and its access symbol, and The storage destination index indicating the storage destination of the audio file corresponding to these data is stored.

このユーザＩＤ毎の登録辞書４２（Ｕｓｅｒ１，２…）は、検索の際の順位として第１，２番目の順位（第１，２順位）に設定されており、検索部２６がキャッシュデータを検索するときに初めに参照される。この辞書の中での順位は、第１順位が中間ファイル、第２順位がテキストである。インデックス情報としての検索順位は第１番目である。 The registration dictionary 42 (User 1, 2...) For each user ID is set to the first and second ranks (first and second ranks) as the ranks for the search, and the search unit 26 searches the cache data. When you refer to it first. In the dictionary, the first rank is an intermediate file, and the second rank is text. The search order as index information is first.

ユーザＩＤ毎の変換履歴４３（Ｕｓｅｒ１，２…）には、ログインしたユーザＩＤ毎にこの音声変換機能を使用したときのテキストと、テキストとそのアクセスト記号のペア（組）である中間ファイルと、これらのデータに対応する音声ファイルの保存先を示す保存先インデックスとが記憶されている。テキスト、中間ファイル、保存先インデックスなどを使用履歴という。 The conversion history 43 for each user ID (User 1, 2,...) Includes text when this voice conversion function is used for each logged-in user ID, an intermediate file that is a pair of text and its access symbol, and The storage destination index indicating the storage destination of the audio file corresponding to these data is stored. Text, intermediate files, save destination indexes, etc. are called usage history.

ユーザＩＤ毎の変換履歴４３（Ｕｓｅｒ１，２…）は、検索の際の順位として第３，４番目の順位（第３，４順位）に設定されており、検索部２６がキャッシュデータを検索するときに第３，４番目に参照される。この履歴の中での順位は、第３順位が中間ファイル、第４順位がテキストである。インデックス情報としての検索順位は第２番目である。 The conversion history 43 (User 1, 2,...) For each user ID is set to the third and fourth ranks (third and fourth ranks) as the ranks for the search, and the search unit 26 searches the cache data. Sometimes referred to 3rd and 4th. In the history, the third rank is an intermediate file, and the fourth rank is text. The search order as index information is second.

サーバ使用履歴４４には、このアプリケーションサーバ２にユーザがログインしてこの音声変換機能を使用した際のすべてのユーザの使用履歴が記憶されている。サーバ使用履歴４４は、検索の際の順位として第５，６番目の順位（第５，６順位）に設定されており、検索部２６がキャッシュデータを検索するときに第５，６番目に参照される。この履歴の中での順位は、第５順位が中間ファイル、第６順位がテキストである。インデックス情報としての検索順位は第３番目である。 The server usage history 44 stores usage histories of all users when the user logs in to the application server 2 and uses the voice conversion function. The server usage history 44 is set to the fifth and sixth ranks (fifth and sixth ranks) as the rank in the search, and is referred to the fifth and sixth when the search unit 26 searches the cache data. Is done. In the history, the fifth rank is an intermediate file, and the sixth rank is text. The search order as index information is the third.

次に、図４〜図６のフローチャートを参照してこの実施形態の音声合成システムに動作を説明する。まず図４，図５のフローチャートを参照してこのシステム全体の動作を説明する。
この実施形態の音声合成システムの場合、ユーザがユーザＰＣ１から所定のＵＲＬを入力し、アプリケーションサーバ２にアクセスすると、ＧＵＩ部２１はログイン画面をユーザＰＣ１に表示するので、ユーザは表示されたログイン画面の入力欄に、ログインＩＤ、例えば「Ｕｓｅｒ１」などのログイン情報を入力する（図４のステップＳ１０１）。この他、ログイン情報としてパスワードなども入力する場合がある。 Next, the operation of the speech synthesis system of this embodiment will be described with reference to the flowcharts of FIGS. First, the operation of the entire system will be described with reference to the flowcharts of FIGS.
In the case of the speech synthesis system of this embodiment, when the user inputs a predetermined URL from the user PC 1 and accesses the application server 2, the GUI unit 21 displays a login screen on the user PC 1, so that the user displays the displayed login screen. The login information such as “User1”, for example, is input in the input field (step S101 in FIG. 4). In addition, a password may be input as login information.

すると、ＧＵＩ部２１は入力されたログイン情報をメモリ２２のユーザＩＤテーブル４１のユーザＩＤと照合することで、ログイン情報が登録済みか否かを判定し（ステップＳ１０２）、ログイン情報が登録済みの場合（ステップＳ１０２のＹｅｓ）、アプリケーションサーバ２へのログインを許可し、音声変換画面を表示する（ステップＳ１０３）。 Then, the GUI unit 21 determines whether the login information has been registered by checking the input login information with the user ID in the user ID table 41 of the memory 22 (step S102). In the case (Yes in step S102), login to the application server 2 is permitted and a voice conversion screen is displayed (step S103).

ユーザが、音声変換画面の文字入力欄にカーソルを移動し、キー入力により変換対象の文字（テキストデータ）を入力すると（ステップＳ１０４）、ＧＵＩ部２１はその入力を受け付け、入力されたテキストデータを文字入力欄に表示する（ステップＳ１０５）。 When the user moves the cursor to the character input field on the voice conversion screen and inputs a character to be converted (text data) by key input (step S104), the GUI unit 21 accepts the input and receives the input text data. It is displayed in the character input field (step S105).

そして、音声変換画面に表示されている音声ファイル作成指示のためのボタンが押下されると（ステップＳ１０６のＹｅｓ）、音声ファイル作成処理のルーチンへ進む。 When the button for voice file creation instruction displayed on the voice conversion screen is pressed (Yes in step S106), the routine proceeds to a routine for voice file creation processing.

一方、文字入力欄に表示されているテキストデータのうちテスト再生箇所をユーザが範囲指定などの操作（マウスのドラッグ操作など）により指定され、音声変換画面に表示されているテスト再生ボタンが押下されると（ステップＳ１０７のＹｅｓ）、ＧＵＩ部２１はテスト再生ボタンの押下を受け付け、検索部２６に通知する。 On the other hand, the test playback location in the text data displayed in the character input field is specified by the user by an operation such as range specification (such as a mouse drag operation), and the test playback button displayed on the voice conversion screen is pressed. Then (Yes in step S107), the GUI unit 21 accepts the pressing of the test playback button and notifies the search unit 26 of it.

検索部２６は、既存の音声ファイルがこの装置内（メモリ２２またはインデックス情報記憶部２４）に存在するか否かを検索する（ステップＳ１０８）。 The search unit 26 searches whether or not an existing audio file exists in the device (memory 22 or index information storage unit 24) (step S108).

より詳細には、変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにして、インデックス情報記憶部２４を参照してテスト再生箇所のファイルのインデックス情報が存在するか否かを検索し、インデックス情報が存在すれば、「既存音声ファイルあり」として（ステップＳ１０９）、インデックス情報に従ってキャッシュデータ保存部２５からテスト再生箇所の音声ファイルを読み出し、メモリ２２の再生ワークエリアにキャッシュする（ステップＳ１１０）。この音声ファイルを＜データ１＞とする。 More specifically, with reference to the index information storage unit 24, whether or not the index information of the file at the test reproduction location exists by using the text data at the designated test reproduction location among the text data to be converted as a keyword. If the index information exists, “existing audio file exists” is determined (step S109), and the audio file at the test playback location is read from the cache data storage unit 25 according to the index information and cached in the playback work area of the memory 22 ( Step S110). This audio file is <data 1>.

テスト再生３０は、検索部２６による検索の結果、既存のファイルが存在した場合、再生ワークエリアにキャッシュされたテスト再生箇所の既存の音声ファイルを読み出してユーザＰＣ１のスピーカから音声を再生する（ステップＳ１１１）。 When there is an existing file as a result of the search by the search unit 26, the test playback 30 reads the existing audio file at the test playback location cached in the playback work area and plays the audio from the speaker of the user PC 1 (step S30). S111).

この音声を聞いたユーザが発音を確認し、発音が正しいものとして音声変換画面の「ＯＫ」等のボタンを選択操作すると（ステップＳ１１２のＹｅｓ）、データ処理部２８は、テスト再生された部分のテキストデータを確定し（ステップＳ１１３）、確定したテキストデータに変換不要のフラグを付す。テスト再生すべき次のテキストデータがあれば、Ｓ１０４の処理に戻り上記の処理繰り返す。 When the user who has heard this voice confirms the pronunciation and selects a button such as “OK” on the voice conversion screen on the assumption that the pronunciation is correct (Yes in step S112), the data processing unit 28 determines the portion of the test playback portion. The text data is confirmed (step S113), and a conversion unnecessary flag is attached to the confirmed text data. If there is the next text data to be tested, the process returns to S104 and the above process is repeated.

このようにしてテスト再生すべき次のテキストデータがなくなり、テスト再生箇所すべてのテスト再生が終了し、音声ファイル作成指示のためのボタンが押下されると（ステップＳ１１５）、データ処理部２８は、文字入力欄に入力された文面のテキストデータを単語または文節単位に分割し（ステップＳ１１６）、分割した単位でテキストデータを検索部２６に渡し、上記ステップＳ１０８と同様にこの装置内に保存されている既存音声ファイルを検索させる（ステップＳ１１７）。なお、既存音声ファイルを検索動作の詳細については後述の図６で説明する。また、この場合、既にユーザから承認を受け、変換不要（確定済）のフラグが付されているテスト再生済のテキストデータについては、音声へ変換しないため検索対象外とする。 In this way, when there is no next text data to be test-played, test playback of all test playback locations is completed, and the button for audio file creation instruction is pressed (step S115), the data processing unit 28 The text data of the text input in the character input field is divided into words or clauses (step S116), and the text data is transferred to the search unit 26 in the divided units, and is stored in this apparatus in the same manner as in step S108. The existing audio file is searched (step S117). Details of the search operation for the existing audio file will be described later with reference to FIG. Further, in this case, text data that has already been approved by the user and has been subjected to test reproduction with a conversion unnecessary (confirmed) flag added is not subject to search because it is not converted to speech.

検索の結果、既存音声ファイルが存在すると、検索部２６は、「既存音声ファイルあり」として（ステップＳ１１８のＹｅｓ）、インデックス情報に従ってキャッシュデータ保存部２５から音声ファイルを読み出し、メモリ２２の再生ワークエリアにキャッシュする（ステップＳ１１９）。この音声ファイルを＜データ２＞とする。
とする。 If there is an existing audio file as a result of the search, the search unit 26 determines that “existing audio file exists” (Yes in step S118), reads out the audio file from the cache data storage unit 25 according to the index information, and reproduces the work area of the memory 22 (Step S119). This audio file is <data 2>.
And

また上記検索の結果、既存音声ファイルがこの装置内に存在しない場合（ステップＳ１１８のＮｏ）、データ処理部２８は、検索した分割テキストデータの中間ファイルを生成した上で音声合成サーバ３へ転送し（ステップＳ１２０）、音声合成サーバ３により変換された音声ファイルを取得する（ステップＳ１２１）。 As a result of the search, if the existing voice file does not exist in the apparatus (No in step S118), the data processing unit 28 generates an intermediate file of the searched divided text data and transfers it to the voice synthesis server 3. (Step S120), the voice file converted by the voice synthesis server 3 is acquired (Step S121).

そして、得られた音声ファイル（生成物）をメモリ２２の再生ワークエリアにキャッシュする（ステップＳ１２２）。この音声ファイルを＜データ３＞とする。 Then, the obtained audio file (product) is cached in the reproduction work area of the memory 22 (step S122). This audio file is <data 3>.

そして、データ処理部２８は、再生ワークエリアにキャッシュされた音声ファイル（＜データ１＞、＜データ２＞、＜データ３＞）を分割した順序に従い結合し（ステップＳ１２３）、文面通りの音声ファイルを生成し、結合した音声ファイルをユーザＰＣ１のスピーカから再生する（ステップＳ１２４）。 Then, the data processing unit 28 combines the audio files cached in the playback work area (<data 1>, <data 2>, <data 3>) according to the divided order (step S123), and the audio file as written. And the combined audio file is reproduced from the speaker of the user PC 1 (step S124).

この音声を聞いたユーザが再生音声の発音を確認し、発音が間違っていた場合は、その音声ファイルの中間ファイル（テキストデータとアクセント記号）を表示した編集画面を表示し（ステップＳ１２６）、例えば「間違っている箇所のアクセント記号を訂正してください」等といったメッセージを表示してユーザの訂正を促し、ユーザがアクセント記号を訂正すると（ステップＳ１２７）、ステップＳ１２４の処理に戻り音声を再生する。 The user who has heard the sound confirms the pronunciation of the reproduced sound, and if the pronunciation is incorrect, an editing screen displaying the intermediate file (text data and accent mark) of the sound file is displayed (step S126). A message such as “Please correct the accent symbol in the wrong place” is displayed to prompt the user to correct it. When the user corrects the accent symbol (step S127), the process returns to step S124 to reproduce the voice.

ユーザが発音を確認した結果、発音が正しいものとして、画面の音声保存ボタンを選択操作すると（ステップＳ１２８）、データ処理部２８は、保存先を指定するための画面を表示し、保存先が指定されると（ステップＳ１２９）、その指定されたユーザＰＣ１の保存先へ音声ファイルを転送し保存する（ステップＳ１３０）。 As a result of the user confirming the pronunciation, if the pronunciation is correct and the sound saving button on the screen is selected and operated (step S128), the data processing unit 28 displays a screen for designating the saving destination, and the saving destination is designated. If so (step S129), the audio file is transferred to the storage location of the designated user PC 1 and stored (step S130).

その後、処理を継続するか否かに応じて処理を変える。例えば処理終了操作が行われなければ処理を継続するものとして（ステップＳ１３１のＹｅｓ）、ステップＳ１０４の処理に戻り、次のテキスト入力を待機する。 Thereafter, the process is changed depending on whether or not to continue the process. For example, if the process end operation is not performed, it is assumed that the process is continued (Yes in step S131), the process returns to the process in step S104 and waits for the next text input.

また、処理終了操作が行われると、処理を継続しないものとして（ステップＳ１３１のＮｏ）、データ処理部２８は、音声ファイルを履歴記録エリアであるキャッシュデータ保存部２５に保存し（ステップＳ１３２）、保存先の情報をインデックス情報記憶部２４に登録し（ステップＳ１３３）、一連のテキスト／音声の変換処理を終了する。 When the process end operation is performed, the data processing unit 28 stores the audio file in the cache data storage unit 25 that is a history recording area (step S132), assuming that the process is not continued (No in step S131). Information on the storage destination is registered in the index information storage unit 24 (step S133), and a series of text / speech conversion processing ends.

ここで、図６を参照して検索部２６による既存音声ファイルの検索動作（ステップＳ１０８、ステップＳ１１７）の詳細について説明する。
検索部２６は、テキストデータをキーワードにしてインデックス情報を、予め設定された優先度の順（第１ファイル照合、第２ファイル照合、第３ファイル照合、第４ファイル照合）に検索することで既存の音声ファイルの保存先を特定し、特定した保存先から既存の音声ファイルを読み出す。 Here, the details of the search operation (step S108, step S117) of the existing audio file by the search unit 26 will be described with reference to FIG.
The search unit 26 uses the text data as a keyword to search the index information in the order of preset priorities (first file verification, second file verification, third file verification, fourth file verification). The storage location of the audio file is specified, and the existing audio file is read from the specified storage location.

この場合、検索部２６は、図６に示すように、テキストデータをキーワードにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第１ファイル照合の対象であるログインユーザのユーザＩＤ（Ｕｓｅｒ１）の登録辞書４２（図３参照）のインデックス（中間ファイルおよびテキストなど）を検索し（ステップＳ２０１）、検索されたインデックスのリンク先である保存先を検索することで（ステップＳ２０２）、音声ファイルを読み出す。 In this case, as shown in FIG. 6, the search unit 26 uses the text data as a keyword, and the user of the login user who is the target of the first file verification among the index information stored in the index information storage unit 24. By searching the index (intermediate file, text, etc.) of the registration dictionary 42 (see FIG. 3) of the ID (User1) (step S201) and searching the storage destination that is the link destination of the searched index (step S202). Read audio file.

第１ファイル照合の結果、合致する情報が検索されない場合、検索部２６は、第２ファイル照合の対象であるログインユーザのユーザＩＤ（Ｕｓｅｒ１）の変換履歴４３（図３参照）の履歴データ（中間ファイルおよびテキストなど）を検索し（ステップＳ２０３）、検索された履歴データのリンク先である保存先を検索することで（ステップＳ２０４）、音声ファイルを読み出す。 If no matching information is found as a result of the first file collation, the retrieval unit 26 records history data (intermediate) of the conversion history 43 (see FIG. 3) of the user ID (User1) of the login user who is the second file collation target. (File and text etc.) are searched (step S203), and the storage destination that is the link destination of the searched history data is searched (step S204), and the audio file is read out.

第２ファイル照合の結果、合致する情報が検索されない場合、検索部２６は、第３ファイル照合の対象である他のユーザのユーザＩＤ（Ｕｓｅｒ２…）の登録辞書４２（図３参照）のインデックス（中間ファイルおよびテキストなど）を検索し（ステップＳ２０５）、検索されたインデックスのリンク先である保存先を検索することで（ステップＳ２０６）、音声ファイルを読み出す。 If no matching information is found as a result of the second file collation, the search unit 26 indexes the registration dictionary 42 (see FIG. 3) of the user ID (User2...) Of another user who is the third file collation target (see FIG. 3). An intermediate file, text, etc.) are searched (step S205), and a storage destination that is a link destination of the searched index is searched (step S206), thereby reading out an audio file.

第３ファイル照合の結果、合致する情報が検索されない場合、検索部２６は、第４ファイル照合の対象である他のユーザのユーザＩＤ（Ｕｓｅｒ２…）の変換履歴４３（図３参照）の履歴データ（中間ファイルおよびテキストなど）を検索し（ステップＳ２０７）、検索された履歴データのリンク先である保存先を検索することで（ステップＳ２０８）、音声ファイルを読み出す。 If no matching information is found as a result of the third file collation, the retrieval unit 26 records the history data of the conversion history 43 (see FIG. 3) of the user ID (User2...) Of another user who is the fourth file collation target. (Intermediate file and text, etc.) are searched (step S207), and the storage destination that is the link destination of the searched history data is searched (step S208), thereby reading the audio file.

このようにこの実施形態によれば、今までの音声変換処理にて部分的に誤変換が発生していた箇所を最終の変換処理後ではなく、事前にテスト再生にて確認・確定しておくことで、テキストの文面を音声変換した結果として得られる音声の発音の誤りをなくすことができる。またこれにより音声変換にかかる時間を短縮することができる。 As described above, according to this embodiment, a part where erroneous conversion has occurred partially in the voice conversion processing so far is confirmed and confirmed in advance by test reproduction instead of after the final conversion processing. In this way, it is possible to eliminate an error in pronunciation of speech obtained as a result of speech conversion of the text of the text. Further, this can reduce the time required for voice conversion.

また音声変換を音声合成サーバ３に依頼することなく、アプリケーションサーバ２に蓄積したおいた過去の変換済のデータから音声データを取得することで、音声合成サーバの負荷軽減を図ると共に、処理速度を向上することができる。また音声合成サーバ３が変換処理するデータ量を低減することができる。さらに、修正箇所のミニマム化が図れ、最終的にユーザの音声変換作業の時間短縮につながる。 In addition, by obtaining voice data from the past converted data stored in the application server 2 without requesting the voice synthesis server 3 for voice conversion, the load on the voice synthesis server is reduced and the processing speed is increased. Can be improved. In addition, the amount of data to be converted by the speech synthesis server 3 can be reduced. In addition, the correction location can be minimized, which ultimately shortens the time required for voice conversion by the user.

説明した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The described embodiments are presented by way of example and are not intended to limit the scope of the invention. The novel embodiment can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The above-described embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and equivalents thereof.

また上記実施形態に示した各構成要素を、コンピュータのハードディスク装置などのストレージにインストールしたプログラムで実現してもよく、また上記プログラムを、コンピュータ読取可能な電子媒体：electronic mediaに記憶しておき、プログラムを電子媒体からコンピュータに読み取らせることで本発明の機能をコンピュータが実現するようにしてもよい。電子媒体としては、例えばＣＤ−ＲＯＭ等の記録媒体やフラッシュメモリ、リムーバブルメディア：Removable media等が含まれる。さらに、ネットワークを介して接続した異なるコンピュータに構成要素を分散して記憶し、各構成要素を機能させたコンピュータ間で通信することで実現してもよい。 Further, each component shown in the above embodiment may be realized by a program installed in a storage such as a hard disk device of a computer, and the program is stored in a computer-readable electronic medium: electronic media, The computer may realize the functions of the present invention by causing a computer to read a program from an electronic medium. Examples of the electronic medium include a recording medium such as a CD-ROM, flash memory, and removable media. Further, the configuration may be realized by distributing and storing components in different computers connected via a network, and communicating between computers in which the components are functioning.

１…ユーザＰＣ、２…アプリケーションサーバ、３…音声合成サーバ３１…グラフィックユーザインターフェース部（ＧＵＩ部）、２２…メモリ、２３…中間ファイル生成部、２４…インデックス情報記憶部、２５…キャッシュデータ保存部、２５…キャッシュデータ保存部、２６…検索部、２７…通信処理部、２８…データ処理部、２９…登録部、３０…テスト再生部、４１…ユーザＩＤテーブル、４２…ユーザＩＤ毎の登録辞書、４３…ユーザＩＤ毎の変換履歴、４４…サーバ使用履歴。 DESCRIPTION OF SYMBOLS 1 ... User PC, 2 ... Application server, 3 ... Speech synthesis server 31 ... Graphic user interface part (GUI part), 22 ... Memory, 23 ... Intermediate file generation part, 24 ... Index information storage part, 25 ... Cache data storage part , 25 ... cache data storage unit, 26 ... search unit, 27 ... communication processing unit, 28 ... data processing unit, 29 ... registration unit, 30 ... test playback unit, 41 ... user ID table, 42 ... registration dictionary for each user ID 43 ... Conversion history for each user ID, 44 ... Server usage history.

Claims

In a voice conversion support device connected via a network to a user terminal and a voice conversion device that converts text data into voice data,
A storage unit in which the voice data converted by the voice converter and the text data of the conversion source corresponding to the voice data are stored;
Retrieval for searching whether or not voice data of the test playback location exists in the storage unit, using the text data of the specified test playback location as a keyword among the text data to be converted newly input from the terminal And
As a result of the search by the search unit, when the audio data is present in the storage unit, a test playback unit that performs test playback of the audio data;
When receiving an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the audio data obtained by converting the text data excluding the test reproduction location to the audio conversion device and the test A voice conversion support device comprising: a control unit that combines the reproduced existing voice data and transfers the data to the terminal.

The storage unit is provided with a dictionary for each user in which index information indicating a storage destination of voice data registered by a user who has logged in to the device is stored.
The search unit
2. The voice conversion support device according to claim 1, wherein the voice data of the test reproduction portion is searched by referring to the dictionary of the logged-in user by using the text data of the test reproduction portion as a keyword.

The storage unit is provided with a first conversion history in which index information indicating a storage destination of converted audio data that has been previously converted by a user who has logged in to the device is stored;
The search unit
The speech conversion support device according to claim 1 or 2, wherein the text data of the test reproduction part is searched for the voice data of the test reproduction part with reference to the first conversion history using the text data of the test reproduction part as a keyword.

The storage unit is provided with a plurality of user dictionaries in which index information indicating storage destinations of voice data registered by a plurality of users logged in to the apparatus is stored,
The search unit
4. The speech conversion support device according to claim 1, wherein the text data of the test playback portion is searched for the voice data of the test playback portion with reference to the plurality of user dictionaries using the text data of the test playback portion as a keyword.

The storage unit is provided with a plurality of second conversion histories in which index information of a plurality of users indicating storage destinations of the converted voice data that the plurality of users who have logged in to the apparatus previously instructed to convert is stored. And
The search unit
5. The speech conversion support device according to claim 1, wherein the speech data of the test playback location is searched by referring to the plurality of second conversion histories using the text data of the test playback location as a keyword. 6.

A user terminal, a voice conversion device that converts text data into voice data, and a voice conversion support device that transfers voice data obtained by converting text data input from the terminal to the voice conversion device, to the terminal In a speech synthesis system in which and are connected via a network,
The voice conversion support device
A storage unit that stores the voice data converted by the voice conversion support device and the text data of the conversion source corresponding to the voice data;
Retrieval for searching whether or not voice data of the test playback location exists in the storage unit, using the text data of the specified test playback location as a keyword among the text data to be converted newly input from the terminal And
As a result of the search by the search unit, when the audio data is present in the storage unit, a test playback unit that performs test playback of the audio data;
When receiving an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the audio data obtained by converting the text data excluding the test reproduction location to the audio conversion device and the test A speech synthesis system comprising a control unit that combines the reproduced existing voice data and transfers the combined data to the terminal.

In a speech conversion support method in a speech conversion support device connected via a network to a user terminal and a speech conversion device that converts text data into speech data,
The voice conversion support device stores the voice data converted by the voice conversion device and the conversion source text data corresponding to the voice data,
The text data of the specified test playback location among the text data to be converted newly input from the terminal is used as a keyword, and the voice data of the test playback location exists in the converted audio data stored. Whether or not the voice conversion support device searches,
As a result of the search, when the audio data at the test reproduction location is present in the stored converted audio data, the audio conversion support device performs test reproduction of the audio data,
When an instruction is received from the terminal that the test-played voice data is correct, the voice data obtained by converting the text data excluding the test playback portion into the voice converter and the existing voice that has been test-played A voice conversion support method in which the voice conversion support device combines data and transfers the data to the terminal .