JP2015022162A

JP2015022162A - Voice synthesizing system, and voice conversion supporting system

Info

Publication number: JP2015022162A
Application number: JP2013150532A
Authority: JP
Inventors: 町田　淳; Atsushi Machida; 淳町田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2013-07-19
Filing date: 2013-07-19
Publication date: 2015-02-02
Anticipated expiration: 2033-07-19
Also published as: JP6117040B2

Abstract

PROBLEM TO BE SOLVED: To enhance the response speed of a voice synthesizing apparatus while reducing the load on it by making the greatest possible use of the history of conversions done by its user.SOLUTION: A voice conversion supporting apparatus comprises: a converted voice data memory unit; a historical information memory unit; a searching unit; a test reproducing unit; and a control unit. The searching unit references the historical information memory unit by using as the keyword text data in a prescribed test reproduction part out of the text data inputted from a user terminal for conversion and identifies the place in which voice in the test reproduction part is stored. If the place for storage is the converted voice data memory unit in the apparatus, the converted voice data memory unit is designated as the destination of reading out the voice data; if the place for storage is another voice conversion supporting system on the network, a request for acquisition of the voice data is made with that other voice conversion supporting system and the voice data is acquired.

Description

本発明の実施形態は、例えばテキスト（文字、記号等）、図形等のキャラクタを音声に変換するサービスに利用される音声合成システムおよび音声変換支援装置に関する。 Embodiments described herein relate generally to a speech synthesis system and a speech conversion support device used in a service that converts characters such as text (characters, symbols, etc.) and figures into speech.

近年、例えばインターネットなどでは、文字を音声に変換するサービスが開始されており、このサービスには音声合成装置が利用されている。 In recent years, for example, on the Internet, a service for converting characters into speech has been started, and a speech synthesizer is used for this service.

一般に、音声合成装置は、ユーザが端末から入力したテキストの文面（文字列）を音声合成波形データに変換し、音声信号または音声ファイルを端末へ返すものである。 In general, a speech synthesizer converts a text (character string) of text input by a user from a terminal into speech synthesis waveform data and returns a speech signal or a speech file to the terminal.

ところで、ユーザが入力したテキストが、例えばひらがななどの場合、同音異義語、つまり文面では同じであるが意味上の違いから発音が異なる単語があり、このような単語を含む文面に対して音声変換処理を実施した場合、ユーザの意図とは異なる発音の音声信号または音声ファイルが端末に返されることがある。 By the way, if the text entered by the user is Hiragana, for example, there are homonyms, that is, words that are the same in the text but different in pronunciation due to semantic differences, and speech conversion is performed for texts containing such words When the process is performed, an audio signal or an audio file having a sound different from the user's intention may be returned to the terminal.

特開２００４−１１７７７８号公報JP 2004-117778 A

この場合、ユーザは修正したテキストの再変換、つまり音声変換のやり直しを音声合成装置に行わせることになるが、このようなやり直しの処理は、音声合成装置に多大な負荷をかけるだけでなく、それ相応の時間を要することから、ユーザへのレスポンスが悪化する。 In this case, the user causes the speech synthesizer to perform re-conversion of the corrected text, that is, the speech conversion, but this re-processing does not only put a great load on the speech synthesizer, Since a corresponding time is required, the response to the user is deteriorated.

本発明が解決しようとする課題は、ユーザが以前に行った変換履歴を可能な限り利用して音声合成装置の負荷を軽減しつつレスポンス速度を向上することができる音声変換システムおよび音声変換支援装置を提供することにある。 A problem to be solved by the present invention is to provide a speech conversion system and a speech conversion support device that can improve the response speed while reducing the load on the speech synthesizer by using the conversion history previously performed by the user as much as possible. Is to provide.

実施形態の音声変換支援装置は、変換済音声データ記憶部、履歴情報記憶部、検索部、テスト再生部、制御部を備える。前記変換済音声データ記憶部には音声変換装置が変換済の音声データとこの音声データと対応する変換元のテキストデータとが記憶されている。前記履歴情報記憶部には前記変換済音声データ記憶部および前記ネットワーク上の他の音声変換支援装置を含めて前記変換済の音声データの保存先を検索するための履歴情報が記憶されている。前記検索部は前記端末から入力された変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにして前記履歴情報記憶部を参照し前記テスト再生箇所の音声データの保存先を特定し、前記保存先がこの装置内部の前記変換済音声データ記憶部の場合、前記音声データの読出先として前記変換済音声データ記憶部を指定し、前記保存先が前記ネットワーク上の他の音声変換支援装置の場合、前記他の音声変換支援装置に対して該音声データの取得要求を行い、該音声データを取得する。前記テスト再生部は前記検索部により前記音声データの読出先として指定された前記変換済音声データ記憶部から読み出した該音声データまたは前記他の音声変換支援装置から取得された該音声データをテスト再生する。前記制御部は前記テスト再生部によりテスト再生された音声データが正しいものとの指示を前記端末から受けた場合、前記テスト再生箇所を除いたテキストデータを前記音声変換装置に変換させて得られた音声データとテスト再生した過去に変換済みの音声データとを結合して前記端末へ送信する。 The speech conversion support device according to the embodiment includes a converted speech data storage unit, a history information storage unit, a search unit, a test playback unit, and a control unit. The converted voice data storage unit stores voice data converted by the voice converter and text data corresponding to the voice data. The history information storage unit stores history information for searching the storage destination of the converted voice data including the converted voice data storage unit and other voice conversion support devices on the network. The search unit uses the text data of the specified test playback portion of the text data to be converted input from the terminal as a keyword, and refers to the history information storage unit to specify the storage location of the audio data of the test playback portion When the storage destination is the converted voice data storage unit in the apparatus, the converted voice data storage unit is designated as a reading destination of the voice data, and the storage destination is another voice conversion on the network. In the case of the support device, the voice data acquisition request is made to the other voice conversion support device, and the voice data is acquired. The test playback unit performs test playback of the voice data read from the converted voice data storage unit designated as the reading destination of the voice data by the search unit or the voice data acquired from the other voice conversion support device To do. When the control unit receives an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the control unit is obtained by converting the text data excluding the test reproduction part to the audio conversion device. Audio data and test-reproduced audio data converted in the past are combined and transmitted to the terminal.

実施形態の音声合成システムの全体の構成を示す図である。1 is a diagram illustrating an overall configuration of a speech synthesis system according to an embodiment. アプリケーションサーバのブロック図である。It is a block diagram of an application server. インデックス情報の一例を示す図である。It is a figure which shows an example of index information. この音声合成システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of this speech synthesis system. この音声合成システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of this speech synthesis system. 既存の音声ファイルの検索動作を示すフローチャートである。It is a flowchart which shows the search operation | movement of the existing audio | voice file.

以下、図面を参照して実施形態を詳細に説明する。図１は実施形態の音声合成システムの構成を示す図である。 Hereinafter, embodiments will be described in detail with reference to the drawings. FIG. 1 is a diagram illustrating a configuration of a speech synthesis system according to an embodiment.

図１に示すように、この実施形態の音声合成システムは、サービス利用者（以下「ユーザ」と称す）が操作する端末であるコンピュータ１（以下「ユーザＰＣ１」と称す）、音声変換装置としてのコンピュータである音声合成サーバ３と、音声変換支援装置としてのコンピュータ２ａ〜２ｎ（以下「アプリケーションサーバ２ａ〜２ｎ」と称す）と、これらの機器を、ネットワーク４を介して接続されたものである。 As shown in FIG. 1, the speech synthesis system of this embodiment includes a computer 1 (hereinafter referred to as “user PC1”), which is a terminal operated by a service user (hereinafter referred to as “user”), and a speech conversion apparatus. A speech synthesis server 3 that is a computer, computers 2a to 2n (hereinafter referred to as “application servers 2a to 2n”) as speech conversion support devices, and these devices are connected via a network 4.

ユーザＰＣ１は、アプリケーションサーバ２ａ〜２ｎに対してテキストデータから音声データへの変換要求を行う。音声合成サーバ３は、音声合成エンジンを搭載しており、アプリケーションサーバ２ａ〜２ｎからの変換要求によってアプリケーションサーバ２ａ〜２ｎから受け取ったテキストデータを音声データに変換し、要求元のアプリケーションサーバ２ａ〜２ｎへ返す音声変換装置である。 The user PC 1 makes a conversion request from text data to voice data to the application servers 2a to 2n. The speech synthesis server 3 includes a speech synthesis engine, converts the text data received from the application servers 2a to 2n into speech data in response to a conversion request from the application servers 2a to 2n, and requests the application servers 2a to 2n. This is a voice conversion device to return to.

詳細には、音声合成サーバ３は、アプリケーションサーバ２ａ〜２ｎから送られてきた中間ファイル（テキストデータとアクセント記号のペア（組））を音声データ（以下「音声ファイル」と称す）に変換してアプリケーションサーバ２ａ〜２ｎに戻す。 Specifically, the speech synthesis server 3 converts the intermediate file (text data and accent symbol pair) sent from the application servers 2a to 2n into speech data (hereinafter referred to as “speech file”). Return to the application servers 2a to 2n.

アプリケーションサーバ２ａ〜２ｎは、ユーザＰＣ１と音声合成サーバ３とにネットワーク４を介して接続されている。アプリケーションサーバ２ａ〜２ｎは、音声合成サーバ３とユーザＰＣ１との間に介在してテキストデータ、中間ファイルおよび音声ファイルのやりとりを行う。 The application servers 2a to 2n are connected to the user PC 1 and the speech synthesis server 3 via the network 4. The application servers 2a to 2n are interposed between the voice synthesis server 3 and the user PC 1 and exchange text data, intermediate files, and voice files.

詳細には、アプリケーションサーバ２ａ〜２ｎは、ユーザＰＣ１からのテキストデータの変換要求に対して過去に該テキストデータの変換履歴がない場合に音声合成サーバ３にテキストデータを変換させ、音声合成サーバ３により変換された音声データをユーザＰＣ１へ送信する。 Specifically, the application servers 2 a to 2 n cause the speech synthesis server 3 to convert the text data when there is no conversion history of the text data in the past in response to the text data conversion request from the user PC 1. Is transmitted to the user PC 1.

ネットワーク４に接続された複数のアプリケーションサーバ２ａ〜２ｎのうち、図２に示すように、例えばアプリケーションサーバ２ａは、グラフィックユーザインターフェース部２１（以下「ＧＵＩ部２１」と称す）、メモリ２２、中間ファイル生成部２３、キャッシュされたデータを管理するための履歴情報記憶部としてのインデックス情報記憶部２４、変換済音声データ記憶部としてのキャッシュデータ保存部２５、検索部２６、通信処理部２７、データ処理部２８、登録部２９、テスト再生部３０、配信部３１などを有する。 Among the plurality of application servers 2a to 2n connected to the network 4, as shown in FIG. 2, for example, the application server 2a includes a graphic user interface unit 21 (hereinafter referred to as “GUI unit 21”), a memory 22, and an intermediate file. Generation unit 23, index information storage unit 24 as a history information storage unit for managing cached data, cache data storage unit 25 as a converted audio data storage unit, search unit 26, communication processing unit 27, data processing Unit 28, registration unit 29, test reproduction unit 30, distribution unit 31, and the like.

なお他のアプリケーションサーバ２ｂ〜２ｎもアプリケーションサーバ２ａと同様の構成を有するため、以下では代表してアプリケーションサーバ２ａの構成について説明する。 Since the other application servers 2b to 2n have the same configuration as the application server 2a, the configuration of the application server 2a will be described below as a representative.

アプリケーションサーバ２ａは、ユーザＰＣ１から入力されたテキストデータをキーワード（検索キー）にしてインデックス情報記憶部２４のインデックス情報を利用してキャッシュデータ保存部２５にキャッシュ（記憶）された音声ファイルを検索し、ヒットした場合は、音声合成サーバ３に音声合成を要求することなく、キャッシュされた音声ファイルを読み出して要求元であるユーザＰＣ１に返す。 The application server 2a searches the voice file cached (stored) in the cache data storage unit 25 using the index information in the index information storage unit 24 using the text data input from the user PC 1 as a keyword (search key). If there is a hit, the cached voice file is read out and returned to the requesting user PC 1 without requesting the voice synthesis server 3 for voice synthesis.

すなわち、アプリケーションサーバ２ａは、自身のハードディスク装置にキャッシュされているか否かをチェックし、キャッシュされていない場合に、ユーザＰＣ１から入力されたテキストデータを音声合成サーバ３へ出力し、このテキストデータに対する応答として音声合成サーバ３にて変換（音声合成）された音声ファイルを取得しユーザＰＣ１へ送る。 That is, the application server 2a checks whether or not it is cached in its own hard disk device, and if it is not cached, it outputs the text data input from the user PC 1 to the speech synthesis server 3 and applies this text data. As a response, the voice file converted (voice synthesized) by the voice synthesis server 3 is acquired and sent to the user PC 1.

ＧＵＩ部２１は、ユーザＰＣ１からアプリケーションサーバ２ａにログインするための画面、検索画面、登録画面などを表示し、ユーザＰＣ１からの音声合成要求、テキストデータの入力などを受け付けるとともに、要求に対する応答として音声ファイルをユーザＰＣ１へ送る。ＧＵＩ部２１は、例えば検索画面において、ユーザＰＣ１から新たに入力された変換対象のテキストデータのうち指定されたテスト再生箇所を受け付ける受付部として機能する。 The GUI unit 21 displays a screen for logging in to the application server 2a from the user PC 1, a search screen, a registration screen, and the like, accepts a voice synthesis request from the user PC 1, input of text data, and the like as a response to the request. Send the file to user PC1. For example, on the search screen, the GUI unit 21 functions as a reception unit that receives a designated test reproduction portion of text data to be converted newly input from the user PC 1.

つまり、ＧＵＩ部２１は、ユーザＰＣ１とアプリケーションサーバ２ａとの間の入出力インターフェースを実現するものである。 That is, the GUI unit 21 realizes an input / output interface between the user PC 1 and the application server 2a.

メモリ２２は、データ処理部２８、検索部２６および登録部２９などがそれぞれの処理を実行する際のワークエリア、変換要求する際に作成された中間ファイルの一時記憶エリアとして利用される。 The memory 22 is used as a work area when the data processing unit 28, the search unit 26, the registration unit 29, etc. execute the respective processes, and as a temporary storage area for the intermediate file created when a conversion request is made.

中間ファイル生成部２３は、ユーザＰＣ１から入力された変換対象のテキストデータを単語または文節の単位に分割し、分割したテキストデータのうちキャッシュされていないもの、またはテスト再生でユーザにより発音が正しくないものと指示されたものをキーワードにして音声変換辞書を参照して、対応するアクセント記号を音声変換辞書から読み出してテキストデータとアクセント記号とのペア（組）の中間ファイルを生成し、音声合成サーバ３への変換要求のためのデータとしてメモリ２２に記憶する。この中間ファイルは、音声合成用の元データとして音声合成サーバ３へ送信される。 The intermediate file generation unit 23 divides the text data to be converted input from the user PC 1 into units of words or phrases, and the divided text data is not cached, or the pronunciation is not correct by the user in the test playback. Refer to the speech conversion dictionary using what is designated as a keyword, read the corresponding accent symbol from the speech conversion dictionary, generate an intermediate file of text data and accent symbol pairs, and generate a speech synthesis server 3 is stored in the memory 22 as data for requesting conversion to 3. This intermediate file is transmitted to the speech synthesis server 3 as original data for speech synthesis.

なお、既にユーザの承認を受け、変換不要（確定済）のフラグが付されたテキストデータについては、音声変換をしないため中間ファイルも生成しない。 For text data that has already been approved by the user and that has been flagged as not requiring conversion (confirmed), no intermediate file is generated because voice conversion is not performed.

キャッシュデータ保存部２５には、以前（過去）に変換されたファイル（音声データのファイル、テキストデータのファイル、中間データのファイルなど）が保存されている。 The cache data storage unit 25 stores files (voice data files, text data files, intermediate data files, etc.) that have been converted into the previous (past) format.

つまりキャッシュデータ保存部２５には、音声合成サーバ３により変換済の音声データとこの音声データと対応する変換元のテキストデータとが記憶されている。 That is, the cache data storage unit 25 stores the voice data converted by the voice synthesis server 3 and the conversion source text data corresponding to the voice data.

インデックス情報記憶部２４には、キャッシュデータ保存部２５にキャッシュ（保存）されている過去のデータ（ファイル）をユーザとの親和性の高い順に検索するための、絞り込み範囲の異なる複数のインデックス情報が順位付けして記憶されている。 The index information storage unit 24 includes a plurality of pieces of index information with different narrowing ranges for searching past data (files) cached (saved) in the cache data storage unit 25 in descending order of affinity with the user. They are ranked and stored.

インデックス情報記憶部２４には、ユーザ毎の音声変換用の辞書、変換履歴などが記憶されている。辞書にはユーザが独自に登録した音声データの保存先を示すインデックス情報が記憶されている。変換履歴は個々のユーザが音声変換したときのインデックス情報を変換履歴として残したものである。 The index information storage unit 24 stores a voice conversion dictionary for each user, a conversion history, and the like. The dictionary stores index information indicating the storage destination of the voice data uniquely registered by the user. The conversion history is obtained by leaving the index information when each user performs voice conversion as the conversion history.

インデックス情報は、音声データの保存先を示す保存先情報（ディレクトリ、リンク先、サーバＩＤなど）と対応した変換元のテキストデータとこのテキストデータを音声に変換する際に音の強弱を指定するアクセント記号（アクセント情報）とのペア（組）が保存されたファイル管理用の情報である。 The index information includes storage destination information (directory, link destination, server ID, etc.) corresponding to the storage destination of the voice data, and an accent that specifies the strength of the sound when the text data is converted to voice. This is file management information in which a pair with a symbol (accent information) is stored.

つまりインデックス情報は、このアプリケーションサーバ２ａのキャッシュデータ保存部２５およびネットワーク４上の他のアプリケーションサーバ２ａ〜２ｎを含めて変換済みの音声ファイルの保存先を検索するための履歴情報である。 That is, the index information is history information for searching the storage destination of the converted audio file including the cache data storage unit 25 of the application server 2a and the other application servers 2a to 2n on the network 4.

音声データは音声合成サーバ３により変換（音声合成）された音声データである場合もある。この音声変換辞書は中間ファイル生成部２３により利用される。 The voice data may be voice data converted (voice synthesis) by the voice synthesis server 3. This voice conversion dictionary is used by the intermediate file generation unit 23.

つまりインデックス情報記憶部２４には、キャッシュデータ保存部２５にキャッシュされたデータを管理するためのインデックス情報が記憶されている。なおインデックス情報については図３で具体的に説明する。 That is, the index information storage unit 24 stores index information for managing the data cached in the cache data storage unit 25. The index information will be specifically described with reference to FIG.

検索部２６はユーザＰＣ１から新たに入力された変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにして、テスト再生箇所の音声データがこのアプリケーションサーバ２ａ（自機）のキャッシュデータ保存部２５に存在するか否か、また他のアプリケーションサーバ２ｂ〜２ｎ（他機）に存在するか否かを検索する。 The search unit 26 uses the text data of the designated test playback portion among the text data to be converted newly input from the user PC 1 as a keyword, and the audio data of the test playback portion is cached in the application server 2a (own device). A search is made as to whether or not the data storage unit 25 exists and whether or not it exists in the other application servers 2b to 2n (other devices).

検索部２６はユーザＩＤ毎の登録辞書４２、ユーザＩＤ毎の変換履歴４３、およびサーバ履歴４４を予め設定された優先順位の順に検索する。検索部２６は、要求元から入力されたテキストデータを基にインデックス情報記憶部２４の複数のインデックス情報を順位付け（図６の第１ファイル照合、第２ファイル照合、第３ファイル照合、第４ファイル照合、第５ファイル照合）の順に検索して既存の音声ファイル（音声データ）の有無を確認する。 The search unit 26 searches the registration dictionary 42 for each user ID, the conversion history 43 for each user ID, and the server history 44 in the order of preset priority. The search unit 26 ranks the plurality of pieces of index information in the index information storage unit 24 based on the text data input from the request source (first file collation, second file collation, third file collation, fourth The presence of an existing voice file (voice data) is confirmed by searching in the order of file verification and fifth file verification).

第１ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにしてログインしたユーザ自身のユーザＩＤの登録辞書４２（Ｕｓｅｒ１）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the first file collation, the search unit 26 searches the voice file (voice data) at the test playback location with reference to the registered dictionary 42 (User1) of the user ID of the logged-in user using the text data at the test playback location as a keyword. To do.

第２ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにしてログインしたユーザ自身のユーザＩＤの変換履歴４３（Ｕｓｅｒ１）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the second file collation, the search unit 26 searches the voice file (voice data) at the test playback location with reference to the conversion history 43 (User1) of the user ID of the logged-in user using the text data at the test playback location as a keyword. To do.

第３ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにして複数のユーザの登録辞書４２（Ｕｓｅｒ２…）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the third file collation, the search unit 26 searches the voice file (voice data) at the test playback location by referring to the registered dictionaries 42 (User2...) Of a plurality of users using the text data at the test playback location as a keyword.

第４ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにして複数のユーザのユーザＩＤの変換履歴４３（Ｕｓｅｒ２…）を参照してテスト再生箇所の音声ファイル（音声データ）を検索する。 In the fourth file collation, the search unit 26 searches the audio file (audio data) at the test reproduction location by referring to the conversion history 43 (User2...) Of the user IDs of a plurality of users using the text data at the test reproduction location as a keyword. To do.

第５ファイル照合では、検索部２６はテスト再生箇所のテキストデータをキーワードにしてアプリケーションサーバ２ａのサーバ履歴４４を参照して自他全てのアプリケーションサーバ２ａ〜２ｎに保存されているテスト再生箇所の音声ファイル（音声データ）を検索する。 In the fifth file collation, the search unit 26 refers to the server history 44 of the application server 2a using the text data of the test playback location as a keyword, and the test playback location audio stored in all of the application servers 2a to 2n. Search for files (voice data).

検索部２６は、ユーザＰＣ１から入力された変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにしてインデックス情報記憶部２４を参照しテスト再生箇所の音声ファイルの保存先を特定する。 The search unit 26 refers to the index information storage unit 24 using the text data of the designated test playback portion of the text data to be converted input from the user PC 1 as a keyword, and specifies the storage location of the audio file of the test playback portion. To do.

そして特定した保存先がこのアプリケーションサーバ２ａ内のキャッシュデータ保存部２５の場合、検索部２６は音声ファイルの読出先としてキャッシュデータ保存部２５を指定する。 When the specified storage destination is the cache data storage unit 25 in the application server 2a, the search unit 26 designates the cache data storage unit 25 as a read destination of the audio file.

また特定した保存先がネットワーク４上の他のアプリケーションサーバ２ｂ〜２ｎの場合、検索部２６はその外部のアプリケーションサーバ２ｂ〜２ｎに対して該音声ファイルの取得要求を行い、該音声ファイルを取得する。 When the specified storage destination is the other application servers 2b to 2n on the network 4, the search unit 26 requests the external application servers 2b to 2n to acquire the audio file, and acquires the audio file. .

詳細には、検索部２６は、ＧＵＩ部２１がユーザＰＣ１に表示する画面において、ユーザＰＣ１から入力されたテキストデータのうちテスト再生箇所が指定された場合、ＧＵＩ部２１により受け付けられたテスト再生箇所の過去の変換済みデータ、つまり変換済みの音声ファイル（音声データ）がどのアプリケーションサーバ２ａ〜２ｎのキャッシュデータ保存部２５に存在するか否かを検索する。 Specifically, the search unit 26 displays the test playback location received by the GUI unit 21 when the test playback location is specified in the text data input from the user PC 1 on the screen displayed on the user PC 1 by the GUI unit 21. The past converted data, that is, the converted audio file (audio data) is searched for in which cache data storage unit 25 of the application servers 2a to 2n.

テスト再生部３０は、検索部２６により音声ファイルの読出先として指定されたキャッシュデータ保存部２５から読み出した該音声ファイル、または他のアプリケーションサーバ２ｂ〜２ｎから取得された該音声ファイルをテスト再生する。 The test playback unit 30 performs test playback of the audio file read from the cache data storage unit 25 designated by the search unit 26 as a read destination of the audio file or the audio file acquired from the other application servers 2b to 2n. .

通信処理部２７は、音声合成サーバ３との間で、ＴＣＰ（ＨＴＴＰ）通信により、データのやりとりを行う。 The communication processing unit 27 exchanges data with the speech synthesis server 3 by TCP (HTTP) communication.

データ処理部２８は、検索部２６による検索の結果、テキストデータ（検索キーまたはキーワード等とも言う）が存在したインデックス情報により示される過去のデータの保存先、つまりキャッシュデータ保存部２５からキャッシュされている既存ファイルに含まれる音声ファイルを読み出して変換要求元のユーザＰＣ１へ転送する。 As a result of the search by the search unit 26, the data processing unit 28 is cached from the past data storage destination indicated by the index information in which text data (also referred to as a search key or keyword) exists, that is, from the cache data storage unit 25. The audio file included in the existing file is read out and transferred to the user PC 1 as the conversion request source.

ユーザＰＣ１の画面において、テスト再生部３０によりテスト再生されたテスト再生箇所の音声データの発音が正しいものとの指示をユーザが行い、その指示をデータ処理部２８がユーザＰＣ１から受けた場合、データ処理部２８は、テスト再生箇所を除いたテキストデータを音声合成サーバ３に変換させて得られた音声データとテスト再生した既存の音声データとを結合してユーザＰＣ１へ返却する制御部として機能する。 When the user gives an instruction on the screen of the user PC 1 that the sound of the test data reproduced by the test reproduction unit 30 is correct and the data processing unit 28 receives the instruction from the user PC 1, the data The processing unit 28 functions as a control unit that combines the voice data obtained by converting the text data excluding the test playback portion into the voice synthesis server 3 and the existing voice data that has been tested and played back to the user PC 1. .

一方、いずれのインデックス情報にもキーワードが存在しない場合、データ処理部２８は、入力されたテキストデータを基に中間ファイル生成部２３により生成されメモリ２２に記憶された中間ファイルを音声合成サーバ３へ送り、音声合成サーバ３により変換（音声合成）された音声ファイルを要求元のユーザＰＣ１へ転送（送信）する。なお中間ファイルではなく入力されたテキストデータを送ってもよい。 On the other hand, if no keyword exists in any index information, the data processing unit 28 sends the intermediate file generated by the intermediate file generation unit 23 based on the input text data and stored in the memory 22 to the speech synthesis server 3. Then, the voice file converted (voice synthesized) by the voice synthesis server 3 is transferred (transmitted) to the requesting user PC 1. It is also possible to send input text data instead of an intermediate file.

登録部２９は、ＧＵＩ部２１により表示される辞書登録画面にて、メモリ２２内のユーザＩＤ毎の登録辞書４２にユーザが独自に入力または編集した情報（音声データとテキストデータとアクセント記号）を登録すると共に、メモリ２２の登録辞書４２の管理情報をインデックス情報記憶部２４に登録する。管理情報は対象のテキストおよび中間ファイルと保存先などである。また登録部２９は、ネットワーク４上の他のアプリケーションサーバ２ｂ〜２ｎから配信されてきた音声ファイルの保存先情報をインデックス情報記憶部２４に登録する。 The registration unit 29 uses the dictionary registration screen displayed by the GUI unit 21 to store information (voice data, text data, and accent marks) that the user has entered or edited in the registration dictionary 42 for each user ID in the memory 22. At the same time, the management information of the registration dictionary 42 in the memory 22 is registered in the index information storage unit 24. The management information includes the target text, intermediate file, and storage destination. The registration unit 29 registers the storage location information of the audio file distributed from the other application servers 2 b to 2 n on the network 4 in the index information storage unit 24.

配信部３１は、キャッシュデータ保存部２５に新たに登録された音声ファイルの保存先情報をインデックス情報記憶部２４から読み出してネットワーク４上の他のアプリケーションサーバ２ｂ〜２ｎへ配信する。 The distribution unit 31 reads out the storage destination information of the audio file newly registered in the cache data storage unit 25 from the index information storage unit 24 and distributes it to the other application servers 2b to 2n on the network 4.

図３に示すように、インデックス情報記憶部２４には、ユーザＩＤテーブル４１の各ユーザＩＤに紐付けられた複数のインデックス情報（ユーザＩＤ毎の登録辞書４２（Ｕｓｅｒ１，２…）、ユーザＩＤ毎の変換履歴４３（Ｕｓｅｒ１，２…）、このアプリケーションサーバ２のすべてのユーザの使用履歴４４（以下「サーバ履歴４４」と称す）などの３つのインデックス情報）が記憶されている。このアプリケーションサーバ２にログインしたユーザ一人の変換履歴４３を第１変換履歴という。このアプリケーションサーバ２にログインした複数のユーザ（例えば今までログインした全てのユーザといってもよい）の変換履歴を第２変換履歴という。 As shown in FIG. 3, the index information storage unit 24 stores a plurality of pieces of index information (registered dictionaries 42 for each user ID (User 1, 2...)) Associated with each user ID in the user ID table 41, for each user ID. Conversion history 43 (User 1, 2...) And usage index 44 (hereinafter referred to as “server history 44”) of all users of the application server 2 are stored. The conversion history 43 of one user who has logged into the application server 2 is referred to as a first conversion history. A conversion history of a plurality of users who have logged into the application server 2 (for example, all users who have logged in so far) is referred to as a second conversion history.

ユーザＩＤテーブル４１には、このアプリケーションサーバ２にログイン可能なユーザの識別情報であるユーザＩＤが設定されている。ユーザの識別情報は、ユーザＩＤだけでなくパスワードなども含まれる。 In the user ID table 41, a user ID that is identification information of a user who can log in to the application server 2 is set. The user identification information includes not only the user ID but also a password.

ユーザＩＤ毎の登録辞書４２（Ｕｓｅｒ１，２…）には、ユーザが独自に登録したテキストデータ（これを「テキスト」という）と、テキストとそのアクセスト記号のペア（組）である中間ファイルと、これらのデータに対応する音声ファイルの保存先を示す保存先インデックスとが記憶されている。 The registration dictionary 42 (User 1, 2,...) For each user ID includes text data (this is referred to as “text”) uniquely registered by the user, an intermediate file that is a pair of text and its access symbol, and The storage destination index indicating the storage destination of the audio file corresponding to these data is stored.

このユーザＩＤ毎の登録辞書４２（Ｕｓｅｒ１，２…）は、検索の際の順位として第１，２番目の順位（第１，２順位）に設定されており、検索部２６がキャッシュデータを検索するときに初めに参照される。この辞書の中での順位は、第１順位が中間ファイル、第２順位がテキストである。インデックス情報としての検索順位は第１番目である。 The registration dictionary 42 (User 1, 2...) For each user ID is set to the first and second ranks (first and second ranks) as the ranks for the search, and the search unit 26 searches the cache data. When you refer to it first. In the dictionary, the first rank is an intermediate file, and the second rank is text. The search order as index information is first.

ユーザＩＤ毎の変換履歴４３（Ｕｓｅｒ１，２…）には、ログインしたユーザＩＤ毎にこの音声変換機能を使用したときのテキストと、テキストとそのアクセスト記号のペア（組）である中間ファイルと、これらのデータに対応する音声ファイルの保存先を示す保存先インデックスとが記憶されている。テキスト、中間ファイル、保存先インデックスなどを使用履歴という。 The conversion history 43 for each user ID (User 1, 2,...) Includes text when this voice conversion function is used for each logged-in user ID, an intermediate file that is a pair of text and its access symbol, and The storage destination index indicating the storage destination of the audio file corresponding to these data is stored. Text, intermediate files, save destination indexes, etc. are called usage history.

ユーザＩＤ毎の変換履歴４３（Ｕｓｅｒ１，２…）は、検索の際の順位として第３，４番目の順位（第３，４順位）に設定されており、検索部２６がキャッシュデータを検索するときに第３，４番目に参照される。この履歴の中での順位は、第３順位が中間ファイル、第４順位がテキストである。インデックス情報としての検索順位は第２番目である。 The conversion history 43 (User 1, 2,...) For each user ID is set to the third and fourth ranks (third and fourth ranks) as the ranks for the search, and the search unit 26 searches the cache data. Sometimes referred to 3rd and 4th. In the history, the third rank is an intermediate file, and the fourth rank is text. The search order as index information is second.

サーバ履歴４４には、このアプリケーションサーバ２にユーザがログインしてこの音声変換機能を使用した際のすべてのユーザの使用履歴が記憶されている。サーバ履歴４４は、検索の際の順位として第５，６番目の順位（第５，６順位）に設定されており、検索部２６がキャッシュデータを検索するときに第５，６番目に参照される。この履歴の中での順位は、第５順位が中間ファイル、第６順位がテキストである。インデックス情報としての検索順位は第３番目である。 The server history 44 stores use histories of all users when the user logs in to the application server 2 and uses the voice conversion function. The server history 44 is set to the fifth and sixth ranks (fifth and sixth ranks) as the rank in the search, and is referred to the fifth and sixth when the search unit 26 searches the cache data. The In the history, the fifth rank is an intermediate file, and the sixth rank is text. The search order as index information is the third.

サーバ履歴４４の保存先情報の欄に登録される情報は、例えばサーバ自体の内部であればキャッシュデータ保存部２５のディレクトリ、ネットワーク４に接続された他のサーバであれば、そのサーバのリンク情報かＵＲＬまたはサーバを識別するためのサーバ識別子（サーバＩＤ）などである。 The information registered in the storage destination information column of the server history 44 is, for example, the directory of the cache data storage unit 25 if it is inside the server itself, or the link information of that server if it is another server connected to the network 4. Or a URL or a server identifier (server ID) for identifying the server.

次に、図４〜図６のフローチャートを参照してこの実施形態の音声合成システムに動作を説明する。まず図４，図５のフローチャートを参照してこのシステム全体の動作を説明する。
この実施形態の音声合成システムの場合、ユーザがユーザＰＣ１から所定のＵＲＬを入力し、アプリケーションサーバ２にアクセスすると、ＧＵＩ部２１はログイン画面をユーザＰＣ１に表示するので、ユーザは表示されたログイン画面の入力欄に、ログインＩＤ、例えば「Ｕｓｅｒ１」などのログイン情報を入力する（図４のステップＳ１０１）。この他、ログイン情報としてパスワードなども入力する場合がある。 Next, the operation of the speech synthesis system of this embodiment will be described with reference to the flowcharts of FIGS. First, the operation of the entire system will be described with reference to the flowcharts of FIGS.
In the case of the speech synthesis system of this embodiment, when the user inputs a predetermined URL from the user PC 1 and accesses the application server 2, the GUI unit 21 displays a login screen on the user PC 1, so that the user displays the displayed login screen. The login information such as “User1”, for example, is input in the input field (step S101 in FIG. 4). In addition, a password may be input as login information.

すると、ＧＵＩ部２１は入力されたログイン情報をメモリ２２のユーザＩＤテーブル４１のユーザＩＤと照合することで、ログイン情報が登録済みか否かを判定し（ステップＳ１０２）、ログイン情報が登録済みの場合（ステップＳ１０２のＹｅｓ）、アプリケーションサーバ２へのログインを許可し、音声変換画面を表示する（ステップＳ１０３）。 Then, the GUI unit 21 determines whether the login information has been registered by checking the input login information with the user ID in the user ID table 41 of the memory 22 (step S102). In the case (Yes in step S102), login to the application server 2 is permitted and a voice conversion screen is displayed (step S103).

ユーザが、音声変換画面の文字入力欄にカーソルを移動し、キー入力により変換対象の文字（テキストデータ）を入力すると（ステップＳ１０４）、ＧＵＩ部２１はその入力を受け付け、入力されたテキストデータを文字入力欄に表示する（ステップＳ１０５）。 When the user moves the cursor to the character input field on the voice conversion screen and inputs a character to be converted (text data) by key input (step S104), the GUI unit 21 accepts the input and receives the input text data. It is displayed in the character input field (step S105).

そして、音声変換画面に表示されている音声ファイル作成指示のためのボタンが押下されると（ステップＳ１０６のＹｅｓ）、音声ファイル作成処理のルーチンへ進む。 When the button for voice file creation instruction displayed on the voice conversion screen is pressed (Yes in step S106), the routine proceeds to a routine for voice file creation processing.

一方、文字入力欄に表示されているテキストデータのうちテスト再生箇所をユーザが範囲指定などの操作（マウスのドラッグ操作など）により指定され、音声変換画面に表示されているテスト再生ボタンが押下されると（ステップＳ１０７のＹｅｓ）、ＧＵＩ部２１はテスト再生ボタンの押下を受け付け、検索部２６に通知する。 On the other hand, the test playback location in the text data displayed in the character input field is specified by the user by an operation such as range specification (such as a mouse drag operation), and the test playback button displayed on the voice conversion screen is pressed. Then (Yes in step S107), the GUI unit 21 accepts the pressing of the test playback button and notifies the search unit 26 of it.

検索部２６は、既存の音声ファイルがこの装置内（メモリ２２またはインデックス情報記憶部２４）に存在するか否かを検索する（ステップＳ１０８）。 The search unit 26 searches whether or not an existing audio file exists in the device (memory 22 or index information storage unit 24) (step S108).

より詳細には、変換対象のテキストデータのうち指定されたテスト再生箇所のテキストデータをキーワードにして、インデックス情報記憶部２４を参照してテスト再生箇所のファイルのインデックス情報が存在するか否かを検索し、インデックス情報が存在すれば、「既存音声ファイルあり」として（ステップＳ１０９）、インデックス情報に従ってキャッシュデータ保存部２５からテスト再生箇所の音声ファイルを読み出し、メモリ２２の再生ワークエリアにキャッシュする（ステップＳ１１０）。この音声ファイルを＜データ１＞とする。 More specifically, with reference to the index information storage unit 24, whether or not the index information of the file at the test reproduction location exists by using the text data at the designated test reproduction location among the text data to be converted as a keyword. If the index information exists, “existing audio file exists” is determined (step S109), and the audio file at the test playback location is read from the cache data storage unit 25 according to the index information and cached in the playback work area of the memory 22 ( Step S110). This audio file is <data 1>.

テスト再生３０は、検索部２６による検索の結果、既存のファイルが存在した場合、再生ワークエリアにキャッシュされたテスト再生箇所の既存の音声ファイルを読み出してユーザＰＣ１のスピーカから音声を再生する（ステップＳ１１１）。 When there is an existing file as a result of the search by the search unit 26, the test playback 30 reads the existing audio file at the test playback location cached in the playback work area and plays the audio from the speaker of the user PC 1 (step S30). S111).

この音声を聞いたユーザが発音を確認し、発音が正しいものとして音声変換画面の「ＯＫ」等のボタンを選択操作すると（ステップＳ１１２のＹｅｓ）、データ処理部２８は、テスト再生された部分のテキストデータを確定し（ステップＳ１１３）、確定したテキストデータに変換不要のフラグを付す。テスト再生すべき次のテキストデータがあれば、Ｓ１０４の処理に戻り上記の処理繰り返す。 When the user who has heard this voice confirms the pronunciation and selects a button such as “OK” on the voice conversion screen on the assumption that the pronunciation is correct (Yes in step S112), the data processing unit 28 determines the portion of the test playback portion. The text data is confirmed (step S113), and a conversion unnecessary flag is attached to the confirmed text data. If there is the next text data to be tested, the process returns to S104 and the above process is repeated.

このようにしてテスト再生すべき次のテキストデータがなくなり、テスト再生箇所すべてのテスト再生が終了し、音声ファイル作成指示のためのボタンが押下されると（ステップＳ１１５）、データ処理部２８は、文字入力欄に入力された文面のテキストデータを単語または文節単位に分割し（ステップＳ１１６）、分割した単位でテキストデータを検索部２６に渡し、上記ステップＳ１０８と同様にこの装置内に保存されている既存音声ファイルを検索させる（ステップＳ１１７）。なお、既存音声ファイルを検索動作の詳細については後述の図６で説明する。また、この場合、既にユーザから承認を受け、変換不要（確定済）のフラグが付されているテスト再生済のテキストデータについては、音声へ変換しないため検索対象外とする。 In this way, when there is no next text data to be test-played, test playback of all test playback locations is completed, and the button for audio file creation instruction is pressed (step S115), the data processing unit 28 The text data of the text input in the character input field is divided into words or clauses (step S116), and the text data is transferred to the search unit 26 in the divided units, and is stored in this apparatus in the same manner as in step S108. The existing audio file is searched (step S117). Details of the search operation for the existing audio file will be described later with reference to FIG. Further, in this case, text data that has already been approved by the user and has been subjected to test reproduction with a conversion unnecessary (confirmed) flag added is not subject to search because it is not converted to speech.

検索の結果、既存音声ファイルが存在すると、検索部２６は、「既存音声ファイルあり」として（ステップＳ１１８のＹｅｓ）、インデックス情報に従ってキャッシュデータ保存部２５から音声ファイルを読み出し、メモリ２２の再生ワークエリアにキャッシュする（ステップＳ１１９）。この音声ファイルを＜データ２＞とする。
とする。 If there is an existing audio file as a result of the search, the search unit 26 determines that “existing audio file exists” (Yes in step S118), reads out the audio file from the cache data storage unit 25 according to the index information, and reproduces the work area of the memory 22 (Step S119). This audio file is <data 2>.
And

また上記検索の結果、既存音声ファイルがこの装置内に存在しない場合（ステップＳ１１８のＮｏ）、データ処理部２８は、検索した分割テキストデータの中間ファイルを生成した上で音声合成サーバ３へ転送し（ステップＳ１２０）、音声合成サーバ３により変換された音声ファイルを取得する（ステップＳ１２１）。 As a result of the search, if the existing voice file does not exist in the apparatus (No in step S118), the data processing unit 28 generates an intermediate file of the searched divided text data and transfers it to the voice synthesis server 3. (Step S120), the voice file converted by the voice synthesis server 3 is acquired (Step S121).

そして、得られた音声ファイル（生成物）をメモリ２２の再生ワークエリアにキャッシュする（ステップＳ１２２）。この音声ファイルを＜データ３＞とする。 Then, the obtained audio file (product) is cached in the reproduction work area of the memory 22 (step S122). This audio file is <data 3>.

そして、データ処理部２８は、再生ワークエリアにキャッシュされた音声ファイル（＜データ１＞、＜データ２＞、＜データ３＞）を分割した順序に従い結合し（ステップＳ１２３）、文面通りの音声ファイルを生成し、結合した音声ファイルをユーザＰＣ１のスピーカから再生する（ステップＳ１２４）。 Then, the data processing unit 28 combines the audio files cached in the playback work area (<data 1>, <data 2>, <data 3>) according to the divided order (step S123), and the audio file as written. And the combined audio file is reproduced from the speaker of the user PC 1 (step S124).

この音声を聞いたユーザが再生音声の発音を確認し、発音が間違っていた場合は、その音声ファイルの中間ファイル（テキストデータとアクセント記号）を表示した編集画面を表示し（ステップＳ１２６）、例えば「間違っている箇所のアクセント記号を訂正してください」等といったメッセージを表示してユーザの訂正を促し、ユーザがアクセント記号を訂正すると（ステップＳ１２７）、ステップＳ１２４の処理に戻り音声を再生する。 The user who has heard the sound confirms the pronunciation of the reproduced sound, and if the pronunciation is incorrect, an editing screen displaying the intermediate file (text data and accent mark) of the sound file is displayed (step S126). A message such as “Please correct the accent symbol in the wrong place” is displayed to prompt the user to correct it. When the user corrects the accent symbol (step S127), the process returns to step S124 to reproduce the voice.

ユーザが発音を確認した結果、発音が正しいものとして、画面の音声保存ボタンを選択操作すると（ステップＳ１２８）、データ処理部２８は、保存先を指定するための画面を表示し、保存先が指定されると（ステップＳ１２９）、その指定されたユーザＰＣ１の保存先へ音声ファイルを転送し保存する（ステップＳ１３０）。 As a result of the user confirming the pronunciation, if the pronunciation is correct and the sound saving button on the screen is selected and operated (step S128), the data processing unit 28 displays a screen for designating the saving destination, and the saving destination is designated. If so (step S129), the audio file is transferred to the storage location of the designated user PC 1 and stored (step S130).

その後、処理を継続するか否かに応じて処理を変える。例えば処理終了操作が行われなければ処理を継続するものとして（ステップＳ１３１のＹｅｓ）、ステップＳ１０４の処理に戻り、次のテキスト入力を待機する。 Thereafter, the process is changed depending on whether or not to continue the process. For example, if the process end operation is not performed, it is assumed that the process is continued (Yes in step S131), the process returns to the process in step S104 and waits for the next text input.

また、処理終了操作が行われると、処理を継続しないものとして（ステップＳ１３１のＮｏ）、データ処理部２８は、音声ファイルを履歴記録エリアであるキャッシュデータ保存部２５に保存し（ステップＳ１３２）、保存先情報（装置ＩＤとテキストデータ（以下これを「語句」と称す）をインデックス情報記憶部２４に登録するとともに（ステップＳ１３３）、保存先情報をネットワーク４上の他のアプリケーションサーバ２ｂ〜２ｎへ配信し（ステップＳ１３４）、一連のテキスト／音声の変換処理を終了する。 When the process end operation is performed, the data processing unit 28 stores the audio file in the cache data storage unit 25 that is a history recording area (step S132), assuming that the process is not continued (No in step S131). The storage destination information (device ID and text data (hereinafter referred to as “word / phrase”) is registered in the index information storage unit 24 (step S133), and the storage destination information is transmitted to the other application servers 2b to 2n on the network 4. Distribute (step S134), and a series of text / speech conversion processing ends.

なお他のアプリケーションサーバ２ｂ〜２ｎから音声ファイルの保存先情報（リンク情報またはサーバＩＤと語句（テキストデータ））が配信されてきて、通信処理部２７に受信された場合、登録部２９は、その受信された保存先情報（装置ＩＤと語句）をインデックス情報記憶部２４に登録する。 When the storage destination information (link information or server ID and word (text data)) of the audio file is distributed from the other application servers 2b to 2n and received by the communication processing unit 27, the registration unit 29 The received storage location information (device ID and word / phrase) is registered in the index information storage unit 24.

ここで、図６を参照して検索部２６による既存音声ファイルの検索動作（ステップＳ１０８、ステップＳ１１７）の詳細について説明する。
検索部２６は、テキストデータをキーワードにしてインデックス情報を、予め設定された優先度の順（第１ファイル照合、第２ファイル照合、第３ファイル照合、第４ファイル照合、第５ファイル照合）に検索することで、既存の音声ファイルの保存先を特定し、特定した保存先から既存の音声ファイルを読み出す、または取得する。 Here, the details of the search operation (step S108, step S117) of the existing audio file by the search unit 26 will be described with reference to FIG.
The search unit 26 uses the text data as a keyword and sets the index information in the order of preset priorities (first file verification, second file verification, third file verification, fourth file verification, fifth file verification). By searching, the storage location of the existing audio file is specified, and the existing audio file is read or acquired from the specified storage location.

この場合、検索部２６は、図６に示すように、テキストデータをキーワードにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第１ファイル照合の対象であるログインユーザのユーザＩＤ（Ｕｓｅｒ１）の登録辞書４２（図３参照）のインデックス（中間ファイルおよびテキストなど）を検索し（ステップＳ２０１）、検索されたインデックスのリンク先である保存先を特定することで（ステップＳ２０２）、音声ファイルを読み出す。 In this case, as shown in FIG. 6, the search unit 26 uses the text data as a keyword, and the user of the login user who is the target of the first file verification among the index information stored in the index information storage unit 24. By searching the index (intermediate file, text, etc.) of the registration dictionary 42 (see FIG. 3) of ID (User1) (step S201), the storage destination that is the link destination of the searched index is specified (step S202). Read audio file.

第１ファイル照合の結果、合致する情報が検索されない場合、検索部２６は、第２ファイル照合の対象であるログインユーザのユーザＩＤ（Ｕｓｅｒ１）の変換履歴４３（図３参照）の履歴データ（中間ファイルおよびテキストなど）を検索し（ステップＳ２０３）、検索された履歴データのリンク先である保存先を特定することで（ステップＳ２０４）、音声ファイルを読み出す。 If no matching information is found as a result of the first file collation, the retrieval unit 26 records history data (intermediate) of the conversion history 43 (see FIG. 3) of the user ID (User1) of the login user who is the second file collation target. (File and text) are searched (step S203), and the storage destination that is the link destination of the searched history data is specified (step S204), and the audio file is read out.

第２ファイル照合の結果、合致する情報が検索されない場合、検索部２６は、第３ファイル照合の対象である他のユーザのユーザＩＤ（Ｕｓｅｒ２…）の登録辞書４２（図３参照）のインデックス（中間ファイルおよびテキストなど）を検索し（ステップＳ２０５）、検索されたインデックスのリンク先である保存先を特定することで（ステップＳ２０６）、音声ファイルを読み出す。 If no matching information is found as a result of the second file collation, the search unit 26 indexes the registration dictionary 42 (see FIG. 3) of the user ID (User2...) Of another user who is the third file collation target (see FIG. 3). An intermediate file, text, etc.) are searched (step S205), and a storage destination that is a link destination of the searched index is specified (step S206), and an audio file is read.

第３ファイル照合の結果、合致する情報が検索されない場合、検索部２６は、第４ファイル照合の対象である他のユーザのユーザＩＤ（Ｕｓｅｒ２…）の変換履歴４３（図３参照）の履歴データ（中間ファイルおよびテキストなど）を検索し（ステップＳ２０７）、検索された履歴データのリンク先である保存先を特定することで（ステップＳ２０８）、音声ファイルを読み出す。 If no matching information is found as a result of the third file collation, the retrieval unit 26 records the history data of the conversion history 43 (see FIG. 3) of the user ID (User2...) Of another user who is the fourth file collation target. (Intermediate file, text, etc.) are searched (step S207), and the storage destination that is the link destination of the searched history data is specified (step S208), and the audio file is read.

第４ファイル照合の結果、合致する情報が検索されない場合（ステップＳ２０７のＮｏ）、検索部２６は、第５ファイル照合の対象であるアプリケーションサーバ２ａ〜２ｎのサーバ履歴４４（図３参照）の履歴データ（テキスト、中間ファイル、保存先など）に当該音声ファイルがあるか否かを検索し（ステップＳ２０９）、キーワードと合致した履歴データのリンク先である保存先を特定する。
そして、特定した保存先がキャッシュデータ保存部２５であれば（ステップＳ２１０のＹｅｓ）、検索部２６は、テスト再生部３０に対して音声ファイルの読出先としてキャッシュデータ保存部２５を指定する（ステップＳ２１１）。
また、特定した保存先がこのアプリケーションサーバ２ａの外のネットワーク４上のアプリケーションサーバ２ｂ〜２ｎ（例えばアプリケーションサーバ２ｂ）であれば（ステップＳ２１０のＮｏ）、検索部２６は、保存先のアプリケーションサーバ、例えばアプリケーションサーバ２ｂに対して音声ファイルの取得要求を行い、アプリケーションサーバ２ｂから当該音声ファイルを取得する（ステップＳ２１２）。 If no matching information is found as a result of the fourth file collation (No in step S207), the retrieval unit 26 records the server history 44 (see FIG. 3) of the application servers 2a to 2n that are the targets of the fifth file collation. It is searched whether or not the sound file exists in the data (text, intermediate file, storage destination, etc.) (step S209), and the storage destination that is the link destination of the history data matching the keyword is specified.
If the specified storage destination is the cache data storage unit 25 (Yes in step S210), the search unit 26 designates the cache data storage unit 25 as a read destination of the audio file to the test reproduction unit 30 (step S210). S211).
Further, if the specified storage destination is the application server 2b to 2n (for example, the application server 2b) on the network 4 outside the application server 2a (No in step S210), the search unit 26 includes the storage destination application server, For example, an audio file acquisition request is made to the application server 2b, and the audio file is acquired from the application server 2b (step S212).

このようにこの実施形態によれば、今までのテキスト／音声変換の履歴を、アプリケーションサーバ２ａ（自機または自装置ともいう）だけでなく、ネットワーク４上の他のアプリケーションサーバ２ｂ〜２ｎ（他機）についてもキャッシュされている音声ファイルの情報を共有および管理し、変換済み音声ファイルの検索の範囲を直ちに特定することによって、ユーザが以前に行った変換履歴を可能な限り利用して音声合成サーバ３の負荷を軽減しつつレスポンス速度を向上することができる。 As described above, according to this embodiment, not only the application server 2a (also referred to as the own device or the own device) but also the other application servers 2b to 2n (others) on the network 4 are recorded on the history of text / speech conversion so far. (Synchronized audio files), and by sharing and managing cached audio file information and immediately specifying the search range of converted audio files, the user can make use of the conversion history previously performed as much as possible. The response speed can be improved while reducing the load on the server 3.

ネットワーク４上の複数のアプリケーションサーバ２ａ〜２ｎに変換済みの音声ファイルを分散して保管させることで、個々のアプリケーションサーバ２ａ〜２ｎの保存容量や処理性能を抑えることができ、装置の低コスト化を図ることができる。 By distributing and storing the converted audio files to the plurality of application servers 2a to 2n on the network 4, the storage capacity and processing performance of the individual application servers 2a to 2n can be suppressed, and the cost of the apparatus can be reduced. Can be achieved.

また音声変換を音声合成サーバ３に依頼することなく、複数のアプリケーションサーバ２ａ〜２ｎに蓄積されていた過去に変換済みの音声ファイルを利用することで、音声合成サーバ３の負荷軽減を図ると共に変換時間を短縮し、ユーザへのレスポンス速度を向上することができる。また音声合成サーバ３が変換処理するデータ量を低減することができる。さらに、修正箇所のミニマム化が図れ、最終的にユーザの音声変換作業の時間短縮につながる。 In addition, it is possible to reduce the load on the speech synthesis server 3 and convert it by using the speech files converted in the past stored in the plurality of application servers 2a to 2n without requesting the speech synthesis server 3 for speech conversion. The time can be shortened and the response speed to the user can be improved. In addition, the amount of data to be converted by the speech synthesis server 3 can be reduced. In addition, the correction location can be minimized, which ultimately shortens the time required for voice conversion by the user.

すなわち音声データの保存に関する管理情報を音声変換支援装置にて共有することにより、当該装置を直ちに特定して検索することにより処理速度を向上させるとともに負荷低減を図ることができる。 That is, by sharing management information related to storage of audio data in the audio conversion support device, it is possible to improve the processing speed and reduce the load by immediately specifying and searching for the device.

説明した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The described embodiments are presented by way of example and are not intended to limit the scope of the invention. The novel embodiment can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The above-described embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and equivalents thereof.

また上記実施形態に示した各構成要素を、コンピュータのハードディスク装置などのストレージにインストールしたプログラムで実現してもよく、また上記プログラムを、コンピュータ読取可能な電子媒体：electronic mediaに記憶しておき、プログラムを電子媒体からコンピュータに読み取らせることで本発明の機能をコンピュータが実現するようにしてもよい。電子媒体としては、例えばＣＤ−ＲＯＭ等の記録媒体やフラッシュメモリ、リムーバブルメディア：Removable media等が含まれる。さらに、ネットワークを介して接続した異なるコンピュータに構成要素を分散して記憶し、各構成要素を機能させたコンピュータ間で通信することで実現してもよい。 Further, each component shown in the above embodiment may be realized by a program installed in a storage such as a hard disk device of a computer, and the program is stored in a computer-readable electronic medium: electronic media, The computer may realize the functions of the present invention by causing a computer to read a program from an electronic medium. Examples of the electronic medium include a recording medium such as a CD-ROM, flash memory, and removable media. Further, the configuration may be realized by distributing and storing components in different computers connected via a network, and communicating between computers in which the components are functioning.

１…ユーザＰＣ、２ａ〜２ｎ…アプリケーションサーバ、３…音声合成サーバ３１…グラフィックユーザインターフェース部（ＧＵＩ部）、２２…メモリ、２３…中間ファイル生成部、２４…インデックス情報記憶部、２５…キャッシュデータ保存部、２５…キャッシュデータ保存部、２６…検索部、２７…通信処理部、２８…データ処理部、２９…登録部、３０…テスト再生部、４１…ユーザＩＤテーブル、４２…ユーザＩＤ毎の登録辞書、４３…ユーザＩＤ毎の変換履歴、４４…サーバ履歴。 DESCRIPTION OF SYMBOLS 1 ... User PC, 2a-2n ... Application server, 3 ... Speech synthesis server 31 ... Graphic user interface part (GUI part), 22 ... Memory, 23 ... Intermediate file generation part, 24 ... Index information storage part, 25 ... Cache data Storage unit 25 ... Cache data storage unit 26 ... Search unit 27 ... Communication processing unit 28 ... Data processing unit 29 ... Registration unit 30 ... Test playback unit 41 ... User ID table 42 ... For each user ID Registration dictionary, 43 ... Conversion history for each user ID, 44 ... Server history.

Claims

Connected via a network to a user terminal that makes a conversion request from text data to voice data and a voice conversion device that converts the text data to voice data, and in response to the text data conversion request from the terminal In the speech conversion support device for causing the speech conversion device to convert the text data when there is no conversion history of the text data in the past, and transmitting the converted speech data to the terminal,
A converted voice data storage unit in which the voice data converted by the voice converter and the text data of the conversion source corresponding to the voice data are stored;
A history information storage unit storing history information for searching for a storage destination of the converted voice data including the converted voice data storage unit and other voice conversion support devices on the network;
Using the text data of the designated test playback portion of the text data to be converted input from the terminal as a keyword, the history information storage unit is referred to specify the storage location of the audio data of the test playback portion, and the storage When the destination is the converted voice data storage unit in the device, the converted voice data storage unit is designated as the voice data read destination, and the storage destination is another voice conversion support device on the network A search unit for requesting acquisition of the voice data to the other voice conversion support device, and acquiring the voice data;
A test reproduction unit for performing test reproduction of the audio data read from the converted audio data storage unit designated by the search unit as a read destination of the audio data or the audio data acquired from the other audio conversion support device; ,
When receiving an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the audio data obtained by converting the text data excluding the test reproduction location to the audio conversion device and the test A voice conversion support apparatus comprising: a control unit that combines the reproduced voice data that has been converted in the past and transmits the voice data to the terminal.

The speech conversion support device according to claim 1, further comprising: a distribution unit that distributes storage destination information of the speech data newly registered in the converted speech data storage unit to another speech conversion support device on the network.

The speech conversion support apparatus according to claim 1, further comprising a registration unit that registers storage destination information of speech data distributed from another speech conversion support apparatus on the network in the history information storage unit.

Connected via a network to a user terminal that makes a conversion request from text data to voice data and a voice conversion device that converts the text data to voice data, and in response to the text data conversion request from the terminal In the speech conversion support method in the speech conversion support device for causing the speech conversion device to convert the text data when there is no conversion history of the text data in the past, and transmitting the converted speech data to the terminal,
Storing the voice data converted by the voice converter and the text data of the conversion source corresponding to the voice data in the converted voice data storage unit;
Storing history information for searching a storage destination of the converted voice data including the converted voice data storage unit and other voice conversion support devices on the network in the history information storage unit;
The search unit specifies the storage destination of the audio data of the test reproduction part by referring to the history information storage unit using the text data of the designated test reproduction part among the text data to be converted inputted from the terminal as a keyword. When the storage destination is the converted voice data storage unit inside the apparatus, the converted voice data storage unit is designated as a reading destination of the voice data, and the storage destination is another voice conversion support on the network. In the case of a device, it makes an acquisition request for the voice data to the other voice conversion support device, acquires the voice data,
The test reproduction unit performs test reproduction of the audio data read from the converted audio data storage unit designated by the search unit as the audio data read destination or the audio data acquired from the other audio conversion support device. ,
When the control unit receives an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the audio obtained by converting the text data excluding the test reproduction part to the audio conversion device An audio conversion support method for combining data and test-reproduced audio data converted in the past and transmitting the data to the terminal.

A user terminal that makes a conversion request from text data to voice data, a voice conversion device that converts the text data to voice data, and conversion of the text data in the past in response to the text data conversion request from the terminal In the speech synthesis system in which the speech conversion device is connected via a network with the speech conversion support device that converts the text data to the speech conversion device when there is no history, and transmits the converted speech data to the terminal.
The voice conversion support device
A converted voice data storage unit in which the voice data converted by the voice converter and the text data of the conversion source corresponding to the voice data are stored;
A history information storage unit storing history information for searching for a storage destination of the converted voice data including the converted voice data storage unit and other voice conversion support devices on the network;
Using the text data of the designated test playback portion of the text data to be converted input from the terminal as a keyword, the history information storage unit is referred to specify the storage location of the audio data of the test playback portion, and the storage When the destination is the converted voice data storage unit in the device, the converted voice data storage unit is designated as the voice data read destination, and the storage destination is another voice conversion support device on the network A search unit for requesting acquisition of the voice data to the other voice conversion support device, and acquiring the voice data;
A test reproduction unit for performing test reproduction of the audio data read from the converted audio data storage unit designated by the search unit as a read destination of the audio data or the audio data acquired from the other audio conversion support device; ,
When receiving an instruction from the terminal that the audio data test-reproduced by the test reproduction unit is correct, the audio data obtained by converting the text data excluding the test reproduction location to the audio conversion device and the test A speech synthesis system comprising: a control unit that combines the reproduced speech data that has been converted in the past and transmits the combined speech data to the terminal.