JP5502787B2

JP5502787B2 - Voice conversion support device, program, and voice conversion support method

Info

Publication number: JP5502787B2
Application number: JP2011057229A
Authority: JP
Inventors: 淳町田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2011-03-15
Filing date: 2011-03-15
Publication date: 2014-05-28
Anticipated expiration: 2031-03-15
Also published as: JP2012194284A

Description

本発明の実施形態は、例えば文字等のキャラクタを音声に変換するサービスに利用される音声変換支援装置、プログラムおよび音声変換支援方法に関する。 Embodiments described herein relate generally to a speech conversion support apparatus, a program, and a speech conversion support method used for a service that converts a character such as a character into speech.

近年、例えばインターネットなどでは、文字を音声に変換するサービスが開始されており、このサービスには音声合成装置が利用されている。 In recent years, for example, on the Internet, a service for converting characters into speech has been started, and a speech synthesizer is used for this service.

一般に、音声合成装置は、ユーザが入力したテキストの文面（文字列）を音声合成波形データに変換し、音声信号または音声ファイルを出力するものである。 In general, a speech synthesizer converts a text (character string) of text input by a user into speech synthesis waveform data and outputs a speech signal or a speech file.

ところで、文字を音声に変換するためには処理性能の高い音声合成装置が必要であり、この種のサービスを開始するためにはコストがかかる。一方、性能の低い音声合成装置を用いた場合は、音声合成処理に時間がかかり、応答性が損なわれるという問題もある。 By the way, in order to convert characters into speech, a speech synthesizer with high processing performance is required, and it is expensive to start this type of service. On the other hand, when a speech synthesizer with low performance is used, there is a problem that speech synthesis processing takes time and responsiveness is impaired.

特開２００４−１１７７７８号公報JP 2004-117778 A

一般に、この種のサービスでは、ユーザが入力したテキストに対して、特色のない一定の音声データが返されるだけであるため、サービスに特徴を持たせるために何らかの付加価値を付ける必要がある。 In general, this type of service only returns certain voice data having no special characteristics to the text input by the user. Therefore, it is necessary to add some added value in order to give the service a characteristic.

本発明が解決しようとする課題は、文字／音声変換の応答性を維持しつつコストダウンを図り、さらには、特徴のあるサービスを実現することができる音声変換支援装置、プログラムおよび音声変換支援方法を提供することにある。 The problem to be solved by the present invention is to reduce costs while maintaining the responsiveness of character / speech conversion, and further, a speech conversion support apparatus, program and speech conversion support method capable of realizing a characteristic service Is to provide.

実施形態の音声変換支援装置は、テキストデータを音声データに変換する音声変換装置とネットワークを介して接続されている。前記音声変換支援装置はキャッシュデータ保存部、インデックス情報記憶部、検索部、データ処理部を備える。前記キャッシュデータ保存部には前記音声変換装置により以前に変換された音声データとこの音声データの変換元のテキストデータとが対応して過去データとして保存される。前記インデックス情報記憶部には前記キャッシュデータ保存部に保存されている過去データをユーザとの親和性の高い順に検索するための、絞り込み範囲の異なる複数のインデックス情報が順位付けして記憶される。前記検索部は要求元から入力されたテキストデータを基に前記インデックス情報記憶部の複数のインデックス情報を前記順位付けの順に検索する。前記データ処理部は前記検索部による検索の結果、前記テキストデータが存在したインデックス情報により示される過去に使用された音声データを前記キャッシュデータ保存部から読み出して前記要求元へ返す一方、いずれのインデックス情報にも存在しない前記テキストデータを前記音声変換装置へ送り、前記音声変換装置により変換された音声データを前記要求元へ返す。前記インデックス情報記憶部は、この装置にログインしたユーザが独自に登録したユーザ登録辞書と、前記ユーザの使用履歴である第１使用履歴と、この装置にログインした全てのユーザの使用履歴である第２使用履歴とを備え、検索ステップでは、第１に前記ユーザ登録辞書を検索し、前記ユーザ登録辞書に、入力されたテキストデータが存在しない場合に、第２に前記第１使用履歴を検索し、前記第１使用履歴に前記テキストデータが存在しない場合、第３に前記第２使用履歴を検索する。 The speech conversion support device according to the embodiment is connected to a speech conversion device that converts text data into speech data via a network. The voice conversion support device includes a cache data storage unit, an index information storage unit, a search unit, and a data processing unit. In the cache data storage unit, voice data previously converted by the voice converter and text data from which the voice data is converted are stored as past data correspondingly. The index information storage unit ranks and stores a plurality of pieces of index information with different narrowing ranges for searching past data stored in the cache data storage unit in descending order of affinity with the user. The search unit searches a plurality of pieces of index information in the index information storage unit in the order of ranking based on text data input from a request source. The data processing unit as a result of search by the search unit, whereas the text data returns the audio data used in the past as indicated by the presence index information to the previous reading from the cache data storage unit Kiyo Motomemoto, either The text data that does not exist in the index information is sent to the voice converter, and the voice data converted by the voice converter is returned to the request source. The index information storage unit is a user registration dictionary uniquely registered by a user who has logged in to the device, a first usage history which is a usage history of the user, and a usage history of all users who have logged into the device. The search step first searches the user registration dictionary, and when the input text data does not exist in the user registration dictionary, the search step second searches the first usage history. If the text data does not exist in the first usage history, third, the second usage history is searched.

実施形態の音声合成システムの全体の構成を示す図である。1 is a diagram illustrating an overall configuration of a speech synthesis system according to an embodiment. アプリケーションサーバのブロック図である。It is a block diagram of an application server. インデックス情報の一例を示す図である。It is a figure which shows an example of index information. この音声合成システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of this speech synthesis system. ユーザ毎の辞書登録動作を示すフローチャートである。It is a flowchart which shows the dictionary registration operation | movement for every user. アプリケーションサーバにおけるキャッシュデータの検索動作を示すフローチャートである。It is a flowchart which shows the search operation | movement of the cache data in an application server.

以下、図面を参照して、実施形態を詳細に説明する。
（第１の実施形態）図１は第１の実施形態の音声合成システムの構成を示す図である。 Hereinafter, embodiments will be described in detail with reference to the drawings.
(First Embodiment) FIG. 1 is a diagram showing the configuration of a speech synthesis system according to a first embodiment.

図１に示すように、この実施形態の音声合成システムは、サービス利用者（以下「ユーザ」と称す）が操作する端末装置であるコンピュータ１（以下「ユーザＰＣ１」と称す）、音声合成エンジンを搭載したコンピュータである音声合成サーバ３と、音声変換支援装置としてのコンピュータ２（以下「アプリケーションサーバ２」と称す）と、これらの機器を接続するネットワーク４等から構成されている。 As shown in FIG. 1, the speech synthesis system of this embodiment includes a computer 1 (hereinafter referred to as “user PC 1”), which is a terminal device operated by a service user (hereinafter referred to as “user”), and a speech synthesis engine. It is composed of a voice synthesis server 3 which is a mounted computer, a computer 2 (hereinafter referred to as “application server 2”) as a voice conversion support device, and a network 4 which connects these devices.

音声合成サーバ３は、アプリケーションサーバ２から転送（入力）された中間ファイル（テキストデータとアクセント記号の組）を音声データ（以下「音声ファイル」と称す）に変換してアプリケーションサーバ２に返す。 The speech synthesis server 3 converts the intermediate file (a set of text data and accent marks) transferred (input) from the application server 2 into speech data (hereinafter referred to as “speech file”) and returns it to the application server 2.

アプリケーションサーバ２は、音声合成サーバ３とユーザＰＣ１との間に介在してテキストデータ、中間ファイルおよび音声ファイルのやりとりを行う。 The application server 2 is interposed between the voice synthesis server 3 and the user PC 1 and exchanges text data, intermediate files, and voice files.

図２に示すように、アプリケーションサーバ２は、グラフィックユーザインターフェース部２１（以下「ＧＵＩ部２１」と称す）、メモリ２２、中間ファイル生成部２３、キャッシュされたデータを管理するためのインデックス情報記憶部２４、キャッシュデータ保存部２５、検索部２６、通信処理部２７、データ処理部２８、登録部２９などを有している。 As shown in FIG. 2, the application server 2 includes a graphic user interface unit 21 (hereinafter referred to as “GUI unit 21”), a memory 22, an intermediate file generation unit 23, and an index information storage unit for managing cached data. 24, a cache data storage unit 25, a search unit 26, a communication processing unit 27, a data processing unit 28, a registration unit 29, and the like.

アプリケーションサーバ２は、ユーザＰＣ１から入力されたテキストデータをキーワード（検索キー）にしてインデックス情報記憶部２４のインデックス情報を利用してキャッシュデータ保存部２５にキャッシュ（記憶）された音声ファイルを検索し、ヒットした場合は、音声合成サーバ３に音声合成を要求することなく、キャッシュされた音声ファイルを読み出して要求元であるユーザＰＣ１に返す。 The application server 2 searches the audio data cached (stored) in the cache data storage unit 25 using the index information in the index information storage unit 24 using the text data input from the user PC 1 as a keyword (search key). If there is a hit, the cached voice file is read out and returned to the requesting user PC 1 without requesting the voice synthesis server 3 for voice synthesis.

すなわち、アプリケーションサーバ２は、自身のハードディスク装置にキャッシュされているか否かをチェックし、キャッシュされていない場合に、ユーザＰＣ１から入力されたテキストデータを音声合成サーバ３へ出力し、このテキストデータに対する応答として音声合成サーバ３にて変換（音声合成）された音声ファイルを取得しユーザＰＣ１へ送る。 In other words, the application server 2 checks whether or not it is cached in its own hard disk device, and if it is not cached, it outputs the text data input from the user PC 1 to the speech synthesis server 3, and As a response, the voice file converted (voice synthesized) by the voice synthesis server 3 is acquired and sent to the user PC 1.

ＧＵＩ部２１は、ユーザＰＣ１からアプリケーションサーバ２にログインするための画面、検索画面、登録画面など表示し、ユーザＰＣ１からの音声合成要求、テキストデータの入力などを受け付けるとともに、要求に対する応答として音声ファイルをＰＣへ送る。 The GUI unit 21 displays a screen for logging in to the application server 2 from the user PC 1, a search screen, a registration screen, etc., accepts a voice synthesis request from the user PC 1, input of text data, and the like, and a voice file as a response to the request To the PC.

つまり、ＧＵＩ部２１は、ユーザＰＣ１とアプリケーションサーバ２との間の入出力インターフェースを実現するものである。 That is, the GUI unit 21 realizes an input / output interface between the user PC 1 and the application server 2.

メモリ２２には、音声変換辞書が記憶されている。音声変換辞書は音声合成サーバ３により変換（音声合成）される音声データの変換元のテキストデータとこのテキストデータを音声に変換する際に音の強弱を指定するアクセント記号（アクセント情報）とが対応して保存され参照用の辞書である。この音声変換辞書は中間ファイル生成部２３により利用される。 The memory 22 stores a voice conversion dictionary. The voice conversion dictionary corresponds to the text data that is converted from the voice data converted (voice synthesized) by the voice synthesis server 3 and the accent symbols (accent information) that specify the strength of the sound when the text data is converted into voice. It is a dictionary for reference that is saved. This voice conversion dictionary is used by the intermediate file generation unit 23.

またメモリ２２は、データ処理部２８、検索部２６および登録部２９などがそれぞれの処理を実行する際のワークエリアとして利用される。 The memory 22 is used as a work area when the data processing unit 28, the search unit 26, the registration unit 29, and the like execute their respective processes.

中間ファイル生成部２３は、ユーザＰＣ１から入力された変換対象のテキストデータをキーにして音声変換辞書を参照して、対応するアクセント記号を音声変換辞書から読み出してテキストデータとアクセント記号との組の中間ファイルを生成し、メモリ２２に記憶する。中間ファイルは、音声合成用の元データとして音声合成サーバ３へ送信される。 The intermediate file generation unit 23 refers to the speech conversion dictionary by using the text data to be converted input from the user PC 1 as a key, reads the corresponding accent symbol from the speech conversion dictionary, and sets a set of the text data and the accent symbol. An intermediate file is generated and stored in the memory 22. The intermediate file is transmitted to the speech synthesis server 3 as original data for speech synthesis.

キャッシュデータ保存部２５には、以前（過去）に変換された音声ファイル、テキストデータ、中間ファイルなどが保存されている。 The cache data storage unit 25 stores voice files, text data, intermediate files, and the like that have been converted to the previous (past) time.

インデックス情報記憶部２４には、キャッシュデータ保存部２５にキャッシュ（保存）されている過去のデータをユーザとの親和性の高い順に検索するための、絞り込み範囲の異なる複数のインデックス情報が順位付けして記憶されている。 In the index information storage unit 24, a plurality of pieces of index information with different narrowing ranges for searching the past data cached (saved) in the cache data storage unit 25 in order of high affinity with the user are ranked. Is remembered.

つまりインデックス情報記憶部２４には、キャッシュデータ保存部２５にキャッシュされたデータを管理するためのインデックス情報が記憶されている。なおインデックス情報については図３で具体的に説明する。 That is, the index information storage unit 24 stores index information for managing the data cached in the cache data storage unit 25. The index information will be specifically described with reference to FIG.

検索部２６は、ユーザＩＤ毎に設定された登録辞書４２、ユーザＩＤ毎の使用履歴４３、サーバ使用履歴４４を順位の順に検索する。すなわち検索部２６は、要求元から入力されたテキストデータを基にインデックス情報記憶部２４の複数のインデックス情報を順位付けの順に検索する。 The search unit 26 searches the registration dictionary 42 set for each user ID, the use history 43 for each user ID, and the server use history 44 in order of rank. That is, the search unit 26 searches the plurality of index information in the index information storage unit 24 in the order of ranking based on the text data input from the request source.

通信処理部２７は、音声合成サーバ３との間で、ＴＣＰ（ＨＴＴＰ）通信により、データのやりとりを行う。 The communication processing unit 27 exchanges data with the speech synthesis server 3 by TCP (HTTP) communication.

データ処理部２８は、検索部２６による検索の結果、テキストデータ（検索キーまたはキーワード等とも言う）が存在したインデックス情報により示される過去のデータの保存場所、つまりキャッシュデータ保存部２５からキャッシュされている音声ファイルを読み出して変換要求元のユーザＰＣ１へ返す。 As a result of the search by the search unit 26, the data processing unit 28 is cached from the past data storage location indicated by the index information where the text data (also referred to as a search key or keyword) exists, that is, from the cache data storage unit 25. Is read out and returned to the conversion requesting user PC1.

一方、いずれのインデックス情報にもキーワードが存在しない場合は、入力されたテキストデータを基に中間ファイル生成部２３により生成されメモリ２２に記憶された中間ファイルを音声合成サーバ３へ送り、音声合成サーバ３により変換（音声合成）された音声ファイルを要求元のユーザＰＣ１へ返す。なおテキストデータを送ってもよい。 On the other hand, if no keyword exists in any of the index information, the intermediate file generated by the intermediate file generation unit 23 based on the input text data and stored in the memory 22 is sent to the speech synthesis server 3, and the speech synthesis server The voice file converted (voice synthesized) by 3 is returned to the requesting user PC 1. Text data may be sent.

登録部２９は、ＧＵＩ部２１により表示される辞書登録画面にて、ユーザＩＤ毎の登録辞書４２にユーザが独自に入力または編集した辞書情報（テキストデータとアクセント記号）を登録する。 The registration unit 29 registers the dictionary information (text data and accent symbols) that the user independently inputs or edits in the registration dictionary 42 for each user ID on the dictionary registration screen displayed by the GUI unit 21.

インデックス情報記憶部２４には、図３に示すように、ユーザＩＤテーブル４１の各ユーザＩＤに紐付けられた複数のインデックス情報（ユーザＩＤ毎の登録辞書４２、ユーザＩＤ毎の使用履歴４３、このアプリケーションサーバ２のすべてのユーザの使用履歴４４（以下「サーバ使用履歴４４」と称す）などの３つのインデックス情報）が記憶されている。 As shown in FIG. 3, the index information storage unit 24 includes a plurality of pieces of index information (a registration dictionary 42 for each user ID, a use history 43 for each user ID, and the like) associated with each user ID in the user ID table 41. Stored is a usage history 44 (hereinafter referred to as “server usage history 44”) of all users of the application server 2).

ユーザＩＤテーブル４１には、このアプリケーションサーバ２にログイン可能なユーザの識別情報であるユーザＩＤが設定されている。ユーザの識別情報は、ユーザＩＤだけでなくパスワードなども含まれる。 In the user ID table 41, a user ID that is identification information of a user who can log in to the application server 2 is set. The user identification information includes not only the user ID but also a password.

ユーザＩＤ毎の登録辞書４２には、ユーザが独自に登録したテキストデータ（これを「テキスト」という）と、テキストとそのアクセスト記号の組である中間ファイルと、これらのデータに対応する音声ファイルの保存先を示す保存先インデックスとが記憶されている。 The registration dictionary 42 for each user ID includes text data uniquely registered by the user (referred to as “text”), an intermediate file that is a set of text and its access symbol, and an audio file corresponding to these data. And a storage location index indicating the storage location of.

このユーザＩＤ毎の登録辞書４２は、検索の際の順位として第１，２番目の順位（第１，２順位）に設定されており、検索部２６がキャッシュデータを検索するときに初めに参照される。この辞書の中での順位は、第１順位が中間ファイル、第２順位がテキストである。インデックス情報としての検索順位は第１番目である。 The registration dictionary 42 for each user ID is set to the first and second ranks (first and second ranks) as the ranks for the search, and is referred to first when the search unit 26 searches the cache data. Is done. In the dictionary, the first rank is an intermediate file, and the second rank is text. The search order as index information is first.

ユーザＩＤ毎の使用履歴４３には、ログインしたユーザＩＤ毎にこの音声変換機能を使用したときのテキストと、テキストとそのアクセスト記号の組である中間ファイルと、これらのデータに対応する音声ファイルの保存先を示す保存先インデックスとが記憶されている。テキスト、中間ファイル、保存先インデックスなどを使用履歴という。 The usage history 43 for each user ID includes a text when this voice conversion function is used for each logged-in user ID, an intermediate file that is a set of the text and its access symbol, and a voice file corresponding to these data. And a storage location index indicating the storage location of. Text, intermediate files, save destination indexes, etc. are called usage history.

ユーザＩＤ毎の使用履歴４３は、検索の際の順位として第３，４番目の順位（第３，４順位）に設定されており、検索部２６がキャッシュデータを検索するときに第３，４番目に参照される。この履歴の中での順位は、第３順位が中間ファイル、第４順位がテキストである。インデックス情報としての検索順位は第２番目である。 The usage history 43 for each user ID is set to the third and fourth ranks (third and fourth ranks) as the ranks for retrieval, and the third and fourth ranks when the retrieval unit 26 retrieves cache data. Referenced to th. In the history, the third rank is an intermediate file, and the fourth rank is text. The search order as index information is second.

サーバ使用履歴４４には、このアプリケーションサーバ２にユーザがログインしてこの音声変換機能を使用した際のすべてのユーザの使用履歴が記憶されている。サーバ使用履歴４４は、検索の際の順位として第５，６番目の順位（第５，６順位）に設定されており、検索部２６がキャッシュデータを検索するときに第５，６番目に参照される。この履歴の中での順位は、第５順位が中間ファイル、第６順位がテキストである。インデックス情報としての検索順位は第３番目である。 The server usage history 44 stores usage histories of all users when the user logs in to the application server 2 and uses the voice conversion function. The server usage history 44 is set to the fifth and sixth ranks (fifth and sixth ranks) as the rank in the search, and is referred to the fifth and sixth when the search unit 26 searches the cache data. Is done. In the history, the fifth rank is an intermediate file, and the sixth rank is text. The search order as index information is the third.

次に、図４乃至図６のフローチャートを参照してこの実施形態の音声合成システムに動作を説明する。まず、図４のフローチャートを参照してこのシステム全体の動作を説明する。 Next, the operation of the speech synthesis system of this embodiment will be described with reference to the flowcharts of FIGS. First, the operation of the entire system will be described with reference to the flowchart of FIG.

この実施形態の音声合成システムの場合、ユーザがユーザＰＣ１から所定のＵＲＬを入力し、アプリケーションサーバ２にアクセスすると、ＧＵＩ部２１はログイン画面をユーザＰＣ１に表示するので、ユーザは表示されたログイン画面の入力欄に、ログインＩＤなどのログイン情報を入力する（図４のステップＳ１０１）。この他、ログイン情報としてパスワードなども入力する場合がある。 In the case of the speech synthesis system of this embodiment, when the user inputs a predetermined URL from the user PC 1 and accesses the application server 2, the GUI unit 21 displays a login screen on the user PC 1, so that the user displays the displayed login screen. In the input field, login information such as a login ID is input (step S101 in FIG. 4). In addition, a password may be input as login information.

すると、ＧＵＩ部２１は入力されたログイン情報をメモリ２２のユーザＩＤテーブル４１のユーザＩＤと照合することで、ログイン情報が登録済みか否かを判定し（ステップＳ１０２）、ログイン情報が登録済みの場合（ステップＳ１０２のＹｅｓ）、アプリケーションサーバ２へのログインを許可し、音声変換画面を表示する（ステップＳ１０３）。 Then, the GUI unit 21 determines whether the login information has been registered by checking the input login information with the user ID in the user ID table 41 of the memory 22 (step S102). In the case (Yes in step S102), login to the application server 2 is permitted and a voice conversion screen is displayed (step S103).

ユーザが、音声変換画面の文字入力欄に、キー入力により、変換対象の文字（テキストデータ）を入力すると（ステップＳ１０４）、ＧＵＩ部２１はその入力を受け付ける。 When the user inputs a character to be converted (text data) by key input in the character input field of the voice conversion screen (step S104), the GUI unit 21 receives the input.

そして、音声変換画面に表示されている音声ファイル作成指示のためのボタンを押下すると（ステップＳ１０５）、中間ファイル生成部２３が、受け付けた変換対象のテキストデータをキーにして音声変換辞書を参照して、対応するアクセント記号を音声変換辞書から読み出してテキストデータとアクセント記号との組の中間ファイルを生成し（ステップＳ１０６）、生成した中間ファイルを音声変換画面に表示する（ステップＳ１０７）。 When the button for voice file creation instruction displayed on the voice conversion screen is pressed (step S105), the intermediate file generation unit 23 refers to the voice conversion dictionary using the received text data to be converted as a key. Then, the corresponding accent symbol is read from the speech conversion dictionary, an intermediate file of a set of text data and an accent symbol is generated (step S106), and the generated intermediate file is displayed on the speech conversion screen (step S107).

続いて、ユーザが、音声変換画面に表示されている検索指示のためのボタンを押下すると（ステップＳ１０８）、検索部２６が、受け付けたテキストデータをキーにしてインデックス情報を、設定された順位の順に検索する（ステップＳ１０９）。なお検索動作の詳細について図５で説明する。 Subsequently, when the user presses a button for a search instruction displayed on the voice conversion screen (step S108), the search unit 26 uses the received text data as a key to display the index information in the set order. Search in order (step S109). Details of the search operation will be described with reference to FIG.

検索の結果、対象のテキストデータが、インデックス情報記憶部２４の複数のインデックス情報のうちのいずれにも存在しない場合（ステップＳ１１０のＮｏ）、データ処理部２８は、中間ファイル生成部２３により生成された中間ファイルと共に音声変換要求を音声合成サーバ３へ転送（送信）する（ステップＳ１１１）。 As a result of the search, when the target text data does not exist in any of the plurality of index information in the index information storage unit 24 (No in step S110), the data processing unit 28 is generated by the intermediate file generation unit 23. The voice conversion request is transferred (transmitted) to the voice synthesis server 3 together with the intermediate file (step S111).

音声合成サーバ３は、中間ファイル及び音声変換要求を受けて、中間ファイルを基に音声合成し、生成した音声ファイルを、ネットワークを通じてアプリケーションサーバ２に返信する（ステップＳ１１２）。 The voice synthesis server 3 receives the intermediate file and voice conversion request, synthesizes voice based on the intermediate file, and returns the generated voice file to the application server 2 through the network (step S112).

アプリケーションサーバ２では、音声合成サーバ３により送信された音声ファイルが通信処理部２７により受信されると、その音声ファイルをデータ処理部２８に渡し、データ処理部２８が取得する（ステップＳ１１３）。 In the application server 2, when the voice file transmitted from the voice synthesis server 3 is received by the communication processing unit 27, the voice file is transferred to the data processing unit 28, and the data processing unit 28 acquires the voice file (step S113).

データ処理部２８は、取得した音声ファイルを、キャッシュデータ保存部２５に保存すると共にユーザＰＣ１へ送り、ユーザＰＣ１のスピーカから音声が出力される（ステップＳ１１４）。 The data processing unit 28 stores the acquired audio file in the cache data storage unit 25 and sends it to the user PC 1, and the audio is output from the speaker of the user PC 1 (step S 114).

上記Ｓ１０９の検索ステップの検索の結果、対象のテキストデータが、インデックス情報記憶部２４の複数のインデックス情報のうちのいずれかに存在した場合（ステップＳ１１０のＹｅｓ）、データ処理部２８は、検索されたインデックス情報により示される保存場所（キャッシュデータ保存部２５）から、該当する音声ファイルを読み出して取得し（ステップＳ１１３）、音声合成サーバ３に依頼することなく、ユーザＰＣ１へ転送（送信）する（ステップＳ１１４）。 As a result of the search in the search step in S109, when the target text data is present in any one of the plurality of index information in the index information storage unit 24 (Yes in step S110), the data processing unit 28 is searched. The corresponding voice file is read out and acquired from the storage location (cache data storage unit 25) indicated by the index information (step S113), and is transferred (transmitted) to the user PC 1 without requesting the voice synthesis server 3 ( Step S114).

その後、処理終了操作が行われなければ（ステップＳ１１５のＮｏ）、次のテキスト入力を待機する。また処理終了操作が行われると（ステップＳ１１５のＹｅｓ）、データ処理部２８は、テキスト／音声の変換処理を終了する。 Thereafter, if the processing end operation is not performed (No in step S115), the next text input is waited. When the processing end operation is performed (Yes in step S115), the data processing unit 28 ends the text / speech conversion processing.

続いて、図５を参照して一度作成された音声ファイルのキャッシュ処理を説明する。
アプリケーションサーバ２では、作成された中間ファイルおよび音声ファイルは、メモリ２２に一旦キャッシュされる。 Next, a description will be given of a cache process for an audio file once created with reference to FIG.
In the application server 2, the created intermediate file and audio file are temporarily cached in the memory 22.

そして、ユーザにより当該音声ファイルの変換元の中間ファイルが音声変換画面上で指定されると（ステップＳ１２１）、登録部２９は、メモリ２２から中間ファイルを読み出し音声変換画面にその内容（テキストデータとアクセント部号）を表示する（ステップＳ１２２）。 When the intermediate file from which the audio file is converted is designated by the user on the audio conversion screen (step S121), the registration unit 29 reads the intermediate file from the memory 22 and displays the contents (text data and text data) on the audio conversion screen. (Accent part number) is displayed (step S122).

そして、ユーザが、音声変換画面に表示された中間ファイルの内容であるテキストデータとアクセント符号を編集した後（ステップＳ１２３）、音声変換画面に設けられている音声ファイル保存指示用のボタンを押下すると（ステップＳ１２４）、保存先指定用のダイアログボックスが表示される。 When the user edits the text data and the accent code that are the contents of the intermediate file displayed on the voice conversion screen (step S123), the user presses the voice file save instruction button provided on the voice conversion screen. (Step S124), a save destination designation dialog box is displayed.

ユーザがこの画面より、保存先を指定すると（ステップＳ１２５）、登録部２９は、指定された保存先のフォルダへ音声ファイルを保存する（ステップＳ１２６）。そして、終了操作が行われると（ステップＳ１２７のＹｅｓ）、登録部２９は、保存した音声ファイルのインデックス情報をインデックス情報記憶部２４のユーザＩＤ毎の登録辞書４２に登録する（ステップＳ１２８）。 When the user designates a save destination from this screen (step S125), the registration unit 29 saves the audio file in the designated save destination folder (step S126). When the ending operation is performed (Yes in step S127), the registration unit 29 registers the index information of the stored audio file in the registration dictionary 42 for each user ID in the index information storage unit 24 (step S128).

続いて、図６を参照して上記ステップＳ１０９の検索処理の詳細を説明する。
この場合、ＧＵＩ部２１により表示された音声変換画面から、検索ボタンを操作すると、検索部２６は、作成された中間ファイルのテキストデータとアクセント記号を検索キーにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第１順位であるユーザＩＤ毎の登録辞書４２の中の中間ファイルを検索する（ステップＳ２０１：第１検索）。 Next, the details of the search process in step S109 will be described with reference to FIG.
In this case, when a search button is operated from the voice conversion screen displayed by the GUI unit 21, the search unit 26 stores the created intermediate file text data and accent symbol in the index information storage unit 24 using the search key. The intermediate file in the registered dictionary 42 for each user ID, which is the first rank, of the index information being searched is searched (step S201: first search).

この第１検索の結果、ユーザＩＤ毎の登録辞書４２の中の中間ファイルに、検索キーと同じデータが存在すると（ステップＳ２０２のＹｅｓ）、次のステップ１１０で、検索部２６は、そのデータに対応する音声ファイルをキャッシュデータ保存部２５から読み出してデータ処理部２８に渡す。これにより音声ファイルがデータ処理部２８に取得される。 As a result of the first search, if the same data as the search key exists in the intermediate file in the registration dictionary 42 for each user ID (Yes in step S202), the search unit 26 adds the data to the data in the next step 110. The corresponding audio file is read from the cache data storage unit 25 and passed to the data processing unit 28. As a result, the audio file is acquired by the data processing unit 28.

一方、第１検索の結果、ユーザＩＤ毎の登録辞書４２の中の中間ファイルに、検索キーと同じデータが存在しない場合（ステップＳ２０２のＮｏ）、検索部２６は、次に同中間ファイルのテキストデータを検索キーにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第２順位であるユーザＩＤ毎の登録辞書４２の中のテキストを検索する（ステップＳ２０３：第２検索）。 On the other hand, as a result of the first search, if the same data as the search key does not exist in the intermediate file in the registration dictionary 42 for each user ID (No in step S202), the search unit 26 next selects the text of the intermediate file. Using the data as a search key, the index information stored in the index information storage unit 24 is searched for text in the registration dictionary 42 for each user ID in the second order (step S203: second search). .

この第２検索の結果、ユーザＩＤ毎の登録辞書４２の中のテキストに、検索キーと同じデータが存在すると（ステップＳ２０４のＹｅｓ）、次のステップ１１０に移る。 As a result of the second search, if the same data as the search key exists in the text in the registration dictionary 42 for each user ID (Yes in step S204), the process proceeds to the next step 110.

また、第２検索の結果、ユーザＩＤ毎の登録辞書４２の中のテキストに、検索キーと同じデータが存在しない場合（ステップＳ２０４のＮｏ）、検索部２６は、次に同中間ファイルのテキストデータとアクセント記号を検索キーにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第３順位であるユーザ毎の使用履歴４３の中の中間ファイルを検索する（ステップＳ２０５：第３検索）。 If the same data as the search key does not exist in the text in the registration dictionary 42 for each user ID as a result of the second search (No in step S204), the search unit 26 then selects the text data of the intermediate file. And the accent mark as a search key, the intermediate file in the usage history 43 for each user in the third rank is searched from the index information stored in the index information storage unit 24 (step S205: third). Search).

この第３検索の結果、ユーザＩＤ毎の登録辞書４２の中の中間ファイルに、検索キーと同じデータが存在すると（ステップＳ２０６のＹｅｓ）、次のステップ１１０に移る。 If the same data as the search key exists in the intermediate file in the registration dictionary 42 for each user ID as a result of the third search (Yes in step S206), the process proceeds to the next step 110.

一方、第３検索の結果、ユーザＩＤ毎の登録辞書４２の中の中間ファイルに、検索キーと同じデータが存在しない場合（ステップＳ２０６のＮｏ）、検索部２６は、次に同中間ファイルのテキストデータを検索キーにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第４順位であるユーザＩＤ毎の登録辞書４２の中のテキストを検索する（ステップＳ２０７：第４検索）。 On the other hand, as a result of the third search, when the same data as the search key does not exist in the intermediate file in the registration dictionary 42 for each user ID (No in step S206), the search unit 26 next selects the text of the intermediate file. Using the data as a search key, the index information stored in the index information storage unit 24 is searched for text in the registration dictionary 42 for each user ID in the fourth rank (step S207: fourth search). .

この第４検索の結果、ユーザＩＤ毎の登録辞書４２の中のテキストに、検索キーと同じデータが存在すると（ステップＳ２０８のＹｅｓ）、次のステップ１１０に移る。 As a result of the fourth search, if the same data as the search key exists in the text in the registration dictionary 42 for each user ID (Yes in step S208), the process proceeds to the next step 110.

また、第４検索の結果、ユーザＩＤ毎の登録辞書４２の中のテキストに、検索キーと同じデータが存在しない場合（ステップＳ２０８のＮｏ）、検索部２６は、次に同中間ファイルのテキストデータとアクセント記号を検索キーにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第５順位であるサーバ使用履歴４４の中の中間ファイルを検索する（ステップＳ２０９：第５検索）。 If the same data as the search key does not exist in the text in the registration dictionary 42 for each user ID as a result of the fourth search (No in step S208), the search unit 26 next selects the text data of the intermediate file. And the accent mark as a search key, the intermediate file in the server usage history 44 in the fifth rank of the index information stored in the index information storage unit 24 is searched (step S209: fifth search). .

この第５検索の結果、サーバ使用履歴４４の中の中間ファイルに、検索キーと同じデータが存在すると（ステップＳ２１０のＹｅｓ）、次のステップ１１０に移る。 As a result of the fifth search, if the same data as the search key exists in the intermediate file in the server usage history 44 (Yes in step S210), the process proceeds to the next step 110.

一方、第５検索の結果、サーバ使用履歴４４の中の中間ファイルに、検索キーと同じデータが存在しない場合（ステップＳ２１０のＮｏ）、検索部２６は、次に同中間ファイルのテキストデータを検索キーにして、インデックス情報記憶部２４に記憶されているインデック情報のうちの、第６順位であるサーバ使用履歴４４の中のテキストを検索する（ステップＳ２１１：第６検索）。 On the other hand, as a result of the fifth search, when the same data as the search key does not exist in the intermediate file in the server usage history 44 (No in step S210), the search unit 26 next searches the text data of the intermediate file. Using the key, the text in the server usage history 44 in the sixth rank in the index information stored in the index information storage unit 24 is searched (step S211: sixth search).

この第６検索の結果、サーバ使用履歴４４の中のテキストに、検索キーと同じデータが存在すると（ステップＳ２１２のＹｅｓ）、次のステップ１１０に移る。 As a result of the sixth search, if the same data as the search key exists in the text in the server usage history 44 (Yes in step S212), the process proceeds to the next step 110.

また、第６検索の結果、サーバ使用履歴４４の中のテキストに、検索キーと同じデータが存在しない場合（ステップＳ２１２のＮｏ）、検索部２６は、キャッシュデータ保存部２５にデータが存在しない旨をデータ処理部２８に通知する。この通知を受けたデータ処理部２８は、音声変換要求と中間ファイルを音声合成サーバ３へ送信する。 If the same data as the search key does not exist in the text in the server usage history 44 as a result of the sixth search (No in step S212), the search unit 26 indicates that no data exists in the cache data storage unit 25. Is notified to the data processing unit 28. Upon receiving this notification, the data processing unit 28 transmits the voice conversion request and the intermediate file to the voice synthesis server 3.

このようにこの実施形態によれば、ユーザからの要求に応じてテキストを音声に変換する機能を、ユーザインターフェース部分であるアプリケーションサーバ２と音声合成エンジン部部分である音声合成サーバ３とに分けたことで、負荷分散を図ることができる。 As described above, according to this embodiment, the function of converting text into speech in response to a request from the user is divided into the application server 2 which is a user interface part and the speech synthesis server 3 which is a speech synthesis engine part. Thus, load distribution can be achieved.

また、ユーザがアプリケーションサーバ２に入力した文字または文章（テキストデータ、文字列）が以前に使用された文字または文章と同一であった場合は音声合成サーバ３に処理を依頼せずに、自身のハードディスク装置にキャッシュされている音声ファイルを読み出して返すことで、音声合成サーバ３側の処理負荷を軽減することができる。 If the character or sentence (text data, character string) input by the user to the application server 2 is the same as the previously used character or sentence, the user does not request the speech synthesis server 3 to perform the process. By reading and returning the voice file cached in the hard disk device, the processing load on the voice synthesis server 3 can be reduced.

さらに、ユーザが独自に登録したユーザＩＤ毎の登録辞書４２を第１に検索することで、例えばユーザ独自のアクセストの音声をユーザへ提供することができる。 Further, by first searching the registration dictionary 42 for each user ID registered by the user, for example, the user's unique access voice can be provided to the user.

この結果、文字／音声変換の応答性を維持しつつコストダウンを図り、さらには、特徴のあるサービスを実現することができる。 As a result, it is possible to reduce costs while maintaining the responsiveness of character / speech conversion, and to realize a characteristic service.

説明した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。
上記実施形態では、中間ファイルとテキストデータとを交互に検索したが、中間ファイルのみ、またはテキストデータのみで複数のインデックス情報に順位を設定し、その順位の順にインデックス情報を検索してもよい。 The described embodiments are presented by way of example and are not intended to limit the scope of the invention. The novel embodiment can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The above-described embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and equivalents thereof.
In the above-described embodiment, the intermediate file and the text data are alternately searched. However, ranks may be set for a plurality of index information using only the intermediate file or only the text data, and the index information may be searched in order of the ranks.

また上記実施形態に示した各構成要素を、コンピュータのハードディスク装置などのストレージにインストールしたプログラムで実現してもよく、また上記プログラムを、コンピュータ読取可能な電子媒体：electronic mediaに記憶しておき、プログラムを電子媒体からコンピュータに読み取らせることで本発明の機能をコンピュータが実現するようにしてもよい。電子媒体としては、例えばＣＤ−ＲＯＭ等の記録媒体やフラッシュメモリ、リムーバブルメディア：Removable media等が含まれる。さらに、ネットワークを介して接続した異なるコンピュータに構成要素を分散して記憶し、各構成要素を機能させたコンピュータ間で通信することで実現してもよい。 Further, each component shown in the above embodiment may be realized by a program installed in a storage such as a hard disk device of a computer, and the program is stored in a computer-readable electronic medium: electronic media, The computer may realize the functions of the present invention by causing a computer to read a program from an electronic medium. Examples of the electronic medium include a recording medium such as a CD-ROM, flash memory, and removable media. Further, the configuration may be realized by distributing and storing components in different computers connected via a network, and communicating between computers in which the components are functioning.

１…ユーザＰＣ、２…アプリケーションサーバ、３…音声合成サーバ３１…グラフィックユーザインターフェース部（ＧＵＩ部）、２２…メモリ、２３…中間ファイル生成部、２４…インデックス情報記憶部、２５…キャッシュデータ保存部、２５…キャッシュデータ保存部、２６…検索部、２７…通信処理部、２８…データ処理部、２９…登録部、４１…ユーザＩＤテーブル、４２…ユーザＩＤ毎の登録辞書、４３…ユーザＩＤ毎の使用履歴、４４…サーバ使用履歴。 DESCRIPTION OF SYMBOLS 1 ... User PC, 2 ... Application server, 3 ... Speech synthesis server 31 ... Graphic user interface part (GUI part), 22 ... Memory, 23 ... Intermediate file generation part, 24 ... Index information storage part, 25 ... Cache data storage part , 25 ... cache data storage unit, 26 ... search unit, 27 ... communication processing unit, 28 ... data processing unit, 29 ... registration unit, 41 ... user ID table, 42 ... registration dictionary for each user ID, 43 ... for each user ID Usage history, 44... Server usage history.

Claims

In a speech conversion support device connected via a network to a speech conversion device that converts text data into speech data,
A cache data storage unit in which the speech data previously converted by the speech conversion device and the text data from which the speech data is converted are stored as past data;
An index information storage unit in which a plurality of pieces of index information with different narrowing down ranges are stored in order for searching past data stored in the cache data storage unit in descending order of affinity with the user;
A search unit that searches a plurality of pieces of index information in the index information storage unit based on text data input from a request source in the order of ranking;
The search portion searches results by, while said text data returns the audio data used in the past as indicated by the presence index information to the previous reading from the cache data storage unit Kiyo Motomemoto, in any of the index information A data processing unit that sends the text data that does not exist to the voice conversion device and returns the voice data converted by the voice conversion device to the request source ;
The index information storage unit is a user registration dictionary uniquely registered by a user who has logged in to the device, a first usage history which is a usage history of the user, and a usage history of all users who have logged into the device. With 2 usage histories,
The search unit
First, the user registration dictionary is searched, and when the input text data does not exist in the user registration dictionary, the second usage history is searched second, and the text data is stored in the first usage history. If not, thirdly, the speech conversion support device searches the second usage history .

In a speech conversion support device connected via a network to a speech conversion device that converts text data into speech data,
The voice data previously converted by the voice converter, the text data from which the voice data is converted, and the accent information that specifies the strength of the sound when the text data is converted to voice are stored as past data. Cache data storage unit
An index information storage unit in which a plurality of pieces of index information with different narrowing down ranges are stored in order for searching past data stored in the cache data storage unit in descending order of affinity with the user;
A search unit that searches a plurality of pieces of index information in the index information storage unit based on text data input from a request source in the order of ranking;
As a result of the search by the search unit, the past voice data indicated by the index information in which the text data existed is read from the cache data storage unit and returned to the request source, while no text data exists in any index information. A data processing unit that sends the voice data to the voice conversion device as voice synthesis data together with the accent information corresponding to the text data, and returns the voice data converted by the voice conversion device to the request source ,
The index information storage unit is a user registration dictionary uniquely registered by a user who has logged in to the device, a first usage history which is a usage history of the user, and a usage history of all users who have logged into the device. With 2 usage histories,
The search unit
First, the user registration dictionary is searched, and when the input text data does not exist in the user registration dictionary, the second usage history is searched second, and the text data is stored in the first usage history. If not, thirdly, the speech conversion support device searches the second usage history .

In the user registration dictionary, the first usage history, and the second usage history, text data that is a conversion source of previously converted voice data, and intermediate data that includes text data and accent marks are stored. And
The search unit
The voice according to claim 1 or 2 , wherein first the intermediate data is searched, and if the input text data does not exist in the intermediate data, the second conversion source text data is searched second. Conversion support device.

A speech conversion device that converts input text data into speech data, and a speech conversion device that is connected to the speech conversion device via a network, and sends a conversion request for converting the text data to speech data to the speech conversion device. In a speech synthesis system having a speech conversion support device that acquires speech data output from the speech conversion device in response to a conversion request and sends back to the input source of text data,
The voice conversion support device includes:
A cache data storage unit in which the speech data previously converted by the speech conversion device and the text data from which the speech data is converted are stored as past data;
An index information storage unit in which a plurality of pieces of index information with different narrowing down ranges are stored in order for searching past data stored in the cache data storage unit in descending order of affinity with the user;
A search unit that searches a plurality of pieces of index information in the index information storage unit based on text data input from a request source in the order of ranking;
As a result of the search by the search unit, the voice data used in the past indicated by the index information in which the text data existed is read from the cache data storage unit and returned to the request source, but is not present in any index information A data processing unit that sends the text data to the voice converter and returns the voice data converted by the voice converter to the request source;
The index information storage unit is a user registration dictionary uniquely registered by a user who has logged in to the device, a first usage history which is a usage history of the user, and a usage history of all users who have logged into the device. With 2 usage histories,
The search unit
First, the user registration dictionary is searched, and when the input text data does not exist in the user registration dictionary, the second usage history is searched second, and the text data is stored in the first usage history. If not, thirdly search the second usage history
A voice conversion support system characterized by this.

A speech conversion device that converts input text data into speech data, and a speech conversion device that is connected to the speech conversion device via a network and sends a conversion request for converting the text data to speech data to the speech conversion device, In a speech synthesis system having a speech conversion support device that acquires speech data output from the speech conversion device in response to a conversion request and sends back to the input source of text data,
The voice conversion support device includes:
The voice data previously converted by the voice converter, the text data from which the voice data is converted, and the accent information that specifies the strength of the sound when the text data is converted to voice are stored as past data. Cache data storage unit
An index information storage unit in which a plurality of pieces of index information with different narrowing down ranges are stored in order for searching past data stored in the cache data storage unit in descending order of affinity with the user;
A search unit that searches a plurality of pieces of index information in the index information storage unit based on text data input from a request source in the order of ranking;
As a result of the search by the search unit, the past voice data indicated by the index information in which the text data existed is read from the cache data storage unit and returned to the request source, while no text data exists in any index information. A data processing unit that sends the voice data to the voice conversion device as voice synthesis data together with the accent information corresponding to the text data, and returns the voice data converted by the voice conversion device to the request source ,
The index information storage unit
A user registration dictionary uniquely registered by a user who has logged into this device, a first usage history which is the user's usage history, and a second usage history which is the usage history of all users who have logged into the device,
The search unit
First, the user registration dictionary is searched, and when the input text data does not exist in the user registration dictionary, the second usage history is searched second, and the text data is stored in the first usage history. If not, thirdly, the second usage history is searched . A speech synthesis system , wherein the second usage history is searched .

In a program that causes a speech conversion support device connected via a network to a speech conversion device that converts text data to speech data,
The voice conversion support device,
The voice data previously converted by the voice converter, the text data from which the voice data is converted, and the accent information that specifies the strength of the sound when the text data is converted to voice are stored as past data. Cache data storage unit
An index information storage unit in which a plurality of pieces of index information with different narrowing down ranges are stored in order for searching past data stored in the cache data storage unit in descending order of affinity with the user;
A search unit that searches a plurality of pieces of index information in the index information storage unit based on text data input from a request source in the order of ranking;
As a result of the search by the search unit, the past voice data indicated by the index information in which the text data existed is read from the cache data storage unit and returned to the request source, while no text data exists in any index information. The text data and the accent information corresponding to the text data is sent to the speech converter as speech synthesis data, and the speech data converted by the speech converter is functioned as a data processor that returns to the request source .
The index information storage unit
A user registration dictionary uniquely registered by a user who has logged into this device, a first usage history which is the user's usage history, and a second usage history which is the usage history of all users who have logged into the device,
The search unit
First, the user registration dictionary is searched, and when the input text data does not exist in the user registration dictionary, the second usage history is searched second, and the text data is stored in the first usage history. If not, thirdly, the second usage history is searched .

Connected via a network to a voice conversion device that converts text data into voice data, converts voice data previously converted by the voice conversion device, text data from which the voice data is converted, and the text data into voice. The cache data storage unit stores the past data stored in the cache data storage unit in correspondence with the accent information that specifies the strength of the sound, and searches the past data stored in the cache data storage unit in descending order of affinity with the user. And a speech conversion support method in a speech conversion support device comprising an index information storage unit in which a plurality of index information having different narrowing ranges are ranked and stored,
A search step of searching a plurality of index information in the index information storage unit in the order of ranking based on text data input from a request source;
As a result of the search in the search step, the past audio data indicated by the index information in which the text data existed is read from the cache data storage unit and returned to the request source, while no text data exists in any index information. A data processing step of sending the voice data together with the accent information corresponding to the text data to the voice conversion device as data for voice synthesis and returning the voice data converted by the voice conversion device to the request source;
The index information storage unit
A user registration dictionary uniquely registered by a user who has logged into this device, a first usage history which is the user's usage history, and a second usage history which is the usage history of all users who have logged into the device,
In the search step, first, the user registration dictionary is searched, and when the input text data does not exist in the user registration dictionary, the first usage history is searched second, and the first usage history is stored in the first usage history. The voice conversion support method , wherein, when the text data does not exist, thirdly, the second usage history is searched .