JP2012037726A

JP2012037726A - Voice synthesizer and computer program

Info

Publication number: JP2012037726A
Application number: JP2010177776A
Authority: JP
Inventors: Reiko Saito; 礼子齋藤; Toru Tsugi; 徹都木; Nobumasa Seiyama; 信正清山
Original assignee: Nippon Hoso Kyokai NHK; NHK Engineering Services Inc
Current assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2010-08-06
Filing date: 2010-08-06
Publication date: 2012-02-23
Anticipated expiration: 2030-08-06
Also published as: JP5518621B2

Abstract

【課題】容易に且つ効率的に音声データを補正する。
【解決手段】制御情報テーブルを記憶する制御情報テーブル記憶部５１と、音声データの基本周波数の時間変化情報を生成する基本周波数分析部３１と、話速値を計算する話速計算部３２と、制御方法の種別と話速の制御量または基本周波数の変化幅の制御量を指定するキー情報とを設定する制御条件設定部５２と、制御方法の種別に関連付けられた制御情報テーブルを読み込み、キー情報と話速値と基本周波数の時間変化情報とに基づいて話速の制御量と基本周波数の変化幅の制御量とを取得する制御情報取得部５３と、基本周波数の変化幅の制御量に基づいて基本周波数の時間変化情報から目標基本周波数の時間変化情報を生成する目標基本周波数生成部５４と、話速の制御量と目標基本周波数の時間変化情報とに基づいて音声データを補正する音声データ補正部８０とを備えた。
【選択図】図１Sound data is corrected easily and efficiently.
A control information table storage unit 51 that stores a control information table, a fundamental frequency analysis unit 31 that generates time change information of a fundamental frequency of voice data, a speech rate calculation unit 32 that calculates a speech rate value, The control condition setting unit 52 for setting the control method type and the key information for specifying the control amount of the speech speed or the change amount of the fundamental frequency, and the control information table associated with the control method type are read A control information acquisition unit 53 that acquires the control amount of the speech speed and the control amount of the change width of the fundamental frequency based on the information, the speech speed value, and the time change information of the fundamental frequency; Based on the target basic frequency generation unit 54 for generating the time change information of the target basic frequency from the time change information of the basic frequency based on the control amount of the speech speed and the time change information of the target basic frequency. And a voice data correction unit 80 to which a positive to.
[Selection] Figure 1

Description

本発明は、入力音声の韻律を補正する音声合成装置およびコンピュータプログラムに関する。 The present invention relates to a speech synthesizer and a computer program that correct prosody of input speech.

合成音声等の音声データを聴き易い音声データに変換するために、韻律に関する音響特徴量を変更することは有効である。そして、韻律に関する音響特徴量として例えばピッチ、パワー、および継続長を制御対象とし、ピッチもしくはパワー、またはピッチパターンの時間変動のダイナミックレンジを変更したりポーズ長を変えたりして、音声の音質を変化させる技術が知られている（例えば、特許文献１参照）。 In order to convert voice data such as synthesized voice into voice data that is easy to listen to, it is effective to change the acoustic feature quantity related to prosody. Then, for example, pitch, power, and duration are controlled as acoustic features related to the prosody, and the sound quality is improved by changing the dynamic range of the time variation of the pitch or power or pitch pattern or changing the pause length. A technique for changing is known (see, for example, Patent Document 1).

特開平１０−９７２６７号公報JP-A-10-97267

音声データの音声を聴き易い音声に変換するためには、音声データの韻律に関する複数種類の音響特徴量を補正することが有効である。しかしながら、複数種類の音響特徴量を調整して所望の韻律の音声に近づけるには、音響特徴量相互の関係や音響特徴量と聴き易さとの関係等の音響に関する専門知識や豊かな調整経験が必要である。また、このような調整作業を、人手を伴って行うことは非常に複雑でかつ煩雑である。
そこで、本発明は上記事情に鑑みてなされたものであり、音響に関する専門知識や豊かな調整経験がなくとも音声データを補正でき、聴き易い音声データを容易に且つ効果的に得ることができる、音声合成装置およびコンピュータプログラムを提供することを目的とする。 In order to convert the sound of the sound data into a sound that is easy to hear, it is effective to correct a plurality of types of acoustic feature values related to the prosody of the sound data. However, in order to adjust multiple types of acoustic features and bring them closer to the desired prosody, you need extensive expertise in sound and rich adjustment experience, such as the relationship between acoustic features and the relationship between acoustic features and ease of listening. is necessary. In addition, it is very complicated and complicated to perform such adjustment work with human hands.
Therefore, the present invention has been made in view of the above circumstances, it is possible to correct the audio data without having specialized knowledge and rich adjustment experience, and it is possible to easily and effectively obtain easy-to-listen audio data. An object is to provide a speech synthesizer and a computer program.

［１］上記の課題を解決するため、本発明の一態様である音声合成装置は、話速の制御量と基本周波数の変化幅の制御量とを対応付けた制御情報テーブルを、制御方法の種別に関連付けて記憶する制御情報テーブル記憶部と、音声データを取得する音声データ取得部と、前記音声データ取得部が取得した前記音声データに基づいて基本周波数の時間変化情報を生成する基本周波数分析部と、前記音声データに基づいて話速値を計算する話速計算部と、所望の制御方法の種別と制御情報テーブルにおける話速の制御量または基本周波数の変化幅の制御量を指定するキー情報とを設定する制御条件設定部と、前記制御条件設定部が設定した前記制御方法の種別に関連付けられた制御情報テーブルを前記制御情報テーブル記憶部から読み込み、前記制御条件設定部が設定した前記キー情報と前記話速計算部が計算した前記話速値と前記基本周波数分析部が生成した前記基本周波数の時間変化情報とに基づいて、話速の制御量と基本周波数の変化幅の制御量とを取得する制御情報取得部と、前記制御情報取得部が取得した前記基本周波数の変化幅の制御量に基づいて、前記基本周波数の時間変化情報から目標基本周波数の時間変化情報を生成する目標基本周波数生成部と、前記制御情報取得部が取得した前記話速の制御量と、前記目標基本周波数生成部が生成した前記目標基本周波数の時間変化情報とに基づいて、前記音声データを補正して補正後音声データを出力する音声データ補正部と、を備えることを特徴とする。
ここで、話速の制御量は、音声データ補正部が韻律の音響特徴量である話速を補正するための補正データまたは補正量を表すデータである。また、基本周波数の変化幅の制御量は、音声データ補正部が韻律の音響特徴量である基本周波数の変化幅、言い換えると音域を補正するための補正データまたは補正量を表すデータである。
この構成によれば、本発明の一態様である音声合成装置は、話速の制御量と基本周波数の変化幅の制御量とのいずれか一方を指定することにより、この一方の制御量に対応する他方の制御量を制御情報テーブル記憶部から取得する。そして、この音声合成装置は、取得した基本周波数の変化幅の制御量に基づいて目標基本周波数の時間変化情報を生成し、取得した話速の制御量と生成した目標基本周波数の時間変化情報とに基づいて、音声データの話速および基本周波数の変化幅に関する韻律を補正する。
よって、本発明の一態様である音声合成装置によれば、音声データが容易に且つ効率的に補正後音声データに補正される。 [1] In order to solve the above-described problem, a speech synthesizer according to an aspect of the present invention provides a control information table in which a control amount of speech speed and a control amount of a change amount of a fundamental frequency are associated with each other in a control method. Basic frequency analysis that generates time change information of a fundamental frequency based on the audio data acquired by the audio data acquisition unit, a control information table storage unit that stores the data in association with a type, an audio data acquisition unit , A speech speed calculation unit for calculating a speech speed value based on the voice data, and a key for designating a desired control method type and a control amount of a speech speed or a change amount of a fundamental frequency in a control information table A control condition setting unit for setting information, and a control information table associated with the type of the control method set by the control condition setting unit is read from the control information table storage unit, Based on the key information set by the control condition setting unit, the speech speed value calculated by the speech speed calculation unit, and the time change information of the fundamental frequency generated by the fundamental frequency analysis unit, A control information acquisition unit that acquires a control amount of a change width of the fundamental frequency, and a target basic frequency from the time change information of the fundamental frequency based on the control amount of the change width of the fundamental frequency acquired by the control information acquisition unit. Based on the target fundamental frequency generation unit that generates the time change information of the target, the control amount of the speech speed acquired by the control information acquisition unit, and the time change information of the target basic frequency generated by the target basic frequency generation unit An audio data correction unit that corrects the audio data and outputs the corrected audio data.
Here, the speech speed control amount is correction data or data representing a correction amount for the speech data correction unit to correct the speech speed, which is the prosody acoustic feature amount. Further, the control amount of the change width of the fundamental frequency is data representing the correction data or the correction amount for correcting the change width of the fundamental frequency, which is the acoustic feature quantity of the prosody, in other words, correcting the sound range.
According to this configuration, the speech synthesizer according to one aspect of the present invention can handle either one of the control amounts by designating either the control amount of the speech speed or the control amount of the change width of the fundamental frequency. The other control amount to be acquired is acquired from the control information table storage unit. Then, the speech synthesizer generates time change information of the target basic frequency based on the acquired control amount of the change width of the basic frequency, and the acquired control amount of the speech speed and the generated time change information of the target basic frequency, Based on the above, the prosody related to the speech speed and the change width of the fundamental frequency of the voice data is corrected.
Therefore, according to the speech synthesizer which is an aspect of the present invention, speech data is easily and efficiently corrected into corrected speech data.

［２］上記［１］記載の音声合成装置において、話速の絶対値を話速の制御量とし、基本周波数の変化幅の絶対値を基本周波数の変化幅の制御量とし、前記制御情報テーブル記憶部は、複数の話速値の絶対値それぞれに基本周波数の変化幅の絶対値を対応付けた制御情報テーブルを制御方法の種別に関連付けて記憶することを特徴とする。
［３］上記［１］記載の音声合成装置において、話速の倍率を話速の制御量とし、基本周波数の変化幅の倍率を基本周波数の変化幅の制御量とし、前記制御情報テーブル記憶部は、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた制御情報テーブルを制御方法の種別に関連付けて記憶し、前記制御条件設定部は、前記キー情報についての所望の倍率をさらに設定し、前記制御情報取得部は、前記制御条件設定部が設定した前記キー情報と前記倍率と前記話速計算部が計算した前記話速値と前記基本周波数分析部が生成した前記基本周波数の時間変化情報とに基づいて、話速の制御量と基本周波数の変化幅の制御量とを取得することを特徴とする。
［４］上記［３］記載の音声合成装置において、前記制御情報テーブル記憶部は、複数の指定話速値それぞれに対応して、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた制御情報テーブルを制御方法の種別に関連付けて記憶し、前記制御情報取得部は、前記制御条件設定部が設定した前記制御方法の種別に関連し且つ前記話速計算部が計算した前記話速値と同一である指定話速値に対応する制御情報テーブルを前記制御情報テーブル記憶部から読み込み、前記制御条件設定部が設定した前記キー情報と前記倍率と前記話速値と前記基本周波数分析部が生成した前記基本周波数の時間変化情報とに基づいて、話速の制御量と基本周波数の変化幅の制御量とを取得することを特徴とする。
［５］上記［１］記載の音声合成装置において、前記制御情報テーブル記憶部は、話速の制御量である話速の倍率に基本周波数の変化幅の制御量である基本周波数の変化幅の倍率を対応付けた制御情報テーブルを、制御方法の種別に関連付けて記憶し、前記制御条件設定部は、所望の制御方法の種別を設定し、前記制御情報取得部は、前記制御条件設定部が設定した前記制御方法の種別に関連付けられた制御情報テーブルを前記制御情報テーブル記憶部から読み込み、前記制御情報テーブルから話速の制御量と基本周波数の変化幅の制御量とを取得することを特徴とする。 [2] In the speech synthesizer according to [1], the absolute value of the speech speed is set as a control amount of the speech speed, the absolute value of the change width of the basic frequency is set as a control amount of the change width of the basic frequency, and the control information table The storage unit stores a control information table in which an absolute value of a change frequency of a fundamental frequency is associated with each absolute value of a plurality of speech speed values in association with a type of control method.
[3] In the speech synthesizer described in [1] above, the control information table storage unit includes a speech speed magnification as a speech speed control amount, a fundamental frequency change width as a fundamental frequency change amount control amount, and Stores a control information table in which a magnification of a change rate of a fundamental frequency is associated with each of a plurality of speech speed magnifications in association with a type of control method, and the control condition setting unit is configured to store a desired magnification for the key information. The control information acquisition unit further includes the key information set by the control condition setting unit, the magnification, the speech speed value calculated by the speech speed calculation unit, and the basic frequency generated by the fundamental frequency analysis unit. Based on the time change information of the frequency, the control amount of the speech speed and the control amount of the change width of the fundamental frequency are acquired.
[4] In the speech synthesizer according to [3] above, the control information table storage unit may multiply the multiples of the plurality of speech speeds by the multiples of the basic frequency corresponding to the plurality of designated speech speed values. The associated control information table is stored in association with the type of control method, and the control information acquisition unit is related to the type of control method set by the control condition setting unit and calculated by the speech speed calculation unit. A control information table corresponding to a designated speech speed value that is the same as the speech speed value is read from the control information table storage unit, and the key information, the magnification, the speech speed value, and the fundamental frequency set by the control condition setting unit Based on the time change information of the fundamental frequency generated by the analysis unit, the control amount of the speech speed and the control amount of the change width of the fundamental frequency are acquired.
[5] In the speech synthesizer described in [1] above, the control information table storage unit may calculate a basic frequency change width that is a control amount of a basic frequency change amount to a speech speed magnification that is a control amount of the speech speed. A control information table in which magnifications are associated is stored in association with a control method type, the control condition setting unit sets a desired control method type, and the control information acquisition unit is stored in the control condition setting unit. A control information table associated with the set type of the control method is read from the control information table storage unit, and a control amount of speech speed and a control amount of a change width of a fundamental frequency are acquired from the control information table. And

［６］上記の課題を解決するため、本発明の一態様であるコンピュータプログラムは、話速の制御量と基本周波数の変化幅の制御量とを対応付けた制御情報テーブルを、制御方法の種別に関連付けて記憶する制御情報テーブル記憶部を備えるコンピュータを、音声データを取得する音声データ取得手段と、前記音声データ取得手段が取得した前記音声データに基づいて基本周波数の時間変化情報を生成する基本周波数分析手段と、前記音声データに基づいて話速値を計算する話速計算手段と、所望の制御方法の種別と制御情報テーブルにおける話速の制御量または基本周波数の変化幅の制御量を示すキー情報とを設定する制御条件設定手段と、前記制御条件設定手段が設定した前記制御方法の種別に関連付けられた制御情報テーブルを前記制御情報テーブル記憶部から読み込み、前記制御条件設定手段が設定した前記キー情報と前記話速計算手段が計算した前記話速値と前記基本周波数分析手段が生成した前記基本周波数の時間変化情報とに基づいて、話速の制御量と基本周波数の変化幅の制御量とを取得する制御情報取得手段と、前記制御情報取得手段が取得した前記基本周波数の変化幅の制御量に基づいて、前記基本周波数の時間変化情報から目標基本周波数の時間変化情報を生成する目標基本周波数生成手段と、前記制御情報取得手段が取得した前記話速の制御量と、前記目標基本周波数生成手段が生成した前記目標基本周波数の時間変化情報とに基づいて、前記音声データを補正して補正後音声データを出力する音声データ補正手段と、として機能させる。 [6] In order to solve the above-described problem, the computer program according to one aspect of the present invention provides a control information table in which a control amount of speech speed and a control amount of a change width of a fundamental frequency are associated with each other by a control method type A computer having a control information table storage unit stored in association with audio data acquisition means for acquiring audio data, and basic for generating time change information of a fundamental frequency based on the audio data acquired by the audio data acquisition means A frequency analysis means, a speech speed calculation means for calculating a speech speed value based on the voice data, a type of a desired control method, and a control amount of a speech speed or a control amount of a change amount of a fundamental frequency in a control information table Control condition setting means for setting key information, and a control information table associated with the type of control method set by the control condition setting means. Based on the key information set by the control condition setting means, the speech speed value calculated by the speech speed calculation means, and time change information of the fundamental frequency generated by the fundamental frequency analysis means, read from the information table storage unit Control information acquisition means for acquiring the control amount of the speech speed and the control amount of the change width of the fundamental frequency, and the basic frequency based on the control amount of the change width of the fundamental frequency acquired by the control information acquisition means Target fundamental frequency generating means for generating time change information of the target fundamental frequency from the time change information of the above, the control amount of the speech speed acquired by the control information acquiring means, and the target basic frequency generated by the target basic frequency generating means Based on the time change information of the frequency, the sound data is corrected and the sound data correcting means for outputting the corrected sound data is made to function.

本発明によれば、音響に関する専門知識や豊かな調整経験がなくとも音声データを補正でき、聴き易い音声データを容易且つ効率的に得ることができる。 According to the present invention, it is possible to correct audio data without having specialized knowledge about sound and rich adjustment experience, and it is possible to easily and efficiently obtain audio data that is easy to listen to.

本発明の第１実施形態である音声合成装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the speech synthesizer which is 1st Embodiment of this invention. 音素区切時間情報を表すテキストデータの例を示す図である。It is a figure which shows the example of the text data showing phoneme division | segmentation time information. 話速値の絶対値に基本周波数の変化幅の絶対値を対応付けた制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table which matched the absolute value of the change width of a fundamental frequency with the absolute value of speech speed value. 複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table which matched the magnification of the change width of a fundamental frequency with each magnification of several speech speed. 三つの指定話速値それぞれに対応して、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table which matched the magnification | multiplying_factor of the change width of a fundamental frequency with each of the magnification | multiplying of a several speech speed corresponding to each of three designated speech speed values. 目標基本周波数生成部が、基本周波数の変化幅の制御量に基づいて生成する目標基本周波数の時間変化情報を模式的に示す図である。It is a figure which shows typically the time change information of the target fundamental frequency which a target fundamental frequency production | generation part produces | generates based on the control amount of the change width of a fundamental frequency. 同実施形態において、音声合成装置の韻律補正処理の処理手順を示すフローチャートである。4 is a flowchart showing a processing procedure of prosody correction processing of the speech synthesizer in the embodiment. 本発明の第２実施形態である音声合成装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the speech synthesizer which is 2nd Embodiment of this invention. 難聴者向けの制御情報が格納された制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table in which the control information for hearing-impaired people was stored. 速聴向けの制御情報が格納された制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table in which the control information for quick listening was stored. 早口向けの制御情報が格納された制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table in which the control information for rapid mouths was stored. 雑音環境下向けの制御情報テーブルの例を示す図である。It is a figure which shows the example of the control information table for noise environments. 同実施形態において、音声合成装置の韻律補正処理の処理手順を示すフローチャートである。4 is a flowchart showing a processing procedure of prosody correction processing of the speech synthesizer in the embodiment.

以下、本発明を実施するための形態について、図面を参照して詳細に説明する。
［第１の実施の形態］
図１は、本発明の第１実施形態である音声合成装置の機能構成を示すブロック図である。同図に示すように、音声合成装置１００は、音声データ取得部１０と、音声属性情報取得部２０と、音声分析部３０と、制御条件情報生成部４０と、目標値生成部５０と、制御情報テーブル更新部６０と、生成条件取得部７０と、音声データ補正部８０とを備える。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram showing a functional configuration of a speech synthesizer according to the first embodiment of the present invention. As shown in the figure, the speech synthesizer 100 includes a speech data acquisition unit 10, a speech attribute information acquisition unit 20, a speech analysis unit 30, a control condition information generation unit 40, a target value generation unit 50, and a control. An information table update unit 60, a generation condition acquisition unit 70, and an audio data correction unit 80 are provided.

音声データ取得部１０は、外部から供給される音声データを取得し、この取得した音声データを音声分析部３０に供給する。音声データは、アナログ音声信号を標本化周波数で標本化し量子化して得られるデジタル音声データであり、例えば不図示の録音装置によって録音されて得られた録音音声データや不図示の音声合成装置によって音声合成処理された合成音声データである。音声データは、例えばＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）データであり、フォーマットは、例えばＷＡＶＥである。 The voice data acquisition unit 10 acquires voice data supplied from the outside, and supplies the acquired voice data to the voice analysis unit 30. The voice data is digital voice data obtained by sampling and quantizing an analog voice signal at a sampling frequency. For example, the voice data is recorded by a recording device (not shown) or voiced by a voice synthesis device (not shown). This is the synthesized voice data that has been synthesized. The audio data is, for example, PCM (Pulse Code Modulation) data, and the format is, for example, WAVE.

音声属性情報取得部２０は、音声データ取得部１０に供給される音声データに関連する音声属性情報の供給を受けて取得し、その音声属性情報を音声分析部３０に供給する。音声属性情報は、音声データの発話内容を文章化したテキスト情報、または音声データの音素区切時間情報である。
音声データの発話内容が「おはようございます（空白）ごきげんいかがですか（空白）」である場合、テキスト情報は、例えば“おはようございます。ごきげんいかがですか。”である。なお、“（空白）”は、その前後の音声の間に、発声されない区間があることを表す。また、テキスト情報に含まれる文章は、ひらがな表記の他、カタカナ表記、ローマ字表記、または漢字混じり表記のいずれであってもよい。また、本実施形態では、音声データの発話内容が日本語である場合について説明するが、本実施形態は外国語についても適用可能である。
音素区切時間情報は、音声データの音声の先頭に対応する時点を基準時としたときの、当該音声データの各音素の発声時間を示す情報である。音素とは、音韻論上の音の最小単位であり、母音および子音それぞれが１音素に対応する。また、撥音、長音、および促音それぞれも１音素に対応する。この音素区切時間情報の具体例については後述する。 The voice attribute information acquisition unit 20 receives and acquires voice attribute information related to the voice data supplied to the voice data acquisition unit 10, and supplies the voice attribute information to the voice analysis unit 30. The voice attribute information is text information in which speech content of voice data is converted into text, or phoneme separation time information of voice data.
When the utterance content of the voice data is “Good morning (blank) how are you? (Blank)”, the text information is, for example, “Good morning. How are you?” Note that “(blank)” indicates that there is a section that is not uttered between the preceding and following sounds. In addition to the hiragana notation, the sentence included in the text information may be any katakana notation, romaji notation, or kanji mixed notation. Further, in the present embodiment, a case where the utterance content of the voice data is Japanese will be described, but the present embodiment can also be applied to a foreign language.
The phoneme separation time information is information indicating the utterance time of each phoneme of the voice data when the time corresponding to the head of the voice of the voice data is set as the reference time. A phoneme is a minimum unit of sound in phonology, and each vowel and consonant corresponds to one phoneme. In addition, each of the repellent sound, the long sound, and the prompt sound corresponds to one phoneme. A specific example of the phoneme separation time information will be described later.

音声分析部３０は、音声データ取得部１０から供給される音声データと音声属性情報取得部２０から供給される音声属性情報とをそれぞれ取り込み、これら音声データと音声属性情報とを分析して音声の基本周波数（以下、単に「基本周波数」と呼ぶ。）の時間変化情報と話速値とをそれぞれ求めて目標値生成部５０に供給する。 The voice analysis unit 30 takes in the voice data supplied from the voice data acquisition unit 10 and the voice attribute information supplied from the voice attribute information acquisition unit 20, and analyzes the voice data and the voice attribute information to analyze the voice. Time change information and speech speed value of the fundamental frequency (hereinafter simply referred to as “fundamental frequency”) are obtained and supplied to the target value generation unit 50.

音声分析部３０は、その機能構成として基本周波数分析部３１と、話速計算部３２とを含んで構成される。
基本周波数分析部３１は、音声データに基づいて、基本周波数の時間変化情報を有声区間（声帯の振動を伴う音声である有声音声の区間）において生成し、その生成した基本周波数の値に対し、有声区間の分析値を用いてスプライン関数等によりスムージング処理を行うことによって、変化が滑らかな基本周波数の時間変化情報を生成する。このとき、基本周波数分析部３１は、基本周波数を抽出できない無声区間（声帯の振動を伴わない音声である無声音声の区間）については、この無声区間の前後の有声区間における基本周波数の値から内挿した値を用いて、無声区間における基本周波数の値を補間する。
なお、基本周波数分析部３１は、スムージング処理を行わずに、生成した基本周波数の値そのものを基本周波数の時間変化情報としてもよい。 The voice analysis unit 30 includes a fundamental frequency analysis unit 31 and a speech speed calculation unit 32 as functional configurations.
The fundamental frequency analysis unit 31 generates time change information of the fundamental frequency based on the speech data in a voiced section (speeched speech section that is speech accompanied by vocal cord vibration), and for the generated fundamental frequency value, By performing the smoothing process using a spline function or the like using the analysis value of the voiced section, the time change information of the fundamental frequency with a smooth change is generated. At this time, the fundamental frequency analysis unit 31 uses the fundamental frequency value in the voiced section before and after the unvoiced section for an unvoiced section where the fundamental frequency cannot be extracted (a section of unvoiced speech that is a voice not accompanied by vocal cord vibration). Using the inserted value, the value of the fundamental frequency in the unvoiced interval is interpolated.
The fundamental frequency analyzing unit 31 may use the generated fundamental frequency value itself as time change information of the fundamental frequency without performing the smoothing process.

音声属性情報がテキスト情報である場合、話速計算部３２は、次の二通りの方法のいずれかによって音声データの話速値を計算する。
第１の話速値を計算する方法は、音声データとテキスト情報とに基づいて音素区切時間情報を生成し、生成した音素区切時間情報に基づいて話速を計算する方法である。具体的には、話速計算部３２は、音声データから音声区間の総時間長を求め、この総時間長とテキスト情報とに基づき公知の音声認識処理技術によって音素区切時間情報を生成する。そして、話速計算部３２は、音素区切時間情報から単位時間当たりのモーラ（Ｍｏｒａ）数を求めることによって話速値（単位は、「モーラ／秒」）を得る。モーラとは、音の長さについての音韻論上の単位である。日本語では、おおむね拗音については仮名二文字が１モーラに対応し、直音については仮名一文字が１モーラに対応する。そして、１モーラは、一つまたは複数の音素により構成される。
第２の話速値を計算する方法は、音声データから音声区間の総時間長を求めるとともにテキスト情報からモーラ数を求めて、これら音声区間の総時間長とモーラ数とに基づいて話速値を計算する方法である。 When the speech attribute information is text information, the speech speed calculation unit 32 calculates the speech speed value of the speech data by one of the following two methods.
The method for calculating the first speech speed value is a method for generating phoneme segmentation time information based on speech data and text information and calculating the speech rate based on the generated phoneme segmentation time information. Specifically, the speech speed calculation unit 32 obtains the total time length of the speech section from the speech data, and generates phoneme separation time information by a known speech recognition processing technique based on the total time length and text information. Then, the speech speed calculation unit 32 obtains the speech speed value (unit: “Mora / second”) by obtaining the number of mora per unit time from the phoneme break time information. Mora is a phonological unit of sound length. In Japanese, two kana characters generally correspond to 1 mora for stuttering, and one kana character corresponds to 1 mora for straight sounds. One mora is composed of one or a plurality of phonemes.
The method for calculating the second speech speed value is to obtain the total time length of the speech section from the speech data and the number of mora from the text information, and based on the total time length of the speech section and the number of mora. Is a method of calculating

また、音声属性情報が音素区切時間情報である場合、話速計算部３２は、その音素区切時間情報から単位時間当たりのモーラ数を求めることによって話速値を得る。 When the speech attribute information is phoneme break time information, the speech speed calculation unit 32 obtains the speech speed value by obtaining the number of mora per unit time from the phoneme break time information.

制御条件情報生成部４０は、ユーザによるキーボードやマウス等の入力装置の操作によって所望の制御方法の種別と所望の制御パラメータと所望の補正目的の種別との供給を受け、これら制御方法の種別と制御パラメータと補正目的の種別とを含めた制御条件情報を生成して目標値生成部５０に供給する。
制御方法の種別は、話速の制御量と基本周波数の変化幅の制御量との制御方法を区別するデータである。話速の制御量は、音声データ補正部８０が韻律の第１の音響特徴量である話速を補正するための補正データまたは補正量を表すデータであり、目標となる話速値の絶対値、または話速の倍率で表される。基本周波数の変化幅の制御量は、音声データ補正部８０が韻律の第２の音響特徴量である基本周波数の変化幅、言い換えると音域を補正するための補正データまたは補正量を表すデータであり、基本周波数の変化幅の絶対値、または基本周波数の変化幅の倍率で表される。
制御方法の種別について具体的に説明する。本実施形態における制御方法の種別は定量的に制御量を指定する制御方法であり、下記の表１のように分類される。 The control condition information generation unit 40 receives supply of a desired control method type, a desired control parameter, and a desired correction purpose type by a user's operation of an input device such as a keyboard or a mouse. Control condition information including the control parameter and the correction purpose type is generated and supplied to the target value generation unit 50.
The type of the control method is data that distinguishes the control method between the control amount of the speech speed and the control amount of the change width of the fundamental frequency. The speech speed control amount is correction data for the speech data correction unit 80 to correct the speech speed that is the first acoustic feature amount of the prosody or data representing the correction amount, and the absolute value of the target speech speed value. , Or expressed as a rate of speech speed. The control amount of the change width of the fundamental frequency is data representing the correction data or the correction amount for correcting the change width of the fundamental frequency, which is the second acoustic feature quantity of the prosody, in other words, correcting the sound range. , The absolute value of the change width of the fundamental frequency, or the magnification of the change width of the fundamental frequency.
The type of control method will be specifically described. The type of control method in the present embodiment is a control method for quantitatively specifying a control amount, and is classified as shown in Table 1 below.

表１において、制御方法１は、話速の制御量と基本周波数の変化幅の制御量とを数値の絶対値で指定する方法である。制御方法２は、話速の制御量と基本周波数の変化幅の制御量とを倍率で指定する方法である。制御方法３は、音声分析部３０から供給される話速値（指定話速値）に対応して、話速の制御量と基本周波数の変化幅の制御量とを倍率で指定する方法である。制御方法の種別は、制御方法１〜制御方法３それぞれに対応付けられたデータである。 In Table 1, the control method 1 is a method of designating the control amount of the speech speed and the control amount of the change width of the fundamental frequency by numerical absolute values. The control method 2 is a method for designating the control amount of the speech speed and the control amount of the change width of the fundamental frequency by a magnification. The control method 3 is a method of designating the control amount of the speech speed and the control amount of the change width of the fundamental frequency by a magnification corresponding to the speech speed value (designated speech speed value) supplied from the voice analysis unit 30. . The type of the control method is data associated with each of the control methods 1 to 3.

制御パラメータは、目標値生成部５０が話速の制御量と基本周波数の変化幅の制御量とを取得するために用いられるキー情報であり、具体的には次のとおりである。制御方法１に対応する制御パラメータは、話速の制御量と基本周波数の変化幅の制御量とのいずれかを指定する情報である。制御方法２および制御方法３に対応する制御パラメータは、話速の制御量と基本周波数の変化幅の制御量とのいずれかを指定する情報と所望の倍率とである。 The control parameter is key information used by the target value generation unit 50 to acquire the control amount of the speech speed and the control amount of the change width of the fundamental frequency, and is specifically as follows. The control parameter corresponding to the control method 1 is information for designating either the control amount of the speech speed or the control amount of the change width of the fundamental frequency. The control parameters corresponding to the control method 2 and the control method 3 are information specifying either the control amount of the speech speed or the control amount of the change width of the fundamental frequency, and a desired magnification.

補正目的の種別は、韻律制御の目的を指定する情報、言い換えると、目標値生成部５０が取得する話速の制御量と基本周波数の変化幅の制御量との補正傾向を指定する情報である。具体的には、補正目的の種別は、例えば、「難聴者向け」、「速聴向け」、「早口向け」を指定する情報である。 The type of correction purpose is information that specifies the purpose of prosody control, in other words, information that specifies the correction tendency between the control amount of the speech speed and the control amount of the change amount of the fundamental frequency acquired by the target value generation unit 50. . Specifically, the correction purpose type is, for example, information for designating “for hearing-impaired people”, “for quick listening”, and “for quick mouth”.

目標値生成部５０は、音声分析部３０から供給される話速値と基本周波数の時間変化情報とを取り込み、制御条件情報生成部４０から供給される制御条件情報にしたがって、話速の制御量と基本周波数の変化幅の制御量とを求める。さらに、目標値生成部５０は、基本周波数の変化幅の制御量に基づいて目標基本周波数の時間変化情報を生成する。そして、目標値生成部５０は、話速の制御量と目標基本周波数の時間変化情報とを音声データ補正部８０に供給する。 The target value generation unit 50 takes in the speech speed value supplied from the voice analysis unit 30 and time change information of the fundamental frequency, and controls the amount of speech speed according to the control condition information supplied from the control condition information generation unit 40. And the control amount of the change width of the fundamental frequency. Further, the target value generation unit 50 generates time change information of the target fundamental frequency based on the control amount of the change width of the fundamental frequency. Then, the target value generation unit 50 supplies the speech data correction unit 80 with the control amount of the speech speed and the time change information of the target fundamental frequency.

次に、詳細に目標値生成部５０を説明する。目標値生成部５０は、その機能構成として、制御情報テーブル記憶部５１と、制御条件設定部５２と、制御情報取得部５３と、目標基本周波数生成部５４とを含んで構成される。 Next, the target value generation unit 50 will be described in detail. The target value generation unit 50 includes a control information table storage unit 51, a control condition setting unit 52, a control information acquisition unit 53, and a target fundamental frequency generation unit 54 as functional configurations.

制御情報テーブル記憶部５１は、話速の制御量と基本周波数の変化幅の制御量とを対応付けた複数の制御情報テーブルを、制御方法の種別と補正目的の種別とに関連付けて記憶する。制御情報テーブルは、話速値の絶対値に基本周波数の変化幅の絶対値を対応付けたテーブルと、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けたテーブルと、複数の指定話速値それぞれに対応して、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けたテーブルとの３種類である。各種制御情報テーブルの具体例については後述する。
制御情報テーブル記憶部５１は、読み出しおよび書き込み可能な記憶装置であり、例えば、磁気ハードディスク装置、半導体記憶装置等である。また、制御情報テーブル記憶部５１は、光磁気ディスク等の可搬型記録媒体であってもよい。 The control information table storage unit 51 stores a plurality of control information tables in which the control amount of the speech speed and the control amount of the change width of the fundamental frequency are associated with each other in association with the control method type and the correction purpose type. The control information table includes a table in which the absolute value of the change rate of the fundamental frequency is associated with the absolute value of the speech speed value, a table in which the magnification of the change rate of the fundamental frequency is associated with each of the multiple rates of speech speed, and a plurality of Corresponding to each of the designated speech speed values, there are three types, a table in which the magnification of the fundamental frequency is associated with each of a plurality of speech speed magnifications. Specific examples of the various control information tables will be described later.
The control information table storage unit 51 is a readable and writable storage device, such as a magnetic hard disk device or a semiconductor storage device. The control information table storage unit 51 may be a portable recording medium such as a magneto-optical disk.

制御条件設定部５２は、制御条件情報生成部４０から供給される制御条件情報を取り込み、その制御条件情報に含まれる制御方法の種別と制御パラメータと補正目的の種別とを制御情報取得部５３に設定する。 The control condition setting unit 52 takes in the control condition information supplied from the control condition information generating unit 40, and sends the control method type, control parameter, and correction purpose type included in the control condition information to the control information acquisition unit 53. Set.

制御情報取得部５３は、制御条件設定部５２により設定された制御方法の種別と補正目的の種別とに対応付けられた制御情報テーブルを制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、読み込んだ制御情報テーブルと、制御条件設定部５２により設定された制御パラメータと、音声分析部３０からそれぞれ供給された話速値および基本周波数の時間変化情報とに基づいて、話速の制御量と基本周波数の変化幅の制御量とを求める。さらに、制御情報取得部５３は、話速の制御量を音声データ補正部８０に供給するとともに、基本周波数の変化幅の制御量を目標基本周波数生成部５４に供給する。 The control information acquisition unit 53 reads from the control information table storage unit 51 a control information table associated with the control method type and the correction purpose type set by the control condition setting unit 52. Then, the control information acquisition unit 53 converts the read control information table, the control parameters set by the control condition setting unit 52, and the time change information of the speech speed value and the fundamental frequency respectively supplied from the voice analysis unit 30. Based on this, the control amount of the speech speed and the control amount of the change width of the fundamental frequency are obtained. Further, the control information acquisition unit 53 supplies the control amount of the speech speed to the audio data correction unit 80 and supplies the control amount of the change width of the basic frequency to the target basic frequency generation unit 54.

具体的には、制御条件設定部５２により制御方法の種別（例えば、制御方法１）と制御パラメータ（例えば、キー情報は話速の制御量を指定する情報である。）と補正目的の種別（例えば、難聴者向け）とが設定された場合、制御情報取得部５３は、制御方法１と難聴者向けとに対応付けられた制御情報テーブル（話速値の絶対値に基本周波数の変化幅の絶対値を対応付けた難聴者向けテーブル）を制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、制御パラメータにしたがい、音声分析部３０から供給された話速値の絶対値をキーとして、そのキーに対応付けられた制御量である基本周波数の変化幅の絶対値を制御情報テーブルから抽出する。そして、制御情報取得部５３は、キーである話速値の絶対値を話速の制御量とし、制御情報テーブルから抽出した基本周波数の変化幅の絶対値を基本周波数の変化幅の制御量とする。 Specifically, the control condition setting unit 52 controls the type of control method (for example, control method 1), the control parameter (for example, key information is information for specifying the control amount of speech speed), and the type of correction purpose ( For example, when “for the hearing impaired” is set, the control information acquisition unit 53 sets the control information table (the absolute value of the speech speed value to the basic frequency change width) associated with the control method 1 and for the hearing impaired. A table for the hard of hearing who is associated with an absolute value) is read from the control information table storage unit 51. Then, according to the control parameter, the control information acquisition unit 53 uses the absolute value of the speech speed value supplied from the voice analysis unit 30 as a key, and the absolute value of the change width of the basic frequency that is the control amount associated with the key. The value is extracted from the control information table. Then, the control information acquisition unit 53 uses the absolute value of the speech speed value, which is a key, as the control amount of the speech speed, and uses the absolute value of the change frequency of the basic frequency extracted from the control information table as the control amount of the basic frequency change width. To do.

また、制御条件設定部５２により制御方法の種別（例えば、制御方法１）と制御パラメータ（例えば、キー情報は基本周波数の変化幅の制御量を指定する情報である。）と補正目的の種別（例えば、難聴者向け）とが設定された場合、制御情報取得部５３は、制御方法１と難聴者向けとに対応付けられた制御情報テーブル（話速値の絶対値に基本周波数の変化幅の絶対値を対応付けた難聴者向けテーブル）を制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、制御パラメータにしたがい、音声分析部３０から供給された基本周波数の時間変化情報に基づき得られる基本周波数の変化幅の絶対値をキーとして、そのキーに対応付けられた制御量である話速値の絶対値を制御情報テーブルから抽出する。そして、制御情報取得部５３は、キーである基本周波数の変化幅の絶対値を基本周波数の変化幅の制御量とし、制御情報テーブルから抽出した話速値の絶対値を話速の制御量とする。 In addition, the control condition setting unit 52 controls the type of control method (for example, control method 1), the control parameter (for example, key information is information specifying the control amount of the change width of the fundamental frequency), and the type of correction purpose ( For example, when “for the hearing impaired” is set, the control information acquisition unit 53 sets the control information table (the absolute value of the speech speed value to the basic frequency change width) associated with the control method 1 and for the hearing impaired. A table for the hard of hearing who is associated with an absolute value) is read from the control information table storage unit 51. Then, according to the control parameter, the control information acquisition unit 53 uses the absolute value of the change width of the fundamental frequency obtained based on the time change information of the fundamental frequency supplied from the speech analysis unit 30 as a key and is associated with the key. The absolute value of the speech speed value, which is the control amount, is extracted from the control information table. Then, the control information acquisition unit 53 uses the absolute value of the change width of the fundamental frequency as a key as the control amount of the change width of the fundamental frequency, and uses the absolute value of the speech speed value extracted from the control information table as the control amount of the speech speed. To do.

また、制御条件設定部５２により制御方法の種別（例えば、制御方法２）と制御パラメータ（例えば、キー情報は話速の制御量を指定する情報であり、倍率も指定されている。）と補正目的の種別（例えば、速聴向け）とが設定された場合、制御情報取得部５３は、制御方法２と速聴向けとに対応付けられた制御情報テーブル（複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた速聴向けテーブル）を制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、制御パラメータによる話速の倍率をキーとして、そのキーに対応付けられた制御量である基本周波数の変化幅の倍率を制御情報テーブルから抽出する。そして、制御情報取得部５３は、制御パラメータによる話速の倍率を話速の制御量とし、制御情報テーブルから抽出した基本周波数の変化幅の倍率を基本周波数の変化幅の制御量とする。 Further, the control condition setting unit 52 corrects the type of control method (for example, control method 2) and the control parameter (for example, key information is information for specifying the control amount of speech speed and the magnification is also specified) and correction. When the target type (for example, for fast listening) is set, the control information acquisition unit 53 controls the control information table associated with the control method 2 and for fast listening (basic for each of multiple magnifications of speaking speed). A table for quick listening associated with a frequency change width magnification) is read from the control information table storage unit 51. Then, the control information acquisition unit 53 extracts the magnification of the change width of the fundamental frequency, which is a control amount associated with the key, from the control information table using the speech speed magnification based on the control parameter as a key. Then, the control information acquisition unit 53 sets the magnification of the speech speed based on the control parameter as the control amount of the speech speed, and sets the magnification of the change width of the fundamental frequency extracted from the control information table as the control amount of the change width of the fundamental frequency.

また、制御条件設定部５２により制御方法の種別（例えば、制御方法２）と制御パラメータ（例えば、キー情報は基本周波数の変化幅の制御量を指定する情報であり、倍率も指定されている。）と補正目的の種別（例えば、速聴向け）とが設定された場合、制御情報取得部５３は、制御方法２と速聴向けとに対応付けられた制御情報テーブル（複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた速聴向けテーブル）を制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、制御パラメータによる基本周波数の変化幅の倍率をキーとして、そのキーに対応付けられた制御量である話速の倍率を制御情報テーブルから抽出する。そして、制御情報取得部５３は、制御パラメータによる基本周波数の変化幅の倍率を基本周波数の変化幅の制御量とし、制御情報テーブルから抽出した話速の倍率を話速の制御量とする。 Further, the control condition setting unit 52 specifies the control method type (for example, control method 2) and the control parameter (for example, key information is information for specifying the control amount of the change width of the fundamental frequency, and the magnification is also specified. ) And the type of correction purpose (for example, for fast listening), the control information acquisition unit 53 controls the control information table (multiple of speaking speeds) associated with the control method 2 and for fast listening. A table for quick listening, each of which is associated with a magnification of the change width of the fundamental frequency, is read from the control information table storage unit 51. Then, the control information acquisition unit 53 uses the magnification of the change width of the fundamental frequency based on the control parameter as a key, and extracts the speech speed magnification, which is the control amount associated with the key, from the control information table. Then, the control information acquisition unit 53 uses the magnification of the change width of the fundamental frequency based on the control parameter as the control amount of the change width of the fundamental frequency, and sets the magnification of the speech speed extracted from the control information table as the control amount of the speech speed.

また、制御条件設定部５２により制御方法の種別（例えば、制御方法３）と制御パラメータ（例えば、キー情報は話速の制御量を指定する情報であり、倍率も指定されている。）と補正目的の種別（例えば、早口向け）とが設定された場合、制御情報取得部５３は、制御方法３と早口向けとに対応付けられて且つ音声分析部３０から供給された話速値と同一である指定話速値の制御情報テーブル（指定話速値に対応して、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた早口向けテーブル）を、制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、制御パラメータによる話速の倍率をキーとして、そのキーに対応付けられた制御量である基本周波数の変化幅の倍率を制御情報テーブルから抽出する。そして、制御情報取得部５３は、制御パラメータによる話速の倍率を話速の制御量とし、制御情報テーブルから抽出した基本周波数の変化幅の倍率を基本周波数の変化幅の制御量とする。 Further, the control condition setting unit 52 corrects the type of control method (for example, control method 3) and the control parameter (for example, key information is information for specifying the control amount of speech speed, and the magnification is also specified). When the target type (for example, for fast mouth) is set, the control information acquisition unit 53 is associated with the control method 3 and for fast mouth and has the same speech speed value supplied from the speech analysis unit 30. A control information table storage unit 51 stores a control information table of a certain designated speech speed value (a table for a quick mouth in which a magnification of a change width of a fundamental frequency is associated with each of a plurality of speech speed magnifications corresponding to a designated speech speed value). Read from. Then, the control information acquisition unit 53 extracts the magnification of the change width of the fundamental frequency, which is a control amount associated with the key, from the control information table using the speech speed magnification based on the control parameter as a key. Then, the control information acquisition unit 53 sets the magnification of the speech speed based on the control parameter as the control amount of the speech speed, and sets the magnification of the change width of the fundamental frequency extracted from the control information table as the control amount of the change width of the fundamental frequency.

また、制御条件設定部５２により制御方法の種別（例えば、制御方法３）と制御パラメータ（例えば、キー情報は基本周波数の変化幅の制御量を指定する情報であり、倍率も指定されている。）と補正目的の種別（例えば、早口向け）とが設定された場合、制御情報取得部５３は、制御方法３と早口向けとに対応付けられて且つ音声分析部３０から供給された話速値と同一である指定話速値の制御情報テーブル（指定話速値に対応して、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた早口向けテーブル）を、制御情報テーブル記憶部５１から読み込む。そして、制御情報取得部５３は、制御パラメータによる基本周波数の変化幅の倍率をキーとして、そのキーに対応付けられた制御量である話速値の倍率を制御情報テーブルから抽出する。そして、制御情報取得部５３は、制御パラメータによる基本周波数の変化幅の倍率を基本周波数の変化幅の制御量とし、制御情報テーブルから抽出した話速の倍率を話速の制御量とする。 Further, the control condition setting unit 52 specifies the control method type (for example, control method 3) and the control parameter (for example, key information is information for specifying the control amount of the change width of the fundamental frequency, and the magnification is also specified. ) And the type of correction purpose (for example, for fast mouth), the control information acquisition unit 53 is associated with the control method 3 and for fast mouth and the speech speed value supplied from the speech analysis unit 30 A control information table of a designated speech speed value that is the same as (a table for a quick mouth in which a magnification of a change rate of a fundamental frequency is associated with each of a plurality of speech speed magnifications corresponding to the designated speech speed value) Read from the storage unit 51. Then, the control information acquisition unit 53 uses the magnification of the change width of the fundamental frequency based on the control parameter as a key, and extracts the magnification of the speech speed value, which is a control amount associated with the key, from the control information table. Then, the control information acquisition unit 53 uses the magnification of the change width of the fundamental frequency based on the control parameter as the control amount of the change width of the fundamental frequency, and sets the magnification of the speech speed extracted from the control information table as the control amount of the speech speed.

目標基本周波数生成部５４は、制御情報取得部５３から供給された基本周波数の変化幅の制御量に基づいて、基本周波数分析部３１で生成された基本周波数の時間変化情報から目標基本周波数の時間変化情報を生成して音声データ補正部８０に供給する。このとき、目標基本周波数生成部５４は、内部に記憶している生成条件情報にしたがって目標基本周波数の時間変化情報を生成する。
生成条件情報は、目標基本周波数生成部５４が目標基本周波数の時間変化情報を生成するときの条件を指定する情報であり、平均値固定指定と、最小値固定指定と、指定値固定指定とのいずれかが指定される。 The target fundamental frequency generation unit 54 determines the time of the target fundamental frequency from the time change information of the fundamental frequency generated by the fundamental frequency analysis unit 31 based on the control amount of the change width of the fundamental frequency supplied from the control information acquisition unit 53. Change information is generated and supplied to the audio data correction unit 80. At this time, the target fundamental frequency generation unit 54 generates time change information of the target fundamental frequency according to the generation condition information stored therein.
The generation condition information is information for specifying a condition when the target fundamental frequency generation unit 54 generates time change information of the target fundamental frequency. The generation condition information includes an average value fixed designation, a minimum value fixed designation, and a designated value fixed designation. Either is specified.

平均値固定指定は、基本周波数分析部３１が生成した基本周波数の時間変化情報の所定区間における周波数値の平均値と同一の平均値となるようにして、当該区間における目標基本周波数の時間変化情報を生成することを指定するものである。
最小値固定指定は、基本周波数分析部３１が生成した基本周波数の時間変化情報の所定区間における周波数値の最小値を基準とし、その最小値以上の周波数値となるように、当該区間における目標基本周波数の時間変化情報を生成することを指定するものである。
指定値固定指定には次の二通りの指定がある。第１の指定値固定指定は、基本周波数分析部３１が生成した基本周波数の時間変化情報の所定区間における任意の一つの周波数値を基準として、当該区間における目標基本周波数の時間変化情報を生成することを指定するものである。
また、第２の指定値固定指定は、所定区間の始点と終点とのそれぞれに対応する二つの周波数値を基準として、当該区間における目標基本周波数の時間変化情報を生成することを指定するものである。 The average value fixed designation is such that the average value is the same as the average value of the frequency values in the predetermined section of the time change information of the fundamental frequency generated by the fundamental frequency analysis unit 31, and the time change information of the target fundamental frequency in the section Is specified.
The minimum value fixed designation is based on the minimum value of the frequency value in the predetermined section of the time change information of the basic frequency generated by the basic frequency analysis unit 31, and the target basic value in the section is set so that the frequency value is equal to or higher than the minimum value. This specifies that the time change information of the frequency is generated.
There are the following two types of fixed value specification. The first specified value fixed designation generates time change information of the target fundamental frequency in the section based on any one frequency value in the predetermined section of the time change information of the fundamental frequency generated by the fundamental frequency analysis unit 31. Is to specify.
In addition, the second specified value fixed designation designates that time change information of the target fundamental frequency in the section is generated with reference to two frequency values corresponding to the start point and the end point of the predetermined section, respectively. is there.

目標基本周波数生成部５４は、生成条件取得部７０から生成条件情報の供給を受けると、その生成条件情報で内部に記憶した生成条件情報を書き換える。 When receiving the generation condition information from the generation condition acquisition unit 70, the target fundamental frequency generation unit 54 rewrites the generation condition information stored therein with the generation condition information.

目標基本周波数生成部５４は、制御情報取得部５３から供給される基本周波数の変化幅の制御量が「基本周波数の変化幅の数値の絶対値」である場合、生成条件取得部７０から供給される生成条件情報にしたがって、周波数の変化幅が「基本周波数の変化幅の数値の絶対値」に合致する目標基本周波数の時間変化情報を生成する。
また、目標基本周波数生成部５４は、制御情報取得部５３から供給される基本周波数の変化幅の制御量が「基本周波数の変化幅の倍率」である場合、生成条件取得部７０から供給される生成条件情報にしたがって、周波数の変化幅が「基本周波数の変化幅の倍率」に適合する目標基本周波数の時間変化情報を生成する。 The target fundamental frequency generation unit 54 is supplied from the generation condition acquisition unit 70 when the control amount of the change width of the fundamental frequency supplied from the control information acquisition unit 53 is “absolute value of the change frequency of the fundamental frequency”. In accordance with the generation condition information, the time variation information of the target fundamental frequency whose frequency variation width matches the “absolute value of the fundamental frequency variation width” is generated.
Further, the target fundamental frequency generation unit 54 is supplied from the generation condition acquisition unit 70 when the control amount of the change width of the fundamental frequency supplied from the control information acquisition unit 53 is “a magnification of the change width of the fundamental frequency”. According to the generation condition information, the time change information of the target fundamental frequency that matches the change width of the frequency with the “multiplier of the change width of the fundamental frequency” is generated.

制御情報テーブル更新部６０は、ユーザによるキーボードやマウス等の入力装置の操作によって制御情報テーブルの供給を受け、その制御情報テーブルを制御情報テーブル記憶部５１に新規に書き込んだり、その制御情報テーブルで、既に記憶されている制御情報テーブルを書き換えたりする。
生成条件取得部７０は、ユーザによるキーボードやマウス等の入力装置の操作によって平均値固定指定と最小値固定指定と第１の指定値固定指定と第２の指定値固定指定とのいずれかの指定を受け、その指定を示す生成条件情報を生成して目標基本周波数生成部５４に供給する。 The control information table update unit 60 receives a control information table by a user's operation of an input device such as a keyboard or a mouse, and newly writes the control information table in the control information table storage unit 51 or uses the control information table. The control information table that has already been stored is rewritten.
The generation condition acquisition unit 70 designates one of an average value fixed designation, a minimum value fixed designation, a first designated value fixed designation, and a second designated value fixed designation by an operation of an input device such as a keyboard or a mouse by a user. The generation condition information indicating the designation is generated and supplied to the target fundamental frequency generation unit 54.

音声データ補正部８０は、制御情報取得部５３から供給される話速の制御量と、目標基本周波数生成部５４から供給される目標基本周波数の時間変化情報とをそれぞれ取り込み、音声データ取得部１０が取得した音声データの韻律を補正して補正後音声データを出力する。
具体的には、例えば、音声データ補正部８０は、音声データに目標基本周波数の時間変化情報を適用して基本周波数の時間変化情報を補正する。そして、音声データ補正部８０は、話速の制御量（話速値の絶対値または話速の倍率）に基づいて話速計算部３２が算出した話速値を補正し、音声データの音素区切時間情報をその補正した話速値に基づいて補正する。そして、音声データ補正部８０は、基本周波数の時間変化情報が補正された音声データに、補正後の音素区切時間情報を適用して時間軸方向を補正した補正後音声データを生成する。 The audio data correction unit 80 takes in the control amount of the speech speed supplied from the control information acquisition unit 53 and the time change information of the target basic frequency supplied from the target basic frequency generation unit 54, respectively, and the audio data acquisition unit 10 Corrects the prosody of the acquired voice data and outputs the corrected voice data.
Specifically, for example, the audio data correction unit 80 corrects the time change information of the basic frequency by applying the time change information of the target basic frequency to the audio data. Then, the voice data correction unit 80 corrects the speech speed value calculated by the speech speed calculation unit 32 based on the control amount of the speech speed (absolute value of the speech speed value or the multiplication factor of the speech speed), thereby separating the phoneme of the speech data. The time information is corrected based on the corrected speech speed value. Then, the voice data correction unit 80 generates corrected voice data in which the time axis direction is corrected by applying the corrected phoneme segmentation time information to the voice data in which the time change information of the fundamental frequency is corrected.

図２は、音素区切時間情報を表すテキストデータの例を示す図である。同図に示すように、音素区切時間情報は複数の行からなるテキストデータであり、各行が音素に対応している。この音素区切時間情報の１行目から１７行目までのデータは、音声データにおける“（空白）おはようございます（空白）”の部分についての音素の区切りを示す時間情報である。
また、音素区切時間情報の各行は、区切り文字（例えば、スペースやタブ等）で区切られた３つの列を有している。一列目は、音声データの先頭に対応する時点を基準時として各音素の開始時点までの時間を１万分の１秒単位で表し、二列目は上記基準時から各音素の終了時点までの時間を１万分の１秒単位で表し、三列目は音素の音素ラベルを表す。例えば、同図において、“０２７４０ｓｉｌ”は、基準時から０．２７４秒経過するまでの間が無声区間であることを表す。また、“２７４０３１６８ｏ（お）”は、基準時から０．２７４秒経過した時点から、基準時から０．３１６８秒経過するまでの間の音素が“ｏ（お）”であることを表す。なお、音素ラベル“ｓｉｌ”は音素がないことを表す。
なお、ここでは、時刻が１万分の１秒単位である場合を一例として説明したが、１千分の１秒単位（ミリ秒）等、他の単位で表すようにしてもよい。 FIG. 2 is a diagram illustrating an example of text data representing phoneme segmentation time information. As shown in the figure, the phoneme break time information is text data composed of a plurality of lines, and each line corresponds to a phoneme. The data from the first line to the 17th line of the phoneme separation time information is time information indicating the phoneme breakage for the “(blank) good morning (blank)” portion in the voice data.
Each row of phoneme separation time information has three columns separated by a delimiter (for example, a space or a tab). The first column shows the time from the reference time to the end point of each phoneme, with the time from the reference time to the end point of each phoneme. Is expressed in units of 1 / 10,000 second, and the third column indicates phoneme labels of phonemes. For example, in the figure, “0 2740 sil” indicates that the period from the reference time until 0.274 seconds elapses is a silent section. Further, “2740 3168 o (o)” represents that the phoneme from the time point when 0.274 seconds have elapsed from the reference time to the point when 0.3168 seconds have elapsed from the reference time is “o (o)”. . Note that the phoneme label “sil” indicates that there is no phoneme.
Here, the case where the time is in units of 1 / 10,000 seconds has been described as an example. However, the time may be expressed in other units such as 1 / 1000th of a second (millisecond).

図３（ａ）〜（ｃ）のそれぞれは、話速値の絶対値に基本周波数の変化幅の絶対値を対応付けた制御情報テーブルの例を示す図である。同図（ａ）〜（ｃ）それぞれの制御情報テーブルは、制御方法１を示す制御方法の種別に関連付けられている。これら制御情報テーブルにおいて、話速値の絶対値は話速の制御量であり、基本周波数の変化幅の絶対値は基本周波数の変化幅の制御量である。同図（ａ）は難聴者向けテーブルであり、「難聴者向け」を示す補正目的の種別に関係付けられている。また、同図（ｂ）は速聴向けテーブルであり、「速聴向け」を示す補正目的の種別に関係付けられている。また、同図（ｃ）は早口向けテーブルであり、「早口向け」を示す補正目的の種別に関係付けられている。 Each of FIGS. 3A to 3C is a diagram illustrating an example of a control information table in which the absolute value of the fundamental frequency is associated with the absolute value of the speech speed value. Each of the control information tables in FIGS. 9A to 9C is associated with a control method type indicating the control method 1. In these control information tables, the absolute value of the speech speed value is the control amount of the speech speed, and the absolute value of the change width of the fundamental frequency is the control amount of the change width of the fundamental frequency. FIG. 6A shows a table for the hearing impaired person, and is associated with the type of correction purpose indicating “for the hearing impaired”. FIG. 5B is a table for quick listening, and is related to the type of correction purpose indicating “for quick listening”. FIG. 6C is a quick mouth table, which is related to the type of correction purpose indicating “for fast mouth”.

図４（ａ）〜（ｃ）のそれぞれは、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた制御情報テーブルの例を示す図である。同図（ａ）〜（ｃ）それぞれの制御情報テーブルは、制御方法２を示す制御方法の種別に関連付けられている。これら制御情報テーブルにおいて、話速値の倍率は話速の制御量であり、基本周波数の変化幅の倍率は基本周波数の変化幅の制御量である。同図（ａ）は難聴者向けテーブルであり、「難聴者向け」を示す補正目的の種別に関係付けられている。また、同図（ｂ）は速聴向けテーブルであり、「速聴向け」を示す補正目的の種別に関係付けられている。また、同図（ｃ）は早口向けテーブルであり、「早口向け」を示す補正目的の種別に関係付けられている。 Each of FIGS. 4A to 4C is a diagram illustrating an example of a control information table in which a magnification of a change width of a fundamental frequency is associated with each of a plurality of magnifications of speech speed. Each of the control information tables in FIGS. 9A to 9C is associated with a control method type indicating the control method 2. In these control information tables, the magnification of the speech speed value is a control amount of the speech speed, and the magnification of the change width of the fundamental frequency is a control amount of the change width of the fundamental frequency. FIG. 6A shows a table for the hearing impaired person, and is associated with the type of correction purpose indicating “for the hearing impaired”. FIG. 5B is a table for quick listening, and is related to the type of correction purpose indicating “for quick listening”. FIG. 6C is a quick mouth table, which is related to the type of correction purpose indicating “for fast mouth”.

図５（ａ）〜（ｃ）のそれぞれは、三つの指定話速値それぞれに対応して、複数の話速の倍率それぞれに基本周波数の変化幅の倍率を対応付けた制御情報テーブルの例を示す図である。同図（ａ）〜（ｃ）それぞれの制御情報テーブルは、制御方法３を示す制御方法の種別に関連付けられている。これら制御情報テーブルにおいて、話速値の倍率は話速の制御量であり、基本周波数の変化幅の倍率は基本周波数の変化幅の制御量である。同図（ａ）は難聴者向けテーブルであり、「難聴者向け」を示す補正目的の種別に関係付けられている。また、同図（ｂ）は速聴向けテーブルであり、「速聴向け」を示す補正目的の種別に関係付けられている。また、同図（ｃ）は早口向けテーブルであり、「早口向け」を示す補正目的の種別に関係付けられている。同図（ａ），（ｂ），（ｃ）とも、指定話速値が６，８，１０（モーラ／秒）である例を表す。 Each of FIGS. 5A to 5C is an example of a control information table in which the magnification of the change frequency of the fundamental frequency is associated with each of a plurality of magnifications of the speech speed corresponding to each of the three designated speech speed values. FIG. Each of the control information tables in FIGS. 9A to 9C is associated with a control method type indicating the control method 3. In these control information tables, the magnification of the speech speed value is a control amount of the speech speed, and the magnification of the change width of the fundamental frequency is a control amount of the change width of the fundamental frequency. FIG. 6A shows a table for the hearing impaired person, and is associated with the type of correction purpose indicating “for the hearing impaired”. FIG. 5B is a table for quick listening, and is related to the type of correction purpose indicating “for quick listening”. FIG. 6C is a quick mouth table, which is related to the type of correction purpose indicating “for fast mouth”. FIGS. 9A, 9B, and 9C show examples in which the designated speech speed values are 6, 8, and 10 (mora / second).

なお、図３から図５までの制御情報テーブルの例は、話速計算部３２から供給される音声データの話速値が７モーラ／秒から８モーラ／秒程度である場合の例である。 The example of the control information table from FIG. 3 to FIG. 5 is an example in the case where the speech speed value of the voice data supplied from the speech speed calculation unit 32 is about 7 mora / second to 8 mora / second.

図６は、目標基本周波数生成部５４が、制御情報取得部５３から供給される基本周波数の変化幅の制御量に基づいて生成する目標基本周波数の時間変化情報を模式的に示す図である。
同図（ａ）は、生成条件情報が平均値固定指定を示す情報である場合に、目標基本周波数生成部５４が生成した目標基本周波数の時間変化情報の例である。同図（ａ）は、時刻ｔ１から時刻ｔ２まで（ｔ１＜ｔ２）の時間における音声データの基本周波数の時間変化情報１と、この基本周波数の時間変化情報１を基本周波数の変化幅の制御量に基づいて、拡大した基本周波数の時間変化情報１ａと、縮小した基本周波数の時間変化情報１ｒとを示す。同図（ａ）において、基本周波数の時間変化情報１の時刻ｔ１から時刻ｔ２までにおける周波数値の平均値は平均値ｆ_ａｖｅである。同図（ａ）に示すように、目標基本周波数生成部５４は、時刻ｔ１から時刻ｔ２までの基本周波数の時間変化情報の平均値が平均値ｆ_ａｖｅになるように、基本周波数の時間変化情報２を拡大したり縮小したりする。 FIG. 6 is a diagram schematically illustrating the time change information of the target fundamental frequency generated by the target fundamental frequency generation unit 54 based on the control amount of the change width of the fundamental frequency supplied from the control information acquisition unit 53.
FIG. 6A shows an example of time change information of the target fundamental frequency generated by the target fundamental frequency generation unit 54 when the generation condition information is information indicating that the average value is fixed. FIG. 4A shows the time change information 1 of the fundamental frequency of the audio data in the time from the time t1 to the time t2 (t1 <t2), and the time change information 1 of the fundamental frequency as a control amount of the change width of the fundamental frequency. Based on, the time change information 1a of the expanded fundamental frequency and the time change information 1r of the reduced fundamental frequency are shown. In FIG. _6A , the average value of the frequency values from time t1 to time t2 of the time change information 1 of the fundamental frequency is an average value f _ave . As shown in FIG. 9A, the target fundamental frequency generation unit 54 sets the time change information of the fundamental frequency so that the average value of the time change information of the fundamental frequency from time t1 to time t2 becomes the average value f _ave. 2 is enlarged or reduced.

図６（ｂ）は、生成条件情報が最小値固定指定を示す情報である場合に、目標基本周波数生成部５４が生成した目標基本周波数の時間変化情報の例である。同図（ｂ）は、時刻ｔ１から時刻ｔ２まで（ｔ１＜ｔ２）の時間における音声データの基本周波数の時間変化情報２と、この基本周波数の時間変化情報２を基本周波数の変化幅の制御量に基づいて、拡大した基本周波数の時間変化情報２ａと、縮小した基本周波数の時間変化情報２ｒとを示す。同図（ｂ）において、基本周波数の時間変化情報２の時刻ｔ１から時刻ｔ２までにおける周波数値の最小値は最小値ｆ_ｍｉｎである。同図（ｂ）に示すように、目標基本周波数生成部５４は、時刻ｔ１から時刻ｔ２までの基本周波数の時間変化情報２を周波数値の最小値ｆ_ｍｉｎを基準として拡大したり縮小したりする。 FIG. 6B is an example of the time change information of the target fundamental frequency generated by the target fundamental frequency generation unit 54 when the generation condition information is information indicating that the minimum value is fixed. FIG. 6B shows the time change information 2 of the fundamental frequency of the audio data in the time from the time t1 to the time t2 (t1 <t2), and the time change information 2 of the fundamental frequency as the control amount of the change width of the fundamental frequency. The basic frequency time change information 2a expanded and the reduced basic frequency time change information 2r are shown. In FIG. 6B, the minimum value of the frequency value from the time t1 to the time t2 of the time change information 2 of the basic frequency is the minimum value _fmin . As shown in FIG. 4B, the target fundamental frequency generation unit 54 expands or reduces the time change information 2 of the fundamental frequency from time t1 to time t2 with reference to the minimum frequency value _fmin. .

図６（ｃ）は、生成条件情報が第１の指定値固定指定を示す情報である場合に、目標基本周波数生成部５４が生成した目標基本周波数の時間変化情報の例である。同図（ｃ）は、時刻ｔ１から時刻ｔ２まで（ｔ１＜ｔ２）の時間における音声データの基本周波数の時間変化情報３と、この基本周波数の時間変化情報３を基本周波数の変化幅の制御量に基づいて、拡大した基本周波数の時間変化情報３ａと、縮小した基本周波数の時間変化情報３ｒとを示す。同図（ｃ）に示すように、目標基本周波数生成部５４は、時刻ｔ１から時刻ｔ２までの基本周波数の時間変化情報３を、任意の周波数値ｆ_ｓを基準として拡大したり縮小したりする。 FIG. 6C is an example of the time change information of the target fundamental frequency generated by the target fundamental frequency generator 54 when the generation condition information is information indicating the first specified value fixed designation. FIG. 4C shows the time change information 3 of the fundamental frequency of the audio data in the time from time t1 to time t2 (t1 <t2), and the time change information 3 of the fundamental frequency as the control amount of the change width of the fundamental frequency. The basic frequency time change information 3a expanded and the reduced basic frequency time change information 3r are shown. As shown in FIG. (C), the target fundamental frequency generating unit 54, a time change information 3 of the fundamental frequency from the time t1 to the time t2, the enlarged or reduced based on the arbitrary frequency value f _s .

図６（ｄ）は、生成条件情報が第２の指定値固定指定を示す情報である場合に、目標基本周波数生成部５４が生成した目標基本周波数の時間変化情報の例である。同図（ｄ）は、時刻ｔ１から時刻ｔ２まで（ｔ１＜ｔ２）の時間における音声データの基本周波数の時間変化情報４と、この基本周波数の時間変化情報４を基本周波数の変化幅の制御量に基づいて、拡大した基本周波数の時間変化情報４ａと、縮小した基本周波数の時間変化情報４ｒとを示す。同図（ｄ）に示すように、目標基本周波数生成部５４は、時刻ｔ１における周波数値ｆ_ｓ１と時刻ｔ２（ｔ１＜ｔ２）における周波数値ｆ_ｓ２とを基準として、時刻ｔ１から時刻ｔ２までの基本周波数の時間変化情報４を拡大したり縮小したりする。 FIG. 6D is an example of the time change information of the target fundamental frequency generated by the target fundamental frequency generator 54 when the generation condition information is information indicating the second specified value fixed designation. FIG. 4D shows the time change information 4 of the fundamental frequency of the audio data in the time from the time t1 to the time t2 (t1 <t2), and the time change information 4 of the fundamental frequency as the control amount of the change width of the fundamental frequency. The basic frequency time change information 4a expanded and the reduced basic frequency time change information 4r are shown. As shown in FIG. 4D, the target fundamental frequency generation unit 54 uses the frequency value f _s1 at time t1 and the frequency value f _{s2 at} time t2 (t1 <t2) as references, from time t1 to time t2. The basic frequency time change information 4 is enlarged or reduced.

次に、第１実施形態である音声合成装置１００の動作について説明する。図７は、音声合成装置１００の韻律補正処理の処理手順を示すフローチャートである。
まず、ステップＳ１０１において、制御条件情報生成部４０は、ユーザによるキーボードやマウス等の入力装置の操作によって所望の制御方法の種別と所望の制御パラメータと補正目的の種別との供給を受けると、これら制御方法の種別と制御パラメータと補正目的の種別とを含めた制御条件情報を生成して目標値生成部５０に供給する。
次に、ステップＳ１０２において、目標値生成部５０の制御条件設定部５２は、制御条件情報生成部４０から供給される制御条件情報を取り込み、その制御条件情報に含まれる制御方法の種別と制御パラメータと補正目的の種別とを制御情報取得部５３に設定する。
次に、ステップＳ１０３において、音声データ取得部１０は、音声データの供給を受けると、その音声データを取り込んで音声分析部３０に供給する。さらに、音声属性情報取得部２０は、音声データ取得部１０に供給される音声データに関連する音声属性情報の供給を受けると、その音声属性情報を取り込んで音声分析部３０に供給する。 Next, the operation of the speech synthesizer 100 according to the first embodiment will be described. FIG. 7 is a flowchart showing a processing procedure of prosody correction processing of the speech synthesizer 100.
First, in step S101, when the control condition information generation unit 40 receives supply of a desired control method type, a desired control parameter, and a correction purpose type by a user's operation of an input device such as a keyboard or a mouse, Control condition information including the type of control method, control parameter, and type of correction purpose is generated and supplied to the target value generation unit 50.
Next, in step S102, the control condition setting unit 52 of the target value generating unit 50 takes in the control condition information supplied from the control condition information generating unit 40, and the type and control parameter of the control method included in the control condition information. And the type of correction purpose are set in the control information acquisition unit 53.
Next, in step S <b> 103, when the audio data acquisition unit 10 receives the supply of the audio data, the audio data acquisition unit 10 takes in the audio data and supplies it to the audio analysis unit 30. Further, when receiving the voice attribute information related to the voice data supplied to the voice data acquiring unit 10, the voice attribute information acquiring unit 20 takes in the voice attribute information and supplies it to the voice analyzing unit 30.

次に、ステップＳ１０４において、音声分析部３０は、音声データ取得部１０から供給される音声データと音声属性情報取得部２０から供給される音声属性情報とをそれぞれ取り込むと、これら音声データと音声属性情報とを分析し、音声の基本周波数の時間変化情報と話速値とをそれぞれ求めて目標値生成部５０に供給する。詳細には、音声分析部３０の基本周波数分析部３１は、音声データに基づいて基本周波数の時間変化情報を生成して目標値生成部５２に供給する。また、音声属性情報がテキスト情報である場合、音声分析部３０の話速計算部３２は、音声データとテキスト情報とに基づいて音素区切時間情報を生成し、生成した音素区切時間情報に基づいて話速を計算する（第１の話速値を計算する方法）か、音声データから音声区間の総時間長を求めるとともにテキスト情報からモーラ数を求めて、これら音声区間の総時間長とモーラ数とに基づいて話速値を計算する（第２の話速値を計算する方法）。また、音声属性情報が音素区切時間情報である場合、話速計算部３２は、その音素区切時間情報から単位時間当たりのモーラ数を求めることによって話速値を得る。 Next, in step S104, the voice analysis unit 30 takes in the voice data supplied from the voice data acquisition unit 10 and the voice attribute information supplied from the voice attribute information acquisition unit 20, respectively. The information is analyzed, and the time change information and the speech speed value of the fundamental frequency of the voice are obtained and supplied to the target value generation unit 50. Specifically, the fundamental frequency analysis unit 31 of the speech analysis unit 30 generates time change information of the fundamental frequency based on the speech data and supplies it to the target value generation unit 52. When the speech attribute information is text information, the speech speed calculation unit 32 of the speech analysis unit 30 generates phoneme segmentation time information based on the speech data and the text information, and based on the generated phoneme segmentation time information. The speech speed is calculated (method of calculating the first speech speed value), or the total time length of the speech section is obtained from the speech data and the number of mora is obtained from the text information, and the total time length and the number of mora of these speech sections are obtained. Based on the above, the speech speed value is calculated (method of calculating the second speech speed value). When the speech attribute information is phoneme break time information, the speech speed calculation unit 32 obtains the speech speed value by obtaining the number of mora per unit time from the phoneme break time information.

次に、ステップＳ１０５において、制御情報取得部５３は、制御条件設定部５２により設定された制御方法の種別と補正目的の種別とに対応付けられた制御情報テーブルを制御情報テーブル記憶部５１から読み込む。次に、制御情報取得部５３は、読み込んだ制御情報テーブルと、制御条件設定部５２により設定された制御パラメータと、音声分析部３０からそれぞれ供給された話速値および基本周波数の時間変化情報とに基づいて、話速の制御量と基本周波数の変化幅の制御量とを求める。次に、制御情報取得部５３は、話速の制御量を音声データ補正部８０に供給し、基本周波数の変化幅の制御量を目標基本周波数生成部５４に供給する。 Next, in step S <b> 105, the control information acquisition unit 53 reads from the control information table storage unit 51 the control information table associated with the control method type and the correction purpose type set by the control condition setting unit 52. . Next, the control information acquisition unit 53 reads the read control information table, the control parameters set by the control condition setting unit 52, the time change information of the speech speed value and the fundamental frequency supplied from the voice analysis unit 30, respectively. Based on the above, the control amount of the speech speed and the control amount of the change width of the fundamental frequency are obtained. Next, the control information acquisition unit 53 supplies the control amount of the speech speed to the audio data correction unit 80 and supplies the control amount of the change width of the basic frequency to the target basic frequency generation unit 54.

次に、ステップＳ１０６において、目標基本周波数生成部５４は、制御情報取得部５３から供給された基本周波数の変化幅の制御量に基づいて、基本周波数分析部３１で生成された基本周波数の時間変化情報から目標基本周波数の時間変化情報を生成して音声データ補正部８０に供給する。このとき、目標基本周波数生成部５４は、内部に記憶している生成条件情報にしたがって目標基本周波数の時間変化情報を生成する。
次に、ステップＳ１０７において、音声データ補正部８０は、制御情報取得部５３から供給された話速の制御量と、目標基本周波数生成部５４から供給された目標基本周波数の時間変化情報とをそれぞれ取り込むと、音声データ取得部１０が取得した音声データの韻律を補正して補正後音声データを出力する。 Next, in step S <b> 106, the target fundamental frequency generation unit 54 changes time of the fundamental frequency generated by the fundamental frequency analysis unit 31 based on the control amount of the variation width of the fundamental frequency supplied from the control information acquisition unit 53. Time change information of the target fundamental frequency is generated from the information and supplied to the audio data correction unit 80. At this time, the target fundamental frequency generation unit 54 generates time change information of the target fundamental frequency according to the generation condition information stored therein.
Next, in step S107, the audio data correction unit 80 obtains the control amount of the speech speed supplied from the control information acquisition unit 53 and the time change information of the target basic frequency supplied from the target basic frequency generation unit 54, respectively. When captured, the prosody of the audio data acquired by the audio data acquisition unit 10 is corrected and the corrected audio data is output.

以上説明したように、第１実施形態である音声合成装置１００は、ユーザによる入力装置の操作により話速の制御量と基本周波数の変化幅の制御量とのいずれか一方と補正目的とを指定することによって、補正目的に応じて、指定した一方の制御量に対応する他方の制御量を容易に且つ効率的に取得することができる。そして、音声合成装置１００は、それぞれ取得した話速の制御量と基本周波数の変化幅の制御量とに基づいて、音声データの話速および基本周波数の変化幅に関する韻律をユーザの補正目的に沿って補正することができる。
したがって、本実施形態である音声合成装置１００によれば、音響に関する専門知識や豊かな調整経験を必要とせず、容易に且つ効率的に音声データを聴き易い補正後音声データに補正することができる。 As described above, the speech synthesizer 100 according to the first embodiment designates one of the control amount of the speech speed and the control amount of the change width of the fundamental frequency and the correction purpose by the operation of the input device by the user. By doing so, the other control amount corresponding to the designated one control amount can be easily and efficiently acquired in accordance with the correction purpose. Then, the speech synthesizer 100 sets the prosody related to the speech speed and fundamental frequency variation of the speech data in accordance with the user's correction purpose based on the acquired speech speed control amount and fundamental frequency variation control amount. Can be corrected.
Therefore, according to the speech synthesizer 100 according to the present embodiment, it is possible to easily and efficiently correct the sound data to the corrected sound data that is easy to listen without requiring specialized knowledge about sound and rich adjustment experience. .

［第２の実施の形態］
図８は、本発明の第２実施形態である音声合成装置の機能構成を示すブロック図である。なお、本実施形態において、第１実施形態である音声合成装置１００の構成と同一の構成については、同一の符号を付してその説明を省略する。同図に示す音声合成装置１００ａは、音声合成装置１００の構成における制御条件情報生成部４０と目標値生成部５０とを、それぞれ、制御条件情報生成部４０ａと目標値生成部５０ａとに変更したものである。 [Second Embodiment]
FIG. 8 is a block diagram showing a functional configuration of the speech synthesizer according to the second embodiment of the present invention. In the present embodiment, the same components as those of the speech synthesizer 100 according to the first embodiment are denoted by the same reference numerals, and the description thereof is omitted. The speech synthesizer 100a shown in the figure changes the control condition information generation unit 40 and the target value generation unit 50 in the configuration of the speech synthesizer 100 to a control condition information generation unit 40a and a target value generation unit 50a, respectively. Is.

制御条件情報生成部４０ａは、ユーザによるキーボードやマウス等の入力装置の操作によって所望の制御方法の種別の供給を受け、この制御方法の種別を含めた制御条件情報を生成して目標値生成部５０ａに供給する。
本実施形態における制御方法の種別について具体的に説明する。本実施形態における制御方法の種別は定性的に制御量を指定する制御方法であり、下記の表２のように分類される。 The control condition information generation unit 40a receives supply of a desired control method type by the operation of an input device such as a keyboard or a mouse by the user, generates control condition information including the control method type, and generates a target value generation unit 50a.
The type of control method in this embodiment will be specifically described. The type of control method in the present embodiment is a control method for qualitatively specifying a control amount, and is classified as shown in Table 2 below.

表２において、制御方法Ａは、音声データの話速を「遅く」するよう指定するとともに、音声データの基本周波数の変化幅を「拡大」するよう指定する方法である。具体的には、例えば、制御方法Ａは、音声分析部３０から供給される話速値を約０．５倍から約０．９倍までの間にするよう指定するとともに、音声分析部３０から供給される基本周波数の時間変化情報に基づき得られる基本周波数の変化幅を約１．１５倍から約１．０５倍までの間にするよう指定する方法である。
この制御方法Ａは、老人性難聴の症状を有する高齢者や他の難聴者等に、ゆっくりとした速度で且つ音域を拡張した聴き取り易い音声を提供する場合に好適な制御方法である。 In Table 2, the control method A is a method for designating “slow” for the speech speed of the speech data and for “enlarging” the change width of the fundamental frequency of the speech data. Specifically, for example, the control method A specifies that the speech speed value supplied from the voice analysis unit 30 is between about 0.5 times and about 0.9 times, and from the voice analysis unit 30. This is a method of designating that the change width of the fundamental frequency obtained based on the time change information of the supplied fundamental frequency is between about 1.15 times and about 1.05 times.
This control method A is a control method suitable for providing an easy-to-listen sound with a slow speed and an extended sound range to an elderly person with senile deafness or another deaf person.

また、表２において、制御方法Ｂは、音声データの話速を「速く」するよう指定するとともに、音声データの基本周波数の変化幅を「少し縮小」するよう指定する方法である。具体的には、例えば、制御方法Ｂは、音声分析部３０から供給される話速値を約１．５倍から約２．５倍までの間にするよう指定するとともに、音声分析部３０から供給される基本周波数の時間変化情報に基づき得られる基本周波数の変化幅を約０．９７倍から約０．９倍までの間にするよう指定する方法である。
この制御方法Ｂは、聴き取り可能な程度の速い速度で且つ音域を少し狭めた速聴向けの音声を提供する場合に好適な制御方法である。 In Table 2, the control method B is a method of designating that the speech speed of the voice data is “fast” and that the change width of the fundamental frequency of the voice data is “slightly reduced”. Specifically, for example, the control method B specifies that the speech speed value supplied from the voice analysis unit 30 is between about 1.5 times and about 2.5 times, and from the voice analysis unit 30, This is a method for designating that the change width of the fundamental frequency obtained based on the supplied time change information of the fundamental frequency is between about 0.97 times and about 0.9 times.
This control method B is a control method suitable for providing sound for quick listening with a speed that is high enough to be heard and with a slightly narrower sound range.

また、表２において、制御方法Ｃは、音声データの話速を「少し速く」するよう指定するとともに、音声データの基本周波数の変化幅を「縮小」するよう指定する方法である。具体的には、例えば、制御方法Ｃは、音声分析部３０から供給される話速値を約１．１倍から約１．５倍までの間にするよう指定するとともに、音声分析部３０から供給される基本周波数の時間変化情報に基づき得られる基本周波数の変化幅を約０．９６倍から約０．８倍までの間にするよう指定する方法である。
この制御方法Ｃは、違和感を与えない程度の速い速度で且つ音域を狭めた早口音声を提供する場合に好適な制御方法である。 In Table 2, the control method C is a method of designating that the speech speed of the voice data is “slightly faster” and that the change width of the fundamental frequency of the voice data is “reduced”. Specifically, for example, the control method C specifies that the speech speed value supplied from the voice analysis unit 30 is between about 1.1 times and about 1.5 times, and from the voice analysis unit 30, This is a method of designating that the change width of the fundamental frequency obtained based on the time change information of the supplied fundamental frequency is between about 0.96 and about 0.8 times.
This control method C is a control method suitable for providing a fast-speech voice with a narrow speed range that does not give a sense of incongruity.

表２において、制御方法Ｄは、音声データの話速を「固定」するよう指定するとともに、音声データの基本周波数の変化幅を「拡大」するよう指定する方法である。具体的には、例えば、制御方法Ｄは、音声分析部３０から供給される話速値を変更しないよう指定するとともに、音声分析部３０から供給される基本周波数の時間変化情報に基づき得られる基本周波数の変化幅を約１．１倍から約１．２倍までの間にするよう指定する方法である。
この制御方法Ｄは、話速を変えることなく且つ音域を拡張して、例えば騒音・雑音環境下において聴き取り易い音声を提供する場合に好適な制御方法である。 In Table 2, the control method D is a method of designating to “fix” the speech speed of the audio data and to “enlarge” the change width of the fundamental frequency of the audio data. Specifically, for example, the control method D specifies that the speech speed value supplied from the voice analysis unit 30 is not changed, and the basic method obtained based on the time change information of the fundamental frequency supplied from the voice analysis unit 30. This is a method for designating the frequency change width to be between about 1.1 times and about 1.2 times.
This control method D is a control method suitable for expanding the sound range without changing the speech speed and providing a voice that can be easily heard in a noise / noise environment, for example.

目標値生成部５０ａは、音声分析部３０から供給される話速値と基本周波数の時間変化情報とを取り込み、制御条件情報生成部４０ａから供給される制御条件情報にしたがって、話速の制御量と基本周波数の変化幅の制御量とを求める。さらに、目標値生成部５０ａは、基本周波数の変化幅の制御量に基づいて目標基本周波数の時間変化情報を生成する。そして、目標値生成部５０ａは、話速の制御量と目標基本周波数の時間変化情報とを音声データ補正部８０に供給する。 The target value generation unit 50a takes in the speech speed value supplied from the voice analysis unit 30 and the time change information of the fundamental frequency, and controls the amount of speech speed according to the control condition information supplied from the control condition information generation unit 40a. And the control amount of the change width of the fundamental frequency. Further, the target value generation unit 50a generates time change information of the target fundamental frequency based on the control amount of the change width of the fundamental frequency. Then, the target value generation unit 50a supplies the speech data correction unit 80 with the control amount of the speech speed and the time change information of the target fundamental frequency.

次に、詳細に目標値生成部５０ａを説明する。目標値生成部５０ａは、音声合成装置１００の目標値生成部５０における制御情報テーブル記憶部５１と制御条件設定部５２と制御情報取得部５３とを、制御情報テーブル記憶部５１ａと制御条件設定部５２ａと制御情報取得部５３ａとに変更したものである。 Next, the target value generation unit 50a will be described in detail. The target value generation unit 50a includes a control information table storage unit 51, a control condition setting unit 52, and a control information acquisition unit 53 in the target value generation unit 50 of the speech synthesizer 100, a control information table storage unit 51a, and a control condition setting unit. 52a and the control information acquisition unit 53a.

制御情報テーブル記憶部５１ａは、話速の制御量と基本周波数の変化幅の制御量とを対応付けた複数の制御情報テーブルを、制御方法の種別に対応付けて記憶する。制御情報テーブルは、補正目的（例えば、難聴者向け、速聴、早口、雑音環境下）ごとに、話速の倍率に基本周波数の変化幅の倍率を対応付けたテーブルである。補正目的ごとの種制御情報テーブルの具体例については後述する。
制御情報テーブル記憶部５１ａは、読み出しおよび書き込み可能な記憶装置であり、例えば、磁気ハードディスク装置、半導体記憶装置等である。また、制御情報テーブル記憶部５１ａは、光磁気ディスク等の可搬型記録媒体であってもよい。 The control information table storage unit 51a stores a plurality of control information tables in which the control amount of the speech speed is associated with the control amount of the change width of the fundamental frequency in association with the type of control method. The control information table is a table in which the magnification of the fundamental frequency is associated with the magnification of the speech frequency for each correction purpose (for example, for the hearing impaired, fast listening, fast mouth, and noise environment). A specific example of the seed control information table for each correction purpose will be described later.
The control information table storage unit 51a is a readable / writable storage device, such as a magnetic hard disk device or a semiconductor storage device. The control information table storage unit 51a may be a portable recording medium such as a magneto-optical disk.

制御条件設定部５２ａは、制御条件情報生成部４０ａから供給される制御条件情報を取り込み、その制御条件情報に含まれる制御方法の種別を制御情報取得部５３ａに設定する。 The control condition setting unit 52a takes in the control condition information supplied from the control condition information generating unit 40a, and sets the type of the control method included in the control condition information in the control information acquisition unit 53a.

制御情報取得部５３ａは、制御条件設定部５２ａにより設定された制御方法の種別に対応付けられた制御情報テーブルを制御情報テーブル記憶部５１ａから読み込む。そして、制御情報取得部５３ａは、読み込んだ制御情報テーブルに格納された話速の制御量を音声データ補正部８０に供給するとともに、制御情報テーブルに格納された基本周波数の変化幅の制御量を目標基本周波数生成部５４に供給する。 The control information acquisition unit 53a reads the control information table associated with the control method type set by the control condition setting unit 52a from the control information table storage unit 51a. Then, the control information acquisition unit 53a supplies the control amount of the speech speed stored in the read control information table to the voice data correction unit 80, and the control amount of the change width of the fundamental frequency stored in the control information table. This is supplied to the target fundamental frequency generation unit 54.

具体的には、例えば、制御条件設定部５２ａにより制御方法の種別（制御方法Ａ）が設定された場合、制御情報取得部５３ａは、制御方法Ａに関係付けられた制御情報テーブル（難聴者向けの制御情報が格納されたテーブル）を制御情報テーブル記憶部５１ａから読み込む。そして、制御情報取得部５３ａは、読み込んだ制御情報テーブルに格納された話速の倍率と基本周波数の変化幅の倍率とを、それぞれ話速の制御量と基本周波数の変化幅の制御量とする。 Specifically, for example, when the type of control method (control method A) is set by the control condition setting unit 52a, the control information acquisition unit 53a controls the control information table (for the deaf person) associated with the control method A. Is read from the control information table storage unit 51a. Then, the control information acquisition unit 53a sets the magnification of the speech speed and the magnification of the change frequency of the fundamental frequency stored in the read control information table as the control amount of the speech speed and the change frequency of the fundamental frequency, respectively. .

また、例えば、制御条件設定部５２ａにより制御方法（制御方法Ｂ）が設定された場合、制御情報取得部５３ａは、制御方法Ｂに関連付けられた制御情報テーブル（速聴向けの制御情報が格納されたテーブル）を制御情報テーブル記憶部５１ａから読み込む。そして、制御情報取得部５３ａは、読み込んだ制御情報テーブルに格納された話速の倍率と基本周波数の変化幅の倍率とを、それぞれ話速の制御量と基本周波数の変化幅の制御量とする。 For example, when a control method (control method B) is set by the control condition setting unit 52a, the control information acquisition unit 53a stores a control information table (control information for quick listening) associated with the control method B. Table) is read from the control information table storage unit 51a. Then, the control information acquisition unit 53a sets the magnification of the speech speed and the magnification of the change frequency of the fundamental frequency stored in the read control information table as the control amount of the speech speed and the change frequency of the fundamental frequency, respectively. .

また、例えば、制御条件設定部５２ａにより制御方法（制御方法Ｃ）が設定された場合、制御情報取得部５３ａは、制御方法Ｃに関連付けられた制御情報テーブル（早口向けの制御情報が格納されたテーブル）を制御情報テーブル記憶部５１ａから読み込む。そして、制御情報取得部５３ａは、読み込んだ制御情報テーブルに格納された話速の倍率と基本周波数の変化幅の倍率とを、それぞれ話速の制御量と基本周波数の変化幅の制御量とする。 For example, when the control method (control method C) is set by the control condition setting unit 52a, the control information acquisition unit 53a stores the control information table associated with the control method C (control information for fast exit is stored). Table) is read from the control information table storage unit 51a. Then, the control information acquisition unit 53a sets the magnification of the speech speed and the magnification of the change frequency of the fundamental frequency stored in the read control information table as the control amount of the speech speed and the change frequency of the fundamental frequency, respectively. .

また、例えば、制御条件設定部５２ａにより制御方法（制御方法Ｄ）が設定された場合、制御情報取得部５３ａは、制御方法Ｄに関連付けられた制御情報テーブル（雑音環境下向けのテーブル）を制御情報テーブル記憶部５１ａから読み込む。そして、制御情報取得部５３ａは、話速の倍率を１倍として話速の制御量とし、読み込んだ制御情報テーブルに格納された基本周波数の変化幅の倍率を基本周波数の変化幅の制御量とする。 For example, when the control method (control method D) is set by the control condition setting unit 52a, the control information acquisition unit 53a controls the control information table (table for noise environment) associated with the control method D. Read from the information table storage unit 51a. Then, the control information acquisition unit 53a sets the speech speed magnification to 1 to obtain the speech speed control amount, and uses the fundamental frequency change width magnification stored in the read control information table as the fundamental frequency change width control amount. To do.

図９は、難聴者向けの制御情報が格納された制御情報テーブルの例を示す図である。同図に示す難聴者向けの制御情報テーブルは、音声データの話速を０．７倍する指定と、音声データの基本周波数の変化幅を１．１５倍する指定とを含んでいる。
図１０は、速聴向けの制御情報が格納された制御情報テーブルの例を示す図である。同図に示す速聴向けの制御情報テーブルは、音声データの話速を２．０倍する指定と、音声データの基本周波数の変化幅を０．９３倍する指定とを含んでいる。
図１１は、早口向けの制御情報が格納された制御情報テーブルの例を示す図である。同図に示す早口向けの制御情報テーブルは、音声データの話速を１．４倍する指定と、音声データの基本周波数の変化幅を０．８４倍する指定とを含んでいる。
図１２は、雑音環境下向けの制御情報テーブルの例を示す図である。同図に示す雑音環境下向けの制御情報テーブルは、音声データの話速を固定（１．０倍）する指定と、音声データの基本周波数の変化幅を１．１５倍する指定とを含んでいる。
なお、図９から図１２までの制御情報テーブルの例は、話速計算部３２から供給される音声データの話速値が７モーラ／秒から８モーラ／秒程度である場合の例である。 FIG. 9 is a diagram illustrating an example of a control information table in which control information for the hearing impaired person is stored. The control information table for the hearing-impaired person shown in the figure includes designation for increasing the speech speed of voice data by 0.7 and designation for increasing the change width of the fundamental frequency of voice data by 1.15.
FIG. 10 is a diagram illustrating an example of a control information table in which control information for quick listening is stored. The control information table for quick listening shown in the figure includes a designation for multiplying the speech speed of voice data by 2.0 and a designation for multiplying the change width of the fundamental frequency of voice data by 0.93.
FIG. 11 is a diagram illustrating an example of a control information table in which control information for a quick mouth is stored. The control information table for the quick mouth shown in the figure includes a designation for multiplying the speech speed of voice data by 1.4 and a designation for multiplying the change width of the fundamental frequency of voice data by 0.84.
FIG. 12 is a diagram illustrating an example of a control information table for a noisy environment. The control information table for a noisy environment shown in the figure includes a designation for fixing the speech speed of voice data (1.0 times) and a designation for multiplying the change width of the fundamental frequency of voice data by 1.15. Yes.
Note that the control information tables in FIGS. 9 to 12 are examples in which the speech speed value of the voice data supplied from the speech speed calculation unit 32 is about 7 mora / second to 8 mora / second.

次に、第２実施形態である音声合成装置１００ａの動作について説明する。図１３は、音声合成装置１００ａの韻律補正処理の処理手順を示すフローチャートである。なお、本実施形態において、第１実施形態である音声合成装置１００の処理手順と同一の処理については、同一の符号を付してその説明を省略する。
まず、ステップＳ１０１ａにおいて、制御条件情報生成部４０ａは、ユーザによるキーボードやマウス等の入力装置の操作によって所望の制御方法の種別の供給を受けると、この制御方法の種別を含めた制御条件情報を生成して目標値生成部５０ａに供給する。
次に、ステップＳ１０２ａにおいて、目標値生成部５０ａの制御条件設定部５２ａは、制御条件情報生成部４０ａから供給される制御条件情報を取り込み、その制御条件情報に含まれる制御方法の種別を制御情報取得部５３ａに設定する。 Next, the operation of the speech synthesizer 100a according to the second embodiment will be described. FIG. 13 is a flowchart showing the processing procedure of the prosody correction processing of the speech synthesizer 100a. In the present embodiment, the same processes as those of the speech synthesis apparatus 100 according to the first embodiment are denoted by the same reference numerals and the description thereof is omitted.
First, in step S101a, when the control condition information generation unit 40a receives supply of a desired control method type by an operation of an input device such as a keyboard or a mouse by the user, the control condition information generation unit 40a displays control condition information including the control method type. Generated and supplied to the target value generating unit 50a.
Next, in step S102a, the control condition setting unit 52a of the target value generation unit 50a takes in the control condition information supplied from the control condition information generation unit 40a, and sets the type of the control method included in the control condition information as control information. Set in the acquisition unit 53a.

ステップＳ１０５ａにおいて、制御情報取得部５３ａは、制御条件設定部５２ａにより設定された制御方法の種別に関連付けられた制御情報テーブルを制御情報テーブル記憶部５１ａから読み込む。次に、制御情報取得部５３ａは、読み込んだ制御情報テーブルに格納された話速の制御量を音声データ補正部８０に供給し、制御情報テーブルに格納された基本周波数の変化幅の制御量を目標基本周波数生成部５４に供給する。 In step S105a, the control information acquisition unit 53a reads the control information table associated with the control method type set by the control condition setting unit 52a from the control information table storage unit 51a. Next, the control information acquisition unit 53a supplies the control amount of the speech speed stored in the read control information table to the audio data correction unit 80, and the control amount of the change width of the fundamental frequency stored in the control information table is obtained. This is supplied to the target fundamental frequency generation unit 54.

以上説明したように、第２実施形態である音声合成装置１００ａは、ユーザによる入力装置の操作により補正目的に応じた制御方法の種別を指定することによって、その補正目的に対応した話速の制御量と基本周波数の変化幅の制御量とを容易に且つ効率的に取得することができる。そして、音声合成装置１００ａは、それぞれ取得した話速の制御量と基本周波数の変化幅の制御量とに基づいて、音声データの話速および基本周波数の変化幅に関する韻律をユーザの補正目的に沿って補正することができる。
したがって、本実施形態である音声合成装置１００ａによれば、音響に関する専門知識や豊かな調整経験を必要とせず、容易に且つ効率的に音声データを聴き易い補正後音声データに補正することができる。 As described above, the speech synthesizer 100a according to the second embodiment controls the speech speed corresponding to the correction purpose by designating the type of the control method according to the correction purpose by the operation of the input device by the user. The amount and the control amount of the change width of the fundamental frequency can be acquired easily and efficiently. Then, the speech synthesizer 100a determines the prosody related to the speech speed and the change frequency of the fundamental frequency based on the acquired control amount of the speech speed and the control amount of the change frequency of the fundamental frequency according to the user's correction purpose. Can be corrected.
Therefore, according to the speech synthesizer 100a according to the present embodiment, it is possible to easily and efficiently correct the sound data to the corrected sound data that does not require specialized knowledge and rich adjustment experience related to sound. .

なお、第１実施形態である音声合成装置１００と第２実施形態である音声合成装置１００ａとのそれぞれは、音声属性情報取得部２０を搭載しない構成としてもよい。その場合、音声分析部３０の話速計算部３２は、以下の処理を行って音声データの話速値を計算する。つまり、話速計算部３２は、音声データから公知の音声認識処理を適用してテキスト情報を生成するか、または音声データから音素区切時間情報を生成する。そして、話速計算部３２は、第１実施形態に詳述したように、音声データとテキスト情報とに基づいて音素区切時間情報を生成し、生成した音素区切時間情報に基づいて話速を計算する（第１の話速値を計算する方法）か、音声データから音声区間の総時間長を求めるとともにテキスト情報からモーラ数を求めて、これら音声区間の総時間長とモーラ数とに基づいて話速値を計算する（第２の話速値を計算する方法）。または、話速計算部３２は、音素区切時間情報から単位時間当たりのモーラ数を求めることによって話速値を得る。 Each of the speech synthesizer 100 according to the first embodiment and the speech synthesizer 100a according to the second embodiment may be configured not to include the speech attribute information acquisition unit 20. In that case, the speech rate calculation unit 32 of the speech analysis unit 30 performs the following processing to calculate the speech rate value of the speech data. That is, the speech speed calculation unit 32 generates text information by applying a known speech recognition process from the speech data, or generates phoneme separation time information from the speech data. Then, as described in detail in the first embodiment, the speech rate calculation unit 32 generates phoneme segmentation time information based on the voice data and the text information, and calculates the speech rate based on the generated phoneme segmentation time information. (A method for calculating the first speech speed value) or obtaining the total time length of the speech section from the speech data and obtaining the number of mora from the text information, and based on the total time length of the speech section and the number of mora The speech speed value is calculated (method of calculating the second speech speed value). Alternatively, the speech speed calculation unit 32 obtains the speech speed value by obtaining the number of mora per unit time from the phoneme break time information.

また、第２実施形態である音声合成装置１００ａの制御情報テーブル記憶部５１ａは、話速の倍率と基本周波数の変化幅の倍率とを対応付けた複数の制御情報テーブルを、制御方法の種別に対応付けて記憶するものであった。これ以外にも、制御情報テーブル記憶部５１ａは、話速値の絶対値と基本周波数の変化幅の絶対値とを対応付けた複数の制御情報テーブルを、制御方法の種別に対応付けて記憶するようにしてもよい。 In addition, the control information table storage unit 51a of the speech synthesizer 100a according to the second embodiment sets a plurality of control information tables in which the rate of speech speed and the rate of change of the fundamental frequency are associated with each other as the type of control method. The data was stored in association with each other. In addition to this, the control information table storage unit 51a stores a plurality of control information tables in which the absolute value of the speech speed value and the absolute value of the change width of the fundamental frequency are associated with each other in association with the type of the control method. You may do it.

また、音声合成装置１００，１００ａの一部の機能をコンピュータで実現するようにしてもよい。この場合、その機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や周辺機器のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、光ディスク、メモリカード等の可搬型記録媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持するものを含んでもよい。また上記のプログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせにより実現するものであってもよい。 Further, some functions of the speech synthesizer 100, 100a may be realized by a computer. In this case, a program for realizing the function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS (Operating System) and hardware of peripheral devices. The “computer-readable recording medium” refers to a portable recording medium such as a flexible disk, a magneto-optical disk, an optical disk, and a memory card, and a storage device such as a hard disk built in the computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case may be included and a program that holds a program for a certain period of time may be included. Further, the above program may be for realizing a part of the functions described above, or may be realized by a combination with the program already recorded in the computer system. .

以上、本発明の実施形態について図面を参照して詳述したが、具体的な構成はこの実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計等も含まれる。 As mentioned above, although embodiment of this invention was explained in full detail with reference to drawings, the concrete structure is not restricted to this embodiment, The design etc. of the range which does not deviate from the summary of this invention are included.

１０音声データ取得部
２０音声属性情報取得部
３０音声分析部
３１基本周波数分析部
３２話速計算部
４０，４０ａ制御条件情報生成部
５０，５０ａ目標値生成部
５１，５１ａ制御情報テーブル記憶部
５２，５２ａ制御条件設定部
５３，５３ａ制御情報取得部
５４目標基本周波数生成部
６０制御情報テーブル更新部
７０生成条件取得部
８０音声データ補正部 DESCRIPTION OF SYMBOLS 10 Voice data acquisition part 20 Voice attribute information acquisition part 30 Voice analysis part 31 Fundamental frequency analysis part 32 Speech speed calculation part 40, 40a Control condition information generation part 50, 50a Target value generation part 51, 51a Control information table storage part 52, 52a Control condition setting unit 53, 53a Control information acquisition unit 54 Target fundamental frequency generation unit 60 Control information table update unit 70 Generation condition acquisition unit 80 Audio data correction unit

Claims

A control information table storage unit that stores a control information table in which a control amount of speech speed is associated with a control amount of a change width of a fundamental frequency in association with a type of control method;
An audio data acquisition unit for acquiring audio data;
A fundamental frequency analysis unit that generates time change information of a fundamental frequency based on the speech data acquired by the speech data acquisition unit;
A speech speed calculator that calculates a speech speed value based on the voice data;
A control condition setting unit that sets a type of a desired control method and key information that specifies a control amount of a speech speed or a control amount of a change width of a fundamental frequency in a control information table;
A control information table associated with the type of control method set by the control condition setting unit is read from the control information table storage unit, and the key information set by the control condition setting unit and the speech speed calculation unit are calculated. Based on the speech speed value and the time variation information of the fundamental frequency generated by the fundamental frequency analysis unit, a control information acquisition unit that acquires a control amount of the speech speed and a control amount of the variation range of the fundamental frequency;
A target fundamental frequency generation unit that generates time change information of a target fundamental frequency from time change information of the fundamental frequency based on a control amount of a change width of the fundamental frequency acquired by the control information acquisition unit;
Based on the control amount of the speech speed acquired by the control information acquisition unit and the time change information of the target basic frequency generated by the target basic frequency generation unit, the audio data is corrected and corrected audio data is obtained. An audio data correction unit to output;
A speech synthesizer comprising:

The absolute value of the speech speed is set as the control amount of the speech speed, the absolute value of the change width of the basic frequency is set as the control amount of the change width of the basic frequency, and the control information table storage unit stores each of the absolute values of the plurality of speech speed values. The speech synthesizer according to claim 1, wherein a control information table in which absolute values of change widths of fundamental frequencies are associated is stored in association with the type of control method.

The rate of speech speed is defined as the amount of control of speech rate, the rate of change in the basic frequency is defined as the control amount of the rate of change in basic frequency, and the control information table storage unit changes the fundamental frequency for each of multiple rates of speech speed. A control information table in which the magnification of the width is associated is stored in association with the type of control method,
The control condition setting unit further sets a desired magnification for the key information,
The control information acquisition unit includes the key information set by the control condition setting unit, the magnification, the speech speed value calculated by the speech speed calculation unit, and time change information of the fundamental frequency generated by the fundamental frequency analysis unit. The speech synthesizer according to claim 1, further comprising: acquiring a control amount of a speech speed and a control amount of a change width of a fundamental frequency based on

The control information table storage unit associates a control information table in which a multiple of a plurality of speech speeds is associated with a magnification of a change width of a fundamental frequency in association with each of a plurality of designated speech speed values, and is associated with a type of control method. Remember,
The control information acquisition unit is a control information table corresponding to a designated speech speed value related to the type of the control method set by the control condition setting unit and the same as the speech speed value calculated by the speech speed calculation unit From the control information table storage unit, based on the key information set by the control condition setting unit, the magnification, the speech speed value, and the time change information of the fundamental frequency generated by the fundamental frequency analysis unit, The speech synthesizer according to claim 3, wherein a speech rate control amount and a control amount of a fundamental frequency change width are acquired.

The control information table storage unit includes a control information table in which a magnification of the basic frequency that is the control amount of the fundamental frequency is associated with a magnification of the speech speed that is the control amount of the speech speed, and a type of control method Remember and associate with
The control condition setting unit sets a type of a desired control method,
The control information acquisition unit reads a control information table associated with the type of the control method set by the control condition setting unit from the control information table storage unit, and controls a speech rate control amount and a basic frequency from the control information table. The speech synthesizer according to claim 1, further comprising: obtaining a control amount of a change width of

A computer including a control information table storage unit that stores a control information table in which a control amount of a speech speed and a control amount of a change width of a fundamental frequency are associated with each other,
Audio data acquisition means for acquiring audio data;
Fundamental frequency analysis means for generating time change information of the fundamental frequency based on the voice data acquired by the voice data acquisition means;
Speaking speed calculation means for calculating a speaking speed value based on the voice data;
A control condition setting means for setting a type of a desired control method and key information indicating a control amount of speech speed or a control amount of a change width of a fundamental frequency in the control information table;
The control information table associated with the type of the control method set by the control condition setting means is read from the control information table storage unit, and the key information set by the control condition setting means and the speech speed calculation means are calculated. Control information acquisition means for acquiring the control amount of the speech speed and the control amount of the change width of the fundamental frequency based on the speech speed value and the time change information of the fundamental frequency generated by the fundamental frequency analysis means;
Target basic frequency generation means for generating time change information of the target fundamental frequency from time change information of the fundamental frequency based on a control amount of the change width of the fundamental frequency acquired by the control information acquisition means;
Based on the control amount of the speech speed acquired by the control information acquisition unit and the time change information of the target basic frequency generated by the target basic frequency generation unit, the audio data is corrected and corrected audio data is obtained. Audio data correction means for outputting;
Computer program to function as.