JP6347536B2

JP6347536B2 - Sound synthesis method and sound synthesizer

Info

Publication number: JP6347536B2
Application number: JP2014036603A
Authority: JP
Inventors: 秀樹坂野; 裕展西脇
Original assignee: Meijo University
Current assignee: Meijo University
Priority date: 2014-02-27
Filing date: 2014-02-27
Publication date: 2018-06-27
Anticipated expiration: 2034-02-27
Also published as: JP2015161774A

Description

本発明は音合成方法及び音合成装置に関するものである。 The present invention relates to a sound synthesis method and a sound synthesis apparatus.

本発明の音合成方法及び音合成装置は、音声信号の音色を変換する際に基板技術として用いられる音声分析合成方式を拡張するものである。音声分析合成方式は、多くの場合、声の高さに関する情報である基本周波数、声道の情報を表すスペクトル包絡、及び無声音か有声音かを表す有声無声判定情報の３つの時間的に変化する情報を音声波形から分析して抽出し、これら情報から合成音を生成する。 The sound synthesizing method and the sound synthesizing apparatus of the present invention extend the voice analysis / synthesis method used as a substrate technology when converting the tone color of a voice signal. In many cases, a speech analysis / synthesis method changes in three times: a fundamental frequency that is information related to voice pitch, a spectrum envelope that represents information of the vocal tract, and voiced / unvoiced determination information that represents whether voiced or voiced. Information is analyzed and extracted from the speech waveform, and a synthesized sound is generated from the information.

スペクトル包絡は、音声信号の分析において求められる振幅スペクトルから、声の高さにも関係する周波数方向に変化する細かい変動情報を除去したものである。また、スペクトル包絡は音の音色の情報に深く関係している。つまり、音声において、スペクトル包絡は声道の情報と密接な関係にあり、誰が話しているかを表す話者性の情報や、何を話しているかを表す音韻性の情報を多分に含んでいる。 The spectrum envelope is obtained by removing detailed fluctuation information that changes in the frequency direction related to the voice pitch from the amplitude spectrum obtained in the analysis of the voice signal. The spectral envelope is deeply related to the tone color information. That is, in speech, the spectrum envelope is closely related to vocal tract information, and includes a lot of speaker information indicating who is speaking and phonological information indicating what is being spoken.

音声分析合成方式は音声信号から声道の情報を分離して表現することから、声道の制御が容易である。しかし、音声分析合成方式は、音声波形に含まれる情報の中で、振幅スペクトルと対になる位相スペクトルの情報をほとんど含んでいない。位相スペクトルは声帯振動における雑音性の情報等を多く含んでいる。そこで、音声分析合成方式は位相スペクトルを有声無声判定情報という縮退した情報に置き換えて合成音を生成している。この場合、アナウンサーのような声帯振動が規則的な音声であれば、このような縮退した情報であっても問題は少ないが、ハスキーな声やガラガラ声等の声帯振動が特殊な音声を入力し、再合成した場合は合成音の劣化が発生する。 Since the speech analysis and synthesis method separates and expresses vocal tract information from a speech signal, it is easy to control the vocal tract. However, the speech analysis and synthesis method hardly includes information on the phase spectrum that is paired with the amplitude spectrum in the information included in the speech waveform. The phase spectrum contains a lot of noise information in vocal cord vibration. Therefore, the speech analysis and synthesis method generates a synthesized sound by replacing the phase spectrum with degenerate information called voiced / unvoiced determination information. In this case, if vocal cord vibration like an announcer is regular voice, there is little problem with such degenerate information, but vocal cord vibration such as a husky voice or rattle voice inputs special voice. When re-synthesized, the synthesized sound deteriorates.

さらに、近年では歌声合成システムにおいて、歌手が声帯の振動を積極的に制御するシャウト唱法やスクリーム唱法等を再現可能なシステムの実現の期待が高まっている。これらの音声も声帯振動がきわめて特殊であり、既存の音声分析合成方式では高品質な再現が難しい。声帯振動が特殊な音声の音声波形そのものを大量に収録しておき、それを使用する方法が用いられることも有るが、収録したものしか再現できないため、ユーザーが所望する声帯振動を再現することは困難である。 Furthermore, in recent years, in a singing voice synthesizing system, there is an increasing expectation for realizing a system that can reproduce a shout method, a scream method, and the like in which a singer actively controls vibration of a vocal cord. These voices are also very special in vocal cord vibration, and it is difficult to reproduce with high quality by the existing voice analysis and synthesis method. There are cases where the vocal waveform itself is recorded in large quantities and the method of using it is used, but only the recorded one can be reproduced, so it is impossible to reproduce the vocal cord vibration desired by the user Have difficulty.

そこで、非特許文献１は、雑音性の強い声帯振動を持つ音声を高品質に再合成したり、雑音性を強調したりすることができる従来の音声分析合成方式を開示している。この音声分析合成方式は、位相スペクトルと同等の情報を持つ群遅延スペクトルを音声信号から抽出し、その値を増幅することによって、声帯振動の雑音性を強調することができる。 Therefore, Non-Patent Document 1 discloses a conventional speech analysis and synthesis method that can re-synthesize speech having strong noisy vocal cord vibrations with high quality or enhance noise. This speech analysis and synthesis method can enhance the noise characteristics of vocal cord vibrations by extracting a group delay spectrum having information equivalent to the phase spectrum from the speech signal and amplifying the value.

また、特許文献１は従来の音声合成方式を開示している。この音声合成方式は位相スペクトルの情報を変化させるものである。 Patent Document 1 discloses a conventional speech synthesis method. This speech synthesis method changes the information of the phase spectrum.

特開平１０−９７２８７号公報JP-A-10-97287

坂野秀樹、陸金林、中村哲、鹿野清宏、河原英紀、「時間領域平滑化群遅延による位相制御を用いた声質制御方式」、電子情報通信学会論文誌、Ｄ−II，Ｖｏｌ．Ｊ８３−Ｄ−II，Ｎｏ．１１，ｐｐ．２２７６−２２８２，２０００年１１月Hideki Sakano, Rikukinbayashi, Satoshi Nakamura, Kiyohiro Shikano, Hideki Kawahara, "Voice quality control method using phase control by time domain smoothing group delay", IEICE Transactions, D-II, Vol. J83-D-II, no. 11, pp. 2276-2282, November 2000

しかし、非特許文献１の音声分析合成方式は、群遅延スペクトルを安定して取り出すために、ピッチマーキングという前処理が必要になる。ピッチマーキングは、誤りの多い自動処理であるため、誤りが多く発生した場合は、手動で修正を行わないと高い品質の音声合成ができない。また、特許文献１の音声合成方式は、入力音声の位相スペクトルの情報を利用するものではなく、単に位相スペクトルの情報を変化させるものである。 However, the speech analysis and synthesis method of Non-Patent Document 1 requires a pre-processing called pitch marking in order to stably extract the group delay spectrum. Since pitch marking is an automatic process with many errors, if many errors occur, high-quality speech synthesis cannot be performed without manual correction. Further, the speech synthesis method of Patent Document 1 does not use the phase spectrum information of the input speech, but simply changes the phase spectrum information.

本発明は、上記従来の実情に鑑みてなされたものであって、高品質の合成音を容易に生成することができる音合成方法及び音合成装置を提供することを解決すべき課題としている。 The present invention has been made in view of the above-described conventional situation, and it is an object to be solved to provide a sound synthesis method and a sound synthesis apparatus that can easily generate high-quality synthesized sound.

本発明の音合成方法は、合成音を生成する音合成方法であって、
予め定めた帯域分割したスペクトルの尖度又はスペクトルフラットネスの値と群遅延スペクトルの変動量との対応関係に基づいて、前記帯域分割したスペクトルの尖度又はスペクトルフラットネスの任意の値に対応させた群遅延スペクトルの変動量を生成する第１工程と、
この第１工程で生成された前記群遅延スペクトルの変動量を用いて群遅延スペクトルを生成する第２工程と、
この第２工程で生成された群遅延スペクトルを積分又は累積和を計算して位相スペクトルに変換する第３工程と、
任意の振幅スペクトル又は任意のスペクトル包絡と、前記第３工程で変換された前記位相スペクトルとを組み合わせて複素スペクトルを求め、この複素スペクトルを逆フーリエ変換して１周期分の信号である１ピッチ波形を生成する第４工程と、
この第４工程で生成した前記１ピッチ波形を重畳加算して合成音を生成する第５工程と、
を備えており、
前記第２工程から第５工程を繰り返して合成用フレーム長の合成音を生成することを特徴とする。 The sound synthesis method of the present invention is a sound synthesis method for generating a synthesized sound,
Predetermined spectral kurtosis that band division or based on the correspondence between the value and the variation amount of the group delay spectrum of the spectral flatness, to correspond to any value of kurtosis or spectral flatness of spectrum the band division A first step of generating a variation amount of the group delay spectrum,
A second step of generating a group delay spectrum using the variation amount of the group delay spectrum generated in the first step;
A third step of calculating an integral or cumulative sum of the group delay spectrum generated in the second step and converting it into a phase spectrum;
A complex spectrum is obtained by combining an arbitrary amplitude spectrum or an arbitrary spectral envelope and the phase spectrum converted in the third step, and the complex spectrum is subjected to inverse Fourier transform to obtain a one-pitch waveform which is a signal for one period. A fourth step of generating
A fifth step of generating a synthesized sound by superimposing and adding the 1 pitch waveform generated in the fourth step ;
With
The second to fifth steps are repeated to generate a synthesized sound having a frame length for synthesis.

また、本発明の音合成装置は、合成音を生成する音合成装置であって、
予め定めた帯域分割したスペクトルの尖度又はスペクトルフラットネスの値と群遅延スペクトルの変動量との対応関係を記憶した記憶部と、
この記憶部に記憶された前記帯域分割したスペクトルの尖度又はスペクトルフラットネスの値と前記群遅延スペクトルの変動量との対応関係に基づいて、前記帯域分割したスペクトルの尖度又はスペクトルフラットネスの任意の値に対応させた群遅延スペクトルの変動量を生成する群遅延変動量生成部と、
この群遅延変動量生成部で生成された前記群遅延スペクトルの変動量を用いて群遅延スペクトルを生成する群遅延生成部と、
この群遅延生成部で生成された群遅延スペクトルを積分又は累積和を計算して位相スペクトルに変換する位相生成部と、
任意の振幅スペクトル又は任意のスペクトル包絡と、前記位相生成部で変換された前記位相スペクトルとを組み合わせて複素スペクトルを求め、この複素スペクトルを逆フーリエ変換して１周期分の信号である１ピッチ波形を生成する１ピッチ波形生成部と、
この１ピッチ波形生成部で生成した前記１ピッチ波形を重畳加算して合成音を生成する重畳加算部と、
を備えていることを特徴とする。 The sound synthesizer of the present invention is a sound synthesizer that generates a synthesized sound,
A storage unit storing a correspondence relationship between a kurtosis or spectrum flatness value of a predetermined band-divided spectrum and a variation amount of the group delay spectrum;
Based on the correspondence relationship between the kurtosis or spectrum flatness value of the band-divided spectrum stored in the storage unit and the variation amount of the group delay spectrum, the kurtosis or spectrum flatness of the band-divided spectrum is calculated. A group delay variation generating unit for generating a group delay spectrum variation corresponding to an arbitrary value;
A group delay generation unit that generates a group delay spectrum using the variation amount of the group delay spectrum generated by the group delay variation generation unit;
A phase generation unit that calculates an integral or cumulative sum of the group delay spectrum generated by the group delay generation unit and converts it into a phase spectrum; and
A complex spectrum is obtained by combining an arbitrary amplitude spectrum or an arbitrary spectral envelope and the phase spectrum converted by the phase generation unit, and the complex spectrum is subjected to inverse Fourier transform to obtain a one-pitch waveform which is a signal for one period. A one-pitch waveform generating unit for generating
A superposition addition unit that superimposes and adds the one pitch waveform generated by the one pitch waveform generation unit to generate a synthesized sound ;
It is characterized by having.

この音合成方法及び音合成装置は群遅延スペクトルの変動量と対応関係にある帯域分割したスペクトルの尖度又はスペクトルフラットネスを利用して合成音を生成する。帯域分割したスペクトルの尖度又はスペクトルフラットネスの値は群遅延スペクトルの変動量とは対応関係にあるため、予め定めておき、帯域分割したスペクトルの尖度又はスペクトルフラットネスの任意の値に対応させて群遅延スペクトルの変動量を生成することができる。これは、群遅延スペクトルを忠実に再現するのではなく、各帯域における群遅延スペクトルの変動の度合いを再現することで、雑音性を有する合成音の生成を可能にするものである。 This sound synthesizing method and sound synthesizing apparatus generates a synthesized sound by using the kurtosis or spectrum flatness of a spectrum obtained by dividing a band corresponding to the variation amount of the group delay spectrum. Since the kurtosis or spectral flatness value of the band-divided spectrum has a corresponding relationship with the fluctuation amount of the group delay spectrum, it is determined in advance and corresponds to any value of the kurtosis or spectral flatness of the band-divided spectrum Thus, the fluctuation amount of the group delay spectrum can be generated. This does not faithfully reproduce the group delay spectrum but reproduces the degree of fluctuation of the group delay spectrum in each band, thereby enabling generation of a synthesized sound having noise characteristics.

したがって、本発明の音合成方法及び音合成装置は高品質の合成音を容易に生成することができる。 Therefore, the sound synthesis method and sound synthesis apparatus of the present invention can easily generate high-quality synthesized sound.

実施例１の音合成装置を示すブロック図である。1 is a block diagram illustrating a sound synthesizer according to a first embodiment. 実施例１の音合成方法を示すフローチャートである。3 is a flowchart illustrating a sound synthesis method according to the first embodiment. 尖度の値を示すグラフである。It is a graph which shows the value of kurtosis. 群遅延スペクトルの変動量を示すグラフである。It is a graph which shows the variation | change_quantity of a group delay spectrum. 指標−群遅延変動量対応を示すグラフである。It is a graph which shows an index-group delay variation amount correspondence. 群遅延スペクトルを示すグラフである。It is a graph which shows a group delay spectrum. 位相スペクトルを示すグラフである。It is a graph which shows a phase spectrum. スペクトル包絡を示すグラフである。It is a graph which shows a spectrum envelope. １ピッチ波形を示すグラフである。It is a graph which shows 1 pitch waveform. 合成音を示すグラフである。It is a graph which shows a synthetic sound. 実施例２の音合成装置を示すブロック図である。It is a block diagram which shows the sound synthesizer of Example 2. 実施例２の音合成方法を示すフローチャートである。6 is a flowchart illustrating a sound synthesis method according to the second embodiment.

本発明における好ましい実施の形態を説明する。 A preferred embodiment of the present invention will be described.

本発明の音合成方法において、前記帯域分割したスペクトルの尖度又はスペクトルフラットネスの任意の値は入力された音信号に対して設定した時間長さのフレーム長の分析用信号から抽出され得る。この場合、入力された音信号から群遅延スペクトルを抽出するよりも容易に抽出することができる帯域分割したスペクトルの尖度又はスペクトルフラットネスを抽出するため、入力された音信号の分析を容易に行うことができる。 In the sound synthesis method of the present invention, the arbitrary value of the kurtosis or spectrum flatness of the band-divided spectrum can be extracted from the analysis signal having the frame length of the time length set for the input sound signal. In this case, it is easier to analyze the input sound signal to extract the kurtosis or spectrum flatness of the band-divided spectrum that can be extracted more easily than extracting the group delay spectrum from the input sound signal. It can be carried out.

本発明の音合成方法において、前記任意の振幅スペクトル又は任意のスペクトル包絡は、入力された音信号に対して設定した時間長さのフレーム長の分析用信号から抽出され得る。この場合、位相スペクトルと組み合わせて複素スペクトルを求める際の振幅スペクトル又はスペクトル包絡を分析用信号から抽出したものを利用することによって、入力された音信号により近い合成音を生成することができる。 In the sound synthesis method of the present invention, the arbitrary amplitude spectrum or the arbitrary spectrum envelope can be extracted from the analysis signal having the frame length of the time length set for the input sound signal. In this case, a synthesized sound closer to the input sound signal can be generated by using the amplitude spectrum or spectrum envelope extracted from the analysis signal in obtaining the complex spectrum in combination with the phase spectrum.

本発明の音合成方法において、前記第２工程で生成される前記群遅延スペクトルは群遅延スペクトルの変動量に所定の係数を乗じて生成され得る。この場合、群遅延スペクトルの変動量に乗じる所定の係数によって、合成音の雑音性を増幅させたり、減衰させたりすることができる。 In the sound synthesis method of the present invention, the group delay spectrum generated in the second step can be generated by multiplying a variation amount of the group delay spectrum by a predetermined coefficient. In this case, the noise characteristic of the synthesized sound can be amplified or attenuated by a predetermined coefficient that is multiplied by the fluctuation amount of the group delay spectrum.

本発明の音合成方法において、前記第２工程で生成される前記群遅延スペクトルは群遅延スペクトルの変動量に乱数を乗じて生成され得る。この場合、群遅延スペクトルの変動量に乗じる乱数によって、合成音の雑音性を増幅させたり、減衰させたりすることができると共に、合成音の雑音感をより良好に生成することができる。 In the sound synthesis method of the present invention, the group delay spectrum generated in the second step may be generated by multiplying the variation amount of the group delay spectrum by a random number. In this case, the noise characteristic of the synthesized sound can be amplified or attenuated by a random number multiplied by the variation amount of the group delay spectrum, and the noise feeling of the synthesized sound can be generated better.

本発明の音合成装置において、入力された音信号に対して設定した時間長さのフレーム長毎に分析用信号を抽出する分析用信号抽出部と、この分析用信号抽出部で抽出された前記分析用信号から帯域分割したスペクトルの尖度又はスペクトルフラットネスの値を抽出する指標抽出部とを備え得る。この場合、分析用信号から抽出する帯域分割したスペクトルの尖度又はスペクトルフラットネスの値は、分析用信号から群遅延スペクトルを抽出するよりも容易に抽出することができるため、入力された音信号の分析を容易に行うことができる。 In the sound synthesizer of the present invention, an analysis signal extraction unit that extracts an analysis signal for each frame length of a time length set for an input sound signal, and the analysis signal extraction unit that extracts the analysis signal An index extraction unit that extracts a value of spectrum kurtosis or spectrum flatness obtained by band-dividing from the analysis signal. In this case, the kurtosis or spectral flatness value of the spectrum obtained by dividing the band extracted from the analysis signal can be extracted more easily than the group delay spectrum extracted from the analysis signal. Can be easily analyzed.

本発明の音合成装置において、入力された音信号に対して設定した時間長さのフレーム長毎に分析用信号を抽出する分析用信号抽出部と、この分析用信号抽出部で抽出された前記分析用信号から前記振幅スペクトル又は前記スペクトル包絡を抽出するスペクトル抽出部とを備え得る。この場合、分析用信号抽出部で入力された音信号から分析用信号を抽出し、スペクトル抽出部で分析用信号から振幅スペクトル又はスペクトル包絡を抽出する。これによって、位相スペクトルと組み合わせて複素スペクトルを求める際の振幅スペクトル又はスペクトル包絡を分析用信号から抽出したものを利用することができ、入力された音信号により近い合成音を生成することができる。 In the sound synthesizer of the present invention, an analysis signal extraction unit that extracts an analysis signal for each frame length of a time length set for an input sound signal, and the analysis signal extraction unit that extracts the analysis signal And a spectrum extraction unit that extracts the amplitude spectrum or the spectrum envelope from the analysis signal. In this case, the analysis signal is extracted from the sound signal input by the analysis signal extraction unit, and the amplitude spectrum or spectrum envelope is extracted from the analysis signal by the spectrum extraction unit. As a result, it is possible to use an amplitude spectrum or spectrum envelope extracted from a signal for analysis when a complex spectrum is obtained in combination with a phase spectrum, and a synthesized sound closer to the input sound signal can be generated.

次に、本発明の音合成方法及び音合成装置を具体化した実施例１及び２について、図面を参照しつつ説明する。 Next, Embodiments 1 and 2 embodying the sound synthesis method and sound synthesis apparatus of the present invention will be described with reference to the drawings.

＜実施例１＞
実施例１の音合成装置は、図１に示すように、分析部１０と合成部２０とを備えている。分析部１０は、分析用信号抽出部１１、スペクトル抽出部１２、基本周波数抽出部１３、及び指標抽出部１４を有している。合成部２０は群遅延変動量生成部２１、群遅延生成部２２、位相生成部２３、１ピッチ波形生成部２４、重畳加算部２５、及び記憶部２６を有している。この音合成装置を利用した音合成方法は、音合成装置に入力された音信号を分析部１０で分析し、分析によって得られた情報に基づき、合成部２０で合成音を生成する。 <Example 1>
As shown in FIG. 1, the sound synthesizer according to the first embodiment includes an analysis unit 10 and a synthesis unit 20. The analysis unit 10 includes an analysis signal extraction unit 11, a spectrum extraction unit 12, a fundamental frequency extraction unit 13, and an index extraction unit 14. The synthesizing unit 20 includes a group delay variation generating unit 21, a group delay generating unit 22, a phase generating unit 23, a 1 pitch waveform generating unit 24, a superposition adding unit 25, and a storage unit 26. In the sound synthesizing method using this sound synthesizer, the sound signal input to the sound synthesizer is analyzed by the analyzer 10, and the synthesized sound is generated by the synthesizer 20 based on information obtained by the analysis.

この音合成装置を利用した音合成方法は、図２に示すように、先ず、分析用信号抽出部１１において、音合成装置に入力された音信号に対し、分析開始点から設定した時間長さのフレーム長の分析用信号を抽出する（ステップＳ１）。必要に応じて、抽出した分析用信号に分析窓を乗じる。以下において、このフレームにおけるフレーム番号をｍとする。 As shown in FIG. 2, in the sound synthesis method using this sound synthesizer, first, the analysis signal extraction unit 11 sets the time length set from the analysis start point for the sound signal input to the sound synthesizer. The signal for analysis of the frame length is extracted (step S1). If necessary, the extracted analysis signal is multiplied by an analysis window. In the following, it is assumed that the frame number in this frame is m.

次に、指標抽出部１４において、分析用信号抽出部１１で抽出された分析用信号から周期性を表す指標である帯域分割したスペクトルの尖度の値Ｋ_ｍ（ｆ）を抽出する（ステップＳ２）。帯域分割したスペクトルの尖度の値Ｋ_ｍ（ｆ）は、図３に示すように、周波数ｆに依存し、群遅延スペクトルの変動量Ｗ_ｍ（ｆ）に対応している。帯域分割したスペクトルの尖度の値Ｋ_ｍ（ｆ）は容易かつ安定して音信号から抽出することができるため、入力された音信号の分析を容易に行うことができる。また、周波数に応じて変化する帯域分割したスペクトルの尖度の値Ｋ_ｍ（ｆ）を用いることによって、高品質な合成音を生成することができる。 Next, the index extraction unit 14 extracts the kurtosis value K _m (f) of the band-divided spectrum, which is an index representing periodicity, from the analysis signal extracted by the analysis signal extraction unit 11 (step S2). ). As shown in FIG. 3, the kurtosis value K _m (f) of the band-divided spectrum depends on the frequency f and corresponds to the fluctuation amount W _m (f) of the group delay spectrum. Since the kurtosis value K _m (f) of the spectrum obtained by the band division can be easily and stably extracted from the sound signal, the input sound signal can be easily analyzed. Further, by using the kurtosis value K _m (f) of the spectrum obtained by band division that changes according to the frequency, a high-quality synthesized sound can be generated.

次に、群遅延変動量生成部２１において、群遅延スペクトルの変動量Ｗm（ｆ）を生成する第１工程を実行する（ステップＳ３）。第１工程で生成された群遅延スペクトルの変動量Ｗm（ｆ）を図４に示す。この群遅延スペクトルの変動量Ｗm（ｆ）は、記憶部２６に蓄積された「指標−群遅延変動量対応情報」（図５参照）に基づいて生成される。帯域分割したスペクトルの尖度の値Ｋm（ｆ）と群遅延スペクトルの変動量Ｗm（ｆ）との対応関係は、実験的に予め定めておき、記憶部２６に「指標−群遅延変動量対応情報」として蓄積されている。指標である帯域分割したスペクトルの尖度の値Ｋm（ｆ）と群遅延スペクトルの変動量Ｗm（ｆ）との対応関係を表す関数Ψを周波数ｆと帯域分割したスペクトルの尖度の値Ｋm（ｆ）に依存したものであるとすると、Ｗm（ｆ）＝Ψ（ｆ，Ｋm（ｆ））と表すことができる。 Next, the group delay variation generating unit 21 executes a first step of generating the group delay spectrum variation Wm (f) (step S3). The variation amount Wm (f) of the group delay spectrum generated in the first step is shown in FIG. The variation amount Wm (f) of the group delay spectrum is generated based on “index-group delay variation correspondence information” (see FIG. 5) accumulated in the storage unit 26. The correspondence relationship between the kurtosis value Km (f) of the band-divided spectrum and the group delay spectrum fluctuation amount Wm (f) is experimentally determined in advance and stored in the storage unit 26 as “index-group delay fluctuation amount correspondence”. Information "is accumulated. The function Ψ representing the correspondence between the index kurtosis value Km (f) of the band-divided spectrum and the group delay spectrum variation Wm (f) is used as the index, and the kurtosis value Km ( If it depends on f), it can be expressed as Wm (f) = Ψ (f, Km (f)).

ここでは、まず、人工的に群遅延スペクトルの変動量Ｗm（ｆ）を与えて作成した信号から帯域分割したスペクトルの尖度の値Ｋm（ｆ）を観察し、これらの関係をシグモイド関数に基づく式１で近似した。 Here, first, a kurtosis value Km (f) of a spectrum obtained by band-dividing from a signal created by artificially giving a variation amount Wm (f) of the group delay spectrum is observed, and these relationships are based on a sigmoid function. It approximated by Formula 1.

ここで、ｂ，ｃ，ｄは、実験データから観察される帯域分割したスペクトルの尖度の値Ｋm（ｆ）と群遅延スペクトルの変動量Ｗm（ｆ）とが最も良く対応付けられるように決められた定数である。また、ａ（ｆ）も帯域分割したスペクトルの尖度の値Ｋm（ｆ）と群遅延スペクトルの変動量Ｗm（ｆ）とが最もよく対応付けられるように定めた関数であり、例えば、シグモイド関数に基づく式２を利用することができる。 Here, b, c, and d are determined so that the kurtosis value Km (f) of the spectrum obtained by dividing the band observed from the experimental data and the group delay spectrum variation Wm (f) are best correlated. Constant. Also, a (f) is a function determined so that the kurtosis value Km (f) of the band-divided spectrum and the variation amount Wm (f) of the group delay spectrum are best correlated, for example, a sigmoid function Equation 2 based on can be used.

ここで、ｐ，ｑもｂ，ｃ，ｄと同様、実際のデータから観測される帯域分割したスペクトルの尖度の値Ｋm（ｆ）と群遅延スペクトルの変動量Ｗm（ｆ）とが最も良く対応付けられるように決められた定数である。そして、Ψ-1（ｆ，Ｗ）を用い、Ｗに関して逆関数を求めたものをΨ（ｆ，Ｗ）とした。式１及び式２から式３になる。なお、必要に応じてａ（ｆ）はｆに依存しない定数としてもよい。 Here, similarly to b, c, and d, p and q have the best kurtosis value Km (f) of the band-divided spectrum observed from the actual data and the variation Wm (f) of the group delay spectrum. It is a constant determined to be associated. Then, Ψ (f, W) is obtained by using Ψ−1 (f, W) and obtaining an inverse function with respect to W. Expression 1 and Expression 2 become Expression 3. Note that a (f) may be a constant independent of f as necessary.

次に、群遅延生成部２２において、第１工程で生成された群遅延スペクトルの変動量Ｗ_m（ｆ）を用いて合成用フレーム（フレーム番号をｎとする）に対する群遅延スペクトルＤ_n（ｆ）を生成する第２工程を実行する（ステップＳ４）。第２工程で生成された群遅延スペクトルＤ_n（ｆ）を図６に示す。この群遅延スペクトルＤ_n（ｆ）は、Ｗ_m（ｆ）に依存する変動量をもつものであればよく、乱数生成器を用いて、平均値０、分散１の乱数Ｎ_n（ｆ）を生成し、それに対して群遅延スペクトルの変動量Ｗ_m（ｆ）を乗じたものである。合成音の雑音性を増幅させたり、減衰させたりする場合は、この重みに対して非零の係数αを乗ずればよい。この場合、生成される群遅延スペクトルＤ_n（ｆ）は、Ｄ_n（ｆ）＝αＷ_m（ｆ）Ｎ_n（ｆ）と表される。このように、乱数Ｎ_n（ｆ）を乗ずることによって合成音の雑音感を良好に生成することができる。 Next, the group delay generation unit 22 uses the group delay spectrum variation W _m (f) generated in the first step to use the group delay spectrum D _n (f ) Is generated (step S4). FIG. 6 shows the group delay spectrum D _n (f) generated in the second step. This group delay spectrum D _n (f) only needs to have a fluctuation amount depending on W _m (f), and a random number generator is used to calculate a random number N _n (f) having an average value of 0 and a variance of 1. It is generated and multiplied by the variation amount W _m (f) of the group delay spectrum. When amplifying or attenuating the noise characteristics of the synthesized sound, this weight may be multiplied by a non-zero coefficient α. In this case, the generated group delay spectrum D _n (f) is expressed as D _n (f) = αW _m (f) N _n (f). In this way, the noise of the synthesized sound can be generated satisfactorily by multiplying by the random number N _n (f).

次に、位相生成部２３において、第２工程で生成された群遅延スペクトルＤ_n（ｆ）を積分して位相スペクトルθ_n（ｆ）に変換する第３工程を実行する(ステップＳ５）。第３工程で変換された位相スペクトルθ_n（ｆ）を図７に示す。また、この変換は式４に表される。なお、この位相スペクトルθ_n（ｆ）に対しては、例えば、基本周波数の値に応じた位置のずれを再現するための変形等、他の変形を加えることもある。 Next, the phase generation unit 23 executes a third step of integrating the group delay spectrum D _n (f) generated in the second step and converting it to the phase spectrum θ _n (f) (step S5). FIG. 7 shows the phase spectrum θ _n (f) converted in the third step. Further, this conversion is expressed by Equation 4. The phase spectrum θ _n (f) may be subjected to other modifications such as a modification for reproducing a positional shift according to the value of the fundamental frequency.

次に、１ピッチ波形生成部２４において、図８に示すスペクトル包絡Ａ_m（ｆ）と、第３工程で生成された位相スペクトルθ_n（ｆ）とを組み合わせて複素スペクトルＹ_n（ｆ）を求める。スペクトル包絡Ａ_m（ｆ）は、分析部１０のスペクトル抽出部１２において、分析用信号から抽出したものである（ステップＳ６−1）。このため、入力された音信号により近い合成音を生成することができる。複素スペクトルＹ_n（ｆ）は式５に表させる。求められた複素スペクトルＹ_n（ｆ）を逆フーリエ変換して、図９に示す１周期分の信号（１ピッチ波形）ｙ_n（ｔ）を生成する第４工程を実行する（ステップＳ６）。 Next, the 1-pitch waveform generation unit 24 combines the spectrum envelope A _m (f) shown in FIG. 8 with the phase spectrum θ _n (f) generated in the third step to generate a complex spectrum Y _n (f). Ask. The spectrum envelope A _m (f) is extracted from the analysis signal by the spectrum extraction unit 12 of the analysis unit 10 (step S6-1). For this reason, a synthesized sound closer to the input sound signal can be generated. The complex spectrum Y _n (f) is expressed in Equation 5. The fourth step of performing inverse Fourier transform on the obtained complex spectrum Y _n (f) to generate a signal (one pitch waveform) y _n (t) for one period shown in FIG. 9 is executed (step S6).

次に、第４工程で生成した１ピッチ波形を分析部１０の基本周波数抽出部１３において分析用信号から抽出した基本周波数の値（ステップＳ７−1）を用いて重畳加算し、合成音を生成する第５工程を実行する（ステップＳ７）。加算開始位置は基本周期に基づいて更新する。分析フレームの更新が必要になるまで、第２工程から第５工程を繰り返す。 Next, the 1-pitch waveform generated in the fourth step is superimposed and added using the fundamental frequency value (step S7-1) extracted from the analysis signal in the fundamental frequency extraction unit 13 of the analysis unit 10 to generate a synthesized sound. A fifth step is executed (step S7). The addition start position is updated based on the basic period. The second to fifth steps are repeated until the analysis frame needs to be updated.

分析フレームの更新が必要になった場合（ステップＳ８）、分析開始点を更新し、分析用信号抽出部１１で次の分析用信号を抽出し（ステップＳ１）、上述した各処理を実行する。合成用フレームｎにおける加算開始位置をｔ_nとすると、第２工程から第５工程を繰り返した処理後の合成音ｓ_m（ｔ）は、繰り返し処理前の合成音ｓ_m-1（ｔ）を用いて式６と表される。ここで、ｎ_mは分析用フレームｍにおける合成用フレームの最初のフレーム番号を表す。Ｎ_mは分析用フレームｍにおける合成の繰り返し回数を表す。 When the analysis frame needs to be updated (step S8), the analysis start point is updated, the analysis signal extraction unit 11 extracts the next analysis signal (step S1), and the above-described processes are executed. Assuming that the addition start position in the synthesis frame n is t _n , the synthesized sound s _m (t) after the process from the second process to the fifth process is converted to the synthesized sound s _m-1 (t) before the repeated process. And is expressed as Equation 6. Here, n _m represents the first frame number of synthetic frames in the analysis frame m. N _m represents the number of synthesis repetitions in the analysis frame m.

このようにして生成された合成音を図１０に示す。この音合成方法及び音合成装置は群遅延スペクトルの変動量と対応関係にある周期性を表す指標として帯域分割したスペクトルの尖度を利用して合成音を生成する。帯域分割したスペクトルの尖度の値Ｋm（ｆ）は群遅延スペクトルの変動量Ｗm（ｆ）とは対応関係にあるため、予め定めておき、分析用信号から抽出した帯域分割したスペクトルの尖度の値Ｋm（ｆ）に対応させて群遅延スペクトルの変動量Ｗm（ｆ）を生成することができる。これは、群遅延スペクトルを忠実に再現するのではなく、各帯域における群遅延スペクトルの変動の度合いを再現することで、雑音性を有する合成音の生成を可能にするものである。 The synthesized sound generated in this way is shown in FIG. This sound synthesizing method and sound synthesizing apparatus generates a synthesized sound by using the kurtosis of the spectrum obtained by band division as an index representing periodicity corresponding to the variation amount of the group delay spectrum. Since the kurtosis value Km (f) of the band-divided spectrum has a corresponding relationship with the fluctuation amount Wm (f) of the group delay spectrum, it is determined in advance and the kurtosis of the band-divided spectrum extracted from the analysis signal. The group delay spectrum variation Wm (f) can be generated in correspondence with the value Km (f). This does not faithfully reproduce the group delay spectrum but reproduces the degree of fluctuation of the group delay spectrum in each band, thereby enabling generation of a synthesized sound having noise characteristics.

したがって、実施例１の音合成方法及び音合成装置は雑音性を有する合成音の生成を容易にすることができる。 Therefore, the sound synthesizing method and sound synthesizing apparatus according to the first embodiment can easily generate a synthesized sound having noise characteristics.

＜実施例２＞
実施例２の音合成装置は、図１１に示すように、分析部１１０において、線形予測分析部１５、及び線形予測残差抽出部１６を有し、合成部１２０において、残差駆動合成部２７を有する点で実施例１と相違する。他の構成は実施例１と同様であり、同一の構成は同一の符号を付し、詳細な説明を省略する。 <Example 2>
As shown in FIG. 11, the sound synthesizer of the second embodiment includes a linear prediction analysis unit 15 and a linear prediction residual extraction unit 16 in the analysis unit 110, and a residual drive synthesis unit 27 in the synthesis unit 120. This is different from the first embodiment. Other configurations are the same as those of the first embodiment, and the same configurations are denoted by the same reference numerals and detailed description thereof is omitted.

この音合成装置を利用した音合成方法は、図１２に示すように、線形予測残差駆動型分析合成方式を利用するものである。つまり、１ピッチ波形生成部２４において、線形予測残差抽出部１６で抽出した線形予測残差信号の振幅スペクトルＡ_m（ｆ）（ステップＳ６−２)と、第３工程（ステップＳ５）で生成された位相スペクトルθ_n（ｆ）とを組み合わせて複素スペクトルＹ_n（ｆ）を求め、逆フーリエ変換して、１周期分の信号（１ピッチ波形）を生成する第４工程を実行する（ステップＳ６）。 As shown in FIG. 12, a sound synthesis method using this sound synthesizer uses a linear prediction residual drive type analysis and synthesis method. That is, the 1-pitch waveform generation unit 24 generates the amplitude spectrum A _m (f) (step S6-2) of the linear prediction residual signal extracted by the linear prediction residual extraction unit 16 and the third step (step S5). A complex spectrum Y _n (f) is obtained by combining with the phase spectrum θ _n (f), and a fourth step of generating a signal (one pitch waveform) for one period by performing an inverse Fourier transform is executed (step) S6).

その後、重畳加算部２５において、合成音を生成する第５工程を実行し（ステップＳ７）、線形予測残差駆動型分析合成方式に対して与える線形予測残差信号として利用する。そして、残差駆動合成部２７において、分析フレーム毎に線形予測分析部１５において抽出した線形予測係数（ステップＳ９−1）を用い、この線形予測残差信号で駆動して合成音を生成する（ステップＳ９）。 Thereafter, the superimposing / adding unit 25 executes a fifth step of generating a synthesized sound (step S7) and uses it as a linear prediction residual signal to be given to the linear prediction residual drive type analysis / synthesis method. Then, the residual drive synthesis unit 27 uses the linear prediction coefficient (step S9-1) extracted by the linear prediction analysis unit 15 for each analysis frame, and drives with this linear prediction residual signal to generate a synthesized sound ( Step S9).

この音合成方法及び音合成装置も群遅延スペクトルの変動量と対応関係にある周期性を表す指標として帯域分割したスペクトルの尖度を利用して合成音を生成する。帯域分割したスペクトルの尖度の値Ｋm（ｆ）は群遅延スペクトルの変動量Ｗm（ｆ）とは対応関係にあるため、予め定めておき、分析用信号から抽出した帯域分割したスペクトルの尖度の値Ｋm(ｆ）に対応させて群遅延スペクトルの変動量Ｗm（ｆ）を生成することができる。これは、群遅延スペクトルを忠実に再現するのではなく、各帯域における群遅延スペクトルの変動の度合いを再現することで、雑音性を有する合成音の生成を可能にするものである。 This sound synthesizing method and sound synthesizing apparatus also generates a synthesized sound by using the kurtosis of the spectrum obtained by band division as an index representing periodicity corresponding to the variation amount of the group delay spectrum. Since the kurtosis value Km (f) of the band-divided spectrum has a corresponding relationship with the fluctuation amount Wm (f) of the group delay spectrum, it is determined in advance and the kurtosis of the band-divided spectrum extracted from the analysis signal. The variation amount Wm (f) of the group delay spectrum can be generated in correspondence with the value Km (f). This does not faithfully reproduce the group delay spectrum but reproduces the degree of fluctuation of the group delay spectrum in each band, thereby enabling generation of a synthesized sound having noise characteristics.

したがって、実施例２の音合成方法及び音合成装置も雑音性を有する合成音の生成を容易にすることができる。 Therefore, the sound synthesizing method and sound synthesizing apparatus according to the second embodiment can also easily generate a synthesized sound having noise characteristics.

本発明は上記記述及び図面によって説明した実施例１及び２に限定されるものではなく、例えば次のような実施例も本発明の技術的範囲に含まれる。
（１）実施例１及び２では、分析部を有して音合成装置に入力した音信号を分析し、分析した各信号を基にして合成部で合成音を生成したが、分析部を有さず、蓄積した信号を基に合成部で合成音を生成してもよい。
（２）実施例１及び２では、周期性を表す指標として帯域分割したスペクトルの尖度を利用したが、帯域分割したスペクトルの尖度の代わりに帯域分割したスペクトルのスペクトルフラットネスを利用してもよい。
（３）実施例１及び２では、分析用信号からスペクトルに対する尖度を抽出したが、スペクトル包絡の情報を取り除いた後の振幅スペクトルや、線形予測残差信号の振幅スペクトルからスペクトルに対する尖度を抽出してもよい。
（４）実施例１及び２では、群遅延生成部において、乱数を群遅延スペクトルの変動量に乗じたが、乱数の代わりに適当に生成しておいた群遅延スペクトルを群遅延データベースとして用意しておき、それに対して同様の処理をしてもよい。
（５）実施例１及び２では、位相生成部において、群遅延スペクトルを積分して位相スペクトルに変換したが、群遅延スペクトルの累積和を計算して位相スペクトルに変換してもよい。
（６）実施例１及び２では、１ピッチ波形生成部において、スペクトル包絡と位相スペクトルを組み合わせて複素スペクトルを求めたが、スペクトル包絡の代わりに振幅スペクトルを利用してもよい。

The present invention is not limited to the first and second embodiments described with reference to the above description and drawings. For example, the following embodiments are also included in the technical scope of the present invention.
(1) In the first and second embodiments, the sound signal input to the sound synthesizer having the analysis unit is analyzed, and the synthesized sound is generated by the synthesis unit based on the analyzed signals. Instead, the synthesized sound may be generated by the synthesis unit based on the accumulated signal.
(2) in Examples 1 and 2, but using spectral kurtosis that band division as an index representing the periodicity, using spectral flat Ne scan of spectrum band division instead of spectral kurtosis that band division May be.
(3) In Examples 1 and 2 have been extracted kurtosis from the analysis signal with respect to spectrum, and the amplitude spectrum after removal of the information of the spectral envelope, the kurtosis for the spectrum from the amplitude spectrum of the linear prediction residual signal It may be extracted.
( 4 ) In the first and second embodiments, the group delay generation unit multiplies the fluctuation amount of the group delay spectrum by the random number. However, the group delay spectrum generated appropriately instead of the random number is prepared as a group delay database. A similar process may be performed for this.
( 5 ) In the first and second embodiments, the phase generation unit integrates the group delay spectrum and converts it into the phase spectrum. However, the accumulated sum of the group delay spectrum may be calculated and converted into the phase spectrum.
( 6 ) In the first and second embodiments, the 1-pitch waveform generation unit obtains the complex spectrum by combining the spectrum envelope and the phase spectrum, but an amplitude spectrum may be used instead of the spectrum envelope.

本発明は、ハスキーさの再現や協調が可能なリアルタイム声質変換システム、シャウト・スクリーム唱法を再現・強調できる歌声合成システム、ハスキーさを制御できるテキスト音声合成システム、音色の雑音性を自在に制御できる音楽用シンセサイザーに利用可能である。 The present invention is a real-time voice quality conversion system capable of reproducing and coordinating huskyness, a singing voice synthesis system capable of reproducing and emphasizing the shout scream method, a text-to-speech synthesis system capable of controlling huskyness, and a timbre noise characteristic can be freely controlled. It can be used for music synthesizers.

Ｓ３…第１工程
Ｓ４…第２工程
Ｓ５…第３工程
Ｓ６…第４工程
Ｓ７…第５工程
１１…分析用信号抽出部
１２…スペクトル抽出部
１４…指標抽出部
２１…群遅延変動量生成部
２２…群遅延生成部
２３…位相生成部
２４…１ピッチ波形生成部
２５…重畳加算部
２６…記憶部 S3 ... 1st process S4 ... 2nd process S5 ... 3rd process S6 ... 4th process S7 ... 5th process 11 ... Signal extraction part for analysis 12 ... Spectrum extraction part 14 ... Index extraction part 21 ... Group delay variation | change_quantity production | generation part DESCRIPTION OF SYMBOLS 22 ... Group delay production | generation part 23 ... Phase production | generation part 24 ... 1 pitch waveform production | generation part 25 ... Superimposition addition part 26 ... Memory | storage part

Claims

A sound synthesis method for generating a synthesized sound,
Predetermined spectral kurtosis that band division or based on the correspondence between the value and the variation amount of the group delay spectrum of the spectral flatness, to correspond to any value of kurtosis or spectral flatness of spectrum the band division A first step of generating a variation amount of the group delay spectrum,
A second step of generating a group delay spectrum using the variation amount of the group delay spectrum generated in the first step;
A third step of calculating an integral or cumulative sum of the group delay spectrum generated in the second step and converting it into a phase spectrum;
A complex spectrum is obtained by combining an arbitrary amplitude spectrum or an arbitrary spectral envelope and the phase spectrum converted in the third step, and the complex spectrum is subjected to inverse Fourier transform to obtain a one-pitch waveform which is a signal for one period. A fourth step of generating
A fifth step of generating a synthesized sound by superimposing and adding the 1 pitch waveform generated in the fourth step ;
With
A sound synthesizing method, wherein the second to fifth steps are repeated to generate a synthesized sound having a synthesis frame length.

The sound synthesis according to claim 1, wherein an arbitrary value of the kurtosis or the spectrum flatness of the spectrum obtained by the band division is extracted from an analysis signal having a frame length of a time length set for the input sound signal. Method.

The sound synthesis method according to claim 1 or 2, wherein the arbitrary amplitude spectrum or the arbitrary spectral envelope is extracted from an analysis signal having a frame length of a time length set for an input sound signal .

4. The sound synthesis method according to claim 1, wherein the group delay spectrum generated in the second step is generated by multiplying a variation amount of the group delay spectrum by a predetermined coefficient .

The sound synthesis method according to any one of claims 1 to 4, wherein the group delay spectrum generated in the second step is generated by multiplying a variation amount of the group delay spectrum by a random number .

A sound synthesizer for generating a synthesized sound,
A storage unit storing a correspondence relationship between a kurtosis or spectrum flatness value of a predetermined band-divided spectrum and a variation amount of the group delay spectrum;
Based on the correspondence relationship between the kurtosis or spectrum flatness value of the band-divided spectrum stored in the storage unit and the variation amount of the group delay spectrum, the kurtosis or spectrum flatness of the band-divided spectrum is calculated. A group delay variation generating unit for generating a group delay spectrum variation corresponding to an arbitrary value;
A group delay generation unit that generates a group delay spectrum using the variation amount of the group delay spectrum generated by the group delay variation generation unit;
A phase generation unit that calculates an integral or cumulative sum of the group delay spectrum generated by the group delay generation unit and converts it into a phase spectrum; and
A complex spectrum is obtained by combining an arbitrary amplitude spectrum or an arbitrary spectral envelope and the phase spectrum converted by the phase generation unit, and the complex spectrum is subjected to inverse Fourier transform to obtain a one-pitch waveform which is a signal for one period. A one-pitch waveform generating unit for generating
A superposition addition unit that superimposes and adds the one pitch waveform generated by the one pitch waveform generation unit to generate a synthesized sound;
A sound synthesizer characterized by comprising:

An analysis signal extraction unit that extracts an analysis signal for each frame length of a time length set for the input sound signal;
An index extraction unit for extracting a value of spectrum kurtosis or spectrum flatness obtained by band-dividing from the analysis signal extracted by the analysis signal extraction unit;
The sound synthesizer according to claim 6 .

An analysis signal extraction unit that extracts an analysis signal for each frame length of a time length set for the input sound signal;
A spectrum extraction unit for extracting the amplitude spectrum or the spectrum envelope from the analysis signal extracted by the analysis signal extraction unit;
The sound synthesizer according to claim 6 or 7.