JP3918606B2

JP3918606B2 - Speech synthesis apparatus, speech synthesis method, speech synthesis program, and computer-readable recording medium storing the program

Info

Publication number: JP3918606B2
Application number: JP2002092450A
Authority: JP
Inventors: 裕司久湊
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2002-03-28
Filing date: 2002-03-28
Publication date: 2007-05-23
Anticipated expiration: 2022-03-28
Also published as: JP2003288095A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力された演奏データに基づいて音声を合成する音声合成装置、音声合成方法並びに音声合成用プログラム及びこのプログラムを記録したコンピュータで読み取り可能な記録媒体に関し、更に詳しくは、合成・出力される音声に気息性を付与する機能を備えた音声合成装置、音声合成方法並びに音声合成用プログラム及びこのプログラムを記録したコンピュータで読み取り可能な記録媒体に関する。
【０００２】
【従来の技術】
人間の音声の特徴を表わす用語として気息性（Breathiness ブレスネス）がある。気息性とは、息の音の大きさを表わす指標である。気息性が大きい、といえば、それは息の音が大きく感じられる、という意味である。この気息性は話者や歌唱者の特徴の１つであるので、音声合成装置においても、気息性を考慮にいれた音声合成を行うのが好ましい。
【０００３】
気息性や、音声の聴感上の音量感であるダイナミクスは、音声の調和成分、非調和成分の比率が変化すると、それに伴って変化することが判っている。ここで調和成分とは、声帯の振動による周期的な音声の成分のことであり、非調和成分とは、肺からの空気の流れが声門や声帯が狭められたことによって生じる雑音的な音声の成分のことである。
【０００４】
【発明が解決しようとする課題】
従来より、調和成分と非調和成分の比率を変化させることが可能な音声合成装置が知られている（例えば特開平１０−１８７１８０号公報参照）。
この公報に記載されているような方法でも、結果として気息性やダイナミクスを制御することは可能である。しかし、この方法では、調和成分と非調和成分の比率を変化させた結果として気息性等が変化するに過ぎず、気息性等を積極的に制御することが出来るわけではなかった。
本発明は、この点に鑑みてなされたものであり、気息性の大きさを所望どおりに簡易に制御することを可能とした音声合成装置、音声合成方法並びに音声合成用プログラム及びこのプログラムを記録したコンピュータで読み取り可能な記録媒体
を提供することを目的とする。
【０００５】
【課題を解決するための手段】
上記目的を達成するため、本出願の第１の発明に係る音声合成装置は、入力された演奏データに基づいて音声を合成して出力する音声合成装置において、気息性の大きさを示す気息性データＢｒとダイナミクスを示すダイナミクスデータＤｙとを含む演奏データが入力される演奏データ入力部と、前記演奏データに基づき音声の調和成分Ｈと非調和成分ＮＨとを生成する調和／非調和成分生成部と、
前記気息性データＢｒ及び前記ダイナミクスデータＤｙを用いて、前記調和成分Ｈ及び前記非調和成分ＮＨの大きさをそれぞれ以下の式により変更後の調和成分Ｈ’及び変更後の非調和成分ＮＨ’に変更して前記音声に気息性を付与する気息性付与部と、
Ｈ’＝Ｈ＋ΔＨ×Ｂｒ
ＮＨ’＝ＮＨ＋（ΔＮＨ１＋ΔＮＨ２×Ｄｙ）×Ｂｒ
（ここで、ΔＨ、ΔＮＨ１、ΔＮＨ２は気息性データ、ダイナミクスデータの増減による影響度を表す数である。）
前記気息性付与部より出力された前記変更後の調和成分Ｈ’及び前記変更後の非調和成分ＮＨ’とを合成して合成音声信号を出力するミキサとを備えたことを特徴とする。
【００１０】
上記目的達成のため、本出願の第２の発明に係る音声合成方法は、入力された演奏データに基づいて音声合成装置により音声を合成して出力する音声合成方法において、気息性の大きさを示す気息性データＢｒとダイナミクスを示すダイナミクスデータＤｙとを含む演奏データを前記音声合成装置に入力させる演奏データ入力ステップと、前記演奏データに基づき音声の調和成分Ｈと非調和成分ＮＨとを前記音声合成装置により生成させる調和／非調和成分生成ステップと、前記気息性データＢｒ及び前記ダイナミクスデータＤｙを用いて、前記音声合成装置により前記調和成分Ｈ及び前記非調和成分ＮＨの大きさをそれぞれ以下の式により変更後の調和成分Ｈ’及び変更後の非調和成分ＮＨ’に変更させて前記音声に気息性を付与させる気息性付与ステップと、
Ｈ’＝Ｈ＋ΔＨ×Ｂｒ
ＮＨ’＝ＮＨ＋（ΔＮＨ１＋ΔＮＨ２×Ｄｙ）×Ｂｒ
（ここで、ΔＨ、ΔＮＨ１、ΔＮＨ２は気息性データ、ダイナミクスデータの増減による影響度を表す数である。）
前記気息性付与ステップより出力された前記変更後の調和成分Ｈ’及び前記変更後の非調和成分ＮＨ’とを前記音声合成装置により合成させて合成音声信号を出力させる合成ステップとを備えたことを特徴とする。
【００１１】
上記目的達成のため、本出願の第３の発明に係る音声合成用プログラムは、入力された演奏データに基づいて音声を合成して出力する手順をコンピュータに実行させる音声合成用プログラムにおいて、気息性の大きさを示す気息性データＢｒとダイナミクスを示すダイナミクスデータＤｙとを含む演奏データを前記コンピュータに入力させる演奏データ入力ステップと、前記演奏データに基づき音声の調和成分Ｈと非調和成分ＮＨとを前記コンピュータにより生成させる調和／非調和成分生成ステップと、前記気息性データＢｒ及び前記ダイナミクスデータＤｙを用いて、前記音声合成装置により前記調和成分Ｈ及び前記非調和成分ＮＨの大きさをそれぞれ以下の式により変更後の調和成分Ｈ’及び変更後の非調和成分ＮＨ’に変更させて前記音声に気息性を付与させる気息性付与ステップと、
Ｈ’＝Ｈ＋ΔＨ×Ｂｒ
ＮＨ’＝ＮＨ＋（ΔＮＨ１＋ΔＮＨ２×Ｄｙ）×Ｂｒ
（ここで、ΔＨ、ΔＮＨ１、ΔＮＨ２は気息性データ、ダイナミクスデータの増減による影響度を表す数である。）
前記気息性付与ステップより出力された前記変更後の調和成分Ｈ’及び前記変更後の非調和成分ＮＨ’とを前記コンピュータにより合成させて合成音声信号を出力させる合成ステップとを備えたことを特徴とする。
【００１２】
【発明の実施の形態】
以下、本発明の実施の形態を、歌唱音声合成装置を例にとって説明する。
図１に示すように、本実施の形態の歌唱音声合成装置は、演奏データ入力部１０と、調和／非調和成分生成器２０と、気息性付与器３０、ミキサ４０とから構成される。これらの構成要素は、通常のコンピュータとコンピュータプログラムとにより実現することができるが、ハードウエア的に独立に構成することももちろん可能である。
演奏データ入力部１０は、歌唱音声を合成するための各種の演奏データを入力する部分である。この実施の形態では、演奏データは、ピッチデータP、歌詞データL、歌唱者名データS、ダイナミクスデータDy、気息性データBr、ボリュームデータＶを含んでいるものとする。
【００１３】
ピッチデータPは、歌唱音声のピッチ（音高）を示すデータである。また、歌詞データLは、歌唱しようとする歌詞を表わすデータである。歌唱者名データSは、歌唱者の声の特徴を合成される歌唱音声に反映させるための歌唱者の識別番号である。気息性データBrは、気息性の大きさを表わすためのものであり、ここでは０から１の間の数値で表現する。気息性データBrの増減により、調和成分H、非調和成分NHの変化の仕方が変化する。詳しくは後述する。
【００１４】
ダイナミクスデータDyは、聴感上のダイナミクス感を表わすためのものであり、ここでは０から１の間の数値で表現される。ダイナミクスデータDyが０のときは、合成される歌唱音声は最小のダイナミクス感（人が最も小さな声で歌唱したときの音声）となり、ダイナミクスデータDyが１のときは、合成される歌唱音声は最大のダイナミクス感（人が最も大きな声で歌唱したときの音声）となる。
【００１５】
ボリュームデータＶは、合成される歌唱音声の音量を決定するためのものであり、０から１の間の数値で表現される。ボリュームが０の時には、合成される歌唱音声の音量は最小となり、ボリュームが１の時には、合成される歌唱音声の音量が最大となる。
【００１６】
調和／非調和成分生成器２０は、入力される演奏データに合致する調和成分H、非調和成分NHを出力する部分である。ここでは、調和成分H、非調和成分NHは周波数スペクトルで表現されるものとするが、時間波形として表現することも可能である。調和／非調和成分生成器２０は、演奏データの種類ごとに異なる調和成分データ、非調和成分データを記憶したデータベースＤＢを備えている。調和／非調和成分生成器２０は、演奏データ入力部１０から入力される演奏データに合致する適切な調和成分と非調和成分をデータベースＤＢから取得して出力する。なお、入力された演奏データに合致する調和成分及び非調和成分がデータベースＤＢ内に無い場合には、近似する調和成分と非調和成分を読み出して直線補間等の調整を行うようにしてもよい。
【００１７】
また、気息性付与器３０は、演奏データ入力部１０において入力される気息性データBr等に基づき、調和／非調和成分生成器２０から出力される調和成分H、非調和成分NHに変更を加える部分である。ミキサ４０は、気息性付与器３０より出力された変更後の調和成分、非調和成分を合成して音声信号を合成して出力する部分である。
【００１８】
次に、この実施の形態の作用を図２に示すフローチャートに基づいて説明する。
始めに、演奏データ入力部１０において、各種演奏データが入力される（Ｓ１）。
【００１９】
調和／非調和成分生成器２０は、演奏データ入力部１０より入力される演奏データのうち、ピッチデータP、歌詞データL、歌唱者名データS、ダイナミクスデータDyの入力を受け、これらデータに合致した調和成分データ、非調和成分データをデータベースＤＢから読み出すことにより、音声の調和成分H、非調和成分NHを生成する（Ｓ２）。ここで生成される調和成分Hは、図３（ａ）に示すように、ダイナミクスデータDyの増加に伴って増加する。一方、非調和成分NHは、図３（ｂ）に示すように、ダイナミクスデータDyの大きさの変化に関係なく略一定である。このような曲線となるのは、調和／非調和成分生成器２０において、気息性データBrをファクターとして考慮していないためである。
【００２０】
気息性付与器３０は、この調和成分H、非調和成分NHの入力を受けるとともに、演奏データ入力部１０から入力される歌唱者名データS、ダイナミクスデータDy、気息性データBrに基づいて、調和成分H、非調和成分NHの大きさを変更する（Ｓ３）。
【００２１】
変更後の調和成分の大きさH´、変更後の非調和成分の大きさNH´は、変更前の調和成分の大きさH、変更前の非調和成分の大きさNH´との関係で次の式で表わされる。
【００２２】
【数１】
H´＝H＋ΔH(S)×Br [dB] ……(1)
NH´＝NH＋（ΔNH1(S)+ΔNH2(S)×Dy）×Br [dB]……(2)
【００２３】
ただし、ΔH(S)、 ΔNH1(S)、 ΔNH2(S)は歌唱者名データSにより決定される係数である。式（１）、（２）から明らかなように、ΔH(S)が大きくなるほど、気息性データBrの増減によるH´への影響度が大きくなる。また、ΔNH1(S)が大きくなる程、気息性データBrの増減によるNH´への影響度が大きくなるが、ダイナミクスデータDyの増減によるNH´への影響度は変化しない。また、ΔNH2(S)が大きくなるほど、気息性データBrの増減によるNH´への影響度、及び、ダイナミクスデータDyの増減によるNH´への影響度は大きくなる。
【００２４】
上記[数１]の式（１）で表わされるH´の変化量（ΔH(S)×Br）を図４（ａ）のグラフに、式（２）で表わされる変化量（（ΔNH1(S)+ΔNH2(S)×Dy）×Br）を図４（ｂ）のグラフにそれぞれに示す。
図４（ａ）、（ｂ）とも、横軸にダイナミクスデータDy 、縦軸に変化量の大きさ（ｄＢ）をとっている。
【００２５】
図５は、ダイナミクスデータDyの変化に対する変更後の調和成分の大きさH´、変更後の非調和成分の大きさNH´の変化のしかたを示すグラフである。
図５（ａ）に示すように、ダイナミクスデータDyと調和成分H´との関係を示す直線は、気息性データBrの変化によってもその傾きは変化しないが、その縦軸の切片が変化する。すなわち、気息性データBrの変化により、ダイナミクスデータDy−調和成分H´直線は縦軸方向に平行移動する。
【００２６】
一方、図５（ｂ）に示すように、気息性データBrが０のときは、非調和成分NH´の大きさは、ダイナミクスデータDyの増減に関わらず一定であるが、気息性データBrが０より大きくなると、非調和成分NH´は、ダイナミクスデータDyの増加に伴って大きくなり、気息性データBrが大きくなるほど、ダイナミクスデータDyの増加に伴う非調和成分NH´の変化の度合いも大きくなる。すなわち、図５（ｂ）に示すように、気息性データBrが大きくなるほど、ダイナミクスデータDy−非調和成分NH´の変化曲線の傾きが大きくなる。
【００２７】
図６に、気息性データBr＝０．０（最小）の場合における調和成分H´、非調和成分NH´とダイナミクスデータDyとの関係（同図（ａ））、気息性データBr＝１．０（最大）の場合における調和成分H、非調和成分NH´とダイナミクスデータDyとの関係を示す（同図（ｂ））。
【００２８】
図６（ａ）に示すように、気息性データBr＝０．０の場合には、調和成分H´はダイナミクスデータDyの増加に伴って増加するようにされるが、非調和成分NH´はダイナミクスデータDyに拘わらず一定である。一方、図６（ｂ）に示すように、気息性データBr＝１．０の場合には、調和成分H´はダイナミクスデータDyの増加に伴って増加するようにされ、非調和成分NH´もダイナミクスデータDyの増加に伴って増加するようにされる。このように、気息性データBrの大きさが異なると、同じようにダイナミクスデータDyが変化するにしても、調和成分H´と非調和成分NH´との比率の変化のしかたが変わってくる。
【００２９】
人間の実際の発声において、声門閉鎖区間が長い場合や、閉鎖区間が不完全で肺からの直流的空気流の割合が大きくなった場合の音声は「気息性の程度が大きい」という。このような場合、ダイナミクスを大きくしようとして発声すると、肺からの直流的空気流の大きさ自体も大きくなるから、非調和成分もダイナミクスの増加に伴って増加することになる。
気息性の程度が小さい場合には、こうした肺からの直流的空気流が殆ど無いので、非調和成分はダイナミクスに関係なく低いままで殆ど一定となる。
図６のグラフは、このような人間の実際の声の発声の特徴と共通している。
最後に、ミキサ４０で、気息性付与器３０より出力された変更後の調和成分Ｈ´、非調和成分ＮＨ´を合成して音声信号を合成して出力する（Ｓ４）。
【００３０】
以上説明したように、本実施の形態の歌唱音声合成装置によれば、気息性データとダイナミクスデータにより合成する音声の調和成分、非調和成分を制御して、簡単に自然で特徴のある音声を合成することが可能になる。
また、ダイナミクスの気息性の程度を独立して制御することができるので、ダイナミクスを変化させて次第に大きくしたり小さくしたりした音声を合成する場合でも、より人間の歌唱に近い自然な気息性を持つ音声を合成することが可能になる。
【００３１】
また、ダイナミクスと気息性の程度を適宜設定することで、歌唱者による気息性の違いを容易に与えることが可能になる。
また、入力された演奏データに合致する調和成分及び非調和成分がデータベースに無い場合でも、近似する調和成分と非調和成分から補間による調整により、目的とする演奏データを合成することが可能になる。このため、すべてのダイナミクス、気息性の組合せを取る調和成分及び非調和成分をデータベースに蓄積する必要がなくなり、データベースを小さくすることができる。
【００３２】
【発明の効果】
以上説明したように、本発明によれば、気息性の大きさを所望どおりに簡易に制御することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る歌唱音声合成装置の構成を示す。
【図２】図１の装置による処理の様子を示すフローチャートである。
【図３】図１の調和／非調和成分生成器２０から出力される音声の調和成分H、非調和成分NHの、ダイナミクスデータDyとの関係を示すグラフである。
【図４】図１の気息性付与器３０で調和成分H、非調和成分NHに変更を加えるための変化分と、ダイナミクスデータDyとの関係を示すグラフである。
【図５】気息性付与器３０から出力される変更後の調和成分H´、非調和成分NH´のダイナミクスデータDyとの関係を示すグラフである。
【図６】気息性データBrが異なる場合において、調和成分H´、非調和成分NH´のダイナミクスデータDyとの関係が変化する様子を説明するためのグラフである。
【符号の説明】
１０・・・演奏データ入力部
２０・・・調和／非調和成分生成器
３０・・・気息性付与器
４０・・・ミキサ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech synthesizer that synthesizes speech based on input performance data, a speech synthesis method, a speech synthesis program, and a computer-readable recording medium on which the program is recorded. The present invention relates to a speech synthesizer, a speech synthesis method, a speech synthesis program, and a computer-readable recording medium on which the program is recorded.
[0002]
[Prior art]
Breathiness is a term that describes the characteristics of human speech. Breathability is an index that represents the loudness of a breath. Speaking of breathability means that the sound of breathing is felt loud. Since this breathability is one of the characteristics of a speaker or a singer, it is preferable to perform speech synthesis in consideration of breathability in the speech synthesizer.
[0003]
It has been found that the dynamics, which are breathiness and the volume of sound in the sense of sound, change with the change of the ratio of the harmonic component and the nonharmonic component of the voice. Here, the harmonic component is a periodic voice component caused by the vocal cord vibration, and the non-harmonic component is a noisy voice generated by the narrowing of the glottis and the vocal cords due to the air flow from the lungs. It is an ingredient.
[0004]
[Problems to be solved by the invention]
2. Description of the Related Art Conventionally, a speech synthesizer that can change the ratio of harmonic components and non-harmonic components is known (see, for example, JP-A-10-187180).
Even with the method described in this publication, it is possible to control breathability and dynamics as a result. However, in this method, the breathability only changes as a result of changing the ratio of the harmonic component and the nonharmonic component, and the breathability or the like cannot be positively controlled.
The present invention has been made in view of this point, and has a speech synthesizer, a speech synthesis method, a speech synthesis program, and a program for recording the program, which can easily control the level of breathability as desired. An object of the present invention is to provide a computer-readable recording medium.
[0005]
[Means for Solving the Problems]
To achieve the above object, a speech synthesizer according to a first invention of the present application is a speech synthesizer that synthesizes and outputs speech based on input performance data. A performance data input unit to which performance data including data Br and dynamics data Dy indicating dynamics is input, and a harmonic / nonharmonic component generation unit that generates a harmonic component H and an anharmonic component NH of the voice based on the performance data. When,
Using the breathability data Br and the dynamics data Dy , the magnitudes of the harmonic component H and the anharmonic component NH are respectively changed to the harmonic component H ′ after change and the anharmonic component NH ′ after change according to the following equations. An air breathing unit that changes the air to provide breathability;
H ′ = H + ΔH × Br
NH ′ = NH + (ΔNH1 + ΔNH2 × Dy) × Br
(Here, ΔH, ΔNH1, and ΔNH2 are numbers representing the degree of influence due to the increase and decrease of breathability data and dynamics data.)
Characterized in that a mixer for outputting the harmonic component H synthesized and synthesized speech signals a 'and inharmonic components NH after change' after the output from the breath-imparting portion changes.
[0010]
To achieve the above object, a speech synthesis method according to the second invention of the present application is a speech synthesis method for synthesizing and outputting speech by a speech synthesizer based on input performance data. the voice and performance data input step to let inputting performance data to the speech synthesizer, and a harmonic component H and the stochastic component NH speech based on said performance data including the dynamics data Dy showing a breathiness data Br and dynamics indicated harmony / stochastic component generating step Ru is generated by synthesizer, using the breathiness data Br and the dynamics data Dy, respectively below the magnitude of the harmonic component H and the stochastic component NH by the speech synthesizer breath of formula by is changed to 'stochastic component NH and after change' harmonic component H after change of Ru to impart breathiness to the audio And grant step,
H ′ = H + ΔH × Br
NH ′ = NH + (ΔNH1 + ΔNH2 × Dy) × Br
(Here, ΔH, ΔNH1, and ΔNH2 are numbers representing the degree of influence due to the increase and decrease of breathability data and dynamics data.)
And a synthesis step we leave for is synthesized to output the synthesized speech signal by a 'stochastic component NH and after the change' harmonic component H after the output from the breath-imparting step changes the speech synthesizer It is characterized by that.
[0011]
To achieve the above object, a speech synthesis program according to a third invention of the present application is a speech synthesis program for causing a computer to execute a procedure for synthesizing and outputting speech based on input performance data. and performance data input step of Ru is inputting performance data to the computer including the dynamics data Dy showing a breathiness data Br and dynamics indicating the magnitude of the harmonic component H of the speech based on the performance data and the stochastic component NH respectively harmony / stochastic component generating step Ru is generated by the computer, using the breathiness data Br and the dynamics data Dy, by the speech synthesizer the magnitude of the harmonic component H and the stochastic component NH wherein by changing the following harmonic component H after the change by the equation of 'and the stochastic component NH the changed' And breath-imparting step that Ru was granted a breathiness to voice,
H ′ = H + ΔH × Br
NH ′ = NH + (ΔNH1 + ΔNH2 × Dy) × Br
(Here, ΔH, ΔNH1, and ΔNH2 are numbers representing the degree of influence due to the increase and decrease of breathability data and dynamics data.)
That a synthesizing step of Ru and 'stochastic component NH and after the change' harmonic component H after change which is output from the breath-imparting step to output the synthesized speech signal by combining by the computer Features.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described by taking a singing voice synthesis device as an example.
As shown in FIG. 1, the singing voice synthesizer according to the present embodiment includes a performance data input unit 10, a harmonic / anharmonic component generator 20, an air breathing unit 30, and a mixer 40. These components can be realized by a normal computer and a computer program, but can of course be configured independently by hardware.
The performance data input unit 10 is a part for inputting various performance data for synthesizing the singing voice. In this embodiment, it is assumed that the performance data includes pitch data P, lyrics data L, singer name data S, dynamics data Dy, breathability data Br, and volume data V.
[0013]
The pitch data P is data indicating the pitch (pitch) of the singing voice. The lyrics data L is data representing the lyrics to be sung. The singer name data S is a singer identification number for reflecting the characteristics of the singer's voice in the synthesized singing voice. The breathability data Br is for representing the magnitude of breathability, and is represented by a numerical value between 0 and 1 here. The way of changing the harmonic component H and the anharmonic component NH is changed by increasing / decreasing the breath data Br. Details will be described later.
[0014]
The dynamics data Dy is for representing a sense of dynamics in the sense of hearing, and is represented by a numerical value between 0 and 1 here. When the dynamics data Dy is 0, the synthesized singing voice has the least dynamic feeling (the voice when a person sings with the smallest voice), and when the dynamics data Dy is 1, the synthesized singing voice is the largest. Of the dynamics (sound when a person sings with the loudest voice).
[0015]
The volume data V is for determining the volume of the synthesized singing voice and is expressed by a numerical value between 0 and 1. When the volume is 0, the volume of the synthesized singing voice is minimized, and when the volume is 1, the volume of the synthesized singing voice is maximized.
[0016]
The harmonic / anharmonic component generator 20 is a part that outputs a harmonic component H and an anharmonic component NH that match the input performance data. Here, the harmonic component H and the anharmonic component NH are assumed to be expressed by a frequency spectrum, but can also be expressed as a time waveform. The harmonic / anharmonic component generator 20 includes a database DB that stores different harmonic component data and anharmonic component data for each type of performance data. The harmonic / anharmonic component generator 20 acquires appropriate harmonic components and anharmonic components that match the performance data input from the performance data input unit 10 from the database DB and outputs them. If there is no harmonic component and anharmonic component that match the input performance data in the database DB, the approximate harmonic component and the anharmonic component may be read out and adjustment such as linear interpolation may be performed.
[0017]
Further, the breathing unit 30 changes the harmonic component H and the harmonic component NH output from the harmonic / nonharmonic component generator 20 based on the breath data Br input in the performance data input unit 10. Part. The mixer 40 is a part that synthesizes and outputs an audio signal by synthesizing the changed harmonic component and the non-harmonic component output from the breathing unit 30.
[0018]
Next, the operation of this embodiment will be described based on the flowchart shown in FIG.
First, various performance data are input in the performance data input unit 10 (S1).
[0019]
The harmonic / nonharmonic component generator 20 receives input of pitch data P, lyric data L, singer name data S, and dynamics data Dy among performance data input from the performance data input unit 10, and matches these data. The harmonic component data and the anharmonic component data are read from the database DB to generate the harmonic component H and the anharmonic component NH of the voice (S2). The harmonic component H generated here increases as the dynamics data Dy increases, as shown in FIG. On the other hand, the anharmonic component NH is substantially constant regardless of the change in the size of the dynamics data Dy, as shown in FIG. The reason why such a curve is formed is that the harmonic / anharmonic component generator 20 does not consider the breathability data Br as a factor.
[0020]
The breathability imparting device 30 receives the harmonic component H and the anharmonic component NH, and based on the singer name data S, the dynamics data Dy, and the breathability data Br input from the performance data input unit 10. The sizes of the component H and the anharmonic component NH are changed (S3).
[0021]
The magnitude H ′ of the harmonic component after the change, the magnitude NH ′ of the anharmonic component after the change, are the following in relation to the magnitude H of the harmonic component before the change, and the magnitude NH ′ of the anharmonic component before the change. It is expressed by the following formula.
[0022]
[Expression 1]
H´ ＝ H ＋ ΔH (S) × Br [dB] …… (1)
NH´ ＝ NH + (ΔNH1 (S) + ΔNH2 (S) × Dy) × Br [dB] …… (2)
[0023]
However, ΔH (S), ΔNH1 (S), and ΔNH2 (S) are coefficients determined by the singer name data S. As is clear from the equations (1) and (2), the greater the ΔH (S), the greater the influence on H ′ due to the increase / decrease in the breath data Br. Further, as ΔNH1 (S) increases, the degree of influence on NH ′ due to increase / decrease in breathability data Br increases, but the degree of influence on NH ′ due to increase / decrease in dynamics data Dy does not change. Further, as ΔNH2 (S) increases, the degree of influence on NH ′ due to the increase / decrease in breathability data Br and the degree of influence on NH ′ due to the increase / decrease in dynamics data Dy increase.
[0024]
The change amount (ΔH (S) × Br) of H ′ represented by the equation (1) in the above [Equation 1] is shown in the graph of FIG. 4A, and the change amount represented by the equation (2) ((ΔNH1 (S ) + ΔNH 2 (S) × Dy) × Br) is shown in the graph of FIG.
4 (a) and 4 (b), the horizontal axis represents the dynamics data Dy and the vertical axis represents the magnitude of change (dB).
[0025]
FIG. 5 is a graph showing how the magnitude H ′ of the harmonic component after change and the magnitude NH ′ of the anharmonic component after change are changed with respect to the change in the dynamics data Dy.
As shown in FIG. 5A, the slope of the straight line indicating the relationship between the dynamics data Dy and the harmonic component H ′ does not change even when the breathability data Br changes, but the intercept of the vertical axis changes. That is, the dynamics data Dy-harmonic component H ′ straight line is translated in the vertical axis direction by the change of the breath data Br.
[0026]
On the other hand, as shown in FIG. 5B, when the breathing data Br is 0, the magnitude of the anharmonic component NH ′ is constant regardless of the increase / decrease in the dynamics data Dy. When it is greater than 0, the anharmonic component NH ′ increases with an increase in the dynamics data Dy, and as the breathability data Br increases, the degree of change in the anharmonic component NH ′ with an increase in the dynamics data Dy also increases. . That is, as shown in FIG. 5B, as the breathability data Br increases, the slope of the change curve of the dynamics data Dy-anharmonic component NH ′ increases.
[0027]
FIG. 6 shows the relationship between the harmonic component H ′ and the anharmonic component NH ′ and the dynamics data Dy when the breath data Br = 0.0 (minimum) (FIG. 6A), the breath data Br = 1. The relationship between the harmonic component H, the anharmonic component NH ′, and the dynamics data Dy in the case of 0 (maximum) is shown (FIG. 5B).
[0028]
As shown in FIG. 6 (a), in the case of breathability data Br = 0.0, the harmonic component H ′ is increased as the dynamics data Dy increases. It is constant regardless of the dynamics data Dy. On the other hand, as shown in FIG. 6B, in the case of breathability data Br = 1.0, the harmonic component H ′ is increased as the dynamics data Dy increases, and the anharmonic component NH ′ is also increased. It is made to increase with the increase of the dynamics data Dy. As described above, when the size of the breath data Br differs, the manner in which the ratio of the harmonic component H ′ and the non-harmonic component NH ′ changes is changed even if the dynamics data Dy changes in the same manner.
[0029]
In actual human vocalization, when the glottal closure period is long, or when the closed period is incomplete and the ratio of DC airflow from the lungs is high, the voice is said to be “highly breathable”. In such a case, when speaking to increase the dynamics, the magnitude of the direct current air flow from the lungs itself increases, so the anharmonic component also increases as the dynamics increase.
When the degree of breathing is small, there is almost no DC airflow from the lungs, so the anharmonic component remains low and constant regardless of the dynamics.
The graph of FIG. 6 is in common with the characteristics of the utterance of an actual human voice.
Finally, the mixer 40 combines the changed harmonic component H ′ and the non-harmonic component NH ′ output from the breather 30 and synthesizes and outputs an audio signal (S4).
[0030]
As described above, according to the singing voice synthesizer of the present embodiment, it is possible to control a harmonic component and a non-harmonic component of a voice to be synthesized by breath data and dynamics data, thereby easily producing a natural and characteristic voice. It becomes possible to synthesize.
In addition, since the degree of breathing of dynamics can be controlled independently, even when synthesizing voices that are gradually made larger or smaller by changing the dynamics, natural breathing closer to human singing is achieved. It is possible to synthesize the voice that you have.
[0031]
In addition, by appropriately setting the degree of dynamics and breathability, it becomes possible to easily give a difference in breathability by the singer.
Moreover, even if there is no harmonic component and anharmonic component that match the input performance data in the database, it is possible to synthesize the desired performance data by adjusting the approximate harmonic component and the anharmonic component by interpolation. . For this reason, it is not necessary to store harmonic components and non-harmonic components that take a combination of all dynamics and breathability in the database, and the database can be made smaller.
[0032]
【The invention's effect】
As described above, according to the present invention, the magnitude of breathability can be easily controlled as desired.
[Brief description of the drawings]
FIG. 1 shows a configuration of a singing voice synthesizer according to an embodiment of the present invention.
FIG. 2 is a flowchart showing a state of processing by the apparatus of FIG.
3 is a graph showing the relationship between the harmonic component H and the anharmonic component NH of the audio output from the harmonic / anharmonic component generator 20 of FIG. 1 and the dynamics data Dy. FIG.
4 is a graph showing a relationship between a change amount for changing the harmonic component H and the anharmonic component NH and dynamics data Dy in the breathability imparting device 30 of FIG. 1; FIG.
FIG. 5 is a graph showing the relationship between the changed harmonic component H ′ and the anharmonic component NH ′ output from the breathability imparting device 30 and the dynamics data Dy.
FIG. 6 is a graph for explaining how the relationship between the harmonic component H ′ and the anharmonic component NH ′ and the dynamics data Dy changes when the breathability data Br is different.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Performance data input part 20 ... Harmonic / nonharmonic component generator 30 ... Breathing applicator 40 ... Mixer

Claims

In a speech synthesizer that synthesizes and outputs speech based on input performance data,
A performance data input unit to which performance data including breathability data Br indicating the level of breathability and dynamics data Dy indicating dynamics is input;
A harmonic / nonharmonic component generating unit that generates a harmonic component H and an anharmonic component NH of the voice based on the performance data;
Using the breathability data Br and the dynamics data Dy , the magnitudes of the harmonic component H and the anharmonic component NH are respectively changed to the harmonic component H ′ after change and the anharmonic component NH ′ after change according to the following equations. An air breathing unit that changes the air to provide breathability;
H ′ = H + ΔH × Br
NH ′ = NH + (ΔNH1 + ΔNH2 × Dy) × Br
(Here, ΔH, ΔNH1, and ΔNH2 are numbers representing the degree of influence due to the increase and decrease of breathability data and dynamics data.)
Speech synthesis is characterized in that a mixer for outputting the synthesized and the synthesized speech signal and a 'non-harmonic component NH and after the change' harmonic component H after output from the breath-imparting unit the change apparatus.

The speech synthesizer according to claim 1 , wherein ΔH, ΔNH1, and ΔNH2 are coefficients determined by singer name data S.

In a speech synthesis method for synthesizing and outputting speech by a speech synthesizer based on input performance data,
A performance data input step for causing the speech synthesizer to input performance data including breathability data Br indicating the level of breathability and dynamics data Dy indicating dynamics;
Harmony / stochastic component generating step Ru is generated by the speech synthesizer and a harmonic component H and the stochastic component NH voice on the basis of the performance data,
Using the breathability data Br and the dynamics data Dy , the speech synthesizer uses the following equations to change the magnitudes of the harmonic component H and the nonharmonic component NH, respectively, according to the following formulas. a breath-imparting step conditioner is changed into components NH 'and Ru is imparting breathiness to the audio,
H ′ = H + ΔH × Br
NH ′ = NH + (ΔNH1 + ΔNH2 × Dy) × Br
(Here, ΔH, ΔNH1, and ΔNH2 are numbers representing the degree of influence due to the increase and decrease of breathability data and dynamics data.)
And a synthesis step we leave for is synthesized to output the synthesized speech signal by a 'stochastic component NH and after the change' harmonic component H after the output from the breath-imparting step changes the speech synthesizer A speech synthesis method characterized by the above.

In a speech synthesis program for causing a computer to execute a procedure of synthesizing and outputting speech based on input performance data,
And performance data input step of performance data Ru is input to the computer including the dynamics data Dy showing a breathiness data Br and dynamics showing the breathiness magnitude,
Harmony / stochastic component generating step Ru is generated by the computer and a harmonic component H and the stochastic component NH voice on the basis of the performance data,
Using the breathability data Br and the dynamics data Dy, the speech synthesizer uses the following equations to change the magnitudes of the harmonic component H and the nonharmonic component NH, respectively, according to the following formulas. a breath-imparting step conditioner is changed into components NH 'and Ru is imparting breathiness to the audio,
H ′ = H + ΔH × Br
NH ′ = NH + (ΔNH1 + ΔNH2 × Dy) × Br
(Here, ΔH, ΔNH1, and ΔNH2 are numbers representing the degree of influence due to the increase and decrease of breathability data and dynamics data.)
That a synthesizing step of Ru and 'stochastic component NH and after the change' harmonic component H after change which is output from the breath-imparting step to output the synthesized speech signal by combining by the computer A special speech synthesis program.

A computer-readable recording medium on which the speech synthesis program according to claim 4 is recorded.