JP4296767B2

JP4296767B2 - Breath sound synthesis method, breath sound synthesis apparatus and program

Info

Publication number: JP4296767B2
Application number: JP2002306798A
Authority: JP
Inventors: 崇野口
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2002-10-22
Filing date: 2002-10-22
Publication date: 2009-07-15
Anticipated expiration: 2022-10-22
Also published as: JP2004144814A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声機器に用いて好適なブレス音合成方法、ブレス音合成装置およびプログラムに関する。
【０００２】
【従来の技術】
歌唱音等の音声データまたはシーケンスデータを、パーソナルコンピュータ上で修正することは周知技術である。それゆえ、手動により適切な位置にブレス（息継ぎ）音の波形データまたはイベントデータを該音声データまたは該シーケンスデータに付加することは可能であると考えられる。
一方、特許文献１には、楽器演奏のためのメロディデータと歌詞データとを含む演奏データに基づいて歌唱音を自動的に合成する歌唱音声合成装置が開示されている。該演奏データには歌詞データに対する息継ぎ指定（呼気フラグ）が適宜設けられ、該指定があった場合に、発音中の音源チャンネルに対してノートオフ操作を行う旨が開示されている。
【０００３】
【特許文献１】
特許第３２３９７０６号公報
【０００４】
【発明が解決しようとする課題】
しかし、特許文献１に示される技術においては、息継ぎ指定が呼気フラグの挿入のみによって行われている。そして、該呼気フラグのタイミングで発音中の歌唱音はノートオフされるが、具体的な息継ぎ音の発生指示や特性の指定については開示がない。また、手動で音声データにブレス音を付加する技術においても、フレーズの長短や強弱に関係なく同じブレス音を毎回同じように再生すると、不自然な歌唱音に聞こえる問題点がある。一方、自然な歌唱音を合成するために、ブレスのタイミング、長さ、強度を毎回手動で指定することはきわめて煩雑である。
この発明は、上述した事情に鑑みてなされたものであり、使用者がブレスのことを意識せず曲制作を行うことが出来るブレス音合成方法、ブレス音合成装置およびプログラムを提供することを目的とする。
【０００５】
【課題を解決するための手段】
上記課題を解決するため本発明にあっては、下記構成を具備することを特徴とする。なお、括弧内は例示である。
請求項１記載のブレス音合成方法にあっては、各々が連続的な発音期間（フレーズ）であって順次発生する第１および第２の発音期間を音声データ内またはシーケンスデータ内から検出する検出過程と、前記音声データ中または前記シーケンスデータ中、前記第１の発音期間の終了後、前記第２の発音期間の開始前のタイミングにブレス音または該ブレス音を発生させるブレスイベントを挿入するブレス音挿入過程とを有し、前記ブレス音の波形は、前記第２の発音期間に係る連続波形強度（フレーズ強度）が強いほどブレス量が多く、また、連続波形長（フレーズ長）が長いほど該ブレス量が多くなるように設定される波形であり、前記ブレス音の長さであるブレス長は、前記第１の発音期間が終了した後に前記第２の発音期間が開始されるまでの期間において、該ブレス音が終了する時刻から前記第２の発音期間が開始されるまでの所定の間隔（Ｔ _BP2 ）を除いた範囲内で設定されるものであり、前記ブレス音の強度であるブレス強度は、前記ブレス量が多いほど大きく、また前記ブレス長が長いほど小さくなるように算出されるものであり、前記ブレス音の開始タイミングは、前記ブレス長と、前記所定の間隔（Ｔ_BP2）とに基づいて算出されるものであることを特徴とする。
さらに、請求項２記載の構成にあっては、請求項１記載のブレス音合成方法において、前記第２の発音期間に係る連続波形強度（フレーズ強度）および連続波形長（フレーズ長）に応じて決定されるブレス量の値により、前記ブレス音または前記ブレスイベントを挿入するか否かの判定を行うブレス音挿入判定過程をさらに有することを特徴とする。
また、請求項３記載のブレス音合成装置にあっては、請求項１または２に記載のブレス音合成方法を実行することを特徴とする。
また、請求項４記載のプログラムにあっては、請求項１または２に記載のブレス音合成方法を処理装置（５０）に実行させることを特徴とする。
【０００６】
【発明の実施の形態】
1. 実施形態の構成
本発明の一実施形態であるブレス音合成装置の構成を図１を参照して説明する。
図において、１０は液晶表示パネルであり、データファイルの指定に関する表示、該データファイルについてのリアルタイム再生波形の表示が行われる。２０はＩ／Ｏ制御部であり、ＭＩＤＩインターフェース、ＵＳＢインターフェースにより、音声データ、ＭＩＤＩデータ（シーケンスデータ）の通信が行われる。尚、各種設定を行うためのポインティングデバイスが含まれる。３０はＦＤＤ制御部であり、フレキシブルディスクを介して、STANDARD_MIDI_FORMATファイル、音声ファイルの交換が行われる。４０はＨＤＤ制御部であり、一時ファイル、パラメータが記憶される。５０はＣＰＵであり、各部を制御する。６０はＲＯＭであり、ＣＰＵ５０を駆動するプログラムが記憶される。７０はＲＡＭであり、ワークメモリとして使用される。９０はバスラインであり、各部を接続する。以上により、ブレス音合成装置１００が構成される。
【０００７】
２．実施形態の動作
ブレス音合成装置１００は、リアルタイムな振幅データである音声データ、あるいはイベントデータの集合であるシーケンスデータのいずれに対しても対応している。以下、各々のデータに対する処理を場合を分けて説明する。
（１）音声データでの動作
ブレス音合成装置１００について音声データでの動作を図３のフローチャートを参照して説明する。所定の操作を行うことにより、図３のルーチンが起動する。
ステップＳＰ１０１において、曲調・テンポ・曲データの情報を含む音声データファイルの全体がハードディスクに読み込まれる。読み込みは、ＵＳＢインターフェースあるいはフレキシブルディスク等を介して行われる。ここで、図２（ａ）に、読み込まれた音声データについてのリアルタイム再生波形の一部を示す。一息で連続的に発音される連続波形の期間をフレーズといい、該フレーズが時系列的に複数存在する。また、前のフレーズと後ろのフレーズとの間に空白期間が存在し、該空白期間に、目的とするブレス音が挿入される。なお、一息で発音されるフレーズは、五線譜において「スラー」により表現される。そして、読み込み終了後、処理はステップＳＰ１０３に進む。
【０００８】
ステップＳＰ１０３においては、フレーズの検出処理が行われる。すなわち、読み込まれた音声データファイルが振幅データに変換され、連続波形の期間すなわち、所定の閾値以上の信号レベルが連続している期間がフレーズであると判断される。なお、読み込まれた音声ファイルの全体についてこの処理が行われる。そして、処理はステップＳＰ１０５に進み、フレーズ分析が行われる。
【０００９】
ステップＳＰ１０５においては、ｎ番目のフレーズのデータおよび（ｎ＋１）番目のフレーズのデータがワークメモリに読み込まれ、フレーズ間隔が検出される。ここで図２（ａ）に示されるように、フレーズ間隔（PHRASE_INTERVAL）とは、前のフレーズと後ろのフレーズとの間隔をいう。特に、最初の処理においては、ｎ＝１に設定され、第１番目のフレーズと第２番目のフレーズとのフレーズ間隔が検出される。そして、処理はステップＳＰ１０７に進む。ステップＳＰ１０７においては、後ろのフレーズ長（PHRASE_LENGTH）およびフレーズ強度（PHRASE_STRENGTH）の検出が行われる。ここで、フレーズ強度は音量レベルの平均値で定義される。そして、処理はステップＳＰ１０９に進む。
【００１０】
ステップＳＰ１０９においては、ブレス量の算出処理が行われる。ブレス量（BREATH_POWER）は、後ろのフレーズのフレーズ長（PHRASE_LENGTH）およびフレーズ強度（PHRASE_STRENGTH）の関数で与えられ、具体的にはフレーズ長とフレーズ強度との積で与えられる。すなわち、後ろのフレーズが長いほど、あるいは、強いほど必要なブレス量が多くなる。そして、処理はステップＳＰ１１１に進む。
【００１１】
ステップＳＰ１１１においては、ブレス音を挿入するか否かが判定される。ブレス量（BREATH_POWER）が所定の閾値未満であればブレス音を付加する必要がなく「ＮＯ」と判定され、処理はステップＳＰ１１７に進む。ステップＳＰ１１７においては、フレーズの更新が行われる。すなわち、ｎの値が増加され、次のフレーズのデータがワークメモリに読み込まれる。そして、処理はステップＳＰ１０５に戻る。一方、ブレス量が所定の閾値以上であれば、ブレス音を付加する必要があり、「ＹＥＳ」と判定され、処理はステップＳＰ１１３に進む。
【００１２】
ステップＳＰ１１３においては、ブレス強度、ブレス長、ブレスタイミングの算出処理が行われる。ここで、ブレス長（BREATH_LENGTH）は、フレーズ間隔（PHRASE_INTERVAL）と、ブレス終了時刻から後ろのフレーズの開始時刻までの時間Ｔ_BP2（図２（ｂ）参照）との関数である。ここで、Ｔ_BP2の値は予め所定の値に設定される。すなわち、ブレスの終了時刻と後ろのフレーズの開始時刻との時間が一定間隔にされ、ブレス長はフレーズ間隔の範囲内でなるべく大きな値にされる。但し、ブレス長の最大値は所定値に制限される。特に、前のフレーズが存在せず、フレーズ間隔が無限大とみなされる場合には、該所定値がブレス長に設定される。なお、前のフレーズの終了時刻とブレスの開始時刻との間の時間Ｔ_BP1が設定されることがある。
【００１３】
また、ブレス強度（BREATH_STRENGTH）は、ブレス量（BREATH_POWER）およびブレス長（BREATH_LENGTH）の関数であり、ブレス量が大きければ、ブレス強度が大きくなり、ブレス長が長ければブレス強度が小さくなる（図２（ｃ）参照）。また、ブレス強度は、ブレス波形の振幅の平均値に等しい。ここで、該ブレス波形は、波形の中心で最大値になりブレス開始時刻およびブレス終了時刻に近づくにつれて対称的に低下するような波形にされる（図２（ｂ）実線）。
【００１４】
さらに、ブレスタイミング（ブレス再生開始時刻）は、ブレス長（BREATH_LENGTH）および時間Ｔ_BP2（図２（ｂ）参照）の関数である。たとえば、後ろのフレーズの開始時刻から該所定値およびＴ_BP2の値を減算した時刻がブレスタイミングに設定される。そして、処理はステップＳＰ１１４に進む。
【００１５】
ステップＳＰ１１４においては、曲調・テンポによる修正処理が行われる。たとえば、曲調が落ち着いた曲であれば、ブレス強度を小さくし、ブレス長を長くすると好適である。一方、テンポが早い曲であれば、ブレス長を短くし、ブレス強度を大きくすると好適である。そして、処理はステップＳＰ１１５に進む。
【００１６】
ステップＳＰ１１５においては、最終フレーズまで処理が完了したか否かが判定される。処理が完了しなければ、「ＮＯ」と判定され、処理はステップＳＰ１１７に進む。ステップＳＰ１１７においては、フレーズの更新が行われ、処理はステップＳＰ１０５に戻る。一方、最終フレーズまで処理が完了すれば、「ＹＥＳ」と判定され、本ルーチンは終了する。
【００１７】
そして、算出されたブレス強度、ブレス長、ブレスタイミングに基づき、音声データの合成処理が行われる。すなわち、ブレス強度、ブレス長の情報を用いて音声データにブレス音が追加される。
【００１８】
（２）シーケンスデータの場合の動作
以下、ＭＩＤＩのシステムエクスクルーシブメッセージなどに挿入されたシーケンスデータ（歌詞データおよびメロディデータ）についての動作を、上記と同様に図３を参照して説明する。
ステップＳＰ１０１において、シーケンスデータの読み込みがＭＩＤＩインターフェース、ＵＳＢインターフェースあるいはフレキシブルディスクを介して、行われる。ここで、歌詞データは、歌詞の一音素ずつに対応する複数の音声シーケンスデータ等から構成されている。さらに、該音声シーケンスデータは、フォルマントデータを指定するインデックスデータから構成される。
【００１９】
なお、かかるシーケンスデータの具体例は上述した特許文献１に開示されており、シーケンスデータには各音素のノートオンタイミングを特定する情報は含まれている。但し、特許文献１においては、一の音素に対してノートオンが発生した後、次の音素に対してノートオンが発生するまで、該一の音素の発音状態が継続することが原則であるため、ノートオフタイミングを直接的に指定するための情報が含まれていない。これに対して、本実施形態においては、各音素に対するノートオフタイミングを特定するために、各音素の発音時間を特定するデュレーションデータが含まれている。従って、各音素のノートオフタイミングは、対応するノートオンタイミングにデュレーションデータを加算したタイミングに等しい。
【００２０】
次に、処理がステップＳＰ１０３に進むと、決定されたノートオン／オフタイミングに基づいて、「フレーズ」が検出される。すなわち、「任意の音素のノートオフタイミングから次の音素のノートオンタイミングまでの時間が所定時間以下である区間」が各々「フレーズ」を構成すると判定される。次に、処理がステップＳＰ１０５に進むと、これら複数のフレーズ相互間の間隔が「フレーズ間隔」として検出される。以後の処理は「音声データ」に対する処理と同様であり、各フレーズ間隔、フレーズ長およびフレーズ強度等に基づいて、挿入されるブレス音のブレス強度、ブレス長、ブレスタイミング等が決定される。そして、本ルーチンの処理が終了すると、これらブレス音を発生させるブレスイベントデータがシーケンスデータに追加される。なお、音源が多数のブレス音を有していれば、ブレス強度、ブレス長によって定まる音色選択情報（プログラムチェンジ）およびノートオン情報等を音声ファイルに挿入しても良い。
【００２１】
以上の動作により、フレーズ長、フレーズ強度、フレーズ間隔に応じて、自然なブレス音が自動的に挿入されるので、使用者がブレスのことを意識することなく音声データ・シーケンスデータの制作を行うことが出来る。
【００２２】
３．変形例
本発明は上述した実施形態に限定されるものではなく、例えば以下のような種々の変形が可能であり、全て本発明の範疇に含まれる。
（１）上記実施形態は、フレーズの強度をレベルの平均強度で定義したが、最大強度、二乗平均強度によって定義しても良い。
（２）上記実施形態は、ブレス長をフレーズ間隔の範囲内で大きな値に設定し、ブレス強度を可変したが、ブレス強度を所定値に固定し、ブレス長を可変しても良い。ただし、ブレス長の計算値がフレーズ間隔よりも長い値になる場合においては、例外的に、ブレス強度は該所定値よりも大きな値にされる。
（３）上記実施形態では、波形の中心で最大値になりブレス開始時刻およびブレス終了時刻に近づくにつれて対称的に低下するようにブレス波形を設定したが（図２（ｂ）実線参照）、矩形波状のブレス波形（図２（ｃ）参照）でもよい。また、ブレス長が短くなる場合においては、ブレス最大値に早く到達し、徐々に低下するような波形（図２（ｂ）破線参照）にしてもよい。
（４）上記実施形態は、音声データファイルの全体を読み込み、ブレス音データを必要に応じて挿入したが、リアルタイムに音声データを読み込み、後ろのフレーズが読み込まれた時点でブレス音に係るデータを挿入した音声データファイルを生成してもよい。
（５）上記実施形態においては、ＲＯＭ６０に記憶されたプログラムによってブレス音合成方法を実行する機能を実現したが、例えばパーソナルコンピュータ上で動作するアプリケーションプログラムによっても同様の機能を実現することができる。このアプリケーションプログラムあるいはブレス音合成方法を実行して得た音声ファイルをＣＤ−ＲＯＭ、フレキシブルディスク等の記憶媒体に格納して頒布し、あるいは電気通信回線を通じて頒布してもよい。
【００２３】
【発明の効果】
以上説明したように本発明によれば、
第１の発音期間と第２の発音期間との間にブレス音またはブレスイベントが挿入される。
【図面の簡単な説明】
【図１】本発明の一実施形態であるブレス音合成装置の構成図である。
【図２】音声ファイルのリアルタイム再生波形図および該説明図である。
【図３】フローチャートを示す図である。
【符号の説明】
１０…液晶表示パネル、２０…Ｉ／Ｏ制御部、３０…ＦＤＤ制御部、４０…ＨＤＤ制御部、５０…ＣＰＵ、６０…ＲＯＭ、７０…ＲＡＭ、９０…バスライン、１００…ブレス音合成装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a breath sound synthesizing method, a breath sound synthesizing apparatus, and a program suitable for use in audio equipment.
[0002]
[Prior art]
It is a well known technique to correct voice data such as singing sound or sequence data on a personal computer. Therefore, it is considered possible to manually add breath (breathing) sound waveform data or event data to the audio data or the sequence data at an appropriate position.
On the other hand, Patent Document 1 discloses a singing voice synthesizer that automatically synthesizes a singing sound based on performance data including melody data and lyric data for musical instrument performance. It is disclosed that the performance data is appropriately provided with a breath change designation (exhalation flag) for the lyric data, and when this designation is made, a note-off operation is performed on the sound source channel that is sounding.
[0003]
[Patent Document 1]
Japanese Patent No. 3239706 Publication
[Problems to be solved by the invention]
However, in the technique disclosed in Patent Document 1, breath connection designation is performed only by inserting an exhalation flag. The singing sound that is being sounded at the timing of the exhalation flag is note-off, but there is no disclosure regarding a specific instruction for generating a breathing sound or specification of characteristics. In addition, even in the technique of manually adding a breath sound to audio data, there is a problem that an unnatural singing sound can be heard if the same breath sound is reproduced in the same way regardless of the length or weakness of the phrase. On the other hand, in order to synthesize a natural singing sound, manually specifying the timing, length, and strength of the breath each time is extremely complicated.
The present invention has been made in view of the above-described circumstances, and an object thereof is to provide a breath sound synthesis method, a breath sound synthesis device, and a program that allow a user to produce a song without being conscious of breath. And
[0005]
[Means for Solving the Problems]
In order to solve the above problems, the present invention is characterized by having the following configuration. The parentheses are examples.
2. The breath sound synthesizing method according to claim 1, wherein the first and second sound generation periods, which are successively generated in each of the continuous sound generation periods (phrases), are detected from the audio data or the sequence data. And a breath to insert a breath sound or a breath event for generating the breath sound at a timing after the end of the first sound generation period and before the start of the second sound generation period in the audio data or the sequence data. The waveform of the breath sound has a larger amount of breath as the continuous waveform intensity (phrase intensity) related to the second sound generation period is higher, and the longer the continuous waveform length (phrase length) is, a waveform which is set such that the breath volume is increased, breath length is the length of the breath sound, the second sound production period is started after the first sound period ends In the period of up to, which is set within a range in which the breath sound except a predetermined distance from the time to end to said second sound period starts (T _BP2), the strength of the breath sound blow strength is, the higher the breath volume is large large and is intended for the breath length is calculated to be smaller the longer the start timing of the breath sound, and the breath length, said predetermined interval ( And T _BP2 ).
Furthermore, in the configuration according to claim 2, in the breath sound synthesizing method according to claim 1, according to the continuous waveform strength (phrase strength) and the continuous waveform length (phrase length) according to the second pronunciation period. The method further comprises a breath sound insertion determination process for determining whether to insert the breath sound or the breath event according to the determined value of the breath amount.
The breath sound synthesizing apparatus according to claim 3 is characterized in that the breath sound synthesizing method according to claim 1 or 2 is executed.
The program according to claim 4 causes the processing device (50) to execute the breath sound synthesis method according to claim 1 or 2.
[0006]
DETAILED DESCRIPTION OF THE INVENTION
1. Configuration of Embodiment A configuration of a breath sound synthesizer according to an embodiment of the present invention will be described with reference to FIG.
In the figure, reference numeral 10 denotes a liquid crystal display panel, which displays data file designation and displays a real-time reproduction waveform for the data file. An I / O control unit 20 communicates audio data and MIDI data (sequence data) through a MIDI interface and a USB interface. A pointing device for performing various settings is included. Reference numeral 30 denotes an FDD control unit, which exchanges STANDARD_MIDI_FORMAT files and audio files via a flexible disk. Reference numeral 40 denotes an HDD control unit, which stores temporary files and parameters. Reference numeral 50 denotes a CPU which controls each unit. Reference numeral 60 denotes a ROM which stores a program for driving the CPU 50. A RAM 70 is used as a work memory. Reference numeral 90 denotes a bus line, which connects each part. The breath sound synthesizing apparatus 100 is configured as described above.
[0007]
2. The motion breath sound synthesizing apparatus 100 according to the embodiment supports both voice data that is real-time amplitude data and sequence data that is a set of event data. Hereinafter, processing for each data will be described separately.
(1) Operation with Voice Data The operation of the voice data of the breath sound synthesizer 100 will be described with reference to the flowchart of FIG. The routine of FIG. 3 is started by performing a predetermined operation.
In step SP101, the entire audio data file including information on the tune, tempo, and tune data is read into the hard disk. Reading is performed via a USB interface or a flexible disk. Here, FIG. 2A shows a part of a real-time reproduction waveform for the read audio data. A period of a continuous waveform that is continuously generated at a breath is called a phrase, and there are a plurality of phrases in time series. Also, there is a blank period between the previous phrase and the subsequent phrase, and a target breath sound is inserted in the blank period. Note that a phrase that is pronounced at a breath is expressed as “slur” in the staff. Then, after the reading is completed, the process proceeds to step SP103.
[0008]
In step SP103, a phrase detection process is performed. That is, the read audio data file is converted into amplitude data, and it is determined that a period of a continuous waveform, that is, a period in which a signal level equal to or higher than a predetermined threshold is continuous is a phrase. This process is performed for the entire read audio file. And a process progresses to step SP105 and a phrase analysis is performed.
[0009]
In step SP105, the data of the nth phrase and the data of the (n + 1) th phrase are read into the work memory, and the phrase interval is detected. Here, as shown in FIG. 2A, the phrase interval (PHRASE_INTERVAL) refers to the interval between the previous phrase and the subsequent phrase. In particular, in the first process, n = 1 is set, and the phrase interval between the first phrase and the second phrase is detected. Then, the process proceeds to step SP107. In step SP107, the subsequent phrase length (PHRASE_LENGTH) and phrase strength (PHRASE_STRENGTH) are detected. Here, the phrase strength is defined by the average value of the volume levels. Then, the process proceeds to step SP109.
[0010]
In step SP109, a breath amount calculation process is performed. The breath amount (BREATH_POWER) is given by a function of the phrase length (PHRASE_LENGTH) and the phrase strength (PHRASE_STRENGTH) of the subsequent phrase, and specifically, is given by the product of the phrase length and the phrase strength. In other words, the longer the phrase behind or the stronger the phrase, the greater the amount of breath required. Then, the process proceeds to step SP111.
[0011]
In step SP111, it is determined whether or not to insert a breath sound. If the breath amount (BREATH_POWER) is less than the predetermined threshold value, it is not necessary to add a breath sound and it is determined “NO”, and the process proceeds to step SP117. In step SP117, the phrase is updated. That is, the value of n is increased and the data for the next phrase is read into the work memory. Then, the process returns to step SP105. On the other hand, if the amount of breath is equal to or greater than a predetermined threshold, it is necessary to add a breath sound, it is determined “YES”, and the process proceeds to step SP113.
[0012]
In step SP113, processing for calculating the breath strength, the breath length, and the breath timing is performed. Here, the breath length (BREATH_LENGTH) is a function of the phrase interval (PHRASE_INTERVAL) and the time T _BP2 (see FIG. 2B) from the breath end time to the start time of the subsequent phrase. Here, the value of T _BP2 is set to a predetermined value in advance. That is, the time between the end time of the breath and the start time of the subsequent phrase is set at a constant interval, and the breath length is set as large as possible within the range of the phrase interval. However, the maximum value of the breath length is limited to a predetermined value. In particular, when the previous phrase does not exist and the phrase interval is regarded as infinite, the predetermined value is set as the breath length. Note that a time T _BP1 between the end time of the previous phrase and the start time of the breath may be set.
[0013]
The brace strength (BREATH_STRENGTH) is a function of the brace amount (BREATH_POWER) and the brace length (BREATH_LENGTH). The greater the brace amount, the greater the brace strength, and the longer the brace length, the smaller the brace strength (Fig. 2). (See (c)). The breath intensity is equal to the average amplitude of the breath waveform. Here, the breath waveform has a maximum value at the center of the waveform and is symmetrically lowered as it approaches the breath start time and the breath end time (solid line in FIG. 2B).
[0014]
Furthermore, the breath timing (the breath reproduction start time) is a function of the breath length (BREATH_LENGTH) and the time T _BP2 (see FIG. 2B). For example, a time obtained by subtracting the predetermined value and the value of _TBP2 from the start time of the subsequent phrase is set as the breath timing. Then, the process proceeds to step SP114.
[0015]
In step SP114, correction processing based on the tune and tempo is performed. For example, if the music is calm, it is preferable to reduce the brace strength and increase the brace length. On the other hand, if the song has a fast tempo, it is preferable to shorten the breath length and increase the breath strength. Then, the process proceeds to step SP115.
[0016]
In step SP115, it is determined whether or not processing has been completed up to the last phrase. If the process is not completed, “NO” is determined, and the process proceeds to step SP117. In step SP117, the phrase is updated, and the process returns to step SP105. On the other hand, if the processing is completed up to the last phrase, “YES” is determined, and this routine ends.
[0017]
Then, voice data synthesis processing is performed based on the calculated breath intensity, breath length, and breath timing. That is, a breath sound is added to the audio data using the information on the breath intensity and the breath length.
[0018]
(2) Operation for Sequence Data The operation for sequence data (lyric data and melody data) inserted in a MIDI system exclusive message or the like will be described below with reference to FIG.
In step SP101, sequence data is read via a MIDI interface, USB interface, or flexible disk. Here, the lyric data is composed of a plurality of audio sequence data corresponding to each phoneme of the lyric. Further, the audio sequence data is composed of index data for specifying formant data.
[0019]
A specific example of such sequence data is disclosed in Patent Document 1 described above, and the sequence data includes information specifying the note-on timing of each phoneme. However, in Patent Document 1, after note-on occurs for one phoneme, it is a principle that the sounding state of the one phoneme continues until note-on occurs for the next phoneme. Information for directly specifying the note-off timing is not included. On the other hand, in the present embodiment, in order to specify the note-off timing for each phoneme, duration data for specifying the pronunciation time of each phoneme is included. Therefore, the note-off timing of each phoneme is equal to the timing obtained by adding duration data to the corresponding note-on timing.
[0020]
Next, when the process proceeds to step SP103, a “phrase” is detected based on the determined note on / off timing. That is, it is determined that “sections in which the time from the note-off timing of an arbitrary phoneme to the note-on timing of the next phoneme is equal to or less than a predetermined time” constitutes “phrases”. Next, when the process proceeds to step SP105, an interval between the plurality of phrases is detected as a “phrase interval”. The subsequent processing is the same as the processing for “voice data”, and the breath strength, the breath length, the breath timing, and the like of the inserted breath sound are determined based on the interval between phrases, the phrase length, the phrase strength, and the like. When the processing of this routine ends, breath event data for generating these breath sounds is added to the sequence data. If the sound source has a large number of breath sounds, tone color selection information (program change) and note-on information determined by the breath intensity and the breath length may be inserted into the sound file.
[0021]
With the above operation, a natural breath sound is automatically inserted according to the phrase length, phrase strength, and phrase interval, so the user can create audio data and sequence data without being aware of the breath. I can do it.
[0022]
3. Modifications The present invention is not limited to the above-described embodiments. For example, the following various modifications are possible and all fall within the scope of the present invention.
(1) In the above embodiment, the phrase strength is defined by the average strength of the level, but may be defined by the maximum strength and the mean square strength.
(2) In the above embodiment, the brace length is set to a large value within the range of the phrase interval and the brace strength is varied. However, the brace strength may be fixed to a predetermined value and the brace length may be varied. However, when the calculated value of the breath length is longer than the phrase interval, the breath strength is exceptionally set to a value larger than the predetermined value.
(3) In the above embodiment, the breath waveform is set so that it becomes the maximum value at the center of the waveform and decreases symmetrically as it approaches the breath start time and the breath end time (see the solid line in FIG. 2 (b)). A wavy breath waveform (see FIG. 2C) may also be used. When the breath length is shortened, the waveform may be such that the maximum breath value is reached quickly and gradually decreases (see the broken line in FIG. 2B).
(4) In the above embodiment, the entire audio data file is read and breath sound data is inserted as necessary. However, the audio data is read in real time, and the data related to the breath sound is read when the subsequent phrase is read. An inserted audio data file may be generated.
(5) In the above embodiment, the function of executing the breath synthesis method is realized by the program stored in the ROM 60, but the same function can also be realized by an application program running on a personal computer, for example. An audio file obtained by executing this application program or breath sound synthesis method may be stored in a storage medium such as a CD-ROM or a flexible disk and distributed, or distributed through an electric communication line.
[0023]
【The invention's effect】
As described above, according to the present invention,
A breath sound or a breath event is inserted between the first sound generation period and the second sound generation period.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a breath sound synthesizer according to an embodiment of the present invention.
FIG. 2 is a waveform diagram for real-time reproduction of an audio file and an explanatory diagram thereof.
FIG. 3 is a diagram showing a flowchart.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Liquid crystal display panel, 20 ... I / O control part, 30 ... FDD control part, 40 ... HDD control part, 50 ... CPU, 60 ... ROM, 70 ... RAM, 90 ... Bus line, 100 ... Breath sound synthesizer

Claims

A detection process for detecting first and second sound generation periods, each of which is a continuous sound generation period, sequentially generated from the audio data or the sequence data;
A breath sound insertion process for inserting a breath sound or a breath event for generating the breath sound at a timing after the end of the first sound generation period and before the start of the second sound generation period in the audio data or the sequence data. And
The waveform of the breath sound is a waveform set so that the amount of breath increases as the continuous waveform intensity related to the second sound generation period increases, and the amount of breath increases as the continuous waveform length increases .
The breath length, which is the length of the breath sound, is calculated from the time when the breath sound ends in the period from the end of the first sound generation period to the start of the second sound generation period. It is set within a range excluding a predetermined interval until the period starts,
Blow strength is the intensity of the breath sound, the higher the breath volume is large large and is intended for the breath length is calculated to be smaller as long,
The breath sound start timing is calculated based on the breath length and the predetermined interval.

A breath sound insertion determination process for determining whether to insert the breath sound or the breath event according to the value of the breath amount determined according to the continuous waveform intensity and the continuous waveform length according to the second sound generation period. The breath sound synthesizing method according to claim 1, further comprising:

A breath sound synthesizing apparatus that executes the breath sound synthesizing method according to claim 1.

A program for causing a processing device to execute the breath synthesis method according to claim 1.