JP4432834B2

JP4432834B2 - Singing composition device and singing composition program

Info

Publication number: JP4432834B2
Application number: JP2005157758A
Authority: JP
Inventors: 秀紀劔持; ボナダジョルディ
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-05-30
Filing date: 2005-05-30
Publication date: 2010-03-17
Anticipated expiration: 2025-05-30
Also published as: JP2006330615A

Description

本発明は、歌唱音の合成を行う歌唱合成装置および歌唱合成プログラムに関する。 The present invention relates to a singing voice synthesizing apparatus and a singing voice synthesis program for synthesizing singing sounds.

歌詞データと音符データとを記憶し、音符データの読出に対応して歌詞データを読み出し、歌詞データに対応した音韻を発音させて歌詞を歌唱させる歌唱合成装置が各種提案されている。この種の歌唱合成装置は、人間らしい自然な歌唱音を合成する機能が求められる。人間の歌唱音を観察すると、歌唱音のピッチは、音符データ通りに階段状に変化するのではなく、ある音符に対応した発声から次の音符に対応した発声へと移行する際に、曲線を描いて連続的に変化する。この変化が歌唱の人間らしさの一因となっていると考えられる。そこで、従来から、歌唱合成装置において、合成される歌唱音のピッチに人間らしい変化を与えるための検討がなされてきた。例えば特許文献１は、連続した２つの音符に対応した歌唱音を合成する場合に、各音符に対応したピッチ間を直線で結び、この直線の前後にオーバシュート部を設けたピッチ曲線を求め、このピッチ曲線に従ってピッチが変化する歌唱音を合成する装置を提案している。
特開２００３−３２３１８８号公報 Various song synthesizers have been proposed that store lyric data and note data, read out the lyric data in response to reading out the note data, and sing the lyrics by generating phonemes corresponding to the lyric data. This type of synthesizer is required to have a function of synthesizing natural human-like singing sounds. Observing the human singing sound, the pitch of the singing sound does not change stepwise according to the note data, but changes from a utterance corresponding to one note to a utterance corresponding to the next note. Draw and change continuously. This change is thought to contribute to the human nature of singing. Therefore, conventionally, in a synthesizer, studies have been made to give human-like changes to the pitch of the synthesized singing sound. For example, when synthesizing a singing sound corresponding to two consecutive notes, Patent Document 1 connects pitches corresponding to each note with a straight line, and obtains a pitch curve provided with an overshoot part before and after the straight line. An apparatus for synthesizing a singing sound whose pitch changes according to this pitch curve is proposed.
JP 2003-323188 A

しかしながら、上述した特許文献１に開示の歌唱合成装置は、合成される歌唱音のピッチが音符に対応するピッチに到達した後、歌唱音のピッチに変化がないため、同じ音色が連続することとなり、ブザー音に似た人工的な歌声になりやすいという問題があった。 However, in the singing voice synthesizing device disclosed in Patent Document 1 described above, after the pitch of the synthesized singing sound reaches the pitch corresponding to the note, there is no change in the pitch of the singing sound. There was a problem that it was easy to become an artificial singing voice similar to a buzzer sound.

この発明は、以上説明した事情に鑑みてなされたものであり、合成される歌唱音のピッチが音符に対応したピッチに到達した後の区間においても、ピッチに変化を与えることができ、自然な歌唱音を得ることができる歌唱合成装置および歌唱合成プログラムを提供することを目的とする。 The present invention has been made in view of the circumstances described above, and even in a section after the pitch of the synthesized singing sound has reached the pitch corresponding to the note, the pitch can be changed, which is natural. An object is to provide a singing voice synthesizing apparatus and a singing voice synthesizing program capable of obtaining a singing sound.

この発明は、１つの音符の区間内において歌唱音のピッチの軌道の通過点となる３種類の制御点の相対的な位置を定めるデータに従い、曲を構成する音符毎に、歌唱音のピッチの軌道の通過点となる制御点を設定する制御点設定手段と、擬似乱数を発生し、発生した擬似乱数に従って、前記制御点設定手段により設定される制御点の時刻を分散させる分散手段と、隣接する２種類の制御点間の区間毎に定められた演算方法に従い、前記制御点設定手段によって設定された各制御点を通過する軌道を求め、該軌道に沿ってピッチを変化させるピッチデータを生成して記憶手段に格納するピッチデータ生成手段と、前記ピッチデータを前記記憶手段から読み出し、歌唱音を合成する音源に出力する出力手段とを具備することを特徴とする歌唱合成装置およびコンピュータを前記歌唱合成装置として機能させる歌唱合成プログラムを提供する。
かかる発明によれば、音符毎に制御点が設定され、制御点を通過する軌道に沿って、合成される歌唱音のピッチを変化させることができるので、合成される歌唱音のピッチが音符に対応したピッチに到達した後の区間においても、ピッチに変化を与えることができ、自然な歌唱音を得ることができる。
According to the present invention, the pitch of the singing sound pitch is determined for each note constituting the song in accordance with the data for determining the relative positions of the three types of control points that are the passage points of the pitch of the singing sound within a single note interval. A control point setting means for setting a control point to be a passing point of the trajectory, a dispersion means for generating a pseudo random number, and distributing the time of the control point set by the control point setting means in accordance with the generated pseudo random number; The trajectory passing through each control point set by the control point setting means is obtained according to a calculation method determined for each section between the two types of control points to be generated, and pitch data for changing the pitch along the trajectory is generated. A singing composition comprising: pitch data generating means for storing in the storing means; and output means for reading the pitch data from the storing means and outputting the singing sound to a sound source. Providing singing synthesis program to function location and computer as the singing voice synthesizing apparatus.
According to this invention, since the control point is set for each note and the pitch of the synthesized singing sound can be changed along the trajectory passing through the control point, the pitch of the synthesized singing sound is changed to the note. Even in the section after reaching the corresponding pitch, the pitch can be changed, and a natural singing sound can be obtained.

以下、図面を参照し、この発明の実施の形態を説明する。
＜第１実施形態＞
図１は、この発明の第１実施形態である歌唱合成装置の構成を示すブロック図である。この歌唱合成装置は、音声を合成して出力する機能を有するパーソナルコンピュータなどのコンピュータに対し、歌唱合成プログラムをインストールしたものである。図１において、ＣＰＵ１は、この歌唱合成装置の各部を制御する制御中枢である。ＲＯＭ２は、ローダなど、この歌唱合成装置の基本的な動作を制御するための制御プログラムを記憶した読み出し専用メモリである。表示部３は、装置の動作状態や入力データおよび操作者に対するメッセージなどを表示するための装置である。操作部４は、ユーザからコマンドや各種の情報を受け取るための手段であり、キーボードやマウスなどの各種の操作子により構成されている。インタフェース群５は、ネットワークを介して他の装置との間でデータ通信を行うためのネットワークインタフェースや、磁気ディスクやＣＤ−ＲＯＭなどの外部記憶媒体との間でデータの授受を行うためのドライバなどにより構成されている。ＨＤＤ（ハードディスク装置）６は、各種のプログラムやデータベースなどの情報を記憶するための不揮発性記憶装置である。ＲＡＭ７は、ＣＰＵ１によってワークエリアとして使用される揮発性メモリである。ＣＰＵ１は、操作部４を介して与えられる指令に従い、ＨＤＤ６内のプログラムをＲＡＭ７にロードして実行する。 Embodiments of the present invention will be described below with reference to the drawings.
<First Embodiment>
FIG. 1 is a block diagram showing a configuration of a singing voice synthesizing apparatus according to the first embodiment of the present invention. This singing voice synthesizing apparatus is obtained by installing a singing voice synthesis program on a computer such as a personal computer having a function of synthesizing and outputting a voice. In FIG. 1, CPU1 is a control center which controls each part of this song synthesis apparatus. The ROM 2 is a read-only memory that stores a control program for controlling basic operations of the singing voice synthesizing apparatus such as a loader. The display unit 3 is a device for displaying an operation state of the device, input data, a message for the operator, and the like. The operation unit 4 is a means for receiving commands and various types of information from the user, and includes various types of operators such as a keyboard and a mouse. The interface group 5 includes a network interface for performing data communication with other devices via a network, a driver for transmitting / receiving data to / from an external storage medium such as a magnetic disk or a CD-ROM, and the like. It is comprised by. The HDD (hard disk device) 6 is a non-volatile storage device for storing information such as various programs and databases. The RAM 7 is a volatile memory used as a work area by the CPU 1. The CPU 1 loads a program in the HDD 6 into the RAM 7 and executes it in accordance with a command given via the operation unit 4.

フォルマント音源８は、ＣＰＵ１による制御の下、歌唱音を合成する音源である。このフォルマント音源８には、歌唱音のフォルマントを特定する情報として、フォルマント中心周波数データＦＦｒｅｑ、フォルマントレベルデータＦｌｅｖｅｌおよびフォルマント形状データＦＳｈａｐｅが与えられ、また、歌唱音のピッチを指定するピッチデータＰＩＴＣＨが与えられる。フォルマント音源８は、これらのデータにより指定されたフォルマントおよびピッチを有する歌唱音を合成し、サウンドシステム９から出力する。 The formant sound source 8 is a sound source that synthesizes a singing sound under the control of the CPU 1. The formant sound source 8 is provided with formant center frequency data FFreq, formant level data Flevel and formant shape data FShape as information for specifying the formant of the singing sound, and pitch data PITCH specifying the pitch of the singing sound. It is done. The formant sound source 8 synthesizes a singing sound having a formant and pitch designated by these data, and outputs it from the sound system 9.

ＨＤＤ６に記憶される情報として、曲編集プログラム６１と、曲データ６２と、音韻データベース６３と、歌唱合成プログラム６４がある。曲データ６２は、曲を構成する一連の音符を表す音符データと、音符に合わせて発声する歌詞を表す歌詞データとからなるデータであり、曲毎に編集されてＨＤＤ６に格納される。曲編集プログラム６１は、曲データを編集するためにＣＰＵ１によって実行されるプログラムである。好ましい態様において、この曲編集プログラム６１は、ピアノの鍵盤の画像からなるＧＵＩ（グラフィカルユーザインタフェース）を表示部３に表示させる。ユーザは、表示部３に表示された鍵盤における所望の鍵の画像を操作部４の操作により指定し、また、その音符に合わせて発声する歌詞を操作部４の操作により入力することができる。曲編集プログラム６１は、このようにして、音符とその音符に合わせて発声する歌詞に関する情報をユーザから操作部４を介して受け取り、音符毎に音符データと歌詞データとを曲データ６２としてＨＤＤ６内に格納する。 Information stored in the HDD 6 includes a song editing program 61, song data 62, a phoneme database 63, and a song synthesis program 64. The song data 62 is data composed of note data representing a series of notes constituting the song and lyrics data representing lyrics uttered in accordance with the notes, and is edited for each song and stored in the HDD 6. The song editing program 61 is a program executed by the CPU 1 to edit song data. In a preferred embodiment, the song editing program 61 causes the display unit 3 to display a GUI (graphical user interface) including an image of a piano keyboard. The user can designate an image of a desired key on the keyboard displayed on the display unit 3 by operating the operation unit 4, and can input lyrics to be uttered in accordance with the note by operating the operation unit 4. In this way, the song editing program 61 receives information about the notes and the lyrics to be uttered in accordance with the notes from the user via the operation unit 4 and stores the note data and the lyrics data for each note in the HDD 6 as the song data 62. To store.

１個の音符に対応した音符データは、音符の発生時刻、音高、音符の長さを示す各情報を含んでいる。歌詞データは、音符に合わせて発音すべき歌詞を音符毎に定義したデータである。曲データ６２は、曲の開始からの発生順序に合わせて、個々の音符に対応した音符データと歌詞データとを時系列的に並べたものであり、曲データ６２内において音符データと歌詞データは音符単位で対応付けられている。 The note data corresponding to one note includes information indicating the note generation time, pitch, and note length. The lyric data is data in which lyrics to be pronounced in accordance with the notes are defined for each note. The song data 62 is a chronological arrangement of note data and lyrics data corresponding to each note in accordance with the generation order from the start of the song. In the song data 62, the note data and the lyrics data are Corresponds in note units.

歌唱合成プログラム６４は、この曲データ６２に従って、フォルマント音源８に歌唱音を合成させる処理をＣＰＵ１に実行させるプログラムである。好ましい態様において、歌唱合成プログラム６４および曲編集プログラム６１は、例えばインターネット内のサイトからインタフェース群５の中の適当なものを介してダウンロードされ、ＨＤＤ６にインストールされる。また、他の態様において、歌唱合成プログラム６４等は、ＣＤ−ＲＯＭ、ＭＤなどのコンピュータ読み取り可能な記憶媒体に記憶された状態で取引される。この態様では、インタフェース群５の中の適当なものを介して記憶媒体から歌唱合成プログラム６４等が読み出され、ＨＤＤ６にインストールされる。音韻データベース６３は、歌唱合成プログラム６４によって参照される音韻データのグループの集合体である。音韻データベース６３では、発生する音声の種類、例えば、男声、女声、あるいは特定の歌手等ごとに音韻データのグループが用意されている。歌唱合成プログラム６４による歌唱合成の際、ユーザは、操作部４の操作により、発声させたい声質に応じて使用する音韻データのグループを選択することができる。各グループを構成する音韻データは、各音韻をフォルマント合成するためのパラメータであり、各音韻対応に、その音韻を発生するための各フォルマントの形状を指定するフォルマント形状データＦＳｈａｐｅ、各フォルマントの中心周波数をそれぞれ指定するフォルマント中心周波数データＦＦｒｅｑ、各フォルマントの出力レベルをそれぞれ指定するフォルマントレベルデータＦＬｅｖｅｌからなっている。 The song synthesis program 64 is a program that causes the CPU 1 to execute a process of causing the formant sound source 8 to synthesize a song sound according to the song data 62. In a preferred embodiment, the song synthesis program 64 and the song editing program 61 are downloaded from a site in the Internet, for example, through an appropriate one in the interface group 5 and installed in the HDD 6. In another aspect, the song synthesis program 64 and the like are traded in a state stored in a computer-readable storage medium such as a CD-ROM or MD. In this aspect, the song synthesis program 64 and the like are read from the storage medium via an appropriate one in the interface group 5 and installed in the HDD 6. The phonological database 63 is a collection of phonological data groups referred to by the song synthesis program 64. In the phonological database 63, a group of phonological data is prepared for each type of generated voice, for example, male voice, female voice, or a specific singer. At the time of singing synthesis by the singing synthesis program 64, the user can select a group of phonological data to be used according to the voice quality to be uttered by operating the operation unit 4. The phoneme data constituting each group is a parameter for formant synthesis of each phoneme. For each phoneme, formant shape data FShape for designating the shape of each formant for generating the phoneme, the center frequency of each formant Are formant center frequency data FFreq and formant level data FLlevel for designating the output level of each formant.

図２は歌唱合成プログラム６４の処理内容を示している。歌唱合成プログラム６４は、大別して、歌唱合成スコア生成処理６４Ａと歌唱合成スコア出力処理６４Ｂとにより構成されている。歌唱合成スコア生成処理６４Ａは、ユーザによって指定された曲データ６２と、音韻データベース６３内においてユーザによって指定されたグループの音韻データとに基づき、ピッチデータトラック７１および音韻データトラック７２からなる歌唱合成スコア７０をＲＡＭ７内のワークエリアに生成する処理である（図１参照）。ここで、ピッチデータトラック７１は、歌唱音のピッチを指定するピッチデータＰＩＴＣＨを曲の開始点からの時間経過に従って時系列的に並べたデータストリームである。また、音韻データトラック７２は、歌唱音のフォルマントを特徴付けるフォルマント形状データＦＳｈａｐｅ、フォルマント中心周波数データＦＦｒｅｑおよびフォルマントレベルデータＦＬｅｖｅｌを曲の開始点からの時間経過に従って時系列的に並べたデータストリームである。 FIG. 2 shows the processing contents of the song synthesis program 64. The song synthesis program 64 is roughly divided into a song synthesis score generation process 64A and a song synthesis score output process 64B. The singing synthesis score generation process 64A is based on the song data 62 designated by the user and the phonological data of the group designated by the user in the phonological database 63, and the singing synthesis score comprising the pitch data track 71 and the phonological data track 72. This is a process of generating 70 in the work area in the RAM 7 (see FIG. 1). Here, the pitch data track 71 is a data stream in which pitch data PITCH specifying the pitch of the singing sound is arranged in time series as time passes from the start point of the song. The phonological data track 72 is a data stream in which formant shape data FShape characterizing the formant of the singing sound, formant center frequency data FFreq, and formant level data FLlevel are arranged in time series as time passes from the start of the song.

歌唱合成スコア生成処理６４Ａは、ピッチデータトラック７１を生成するための処理として、制御点設定処理６４１と、ピッチデータ生成処理６４２とを有している。本実施形態では、歌唱音の表情を豊かなものにするため、音符に対応したピッチに到達した以降における歌唱音のピッチを、ある軌道に沿って変化させ、このピッチの時間的変化を持った歌唱音をフォルマント音源８に生成させる。このように歌唱音のピッチを変化させるため、制御点設定処理６４１では、音符毎に、合成される歌唱音のピッチの軌跡の目標通過点として、基本的には３個の制御点Ａ、ＢおよびＣ、例外的にはそれらのうちの２個または１個の制御点を定める。ここで、制御点Ａは、歌唱音のピッチが音符に対応したピッチに到達した後、最初に通過すべき目標通過点、制御点Ｂは、２番目に通過すべき目標通過点、制御点Ｃは最後に通過すべき目標通過点である。 The song synthesis score generation process 64A includes a control point setting process 641 and a pitch data generation process 642 as processes for generating the pitch data track 71. In the present embodiment, in order to enrich the expression of the singing sound, the pitch of the singing sound after reaching the pitch corresponding to the note is changed along a certain trajectory, and this pitch has a temporal change. A singing sound is generated in the formant sound source 8. In order to change the pitch of the singing sound in this way, the control point setting process 641 basically has three control points A and B as target passage points of the pitch trajectory of the singing sound synthesized for each note. And C, exceptionally defining two or one of them control points. Here, after the pitch of the singing sound reaches the pitch corresponding to the note, the control point A is the target passage point that should pass first, and the control point B is the target passage point that should pass second, the control point C Is the target passing point that should be passed last.

このような制御点を音符毎に定めるため、本実施形態では、制御点Ａ、Ｂ、Ｃに関するデータが予め用意されている。ここで、制御点ＡおよびＢに関するデータは、音符の開始時刻（ノートオンタイミング）から制御点ＡまたはＢまでの経過時間を示す情報と、制御点ＡまたはＢのピッチと平均律により決められる音符のピッチとの音高差を示すセント値とを含む。また、制御点Ｃに関するデータは、制御点Ｃから音符の終了時刻（ノートオフタイミング）までの残り時間を示す情報と、制御点Ｃのピッチと平均律により決められる音符のピッチとの音高差を示すセント値とを含む。制御点設定処理６４１では、音符毎に、その音符の音符データと、これらの制御点Ａ、ＢおよびＣに関するデータとに基づき、制御点Ａ、ＢおよびＣのタイミングとピッチを定める。 In order to determine such control points for each note, in this embodiment, data relating to control points A, B, and C are prepared in advance. Here, the data relating to the control points A and B are information indicating the elapsed time from the note start time (note-on timing) to the control point A or B, and the note determined by the pitch of the control point A or B and the equal temperament. And a cent value indicating a pitch difference from the pitch. Further, the data related to the control point C is a pitch difference between the information indicating the remaining time from the control point C to the note end time (note-off timing) and the pitch of the note determined by the pitch of the control point C and the equal temperament. And a cent value indicating. In the control point setting process 641, the timing and pitch of the control points A, B, and C are determined for each note based on the note data of the note and the data related to these control points A, B, and C.

ピッチデータ生成処理６４２では、これらの制御点Ａ、ＢおよびＣを通過するピッチの軌道を求め、この軌道に沿ってピッチを変化させるピッチデータを生成し、このピッチデータをピッチデータトラック７１に格納する。本実施形態では、制御点ＡおよびＢ間と、制御点ＢおよびＣ間と、制御点Ｃおよび次の音符の制御点Ａ間について、各区間のピッチの軌道を求めるための関数が予め定義されている。ピッチデータ生成処理６４２では、これらの関数を利用し、各制御点を通過するピッチの軌道を求める。 In the pitch data generation processing 642, a pitch trajectory passing through these control points A, B and C is obtained, pitch data for changing the pitch along this trajectory is generated, and this pitch data is stored in the pitch data track 71. To do. In the present embodiment, functions for determining the pitch trajectory of each section are defined in advance between the control points A and B, between the control points B and C, and between the control point C and the control point A of the next note. ing. The pitch data generation process 642 uses these functions to determine the pitch trajectory that passes through each control point.

歌唱合成スコア生成処理６４Ａは、音韻データトラック７２を生成するための処理として、音韻抽出処理６４３と、音韻データ生成処理６４４とを有している。音韻抽出処理６４３は、歌詞データ６２Ｂが示す歌詞から音韻を順次抽出する処理である。音韻データ生成処理６４４は、音韻データベース６３内の音韻データのうちユーザによって指定されたグループのものの中から、音韻抽出処理６４３により抽出された音韻を合成するための音韻データ、具体的にはフォルマント形状データＦＳｈａｐｅ、フォルマント中心周波数データＦＦｒｅｑおよびフォルマントレベルデータＦＬｅｖｅｌを読み出し、音韻データトラック７２に格納する。 The singing synthesis score generation process 64 </ b> A includes a phonological extraction process 643 and a phonological data generation process 644 as processes for generating the phonological data track 72. The phoneme extraction process 643 is a process of sequentially extracting phonemes from the lyrics indicated by the lyrics data 62B. The phoneme data generation process 644 is phoneme data for synthesizing the phonemes extracted by the phoneme extraction process 643 from the groups specified by the user among the phoneme data in the phoneme database 63, specifically, formant shape. Data FShape, formant center frequency data FFreq, and formant level data FLlevel are read and stored in the phoneme data track 72.

歌唱合成スコア出力処理６４Ｂは、以上説明した歌唱合成スコア生成処理６４Ａにより生成されたピッチデータトラック７１および音韻データトラック７２の同期再生を行い、ピッチデータＰＩＴＣＨと、音韻データであるフォルマント形状データＦＳｈａｐｅ、フォルマント中心周波数データＦＦｒｅｑおよびフォルマントレベルデータＦＬｅｖｅｌをフォルマント音源８に供給する処理である。さらに詳述すると、歌唱合成スコア出力処理６４Ｂでは、曲の進行に合わせて各音符に対応したピッチデータＰＩＴＣＨおよび音韻データをピッチデータトラック７１および音韻データトラック７２から読み出してフォルマント音源８に供給する。また、歌唱合成スコア出力処理６４Ｂでは、母音のみからなる音節は、音符のノートオンタイミングにおいてフォルマント音源８から出力され、子音および母音からなる音節は、音符のノートオンタイミングにおいて母音部分がフォルマント音源８から出力されるように、各音韻データのフォルマント音源８への供給タイミングが制御される。
以上が歌唱合成プログラムの処理内容の概略である。なお、歌唱合成プログラムの詳細、特に制御点設定処理６４１およびピッチデータ生成処理６４２の詳細については、説明の重複を避けるため、本実施形態の動作説明において明らかにする。 The singing synthesis score output processing 64B performs synchronous reproduction of the pitch data track 71 and the phonological data track 72 generated by the singing synthetic score generation processing 64A described above, and the pitch data PITCH and the formant shape data FShape, which is phonological data, This is a process of supplying formant center frequency data FFreq and formant level data FLlevel to the formant sound source 8. More specifically, in the singing synthesis score output process 64B, the pitch data PITCH and phonological data corresponding to each note are read from the pitch data track 71 and the phonological data track 72 and supplied to the formant sound source 8 as the music progresses. In the singing synthesis score output process 64B, syllables consisting only of vowels are output from the formant sound source 8 at the note-on timing of the notes, and syllables consisting of consonants and vowels are composed of the vowel part at the note-on timing of the notes. The timing of supplying each phoneme data to the formant sound source 8 is controlled.
The above is the outline of the processing content of the song synthesis program. Note that details of the song synthesis program, particularly the details of the control point setting process 641 and the pitch data generation process 642, will be clarified in the description of the operation of this embodiment in order to avoid duplication of explanation.

次に本実施形態の動作を説明する。なお、以下では、本実施形態の特徴である歌唱合成の動作を説明し、他の動作の説明は省略する。ＣＰＵ１は、操作部４を介して所定の指示が与えられることにより、ＨＤＤ６内の歌唱合成プログラム６４をＲＡＭ７にロードして実行する。この歌唱合成プログラム６４の実行過程において、歌唱合成の対象である曲データ６２と、合成に用いる音韻データのグループとが操作部４の操作により指定され、歌唱合成開始の指示がユーザから与えられると、まず、歌唱合成スコア生成処理６４Ａの実行が開始され、それから少し遅れて歌唱合成スコア出力処理６４Ｂの実行が開始される。 Next, the operation of this embodiment will be described. In addition, below, the operation | movement of singing composition which is the characteristics of this embodiment is demonstrated, and description of another operation | movement is abbreviate | omitted. When a predetermined instruction is given via the operation unit 4, the CPU 1 loads the song synthesis program 64 in the HDD 6 into the RAM 7 and executes it. In the process of executing the song synthesis program 64, when song data 62 to be synthesized and a group of phonological data used for synthesis are designated by the operation of the operation unit 4, and an instruction to start song synthesis is given by the user. First, the execution of the song synthesis score generation process 64A is started, and the execution of the song synthesis score output process 64B is started a little later.

歌唱合成スコア生成処理６４Ａでは、制御点設定処理６４１およびピッチデータ生成処理６４２の実行により音符データ６２Ａからピッチデータトラック７１が生成され、音韻抽出処理６４３および音韻データ生成処理６４４の実行により歌詞データ６２Ｂから音韻データトラック７２が生成される。 In the singing synthesis score generation process 64A, the pitch data track 71 is generated from the note data 62A by executing the control point setting process 641 and the pitch data generation process 642, and the lyric data 62B by executing the phoneme extraction process 643 and the phoneme data generation process 644. From the phoneme data track 72 is generated.

歌唱合成スコア出力処理６４Ｂでは、歌唱合成スコア生成処理６４Ａにより生成されたピッチデータトラック７１および音韻データトラック７２の同期再生が行われ、ピッチデータＰＩＴＣＨと、音韻データがフォルマント音源８に供給される。この結果、音韻データによって特徴付けられるフォルマントを有し、かつ、ピッチデータＰＩＴＣＨにより指定されるピッチを有する歌唱音がフォルマント音源８により順次合成され、サウンドシステム９から出力される。 In the singing synthesis score output process 64B, the pitch data track 71 and the phonological data track 72 generated by the singing synthesis score generation process 64A are synchronously reproduced, and the pitch data PITCH and the phonological data are supplied to the formant sound source 8. As a result, a singing sound having a formant characterized by phonological data and having a pitch specified by the pitch data PITCH is sequentially synthesized by the formant sound source 8 and output from the sound system 9.

本実施形態では、フォルマント音源８により合成される歌唱音のピッチを制御するための手段として、制御点設定処理６４１およびピッチデータ生成処理６４２が実行される。このため、フォルマント音源８により合成される歌唱音のピッチの変化の態様は、以下説明するように本実施形態特有のものとなる。 In the present embodiment, a control point setting process 641 and a pitch data generation process 642 are executed as means for controlling the pitch of the singing sound synthesized by the formant sound source 8. For this reason, the aspect of the change of the pitch of the singing sound synthesize | combined by the formant sound source 8 becomes a characteristic of this embodiment so that it may demonstrate below.

図３（ａ）〜（ｃ）は制御点設定処理６４１およびピッチデータ生成処理６４２の実行例を示している。図３（ａ）は、音符データ６２Ａの内容を例示するものである。この図において、横軸は時間であり、縦軸は音符のピッチである。この図に示すように、音符データ６２Ａによって表される一連の音符Ｎ０〜Ｎ４のピッチは、階段状に変化する。制御点設定処理６４１では、これらの音符Ｎ０〜Ｎ４に制御点が設定され、ピッチデータ生成処理６４２では各制御点を通過するピッチの軌道が求められ、この軌道に沿ってピッチを変化させるピッチデータが生成される。図３（ｂ）は、音符Ｎ１に対して制御点Ａ、Ｂ、Ｃが設定され、これらの制御点間を結ぶピッチの軌道が決定された様子を示している。図３（ｂ）に示す例において、制御点ＡおよびＣのピッチは、音符のピッチよりも僅かに高いピッチとされ、制御点Ｂのピッチは音符のピッチよりも低いピッチとされている。 FIGS. 3A to 3C show execution examples of the control point setting process 641 and the pitch data generation process 642. FIG. 3A illustrates the contents of the note data 62A. In this figure, the horizontal axis is time, and the vertical axis is the pitch of notes. As shown in this figure, the pitch of a series of musical notes N0 to N4 represented by the musical note data 62A changes in a staircase pattern. In the control point setting process 641, control points are set for these notes N0 to N4. In the pitch data generation process 642, a pitch trajectory passing through each control point is obtained, and pitch data for changing the pitch along this trajectory. Is generated. FIG. 3B shows a state in which control points A, B, and C are set for the note N1, and a pitch trajectory connecting these control points is determined. In the example shown in FIG. 3B, the pitch of the control points A and C is slightly higher than the pitch of the note, and the pitch of the control point B is lower than the pitch of the note.

制御点設定処理６４１では、１つの音符に対し、基本的に３つの制御点Ａ、Ｂ、Ｃが設定される。しかし、音符の符長が第１の閾値より短い場合には、最後の制御点Ｃが消去され、制御点ＡおよびＢのみが設定される。さらに、音符の符長が第１の閾値よりも短い第２の閾値よりも短く、２つの制御点を設定するに値しない場合には、制御点Ａのみがその音符に設定される。図３（ｃ）には、符長が長い音符Ｎ１およびＮ４に３つの制御点Ａ、Ｂ、Ｃが設定され、第１の閾値よりも短い符長の音符Ｎ３に２つの制御点Ａ、Ｂが設定され、第２の閾値よりも短い符長の音符Ｎ２に１つの制御点Ａが設定された例が示されている。 In the control point setting process 641, basically three control points A, B, and C are set for one note. However, if the note length of the note is shorter than the first threshold, the last control point C is deleted, and only the control points A and B are set. Further, if the note length of the note is shorter than the second threshold shorter than the first threshold and is not worth setting two control points, only the control point A is set to that note. In FIG. 3C, three control points A, B, and C are set for notes N1 and N4 having a long note length, and two control points A and B are set for a note N3 having a note length shorter than the first threshold. Is set, and one control point A is set for a note N2 having a note length shorter than the second threshold.

ピッチデータ生成処理６４２においては、制御点ＡおよびＢ間、制御点ＢおよびＣ間、最後の制御点（通常は制御点Ｃ、例外的に制御点ＢまたはＡ）および後続音符の制御点Ａ間の各区間についてピッチの軌道の演算方法が定められており、この演算方法に従って各区間の軌道が演算される。この制御点間の軌道の演算方法には各種の態様が考えられる。図３（ｂ）では、制御点ＡおよびＢ間を結ぶ直線をピッチの軌道とする第１の態様が破線で示され、制御点ＡおよびＢ間を結ぶ緩やかな谷をピッチの軌跡とする第２の態様が実線で示されている。 In the pitch data generation process 642, between the control points A and B, between the control points B and C, between the last control point (usually control point C, exceptionally control point B or A) and the control point A of the subsequent note. The pitch trajectory calculation method is determined for each of the sections, and the trajectory of each section is calculated according to this calculation method. Various modes are conceivable for the method of calculating the trajectory between the control points. In FIG. 3B, the first mode in which the straight line connecting the control points A and B is the pitch trajectory is indicated by a broken line, and the gentle valley connecting the control points A and B is the pitch trajectory. Two aspects are shown in solid lines.

第１の態様では、例えば時刻ｔにおける制御点ＡおよびＢ間の軌道上のピッチｐを、次式により求めることができる。
ｐ＝ｐＡ＋（（ｐＢ−ｐＡ）／（ｔＢ−ｔＡ））（ｔ−ｔＡ） ……（１）
ここで、ｐＡは制御点Ａのピッチ、ｐＢは制御点Ｂのピッチ、ｔＡは制御点Ａの時刻、ｔＢは制御点Ｂの時刻である。 In the first aspect, for example, the pitch p on the trajectory between the control points A and B at time t can be obtained by the following equation.
p = pA + ((pB−pA) / (tB−tA)) (t−tA) (1)
Here, pA is the pitch of the control point A, pB is the pitch of the control point B, tA is the time of the control point A, and tB is the time of the control point B.

第２の態様では、制御点ＡおよびＢ間の軌道上のピッチＰを、次式により求めることができる。
ｐ＝ｐＡ＋（（ｐＢ−ｐＡ）／（ｔＢ−ｔＡ））（ｔ−ｔＡ）
−ｖ１ｓｉｎ^２（π（ｔ−ｔＡ）/（ｔＢ−ｔＡ）） ……（２）
ここで、ｖ１は谷の深さを表すパラメータである。
制御点ＢおよびＣ間のピッチの軌道も同様であり、この軌道は制御点間を結ぶ直線としてもよいし、曲線としてもよい。
本実施形態では、実際の人間の歌唱音におけるピッチの振る舞いに合わせ、制御点ＡおよびＢ間の軌道は谷とし、制御点ＢおよびＣ間の軌道は直線としている。 In the second mode, the pitch P on the trajectory between the control points A and B can be obtained by the following equation.
p = pA + ((pB−pA) / (tB−tA)) (t−tA)
−v1sin ² (π (t−tA) / (tB−tA)) (2)
Here, v1 is a parameter representing the depth of the valley.
The same applies to the trajectory of the pitch between the control points B and C, and this trajectory may be a straight line connecting the control points or a curved line.
In the present embodiment, the trajectory between the control points A and B is a valley and the trajectory between the control points B and C is a straight line according to the behavior of the pitch in an actual human singing sound.

音符の切り換わり部分におけるピッチの軌道は、先行する音符の最後の制御点（通常は制御点Ｃ）と後続の音符の最初の制御点Ａとを結ぶ直線または曲線とされる。図３（ｃ）に示す例では、音符Ｎ０から音符Ｎ１への切り換わり部分におけるピッチの軌道として、先行する音符Ｎ０の最後の制御点Ｃと後続の音符Ｎ１の最初の制御点Ａを結ぶ曲線が採用されている。実際の歌声のピッチの動きを観察すると、この音符の切り換わり部分のピッチの軌道は、図４に実線で示すように撓んでいることが多い。そこで、好ましい態様におけるピッチデータ生成処理６４２では、先行する音符の最後の制御点（通常は制御点Ｃ、例外的に制御点ＢまたはＡ）と後続の最初の制御点との間の区間の軌道を次のようにして求める。 The pitch trajectory at the note switching portion is a straight line or a curve connecting the last control point of the preceding note (usually control point C) and the first control point A of the following note. In the example shown in FIG. 3C, a curve connecting the last control point C of the preceding note N0 and the first control point A of the succeeding note N1 as the trajectory of the pitch at the switching portion from the note N0 to the note N1. Is adopted. When observing the actual movement of the pitch of the singing voice, the pitch trajectory of the note switching portion is often bent as shown by the solid line in FIG. Therefore, in the pitch data generation processing 642 in the preferred embodiment, the trajectory of the section between the last control point of the preceding note (usually control point C, exceptionally control point B or A) and the subsequent first control point. Is obtained as follows.

すなわち、時刻をｔ、先行する音符の最後の制御点の時刻をｔＣ、同制御点のピッチをｐＣ、後続の音符の最初の制御点Ａの時刻をｔＡ、同制御点のピッチをｐＡとした場合、ｔ≦（ｔＡ＋ｔＣ）／２の区間は式（３）に従い、ｔ＞（ｔＡ＋ｔＣ）／２の区間は式（４）に従って音符の切り換わり部分のピッチｐを求めるのである。
ｐ＝（ｐＡ＋ｐＣ）／２
−（（ｐＡ−ｐＣ）／２）（１−２（ｔ−ｔＣ）／（ｔＡ−ｔＣ））^α）……（３）
ｐ＝（ｐＡ＋ｐＣ）／２
＋（（ｐＡ−ｐＣ）／２）（２（ｔ−ｔＣ）／（ｔＡ−ｔＣ）−１）^α）……（４） That is, the time is t, the time of the last control point of the preceding note is tC, the pitch of the control point is pC, the time of the first control point A of the subsequent note is tA, and the pitch of the control point is pA. In this case, the interval of t ≦ (tA + tC) / 2 is determined according to the equation (3), and the interval of t> (tA + tC) / 2 is determined according to the equation (4) to obtain the pitch p of the note switching portion.
p = (pA + pC) / 2
− ((PA−pC) / 2) (1-2 (t−tC) / (tA−tC)) ^α ) (3)
p = (pA + pC) / 2
+ ((PA−pC) / 2) (2 (t−tC) / (tA−tC) −1) ^α ) (4)

上記式（３）および（４）において、αはピッチ変化の撓み具合を調整するためのパラメータである。このパラメータαが１である場合、制御点間の軌道は図４に破線で示すように直線となる。また、パラメータαが１より小さい正の数である場合、図４に実線で示すように、前半のｔ≦（ｔＡ＋ｔＣ）／２の区間の軌道は下方に撓んだ曲線となり、後半のｔ＞（ｔＡ＋ｔＣ）／２の区間の軌道は上方に撓んだ曲線となる。好ましい態様では、この曲線が音符の切り換わり部分のピッチの軌道として採用される。なお、パラメータαが１より大きい正の数である場合は、逆に、前半の区間の軌道は上方に撓んだ曲線となり、後半の区間の軌道は下方に撓んだ曲線となる。 In the above formulas (3) and (4), α is a parameter for adjusting the bending degree of the pitch change. When the parameter α is 1, the trajectory between the control points is a straight line as shown by a broken line in FIG. When the parameter α is a positive number smaller than 1, as shown by a solid line in FIG. 4, the trajectory in the first half section t ≦ (tA + tC) / 2 becomes a downwardly bent curve, and the second half t> The trajectory in the section of (tA + tC) / 2 is a curved line bent upward. In a preferred embodiment, this curve is employed as the pitch trajectory of the note switching portion. When the parameter α is a positive number larger than 1, conversely, the trajectory in the first half section is a curved line bent upward, and the trajectory in the second half section is a curved line bent downward.

音符の切り換わり部分のピッチの軌道に、前述した制御点ＡおよびＢ間の軌道に適用したような谷（図３（ｂ）参照）を設けてもよい。この態様によれば、音程が上がるときに、歌声の音程が一旦下がってから上昇したり、音程が下がるときに、歌声の音程が一旦下がりすぎてから再度上がる現象を歌唱合成において再現することができる。 A trough (see FIG. 3B) as applied to the trajectory between the control points A and B described above may be provided on the trajectory of the pitch at which the notes are switched. According to this aspect, when the pitch goes up, the pitch of the singing voice rises once and then rises, or when the pitch goes down, the phenomenon that the pitch of the singing voice goes down once and then rises again can be reproduced in singing synthesis. it can.

ところで、実際の歌唱では、歌いだしの部分、すなわち、最初の音符または所定時間長以上の休符の後の音符に対応した歌唱部分において、特別なピッチの動きをする場合が多い。この現象を再現するため、本実施形態における制御点設定処理６４１では、対象とする音符の前の音符がない場合または対象とする音符の前に所定時間長以上の休符がある場合に、図５に示すように、対象とする音符の最初の制御点Ａの前に１つ以上の追加の制御点（図５では制御点Ｐ、Ｑ）を配置する。そして、ピッチデータ生成処理６４２では、この追加の制御点と、対象とする音符の最初の制御点Ａとを結ぶ軌道を求め、この軌道に沿ってピッチを変化させるピッチデータを生成する。この場合、各制御点間を結ぶ軌道は、破線で示すように直線にしてもよく、あるいは実線で示すように曲線にしてもよい。なお、歌いだし部分の歌詞の存在しない期間は、歌詞から音韻データを生成することができないので、無音から歌いだしの歌詞に対応する音韻に遷移する音韻データを音韻データ生成処理６４４において生成し、音韻データトラック７２に格納する。 By the way, in actual singing, there is often a special pitch movement in a singing part, that is, a singing part corresponding to a note after a first note or a rest longer than a predetermined time length. In order to reproduce this phenomenon, in the control point setting process 641 in the present embodiment, when there is no note before the target note or when there is a rest longer than a predetermined time before the target note, As shown in FIG. 5, one or more additional control points (control points P and Q in FIG. 5) are arranged before the first control point A of the target note. In the pitch data generation process 642, a trajectory connecting this additional control point and the first control point A of the target note is obtained, and pitch data for changing the pitch along this trajectory is generated. In this case, the trajectory connecting the control points may be a straight line as shown by a broken line or a curved line as shown by a solid line. During the period when the lyric of the singing part does not exist, phonological data cannot be generated from the lyric, so that the phonological data generating process 644 generates phonological data that transitions from silence to the phonic corresponding to the singing lyric, Stored in the phoneme data track 72.

図５に示す例では、制御点Ｐは、最初の音符のノートオンタイミングよりも所定時間だけ前の位置に配置され、そのピッチは最初の音符に対応したピッチｐＳよりも所定量だけ低いピッチとされている。また、制御点Ｑは、最初の音符のノートオンタイミングに近い位置に配置され、制御点Ｑのピッチは、制御点Ｐよりもピッチが高く、かつ、制御点Ａよりは制御点Ｐに近いピッチとされている。この場合、歌いだし部分における合成歌唱音のピッチは、最初の音符のノートオンタイミングの手前の制御点Ｐのある時刻から徐々に上昇を開始し、制御点Ｑのあるノートオンタイミング近くの時刻以降になると、それまでよりも急峻なスロープを描いて、制御点Ａのあるピッチまで上昇する。このように、本態様によれば、歌いだし部分の特別なニュアンスを歌唱合成において再現することができる。 In the example shown in FIG. 5, the control point P is arranged at a position that is a predetermined time before the note-on timing of the first note, and its pitch is a pitch that is lower than the pitch pS corresponding to the first note by a predetermined amount. Has been. Further, the control point Q is arranged at a position close to the note-on timing of the first note, and the pitch of the control point Q is higher than the control point P and closer to the control point P than the control point A. It is said that. In this case, the pitch of the synthesized singing sound at the beginning of the singing starts gradually rising from a certain time at the control point P before the note-on timing of the first note, and after the time near the note-on timing at which the control point Q is located. Then, the slope rises to a certain pitch of the control point A while drawing a steeper slope than before. Thus, according to this aspect, the special nuance of the singing part can be reproduced in the singing synthesis.

歌いだしの場合には、追加の制御点Ｐ、Ｑを配置することに加えて、歌いだしの音符の最初の制御点Ａを通常とは異なった位置に配置するようにすると、歌いだしの場合の特別なニュアンスを歌唱合成において再現することができる。例えば歌いだしの部分においては、歌声の音程が音符に対応した正しい音程に到達するまで若干の時間がかかる場合が多い。そこで、好ましい態様では、これを再現するために、歌いだし部分の最初の音符については、特別に、制御点Ａのピッチを音符に対応したピッチよりも低く設定する。このようにすることで、より自然な歌いまわしを歌唱合成において再現することができる。 In the case of singing, in addition to arranging the additional control points P and Q, if the first control point A of the singing note is arranged at a different position from the normal case, This special nuance can be reproduced in singing synthesis. For example, in the beginning part, it often takes some time until the pitch of the singing voice reaches the correct pitch corresponding to the note. Therefore, in the preferred embodiment, in order to reproduce this, the pitch of the control point A is set to be lower than the pitch corresponding to the note for the first note of the singing portion. In this way, a more natural singing can be reproduced in singing synthesis.

同様に、実際の歌唱では、歌い終わりの部分、すなわち、曲の最後の音符または所定時間長以上の休符の前の音符に対応した歌唱部分においても、ピッチが最後にだら下がりになる等、特別なピッチの動きをする場合が多い。この現象を再現するため、本実施形態における制御点設定処理６４１では、対象とする音符の後に音符がない場合または対象とする音符の後に所定時間長以上の休符がある場合に、図６に示すように、対象とする音符の最後の制御点Ｃの後に１つ以上の追加の制御点（図６では制御点Ｒ、Ｓ）を配置する。そして、ピッチデータ生成処理６４２では、この追加の制御点と、対象とする音符の最後の制御点Ｃとを結ぶ軌道を求め、この軌道に沿ってピッチを変化させるピッチデータを生成する。この場合、各制御点間を結ぶ軌道は、破線で示すように直線にしてもよく、あるいは実線で示すように曲線にしてもよい。音韻データは、歌い終わりの部分の最後の音符に合わせて発声される音韻のものを引き続いて採用し、音韻データトラック７２に格納すればよい。 Similarly, in actual singing, in the singing end part, that is, in the singing part corresponding to the last note of the song or the note before the rest longer than the predetermined time length, the pitch is gradually lowered, etc. There are many cases of special pitch movement. In order to reproduce this phenomenon, in the control point setting process 641 in the present embodiment, when there is no note after the target note or when there is a rest longer than a predetermined time after the target note, FIG. As shown, one or more additional control points (control points R and S in FIG. 6) are placed after the last control point C of the target note. In the pitch data generation process 642, a trajectory connecting this additional control point and the last control point C of the target note is obtained, and pitch data for changing the pitch along this trajectory is generated. In this case, the trajectory connecting the control points may be a straight line as shown by a broken line or a curved line as shown by a solid line. As the phoneme data, the phoneme data that is uttered in accordance with the last note at the end of the singing may be continuously adopted and stored in the phoneme data track 72.

図６に示す例では、制御点Ｓは、最後の音符のノートオフタイミングよりも所定時間だけ後の位置に配置され、そのピッチは最後の音符に対応したピッチｐＥよりも所定量だけ低いピッチとされている。また、制御点Ｒは、最後の音符のノートオフタイミングに近い位置に配置され、制御点Ｒのピッチは、制御点Ｓよりもピッチが高く、かつ、最後の音符の最後の制御点Ｃよりは制御点Ｓに近いピッチとされている。この場合、歌い終わり部分における合成歌唱音のピッチは、最後の音符の最後の制御点Ｃの時刻からノートオフタイミングの近くの制御点Ｒのある時刻に掛けて下降し、それ以降は、それまでよりも緩やかなスロープを描いて、制御点Ｓのあるピッチまで下降する。このように、本態様によれば、歌い終わり部分の特別なニュアンスを歌唱合成において再現することができる。 In the example shown in FIG. 6, the control point S is arranged at a position after a predetermined time after the note-off timing of the last note, and the pitch is a pitch lower by a predetermined amount than the pitch pE corresponding to the last note. Has been. The control point R is arranged at a position close to the note-off timing of the last note. The pitch of the control point R is higher than that of the control point S and is higher than that of the last control point C of the last note. The pitch is close to the control point S. In this case, the pitch of the synthesized singing sound at the end of the singing is lowered from the time of the last control point C of the last note to the time of the control point R near the note-off timing, and thereafter A gentler slope is drawn and the control point S is lowered to a certain pitch. Thus, according to this aspect, the special nuance of the singing end portion can be reproduced in the singing synthesis.

以上説明したように、本実施形態によれば、歌唱音のピッチの軌道の通過点となる１〜３個の制御点を、音符データが示す音符毎に設定し、隣接する２種類の制御点間の区間毎に定められた軌道の演算方法に従い、各制御点を通過する軌道を求め、該軌道に沿って合成歌唱音のピッチを変化させるようにしたので、合成される歌唱音のピッチに対し、音符に対応するピッチに到達した後の区間においても変化を与えることができ、人間の歌唱に近い自然な歌唱を再現することができる。また、本実施形態によれば、音符の切り換わり部分において、先行する音符の最後の制御点と後続の音符の最初の制御点との間の区間のピッチの軌道に所望の変化を持たせることができるので、音符の切り換わり部分について自然な歌唱を再現することができる。また、本実施形態によれば、歌いだし部分については最初の音符の前に追加の制御点を配置し、歌い終わり部分については最後の音符の後に追加の制御点を配置し、これらの制御点を通過する軌道に従って合成歌唱音のピッチを変化させるようにしたので、歌いだし部分および歌い終わり部分の特別なニュアンスを歌唱合成において再現することができる。 As described above, according to the present embodiment, 1 to 3 control points serving as the passing points of the pitch of the singing sound are set for each note indicated by the note data, and two adjacent control points are set. The trajectory passing through each control point is obtained according to the trajectory calculation method determined for each interval, and the pitch of the synthesized singing sound is changed along the trajectory. On the other hand, a change can be given even in a section after reaching the pitch corresponding to the note, and a natural song close to a human song can be reproduced. Further, according to the present embodiment, at the part where the notes are switched, a desired change is given to the pitch trajectory of the section between the last control point of the preceding note and the first control point of the subsequent note. Therefore, natural singing can be reproduced for the part where the notes change. Further, according to the present embodiment, an additional control point is arranged before the first note for the singing portion, and an additional control point is arranged after the last note for the ending portion of the singing. Since the pitch of the synthesized singing sound is changed in accordance with the trajectory passing through, special nuances of the singing part and the singing end part can be reproduced in the singing composition.

＜第２実施形態＞
次に図７を参照し、この発明の第２実施形態について説明する。本実施形態は、上記第１実施形態に変形を加えたものである。本実施形態では、上記第１実施形態における歌唱合成スコア生成処理６４Ａに対し、擬似乱数生成処理６４５が追加されている。本実施形態では、制御点設定処理６４１により制御点の設定を行うときに、それに先立って擬似乱数生成処理６４５により擬似乱数を発生する。そして、この擬似乱数により各制御点の時刻またはピッチを、制御点の相対的位置を示すデータによる定まる位置から分散させる。 <Second Embodiment>
Next, a second embodiment of the present invention will be described with reference to FIG. This embodiment is a modification of the first embodiment. In the present embodiment, a pseudo-random number generation process 645 is added to the singing synthesis score generation process 64A in the first embodiment. In the present embodiment, when the control point is set by the control point setting process 641, a pseudo random number is generated by the pseudo random number generation process 645 prior to that. Then, the time or pitch of each control point is dispersed from the position determined by the data indicating the relative position of the control point by this pseudo random number.

本実施形態では、ある音符の制御点Ａ、Ｂ、Ｃ、次の音符の制御点Ａ、Ｂ、Ｃという具合に、設定対象である制御点が進む度に、擬似乱数が発生され、この擬似乱数により各制御点の発生時刻またはピッチに揺らぎが与えられる。従って、制御点の位置が分散し、ピッチの変化の態様が画一的なものではなくなり、単調さのない自然な歌唱音を合成することができる。 In this embodiment, a pseudo-random number is generated each time a control point to be set advances, such as a control point A, B, C of a certain note, a control point A, B, C of the next note, etc. A fluctuation is given to the generation time or pitch of each control point by a random number. Therefore, the positions of the control points are dispersed, the mode of change in pitch is not uniform, and a natural singing sound without monotonousness can be synthesized.

さらに本実施形態では、擬似乱数生成処理６４５に加えて、制御点の種類毎に、分散化に用いる擬似乱数の範囲を変え、制御点の分散の程度を調整する乗算処理Ｍ１〜Ｍ７が設けられている。ここで、制御点の分散の程度は、操作部４の操作により、制御点の種類毎に設定することができる。従って、本実施形態によれば、ユーザは、制御点の種類毎に分散の程度を調整して、自分の好みにあった歌唱音を合成することができる。 Further, in the present embodiment, in addition to the pseudo random number generation process 645, multiplication processes M1 to M7 are provided for changing the range of pseudo random numbers used for decentralization and adjusting the degree of control point dispersion for each type of control points. ing. Here, the degree of control point dispersion can be set for each type of control point by operating the operation unit 4. Therefore, according to this embodiment, the user can synthesize a singing sound that suits his / her preference by adjusting the degree of dispersion for each type of control point.

本実施形態には、次のような変形例が考えられる。すなわち、曲の最初の音符および長い休符の後の音符の制御点の分散の程度を、それ以外の音符の制御点の分散の程度とは独立に、操作部４の操作により調整することができるように構成するのである。この態様によれば、歌いだしの部分の歌いまわしが画一的になるのを防止し、自然な歌唱音を合成することができる。 The following modifications can be considered for this embodiment. That is, the degree of dispersion of the control points of the notes after the first note and the long rest of the music can be adjusted by the operation of the operation unit 4 independently of the degree of dispersion of the control points of the other notes. It is configured so that it can. According to this aspect, it is possible to prevent the singing of the singing portion from becoming uniform and to synthesize a natural singing sound.

＜第３実施形態＞
次に図８を参照し、この発明の第３実施形態について説明する。本実施形態も、上記第１実施形態に変形を加えたものである。本実施形態では、上記第１実施形態におけるＨＤＤ６にパラメータデータベース６５が格納されている。このパラメータデータベース６５は、“通常”、“鋭く”、“滑らかに”といった各種の歌いまわし毎に、制御点ＡおよびＢ間の谷の深さｖ１、音符の切り換わり部分のピッチの変化の撓み具合を決定するパラメータα、各制御点の音符に対する位置を決定するためのパラメータ、制御点間を結ぶ曲線の関数等、ピッチの軌道を決定するためのパラメータのセットを定義したものである。本実施形態では、操作部４の操作により歌いまわしの種類の選択が行われると、その種類に対応したパラメータのセットがパラメータデータベース６５から読み出され、歌唱合成スコア生成処理６４Ａに与えられる。そして、歌唱合成スコア生成処理６４Ａでは、このパラメータデータベース６５から読み出されたパラメータのセットに基づいて、制御点の設定や制御点間を結ぶピッチの軌道の演算が行われる。 <Third Embodiment>
Next, a third embodiment of the present invention will be described with reference to FIG. This embodiment is also a modification of the first embodiment. In the present embodiment, the parameter database 65 is stored in the HDD 6 in the first embodiment. This parameter database 65 shows the deflection of the change in the valley depth v1 between the control points A and B and the change in pitch of the note change portion for each of various kinds of songs such as “normal”, “sharp”, and “smooth”. It defines a set of parameters for determining the pitch trajectory, such as a parameter α for determining the condition, a parameter for determining the position of each control point relative to the note, and a function of a curve connecting the control points. In the present embodiment, when the type of singing is selected by the operation of the operation unit 4, a set of parameters corresponding to the type is read from the parameter database 65 and provided to the singing synthesis score generation process 64A. In the singing synthesis score generation process 64 </ b> A, based on the parameter set read from the parameter database 65, control points are set and a pitch trajectory that connects the control points is calculated.

本実施形態によれば、ユーザは、各パラメータの意味を知らなくても、自分の好みの歌いまわしを選択する操作をするのみにより、そのような歌いまわしのためのパラメータが自動的に選択され、歌唱合成に用いられる。従って、ユーザは、自分の好みの歌いまわしの合成歌唱音を簡単に取得することができる。 According to the present embodiment, even if the user does not know the meaning of each parameter, the user simply selects a favorite song and the parameters for such song are automatically selected. , Used for singing synthesis. Therefore, the user can easily obtain the synthesized singing sound of his / her favorite song.

＜他の実施形態＞
以上、この発明の第１〜第３実施形態を説明したが、この発明には、これ以外にも各種の実施形態が考えられる。例えば次の通りである。
（１）上記第１実施形態において、制御点Ａ、Ｂ、Ｃの相対的位置を定義するデータや、制御点間を結ぶ軌道を決定する各種のパラメータを、ユーザが操作部４の操作により調整することができるように構成してもよい。
（２）歌唱音の合成を行う音源は、フォルマント音源でなくてもよい。例えば、様々な音韻の波形のサンプルデータを記憶したメモリ、または、様々な音韻の調和成分のスペクトル包絡と非調和成分のスペクトルを記憶したメモリを備え、指定された音韻のサンプルデータをこのメモリから読み出して歌唱音を合成する構成の音源を歌唱合成装置に用いてもよい。
（３）操作部４の操作により与えられる要求に応じて、ピッチデータ生成処理６４２により得られたピッチの軌道を表示部３に表示するようにしてもよい。また、この表示を確認したユーザが、操作部４の操作により、所望の音符の所望の制御点の位置を修正することができるように構成してもよい。 <Other embodiments>
While the first to third embodiments of the present invention have been described above, various embodiments other than this can be considered for the present invention. For example:
(1) In the first embodiment, the user adjusts the data defining the relative positions of the control points A, B and C and various parameters for determining the trajectory connecting the control points by operating the operation unit 4. You may comprise so that it can do.
(2) The sound source for synthesizing the singing sound may not be a formant sound source. For example, a memory storing sample data of various phonological waveforms, or a memory storing spectral envelopes of harmonic components and anharmonic components of various phonological components, and storing the specified phonological sample data from this memory A sound source configured to read and synthesize a singing sound may be used in a singing synthesizer.
(3) The pitch trajectory obtained by the pitch data generation processing 642 may be displayed on the display unit 3 in response to a request given by the operation of the operation unit 4. Further, the user who confirms the display may be configured to correct the position of the desired control point of the desired note by operating the operation unit 4.

この発明の第１実施形態である歌唱合成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the song synthesizing | combining apparatus which is 1st Embodiment of this invention. 同実施形態における歌唱合成処理の内容を示す図である。It is a figure which shows the content of the song synthesis | combination process in the embodiment. 同実施形態における制御点設定処理およびピッチデータ生成処理の実行例を示す図である。It is a figure which shows the execution example of the control point setting process and pitch data generation process in the embodiment. 同実施形態における音符の切り換わり部分におけるピッチの変化の態様を例示する波形図である。It is a wave form diagram which illustrates the mode of change of the pitch in the note change part in the embodiment. 同実施形態における歌いだし部分におけるピッチの変化の態様を例示する波形図である。It is a wave form diagram which illustrates the aspect of the change of the pitch in the singing part in the embodiment. 同実施形態における歌い終わり部分におけるピッチの変化の態様を例示する波形図である。It is a wave form diagram which illustrates the mode of change of the pitch in the end of singing in the embodiment. この発明の第２実施形態である歌唱合成装置の特徴部分の構成を示すブロック図である。It is a block diagram which shows the structure of the characteristic part of the song synthesizing | combining apparatus which is 2nd Embodiment of this invention. この発明の第３実施形態である歌唱合成装置の特徴部分の構成を示すブロック図である。It is a block diagram which shows the structure of the characteristic part of the song synthesizing | combining apparatus which is 3rd Embodiment of this invention.

Explanation of symbols

１……ＣＰＵ、４……操作部、６……ＨＤＤ、７……ＲＡＭ、８……フォルマント音源、６３……音韻データベース、６４……歌唱合成プログラム、７０……歌唱合成スコア、７１……ピッチデータトラック、７２……音韻データトラック、６２Ａ……音符データ、６４１……制御点設定処理、６４２……ピッチデータ生成処理、６４Ａ……歌唱合成スコア生成処理、６４Ｂ……歌唱合成スコア出力処理。 DESCRIPTION OF SYMBOLS 1 ... CPU, 4 ... Operation part, 6 ... HDD, 7 ... RAM, 8 ... Formant sound source, 63 ... Phonological database, 64 ... Singing synthesis program, 70 ... Singing synthesis score, 71 ... Pitch data track, 72... Phonological data track, 62 A .. note data, 641... Control point setting process, 642... Pitch data generation process, 64 A .. singing synthesis score generation process, 64 B. .

Claims

The passage point of the singing sound pitch trajectory for each note composing the music according to the data defining the relative positions of the three types of control points that will be the passage point of the singing sound pitch trajectory within one note interval. Control point setting means for setting the control point to be,
Distributing means for generating pseudo-random numbers and distributing the time of the control points set by the control point setting means according to the generated pseudo-random numbers;
Pitch data generating means for obtaining a trajectory passing through each control point set by the control point setting means, generating pitch data for changing the pitch along the trajectory, and storing the data in a storage means;
An singing voice synthesizing apparatus comprising: output means for reading the pitch data from the storage means and outputting the singing sound to a sound source.

The pitch data generation means obtains a trajectory passing through each control point set by the control point setting means according to a calculation method defined for each section between two adjacent control points. The singing voice synthesizing apparatus according to Item 1.

When the target note is the first note or when there is a rest longer than a predetermined time length before the note, the control point setting means has one or more before the first control point of the note. The additional control point is set, and the pitch data generation means obtains a trajectory connecting the additional control point and the first control point of the note. Singing composition device given in the clause.

When the target note is the first note, or when there is a rest longer than a predetermined time length before the note, the control point setting means, for the first control point of the note, The singing voice synthesizing apparatus according to claim 3, wherein the singing voice synthesizing apparatus is set to a position different from a position determined according to data for determining a relative position.

And further comprising adjusting means for adjusting the degree of dispersion of the control points of the notes after the first note and long rest of the song independently of the degree of dispersion of the control points of the other notes. Item 3. The singing voice synthesizing apparatus according to item 3.

The control point setting means may include one or more additional control points after the last control point of the note when the target note is the last note or when there is a rest longer than a predetermined time length after the note. 6. The control point is set, and the pitch data generation means obtains a trajectory connecting the additional control point and the last control point of the note, according to any one of claims 1 to 5. Song synthesizer.

Operation means;
The singing composition according to any one of claims 1 to 6, further comprising means for changing data for determining a relative position of the three kinds of control points in accordance with an operation on the operation means. apparatus.

Operation means;
According to an operation on the operation means, there is provided means for changing data for determining a relative position of the three types of control points or a method for calculating a trajectory of a section between the two adjacent types of control points. The singing voice synthesizing device according to any one of claims 2 to 6.

According to the data indicating the relative position with the notes determined for the three types of control points that are the passing points of the singing sound pitch trajectory, the passing point of the singing sound pitch trajectory for each note constituting the song, A control point setting process for setting at least one kind of control point,
A distribution process for generating pseudo-random numbers and distributing the time of the control points set by the control point setting process according to the generated pseudo-random numbers;
A pitch data generation process for obtaining a trajectory that passes through each control point set by the control point setting process and storing pitch data for changing the pitch along the trajectory in a storage means;
A singing composition program that causes the computer to execute an output process of reading out the pitch data from the storage means and outputting the singing sound to a sound source.