JP4858173B2

JP4858173B2 - Singing sound synthesizer and program

Info

Publication number: JP4858173B2
Application number: JP2007000412A
Authority: JP
Inventors: 隼人大下; 秀紀劔持
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-01-05
Filing date: 2007-01-05
Publication date: 2012-01-18
Anticipated expiration: 2027-01-05
Also published as: JP2008165130A

Description

本発明は、歌声を電気的に合成する歌唱音合成装置に関し、特に、音符、歌詞および表情などを表す楽曲データの入力を支援する技術に関する。 The present invention relates to a singing sound synthesizer that electrically synthesizes a singing voice, and more particularly to a technique for supporting input of music data representing notes, lyrics, facial expressions, and the like.

歌詞データと音符データとを記憶し、音符データの読出に対応して歌詞データを読み出し、歌詞データに対応した音韻を示す音データを予め生成しておき、その音データにしたがった発音を行うことにより歌詞を歌唱する歌唱音合成装置が各種提案されている。この種の歌唱音合成装置には、人間らしい自然な歌唱音を合成する機能が求められる。人間の歌唱音を観察すると、音符データのみでは直接的には表されない発声の強弱の変化や音量の変化、息成分などを含んでおり、これらが歌唱の人間らしさの一因となっていると考えられる。そこで、従来から、歌唱音合成装置において合成される歌唱音に人間らしい表情（すなわち、発声の変化や音量の変化、息成分など）を与えるための検討がなされてきた。例えば非特許文献１には、歌唱音に表情を与えるための各種パラメータの入力を促したり、その入力内容を個々に確認させたりするためのユーザインタフェイスを歌唱音合成装置に設け、係るユーザインタフェイスを介して入力された各パラメータの示す音響効果や編集を歌唱音に付与することが提案されている。
“Cubase 製品紹介”、[online]、[平成１９年１月５日検索]、インターネット〈URL:http://www.japan.steinberg.net/731.html> Stores lyric data and note data, reads out lyric data corresponding to reading out of note data, generates in advance sound data indicating phonemes corresponding to lyric data, and performs pronunciation according to the sound data Various kinds of singing sound synthesizers for singing lyrics have been proposed. This type of singing sound synthesizer is required to have a function of synthesizing natural human-like singing sounds. Observing human singing sounds includes changes in utterance strength, volume changes, breath components, etc. that are not directly expressed only by note data, and these are contributing to the human nature of singing Conceivable. Therefore, conventionally, studies have been made to give human-like facial expressions (that is, utterance change, volume change, breath component, etc.) to the singing sound synthesized by the singing sound synthesizer. For example, Non-Patent Document 1 provides a singing sound synthesizer with a user interface for prompting the user to input various parameters for giving a facial expression to the singing sound and for confirming the input contents individually. It has been proposed to add to the singing sound the sound effects and edits indicated by the parameters input via the face.
“Introduction to Cubase products”, [online], [Search January 5, 2007], Internet <URL: http://www.japan.steinberg.net/731.html>

しかし、非特許文献１に開示された技術では、歌唱音声に表情を与える各パラメータを個別の表示領域に表示させるため、そのパラメータの数が多い場合に大きな表示面積が必要になる。また、上記各パラメータは、時間軸を共有する時系列データとして表されるものであるが、パラメータ毎に異なる領域に表示されるため、パラメータの比較参照を行い難いといった問題点がある。 However, in the technique disclosed in Non-Patent Document 1, each parameter that gives a facial expression to the singing voice is displayed in a separate display area, so that a large display area is required when the number of parameters is large. Each parameter is expressed as time-series data sharing a time axis. However, since each parameter is displayed in a different area for each parameter, there is a problem that it is difficult to compare and refer to the parameters.

本発明は、上記課題に鑑みてなされたものであり、時間軸を共有する複数の時系列データの入力または編集を容易にする技術を提供することを目的としている。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique that facilitates input or editing of a plurality of time-series data sharing a time axis.

上記課題を解決するため、本発明は、楽曲を構成する音符の音高および発音期間を示す音符データとその音符で発声する歌詞を示す歌詞データとがその発声順に配列された楽譜データと、前記楽譜データと時間軸を共有する複数種類の時系列データであって、前記楽譜データにしたがって合成される歌唱音に与える表情を示すパラメータの時間変化を表す複数種類の時系列データと、を取得する取得手段と、前記取得手段により取得された楽譜データにしたがって、前記楽曲の音高の時間変化および歌詞の時間変化を示す楽譜画像を表示装置に表示させる一方、前記楽譜画像の時間軸に沿って前記複数のパラメータの各々の時間変化を示すグラフを表す画像を前記各時系列データにしたがって合成し、前記各グラフの時間軸と前記楽譜画像の時間軸とを対応させつつ前記楽譜画像と並べて前記表示装置に表示させる表示制御手段と、前記取得手段により取得された楽譜データにしたがって前記楽曲の歌唱音を合成し、前記各パラメータの示す表情を付与して出力する歌唱音合成手段とを有する歌唱音合成装置、を提供する。 In order to solve the above problems, the present invention provides musical score data in which note data indicating the pitches and pronunciation periods of notes constituting a musical piece and lyrics data indicating lyrics uttered by the notes are arranged in the order of the voices, A plurality of types of time-series data sharing a time axis with the score data, and a plurality of types of time-series data representing time changes of parameters indicating facial expressions given to the singing sound synthesized according to the score data According to the musical score data acquired by the acquisition means and the acquisition means, a musical score image showing a temporal change in pitch of the music and a temporal change in lyrics is displayed on the display device, while along the time axis of the musical score image. An image representing a graph showing a time change of each of the plurality of parameters is synthesized according to each time series data, and the time axis of each graph and the score image Display control means for displaying on the display device side by side with the musical score image in correspondence with the interval, and synthesize the singing sound of the music according to the musical score data acquired by the acquisition means, and the facial expression indicated by each parameter A singing sound synthesizing device having a singing sound synthesizing means for giving and outputting.

より好ましい態様においては、上記歌唱音合成装置の表示制御手段は、前記各パラメータの時間変化を示すグラフの画像にその背景色を透明にする透過処理を施した後に各グラフの時間軸を対応させて重ね合わせ、各時間軸を前記楽譜画像の時間軸と対応させつつ前記楽譜画像と並べて前記表示装置に表示させることを特徴とする。また、別の好ましい態様においては、前記表示制御手段は、前記各パラメータの時間変化を示すグラフと時間軸との間の領域にパラメータ毎に異なる色で着色を施して各グラフの時間軸を対応させて重ね合わせ、互いに重なり合う着色領域についてはその重なり具合に応じた色に変換して前記表示装置に表示させることを特徴とする。また、別の好ましい態様においては、上記歌唱音合成装置は、操作手段と、前記操作手段に対する操作にしたがって前記時系列データを更新する更新手段とを有し、前記表示制御手段は、前記表示装置に表示されている複数のグラフのうちの１つが前記操作手段により選択された場合に、そのグラフの画像が最前面に見えるように透過表示の制御を行い、前記更新手段は、前記選択されたグラフを書き換える操作が前記操作手段により為された場合に、そのグラフに対応する時系列データをその書き換え操作にしたがって更新することを特徴とする。 In a more preferred aspect, the display control means of the singing sound synthesizing apparatus associates the time axis of each graph with a transparent process for making the background color transparent to the image of the graph showing the time change of each parameter. And the time axis is aligned with the time axis of the score image and displayed on the display device side by side with the score image. In another preferred embodiment, the display control means applies a color different for each parameter to the area between the graph showing the time change of each parameter and the time axis to correspond to the time axis of each graph. The colored regions that are overlapped and overlap each other are converted to a color corresponding to the overlapping state and displayed on the display device. In another preferable aspect, the singing sound synthesizer includes an operation unit and an update unit that updates the time-series data in accordance with an operation on the operation unit, and the display control unit includes the display device. When one of the plurality of graphs displayed on the screen is selected by the operation means, the transparent display is controlled so that the image of the graph can be seen in the foreground, and the update means controls the selected When an operation of rewriting a graph is performed by the operation means, the time series data corresponding to the graph is updated according to the rewriting operation.

また、上記課題を解決するために、本発明は、コンピュータ装置を、同一の時間軸に沿って時間変化する複数のパラメータの各々について、パラメータ値の時間変化を示す時系列データを取得する取得手段と、前記複数のパラメータの各々の時間変化を示すグラフを表す画像を前記各時系列データにしたがって合成し、各グラフの時間軸を一致させつつ重ね合わせて表示装置に表示させる表示制御手段として機能させるプログラム、を提供する。 Further, in order to solve the above-described problem, the present invention provides a computer device for acquiring time-series data indicating a time change of a parameter value for each of a plurality of parameters that change with time along the same time axis. And a display control unit that combines the images representing the graphs showing the time changes of the plurality of parameters according to the time series data, and displays them on the display device in a superimposed manner while matching the time axes of the graphs. Program to be provided.

本発明によれば、時間軸を共有する複数の時系列データの示す複数のグラフが時間軸を共有して合成表示されるため、ユーザは複数の時系列データの比較参照がし易くなる、といった効果を奏する。 According to the present invention, since a plurality of graphs indicated by a plurality of time-series data sharing a time axis are combined and displayed while sharing a time axis, the user can easily compare and reference a plurality of time-series data. There is an effect.

（Ａ：第１実施形態）
図１は、本発明の一実施形態に係る歌唱音合成装置１０の構成例を示すブロック図である。図１に示すように、歌唱音合成装置１０は、制御部１１０、操作部１２０、表示部１３０、音声出力部１４０、記憶部１５０、および各構成要素間のデータ授受を仲介するバス１６０を備えている。 (A: 1st Embodiment)
FIG. 1 is a block diagram illustrating a configuration example of a singing sound synthesizer 10 according to an embodiment of the present invention. As shown in FIG. 1, the singing sound synthesizer 10 includes a control unit 110, an operation unit 120, a display unit 130, a voice output unit 140, a storage unit 150, and a bus 160 that mediates data exchange between each component. ing.

制御部１１０は、例えばＣＰＵ（Central Processing Unit）であり、歌唱音合成装置１０の各部の作動制御を行う制御中枢である。操作部１２０は、テンキーなどの複数の操作子を備えたキーボードとマウスなどのポインティングデバイスとを含んでいる（何れも図示省略）。操作部１２０は、上記操作子やポインティングデバイスに対して為された操作を表すデータ（例えば、押下された操作子を示す識別子やポインティングデバイスの移動量を示すデータ）を制御部１１０に引き渡す。これにより、操作部１２０に対するユーザの操作内容が制御部１１０に伝達されることとなる。 The control unit 110 is, for example, a CPU (Central Processing Unit), and is a control center that performs operation control of each unit of the singing sound synthesizer 10. The operation unit 120 includes a keyboard having a plurality of operators such as a numeric keypad and a pointing device such as a mouse (all not shown). The operation unit 120 delivers data representing an operation performed on the operation element or the pointing device (for example, an identifier indicating the pressed operation element or data indicating the movement amount of the pointing device) to the control unit 110. As a result, the user's operation content on the operation unit 120 is transmitted to the control unit 110.

表示部１３０は、例えば液晶ディスプレイとその駆動回路であり（何れも図示省略）、制御部１１０から引き渡された画像データの表す画像を表示する。音声出力部１４０は、Ｄ／Ａコンバータ、アンプおよびスピーカを含んでいる（何れも図示省略）。音声出力部１４０は、制御部１１０から受け取った音声データをＤ／Ａコンバータによって音声信号に変換し、その音声信号をアンプによって適宜増幅した後にスピーカに与え、その音声データの表す音を出力する。 The display unit 130 is, for example, a liquid crystal display and its drive circuit (both not shown), and displays an image represented by the image data delivered from the control unit 110. The audio output unit 140 includes a D / A converter, an amplifier, and a speaker (all not shown). The audio output unit 140 converts the audio data received from the control unit 110 into an audio signal by a D / A converter, amplifies the audio signal as appropriate by an amplifier, and gives it to a speaker to output a sound represented by the audio data.

記憶部１５０は、図１に示すように、揮発性記憶部１５０ａと不揮発性記憶部１５０ｂとを含んでいる。揮発性記憶部１５０ａは、例えばＲＡＭ（Random Access Memory）であり、各種ソフトウェアにしたがって作動している制御部１１０によってワークエリアとして利用される。一方、不揮発性記憶部１５０ｂは、例えばハードディスクであり、各種データや各種プログラムを記憶している。 As shown in FIG. 1, the storage unit 150 includes a volatile storage unit 150a and a nonvolatile storage unit 150b. The volatile storage unit 150a is, for example, a RAM (Random Access Memory), and is used as a work area by the control unit 110 operating according to various software. On the other hand, the non-volatile storage unit 150b is a hard disk, for example, and stores various data and various programs.

不揮発性記憶部１５０ｂに記憶されているデータの一例としては、図１に示す楽曲データベース１５２および音韻データベース１５４に格納されているデータが挙げられる。図１に示す楽曲データベース１５２には、楽曲を一意に示す楽曲識別子（例えば、楽曲名を示す文字列データ等）に対応付けて、その楽曲の歌唱音の合成に用いられる楽譜データと、その歌唱音に与える表情を示すパラメータとが格納されている。一方、音韻データベース１５４は、歌唱合成の際に参照される音韻データの集合体である。この音韻データベース１５４では、発声する音声の種類、例えば、男声、女声、あるいは特定の歌手等ごとに音韻データの集合体が予め用意されている。 As an example of data stored in the nonvolatile storage unit 150b, data stored in the music database 152 and the phonological database 154 shown in FIG. The music database 152 shown in FIG. 1 is associated with a music identifier (for example, character string data indicating a music name) uniquely indicating a music, and musical score data used for synthesizing the song sound of the music, and the song A parameter indicating the facial expression given to the sound is stored. On the other hand, the phonological database 154 is a collection of phonological data that is referred to during singing synthesis. In this phoneme database 154, a collection of phoneme data is prepared in advance for each type of voice to be uttered, for example, male voice, female voice, or a specific singer.

楽譜データには、その楽譜データに対応付けられている楽曲識別子で識別される楽曲を構成する各音符の音高および発音期間（例えば、発声時刻と音符の長さ）を示す音符データと、音符に合わせて発声する歌詞を示す歌詞データとが含まれている。この楽譜データは、楽曲の開始から発声順序に合わせて、個々の音符に対応した音符データと歌詞データとを時系列的に並べたものであり、楽譜データ内において音符データと歌詞データとは音符単位で対応付けられている。 The musical score data includes note data indicating the pitch and pronunciation period (for example, utterance time and note length) of each note constituting the musical piece identified by the musical piece identifier associated with the musical score data, and the musical note. And lyrics data indicating lyrics to be uttered in accordance with. This score data is a chronological arrangement of note data and lyric data corresponding to individual notes in accordance with the utterance order from the start of the music. In the score data, note data and lyric data are Corresponds in units.

一方、パラメータは、音符の発声態様を示すパラメータとピッチベンドに関するパラメータとに分類される。音符の発声態様を示すパラメータとは、音符の発声の強弱を示す“ＶＥＬ”、音量を示す“ＤＹＮ”、息成分の強弱を示す“ＢＲＥ”、声のトーンを示す“ＢＲＩ”、声の透明感を示す“ＣＬＥ”、性別・年齢的な変化を示す“ＧＥＮ”、および、音符のつながりの滑らかさを示す“ＰＯＲ”の７種類である。また、ピッチベンドに関するパラメータとは、ピッチベンドの有無を示す“ＰＩＴ”およびピッチベンドの最大幅を示す“ＰＢＳ”の２種類である。これら９種類のパラメータは、何れも、楽曲の再生開始時点を原点とする時間軸上の時刻におけるパラメータ値の配列（すなわち、時間軸を共有する時系列データ）として楽曲データベース１５２に楽曲毎に格納されている。なお、本実施形態では、電気的に合成する歌唱音声に人間らしい表情を与えるパラメータとして上記９種類を用いる場合について説明するが、これら９種類のうちの何れか８種類以下を用いても良く、また、これら９種類に他の１又は複数のパラメータを加えて用いても良い。以下では、楽曲識別子に対応付けて楽曲データベース１５２に格納されている楽譜データと各パラメータとの組を「楽曲データ」とも呼ぶ。 On the other hand, parameters are classified into parameters indicating the utterance mode of notes and parameters relating to pitch bend. The parameters indicating the utterance mode of the note are “VEL” indicating the strength of the utterance of the note, “DYN” indicating the volume, “BRE” indicating the strength of the breath component, “BRI” indicating the tone of the voice, transparency of the voice There are seven types: “CLE” indicating a feeling, “GEN” indicating a change in gender and age, and “POR” indicating a smooth connection of musical notes. The parameters relating to pitch bend are two types, “PIT” indicating the presence or absence of pitch bend and “PBS” indicating the maximum width of the pitch bend. All these nine types of parameters are stored for each song in the song database 152 as an array of parameter values at the time on the time axis starting from the playback start point of the song (that is, time-series data sharing the time axis). Has been. In the present embodiment, the case where the above nine types are used as parameters for giving a human-like expression to the singing voice to be electrically synthesized will be described. However, any one of these nine types may be used. These nine types may be used by adding one or more other parameters. Hereinafter, a set of score data and each parameter stored in the music database 152 in association with a music identifier is also referred to as “music data”.

不揮発性記憶部１５０ｂに格納されているプログラムの一例としては、所謂オペレーションシステム（Operation System：以下、「ＯＳ」）を制御部１１０に実現させるＯＳプログラム（図示省略）や曲編集プログラム１５６、および歌唱合成プログラム１５８が挙げられる。ＯＳプログラムにしたがって作動している制御部１１０には、歌唱音合成装置１０の各部の作動制御を行う機能、各種入力をユーザに促すためのＧＵＩを提供する機能、およびユーザの指示に応じて他のプログラムを実行する機能が付与される。曲編集プログラム１５６とは、図２に示す楽曲データ入出力画面を表示部１３０に表示させ、楽曲データの閲覧および編集をユーザに促す処理を制御部１１０に実行させるためのプログラムである。この楽曲データ入出力画面については後に詳細に説明する。一方、歌唱合成プログラム１５８とは、ユーザにより指定された楽曲識別子により識別される楽曲の歌唱音をユーザにより指定された声質および表情で合成し出力する処理を、その楽曲に対応する楽曲データおよび音韻データベースの格納内容にしたがって制御部１１０に実行させるプログラムである。このように、ＯＳプログラムにしたがって作動している制御部１１０が歌唱合成プログラム１５８にしたがって行う処理は、一般的な歌唱合成処理と同一である。このため、以下では、曲編集プログラム１５６にしたがって作動している制御部１１０が行う処理を中心に説明する。 As an example of the program stored in the non-volatile storage unit 150b, an OS program (not shown), a song editing program 156, and a song that causes the control unit 110 to realize a so-called operation system (hereinafter referred to as “OS”). An example is a synthesis program 158. The control unit 110 operating according to the OS program includes a function for controlling the operation of each unit of the singing sound synthesizer 10, a function for providing a GUI for prompting the user to perform various inputs, and others according to user instructions. The function of executing the program is provided. The song editing program 156 is a program for causing the control unit 110 to execute processing for displaying the song data input / output screen shown in FIG. 2 on the display unit 130 and prompting the user to browse and edit the song data. The music data input / output screen will be described in detail later. On the other hand, the singing synthesis program 158 is a process of synthesizing and outputting the singing sound of the music identified by the music identifier specified by the user with the voice quality and expression specified by the user, and the music data and phoneme corresponding to the music. This is a program to be executed by the control unit 110 according to the stored contents of the database. As described above, the process performed by the control unit 110 operating according to the OS program according to the song synthesis program 158 is the same as a general song synthesis process. For this reason, below, it demonstrates focusing on the process which the control part 110 which is operate | moving according to the music edit program 156 performs.

ＯＳプログラムにしたがって作動している制御部１１０は、曲編集プログラム１５６の実行を指示する旨の操作が操作部１２０に対して為されると、その操作内容を示すデータを操作部１２０から受け取り、そのデータの示す指示に従って曲編集プログラム１５６を不揮発性記憶部１５０ｂから揮発性記憶部１５０ａに読み出し、その実行を開始する。曲編集プログラムにしたがって作動している制御部１１０は、図３に示すように、データアクセス手段３１０、表示制御手段３２０、および操作内容判定手段３３０の各ソフトウェアモジュールとして機能する。 When an operation for instructing execution of the song editing program 156 is performed on the operation unit 120, the control unit 110 operating according to the OS program receives data indicating the operation content from the operation unit 120, In accordance with the instruction indicated by the data, the music editing program 156 is read from the nonvolatile storage unit 150b to the volatile storage unit 150a, and its execution is started. As shown in FIG. 3, the control unit 110 operating according to the song editing program functions as software modules of the data access unit 310, the display control unit 320, and the operation content determination unit 330.

データアクセス手段３１０は、楽曲データベース１５２からの楽曲データの取得、および楽曲データベース１５２の格納内容の更新を行う機能を担っている。より詳細に説明すると、データアクセス手段３１０は、操作部１２０を介して入力された楽曲識別子に対応する楽曲データを楽曲データベース１５２から読み出し、その楽曲識別子と対応付けて揮発性記憶部１５０ａ内の所定領域に書き込む。揮発性記憶部１５０ａの所定領域に書き込まれた楽曲データは、楽曲データ入出力画面（図２参照）を表示部１３０に表示させる際に表示制御手段３２０によって利用される。また、揮発性記憶部１５０ａ内の所定領域に格納された楽曲データは、楽曲データ入出力画面に対してユーザが行った編集操作に応じて適宜更新される。そして、操作部１２０を介して登録指示が入力されると、データアクセス手段３１０は、上記所定領域に格納されている楽曲データで楽曲データベース１５２内の該当楽曲データを上書き更新する。 The data access means 310 has a function of acquiring music data from the music database 152 and updating the stored contents of the music database 152. More specifically, the data access unit 310 reads out music data corresponding to the music identifier input via the operation unit 120 from the music database 152, and associates the music data with the music identifier in the volatile storage unit 150a. Write to the area. The music data written in the predetermined area of the volatile storage unit 150a is used by the display control unit 320 when the music data input / output screen (see FIG. 2) is displayed on the display unit 130. The music data stored in the predetermined area in the volatile storage unit 150a is appropriately updated according to the editing operation performed by the user on the music data input / output screen. When a registration instruction is input via the operation unit 120, the data access unit 310 overwrites and updates the corresponding music data in the music database 152 with the music data stored in the predetermined area.

表示制御手段３２０は、データアクセス手段３１０によって揮発性記憶部１５０ａ内の所定領域に書き込まれた楽曲データにしたがって、図２に示す楽曲データ入出力画面を表示部１３０に表示させる。図２に示すように、楽曲データ入出力画面は、楽譜データ入出力領域２１０、パラメータ選択領域２２０およびパラメータ入出力領域２３０の３つの入出力領域に分けられる。 The display control unit 320 displays the music data input / output screen shown in FIG. 2 on the display unit 130 according to the music data written in the predetermined area in the volatile storage unit 150a by the data access unit 310. As shown in FIG. 2, the music data input / output screen is divided into three input / output areas: a score data input / output area 210, a parameter selection area 220, and a parameter input / output area 230.

楽譜データ入出力領域２１０は、揮発性記憶部１５０ａ内の所定領域に楽曲識別子に対応付けて格納されている楽曲データに含まれる楽譜データにしたがって、その楽曲識別子で識別される楽曲の歌唱音の音高（以下、「ピッチ」とも呼ぶ）および歌詞の時間変化を示す画像を表示するための入出力領域である。この楽譜データ入出力領域２１０には、縦軸が音高を示す一方、横軸が時間を示す座標平面上に、音符データの示す音符の長さに応じた長さの棒状の画像をその音符データの示す音高および発声時刻に対応する座標位置に配置してなる楽譜画像が表示制御手段３２０によって表示される。この楽譜画像は、図２に示すように、ピアノの鍵盤を模した画像がピッチのスケールとして上記縦軸に沿って配列されているためピアノロール画像とも呼ばれる。また、図２を参照すれば明らかなように、本実施形態に係る楽譜画像においては、各音符データに対応する棒状画像の近傍に、その音符データに対応する歌詞データの示す歌詞を示す文字列画像が配置される。楽譜データ入出力領域２１０に表示される楽譜画像を視認したユーザは、その楽譜画像における上記棒状画像の配列等から楽曲のピッチおよび歌詞の時間変化を直感的に把握することができる。なお、本実施形態では、楽曲のピッチおよび歌詞の時間変化を表す楽譜画像としてピアノロール画像を用いる場合について説明するが、例えば、各音符データにしたがって五線譜上に音符を記譜してなる楽譜画像を用いても勿論良い。 The score data input / output area 210 is used to store the singing sound of the song identified by the song identifier according to the score data included in the song data stored in the predetermined area in the volatile storage unit 150a in association with the song identifier. This is an input / output area for displaying an image showing a temporal change in pitch (hereinafter also referred to as “pitch”) and lyrics. In the score data input / output area 210, a bar-like image having a length corresponding to the length of the note indicated by the note data is displayed on the coordinate plane in which the vertical axis indicates the pitch and the horizontal axis indicates the time. A musical score image arranged at a coordinate position corresponding to the pitch and utterance time indicated by the data is displayed by the display control means 320. As shown in FIG. 2, this score image is also called a piano roll image because images simulating a piano keyboard are arranged along the vertical axis as a pitch scale. As is clear from FIG. 2, in the musical score image according to the present embodiment, a character string indicating the lyrics indicated by the lyrics data corresponding to the note data in the vicinity of the bar-shaped image corresponding to each note data. An image is placed. The user who visually recognizes the score image displayed in the score data input / output area 210 can intuitively grasp the time change of the pitch of the music and the lyrics from the arrangement of the bar-shaped images in the score image. In the present embodiment, a case where a piano roll image is used as a musical score image representing a time change of the pitch and lyrics of music is described. For example, a musical score image in which musical notes are recorded on a staff according to each musical note data. Of course, it is also possible to use.

パラメータ選択領域２２０には、前述した９種類のパラメータのうちの何れを表示または編集するのかをユーザに指示させるための画像が表示される。図２に示すように、パラメータ選択領域２２０に表示される画像には、前述した９種類のパラメータのうちから表示対象のものをユーザに指示させるためのチェックボックス（以下、表示指示ボックス）と、ユーザに選択されたパラメータのうちから編集対象のものをユーザに指示させるためのチェックボックス（以下、編集指示ボックス）が含まれている。ユーザは操作部１２０を適宜操作して上記各チェックボックスをクリックすることにより、表示対象のパラメータの選択をしたり、表示させたパラメータのうちから編集対象のものを１つ選択することができる。ユーザが何れのチェックボックスを指定または解除する操作を行ったかは操作部１２０から引き渡されるデータを解析することにより操作内容判定手段３３０によって判定され、その判定結果を示すデータが表示制御手段３２０に引き渡される。一方、表示制御手段３２０は、ユーザの操作内容を示すデータを操作内容判定手段３３０から受け取る毎に、表示対象として指定されたパラメータの時間変化を示すグラフの画像をパラメータ入出力領域２３０に表示させる。なお、表示制御手段３２０は、表示対象のパラメータの選択が未だ為されていない状況下では、パラメータ入出力領域２３０に例えば白一色などの下地画像を表示させる。 In the parameter selection area 220, an image for instructing the user which of the nine types of parameters described above is to be displayed or edited is displayed. As shown in FIG. 2, the image displayed in the parameter selection area 220 includes a check box (hereinafter referred to as a display instruction box) for instructing the user of a display target from the nine types of parameters described above, A check box (hereinafter referred to as an edit instruction box) is included for allowing the user to instruct the user to edit one of the parameters selected by the user. By appropriately operating the operation unit 120 and clicking each check box, the user can select a parameter to be displayed or select one to be edited from the displayed parameters. Which check box is designated or canceled by the user is determined by the operation content determination unit 330 by analyzing data delivered from the operation unit 120, and data indicating the determination result is delivered to the display control unit 320. It is. On the other hand, the display control unit 320 displays, in the parameter input / output area 230, a graph image showing a time change of a parameter designated as a display target every time data indicating the operation content of the user is received from the operation content determination unit 330. . Note that the display control unit 320 displays a background image such as a white color in the parameter input / output area 230 in a situation where a parameter to be displayed has not yet been selected.

より詳細に説明すると、表示制御手段３２０は、パラメータ選択領域２２０内の表示指示ボックスが選択されたことを示すデータを操作内容判定手段３３０から受け取ると、その表示指示ボックスに対応するパラメータの時間変化を楽譜画像と同一の時間軸上で示すグラフを表す画像データを透過モードで生成する。ここで、透過モードとは、透過処理（すなわち、グラフの背景部分を透明にする処理）を施して画像データを生成するモードである。そして、表示制御手段３２０は、上記透過モードで生成した画像データの表す画像をパラメータ入出力領域２３０に重ね合わせて表示させる。例えば、音符の発声の強弱を示す“ＶＥＬ”、音量を示す“ＤＹＮ”、および息成分の強弱を示す“ＢＲＥ”の３種類のパラメータについての表示指定ボックスをこの順に選択する操作が為された場合には、表示制御部３２０は、図４（Ａ）に示すように、各パラメータの時間変化を示すグラフをその選択順に下地画像に重ね合わせてパラメータ入出力領域２３０に表示する。ここで、上記３種類のパラメータが図５（Ａ）、（Ｂ）および（Ｃ）に示すように時間変化している場合には、表示部１３０には図６に示す楽曲データ入出力画面が表示されることになる。この例においては、図６を参照すれば明らかなように、パラメータ入出力領域２３０に重ね合わせて表示される各グラフの画像には透過処理が施されているため、ユーザは上記３種類のパラメータの時間変化を一括して把握することができる。 More specifically, when the display control unit 320 receives data indicating that the display instruction box in the parameter selection area 220 has been selected from the operation content determination unit 330, the time change of the parameter corresponding to the display instruction box Is generated in the transmission mode, which represents a graph indicating the same as the score image on the same time axis. Here, the transparent mode is a mode in which image data is generated by performing a transparent process (that is, a process for making the background portion of the graph transparent). Then, the display control unit 320 displays the image represented by the image data generated in the transmission mode so as to be superimposed on the parameter input / output area 230. For example, an operation of selecting display designation boxes in this order for three types of parameters, “VEL” indicating the strength of the utterance of a note, “DYN” indicating the volume, and “BRE” indicating the strength of the breath component, was performed. In this case, as shown in FIG. 4A, the display control unit 320 superimposes a graph showing the time change of each parameter on the base image in the selection order and displays the graph in the parameter input / output area 230. Here, when the three types of parameters are changing with time as shown in FIGS. 5A, 5B and 5C, the music data input / output screen shown in FIG. Will be displayed. In this example, as is clear from FIG. 6, the image of each graph displayed superimposed on the parameter input / output area 230 has been subjected to transparency processing. It is possible to grasp the time change of all at once.

一方、表示制御手段３２０は、編集指示ボックスが選択されたことを示すデータを操作内容判定手段３３０から受け取ると、その編集指示ボックスに対応するパラメータのグラフを表す画像に対して、グラフの描線を点線に変更する処理を施した後に最前面に移動させる。なお、編集対象として指定されたパラメータの線種別を他のグラフと異ならせる理由は、編集対象のグラフをユーザに明確に把握させるためであるが、編集対象のグラフの描線の色を他のグラフと異ならせることにより編集対象をユーザに明確に把握させても良い。 On the other hand, when the display control unit 320 receives data indicating that the edit instruction box has been selected from the operation content determination unit 330, the display control unit 320 draws a line for the graph on the image representing the parameter graph corresponding to the edit instruction box. Move to the foreground after processing to change to a dotted line. The reason why the line type of the parameter specified as the editing target is different from other graphs is to make the editing target graph clear to the user, but the stroke color of the editing target graph is different from the other graphs. It is also possible to make the user clearly grasp the editing target by making it different from the above.

例えば、図６に示す楽曲データ入出力画面が表示部１３０に表示されている状況下で、音符の発声の強弱を示す“ＶＥＬ”についての編集指定ボックスが選択されると、表示制御手段３２０は、このパラメータの時間変化を示すグラフ（図５（Ａ）参照）の画像にグラフの描画線の種別を変更する処理を施した後に、図４（Ｂ）に示すようにパラメータ入出力領域２３０の最前面に移動させる。その結果、表示部１３０には、図７に示す曲データ入出力画面が表示されることになる。この例において、パラメータ入出力領域２３０の最前面に表示されるグラフは他のグラフとは異なる種別の線で描画されているため、編集対象のグラフをユーザに明確に把握させることができる。また、図７に示す曲データ入出力画面のパラメータ入出力領域２３０に表示される各グラフ画像には透過処理が施されているため、他のパラメータの時間変化を参照しつつ編集対象のパラメータの時間変化を更新することが容易になる。 For example, in a situation where the music data input / output screen shown in FIG. 6 is displayed on the display unit 130, when the edit designation box for “VEL” indicating the strength of the utterance of a note is selected, the display control means 320 After the processing of changing the type of the drawing line of the graph is performed on the image of the graph showing the time change of the parameter (see FIG. 5A), as shown in FIG. Move to the front. As a result, the music data input / output screen shown in FIG. 7 is displayed on the display unit 130. In this example, since the graph displayed in the forefront of the parameter input / output area 230 is drawn with a line of a different type from the other graphs, the graph to be edited can be clearly understood by the user. Further, since each graph image displayed in the parameter input / output area 230 of the song data input / output screen shown in FIG. It becomes easy to update the time change.

以上に説明したように、本実施形態に係る歌唱音合成装置１０によれば、楽曲の歌唱音のピッチの時間変化を示す楽譜画像とともに、ユーザにより選択された複数のパラメータの時間変化を示す各グラフを重ね合わせて合成した画像が上記楽譜画像の時間軸に沿って表示される。このため、パラメータ毎に入出力領域を用いる必要はなく、表示部１３０の描画領域を無駄に使うことはなく、時間軸を共有する複数種類の時系列データにより表される各パラメータの時間変化を比較・参照および編集することが容易になる、といった効果を奏する。なお、本実施形態では、本発明に係る歌唱音合成装置に特徴的な時系列データ表示機能をソフトウェアモジュールで実現する場合について説明したが、ハードウェアモジュールで実現しても良いことは勿論である。 As described above, according to the singing sound synthesizing apparatus 10 according to the present embodiment, each of the time-varying parameters of the plurality of parameters selected by the user is displayed together with the score image indicating the time-varying pitch of the singing sound of the music. An image synthesized by superimposing the graphs is displayed along the time axis of the score image. For this reason, it is not necessary to use an input / output area for each parameter, and the drawing area of the display unit 130 is not used wastefully, and the time change of each parameter represented by a plurality of types of time-series data sharing a time axis can be obtained. There is an effect that comparison / reference and editing are facilitated. In the present embodiment, the case where the time series data display function characteristic of the singing sound synthesizer according to the present invention is realized by a software module has been described, but it is needless to say that it may be realized by a hardware module. .

（Ｂ：その他の実施形態）
以上、本発明の一実施形態について説明したが、係る実施形態に以下に述べる変形を加えても良いことは勿論である。
（１）上述した実施形態では、１台のコンピュータ装置に楽曲データ編集処理と楽曲データにしたがって歌唱合成を実行する歌唱合成処理とを実行させる場合について説明した。しかしながら、楽曲データ編集処理と歌唱合成処理とを各々別個のコンピュータ装置に実行させても良い。 (B: Other embodiments)
Although one embodiment of the present invention has been described above, it goes without saying that modifications described below may be added to such an embodiment.
(1) In the above-described embodiment, a case has been described in which a single computer apparatus performs a song data editing process and a song synthesis process for performing song synthesis according to song data. However, the music data editing process and the song composition process may be executed by separate computer devices.

（２）上述した実施形態では、ユーザにより表示指定された複数のパラメータの各々の時間変化を示すグラフに透過処理を施した後に重ね合わせて表示させる場合について説明したが、各パラメータの時間変化を把握し易くするためにパラメータ毎にグラフ描画に用いる線の種別を異ならせても良く、線の色を異ならせても勿論良い。また、グラフを描く線の色や種類で各パラメータを示すのではなく、グラフ曲線と時間軸との間の領域にパラメータ毎に異なる着色、または選択順に応じた着色を施すようにしても良い。具体的には、1番目に選択されるパラメータについては赤で、２番目に選択されるパラメータについては黄で、３番目に選択されるパラメータについては青で、グラフ曲線と時間軸との間の領域を着色する等すれば良い。そして、これら着色を施したグラフ画像に透過処理を施して重ね合わせる際には、着色した各領域を有色透明に変換し、さらに、互いに重なり合う着色領域については、その重なり具合に応じた色に変換する処理を施せば良い。上記の例で、赤で着色した領域の上に黄で着色した領域が重なっている場合には、その重なっている部分については黄色がかったオレンジ色に変換すれば良く、黄で着色した領域の上に青で着色した領域が重なっている場合には、その重なっている部分については青味がかった緑色に変換すれば良い。このような変換処理を施せば、重なり部分の色合いで各パラメータの重なり順をユーザに把握させることが可能である。なお、重なり部分の色合いの決定や色の変換には既存の画像処理アルゴリズムを用いることが可能である。 (2) In the above-described embodiment, a case has been described in which a graph indicating a time change of each of a plurality of parameters designated for display by a user is subjected to a transparent process and then superimposed and displayed. In order to make it easy to grasp, the type of line used for graph drawing may be different for each parameter, and of course, the color of the line may be different. Further, instead of indicating each parameter by the color or type of a line for drawing a graph, a different color for each parameter or a color according to the selection order may be applied to a region between the graph curve and the time axis. Specifically, the first selected parameter is red, the second selected parameter is yellow, the third selected parameter is blue, and the distance between the graph curve and the time axis What is necessary is just to color an area | region. When these colored graph images are subjected to transparency processing and superimposed, the colored areas are converted to colored and transparent, and the colored areas that overlap each other are converted to colors according to the degree of overlap. What is necessary is just to give the process to do. In the above example, if the yellow colored area overlaps the red colored area, the overlapping part may be converted to yellowish orange, and the yellow colored area When the area colored in blue overlaps, the overlapping portion may be converted to bluish green. By performing such conversion processing, it is possible to make the user understand the overlapping order of the parameters by the hue of the overlapping portion. An existing image processing algorithm can be used for determining the color of the overlapped portion and for converting the color.

（３）上述した実施形態では、曲頭から曲末尾までの楽曲のピッチの時間変化を示す楽譜画像および各種パラメータの時間変化を示す画像を一括して表示部１３０に表示させる場合について説明した。しかしながら、例えばノート型パソコンやＰＤＡ（Personal Digital Assistants）など据え置き型パソコンに比較して狭い描画領域しか有さないコンピュータ装置で曲編集プログラムを実行する場合には、一曲分の楽譜画像全体の一括表示ができなかったり、視認に耐えないほどの縮小を行わなければ表示できない場合がある。そこで、一曲分の楽譜画像全体の一括表示を行うに充分な広さの描画領域がない場合には、一般的なワードプロセッサ画面や表計算画面などと同様にスクロールバーおよびスクロールボタンを設け、所定の時間範囲毎および所定のピッチ範囲毎に楽譜画像を分割して表示すれば良い。ただし、所定の時間範囲毎に楽譜画像を表示しスクロールボタン等で時間範囲の切り替えを行う場合には、パラメータの時間変化を示すグラフの時間範囲につても楽譜画像の時間範囲に同期させて表示を切り替える必要がある。 (3) In the above-described embodiment, the case has been described in which the score image indicating the time change of the pitch of the music from the beginning of the music to the end of the music and the image indicating the time change of the various parameters are collectively displayed on the display unit 130. However, when a song editing program is executed on a computer device that has a narrow drawing area compared to a stationary personal computer such as a notebook personal computer or PDA (Personal Digital Assistants), for example, the entire score image for a single song is batched. In some cases, display cannot be performed or display cannot be performed unless the image is reduced so as not to be visually recognized. Therefore, if there is no drawing area that is large enough to display the entire score image for one song, a scroll bar and scroll buttons are provided as in the case of a general word processor screen or spreadsheet screen. The score image may be divided and displayed for each time range and for each predetermined pitch range. However, when displaying a score image for each predetermined time range and switching the time range with a scroll button etc., the time range of the graph showing the time change of the parameter is also displayed in synchronization with the time range of the score image. It is necessary to switch.

（４）上述した実施形態では、時間軸を共有する複数種類の時系列データの各々により表される各パラメータの時間変化を表すグラフを１つの画像に合成して表示部１３０に表示させる処理を制御部１１０に実行させる時系列データ表示制御プログラムが歌唱音合成装置１０に予めインストールされている場合について説明した。しかし、上記時系列データ表示制御プログラムをＣＤ−ＲＯＭ（Compact Disk-Read Only Memory）やＤＶＤ（Digital Versatile Disk）などのコンピュータ装置読み取り可能な記録媒体に書き込んで配布しても良く、また、インターネットなどの電気通信回線を介したダウンロードにより上記時系列データ表示制御プログラムを配布しても良い。このようにして配布される時系列データ表示制御プログラムを一般的なコンピュータ装置にインストールすることによって、そのコンピュータ装置に前述したパラメータ表示制御モジュールと同一の機能を付与することが可能になる。 (4) In the above-described embodiment, the process of combining the graph representing the time change of each parameter represented by each of a plurality of types of time-series data sharing the time axis into one image and causing the display unit 130 to display the graph. The case where the time series data display control program to be executed by the control unit 110 is installed in the singing sound synthesizer 10 in advance has been described. However, the time-series data display control program may be distributed by being written on a computer-readable recording medium such as a CD-ROM (Compact Disk-Read Only Memory) or a DVD (Digital Versatile Disk), or the Internet. The time-series data display control program may be distributed by downloading via a telecommunication line. By installing the time-series data display control program distributed in this way on a general computer device, the same function as the parameter display control module described above can be given to the computer device.

本発明の一実施形態に係る歌唱音合成装置１０の構成例を示すブロック図である。It is a block diagram which shows the structural example of the singing sound synthesizer 10 which concerns on one Embodiment of this invention. 歌唱音合成装置１０の表示部１３０に表示される楽曲データ入出力画面の画面レイアウトを示す図である。It is a figure which shows the screen layout of the music data input / output screen displayed on the display part 130 of the singing sound synthesizer. 歌唱音合成装置１０の制御部１１０が曲編集プログラムにしたがって作動することにより実現されるソフトウェアモジュールの一例を示す図である。It is a figure which shows an example of the software module implement | achieved when the control part 110 of the singing sound synthesizer 10 operate | moves according to a music edit program. 制御部１１０が曲編集プログラムにしたがって実行する画像合成処理を説明するための図である。It is a figure for demonstrating the image composition process which the control part 110 performs according to a music edit program. パラメータの時間変化を示すグラフの一例である。It is an example of the graph which shows the time change of a parameter. 歌唱音合成装置１０の表示部１３０に表示される楽曲データ入出力画面の一例を示す図である。It is a figure which shows an example of the music data input / output screen displayed on the display part 130 of the singing sound synthesizer. 歌唱音合成装置１０の表示部１３０に表示される楽曲データ入出力画面の一例を示す図である。It is a figure which shows an example of the music data input / output screen displayed on the display part 130 of the singing sound synthesizer.

Explanation of symbols

１０…歌唱音合成装置、１１０…制御部、１２０…操作部、１３０…表示部、１４０…音声出力部、１５０…記憶部、１５０ａ…揮発性記憶部、１５０ｂ…不揮発性記憶部、１６０…バス。 DESCRIPTION OF SYMBOLS 10 ... Singing sound synthesizer, 110 ... Control part, 120 ... Operation part, 130 ... Display part, 140 ... Audio | voice output part, 150 ... Memory | storage part, 150a ... Volatile memory part, 150b ... Nonvolatile memory part, 160 ... Bus .

Claims

Musical score data in which musical note data indicating the pitches and pronunciation periods of the musical notes constituting the musical piece and lyrics data indicating the lyrics uttered by the musical notes are arranged in the order of voice production, and a plurality of types sharing the time axis with the musical score data Time series data, and a plurality of types of time series data representing a time change of a parameter indicating a facial expression given to a singing sound synthesized according to the musical score data;
In accordance with the musical score data acquired by the acquisition means, the musical score image showing the time change of the pitch of the music and the time change of the lyrics is displayed on the display device, while the plurality of parameters along the time axis of the musical score image An image representing a graph showing each time change is synthesized according to each time series data, and the time axis of each graph and the time axis of the score image are associated with each other and displayed on the display device along with the score image. Display control means,
Synthesizer singing sound of the music according to the musical score data acquired by the acquiring means, and singing sound synthesizing means for providing and outputting a facial expression indicated by each parameter,
The display control means includes
In each of the graph images showing the time change of each parameter, after coloring the area between the graph curve showing the time change of the parameter and the time axis with a different color for each parameter , the time axis of each graph is changed. Colored areas that are overlapped and overlapped with each other are colors obtained by mixing the colors colored in the respective areas, and are converted into colors having a tint in the foreground area and displayed on the display device. A singing sound synthesizer characterized in that

The singing voice synthesizing apparatus according to claim 1, wherein the display control unit draws a graph displayed on the foreground with a line of a type different from other graphs.

An operation means, and an update means for updating the time-series data in accordance with an operation on the operation means,
The display control means, when one of a plurality of graphs displayed on the display device is selected by the operation means, performs display control so that an image of the graph can be seen in the foreground,
It said updating means, according to claim 1 or 2 operation to rewrite the selected graph when made by said operating means, and updates the time-series data corresponding to the graph in accordance with the rewriting operation The singing sound synthesizer described in 1.

Computer
Musical score data in which musical note data indicating the pitches and pronunciation periods of the musical notes constituting the musical piece and lyrics data indicating the lyrics uttered by the musical notes are arranged in the order of voice production, and a plurality of types sharing the time axis with the musical score data Time series data, and a plurality of types of time series data representing a time change of a parameter indicating a facial expression given to a singing sound synthesized according to the musical score data;
In accordance with the musical score data acquired by the acquisition means, the musical score image showing the time change of the pitch of the music and the time change of the lyrics is displayed on the display device, while the plurality of parameters along the time axis of the musical score image An image representing a graph showing each time change is synthesized according to each time series data, and the time axis of each graph and the time axis of the score image are associated with each other and displayed on the display device along with the score image. In each of the graph images showing the time change of each parameter, the region between the graph curve representing the time change of the parameter and the time axis is colored with a different color for each parameter . superimposed in correspondence with the time axis of each graph, for colored region overlap each other a color obtained by mixing the color colored to each area, the outermost Display control means for displaying on the display device by converting the color of tint color of the surface area,
A program that synthesizes the singing sound of the music according to the musical score data acquired by the acquiring means, and that functions as a singing sound synthesizing means that outputs the facial expression indicated by each parameter.