JP2013045082A

JP2013045082A - Musical piece generation device

Info

Publication number: JP2013045082A
Application number: JP2011185125A
Authority: JP
Inventors: Tsutomu Miyaki; 強宮木; Yoshikazu Kimura; 義一木村; Kenichiro Yamaguchi; 健一郎山口; Koichi Fujimoto; 功一藤本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2011-08-26
Filing date: 2011-08-26
Publication date: 2013-03-04
Anticipated expiration: 2031-08-26
Also published as: CN102956224B; CN102956224A; TWI471853B; TW201312547A; JP5974436B2

Abstract

PROBLEM TO BE SOLVED: To generate musically natural and various melodies.SOLUTION: A musical piece generation device includes: a first acquisition unit 42 for generating section designation data GA corresponding to one of a plurality of pieces of section designation data DA designating a time series of a sound production section; a second acquisition unit 44 for selecting one of a plurality of pieces of pitch designation data DB designating a time series of a pitch, as pitch designation data GB; and a melody generation unit for generating melody data which designates a time series of a note corresponding to each sound production section designated by the section designation data GA acquired by the first acquisition unit and a pitch designated by the pitch designation data GB acquired by the second acquisition unit 44 at a start point of the sound production section.

Description

本発明は、楽曲（旋律）を生成する技術に関する。 The present invention relates to a technique for generating music (melody).

楽曲の旋律を自動的に生成する技術（自動作曲技術）が従来から提案されている。例えば特許文献１には、音価列データが時系列に指定する各音価（発音区間）に音高を付与することで旋律を生成する技術が開示されている。各音価に付与される音高はランダムに選択される。特許文献２には、メロディ素材データが時系列に指定する各音高をキーやスケール等の背景情報に応じて変更することで旋律を生成する技術が開示されている。 A technique (automatic music technique) that automatically generates a melody of music has been proposed. For example, Patent Literature 1 discloses a technique for generating a melody by assigning a pitch to each note value (sounding section) specified in time series by note value string data. The pitch given to each note value is selected at random. Patent Document 2 discloses a technique for generating a melody by changing each pitch specified by melody material data in time series according to background information such as a key and a scale.

特開平９−０８１１４１号公報Japanese Patent Laid-Open No. 9-081141 特開平９−０８１１４３号公報Japanese Patent Laid-Open No. 9-081143

特許文献１の技術では、各音価に付与される音高がランダムに選択されるから、音楽的に自然な旋律を生成することは実際には困難である。また、特許文献２の技術では、メロディ素材データが指定する各音高を変更するに過ぎないから、多様な旋律を生成することは困難である。以上の事情を考慮して、本発明は、音楽的に自然で多様な旋律を生成することを目的とする。 In the technique of Patent Document 1, since a pitch given to each tone value is selected at random, it is actually difficult to generate a musically natural melody. In the technique of Patent Document 2, it is difficult to generate various melodies because only the pitches specified by the melody material data are changed. In view of the above circumstances, an object of the present invention is to generate musically natural and diverse melodies.

以上の課題を解決するために本発明が採用する手段を説明する。なお、本発明の理解を容易にするために、以下の説明では、本発明の各要素と後述の各実施形態の要素との対応を括弧書で付記するが、本発明の範囲を実施形態の例示に限定する趣旨ではない。 Means employed by the present invention to solve the above problems will be described. In order to facilitate understanding of the present invention, in the following description, the correspondence between each element of the present invention and the element of each of the embodiments described later is indicated in parentheses, but the scope of the present invention is not limited to the embodiment. It is not intended to limit the example.

本発明の楽曲生成装置は、発音区間の時系列を指定する複数の区間指定データ（例えば区間指定データＤA）の何れかに応じた区間指定データ（例えば区間指定データＧA）を取得する第１取得手段（例えば第１取得部４２）と、音高の時系列を指定する複数の音高指定データ（例えば音高指定データＤB）の何れかに応じた音高指定データ（例えば音高指定データＧB）を取得する第２取得手段（例えば第２取得部４４）と、第１取得手段が取得した区間指定データが指定する各発音区間と、第２取得手段が取得した音高指定データが当該発音区間の基準時点（例えば発音区間の始点）について指定する音高とに応じた音符の時系列を指定する旋律データ（例えば旋律データＤM）を生成する旋律生成手段（例えば旋律生成部５２２）とを具備する。以上の構成では、複数の区間指定データの何れかに応じた区間指定データと複数の音高指定データの何れかに応じた音高指定データとを利用して旋律データが生成されるから、例えばメロディ素材データが指定する各音高を変更するだけの特許文献２の技術と比較して多様な楽曲を生成できるという利点がある。また、区間指定データと音高指定データとをテンプレートとして利用して旋律データが生成されるから、例えば音高をランダムに選択する特許文献１の技術と比較して音楽的に自然な旋律を生成することが可能である。 The music generation device of the present invention acquires first section acquisition data (for example, section specification data GA) according to any of a plurality of section specification data (for example, section specification data DA) for specifying a time series of pronunciation sections. Pitch designation data (for example, pitch designation data GB) corresponding to any of the means (for example, the first acquisition unit 42) and a plurality of pitch designation data (for example, pitch designation data DB) for designating a time series of pitches. ) Is acquired by the second acquisition means (for example, the second acquisition unit 44), each sound generation section specified by the section specification data acquired by the first acquisition means, and the pitch specification data acquired by the second acquisition means is the sound generation. Melody generating means (for example, melody generating unit 522) for generating melody data (for example, melody data DM) for specifying a time series of notes corresponding to a pitch specified for a reference time point (for example, a starting point of a pronunciation section). Have . In the above configuration, the melody data is generated using the section designation data according to any of the plurality of section designation data and the pitch designation data according to any of the plurality of pitch designation data. There is an advantage that a variety of music can be generated as compared with the technique of Patent Document 2 in which only the pitches specified by the melody material data are changed. Also, since melody data is generated using the section specification data and pitch specification data as templates, for example, a musically natural melody is generated compared to the technique of Patent Document 1 in which pitches are selected at random. Is possible.

本発明の好適な態様において、旋律生成手段は、区間指定データが指定する各発音区間と音高指定データが当該発音区間の基準時点に指定する音高とを有する音符を配列した編成音符列（例えば編成音符列Ｍ）を生成する音高抽出手段（例えば音高抽出部６２）と、音高抽出手段が生成した編成音符列の各音符の音高を調整して旋律データを生成する音高調整手段（例えば音高調整部６４）とを含む。以上の態様では、区間指定データと音高指定データとから生成された編成音符列の各音符の音高が調整されるから、編成音符列を調整しない構成と比較して音楽的に自然な旋律を生成することが可能である。 In a preferred aspect of the present invention, the melody generating means includes a knitted musical note sequence in which notes having each sounding section designated by the section designation data and the pitch designated by the pitch designation data at the reference time of the sounding section are arranged ( For example, a pitch extraction unit (for example, pitch extraction unit 62) that generates a knitted note sequence M) and a pitch that generates melody data by adjusting the pitch of each note of the knitted note sequence generated by the pitch extraction unit. Adjusting means (for example, pitch adjusting unit 64). In the above aspect, since the pitch of each note of the knitted note sequence generated from the section specifying data and the pitch specifying data is adjusted, the musically natural melody is compared with the configuration in which the knitted note sequence is not adjusted. Can be generated.

音高調整手段による調整処理の内容は任意であるが、例えばコード進行を指定するコード進行データ（例えば複数のコード進行データの何れかに応じたコード進行データ）を取得する第３取得手段（例えば第３取得部４６）を具備する構成では、音高調整手段は、編成音符列の複数の音符のうち少なくとも一部の音符（例えば重要音や終結音に指定された音符）の音高を、当該音符の発音区間についてコード進行データが指定するコードの構成音の音高に変更する第１調整処理（例えば第１調整処理ＳB1）を実行する。以上の態様では、編成音符列のうち少なくとも一部の音符の音高が、コード進行データが指定するコードの構成音に変更されるから、各音符の音高が音楽的に自然に遷移する楽曲を生成することが可能である。また、編成音符列の複数の音符のうち少なくとも一部の音符（例えば経過音に指定された音符）の音高を、当該音符の直前または直後の音符の音高に対応する所定の範囲内の音高に変更する第２調整処理（例えば第２調整処理ＳB2）を音高調整手段が実行する構成でも、各音符の音高が音楽的に自然に遷移する楽曲を生成できるという利点がある。 The content of the adjustment processing by the pitch adjustment means is arbitrary, but for example, third acquisition means (for example, chord progression data for specifying chord progression (for example, chord progression data corresponding to any of a plurality of chord progression data)) In the configuration including the third acquisition unit 46), the pitch adjusting means calculates the pitch of at least some of the notes (for example, the notes designated as the important sound and the end sound) among the plurality of notes of the knitting note string. A first adjustment process (for example, a first adjustment process SB1) is executed to change the pitch of the constituent sound of the chord designated by the chord progression data for the note production period. In the above aspect, the pitch of at least some of the notes in the knitted note sequence is changed to the constituent sound of the chord specified by the chord progression data, so that the pitch of each note transitions musically naturally. Can be generated. In addition, the pitch of at least some of the notes (for example, the notes specified as the elapsed sound) of the plurality of notes in the knitting note string is set within a predetermined range corresponding to the pitch of the note immediately before or after the note. Even in a configuration in which the pitch adjustment means executes the second adjustment process (for example, the second adjustment process SB2) for changing to a pitch, there is an advantage that a musical piece in which the pitch of each note transitions naturally can be generated.

本発明の好適な態様に係る楽曲生成装置は、複数の音声単位（例えば音節や音素）を時系列に配列した指定文字列を取得する文字列設定手段（例えば文字列設定部３４）と、旋律生成手段が生成した旋律データが指定する複数の音符により指定文字列を発音した音声の音声信号を生成する音声合成手段（例えば音声合成部５２４）とを具備する。以上の態様によれば、音楽的に自然で多様な旋律を発音した音声の音声信号を生成できるという利点がある。 The music generation device according to a preferred aspect of the present invention includes a character string setting unit (for example, a character string setting unit 34) that acquires a designated character string in which a plurality of speech units (for example, syllables and phonemes) are arranged in time series, and a melody. Voice synthesizing means (for example, a voice synthesizing unit 524) that generates a voice signal of a voice that is generated by a plurality of notes designated by the melody data generated by the generating means. According to the above aspect, there exists an advantage that the audio | voice signal of the sound which sounded musically natural and various melodies can be produced | generated.

音声合成手段を具備する構成の好適例において、第１取得手段は、複数の区間指定データの何れかを選択する選択手段（例えば選択部４２２）と、選択手段が選択した区間指定データが指定する各発音区間と指定文字列の各音声単位とを相互に対応させる譜割手段（例えば譜割部４２４）とを含み、音声合成手段は、旋律データが指定する複数の音符の各々の発音区間にて譜割手段が当該発音区間に対応させた音声単位を発音する音声の音声信号を生成する。以上の態様では、旋律データが指定する各音符の発音区間にて指定文字列の音声単位が発音される自然な音声の音声信号が生成されるという利点がある。 In a preferred example of the configuration including the speech synthesizing unit, the first acquisition unit specifies the selection unit (for example, the selection unit 422) that selects any one of the plurality of section specifying data and the section specifying data selected by the selecting unit specifies. A musical score section (for example, a musical score section 424) that associates each sound generation section with each sound unit of the designated character string, and the speech synthesis means includes a sound generation section for each of the sound generation sections of the plurality of notes specified by the melody data. Then, the musical score generating means generates a voice signal for voice that pronounces a voice unit corresponding to the sound generation section. In the above aspect, there is an advantage that a sound signal of a natural sound in which the sound unit of the designated character string is pronounced in the sounding interval of each note designated by the melody data is generated.

本発明の好適な態様において、選択手段は、文字列設定手段が設定した指定文字列の音声単位の個数が多いほど、複数の区間指定データのうち発音区間の個数が多い区間指定データを選択する。以上の態様では、指定文字列の音声単位の個数が多いほど発音区間の個数が多い区間指定データが選択されるから、各発音区間と各音声単位とを無理なく対応させることが可能である。 In a preferred aspect of the present invention, the selecting means selects the section specifying data having a larger number of pronunciation sections among the plurality of section specifying data as the number of voice units of the specified character string set by the character string setting section is larger. . In the above aspect, since the section designating data having a larger number of sounding sections is selected as the number of sound units of the designated character string is larger, each sounding section can be associated with each sound unit without difficulty.

本発明の好適な態様において、譜割手段は、時間軸上に設定された複数のブロックの各々について、区間指定データが当該ブロック内に指定する発音区間の個数と指定文字列のうち当該ブロック内の音声単位の個数とが近似するように、指定文字列をブロック毎に区分し、各発音区間と各音声単位とをブロック毎に対応させる。以上の態様によれば、各ブロックにて発音区間と音声単位とを無理なく対応させることが可能である。また、発音区間の個数が音声単位の個数を下回るブロックについては当該ブロック内の発音区間を分割し、発音区間の個数が音声単位の個数を上回るブロックについては当該ブロック内の指定文字列に所定の音声単位（例えば長音記号）を追加する構成によれば、各発音区間と各音声単位とを１対１に対応させることが可能である。 In a preferred aspect of the present invention, the musical notation means includes, for each of a plurality of blocks set on the time axis, the number of pronunciation sections designated in the block by the section designation data and the designated character string. The designated character string is divided into blocks so that the number of voice units is approximated, and each sound generation section and each voice unit are associated with each block. According to the above aspect, it is possible to associate the sounding section and the voice unit without difficulty in each block. Also, for blocks where the number of sounding intervals is less than the number of speech units, the sounding interval in the block is divided, and for blocks where the number of sounding intervals exceeds the number of sound units, a predetermined character string in the block is specified. According to the configuration in which a voice unit (for example, a long sound symbol) is added, each sound generation section and each voice unit can be made to correspond one-to-one.

本発明の好適な態様に係る楽曲生成装置は、伴奏音を示す伴奏信号を生成する伴奏音生成手段（例えば伴奏音生成部５４）と、音声合成手段が生成した音声信号と伴奏音生成手段が生成した伴奏信号とを混合する混合手段（例えば混合部５６）とを具備する。以上の態様では、伴奏信号が音声信号に混合されるから、音声信号を単独で生成する構成と比較して音楽性の豊かな楽曲を生成できるという利点がある。また、音声合成手段が生成する音声信号と伴奏音生成手段が生成する伴奏信号とを同等のテンポに設定すれば、旋律と伴奏とが自然に整合した楽曲を生成できるという利点もある。 The music generating apparatus according to a preferred aspect of the present invention includes an accompaniment sound generating unit (for example, an accompaniment sound generating unit 54) that generates an accompaniment signal indicating an accompaniment sound, an audio signal generated by the voice synthesizing unit, and an accompaniment sound generating unit Mixing means (for example, a mixing unit 56) for mixing the generated accompaniment signal. In the above aspect, since the accompaniment signal is mixed with the audio signal, there is an advantage that a musical composition rich in musicality can be generated as compared with the configuration in which the audio signal is generated alone. Further, if the audio signal generated by the voice synthesizing means and the accompaniment signal generated by the accompaniment sound generating means are set to the same tempo, there is also an advantage that a music in which the melody and the accompaniment naturally match can be generated.

以上の各態様に係る楽曲生成装置は、専用のＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明に係るプログラムは、発音区間の時系列を指定する複数の区間指定データの何れかに応じた区間指定データを取得する第１取得処理と、音高の時系列を指定する複数の音高指定データの何れかに応じた音高指定データを取得する第２取得処理と、第１取得処理で取得した区間指定データが指定する各発音区間と、第２取得処理で取得した音高指定データが当該発音区間の基準時点について指定する音高とに応じた音符の時系列を指定する旋律データを生成する旋律生成処理とをコンピュータに実行させる。以上のプログラムによれば、本発明に係る楽曲生成装置と同様の作用および効果が実現される。なお、本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされるほか、通信網を介した配信の形態で提供されてコンピュータにインストールされる。 The music generating device according to each of the above aspects is realized by hardware (electronic circuit) such as a dedicated DSP (Digital Signal Processor), or a general-purpose arithmetic processing device such as a CPU (Central Processing Unit) and a program. It is also realized through collaboration. The program according to the present invention includes a first acquisition process for acquiring section specifying data corresponding to any of a plurality of section specifying data for specifying a time series of sound generation sections, and a plurality of pitches for specifying a time series of pitches. Second acquisition process for acquiring pitch specification data according to any of the specified data, each sounding section specified by the section specification data acquired in the first acquisition process, and pitch specification data acquired in the second acquisition process Causes the computer to execute a melody generation process for generating melody data that specifies a time series of notes corresponding to the pitch specified for the reference time point of the pronunciation interval. According to the above program, the same operation and effect as the music generating device according to the present invention are realized. Note that the program of the present invention is provided in a form stored in a computer-readable recording medium and installed in the computer, or is provided in a form distributed via a communication network and installed in the computer.

また、本発明は、楽曲を生成する方法としても実現される。本発明の楽曲生成方法は、発音区間の時系列を指定する複数の区間指定データの何れかに応じた区間指定データを取得し、音高の時系列を指定する複数の音高指定データの何れかに応じた音高指定データを取得し、取得した区間指定データが指定する各発音区間と、取得した音高指定データが当該発音区間の基準時点について指定する音高とに応じた音符の時系列を指定する旋律データを生成する。以上の方法によれば、本発明に係る楽曲生成装置と同様の作用および効果が実現される。 The present invention is also realized as a method for generating music. The music generation method of the present invention obtains section designation data corresponding to any of a plurality of section designation data for designating a time series of pronunciation sections, and any of a plurality of pitch designation data for designating a time series of pitches When the pitch specified data is acquired, and the sound generation interval specified by the acquired interval specification data and the pitch specified by the acquired pitch specification data are specified according to the reference time of the sound generation interval. Generate melody data that specifies the series. According to the above method, the same operation and effect as the music generating device according to the present invention are realized.

本発明の第１実施形態に係る楽曲生成装置のブロック図である。It is a block diagram of the music production | generation apparatus which concerns on 1st Embodiment of this invention. 区間指定データの説明図である。It is explanatory drawing of area designation data. 音高指定データの説明図である。It is explanatory drawing of pitch designation | designated data. 指定文字列の区分の説明図である。It is explanatory drawing of the division | segmentation of the designated character string. 第１取得部のブロック図である。It is a block diagram of a 1st acquisition part. 譜割処理のフローチャートである。It is a flowchart of a staff division process. 歌唱音生成部のブロック図である。It is a block diagram of a song sound production | generation part. 旋律生成部の動作の説明図である。It is explanatory drawing of operation | movement of a melody production | generation part. 調整処理のフローチャートである。It is a flowchart of an adjustment process. 伴奏音生成部のブロック図である。It is a block diagram of an accompaniment sound production | generation part. 第２実施形態に係る楽曲生成装置のブロック図である。It is a block diagram of the music production | generation apparatus which concerns on 2nd Embodiment.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る楽曲生成装置１００Aのブロック図である。楽曲生成装置１００Aは、楽曲を生成してその楽曲の演奏音の音響信号Ｖを出力する信号処理装置（自動作曲装置）である。第１実施形態では、歌唱音の旋律に楽器音の伴奏を付加した楽曲（歌唱曲）を生成する構成を例示する。 <First Embodiment>
FIG. 1 is a block diagram of a music generation device 100A according to the first embodiment of the present invention. The music generation device 100A is a signal processing device (automatic music device) that generates a music and outputs an acoustic signal V of the performance sound of the music. In 1st Embodiment, the structure which produces | generates the music (song song) which added the accompaniment of the instrument sound to the melody of the song sound is illustrated.

楽曲生成装置１００Aには入力装置１２と放音装置１４とが接続される。入力装置１２は、利用者からの指示を受付ける機器（例えばマウスやキーボード）である。放音装置１４（例えばスピーカやヘッドホン）は、楽曲生成装置１００Aから供給される音響信号Ｖに応じた音波を放射する。なお、音響信号Ｖをデジタルからアナログに変換するＤ/Ａ変換器等の図示は便宜的に省略されている。 An input device 12 and a sound emitting device 14 are connected to the music generating device 100A. The input device 12 is a device (for example, a mouse or a keyboard) that receives an instruction from a user. The sound emitting device 14 (for example, a speaker or a headphone) emits a sound wave corresponding to the acoustic signal V supplied from the music generating device 100A. In addition, illustration of the D / A converter etc. which convert the acoustic signal V from digital to analog is abbreviate | omitted for convenience.

第１実施形態の楽曲生成装置１００Aは、演算処理装置２２と記憶装置２４とを具備するコンピュータシステムで実現される。演算処理装置２２は、記憶装置２４に格納されたプログラムＰGMを実行することで、楽曲を生成するための複数の機能（変数設定部３２，文字列設定部３４，第１取得部４２，第２取得部４４，第３取得部４６，歌唱音生成部５２，伴奏音生成部５４，混合部５６）を実現する。なお、演算処理装置２２の一部の機能を専用の電子回路（ＤＳＰ）に分担させることも可能である。 The music generation device 100 </ b> A of the first embodiment is realized by a computer system that includes an arithmetic processing device 22 and a storage device 24. The arithmetic processing device 22 executes a program PGM stored in the storage device 24 to thereby generate a plurality of functions (a variable setting unit 32, a character string setting unit 34, a first acquisition unit 42, and a second acquisition unit). An acquisition unit 44, a third acquisition unit 46, a singing sound generation unit 52, an accompaniment sound generation unit 54, and a mixing unit 56) are realized. Note that a part of the functions of the arithmetic processing unit 22 can be shared by a dedicated electronic circuit (DSP).

記憶装置２４は、演算処理装置２２が実行するプログラムＰGMや演算処理装置２２が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体や複数種の記録媒体の組合せが記憶装置２４として任意に採用され得る。 The storage device 24 stores a program PGM executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media can be arbitrarily employed as the storage device 24.

第１実施形態の記憶装置２４は、複数の区間指定データＤAと複数の音高指定データＤBと複数のコード進行データＤCとコードテーブルＴBLとを記憶する。各区間指定データＤAと各音高指定データＤBと各コード進行データＤCとは、楽曲生成の素材として選択された既存の複数の楽曲の各々における所定の区間（以下「素材区間」という）から生成されたテンプレートである。 The storage device 24 of the first embodiment stores a plurality of section specifying data DA, a plurality of pitch specifying data DB, a plurality of chord progression data DC, and a chord table TBL. Each section designating data DA, each pitch designating data DB, and each chord progression data DC are generated from a predetermined section (hereinafter referred to as “material section”) in each of a plurality of existing music pieces selected as music generation materials. Template.

図２および図３に示すように、第１実施形態では、相連続する所定個（第１実施形態の例示では８個）の小節が各楽曲から素材区間Ｑとして抽出される。各素材区間Ｑは拍子（例えば４/４拍子）が共通する。したがって、各素材区間Ｑを共通のテンポで再生した場合の各素材区間Ｑの時間長は相等しい。図２および図３に示すように、素材区間Ｑは、楽曲の小節の２個分を単位として４個のブロックＱB（ＱB1〜ＱB4）に区分される。ただし、素材区間Ｑ内のブロックＱBの個数やブロックＱBを構成する小節の個数は任意に変更され得る。 As shown in FIGS. 2 and 3, in the first embodiment, a predetermined number of consecutive bars (eight in the example of the first embodiment) are extracted as the material section Q from each piece of music. Each material section Q has a common time signature (for example, 4/4 time signature). Accordingly, the time lengths of the material sections Q when the material sections Q are reproduced at a common tempo are the same. As shown in FIGS. 2 and 3, the material section Q is divided into four blocks QB (QB1 to QB4) in units of two music bars. However, the number of blocks QB in the material section Q and the number of bars constituting the block QB can be arbitrarily changed.

記憶装置２４に記憶された各区間指定データＤAは、図２に示すように、素材区間Ｑ内の旋律を構成する各音符が発音される区間（以下「発音区間」という）Ｓの時系列を指定する。すなわち、各区間指定データＤAは、素材区間Ｑ内の旋律のリズムパターンを表現する。複数の区間指定データＤAの各々は、相異なる素材区間Ｑ（相異なる楽曲の素材区間Ｑまたは１個の楽曲内の相異なる素材区間Ｑ）から生成される。したがって、区間指定データＤAが指定する発音区間Ｓの時系列の態様（各発音区間Ｓの時間軸上の位置や継続長や個数）は区間指定データＤA毎に相違する。 As shown in FIG. 2, each section designation data DA stored in the storage device 24 is a time series of sections (hereinafter referred to as “sound generation sections”) S in which each note constituting the melody in the material section Q is pronounced. specify. That is, each section designation data DA expresses a rhythm pattern of melody in the material section Q. Each of the plurality of section designation data DA is generated from different material sections Q (material sections Q of different music pieces or different material sections Q in one music piece). Therefore, the time-series manner of the sound generation section S specified by the section specification data DA (position, duration, and number of sound generation sections S on the time axis) is different for each section specification data DA.

図２に示すように、１個の区間指定データＤAは、素材区間Ｑ内の相異なる発音区間Ｓに対応する複数の単位データＵAの時系列で構成される。各単位データＵAは、属性情報ＵA1と時間情報ＵA2とを含んで構成される。時間情報ＵA2は、発音区間Ｓの時間軸上の位置（例えば開始点の時刻）および継続長（音価）を指定する。 As shown in FIG. 2, one section designation data DA is composed of a plurality of unit data UA corresponding to different sound generation sections S in the material section Q. Each unit data UA includes attribute information UA1 and time information UA2. The time information UA2 designates the position on the time axis of the sounding section S (for example, the time at the start point) and the duration (sound value).

属性情報ＵA1は、発音区間Ｓの音符の音楽的な属性（楽曲内での音楽的な意義）を指定する。具体的には、属性情報ＵA1は、図２に示すように、重要音と終結音と経過音との何れかを指定する。重要音は、素材区間Ｑ内で音楽的に重要な音符を意味する。具体的には、継続長が長い音符や楽曲のコードの構成音が重要音に分類される。終結音は、素材区間Ｑ内の各ブロックＱBの最後に位置する音符（重要音以外の音符）である。経過音は、重要音および終結音以外の音符である。各単位データＵAの属性情報ＵA1は、例えば楽曲生成装置１００Aの提供者が各素材区間Ｑの内容を解析することで手動により設定する。 The attribute information UA1 designates the musical attribute (musical significance within the musical composition) of the notes in the pronunciation period S. Specifically, as shown in FIG. 2, the attribute information UA1 designates any one of an important sound, a termination sound, and a elapsed sound. The important sound means a musically important note in the material section Q. Specifically, notes having a long duration or constituent sounds of music chords are classified as important sounds. The closing sound is a note (note other than the important sound) located at the end of each block QB in the material section Q. The elapsed sound is a note other than the important sound and the end sound. The attribute information UA1 of each unit data UA is manually set by analyzing the contents of each material section Q, for example, by the provider of the music generation device 100A.

各区間指定データＤAは、ＭＩＤＩ（Musical Instrument Digital Interface）規格に準拠したＳＭＦ（Standard MIDI File）形式の音楽ファイルとして記述され得る。具体的には、属性情報ＵA1をノートナンバの数値で便宜的に指定するイベントデータと、各イベントデータの処理間隔を時間情報ＵA2として指定するタイミングデータとを時系列に配列したＳＭＦ形式の時系列データが区間指定データＤAとして作成されて記憶装置２４に格納される。 Each section designation data DA can be described as a music file in the SMF (Standard MIDI File) format conforming to the MIDI (Musical Instrument Digital Interface) standard. Specifically, the event data that designates the attribute information UA1 with a numerical value of the note number for convenience, and the timing data that designates the processing interval of each event data as the time information UA2 are arranged in time series in the SMF format. Data is created as section designation data DA and stored in the storage device 24.

図３に示すように、記憶装置２４に記憶された各音高指定データＤBは、素材区間Ｑ内の旋律を構成する各音符の音高の時系列を指定する。すなわち、各音高指定データＤBは、素材区間Ｑ内のメロディラインを表現する。音高指定データＤBが指定する音高の時系列の態様は音高指定データＤB毎に相違する。 As shown in FIG. 3, each pitch designation data DB stored in the storage device 24 designates a time series of pitches of each note constituting the melody in the material section Q. That is, each pitch designation data DB represents a melody line in the material section Q. The time-series manner of the pitch designated by the pitch designation data DB is different for each pitch designation data DB.

図３に示すように、１個の音高指定データＤBは、素材区間Ｑ内の相異なる音符に対応する複数の単位データＵBの時系列で構成される。各単位データＵBは、音高情報ＵB1と時間情報ＵB2とを含んで構成される。音高情報ＵB1は音符の音高を指定し、時間情報ＵB2は音符の時間軸上の位置および継続長を指定する。各音高指定データＤBは、音高情報ＵB1をノートナンバの数値で指定するイベントデータと、各イベントデータの処理間隔を時間情報ＵB2として指定するタイミングデータとを時系列に配列したＳＭＦ形式の音楽ファイルとして記述され得る。 As shown in FIG. 3, one pitch designation data DB is constituted by a time series of a plurality of unit data UB corresponding to different notes in the material section Q. Each unit data UB includes pitch information UB1 and time information UB2. The pitch information UB1 designates the pitch of the note, and the time information UB2 designates the position and duration of the note on the time axis. Each pitch designation data DB is music in SMF format in which event data that designates pitch information UB1 with a numerical value of note number and timing data that designates the processing interval of each event data as time information UB2 are arranged in time series. Can be described as a file.

記憶装置２４に記憶された各コード進行データＤCは、素材区間Ｑ内のコード進行（コードの時系列）を指定する。具体的には、各コード進行データＤCは、素材区間Ｑを区分した単位時間（例えば８分音符の時間長）毎にコードを指定する。各コード進行データが指定するコードの時系列はコード進行データＤC毎に相違する。図１のコードテーブルＴBLは、コード進行データＤCで指定され得るコード毎にそのコードの複数の構成音を指定する。以上が第１実施形態の記憶装置２４に記憶される主要なデータである。 Each chord progression data DC stored in the storage device 24 designates chord progression (code time series) in the material section Q. Specifically, each chord progression data DC designates a chord for each unit time (eg, eighth note time length) in which the material section Q is divided. The time series of chords specified by each chord progression data is different for each chord progression data DC. The chord table TBL in FIG. 1 designates a plurality of constituent sounds of the chord for each chord that can be designated by the chord progression data DC. The above is the main data stored in the storage device 24 of the first embodiment.

図１の変数設定部３２は、楽曲生成に適用される変数を可変に設定する。具体的には、変数設定部３２は、入力装置１２に対する利用者からの指示に応じて楽曲のテンポＸtと音楽的なスタイル（例えばロックやジャズ等のジャンル）Ｘsとを設定する。テンポＸtは、例えば利用者から指示された任意の数値に設定され、スタイルＸsは、例えば複数の選択肢のうち利用者が選択した内容に設定される。 The variable setting unit 32 in FIG. 1 variably sets a variable applied to music generation. Specifically, the variable setting unit 32 sets a music tempo Xt and a musical style (for example, a genre such as rock or jazz) Xs in accordance with an instruction from the user to the input device 12. The tempo Xt is set to an arbitrary numerical value instructed by the user, for example, and the style Xs is set to the content selected by the user among a plurality of options, for example.

文字列設定部３４は、楽曲の歌詞の文字列（以下「指定文字列」という）ＤYを可変に設定する。具体的には、文字列設定部３４は、入力装置１２に対する利用者からの指示（文字入力）に応じて指定文字列ＤYを設定する。図４に示すように、指定文字列ＤYは、複数の音節ｙ（音声単位）で構成される。 The character string setting unit 34 variably sets a character string (hereinafter referred to as “designated character string”) DY of the lyrics of the music. Specifically, the character string setting unit 34 sets the designated character string DY according to an instruction (character input) from the user to the input device 12. As shown in FIG. 4, the designated character string DY is composed of a plurality of syllables y (sound units).

指定文字列ＤYは、文字列Ｙaと文字列Ｙbとに分割される。具体的には、利用者が２行分の指定文字列ＤYを指定する構成では、改行以前の文字列Ｙaと改行以降の文字列Ｙbとに指定文字列ＤYが分割される。なお、指定文字列ＤYの分割の方法は任意である。２個の文章で構成される指定文字列ＤYを利用者が指定する構成では、第１文に対応する文字列Ｙaと第２文に対応する文字列Ｙbとに指定文字列ＤYが分割される。 The designated character string DY is divided into a character string Ya and a character string Yb. Specifically, in the configuration in which the user designates the designated character string DY for two lines, the designated character string DY is divided into the character string Ya before the line feed and the character string Yb after the line feed. The method for dividing the designated character string DY is arbitrary. In a configuration in which the user designates a designated character string DY composed of two sentences, the designated character string DY is divided into a character string Ya corresponding to the first sentence and a character string Yb corresponding to the second sentence. .

図１の第１取得部４２は、記憶装置２４に記憶された複数の区間指定データＤAの何れかに応じた区間指定データＧAを生成する。図５は、第１実施形態の第１取得部４２のブロック図である。図５に示すように、第１取得部４２は、選択部４２２と譜割部４２４とを含んで構成される。 The first acquisition unit 42 in FIG. 1 generates section designation data GA corresponding to any of the plurality of section designation data DA stored in the storage device 24. FIG. 5 is a block diagram of the first acquisition unit 42 of the first embodiment. As shown in FIG. 5, the first acquisition unit 42 includes a selection unit 422 and a staff division unit 424.

選択部４２２は、記憶装置２４に記憶された複数の区間指定データＤAの何れかを選択する。第１実施形態では、文字列設定部３４が設定した指定文字列ＤYを構成する音節ｙの個数に応じて区間指定データＤAが選択される。具体的には、選択部４２２は、指定文字列ＤYの音節ｙの個数が多いほど発音区間Ｓの個数が多い区間指定データＤAを選択する。 The selection unit 422 selects any one of the plurality of section designation data DA stored in the storage device 24. In the first embodiment, the section designation data DA is selected according to the number of syllables y constituting the designated character string DY set by the character string setting unit 34. Specifically, the selection unit 422 selects the section designation data DA in which the number of pronunciation sections S increases as the number of syllables y in the designated character string DY increases.

例えば選択部４２２は、記憶装置２４に記憶された複数の区間指定データＤAの各々について以下の数式(1)の指標値αを算定する。
α＝|Ｎa−ＮYa|＋|Ｎb−ＮYb| ……(1)
数式(1)の第１項|Ｎa−ＮYa|は、素材区間Ｑの前半区間（ブロックＱB1およびブロックＱB2）内に区間指定データＤAが指定する発音区間Ｓの個数Ｎaと、指定文字列ＤYのうち前方の文字列Ｙaの音節ｙの個数ＮYaとの差分の絶対値である。同様に、数式(1)の第２項|Ｎb−ＮYb|は、素材区間Ｑの後半区間（ブロックＱB3およびブロックＱB4）内に区間指定データＤAが指定する発音区間Ｓの個数Ｎbと、指定文字列ＤYのうち後方の文字列Ｙbの音節ｙの個数ＮYbとの差分の絶対値である。すなわち、素材区間Ｑの前半区間での発音区間Ｓの個数Ｎaと文字列Ｙaの音節ｙの個数ＮYaとが近似するほど、または、素材区間Ｑの後半区間での発音区間Ｓの個数Ｎbと文字列Ｙbの音節ｙの個数ＮYbとが近似するほど、指標値αは小さい数値となる。 For example, the selection unit 422 calculates an index value α of the following formula (1) for each of the plurality of section designation data DA stored in the storage device 24.
α = | Na−NYa | + | Nb−NYb | (1)
The first term | Na−NYa | in the equation (1) is the number Na of the pronunciation sections S designated by the section designation data DA in the first half section (block QB1 and block QB2) of the material section Q and the designated character string DY. Of these, the absolute value of the difference from the number NYa of syllables y of the preceding character string Ya. Similarly, the second term | Nb−NYb | of the equation (1) includes the number Nb of the sound generation sections S specified by the section specifying data DA in the latter half section (block QB3 and block QB4) of the material section Q, and the specified character. This is the absolute value of the difference from the number NYb of syllables y of the rear character string Yb in the string DY. That is, the number Nb of the sounding sections S and the number Nb of the sounding sections S in the latter half of the material section Q, or the number Na of the sounding sections S in the first half section of the material section Q and the number NYa of the syllable y of the character string Ya are approximated. The index value α becomes smaller as the number NYb of syllables y in the column Yb approximates.

選択部４２２は、指標値αが所定の範囲内（例えば５以下の範囲内）にある区間指定データＤA（すなわち、指定文字列ＤYの音節ｙの個数に近い個数の発音区間Ｓを指定する区間指定データＤA）を記憶装置２４から検索し、検索した複数の区間指定データＤAからランダムに１個の区間指定データＤAを選択する。 The selection unit 422 specifies the section designating data DA (that is, the number of pronunciation sections S close to the number of syllables y in the designated character string DY) in which the index value α is within a predetermined range (for example, a range of 5 or less). The designation data DA) is retrieved from the storage device 24, and one section designation data DA is selected at random from the retrieved plurality of section designation data DA.

図５の譜割部４２４は、選択部４２２が選択した区間指定データＤAが示す各発音区間Ｓと文字列設定部３４が設定した指定文字列ＤYの各音節ｙとを対応させる譜割処理を実行する。具体的には、譜割部４２４は、図４に示すように、素材区間Ｑ内の４個のブロックＱB（ＱB1〜ＱB4）の各々について、区間指定データＤAがそのブロックＱBi（ｉ＝１〜４）内に指定する発音区間Ｓの個数Ｎiと指定文字列ＤYのうちそのブロックＱBi内の音節ｙの個数ＮYiとの差異が最小化されるように、指定文字列ＤYの各音節ｙと素材区間Ｑ内の各発音区間Ｓとを対応させる。 The musical score section 424 in FIG. 5 performs musical score processing for associating each sound generation section S indicated by the section designation data DA selected by the selection section 422 with each syllable y of the designated character string DY set by the character string setting section 34. Run. Specifically, as shown in FIG. 4, the staff division unit 424 sets the section designation data DA for each of the four blocks QB (QB1 to QB4) in the material section Q so that the block QBi (i = 1 to 1). 4) Each syllable y of the designated character string DY and the material so that the difference between the number Ni of the sound generation sections S designated in the number N and the number NYi of the syllable y in the block QBi of the designated character string DY is minimized. The sound generation sections S in the section Q are associated with each other.

図６は、譜割処理のフローチャートである。譜割処理を開始すると、譜割部４２４は、図４に示すように、指定文字列ＤYを素材区間Ｑ内のブロックＱB毎に分割する（ＳA1）。指定文字列ＤYの分割の境界は、指定文字列ＤYの各文節の境界から選択される。指定文字列ＤYの文節の特定には、形態素解析等の公知の自然言語処理が任意に利用され得る。なお、図４では、文字列Ｙaと文字列Ｙbとの境界がブロックＱB2とブロックＱB3との境界と合致する場合を便宜的に例示したが、文字列Ｙaと文字列Ｙbとの境界と各ブロックＱBの境界とは必ずしも合致しない。 FIG. 6 is a flowchart of the staff division process. When the musical score processing is started, the musical score section 424 divides the designated character string DY into blocks QB in the material section Q as shown in FIG. 4 (SA1). The dividing boundary of the designated character string DY is selected from the boundaries of the clauses of the designated character string DY. Known natural language processing such as morphological analysis can be arbitrarily used to specify the phrase of the designated character string DY. In FIG. 4, the case where the boundary between the character string Ya and the character string Yb coincides with the boundary between the block QB2 and the block QB3 is illustrated for convenience, but the boundary between the character string Ya and the character string Yb and each block It does not necessarily match the QB boundary.

具体的には、譜割部４２４は、以下の数式(2)で定義される指標値βが最小化するように指定文字列ＤYを文節の境界で分割する。
β＝|Ｎ1−ＮY1|²＋|Ｎ2−ＮY2|²＋|Ｎ3−ＮY3|²＋|Ｎ4−ＮY4|² ……(2)
数式(2)から理解されるように、指標値βは、区間指定データＤAがブロックＱBi内に指定する発音区間Ｓの個数Ｎiと、指定文字列ＤYのうちブロックＱBi内に存在する音節ｙの個数ＮYiとの差異の自乗|Ｎi−ＮYi|²を、複数のブロックＱB1〜ＱB4について積算した数値である。すなわち、譜割部４２４は、発音区間Ｓの個数Ｎiと音節ｙの個数ＮYiとの差異が複数のブロックＱB1〜ＱB4について最小化されるように、指定文字列ＤYをブロックＱB毎に区分する。 Specifically, the staff division unit 424 divides the designated character string DY at the boundary of the phrase so that the index value β defined by the following formula (2) is minimized.
β = | N1−NY1 | ² + | N2−NY2 | ² + | N3−NY3 | ² + | N4−NY4 | ² (2)
As understood from the equation (2), the index value β is determined by the number Ni of the sound generation sections S specified by the section specifying data DA in the block QBi and the syllable y existing in the block QBi of the specified character string DY. This is a numerical value obtained by integrating the square | Ni-NYi | ² of the difference from the number NYi for a plurality of blocks QB1 to QB4. That is, the staff division unit 424 divides the designated character string DY for each block QB so that the difference between the number Ni of the sound generation sections S and the number NYi of the syllables y is minimized for the plurality of blocks QB1 to QB4.

指定文字列ＤYを以上の処理で分割した段階では、各ブロックＱBi内の発音区間Ｓの個数Ｎiと指定文字列ＤYの音節ｙの個数ＮYiとは必ずしも合致しない。そこで、譜割部４２４は、各ブロックＱB内で発音区間Ｓの個数Ｎiと指定文字列ＤYの音節ｙの個数ＮYiとを合致させるための処理（ＳA2，ＳA3）を実行する。すなわち、譜割部４２４は、各ブロックＱB内で発音区間Ｓと音節ｙとを１対１に対応させる。 At the stage where the designated character string DY is divided by the above processing, the number Ni of the sounding sections S in each block QBi and the number NYi of the syllable y of the designated character string DY do not necessarily match. Therefore, the musical score section 424 executes a process (SA2, SA3) for matching the number Ni of the pronunciation sections S with the number NYi of the syllable y of the designated character string DY in each block QB. That is, the musical score section 424 associates the sound generation section S and the syllable y on a one-to-one basis within each block QB.

第１に、譜割部４２４は、発音区間Ｓの個数Ｎiが音節ｙの個数ＮYiを下回るブロックＱBi（Ｎi＜ＮYi）について、各発音区間Ｓを適宜に分割することでブロックＱBi内の発音区間Ｓの個数Ｎiを音節ｙの個数ＮYiまで増加させる（ＳA2）。具体的には、譜割部４２４は、ブロックＱBi内で最長の発音区間Ｓを分割（例えば２等分）する処理を、発音区間Ｓの個数Ｎiが音節ｙの個数ＮYiに到達するまで反復する。分割後の複数の発音区間Ｓの属性（重要音／終結音／経過音）は分割前の発音区間Ｓと共通の属性に設定される。 First, the score division unit 424 appropriately divides each sounding section S for the block QBi (Ni <NYi) in which the number Ni of the sounding sections S is less than the number NYi of the syllable y, thereby generating the sounding sections in the block QBi. The number S of Ni is increased to the number NYi of syllables y (SA2). Specifically, the score division unit 424 repeats the process of dividing (for example, dividing into two equal parts) the longest sounding section S in the block QBi until the number Ni of sounding sections S reaches the number NYi of syllables y. . The attributes (important sound / final sound / elapsed sound) of the plurality of sounding sections S after the division are set to the same attributes as the sounding section S before the division.

第２に、譜割部４２４は、発音区間Ｓの個数Ｎiが音節ｙの個数ＮYiを上回るブロックＱBi（Ｎi＞ＮYi）について、調整用の所定の音節ｙを指定文字列ＤYに適宜に挿入することでブロックＱBi内の音節ｙの個数ＮYiを発音区間Ｓの個数Ｎiまで増加させる（ＳA3）。具体的には、譜割部４２４は、直前の音節ｙの継続を意味する長音記号「ー」を調整用の音節ｙとしてブロックＱBi内の指定文字列ＤYに挿入する処理を、音節ｙの個数ＮYiが発音区間Ｓの個数Ｎiに到達するまで反復する。調整用の音節ｙが挿入される位置は、例えばブロックＱBi内の指定文字列ＤYからランダムに選択される。 Secondly, the staff division unit 424 appropriately inserts a predetermined syllable y for adjustment into the designated character string DY for the block QBi (Ni> NYi) in which the number Ni of the pronunciation sections S exceeds the number NYi of the syllables y. As a result, the number NYi of syllables y in the block QBi is increased to the number Ni of the sounding sections S (SA3). Specifically, the staff division unit 424 inserts into the designated character string DY in the block QBi a long syllable symbol “-”, which means continuation of the immediately preceding syllable y, as the syllable y for adjustment. Iterates until NYi reaches the number Ni of the sound generation sections S. The position where the adjustment syllable y is inserted is selected at random from the designated character string DY in the block QBi, for example.

以上に説明した譜割処理により、各ブロックＱBi内の発音区間Ｓと音節ｙとが１対１に対応する。譜割部４２４は、譜割処理後の各発音区間Ｓの時系列を指定する区間指定データＧAと譜割処理後の各音節ｙの時系列を指定する指定文字列ＧYとを生成する。すなわち、区間指定データＧAは、選択部４２２が選択した区間指定データＤAで指定される複数の発音区間Ｓの一部をステップＳA2での分割後の複数の発音区間Ｓに置換した状態の発音区間Ｓの時系列を指定する時系列データであり、区間指定データＤAと同様に複数の単位データＵAを含んで構成される。指定文字列ＧYは、文字列設定部３４が設定した指定文字列ＤYにステップＳA3で調整用の音節ｙを追加した内容の文字列である。以上の説明から理解されるように、区間指定データＧAが指定する各発音区間Ｓと指定文字列ＧYを構成する各音節ｙとは１対１に対応する。 By the musical score processing described above, the sound generation section S and the syllable y in each block QBi correspond one-to-one. The musical score section 424 generates section designation data GA for designating the time series of each sounding section S after the musical score processing and the designated character string GY for designating the time series of each syllable y after the musical score processing. That is, the section designating data GA is the sounding section in a state where a part of the plurality of sounding sections S designated by the section designating data DA selected by the selection unit 422 is replaced with the plurality of sounding sections S after the division at step SA2. This is time series data for designating a time series of S, and includes a plurality of unit data UA as with the section designation data DA. The designated character string GY is a character string having contents obtained by adding the syllable y for adjustment to the designated character string DY set by the character string setting unit 34 in step SA3. As understood from the above description, each sound generation section S designated by the section designation data GA and each syllable y constituting the designated character string GY have a one-to-one correspondence.

図１の第２取得部４４は、記憶装置２４に記憶された複数の音高指定データＤBの何れかを音高指定データＧBとして選択して記憶装置２４から取得する。音高指定データＤBの選択の方法は任意であるが、例えば、複数の音高指定データＤBの何れかをランダムに音高指定データＧBとして選択することが可能である。 The second acquisition unit 44 in FIG. 1 selects any of the plurality of pitch designation data DB stored in the storage device 24 as the pitch designation data GB and acquires it from the storage device 24. Although the method for selecting the pitch designation data DB is arbitrary, for example, any one of the plurality of pitch designation data DB can be randomly selected as the pitch designation data GB.

図１の第３取得部４６は、記憶装置２４に記憶された複数のコード進行データＤCの何れかをコード進行データＧCとして選択して記憶装置２４から取得する。例えば第３取得部４６は、複数のコード進行データＤCの何れかをランダムにコード進行データＧCとして選択する。 The third acquisition unit 46 in FIG. 1 selects any one of the plurality of chord progression data DC stored in the storage device 24 as chord progression data GC and obtains it from the storage device 24. For example, the third acquisition unit 46 randomly selects any one of the plurality of chord progression data DC as the chord progression data GC.

歌唱音生成部５２は、第１取得部４２が生成した区間指定データＧAと第２取得部４４が取得した音高指定データＧBと第３取得部４６が取得したコード進行データＧCとに応じた旋律を第１取得部４２が生成した指定文字列ＧYで発音する音声の音声信号ＶAを生成する。 The singing sound generating unit 52 corresponds to the section specifying data GA generated by the first acquiring unit 42, the pitch specifying data GB acquired by the second acquiring unit 44, and the chord progression data GC acquired by the third acquiring unit 46. A voice signal VA of a voice that produces a melody with the designated character string GY generated by the first acquisition unit 42 is generated.

図７は、歌唱音生成部５２のブロック図である。図７に示すように、歌唱音生成部５２は、旋律生成部５２２と音声合成部５２４とを具備する。旋律生成部５２２は、区間指定データＧAと音高指定データＧBとコード進行データＧCとをテンプレートとして利用して旋律データＤMを生成する要素であり、音高抽出部６２と音高調整部６４とを含んで構成される。 FIG. 7 is a block diagram of the singing sound generation unit 52. As shown in FIG. 7, the singing sound generation unit 52 includes a melody generation unit 522 and a voice synthesis unit 524. The melody generation unit 522 is an element that generates the melody data DM using the section specification data GA, the pitch specification data GB, and the chord progression data GC as a template. The pitch extraction unit 62, the pitch adjustment unit 64, It is comprised including.

音高抽出部６２は、図８に示すように、区間指定データＧAが指定する各発音区間Ｓと、音高指定データＧBがその発音区間Ｓの始点について指定する音高とで規定される音符ｎの時系列（以下「編成音符列」という）Ｍを生成する。具体的には、編成音符列Ｍを構成する複数の音符ｎのうち第ｍ番目（ｍは自然数）の音符ｎは、区間指定データＧAが指定する複数の発音区間Ｓのうち第ｍ番目の発音区間Ｓの始点から終点までにわたり、音高指定データＧBがその発音区間Ｓの始点の時点に指定する音高に維持される。例えば、図８に例示された編成音符列Ｍのうち第１番目の音符ｎは、区間指定データＧAが指定する最初の発音区間Ｓ内において、その発音区間Ｓの始点の時点について音高指定データＧBが指定する音高“Ｃ#”に維持される。以上の説明から理解されるように、編成音符列Ｍの各音符ｎと区間指定データＧAが指定する各発音区間とは１対１に対応する。 The pitch extraction unit 62, as shown in FIG. 8, is a musical note defined by each tone generation section S designated by the section designation data GA and a pitch designated by the pitch designation data GB for the start point of the pronunciation section S. A time series of n (hereinafter referred to as “knitted note string”) M is generated. Specifically, the m-th note n (m is a natural number) of the plurality of notes n constituting the knitted note string M is the m-th pronunciation of the plurality of sound generation intervals S specified by the interval specification data GA. From the start point to the end point of the section S, the pitch specification data GB is maintained at the pitch specified at the start point of the sound generation section S. For example, the first note n in the knitting note string M illustrated in FIG. 8 is the pitch designation data for the start point of the sounding section S in the first sounding section S designated by the section designating data GA. The pitch “C #” specified by GB is maintained. As can be understood from the above description, each note n of the knitting note string M and each sounding section specified by the section specifying data GA have a one-to-one correspondence.

図７の音高調整部６４は、音高抽出部６２が生成した編成音符列Ｍの各音符ｎの音高を調整して旋律データＤMを生成する。図９は、音高調整部６４による調整処理のフローチャートである。図９に示すように、音高調整部６４は、第１調整処理ＳB1と第２調整処理ＳB2と第３調整処理ＳB3とを順次に実行する。なお、各調整処理（ＳB1〜ＳB3）の順序は適宜に変更され得る。 The pitch adjusting unit 64 in FIG. 7 adjusts the pitch of each note n of the knitted note sequence M generated by the pitch extracting unit 62 to generate melody data DM. FIG. 9 is a flowchart of adjustment processing by the pitch adjustment unit 64. As shown in FIG. 9, the pitch adjustment unit 64 sequentially executes a first adjustment process SB1, a second adjustment process SB2, and a third adjustment process SB3. Note that the order of the adjustment processes (SB1 to SB3) can be changed as appropriate.

第１調整処理ＳB1は、編成音符列Ｍを構成する複数の音符ｎのうち区間指定データＧAで重要音または終結音に指定された音符ｎの音高を、第３取得部４６が取得したコード進行データＧCに応じて調整する処理である。具体的には、音高調整部６４は、編成音符列Ｍのうち重要音または終結音に指定された各音符ｎの音高を、コード進行データＧCがその音符ｎの発音区間Ｓについて指定するコード（例えば発音区間Ｓの始点のコード）の構成音のうち当該音高に最も近い音高に変更する。コード進行データＧCが指定するコードの構成音は、記憶装置２４に記憶されたコードテーブルＴBLから特定される。 In the first adjustment process SB1, the third acquisition unit 46 acquires the pitch of the note n designated as the important sound or the end sound in the section designation data GA among the plurality of notes n constituting the knitting note string M. This is a process of adjusting according to the progress data GC. Specifically, the pitch adjusting unit 64 specifies the pitch of each note n designated as an important note or a final note in the knitted note string M for the sound generation interval S of the note n by the chord progression data GC. Of the constituent sounds of the chord (for example, the chord at the start point of the sounding section S), the pitch is changed to the pitch closest to the pitch. The sound of the chord designated by the chord progression data GC is specified from the chord table TBL stored in the storage device 24.

ただし、音楽的な自然性を確保する観点から以下の例外処理が実行される。第１に、重要音の音符ｎの音高が“Ｆ”または“Ｂ”であり、かつ、その音符ｎについてコード進行データＧCが指定するコードの根音が“Ｃ”である場合には、音符ｎの音高の変更を実行しない。すなわち、音符ｎの音高は“Ｆ”または“Ｂ”のまま維持される。第２に、終結音の音符ｎについてコード進行データＧCが指定するコードの根音が“Ｃ”，“Ｆ”または“Ｇ”である場合、音符ｎの音高は、その音高に最も近いペンタトニック音（Ｃ,Ｄ,Ｅ,Ｇ,Ａ）に変更される。 However, the following exception processing is executed from the viewpoint of ensuring musical naturalness. First, when the pitch of the note n of the important note is “F” or “B” and the root note of the chord designated by the chord progression data GC for the note n is “C”, The pitch of note n is not changed. That is, the pitch of the note n is maintained as “F” or “B”. Second, if the root note of the chord designated by the chord progression data GC is “C”, “F” or “G” for the note n of the end note, the pitch of the note n is closest to that pitch. The sound is changed to a pentatonic sound (C, D, E, G, A).

第２調整処理ＳB2は、編成音符列Ｍを構成する複数の音符ｎのうち区間指定データＧAが経過音に指定する音符ｎの音高を調整する処理である。具体的には、音高調整部６４は、編成音符列Ｍのうち経過音に指定された各音符ｎの音高を、その音符ｎの直前または直後の音符ｎの音高を含む所定の範囲内の音高に変更する。例えば、経過音の各音符ｎの音高は、その直前または直後の音符ｎの音高を中心または端点（上限または下限）とする所定の範囲（例えば４半音分の範囲）内の幹音（Ｃ,Ｄ,Ｅ,Ｆ,Ｇ,Ａ,Ｂ）に設定される。 The second adjustment process SB2 is a process of adjusting the pitch of the note n designated by the section designation data GA as the elapsed sound among the plurality of notes n constituting the knitting note string M. Specifically, the pitch adjusting unit 64 includes a predetermined range including the pitch of each note n designated as the elapsed note in the knitted note string M, including the pitch of the note n immediately before or after the note n. Change to the pitch within. For example, the pitch of each note n of the elapsed sound is a stem tone within a predetermined range (for example, a range of four semitones) with the pitch of the note n immediately before or after it as the center or the end point (upper limit or lower limit). C, D, E, F, G, A, B).

第３調整処理ＳB3は、コード進行データＧCが編成音符列Ｍの各音符ｎ（重要音，終結音，経過音）について特定のコードを指定する場合の例外処理である。具体的には、音符ｎの音高が“Ａ”であり、かつ、その音符ｎについて指定されるコードがＦm系（例えばＦm，Ｆm7等）である場合、音高調整部６４は、音符ｎの音高を“Ａ”から“Ｇ#”に変更する。また、音符ｎの音高が“Ｂ”であり、かつ、その音符ｎについて指定されるコードがＦm系，Ｇm系，Ａ#系またはＣ7である場合、音高調整部６４は、音符ｎの音高を“Ｂ”から“Ａ#”に変更する。 The third adjustment process SB3 is an exception process when the chord progression data GC designates a specific chord for each note n (important sound, end sound, elapsed sound) of the knitting note string M. Specifically, when the pitch of the note n is “A” and the chord designated for the note n is Fm series (for example, Fm, Fm7, etc.), the pitch adjustment unit 64 Is changed from “A” to “G #”. When the pitch of the note n is “B” and the chord designated for the note n is Fm, Gm, A #, or C7, the pitch adjuster 64 determines that the note n The pitch is changed from “B” to “A #”.

図７の旋律生成部５２２（音高調整部６４）は、以上に説明した調整後の編成音符列Ｍを指定する旋律データＤMを生成する。図８に示すように、旋律データＤMは、調整後の編成音符列Ｍの各音符ｎに対応する複数の単位データＵMの時系列で構成される。各単位データＵMは、音符ｎの音高を指定する音高情報ＵM1と音符ｎの発音区間Ｓ（時間軸上の位置および継続長）を指定する時間情報ＵM2とを含んで構成される。具体的には、旋律データＤMは、音高指定データＤBと同様にＳＭＦ形式の音楽ファイルとして記述される。旋律データＤMが指定する各音符ｎと指定文字列ＧYが指定する各音節ｙとの関係は、第１取得部４２（譜割部４２４）が対応させた関係（各音符ｎの発音区間Ｓと指定文字列ＧYの各音節ｙとが１対１に対応する関係）に維持される。 The melody generation unit 522 (pitch adjustment unit 64) in FIG. 7 generates melody data DM that specifies the adjusted knitted note string M described above. As shown in FIG. 8, the melody data DM is composed of time series of a plurality of unit data UM corresponding to each note n of the adjusted musical note string M. Each unit data UM includes pitch information UM1 for designating the pitch of the note n and time information UM2 for designating the sound generation section S (position and duration on the time axis) of the note n. Specifically, the melody data DM is described as a music file in the SMF format, like the pitch designation data DB. The relationship between each note n designated by the melody data DM and each syllable y designated by the designated character string GY is the relationship (the pronunciation interval S of each note n) and the relationship associated with the first acquisition unit 42 (the score division unit 424). Each syllable y of the designated character string GY is maintained in a one-to-one relationship).

図７の音声合成部５２４は、旋律生成部５２２が生成した旋律データＤMと第１取得部４２が生成した指定文字列ＧYとに応じた音声信号ＶAを生成する。音声信号ＶAは、指定文字列ＧYの各音節ｙを、旋律データＤMがその音節ｙに対応する音符ｎに指定する音高で発声したときの歌唱音の音響信号である。すなわち、音声信号ＶAは、指定文字列ＧYを歌詞として編成音符列Ｍ（メロディライン）を歌唱した歌唱音に相当する。音声信号ＶAの生成には、例えば公知の素片接続型の音声合成処理が好適に採用され得る。すなわち、音声合成部５２４は、指定文字列ＧYが指定する各音節ｙに対応する音声素片を順次に選択し、各音声素片を、旋律データＤMが指定する各音符ｎの発音区間Ｓと音高とに調整したうえで相互に連結することで音声信号ＶAを生成する。音声信号ＶAのテンポは、変数設定部３２が設定したテンポＸtに設定される。 7 generates a speech signal VA corresponding to the melody data DM generated by the melody generation unit 522 and the designated character string GY generated by the first acquisition unit 42. The voice signal VA is an acoustic signal of a singing sound when each syllable y of the designated character string GY is uttered at a pitch specified by the note n corresponding to the syllable y in the melody data DM. That is, the audio signal VA corresponds to the singing sound of singing the organized note string M (melody line) with the designated character string GY as lyrics. For the generation of the audio signal VA, for example, a known unit connection type speech synthesis process can be suitably employed. That is, the speech synthesizer 524 sequentially selects speech segments corresponding to each syllable y designated by the designated character string GY, and each speech segment is selected from the pronunciation interval S of each note n designated by the melody data DM. The audio signal VA is generated by adjusting the pitch and connecting them to each other. The tempo of the audio signal VA is set to the tempo Xt set by the variable setting unit 32.

図１の伴奏音生成部５４は、伴奏音を示す伴奏信号ＶBを生成する。図１０に示すように、伴奏音生成部５４は、伴奏データ生成部５４２と伴奏信号生成部５４４とを含んで構成される。伴奏データ生成部５４２は、第３取得部４６が取得したコード進行データＧCに応じた伴奏データＤEを生成する。伴奏データＤEは、コード進行データＧCが順次に指定するコードに対応した伴奏音を時系列に指定するＳＭＦ形式の音楽ファイルである。伴奏データＤEが指定する伴奏音の種類やリズムは、変数設定部３２が設定したスタイルＸsに応じて設定される。また、伴奏データＤEのテンポは、変数設定部３２が設定したテンポＸtに設定される。伴奏信号生成部５４４は、伴奏データＤEに対して所定の処理（例えばＭＩＤＩ音源による楽音生成処理）を実行することで伴奏信号ＶBを生成する。以上の説明から理解されるように、音声信号ＶAと伴奏信号ＶBとは同等のテンポに設定される。 1 generates an accompaniment signal VB indicating an accompaniment sound. As shown in FIG. 10, the accompaniment sound generation unit 54 includes an accompaniment data generation unit 542 and an accompaniment signal generation unit 544. The accompaniment data generation unit 542 generates accompaniment data DE corresponding to the chord progression data GC acquired by the third acquisition unit 46. The accompaniment data DE is a music file in the SMF format in which accompaniment sounds corresponding to the chords sequentially designated by the chord progression data GC are designated in time series. The type and rhythm of the accompaniment sound specified by the accompaniment data DE is set according to the style Xs set by the variable setting unit 32. The tempo of the accompaniment data DE is set to the tempo Xt set by the variable setting unit 32. The accompaniment signal generation unit 544 generates an accompaniment signal VB by executing predetermined processing (for example, musical tone generation processing by a MIDI sound source) on the accompaniment data DE. As understood from the above description, the audio signal VA and the accompaniment signal VB are set to the same tempo.

図１の混合部５６は、歌唱音生成部５２が生成した音声信号ＶAと伴奏音生成部５４が生成した伴奏信号ＶBとの混合（加重和）で音響信号Ｖを生成する。混合部５６が生成した音響信号Ｖが放音装置１４に供給されることで音波が再生される。以上の説明から理解されるように、音響信号Ｖの再生音は、旋律データＤMが示す旋律を指定文字列ＧYで歌唱した歌唱音に伴奏音が付加された歌唱曲の演奏音となる。 The mixing unit 56 of FIG. 1 generates the acoustic signal V by mixing (weighted sum) of the audio signal VA generated by the singing sound generating unit 52 and the accompaniment signal VB generated by the accompaniment sound generating unit 54. The acoustic signal V generated by the mixing unit 56 is supplied to the sound emitting device 14 to reproduce the sound wave. As can be understood from the above description, the reproduced sound of the acoustic signal V is a performance sound of a song in which an accompaniment sound is added to a song sound sung by the designated character string GY of the melody indicated by the melody data DM.

以上に説明したように、第１実施形態では、複数の区間指定データＤAの何れかに応じた区間指定データＧAと複数の音高指定データＤBの何れかに応じた音高指定データＧBとを利用して旋律データＤMが生成される。したがって、区間指定データＧAが共通する場合でも、音高指定データＧBが相違するならば、相異なる旋律の旋律データＤMが生成され、音高指定データＧBが共通する場合でも、区間指定データＧAが相違するならば、相異なる旋律の旋律データＤMが生成される。すなわち、第１実施形態によれば、例えばメロディ素材データが指定する各音高を変更するだけの特許文献２の技術と比較して、楽曲生成の素材として選択された楽曲が少ない場合でも多様な楽曲を生成できるという利点がある。 As described above, in the first embodiment, the section specifying data GA corresponding to any of the plurality of section specifying data DA and the pitch specifying data GB corresponding to any of the plurality of pitch specifying data DB are used. Using this, melody data DM is generated. Therefore, even if the section designation data GA is common, if the pitch designation data GB is different, melody data DM having different melody is generated, and even if the pitch designation data GB is common, the section designation data GA is If they are different, melody data DM having different melody is generated. That is, according to the first embodiment, for example, compared to the technique of Patent Document 2 in which only the pitches specified by the melody material data are changed, there are a variety of cases even when there are few songs selected as material for music generation. There is an advantage that music can be generated.

また、例えば既存の楽曲のリズムパターンを表現する区間指定データＤAと既存の楽曲のメロディラインを表現する音高指定データＤBとをテンプレートとして利用して旋律データＤMが生成されるから、例えば音高をランダムに選択する特許文献１の技術と比較して、音楽的に自然な旋律を生成できるという利点もある。すなわち、第１実施形態によれば、音楽的に自然で多様な旋律を生成することが可能である。 Further, for example, the melody data DM is generated using the section designation data DA representing the rhythm pattern of the existing music and the pitch designation data DB representing the melody line of the existing music as templates. Compared with the technique of Patent Document 1 that randomly selects a melody, there is also an advantage that a musically natural melody can be generated. That is, according to the first embodiment, it is possible to generate musically natural and various melody.

第１実施形態では、区間指定データＤAと音高指定データＤBとから生成された編成音符列Ｍの各音符ｎの音高が音高調整部６４にて調整されるから、編成音符列Ｍを調整しない構成と比較して音楽的に自然な旋律を生成できるという利点がある。具体的には、第１実施形態では、重要音や終結音の音符ｎの音高が、コード進行データＧCで指定されるコードの構成音の音高に変更されるから（第１調整処理ＳB1）、旋律内で特に重要な音符の音高を音楽的に自然に遷移させることが可能である。また、区間指定データＤAと音高指定データＤBとコード進行データＤCの組合せに応じた多様な旋律を生成できるという利点もある。また、経過音の音符ｎの音高が直前または直後の音符ｎの音高に対して所定の範囲内の音高に変更されるから（第２調整処理ＳB2）、経過音の音高も音楽的に自然に遷移させることが可能である。 In the first embodiment, the pitch adjustment unit 64 adjusts the pitch of each note n of the knitting note string M generated from the section specifying data DA and the pitch specifying data DB. There is an advantage that a musically natural melody can be generated as compared with a configuration without adjustment. Specifically, in the first embodiment, the pitch of the note n of the important sound and the end sound is changed to the pitch of the constituent sound of the chord designated by the chord progression data GC (first adjusting process SB1). ), It is possible to shift musically natural pitches of notes that are particularly important in the melody. There is also an advantage that various melody can be generated according to the combination of the section designation data DA, the pitch designation data DB, and the chord progression data DC. Further, since the pitch of the note n of the elapsed sound is changed to a pitch within a predetermined range with respect to the pitch of the immediately preceding or subsequent note n (second adjustment process SB2), the pitch of the elapsed sound is also music. It is possible to transition naturally.

第１実施形態では、区間指定データＧAが指定する各発音区間Ｓと指定文字列ＧYの各音節ｙとが譜割部４２４にて対応付けられるから、各発音区間Ｓと各音節ｙとの対応が明確で自然な歌唱音の音声信号ＶAを生成できるという利点がある。第１実施形態では特に、指定文字列ＤYの音節ｙの個数が多いほど発音区間Ｓの個数が多い区間指定データＤAが選択されるから、発音区間Ｓの個数と音節ｙの個数とが大幅に相違し得る構成と比較して、各発音区間Ｓと各音節ｙとを無理なく対応させることが可能である。また、発音区間Ｓの個数Ｎiと音節ｙの個数ＮYiとの差異がブロックＱB1〜ＱB4について最小化されるように指定文字列ＤYをブロックＱB毎に区分して各発音区間Ｓと各音節ｙとが対応付けられるから、ブロックＱB1〜ＱB4の各々において各発音区間Ｓと各音節ｙとを無理なく対応させることが可能である。しかも、各発音区間Ｓと各音節ｙとが１対１に対応するように発音区間Ｓの分割（ＳA2）や指定文字列ＤYに対する音節ｙの挿入（ＳA3）が実行されるから、旋律の各音符ｎに１個の音節ｙが割当てられた自然な歌唱音の音声信号ＶAを生成できるという利点もある。 In the first embodiment, each pronunciation section S designated by the section designation data GA and each syllable y of the designated character string GY are associated by the musical score section 424. Therefore, the correspondence between each pronunciation section S and each syllable y. There is an advantage that a voice signal VA of a clear and natural singing sound can be generated. In the first embodiment, in particular, as the number of syllables y in the designated character string DY increases, the section designation data DA having a larger number of pronunciation sections S is selected, so the number of pronunciation sections S and the number of syllables y are greatly increased. Compared to a configuration that can be different, each sound generation section S can be associated with each syllable y without difficulty. Further, the designated character string DY is divided for each block QB so that the difference between the number Ni of the sounding sections S and the number NYi of the syllables y is minimized for the blocks QB1 to QB4. Are associated with each other, it is possible to associate each sounding section S with each syllable y without difficulty in each of the blocks QB1 to QB4. In addition, the division of the pronunciation section S (SA2) and the insertion of the syllable y (SA3) to the designated character string DY are executed so that each pronunciation section S and each syllable y have a one-to-one correspondence. There is also an advantage that a voice signal VA of a natural singing sound in which one syllable y is assigned to the note n can be generated.

第１実施形態では、伴奏音生成部５４が生成した伴奏信号ＶBが音声信号ＶAに付加されるから、音声信号ＶAを単独で再生する構成と比較して音楽性の豊かな楽曲を生成できるという利点がある。第１実施形態では特に、複数のコード進行データＤCから選択されたコード進行データＧCを利用して伴奏信号ＶBが生成されるから、伴奏が１種類に固定された構成と比較して多様な楽曲を生成できるという利点がある。しかも、第１実施形態では、利用者からの指示に応じたスタイルＸsに応じた伴奏データＤEが生成されるから、利用者の意図や嗜好に合致した楽曲を生成することが可能である。また、音声信号ＶAと伴奏信号ＶBとが共通のテンポＸtに設定されるから、伴奏音と旋律音とが自然に整合した楽曲を生成できるという利点もある。 In the first embodiment, since the accompaniment signal VB generated by the accompaniment sound generation unit 54 is added to the audio signal VA, it is possible to generate a musical composition rich in musicality as compared with the configuration in which the audio signal VA is reproduced alone. There are advantages. In the first embodiment, in particular, since the accompaniment signal VB is generated using chord progression data GC selected from a plurality of chord progression data DC, a variety of music is produced as compared with a configuration in which the accompaniment is fixed to one type. There is an advantage that can be generated. Moreover, in the first embodiment, the accompaniment data DE corresponding to the style Xs corresponding to the instruction from the user is generated, so that it is possible to generate music that matches the user's intention and preference. Further, since the audio signal VA and the accompaniment signal VB are set to a common tempo Xt, there is also an advantage that a musical piece in which the accompaniment sound and the melody sound are naturally matched can be generated.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。第１実施形態では、楽曲生成装置１００Aを単体の装置で実現した構成を例示した。第２実施形態では、相互に通信可能な複数のサーバ装置が協働することで、第１実施形態の楽曲生成装置１００Aと同様の機能の楽曲生成装置１００Bが実現される。なお、以下に例示する各態様において作用や機能が第１実施形態と同等である要素については、以上の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In the first embodiment, a configuration in which the music generation device 100A is realized by a single device has been exemplified. In the second embodiment, the music generation device 100B having the same function as the music generation device 100A of the first embodiment is realized by the cooperation of a plurality of server devices that can communicate with each other. In addition, about the element which an effect | action and a function are equivalent to 1st Embodiment in each aspect illustrated below, each reference detailed in the above description is diverted and each detailed description is abbreviate | omitted suitably.

図１１は、第２実施形態に係る楽曲生成装置１００Bのブロック図である。図１１に示すように、第２実施形態の楽曲生成装置１００Bは、音響信号Ｖを生成して端末装置１０に提供する通信システム（楽曲生成システム）であり、管理サーバ装置７０と旋律生成サーバ装置７２と音声信号合成サーバ装置７４と伴奏生成サーバ装置７６と伴奏信号合成サーバ装置７８とを含んで構成される。各サーバ装置はインターネット等の通信網を介して相互に通信可能である。端末装置１０は、例えば携帯電話機やパーソナルコンピュータ等の通信端末であり、入力装置１２と放音装置１４とを含んで構成される。 FIG. 11 is a block diagram of a music generation device 100B according to the second embodiment. As shown in FIG. 11, the music generation device 100B of the second embodiment is a communication system (music generation system) that generates an acoustic signal V and provides it to the terminal device 10, and includes a management server device 70 and a melody generation server device. 72, an audio signal synthesis server device 74, an accompaniment generation server device 76, and an accompaniment signal synthesis server device 78. Each server device can communicate with each other via a communication network such as the Internet. The terminal device 10 is a communication terminal such as a mobile phone or a personal computer, and includes an input device 12 and a sound emitting device 14.

管理サーバ装置７０は、端末装置１０と通信するウェブサーバであり、変数設定部３２と文字列設定部３４と記憶部２４２と第３取得部４６と混合部５６とを具備する。変数設定部３２は、端末装置１０の入力装置１２に対する利用者からの指示に応じてテンポＸtおよびスタイルＸsを設定し、文字列設定部３４は、入力装置１２に対する利用者からの指示に応じて指定文字列ＤYを設定する。記憶部２４２は、複数のコード進行データＤCを記憶する。第３取得部４６は、複数のコード進行データＤCの何れかをコード進行データＧCとして取得する。指定文字列ＤYとテンポＸtとコード進行データＧCとは旋律生成サーバ装置７２に送信され、コード進行データＧCとテンポＸtとスタイルＸsとは伴奏生成サーバ装置７６に送信される。 The management server device 70 is a web server that communicates with the terminal device 10, and includes a variable setting unit 32, a character string setting unit 34, a storage unit 242, a third acquisition unit 46, and a mixing unit 56. The variable setting unit 32 sets the tempo Xt and the style Xs according to an instruction from the user to the input device 12 of the terminal device 10, and the character string setting unit 34 corresponds to the instruction from the user to the input device 12. Set the specified character string DY. The storage unit 242 stores a plurality of chord progression data DC. The third acquisition unit 46 acquires any of the plurality of chord progression data DC as chord progression data GC. The designated character string DY, tempo Xt and chord progression data GC are transmitted to the melody generation server device 72, and the chord progression data GC, tempo Xt and style Xs are transmitted to the accompaniment generation server device 76.

旋律生成サーバ装置７２は、記憶部２４４と第１取得部４２と第２取得部４４と旋律生成部５２２とを具備する。記憶部２４４は、複数の区間指定データＤAと複数の音高指定データＤBとを記憶する。第１取得部４２は、記憶部２４４に記憶された複数の区間指定データＤAの何れかを指定文字列ＤYに応じて選択して図６の譜割処理を実行することで区間指定データＧAと指定文字列ＧYとを生成する。第２取得部４４は、記憶部２４４に記憶された複数の音高指定データＤBの何れかを音高指定データＧBとして選択する。旋律生成部５２２は、区間指定データＧAと音高指定データＧBとコード進行データＧCとに応じた旋律データＤMを生成する。旋律データＤMと指定文字列ＧYとテンポＸtとは音声信号合成サーバ装置７４に送信される。 The melody generation server device 72 includes a storage unit 244, a first acquisition unit 42, a second acquisition unit 44, and a melody generation unit 522. The storage unit 244 stores a plurality of section designation data DA and a plurality of pitch designation data DB. The first acquisition unit 42 selects any one of the plurality of section designation data DA stored in the storage unit 244 according to the designated character string DY, and executes the musical score processing of FIG. A designated character string GY is generated. The second acquisition unit 44 selects any one of the plurality of pitch designation data DB stored in the storage unit 244 as the pitch designation data GB. The melody generation unit 522 generates melody data DM according to the section designation data GA, the pitch designation data GB, and the chord progression data GC. The melody data DM, the designated character string GY, and the tempo Xt are transmitted to the audio signal synthesis server device 74.

音声信号合成サーバ装置７４は、旋律データＤMと指定文字列ＧYとテンポＸtとに応じた音声信号ＶAを生成する音声合成部５２４を具備する。音声合成部５２４が生成した音声信号ＶAは、旋律生成サーバ装置７２を経由して管理サーバ装置７０に送信される。 The voice signal synthesis server device 74 includes a voice synthesis unit 524 that generates a voice signal VA corresponding to the melody data DM, the designated character string GY, and the tempo Xt. The voice signal VA generated by the voice synthesizer 524 is transmitted to the management server device 70 via the melody generation server device 72.

伴奏生成サーバ装置７６は、コード進行データＧCとテンポＸtとスタイルＸsとに応じた伴奏データＤEを生成する。伴奏信号合成サーバ装置７８は、伴奏生成サーバ装置７６から供給される伴奏データＤEに応じた伴奏信号ＶBを生成する伴奏信号生成部５４４を具備する。伴奏信号生成部５４４が生成した伴奏信号ＶBは、伴奏生成サーバ装置７６を経由して管理サーバ装置７０に送信される。 The accompaniment generation server device 76 generates accompaniment data DE corresponding to the chord progression data GC, the tempo Xt, and the style Xs. The accompaniment signal synthesis server device 78 includes an accompaniment signal generation unit 544 that generates an accompaniment signal VB according to the accompaniment data DE supplied from the accompaniment generation server device 76. The accompaniment signal VB generated by the accompaniment signal generation unit 544 is transmitted to the management server device 70 via the accompaniment generation server device 76.

管理サーバ装置７０の混合部５６は、音声信号合成サーバ装置７４が生成した音声信号ＶAと伴奏信号合成サーバ装置７８が生成した伴奏信号ＶBとを混合して音響信号Ｖを生成する。音響信号Ｖが端末装置１０に送信されて放音装置１４から音波として再生される。 The mixing unit 56 of the management server device 70 generates the acoustic signal V by mixing the audio signal VA generated by the audio signal synthesis server device 74 and the accompaniment signal VB generated by the accompaniment signal synthesis server device 78. The acoustic signal V is transmitted to the terminal device 10 and reproduced as a sound wave from the sound emitting device 14.

第２実施形態においても第１実施形態と同様の効果が実現される。なお、楽曲生成装置１００Bを構成するサーバ装置の個数や各々が分担する機能は、図１１の例示から適宜に変更される。例えば、図１１の旋律生成サーバ装置７２および音声信号合成サーバ装置７４の機能を単体のサーバ装置が分担する構成や、伴奏生成サーバ装置７６および伴奏信号合成サーバ装置７８の機能を単体のサーバ装置が分担する構成も採用され得る。 In the second embodiment, the same effect as in the first embodiment is realized. Note that the number of server devices constituting the music generation device 100B and the functions shared by each are appropriately changed from the example of FIG. For example, a single server device shares the functions of the melody generation server device 72 and the audio signal synthesis server device 74 of FIG. 11, and the single server device has the functions of the accompaniment generation server device 76 and the accompaniment signal synthesis server device 78. A shared configuration may also be employed.

＜変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）第１取得部４２が区間指定データＧAを取得する方法は適宜に変更される。例えば、第１取得部４２が複数の区間指定データＤAの何れかを区間指定データＧAとしてランダムに選択する構成（すなわち、譜割部４２４を省略した構成）も採用される。すなわち、区間指定データＤAを指定文字列ＤYに応じて選択する構成や、区間指定データＤAに対して譜割処理を実行する構成は省略され得る。以上の説明から理解されるように、第１取得部４２は、複数の区間指定データＤAの何れかに応じた区間指定データＧAを取得する要素として包括される。区間指定データＧAは、第１実施形態の例示のように譜割部４２４による譜割処理後の区間指定データＤAと区間指定データＤA自身との双方を包含する。第２取得部４４や第３取得部４６の動作も適宜に変更される。例えば、第２取得部４４が、複数の音高指定データＤBの何れかを記憶装置２４から選択して所定の処理（例えば各音符の音高を所定量だけ増加させる処理）を実行することで音高指定データＧBを生成する構成も採用される。第３取得部４６についても同様である。 (1) The method by which the first acquisition unit 42 acquires the section designation data GA is changed as appropriate. For example, a configuration in which the first acquisition unit 42 randomly selects any one of the plurality of section designation data DA as the section designation data GA (that is, a configuration in which the score section 424 is omitted) is also employed. That is, the configuration for selecting the section designation data DA according to the designated character string DY and the configuration for executing the musical score processing on the section designation data DA can be omitted. As understood from the above description, the first acquisition unit 42 is included as an element for acquiring the section designation data GA corresponding to any of the plurality of section designation data DA. The section designation data GA includes both the section designation data DA and the section designation data DA itself after the musical score processing by the musical score section 424 as illustrated in the first embodiment. The operations of the second acquisition unit 44 and the third acquisition unit 46 are also changed as appropriate. For example, the second acquisition unit 44 selects one of the plurality of pitch designation data DB from the storage device 24 and executes a predetermined process (for example, a process of increasing the pitch of each note by a predetermined amount). A configuration for generating pitch designation data GB is also employed. The same applies to the third acquisition unit 46.

（２）前述の各形態では、指定文字列ＧYに応じた音声信号ＶAを音声合成部５２４が生成したが、文字列設定部３４や音声合成部５２４は省略され得る。すなわち、編成音符列Ｍを指定する旋律データＤMの生成を目的とする装置や、旋律データＤMが指定する編成音符列Ｍの楽器音を示す音響信号を生成する装置としても本発明は実現され得る。 (2) In each of the above-described embodiments, the voice synthesizer 524 generates the voice signal VA corresponding to the designated character string GY, but the character string setting unit 34 and the voice synthesizer 524 may be omitted. That is, the present invention can also be realized as an apparatus for generating melody data DM specifying the knitting note string M, or an apparatus for generating an acoustic signal indicating the musical instrument sound of the knitting note string M specified by the melody data DM. .

（３）伴奏音生成部５４が伴奏信号ＶBを生成する方法は適宜に変更される。例えば、相異なるスタイルに対応する複数の伴奏データＤEを記憶装置２４に保持し、利用者が指定したスタイルＸsに応じた伴奏データＤEを選択して伴奏信号ＶBを生成することも可能である。また、伴奏音生成部５４を省略した構成（すなわち旋律のみを生成する構成）も採用され得る。 (3) The method by which the accompaniment sound generation unit 54 generates the accompaniment signal VB is appropriately changed. For example, a plurality of accompaniment data DE corresponding to different styles can be stored in the storage device 24, and the accompaniment signal VB can be generated by selecting the accompaniment data DE corresponding to the style Xs designated by the user. Further, a configuration in which the accompaniment sound generation unit 54 is omitted (that is, a configuration that generates only a melody) may be employed.

（４）前述の各形態では、区間指定データＧAが指定する発音区間Ｓの始点について音高指定データＧBが指定する音高を音高抽出部６２が抽出したが、音高が抽出される時点は発音区間Ｓの始点に限定されない。例えば、発音区間Ｓの始点から所定の時間が経過した時点や発音区間Ｓの中点について音高指定データＧBが指定する音高をその発音区間Ｓに対応する音符ｎの音高として抽出することも可能である。以上の説明から理解されるように、旋律生成部５２２は、区間指定データＧAが指定する各発音区間Ｓと、音高指定データＧBがその発音区間Ｓの基準時点について指定する音高とに応じた音符の時系列を指定する旋律データＤMを生成する要素として包括され、発音区間Ｓの基準時点は、発音区間Ｓに対して所定の位置関係にある時点（発音区間Ｓの始点や中点）を意味する。 (4) In each of the above-described embodiments, the pitch extraction unit 62 extracts the pitch specified by the pitch specification data GB for the start point of the sound generation section S specified by the section specification data GA. Is not limited to the starting point of the pronunciation period S. For example, the pitch specified by the pitch specification data GB at the time when a predetermined time has elapsed from the start point of the sounding section S or the middle point of the sounding section S is extracted as the pitch of the note n corresponding to the sounding section S. Is also possible. As understood from the above description, the melody generation unit 522 responds to each sound generation section S designated by the section designation data GA and the pitch designated by the pitch designation data GB for the reference time point of the sound generation section S. The reference time of the sound generation section S is a time point that is in a predetermined positional relationship with the sound generation section S (the start point and the middle point of the sound generation section S). Means.

（５）前述の各形態では、編成音符列Ｍのうち重要音および終結音に該当する音符ｎについて第１調整処理ＳB1を実行し、経過音に該当する音符ｎについて第２調整処理ＳB2を実行したが、編成音符列Ｍの全部の音符ｎに第１調整処理ＳB1や第２調整処理ＳB2を実行することも可能である。 (5) In each of the above-described forms, the first adjustment process SB1 is executed for the note n corresponding to the important sound and the end sound in the knitted note string M, and the second adjustment process SB2 is executed for the note n corresponding to the elapsed sound. However, it is also possible to execute the first adjustment process SB1 and the second adjustment process SB2 on all the notes n of the knitting note string M.

（６）前述の各形態で例示したデータの形式は任意に変更される。例えば、区間指定データＤA，音高指定データＤB，旋律データＤMおよび伴奏データＤEは、ＳＭＦ形式に限定されず、任意の形式で記述され得る。コード進行データＤCをＳＭＦ形式で記述することも可能である。また、音響信号Ｖや音声信号ＶAや伴奏信号ＶBは、時間波形のサンプル系列には限定されず、時間波形を時系列に指定するデータ（例えばＭＩＤＩ規格に準拠した演奏データ）としても表現され得る。 (6) The data format exemplified in each of the above embodiments is arbitrarily changed. For example, the section designation data DA, pitch designation data DB, melody data DM, and accompaniment data DE are not limited to the SMF format and can be described in any format. It is also possible to describe the chord progression data DC in the SMF format. The acoustic signal V, the audio signal VA, and the accompaniment signal VB are not limited to the time waveform sample series, and can be expressed as data specifying the time waveform in time series (for example, performance data compliant with the MIDI standard). .

１００A，１００B……楽曲生成装置、１０……端末装置、１２……入力装置、１４……放音装置、２２……演算処理装置、２４……記憶装置、３２……変数設定部、３４……文字列設定部、４２……第１取得部、４２２……選択部、４２４……譜割部、４４……第２取得部、４６……第３取得部、５２……歌唱音生成部、５２２……旋律生成部、５２４……音声合成部、５４……伴奏音生成部、５４２……伴奏データ生成部、５４４……伴奏信号生成部、５６……混合部、６２……音高抽出部、６４……音高調整部、７０……管理サーバ装置、７２……旋律生成サーバ装置、７４……音声信号合成サーバ装置、７６……伴奏生成サーバ装置、７８……伴奏信号合成サーバ装置。
100A, 100B ... Music generation device, 10 ... Terminal device, 12 ... Input device, 14 ... Sound emission device, 22 ... Arithmetic processing device, 24 ... Storage device, 32 ... Variable setting unit, 34 ... ... Character string setting section, 42 ... First acquisition section, 422 ... Selection section, 424 ... Musical score section, 44 ... Second acquisition section, 46 ... Third acquisition section, 52 ... Singing sound generation section 522 …… Melodic generator, 524 …… Speech synthesizer, 54 …… Accompaniment sound generator, 542 …… Accompaniment data generator, 544 …… Accompaniment signal generator, 56 …… Mixer, 62 …… Pitch Extraction unit 64... Pitch adjustment unit 70... Management server device 72 .. melody generation server device 74 .. speech signal synthesis server device 76 .. accompaniment generation server device 78 .. accompaniment signal synthesis server apparatus.

Claims

First acquisition means for acquiring section specifying data corresponding to any of a plurality of section specifying data specifying a time series of pronunciation sections;
Second acquisition means for acquiring pitch designation data according to any of a plurality of pitch designation data for designating a time series of pitches;
At the time of a note corresponding to each sounding section specified by the section specifying data acquired by the first acquisition means and the pitch specified by the pitch specifying data acquired by the second acquisition means for the reference time of the sounding section A music generation device comprising: melody generation means for generating melody data for designating a series.

The melody generation means includes
A pitch extraction means for generating a knitted note sequence in which notes having the respective tone generation sections designated by the section designation data and the pitch designated by the pitch designation data at the reference time of the sound generation section are arranged;
The music generating device according to claim 1, further comprising pitch adjusting means for adjusting the pitch of each note of the knitted note sequence generated by the pitch extracting means to generate the melody data.

Character string setting means for acquiring a designated character string in which a plurality of audio units are arranged in time series,
The music generating device according to claim 1, further comprising: a voice synthesizing unit that generates a voice signal of a voice in which the designated character string is pronounced by a plurality of notes designated by the melody data generated by the melody generating unit.

The first acquisition means includes
Selecting means for selecting any of the plurality of section designation data;
Including a musical score section that associates each sound generation section designated by the section designation data selected by the selection means with each voice unit of the designated character string,
The voice synthesis unit generates a voice signal of a voice that produces a voice unit corresponding to the sounding section in the sounding section of each of a plurality of notes designated by the melody data. Music generation device.

Accompaniment sound generating means for generating an accompaniment signal indicating the accompaniment sound;
The music generating device according to claim 3 or 4, further comprising: a mixing unit that mixes the audio signal generated by the voice synthesizing unit and the accompaniment signal generated by the accompaniment sound generating unit.