JP2016206323A

JP2016206323A - Singing sound synthesis device

Info

Publication number: JP2016206323A
Application number: JP2015085604A
Authority: JP
Inventors: 桂三濱野; Keizo Hamano; 智子奥村; Tomoko Okumura
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-04-20
Filing date: 2015-04-20
Publication date: 2016-12-08
Anticipated expiration: 2035-04-20
Also published as: JP6485185B2

Abstract

PROBLEM TO BE SOLVED: To move a text to a desired position with a simple operation when generating a singing sound at real time performance.SOLUTION: When a key of A is pressed K0 at time t0, velocity V0 of the key pressing k0 is detected in response to reception of a sound generation instruction. Then, a character 'sa' of a cursor position is acquired, and voice stock piece data of sa is selected. Then, based on the voice stock piece data, a singing sound of sa is generated at pitch in which a sound source is A and predetermined prescribed volume. If a key k2 is not pressed at time t2, and if a key of A is pressed at time t4 at the velocity V1, sa is acquired, sa is next to ra on a cursor position, and a singing sound of sa is generated at the pitch in which the sound source is A and predetermined prescribed volume.SELECTED DRAWING: Figure 3

Description

この発明は、リアルタイム演奏で歌唱音を発音させる際に、簡易な操作で歌詞を所望の位置に進めることのできる歌唱音合成装置に関する。 The present invention relates to a singing sound synthesizing apparatus capable of advancing lyrics to a desired position with a simple operation when singing a singing sound in real-time performance.

従来、タッチパネルに表示された歌詞から任意の区間をユーザーが指定し、指定された歌詞を押鍵された鍵に応じたピッチで歌唱音声として出力させる歌唱音合成装置が知られている（特許文献１参照）。この歌唱音合成装置では、押鍵のたびに次の音節へと進める第１モードと、選択された音節のみを押鍵のたびに繰り返し読み出す第２モードとを有している。
このような従来の歌唱音合成装置では、ユーザーはタッチパネルの操作により所望の歌詞を選択することができ、鍵盤の押鍵操作により、選択された歌詞の各音節を所望のタイミングにおいて所望のピッチの歌唱音声として出力することができる。 2. Description of the Related Art Conventionally, a singing sound synthesizer is known in which a user specifies an arbitrary section from lyrics displayed on a touch panel and outputs the specified lyrics as a singing voice at a pitch corresponding to the key that is pressed (Patent Literature). 1). This singing sound synthesizer has a first mode in which the syllable is advanced to the next syllable each time a key is pressed, and a second mode in which only the selected syllable is repeatedly read each time the key is pressed.
In such a conventional singing sound synthesizer, the user can select a desired lyrics by operating the touch panel, and each syllable of the selected lyrics can be selected at a desired pitch by pressing the key on the keyboard. It can be output as a singing voice.

特開２０１４−１０１９０号公報JP 2014-10190 A

従来の歌唱音合成装置では、押鍵に応じて自動的に１文字ずつあるいは１音節ずつ進める場合、演奏ミス等があると、歌詞の位置が演奏内容とずれてしまうことがある。その際、タッチパネル等のユーザーインタフェースを操作して歌詞の位置をコントロールするのは、リアルタイム演奏を行っていることから困難になるという問題点があった。 In a conventional singing sound synthesizing apparatus, when a character or a syllable is automatically advanced in accordance with a key press, if there is a performance mistake or the like, the position of the lyrics may deviate from the performance content. At that time, it is difficult to control the position of the lyrics by operating a user interface such as a touch panel because real-time performance is performed.

そこで、本発明は、リアルタイム演奏で歌唱音を発音させる際に、簡易な操作で歌詞を所望の位置に進められるようにする歌唱音合成装置を提供することを目的としている。 Therefore, an object of the present invention is to provide a singing sound synthesizing apparatus that allows a lyric to be advanced to a desired position with a simple operation when a singing sound is generated in real-time performance.

上記目的を達成するために、本発明の歌唱音合成装置は、演奏操作子の操作開始および操作強度を検出する検出手段と、該検出手段が検出した前記演奏操作子の操作開始に応じて、文字情報を記憶装置から取得する取得手段と、発音制御手段とを備え、前記取得手段が取得する文字の位置が、前記検出手段が検出した前記演奏操作子の操作強度に応じて制御され、前記発音制御手段が、前記取得手段で取得された文字を音声として発音することを最も主要な特徴としている。 In order to achieve the above object, the singing sound synthesizer of the present invention includes a detection means for detecting the operation start and operation intensity of the performance operator, and according to the operation start of the performance operator detected by the detection means, An acquisition means for acquiring character information from a storage device; and a pronunciation control means; the position of the character acquired by the acquisition means is controlled according to the operation intensity of the performance operator detected by the detection means; The main feature is that the pronunciation control means pronounces the characters acquired by the acquisition means as speech.

本発明の歌唱音合成装置では、演奏操作子の操作に応じて取得した文字を音声として発音する場合に、演奏操作子の操作強度に応じて取得する文字を制御することができることから、リアルタイム演奏の際に、演奏操作子の操作強度を操作することで歌詞を所望の位置に進めることができるようになる。 In the singing sound synthesizer of the present invention, when a character acquired in response to the operation of the performance operator is pronounced as a voice, the character acquired in accordance with the operation intensity of the performance operator can be controlled. In this case, the lyrics can be advanced to a desired position by operating the operation intensity of the performance operator.

本発明の歌唱音合成装置のハードウェア構成を示す機能ブロック図である。It is a functional block diagram which shows the hardware constitutions of the song sound synthesizer of this invention. 本発明にかかる歌唱音合成装置における歌詞データの構成、ベロシティ設定の態様を説明する図である。It is a figure explaining the structure of the lyric data in the singing sound synthesizer concerning this invention, and the aspect of a velocity setting. 本発明にかかる歌唱音合成装置の動作例を示す図である。It is a figure which shows the operation example of the singing sound synthesizer concerning this invention. 本発明にかかる歌唱音合成装置が実行する歌唱音合成処理のフローチャートである。It is a flowchart of the singing sound synthetic | combination process which the singing sound synthesizer concerning this invention performs.

本発明の歌唱音合成装置のハードウェア構成を示す機能ブロック図を図１に示す。
図１に示す本発明の歌唱音合成装置１において、ＣＰＵ（Central Processing Unit）１０は、本発明の歌唱音合成装置１の全体の制御を行う中央処理装置であり、ＲＯＭ（Read Only Memory）１１は制御プログラムおよび各種のデータなどが格納されている不揮発性のメモリであり、ＲＡＭ（Random Access Memory）３はＣＰＵ１０のワーク領域および各種のバッファなどとして使用される揮発性のメモリである。フラッシュメモリ等の書換可能なメモリとされるとされるデータメモリ１８には、歌詞のテキストデータを含む文字情報および歌唱音の音声素片データが格納されている音韻データベースなどが格納されている。表示部１５は、動作状態および各種設定画面やユーザーに対するメッセージなどが表示される液晶表示器等からなる表示部である。演奏操作子１６は鍵盤などからなる演奏操作子であり、キーオン、キーオフ、音高、ベロシティなどの演奏情報を発生する。また、設定操作子１７は、歌唱音合成装置１を設定する操作つまみや操作ボタンなどの各種設定操作子である。 A functional block diagram showing the hardware configuration of the singing sound synthesizer of the present invention is shown in FIG.
In the singing sound synthesizing apparatus 1 of the present invention shown in FIG. 1, a CPU (Central Processing Unit) 10 is a central processing apparatus that controls the entire singing sound synthesizing apparatus 1 of the present invention, and a ROM (Read Only Memory) 11. Is a non-volatile memory storing a control program and various data. A RAM (Random Access Memory) 3 is a volatile memory used as a work area of the CPU 10 and various buffers. A data memory 18 that is assumed to be a rewritable memory such as a flash memory stores a phonetic database that stores character information including text data of lyrics and speech segment data of singing sounds. The display unit 15 is a display unit including a liquid crystal display or the like on which an operation state, various setting screens, a message for the user, and the like are displayed. The performance operator 16 is a performance operator composed of a keyboard or the like, and generates performance information such as key-on, key-off, pitch, and velocity. The setting operator 17 is various setting operators such as operation knobs and operation buttons for setting the singing sound synthesizer 1.

音源１３は、複数の発音チャンネルを有し、ＣＰＵ１０の制御の基で、ユーザーの演奏操作子１６を使用するリアルタイム演奏に応じて１つの発音チャンネルを割り当て、割り当てられた発音チャンネルにおいて、データメモリ１８から演奏に対応する音声素片データを読み出して歌唱音データを生成する。サウンドシステム１４は、音源１３で生成された歌唱音データをデジタル／アナログ変換器によりアナログ信号に変換して、アナログ信号とされた歌唱音を増幅してスピーカ等へ出力している。さらに、バス１９は歌唱音合成装置１における各部の間のデータ転送を行うためのバスである。 The sound source 13 has a plurality of tone generation channels, and assigns one tone generation channel in accordance with real-time performance using the user's performance operator 16 under the control of the CPU 10. In the assigned tone generation channel, the data memory 18 is assigned. The voice segment data corresponding to the performance is read from the singing sound data. The sound system 14 converts the singing sound data generated by the sound source 13 into an analog signal using a digital / analog converter, amplifies the singing sound converted into an analog signal, and outputs the amplified singing sound to a speaker or the like. Furthermore, the bus 19 is a bus for transferring data between the respective units in the singing sound synthesizer 1.

本発明にかかる歌唱音合成装置１における歌詞データの構成が図２（ａ）に、ベロシティ設定の態様が図２（ｂ）に示されている。歌詞データは、歌詞の内容を表すテキストデータ３０からなり、テキストデータ３０を構成する文字の各々が文字情報３１である。なお、文字情報３１は一度の押鍵で読み出される１音節を単位としてテキストデータ３０を定義したものである。図２（ａ）に示す場合は、「さくらさくらのやまも」がテキストデータ３０とされ、そのｃ１１〜ｃ３４が文字情報３１とされている。最初の「さくら」の３文字のテキストデータ３０が第１フレーズ３２ａであることを、ｃ１１，ｃ１２，ｃ１３の文字情報３１が示し、続く「さくら」の３文字のテキストデータ３０が第２フレーズ３２ｂであることを、ｃ２１，ｃ２２，ｃ２３の文字情報が示し、続く「のやまも」の４文字のテキストデータ３０が第３フレーズ３２ｃであることを、ｃ３１，ｃ３２，ｃ３３，ｃ３４の文字情報が示している。すなわち、ｃ１１は１文字目の「さ」が第１フレーズ３２ａに属することの情報とされ、ｃ１２は２文字目の「く」が第１フレーズ３２ａに属することの情報とされ、ｃ１３は３文字目の「ら」が第１フレーズ３２ａに属することの情報とされている。ｃ２１〜ｃ３４も同様とされており、例えば、ｃ２１は４文字目の「さ」が第２フレーズ３２ｂに属することの情報とされ、ｃ３１は７文字目の「の」が第３フレーズ３２ｃに属することの情報とされている。 The configuration of the lyrics data in the singing sound synthesizer 1 according to the present invention is shown in FIG. 2 (a), and the mode of velocity setting is shown in FIG. 2 (b). The lyric data is composed of text data 30 representing the contents of the lyrics, and each character constituting the text data 30 is character information 31. The character information 31 defines the text data 30 in units of one syllable read by one key press. In the case shown in FIG. 2A, “Sakura Sakura no Yamamo” is the text data 30, and c11 to c34 are the character information 31. The character information 31 of c11, c12, c13 indicates that the first three-character text data 30 of “Sakura” is the first phrase 32a, and the subsequent three-character text data 30 of “Sakura” is the second phrase 32b. That the character information of c21, c22, and c23 indicates that the subsequent four-character text data 30 of “Noyamamo” is the third phrase 32c, and the character information of c31, c32, c33, and c34 Show. That is, c11 is information that the first character “sa” belongs to the first phrase 32a, c12 is information that the second character “ku” belongs to the first phrase 32a, and c13 is three characters. This is information indicating that the eye “ra” belongs to the first phrase 32a. The same applies to c21 to c34. For example, c21 is information indicating that the fourth character “sa” belongs to the second phrase 32b, and c31 is information indicating that the seventh character “no” belongs to the third phrase 32c. Information.

次に、本発明にかかる歌唱音合成装置１におけるベロシティ設定の一態様を図２（ｂ）に示す。本発明にかかる歌唱音合成装置１では、押鍵した際のベロシティＶに応じて発音する際に読み出す文字の位置を制御している。図２（ｂ）は縦軸がベロシティＶの強度とされ、この図に示すベロシティ設定では、ベロシティＶがＶａ以下の強度の場合をベロシティＶ０と定義し、発音指示があった際に、ベロシティＶがＶ０であった場合は通常の位置の文字を読み出して発音する。また、Ｖａ＜Ｖｂとした時に、ベロシティＶがＶａを超えてＶｂ以下の場合をベロシティＶ１と定義し、押鍵した際に、ベロシティＶがＶ１であった場合は通常の位置の次の文字を読み出して発音する。さらに、ベロシティＶがＶｂを超えている場合をベロシティＶ２と定義し、押鍵した際に、ベロシティＶがＶ２であった場合は次のフレーズの先頭文字を読み出して発音する。例えば、Ｖａはメゾフォルテ（ｍｆ）に相当する強度とされ、Ｖｂはフォルテ（ｆ）あるいはフォルティッシモ（ｆｆ）に相当する強度とされる。これにより、次に説明するように押鍵すべき鍵を押し損ねた際にも、読み出される歌詞の位置を次に押鍵した際のベロシティでコントロールすることができ、リアルタイム演奏の演奏内容と歌詞の位置を一致させることができるようになる。 Next, one mode of velocity setting in the singing sound synthesizer 1 according to the present invention is shown in FIG. In the singing sound synthesizing apparatus 1 according to the present invention, the position of the character to be read out is controlled in accordance with the velocity V when the key is pressed. In FIG. 2 (b), the vertical axis indicates the intensity of velocity V. In the velocity setting shown in this figure, the velocity V is defined as velocity V0 when the intensity is equal to or less than Va. When V is V0, the character at the normal position is read and pronounced. When Va <Vb, the velocity V is greater than Va and less than or equal to Vb is defined as velocity V1, and when the key is depressed and the velocity V is V1, the next character in the normal position is Read and pronounce. Further, when the velocity V exceeds Vb, it is defined as velocity V2, and when the key is depressed and the velocity V is V2, the first character of the next phrase is read and pronounced. For example, Va is an intensity corresponding to mesoforte (mf), and Vb is an intensity corresponding to forte (f) or fortissimo (ff). This makes it possible to control the position of the lyrics to be read with the velocity when the key is pressed next, even when the key to be pressed is missed as described below, and the performance contents and lyrics of the real-time performance The positions of can be matched.

本発明にかかる歌唱音合成装置１の動作例を図３（ａ）（ｂ）（ｃ）に示し、その際に実行される歌唱音合成処理のフローチャートを図４に示す。
図３（ａ）に示す動作例１を図４に示すフローチャートで説明する。図３（ａ）の縦軸は音高を示し、横軸は時間を示している。この場合、音名で音高を表し、時間軸の時刻ｔ０〜ｔ１２で拍打ちのタイミングを示している。ユーザーがリアルタイム演奏するに先立ち、データメモリ１８に格納されている歌詞データの内の図２（ａ）に示す歌詞データが選択されており、読み出す文字を示すカーソルが先頭のｃ１１の文字「さ」の位置に置かれているとする。 An operation example of the singing sound synthesizing apparatus 1 according to the present invention is shown in FIGS. 3A, 3B, and 3C, and a flowchart of the singing sound synthesizing process executed at that time is shown in FIG.
Operation example 1 shown in FIG. 3A will be described with reference to the flowchart shown in FIG. In FIG. 3A, the vertical axis indicates the pitch, and the horizontal axis indicates the time. In this case, the pitch is represented by the pitch name, and the beat timing is shown at times t0 to t12 on the time axis. Before the user performs a real-time performance, the lyric data shown in FIG. 2A among the lyric data stored in the data memory 18 is selected, and the cursor indicating the character to be read is the first character “sa” of c11. It is assumed that it is placed in the position.

ユーザーが演奏を開始して演奏操作子１６の内のＡの鍵を、時刻ｔ０で押鍵ｋ０すると、この押鍵ｋ０をＣＰＵ１０が検出し、最初の歌唱音合成処理がスタートする。ここで、歌唱音合成処理におけるステップＳ１４の音声素片データ選択処理およびステップＳ１５の発音処理はＣＰＵ１０の制御の基で音源１３において実行され、これ以外の処理はＣＰＵ１０が実行する。スタートされた歌唱音合成処理のステップＳ１０では、時刻ｔ０で操作された演奏操作子１６の押鍵ｋ０に基づく発音指示をＣＰＵ１０が受け付ける。この際に、ＣＰＵ１０は押鍵ｋ０のタイミングｔ０、操作された演奏操作子１６の音高情報Ａおよびベロシティなどの演奏情報を取得する。この場合、演奏操作子１６とされる鍵盤は、例えば、鍵盤の押し込み操作を３段階で検出する第１センサ、第２センサ、第３センサが設けられた３メイクの鍵盤とされている。そして、ＣＰＵ１０は、第１センサがオンしてから第２センサがオンするまでの時間に基づいて算出したベロシティＶの強度を取得する。ステップＳ１１では、押鍵ｋ０された鍵のベロシティＶがベロシティＶ０の強度とされているか否かが判定され、ＣＰＵ１０が算出したベロシティＶの強度がＶａ以下の場合は、ベロシティＶがベロシティＶ０の強度とされている（Ｙｅｓ）と判定してステップＳ１３に進む。押鍵ｋ０では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。これにより、ＣＰＵ１０はステップＳ１１ではＹｅｓと判定してステップＳ１３に進み、データメモリ１８からカーソル位置の文字をＣＰＵ１０が取得する。この場合は、図２（ａ）に示す歌詞データが選択されており、選択された歌詞データを表すテキストデータ３０の先頭の文字ｃ１１にカーソルが置かれている。これにより、時刻ｔ０で押鍵ｋ０に基づく発音指示を受け付けた際には、ＣＰＵ１０は、カーソルが置かれたｃ１１の文字「さ」をデータメモリ１８から読み出す。 When the user starts playing and the key A of the performance operator 16 is depressed k0 at time t0, the CPU 10 detects the depressed key k0, and the first singing sound synthesis process starts. Here, the speech segment data selection process in step S14 and the sound generation process in step S15 in the singing sound synthesis process are executed in the sound source 13 under the control of the CPU 10, and the other processes are executed by the CPU 10. In step S10 of the started singing sound synthesis process, the CPU 10 accepts a sound generation instruction based on the key depression k0 of the performance operator 16 operated at time t0. At this time, the CPU 10 acquires performance information such as the timing t0 of the key depression k0, the pitch information A of the operated performance operator 16 and the velocity. In this case, the keyboard used as the performance operator 16 is, for example, a three-make keyboard provided with a first sensor, a second sensor, and a third sensor that detect the pressing operation of the keyboard in three stages. Then, the CPU 10 acquires the intensity of the velocity V calculated based on the time from when the first sensor is turned on until the second sensor is turned on. In step S11, it is determined whether or not the velocity V of the pressed key k0 is the velocity V0. If the velocity V calculated by the CPU 10 is less than Va, the velocity V is the velocity V0 strength. (Yes), the process proceeds to step S13. It is assumed that at the key pressing k0, the user intentionally presses the key at a key pressing speed at the velocity V0. As a result, the CPU 10 determines Yes in step S 11, proceeds to step S 13, and the CPU 10 acquires the character at the cursor position from the data memory 18. In this case, the lyric data shown in FIG. 2A is selected, and the cursor is placed on the first character c11 of the text data 30 representing the selected lyric data. Thus, when receiving a sound generation instruction based on the key press k0 at time t0, the CPU 10 reads the character “sa” of c11 on which the cursor is placed from the data memory 18.

次いで、ステップＳ１４で音声素片データ選択処理が行われる。この音声素片データ選択処理は、ＣＰＵ１０の制御の基で音源１３で行われる処理であり、取得された文字を発音させる音声素片データをデータメモリ１８に格納されている音韻データベースから選択する。音韻データベースには、「音素連鎖データ」と「定常部分データ」が記憶されている。音素連鎖データは、無音（＃）から子音、子音から母音、母音から（次の文字の）子音または母音など、発音が変化する際の音素片のデータである。また、定常部分データは、母音の発音が継続する際の音素片のデータである。押鍵ｋ０に基づく発音指示を受け付けて、取得された文字が、ｃ１１の「さ」の場合は、音源１３において、音素連鎖データから「無音→子音ｓ」に対応する音声素片データ「＃−ｓ」と「子音ｓ→母音ａ」に対応する音声素片データ「ｓ−ａ」が選択されると共に、定常部分データから「母音ａ」に対応する音声素片データ「ａ」が選択される。次いで、ステップＳ１５にて、ステップＳ１４で選択した音声素片データに基づく発音開始処理をＣＰＵ１０の制御の基で音源１３が行う。上記したように、音声素片データが選択された場合は、ステップＳ１５の発音開始処理において、「＃−ｓ」→「ｓ−ａ」→「ａ」の音声素片データの発音が順次音源１３において行われて、ｃ１１の文字「さ」の発音が行われる。発音の際には、押鍵ｋ０に基づく発音指示の受付の際に取得した音高Ａで、予め定めた所定の音量で「さ」の歌唱音が発音される。ステップＳ１５の発音開始処理で発音が開始されると、ステップＳ１６に進み、ＣＰＵ１０は次のｃ１２の文字「く」にカーソルを進める。次いで、ステップＳ１７にて、ＣＰＵ１０は押鍵中の鍵のベロシティがＶｔｈ以上変化したか否かを判定する。ここでは、ユーザーは押鍵ｋ０のベロシティを変化させないよう演奏していることから、ステップＳ１７でＣＰＵ１０はＮｏと判定して、ステップＳ１８に進み押鍵ｋ０が離鍵されて発音停止指示がされたか否かをＣＰＵ１０が判定する。ここで、押鍵ｋ０は未だ離鍵されていない（Ｎｏ）とＣＰＵ１０が判定すると、ステップＳ１７に戻り、ステップＳ１７とステップＳ１８の処理か繰り返し行われる。そして、図３（ａ）に示すように時刻ｔ１の直前で押鍵ｋ０が離鍵されると、ステップＳ１８でＣＰＵ１０は発音停止指示を受けたと判定してｃ１１の「さ」の文字の発音に対する消音処理を行い、「さ」の歌唱音は停止する。これにより、歌唱音合成処理は終了する。 Next, speech segment data selection processing is performed in step S14. This speech segment data selection process is a process performed by the sound source 13 under the control of the CPU 10, and selects speech segment data for generating the acquired characters from the phoneme database stored in the data memory 18. The phoneme database stores “phoneme chain data” and “stationary partial data”. The phoneme chain data is data of phonemes when the pronunciation changes, such as silence (#) to consonant, consonant to vowel, vowel to consonant or vowel (next character). The steady part data is data of phonemes when the vowel pronunciation continues. When the sound generation instruction based on the key press k0 is received and the acquired character is “sa” of c11, the sound unit 13 uses the phoneme chain data to generate the speech segment data “# −” corresponding to “silence → consonant s”. speech unit data “sa” corresponding to “s” and “consonant s → vowel a” is selected, and speech unit data “a” corresponding to “vowel a” is selected from the steady-state partial data. . Next, in step S15, the sound source 13 performs a sound generation start process based on the speech segment data selected in step S14 under the control of the CPU 10. As described above, when the speech unit data is selected, in the sound generation start process in step S15, the sound source 13 sequentially generates the sound of the speech unit data “# -s” → “sa” → “a”. The character “sa” of c11 is pronounced. At the time of sound generation, the singing sound of “sa” is generated at a predetermined predetermined volume with the pitch A acquired at the time of receiving the sound generation instruction based on the key press k0. When sound generation is started in the sound generation start process of step S15, the process proceeds to step S16, and the CPU 10 advances the cursor to the next character “ku” of c12. Next, in step S17, the CPU 10 determines whether or not the velocity of the key being pressed has changed by Vth or more. Here, since the user is performing so as not to change the velocity of the key press k0, the CPU 10 determines No in step S17, and proceeds to step S18 to determine whether the key press k0 has been released and a sound generation stop instruction has been issued. The CPU 10 determines whether or not. If the CPU 10 determines that the key press k0 has not been released yet (No), the process returns to step S17, and the processes of steps S17 and S18 are repeated. Then, as shown in FIG. 3A, when the key press k0 is released immediately before time t1, the CPU 10 determines in step S18 that it has received a sound generation stop instruction and responds to the pronunciation of the character “sa” in c11. Silence processing is performed and the singing sound of “sa” stops. Thereby, a song sound synthesis process is complete | finished.

次いで、演奏が進行してユーザーが時刻ｔ１においてＡの鍵を押鍵ｋ１すると、この押鍵ｋ１をＣＰＵ１０が検出し、２度目の歌唱音合成処理がスタートされる。この押鍵ｋ１では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理は上記したと同様に行われ、ＣＰＵ１０はステップＳ１１で、押鍵ｋ１のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ１２の文字である「く」をデータメモリ１８から読み出す。次いで、音源１３では「く」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｋ」、「ｋ−ｕ」、「ｕ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ１２の「く」の文字を発音開始する。この発音開始では、「＃−ｋ」→「ｋ−ｕ」→「ｕ」の音声素片データの発音が順次音源１３において、押鍵ｋ１の音高Ａで、予め定めた所定の音量で発音される。これにより、「く」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ１３の文字「ら」の位置に進められる。押鍵ｋ１は時刻ｔ２に達する前に離鍵されて、押鍵ｋ１の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ１２の「く」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 Next, when the performance progresses and the user presses the key A1 at time t1, the CPU 10 detects the key press k1 and the second singing sound synthesis process is started. In this key press k1, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. The singing sound synthesizing process is performed in the same manner as described above. In step S11, the CPU 10 detects that the velocity V of the key press k1 is the velocity V0, and in step S13, the character “c” at the cursor position c12 is stored as data. Read from the memory 18. Next, phoneme piece data for generating “ku” in the sound source 13 is selected from the phoneme database in the data memory 18 in step S14. In this case, speech unit data “# -k”, “ku”, and “u” are selected. In step S15, the sound source 13 starts to pronounce the character “ku” of c12 based on the selected speech segment data. At the start of the sound generation, the sound segment data “# -k” → “ku” → “u” are sounded in sequence at the tone A of the key press k1 in the sound source 13 at a predetermined predetermined volume. Is done. Thereby, the singing sound of “ku” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “ra” of c13. The key press k1 is released before the time t2 is reached, and the CPU 10 detects a sound generation stop instruction of the key press k1 in step S18, and the singing sound of “ku” in c12 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

次いで、演奏が進行して時刻ｔ２においてＢの鍵を押鍵ｋ２すべきであるが、ユーザーが押鍵ｋ２を押し損ねたとする。この場合は、押鍵ｋ２がされなかったことから、ＣＰＵ１０は発音指示を受け付けず歌唱音合成処理は開始されない。従って、カーソル位置のｃ１３の文字「ら」が読み出されないと共に、その発音開始処理も行われず、カーソルの位置は「ら」の文字位置に留まっている。 Next, when the performance progresses and the key B should be pressed k2 at time t2, it is assumed that the user fails to press the key k2. In this case, since the key press k2 has not been performed, the CPU 10 does not accept the pronunciation instruction and the singing sound synthesis process is not started. Therefore, the character “ra” at c13 at the cursor position is not read, and the sound generation start process is not performed, so that the cursor position remains at the character position “ra”.

さらに演奏が進行して、時刻ｔ４でユーザーがＡの鍵を押鍵ｋ３すると、この押鍵ｋ３をＣＰＵ１０が検出し、３度目の歌唱音合成処理がスタートされる。この押鍵ｋ３では、ユーザーは意図的にベロシティＶ１となる押鍵速度で押鍵したとする。これにより、歌唱音合成処理のステップＳ１１では、ＣＰＵ１０は、押鍵ｋ３のベロシティＶがベロシティＶ０以上であると検出し、ステップＳ１２に分岐する。ステップＳ１２では、ＣＰＵ１０が算出したベロシティＶに応じた位置の文字を取得する。この場合、ベロシティＶがベロシティＶ１であるとＣＰＵ１０が検出すると、図２（ｂ）に示すようにカーソルのある文字の次の文字を読み出す、すなわち、カーソル位置のｃ１３の文字「ら」の次の文字であるｃ２１の「さ」を読み出す。次いで、音源１３では「さ」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｓ」、「ｓ−ａ」、「ａ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ２１の「さ」の文字を発音開始する。この発音開始では、「＃−ｓ」→「ｓ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ３の音高Ａで、予め定めた所定の音量で発音される。これにより、「さ」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ２２の文字「く」の位置に進められる。押鍵ｋ３は時刻ｔ５に達する前に離鍵されて、押鍵ｋ３の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ２１の「さ」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。
このように、押鍵ｋ２を押し損ねると、次の押鍵ｋ３で発音開始される歌詞の位置が１文字ずれることになるが、押鍵ｋ３をベロシティＶ０を超えるベロシティＶ１で押鍵することにより、是正することができる。すなわち、ベロシティＶ１で押鍵することにより、カーソル位置の次の文字を読み出して発音開始することができるようになる。これにより、演奏を間違えた（押鍵すべきノートを押鍵しなかった）場合に、次の押鍵をベロシティＶ０を超えるベロシティＶ１で演奏すれば、本来その押鍵で発音すべき文字をすぐに発音させられるようになる。 When the performance further proceeds and the user presses the key A3 at time t4, the CPU 10 detects the key press k3, and the third singing sound synthesis process is started. In this key press k3, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V1. Thereby, in step S11 of the singing sound synthesizing process, the CPU 10 detects that the velocity V of the key press k3 is equal to or higher than the velocity V0, and branches to step S12. In step S12, a character at a position corresponding to the velocity V calculated by the CPU 10 is acquired. In this case, when the CPU 10 detects that the velocity V is the velocity V1, as shown in FIG. 2B, the character next to the character at the cursor is read, that is, the character next to the character “ra” at c13 at the cursor position. Reads out the character “sa” of c21. Next, in the sound source 13, phoneme piece data for generating “sa” is selected from the phoneme database in the data memory 18 in step S 14. In this case, speech unit data “# -s”, “s-a”, and “a” are selected. In step S15, the sound source 13 starts to pronounce the character “sa” of c21 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -s” → “sa” → “a” is sounded in sequence at the tone A of the key press k3 in the sound source 13 at a predetermined predetermined volume. Is done. Thereby, the song sound of “sa” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “ku” of c22. The key press k3 is released before time t5 is reached, the CPU 10 detects a sound generation stop instruction for the key press k3 in step S18, and the singing sound of “sa” in c21 is muted. When the process of step S18 ends, the singing sound synthesis process ends.
As described above, if the key press k2 is missed, the position of the lyric to start sounding with the next key press k3 will be shifted by one character, but by pressing the key press k3 with a velocity V1 exceeding the velocity V0. Can correct. That is, by pressing the key with velocity V1, it becomes possible to read the character next to the cursor position and start sounding. As a result, if the performance is wrong (the key to be pressed is not pressed) and the next key is played at a velocity V1 exceeding the velocity V0, the character that should be pronounced by that key press is immediately displayed. Can be pronounced.

さらに演奏が進行してユーザーが、時刻ｔ５においてＡの鍵を押鍵ｋ４すると、この押鍵ｋ４をＣＰＵ１０が検出し、４度目の歌唱音合成処理がスタートされる。この押鍵ｋ４では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理は上記したと同様に行われ、ＣＰＵ１０はステップＳ１１で、押鍵ｋ４のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ２２の文字である「く」をデータメモリ１８から読み出す。次いで、音源１３では「く」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｋ」、「ｋ−ｕ」、「ｕ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ２２の「く」の文字を発音開始する。この発音開始では、「＃−ｋ」→「ｋ−ｕ」→「ｕ」の音声素片データの発音が順次音源１３において、押鍵ｋ４の音高Ａで、予め定めた所定の音量で発音される。これにより、「く」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ２３の文字「ら」の位置に進められる。押鍵ｋ４は時刻ｔ６に達する前に離鍵されて、押鍵ｋ４の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ２２の「く」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 When the performance further progresses and the user presses the key A4 at time t5, the CPU 10 detects the key press k4, and the fourth singing sound synthesis process is started. In this key press k4, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. The singing sound synthesizing process is performed in the same manner as described above. In step S11, the CPU 10 detects that the velocity V of the key press k4 is the velocity V0, and in step S13, the character “ku” at the cursor position c22 is stored as data. Read from the memory 18. Next, phoneme piece data for generating “ku” in the sound source 13 is selected from the phoneme database in the data memory 18 in step S14. In this case, speech unit data “# -k”, “ku”, and “u” are selected. In step S15, the sound source 13 starts to pronounce the character “ku” of c22 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -k” → “ku” → “u” are sounded in sequence at the tone A of the key press k4 in the sound source 13 at a predetermined predetermined volume. Is done. Thereby, the singing sound of “ku” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “ra” in c23. The key press k4 is released before time t6 is reached, the CPU 10 detects a sound generation stop instruction for the key press k4 in step S18, and the singing sound of “ku” in c22 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

さらに演奏が進行してユーザーが、時刻ｔ６においてＢの鍵を押鍵ｋ５すると、この押鍵ｋ５をＣＰＵ１０が検出し、５度目の歌唱音合成処理がスタートされる。この押鍵ｋ５では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理は上記したと同様に行われ、ＣＰＵ１０はステップＳ１１で、押鍵ｋ５のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ２３の文字である「ら」をデータメモリ１８から読み出す。次いで、音源１３では「ら」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｒ」、「ｒ−ａ」、「ａ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ２３の「ら」の文字を発音開始する。この発音開始では、「＃−ｒ」→「ｒ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ５の音高Ｂで、予め定めた所定の音量で発音される。これにより、「ら」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ３１の文字「の」の位置に進められる。押鍵ｋ５は時刻ｔ７を越えて時刻ｔ８に達する前に離鍵されて、押鍵ｋ５の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ２３の「ら」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 When the performance further proceeds and the user presses the key B at time t6, the CPU 10 detects the key press k5, and the fifth singing sound synthesis process is started. In this key press k5, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. The singing sound synthesizing process is performed in the same manner as described above. In step S11, the CPU 10 detects that the velocity V of the key press k5 is the velocity V0, and in step S13, the character "ra", which is the character c23 at the cursor position, is stored. Read from the memory 18. Next, in the sound source 13, phoneme piece data for generating “ra” is selected from the phoneme database in the data memory 18 in step S 14. In this case, “# -r”, “r-a”, and “a” speech segment data are selected. In step S15, the sound source 13 starts to pronounce the character “ra” of c23 based on the selected speech segment data. At the start of the sound generation, the sound segment data “# -r” → “r−a” → “a” is sounded in sequence at the tone B of the key press k5 in the sound source 13 at a predetermined predetermined volume. Is done. As a result, the singing sound of “ra” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “no” of c31. The key press k5 is released before the time t7 is reached after the time t7 is reached, the CPU 10 detects a sound generation stop instruction for the key press k5 in step S18, and the singing sound of “ra” in c23 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

本発明にかかる歌唱音合成装置１では、ベロシティＶ１で押鍵された時に上記した動作とは異なる動作で動作させることができる。このバリエーションの動作を、図３（ａ）の時刻ｔ８以降で説明する。演奏が進行してユーザーが時刻ｔ８でＡの鍵を押鍵ｋ６すると、この押鍵ｋ６をＣＰＵ１０が検出し、６度目の歌唱音合成処理がスタートされる。この押鍵ｋ６では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理は上記したと同様に行われ、ＣＰＵ１０はステップＳ１１で、押鍵ｋ６のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ３１の文字である「の」をデータメモリ１８から読み出す。次いで、音源１３では「の」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｎ」、「ｎ−ｏ」、「ｏ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ３１の「の」の文字を発音開始する。この発音開始では、「＃−ｎ」→「ｎ−ｏ」→「ｏ」の音声素片データの発音が順次音源１３において、押鍵ｋ６の音高Ａで、予め定めた所定の音量で発音される。これにより、「の」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ３２の文字「や」の位置に進められる。押鍵ｋ６はほぼ時刻ｔ９で離鍵されて、押鍵ｋ６の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ３１の「の」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 The singing sound synthesizing apparatus 1 according to the present invention can be operated with an operation different from the above-described operation when the key is pressed with velocity V1. The operation of this variation will be described after time t8 in FIG. When the performance progresses and the user presses the key A6 at time t8, the CPU 10 detects the key press k6, and the sixth singing sound synthesis process is started. In this key press k6, it is assumed that the user intentionally presses the key at a key pressing speed at a velocity V0. The singing sound synthesizing process is performed in the same manner as described above. In step S11, the CPU 10 detects that the velocity V of the key press k6 is the velocity V0, and in step S13, the character C31 at the cursor position is “no”. Read from the memory 18. Next, phoneme piece data for generating “no” in the sound source 13 is selected from the phoneme database in the data memory 18 in step S14. In this case, “# -n”, “no”, “o” speech segment data is selected. In step S15, the sound source 13 starts to pronounce the character “no” of c31 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -n” → “no” → “o” is sounded in sequence at the tone A of the key press k6 in the sound source 13 at a predetermined predetermined volume. Is done. As a result, the singing sound “no” is produced. Next, in step S16, the cursor is advanced to the position of the next character “ya” of c32. The key press k6 is released at approximately time t9, the CPU 10 detects a sound generation stop instruction for the key press k6 in step S18, and the “no” song sound of c31 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

次いで、時刻ｔ９においてＢの鍵を押鍵ｋ７すべきであるが、押鍵ｋ７を押し損ねたとする。この場合は、押鍵ｋ７がされなかったことから、ＣＰＵ１０は発音指示を受け付けず歌唱音合成処理は開始されない。従って、カーソル位置のｃ３２の文字「や」が読み出されないと共に、その発音開始処理も行われず、カーソルの位置は「や」の文字位置に留まっている。 Next, at time t9, the key B should be pressed k7, but it is assumed that the key press k7 is missed. In this case, since the key press k7 has not been performed, the CPU 10 does not accept the pronunciation instruction and the singing sound synthesis process is not started. Accordingly, the character “ya” at c32 at the cursor position is not read, and the sound generation start process is not performed, so that the cursor position remains at the character position “ya”.

さらに演奏が進行して、時刻ｔ１０でユーザーがＣの鍵を押鍵ｋ８すると、この押鍵ｋ８をＣＰＵ１０が検出し、７度目の歌唱音合成処理がスタートされる。この押鍵ｋ８では、ユーザーは意図的にベロシティＶ１となる押鍵速度で押鍵したとする。これにより、歌唱音合成処理のステップＳ１１では、ＣＰＵ１０は、押鍵ｋ８のベロシティＶがベロシティＶ０以上であると検出し、ステップＳ１２に分岐する。ステップＳ１２では、ＣＰＵ１０が算出したベロシティＶに応じた位置の文字を取得する。このバリエーションの動作では、押鍵ｋ８のベロシティＶがベロシティＶ１であると検出されると、カーソル位置の文字を短時間で発音させた後でその次の文字を発音させる。すなわち、ＣＰＵ１０がベロシティＶがベロシティＶ１であると検出すると、カーソル位置のｃ３２の文字「や」と次のｃ３３の文字「ま」とをＣＰＵ１０が読み出す。次いで、音源１３では「や」を発音するための音素片データと「ま」を発音するための音素片データとをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「や」に続いて「ま」を発音することから「＃−ｙ」、「ｙ−ａ」、「ａ」の音声素片データと、「ａ−ｍ」、「ｍ−ａ」、「ａ」の音声素片データとが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ３２の「や」の文字を発音開始する。この発音開始では、「＃−ｙ」→「ｙ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ８の音高Ｃで、予め定めた所定の音量で発音される。これにより、「や」の歌唱音が発音されるが、この発音期間は短時間とされ、発音停止された「や」の歌唱音に続いて、音源１３は選択された音声素片データに基づいてｃ３３の「ま」の文字を発音開始する。この発音開始では、「ａ−ｍ」→「ｍ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ８の音高Ｃで、予め定めた所定の音量で発音される。これにより、「や」の歌唱音に続いて「ま」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ３４の文字「も」の位置に進められる。押鍵ｋ８はほぼ時刻ｔ１１で離鍵されて、押鍵ｋ８の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ３３の「ま」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。
このように、押鍵ｋ７を押し損ねると、次の押鍵ｋ８で発音開始される歌詞の位置が１文字ずれることになるが、押鍵ｋ８をベロシティＶ０を超えるベロシティＶ１で押鍵することにより、バリエーションの動作では是正することができる。すなわち、バリエーションの動作では、ベロシティＶ１で押鍵することにより、カーソルの位置の文字を読み出し、次いで次の文字を読み出して、カーソルの位置の文字を短時間だけ発音するようにしている。これにより、バリエーションの動作では、演奏を間違えた（押鍵すべきノートを押鍵しなかった）場合にも、文字が抜けることがなく聴衆にとって歌詞の意味が理解しやすい演奏とすることができる。 When the performance further proceeds and the user presses the key C8 at time t10, the CPU 10 detects the key press k8, and the seventh singing sound synthesis process is started. In this key press k8, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V1. Thereby, in step S11 of the singing sound synthesizing process, the CPU 10 detects that the velocity V of the key press k8 is equal to or higher than the velocity V0, and branches to step S12. In step S12, a character at a position corresponding to the velocity V calculated by the CPU 10 is acquired. In the operation of this variation, when it is detected that the velocity V of the key press k8 is the velocity V1, the character at the cursor position is pronounced in a short time and then the next character is pronounced. That is, when the CPU 10 detects that the velocity V is the velocity V1, the CPU 10 reads the character “ya” of c32 at the cursor position and the next character “ma” of c33. Next, in the sound source 13, phoneme piece data for pronounced “ya” and phoneme piece data for pronounced “ma” are selected from the phoneme database of the data memory 18 in step S 14. In this case, “ma” is pronounced after “ya”, so that “# -y”, “ya”, “a” speech segment data, “am”, “m-a”. , “A” speech segment data is selected. In step S15, the sound source 13 starts to pronounce the character “ya” in c32 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -y” → “ya” → “a” is sounded in sequence at the tone generator C at the pitch C of the key press k8 in the sound source 13. Is done. As a result, the singing sound of “ya” is pronounced, but this sounding period is short, and following the singing sound of “ya” whose pronunciation is stopped, the sound source 13 is based on the selected speech segment data. Then, the pronunciation of the letter “ma” in c33 is started. At the start of this sounding, the sound segment data “am” → “ma” → “a” are sequentially sounded at the tone generator C at the pitch C of the key press k8 at a predetermined predetermined volume. Is done. Thereby, the singing sound of “MA” is pronounced after the singing sound of “YA”. Next, in step S16, the cursor is advanced to the position of the next character "mo" of c34. The key press k8 is released at about time t11, the CPU 10 detects a sound generation stop instruction for the key press k8 in step S18, and the singing sound of “ma” in c33 is muted. When the process of step S18 ends, the singing sound synthesis process ends.
As described above, if the key press k7 is missed, the position of the lyric to start sounding with the next key press k8 is shifted by one character. However, by pressing the key press k8 with the velocity V1 exceeding the velocity V0. In the operation of the variation can be corrected. That is, in the variation operation, by pressing the key with velocity V1, the character at the cursor position is read, then the next character is read, and the character at the cursor position is pronounced for a short time. As a result, in the variation operation, even if the performance is wrong (note that the key to be pressed is not pressed), it is possible to make the performance easy to understand the meaning of the lyrics for the audience without missing letters. .

さらに演奏が進行してユーザーが、時刻ｔ１１においてＢの鍵を押鍵ｋ９すると、この押鍵ｋ９をＣＰＵ１０が検出し、８度目の歌唱音合成処理がスタートされる。この押鍵ｋ９では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理は上記したと同様に行われ、ＣＰＵ１０はステップＳ１１で、押鍵ｋ９のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ３４の文字である「も」をデータメモリ１８から読み出す。次いで、音源１３では「も」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｍ」、「ｍ−ｏ」、「ｏ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ３４の「も」の文字を発音開始する。この発音開始では、「＃−ｍ」→「ｍ−ｏ」→「ｏ」の音声素片データの発音が順次音源１３において、押鍵ｋ９の音高Ｂで、予め定めた所定の音量で発音される。これにより、「も」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次の文字の位置に進められる。この場合、次の文字がない場合はこのステップはスキップされる。押鍵ｋ９はほぼ時刻ｔ１２で離鍵されて、押鍵ｋ９の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ３４の「も」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 When the performance further proceeds and the user presses the key B at time t11, the CPU 10 detects the key press k9, and the eighth singing sound synthesis process is started. In this key press k9, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. The singing sound synthesizing process is performed in the same manner as described above. In step S11, the CPU 10 detects that the velocity V of the key press k9 is velocity V0, and in step S13, the character “c” at the cursor position c34 is stored as data. Read from the memory 18. Next, in the sound source 13, phoneme piece data for generating “mo” is selected from the phoneme database in the data memory 18 in step S 14. In this case, speech unit data “# -m”, “mo”, and “o” are selected. In step S15, the sound source 13 starts to pronounce the character “c” in c34 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -m” → “mo” → “o” are sounded in sequence at the tone B of the key press k9 in the sound source 13 at a predetermined predetermined volume. Is done. Thereby, the singing sound of “mo” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character. In this case, if there is no next character, this step is skipped. The key press k9 is released at about time t12, the CPU 10 detects a sound generation stop instruction for the key press k9 in step S18, and the c34 "mo" singing sound is muted. When the process of step S18 ends, the singing sound synthesis process ends.

次に、図３（ｂ）に本発明にかかる歌唱音合成装置１の動作例２を示す。ここでは、上記した図３（ａ）に示す動作例１と異なる動作について、図３（ｂ）に示す動作例２を説明する。図３（ｂ）においても縦軸は音名で表した音高とされ、横軸の時間軸は時刻ｔ０〜ｔ１２で拍打ちのタイミングを示している。また、ユーザーがリアルタイム演奏するに先立ち、図２（ａ）に示す歌詞データを選択しており、読み出す文字を示すカーソルが先頭のｃ１１の文字「さ」にあるものとする。
ユーザーが演奏を開始して演奏操作子１６の内のＡの鍵を、時刻ｔ０で押鍵ｋ０すると、この押鍵ｋ０をＣＰＵ１０が検出し、最初の歌唱音合成処理がスタートされる。この時に実行される処理は、上記した動作例１と同様の処理とされるので、その説明は省略する。また、演奏が進行してユーザーが時刻ｔ１においてＡの鍵を押鍵ｋ１すると、この押鍵ｋ１をＣＰＵ１０が検出し、２度目の歌唱音合成処理がスタートされる。この時に実行される処理も、上記した動作例１と同様の処理とされるのて、その説明は省略する。 Next, FIG. 3B shows an operation example 2 of the singing sound synthesizer 1 according to the present invention. Here, the operation example 2 shown in FIG. 3B will be described with respect to the operation different from the operation example 1 shown in FIG. In FIG. 3B as well, the vertical axis represents the pitch represented by the pitch name, and the horizontal axis represents the timing of beats at times t0 to t12. Further, it is assumed that the lyric data shown in FIG. 2A is selected before the user performs a real-time performance, and the cursor indicating the character to be read is at the first character “sa” of c11.
When the user starts playing and the key A of the performance operator 16 is depressed k0 at time t0, the CPU 10 detects the depressed key k0, and the first singing sound synthesis process is started. Since the process executed at this time is the same process as in the first operation example, the description thereof is omitted. When the performance progresses and the user presses the key A1 at time t1, the CPU 10 detects the key press k1 and the second singing sound synthesis process is started. The processing executed at this time is also the same processing as in the first operation example, and the description thereof is omitted.

演奏がさらに進行してユーザーが時刻ｔ２においてＢの鍵を押鍵ｋ２すると、この押鍵ｋ２をＣＰＵ１０が検出し、３度目の歌唱音合成処理がスタートされる。この押鍵ｋ２では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理において、ＣＰＵ１０はステップＳ１１で、押鍵ｋ２のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ１３の文字である「ら」をデータメモリ１８から読み出す。次いで、音源１３では「ら」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｒ」、「ｒ−ａ」、「ａ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ１３の「ら」の文字を発音開始する。この発音開始では、「＃−ｒ」→「ｒ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ２の音高Ｂで、予め定めた所定の音量で発音される。これにより、「ら」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ２１の文字「さ」の位置に進められる。押鍵ｋ２は時刻ｔ３を越えて時刻ｔ４に達する前に離鍵されて、押鍵ｋ２の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ１３の「ら」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 When the performance further proceeds and the user presses the key B2 at time t2, the CPU 10 detects the key press k2, and the third singing sound synthesis process is started. In this key press k2, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. In the singing sound synthesizing process, the CPU 10 detects in step S11 that the velocity V of the key press k2 is the velocity V0, and reads “ra”, which is the character of c13 at the cursor position, from the data memory 18 in step S13. Next, in the sound source 13, phoneme piece data for generating “ra” is selected from the phoneme database in the data memory 18 in step S 14. In this case, “# -r”, “r-a”, and “a” speech segment data are selected. In step S15, the sound source 13 starts to pronounce the character “ra” of c13 based on the selected speech segment data. At the start of sounding, the sound segment data “# −r” → “r−a” → “a” is sounded in sequence at the tone B of the key press k2 in the sound source 13 at a predetermined predetermined volume. Is done. As a result, the singing sound of “ra” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “sa” of c21. The key press k2 is released before the time t3 is reached after the time t3 is reached, the CPU 10 detects a sound generation stop instruction for the key press k2 in step S18, and the singing sound of “ra” in c13 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

演奏がさらに進行してユーザーが時刻ｔ４においてＡの鍵を押鍵ｋ３すると、この押鍵ｋ３をＣＰＵ１０が検出し、４度目の歌唱音合成処理がスタートされる。この押鍵ｋ３では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理において、ＣＰＵ１０はステップＳ１１で、押鍵ｋ３のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ２１の文字である「さ」をデータメモリ１８から読み出す。次いで、音源１３では「さ」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｓ」、「ｓ−ａ」、「ａ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ２１の「さ」の文字を発音開始する。この発音開始では、「＃−ｓ」→「ｓ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ３の音高Ａで、予め定めた所定の音量で発音される。これにより、「さ」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ２２の文字「く」の位置に進められる。押鍵ｋ３は時刻ｔ５に達する前に離鍵されて、押鍵ｋ３の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ２１の「さ」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 When the performance further proceeds and the user presses the key A3 at time t4, the CPU 10 detects the key press k3, and the fourth singing sound synthesis process is started. In this key press k3, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. In the singing sound synthesizing process, the CPU 10 detects in step S11 that the velocity V of the key press k3 is velocity V0, and reads “sa”, which is the character of c21 at the cursor position, from the data memory 18 in step S13. Next, in the sound source 13, phoneme piece data for generating “sa” is selected from the phoneme database in the data memory 18 in step S 14. In this case, speech unit data “# -s”, “s-a”, and “a” are selected. In step S15, the sound source 13 starts to pronounce the character “sa” of c21 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -s” → “sa” → “a” is sounded in sequence at the tone A of the key press k3 in the sound source 13 at a predetermined predetermined volume. Is done. Thereby, the song sound of “sa” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “ku” of c22. The key press k3 is released before time t5 is reached, the CPU 10 detects a sound generation stop instruction for the key press k3 in step S18, and the singing sound of “sa” in c21 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

次いで、時刻ｔ５においてＡの鍵を押鍵ｋ４すべきであるが、押鍵ｋ４を押し損ねたとする。この場合は、押鍵ｋ４がされなかったことから、ＣＰＵ１０は発音指示を受け付けず歌唱音合成処理は開始されない。従って、カーソル位置のｃ２２の文字「く」が読み出されないと共に、その発音開始処理も行われず、カーソルの位置は「く」の文字位置に留まっている。
さらに、時刻ｔ６においてＢの鍵を押鍵ｋ５すべきであるが、押鍵ｋ５も押し損ねたとする。この場合は、押鍵ｋ５がされなかったことから、ＣＰＵ１０は発音指示を受け付けず歌唱音合成処理は開始されない。従って、カーソル位置のｃ２２の文字「く」が読み出されないと共に、その発音開始処理も行われず、カーソルの位置は「く」の文字位置に留まっている。 Next, it is assumed that the key A should be pressed k4 at time t5, but the key press k4 is missed. In this case, since the key press k4 has not been performed, the CPU 10 does not accept the pronunciation instruction and the singing sound synthesis process is not started. Therefore, the character “ku” at the cursor position c22 is not read, and the sound generation start process is not performed, so that the cursor position remains at the character position “ku”.
Furthermore, it is assumed that the key B should be pressed k5 at time t6, but the key press k5 is also missed. In this case, since the key press k5 has not been performed, the CPU 10 does not accept the pronunciation instruction and the singing sound synthesis process is not started. Therefore, the character “ku” at the cursor position c22 is not read, and the sound generation start process is not performed, so that the cursor position remains at the character position “ku”.

さらに演奏が進行して、時刻ｔ８でユーザーがＡの鍵を押鍵ｋ６すると、この押鍵ｋ６をＣＰＵ１０が検出し、５度目の歌唱音合成処理がスタートされる。この押鍵ｋ６では、ユーザーは意図的にベロシティＶ２となる押鍵速度で押鍵したとする。これにより、歌唱音合成処理のステップＳ１１では、ＣＰＵ１０は、押鍵ｋ６のベロシティＶがベロシティＶ０以上であると検出し、ステップＳ１２に分岐する。ステップＳ１２では、ＣＰＵ１０が算出したベロシティＶに応じた位置の文字を取得する。この場合、押鍵ｋ６のベロシティＶがベロシティＶ２であると検出されると、図２（ｂ）に示すように次のフレーズの文字が読み出される。この場合、カーソルはｃ２２にあり、ｃ２２が第２フレーズ３２ｂに属することから、次の第３フレーズ３２ｃの先頭のｃ３１の文字「の」をＣＰＵ１０が読み出す。次いで、音源１３では「の」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｎ」、「ｎ−ｏ」、「ｏ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ３１の「の」の文字を発音開始する。この発音開始では、「＃−ｎ」→「ｎ−ｏ」→「ｏ」の音声素片データの発音が順次音源１３において、押鍵ｋ６の音高Ａで、予め定めた所定の音量で発音される。これにより、「の」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ３２の文字「や」の位置に進められる。押鍵ｋ６はほぼ時刻ｔ９で離鍵されて、押鍵ｋ６の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ３１の「の」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 When the performance further proceeds and the user presses the key A6 at time t8, the CPU 10 detects the key press k6, and the fifth singing sound synthesis process is started. In this key press k6, it is assumed that the user intentionally presses the key at a key press speed at which velocity V2 is obtained. Thereby, in step S11 of the singing sound synthesizing process, the CPU 10 detects that the velocity V of the key press k6 is equal to or higher than the velocity V0, and branches to step S12. In step S12, a character at a position corresponding to the velocity V calculated by the CPU 10 is acquired. In this case, when it is detected that the velocity V of the key press k6 is the velocity V2, the character of the next phrase is read as shown in FIG. In this case, since the cursor is at c22 and c22 belongs to the second phrase 32b, the CPU 10 reads the character “NO” of the first c31 of the next third phrase 32c. Next, phoneme piece data for generating “no” in the sound source 13 is selected from the phoneme database in the data memory 18 in step S14. In this case, “# -n”, “no”, “o” speech segment data is selected. In step S15, the sound source 13 starts to pronounce the character “no” of c31 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -n” → “no” → “o” is sounded in sequence at the tone A of the key press k6 in the sound source 13 at a predetermined predetermined volume. Is done. As a result, the singing sound “no” is produced. Next, in step S16, the cursor is advanced to the position of the next character “ya” of c32. The key press k6 is released at approximately time t9, the CPU 10 detects a sound generation stop instruction for the key press k6 in step S18, and the “no” song sound of c31 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

このように、時刻ｔ５における押鍵ｋ４と時刻ｔ６における押鍵ｋ５を続けて押し損ねたことにより、歌詞の位置が演奏内容と２文字分ずれることになるが、この歌詞のずれを次の押鍵ｋ６をベロシティＶ２で押鍵することにより、カーソルのあるフレーズの次のフレーズを読み出すことによりずれを是正することができるようになる。すなわち、演奏を間違えた（押鍵すべきノートを押鍵しなかった）場合に、次の押鍵をベロシティＶ２で演奏することで、本来その押鍵で発音すべき文字をすぐに発音させられるようになる。図３（ｂ）における時刻ｔ９以降の動作の説明については、上記の説明から容易に理解できることから省略する。 As described above, when the key press k4 at time t5 and the key press k5 at time t6 are continuously pressed, the position of the lyrics is shifted by two characters from the performance content. By depressing the key k6 with the velocity V2, the deviation can be corrected by reading the phrase next to the phrase where the cursor is. That is, if the performance is wrong (the note that should be pressed is not pressed), the next key press is played at velocity V2, so that the character that should originally be generated by the key press can be immediately generated. It becomes like this. The description of the operation after time t9 in FIG. 3B is omitted because it can be easily understood from the above description.

次に、図３（ｃ）に本発明にかかる歌唱音合成装置１の動作例３を示す。図３（ｃ）は図３（ａ）と同様に縦軸が音名で表した音高とされ、横軸の時間軸において時刻ｔ２０〜ｔ２３で拍打ちのタイミングを示しているが、図示する区間は演奏の一部区間とされている。ユーザーがリアルタイム演奏するに先立ち、図２（ａ）に示す歌詞データを選択しており、読み出す文字を示すカーソルが先頭のｃ１１の文字「さ」にあるものとする。
演奏中においてユーザーが演奏操作子１６の内のＡの鍵を、時刻ｔ２０で押鍵ｋ１０すると、この押鍵ｋ１０をＣＰＵ１０が検出し、歌唱音合成処理がスタートされる。この押鍵ｋ１０では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理において、ＣＰＵ１０はステップＳ１１で、押鍵ｋ１０のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ１１の文字である「さ」をデータメモリ１８から読み出す。次いで、音源１３では「さ」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｓ」、「ｓ−ａ」、「ａ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ１１の「さ」の文字を発音開始する。この発音開始では、「＃−ｓ」→「ｓ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ１０の音高Ａで、予め定めた所定の音量で発音される。これにより、「さ」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ１２の文字「く」の位置に進められる。 Next, FIG. 3C shows an operation example 3 of the singing sound synthesizer 1 according to the present invention. FIG. 3 (c) shows the beat timing at times t20 to t23 on the time axis on the horizontal axis, as in FIG. 3 (a). The section is a part of the performance. It is assumed that the lyric data shown in FIG. 2A is selected before the user performs a real-time performance, and the cursor indicating the character to be read is at the first character “sa” of c11.
When the user presses the key A in the performance operator 16 at time t20 during performance, the CPU 10 detects the key press k10 and the singing sound synthesis process is started. In this key press k10, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. In the singing sound synthesizing process, the CPU 10 detects in step S11 that the velocity V of the key press k10 is velocity V0, and reads “sa”, which is the character of c11 at the cursor position, from the data memory 18 in step S13. Next, in the sound source 13, phoneme piece data for generating “sa” is selected from the phoneme database in the data memory 18 in step S 14. In this case, speech unit data “# -s”, “s-a”, and “a” are selected. In step S15, the sound source 13 starts to pronounce the character “sa” of c11 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -s” → “sa” → “a” is sounded in sequence at the tone A of the key press k10 in the sound source 13 at a predetermined predetermined volume. Is done. Thereby, the song sound of “sa” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “ku” of c12.

押鍵ｋ１０の押鍵中であって時刻ｔ２１の近辺において押鍵中のＡの鍵を押し込んだとする。ステップＳ１７では、押鍵中の鍵のベロシティがＶｔｈ以上変化したか否かをＣＰＵ１０が判定するが、ここでは、ユーザーは押鍵ｋ１０のベロシティを意図的にＶｔｈ以上変化させるようにＡの鍵を押し込んだとする。これにより、ステップＳ１７でＣＰＵ１０はＹｅｓと判定して、ステップＳ１３に戻り、カーソル位置のｃ１２の文字である「く」をＣＰＵ１０がデータメモリ１８から読み出す。次いで、音源１３では「く」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「さ」に続いて「く」を発音することから「ａ−ｋ」、「ｋ−ｕ」、「ｕ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ１２の「く」の文字を発音開始する。この発音開始では、「ａ−ｋ」→「ｋ−ｕ」→「ｕ」の音声素片データの発音が順次音源１３において、押鍵ｋ１０の音高Ａで、予め定めた所定の音量で発音される。発音開始する際には、「さ」の歌唱音は停止され、「さ」の歌唱音に続いて「く」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ１３の文字「ら」の位置に進められる。押鍵ｋ１０はほぼ時刻ｔ２２で離鍵されて、押鍵ｋ１０の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ１２の「く」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。 It is assumed that the key A being pressed is being pressed near the time t21 while the key k10 is being pressed. In step S17, the CPU 10 determines whether or not the velocity of the key being pressed has changed by Vth or more. Here, the user presses the A key so that the velocity of the key press k10 is intentionally changed by Vth or more. Suppose you push it in. As a result, the CPU 10 determines Yes in step S 17, returns to step S 13, and the CPU 10 reads “ku”, which is the character of c 12 at the cursor position, from the data memory 18. Next, phoneme piece data for generating “ku” in the sound source 13 is selected from the phoneme database in the data memory 18 in step S14. In this case, since “ku” is pronounced after “sa”, speech unit data of “ak”, “ku”, and “u” is selected. In step S15, the sound source 13 starts to pronounce the character “ku” of c12 based on the selected speech segment data. At the start of this sounding, the sound segment data “a−k” → “ku” → “u” are sounded in sequence at the pitch A of the key press k10 in the sound source 13 at a predetermined predetermined volume. Is done. When the pronunciation starts, the song sound of “sa” is stopped, and the song sound of “ku” is pronounced after the song sound of “sa”. Next, in step S16, the cursor is advanced to the position of the next character “ra” of c13. The key press k10 is released at approximately time t22, the CPU 10 detects a sound generation stop instruction for the key press k10 in step S18, and the singing sound of “ku” in c12 is muted. When the process of step S18 ends, the singing sound synthesis process ends.

さらに演奏が進行してユーザーが、時刻ｔ２２においてＢの鍵を押鍵ｋ１２すると、この押鍵ｋ５をＣＰＵ１０が検出し、２度目の歌唱音合成処理がスタートされる。この押鍵ｋ１２では、ユーザーは意図的にベロシティＶ０となる押鍵速度で押鍵したとする。歌唱音合成処理は上記したように行われ、ＣＰＵ１０はステップＳ１１で、押鍵ｋ１２のベロシティＶがベロシティＶ０であると検出し、ステップＳ１３でカーソル位置のｃ１３の文字である「ら」をデータメモリ１８から読み出す。次いで、音源１３では「ら」を発音するための音素片データをステップＳ１４でデータメモリ１８の音韻データベースから選択する。この場合、「＃−ｒ」、「ｒ−ａ」、「ａ」の音声素片データが選択される。そして、ステップＳ１５にて、音源１３は選択された音声素片データに基づいてｃ１３の「ら」の文字を発音開始する。この発音開始では、「＃−ｒ」→「ｒ−ａ」→「ａ」の音声素片データの発音が順次音源１３において、押鍵ｋ１２の音高Ｂで、予め定めた所定の音量で発音される。これにより、「ら」の歌唱音が発音される。次いで、ステップＳ１６で、カーソルは次のｃ２１の文字「の」の位置に進められる。押鍵ｋ１２は時刻ｔ２３を越えて離鍵されて、押鍵ｋ１２の発音停止指示をＣＰＵ１０がステップＳ１８で検出し、ｃ１３の「ら」の歌唱音は消音される。ステップＳ１８の処理が終了すると歌唱音合成処理は終了する。
このように、本発明にかかる歌唱音合成装置１では、押鍵中に、ベロシティを所定量（Ｖｔｈ）以上増加する演奏を行うとき、すなわち、アフタータッチによって、発音中の文字の次の文字を押鍵中の鍵の音高で発音させることができる。 When the performance further proceeds and the user presses the key B at time t22, the CPU 10 detects the key press k5, and the second singing sound synthesis process is started. In this key press k12, it is assumed that the user intentionally presses the key at a key press speed at which the velocity is V0. The singing sound synthesizing process is performed as described above. In step S11, the CPU 10 detects that the velocity V of the key press k12 is the velocity V0, and in step S13, "ra", which is the character of c13 at the cursor position, is stored in the data memory. 18 is read out. Next, in the sound source 13, phoneme piece data for generating “ra” is selected from the phoneme database in the data memory 18 in step S 14. In this case, “# -r”, “r-a”, and “a” speech segment data are selected. In step S15, the sound source 13 starts to pronounce the character “ra” of c13 based on the selected speech segment data. At the start of this sounding, the sound segment data “# -r” → “r−a” → “a” is sounded sequentially at the tone generator B at the pitch B of the key press k12 in the sound source 13. Is done. As a result, the singing sound of “ra” is pronounced. Next, in step S16, the cursor is advanced to the position of the next character “no” of c21. The key press k12 is released beyond time t23, the CPU 10 detects a sound generation stop instruction for the key press k12 in step S18, and the singing sound of “ra” in c13 is muted. When the process of step S18 ends, the singing sound synthesis process ends.
As described above, in the singing sound synthesizer 1 according to the present invention, when performing a performance in which the velocity is increased by a predetermined amount (Vth) or more during key depression, that is, after touching, the character next to the character being pronounced is displayed. It is possible to produce sound with the pitch of the key being pressed.

以上説明した本発明の歌唱音合成装置においては、ベロシティと発音させる文字の進め方の対応については、いろいろなバリエーションが考えられる。例えば、Ｎ段階にベロシティ範囲を設定して、そのベロシティ範囲に応じて発音させる文字を、１文字進める、２文字進める、・・・Ｎ文字進める、と設定することにより、所望の文字数を押鍵ベロシティで制御できるようになる。また、以上の説明では、ベロシティが大きいときに発音させる文字を複数文字進めるようにしたが、これに替えて、ベロシティが小さいときに発音させる文字を複数文字進めるようにしてもよい。さらに、ベロシティ値に応じて、発音させる文字を戻すようにしてもよい。この場合、フレーズの先頭位置に戻したり、１文字戻すように制御してもよい。
ところで、ベロシティ範囲の設定を多段階に設定するほど、正確なベロシティでの演奏が求められることとなり、演奏操作が難しくなる。そこで、鍵盤の操作にペダル等の他の演奏操作子の操作を組み合わせることで、発音させる文字を進めたり戻したりしてもよい。たとえば、上記したように動作する本発明の歌唱音合成装置において、ペダルを踏みながらベロシティＶ１で演奏した場合には発音させる文字を１文字戻るようにし、ペダルを踏みながらベロシティＶ２で演奏した場合には発音させる文字をフレーズ先頭に戻るようにすることができる。これによれば、ベロシティ設定を３段階としても、５種類の発音させる文字の文字制御を行うことができるようになる。 In the singing sound synthesizing apparatus of the present invention described above, various variations can be considered for the correspondence between the velocity and how to advance the characters to be pronounced. For example, by setting a velocity range in N stages and setting a character to be pronounced according to the velocity range by one character, two characters,... It can be controlled by velocity. Further, in the above description, a plurality of characters are pronounced when the velocity is high. Alternatively, a plurality of characters may be advanced when the velocity is small. Further, the character to be sounded may be returned according to the velocity value. In this case, it may be controlled to return to the beginning position of the phrase or to return one character.
By the way, the more the velocity range is set in multiple steps, the more accurate the performance is required, and the performance operation becomes more difficult. Therefore, the character to be sounded may be advanced or returned by combining the operation of the keyboard with the operation of another performance operator such as a pedal. For example, in the singing sound synthesizing apparatus of the present invention that operates as described above, when a performance is performed at velocity V1 while depressing the pedal, the character to be sounded is returned by one character, and when the performance is performed at velocity V2 while depressing the pedal. Can return the character to be played back to the beginning of the phrase. According to this, even if the velocity setting is set in three stages, it becomes possible to perform character control of five types of characters to be pronounced.

さらにまた、本発明の歌唱音合成装置において、演奏操作子は鍵盤ではない演奏操作子であってもよい。例えば、打楽器(パッド)型の場合には打撃力に応じて、発音させる文字の制御を行い、弦楽器型の場合には弦をはじく強さに応じて、発音させる文字の制御を行い、管楽器型の場合には息の量に応じて、発音させる文字の制御を行うようにしてもよい。
さらにまた、本発明の歌唱音合成装置において、文字情報を記憶したデータメモリに替えて、ハードディスク、内蔵メモリ、外付けメモリ、ネットワーク経由のサーバなどに文字情報等を記憶するようにしてもよい。
なお、本発明の歌唱音発音装置では、押鍵ベロシティは文字制御のための情報としてのみ扱い、音量制御には使用されないものとしたが、特定のスイッチやツマミ、あるいは、特定の演奏操作子を操作しながら押鍵した時には、押鍵ベロシティにより音量制御するようにしてもよい。 Furthermore, in the singing sound synthesizer of the present invention, the performance operator may be a performance operator that is not a keyboard. For example, in the case of a percussion instrument (pad) type, the character to be sounded is controlled according to the striking force. In the case of a stringed instrument type, the character to be sounded is controlled according to the strength of repelling the string. In this case, the character to be sounded may be controlled according to the amount of breath.
Furthermore, in the singing sound synthesizer of the present invention, the character information or the like may be stored in a hard disk, a built-in memory, an external memory, a server via a network, or the like instead of the data memory storing the character information.
In the singing sound generating device of the present invention, the key depression velocity is handled only as information for character control and is not used for volume control. However, a specific switch or knob or a specific performance operator is not used. When a key is pressed while operating, the volume may be controlled by the key pressing velocity.

１歌唱音合成装置、１０ＣＰＵ、１１ＲＯＭ、１２ＲＡＭ、１３音源、１４サウンドシステム、１５表示部、１６演奏操作子、１７設定操作子、１８データメモリ、１９バス、３０テキストデータ、３１文字情報、３２ａ第１フレーズ、３２ｂ第２フレーズ、３２ｃ第３フレーズ DESCRIPTION OF SYMBOLS 1 Song sound synthesizer, 10 CPU, 11 ROM, 12 RAM, 13 Sound source, 14 Sound system, 15 Display part, 16 Performance operator, 17 Setting operator, 18 Data memory, 19 Bus, 30 Text data, 31 Character information , 32a 1st phrase, 32b 2nd phrase, 32c 3rd phrase

Claims

Detecting means for detecting the operation start and operation intensity of the performance operator;
An acquisition means for acquiring character information from a storage device in response to the start of operation of the performance operator detected by the detection means;
Pronunciation control means,
The position of the character acquired by the acquisition means is controlled according to the operation intensity of the performance operator detected by the detection means, and the sound generation control means pronounces the character acquired by the acquisition means as speech. A singing sound synthesizer characterized by

The singing sound synthesizing apparatus according to claim 1, wherein the position of the character acquired by the acquisition unit is controlled in accordance with an operation intensity at the start of operation of the performance operator detected by the detection unit.

In response to a change in operation intensity during operation of the performance operator detected by the detection means, the acquisition means acquires a character positioned next, and the character acquired by the acquisition means is converted into the pronunciation control means. The singing sound synthesizer according to claim 1, wherein the singing sound synthesizer is generated as a voice.