JPH09134188A

JPH09134188A - Singing voice synthesizer and musical tone reproducing device

Info

Publication number: JPH09134188A
Application number: JP7292403A
Authority: JP
Inventors: Katsuhiko Hayashi; 克彦林; Daisuke Mori; 大輔森
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1995-11-10
Filing date: 1995-11-10
Publication date: 1997-05-20

Abstract

PROBLEM TO BE SOLVED: To prevent the degradation in the quality of a back chorus in spite of a change in playing speed and interval by applying a singing voice synthesizer capable of synthesizing the back chorus of PCM quality to a synthesizing KARAOKE machine. SOLUTION: The waveform data read out by a first reading out device 106a and second reading out device 106b are subjected to level conversion by level converters 108a and 108b according to the output values of first and second envelope formers 107a and 107b. Further, the outputs of these level converters 108a and 108b are added by an adder 109 and the singing voice synthesized output is obtd. at an output terminal 110. Further, the device is provided with an interval differential value generator 111 for controlling the pitch values of phonemes so as to gradually approximate the pitch value of the front phoneme to the rear phoneme at the time the front phoneme transfers to the rear phoneme when the pitches of the front and rear phonemes vary. As a result, the phonemes and the phonemes are extremely smoothly connected by controlling the reading out devices 106a and 106b.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明はバックコーラス付き
のカラオケ演奏を行うカラオケ装置に用いる音楽再生装
置に関し、特に曲のテンポ(演奏速度)およびピッチ(移
調)変更によってバックバックコーラスの品質を低下す
ることのない、歌声合成装置および音楽再生装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a music reproducing apparatus used in a karaoke apparatus for performing a karaoke performance with a back chorus, and particularly, the quality of the back back chorus is deteriorated by changing the tempo (performance speed) and pitch (transposition) of the music. The present invention relates to a singing voice synthesizing device and a music reproducing device.

【０００２】[0002]

【従来の技術】近年、デジタル技術の進歩に伴い、各種
のカラオケ方式が提案されており、特に電子楽器と同様
の楽音合成装置を用いたシンセサイザ方式のカラオケ
（以下、シンセカラオケと称す）が注目されるようにな
ってきている。2. Description of the Related Art In recent years, various karaoke systems have been proposed in accordance with the progress of digital technology, and in particular, a synthesizer system karaoke system using a musical tone synthesizer similar to an electronic musical instrument (hereinafter referred to as "synth karaoke system") has attracted attention. Is becoming popular.

【０００３】シンセカラオケは、楽音合成装置を一般に
ＭＩＤＩと呼ばれる演奏指示情報等によって駆動し、カ
ラオケ用のバックミュージックを発生するものであり、
一般のカラオケ(例えばレーザーディスクのカラオケ)に
較べて、１曲の演奏に要するデータ量が非常に少ないこ
とに特長がある。Synthesized karaoke is a system in which a musical sound synthesizer is driven by performance instruction information generally called MIDI to generate back music for karaoke.
Compared with general karaoke (for example, laser disk karaoke), the feature is that the amount of data required for playing one song is very small.

【０００４】シンセサイザ方式のこの特長は、演奏情報
をＩＳＤＮや電話回線を伝送路として転送するシンセカ
ラオケ(いわゆる、通信カラオケ)や演奏情報を半導体メ
モリ等に格納するタイプのシンセカラオケ等に生かされ
ていて、今後この分野でのシンセカラオケの成長が期待
されている。This feature of the synthesizer system is utilized in synth karaoke (so-called communication karaoke) in which performance information is transferred through an ISDN or a telephone line as a transmission line, and in synth karaoke in which performance information is stored in a semiconductor memory or the like. Therefore, the growth of synth karaoke in this field is expected in the future.

【０００５】反面、シンセサイザ方式のカラオケには音
声やバックコーラスの演奏が難しいという欠点が存在す
る。これは、シンセカラオケが電子楽器の楽音合成装置
を用いて構成されており、楽音合成装置が楽器音(ピア
ノやギター等の音声以外の楽器)の再生を目的として構
成されていること起因している。On the other hand, synthesizer karaoke has a drawback that it is difficult to play voice and back chorus. This is because synth karaoke is configured using a musical tone synthesizer for electronic musical instruments, and the musical tone synthesizer is configured for playing musical instrument sounds (musical instruments other than voices such as piano and guitar). There is.

【０００６】このような、欠点を克服するために従来考
案されたシンセサイザ方式のカラオケの音楽再生装置と
して、例えば、特開平６−１６１４７９号公報に開示さ
れたものがある。As a conventional music reproducing apparatus for karaoke using a synthesizer system to overcome such drawbacks, there is one disclosed in Japanese Patent Application Laid-Open No. 6-161479.

【０００７】この従来の音楽再生装置を同公報記載の図
面を用い、以下、その説明を行う。同公報の図１におい
て１は楽音データ記録部、２は楽音演奏部、３は入力
部、４は文字データ記録部、５は音声合成部、６は合成
増幅部、７は出力部、８は同期部である。This conventional music reproducing apparatus will be described below with reference to the drawings described in the publication. In FIG. 1 of the publication, 1 is a tone data recording unit, 2 is a tone playing unit, 3 is an input unit, 4 is a character data recording unit, 5 is a voice synthesis unit, 6 is a synthesis amplification unit, 7 is an output unit, and 8 is It is a synchronization unit.

【０００８】以上のように構成された従来の音楽再生装
置について以下説明を行う。楽音データ記録部１は演奏
曲の録音データを記録するもので、例えばコンパクトデ
ィスク(ＣＤ)、レーザーディスク(ＬＤ)または磁気記録
媒体とそのドライブ装置からなる。楽音演奏部２は楽音
データ記録部１の楽音データを受取り、アナログの楽音
信号に変えるものである。ここで記録部１に記録されて
いる楽音データをＭＩＤＩコードとすれば、楽音演奏部
２は電子発振器を使って音を生成するシンセサイザを用
いればよい。また、入力部３は歌唱者の音声をマイク等
で入力する。また、文字データ記録部４は文字コード列
を記録するもので、楽音データ記録部１と同様に磁気記
録媒体等とそにドライブ装置から構成され、楽音データ
記録部１と兼ねることも可能である。音声合成部５は文
字データ記録部４に記録された文字データ列を受取音声
信号に変換するものである。また合成増幅部６は楽音演
奏部２の出力と入力部３より入力された音声信号と音声
合成部５の出力を合成し、かつ増幅する。その結果の出
力はスピーカ等の出力部７に送られる。また同期部８は
楽音演奏部２の演奏に合わせて文字データ記録部４から
音声合成部５へ送られる文字データの制御を指令する。The conventional music reproducing device configured as described above will be described below. The musical sound data recording unit 1 is for recording the recording data of a musical composition, and is composed of, for example, a compact disk (CD), a laser disk (LD) or a magnetic recording medium and its drive device. The musical tone playing section 2 receives the musical tone data of the musical tone data recording section 1 and converts it into an analog musical tone signal. Here, if the musical tone data recorded in the recording unit 1 is a MIDI code, the musical tone playing unit 2 may use a synthesizer that generates a sound using an electronic oscillator. The input unit 3 also inputs the voice of the singer using a microphone or the like. Further, the character data recording unit 4 records a character code string, and like the musical tone data recording unit 1, is composed of a magnetic recording medium and a drive device, and can also function as the musical tone data recording unit 1. . The voice synthesizer 5 converts the character data string recorded in the character data recorder 4 into a received voice signal. The synthesis amplification section 6 synthesizes and amplifies the output of the musical tone performance section 2, the voice signal input from the input section 3 and the output of the voice synthesis section 5. The resulting output is sent to the output unit 7 such as a speaker. Further, the synchronizing section 8 gives an instruction to control the character data sent from the character data recording section 4 to the voice synthesizing section 5 in accordance with the performance of the musical tone performance section 2.

【０００９】音楽データ記録部１にはＭＩＤＩコードが
記録されているものとする。同公報の図２は文字データ
記録部４に記録されている文字データの例を示す。図
１、図２を参照して本装置の動作を説明する。ＭＩＤＩ
コードが演奏され始めると、楽音演奏部２に設定されて
いる楽音データ記録部１からの読みとり速度、すなわち
楽音演奏速度の値が同期部８へ送られ、同期部メモリ９
へ記憶される。次にＭＩＤＩコードの第１小節のデータ
は楽音データ記録部１から読出され、楽音演奏部２に送
られて楽音信号に変換される。同時に１小節分のＭＩＤ
Ｉコードを受け取ったことは同期部８へ知らされる。こ
の同期部８は１小節のＭＩＤＩコードが楽音演奏部２に
送られたことを知ると、文字データ記録部４から１行デ
ータを読出して文字数をカウントし、音声の速さ指定の
値を計算する。ここでは１小節分の文字データは改行ま
での１行としている。そして音声合成部５は速さの指定
から発音する文字コードデータを送る。速さの値は次の
ように演算される。速さの値＝（１小節の時間）／（１
行の文字数）。A MIDI code is recorded in the music data recording unit 1. FIG. 2 of the publication shows an example of character data recorded in the character data recording unit 4. The operation of this apparatus will be described with reference to FIGS. MIDI
When the chord starts to be played, the reading speed from the tone data recording section 1 set in the tone playing section 2, that is, the value of the tone playing speed is sent to the synchronizing section 8 and the synchronizing section memory 9
Is stored in Next, the data of the first measure of the MIDI code is read from the musical tone data recording section 1 and sent to the musical tone performance section 2 to be converted into a musical tone signal. MID for one bar at the same time
The reception of the I code is notified to the synchronization unit 8. When the synchronizing section 8 knows that the one-measure MIDI code has been sent to the musical tone playing section 2, it reads out one line of data from the character data recording section 4, counts the number of characters, and calculates a value designating a voice speed. To do. Here, the character data for one bar is one line up to the line feed. Then, the voice synthesizing unit 5 sends character code data which is pronounced from the designation of speed. The speed value is calculated as follows. Value of speed = (1 bar time) / (1
Number of characters in line).

【００１０】例えば、いま演奏しようとしている曲が１
小節５秒の速さで演奏されるように設定されたとし、バ
ックコーラスの１行の文字数が１０文字であったとする
と、音声合成部５は指定された速さで送られた文字を音
声信号に変換し、合成増幅部６へ送る。この結果、楽音
演奏部２で発生された楽音と、音声合成部５で発生され
た音声との同期がとられ、合成増幅部６より出力部７へ
送られる。このように声音データを文字コード列として
記録し、その文字コード列から音声を発生させるように
しているので、声音データも楽音のＭＩＤＩデータと同
様にコード化して記録することができ、大幅にデータを
圧縮することができる。For example, the song that is about to be played is 1
Assuming that the bar is set to be played at a speed of 5 seconds and the number of characters in one line of the back chorus is 10, the voice synthesizer 5 outputs the characters sent at the specified speed to the voice signal. To the composite amplification unit 6. As a result, the musical sound generated by the musical sound playing unit 2 and the voice generated by the voice synthesizing unit 5 are synchronized and sent from the synthesizing and amplifying unit 6 to the output unit 7. In this way, the voice sound data is recorded as a character code string, and the voice is generated from the character code string. Therefore, the voice sound data can be coded and recorded similarly to the MIDI data of the musical sound, and the data can be greatly Can be compressed.

【００１１】しかしながら、ここに開示された音楽再生
装置では、文字データ記録部４には図２に示したように
文字のみが記憶されていて、これらの文字を音韻に変換
することを音声合成部６に指示することはできるが、音
声合成部６に対して音程を指示することができないう
え、これらの音韻の発音タイミングも１小節内で均等に
割り付けられるため細かく制御することができないなど
の制約があり、特殊な楽曲用のバックコーラスしか実現
できないという課題があった。However, in the music reproducing device disclosed herein, only the characters are stored in the character data recording unit 4 as shown in FIG. 2, and it is necessary to convert these characters into phonemes. 6 can be instructed, but the pitch cannot be instructed to the speech synthesizer 6, and the pronunciation timings of these phonemes cannot be finely controlled because they are evenly allocated within one measure. However, there was a problem that only a back chorus for special songs could be realized.

【００１２】この点を改良した従来のシンセカラオケの
音楽再生装置として、例えば、特開平７−１４０９９１
号公報に開示されたカラオケ装置がある。As a conventional synth karaoke music reproducing apparatus in which this point is improved, for example, Japanese Patent Application Laid-Open No. 7-140991.
There is a karaoke device disclosed in the publication.

【００１３】以下、従来の音楽再生装置を同公報の図面
を用いて、以下に、その説明を行なう。The conventional music reproducing apparatus will be described below with reference to the drawings of the publication.

【００１４】同公報の図１において、１はＣＰＵ、２は
ＲＡＭ、３はバス、４は曲データ記憶装置、５はパネル
Ｉ／Ｆ、６はコントローラ、７は背景映像記憶／再生
部、８は画像／歌詞表示部、９はビデオセレクタ、１０
はモニタ、１１はマイク、１２はミキサおよびエフェク
タ、１３はアンプおよびスピーカー、１４は音源、１５
はプログラムＲＯＭおよびシーケンサ、１６はディジタ
ル音声デコーダである。In FIG. 1 of the publication, 1 is a CPU, 2 is a RAM, 3 is a bus, 4 is a song data storage device, 5 is a panel I / F, 6 is a controller, 7 is a background image storage / reproduction unit, and 8 Is an image / lyrics display section, 9 is a video selector, 10
Is a monitor, 11 is a microphone, 12 is a mixer and an effector, 13 is an amplifier and a speaker, 14 is a sound source, 15
Is a program ROM and sequencer, and 16 is a digital audio decoder.

【００１５】同公報の図２は曲データの構造図であり、
図３は演奏データトラックの詳細を示す説明図であり、
図４は音声データ指示トラックの詳細を示す説明図であ
り、図５は各トラックの時系列的な構造図であり、図６
は音声データをデコードする部分の説明図であり、図７
はコーラス音声制御のタイムチャートである。FIG. 2 of the publication is a structural diagram of music data,
FIG. 3 is an explanatory diagram showing the details of the performance data track,
4 is an explanatory diagram showing details of the audio data instruction track, FIG. 5 is a time-series structural diagram of each track, and FIG.
7 is an explanatory diagram of a portion for decoding audio data, and FIG.
Is a time chart of chorus voice control.

【００１６】以上のように構成された、従来のカラオケ
装置について、以下その動作を説明する。The operation of the conventional karaoke apparatus configured as described above will be described below.

【００１７】図１は従来例のカラオケ装置全体のブロッ
ク図である。この図において１はシステム全体の動作を
制御および管理するＣＰＵ(中央処理装置)、２はＣＰＵ
１によりシステム全体の動作を制御および管理するとき
に使用されるＲＡＭ(ランダムアクセスメモリ)、３はシ
ステム全体を統合するためのデータおよびアドレスバス
である。FIG. 1 is a block diagram of an entire conventional karaoke apparatus. In this figure, 1 is a CPU (central processing unit) that controls and manages the operation of the entire system, and 2 is a CPU
A RAM (random access memory) used when controlling and managing the operation of the entire system by 1 is a data and address bus for integrating the entire system.

【００１８】４は複数の曲データを格納した記憶装置、
例えばＨＤＤ(ハードディスク)、５はパネルＩ/Ｆ（イ
ンタフェース）、６はパネルＩ/Ｆ５を介してシステム
に指示を与える複数のリモコン等のコントローラ、７は
背景映像記憶/再生部、８は背景の静止画や歌詞を表示
するための画像/歌詞表示部、９は背景映像記憶/再生部
７からの背景映像(動画)と画像/歌詞表示部８からの画
像を選択、合成するビデオセレクタ、１０はビデオセレ
クタで選択、合成された画像を表示するモニタである。A storage device 4 stores a plurality of pieces of music data,
For example, an HDD (hard disk), 5 are panel I / Fs (interfaces), 6 are controllers such as a plurality of remote controllers that give instructions to the system via the panel I / Fs 5, 7 is a background image storage / playback unit, and 8 is a background image. An image / lyrics display section for displaying still images and lyrics, and 9 is a video selector for selecting and synthesizing the background video (moving image) from the background video storage / playback section 7 and the image / lyrics display section 8. Is a monitor that displays the image selected and synthesized by the video selector.

【００１９】１１は歌い手の歌唱音声を入力するマイ
ク、１２は歌唱音声と演奏曲の楽音とを混合するミキサ
および各種の音響効果を与えるエフェクタである。１３
は合成された歌唱音声と演奏曲をアンプで増幅して出力
するスピーカである。１４は演奏曲の楽音を発音する音
源、１５はこの音源１４並びに、ミキサおよびエフェク
タ１２を制御するシーケンサである。このシーケンサ１
５はＣＰＵ１が使用するプログラムを格納したプログラ
ムＲＯＭを有する。１６は符号化されたディジタル音声
データ(例えばＰＣＭやＡＤＰＣＭデータ)をデコードす
る音声デコーダである。Reference numeral 11 is a microphone for inputting the singer's singing voice, 12 is a mixer for mixing the singing voice and the musical sound of the performance song, and an effector for providing various acoustic effects. 13
Is a speaker that amplifies and outputs the synthesized singing voice and performance music. Reference numeral 14 is a sound source for producing a musical tone of a musical composition, and 15 is a sequencer for controlling the sound source 14 and the mixer and effector 12. This sequencer 1
Reference numeral 5 has a program ROM that stores programs used by the CPU 1. A voice decoder 16 decodes encoded digital voice data (for example, PCM or ADPCM data).

【００２０】以下、動作を説明する。コントローラ６の
操作によって演奏曲が指定されるとＣＰＵ１は記憶装置
４に記憶されている曲目リストを参照して該当する演奏
曲の曲データおよびバックコーラス用の符号化された音
声データをＲＡＭ２に転送し、シーケンサ１５に制御を
移す。シーケンサ５１５は、曲データに含まれる複数の
イベントデータを基に、曲演奏を含む複数のイベントを
同時に並列して実行する。The operation will be described below. When the musical composition is designated by the operation of the controller 6, the CPU 1 refers to the musical composition list stored in the storage device 4 and transfers the musical composition data of the corresponding musical composition and the encoded voice data for the back chorus to the RAM 2. Then, the control is transferred to the sequencer 15. The sequencer 515 simultaneously executes a plurality of events including music performance in parallel based on a plurality of event data included in the music data.

【００２１】即ち、シーケンサ１５は、楽曲の音色およ
び疑似音声の音色に関するデータは音源１４に、また符
号化された音声データはディジタル音声デコーダ１６
に、さらに背景映像番号は背景映像記録/再生部７に、
また歌詞番号は画像/歌詞表示部８にそれぞれ供給す
る。この結果、モニタ１０の全面に背景映像が表示さ
れ、その一部に歌詞がスーパーインポーズされた表示状
態となる一方、スピーカ１３からは演奏曲およびバック
コーラスが出力される演奏状態になる。That is, in the sequencer 15, the data relating to the tone color of the musical composition and the tone color of the pseudo voice are sent to the sound source 14, and the encoded voice data is sent to the digital voice decoder 16.
In addition, the background video number is in the background video recording / playback unit 7,
The lyrics numbers are supplied to the image / lyrics display section 8, respectively. As a result, the background image is displayed on the entire surface of the monitor 10 and a part of the background image is displayed in a superimposed state, while the speaker 13 is in a playing state in which the playing song and the back chorus are output.

【００２２】曲データは図２に示すように、ヘッダ部、
データシーケンス部、音声データ部からなる。ヘッダ部
にはその曲固有の情報として、曲番号、曲名、作曲者
名、歌手名、背景画像選択情報、歌詞のフォント情報が
書き込まれている。As shown in FIG. 2, the music data includes a header portion,
It consists of a data sequence part and a voice data part. In the header portion, as the information peculiar to the song, a song number, a song name, a composer name, a singer name, background image selection information, and lyric font information are written.

【００２３】データシーケンス部には、並行して同時に
実行される複数種類のイベントを記述した複数のトラッ
クが設定されている。音声データ部にはデータシーケン
ス部の音声データ指示データで選択される複数の音声番
号が書き込まれている。In the data sequence section, a plurality of tracks describing a plurality of types of events that are executed simultaneously in parallel are set. A plurality of voice numbers selected by the voice data instruction data of the data sequence portion are written in the voice data portion.

【００２４】データシーケンス部に含まれるトラックの
内、演奏データトラックには、音源１４から演奏曲の楽
音を発声させるデータを時系列に記述してある。疑似音
声データトラックには、音源１４から疑似コーラス音声
(例えば「わー」、「うー」)を発声させるデータを時系
列に記述してある。音声データ指示トラックにはデコー
ダ１６でデコードさせる真性コーラス音声(例えば「は
こだてー」、「ながさきー」)の種類を指示する音声デ
ータ番号、音程、音高の各データが記述してある。歌詞
データトラックには、演奏に併せてモニタ１０に表示す
る歌詞の種類を指示するデータを時系列に記述してあ
る。効果制御データトラックには、ミキサおよびエフェ
クタ１２を制御するコントロールデータを時系列に記述
してある。Among the tracks included in the data sequence section, data for uttering the musical tone of the musical piece to be played from the sound source 14 is described in time series on the performance data track. The pseudo voice data track contains a pseudo chorus voice from the sound source 14.
The data for uttering (for example, "Wow", "Woo") is described in time series. In the voice data instruction track, voice data numbers, pitches, and pitch data for instructing the types of true chorus voices (for example, "Hakodate" and "Nagasaki") to be decoded by the decoder 16 are described. In the lyrics data track, data indicating the type of lyrics to be displayed on the monitor 10 along with the performance is described in time series. In the effect control data track, control data for controlling the mixer and the effector 12 are described in time series.

【００２５】図３は演奏でーたトラックの詳細を示す説
明図である。この演奏データトラックには、ノートイベ
ント、音色変更イベント、ピッチベンドイベントの各情
報が記述されている。ノートイベントには、音源１４の
発声させようとする一つのＣＨ(チャンネル)を指示する
ＣＨナンバ、ノート番号(音高)、ベロシティ(音量)、符
長が書き込まれている。音色変更イベントにはＣＨナン
バ、音色データが書き込まれている。ピッチベンドイベ
ントには、ＣＨナンバ、ピッチベンド情報が書き込まれ
ている。演奏データトラックのノートイベントは、演奏
曲の楽音発音用であるが、疑似音声データトラックも、
ＣＨナンバが異なるだけの同様構造のノートイベントを
有する。FIG. 3 is an explanatory diagram showing details of the track played. In the performance data track, note event information, tone color change event information, and pitch bend event information are described. A CH number, a note number (pitch), a velocity (volume), and a note length for designating one CH (channel) to be uttered by the sound source 14 are written in the note event. CH number and tone color data are written in the tone color change event. The CH number and pitch bend information are written in the pitch bend event. The note event of the performance data track is for the musical tone pronunciation of the performance song, but the pseudo voice data track also
It has a note event with a similar structure but with different CH numbers.

【００２６】図４は、音声データ指示トラックの詳細を
示す説明図である。この音声データ指示トラックには、
音声指示イベントの各情報が記述されている。即ち、音
声データ番号、音程、音量の各情報である。音声データ
番号は、デコーダ１６でデコードさせる符号化された真
性コーラス音声データの番号であり、図２の音声データ
の音声番号のことである。予め複数のトラックは図５に
示すように、イベントの種類と次のイベント発声までの
待ち時間Δｔを時系列に配列した構造である。FIG. 4 is an explanatory diagram showing details of the audio data instruction track. In this voice data instruction track,
Each information of the voice instruction event is described. That is, it is information about the voice data number, the pitch, and the volume. The voice data number is the number of the encoded true chorus voice data to be decoded by the decoder 16, and is the voice number of the voice data of FIG. As shown in FIG. 5, the plurality of tracks have a structure in which the type of event and the waiting time Δt until the next event is uttered are arranged in time series.

【００２７】図６は音声指示イベントの情報に従い、音
声データをデコードする部分の説明図である。シーケン
サ１５は、音声指示イベントの音声データ番号を用いて
ＲＡＭ２から該当する音声データ番号の符号化されたデ
ィジタル音声データを読み出し、それをディジタル音声
デコーダ１６に入力する。このディジタル音声データ
は、従来例では真性コーラス音声データである。一例と
してディジタル音声データが、データ容量を減らすため
に圧縮されたアダプティブ・デルタ(ＡＤ)ＰＣＭデータ
である場合、デコーダ１６はビット数変換および周波数
変換を行って伸長する機能を有したＰＣＭデコーダであ
る。FIG. 6 is an explanatory diagram of a portion for decoding the voice data according to the information of the voice instruction event. The sequencer 15 reads the encoded digital voice data of the corresponding voice data number from the RAM 2 using the voice data number of the voice instruction event and inputs it to the digital voice decoder 16. This digital voice data is true chorus voice data in the conventional example. As an example, when the digital audio data is adaptive delta (AD) PCM data compressed to reduce the data capacity, the decoder 16 is a PCM decoder having a function of performing bit number conversion and frequency conversion to expand. .

【００２８】デコーダ１６の後段には音程および音量を
制御するプロセッサ１７が配置され、ここでデコードさ
れたアナログ波形の音程および音量を、音声指示イベン
トに含まれる音程および音量情報に基づいて調整した
後、ミキサおよびエフェクタ１２に入力する。この経路
を用いて真性コーラス音声を再生するのは再生速度が１
００％のときまたはピッチの変更量が所定範囲内のとき
である。コントローラ６からの指示で演奏曲の再生速度
を１００％以外に変更したときはプロセッサ１７におい
てデコーダ１６の出力に対する音量を０にして、真性コ
ーラス音声の再生を中止する。ピッチの変更量が所定量
の２００〜３００セントを超えた場合も同様である。A processor 17 for controlling the pitch and the volume is arranged in the subsequent stage of the decoder 16, and after adjusting the pitch and the volume of the analog waveform decoded here based on the pitch and the volume information included in the voice instruction event. , Mixer and effector 12. The playback speed of the intrinsic chorus voice is 1 using this path.
When it is 00% or when the pitch change amount is within a predetermined range. When the reproduction speed of the musical composition is changed to a value other than 100% by the instruction from the controller 6, the processor 17 sets the volume for the output of the decoder 16 to 0 and stops the reproduction of the intrinsic chorus voice. The same applies when the pitch change amount exceeds a predetermined amount of 200 to 300 cents.

【００２９】ピッチ変更時に真性コーラス音声を使用し
ないのは次の理由による。つまり、音程、音量制御部１
７において、真性コーラス音声に対する音程をある程度
以上変化させると、音声の質が低下し、実質的に使用で
きなくなる。「ある程度」の目安は半音単位で±２〜３
音(２００から３００セント)である。したがって、テンポ
(再生速度)を変更した場合だけでなく、ピッチ(音程)を
所定量以上変更したときも疑似コーラス音声を使用す
る。The reason why the intrinsic chorus voice is not used when changing the pitch is as follows. That is, the pitch / volume controller 1
In No. 7, if the pitch for the intrinsic chorus voice is changed to a certain extent or more, the quality of the voice deteriorates and it becomes practically unusable. A guideline for "somewhat" is ± 2 to 3 in semitone units.
Sound (200 to 300 cents). Therefore, the tempo
The pseudo chorus voice is used not only when the (playback speed) is changed but also when the pitch (pitch) is changed by a predetermined amount or more.

【００３０】以上の理由から、演奏曲の再生速度が１０
０％以外になったとき、またはピッチの変更量が所定量
の２００〜３００セントを超えたときは、シーケンサ５１
５は疑似音声データトラックのデータに従い、音源５１
４から疑似コーラスを発声させる。疑似コーラス音声
は、予め音源５１４に、楽音と同様にして「わー」「う
ー」という音声のチャンネルを設定しておき、それを疑
似音声データで選択して発音させるものである。For the above reasons, the playing speed of the performance music is 10
When it becomes a value other than 0% or when the pitch change amount exceeds a predetermined amount of 200 to 300 cents, the sequencer 51
5 is a sound source 51 according to the data of the pseudo voice data track.
Produce a pseudo-chorus from 4. The pseudo-chorus voice is a voice channel in which the voices “wow” and “woo” are set in the sound source 514 in advance in the same manner as the musical tone, and the voice channel is selected by the pseudo-voice data to be sounded.

【００３１】図７は、テンポ(再生速度)が１００％を中
心に変化した場合の、シーケンサ１５による制御の様子
を示している。この例では音声データトラックのイベン
トは、時刻ｔ１、ｔ５、ｔ８で順番に変化する。これに
対しテンポは時間ｔ４で１００５から７０％に、時刻ｔ
６で７０％から１２０％に、時刻ｔ７で１２０％から１
００％に変化した例である。FIG. 7 shows how the sequencer 15 controls when the tempo (reproduction speed) changes around 100%. In this example, the events of the audio data track change in order at times t1, t5, and t8. On the other hand, the tempo changes from 1005 to 70% at time t4 at time t4.
70% to 120% at 6 and 120% to 1 at time t7
This is an example of change to 00%.

【００３２】時刻ｔ１のテンポは１００％なので、この
時は音声データトラックのイベントに従い、「はこだて
ー」という真性音声データのデコードを開始し、音声デ
ータの音量を音声イベントに記述された所定量に設定す
る。このとき、疑似音声データトラックについては何も
処理しないという訳ではなく、最初の「は」に相当する
音声イベントを読み、その疑似音声(例えば「わー」)を
音源１４から発音させる。但し実際には、疑似音声側の
音量を０にして、スピーカ１３からは出力できないよう
にする。同時に、真性音声データに関する音量値(上記
の所定値)を復帰用に記憶しておく。Since the tempo at time t1 is 100%, at this time, the decoding of the true audio data "Hakodate" is started according to the event of the audio data track, and the volume of the audio data is set to the predetermined amount described in the audio event. Set. At this time, nothing is processed with respect to the pseudo voice data track, and the voice event corresponding to the first "ha" is read and the pseudo voice (for example, "wow") is sounded from the sound source 14. However, in reality, the volume of the pseudo sound is set to 0 so that the speaker 13 cannot output the sound. At the same time, the volume value (the above-mentioned predetermined value) regarding the intrinsic voice data is stored for restoration.

【００３３】時刻ｔ２でもテンポは１００％なので、真
性コーラス音声の出力を継続する。このとき、疑似音声
トラックは「こ」の発音イベントに入るが、疑似音声側
の音量は０のままである。時刻ｔ３でもテンポは１００
％なので、真性コーラスの音声の出力を継続する。この
とき、疑似音声トラックは「だ」の発音イベントに入る
が、疑似音声側の音量は０のままである。Since the tempo is 100% at time t2, the true chorus voice is continuously output. At this time, the pseudo voice track enters the sounding event of "ko", but the volume of the pseudo voice remains 0. Tempo is 100 even at time t3
%, So the output of the true chorus voice is continued. At this time, the pseudo voice track enters a sounding event of "da", but the volume on the pseudo voice side remains 0.

【００３４】時刻ｔ４になるとテンポが１００％から７
０％に下がるので、真性音声をフェードアウトし、代わ
りに疑似音声の発音チャンネルを予め記憶しておいた所
定値にフェードインする。このときは、真性音声が再生
途中である可能性もあるので緩やかなクロスフェードを
行って途切れを回避する。At time t4, the tempo changes from 100% to 7
Since it decreases to 0%, the true voice is faded out, and instead, the pseudo voice generation channel is faded in to a predetermined value stored in advance. At this time, since there is a possibility that the true voice is being reproduced, a gentle crossfade is performed to avoid interruption.

【００３５】時刻ｔ５になると音声データ側は次のイベ
ント(「しんじることさ」)に入るが、テンポが１００％
に戻っていないので、音量を０にしたまま真性音声の発
音はせず代わりに「し」に対する疑似音声(例えば「わ
ー」)を音源１４から発声させる。この疑似音声につい
ては、イベントに記述されたＣＨ番号、ノート番号(音
高)、ベロシティ(音量)、符長を用いて制御する。At time t5, the audio data side enters the next event ("Shinji Kotosa"), but the tempo is 100%.
Since the sound has not been returned to, the pseudo sound (for example, "wa") corresponding to "shi" is uttered from the sound source 14 instead of producing the true sound with the volume set to 0. This pseudo voice is controlled using the CH number, note number (pitch), velocity (volume), and note length described in the event.

【００３６】時刻ｔ６でテンポが７０％から１２０％に
変化するが、１００％ではないので「ん」に相当する疑
似音声(例えば「わー」)の発音を継続する。時刻ｔ７で
テンポが７０％から１００％に変化するので真性音声に
戻る１つの条件が成立したが、この段階では前の音声デ
ータイベントのデコード途中なので、次の音声データイ
ベントが開始するまでは、疑似音声の発音を継続する。At time t6, the tempo changes from 70% to 120%, but since it is not 100%, the pseudo voice corresponding to "n" (for example, "wa") is continuously produced. Since the tempo changes from 70% to 100% at time t7, one condition for returning to true voice is satisfied, but at this stage, since the previous voice data event is being decoded, until the next voice data event starts, Continue to pronounce the pseudo voice.

【００３７】時刻ｔ８になるとテンポが１００％であ
り、かつ新たな音声データイベントが開始するので、こ
のタイミングで真性音声にフェードインさせる。この時
は真性音声の開始時なので、急峻にフェードインしても
問題ない。この時、疑似音声の「ど」に相当する発音部
分もあるが、その発音チャンネルの音量を０にして発音
を禁止する。以上のような制御をシーケンサ１５が行う
ことで、コーラスパート付きのカラオケ演奏曲を再生速
度が変化しても常に時間のずれの無い状態で提供するこ
とができる。At time t8, the tempo is 100%, and a new voice data event starts, so that the real voice is faded in at this timing. At this time, since it is the start of the intrinsic voice, there is no problem even if it suddenly fades in. At this time, although there is a sounding portion corresponding to "d" of the pseudo sound, the sound volume of the sounding channel is set to 0 to prohibit sounding. By performing the above-described control by the sequencer 15, it is possible to provide a karaoke performance piece with a chorus part without any time lag even if the reproduction speed changes.

【００３８】[0038]

【発明が解決しようとする課題】しかしながら上記の従
来の構成では、演奏速度を変化したり音程を変化すると
ＰＣＭおよびＰＣＭ圧縮波形で形成されたバックコーラ
スから楽音合成装置で形成された疑似バックコーラスに
切り替わるため、音質が変化してしまい、特に演奏速度
と音程が規定値内でないときは、疑似バックコーラスに
なって、バックコーラスの品質が低下するという課題を
有していた。特にＰＣＭおよびＰＣＭ圧縮で形成したバ
ックコーラスが歌詞を再生するのに対し、「あー」、
「わー」で構成した疑似バックコーラスは異質である。However, in the above-mentioned conventional configuration, when the playing speed or the pitch is changed, the back chorus formed by the PCM and the PCM compressed waveform is changed to the pseudo back chorus formed by the tone synthesizer. Since the sound quality is changed, the sound quality is changed, and especially when the playing speed and the pitch are not within the specified values, a pseudo back chorus is generated and the quality of the back chorus is deteriorated. In particular, while PCM and a back chorus formed by PCM compression play lyrics, "Ah",
The pseudo back chorus composed of "wow" is different.

【００３９】また、ＰＣＭおよびＰＣＭ圧縮形式のバッ
クコーラスのデータ量は、楽音合成装置を駆動する一般
にＭＩＤＩと呼ばれる演奏情報に較べて、非常に多い
(１００倍以上)ため、通信カラオケ等ではこれらを曲毎
に通信回線を経由して伝送するのに多大の時間を要し、
またスタンドアロンのシンセカラオケ等では大量のデー
タ記憶装置が必要になる。これを避けるためには演奏速
度または音程が規定値内であっても疑似バックコーラス
を用いるようにしなければならないという課題を有して
いた。Further, the data amount of the PCM and the back chorus in the PCM compression format is much larger than the performance information generally called MIDI which drives the tone synthesizer.
Because it is (100 times or more), it takes a lot of time to transmit these for each song via a communication line in online karaoke,
A large amount of data storage device is required for a stand-alone synth karaoke. In order to avoid this, there is a problem that the pseudo back chorus must be used even if the playing speed or the pitch is within the specified value.

【００４０】本発明は上記従来の課題を解決するもの
で、ＰＣＭ品質のバックコーラスを合成できる歌声合成
装置を提供し、さらにこれをシンセカラオケに適用する
ことにより、演奏速度や音程が変化してもバックコーラ
スの品質が低下しない音楽再生装置を提供するものであ
る。The present invention solves the above-mentioned conventional problems, and provides a singing voice synthesizer capable of synthesizing a back chorus of PCM quality, and by applying the singing voice synthesizer to a synth karaoke, the performance speed and pitch are changed. Also provides a music reproducing device in which the quality of the back chorus does not deteriorate.

【００４１】[0041]

【課題を解決するための手段】これらの課題を解決する
ために、本発明の歌声合成装置は、入力された音韻と音
高と音量および発音タイミングからなる演奏情報を第１
および第２の出力系列に交互に出力するアサイナと、ア
サイナの第１および第２の出力系列によっ音高と発音タ
イミングを指示されて第１および第２の出力にそれぞれ
第１および第２の音程変化値を出力する音程差分発生器
と、アサイナの第１および第２の出力系列によってそれ
ぞれ音高と音韻と発音タイミングを指示されかつ音程差
分発生器の第１および第２の出力によってそれぞれ音程
変化値を指示される第１および第２の読み出し器と、ア
サイナの第１および第２の出力系列によってそれぞれ音
量と発音タイミングを指示され波形包絡データを生成す
る第１および第２のエンベ形成器と、予め複数の音韻お
よび音高の音声波形を記憶していてそれぞれ第１および
第２の読み出し器によってデータが読み出される第１お
よび第２の波形メモリと、第１および第２の読み出し器
の出力をそれぞれ第１および第２のエンベ形成器の出力
でレベル変換する第１および第２のレベル変換器と、第
１および第２のレベル変換器の出力を加算する加算器と
を備える。In order to solve these problems, the singing voice synthesizing apparatus of the present invention first provides performance information consisting of an input phoneme, pitch, volume and sounding timing.
And a second output sequence for alternately outputting the pitch, and the first and second output sequences of the assigner to instruct the pitch and sounding timing to output the first and second outputs respectively. A pitch difference generator that outputs a pitch change value, and a pitch and a phoneme and a sounding timing are instructed by the first and second output sequences of the assigner, respectively, and the pitch difference generator generates the pitch by the first and second outputs, respectively. First and second readers for which change values are instructed, and first and second envelope formers for generating waveform envelope data in which volume and sound generation timing are instructed by the first and second output sequences of the assigner, respectively. And a first and second waveform memory in which voice waveforms of a plurality of phonemes and pitches are stored in advance and data is read by the first and second readers, respectively. And first and second level converters for level converting the outputs of the first and second readers with the outputs of the first and second envelope formers, respectively, and the first and second level converters. And an adder for adding the outputs of the above.

【００４２】また、本発明の音楽再生装置は、楽器およ
び歌唱の演奏情報を記憶している演奏データメモリと、
演奏データメモリから読みだした演奏情報と移調入力お
よび演奏速度入力に応じてタイミングを制御しながら演
奏指示情報を出力するシーケンサと、シーケンサの演奏
指示に従って楽音を合成する楽音合成装置と、シーケン
サの演奏指示に従って歌声を合成する歌声合成装置と、
楽音合成装置の出力と歌声合成装置の出力を加算する加
算器とを備える。Further, the music reproducing apparatus of the present invention comprises a performance data memory storing performance information of musical instruments and singing,
A sequencer that outputs performance instruction information while controlling the timing according to the performance information read from the performance data memory and transposition input and performance speed input, a tone synthesizer that synthesizes musical sounds according to the performance instructions of the sequencer, and the sequencer performance A singing voice synthesizer that synthesizes a singing voice according to instructions,
An adder for adding the output of the musical sound synthesizer and the output of the singing voice synthesizer is provided.

【００４３】[0043]

【発明の実施の形態】本発明の請求項１に記載の発明
は、入力された音韻と音高と音量および発音タイミング
からなる演奏情報を第１および第２の出力系列に交互に
出力するアサイナと、前記アサイナの第１および第２の
出力系列によってそれぞれ音高と音韻と発音タイミング
を指示される第１および第２の読み出し器と、前記アサ
イナの第１および第２の出力系列によってそれぞれ音量
と発音タイミングを指示され波形包絡データを生成する
第１および第２のエンベ形成器と、予め複数の音韻およ
び音高の音声波形を記憶していてそれぞれ第１および第
２の前記読み出し器によってデータが読み出される第１
および第２の波形メモリと、前記第１および第２の読み
出し器の出力をそれぞれ前記第１および第２のエンベ形
成器の出力でレベル変換する第１および第２のレベル変
換器と、前記第１および第２のレベル変換器の出力を加
算する加算器とを備えたものであり、指定された音韻/
音高の音声波形を、音韻毎に交互に個別に読み出し器で
読み出し、クロスフェードしながら滑らかに出力する。
クロスフェード部以外は原音ＰＣＭそのままなので品質
の高い歌声が合成できる。BEST MODE FOR CARRYING OUT THE INVENTION The invention according to claim 1 of the present invention is an assigner for alternately outputting performance information consisting of an input phoneme, pitch, volume and sounding timing to first and second output sequences. And first and second readers whose pitch, phoneme, and sounding timing are instructed by the first and second output series of the assigner, respectively, and the volume of sound by the first and second output series of the assigner, respectively. And the first and second envelope formers that generate the waveform envelope data instructed by the sounding timing, and the voice waveforms of a plurality of phonemes and pitches are stored in advance and the data is read by the first and second readers, respectively. First is read
And a second waveform memory, first and second level converters for level-converting the outputs of the first and second readers with the outputs of the first and second envelope formers, respectively. An adder for adding the outputs of the first and second level converters,
The voice waveform of the pitch is read out by the reading device alternately for each phoneme, and smoothly output while crossfading.
Except for the cross-fade part, the original sound PCM is unchanged, so that a high quality singing voice can be synthesized.

【００４４】また、前の音韻と後ろの音韻で音高差があ
る場合にはクロスフェード時に前の音韻の音高を次第に
後ろの音高に漸近させるので音高の変化が滑らかにな
り、自然な歌声を合成することができる。Further, when there is a pitch difference between the front phoneme and the back phoneme, the pitch of the front phoneme is gradually asymptotically approached to the back phoneme at the time of crossfade, so that the pitch change becomes smooth and natural. You can synthesize different singing voices.

【００４５】さらに、上記歌声合成装置と楽音合成装置
をカラオケに組み込みバックコーラスを発生させるので
演奏する音程を変化させた時も音声波形の読み出し位置
が変化するだけなので演奏速度と音程にかかわらず品質
の高いバックコーラスが得られる。Further, since the above-mentioned singing voice synthesizer and tone synthesizer are incorporated in karaoke to generate a back chorus, even when the pitch to be played is changed, only the read position of the voice waveform is changed, so that the quality is maintained regardless of the playing speed and the pitch. A high back chorus is obtained.

【００４６】以下、本発明の実施の形態について、図面
を参照しながら説明する。（実施の形態１）図１は、本発明の実施の形態１におけ
る歌声合成装置の構成を示すブロック図である。図１に
おいて、１００、１０１、１０２、１０３は入力端子、
１０４はアサイナ、１０５ａ、１０５ｂは波形メモリ、
１０６ａ、１０６ｂは読み出し器、１０７ａ、１０７ｂ
はエンベ形成器、１０８ａ、１０８ｂはレベル変換器、
１０９は加算器、１１０は出力端子、１１１は音程差分
発生器である。Hereinafter, embodiments of the present invention will be described with reference to the drawings. (Embodiment 1) FIG. 1 is a block diagram showing a configuration of a singing voice synthesizing apparatus according to Embodiment 1 of the present invention. In FIG. 1, 100, 101, 102 and 103 are input terminals,
104 is an assigner, 105a and 105b are waveform memories,
106a and 106b are readers, and 107a and 107b.
Is an envelope former, 108a and 108b are level converters,
109 is an adder, 110 is an output terminal, and 111 is a pitch difference generator.

【００４７】図２は本実施の形態における読み出し器１
０６ａ、１０６ｂの動作および構成を示す図である。図
２（ｂ）において、７０１、７０２、７０３、７０８は
入力端子、７０７は出力端子、７０９、７０６は加算
器、７０４はアドレス生成器、７０５は読み出し開始ア
ドレス生成器である。FIG. 2 shows the reader 1 according to this embodiment.
It is a figure showing operation and composition of 06a and 106b. In FIG. 2B, 701, 702, 703, and 708 are input terminals, 707 is an output terminal, 709 and 706 are adders, 704 is an address generator, and 705 is a read start address generator.

【００４８】図３は本実施の形態における音程差分発生
器１１１の動作を説明する説明図、図４は本実施の形態
における波形メモリ１０５ａおよび１０５ｂのデータ格
納状態の説明図、図５は本実施の形態における波形メモ
リ１０５ａおよび１０５ｂのループ読み出しの説明図、
図６は本実施の形態における歌声合成のクロスフェード
処理の説明図、図７は本実施の形態におけるアサイナ１
０４の動作説明図である。FIG. 3 is an explanatory diagram for explaining the operation of the pitch difference generator 111 according to the present embodiment, FIG. 4 is an explanatory diagram for the data storage state of the waveform memories 105a and 105b according to the present embodiment, and FIG. 5 is for the present embodiment. Of loop reading of the waveform memories 105a and 105b in the form of FIG.
FIG. 6 is an explanatory view of the crossfade processing of singing voice synthesis in this embodiment, and FIG. 7 is an assigner 1 in this embodiment.
It is operation | movement explanatory drawing of 04.

【００４９】以上のように構成された本実施の形態にお
ける歌声合成装置の動作を説明する。The operation of the singing voice synthesizing apparatus according to the present embodiment configured as described above will be described.

【００５０】アサイナ１０４には、合成すべき歌声の音
韻と音高と音量および発音タイミングからなる演奏情報
がそれぞれ入力端子１００、１０１、１０２、１０３か
ら与えられる。アサイナ１０４は、入力された音韻と音
高と音量および発音タイミングからなる演奏情報を交互
に第１および第２の出力系列に出力する。Performance information consisting of phonemes, pitches, volumes and sounding timings of singing voices to be synthesized is given to the assigners 104 from the input terminals 100, 101, 102 and 103, respectively. The assigner 104 alternately outputs the input phoneme, pitch, volume, and performance information consisting of sounding timing to the first and second output sequences.

【００５１】さらに、アサイナ１０４の第１および第２
の出力系列にはそれぞれ第１の読み出し器１０６ａと第
２の読み出し器１０６ｂが接続されていて、アサイナ１
０４から音高と音韻と発音タイミングが指示される。Further, the first and second of the assigner 104 are
A first reader 106a and a second reader 106b are respectively connected to the output series of the assigner 1
From 04, the pitch, phoneme, and sounding timing are instructed.

【００５２】読み出し器１０６ａおよび１０６ｂは次の
ように動作する。音韻と音高が指示され発音の開始が指
示されると、読み出し器１０６ａおよび１０６ｂは波形
メモリの指示された音韻および音高の格納されている先
頭番地から順に波形データを読出す。The readers 106a and 106b operate as follows. When the phoneme and the pitch are instructed and the start of the pronunciation is instructed, the readers 106a and 106b sequentially read the waveform data from the head address where the instructed phoneme and the pitch are stored in the waveform memory.

【００５３】一般に、音声の波形は、発音開始部の数１
００ｍ秒が過渡部であり、以後は比較的類似した波形が
繰り返し出現する性質がある。この繰り返し部分も数１
００ｍ秒で表現できることが知られている。すなわち元
となる音声の波形を図５(ａ)とすると、発音開始部＋繰
り返し部を波形メモリに記憶しておき、読出す際には図
５(ｂ)のように繰り返し部をループして読出すことによ
って、波形メモリ量を１音韻あたり５００ｍ秒から１秒
程度にすることができる。In general, the waveform of a voice is the number 1 of the sound generation start part.
00 msec is the transition part, and thereafter, there is a property that a relatively similar waveform repeatedly appears. This repeated part is also number 1
It is known that it can be expressed in 00 msec. That is, assuming that the waveform of the original voice is as shown in FIG. 5 (a), the sound generation start portion + repetition portion is stored in the waveform memory, and when reading out, the repetition portion is looped as shown in FIG. 5 (b). By reading out, the waveform memory amount can be set to about 500 msec to 1 sec per phoneme.

【００５４】このように波形メモリ１０５ａ、１０５ｂ
は図２に示すような構成をしている。入力端子７０３か
ら音韻、７０８から音高が指定される。していされた音
韻と音高の組み合わせにしたがい読み出し開始アドレス
が読み出し開始アドレス生成器７０５から出力される。In this way, the waveform memories 105a and 105b
Has a configuration as shown in FIG. A phoneme is designated from the input terminal 703, and a pitch is designated from 708. The read start address is output from the read start address generator 705 according to the combination of the phoneme and the pitch that has been played.

【００５５】同時に入力端子７０１から発音タイミン
グ、７０２から音程差分値があたえらる。加算器７０９
は音程差分値と音高の指定値を加算しこれをΔＡとして
出力する。ΔＡはアドレス生成器７０４に与えられてい
てアドレス生成器７０４は基本的にΔＡを積分する。た
だし積分値が波形データ長を超える毎に積分値から繰り
返し部の長さに相当する値が引き算され、図２（ａ）の
ようなアドレス値が生成される。このアドレス値が読み
出し開始アドレス生成器７０５の出力と加算器７０６で
加算され出力端子７０７に波形メモリに与えるアドレス
が出力される。At the same time, the tone generation timing is given from the input terminal 701 and the pitch difference value is given from 702. Adder 709
Outputs the pitch difference value and the specified value of the pitch, and outputs this as ΔA. ΔA is given to the address generator 704, and the address generator 704 basically integrates ΔA. However, every time the integrated value exceeds the waveform data length, a value corresponding to the length of the repeating portion is subtracted from the integrated value to generate an address value as shown in FIG. This address value is added to the output of the read start address generator 705 and the adder 706, and the address given to the waveform memory is output to the output terminal 707.

【００５６】また、音程差分発生器１１１にはアサイナ
１０４の第１および第２の出力系列双方から音高および
発音タイミングが指示される。音程差分発生器１１１は
図３のように前の音韻から後ろの音韻に移行する際に前
の音韻の音高をしだいに後ろの音韻の音高に近づけるた
めに時間的に変化する第１および第２の補正値をそれぞ
れ第１および第２の出力とし、これらの出力はそれぞれ
第１の読み出し器１０６ａと第２の読み出し器１０６ｂ
の音程変化値入力として与えられる。Further, the pitch difference generator 111 is instructed of the pitch and tone generation timing from both the first and second output series of the assigner 104. The pitch difference generator 111 temporally changes the pitch of the preceding phoneme to gradually approach the pitch of the following phoneme when transitioning from the preceding phoneme to the following phoneme as shown in FIG. The second correction values are respectively the first and second outputs, and these outputs are respectively the first reader 106a and the second reader 106b.
It is given as the pitch change value input of.

【００５７】さらに、アサイナ１０４の第１および第２
の出力系列にはそれぞれ波形包絡データを生成する第１
のエンベ形成器１０７ａおよび第２のエンベ形成器１０
７ｂが接続されていてそれぞれアサイナ１０４の第１お
よび第２の出力系列から音量と発音タイミングを指示さ
れる。Further, the first and second of the assigner 104 are
The first output sequence for generating the waveform envelope data
Envelope former 107a and second envelope former 10
7b is connected, and the volume and sound generation timing are instructed from the first and second output series of the assigner 104, respectively.

【００５８】第１の読み出し器１０６ａと第２の読み出
し器１０６ｂによって読み出される第１の波形メモリ１
０５ａおよび１０５ｂには図４のように日本語や英語そ
のたの言語について通常発音可能または演奏に必要な全
音韻、全音高の波形データが格納されていて、第１の読
み出し器１０６ａと第２の読み出し器１０６ｂは指定さ
れた音高および音韻の波形データを順次読み出す。The first waveform memory 1 read by the first reader 106a and the second reader 106b
Reference numerals 05a and 105b store waveform data of all phonemes and pitches that are normally soundable or necessary for performance in Japanese, English and other languages as shown in FIG. Reader 106b sequentially reads the waveform data of the specified pitch and phoneme.

【００５９】第１の読み出し器１０６ａと第２の読み出
し器１０６ｂによって読み出された波形データは、それ
ぞれ第１および第２のエンベ形成器１０７ａと１０７ｂ
の出力値に従ってレベル変換器１０８ａと１０８ｂによ
ってレベル変換される。The waveform data read by the first reader 106a and the second reader 106b are the first and second envelope formers 107a and 107b, respectively.
The level is converted by the level converters 108a and 108b in accordance with the output value of.

【００６０】さらに、レベル変換器１０８ａと１０８ｂ
の出力は加算器１０９で加算され出力端子１１０に歌声
合成出力が得られる。Further, the level converters 108a and 108b.
Is added by the adder 109 and a singing voice synthesis output is obtained at the output terminal 110.

【００６１】以上の動作の一例として、たとえば、図７
の楽譜で示す歌声を合成する場合を説明する。As an example of the above operation, for example, FIG.
The case of synthesizing the singing voice shown in the score will be described.

【００６２】この例の場合、アサイナ１０４の入力およ
び第１、第２の出力系列の出力は図７のようになる。In the case of this example, the input of the assigner 104 and the outputs of the first and second output series are as shown in FIG.

【００６３】アサイナ１０４の出力は第１の読み出し器
１０６ａおよび第２の読み出し器１０６ｂに与えられ、
第１の読み出し器１０６ａは時間t1から音韻あ(音高ド)
を読み出し、時間t3から音韻た(音高ファ)を読み出す。
同じく第２の読み出し器１０６ｂは時間t2から音韻し
(音高ラ)を読み出し、時間ｔ4から音韻の(音高ソ)を読
み出す。The output of the assigner 104 is given to the first reader 106a and the second reader 106b,
The first reader 106a displays the phoneme from the time t1.
And the phoneme (pitch pitch) is read from time t3.
Similarly, the second reader 106b starts phonologically from time t2.
(Pitch L) is read out, and the phoneme (Pitch S) is read out from time t4.

【００６４】また、アサイナ１０４の出力は第１のエン
ベ形成器１０７ａおよび第２のエンベ形成器１０７ｂに
与えられそれぞれ図６のＥａおよびＥｂに示すような波
形エンベロープを発生する。すなわち前後する音韻と音
韻の交替部分で、双方の音韻波形をクロスフェードする
ように波形エンベロープを発生するのである。The output of the assigner 104 is applied to the first envelope former 107a and the second envelope former 107b to generate the waveform envelopes shown as Ea and Eb in FIG. 6, respectively. That is, a waveform envelope is generated so as to crossfade both phoneme waveforms at the alternate portions of the phonemes and the phonemes that come and go.

【００６５】このように形成された波形エンベロープＥ
ａとＥｂと読み出し器１０６ａの出力Ｓａ、読み出し器
１０６ｂの出力Ｓｂがそれぞれレベル変換器１０８ａ、
１０８ｂに与えられる。The waveform envelope E formed in this way
a and Eb, the output Sa of the reader 106a, and the output Sb of the reader 106b are the level converters 108a and 108b, respectively.
108b.

【００６６】エンベ形成器１０７ａと１０７ｂではそれ
ぞれＥａ×ＳａとＥａ×Ｓｂを演算し、エンベ形成器１
０７ａと１０７ｂの出力が加算器１０９で加算され歌声
合成処理が完了して出力端子１１０に出力波形が得られ
る。The envelope formers 107a and 107b calculate Ea × Sa and Ea × Sb, respectively, and the envelope former 1
The outputs of 07a and 107b are added by the adder 109, the singing voice synthesis process is completed, and an output waveform is obtained at the output terminal 110.

【００６７】同時に音程差分発生器１１１は以下のよう
に動作する。アサイナ１０４の出力を受けて、音程差分
発生器１１１は図３のように第１および第２の補正出力
を発生する。例えば時間ｔ2で音韻し(音高ラ)の音声を
発音開始するときに、直前で音韻あ(音高ド)の音声を発
音していて、この間に９００セントの音高差がある。この
音高差を次第に小さくするために音程差分発生器１１１
は図３に示すように第１の補正出力を第１の読み出し器
１０６ａに与える。At the same time, the pitch difference generator 111 operates as follows. Upon receiving the output of the assigner 104, the pitch difference generator 111 generates the first and second correction outputs as shown in FIG. For example, when a phoneme with a phoneme (pitch ra) is started to be produced at time t2, a phoneme with a phoneme a (pitch do) is being pronounced immediately before, and there is a pitch difference of 900 cents between them. To reduce the pitch difference gradually, the pitch difference generator 111
Gives the first correction output to the first reader 106a as shown in FIG.

【００６８】前述したように、読み出し器１０６ａ，１
０６ｂは、音高指定値と補正値を加算した値を波形デー
タの読み出し速度に反映するので、音韻あは音高ドから
音高ラにしだいに変化することになる。このようにして
直前の音韻の音高から次の音韻の音高になめらかに漸近
することになり前述したクロスフェード処理との相乗作
用により、極めて滑らかに音韻と音韻が接続されて、品
質のよい歌声を合成することができる。As described above, the readers 106a, 1a
Since 06b reflects the value obtained by adding the specified pitch value and the correction value to the reading speed of the waveform data, the phoneme changes from the pitch do to the pitch la gradually. In this way, the pitch of the immediately preceding phoneme is smoothly approached to the pitch of the next phoneme, and by the synergistic action with the crossfade processing described above, the phonemes and the phonemes are connected very smoothly, and the quality is good. Singing voice can be synthesized.

【００６９】以上のように、本実施の形態によれば、予
め複数の音高および音韻の波形データを格納した波形メ
モリ１０５ａと１０５ｂを２つの波形読み出し器１０６
ａと１０６ｂで交互に読み出し、この読み出し出力をク
ロスフェードするようにエンベロープを発生するエンベ
発生器１０７ａと１０７ｂを設けこれにより波形読み出
し器１０６ａと１０６ｂの出力をレベル変換して加算
し、さらに前後する音韻の音高が異なる場合には前の音
韻から後ろの音韻に移行する際に前の音韻の音高値をし
だいに後ろの音高値に近づけるように制御する音程差分
値発生器１１１を設けて、これによって読み出し器１０
６ａと１０６ｂを制御することにより、極めて滑らかに
音韻と音韻が接続することができ、品質のよい歌声を合
成することができる。As described above, according to the present embodiment, the waveform memories 105a and 105b in which the waveform data of a plurality of pitches and phonemes are stored in advance are provided in the two waveform readers 106.
a and 106b are alternately read, and envelope generators 107a and 107b are provided to generate envelopes so as to cross-fade the read outputs, whereby the outputs of the waveform readers 106a and 106b are level-converted and added, and then forward and backward. When the pitches of the phonemes are different, a pitch difference value generator 111 is provided to control the pitch value of the previous phoneme to gradually approach the pitch value of the back phoneme when transitioning from the previous phoneme to the back phoneme. This allows the reader 10
By controlling 6a and 106b, the phonemes can be connected to each other very smoothly, and a high quality singing voice can be synthesized.

【００７０】なお、本実施の形態では、波形メモリ１０
５ａと１０５ｂ、読み出し器１０６ａと１０６ｂ、エン
ベ形成器１０７ａと１０７ｂ、レベル変換器１０８ａと
１０８ｂとしたが、これらを時分割多重処理として単一
のハードで構成可能なことはいうまでもない。In the present embodiment, the waveform memory 10
Although 5a and 105b, readers 106a and 106b, envelope formers 107a and 107b, and level converters 108a and 108b are used, it goes without saying that these can be configured by a single hardware as time division multiplexing processing.

【００７１】また、レベル変換を乗算としたが、例えば
ビットシフト等でもかまわない。（実施の形態２）つぎに本発明の実施の形態２につい
て、図面を参照しながら説明する。Further, although the level conversion is multiplication, for example, bit shift or the like may be used. (Second Embodiment) Next, a second embodiment of the present invention will be described with reference to the drawings.

【００７２】図８は本発明の実施の形態における音楽再
生装置の構成を示すブロック図である。図８において、
２００は演奏データメモリ、２０１はシーケンサ、２０
２は歌声合成装置であり、これは実施の形態１で説明し
たものと同じである。２０３は楽音合成装置、２０４は
加算器、２０５は出力端子、２０６は移調入力端子、２
０７は演奏速度入力端子である。FIG. 8 is a block diagram showing the structure of the music reproducing apparatus according to the embodiment of the present invention. In FIG.
200 is a performance data memory, 201 is a sequencer, 20
Reference numeral 2 is a singing voice synthesizer, which is the same as that described in the first embodiment. Reference numeral 203 is a musical sound synthesizer, 204 is an adder, 205 is an output terminal, 206 is a transposition input terminal, 2
Reference numeral 07 is a performance speed input terminal.

【００７３】以上のように構成された本実施の形態の音
楽再生装置について、以下その動作を説明する。The operation of the music reproducing apparatus of the present embodiment configured as above will be described below.

【００７４】演奏データメモリ２００には演奏情報が格
納されている。演奏情報の格納形式は従来例で説明した
とうりであり、ここでは繰り返し説明しない。シーケン
サ２０１は演奏データメモリ２００から演奏情報を読み
出し、歌声の演奏であれば歌声合成装置２０２に合成す
べき歌声の音韻と音高と音量および発音タイミングから
なる演奏情報を与え、楽器音の演奏であれば楽音合成装
置２０３に音高と音量および発音タイミングからなる発
音情報を与る。Performance information is stored in the performance data memory 200. The storage format of the performance information is as described in the conventional example, and will not be described repeatedly here. The sequencer 201 reads the performance information from the performance data memory 200, and in the case of the performance of a singing voice, provides the singing voice synthesizer 202 with the performance information including the phoneme of the singing voice to be synthesized, the pitch, the volume, and the sounding timing. If there is, the musical tone synthesizer 203 is given pronunciation information consisting of the pitch, the volume, and the pronunciation timing.

【００７５】歌声合成装置２０２と楽音合成装置２０３
の出力は加算器２０４で加算され出力端子２０５に歌声
と楽器演奏音が同時に得られる。Singing voice synthesizer 202 and tone synthesizer 203
Is added by the adder 204, and the singing voice and the musical instrument playing sound are simultaneously obtained at the output terminal 205.

【００７６】２０６は移調指示値の入力端子であり、た
とえば値１が入力されるとシーケンサ２０１は音高指示
を半音高くして歌声合成装置２０２および楽音合成装置
２０３に与える。２０７は演奏速度指示値の入力端子で
あり、ここに与える値によりシーケンサは演奏速度を速
くまたは遅くする。Reference numeral 206 denotes a transposition instruction value input terminal. When, for example, a value of 1 is input, the sequencer 201 raises the pitch instruction by a semitone and gives it to the singing voice synthesizer 202 and the musical tone synthesizer 203. Reference numeral 207 is an input terminal for a playing speed instruction value, and the sequencer makes the playing speed faster or slower depending on the value given here.

【００７７】本実施の形態では歌声合成装置２０２に実
施の形態１と同じ構成のものを用いているので、入力端
子２０６から移調指示があって、シーケンサ２０１が音
高指示値を変化すると、波形メモリ１０５ａ、１０５ｂ
を読み出すアドレスが変化する。In this embodiment, since the singing voice synthesizer 202 having the same structure as that of the first embodiment is used, when a transposition instruction is given from the input terminal 206 and the sequencer 201 changes the pitch instruction value, the waveform is changed. Memories 105a, 105b
The address to read is changed.

【００７８】しかしこれは、読み出す波形データの品質
を変化させるものではないので、移調する/しないに係
わらず、歌声合成器は劣化するものではない。However, since this does not change the quality of the waveform data to be read, the singing voice synthesizer does not deteriorate regardless of transposing / not transposing.

【００７９】同様に演奏速度指示値が変化しても、読み
出し器１０６ａと１０６ｂ、エンベ形成器１０７ａと１
０７ｂ等に与えるタイミング情報が変化するだけであ
り、移調の場合と同じく読み出す波形データの品質を変
化させるものではないので、移調する/しないに係わら
ず、歌声合成装置は劣化するものではない。Similarly, even if the performance speed instruction value changes, the reading units 106a and 106b and the envelope forming units 107a and 1
Since the timing information given to 07b and the like only changes and does not change the quality of the waveform data to be read as in the case of transposing, the singing voice synthesizing device does not deteriorate regardless of transposing.

【００８０】以上のように、演奏情報が格納されている
演奏データメモリ２００と、シーケンサ２０１と、実施
の形態１による歌声合成装置２０２と、楽音合成装置２
０３とを設けることにより、演奏速度や音程を変化して
も歌声の品質を劣化させることなく音楽を再生すること
ができ、カラオケ等の音楽再生時にキーコンなどの操作
をしても、音楽の品質が劣化しない。As described above, the performance data memory 200 storing the performance information, the sequencer 201, the singing voice synthesizer 202 according to the first embodiment, and the musical sound synthesizer 2
By providing 03, it is possible to play music without degrading the quality of the singing voice even if the playing speed or the pitch are changed, and even if a key console or the like is operated during music playback such as karaoke, the quality of the music can be improved. Does not deteriorate.

【００８１】[0081]

【発明の効果】以上のように本発明によれば、音韻と音
高と音量および発音タイミングからなる演奏情報を第１
および第２の出力系列に交互に出力するアサイナと、ア
サイナの第１および第２の出力系列によっ音高と発音タ
イミングを指示されて第１および第２の出力にそれぞれ
第１および第２の音程変化値を出力する音程差分発生器
と、アサイナの第１および第２の出力系列によってそれ
ぞれ音高と音韻と発音タイミングを指示されかつ音程差
分発生器の第１および第２の出力によってそれぞれ音程
変化値を指示される第１および第２の読み出し器と、ア
サイナの第１および第２の出力系列によってそれぞれ音
量と発音タイミングを指示され波形包絡データを生成す
る第１および第２のエンベ形成器と、予め複数の音韻お
よび音高の音声波形を記憶していてそれぞれ第１および
第２の読み出し器によってデータが読み出される第１お
よび第２の波形メモリと、第１および第２の読み出し器
の出力をそれぞれ第１および第２のエンベ形成器の出力
でレベル変換する第１および第２のレベル変換器と、第
１および第２のレベル変換器の出力を加算する加算器と
を備ることにより、指定された音韻/音高の音声波形
を、音韻毎に交互に個別に読み出し器で読み出し、クロ
スフェードしながら滑らかに出力する。クロスフェード
部以外は原音ＰＣＭそのままなので品質の高い歌声が合
成でき、さらに前の音韻と後ろの音韻で音高差がある場
合にはクロスフェード時に前の音韻の音高を次第に後ろ
の音高に漸近させるので音高の変化が滑らかになり、自
然な歌声を合成することができる。As described above, according to the present invention, the performance information consisting of the phoneme, the pitch, the volume, and the sounding timing is first set.
And a second output sequence for alternately outputting the pitch, and the first and second output sequences of the assigner to instruct the pitch and sounding timing to output the first and second outputs respectively. A pitch difference generator that outputs a pitch change value and a pitch and a phoneme and a pronunciation timing are respectively instructed by the first and second output sequences of the assigner, and the pitch is generated by the first and second outputs of the pitch difference generator, respectively. First and second readers for which change values are instructed, and first and second envelope formers for generating waveform envelope data in which volume and sound generation timing are instructed by the first and second output sequences of the assigner, respectively. And a first and second waveform memory in which voice waveforms of a plurality of phonemes and pitches are stored in advance and data is read by the first and second readers, respectively. And first and second level converters for level converting the outputs of the first and second readers with the outputs of the first and second envelope formers, respectively, and the first and second level converters. By adding an adder for adding the outputs of the above, the phoneme waveform of the designated phoneme / pitch is alternately read out by the reader for each phoneme alternately and smoothly output while crossfading. Except for the crossfade part, the original PCM is the same as it is, so a high quality singing voice can be synthesized, and if there is a pitch difference between the front phoneme and the back phoneme, the pitch of the front phoneme is gradually changed to the back pitch during the crossfade. Since the pitch is made asymptotic, the pitch change becomes smooth and a natural singing voice can be synthesized.

【００８２】また、楽器および歌唱の演奏情報を記憶し
ている演奏データメモリと、演奏データメモリから読み
だした演奏情報と移調入力および演奏速度入力に応じて
タイミングを制御しながら演奏指示情報を出力するシー
ケンサと、シーケンサの演奏指示に従って楽音を合成す
る楽音合成装置と、シーケンサの演奏指示に従って歌声
を合成する歌声合成装置と、楽音合成装置の出力と歌声
合成装置の出力を加算する加算器とを備ることにより、
演奏する音程や速度を変化させた時も品質の高い歌声付
きの音楽再生ができ、これをシンセカラオケの音楽再生
装置に適用することにより、演奏速度や音程が変化して
もバックコーラスの品質が低下しないシンセカラオケを
提供でき、その実用上優れた効果を有するものである。Also, performance data memory storing performance information of musical instruments and singing, performance information read from the performance data memory, and performance instruction information is output while controlling timing according to transposition input and performance speed input. A sequencer, a tone synthesizer that synthesizes musical tones according to the performance instructions of the sequencer, a singing voice synthesizer that synthesizes singing voices according to the performance instructions of the sequencer, and an adder that adds the output of the tone synthesizer and the output of the singing voice synthesizer. By having
Even when the pitch or speed of the performance is changed, high-quality music with a singing voice can be reproduced.By applying this to a music playback device for synth karaoke, the quality of the back chorus can be improved even if the performance speed or pitch changes. It is possible to provide a synth karaoke that does not deteriorate and has an excellent effect in practical use.

[Brief description of the drawings]

【図１】本発明の実施の形態１における歌声合成装置の
構成を示すブロック図FIG. 1 is a block diagram showing the configuration of a singing voice synthesizing device according to a first embodiment of the present invention.

【図２】同実施の形態における読み出し器の動作および
構成を示す説明図FIG. 2 is an explanatory diagram showing an operation and a configuration of a reader according to the same embodiment.

【図３】同実施の形態における音程差分発生器の動作を
説明する説明図FIG. 3 is an explanatory diagram illustrating an operation of the pitch difference generator according to the same embodiment.

【図４】同実施の形態における波形メモリのデータ格納
状態の説明図FIG. 4 is an explanatory diagram of a data storage state of the waveform memory in the same embodiment.

【図５】同実施の形態における波形メモリのループ読み
出しの説明図FIG. 5 is an explanatory diagram of loop reading of the waveform memory according to the same embodiment.

【図６】同実施の形態における歌声合成のクロスフェー
ド処理の説明図FIG. 6 is an explanatory diagram of a crossfade process for singing voice synthesis in the same embodiment.

【図７】同実施の形態におけるアサイナの動作説明図FIG. 7 is an operation explanatory view of the assigner according to the same embodiment.

【図８】本発明の実施の形態２における音楽再生装置の
構成を示すブロック図FIG. 8 is a block diagram showing a configuration of a music reproducing device according to a second embodiment of the present invention.

【符号の説明】１０４アサイナ１０５ａ、１０５ｂ波形メモリ１０６ａ、１０６ｂ読み出し器１０７ａ、１０７ｂエンベ形成器１０８ａ、１０８ｂレベル変換器１０９加算器１１１音程差分発生器２００演奏データメモリ２０１シーケンサ２０２歌声合成装置２０３楽音合成装置２０４加算器[Description of Reference Signs] 104 Assigner 105a, 105b Waveform memory 106a, 106b Reader 107a, 107b Envelope generator 108a, 108b Level converter 109 Adder 111 Pitch difference generator 200 Performance data memory 201 Sequencer 202 Singing voice synthesizer 203 Musical sound synthesis Device 204 Adder

Claims

[Claims]

1. An assigner which alternately outputs performance information consisting of an input phoneme, pitch, volume and sounding timing to first and second output series, and first and second output series of the assigner. First and second readers for which a pitch, a phoneme, and a sounding timing are respectively instructed, and a volume and a sounding timing are instructed by the first and second output sequences of the assigner, respectively, and first waveform envelope data is generated. And a second envelope former, first and second waveform memories for preliminarily storing speech waveforms of a plurality of phonemes and pitches, from which data is read by the first and second readers, respectively. First and second level converters for level converting the outputs of the first and second readers with the outputs of the first and second envelope formers, respectively; A singing voice synthesizing apparatus comprising: an adder that adds outputs of the first and second level converters.

2. An assigner which alternately outputs performance information consisting of an input phoneme, pitch, volume and sounding timing to a first and a second output sequence, and a first and a second output sequence of the assigner. The pitch difference generator that outputs the first and second pitch change values to the first and second outputs, respectively, in response to the pitch and the sounding timing, and the first and second output sequences of the assigner, respectively. First and second readers which are instructed of pitch, phoneme and pronunciation timing, and instructed by the first and second outputs of the interval difference generator, respectively, and first and second of the assigner. Volumes and sounding timings are respectively instructed by two output sequences, first and second envelope formers for generating waveform envelope data, and speech waves having a plurality of phonemes and pitches in advance. A first and a second waveform memory for storing a shape and data being read by the first and second readers respectively; and outputs of the first and second readers respectively for the first and second readers. 1. A singing voice synthesizing device comprising: first and second level converters for level conversion with the output of the envelope former; and adder for adding outputs of the first and second level converters.

3. Performance data memory storing performance information of musical instruments and singing, performance information read from the performance data memory and performance instruction information while controlling timing according to transposition input and performance speed input. A sequencer for outputting, a musical tone synthesizer for synthesizing musical tones according to performance instructions of the sequencer, a singing voice synthesizer for synthesizing singing voices according to performance instructions of the sequencer, an output of the musical tone synthesizer and an output of the singing voice synthesizer are added. Music playback device equipped with an adder that

4. The music reproducing apparatus according to claim 3, wherein the singing voice synthesizing apparatus has the configuration according to claim 1 or 2.