JP3173310B2

JP3173310B2 - Harmony generator

Info

Publication number: JP3173310B2
Application number: JP04176795A
Authority: JP
Inventors: 保夫蔭山; 秀一松本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1995-03-01
Filing date: 1995-03-01
Publication date: 2001-06-04
Anticipated expiration: 2016-06-04
Also published as: JPH08234784A

Abstract

PURPOSE: To provide a KARAOKE device which generates a harmony voice signal even unless the pitch of a text voice signal is detected. CONSTITUTION: An input singing voice signal to a DSP 30 for voice processing is supplied to a peak detection part 41, a phoneme detection part 42, a mean sound volume detection part 43, and a multiplier 45. The singing voice signal is multiplied by a window function through the multiplier 45 and cut waveform element data of one cycle are stored in a memory 46. A readout control part 48 for harmony data accesses the memory 46 and the signal obtained by repeatedly reading waveform element data out at intervals corresponding to a harmony frequency is the harmony voice signal. The window function is one cycle long in terms of melody data and the timing of starting the window function so controlled that the peak detected by the peak detection part 41 is at the center of the window function. A window function generation part 44 cuts the waveform element data at intervals of tens of ms and waveform element data corresponding to a timbre are written in the memory 46: when phonemes change, a phoneme detection part 42 transmits that to the window function generation part 44 to generate the window function.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、歌唱や演奏の音声信
号に対するハーモニー音声信号を生成するハーモニー生
成装置に関し、とくに、音声信号の周波数が検出できな
い場合でも的確なハーモニー音声信号を生成できるハー
モニー生成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a harmony generation device for generating a harmony voice signal for a singing or performance voice signal, and more particularly to a harmony generation device capable of generating an accurate harmony voice signal even when the frequency of the voice signal cannot be detected. Related to the device.

【０００２】[0002]

【従来の技術】現在実用化されているカラオケ装置など
では、歌唱を盛り上げるため、または、歌唱を上手く聞
かせるために歌唱者の歌唱音声に対してハーモニー（た
とえば、旋律に対して３度上の旋律）の音声を付加して
出力する機能を備えたものがある。ハーモニー付加機能
としては、歌唱者の歌唱音声とよく似た音色で且つ同じ
テンポのハーモニー音声を作るためマイクから入力され
た歌唱者の音声信号の周波数をシフトしてハーモニー音
声信号を形成するものが一般的である。2. Description of the Related Art In a karaoke apparatus or the like which is currently in practical use, singing voices of a singer are harmony (for example, three times higher than a melody) in order to excite the singing or to make the singing sound better. Some have a function of adding and outputting a melody sound. As a harmony addition function, there is a function that shifts the frequency of the singer's voice signal input from the microphone to form a harmony voice signal in order to create a harmony voice with a tone very similar to the singer's singing voice and at the same tempo. General.

【０００３】[0003]

【発明が解決しようとする課題】しかし、上述のハーモ
ニー音声信号付加技術は、歌唱者の歌唱音声信号の基本
周波数を検出し、この基本周波数に基づいてシフトを行
っていたため、入力された歌唱音声信号の周波数を検出
することができなかった場合には、ハーモニー音声信号
の形成ができないという欠点があった。However, the harmony voice signal adding technique described above detects the fundamental frequency of the singing voice signal of the singer and shifts it based on this fundamental frequency. If the frequency of the signal could not be detected, there was a drawback that a harmony audio signal could not be formed.

【０００４】この発明は、入力された音声信号の周波数
を検出することができない場合でもハーモニー音声信号
を形成することができるハーモニー生成装置を提供する
ことを目的とする。[0004] It is an object of the present invention to provide a harmony generation device that can form a harmony audio signal even when the frequency of an input audio signal cannot be detected.

【０００５】[0005]

【課題を解決するための手段】この出願の請求項１の発
明は、歌唱または演奏の音声信号を入力する音声信号入
力手段と、該音声信号の周波数情報である旋律情報を供
給する旋律情報供給手段と、前記音声信号の周波数に協
和する周波数であるハーモニー周波数を供給するハーモ
ニー周波数供給手段と、前記音声信号から前記旋律情報
の１周期分の波形を波形要素データとして切り出す波形
要素データ抽出手段と、該波形要素データを前記ハーモ
ニー周波数で繰り返し読み出すことによりハーモニー音
声信号を形成するハーモニー形成手段とを備えたことを
特徴とする。According to a first aspect of the present invention, there is provided an audio signal input means for inputting a singing or performance audio signal, and a melody information supply for supplying melody information as frequency information of the audio signal. Means, a harmony frequency supply means for supplying a harmony frequency which is a frequency harmonizing with the frequency of the audio signal, and waveform element data extraction means for cutting out a waveform for one cycle of the melody information from the audio signal as waveform element data. Harmony forming means for forming a harmony sound signal by repeatedly reading out the waveform element data at the harmony frequency.

【０００６】この出願の請求項２の発明は、歌唱または
演奏の音声信号を入力する音声信号入力手段と、該音声
信号入力手段から入力された音声信号の周波数を検出す
る周波数検出手段と、前記音声信号の周波数情報である
旋律情報を供給する旋律情報供給手段と、前記歌唱また
は演奏に協和する周波数であるハーモニー周波数を供給
するハーモニー周波数供給手段と、前記周波数検出手段
が音声信号の周波数を検出できたとき、前記音声信号か
ら前記周波数の１周期分の波形を波形要素データとして
切り出し、前記周波数検出手段が音声信号の周波数を検
出できなかったとき、前記旋律情報の１周期分の波形を
波形要素データとして切り出す波形要素データ抽出手段
と、該波形要素データを前記ハーモニー周波数で繰り返
し読み出すことによりハーモニー音声信号を形成するハ
ーモニー形成手段とを備えたことを特徴とする。According to a second aspect of the present invention, there is provided an audio signal input means for inputting a singing or performance audio signal, a frequency detecting means for detecting a frequency of the audio signal input from the audio signal input means, Melody information supply means for supplying melody information which is frequency information of an audio signal, harmony frequency supply means for supplying a harmony frequency which is a frequency cooperating with the singing or performance, and the frequency detection means detects the frequency of the audio signal When it is completed, a waveform for one cycle of the frequency is cut out from the audio signal as waveform element data, and when the frequency detection means cannot detect the frequency of the audio signal, the waveform for one cycle of the melody information is converted into a waveform. Waveform element data extraction means for extracting as element data; and repeatedly reading out the waveform element data at the harmony frequency. Characterized in that a harmony forming means for forming a harmony audio signal Ri.

【０００７】この出願の請求項３の発明は、前記旋律情
報供給手段を複数の旋律情報を時系列に記憶する旋律情
報記憶手段と、該旋律情報記憶手段から旋律情報を逐次
読み出して前記波形要素データ抽出手段に供給する手段
とで構成し、且つ、前記ハーモニー周波数供給手段を複
数のハーモニー周波数を時系列に記憶するハーモニー周
波数記憶手段と、該ハーモニー周波数記憶手段からハー
モニー周波数を逐次読み出して前記ハーモニー形成手段
に供給する手段とで構成したことを特徴とする。According to a third aspect of the present invention, there is provided a melody information supply means for storing a plurality of pieces of melody information in time series, and a melody information storage means for sequentially reading the melody information from the melody information storage means, A harmony frequency supply means, and a harmony frequency supply means for storing a plurality of harmony frequencies in a time series; and a harmony frequency reading means for sequentially reading harmony frequencies from the harmony frequency storage means. And means for supplying to the forming means.

【０００８】[0008]

【作用】請求項１のハーモニー生成装置は、歌唱または
演奏の音声信号を入力し、この音声信号から旋律情報の
１周期分の波形を波形要素データとして切り出す。旋律
情報は旋律情報供給手段から供給される。１周期分の波
形要素データは基本周波数の長さであるがその倍音成分
に関しては１周期を超えて（２周期以上）含まれている
ため、前記入力された音声信号の倍音特性（フォルマン
ト）が保存されている。この波形要素データをハーモニ
ー周波数で繰り返し読み出してハーモニー音声信号を形
成する。ハーモニー周波数は、入力された音声信号の周
波数に協和する周波数、たとえば３度（４／３倍）や５
度（３／２倍）の関係を有する周波数がハーモニー周波
数供給手段から供給される。このハーモニー音声信号
は、波形要素データをハーモニー周波数で繰り返すこと
から基本周波数はハーモニー周波数となるが、波形要素
データの倍音成分がそのまま含まれているため、前記入
力された音声信号のフォルマントが再現され、同じ音色
となる。According to the first aspect of the present invention, a harmony generating device receives a singing or performance audio signal and cuts out a waveform for one cycle of melody information from the audio signal as waveform element data. The melody information is supplied from the melody information supply means. Although the waveform element data for one cycle has the length of the fundamental frequency, its harmonic components are included in more than one cycle (two or more cycles), so that the overtone characteristic (formant) of the input audio signal is Has been saved. The waveform element data is repeatedly read at the harmony frequency to form a harmony audio signal. The harmony frequency is a frequency that harmonizes with the frequency of the input audio signal, for example, 3 degrees (4/3 times) or 5
A frequency having a degree (3/2 times) is supplied from the harmony frequency supply means. Since the harmony sound signal repeats the waveform element data at the harmony frequency, the fundamental frequency becomes the harmony frequency, but since the harmonic component of the waveform element data is included as it is, the formant of the input sound signal is reproduced. And the same tone.

【０００９】また、請求項２のハーモニー生成装置は、
入力された音声信号の周波数を検出し、波形要素データ
を切り出す単位である１周期をこの周波数に基づいて決
定する。これが検出できなかったときには請求項１と同
様に旋律情報供給手段から供給される旋律情報の１周期
を切り出し単位の１周期とする。入力された音声信号に
基づいて１周期を決定するとより理想的な長さの波形要
素データを切り出すことができ、また、歌唱の音程が不
安定で周波数を検出できなかった場合でも、旋律情報に
基づいて波形要素データの切り出しを行うことができ、
ハーモニー音声信号の形成が途切れることがない。The harmony generating device according to claim 2 is
The frequency of the input audio signal is detected, and one cycle, which is a unit for extracting the waveform element data, is determined based on this frequency. If this cannot be detected, one cycle of the melody information supplied from the melody information supply means is set as one cycle of the cutout unit, as in the first aspect. If one cycle is determined based on the input voice signal, waveform element data having a more ideal length can be cut out. Even if the frequency of the singing is unstable and the frequency cannot be detected, the melody information can be obtained. It is possible to cut out waveform element data based on
The formation of the harmony audio signal is not interrupted.

【００１０】請求項３の発明では、前記旋律情報供給手
段を複数の旋律情報を時系列に記憶する旋律情報記憶手
段と、該旋律情報記憶手段から旋律情報を逐次読み出し
て前記波形要素データ抽出手段に供給する手段とで構成
し、前記ハーモニー周波数供給手段を複数のハーモニー
周波数を時系列に記憶するハーモニー周波数記憶手段
と、該ハーモニー周波数記憶手段からハーモニー周波数
を逐次読み出して前記ハーモニー形成手段に供給する手
段とで構成した。これにより、歌唱または演奏の進行に
同期した旋律情報およびハーモニー周波数の供給が可能
になる。According to the third aspect of the present invention, the melody information supply means stores a plurality of pieces of melody information in time series, and the melody information is sequentially read from the melody information storage means, and the waveform element data extraction means is provided. Harmony frequency supply means, and a harmony frequency storage means for storing a plurality of harmony frequencies in chronological order, and harmony frequencies are sequentially read from the harmony frequency storage means and supplied to the harmony formation means. And means. This makes it possible to supply melody information and harmony frequencies synchronized with the progress of singing or performance.

【００１１】[0011]

【実施例】図面を参照してこの発明の実施例であるハー
モニー付加機能付カラオケ装置について説明する。この
カラオケ装置は、音源カラオケ装置であり、通信機能、
および、ハーモニー付加機能を備えている。音源カラオ
ケ装置とは、楽曲データで音源装置を駆動することによ
りカラオケ演奏音を発生するカラオケ装置である。楽曲
データとは、音高や発音タイミングを指定する演奏デー
タ列などの複数トラックからなるシーケンスデータであ
る。また、通信機能とは、通信回線を介してホストステ
ーションと接続され、楽曲データをホストステーション
からダウンロードしてハードディスク装置１７（図１参
照）に蓄える機能である。ハードディスク装置１７は、
楽曲データを数百〜数千曲分記憶することができる。ハ
ーモニー付加機能とは、歌唱者の歌唱音声信号に３度や
５度の音程のハーモニー音声信号を付加する機能であ
る。ハーモニー音声信号は、歌唱者の歌唱音声をピッチ
シフトすることによって生成する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A karaoke apparatus with a harmony addition function according to an embodiment of the present invention will be described with reference to the drawings. This karaoke device is a sound source karaoke device, and has a communication function,
And it has a harmony addition function. A sound source karaoke device is a karaoke device that generates a karaoke performance sound by driving the sound source device with music data. The music data is sequence data composed of a plurality of tracks, such as a performance data string that specifies a pitch and a sound generation timing. The communication function is a function that is connected to the host station via a communication line, downloads music data from the host station, and stores the music data in the hard disk device 17 (see FIG. 1). The hard disk drive 17
Music data for hundreds to thousands of songs can be stored. The harmony adding function is a function of adding a harmony voice signal having a pitch of 3rd or 5th to the singing voice signal of the singer. The harmony voice signal is generated by pitch-shifting the singer's singing voice.

【００１２】図１は同カラオケ装置のブロック図であ
る。装置全体の動作を制御するＣＰＵ１０には、バスを
介してＲＯＭ１１，ＲＡＭ１２，ハードディスク記憶装
置（ＨＤＤ）１７，ＩＳＤＮコントローラ１６，リモコ
ン受信機１３，表示パネル１４，パネルスイッチ１５，
音源装置１８，音声データ処理部１９，効果用ＤＳＰ２
０，文字表示部２３，ＬＤチェンジャ２４，表示制御部
２５および音声処理用ＤＳＰ３０が接続されている。FIG. 1 is a block diagram of the karaoke apparatus. The CPU 10 for controlling the operation of the entire apparatus includes a ROM 11, a RAM 12, a hard disk storage (HDD) 17, an ISDN controller 16, a remote control receiver 13, a display panel 14, a panel switch 15,
Sound source device 18, audio data processing unit 19, effect DSP2
0, a character display unit 23, an LD changer 24, a display control unit 25, and a voice processing DSP 30.

【００１３】ＲＯＭ１１には、システムプログラム，ア
プリケーションプログラム，ローダおよびフォントデー
タが記憶されている。システムプログラムは、この装置
の基本動作や周辺機器とのデータ送受を制御するプログ
ラムである。アプリケーションプログラムは周辺機器制
御プログラム，シーケンスプログラムなどである。カラ
オケ演奏時にはシーケンスプログラムがＣＰＵ１０によ
って実行され、楽曲データに基づいた楽音の発生，映像
の再生が行われる。ローダは、ホストステーションから
楽曲データをダウンロードするためのプログラムであ
る。フォントデータは、歌詞や曲名などを表示するため
のものであり、明朝体やゴジック体などの複数種類の文
字種のフォントが記憶されている。また、ＲＡＭ１２に
は、ワークエリアが設定される。ＨＤＤ１７には楽曲デ
ータファイルが設定される。The ROM 11 stores a system program, an application program, a loader, and font data. The system program is a program that controls the basic operation of the device and data transmission / reception with peripheral devices. The application program is a peripheral device control program, a sequence program, or the like. During a karaoke performance, a sequence program is executed by the CPU 10 to generate musical tones and reproduce images based on music data. The loader is a program for downloading music data from the host station. The font data is for displaying lyrics, song titles, and the like, and stores fonts of a plurality of character types, such as Mincho and Gothic. In the RAM 12, a work area is set. A music data file is set in the HDD 17.

【００１４】ＩＳＤＮコントローラ１６は、ＩＳＤＮ回
線を介してホストステーションと交信するためのコント
ローラである。ＩＳＤＮコントローラ１６はホストステ
ーションから楽曲データなどをダウンロードする。ま
た、ＩＳＤＮコントローラ１６はＤＭＡ回路を内蔵して
おり、ダウンロードされた楽曲データやアプリケーショ
ンプログラムをＣＰＵ１０を介さずに直接ＨＤＤ１７に
書き込む。The ISDN controller 16 is a controller for communicating with a host station via an ISDN line. The ISDN controller 16 downloads music data and the like from the host station. Further, the ISDN controller 16 has a built-in DMA circuit, and writes the downloaded music data and application program directly to the HDD 17 without passing through the CPU 10.

【００１５】リモコン受信機１３はリモコン３１から送
られてくる赤外線信号を受信してデータを復元する。リ
モコン３１は選曲スイッチなどのコマンドスイッチやテ
ンキースイッチなどを備えており、利用者がこれらのス
イッチを操作するとその操作に応じたコードで変調され
た赤外線信号を送信する。表示パネル１４はこのカラオ
ケ装置の前面に設けられており、現在演奏中の曲コード
や予約曲数などを表示するものである。パネルスイッチ
１５はカラオケ装置の前面操作部に設けられており、曲
コード入力スイッチやキーチェンジスイッチなどを含ん
でいる。The remote control receiver 13 receives the infrared signal sent from the remote control 31 and restores the data. The remote controller 31 includes a command switch such as a music selection switch, a numeric key switch, and the like. When the user operates these switches, an infrared signal modulated with a code corresponding to the operation is transmitted. The display panel 14 is provided on the front of the karaoke apparatus, and displays the currently playing music code and the number of reserved music. The panel switch 15 is provided on a front operation unit of the karaoke apparatus, and includes a music code input switch, a key change switch, and the like.

【００１６】音源装置１８は、カラオケ演奏時にＣＰＵ
１０から入力されるイベントデータに基づいて楽音信号
を形成する。イベントデータは楽曲データの楽音トラッ
クに記憶されている。楽音トラックは図４に示すように
複数設定されているため、イベントデータも複数系統が
並行して入力される。音源装置１８は、これらのデータ
を受信して複数の楽音信号を同時に形成する。音声デー
タ処理部１９は、楽曲データに含まれるＡＤＰＣＭデー
タである音声データに基づき、指定された長さ，指定さ
れた音高の音声信号を形成する。音声データは、バック
コーラスや模範歌唱音などの音源装置１８で電子的に発
生しにくい信号波形をそのままディジタル化して記憶し
たものである。The sound source device 18 has a CPU for performing karaoke.
A tone signal is formed on the basis of the event data input from. The event data is stored in a music track of the music data. Since a plurality of tone tracks are set as shown in FIG. 4, a plurality of event data are also input in parallel. The sound source device 18 receives these data and simultaneously forms a plurality of tone signals. The audio data processing unit 19 forms an audio signal having a specified length and a specified pitch based on audio data that is ADPCM data included in the music data. The audio data is obtained by directly digitizing and storing a signal waveform, such as a back chorus or a model singing sound, which is hardly generated electronically by the sound source device 18.

【００１７】一方、歌唱用のマイク２７から入力された
歌唱の音声信号はプリアンプ２８で増幅されＡ／Ｄコン
バータ２９でディジタル信号に変換されたのち効果用Ｄ
ＳＰ２０および音声処理用ＤＳＰ３０に入力される。音
声処理用ＤＳＰ３０には、このほかＣＰＵ１０から主旋
律情報，ハーモニー情報が入力される。音声処理用ＤＳ
Ｐ３０はこれらの情報に基づいて歌唱者の歌唱音声信号
から波形要素データを切り出し、この波形要素データを
合成してハーモニー音声信号を形成する。このハーモニ
ー音声信号は効果用ＤＳＰ２０に出力される。On the other hand, a singing voice signal input from a singing microphone 27 is amplified by a preamplifier 28 and converted to a digital signal by an A / D converter 29, and then converted to a digital signal for effect.
It is input to the SP 20 and the DSP 30 for audio processing. In addition, main melody information and harmony information are input from the CPU 10 to the voice processing DSP 30. DS for audio processing
P30 cuts out waveform element data from the singing voice signal of the singer based on these pieces of information, and combines the waveform element data to form a harmony voice signal. This harmony audio signal is output to the effect DSP 20.

【００１８】効果用ＤＳＰ２０には、音源装置１８が形
成した楽音信号、音声データ処理部１９が形成した音声
信号、Ａ／Ｄコンバータがディジタル変換した歌唱音声
信号および音声処理用ＤＳＰ３０が形成したハーモニー
音声信号が入力される。効果用ＤＳＰ２０は、これら入
力された音声信号や楽音信号に対してリバーブやエコー
などの効果を付与する。効果用ＤＳＰ２０が付与する効
果の種類や程度は、楽曲データの効果トラックのイベン
トデータ（ＤＳＰコントロールデータ）に基づいて制御
される。ＤＳＰコントロールデータはＤＳＰコントロー
ル用シーケンスプログラムに基づき、ＣＰＵ１０が所定
のタイミングに効果用ＤＳＰ２０に入力する。効果が付
与された楽音信号，音声信号はＤ／Ａコンバータ２１で
アナログ信号に変換されたのちアンプ・スピーカ２２に
出力される。アンプ・スピーカ２２はこの信号を増幅し
たのち放音する。The effect DSP 20 includes a tone signal formed by the tone generator 18, a sound signal formed by the sound data processing section 19, a singing sound signal converted by the A / D converter into a digital signal, and a harmony sound formed by the sound processing DSP 30. A signal is input. The effect DSP 20 gives effects such as reverb and echo to these input audio signals and tone signals. The type and degree of the effect provided by the effect DSP 20 are controlled based on the event data (DSP control data) of the effect track of the music data. The DSP control data is input to the effect DSP 20 at a predetermined timing by the CPU 10 based on the DSP control sequence program. The tone signal and audio signal to which the effect has been added are converted into analog signals by the D / A converter 21 and then output to the amplifier / speaker 22. The amplifier / speaker 22 amplifies this signal and emits sound.

【００１９】文字表示部２３は入力される文字データに
基づいて、曲名や歌詞などの文字パターンを生成する。
また、ＬＤチェンジャ２４は入力された映像選択データ
（チャプタナンバ）に基づき、対応するＬＤの背景映像
を再生する。映像選択データは当該カラオケ曲のジャン
ルデータなどに基づいて決定される。ジャンルデータは
楽曲データのヘッダに書き込まれており、カラオケ演奏
スタート時にＣＰＵ１０によって読み出される。ＣＰＵ
１０はジャンルデータに基づいてどの背景映像を再生す
るかを決定し、その背景映像を指定する映像選択データ
をＬＤチェンジャ２４に対して出力する。ＬＤチェンジ
ャ２４には、５枚（１２０シーン）程度のレーザディス
クが内蔵されており約１２０シーンの背景映像を再生す
ることができる。映像選択データによってこのなかから
１つの背景映像が選択され、映像データとして出力され
る。文字パターン，映像データは表示制御部２５に入力
される。表示制御部２５ではこれらのデータをスーパー
インポーズで合成してモニタ２６に表示する。The character display unit 23 generates character patterns such as song titles and lyrics based on the input character data.
The LD changer 24 reproduces the background video of the corresponding LD based on the input video selection data (chapter number). The video selection data is determined based on the genre data of the karaoke song or the like. The genre data is written in the header of the music data, and is read by the CPU 10 when the karaoke performance starts. CPU
10 determines which background video is to be reproduced based on the genre data, and outputs video selection data specifying the background video to the LD changer 24. The LD changer 24 contains about five (120 scenes) laser disks and can reproduce about 120 scenes of background video. One background video is selected from among them according to the video selection data, and is output as video data. The character pattern and the video data are input to the display control unit 25. The display control unit 25 superimposes these data in superimposition and displays them on the monitor 26.

【００２０】次に、図３〜図５を参照して同カラオケ装
置においてカラオケ演奏に用いられる楽曲データの構成
について説明する。図３は楽曲データの構成を示す図で
ある。また、図４，図５は楽曲データの詳細な構成を示
す図である。Next, the structure of music data used for karaoke performance in the karaoke apparatus will be described with reference to FIGS. FIG. 3 is a diagram showing a configuration of music data. FIGS. 4 and 5 are diagrams showing a detailed configuration of music data.

【００２１】図３において、１つの楽曲データは、ヘッ
ダ，楽音トラック，主旋律トラック，ハーモニートラッ
ク，歌詞トラック，音声トラック，効果トラックおよび
音声データ部からなっている。In FIG. 3, one piece of music data includes a header, a musical tone track, a main melody track, a harmony track, a lyrics track, an audio track, an effect track, and an audio data section.

【００２２】ヘッダは、この楽曲データに関する種々の
データが書き込まれる部分であり、曲名，ジャンル，発
売日，曲の演奏時間（長さ）などのデータが書き込まれ
ている。ＣＰＵ１０は、メインシーケンスプログラムの
実行時にジャンルデータに基づいてモニタ２６に表示す
る背景映像を決定し、ＬＤチェンジャ２４に対してその
映像のチャプタナンバを送信する。背景映像の決定方式
は、冬をテーマにした演歌の場合には雪国の映像を選択
し、ポップスの場合には外国の映像を選択するなどであ
る。The header is a portion in which various data relating to the music data are written, and data such as a music title, a genre, a release date, and a music playing time (length) are written therein. The CPU 10 determines a background video to be displayed on the monitor 26 based on the genre data when executing the main sequence program, and transmits a chapter number of the video to the LD changer 24. The method of determining the background image is to select an image of a snowy country in the case of enka on the theme of winter, and to select an image of a foreign country in the case of pops.

【００２３】楽音トラック〜効果トラックの各トラック
は図４，図５に示すように複数のイベントデータと各イ
ベントデータ間の時間間隔を示すデュレーションデータ
Δｔからなるシーケンスデータで構成されている。各ト
ラックのイベントデータはカラオケ演奏中にシーケンス
プログラムに基づきＣＰＵ１０によって読み出される。
シーケンスプログラムは、所定のテンポクロックでΔｔ
をカウントし、Δｔをカウントアップしたときこれに続
くイベントデータの読出タイミングであるとして、これ
を読み出して所定の処理部へ出力するプログラムであ
る。As shown in FIGS. 4 and 5, each track from the musical tone track to the effect track is composed of sequence data including a plurality of event data and duration data Δt indicating a time interval between the event data. The event data of each track is read out by the CPU 10 based on a sequence program during a karaoke performance.
The sequence program is executed at a predetermined tempo clock at Δt.
This is a program for reading out and outputting to a predetermined processing unit, assuming that when the time Δt is counted up, it is the readout timing of the event data subsequent thereto.

【００２４】楽音トラックには、メロディトラック，リ
ズムトラックを初めとして種々のパートのトラックが形
成されている。ＣＰＵ１０は、楽音シーケンスプログラ
ムによって読み出したイベントデータを音源装置１８に
出力する。音源装置１８はそのイベントデータに含まれ
ているチャンネル指定データに基づいて発音チャンネル
を選択し、その発音チャンネルについてそのイベントを
実行する。On the musical sound track, tracks of various parts including a melody track and a rhythm track are formed. The CPU 10 outputs to the tone generator 18 the event data read by the tone sequence program. The sound source device 18 selects a sounding channel based on the channel designation data included in the event data, and executes the event for the sounding channel.

【００２５】主旋律トラックには、このカラオケ曲の主
旋律すなわち歌唱者が歌うべき旋律のシーケンスデータ
が書き込まれている。このデータはＣＰＵ１０から音声
処理用ＤＳＰ３０に入力される。音声処理用ＤＳＰ３０
はこのデータに基づいて歌唱音声信号から波形要素デー
タを切り出す。また、ハーモニートラックの構成も主旋
律トラックと同様であり、このカラオケ曲のハーモニー
旋律のシーケンスデータが書き込まれている。このデー
タもＣＰＵ１０から音声処理用ＤＳＰ３０に入力され
る。音声処理用ＤＳＰ３０はこのデータに基づいてハー
モニー音声信号の周波数（音高）を決定する。In the main melody track, the main melody of the karaoke song, that is, the sequence data of the melody to be sung by the singer is written. This data is input from the CPU 10 to the DSP 30 for audio processing. DSP 30 for audio processing
Extracts waveform element data from the singing voice signal based on this data. The configuration of the harmony track is the same as that of the main melody track, and the sequence data of the harmony melody of this karaoke song is written. This data is also input from the CPU 10 to the voice processing DSP 30. The voice processing DSP 30 determines the frequency (pitch) of the harmony voice signal based on the data.

【００２６】歌詞トラックは、モニタ２６上に歌詞を表
示するためのシーケンスデータを記憶したトラックであ
る。このシーケンスデータは楽音データではないが、イ
ンプリメンテーションの統一をとり、作業工程を容易に
するためこのトラックもＭＩＤＩデータ形式で記述され
ている。データ種類は、システム・エクスクルーシブ・
メッセージである。歌詞トラックのデータ記述におい
て、通常は１行の歌詞を１つの歌詞表示データとして扱
っている。歌詞表示データは１行の歌詞の文字データ
（文字コードおよびその文字の表示座標）、この歌詞の
表示時間（通常は３０秒前後）、および、ワイプシーケ
ンスデータからなっている。ワイプシーケンスデータと
は、曲の進行に合わせて歌詞の表示色を変更してゆくた
めのシーケンスデータであり、表示色を変更するタイミ
ング（この歌詞が表示されてからの時間）と変更位置
（座標）が１行分の長さにわたって順次記録されている
データである。The lyrics track is a track in which sequence data for displaying lyrics on the monitor 26 is stored. Although this sequence data is not tone data, this track is also described in the MIDI data format in order to unify the implementation and facilitate the work process. The data type is system exclusive
It is a message. In the data description of the lyrics track, usually, one line of lyrics is treated as one piece of lyrics display data. The lyrics display data is composed of character data (character codes and display coordinates of the characters) of one line of lyrics, display time of the lyrics (usually around 30 seconds), and wipe sequence data. The wipe sequence data is sequence data for changing the display color of the lyrics in accordance with the progress of the song. The timing of changing the display color (the time from when the lyrics are displayed) and the change position (coordinates) ) Is data sequentially recorded over the length of one line.

【００２７】音声トラックは、音声データ部に記憶され
ている音声データｎ（ｎ＝１，２，３，‥‥）の発生タ
イミングなどを指定するシーケンストラックである。音
声データ部には、音源装置１８で合成しにくいバックコ
ーラスやハーモニー歌唱などの人声が記憶されている。
音声トラックには、音声指定データと、音声指定データ
の読み出し間隔、すなわち、音声データを音声データ処
理部１９に出力して音声信号形成するタイミングを指定
するデュレーションデータΔｔが書き込まれている。音
声指定データは、音声データ番号，音程データおよび音
量データからなっている。音声データ番号は、音声デー
タ部に記録されている各音声データの識別番号ｎであ
る。音程データ，音量データは、形成すべき音声データ
の音程や音量を指示するデータである。すなわち、言葉
を伴わない「アー」や「ワワワワッ」などのバックコー
ラスは、音程や音量を変化させれば何度も利用できるた
め、基本的な音程，音量で１つ記憶しておき、このデー
タに基づいて音程や音量をシフトして繰り返し使用す
る。音声データ処理部１９は音量データに基づいて出力
レベルを設定し、音程データに基づいて音声データの読
出間隔を変えることによって音声信号の音程を設定す
る。The audio track is a sequence track for specifying the generation timing of the audio data n (n = 1, 2, 3,...) Stored in the audio data section. The voice data section stores human voices such as back chorus and harmony singing that are difficult to synthesize by the sound source device 18.
In the audio track, the audio designation data and the reading interval of the audio designation data, that is, the duration data Δt that designates the timing at which the audio data is output to the audio data processing unit 19 and the audio signal is formed, are written. The voice designation data includes a voice data number, pitch data, and volume data. The audio data number is an identification number n of each audio data recorded in the audio data section. The pitch data and the volume data are data indicating the pitch and volume of the audio data to be formed. In other words, a back chorus without words, such as "Ah" or "Wawa Wawa", can be used many times by changing the pitch or volume. The pitch and volume are shifted based on and used repeatedly. The audio data processing unit 19 sets the output level based on the volume data, and sets the interval of the audio signal by changing the reading interval of the audio data based on the interval data.

【００２８】効果トラックには、効果用ＤＳＰ２０を制
御するためのＤＳＰコントロールデータが書き込まれて
いる。効果用ＤＳＰ２０は音源装置１８，音声データ処
理部１９，音声処理用ＤＳＰ３０から入力される信号に
対してリバーブなどの残響系の効果を付与する。ＤＳＰ
コントロールデータは、このような効果の種類を指定す
るデータおよびその変化量データなどからなっている。In the effect track, DSP control data for controlling the effect DSP 20 is written. The effect DSP 20 applies reverberation or other reverberation effects to signals input from the sound source device 18, the audio data processing unit 19, and the audio processing DSP 30. DSP
The control data includes data designating the kind of the effect and data of a change amount thereof.

【００２９】図２は前記音声処理用ＤＳＰ３０の動作を
説明する図である。音声処理用ＤＳＰ３０は内蔵されて
いるマイクロプログラムに基づき入力された歌唱音声信
号に対するハーモニー音声信号を形成するが、このマイ
クロプログラムをブロック化するとこの図のように表す
ことができる。FIG. 2 is a diagram for explaining the operation of the DSP 30 for audio processing. The voice processing DSP 30 forms a harmony voice signal for the input singing voice signal based on a built-in microprogram, and this microprogram can be represented as shown in FIG.

【００３０】マイク２７から入力されアンプ２８で増幅
されＡ／Ｄコンバータ２９でディジタル信号に変換され
た歌唱音声信号は、この音声処理用ＤＳＰ３０の周期検
出部４０，ピーク検出部４１，音素検出部４２，平均音
量検出部４３および乗算器４５に入力される。The singing voice signal input from the microphone 27, amplified by the amplifier 28, and converted into a digital signal by the A / D converter 29 is converted into a period detecting section 40, a peak detecting section 41, and a phoneme detecting section 42 of the voice processing DSP 30. , Are input to the average volume detector 43 and the multiplier 45.

【００３１】周期検出部４０は入力された歌唱音声信号
の波形に基づきその周期Ｔを検出する（図６（Ａ）参
照）。また、周期検出部４０は、ＣＰＵ１０から主旋律
データを入力している。主旋律データは、主旋律の周波
数を表すデータである。子音部や音の変わり目などで歌
唱音声信号の音程が不定になったとき、周期検出部４０
はこの主旋律データによって歌唱音声信号の周期を推定
することによって得た周期情報を出力する。周期検出部
４０は検出した周期情報をピーク検出部４１および窓関
数発生部４４に出力する。The cycle detecting section 40 detects the cycle T based on the waveform of the input singing voice signal (see FIG. 6A). Further, the cycle detecting unit 40 receives the main melody data from the CPU 10. The main melody data is data representing the frequency of the main melody. When the pitch of the singing voice signal becomes indefinite at a consonant part or a transition of a sound, the period detecting unit 40
Outputs period information obtained by estimating the period of the singing voice signal from the main melody data. The cycle detector 40 outputs the detected cycle information to the peak detector 41 and the window function generator 44.

【００３２】ピーク検出部４１は入力された歌唱音声信
号の１つの周期内におけるローカルピークを検出する
（図６（Ａ）参照）。周期検出部４０から入力される周
期情報によって１周期の間隔が決定される。ピーク検出
部４１は検出したピークタイミング情報を窓関数発生部
４４に出力する。The peak detector 41 detects a local peak in one cycle of the input singing voice signal (see FIG. 6A). The interval of one cycle is determined by the cycle information input from the cycle detector 40. The peak detector 41 outputs the detected peak timing information to the window function generator 44.

【００３３】音素検出部４２は、入力された歌唱音声信
号のレベルの切れ目や周波数成分の変化によって音素の
切れ目を検出する。ここで音素とは発音を個別の子音と
母音に分割した区間をいうものとする。図６（Ｂ）にお
いて、歌詞「あかしやの」は、それぞれ「あ」「か」
「し」「や」「の」の５個の音節からなっており、これ
らの音節は「ａ」「ｋ」「ａ」「ｓｈ」「ｉ」「ｙ」
「ａ」「ｎ」「ｏ」の９個の音素に分割することができ
る。各音節間にはレベルが低下する切れ目があり、子音
がホワイトノイズ的な非周期波形であるのに対し、母音
が周期波形であることなどに基づいて音素の分割を行
う。音素検出部４０は音素の切れ目を検出すると、切れ
目である旨を表示する情報を窓関数発生部４４に出力す
る。The phoneme detector 42 detects a phoneme break based on a level break or a change in frequency component of the input singing voice signal. Here, a phoneme refers to a section in which pronunciation is divided into individual consonants and vowels. In FIG. 6 (B), the lyrics “Ashiyano” are “A” and “K”, respectively.
It consists of five syllables, "shi", "ya", and "no", and these syllables are "a", "k", "a", "sh", "i", "y"
It can be divided into nine phonemes "a", "n" and "o". Between each syllable, there is a break in which the level decreases, and the phoneme is divided based on the fact that the consonant has a non-periodic waveform like a white noise, while the vowel has a periodic waveform. When detecting a break between phonemes, the phoneme detection unit 40 outputs information indicating that the break is a break to the window function generation unit 44.

【００３４】平均音量検出部４３は入力された歌唱音声
信号の振幅レベルを平滑して平均音量を検出する。平均
音量検出部４３は検出した平均音量情報を音量制御部５
０に出力する。The average volume detector 43 detects the average volume by smoothing the amplitude level of the input singing voice signal. The average volume detection unit 43 uses the detected average volume information as the volume control unit 5
Output to 0.

【００３５】窓関数発生部４４は図６（Ｃ）に示すよう
な窓関数を出力する。この窓関数は乗算器４５に出力さ
れる。乗算器４５には上述したように歌唱音声信号が入
力されているため、歌唱音声信号がこの窓関数の部分の
み切り取られることになる（図６（Ｃ）参照）。窓関数
としては、開始から終了まで微分的に連続な関数を使用
することが望ましい。微分的に連続な関数を使用する
と、歌唱音声信号の一部（１周期）のみを切り出して
も、切り出しの境界でノイズを発生することがない。こ
のため、このＤＳＰ３０では、ｓｉｎ²（ωｔ／２）
（ｔ＝０〜Ｔ：Ｔは歌唱音声信号の１周期）を使用して
いる。この式からも明らかなように、窓関数の長さは歌
唱音声信号の１周期である。１周期の長さは周期検出部
４０から入力される周期情報によって与えられる。ま
た、窓関数発生部４４は、数十ｍｓ〜１００ｍｓの適当
な間隔で繰り返し窓関数を発生する。このようにある程
度時間をあけて窓関数を発生するのは、同じ波形要素デ
ータをある程度継続しないと、その波形要素データの音
色が聴取者に認識されないからである。一方、音素検出
部４２から音素の切れ目を表示する情報が入力されたと
きには必ず窓関数を発生して新たな音素の波形要素デー
タの切り出しを行う。これは、音素が切り換わると音色
が全く変わるため、これに追従するためである。また、
窓関数の開始タイミングは、ピーク検出部４１から入力
されたピークが窓関数の中央に来るように、ピークと次
のピークの中間点すなわち最もレベルの低い点となるよ
うに制御される。The window function generator 44 outputs a window function as shown in FIG. This window function is output to the multiplier 45. Since the singing voice signal is input to the multiplier 45 as described above, only the window function portion of the singing voice signal is cut off (see FIG. 6C). It is desirable to use a function that is differentially continuous from the start to the end as the window function. If a differentially continuous function is used, noise is not generated at the boundary of the cut even if only a part (one cycle) of the singing voice signal is cut. Therefore, in this DSP 30, sin ² (ωt / 2)
(T = 0 to T: T is one cycle of the singing voice signal). As is apparent from this equation, the length of the window function is one cycle of the singing voice signal. The length of one cycle is given by the cycle information input from the cycle detector 40. Further, the window function generating section 44 repeatedly generates a window function at an appropriate interval of several tens ms to 100 ms. The reason why the window function is generated with a certain time interval is that if the same waveform element data is not continued to some extent, the timbre of the waveform element data will not be recognized by the listener. On the other hand, whenever information indicating a break between phonemes is input from the phoneme detection unit 42, a window function is generated to cut out waveform element data of a new phoneme. This is because the timbre changes completely when the phoneme is switched, so that it follows the change. Also,
The start timing of the window function is controlled so that the peak input from the peak detection unit 41 is located at the center of the window function, that is, the midpoint between the peak and the next peak, that is, the lowest level point.

【００３６】上記のような窓関数で切り出された波形要
素データは、歌唱音声信号の音色すなわちフォルマント
（倍音成分）をほぼそのまま保存したものとなる。The waveform element data cut out by the window function as described above is obtained by storing the timbre of the singing voice signal, that is, the formant (overtone component) almost as it is.

【００３７】窓関数発生部４４は、窓関数を発生すると
同時に、窓関数を発生する旨およびその長さに関する情
報を書込制御部４７に出力する。書込制御部４７は、こ
の情報に対応して窓関数の開始から終了までの間、サン
プリングクロック（４４．１ｋＨｚ）に同期して歩進す
る書込アドレスをメモリ４６に入力する。この書込アド
レスの入力により、乗算器４５で切り出された波形要素
データはメモリ４６に記憶される。The window function generator 44 outputs information indicating that the window function is to be generated and its length to the write controller 47 at the same time as generating the window function. The write control unit 47 inputs a write address that advances in synchronization with the sampling clock (44.1 kHz) from the start to the end of the window function in accordance with this information. By inputting the write address, the waveform element data cut out by the multiplier 45 is stored in the memory 46.

【００３８】以上の構成により、メモリ４６には、その
ときの歌唱音声信号の１周期分の波形要素データが記憶
される。この波形要素データを任意の周期で繰り返し読
み出すことにより、その任意の周期の基本周波数を有
し、波形要素データすなわち歌唱音声信号の音色（倍音
構成）を備えた音声信号を合成することができる。そこ
で、この波形要素データを歌唱音声信号から３度，５度
など協和する周波数関係にあるハーモニー周波数の周期
で繰り返し読み出すことにより、その周波数で且つ歌唱
音声信号と同じ音色のハーモニー音声信号を形成するこ
とができる。なお、この実施例では歌唱音声信号の周波
数は、主旋律データの周波数と概ね一致しているとし
て、楽曲データ中に記憶されているハーモニー旋律デー
タの周波数を用いてハーモニー音声信号を合成するよう
にしている。ハーモニー旋律データは図３，図４で説明
したように、主旋律トラックに記憶されているシーケン
スデータである主旋律データに対して協和する周波数関
係を有するシーケンスデータであり、ハーモニートラッ
クに記憶されているものである。With the above configuration, the memory 46 stores the waveform element data for one cycle of the singing voice signal at that time. By repeatedly reading out the waveform element data at an arbitrary cycle, it is possible to synthesize an audio signal having the fundamental frequency of the arbitrary cycle and having the timbre (overtone configuration) of the singing voice signal. Therefore, by repeatedly reading out the waveform element data from the singing voice signal at a harmony frequency cycle having a consonant frequency relationship such as 3rd or 5th, a harmony voice signal having the same frequency and the same timbre as the singing voice signal is formed. be able to. In this embodiment, it is assumed that the frequency of the singing voice signal substantially matches the frequency of the main melody data, and that the harmony voice signal is synthesized using the frequency of the harmony melody data stored in the music data. I have. As described with reference to FIGS. 3 and 4, the harmony melody data is sequence data having a frequency relationship that harmonizes with the main melody data, which is the sequence data stored in the main melody track, and is stored in the harmony track. It is.

【００３９】メモリ４６の読出制御は読出制御部４８が
行う。読出制御部４８にはＣＰＵ１０からハーモニーデ
ータが入力されている。このハーモニーデータは、楽音
データのハーモニートラックから読み出されたイベント
データである。読出制御部４８はこのハーモニーデータ
の周波数でメモリ４６を繰り返しアクセスする。すなわ
ち、１秒間にハーモニー旋律データの周波数回だけ波形
要素データを繰り返して読み出す。このハーモニー旋律
が主旋律よりも周波数が低い場合には、ハーモニー音声
信号は、図７（Ａ）に示すように波形要素データがデー
タ長Ｔよりも長いＴ１の間隔をおいて配列された波形と
なる。このハーモニー旋律が主旋律よりも周波数が高い
場合には、ハーモニー音声信号は、図７（Ｂ）に示すよ
うに波形要素データがデータ長Ｔよりも短いＴ２の間隔
で互いに重なりあって配列された波形となる。これによ
り、ハーモニー音声信号の基本周波数は１／Ｔ１および
１／Ｔ２となるが各波形要素データ中の倍音成分はその
まま保存されているため、歌唱音声信号と同様のフォル
マントが形成される。また、窓関数が微分的に連続であ
るためノイズが発生することはない。The read control of the memory 46 is performed by a read control unit 48. Harmony data is input from the CPU 10 to the read control unit 48. The harmony data is event data read from the harmony track of the musical sound data. The read control unit 48 repeatedly accesses the memory 46 at the frequency of the harmony data. That is, the waveform element data is repeatedly read out only for the frequency of the harmony melody data in one second. When the harmony melody has a lower frequency than the main melody, the harmony sound signal has a waveform in which the waveform element data is arranged at intervals of T1 longer than the data length T as shown in FIG. . When the harmony melody has a higher frequency than the main melody, the harmony sound signal has a waveform in which the waveform element data are arranged so as to overlap each other at intervals of T2 shorter than the data length T as shown in FIG. Becomes As a result, the fundamental frequencies of the harmony voice signal are 1 / T1 and 1 / T2, but the harmonic components in each waveform element data are stored as they are, so that a formant similar to the singing voice signal is formed. Further, since the window function is differentially continuous, no noise is generated.

【００４０】上記のようにメモリ４６から波形要素デー
タを繰り返し読み出すことによって形成されたハーモニ
ー音声信号は切換器４９を経て乗算器５１に入力され
る。乗算器５１は音量制御部５０から音量制御データが
入力される。音量制御部５０は前記平均音量検出部４３
から歌唱音声信号の平均音量情報を入力しており、この
平均音量情報に基づいて音量制御データを発生する。音
量制御データは、たとえば平均音量情報の８０パーセン
トの値に設定される。乗算器５１で音量制御をされたハ
ーモニー音声信号は効果用ＤＳＰ２０に出力される。な
お、切換器４９はフレーズの切れ目などで強制的に出力
を０にするとき使用される。The harmony sound signal formed by repeatedly reading the waveform element data from the memory 46 as described above is input to the multiplier 51 via the switch 49. The multiplier 51 receives volume control data from the volume controller 50. The volume control unit 50 is provided with the average volume detection unit 43.
And average volume information of the singing voice signal, and generates volume control data based on the average volume information. The volume control data is set to, for example, a value of 80% of the average volume information. The harmony sound signal whose volume has been controlled by the multiplier 51 is output to the effect DSP 20. The switch 49 is used when the output is forcibly set to 0 at a break between phrases.

【００４１】音声処理用ＤＳＰ３０の以上のような動作
により、歌唱者の歌唱音声信号の音色をそのまま保存し
たハーモニー音声信号を形成することができるととも
に、歌唱音声信号の周期（周波数）を検出できない場合
でも、ハーモニー音声信号を支障なく形成することがで
きる。すなわち、通常の歌唱者の歌唱であれば、歌唱音
声信号の周波数と主旋律データで与えられる周波数とは
それほど差がないと考えられるため、その周期で波形要
素データを切り出して、所定の周波数でこれを再合成す
ることにより、ほぼ歌唱者の音色を維持したハーモニー
音声信号を形成することができる。With the above-described operation of the voice processing DSP 30, it is possible to form a harmony voice signal that preserves the timbre of the singer's singing voice signal as it is, and to detect the period (frequency) of the singing voice signal. However, the harmony sound signal can be formed without any trouble. That is, in the case of a normal singer, it is considered that there is not much difference between the frequency of the singing voice signal and the frequency given by the main melody data. By re-synthesizing the harmony voice signal, it is possible to form a harmony voice signal that substantially maintains the timbre of the singer.

【００４２】このように上記実施例では、音声処理用Ｄ
ＳＰ３０に周期検出部４０を設け、歌唱音声信号から周
期を検出して窓関数の長さを決定するようにし、この周
期が検出できないときのみＣＰＵ１０から入力される主
旋律データを用いて窓関数の長さを決定するようにして
いるが、図８のように、ＤＳＰ３０で歌唱音声信号の周
期を求めることなく、窓関数の長さの決定を全て主旋律
データで行うようにすれば周期検出部４０が不要とな
る。この場合には、ピーク検出部４１も主旋律データか
ら与えられる周期（周波数）に基づいてローカルピーク
を検出することになる。この図では、ピーク検出部４１
も窓関数発生部４４も直接主旋律データを入力して周期
を求めるようにしている。主旋律データは元より周波数
データであるため、これから周期を求めることは極めて
容易であり、別途周期演算部を設ける必要はない。上述
したように、通常の歌唱であれば歌唱音声信号の周波数
と主旋律データから与えられる周波数とは殆ど差がない
ため、この構成で実用上問題となることはない。As described above, in the above embodiment, the audio processing D
A period detection unit 40 is provided in the SP 30 to detect the period from the singing voice signal and determine the length of the window function. Only when this period cannot be detected, the length of the window function is determined using the main melody data input from the CPU 10. However, as shown in FIG. 8, if the determination of the length of the window function is entirely performed on the main melody data without determining the period of the singing voice signal by the DSP 30, the period detection unit 40 It becomes unnecessary. In this case, the peak detection unit 41 also detects a local peak based on a cycle (frequency) given from the main melody data. In this figure, the peak detector 41
The window function generator 44 also directly inputs the main melody data to determine the period. Since the main melody data is originally frequency data, it is extremely easy to determine the period from this, and it is not necessary to provide a separate period calculation unit. As described above, in the case of ordinary singing, there is almost no difference between the frequency of the singing voice signal and the frequency given from the main melody data, so that this configuration does not pose a practical problem.

【００４３】さらに、歌唱音声信号の音素の切り換わり
を検出する音素検出部４２を省略しても窓関数の発生を
適当な間隔（例えば５０ｍｓ）で繰り返し行うようにす
ることにより、聴覚上それほどずれた印象を与えること
なくハーモニー音声信号を歌唱音声信号に追従させるこ
とができる。また、窓関数が微分的に連続していること
から、どの位相から波形信号を切り出しても、ノイズを
生じさせることなく波形要素データの切り出しが可能で
ある。したがって、歌唱音声信号のピークが窓関数の端
部にくることによって抑制され倍音構成が多少変化する
ことを許すのであれば、ピーク検出部４１を省略し歌唱
音声信号との位相関係を考慮に入れずに窓関数を発生す
るようにしてもよい。さらに、歌唱者の歌唱ボリューム
に追従する必要がなければ平均音量検出部４３および音
量制御部５０、乗算器５１も不要である。したがって、
本願発明の最低限の基本的な機能である「歌唱音声信号
から窓関数を用いて波形要素データを切り出し、この波
形要素データを所定周期で繰り返すことによりハーモニ
ー音声信号を形成する」ことを実現するためには、図９
に示す構成が備わっていればよい。Furthermore, even if the phoneme detecting unit 42 for detecting the switching of phonemes of the singing voice signal is omitted, the window function is repeatedly generated at an appropriate interval (for example, 50 ms), so that the auditory deviation is not so large. The harmony voice signal can be made to follow the singing voice signal without giving an impression that the harmony voice signal does. Further, since the window function is differentially continuous, it is possible to cut out the waveform element data without generating noise, regardless of the phase from which the waveform signal is cut out. Therefore, if the peak of the singing voice signal is suppressed by coming to the end of the window function and the overtone structure is allowed to slightly change, the peak detecting unit 41 is omitted and the phase relationship with the singing voice signal is taken into consideration. Instead, a window function may be generated. Further, if there is no need to follow the singing volume of the singer, the average volume detection unit 43, the volume control unit 50, and the multiplier 51 are also unnecessary. Therefore,
The minimum basic function of the present invention, "forming a harmony voice signal by extracting waveform element data from a singing voice signal using a window function and repeating this waveform element data at a predetermined cycle" is realized. Figure 9
The configuration shown in FIG.

【００４４】なお、上記実施例ではハーモニー音声信号
を形成する音声処理用ＤＳＰ３０をカラオケ装置に内蔵
しているが、この音声処理用ＤＳＰ３０およびその前後
のＡ／Ｄコンバータ２９，Ｄ／Ａコンバータ（図２，図
８の破線部参照）を一体化して図１０のような外付の装
置（ハーモニー生成装置）として構成してもよい。この
場合には、同図に示すように、カラオケ演奏装置６１か
ら主旋律情報，ハーモニー情報を受け取り、マイク６２
から歌唱者のアナログ歌唱音声信号を入力してこれらに
基づいてハーモニー音声信号を生成する。カラオケ演奏
装置６１は操作パネルの操作に基づいて選曲（楽曲デー
タの選択）やピッチシフト（読み出されるイベントデー
タの音高情報の書き換え）等を行い、楽音データなどを
音源に出力し、歌詞データなどを画像制御部に出力す
る。In the above embodiment, the audio processing DSP 30 for forming the harmony audio signal is incorporated in the karaoke apparatus. However, the audio processing DSP 30, the A / D converters 29 before and after the audio processing DSP 30, and the D / A converter (see FIG. 2, see the broken line in FIG. 8) to form an external device (harmony generating device) as shown in FIG. In this case, as shown in the figure, the main melody information and the harmony information are received from the karaoke performance device 61, and the microphone 62
, The singer's analog singing voice signal is input, and a harmony voice signal is generated based on these signals. The karaoke performance device 61 performs music selection (selection of music data) and pitch shift (rewrite of pitch information of event data to be read) based on operation of the operation panel, outputs musical sound data and the like to a sound source, and lyrics data and the like. Is output to the image control unit.

【００４５】以上のようにこの実施例では歌唱信号を処
理するカラオケ装置について説明したが、この発明はこ
れに限らず楽器の演奏による楽音信号に対しても適用す
ることができる。As described above, in this embodiment, the karaoke apparatus for processing a singing signal has been described. However, the present invention is not limited to this, and can be applied to a tone signal generated by playing a musical instrument.

【００４６】なお、実施例中に開示されているが、特許
請求の範囲に記載しなかった発明を従属形式で記載する
と以下のようになる。The invention disclosed in the embodiments but not described in the claims is described in the dependent form as follows.

【００４７】〔請求項４〕前記波形要素データ抽出手
段は、波形要素データの切り出しを前記旋律情報で規定
される１周期よりも十分長い間隔で繰り返し実行する手
段である請求項１または請求項２に記載のハーモニー生
成装置。この発明では、波形要素データの切り出しを前
記旋律情報で規定される１周期よりも十分長い間隔で繰
り返し実行するようにした。ハーモニー形成手段は、新
たな波形要素データが切り出されるまで同じ波形要素デ
ータでハーモニー音声信号を形成するため、ある程度の
波数同じ波形が繰り返されることにより、聴取者にもそ
の波形の音色が認識可能となり入力された音声信号と同
じ音色であることが認識され易くなる。[4] The waveform element data extracting means is a means for repeatedly executing the extraction of the waveform element data at intervals sufficiently longer than one cycle defined by the melody information. The harmony generation device according to item 1. In the present invention, the extraction of the waveform element data is repeatedly executed at intervals sufficiently longer than one cycle defined by the melody information. The harmony forming means forms a harmony sound signal with the same waveform element data until new waveform element data is cut out.Therefore, by repeating the same waveform to some extent, the listener can recognize the timbre of the waveform. It becomes easy to recognize that the tone is the same as the input voice signal.

【００４８】〔請求項５〕前記波形要素データ抽出手
段は、微分的に連続な窓関数を波形要素データに乗算す
ることにより波形要素データを切り出す手段である請求
項１または請求項２に記載のハーモニー生成装置。この
発明では、波形要素データの切り出しを微分的に連続な
窓関数を波形要素データに乗算することにより行うよう
にした。これにより、切り出された波形要素データもそ
の始点および終点で微分的に連続したものとなり、これ
を再生しても始点，終点でノイズが生じることがない。(5) The waveform element data extracting means according to (1) or (2), wherein the waveform element data is extracted by multiplying the waveform element data by a differentially continuous window function. Harmony generator. In the present invention, the extraction of the waveform element data is performed by multiplying the waveform element data by a differentially continuous window function. As a result, the cut-out waveform element data is also differentially continuous at its start point and end point, and no noise is generated at the start point and end point even if it is reproduced.

【００４９】〔請求項６〕入力された音声信号の波形
のピークを検出するピーク検出手段を備え、前記波形要
素データ抽出手段を、検出されたピークがその中央にな
るように波形要素データを切り出す手段としたことを特
徴とする請求項５に記載のハーモニー生成装置。この発
明では、入力された音声信号の波形のピークを検出し、
検出されたピークがその中央になるように波形要素デー
タを切り出す。これにより、波形レベルの大きい部分が
波形要素データの中央に波形レベルの小さい部分が波形
要素データの端部に位置することになりノイズの発生が
抑制されるとともに、倍音成分がよりよく保存される。[Claim 6] There is provided peak detecting means for detecting the peak of the waveform of the input audio signal, and the waveform element data extracting means cuts out the waveform element data so that the detected peak is located at the center thereof. The harmony generation device according to claim 5, wherein the harmony generation device is a means. In the present invention, the peak of the waveform of the input audio signal is detected,
Waveform element data is cut out such that the detected peak is at the center. Thus, a portion having a higher waveform level is located at the center of the waveform element data, and a portion having a lower waveform level is located at an end of the waveform element data. This suppresses the occurrence of noise and preserves harmonic components better. .

【００５０】〔請求項７〕入力された音声信号の子
音、母音の切り換わりを検出する音素検出手段を備え、
前記波形要素データ抽出手段を、音素の切り換わりが検
出されたとき波形要素データの切り出しを実行する手段
としたことを特徴とする請求項１または請求項２に記載
のハーモニー発生装置。この発明では、入力された音声
信号の子音・母音の切り換わりを検出し、この切り換わ
り時に波形要素データの切り出しを実行するようにし
た。これにより入力音声信号の波形の変化に速やかに追
従することができる。[Claim 7] There is provided a phoneme detecting means for detecting a switch between a consonant and a vowel of the input voice signal,
3. The harmony generating device according to claim 1, wherein said waveform element data extracting means is means for executing extraction of waveform element data when switching of phonemes is detected. According to the present invention, switching between consonants and vowels in an input audio signal is detected, and at the time of switching, waveform element data is cut out. Thereby, it is possible to quickly follow a change in the waveform of the input audio signal.

【００５１】〔請求項８〕入力された音声信号の音量
を検出する音量検出手段と、前記ハーモニー形成手段が
形成したハーモニー音声信号の音量を前記音量検出手段
の検出値に基づいて制御する音量制御手段とを備えたこ
とを特徴とする請求項１または請求項２に記載のハーモ
ニー生成装置。この発明では、入力された音声信号の音
量を検出し、この検出内容に基づいてハーモニー音声信
号の音量を制御する。これにより、入力された音声信号
とハーモニー音声信号との音量バランスを常時維持する
ことができる。[Claim 8] A sound volume detecting means for detecting the sound volume of the input sound signal, and a sound volume control for controlling the sound volume of the harmony sound signal formed by the harmony forming means based on the detection value of the sound volume detecting means. The harmony generation device according to claim 1 or 2, further comprising means. According to the present invention, the volume of the input audio signal is detected, and the volume of the harmony audio signal is controlled based on the detected content. This makes it possible to always maintain the volume balance between the input audio signal and the harmony audio signal.

【００５２】[0052]

【発明の効果】以上のようにこの発明のハーモニー生成
装置は、入力された音声信号の１周期分の波形要素デー
タをハーモニー周波数で繰り返すことによってハーモニ
ー音声信号を形成するようにしたことにより、入力され
た音声信号の波形要素データの倍音成分がそのまま含ま
れた同じ音色のハーモニー音声信号を形成することがで
きる。また、このとき、波形要素データの切り出しを旋
律情報供給手段から供給される旋律情報に基づいて行う
ことにより、音声信号の周波数を検出できない場合でも
的確な波形要素データを切り出してハーモニー音声信号
を形成することができる。また、音声信号の周波数を全
く検出しないで波形要素データの切り出しを行うことも
できる。As described above, the harmony generating apparatus of the present invention forms the harmony sound signal by repeating the waveform element data for one cycle of the input sound signal at the harmony frequency. It is possible to form a harmony sound signal of the same timbre in which the overtone component of the waveform element data of the generated sound signal is included as it is. At this time, by extracting the waveform element data based on the melody information supplied from the melody information supply means, even when the frequency of the audio signal cannot be detected, accurate waveform element data is extracted to form a harmony audio signal. can do. It is also possible to cut out the waveform element data without detecting the frequency of the audio signal at all.

[Brief description of the drawings]

【図１】この発明の実施例である音声変換カラオケ装置
のブロック図FIG. 1 is a block diagram of a voice conversion karaoke apparatus according to an embodiment of the present invention.

【図２】同音声変換カラオケ装置の音声処理用ＤＳＰの
構成を示す図FIG. 2 is a diagram showing a configuration of a voice processing DSP of the voice conversion karaoke apparatus.

【図３】同音声変換カラオケ装置に用いられる楽曲デー
タの構成を示す図FIG. 3 is a diagram showing a configuration of music data used in the voice conversion karaoke apparatus.

【図４】同音声変換カラオケ装置に用いられる楽曲デー
タの構成を示す図FIG. 4 is a diagram showing a configuration of music data used in the voice conversion karaoke apparatus.

【図５】同音声変換カラオケ装置に用いられる楽曲デー
タの構成を示す図FIG. 5 is a diagram showing a configuration of music data used in the voice conversion karaoke apparatus.

【図６】歌唱音声信号から波形要素データの切り出し方
式を説明する図FIG. 6 is a diagram for explaining a method of extracting waveform element data from a singing voice signal.

【図７】ハーモニー音声信号の形成方式を説明する図FIG. 7 is a diagram illustrating a method of forming a harmony audio signal.

【図８】音声処理用ＤＳＰの他の実施例を示す図FIG. 8 is a diagram showing another embodiment of a DSP for audio processing.

【図９】音声処理用ＤＳＰの他の実施例を示す図FIG. 9 is a diagram showing another embodiment of a voice processing DSP.

【図１０】この発明の他の実施例であるハーモニー生成
装置を示す図FIG. 10 is a diagram showing a harmony generation device according to another embodiment of the present invention.

[Explanation of symbols]

３０−音声処理用ＤＳＰ、４４−窓関数発生部、４８−
読出制御部30-DSP for voice processing, 44-Window function generator, 48-
Read control unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10K 15/04 302 G10H 1/00 G10H 1/10 G10L 21/04 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10K 15/04 302 G10H 1/00 G10H 1/10 G10L 21/04

Claims

(57) [Claims]

An audio signal input means for inputting a voice signal of a singing or a performance, a melody information supply means for supplying melody information which is frequency information of the audio signal, and a frequency harmonizing with the frequency of the audio signal. A harmony frequency supply unit that supplies a harmony frequency; a waveform element data extraction unit that cuts out a waveform for one cycle of the melody information from the audio signal as waveform element data;
A harmony generating means for forming a harmony audio signal by repeatedly reading the waveform element data at the harmony frequency.

2. An audio signal input means for inputting a voice signal of a singing or performance, a frequency detecting means for detecting a frequency of the audio signal input from the audio signal input means, and a melody which is frequency information of the audio signal. Melody information supply means for supplying information, harmony frequency supply means for supplying a harmony frequency that is a frequency that is in harmony with the singing or performance, and when the frequency detection means can detect the frequency of the audio signal, from the audio signal A waveform for one cycle of the frequency is cut out as waveform element data, and when the frequency detection means cannot detect the frequency of the audio signal, a waveform for one cycle of the melody information is cut out as waveform element data. Means for forming a harmony audio signal by repeatedly reading the waveform element data at the harmony frequency. Harmony generating device is characterized in that a harmony forming means for.

3. The melody information supply means includes a melody information storage means for storing a plurality of melody information in time series, and a means for sequentially reading melody information from the melody information storage means and supplying the melody information to the waveform element data extraction means. The harmony frequency supply means includes harmony frequency storage means for storing a plurality of harmony frequencies in time series, and means for sequentially reading harmony frequencies from the harmony frequency storage means and supplying the harmony frequencies to the harmony formation means. 3. The method according to claim 1, wherein
The harmony generation device according to item 1.