JPH10268895A

JPH10268895A - Voice signal processing device

Info

Publication number: JPH10268895A
Application number: JP9077080A
Authority: JP
Inventors: Shuichi Matsumoto; 秀一松本
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1997-03-28
Filing date: 1997-03-28
Publication date: 1998-10-09

Abstract

PROBLEM TO BE SOLVED: To provide a voice signal processing device that can convert a singing voice signal of a singer to that of an original singer. SOLUTION: A pitch/sound volume extracting part 31 extracts pitch (frequency) data and sound volume data from a singing voice signal of a singer. An impulse generation part 32 generates a pulse train with an interval based on the pitch data. The pulse train has a wide frequency spectrum near a white noise, but a signal of a tone of an original singer is made by cutting this with formant of an original singer in a filter 33. This formant data is stored as sequence data with music data for KARAOKE playing, read out in parallel with KARAOKE playing, and written in a formant data buffer 36. A signal of a tone of an original singer can be made quite the same as that of a song of a singer by gain-controlling the signal of timre of an original singer with the sound volume data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、歌唱者の音声を
他人の音声に変換する音声信号処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio signal processing device for converting a singer's voice into another person's voice.

【０００２】[0002]

【従来の技術】オリジナルが女性の曲を男性が歌った
り、オリジナルが男性の曲を女性が歌ったりできるよう
にするため、男性の声を女性の声に、女性の声を男性の
声に音域変換する機能を備えたカラオケ装置が提案され
ている。2. Description of the Related Art In order to enable men to sing original female songs, and to sing original male songs by female, the range of male voices to female voices and female voices to male voices A karaoke apparatus having a function of converting has been proposed.

【０００３】音声信号の周波数を変換する場合、単純に
音声信号波形を圧縮・伸長したのでは、録音テープの早
回しや遅回しのように通常の人声とかけ離れた音色にな
ってしまうため、上記音域変換機能ではフォルマントシ
フト方式を用いている。In the case of converting the frequency of an audio signal, simply compressing / expanding the audio signal waveform results in a tone far apart from a normal human voice, such as advancing or slowing a recording tape. The above range conversion function uses a formant shift method.

【０００４】フォルマントシフト方式とは、歌唱者の音
声信号から３０〜６０ｍｓ程度の連続波形をハミング関
数で切り出し、これを変換先の周波数の時間間隔で配列
することによって歌唱者の特徴的な周波数成分（フォル
マント）を保存したまま歌唱の周波数を変換するもので
ある。In the formant shift method, a continuous waveform of about 30 to 60 ms is cut out from a voice signal of a singer by a Hamming function, and this is arranged at time intervals of a frequency of a conversion destination to thereby obtain a characteristic frequency component of the singer. It converts the singing frequency while preserving (formant).

【０００５】[0005]

【発明が解決しようとする課題】しかし、上記音域変換
機能では、男性が女性の声で、女性が男性の声で歌唱で
きるようになっても、フォルマントが同じであり、オリ
ジナルの歌手のような声で歌いたいという要求に応える
ことができない欠点があった。また、このオリジナル歌
手の声で歌いたいという要求は、男性が男性の曲を歌う
場合でも、女性が女性の曲を歌う場合でも同様にあり、
現在のカラオケ装置ではこれに応えることができなかっ
た。However, in the above-mentioned range conversion function, even if a man can sing with a female voice and a woman can sing with a male voice, the formants are the same, and the singing is similar to that of the original singer. There was a drawback that we could not meet the demand for singing in a voice. The demand for singing in the voice of the original singer is the same whether a man sings a male song or a woman sings a female song,
Current karaoke equipment could not respond to this.

【０００６】この発明は、歌唱者の歌唱音声をオリジナ
ル歌手の音声に変換することができる音声信号処理装置
を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide an audio signal processing device capable of converting a singing voice of a singer into a voice of an original singer.

【０００７】[0007]

【課題を解決するための手段】この出願の請求項１の発
明は、特定人の歌唱音声のフォルマントの変化をシーケ
ンスデータとして記憶する記憶手段と、入力された歌唱
音声信号の周波数をリアルタイムに検出する周波数検出
手段と、前記周波数検出手段が検出した周波数に応じた
間隔でインパルス列を発生するインパルス発生手段と、
前記シーケンスデータを歌唱の進行に合わせて読み出す
読出手段と、前記インパルス発生手段が発生したインパ
ルス列の周波数成分を前記読出手段が読み出したフォル
マントで整形して音声信号として出力する変換音声合成
手段と、を備えたことを特徴とする。According to the first aspect of the present invention, a storage means for storing a change in the formant of a specific person's singing voice as sequence data, and a frequency of an input singing voice signal is detected in real time. Frequency detecting means, and impulse generating means for generating an impulse train at intervals according to the frequency detected by the frequency detecting means,
Reading means for reading the sequence data in accordance with the progress of the singing; and converted speech synthesis means for shaping the frequency component of the impulse train generated by the impulse generating means with the formant read by the reading means and outputting as an audio signal, It is characterized by having.

【０００８】この出願の請求項２の発明は、前記入力さ
れた歌唱音声信号の音量をリアルタイムに検出する音量
検出手段と、前記変換音声合成手段が合成した音声信号
の音量を、前記音量検出手段が検出した音量に制御する
音量制御手段と、を備えたことを特徴とする。According to a second aspect of the present invention, there is provided a sound volume detecting means for detecting the volume of the inputted singing voice signal in real time, and a volume of the voice signal synthesized by the converted voice synthesizing means. And sound volume control means for controlling the sound volume to the detected level.

【０００９】請求項１の発明では、記憶手段が、オリジ
ナル歌手などの特定人が曲を歌唱した音声のフォルマン
トの変化をシーケンスデータとして記憶している。この
シーケンスデータは、歌唱音声のフォルマントを所定の
フレーム周期毎に抽出し、これを時系列に記憶したもの
であり、前記特定人の歌唱音声の特徴が曲の進行に従っ
て記憶された時系列のデータである。According to the first aspect of the present invention, the storage means stores, as sequence data, a change in a formant of a voice singing a song by a specific person such as an original singer. This sequence data is obtained by extracting a singing voice formant for each predetermined frame cycle and storing the extracted singing voice in a time-series manner. The time-series data in which the characteristics of the specific person's singing voice are stored as the song progresses It is.

【００１０】一方、マイクなどから入力された通常の歌
唱者の歌唱音声信号は周波数検出手段によって処理さ
れ、その周波数が検出される。インパルス発生手段は、
この周波数に応じた間隔でインパルス列を発生する。イ
ンパルス列は、図４（Ａ）に示すように広範な周波数ス
ペクトルを有し、且つ、前記通常の歌唱者の歌唱周波数
を情報として含んでいる。整形手段が、このインパルス
列の周波数スペクトルが前記記憶手段から読み出される
フォルマントの周波数特性になるように形成し、歌唱者
の歌唱周波数で前記特定人の音色に変換された歌唱音声
を合成する。上記のように記憶手段はフォルマントの変
化をシーケンスデータとして記憶しているため、この変
換された歌唱音声は、前記特定人が歌唱したときの音色
変化と同じ音色変化を生じ、さも特定人が歌唱したよう
な歌唱音声にすることができる。なお、男性の声を女性
の声に変換する場合には発生するインパルス列の間隔を
１／２にしてオクターブ高くし、女性の声を男性の声に
変換する場合には発生するインパルス列の間隔を２倍に
してオクターブ低くすればよい。On the other hand, a singing voice signal of a normal singer input from a microphone or the like is processed by frequency detecting means, and the frequency is detected. The impulse generation means
An impulse train is generated at intervals according to this frequency. The impulse train has a wide frequency spectrum as shown in FIG. 4A, and includes the singing frequency of the ordinary singer as information. The shaping means forms the frequency spectrum of the impulse train so as to have the frequency characteristics of the formant read from the storage means, and synthesizes the singing voice converted to the specific person's timbre at the singing frequency of the singer. Since the storage means stores the change of the formant as sequence data as described above, the converted singing voice produces the same timbre change as the timbre change when the specific person sings, and the specific person sings again. A singing voice like that described above can be obtained. When converting a male voice into a female voice, the interval between impulse trains generated is reduced by half to increase the octave, and when converting a female voice into a male voice, the interval between impulse trains generated is performed. May be doubled to lower the octave.

【００１１】さらに、請求項２の発明では、音量検出手
段が、前記通常の歌唱者の歌唱音声信号から音量を検出
し、音量制御手段が、前記整形手段が出力した音声信号
を前記音量検出手段が検出した音量になるように制御す
る。これにより、歌唱者の周波数・音量で前記特定人の
声質の歌唱音声を合成することができる。Further, in the invention according to claim 2, the sound volume detecting means detects a sound volume from the singing voice signal of the ordinary singer, and the sound volume controlling means converts the sound signal output by the shaping means to the sound volume detecting means. Is controlled so that the volume becomes the detected volume. This makes it possible to synthesize a singing voice having the voice quality of the specific person at the frequency and volume of the singer.

【００１２】[0012]

【発明の実施の形態】図面を参照してこの発明の実施形
態について説明する。図１はこの発明が適用されるカラ
オケ装置のブロック図である。カラオケ装置は、カラオ
ケ装置本体１，コントロールアンプ２，音声信号処理装
置３，ＬＤチェンジャ４，スピーカ５，モニタ６，マイ
ク７および赤外線のリモコン装置８で構成されている。
装置全体の動作を制御するカラオケ装置本体１のＣＰＵ
１０には、内部バスを介してＲＯＭ１１，ＲＡＭ１２，
ハードディスク記憶装置１７，通信制御部１６，リモコ
ン受信部１３，表示パネル１４，パネルスイッチ１５，
音源装置１８，音声データ処理部１９，文字表示部２
０，表示制御部２１が接続されるとともに、上記外部装
置であるコントロールアンプ２，音声信号処理装置３お
よびＬＤチェンジャ４がインタフェースを介して接続さ
れている。Embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of a karaoke apparatus to which the present invention is applied. The karaoke apparatus includes a karaoke apparatus main body 1, a control amplifier 2, an audio signal processing apparatus 3, an LD changer 4, a speaker 5, a monitor 6, a microphone 7, and an infrared remote controller 8.
CPU of karaoke apparatus main body 1 for controlling the operation of the entire apparatus
ROM 10, RAM 12, and 10 via an internal bus.
A hard disk storage device 17, a communication control unit 16, a remote control receiving unit 13, a display panel 14, a panel switch 15,
Sound source device 18, voice data processing unit 19, character display unit 2
0, the display control unit 21 is connected, and the control amplifier 2, the audio signal processing device 3, and the LD changer 4, which are the external devices, are connected via an interface.

【００１３】ＲＯＭ１１にはこの装置を起動するために
必要な起動プログラムなどが記憶されている。装置の動
作を制御するシステムプログラム，アプリケーションプ
ログラムなどはハードディスク記憶装置１７に記憶され
ている。アプリケーションプログラムはカラオケ演奏プ
ログラムなどである。カラオケ装置の電源がオンされる
と上記起動プログラムによってシステムプログラムやカ
ラオケ演奏プログラムがＲＡＭ１２に読み込まれる。ま
たハードディスク記憶装置１７には、上記システムプロ
グラムやアプリケーションプログラムのほか、約１万曲
のカラオケ演奏用楽曲データなどが記憶されている。The ROM 11 stores a starting program and the like necessary for starting the apparatus. A system program, an application program, and the like for controlling the operation of the apparatus are stored in the hard disk storage device 17. The application program is a karaoke performance program or the like. When the power of the karaoke apparatus is turned on, a system program and a karaoke performance program are read into the RAM 12 by the above-mentioned startup program. In addition to the above system programs and application programs, the hard disk storage device 17 stores music data for about 10,000 karaoke songs.

【００１４】通信制御部１６はＩＳＤＮ回線を介して配
信センタから楽曲データなどをダウンロードしハードデ
ィスク記憶装置１７に書き込む。この書込動作はＤＭＡ
回路を用いてハードディスク記憶装置１７に直接行われ
る。The communication control unit 16 downloads music data and the like from the distribution center via the ISDN line, and writes the data to the hard disk storage device 17. This write operation is performed by DMA
The processing is directly performed on the hard disk storage device 17 using a circuit.

【００１５】リモコン装置８は、テンキーなど各種のキ
ースイッチを備えており、利用者がこれらのスイッチを
操作するとその操作に応じたコード信号が赤外線で出力
される。リモコン受信部１３はリモコン装置８から送ら
れてくる赤外線信号を受信して、そのコード信号を復元
しＣＰＵ１０に入力する。前記リモコン装置８は、音声
変換モードスイッチ、ハモリモードスイッチを有してい
る。音声変換モードとは、歌唱者の声をオリジナル歌手
の声に変換するモードであり、前記音声変換モードスイ
ッチの操作によりオン／オフされる。また、ハモリモー
ドとは、歌唱者のメインメロディの歌唱に対してハーモ
ニ歌唱音声を付加するモードであり、前記ハモリモード
スイッチの操作によってオン／オフされる。The remote controller 8 is provided with various key switches such as a numeric keypad, and when a user operates these switches, a code signal corresponding to the operation is output by infrared rays. The remote control receiver 13 receives the infrared signal transmitted from the remote controller 8, restores the code signal, and inputs the code signal to the CPU 10. The remote control device 8 has a voice conversion mode switch and a memory mode switch. The voice conversion mode is a mode in which the voice of the singer is converted into the voice of the original singer, and is turned on / off by operating the voice conversion mode switch. The hamori mode is a mode for adding a harmony singing voice to the singer's main melody singing, and is turned on / off by operating the hamori mode switch.

【００１６】表示パネル１４はこのカラオケ装置本体１
の前面に設けられており、現在演奏中の曲番号や予約曲
数を表示するマトリクス表示器や現在設定されているキ
ーやテンポを表示するＬＥＤ群などを含んでいる。パネ
ルスイッチ１５は、前記リモコン装置８と同様の曲番号
入力用のテンキーやキーチェンジスイッチ、テンポチェ
ンジスイッチを備えている。The display panel 14 is provided with the karaoke apparatus main body 1.
, And includes a matrix display for displaying the number of the music currently being played and the number of reserved music, an LED group for displaying currently set keys and a tempo, and the like. The panel switch 15 includes a numeric keypad, a key change switch, and a tempo change switch similar to those of the remote controller 8 for inputting a music number.

【００１７】音源装置１８は、楽曲データの楽音トラッ
クのデータに基づいて楽音信号を形成する。楽音トラッ
クは、複数のトラックを有しており、音源装置１８はこ
のデータに基づいて複数パートの楽音信号を同時に形成
する。音声データ処理部１９は、楽曲データに含まれる
音声データに基づき、指定された長さ、指定された音高
の音声信号を形成する。音声データは、バックコーラス
などの人声など電子的に形成しにくい信号波形をそのま
まＰＣＭ信号として記憶したものである。前記音源装置
１８が形成した楽音信号および音声データ処理部１９が
再生した音声信号は、コントロールアンプ２に入力され
る。The tone generator 18 forms a tone signal based on tone track data of music data. The tone track has a plurality of tracks, and the tone generator 18 simultaneously forms tone signals of a plurality of parts based on the data. The audio data processing unit 19 forms an audio signal having a specified length and a specified pitch based on audio data included in the music data. The sound data is a signal waveform that is difficult to form electronically, such as a human voice such as a back chorus, and is stored as it is as a PCM signal. The tone signal formed by the sound source device 18 and the sound signal reproduced by the sound data processing unit 19 are input to the control amplifier 2.

【００１８】コントロールアンプ２にはマイク７が接続
されており、カラオケ歌唱者の歌唱音声信号が入力され
る。コントロールアンプ２は、通常モードにおいては、
カラオケ演奏音，バックコーラス，歌唱音声信号に対し
て、それぞれエコーなど所定の効果を付与したのち、所
定のバランスでミキシングし、増幅してスピーカ５に出
力する。一方、音声変換モード時には、マイク７から入
力された歌唱音声信号に対して処理をすることなく、そ
のまま音声信号処理装置３に出力する。そして、音声信
号処理装置３から再入力された変換された歌唱音声信号
を上記のように効果付与、ミキシング・増幅したのちス
ピーカ５から出力する。A microphone 7 is connected to the control amplifier 2, and a singing voice signal of a karaoke singer is input. In the normal mode, the control amplifier 2
After giving a predetermined effect such as an echo to the karaoke performance sound, the back chorus, and the singing voice signal, the karaoke sound is mixed with a predetermined balance, amplified, and output to the speaker 5. On the other hand, in the voice conversion mode, the singing voice signal input from the microphone 7 is output to the voice signal processing device 3 without processing. Then, the converted singing voice signal re-input from the voice signal processing device 3 is given an effect as described above, mixed and amplified, and then output from the speaker 5.

【００１９】音声信号処理装置３は、音声変換モード時
に、前記コントロールアンプ２から入力された歌唱音声
信号をオリジナル歌手の声の信号に変換する。この音声
変換機能については、図３の説明において詳述する。ま
た、この音声信号処理装置３は、ハモリモード時に、歌
唱音声信号の周波数を検出し、これを周波数変換するこ
とによってハーモニ旋律パートの歌唱音声信号を生成す
る。これら音声変換モード動作およびハモリモード動作
は並行して処理することができる。The voice signal processing device 3 converts the singing voice signal input from the control amplifier 2 into a voice signal of the original singer in the voice conversion mode. This voice conversion function will be described in detail in the description of FIG. In addition, the audio signal processing device 3 detects the frequency of the singing audio signal in the hamori mode, and generates a singing audio signal of the harmonic melody part by frequency-converting the frequency. These voice conversion mode operation and hamori mode operation can be processed in parallel.

【００２０】文字表示部２０は入力される文字データに
基づいて曲名や歌詞などの文字パターンを生成する。ま
た、外付装置であるＬＤチェンジャ４は、ＣＰＵ１０か
ら入力された映像選択データに基づいて動画の映像を背
景映像として再生する。映像選択データは楽曲データの
ヘッダに書き込まれているジャンルデータなどに基づい
て決定される。表示制御部２１はＬＤチェンジャ４から
入力された背景映像に文字表示部２０から入力される歌
詞などの文字パターンをスーパーインポーズで合成して
モニタ６に表示する。The character display unit 20 generates a character pattern such as a song title or lyrics based on the input character data. The LD changer 4 as an external device reproduces a moving image as a background image based on the image selection data input from the CPU 10. The video selection data is determined based on genre data or the like written in the header of the music data. The display control unit 21 superimposes a character pattern such as lyrics input from the character display unit 20 on the background image input from the LD changer 4 in superimposition and displays the superimposed character pattern on the monitor 6.

【００２１】図２は同カラオケ装置において用いられる
楽曲データの構成を示す図である。楽曲データは、ヘッ
ダ，楽音トラック，ガイドメロディトラック，ハーモニ
メロディトラック，歌詞トラック，音声トラック，効果
制御トラック，フォルマントシーケンストラックおよび
音声データ部からなっている。ヘッダは、この楽曲デー
タの属性に関するデータが書き込まれる部分であり、曲
名，ジャンル，発表日，曲の演奏時間などのデータが書
き込まれている。楽音トラック〜効果制御トラックの各
トラックは、複数のイベントデータと各イベントデータ
間の時間的間隔を示すデュレーションデータからなるＭ
ＩＤＩフォーマットで記述されている。歌詞トラック〜
効果制御トラックのデータは楽音データではないがイン
プリメンテーションの統一をとり作業工程を容易にする
ため、これらのトラックもＭＩＤＩフォーマットで記述
されている。FIG. 2 is a diagram showing a configuration of music data used in the karaoke apparatus. The music data includes a header, a musical tone track, a guide melody track, a harmonic melody track, a lyrics track, a voice track, an effect control track, a formant sequence track, and a voice data section. The header is a part in which data relating to the attribute of the music data is written, and data such as a music title, a genre, an announcement date, and a music performance time are written therein. Each track from the musical tone track to the effect control track is composed of M pieces of event data and duration data indicating a time interval between the event data.
It is described in IDI format. Lyrics track ~
The data of the effect control tracks are not tone data, but these tracks are also described in MIDI format in order to unify the implementation and facilitate the work process.

【００２２】また、フォルマントシーケンストラックに
は、オリジナル歌手がこの曲を歌ったときの歌唱音声の
フォルマントが、１０ｍｓ毎の時系列データとして記憶
されている。フォルマントとは、音声スペクトル上の優
勢な周波数成分のことであり、口腔の形状や寸法を反映
したものである。フォルマントは周波数の低い順に第１
フォルマント，第２フォルマント，……といい、主とし
て第３フォルマントまでが音韻性に寄与している。フォ
ルマントシーケンストラックに記録されている１個のフ
ォルマントデータは、第１〜第３のフォルマント周波数
およびそのレベルを記憶からなっている。フォルマント
周波数を１６ビット（２バイト）で表現し、そのレベル
を８ビット（１バイト）で表現すると、第１〜第３のフ
ォルマント周波数およびレベルからなる１個のフォルマ
ントデータは９バイトとなる。したがって、１０ｍｓ毎
に１個のフォルマントデータが書き込まれているフォル
マントシーケンストラックのデータサイズは、前奏，間
奏などを除いた歌唱時間が３分の曲の場合、約１５０ｋ
バイトになる。The formant sequence track stores the formants of the singing voice when the original singer sang this song as time-series data every 10 ms. A formant is a dominant frequency component on the voice spectrum, and reflects the shape and size of the oral cavity. Formants are first in order of lower frequency
Formants, second formants,..., Mainly up to the third formant contribute to phonological properties. One formant data recorded in the formant sequence track includes first to third formant frequencies and their levels. If the formant frequency is represented by 16 bits (2 bytes) and the level is represented by 8 bits (1 byte), one formant data including the first to third formant frequencies and levels becomes 9 bytes. Therefore, the data size of the formant sequence track in which one formant data is written every 10 ms is about 150 k in the case of a song whose singing time excluding the prelude, interlude, etc. is 3 minutes.
Become bytes.

【００２３】楽音トラックは、音源装置１８を駆動して
複数の楽音信号を形成するための複数パートのトラック
からなっている。ガイドメロディトラックには、該カラ
オケ曲の主旋律すなわち歌唱者が歌うべき旋律のデータ
が書き込まれている。ハーモニメロディトラックには、
ハーモニ旋律パートのデータが書き込まれている。歌詞
トラックは、モニタ６上に歌詞を表示するためのシーケ
ンスデータを記憶したトラックである。歌詞トラックの
イベントデータは、歌詞の文字コードやその表示位置を
指示するデータなどからなる。音声制御トラックは、音
声データ部に記憶されている音声データ群の発声タイミ
ングなどを指定するトラックである。音声データ部には
人声などのＰＣＭデータが記憶されており、音声制御ト
ラックのイベントデータは、そのイベントタイミングに
どの音声データを再生するかを指定する。効果制御トラ
ックには、コントロールアンプ２を制御するための効果
制御データが書き込まれている。コントロールアンプ２
はこの効果制御データに基づいて楽音信号に対してリバ
ーブなどの残響系の効果を付与する。The tone track is composed of a plurality of tracks for driving the tone generator 18 to form a plurality of tone signals. In the guide melody track, data of the main melody of the karaoke song, that is, the melody to be sung by the singer is written. Harmony melody tracks include
The data of the harmony melody part is written. The lyrics track is a track that stores sequence data for displaying lyrics on the monitor 6. The event data of the lyrics track is composed of the character code of the lyrics and data indicating the display position thereof. The audio control track is a track for specifying the utterance timing of the audio data group stored in the audio data section. PCM data such as human voice is stored in the audio data section, and the event data of the audio control track specifies which audio data is reproduced at the event timing. In the effect control track, effect control data for controlling the control amplifier 2 is written. Control amplifier 2
Gives a reverberation effect such as reverb to a tone signal based on this effect control data.

【００２４】カラオケ演奏がスタートすると、各トラッ
クのデータはクロックに従って並行して読み出され、対
応する処理部に出力される。楽音トラックのイベントデ
ータは音源装置１８に出力され、歌詞トラックのデータ
は文字表示部２０に出力され、効果制御トラックのデー
タはコントロールアンプ２に出力される。そして、フォ
ルマントシーケンストラックのフォルマントデータは、
１０ｍｓ毎に読み出され音声信号処理装置３に出力され
る。When the karaoke performance starts, the data of each track is read out in parallel according to the clock and output to the corresponding processing unit. The event data of the musical sound track is output to the tone generator 18, the data of the lyrics track is output to the character display unit 20, and the data of the effect control track is output to the control amplifier 2. And the formant data of the formant sequence track is
It is read out every 10 ms and output to the audio signal processing device 3.

【００２５】図３は前記音声信号処理装置３の機能を示
すブロック図である。音声信号処理装置３はＤＳＰを内
蔵しており、音声信号の処理をマイクロプログラムで処
理するが、この図では、このマイクロプログラムによる
機能をブロック図で表している。また、図４は図３の機
能ブロック図における信号の流れを示す図である。マイ
ク７からコントロールアンプ２を介して入力された歌唱
音声信号は、Ａ／Ｄコンバータ３０によってディジタル
データに変換される。このディジタルデータはピッチ・
音量抽出部３１に入力される。ピッチ・音量抽出部３１
は、このディジタルデータからピッチデータおよび音量
データを抽出する。ピッチデータはインパルス発生部３
２に出力され、音量データはゲインコントローラ３４に
出力される。インパルス発生部３２は、この装置におい
て音源のような機能をする部分であり、インパルス列を
所定の時間間隔で発生するジェネレータである。時間間
隔は、前記ピッチ・音量抽出部３１から入力されるピッ
チデータによって制御され、該ピッチデータの間隔でイ
ンパルス列を発生する。すなわち、４４０Ｈｚのピッチ
データが入力された場合、１／４４０秒の間隔でインパ
ルス列を発生する。なお、女性の声を男性の声に変換す
る場合には、４４０Ｈｚのピッチデータが入力されたと
き、この周波数を１／２にして１／２２０秒の間隔でイ
ンパルス列を発生し、女性の声を男性の声に変換する場
合には、２２０Ｈｚのピッチデータが入力されたとき、
この周波数を２倍にして１／４４０秒の間隔でインパル
ス列を発生する。この時間間隔は、ピッチ音量抽出部３
１から新たなピッチデータが入力される毎に、このピッ
チデータによって更新される。インパルス発生部３２が
発生したインパルス列は、図４（Ａ）左欄に示す形状で
あり、このインパルス列の振幅スペクトルは、図４
（Ａ）右欄に示す形状となる。このインパルス列はフィ
ルタ３３に入力される。フィルタ３３には、前記楽曲デ
ータのフォルマントシーケンストラックから読み出され
たフォルマントデータがフィルタ係数として設定され
る。ＣＰＵ１０から１０ｍｓ毎に入力されるフォルマン
トデータはフォルマントデータバッファ３６に記憶さ
れ、このバッファ３６の内容がフィルタ３３に供給され
る。フォルマントデータは、上述したように当該カラオ
ケ曲のオリジナル歌手の歌唱音声において音韻性を決定
する特徴的な周波数成分情報であり、図４（Ｂ）左欄に
図示する内容の情報である。１０ｍｓ毎に更新されるこ
のフォルマント変化により、オリジナル歌手の歌唱音声
が表現されている。フィルタ３３は、この周波数成分を
中心周波数としてある程度ブロードなろ波特性に設定さ
れる（図４（Ｂ）右欄参照）。前記インパルス列がフィ
ルタ３３を通過することにより、広範なスペクトルから
前記オリジナル歌手の特徴的な周波数成分が切り取ら
れ、オリジナル歌手の声に聞こえるような波形に整形さ
れる（図４（Ｃ）参照）。この整形された波形データは
ゲインコントローラ３４に入力される。ゲインコントロ
ーラ３４はこの波形データに対して前記ピッチ・音量抽
出部３１が抽出した音量データを乗算して歌唱音声信号
と同じ音量のエンベロープを付加する。この波形データ
がＤ／Ａコンバータ３５に入力されアナログ信号に変換
される。このアナログ信号がオリジナル歌手の声に変換
された歌唱音声信号として、コントロールアンプ２に再
入力される。FIG. 3 is a block diagram showing functions of the audio signal processing device 3. The audio signal processing device 3 has a built-in DSP and processes audio signals by a microprogram. In this figure, the functions of the microprogram are shown in a block diagram. FIG. 4 is a diagram showing a signal flow in the functional block diagram of FIG. The singing voice signal input from the microphone 7 via the control amplifier 2 is converted into digital data by the A / D converter 30. This digital data
It is input to the sound volume extraction unit 31. Pitch / volume extraction unit 31
Extracts pitch data and volume data from the digital data. The pitch data is stored in the impulse generator 3
2 and the volume data is output to the gain controller 34. The impulse generator 32 is a part that functions as a sound source in this device, and is a generator that generates an impulse train at predetermined time intervals. The time interval is controlled by pitch data input from the pitch / volume extracting unit 31, and an impulse train is generated at intervals of the pitch data. That is, when pitch data of 440 Hz is input, an impulse train is generated at intervals of 1/440 seconds. In the case of converting a female voice into a male voice, when pitch data of 440 Hz is input, this frequency is halved to generate an impulse train at an interval of 1/220 seconds, and the female voice is output. Is converted to a male voice, when 220 Hz pitch data is input,
This frequency is doubled to generate an impulse train at intervals of 1/440 seconds. This time interval is determined by the pitch volume extraction unit 3
Each time new pitch data is input from 1, it is updated with this pitch data. The impulse train generated by the impulse generator 32 has the shape shown in the left column of FIG. 4A, and the amplitude spectrum of the impulse train is shown in FIG.
(A) The shape shown in the right column is obtained. This impulse train is input to the filter 33. In the filter 33, formant data read from the formant sequence track of the music data is set as a filter coefficient. The formant data input from the CPU 10 every 10 ms is stored in the formant data buffer 36, and the contents of the buffer 36 are supplied to the filter 33. As described above, the formant data is characteristic frequency component information that determines phonological characteristics in the singing voice of the original singer of the karaoke song, and is information having contents illustrated in the left column of FIG. The singing voice of the original singer is expressed by this formant change updated every 10 ms. The filter 33 is set to have a somewhat broad filtering characteristic with this frequency component as the center frequency (see the right column in FIG. 4B). By passing the impulse train through the filter 33, characteristic frequency components of the original singer are cut out from a broad spectrum and shaped into a waveform that can be heard by the voice of the original singer (see FIG. 4C). . The shaped waveform data is input to the gain controller 34. The gain controller 34 multiplies the waveform data by the volume data extracted by the pitch / volume extraction unit 31 to add an envelope having the same volume as the singing voice signal. This waveform data is input to the D / A converter 35 and converted into an analog signal. This analog signal is re-input to the control amplifier 2 as a singing voice signal converted into the voice of the original singer.

【００２６】コントロールアンプ２は、音声変換モード
時は、マイク７から入力された歌唱音声信号をカラオケ
演奏音とミキシングせず、このオリジナル歌手の声に変
換された歌唱音声信号をカラオケ演奏音とミキシング
し、スピーカ５から放音する。In the voice conversion mode, the control amplifier 2 does not mix the singing voice signal input from the microphone 7 with the karaoke performance sound, but mixes the singing voice signal converted into the voice of the original singer with the karaoke performance sound. Then, sound is emitted from the speaker 5.

【００２７】なお、この音声変換モードとハモリモード
が同時に設定されている場合は、この変換された歌唱音
声信号の周波数を変換してハーモニ旋律の音声信号を作
成すればよい。When the voice conversion mode and the hamori mode are set at the same time, the frequency of the converted singing voice signal may be converted to generate a harmony melody voice signal.

【００２８】このように、インパルス列の持つ広範なス
ペクトルををオリジナル歌手のフォルマントでフィルタ
リングすることにより、オリジナル歌手の音色の歌唱音
声信号を作りだすことができる。As described above, by filtering the broad spectrum of the impulse train with the original singer's formant, a singing voice signal of the original singer's timbre can be created.

【００２９】なお、この実施形態では、この発明の音声
信号処理装置をカラオケ装置に適用した例を示したが、
これ以外の装置に適用することも可能である。In this embodiment, an example is shown in which the audio signal processing apparatus of the present invention is applied to a karaoke apparatus.
It is also possible to apply to other devices.

【００３０】[0030]

【発明の効果】以上のように請求項１の発明によれば、
オリジナル歌手などの特定人の歌唱音声のフォルマント
の変化をシーケンスデータとして記憶し、入力された歌
唱音声信号から検出された周波数に応じて発生したイン
パルス列をこの歌唱に合わせて読み出された前記時系列
に変化するフォルマントデータでフィルタリングするこ
とにより、歌唱音声信号と同じ歌い回しで、前記特定人
が歌唱したときの曲の進行に伴う音色変化を有する歌唱
音声信号を作りだすことができる。As described above, according to the first aspect of the present invention,
The change of the formant of the singing voice of a specific person such as the original singer is stored as sequence data, and the impulse train generated according to the frequency detected from the input singing voice signal is read out in accordance with the singing. By filtering with formant data that changes in a series, a singing voice signal having a timbre change accompanying the progress of the tune when the specific person sings can be created with the same singing turn as the singing voice signal.

【００３１】さらに請求項２の発明によれば、この信号
を前記歌唱音声信号から検出された音量データに基づい
て音量を制御することにより、歌唱音声信号と全く同じ
ように歌う特定人の声の歌唱音声信号を作成することが
でき、カラオケ装置などにこれを適用すれば、カラオケ
歌唱をより盛り上げることができる。Further, according to the invention of claim 2, by controlling the volume of this signal based on the volume data detected from the singing voice signal, the voice of a specific person singing in exactly the same way as the singing voice signal is obtained. A singing voice signal can be created, and if this signal is applied to a karaoke device or the like, karaoke singing can be further enhanced.

[Brief description of the drawings]

【図１】この発明が適用されるカラオケ装置のブロック
図FIG. 1 is a block diagram of a karaoke apparatus to which the present invention is applied;

【図２】同カラオケ装置において用いられる楽曲データ
の構成を示す図FIG. 2 is a diagram showing a configuration of music data used in the karaoke apparatus.

【図３】この発明の実施形態である音声信号処理装置の
機能ブロック図FIG. 3 is a functional block diagram of an audio signal processing device according to an embodiment of the present invention;

【図４】同音声信号処理装置の各処理部で処理される信
号を示す図FIG. 4 is a diagram showing signals processed by respective processing units of the audio signal processing device.

[Explanation of symbols]

３…音声信号処理装置、７…マイク、３１…ピッチ・音
量抽出部、３２…インパルス発生部、３３…フィルタ、
３６…フォルマントデータバッファ3 audio signal processing device, 7 microphone, 31 pitch / volume extraction unit, 32 impulse generation unit, 33 filter
36 ... Formant data buffer

Claims

[Claims]

1. A storage means for storing a change in formant of a specific person's singing voice as sequence data; a frequency detecting means for detecting a frequency of an input singing voice signal in real time; a frequency detected by the frequency detecting means An impulse generating means for generating an impulse train at intervals according to the following: reading means for reading out the sequence data in accordance with the progress of the singing; a formant for reading out frequency components of the impulse train generated by the impulse generating means And a converted voice synthesizing unit configured to output the converted voice signal as a voice signal.

2. A volume detector for detecting the volume of the input singing voice signal in real time; and a volume for controlling the volume of the voice signal synthesized by the converted voice synthesizer to the volume detected by the volume detector. The audio signal processing device according to claim 1, further comprising: a control unit.