JP4371156B2

JP4371156B2 - Karaoke equipment

Info

Publication number: JP4371156B2
Application number: JP2007180379A
Authority: JP
Inventors: 隆宏川嶋
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-07-09
Filing date: 2007-07-09
Publication date: 2009-11-25
Anticipated expiration: 2018-07-29
Also published as: JP2007304619A

Description

この発明は、入力される歌唱音声に基づいて歌唱者の性別や年齢を推定するカラオケ装置に関する。 The present invention relates to a karaoke apparatus for estimating the gender and age of the singer on the basis of the singing voice input.

カラオケ装置において、歌唱者の歌唱を分析して何らかの結果を報告する機能としては、採点機能があった。採点機能は、歌唱者の音程や音量をリファレンス（ガイドメロディデータ）と比較し、その一致度に応じて歌唱を採点する機能である。 In a karaoke apparatus, there was a scoring function as a function of analyzing a singer's song and reporting some result. The scoring function is a function that compares the pitch and volume of a singer with a reference (guide melody data) and scores the song according to the degree of coincidence.

しかし、この機能は歌唱の巧拙を判定するものであったため、競い合う感じになり、だれでも気軽に参加できるものではなかった。また、歌唱の巧拙を判断したのち、この情報を用いてそれ以外のサービスや機能制御を行うことはできなかった。 However, since this function was to judge the skill of singing, it felt like a competition, and anyone could not participate easily. Moreover, after judging the skill of singing, it was not possible to control other services and functions using this information.

この発明は、フォルマントを分析することで性別や年齢などを推定し、利用者が気軽に楽しめるアミューズメントなどを提供することができるカラオケ装置を提供することを目的とする。 An object of the present invention is to provide a karaoke apparatus that can provide an amusement and the like that a user can easily enjoy by estimating sex and age by analyzing formants.

請求項１の発明は、カラオケ演奏を実行する演奏手段と、歌唱音声を入力する歌唱音声入力手段と、歌唱音声からフォルマントを抽出するフォルマント抽出手段と、抽出されたフォルマントに基づいて歌唱音声がどの程度男性的かまたはどの程度女性的かを示す指標である性別度を推定する推定手段と、前記推定手段によって推定された性別度に基づいて前記演奏手段が生成する楽音の音色に関わる要素を制御する演奏制御手段と、を備えたことを特徴とする。 Which of the invention of claim 1, and a playing means for performing a karaoke performance, and the singing voice input means for inputting singing voice, a formant extracting means for extracting a formant from singing voice, singing voice based on the extracted formants Estimating means for estimating the degree of gender, which is an index indicating the degree of masculine or how feminine, and controlling elements related to the tone color of the musical tone generated by the performance means based on the gender degree estimated by the estimating means And a performance control means.

この発明では、歌唱者の歌唱音声からフォルマントデータを抽出し、これを分析する。フォルマントデータとは、発声された母音のスペクトル上の優勢な周波数成分であり、周波数の低い順に第１フォルマント、第２フォルマント，…と呼んでいる。このうち、第３フォルマントまでが音韻性に寄与していると言われている。図５に示すように男性と女性のフォルマントパターンは明らかに異なっており、歌唱音声からフォルマントを抽出し、これを分析することによって歌唱者が男性であるか女性であるかを推定することができる。さらに、どの程度男性的な声であるか女性的な声であるかを割り出すこともできる。また、同様にこのフォルマントによって年齢を推定することができる。 In the present invention, formant data is extracted from the singing voice of the singer and analyzed. The formant data is a dominant frequency component on the spectrum of the uttered vowel, and is called the first formant, the second formant,. Of these, up to the third formant is said to contribute to phonological properties. As shown in FIG. 5, male and female formant patterns are clearly different, and it is possible to estimate whether a singer is male or female by extracting formant from singing voice and analyzing it. . Furthermore, it is possible to determine how much the voice is masculine or feminine. Similarly, the age can be estimated by this formant.

このようにして、カラオケの歌唱音声から歌唱者の性別や年齢を推定し、これに応じた情報を表示する。これにより、歌唱者は、歌唱するのみで何もしなくても自分にあったサービスの提供を受けることができる。この情報表示をゲームなどアミューズメント的なものにすることにより、誰にでも気軽に参加できる歌唱ゲームを提供することができる。また、推定された性別や年齢に基づいてカラオケ演奏を制御することにより、この性別・年齢の歌唱音声が最も映えるような伴奏にカラオケ演奏を調整することができ、歌唱者が何も調整しなくても、歌唱するのみで歌唱者が最も歌いやすいカラオケ演奏に制御することができる。 In this way, the gender and age of the singer are estimated from the karaoke singing voice, and information corresponding to this is displayed. Thus, the singer can receive a service provided for himself / herself without performing anything but singing. By making this information display amusement such as a game, it is possible to provide a singing game that anyone can easily participate in. Also, by controlling the karaoke performance based on the estimated gender and age, it is possible to adjust the karaoke performance to the accompaniment that best reflects the singing voice of this gender and age, and the singer does not adjust anything However, it is possible to control the karaoke performance that the singer can sing most simply by singing.

以上のようにこの発明によれば、歌唱音声のフォルマントを抽出して、その性別や年齢を推定し、これに応じた情報表示やカラオケ演奏の制御を行うことにより、歌唱者がマニュアル操作で何かを入力しなくても、歌唱するのみで何らかの情報表示や演奏制御を行うことができ、サービス・機能の向上につながる利点がある。 As described above, according to the present invention, the formant of the singing voice is extracted, its gender and age are estimated, and information display and karaoke performance control according to this are performed. There is an advantage that it is possible to perform some kind of information display and performance control only by singing without inputting or to improve services and functions.

また、上記サービスは歌唱音声のフォルマントに応じたものであり、従来の採点ゲームのように歌唱の巧拙を採点するものではないため、誰でも気軽に参加できるという利点がある。 In addition, the above service is in accordance with the formant of the singing voice and does not score the skill of singing as in the conventional scoring game, so there is an advantage that anyone can easily participate.

図面を参照してこの発明の実施形態について説明する。図１はこの発明の実施形態であるカラオケ装置の機能ブロック図である。このカラオケ装置は、通常のカラオケ演奏を実行する機能を備えているとともに、カラオケ演奏時の歌唱音声を取り込み、そのフォルマントを分析して歌唱音声の性別度および推定年齢を割り出す。そして、この割り出した性別（度）および（推定）年齢に対応する今日または明日の運勢を運勢データベースから読み出してこれをモニタに表示する。 Embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a functional block diagram of a karaoke apparatus according to an embodiment of the present invention. This karaoke apparatus has a function of executing a normal karaoke performance, takes in a singing voice at the time of karaoke performance, analyzes its formant, and determines the gender degree and estimated age of the singing voice. Then, the fortune of today or tomorrow corresponding to the determined sex (degree) and (estimated) age is read from the fortune database and displayed on the monitor.

また、この性別度および推定年齢の声質にはどのような伴奏（カラオケ演奏）が適しているかを判断し、これに基づいてカラオケ演奏を調整する。これにより、歌唱者が調整しなくても歌唱が最も映えるようにカラオケ演奏を自動的に調整することができる。 Also, it is determined what kind of accompaniment (karaoke performance) is suitable for the gender degree and the voice quality of the estimated age, and the karaoke performance is adjusted based on this. Thereby, even if a singer does not adjust, a karaoke performance can be adjusted automatically so that a song can shine best.

図１において、歌唱音声入力部１は、カラオケ歌唱用のマイクを含んでいる。歌唱音声入力部１から入力された歌唱音声１０はフォルマント抽出部２に取り込まれる。フォルマント抽出部２は、入力された歌唱音声１０から母音を切り出し、各母音毎のフォルマントを抽出する。母音は周期信号であり、カラオケ歌唱においては数十ミリ秒〜数秒程度の時間継続するため、同一周期の波形区間を切り出すことによって短時間の非周期信号である子音と区別することができる。また、その周期波形の継承に基づいてア，イ，ウ，エ，オのどの母音であるかを識別することができる。フォルマントとは、母音の周波数スペクトル上の優勢な周波数成分であり、周波数の低い順に第１，第２，第３，…フォルマントと言う。フォルマント抽出部２は、切り出された母音の第１〜第３フォルマントを抽出する。このフォルマントの抽出はＦＦＴ（高速フーリエ解析）などで行えばよい。カラオケ曲の歌詞の進行に従って入力される歌唱音声１０の母音毎にフォルマント抽出を行うことにより、フォルマント抽出部２は、１秒に１回程度の頻度でフォルマント周波数データを出力することができる。 In FIG. 1, the singing voice input unit 1 includes a microphone for karaoke singing. The singing voice 10 input from the singing voice input unit 1 is taken into the formant extraction unit 2. The formant extraction unit 2 cuts out vowels from the input singing voice 10 and extracts formants for each vowel. A vowel is a periodic signal, which lasts for several tens of milliseconds to several seconds in karaoke singing, so that it can be distinguished from a consonant that is a short-period aperiodic signal by cutting out a waveform section of the same period. In addition, based on the inheritance of the periodic waveform, it is possible to identify which vowel is a, i, u, d, or o. A formant is a dominant frequency component on the frequency spectrum of a vowel, and is called first, second, third,. The formant extraction unit 2 extracts first to third formants of the cut vowel. This formant extraction may be performed by FFT (fast Fourier analysis) or the like. By performing formant extraction for each vowel of the singing voice 10 input according to the progress of the lyrics of the karaoke song, the formant extraction unit 2 can output formant frequency data at a frequency of about once per second.

抽出されたフォルマント１１は、フォルマント比較部３に入力される。フォルマント比較部３は、入力したフォルマント１１をリファレンス１２と比較することによってこの歌唱音声の性別度，推定年齢を割り出す。性別度とは、この歌唱音声がどの程度男性的かまたはどの程度女性的かを示す指標であり、推定年齢とは、この歌唱音声がどのくらいの年齢の人の声に聞こえるかを示すものである。 The extracted formant 11 is input to the formant comparison unit 3. The formant comparison unit 3 compares the input formant 11 with the reference 12 to determine the gender degree and estimated age of the singing voice. Gender is an indicator of how masculine or feminine this singing sound is, and estimated age indicates how old this singing sound can be heard by a person's voice .

リファレンスデータベース４に記憶されているリファレンス１２は、歌唱音声から抽出されたフォルマント１１の性別度，推定年齢を割り出すための資料となる情報であり、この実施形態では、多数のサンプル音声からフォルマントを抽出し、これを男性・女性別、年齢別に平均したデータをリファレンスとして用いている。 The reference 12 stored in the reference database 4 is information serving as a material for determining the sex degree and estimated age of the formant 11 extracted from the singing voice. In this embodiment, the formant is extracted from a large number of sample voices. In addition, data averaged by male / female and by age is used as a reference.

フォルマント比較部３は、入力したフォルマントデータ１１をリファレンス１２の男性の平均フォルマントおよび女性の平均フォルマントと比較し、これらとの近似度によって性別度を割り出す。さらに、入力したフォルマントデータ１１をリファレンス１２の各年齢層の平均フォルマントと比較し、これらとの近似度によって推定年齢を割り出す。割り出された性別度・推定年齢１３は、カラオケ演奏制御部７に入力されるとともに、運勢データベース５に入力される。 The formant comparison unit 3 compares the input formant data 11 with the male average formant and the female average formant of the reference 12, and calculates the gender degree based on the degree of approximation. Further, the input formant data 11 is compared with the average formants of the respective age groups of the reference 12, and the estimated age is determined by the degree of approximation with these. The determined gender degree / estimated age 13 is input to the karaoke performance control unit 7 and to the fortune database 5.

運勢データベース５は、入力された性別度および推定年齢を曲が終了するまで蓄積記憶し、カラオケ曲が終了したとき蓄積した性別度，推定年齢を平均して曲全体の平均の性別度，推定年齢を算出する。そして、この平均の性別度，推定年齢を検索キーとして検索を行い、対応する運勢文言を読み出す。読み出された運勢文言は、前記性別度、推定年齢とともに表示部６に入力される。表示部６はモニタを有しており、カラオケ演奏中は歌詞や背景映像などを表示している。カラオケ曲の演奏が終了したとき、歌詞の表示を消去するとともに、運勢データベース５から入力された運勢文言、性別度・推定年齢を表示する。 The fortune database 5 stores and stores the entered gender degree and estimated age until the end of the song, averages the gender degree and estimated age accumulated when the karaoke song ends, and averages the gender and estimated age of the entire song Is calculated. Then, a search is performed using the average gender degree and estimated age as a search key, and a corresponding fortune statement is read out. The read fortune word is input to the display unit 6 together with the sex degree and the estimated age. The display unit 6 has a monitor, and displays lyrics and background video during karaoke performance. When the performance of the karaoke song ends, the display of the lyrics is erased, and the fortune wording, gender degree / estimated age input from the fortune database 5 are displayed.

また、カラオケ演奏制御部７は、フォルマント比較部３から刻々入力される性別度・推定年齢１３に基づいてカラオケ演奏部８をリアルタイムで制御する。入力される性別度・推定年齢１３は、その歌唱者の声質を表す情報であるため、この声質に最も適した、すなわちこの声質が最も映える演奏（伴奏）となるようにカラオケ演奏を制御する。制御ルールとしては、たとえば、
男性度が高ければアタックを強くする。
女性度が高ければレガートにする。
推定年齢が若ければフィルタのカットオフを上げて倍音成分を多くし、華やかな音にする。
推定年齢が高ければフィルタのカットオフを下げて倍音の少ないまるい音にする。
などである。 Further, the karaoke performance control unit 7 controls the karaoke performance unit 8 in real time based on the gender degree / estimated age 13 inputted from the formant comparison unit 3 every moment. Since the input gender degree / estimated age 13 is information representing the voice quality of the singer, the karaoke performance is controlled so that the performance (accompaniment) most suitable for this voice quality, that is, the voice quality is best reflected. As a control rule, for example,
If the male degree is high, the attack is strengthened.
Use legato if the woman is high.
If the estimated age is younger, the filter cutoff is increased to increase the harmonic content and make the sound gorgeous.
If the estimated age is high, the filter cutoff is lowered to a round sound with few overtones.
Etc.

図２は上記機能を実現するカラオケ装置のハードウェアのブロック図である。図１に示す機能は、このハードウェア上で図４に示すようなプログラムを実行することによって実現される。 FIG. 2 is a block diagram of hardware of a karaoke apparatus that realizes the above function. The function shown in FIG. 1 is realized by executing a program as shown in FIG. 4 on this hardware.

このカラオケ装置は、カラオケ装置本体２１，コントロールアンプ２２，音声信号処理装置２３，ＣＤ−ＲＯＭチェンジャ２４，スピーカ２５，モニタ２６，マイク２７および赤外線のリモコン装置２８で構成されている。カラオケ装置本体２１はこのカラオケ装置全体の動作を制御する。該カラオケ装置本体２１の制御装置であるＣＰＵ３０には、内部バスを介してＲＯＭ３１，ＲＡＭ３２，ハードディスク記憶装置３７，通信制御部３６，リモコン受信部３３，表示パネル３４，パネルスイッチ３５，音源装置３８，音声データ処理部３９，文字表示部４０，表示制御部４１が接続されるとともに、上記外部装置であるコントロールアンプ２２，音声信号処理装置２３およびＣＤ−ＲＯＭチェンジャ２４がインタフェースを介して接続されている。 This karaoke device is composed of a karaoke device main body 21, a control amplifier 22, an audio signal processing device 23, a CD-ROM changer 24, a speaker 25, a monitor 26, a microphone 27, and an infrared remote control device 28. The karaoke device main body 21 controls the operation of the entire karaoke device. The CPU 30, which is a control device of the karaoke device main body 21, has a ROM 31, RAM 32, a hard disk storage device 37, a communication control unit 36, a remote control reception unit 33, a display panel 34, a panel switch 35, a sound source device 38, and the like via an internal bus. The audio data processing unit 39, the character display unit 40, and the display control unit 41 are connected, and the control amplifier 22, the audio signal processing device 23, and the CD-ROM changer 24, which are the external devices, are connected via an interface. .

ＲＯＭ３１にはこの装置を起動するために必要な起動プログラムなどが記憶されている。装置の動作を制御するシステムプログラム，カラオケ演奏実行プログラムなどはハードディスク記憶装置３７に記憶されている。カラオケ装置の電源がオンされると上記起動プログラムによってシステムプログラムやカラオケ演奏プログラムがＲＡＭ３２に読み込まれる。 The ROM 31 stores an activation program necessary for activating this apparatus. A system program for controlling the operation of the apparatus, a karaoke performance execution program, and the like are stored in the hard disk storage device 37. When the power of the karaoke apparatus is turned on, a system program and a karaoke performance program are read into the RAM 32 by the above-described startup program.

ハードディスク記憶装置３７には、上記プログラムや楽曲データが記憶されているほか、歌唱音声のフォルマントを分析して性別度・推定年齢を割り出すためのリファレンスデータベース１７１、性別度を文言化して表示するための表示文言データベース１７２、割り出された性別度・推定年齢に基づいて運勢を表示するための運勢データベース１７３、割り出された性別度・推定年齢に基づいてカラオケ演奏を制御するための演奏制御データベース１７４などが記憶されている（図３参照）。前記ＲＡＭ３２には、装置の起動時にハードディスク記憶装置３７からプログラムを読み込むプログラム記憶エリアや演奏されるカラオケ曲の楽曲データを読み込む実行曲データ記憶エリアなどが設定されるほか、カラオケ演奏中に割り出される性別度・推定年齢を蓄積記憶する蓄積記憶エリアも設定される。 The hard disk storage device 37 stores the above program and music data, as well as a reference database 171 for analyzing the formant of the singing voice and determining the gender degree / estimated age, and for displaying the gender degree in words. Display wording database 172, fortune database 173 for displaying fortune based on the determined gender degree / estimated age, performance control database 174 for controlling karaoke performance based on the determined gender degree / estimated age Etc. are stored (see FIG. 3). In the RAM 32, a program storage area for reading a program from the hard disk storage device 37 when the apparatus is started up, an execution song data storage area for reading song data of a karaoke song to be played, and the like are set. An accumulation storage area for accumulating and storing the sex level and estimated age is also set.

通信制御部３６はＩＳＤＮ回線を介して配信センタ１９と接続される。配信センタ１９は、定期的にカラオケ装置に対して電話を掛け、新曲の楽曲データやバージョンアップされた制御プログラムなどをダウンロードする。また、図３に示すリファレンスデータベース１７１，表示文言データベース１７２，運勢データベース１７３，演奏制御データベース１７４も配信センタ１９からダウンロードされる。特に、運勢データベース１７３は、毎日内容が変わるものであるため、毎日その日の運勢データベースまたは定期的に数日分の運勢データベースがダウンロードされる。 The communication control unit 36 is connected to the distribution center 19 via an ISDN line. The distribution center 19 periodically calls the karaoke device to download music data of new songs, upgraded control programs, and the like. Also, the reference database 171, the display wording database 172, the fortune database 173, and the performance control database 174 shown in FIG. In particular, since the contents of the fortune database 173 change every day, the fortune database for that day or the fortune database for several days is downloaded every day.

リモコン装置２８は、テンキーなどのキースイッチを備えており、利用者がこれらのスイッチを操作するとその操作に応じたコード信号が赤外線で出力される。リモコン受信部３３はリモコン装置１８から送られてくる赤外線信号を受信して、そのコード信号を復元しＣＰＵ３０に入力する。 The remote control device 28 includes key switches such as a numeric keypad, and when a user operates these switches, a code signal corresponding to the operation is output by infrared rays. The remote control receiving unit 33 receives the infrared signal sent from the remote control device 18, restores the code signal, and inputs it to the CPU 30.

表示パネル３４はこのカラオケ装置本体２１の前面に設けられており、現在演奏中の曲番号や予約曲数を表示するマトリクス表示器や、現在設定されているキーやテンポを表示するＬＥＤ群などを含んでいる。パネルスイッチ３５は、前記汎用のリモコン装置２８と同様の曲番号入力用のテンキーなどを備えている。 The display panel 34 is provided on the front surface of the karaoke apparatus main body 21 and includes a matrix display for displaying the number of the currently played song and the number of reserved songs, and a group of LEDs for displaying the currently set key and tempo. Contains. The panel switch 35 includes a numeric keypad for inputting a music number similar to the general-purpose remote control device 28.

音源装置３８は、楽曲データに基づいて楽音信号を形成する。楽曲データは、複数トラックの演奏データを含んでおり、音源装置３８はこのデータに基づいて複数パートの楽音信号を同時に形成する。音声データ処理部３９は、楽曲データに含まれる音声データに基づき、指定された長さ、指定された音高の音声信号を形成する。音声データは、バックコーラスなどの人声など電子的に形成しにくい信号波形をそのままＰＣＭ信号として記憶したものである。前記音源装置３８が形成した楽音信号および音声データ処理部３９が再生した音声信号は、コントロールアンプ２２に入力される。 The sound source device 38 forms a musical sound signal based on the music data. The music data includes performance data of a plurality of tracks, and the tone generator 38 simultaneously forms a plurality of parts of tone signals based on this data. The audio data processing unit 39 forms an audio signal having a specified length and a specified pitch based on the audio data included in the music data. The audio data is obtained by storing a signal waveform that is difficult to form electronically, such as a human voice such as a back chorus, as it is as a PCM signal. The musical tone signal formed by the sound source device 38 and the audio signal reproduced by the audio data processing unit 39 are input to the control amplifier 22.

また、コントロールアンプ２２には、２本のマイク２７ａ，２７ｂが接続されており、カラオケ歌唱者の歌唱音声が入力される。コントロールアンプ２２はこれらのオーディオ信号に、それぞれエコーなど所定の効果を付与したのち増幅してスピーカ２５に出力する。音声信号処理装置２３は、コントロールアンプ２２から入力された歌唱音声の信号（いずれか１本のマイクの信号）をディジタルデータに変換し、周期信号（母音）を切り出してこの周期信号をＦＦＴ解析することによりフォルマントを抽出する。また、この周期波形の形状に基づきア，イ，ウ，エ，オのどの母音であるかを識別し、これを示す母音情報を発生する。抽出されたフォルマントデータおよび母音情報はＣＰＵ３０に入力される。また、音声信号処理装置２３は、歌唱音声の音程のずれを修正したり、他のパートのハーモニー歌唱を作成したりする機能を備えている。修正された歌唱音声や他のパートのハーモニー歌唱音声は再度コントロールアンプ２２に入力される。この修正機能は両方のマイクの信号に施してもよい。 In addition, two microphones 27a and 27b are connected to the control amplifier 22, and the singing voice of the karaoke singer is input. The control amplifier 22 gives a predetermined effect such as echo to these audio signals, amplifies them, and outputs them to the speaker 25. The audio signal processor 23 converts the singing voice signal (any one microphone signal) input from the control amplifier 22 into digital data, cuts out a periodic signal (vowel), and performs FFT analysis on the periodic signal. To extract the formants. Further, based on the shape of the periodic waveform, the vowel of A, B, U, D, or A is identified, and vowel information indicating this is generated. The extracted formant data and vowel information are input to the CPU 30. Moreover, the audio | voice signal processing apparatus 23 is equipped with the function which corrects the shift | offset | difference of the pitch of a song voice, or creates the harmony song of another part. The corrected singing voice and the harmony singing voice of other parts are input to the control amplifier 22 again. This correction function may be applied to the signals of both microphones.

なお、上記音源３８およびコントロールアンプ２２は、性別度・推定年齢に基づきその歌唱者の音質に適合するカラオケ演奏になるように制御される。
文字表示部４０はＶＲＡＭを備え、カラオケ曲の歌詞などを文字パターンに展開した画像データをモニタ２６の表示エリアに対応したマトリクスに展開する。展開されたマトリクスデータは、順次スキャンされ映像信号として表示制御部４１に入力される。カラオケ演奏時はＣＤ−ＲＯＭチェンジャ２４は背景映像を再生し、この映像信号も表示制御部４１に入力される。表示制御部４１は、歌詞の文字パターンを背景映像にスーパーインポーズで合成してモニタ２６に表示する。また、カラオケ曲の終了後、ＣＰＵ３０は、文字表示部４０に運勢の文言，性別度を示す文言および推定年齢の文字を入力する。文字表示部４０はこれを映像信号化してモニタ２６に表示する。 The sound source 38 and the control amplifier 22 are controlled so as to achieve a karaoke performance that matches the sound quality of the singer based on the sex level and the estimated age.
The character display unit 40 includes a VRAM, and develops image data obtained by developing lyrics of karaoke songs into character patterns in a matrix corresponding to the display area of the monitor 26. The developed matrix data is sequentially scanned and input to the display control unit 41 as a video signal. During the karaoke performance, the CD-ROM changer 24 reproduces the background video, and this video signal is also input to the display control unit 41. The display control unit 41 synthesizes the character pattern of the lyrics with the background image in a superimposition and displays it on the monitor 26. Further, after the end of the karaoke song, the CPU 30 inputs the word of fortune, the word indicating the gender degree, and the character of the estimated age to the character display unit 40. The character display unit 40 converts this into a video signal and displays it on the monitor 26.

図３は前記ハードディスク３７の一部記憶内容を示す図である。ハードディスク３７にはリファレンスデータベース１７１、表示文言データベース１７２、運勢データベース１７３および演奏制御データベース１７４が設定されている。 FIG. 3 is a diagram showing a part of the stored contents of the hard disk 37. In the hard disk 37, a reference database 171, a display wording database 172, a fortune database 173, and a performance control database 174 are set.

リファレンスデータベース１７１は、マイク２７から入力された歌唱音声から抽出したフォルマントの性質を調べるためのリファレンスを記憶したデータベースであり、この実施形態ではリファレンスとして、「あ、い、う、え、お」の各母音の第１，第２，第３フォルマントを、男性全体、、女性全体、男性の年齢別（〜２０才、〜３０才、〜４０才、４０才〜）、および、女性の年齢別（〜２０才、〜３０才、〜４０才、４０才〜）毎に記憶している。性別，年齢別のフォルマントは、多数の人声をサンプリングしてフォルマントを抽出し、その平均値を求めればよい。 The reference database 171 is a database that stores a reference for examining the properties of the formants extracted from the singing voice input from the microphone 27. In this embodiment, as a reference, "A, I, U, E, O" The first, second, and third formants of each vowel are classified into the whole male, the whole female, the male age group (~ 20 years old, ~ 30 years old, ~ 40 years old, 40 years old ~), and the female age group ( ~ 20 years old, ~ 30 years old, ~ 40 years old, 40 years old ~) every time. For formants by sex and age, a large number of human voices may be sampled to extract formants, and the average value may be obtained.

同図（Ｂ）に示す表示文言データベースは、音声信号から抽出したフォルマントをリファレンスと比較して割り出された性別度，推定年齢をモニタ２６に表示するときの文言を記憶したデータベースである。性別度，推定年齢とも数値情報として算出される。推定年齢はそのまま数値で表示しても理解可能であるが、性別度は数値では一般の利用者が理解しづらいため、これを文言に置き換えて表示する。そのための文言がこの表示文言データベースに記憶されている。 The displayed wording database shown in FIG. 5B is a database that stores words used when displaying the gender degree and estimated age calculated by comparing the formants extracted from the audio signal with the reference on the monitor 26. Both gender and estimated age are calculated as numerical information. The estimated age can be understood by displaying it as a numerical value as it is, but the gender degree is difficult to understand by a general user because it is a numerical value. The wording for that is memorize | stored in this display wording database.

運勢データベース１７３は、性別および年齢で決まる当日または翌日の運勢を記憶したデータベースである。このデータベースは、配信センタ１９からダウンロードされ、常に最新の運勢を表示できるようにされる。ダウンロードは毎日行ってもよく、数日分をまとめてダウンロードするようにしてもよい。 The fortune database 173 is a database that stores the fortune of the current day or the next day determined by gender and age. This database is downloaded from the distribution center 19 so that the latest information can always be displayed. Downloading may be performed every day or several days may be downloaded together.

演奏制御データベース１７４は、性別度，推定年齢に応じて、歌唱者の声質を引き立たせるためのカラオケ演奏の制御態様を記憶したデータベースであり、上述したようなルールが記憶されている。このルール記述方式としてはファジィルールを用いてもよい。 The performance control database 174 is a database storing karaoke performance control modes for enhancing the voice quality of the singer according to the gender level and the estimated age, and stores the rules as described above. As this rule description method, a fuzzy rule may be used.

上記構成のカラオケ装置でカラオケ演奏が実行されると、マイク２７から入力された歌唱音声がコントロールアンプ２２を介して音声信号処理装置２３に入力される。音声信号処理装置２３は、この信号をデジタルデータ化し、周期信号の区間を割り出してこれを切り出す。この区間がア，イ，ウ，エ，オのどの母音であるかを割り出す。これはア，イ，ウ，エ，オのサンプルデータとのマッチングなどで割り出せばよい。そして、ＦＦＴ解析によりその母音のフォルマントを抽出する。このフォルマントデータと前記母音情報をＣＰＵ３０に入力する。 When a karaoke performance is executed by the karaoke apparatus having the above configuration, the singing voice input from the microphone 27 is input to the audio signal processing apparatus 23 via the control amplifier 22. The audio signal processing device 23 converts this signal into digital data, determines the period of the periodic signal, and cuts it out. Determine which vowel is a, i, u, d, or o in this section. This can be determined by matching with sample data of a, i, c, e and o. Then, the formant of the vowel is extracted by FFT analysis. The formant data and the vowel information are input to the CPU 30.

ＣＰＵ３０は、入力された母音情報に基づいて、その母音に関するリファレンスを読み出し、入力されたフォルマントデータと比較することによってこの歌唱音声の性別度と推定年齢を割り出す。このようにして割り出した性別度、推定年齢データをＲＡＭ３２に蓄積記憶するとともに、演奏制御データベース１７４を検索して制御態様を読み出し、これに基づいてカラオケ演奏を制御する。そして、カラオケ演奏が終了すると、ＲＡＭ３２に蓄積記憶した性別度および推定年齢の平均値を算出し、この値で表示文言データベース１７２および運勢データベース１７３を検索して、歌唱者の声質の評価文言、および、今日または明日の運勢を読み出す。そして、これらを推定年齢と一緒にモニタ２６に表示する。 CPU30 reads the reference regarding the vowel based on the input vowel information, and calculates the gender degree and estimated age of this singing voice by comparing with the input formant data. The gender degree and estimated age data thus determined are accumulated and stored in the RAM 32, and the performance control database 174 is searched to read out the control mode, and the karaoke performance is controlled based on this. Then, when the karaoke performance is completed, the average value of the gender degree and the estimated age stored and stored in the RAM 32 is calculated, and the display wording database 172 and the fortune database 173 are searched with these values, and the evaluation word of the voice quality of the singer, Read the fortune of today or tomorrow. These are displayed on the monitor 26 together with the estimated age.

図４は同カラオケ装置の動作を示すフローチャートである。この動作は、カラオケ演奏時の動作を示すフローチャートである。カラオケ演奏がスタート（ｓ１）すると、マイク２７から音声信号処理装置２３に歌唱音声を入力する。なお、この歌唱音声は、音源３８等によって合成されるカラオケ演奏音とともにコントロールアンプ２２からスピーカ２５にも出力される。入力された歌唱音声は、音声信号処理装置２３によってフォルマントが抽出され（ｓ３）、このフォルマントデータがＣＰＵ３０に入力される。ＣＰＵ３０はこのフォルマントデータに対応するリファレンスデータを読み出し（ｓ４）、これらを比較することによって（ｓ５）、性別度，推定年齢を割り出す（ｓ６）。割り出した性別度，推定年齢を蓄積記憶する（ｓ７）とともに、これで演奏制御データベース１７４を検索することによって制御態様を割り出し（ｓ８）、これで実行中のカラオケ演奏を制御する（ｓ９）。カラオケ曲が終了するまでｓ２以下の動作を継続的に実行する。 FIG. 4 is a flowchart showing the operation of the karaoke apparatus. This operation is a flowchart showing the operation during karaoke performance. When the karaoke performance starts (s1), the singing voice is input from the microphone 27 to the voice signal processing device 23. The singing voice is output from the control amplifier 22 to the speaker 25 together with the karaoke performance sound synthesized by the sound source 38 or the like. From the input singing voice, a formant is extracted by the audio signal processing device 23 (s3), and this formant data is input to the CPU 30. The CPU 30 reads the reference data corresponding to the formant data (s4) and compares them (s5) to determine the gender degree and the estimated age (s6). The determined gender degree and estimated age are accumulated and stored (s7), and the control mode is determined by searching the performance control database 174 (s8), thereby controlling the karaoke performance being executed (s9). The operation from s2 onward is continuously executed until the karaoke song is finished.

カラオケ曲の演奏が終了すると（ｓ１０）、蓄積記憶した性別度・推定年齢を平均して平均性別度，平均推定年齢を算出し（ｓ１１）、これに基づいて表示文言データベース１７２を検索して表示文言を選出するとともに、運勢データベース１７３を検索して今日または明日の運勢を読み出す（ｓ１３）。そしてこれらをモニタ２６に表示する（ｓ１４）。 When the performance of the karaoke song is finished (s10), the average gender degree and the average estimated age are calculated by averaging the stored gender degree and estimated age (s11), and the display wording database 172 is searched and displayed based on this. While selecting the wording, the fortune database 173 is searched to read the fortune of today or tomorrow (s13). These are displayed on the monitor 26 (s14).

なお、上記実施形態では、歌唱者のフォルマントから割り出された性別度・推定年齢に基づいて、カラオケ演奏を制御し、運勢を表示しているが、カラオケ装置が行う制御、サービスは上記に限定されない。 In the above embodiment, the karaoke performance is controlled and the fortune is displayed based on the gender degree / estimated age determined from the formant of the singer, but the control and service performed by the karaoke device are limited to the above. Not.

この発明の実施形態であるカラオケ装置の機能ブロック図Functional block diagram of a karaoke apparatus as an embodiment of the present invention 同カラオケ装置のハードウェアのブロック図Hardware block diagram of the karaoke device 同カラオケ装置のハードディスクの一部構成を示す図The figure which shows the partial structure of the hard disk of the same karaoke device 同カラオケ装置の動作を示すフローチャートFlow chart showing operation of the karaoke apparatus 一般的なフォルマント周波数の分布を示す図Diagram showing general formant frequency distribution

Explanation of symbols

１…歌唱音声入力部、２…フォルマント抽出部、３…フォルマント比較部、４…リファレンスデータベース、５…運勢データベース、６…表示部、７…カラオケ演奏制御部、８…カラオケ演奏部、
１０…歌唱音声、１１…フォルマント、１２…リファレンス、１３…性別度・推定年齢、
２３…音声信号処理装置、２７…歌唱用マイク、３０…ＣＰＵ、３２…ＲＡＭ、３７…ハードディスク記憶装置、４０…文字表示部、
１７１…リファレンスデータベース、１７２…表示文言データベース、１７３…運勢データベース、１７４…演奏制御データベース DESCRIPTION OF SYMBOLS 1 ... Singing voice input part, 2 ... Formant extraction part, 3 ... Formant comparison part, 4 ... Reference database, 5 ... Fortune database, 6 ... Display part, 7 ... Karaoke performance control part, 8 ... Karaoke performance part,
10 ... Singing voice, 11 ... Formant, 12 ... Reference, 13 ... Gender / estimated age,
23 ... Audio signal processing device, 27 ... Singing microphone, 30 ... CPU, 32 ... RAM, 37 ... Hard disk storage device, 40 ... Character display unit,
171 ... Reference database, 172 ... Display wording database, 173 ... Fortune database, 174 ... Performance control database

Claims

A performance means for performing karaoke performance;
Singing voice input means for inputting singing voice;
Formant extraction means for extracting formants from the singing voice;
An estimation means for estimating a gender degree, which is an index indicating how singing voice is masculine or feminine based on the extracted formants;
Performance control means for controlling elements related to the tone color of the musical sound generated by the performance means based on the gender degree estimated by the estimation means;
A karaoke apparatus comprising: