JP2007140548A

JP2007140548A - Portrait output device and karaoke device

Info

Publication number: JP2007140548A
Application number: JP2007018526A
Authority: JP
Inventors: Takahiro Kawashima; 隆宏川嶋
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-01-29
Filing date: 2007-01-29
Publication date: 2007-06-07
Anticipated expiration: 2018-08-10
Also published as: JP4808641B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a karaoke device with a readily amusement function. <P>SOLUTION: Portrait data and voice formant data of a plurality of singers are managed in a database. When a singing voice of a singing person is input, formants of the singing voice are analyzed to calculate similarities to the formant data in the database. Portraits of two or three singers having high similarities are read out and put together to generate and display a portrait of the singing person. Alternatively, a portrait of a singer having the highest similarity of formants is read out and a message of "You look like this singer." is displayed. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

この発明は、歌唱音声などの音声信号を入力し、その音声から似顔絵を選択・合成する似顔絵出力装置およびカラオケ装置に関する。 The present invention relates to a caricature output device and a karaoke apparatus for inputting a voice signal such as a singing voice and selecting and synthesizing a caricature from the voice.

カラオケ装置には、カラオケ曲を演奏するのみならず、そのカラオケ曲を歌唱する音声を用いたゲーム機能、サービス機能を備えたものも実用化されている。たとえば、歌唱音声のピッチや音量がその曲のガイドメロディが指示する音高や音量にどの程度一致しているかによって点数を算出して表示する採点ゲームもその機能の一つである。 In addition to playing a karaoke song, a karaoke device having a game function and a service function using a voice for singing the karaoke song has been put into practical use. For example, a scoring game in which the score is calculated and displayed according to how much the pitch and volume of the singing voice match the pitch and volume indicated by the guide melody of the song is one of its functions.

しかし、この採点機能は、結果的に歌唱者の歌唱の巧拙を判定するものであるため、だれでも気軽に参加できるものではなかった。また、歌唱を採点して点数を表示しても、歌唱者はその点数をどのように今後の歌唱の参考にすればよいかが分からないという問題点があった。 However, this scoring function, as a result, determines the skill of the singer's singing, so anyone could not participate easily. Moreover, even if the singing was scored and the score was displayed, the singer did not know how to use the score as a reference for future singing.

この発明は、誰でも気軽に使用できる似顔絵出力装置、および、気軽に使用でき、出力される似顔絵を選曲の参考にできるカラオケ装置を提供することを目的とする。 An object of the present invention is to provide a portrait output device that anyone can use easily, and a karaoke device that can be easily used and can reference the output portraits for music selection.

請求項１の発明は、音声を入力する音声入力手段と、入力した音声のフォルマントを分析するフォルマント分析手段と、複数のサンプルフォルマントとそのサンプルフォルマントの共鳴特性を有する顔形状の似顔絵を対応させて記憶した記憶手段と、分析したフォルマントで前記記憶手段を検索し、最も近似したサンプルフォルマントに対応する似顔絵を選択して出力する似顔絵選択手段と、を備えたことを特徴とする。 According to the first aspect of the present invention, a voice input means for inputting a voice, a formant analysis means for analyzing a formant of the inputted voice, a plurality of sample formants and a facial caricature having resonance characteristics of the sample formants are associated with each other. The storage means is stored, and the storage means is searched with the analyzed formant, and the portrait selection means for selecting and outputting the portrait corresponding to the sample formant that is most approximated.

請求項２の発明は、音声を入力する音声入力手段と、入力した音声のフォルマントを分析するフォルマント分析手段と、似顔絵を作成するための似顔絵素材を記憶する素材記憶手段と、フォルマント分析手段が分析したフォルマントに基づき、前記似顔絵素材を用いてそのフォルマントの共鳴特性を有する顔形状の似顔絵を合成して出力する似顔絵出力手段と、を備えたことを特徴とする。 According to the second aspect of the present invention, the voice input means for inputting voice, the formant analysis means for analyzing the formant of the inputted voice, the material storage means for storing the portrait material for creating the portrait, and the formant analysis means analyze And caricature output means for synthesizing and outputting a caricature of the face shape having the resonance characteristics of the formant using the caricature material based on the formant.

請求項３の発明は、歌唱音声を入力する音声入力手段と、入力した音声のフォルマントを分析するフォルマント分析手段と、入力した音声の歌唱態様を分析する歌唱分析手段と、似顔絵を作成するための似顔絵素材を記憶する素材記憶手段と、フォルマント分析手段が分析したフォルマントおよび歌唱分析手段が分析した歌唱態様に基づき、前記似顔絵素材を用いてそのフォルマントの共鳴特性を有する顔形状でその歌唱態様で歌唱する似顔絵を合成して出力する似顔絵合成手段と、を備えたことを特徴とする。 The invention of claim 3 is a voice input means for inputting a singing voice, a formant analysis means for analyzing a formant of the inputted voice, a singing analysis means for analyzing a singing mode of the inputted voice, and a portrait for creating a portrait Based on the material storage means for storing the caricature material, the formant analyzed by the formant analysis means and the singing mode analyzed by the singing analysis means, singing in the singing mode with the facial shape having the resonance characteristics of the formant using the caricature material And caricature composition means for synthesizing and outputting the caricatures to be output.

この発明では、入力された音声（母音）からフォルマントデータを抽出する。フォルマントデータとは、発声された母音のスペクトル上の優勢な周波数成分であり、周波数の低い順に第１フォルマント、第２フォルマント、…と呼んでいる。このうち、第３フォルマントまでが音韻性に寄与していると言われている。このフォルマントは発声者の声帯や顔などの共鳴体の形状に依存しており、顔が似ると声も似ると言われている。 In the present invention, formant data is extracted from the input speech (vowel). The formant data is a dominant frequency component on the spectrum of the uttered vowel, and is called first formant, second formant,. Of these, up to the third formant is said to contribute to phonological properties. This formant depends on the shape of the resonator, such as the vocal cords and face of the speaker, and it is said that the voice is similar if the face is similar.

そこで、この発明では、複数のフォルマントとそのフォルマントを有する人物の似顔絵を記憶しておき、入力された音声とフォルマントが類似する似顔絵を選択して出力する。出力された似顔絵はモニタに表示、プリンタで印刷して利用者に提示するようにすればよい。これにより、自分の声でどのような顔が想像されるかを利用者に知らせることができ、利用者に対してアミューズメントを提供することができる。フォルマントおよび似顔絵は実在の人物から採取したものを用いてもよく、頭部の共鳴体モデルからシミュレーションして作成したものでもよい。 Therefore, in the present invention, a plurality of formants and a portrait of a person having the formants are stored, and a portrait similar to the input speech and formant is selected and output. The output portrait may be displayed on a monitor, printed on a printer, and presented to the user. Thereby, it is possible to inform the user what kind of face is imagined by his / her voice and to provide an amusement to the user. The formants and caricatures may be those collected from real people or may be created by simulation from a resonator model of the head.

また、カラオケ装置にカラオケ曲のオリジナル歌手の似顔絵とその歌手のフォルマントを記憶しておき、利用者が声を入力したとき、その声がどの歌手の声に似ているかを似顔絵の表示（印刷）で知らせるようにしてもよい。利用者は、その歌手の歌を選曲すればその曲に合った声で歌唱できるため、似顔絵の表示を選曲の参考にすることができる。 In addition, the singer's portrait and the singer's formant are stored in the karaoke device, and when the user inputs a voice, the singer's voice is displayed (printed). You may make it inform by. Since the user can sing with a voice that matches the song if the song is selected, the display of the portrait can be used as a reference for the song selection.

また、この発明では、記憶手段に似顔絵の素材を記憶しておき、音声から抽出されたフォルマントに基づいてこの似顔絵素材を用いて似顔絵を合成する。素材としては顔の輪郭、眉、眼、鼻、口などの構成部品を部品毎に複数記憶しておき、フォルマントに基づいてぞれぞれを選択するようにしてもよく、サンプル似顔絵を素材として複数記憶しておき、各似顔絵から顔の輪郭、眉、眼、鼻、口などの構成部品を取り出して組み合わせるようにしてもよい。また、素材であるサンプル似顔絵をそのまま変形して利用者の似顔絵を合成してもよい。いずれにしても、抽出されたフォルマントを共鳴周波数としてもつような似顔絵を作成することにより、利用者の顔をよく推定することができる。 In the present invention, the caricature material is stored in the storage means, and the caricature material is synthesized using the caricature material based on the formants extracted from the speech. As a material, a plurality of components such as facial contours, eyebrows, eyes, nose, mouth, etc. may be stored for each part, and each may be selected based on a formant. A plurality of components may be stored, and component parts such as a face outline, eyebrows, eyes, nose, and mouth may be extracted from each portrait and combined. Alternatively, the sample caricature that is the material may be transformed as it is to synthesize the caricature of the user. In any case, the user's face can be well estimated by creating a portrait that has the extracted formant as the resonance frequency.

以上のようにこの発明によれば、音声のフォルマントを抽出して、利用者（歌唱者）の似顔絵や声のよく似た歌手の似顔絵などを表示することができるため、誰でも気軽に楽しめるアミューズメントを提供することができ、また、声の似ている歌手の似顔絵が表示されれば、以後その歌手の曲を選択すればよく声質に合った歌唱をすることができるため、選曲が容易になりカラオケ歌唱の参考にすることができる。 As described above, according to the present invention, it is possible to extract a voice formant and display a caricature of a user (singer) or a caricature of a singer with a similar voice. If a portrait of a singer with a similar voice is displayed, the song can be selected according to the voice quality by selecting the singer's song. It can be used as a reference for karaoke singing.

図面を参照してこの発明の実施形態について説明する。図１〜図３は、この発明の実施形態であるカラオケ装置の機能ブロック図である。このカラオケ装置では、歌手の似顔絵データと歌唱音声のフォルマントデータをサンプルデータとしてサンプルデータベース４に複数記憶している。このサンプルデータベース４の一例を図４に示す。カラオケ曲の演奏に合わせて歌唱した歌唱音声を入力し、この歌唱音声から抽出したフォルマントデータ（抽出フォルマントデータ）を上記サンプルデータベースのフォルマントデータ（サンプルフォルマントデータ）と比較し、声の似ている歌手を選択して、その似顔絵を表示、または、その似顔絵に基づいて歌唱者の似顔絵を合成・表示する。
図１において、歌唱音声入力部１は、カラオケ歌唱用のマイクを含んでいる。歌唱音声入力部１が電気信号に変換した歌唱音声はフォルマント抽出部２およびカラオケ演奏部６に入力される。フォルマント抽出部２は、入力された歌唱音声から母音を切り出し、各母音毎のフォルマントを抽出する。母音は周期信号であり、カラオケ歌唱においては数十ミリ秒〜数秒程度の時間継続するため、同一周期の波形区間を切り出すことによって短時間の非周期信号である子音と区別することができる。また、その周期波形の形状に基づいてア，イ，ウ，エ，オのどの母音であるかを識別することができる。フォルマントとは、母音の周波数スペクトル上の優勢な周波数成分であり、周波数の低い順に第１，第２，第３，…フォルマントと言う。フォルマント抽出部２は、切り出された母音の第１〜第３フォルマントを抽出する。このフォルマントの抽出はＦＦＴ（高速フーリエ解析）などで行えばよい。フォルマント抽出部２は、カラオケ曲全部の歌唱音声のフォルマントを抽出して、母音毎に蓄積記憶し、カラオケ曲の演奏終了後、これを平均した値を抽出フォルマントデータとして出力する。 Embodiments of the present invention will be described with reference to the drawings. 1 to 3 are functional block diagrams of a karaoke apparatus according to an embodiment of the present invention. In this karaoke apparatus, a plurality of singer portrait data and singing voice formant data are stored in the sample database 4 as sample data. An example of this sample database 4 is shown in FIG. A singer whose voice resembles the voice of the singing voice of the karaoke song, and the formant data (extracted formant data) extracted from this singing voice is compared with the formant data (sample formant data) in the sample database. Is selected to display the portrait, or based on the portrait, the singer's portrait is synthesized and displayed.
In FIG. 1, the singing voice input unit 1 includes a microphone for karaoke singing. The singing voice converted into an electric signal by the singing voice input unit 1 is input to the formant extraction unit 2 and the karaoke performance unit 6. The formant extraction unit 2 cuts out vowels from the input singing voice and extracts formants for each vowel. A vowel is a periodic signal, which lasts for several tens of milliseconds to several seconds in karaoke singing, so that it can be distinguished from a consonant that is a short-period aperiodic signal by cutting out a waveform section of the same period. Moreover, it is possible to identify which vowel is a, i, u, d, or o based on the shape of the periodic waveform. A formant is a dominant frequency component on the frequency spectrum of a vowel, and is called first, second, third,. The formant extraction unit 2 extracts first to third formants of the cut vowel. This formant extraction may be performed by FFT (fast Fourier analysis) or the like. The formant extraction unit 2 extracts the formants of the singing voices of all the karaoke songs, accumulates and stores them for each vowel, and outputs the averaged value as extracted formant data after the performance of the karaoke songs.

この抽出フォルマントデータは、フォルマント比較部３に入力される。フォルマント比較部３は、入力した抽出フォルマントデータをサンプルデータベース４のサンプルフォルマントデータと比較する。図４に示すように、サンプルデータベース４は、複数の歌手の似顔絵データを記憶した似顔絵データベース４ｂとこれに対応して各歌手の歌唱音声のフォルマントデータを記憶したフォルマントデータベース４ａからなっており、フォルマント比較部３は、抽出フォルマントデータと各サンプルフォルマントデータとを相関比較し、抽出フォルマントデータすなわち歌唱者の声が、各サンプルフォルマントデータすなわち各歌手の声にどの程度似ているかの類似度を割り出す。そしてこの類似度を似顔絵合成部５に出力する。 This extracted formant data is input to the formant comparison unit 3. The formant comparison unit 3 compares the input extracted formant data with the sample formant data in the sample database 4. As shown in FIG. 4, the sample database 4 includes a portrait database 4b storing portrait data of a plurality of singers and a formant database 4a storing formant data of each singer's singing voice corresponding thereto. The comparison unit 3 correlates and compares the extracted formant data and each sample formant data, and calculates the degree of similarity between the extracted formant data, that is, the voice of the singer, and the sample formant data, that is, the voice of each singer. Then, this similarity is output to the portrait synthesis unit 5.

似顔絵選択部５は、入力された類似度のうち最も高い類似度を示すサンプルの似顔絵を選択し、似顔絵データベース４ｂからこの似顔絵データを読み出す。似顔絵選択部５が選択して読み出した似顔絵データは、歌詞の表示が終了した表示部９に出力して表示するとともに、印刷部１０で印刷する。そして、このとき同時に「あなたはの声はこの歌手の○○さんに似ています」などの文言を表示・印刷し、以後の選曲の参考になるようにする。また、このときこの歌手のカラオケ曲を検索して、選曲支援をするようにしてもよい。 The portrait selection unit 5 selects a sample portrait showing the highest similarity among the input similarities, and reads the portrait data from the portrait database 4b. The caricature data selected and read by the caricature selection unit 5 is output and displayed on the display unit 9 where the display of the lyrics has been completed, and is printed by the printing unit 10. At the same time, a phrase such as “Your voice is similar to this singer XX” is displayed and printed so that it can be used as a reference for subsequent song selection. At this time, the singer's karaoke song may be searched for music selection support.

また、図１の例は、フォルマントが最も類似するサンプル（歌手）を１つ選択して、その歌手の似顔絵を表示するものであるが、フォルマントの類似度の高いサンプルの似顔絵を複数選択し、それらを組み合わせることによって１つの似顔絵を合成するようにしてもよい。 In the example of FIG. 1, one sample (singer) having the most similar formants is selected and the portraits of the singer are displayed. However, a plurality of portraits having a high formant similarity are selected. A single caricature may be synthesized by combining them.

この似顔絵合成機能を備えたカラオケ装置の機能ブロック図を図２に示す。同図において図１の機能ブロック図と異なる点は、似顔絵合成部１５が、フォルマント比較部３から入力した類似度が最も高いサンプルの似顔絵を選択するのでなく、フォルマントの類似度の高い複数のサンプルを選択し、このサンプルの似顔絵データに基づいて１つの似顔絵を合成する点である。 FIG. 2 shows a functional block diagram of a karaoke apparatus provided with this portrait synthesis function. 1 is different from the functional block diagram of FIG. 1 in that the portrait synthesis unit 15 does not select the portrait with the highest similarity input from the formant comparison unit 3, but a plurality of samples with high formant similarity. Is selected, and one portrait is synthesized based on the portrait data of this sample.

似顔絵合成部１５は、入力された類似度に基づき、声がよく似ている２〜３人の歌手の似顔絵データを似顔絵データベース４ｂから選択し、これらの似顔絵に基づいて歌唱した歌唱者の似顔絵を合成する。この合成手法としては、部品組み合わせ法、モーフィング法などの手法を用いればよい。 The caricature composition unit 15 selects caricature data of two or three singers whose voices are very similar from the caricature database 4b based on the input similarity and selects the caricatures of the singer who sang based on these caricatures. Synthesize. As this synthesis method, a method such as a component combination method or a morphing method may be used.

部品組み合わせ法は、顔の輪郭、眉、眼、鼻、口などの顔の構成部品を上記選択された複数のサンプル似顔絵データから適宜選択し、これを組み合わせて似顔絵を合成する手法である。たとえば、第１フォルマントが最も類似している歌手の似顔絵から顔の輪郭を選択し、第２フォルマントが最も類似している歌手の似顔絵から口を選択し、第３フォルマントが最も類似している歌手の似顔絵から鼻を選択するなどの方法で各部品を選択すればよい。 The component combination method is a method in which facial component parts such as a face outline, eyebrows, eyes, nose, and mouth are appropriately selected from the selected plurality of sample portrait data and combined to synthesize a portrait. For example, a face outline is selected from a portrait of a singer whose first formant is most similar, a mouth is selected from a portrait of a singer whose second formant is most similar, and a singer whose third formant is most similar Each part may be selected by a method such as selecting the nose from the caricature.

また、この例では複数の似顔絵のなかから、顔の輪郭、眉、眼、鼻、口などの顔の構成部品を選択するようにしているが、顔の輪郭、眉、眼、鼻、口などの各構成部品毎にデータベースをもっておき、抽出されたフォルマントに基づいて各構成部品毎に適当なものをピックアップして似顔絵を構成するようにしてもよい。 In this example, face components such as face outline, eyebrows, eyes, nose, mouth, etc. are selected from a plurality of caricatures, but face outline, eyebrows, eyes, nose, mouth, etc. A database may be provided for each component, and a caricature may be constructed by picking up an appropriate component for each component based on the extracted formants.

モーフィング法は、上記選択された複数の似顔絵データの中間図形を合成する手法である。中間図形の合成は、顔の輪郭、眉、眼、鼻、口などの顔の構成部品毎に複数の似顔絵データの中間形状を割り出し、複数の似顔絵データにおける配置の中間的な位置に配置することによって合成される。合成は、フォルマントの類似度の高いサンプル似顔絵データにより近くなるように行う。 The morphing method is a method of synthesizing intermediate figures of the plurality of selected portrait data. In the synthesis of intermediate figures, the intermediate shape of multiple portrait data is determined for each facial component such as the face outline, eyebrows, eyes, nose, mouth, etc., and placed at an intermediate position between the multiple portrait data. Is synthesized by The synthesis is performed so as to be closer to the sample portrait data having a high formant similarity.

このようにして合成された似顔絵をカラオケ演奏が終了した表示部９に表示するとともに印刷部１０で印刷する。選択したサンプル歌手の氏名とその合成比率を表示することでよりアミューズメント効果を高めることもできる。 The caricature synthesized in this way is displayed on the display unit 9 where the karaoke performance has been completed and printed by the printing unit 10. The amusement effect can be further enhanced by displaying the name of the selected sample singer and its composition ratio.

また、構成部品を合成して似顔絵を合成する方式以外に、頭部の共鳴体のモデルを記憶しておき、歌唱音声が入力されたとき、その音声（フォルマント）が形成されるような頭部共鳴体をシミュレートし、これに基づいて似顔絵をレンダリングするようにしてもよい。 In addition to the method of synthesizing caricatures by synthesizing components, the head that stores the model of the resonance body of the head and forms the sound (formant) when singing voice is input A resonator may be simulated, and a caricature may be rendered based on this.

図２の例では、フォルマントの類似度に応じて似顔絵を合成するようにしているが、これに加えて歌唱者の歌唱態様に応じて似顔絵の形状を調整するようにしてもよい。歌唱態様とは、レガート、アクセントなどの歌い方、音量のダイナミックなどである。 In the example of FIG. 2, the portraits are synthesized according to the formant similarity, but in addition to this, the shape of the portrait may be adjusted according to the singing mode of the singer. The singing mode includes singing such as legato and accent, dynamic volume, and the like.

図３は上記機能を備えたカラオケ装置の機能ブロック図である。同図の機能ブロック図において図２のものと異なる点は、歌唱態様検出部１１を備えた点、および、歌唱態様検出部１１が検出出力する歌唱態様データに基づいて似顔絵合成部１６の似顔絵の合成動作が制御される点である。歌唱音声入力部１が電気信号に変換した歌唱音声信号はフォルマント抽出部２，演奏部６以外に歌唱態様検出部１１にも入力される。歌唱態様検出部１１は演奏部６から演奏中の曲データを入力し、この曲データに応じて歌唱者がどのような歌唱をしているかの歌唱態様を検出する。歌唱態様とは、上記のようにレガート、アクセントなどの歌い方、音量のダイナミックなどである。検出された歌唱態様情報は似顔絵合成部５′に入力される。似顔絵合成部１６は図２の例と同様にフォルマントの類似度に応じて似顔絵を合成するが、合成した似顔絵をこの歌唱態様情報に応じて変形する。たとえば、強いアクセントで大きいダイナミックで歌唱している歌唱者の場合、眉を太く変形し、レガートで歌っている歌唱者の場合目尻を下げるなどの変形を行えばよい。このようにフォルマントに加えて歌唱態様に合わせて似顔絵を作成することにより、より精度の高いまたはアミューズメント性のある似顔絵合成機能を実現することができる。 FIG. 3 is a functional block diagram of a karaoke apparatus having the above functions. The functional block diagram of FIG. 2 differs from that of FIG. 2 in that it includes the singing mode detection unit 11 and the caricature composition unit 16 based on the singing mode data detected and output by the singing mode detection unit 11. This is the point at which the composition operation is controlled. The singing voice signal converted into an electric signal by the singing voice input unit 1 is input to the singing mode detection unit 11 in addition to the formant extraction unit 2 and the performance unit 6. The singing mode detection unit 11 inputs the song data being played from the performance unit 6 and detects the singing mode of what the singer is singing according to the song data. As described above, the singing mode includes singing such as legato and accent, dynamic volume, and the like. The detected singing mode information is input to the portrait synthesizer 5 '. The caricature composition unit 16 synthesizes a caricature according to the formant similarity as in the example of FIG. 2, and deforms the synthesized caricature according to the singing mode information. For example, in the case of a singer who is singing with a strong accent and a large dynamic, the eyebrows may be deformed thickly, and in the case of a singer singing in legato, the bottom of the eyes may be lowered. Thus, by creating a portrait in accordance with the singing mode in addition to the formant, a more accurate or amusement portrait synthesis function can be realized.

図５は上記機能を実現するカラオケ装置のハードウェアのブロック図である。図１〜図３に示す機能は、このハードウェア上で図６に示すようなプログラムを実行することによって実現される。
このカラオケ装置は、カラオケ装置本体２１，コントロールアンプ２２，音声信号処理装置２３，ＣＤ−ＲＯＭチェンジャ２４，スピーカ２５，モニタ２６，マイク２７、赤外線のリモコン装置２８およびプリンタ２９で構成されている。カラオケ装置本体２１はこのカラオケ装置全体の動作を制御する。該カラオケ装置本体２１の制御装置であるＣＰＵ３０には、内部バスを介してＲＯＭ３１，ＲＡＭ３２，ハードディスク記憶装置３７，通信制御部３６，リモコン受信部３３，表示パネル３４，パネルスイッチ３５，音源装置３８，音声データ処理部３９，パターン展開部４０，表示制御部４１が接続されるとともに、上記外部装置であるコントロールアンプ２２，音声信号処理装置２３およびＣＤ−ＲＯＭチェンジャ２４がインタフェースを介して接続されている。 FIG. 5 is a block diagram of hardware of a karaoke apparatus that realizes the above function. The functions shown in FIGS. 1 to 3 are realized by executing a program as shown in FIG. 6 on this hardware.
This karaoke device is composed of a karaoke device main body 21, a control amplifier 22, an audio signal processing device 23, a CD-ROM changer 24, a speaker 25, a monitor 26, a microphone 27, an infrared remote control device 28 and a printer 29. The karaoke device main body 21 controls the operation of the entire karaoke device. The CPU 30, which is a control device of the karaoke device main body 21, has a ROM 31, RAM 32, a hard disk storage device 37, a communication control unit 36, a remote control reception unit 33, a display panel 34, a panel switch 35, a sound source device 38, and the like via an internal bus. The audio data processing unit 39, the pattern development unit 40, and the display control unit 41 are connected, and the control amplifier 22, the audio signal processing device 23, and the CD-ROM changer 24, which are the external devices, are connected via an interface. .

ＲＯＭ３１にはこの装置を起動するために必要な起動プログラムなどが記憶されている。装置の動作を制御するシステムプログラム，カラオケ演奏実行プログラムなどはハードディスク記憶装置３７に記憶されている。カラオケ装置の電源がオンされると上記起動プログラムによってシステムプログラムやカラオケ演奏プログラムがＲＡＭ３２に読み込まれる。 The ROM 31 stores an activation program necessary for activating this apparatus. A system program for controlling the operation of the apparatus, a karaoke performance execution program, and the like are stored in the hard disk storage device 37. When the power of the karaoke apparatus is turned on, a system program and a karaoke performance program are read into the RAM 32 by the above-described startup program.

ハードディスク記憶装置３７には、上記プログラムのプログラムファイルや多数の楽曲データからなる楽曲データベースが記憶されているほか、サンプルデータベース３７ａが記憶されている。サンプルデータベース３７ａは、カラオケ曲のオリジナル歌手のフォルマントデータを記憶したフォルマントデータベース、および、各オリジナル歌手の似顔絵を記憶した似顔絵データベースを有し、各オリジナル歌手は、歌手番号で識別される。 The hard disk storage device 37 stores a program database of the above program and a music database composed of a large number of music data, and also stores a sample database 37a. The sample database 37a has a formant database that stores formant data of original singers of karaoke songs, and a portrait database that stores portraits of each original singer, and each original singer is identified by a singer number.

前記ＲＡＭ３２には、装置の起動時にハードディスク記憶装置３７からプログラムを読み込むプログラム記憶エリアや演奏されるカラオケ曲の楽曲データを読み込む実行曲データ記憶エリアなどが設定されるほか、カラオケ演奏中に検出されるフォルマントを蓄積記憶するフォルマント蓄積記憶エリア３２ａも設定される。 The RAM 32 is set with a program storage area for reading a program from the hard disk storage device 37 when the apparatus is started up, an execution song data storage area for reading song data of a karaoke song to be played, and the like, and is detected during karaoke performance. A formant storage area 32a for storing and storing formants is also set.

通信制御部３６はＩＳＤＮ回線を介して配信センタ１９と接続される。配信センタ１９は、定期的にカラオケ装置に対して電話を掛け、新曲の楽曲データやバージョンアップされた制御プログラムなどをダウンロードする。また、上記サンプルデータベース３７ａも配信センタ１９からダウンロードされる。 The communication control unit 36 is connected to the distribution center 19 via an ISDN line. The distribution center 19 periodically calls the karaoke device to download music data of new songs, upgraded control programs, and the like. The sample database 37a is also downloaded from the distribution center 19.

リモコン装置２８は、テンキーなどのキースイッチを備えており、利用者がこれらのスイッチを操作するとその操作に応じて曲番号などのコード信号が赤外線で出力される。リモコン受信部３３はリモコン装置１８から送られてくる赤外線信号を受信して、そのコード信号を復元しＣＰＵ３０に入力する。 The remote control device 28 includes key switches such as a numeric keypad. When a user operates these switches, a code signal such as a song number is output in infrared according to the operation. The remote control receiving unit 33 receives the infrared signal sent from the remote control device 18, restores the code signal, and inputs it to the CPU 30.

ここで、各カラオケ曲（楽曲データ）の曲番号は、４桁の歌手番号＋２桁の歌手別曲番号の６桁で構成されている。したがって、６桁のうち上位４桁に注目することにより、そのカラオケ曲がどの歌手が歌っている曲であるかを容易に判断することができる。そして、上記サンプルデータベース３７ａに記憶されている歌手の似顔絵およびフォルマントもこの４桁の歌手番号で識別される。 Here, the song number of each karaoke song (music data) is composed of 6 digits of a 4-digit singer number + a 2-digit singer-specific song number. Therefore, by paying attention to the upper 4 digits of the 6 digits, it is possible to easily determine which singer is singing the karaoke song. The singer's portrait and formant stored in the sample database 37a are also identified by this 4-digit singer number.

表示パネル３４はこのカラオケ装置本体２１の前面に設けられており、現在演奏中の曲番号や予約曲数を表示するマトリクス表示器や、現在設定されているキーやテンポを表示するＬＥＤ群などを含んでいる。パネルスイッチ３５は、前記汎用のリモコン装置２８と同様の曲番号入力用のテンキーなどを備えている。 The display panel 34 is provided on the front surface of the karaoke apparatus main body 21 and includes a matrix display for displaying the number of the currently played song and the number of reserved songs, and a group of LEDs for displaying the currently set key and tempo. Contains. The panel switch 35 includes a numeric keypad for inputting a music number similar to the general-purpose remote control device 28.

音源装置３８は、楽曲データに基づいて楽音信号を形成する。楽曲データは、複数トラックの演奏データを含んでおり、音源装置３８はこのデータに基づいて複数パートの楽音信号を同時に形成する。音声データ処理部３９は、楽曲データに含まれる音声データに基づき、指定された長さ、指定された音高の音声信号を形成する。音声データは、バックコーラスなどの人声など電子的に形成しにくい信号波形をそのままＰＣＭ信号として記憶したものである。前記音源装置３８が形成した楽音信号および音声データ処理部３９が再生した音声信号は、コントロールアンプ２２に入力される。 The sound source device 38 forms a musical sound signal based on the music data. The music data includes performance data of a plurality of tracks, and the tone generator 38 simultaneously forms a plurality of parts of tone signals based on this data. The audio data processing unit 39 forms an audio signal having a specified length and a specified pitch based on the audio data included in the music data. The audio data is obtained by storing a signal waveform that is difficult to form electronically, such as a human voice such as a back chorus, as it is as a PCM signal. The musical tone signal formed by the sound source device 38 and the audio signal reproduced by the audio data processing unit 39 are input to the control amplifier 22.

また、コントロールアンプ２２には、２本のマイク２７ａ，２７ｂが接続されており、カラオケ歌唱者の歌唱音声が入力される。コントロールアンプ２２はこれらのオーディオ信号に、それぞれエコーなど所定の効果を付与したのち増幅してスピーカ２５に出力する。音声信号処理装置２３は、コントロールアンプ２２から入力された歌唱音声の信号（いずれか１本のマイクの信号）をディジタルデータに変換し、周期信号（母音）を切り出してこの周期信号をＦＦＴ解析することによりフォルマントを抽出する。また、この周期波形の形状に基づきア，イ，ウ，エ，オのどの母音であるかを識別し、これを示す母音情報を発生する。抽出されたフォルマントデータおよび母音情報はＣＰＵ３０に入力される。また、音声信号処理装置２３は、歌唱音声の音程のずれを修正したり、他のパートのハーモニー歌唱を作成したりする機能を備えている。修正された歌唱音声や他のパートのハーモニー歌唱音声は再度コントロールアンプ２２に入力される。この修正機能は両方のマイクの信号に施してもよい。 In addition, two microphones 27a and 27b are connected to the control amplifier 22, and the singing voice of the karaoke singer is input. The control amplifier 22 gives a predetermined effect such as echo to these audio signals, amplifies them, and outputs them to the speaker 25. The audio signal processor 23 converts the singing voice signal (any one microphone signal) input from the control amplifier 22 into digital data, cuts out a periodic signal (vowel), and performs FFT analysis on the periodic signal. To extract the formants. Further, based on the shape of the periodic waveform, the vowel of A, B, U, D, or A is identified, and vowel information indicating this is generated. The extracted formant data and vowel information are input to the CPU 30. Moreover, the audio | voice signal processing apparatus 23 is equipped with the function which corrects the shift | offset | difference of the pitch of a song voice, or creates the harmony song of another part. The corrected singing voice and the harmony singing voice of other parts are input to the control amplifier 22 again. This correction function may be applied to the signals of both microphones.

パターン展開部４０はＶＲＡＭを備え、ＣＰＵ３０から入力されるパターンデータをモニタ２６の表示エリアに対応したマトリクスに展開する。パターンデータとしては、カラオケ曲演奏中の歌詞（文字パターン）データやカラオケ演奏終了後の似顔絵データなどがある。展開されたマトリクスデータは、順次スキャンされ映像信号として表示制御部４１に入力される。カラオケ演奏時はＣＤ−ＲＯＭチェンジャ２４は背景映像を再生し、この映像信号も表示制御部４１に入力される。表示制御部４１は、歌詞の文字パターンを背景映像にスーパーインポーズで合成してモニタ２６に表示する。カラオケ演奏終了後は、背景映像が入力されないため表示制御部４１はブルーバックとし、ＣＰＵ３０から入力される似顔絵データをそのうえに展開してモニタ２６に表示する。 The pattern development unit 40 includes a VRAM and develops pattern data input from the CPU 30 into a matrix corresponding to the display area of the monitor 26. The pattern data includes lyrics (character pattern) data during performance of a karaoke song and caricature data after completion of the karaoke performance. The developed matrix data is sequentially scanned and input to the display control unit 41 as a video signal. During the karaoke performance, the CD-ROM changer 24 reproduces the background video, and this video signal is also input to the display control unit 41. The display control unit 41 synthesizes the character pattern of the lyrics with the background image in a superimposition and displays it on the monitor 26. After the karaoke performance is completed, since the background video is not input, the display control unit 41 makes a blue background, develops the portrait data input from the CPU 30 and displays it on the monitor 26.

上記構成のカラオケ装置でカラオケ演奏が実行されると、マイク２７から入力された歌唱音声がコントロールアンプ２２を介して音声信号処理装置２３に入力される。音声信号処理装置２３は、この信号をデジタルデータ化し、周期信号の区間を割り出してこれを切り出す。この区間がア，イ，ウ，エ，オのどの母音であるかを割り出す。これはア，イ，ウ，エ，オのサンプルデータとのマッチングなどで割り出せばよい。そして、ＦＦＴ解析によりその母音のフォルマントを抽出する。このフォルマントデータと前記母音情報をＣＰＵ３０に入力する。 When a karaoke performance is executed by the karaoke apparatus having the above configuration, the singing voice input from the microphone 27 is input to the audio signal processing apparatus 23 via the control amplifier 22. The audio signal processing device 23 converts this signal into digital data, determines the period of the periodic signal, and cuts it out. Determine which vowel is a, i, u, d, or o in this section. This can be determined by matching with sample data of a, i, c, e and o. Then, the formant of the vowel is extracted by FFT analysis. The formant data and the vowel information are input to the CPU 30.

ＣＰＵ３０は、フォルマントデータを各母音毎に蓄積記憶してゆく。カラオケ演奏が終了すると、ＲＡＭ３２のフォルマント蓄積記憶エリア３２ａに蓄積記憶したフォルマントデータを各母音毎に平均して抽出フォルマントデータ値を算出する。そして、この値とサンプルデータベース３７ａから読み出されるサンプルフォルマントデータとを比較して各サンプルフォルマントデータとの類似度を割り出す。そして、この類似度に基づいて似顔絵データベースにアクセスし、フォルマントが最も類似する似顔絵を１つ選択してこれを表示する。または、フォルマントが類似する似顔絵を複数読み出して部品を組み合わせまたはモーフィングして１つの似顔絵を合成する。このとき、歌唱態様に応じて似顔絵を変形してもよい。 The CPU 30 accumulates and stores formant data for each vowel. When the karaoke performance is completed, the formant data stored and stored in the formant storage area 32a of the RAM 32 is averaged for each vowel to calculate an extracted formant data value. Then, this value is compared with the sample formant data read from the sample database 37a, and the similarity to each sample formant data is determined. Then, the caricature database is accessed based on the similarity, and one caricature with the most similar formant is selected and displayed. Alternatively, a plurality of portraits with similar formants are read out, and parts are combined or morphed to synthesize one portrait. At this time, the caricature may be transformed according to the singing mode.

図６は、同カラオケ装置の動作を示すフローチャートである。この動作は、カラオケ曲演奏時の似顔絵選択・合成動作を示すものである。カラオケ曲が演奏され、歌唱者がマイク７に歌唱音声を入力するとこの歌唱音声を取り込んで母音のフォルマントデータを抽出する（ｓ１）。そして、この抽出フォルマントとフォルマントデータベース４ａのサンプルフォルマントとを比較して類似度を求める（ｓ２）。なお、上記フォルマントデータの抽出は、カラオケ曲の開始から終了まで継続して行い、蓄積記憶したものを平均して抽出フォルマントデータを求めるが、カラオケ曲の一部区間で抽出したフォルマントデータを抽出フォルマントデータとして用いてもよい。 FIG. 6 is a flowchart showing the operation of the karaoke apparatus. This operation indicates a portrait selection / combination operation when a karaoke song is played. When a karaoke song is played and the singer inputs a singing voice to the microphone 7, the singing voice is taken in and formant data of a vowel is extracted (s1). Then, this extracted formant is compared with the sample formants in the formant database 4a to obtain the similarity (s2). The formant data is extracted continuously from the beginning to the end of the karaoke song, and the accumulated formant data is averaged to obtain the extracted formant data. However, the formant data extracted in a part of the karaoke song is extracted. It may be used as data.

相関比較することによって求められた類似度で似顔絵を合成または選択するが、現在どちらのモードであるかを判断する（ｓ３）。このモードは利用者によって選択可能にしてもよく、また、係員またはオンラインで自動設定されるようにしてもよい。選択モードの場合には、上記比較において最も類似度の高かったサンプルの似顔絵を選択して似顔絵データベース４ｂから読み出し（ｓ４）、これをモニタ２６に表示するとともにプリントアウトする（ｓ５）。 A portrait is synthesized or selected with the degree of similarity determined by the correlation comparison, and it is determined which mode is currently in effect (s3). This mode may be selectable by the user, or may be automatically set by an attendant or online. In the selection mode, the sample portrait with the highest similarity in the comparison is selected and read from the portrait database 4b (s4), displayed on the monitor 26 and printed out (s5).

一方、合成モードの場合には、上記比較において類似度が高かった２ないし３のサンプルの似顔絵データを似顔絵データベース４ｂから読み出し（ｓ６）、これに基づいて歌唱者の似顔絵を合成する（ｓ７）。合成手法は、上述したように部品組み合わせ法またはモーフィング法で行えばよい。このように合成された似顔絵をモニタ２６に表示するとともにプリントアウトする（ｓ５）。 On the other hand, in the synthesis mode, the portrait data of 2 to 3 samples having high similarity in the above comparison are read from the portrait database 4b (s6), and based on this, the singer's portrait is synthesized (s7). The synthesis method may be performed by the component combination method or the morphing method as described above. The caricature synthesized in this way is displayed on the monitor 26 and printed out (s5).

なお、この実施形態では、フォルマントとして、第１、第２、第３フォルマントを用いたが、第１，第２フォルマントのみでもよく、第４フォルマント以上の高次フォルマントを用いてもよい。また、図４に示すフォルマントデータベースではフォルマント周波数とフォルマントレベルを記憶しているがフォルマント周波数のみでもよい。また、フォルマントとして連続したスペクトル波形を用いてもよい。 In this embodiment, the first, second, and third formants are used as the formants. However, only the first and second formants may be used, and higher-order formants that are equal to or higher than the fourth formants may be used. Further, the formant database shown in FIG. 4 stores formant frequencies and formant levels, but only formant frequencies may be stored. Further, a continuous spectrum waveform may be used as the formant.

図７は同カラオケ装置の選曲支援動作を示すフローチャートである。この選曲支援動作は、利用者（歌唱者）の音声のフォルマントを抽出し、この抽出フォルマントとよく似たフォルマントの音声を有する歌手の曲を選択し、利用者に提示する動作である。 FIG. 7 is a flowchart showing the music selection support operation of the karaoke apparatus. This music selection support operation is an operation of extracting a voice formant of a user (singer), selecting a song of a singer having a formant voice similar to the extracted formant, and presenting it to the user.

まず、マイク２７の入力音声からフォルマントを抽出する（ｓ１１）。この入力音声は、カラオケ曲の歌唱音声であってもよく、また、選曲のために入力された音声であってもよい。なお、このときカラオケ装置は事前にフォルマント選曲モードが設定されているものとする。抽出された利用者のフォルマントをサンプルデータベース３７ａに記憶されているサンプルフォルマントと比較し（ｓ１２）、最も類似するサンプルデータを選出する（ｓ１３）。このサンプルデータの歌手番号を読み出し（ｓ１４）、この歌手番号で曲データベースを検索し、この歌手番号のカラオケ曲を抽出する（ｓ１５）。そして、このカラオケ曲をモニタにリスト表示する（ｓ１６）。リスト表示されたカラオケ曲の番号を利用者がリモコンから入力することによって（ｓ１７）このカラオケ曲を選曲・演奏することができる（ｓ１８）。 First, a formant is extracted from the input sound of the microphone 27 (s11). This input voice may be a singing voice of a karaoke song, or may be a voice input for music selection. At this time, the formant music selection mode is set in advance in the karaoke apparatus. The extracted formant of the user is compared with the sample formant stored in the sample database 37a (s12), and the most similar sample data is selected (s13). The singer number of the sample data is read (s14), the song database is searched with this singer number, and the karaoke song of this singer number is extracted (s15). The list of karaoke songs is displayed on the monitor (s16). When the user inputs the number of the displayed karaoke song from the remote controller (s17), the karaoke song can be selected and played (s18).

また、いわゆる似顔絵データベース４ｂに記憶する似顔絵は、図４に示すようないわゆる線画以外に限定されず写真データなどを用いてもよい。 Further, the portraits stored in the so-called portrait database 4b are not limited to the so-called line drawings as shown in FIG. 4, and photographic data may be used.

なお、この実施形態には、特許請求の範囲に記載した発明以外に、サンプルフォルマントをカラオケ曲の楽曲データと対応づけておき、利用者が入力した音声のフォルマントに類似するサンプルフォルマントを検索して、これに対応する楽曲データを選曲する発明を記載している。サンプルフォルマントと楽曲データの対応づけは、たとえば、歌手のフォルマント（サンプルフォルマント）とこの歌手が歌っているカラオケ曲とを対応づけるようにすればよい。また、声の質と曲の雰囲気で対応づけてもよい。 In this embodiment, in addition to the invention described in the claims, the sample formant is associated with the music data of the karaoke song, and the sample formant similar to the voice formant input by the user is searched. The invention of selecting music data corresponding to this is described. The sample formant and music data may be associated with each other by, for example, associating the singer's formant (sample formant) with the karaoke song sung by the singer. Also, the voice quality may be associated with the music atmosphere.

これにより、声を入力することによって自動的に最も適したカラオケ曲を自動選曲することができる。また選曲手段としては、１曲のみを選曲するのではなく、候補曲として複数の曲を抽出し、そのなかから利用者に１曲を選択させるようにするものも含む。いずれにても利用者は、選曲を容易にすることができ、且つ、自分の声質にあったカラオケ曲を選曲することができる。 Thus, the most suitable karaoke song can be automatically selected by inputting a voice. The music selection means includes not only selecting one song but also extracting a plurality of songs as candidate songs and allowing the user to select one song from them. In any case, the user can easily select a song and can select a karaoke song suitable for his / her voice quality.

この発明の実施形態であるカラオケ装置の機能ブロック図Functional block diagram of a karaoke apparatus as an embodiment of the present invention この発明の実施形態であるカラオケ装置の機能ブロック図Functional block diagram of a karaoke apparatus as an embodiment of the present invention この発明の実施形態であるカラオケ装置の機能ブロック図Functional block diagram of a karaoke apparatus as an embodiment of the present invention サンプルデータベースの例を示す図Diagram showing sample database 同カラオケ装置のハードウェアのブロック図Hardware block diagram of the karaoke device 同カラオケ装置の動作を示すフローチャートFlow chart showing operation of the karaoke apparatus 同カラオケ装置の動作を示すフローチャートFlow chart showing operation of the karaoke apparatus

Explanation of symbols

１…歌唱音声入力部
２…フォルマント抽出部
３…フォルマント比較部
４…サンプルデータベース
５…似顔絵選択部
９…表示部
１０…印刷部、
１１…歌唱態様検出部
１５，１６…似顔絵合成部
２３…音声信号処理装置
２７…歌唱用マイク
２９…プリンタ
３０…ＣＰＵ
３２…ＲＡＭ
３２ａ…フォルマント蓄積記憶エリア
３７…ハードディスク記憶装置
３７ａ…サンプルデータベース
４０…パターン展開部 DESCRIPTION OF SYMBOLS 1 ... Singing voice input part 2 ... Formant extraction part 3 ... Formant comparison part 4 ... Sample database 5 ... Caricature selection part 9 ... Display part 10 ... Printing part,
DESCRIPTION OF SYMBOLS 11 ... Singing mode detection part 15, 16 ... Caricature composition part 23 ... Audio | voice signal processing apparatus 27 ... Singing microphone 29 ... Printer 30 ... CPU
32 ... RAM
32a ... Formant storage area 37 ... Hard disk storage device 37a ... Sample database 40 ... Pattern development unit

Claims

Voice input means for inputting voice;
Formant analysis means for analyzing the formant of the input speech,
A storage means for storing a plurality of sample formants and facial caricatures having resonance characteristics of the sample formants in association with each other;
Carrying out the storage means with the analyzed formant, selecting a portrait corresponding to the closest sample formant and outputting the portrait
Caricature output device with

Voice input means for inputting voice;
Formant analysis means for analyzing the formant of the input speech,
Material storage means for storing caricature material for creating a caricature,
Based on the formant analyzed by the formant analysis means, a portrait output means for synthesizing and outputting a portrait of the face shape having the resonance characteristics of the formant using the portrait material;
Caricature output device with

Voice input means for inputting singing voice;
Formant analysis means for analyzing the formant of the input speech,
Singing analysis means for analyzing the singing mode of the input voice;
Material storage means for storing caricature material for creating a caricature,
Based on the formant analyzed by the formant analysis means and the singing form analyzed by the singing analysis means, the portrait synthesis is performed by synthesizing and outputting the portraits singing in the singing form with the face shape having the resonance characteristics of the formants using the portrait material. Means,
Karaoke device equipped with.