JP2009244607A

JP2009244607A - Duet part singing generation system

Info

Publication number: JP2009244607A
Application number: JP2008091207A
Authority: JP
Inventors: Takahiro Aoyanagi; 孝裕青柳; Tomoaki Nakamura; 友昭中村
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2008-03-31
Filing date: 2008-03-31
Publication date: 2009-10-22
Anticipated expiration: 2028-03-31
Also published as: JP5193654B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for generating ideal duet singing in which virtual duet with a desired duet song can be carried out, even when the duet partner desired by an arbitrary user, for example, an intimate friend or acquaintance is not present there, or irrelevant of his or her singing capability, and in which setting of a key and a tempo of reproduced singing voice can be changed, and moreover, he or she can sing with a correct sound pitch and timing like a professional singer, in the duet piece of karaoke. <P>SOLUTION: The system for synthesizing voice of duet part singing, on the basis of user's singing voice which is obtained beforehand, in a Karaoke machine, comprises: a user ID obtaining means; a user classified phoneme data extraction managing means; a partner designation means and a duet part singing creation means. The singing voice is synthesized by a phoneme data extracted from the singing voice, note data of the duet song and score data for voice synthesizing, which is created from lyrics data, and the duet part singing is created so that voice of the duet part singing can be output. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、カラオケ装置において任意の利用者がデュエット曲を歌唱する場合、相手方のデュエットパート歌唱を音声合成にて生成できるシステムに関する。 The present invention relates to a system capable of generating a counterpart duet part song by speech synthesis when an arbitrary user sings a duet song in a karaoke apparatus.

現在の通信カラオケシステムにおいて、一般的に、カラオケ装置により演奏されるカラオケ楽曲の多くは単独の歌唱者により歌唱されるものであるが、所謂「デュエット」と言われる男女二人で交互に歌唱するデュエット曲も含まれている。このようなデュエット曲を歌唱する場合には、男女二人の歌唱者が必要となるが、パートナーとなる相手歌唱者が得られない場合がある。 In the current online karaoke system, in general, most of the karaoke music performed by the karaoke apparatus is sung by a single singer, but sings alternately by two men and women, so-called "dueet". Duet songs are also included. When singing such a duet song, two male and female singers are required, but the partner singer who is a partner may not be obtained.

このような場合においては、従来、伴奏トラックの他に、男性歌唱の音声（歌声）信号トラックと女性歌唱の音声（歌声）信号データのトラックをそれぞれ別個に設け、男性が歌唱する場合には女性歌唱トラックを、女性が歌唱する場合には男性歌唱トラックが選択され、パートナーとなる相手歌唱者の歌唱音声が得られるようにしたシステムが提案されている（特許文献１参照）。 In such a case, conventionally, in addition to the accompaniment track, a male singing voice (singing voice) signal track and a female singing voice (singing voice) signal data track are separately provided. When a female sings a singing track, a system has been proposed in which a male singing track is selected and the singing voice of a partner singer who is a partner can be obtained (see Patent Document 1).

また、カラオケ利用者に対し、その利用者が所望するデュエットパートナーが同席していない場合であっても、そのデュエットパートナーが予め録音した歌唱音声をもって、好みのデュエット曲を歌唱できることを、その利用者に紹介することができるカラオケシステムを本出願人が提案している（特許文献２参照）。
特開平５−１０９１８７号公報特開２００６−２０８９６１号公報 Moreover, even if the duet partner desired by the user is not present to the karaoke user, the user can sing a favorite duet song with the singing voice recorded in advance by the duet partner. The applicant has proposed a karaoke system that can be introduced to (see Patent Document 2).
JP-A-5-109187 JP 2006-208961 A

上記特許文献１に開示された技術による場合は、複数のチャンネルに男性および女性の音声信号を記録して連続的に再生できるようにしたものである。したがって、デュエット曲を男性が歌唱する場合には、女性のパートナーを指定することにより男性のデュエットパートはキャンセルされて再生されず、女性のデュエットパートに至ると女性の歌唱が再生されるようにしてある。 In the case of the technique disclosed in Patent Document 1, male and female audio signals are recorded on a plurality of channels so that they can be reproduced continuously. Therefore, when a man sings a duet song, by specifying a female partner, the male duet part is canceled and not played, and when the female duet part is reached, the female song is played. is there.

このように構成されたシステムの場合は、デュエット曲を男性が歌唱する場合と女性が歌唱する場合のいずれにも対応できるようにしておくことを要するため、伴奏トラックとともに男性歌唱トラックおよび女性歌唱トラックのいずれも必要であり、音声信号のデータ量が多くなることから大きな記憶容量の記憶媒体が必要となる問題があった。そして、この技術では、デュエットパートナーはプロ歌手の歌唱を予め取得した音声を出力するもので、確かに、デュエットパートナーがいない場合でも仮想デュエット（一人デュエット）ができるものの、相手は「プロ歌手」に限定される。したがって、任意の利用者が所望する利用者、例えば、身近な友人や知人とは仮想デュエットができる訳ではない。 In the case of a system configured in this way, it is necessary to be able to cope with both the case where a man sings a duet song and the case where a woman sings, so a male song track and a female song track together with an accompaniment track. Both of these are necessary, and the amount of data of the audio signal is large, so that a storage medium having a large storage capacity is required. And with this technology, a duet partner outputs a sound that has been acquired by a professional singer in advance, and indeed, even if there is no duet partner, a virtual duet (single duet) can be made, but the partner is a "professional singer" Limited. Therefore, a virtual duet cannot be made with a user desired by an arbitrary user, for example, a close friend or acquaintance.

この点、特許文献２に開示された技術は、任意の利用者が所望する利用者と仮想デュエットができる仕組みになっている。本技術とは、カラオケ楽曲の演奏中にマイク入力からの歌唱音声信号を取得し、ハードディスク装置に歌唱音声データとして記録するものである。このようにして記録された歌唱音声データはデュエットパートナー管理テーブルにより管理され、利用者がリモコン装置に表示されるアイコンによる指定により所望のデュエットパートナーの歌唱音声を指定することができる。
ただし、この技術であっては、当然のことながら、任意の利用者が所望するデュエット曲を、所望するデュエットパートナーが必ず歌えなくてはならない。すなわち、所望するデュエットパートナーが歌えないデュエット曲では、その所望するデュエットパートナーとは仮想デュエットができないことになり、デュエット曲が限定されてしまう。
さらに、従来技術では、記録される歌唱音声データは実際に歌唱された音声に基づくもので、任意の利用者のキー、テンポや歌唱力の巧拙もそのまま記録されてしまうことになり、再生されるデュエットパートナーの歌唱音声が任意の利用者自身の歌唱に調和させることを殆ど要求できないものであった。 In this regard, the technique disclosed in Patent Document 2 has a mechanism that allows a virtual duet with a user desired by an arbitrary user. This technique is to acquire a singing voice signal from a microphone input during performance of karaoke music and record it as singing voice data in a hard disk device. The singing voice data recorded in this way is managed by the duet partner management table, and the user can specify the singing voice of a desired duet partner by specifying with an icon displayed on the remote control device.
However, with this technique, it is a matter of course that a desired duet partner must be able to sing a duet song desired by any user. That is, in a duet song in which a desired duet partner cannot sing, a virtual duet cannot be performed with the desired duet partner, and the duet song is limited.
Furthermore, in the prior art, the recorded singing voice data is based on the actually sung voice, and any user's key, tempo and skill of the singing ability will be recorded as they are and reproduced. The duet partner's singing voice could hardly be required to harmonize with any user's own singing.

ところで、最近、音声合成の技術が進歩し、本来、歌唱していないにも拘わらず、個人の発声をサンプリングして、歌唱を再現できる歌唱合成装置が高性能となっている。具体的には、例えば、特開２００７−２４０５６４号公報では、音声素片（音素）データベースに各種の音声素片を示す音声素片データをサンプリングして記憶しておき、ユーザから入力される音符データ及び歌詞データを含む曲データに基づいて、歌唱音声合成に用いる複数の音声素片、発生タイミング、ピッチを指定する情報を曲の進行の時系列化した歌唱合成スコア（スコアデータ）を生成し、当該歌唱合成スコアで指定される音声素片に対応する音声素片データを上記音声素片データベースから読み出し、所定のピッチ変換、素片連結を行うことで歌唱音声を合成することが開示されている。 By the way, recently, speech synthesis technology has advanced, and a singing synthesizer capable of sampling a person's utterance and reproducing a singing has become high performance, even though the singing is not originally performed. Specifically, for example, in Japanese Patent Application Laid-Open No. 2007-240564, speech unit data indicating various speech units is sampled and stored in a speech unit (phoneme) database, and a note input from a user is recorded. Based on song data including data and lyric data, a song synthesis score (score data) is generated by chronologically indicating the progression of the song with information specifying a plurality of speech segments, generation timings, and pitches used for song voice synthesis. The speech unit data corresponding to the speech unit specified by the singing synthesis score is read from the speech unit database, and singing speech is synthesized by performing predetermined pitch conversion and unit connection. Yes.

そこで、本発明は、この進歩した音声合成技術を活用し、カラオケのデュエット曲において、任意の利用者が所望するデュエットパートナー、例えば、身近な友人や知人と、これらの利用者が不在であっても、あるいは歌唱できるか否かは関係なく、如何なるデュエット曲（但し対象曲）でも仮想デュエットができ、また、歌唱時に利用者が選定したキーやテンポに調和させることも可能であり、さらに、プロ歌手のように音程やタイミングが狂わない理想的なデュエット歌唱を生成できるシステムの提供を課題とする。
この課題に鑑み、本発明者らは、以下に述べる各手段により解決するようにした。すなわち、本発明の請求項１記載のデュエットパート歌唱生成システムとは、複数のデュエットパートに対応する音符データと歌詞データとを備え、演奏シーケンスに基づき演奏されるデュエット曲のデュエットパート歌唱を音声合成するシステムであって、任意のカラオケ装置にてログインした利用者ＩＤを取得して利用者を特定し、この利用者が任意の楽曲を歌唱した際、所定の録音手段により取得された歌唱音声から音素データを抽出し、当該利用者の利用者ＩＤを紐付けて利用者別音素データ管理テーブルにて管理する。そして、任意の利用者が所定のデュエット曲を選曲する際、当該利用者によるデュエットパートナーの指定を受け付け可能とし、この指定を受け付けた場合、当該デュエットパートナーの利用者ＩＤに基づき、利用者別音素データ管理テーブルから、その音素データを取り出すと共に、これと当該デュエット曲の音符データと歌詞データから作成された音声合成用のスコアデータとにより歌唱音声を合成し、当該デュエットパート歌唱を音声出力可能に生成するものである。 Therefore, the present invention makes use of this advanced speech synthesis technology, and in a duet song of karaoke, any user who wants a duet partner, for example, a close friend or acquaintance, and these users are absent. Regardless of whether or not the song can be sung, any duet song (but the target song) can be a virtual duet, and can be harmonized with the key and tempo selected by the user at the time of singing. It is an object to provide a system that can generate an ideal duet song in which the pitch and timing do not change like a singer.
In view of this problem, the present inventors have made a solution by the means described below. That is, the duet part singing generation system according to claim 1 of the present invention includes a note data and lyrics data corresponding to a plurality of duet parts, and synthesizes a duet part song of a duet song to be played based on a performance sequence. A user ID obtained by logging in at an arbitrary karaoke device to identify the user, and when this user sings an arbitrary piece of music, from the singing voice acquired by a predetermined recording means Phoneme data is extracted, and the user ID of the user is linked and managed in the phoneme data management table for each user. When any user selects a predetermined duet song, the user can accept the designation of the duet partner. When this designation is accepted, the user-specific phoneme is based on the user ID of the duet partner. The phoneme data is extracted from the data management table, and the singing voice is synthesized with the score data for voice synthesis created from the note data of the duet song and the lyrics data, and the duet part song can be output as a voice. Is to be generated.

また、本発明の請求項２記載のデュエットパート歌唱生成システムとは、請求項１記載の発明において、所定の録音手段により採取された前記歌唱音声から音素データに加え表情データも抽出し、当該利用者の利用者ＩＤを紐付けて利用者別音素データ管理テーブルにて管理するものであり、さらに、この利用者ＩＤに基づき、利用者別音素データ管理テーブルから、その音素データに加え表情データを取り出すと共に、これと当該デュエット曲の音符データと歌詞データから作成された音声合成用のスコアデータとにより歌唱音声を合成し、当該デュエットパート歌唱を音声出力可能に生成するものである。 In addition, the duet part singing generation system according to claim 2 of the present invention is the use of the invention according to claim 1, extracting facial expression data in addition to phoneme data from the singing voice collected by a predetermined recording means, The user's user ID is linked and managed by the user-specific phoneme data management table. Further, based on the user ID, facial expression data is added to the phoneme data from the user-specific phoneme data management table. At the same time, the singing voice is synthesized with this, the score data for voice synthesis created from the note data of the duet music and the lyrics data, and the duet part singing is generated so that the voice can be output.

本発明の請求項１記載の発明によれば、利用者が任意の楽曲を歌唱した際、所定の録音手段により取得された歌唱音声から音素データを抽出して利用者毎に管理し、任意の利用者が指定したデュエットパートナーの音素データを取り出すと共に、これと選曲したデュエット曲の音符データと歌詞データから作成された音声合成用のスコアデータとにより歌唱音声を合成し、当該デュエットパート歌唱を音声出力可能に生成するようにしたので、デュエットパートナーが当該デュエット曲を歌唱できるか否かは関係なく、任意の利用者が所望するデュエットパートナー、例えば、身近な友人や知人による任意の楽曲の歌唱により抽出された音素データから、これらの人々と如何なるデュエット曲でも仮想デュエットができる。また、従来のように、データ容量の大きい男性パートまたは女性パートの歌唱音声自体を保持する必要がなく、比較的容量の小さい音素データを利用するため、デュエット曲の音声出力時などのデータ記憶量を削減することができ、さらに、シンセサイザイー規格、例えばＭＩＤＩ（登録商標）規格と、その音源により音声出力されるため、キーやテンポの設定変更（調和）が可能となり、そして、音程やタイミングも狂うことがないことから、理想的なデュエット歌唱を生成できるなどといった効果を奏する。 According to the first aspect of the present invention, when a user sings an arbitrary piece of music, the phoneme data is extracted from the singing voice acquired by a predetermined recording means and managed for each user. The phonetic data of the duet partner specified by the user is extracted, and the singing voice is synthesized from the note data of the selected duet song and the score data for voice synthesis created from the lyrics data, and the duet part singing is voiced. Since it is generated so that it can be output, regardless of whether or not the duet partner can sing the duet song, any user wants a duet partner, for example, by singing any song by a close friend or acquaintance From the extracted phoneme data, virtual duets can be made on any duet song with these people. In addition, since there is no need to hold the singing voice of the male part or female part with a large data capacity as in the past, and since the phoneme data with a relatively small capacity is used, the data storage amount at the time of the output of duet music Furthermore, since sound is output from the synthesizer standard such as the MIDI (registered trademark) standard and its sound source, the key and tempo can be changed (harmonized), and the pitch and timing can be adjusted. Since it doesn't go mad, it has the effect of creating an ideal duet song.

本発明の請求項２記載の発明によれば、上記請求項１記載の発明の効果に加え、表情データを付加するようにしたので、再生される歌唱音声に個性を付与することができ、パートナーの歌唱音声を歌唱者が望むものとすることができ、カラオケ装置のアミューズメント性を向上することができる。 According to the invention described in claim 2 of the present invention, in addition to the effect of the invention described in claim 1 above, since facial expression data is added, individuality can be imparted to the reproduced singing voice, and the partner The singing voice of the karaoke device can be improved by the singer, and the amusement performance of the karaoke apparatus can be improved.

以下、本発明の実施の形態を、通信カラオケ装置を前提とする実施例に基づいて説明するが、本発明はこれに限らず、スタンドアローンタイプのカラオケ装置にも実施して同等の効果を得ることができる。 Hereinafter, embodiments of the present invention will be described based on examples based on a communication karaoke device. However, the present invention is not limited to this, and the present invention is also applied to a stand-alone karaoke device to obtain the same effect. be able to.

図１に示すように、本発明を実施したカラオケ装置１は、主に装置全体の動作を制御する中央制御手段２と、これに接続された各種機器にて構成されている。すなわち、この中央制御手段２には、カラオケホスト装置との通信を制御するための通信制御手段１３や、リモコン装置Ｒからカラオケ装置１への操作信号を処理するための操作制御手段１６の他、利用者の歌唱音声を採取するための歌唱音声録音手段１７、デュエットパートナーを指定するためのパートナー指定手段１８、および楽曲に関する情報を選曲可能に表示するための楽曲表示手段１９などが内蔵されて、これらを統括的に制御すると共に、ハードディスク装置４、ＲＡＭ５、演奏音源６、ミキサ７、サウンドシステム８、ＰＣＭデコーダ８、ＭＰＥＧデコーダ１１、合成回路１２、利用者ＩＤ取得手段として、ＩＤカード読み取り／書き込み装置３なる機構が組み込まれている。 As shown in FIG. 1, a karaoke apparatus 1 embodying the present invention is mainly composed of a central control means 2 for controlling the operation of the entire apparatus and various devices connected thereto. That is, the central control means 2 includes a communication control means 13 for controlling communication with the karaoke host device, an operation control means 16 for processing an operation signal from the remote control device R to the karaoke device 1, A singing voice recording means 17 for collecting a user's singing voice, a partner specifying means 18 for specifying a duet partner, and a music display means 19 for displaying information related to music in a selectable manner are incorporated. In addition to comprehensively controlling them, the hard disk device 4, RAM 5, performance sound source 6, mixer 7, sound system 8, PCM decoder 8, MPEG decoder 11, synthesis circuit 12, and ID card reading / writing as user ID acquisition means A mechanism called the device 3 is incorporated.

前記演奏音源６は、中央制御手段２が実行するシーケンサ１５における楽曲シーケンサの処理によって入力された演奏データに応じて楽音信号を形成する。形成された楽音信号はミキサ７に入力され、このミキサ７は、演奏音源６が発生した複数の楽音信号やカラオケマイクＭと、Ａ／Ｄコンバータ９を介して入力された利用者の歌唱音声信号を適当なバランスでミキシングする。ミキシングされたデジタル音声信号はサウンドシステム８に入力される。このサウンドシステム８はパワーアンプを備え、入力されたデジタル信号をアナログ信号に変換して増幅し、スピーカＳから歌唱音声を放音する。
このＡ／Ｄコンバータ９は、選曲時に付帯された利用者ＩＤを付帯して選曲された楽曲（勿論、デュエット曲でなくても如何なる曲でも構わない）が演奏された際、カラオケマイクＭから入力される歌唱時の歌唱音声をデジタル変換して歌唱音声録音手段１７に送出する電子回路である。この歌唱音声録音手段１７にて採取された歌唱音声データは、その選曲時に付帯された利用者ＩＤに関連付けてＨＤＤ４の利用者別歌唱音声管理テーブルＴ１にて管理される歌唱音声データ２３として記憶される。そして、利用者別音素データ抽出管理手段２４は、この歌唱音声データ２３より、少なくとも音素データ２５、さらに好ましくは適宜表情データ（図示省略）を抽出し、利用者ＩＤに関連付けて、所定の利用者別音素データ管理テーブルＴ２にて管理される。 The performance sound source 6 forms a musical tone signal in accordance with performance data input by the music sequencer processing in the sequencer 15 executed by the central control means 2. The formed musical sound signal is input to the mixer 7, which is a plurality of musical sound signals generated by the performance sound source 6, a karaoke microphone M, and a user's singing voice signal input via the A / D converter 9. Is mixed with an appropriate balance. The mixed digital audio signal is input to the sound system 8. The sound system 8 includes a power amplifier, converts an input digital signal into an analog signal and amplifies it, and emits a singing sound from the speaker S.
This A / D converter 9 is input from the karaoke microphone M when a music piece selected with the user ID attached at the time of music selection (of course, any music other than a duet music) is played. It is an electronic circuit that digitally converts the singing voice at the time of singing and sends it to the singing voice recording means 17. The singing voice data collected by this singing voice recording means 17 is stored as the singing voice data 23 managed in the user-specific singing voice management table T1 of the HDD 4 in association with the user ID attached at the time of the music selection. The Then, the user-specific phoneme data extraction management means 24 extracts at least phoneme data 25, more preferably facial expression data (not shown) from the singing voice data 23, and associates it with a user ID for a predetermined user. It is managed by another phoneme data management table T2.

ハードディスク装置４には、楽曲データベース１９と、これをもって管理されている演奏データ２０、映像データ（図示省略）などが記録されており、例えば、この映像データの各映像データはＭＰＥＧ形式にエンコードされ、中央制御手段２が実行する背景映像再生手段（図示省略）により再生処理を行い、これを読み出してＭＰＥＧデコーダ１１に入力する。このＭＰＥＧデコーダ１１は、入力されたＭＰＥＧデータをＮＴＳＣ信号に変換して合成回路１２に入力し、この合成回路１２は、この背景映像の映像信号上に歌詞テロップや採点表示などのＯＳＤを合成し、合成された映像信号は表示手段に表示される。そして、所定の楽曲（所定のデュエット曲に限らない）の演奏データ２０は、それぞれ音符データ、歌詞データ（図示省略）を格納し、適宜当該楽曲について、ビブラートや抑揚等といった歌唱技術を周波数などでデータ化した楽曲唱法データを格納する。具体的には、楽曲ＩＤ、曲名及びアーチストＩＤ（アーチスト名）が関連付けられた楽曲テーブルを有し、楽曲毎に楽曲ＩＤで管理される所定データ形式のカラオケ楽曲の音符データ（例えば、ＭＩＤＩ（登録商標）形式の音符データ）及び歌詞データ（歌詞テロップデータ）が同期されて構成される楽曲データ（ファイル）について楽曲コードをファイル名としてそれぞれ格納したデータベースである。そして、所定のデュエット曲については、その音符データ及び歌詞データに基づいて歌唱音声合成用のスコアデータ２１が作成され、当該スコアデータは楽曲別に記憶される。当該スコアデータ作成のタイミングは種々あるが、本実施形態では、予め作成され楽曲データに組み込まれているが、本発明はこれに限らず、例えば、任意のデュエット曲が予約待ち行列に登録された際に、当該楽曲のスコアデータが作成されるものとしても構わない。 The hard disk device 4 stores a music database 19, performance data 20 and video data (not shown) managed by the music database 19, and for example, each video data of the video data is encoded in MPEG format. Playback processing is performed by background video playback means (not shown) executed by the central control means 2, read out and input to the MPEG decoder 11. The MPEG decoder 11 converts the input MPEG data into an NTSC signal and inputs it to the synthesis circuit 12. The synthesis circuit 12 synthesizes an OSD such as a lyrics telop and a scoring display on the video signal of the background video. The synthesized video signal is displayed on the display means. The performance data 20 of a predetermined music (not limited to a predetermined duet music) stores note data and lyric data (not shown), respectively, and appropriately uses a singing technique such as vibrato or intonation for the music by frequency or the like. Stores music singing method data. Specifically, it has a music table in which a music ID, a music title, and an artist ID (artist name) are associated, and musical note data (for example, MIDI (registration) of a predetermined data format managed by the music ID for each music. (Trademark) format note data) and lyric data (lyric telop data) are stored in synchronism with music data as file names. Then, for a predetermined duet song, score data 21 for singing voice synthesis is created based on the note data and lyrics data, and the score data is stored for each song. There are various timings for creating the score data. In this embodiment, the score data is created in advance and incorporated in the music data. However, the present invention is not limited to this, and for example, any duet music is registered in the reservation queue. At this time, score data of the music may be created.

シーケンサ１５は、楽曲コードで識別される演奏データをハードディスク装置４の演奏データ２０から読み出す。これは主に楽曲シーケンサおよび歌詞シーケンサからなっており、楽曲シーケンサは演奏データ中の演奏データトラック、ガイドメロディトラックなどのトラックデータを読み出し、このデータで演奏音源６を制御することでカラオケ楽曲の演奏を行い、一方、歌詞シーケンサは文字パターン作成手段を備えており、演奏データ中の歌詞トラックのデータを読み出し、このデータに基づいて歌詞テロップ画像パターンを作成し、これを合成回路１２に出力する。 The sequencer 15 reads the performance data identified by the music code from the performance data 20 of the hard disk device 4. This is mainly composed of a music sequencer and a lyrics sequencer. The music sequencer reads track data such as performance data tracks and guide melody tracks in the performance data, and controls the performance sound source 6 with this data to play karaoke music. On the other hand, the lyrics sequencer is provided with character pattern creation means, reads the data of the lyrics track in the performance data, creates a lyrics telop image pattern based on this data, and outputs it to the synthesis circuit 12.

そして、このカラオケ装置１には、楽曲検索機能を備えた多機能カラオケリモコン装置Ｒが付帯されており、このリモコン装置Ｒをもって、利用者は赤外線信号によりカラオケ装置１に所望の演奏予約コマンドを送信する。操作制御手段１６は、この操作信号を検出すると対応する処理を行う。例えば、リモコン装置Ｒにて楽曲コードが入力されると、カラオケ楽曲のリクエストであるとしてＲＡＭ５の予約待ち行列に楽曲コードを演奏予約する。 The karaoke device 1 is accompanied by a multi-function karaoke remote control device R having a music search function. With this remote control device R, the user transmits a desired performance reservation command to the karaoke device 1 by an infrared signal. To do. When the operation control means 16 detects this operation signal, it performs a corresponding process. For example, when a music code is input by the remote control device R, the music code is reserved for performance in the reservation queue of the RAM 5 as a request for karaoke music.

図２に本実施例における前記リモコン装置Ｒの外観の正面斜視図を示す。リモコン装置Ｒには、非接触のＩＣモジュールを搭載したＩＤカードの読み取り／書き込み装置が備えられ、利用者ＩＤ取得手段３として構成されている。なお、ＩＤカードの読み取り／書き込み装置は、厳密には利用者ＩＤ取得手段のインタフェイスであり、リモコン装置の背面の下方所定部にはＬＡＮポートが設けられ、ＬＡＮ接続ケーブルを介して各種のデータが送信される仕組みになっており、当該読み取り情報は、充電器を兼ねた通信用の専用架台（図示省略）を介して利用者データベース（図示省略）にて対照され、実際はここでＩＤの認証が行われて初めて利用者ＩＤが取得されたことになる。また、利用者ＩＤ取得手段としては、本発明はこれに限らず、例えば、ＩＣモジュールを携帯電話に内蔵させたＩＣ機能付き携帯電話の読み取り／書き込み装置やパスワードの入力装置や指紋、音声などの生体識別機能を有する装置などであってもよい。 FIG. 2 shows a front perspective view of the appearance of the remote control device R in the present embodiment. The remote control device R includes an ID card reading / writing device equipped with a non-contact IC module, and is configured as a user ID acquisition means 3. Strictly speaking, the ID card reading / writing device is an interface of user ID acquisition means, and a LAN port is provided in a predetermined lower part on the back of the remote control device, and various data are transmitted via the LAN connection cable. The read information is compared with a user database (not shown) via a dedicated communication stand (not shown) that also serves as a charger. The user ID is acquired for the first time. Further, the present invention is not limited to the user ID acquisition means. For example, a reading / writing device for a mobile phone with an IC function in which an IC module is built in a mobile phone, a password input device, a fingerprint, a voice, etc. An apparatus having a biometric identification function may be used.

リモコン装置Ｒの正面には、タッチセンサを積層した液晶ディスプレイによる表示画面Ｄ２を主体とした利用者インタフェイスが備えられ、ＧＵＩ（ Graphical User Interface ）環境を提供し、このＧＵＩ環境による対話を通じて楽曲索引データベース（図示省略）を検索し、その結果を表示出力させることで検索機能を実現している。そして、本発明に必要な、デュエット曲の選曲やデュエットパートナーの指定もこのＧＵＩを用いて行う。 In front of the remote control device R, a user interface mainly including a display screen D2 by a liquid crystal display with a touch sensor is provided, and a GUI (Graphical User Interface) environment is provided. A search function is realized by searching a database (not shown) and displaying the result. The GUI also uses the GUI to select a duet song and specify a duet partner, which are necessary for the present invention.

図３は、前記リモコン装置Ｒが付帯するＬＣＤによる利用者インタフェイスを利用した利用者別ログイン・ログアウト指示の表示画面Ｄ２である。この表示画面Ｄ２には、利用者別顔写真付きアイコン列３０やカラオケシステムへのログインをするための「ログイン」アイコン３１、ログアウトをするための「ログアウト」アイコン３２などが表示されている。このリモコン装置Ｒを前述した専用架台に装着した状態で、先ず、「ログイン」アイコン３１を選択した後、ＩＤカードの読み取り／書き込み装置に自らのＩＤカードを装填することにより、これに紐付けされた利用者ＩＤが識別されると同時にシステムへのログインが認識される。 FIG. 3 is a display screen D2 of a login / logout instruction for each user using a user interface by an LCD attached to the remote control device R. On this display screen D2, an icon row 30 with a face photo for each user, a “login” icon 31 for logging in to the karaoke system, a “logout” icon 32 for logging out, and the like are displayed. In a state where the remote controller R is mounted on the above-described dedicated mount, first, the “login” icon 31 is selected, and then the ID card is attached to the ID card reading / writing device by loading the ID card. When the user ID is identified, login to the system is recognized.

ログインが認識されると、カラオケホスト装置に予め登録された利用者の顔写真、あるいは予め登録されていない場合、デジタルカメラ機能を使って撮影した利用者の顔写真を新たに登録して、順次、利用者別顔写真付きアイコンが利用者毎に掲載された利用者別インタフェイスが作成される。この利用者別顔写真付きアイコン列３０に利用者別顔写真付きアイコン３０が形成された場合、それぞれを現在ログインしている利用者と見なし、例えば、図１において、楽曲情報表示手段１９は、現在ログインしている利用者に対して、デュエット曲に関する情報を選曲可能に表示できる。そして、任意の利用者が所定のデュエット曲のデュエットパートナーを指定する際には、パートナー指定手段１８は、下記の如く、その所望する利用者の利用者別写真付きアイコン３０ａを選択することで、当該パートナーを特定することができる。 When the login is recognized, the user's face photo registered in advance in the karaoke host device, or if not registered in advance, a user's face photo taken using the digital camera function is newly registered and sequentially Then, an interface for each user is created in which an icon with a face photo for each user is posted for each user. When the icon 30 with a user-specific face photo is formed in the icon row 30 with a user-specific face photo, each icon is regarded as a currently logged-in user. For example, in FIG. Information about duet songs can be displayed to the currently logged-in user so that the user can select a song. When any user designates a duet partner of a predetermined duet song, the partner designating means 18 selects the desired user-specific photo icon 30a as described below. The partner can be identified.

つぎに、図４に示すデュエットパートナー指定用のインタフェイス表示画面Ｄ２について以下に説明する。利用者は、図３における利用者別顔写真付きアイコン３０ａを選択した後、所定の操作によりこの表示画面Ｄ２にアクセスできる。この表示が画面Ｄ２には、任意の利用者が選択したデュエット曲について当該利用者ＩＤを紐付けし、その所望するデュエットパートナーを指定するための各種のアイコンが設けられている。図３における利用者別顔写真付きアイコン３０ａを選択すると、例えば、利用者欄３３には、その利用者を特定するための顔写真３３ａと共に予め登録されている利用者の氏名やニックネームなどが表示される。 Next, the interface display screen D2 for designating a duet partner shown in FIG. 4 will be described below. The user can access this display screen D2 by a predetermined operation after selecting the icon 30a with a user-specific face photo in FIG. This display screen D2 is provided with various icons for associating the user ID with respect to a duet song selected by an arbitrary user and designating a desired duet partner. When the icon 30a with a user-specific face photo in FIG. 3 is selected, for example, the user field 33 displays the name and nickname of the user registered in advance together with the face photo 33a for identifying the user. Is done.

このようにして特定された利用者は、図４に示す如く、先ず、所望するデュエット曲の指定欄３４の楽曲コード入力部へ楽曲コード入力により選択し、さらに、そのデュエット曲において、デュエットパートナーに歌唱してもらいたいデュエットパート（男性パートまたは女声パート）をデュエットパート種別欄３４ｂの選択により指定する。なお、本実施例では、特定された利用者の属性データの性別から、その利用者に対する異性のデュエットパートが自動的にデフォルト設定される。 As shown in FIG. 4, the user specified in this way is first selected by inputting a music code into the music code input section of the desired duet music designation field 34, and further, the duet music is selected as a duet partner. A duet part (male part or female part) to be sung is designated by selecting the duet part type field 34b. In the present embodiment, the duet part of the opposite sex for the user is automatically set by default from the gender of the attribute data of the specified user.

そして、デュエットパートナーの指定は、デュエットパートナー指定欄３５への入力によって行われるが、本実施例では、デュエットパートナーのＩＤカードなどに記載された所定のＩＤ番号を入力すると、対応するＩＤの顔写真３５ａと共にその氏名やニックネーム３５ｂが表示され、ＧＵＩ機能によりアイコンを指定することで、利用者がデュエットパートナーを決めた状態で「指定」アイコン３６を選択すると、図１におけるパートナー指定手段１８は、ＲＡＭ５内にてデュエットパートナーデータ５ａとして、当該利用者ＩＤと楽曲ＩＤとを紐付けして記憶される。 The duet partner is designated by inputting into the duet partner designation field 35. In this embodiment, when a predetermined ID number written on the duet partner's ID card or the like is entered, a face photograph of the corresponding ID. The name and nickname 35b are displayed together with 35a, and when the user designates the icon by the GUI function and selects the “designation” icon 36 in a state where the duet partner is decided, the partner designation means 18 in FIG. The user ID and the music ID are associated with each other and stored as duet partner data 5a.

つぎに、利用者別音素データ管理テーブルＴ２について説明する。図５に示すように、この利用者別音素データ管理テーブルＴ２は、利用者別歌唱音声管理テーブルＴ１に対応しており、共通に各レコードを特定するための「番号」フィールドｆ１、利用者を特定するための「利用者ＩＤ」フィールドｆ２、当該利用者が歌唱した楽曲曲を特定するための「楽曲ＩＤ」フィールドｆ３、歌唱音声データ自体ないし歌唱音声データを特定するための「歌唱音声データ」フィールドｆ４と、当該歌唱音声データから抽出された音素データ自体ないし音素データを特定するための「音素データ」フィールドｆ５、これと同様に、当該歌唱音声データから抽出された表情データ自体ないし表情データを特定するための「表情データ」フィールドｆ６として、それぞれのデータが管理されている。 Next, the user-specific phoneme data management table T2 will be described. As shown in FIG. 5, this user-specific phoneme data management table T2 corresponds to the user-specific singing voice management table T1, and a “number” field f1 for specifying each record in common, “User ID” field f2 for specifying, “Song ID” field f3 for specifying the song sung by the user, “Singing voice data” for specifying the singing voice data itself or the singing voice data The field f4, the phoneme data itself extracted from the singing voice data or the “phoneme data” field f5 for specifying the phoneme data, and the facial expression data itself or facial expression data extracted from the singing voice data in the same manner Each data is managed as an “expression data” field f6 for specifying.

以下、利用者別音素データ抽出管理手段につき、さらに詳述する。音素データは歌唱音声データから抽出され、利用者ＩＤ毎に生成される。音素データとは、現在の音声合成に必要な要素であって、個人の発声音を、母音、先頭子音、末尾子音、子音から母音への変化、母音から子音への変化の五つの音声素片に区分してデータ化したものである。本発明では、例えば、歌唱音声処理装置を備えており、図６にその具体的構成を示す。この歌唱音声処理装置４０には、音素データベース４１と共に表情データベース４２を備え、音素データベース４１は、前記歌唱音声データの音声信号をＳＭＳ分析手段やＬＰＣ分析手段などの分析手法を用いて分析し、音声ごとに切り分けられた音声素片が利用者ＩＤ毎に記憶されている。 Hereinafter, the user-specific phoneme data extraction management means will be described in further detail. Phoneme data is extracted from singing voice data and generated for each user ID. Phoneme data is an element required for current speech synthesis. Individual speech sounds are divided into five speech segments: vowels, head consonants, tail consonants, changes from consonants to vowels, and changes from vowels to consonants. It is classified into data. In the present invention, for example, a singing voice processing device is provided, and a specific configuration thereof is shown in FIG. The singing voice processing device 40 includes a facial expression database 42 together with a phoneme database 41. The phoneme database 41 analyzes a voice signal of the singing voice data using an analysis method such as SMS analysis means or LPC analysis means, A speech segment carved for each user ID is stored for each user ID.

表情データベース４２は、歌唱表情に重要な要素となるビブラート情報などを記憶している。この表情データとは、ビブラートや抑揚などの歌に特徴を付ける歌唱技術を周波数情報などによりデータ化したものであり、歌唱音声に歌唱技術を反映させることができるものである。特にビブラート情報は再生される歌唱音の個性を決定するための重要な要素であり、パラメーターを調整することにより、例えば、歌手の情報（性別／子供／若者／中高年など）、あるいは楽曲のジャンル（ポップス調、演歌調など）を任意に指定することができる。したがって、この表情データに任意のパラメーターを与えて様々な歌唱表情を生成することが可能となる。 The facial expression database 42 stores vibrato information that is an important element for singing facial expressions. This facial expression data is a singing technique that characterizes a song such as vibrato or intonation using frequency information and the like, and the singing technique can be reflected in the singing voice. In particular, the vibrato information is an important element for determining the individuality of the singing sound to be played, and by adjusting the parameters, for example, singer information (sex / child / young / middle-aged, etc.) Pops tone, Enka tone, etc.) can be specified arbitrarily. Therefore, it is possible to generate various singing facial expressions by giving arbitrary parameters to the facial expression data.

つぎに、音声素片選択部４３は、演奏データ（例えば、音符、歌詞、ピッチベンド、ダイナミックスなどの情報を含んだＭＩＤＩ情報）をフレーム単位（フレームデータ）で受けるとともに、入力されたフレームデータ中の歌詞データに対応する音声素片データを音素データベース４１から選択して読み出す機能を有する。この場合、利用者ＩＤが指定されると、当該利用者ＩＤに相当する音声素片データが得られることになる。 Next, the speech segment selection unit 43 receives performance data (for example, MIDI information including information such as notes, lyrics, pitch bends, dynamics, etc.) in units of frames (frame data), and in the input frame data. Has a function of selecting and reading out speech unit data corresponding to the lyric data from the phoneme database 41. In this case, when a user ID is designated, speech segment data corresponding to the user ID is obtained.

表情テンプレート選択部４４は、設定部４５からの利用者による種々の指定（例えば、女性／演歌など）により、これに対応する特定のビブラートデータを表情データベース４２から読み出す。前記設定部４５には演奏データ（ＭＩＤＩ）とリモコン装置Ｒなどからの設定データを入力する。これにより、設定データに基づいて表情テンプレート選択部４４においてビブラート情報が選択され、音声素片選択部４３において音声素片が選択されることになる。この音声音素選択部４３における音声素片の選択は、演奏データにＭＩＤＩデータを採用する場合は、ＭＩＤＩ規格のノートデータに予め設定されている音節を割り当てた歌詞情報に対応する音声素片を音素データベース４１から読み出すようにする。 The facial expression template selection unit 44 reads specific vibrato data corresponding thereto from the facial expression database 42 according to various designations (for example, female / enka etc.) by the user from the setting unit 45. The setting unit 45 receives performance data (MIDI) and setting data from the remote controller R or the like. As a result, the vibrato information is selected by the facial expression template selection unit 44 based on the setting data, and the speech unit is selected by the speech unit selection unit 43. In the selection of the speech unit in the speech phoneme selection unit 43, when MIDI data is adopted as the performance data, the speech unit corresponding to the lyric information assigned with the syllable set in advance in the MIDI standard note data is selected as the phoneme. Read from the database 41.

このようにして決定される音声素片とビブラート情報は、演奏シーケンスに従って制御部４６に入力する。この制御部４６に制御される歌唱音声音源４７は、ＭＩＤＩデータから音声信号を生成するための人の声の音色情報などを保持しており、制御部４５の制御に従って歌唱音の音声信号を生成する。また、音声処理部４８は、音声信号を音声処理（リバーブ／コーラス／バリエーションなど）するための各種情報を保持しており、制御部４６の制御により音声信号の各種の音声処理を行う。 The speech segment and the vibrato information determined in this way are input to the control unit 46 according to the performance sequence. The singing voice sound source 47 controlled by the control unit 46 holds timbre information of a human voice for generating a voice signal from MIDI data, and generates a voice signal of the singing sound according to the control of the control unit 45. To do. The audio processing unit 48 holds various types of information for performing audio processing (such as reverb / chorus / variation) on the audio signal, and performs various types of audio processing on the audio signal under the control of the control unit 46.

歌唱音声処理装置４０はこのように構成されていることから、演奏シーケンスに従って演奏データが入力すると、その歌詞情報に基づいて音声素片が順次読み出され、設定されたビブラート情報に基づいて音声処理された所望の合成歌唱音声ＣＶを得ることができる。また、設定部４５の設定内容は容易に変更できるので、出力される合成歌唱音声ＣＶのキーやテンポを自由に変更することができる。そして、出力された合成歌唱音声ＣＶは演奏音源６からの出力とミキサ７において更に合成され、合成歌唱音声ＣＶとカラオケ楽曲の演奏音が得られる。 Since the singing voice processing device 40 is configured as described above, when performance data is input in accordance with the performance sequence, the voice segments are sequentially read based on the lyrics information, and the voice processing is performed based on the set vibrato information. The desired synthesized singing voice CV can be obtained. Moreover, since the setting content of the setting part 45 can be changed easily, the key and tempo of the synthetic singing voice CV to be output can be freely changed. Then, the output synthesized singing voice CV is further synthesized with the output from the performance sound source 6 in the mixer 7, and the synthesized singing voice CV and the performance sound of the karaoke music are obtained.

歌唱音声処理装置４０をこのように機能させた場合は、演奏音とともに合成歌唱音声ＣＶも同時に再生されることになる。したがって、本来のカラオケ装置として機能させる場合は、歌唱音声処理装置４０の機能を停止する設定データを設定部４５に与えることにより、合成歌唱音声ＣＶが出力されず、演奏音源６からの演奏音のみを出力してカラオケ演奏が可能となる。 When the singing voice processing device 40 is caused to function in this way, the synthesized singing voice CV is also reproduced at the same time as the performance sound. Therefore, when functioning as an original karaoke device, the setting data for stopping the function of the singing voice processing device 40 is given to the setting unit 45, so that the synthesized singing voice CV is not output and only the performance sound from the performance sound source 6 is output. Karaoke performance is possible.

本発明は、かかる機能を有効に利用するもので、リモコン装置Ｒによる指示に基づき、設定部４５への設定データに男性または女性のデュエットパートを指定するデータを与えることにより、カラオケ楽曲のデュエット曲において男性が歌唱する場合には女性デュエットパートを合成歌唱音声ＣＶで出力し、女性が歌唱する場合には男性デュエットパートを合成歌唱音声ＣＶで出力することが可能となる。 The present invention makes effective use of such a function. By giving data specifying a male or female duet part to the setting data to the setting unit 45 based on an instruction from the remote control device R, the duet music of the karaoke music is provided. When a man sings, a female duet part can be output as a synthesized singing voice CV, and when a woman sings, a male duet part can be output as a synthesized singing voice CV.

このように本発明によれば、デュエット曲におけるデュエットパートナーの歌唱音声を合成歌唱音声で得られるようにしたことから、実際に歌唱する歌唱者のキーやピッチに整合させることが可能であり、表情テンプレートを任意に選択して合成歌唱音声の音色を自在に選定することもできる。そして、デュエットパートナーが当該デュエット曲を歌唱できるか否かは関係なく、任意の利用者が所望するデュエットパートナー、例えば、身近な友人や知人による任意の楽曲の歌唱により抽出された音素データから、これらの人々と如何なるデュエット曲でも仮想デュエットが可能となるなどの娯楽性を向上することができる。 As described above, according to the present invention, since the singing voice of the duet partner in the duet song can be obtained by the synthesized singing voice, it is possible to match the key and pitch of the singer who actually sings, The timbre of the synthesized singing voice can be freely selected by arbitrarily selecting a template. Regardless of whether or not the duet partner can sing the duet song, the duet partner desired by any user, for example, from phoneme data extracted by singing any song by a close friend or acquaintance This makes it possible to improve entertainment such as enabling a virtual duet on any duet song.

そして、本発明によれば、歌唱音声データから合成歌唱音声を生成できるようにしたので、デュエット曲ごとに膨大な歌唱データを記録して記憶しておく必要がなく、ソフトウエア処理により多様なアレンジが可能となるので、カラオケ装置における機能性を格段に向上することができるなど本発明特有の効果を奏する。 According to the present invention, since the synthesized singing voice can be generated from the singing voice data, it is not necessary to record and store a huge amount of singing data for each duet song, and various arrangements can be made by software processing. As a result, the functionality of the karaoke apparatus can be remarkably improved.

本発明の概要を説明するブロック図。The block diagram explaining the outline | summary of this invention. リモコン装置の正面斜視図Front perspective view of remote control device 利用者別ログイン・ログアウト指示のインタフェイス表示画面。Interface display screen for user login / logout instructions. デュエットパートナー指定用のインタフェイス表示画面。Interface display screen for specifying duet partners. 利用者別音素データ管理テーブルの概念的構成図。The conceptual block diagram of the phoneme data management table classified by user. 歌唱音声処理装置の構成を示すブロック図。The block diagram which shows the structure of a song voice processing apparatus.

Explanation of symbols

１・・・・・・カラオケ装置
２・・・・・・制御手段
３・・・・・・ＩＤカード読み取り／書き込み装置（利用者ＩＤ取得手段）
４・・・・・・ハードディスク装置
５・・・・・・ＲＡＭ
５ａ・・・・・指定パートナーデータ
１７・・・・・歌唱音声録音手段
１８・・・・・パートナー指定手段
２１・・・・・スコアデータ
２４・・・・・歌唱者別音素データ抽出管理手段
４０・・・・・歌唱音声処理装置
４１・・・・・音素データベース
４２・・・・・表情データベース
４３・・・・・音声素片選択部
４４・・・・・表情テンプレート選択部
４５・・・・・設定部
４６・・・・・制御部
４７・・・・・歌唱音声音源
４８・・・・・音声処理部
Ｍ・・・・・・カラオケマイク
ＣＶ・・・・・合成歌唱音声
Ｔ１・・・・・利用者別歌唱音声管理テーブル
Ｔ２・・・・・利用者別音素データ管理テーブル 1 ... Karaoke device 2 ... Control means 3 ... ID card reading / writing device (user ID acquisition means)
4 .... Hard disk drive 5 .... RAM
5a... Designated partner data 17... Singing voice recording means 18... Partner designation means 21... Score data 24. 40 .. Singing voice processing device 41... Phoneme database 42 .. Expression database 43... Speech element selection unit 44 .. Expression template selection unit 45. ... Setting part 46 ... Control part 47 ... Singing sound source 48 ... Sound processing part M ... Karaoke microphone CV ... Synthetic singing voice T1 …… Singing voice management table by user T2 …… Phoneme data management table by user

Claims

In a karaoke device comprising note data and lyrics data corresponding to a plurality of duet parts, and having a duet song to be played based on a performance sequence, the duet part singing based on the user's singing voice acquired in advance A user ID acquisition unit, a user-specific phoneme data extraction management unit, a partner designation unit, and a duet part song generation unit,
(A) The user ID acquisition means is to acquire a user ID logged in with an arbitrary karaoke device and identify the user,
(B) The user-specific phoneme data extraction management means extracts the phoneme data from the singing voice acquired by the predetermined recording means when the specified user sings an arbitrary piece of music. User IDs are linked and managed in the phoneme data management table for each user.
(C) Partner designation means that any user can accept a duet partner designation by the user when selecting a specific duet song.
(D) Duet part singing generation means, when receiving the designation of the duet partner, based on the user ID of the duet partner, extracts the phoneme data from the user-specific phoneme data management table and Singing voice is synthesized from the data, score data for voice synthesis created from the note data and lyrics data of the duet song, and the duet part song is generated so that voice output is possible.
This is a duet part singing system.

The user-specific phoneme data extraction management means extracts facial expression data in addition to the phoneme data from the singing voice collected by the predetermined recording means, and associates the user ID of the user with the user-specific phoneme data management. The duet part singing generation means extracts facial expression data in addition to the phoneme data from the user-specific phoneme data management table based on the user ID of the duet partner, and also manages the phoneme. Singing voice is synthesized from the data and facial expression data, and the score data for voice synthesis created from the note data and lyrics data of the duet song, and the duet part song is generated so that voice output is possible.
The duet part singing system according to claim 1.