JP6406273B2

JP6406273B2 - Karaoke device and program

Info

Publication number: JP6406273B2
Application number: JP2016017177A
Authority: JP
Inventors: 佳紀原; 典昭阿瀬見
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2016-02-01
Filing date: 2016-02-01
Publication date: 2018-10-17
Anticipated expiration: 2036-02-01
Also published as: JP2017138359A

Description

本発明は、仮想ボーカルを生成して出力するカラオケ装置，及びそのカラオケ装置が備えるコンピュータに実行させるプログラムに関する。 The present invention relates to a karaoke apparatus that generates and outputs a virtual vocal, and a program that is executed by a computer included in the karaoke apparatus.

特許文献１に記載されているように、楽曲の演奏に併せて当該楽曲の歌詞を表示するカラオケ装置が知られている。このカラオケ装置では、複数歌手により構成されるグループ歌手による歌唱が原曲であるカラオケ楽曲を、当該グループ歌手の構成人数以上の利用者が歌唱する際に、利用者の歌唱音声と、歌手の歌唱音声とを、歌唱音声の音質と歌唱特性の少なくとも一方について比較し、歌手の歌唱パートとの類似度が高い利用者から順に、各利用者にそれぞれ異なる歌唱パートを割り当て、複数の歌唱パートを有した一つの楽曲を演奏し、複数の利用者にグループ歌手の歌唱をさせることがなされている。 As described in Patent Document 1, a karaoke apparatus is known that displays lyrics of the music in conjunction with the performance of the music. In this karaoke device, when karaoke music whose original singing is performed by a group singer composed of a plurality of singers is sung by more than a certain number of users of the group singer, the singing voice of the user and the singing of the singer Compare the voice with at least one of the sound quality and singing characteristics of the singing voice, assign different singing parts to each user in descending order of the similarity to the singer's singing part, and have multiple singing parts. A single tune is played and a plurality of users are allowed to sing group singers.

特開２０１５−４５６７１号公報Japanese Patent Laying-Open No. 2015-45671

しかしながら、グループ歌手が歌唱する複数のパートから構成される楽曲を、グループ歌手の楽曲のパート数より少ない数の利用者が歌唱する場合に、特許文献１に記載の技術を用いると、グループ歌手が歌唱している原曲の演奏音よりも音数が少なく、原曲の演奏音への再現性が低くなってしまうという問題がある。 However, when the number of users singing a song composed of a plurality of parts sung by a group singer is less than the number of parts of the group singer's song, using the technique described in Patent Document 1, the group singer There is a problem that the number of sounds is less than the performance sound of the original song being sung, and the reproducibility to the performance sound of the original song is low.

そこで、本発明は、複数の歌唱パートを有した楽曲のカラオケ演奏音が再生されたとき、利用者の歌唱しているパートを判別して、いずれかの歌唱パートを歌唱する利用者の歌唱音声と演奏音の出力により、原曲に近い演奏を可能とするカラオケ装置を提供することを目的とする。 Therefore, the present invention discriminates the part the user is singing when the karaoke performance sound of the music having a plurality of singing parts is reproduced, and the singing voice of the user who sings one of the singing parts An object of the present invention is to provide a karaoke apparatus that can perform a performance close to the original music by outputting performance sounds.

上記目的を達成するためになされた本発明の一態様は、再生手段と、取得手段と、生成手段と、算出手段と、特定手段と、出力手段とを備える、カラオケ装置に関する。
再生手段は、音符属性の少なくとも一部分が異なる複数の歌唱パートを有した楽曲であって、指定された楽曲である指定楽曲の伴奏音を再生する。取得手段は、再生手段による指定楽曲の再生中に、マイクを介して入力された一または複数の音声を歌唱データとして取得する。 One aspect of the present invention made to achieve the above object relates to a karaoke apparatus comprising a reproducing means, an acquiring means, a generating means, a calculating means, a specifying means, and an output means.
The reproduction means reproduces an accompaniment sound of a designated music that is a designated music that is a music having a plurality of singing parts having at least a part of note attributes different from each other. The obtaining means obtains one or a plurality of sounds input through the microphone as song data during reproduction of the designated music by the reproducing means.

さらに、生成手段は、指定楽曲における歌唱パートの数と歌唱データの音声の数との差以上の数の仮想ボーカルデータと、歌唱データとを指定楽曲の各歌唱パートに対応させて混合した総合ボーカルを、指定楽曲の歌唱パートの組み合わせごとに生成する。仮想ボーカルデータとは、指定楽曲における複数の歌唱パートの各々の歌声をオリジナル歌手の歌い方で再現した歌声の各々である。 Furthermore, the generating means is a general vocal in which the virtual vocal data whose number is equal to or greater than the difference between the number of singing parts in the designated music and the number of voices of the singing data and the singing data are mixed in correspondence with each singing part of the designated music. Is generated for each combination of singing parts of the designated music. Virtual vocal data is each of the singing voice which reproduced each singing voice of the plurality of singing parts in the designated music piece by the original singer's way of singing.

算出手段は、生成手段で生成された前記総合ボーカルの各々について、オリジナル歌手によって複数の歌唱パートの各々が歌唱された指定楽曲の歌声を比較用ボーカルと照合し、比較用ボーカルとの類似度を算出する。特定手段は、算出手段で算出された類似度のうち、類似度が最も高い総合ボーカルを特定する。出力手段は、特定手段で特定した総合ボーカルに含まれる仮想ボーカルデータを、指定楽曲の伴奏音の再生とともに出力させる。 For each of the integrated vocals generated by the generating means, the calculating means collates the singing voice of the designated song in which each of the plurality of singing parts is sung by the original singer with the comparative vocal, and determines the similarity with the comparative vocal. calculate. The specifying means specifies the total vocal having the highest similarity among the similarities calculated by the calculating means. The output means outputs the virtual vocal data included in the general vocal specified by the specifying means together with the reproduction of the accompaniment sound of the designated music.

このようなカラオケ装置によれば、比較用ボーカルに最も類似する総合ボーカルを特定できる。この総合ボーカルは、指定楽曲において利用者が担当する歌唱パート以外の歌唱パートの歌声を、仮想ボーカルデータで補った指定楽曲全体の歌声である。 According to such a karaoke apparatus, it is possible to specify the general vocal most similar to the comparative vocal. This general vocal is the singing voice of the entire designated music obtained by supplementing the singing voice of the singing part other than the singing part in charge of the designated music with the virtual vocal data.

そして、カラオケ装置によれば、その特定した総合ボーカルに含まれる仮想ボーカルデータを出力できる。このため、カラオケ装置によれば、仮想ボーカルデータに、利用者の歌声を加えて出力することで、原曲における全体の歌声に近い演奏音を出力できる。 And according to the karaoke apparatus, the virtual vocal data contained in the specified general vocal can be output. For this reason, according to a karaoke apparatus, the performance sound close | similar to the whole singing voice in an original music can be output by adding a user's singing voice to virtual vocal data, and outputting it.

換言すれば、カラオケ装置において、複数の歌唱パートを有した楽曲を再生する際に、いずれかの歌唱パートを歌唱する利用者の歌唱に合わせて、可能な限り原曲に近い演奏音を出力することができる。 In other words, in a karaoke apparatus, when playing a song having a plurality of singing parts, a performance sound that is as close to the original song as possible is output in accordance with the song of the user who sings any singing part. be able to.

カラオケ装置における取得手段は、２つ以上のマイクの各々を介して入力された複数の音声の各々を、歌唱データとして取得してもよい。この場合、生成手段は、取得手段で取得した複数の歌唱データを、歌唱データの数以下の歌唱パートに設定し、複数の歌唱データの歌唱パートの全ての組み合わせについて、歌唱データと、指定楽曲における歌唱パートの数と設定された歌唱パートの数との差以上の数の仮想ボーカルデータとを混合した総合ボーカルを生成してもよい。 The acquisition means in the karaoke apparatus may acquire each of a plurality of sounds input via each of the two or more microphones as song data. In this case, the generation means sets the plurality of song data acquired by the acquisition means to the song parts equal to or less than the number of song data, and for all combinations of the song parts of the plurality of song data, in the song data and the designated music piece You may produce | generate the integrated vocal which mixed the number of virtual vocal data more than the difference of the number of singing parts and the number of the singing parts set.

このようなカラオケ装置によれば、２つ以上のマイクの各々を介して入力された音声の各々を歌唱データとして取得し、一つの総合ボーカルを生成できる。
すなわち、カラオケ装置によれば、複数の利用者が歌唱する場合であっても、最も類似度が高い総合ボーカルを特定でき、適切な仮想ボーカルデータを出力できる。 According to such a karaoke apparatus, each voice input via each of two or more microphones can be acquired as singing data, and one general vocal can be generated.
That is, according to the karaoke apparatus, even when a plurality of users sing, it is possible to specify the general vocal having the highest similarity and output appropriate virtual vocal data.

カラオケ装置の生成手段は、複数の歌唱データの歌唱パートが、それぞれ異なる歌唱パートである組み合わせについて、総合ボーカルを生成してもよい。
このようなカラオケ装置によれば、複数の利用者の各々が別個の歌唱パートを歌唱する場合であっても、最も類似度が高い総合ボーカルを特定できる。 The production | generation means of a karaoke apparatus may produce | generate a general vocal about the combination whose song parts of several song data are respectively different song parts.
According to such a karaoke apparatus, even if each of a plurality of users sing a separate singing part, it is possible to specify a general vocal having the highest similarity.

また、カラオケ装置の生成手段は、複数の歌唱データのうちの少なくとも２つの歌唱データの歌唱パートが、同一の歌唱パートである場合を含む組み合わせについて、総合ボーカルを作成してもよい。 Moreover, the production | generation means of a karaoke apparatus may produce a general vocal about the combination including the case where the song part of the at least 2 song data of the some song data is the same song part.

このようなカラオケ装置によれば、複数の利用者が同一の歌唱パートを歌唱する場合であっても、最も類似度が高い総合ボーカルを特定できる。
さらに、カラオケ装置の生成手段は、歌唱データのパワーが、仮想ボーカルデータにおける少なくとも１つの歌唱パートのパワーを基準として規定された範囲内となるように、歌唱データのパワーを補正して、総合ボーカルを生成してもよい。 According to such a karaoke apparatus, even when a plurality of users sing the same singing part, it is possible to specify a general vocal having the highest similarity.
Further, the karaoke device generating means corrects the power of the singing data so that the power of the singing data is within a range defined based on the power of at least one singing part in the virtual vocal data, and the general vocal May be generated.

このようなカラオケ装置によれば、歌唱データ及び仮想ボーカルデータのいずれか一方のパワーが他方のパワーに比べて極端に大きくなることを低減でき、歌唱データ及び仮想ボーカルデータのいずれか一方に他方が埋没することを低減できる。 According to such a karaoke apparatus, it is possible to reduce the power of either the singing data or the virtual vocal data from becoming extremely large compared to the power of the other, and the other is in one of the singing data and the virtual vocal data. Burying can be reduced.

これにより、カラオケ装置によれば、類似度の算出精度を向上させることができる。
ところで、カラオケ装置の出力手段は、調整手段を備え、その調整手段で調整された仮想ボーカルデータ、歌唱データのいずれかを出力してもよい。調整手段は、仮想ボーカルデータにおける音レベル、または、前記歌唱データの音レベルの少なくともいずれかを、相対的に規定された範囲内とする調整を実行することが好ましい。 Thereby, according to a karaoke apparatus, the calculation precision of a similarity degree can be improved.
By the way, the output unit of the karaoke apparatus may include an adjustment unit, and may output either virtual vocal data or singing data adjusted by the adjustment unit. It is preferable that the adjustment unit performs adjustment so that at least one of the sound level in the virtual vocal data and the sound level of the song data is within a relatively specified range.

カラオケ装置によれば、出力されるボーカル全体の音量バランスを調整できる。
この結果、カラオケ装置によれば、原盤のボーカル全体に、より近づけたボーカル音を出力できる。 According to the karaoke apparatus, the volume balance of the entire output vocal can be adjusted.
As a result, according to the karaoke apparatus, it is possible to output a vocal sound closer to the entire original vocal.

ところで、本発明の１つの態様は、コンピュータに実行させるプログラムであってもよい。
このプログラムでは、指定楽曲の伴奏音を再生する再生手順と、歌唱データを取得する取得手順と、総合ボーカルを生成する生成手順と、類似度を算出する算出手順と、類似度が最も高い総合ボーカルを特定する特定手順と、総合ボーカルに含まれる仮想ボーカルデータを、指定楽曲の伴奏音の再生とともに出力させる出力手順とをコンピュータに実行させる。 By the way, one aspect of the present invention may be a program executed by a computer.
In this program, the playback procedure for playing back the accompaniment sound of the specified song, the acquisition procedure for acquiring singing data, the generation procedure for generating comprehensive vocals, the calculation procedure for calculating similarity, and the overall vocal with the highest similarity The computer is caused to execute a specific procedure for identifying the voice and an output procedure for outputting the virtual vocal data included in the general vocal together with the reproduction of the accompaniment sound of the designated music.

本発明がプログラムとしてなされていれば、記録媒体から必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することで用いることができる。 If the present invention is implemented as a program, it can be used by loading the computer from a recording medium as needed and starting it, or by acquiring and starting the computer via a communication line as necessary.

なお、ここで言う記録媒体には、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な電子媒体を含む。 The recording medium referred to here includes, for example, a computer-readable electronic medium such as a DVD-ROM, a CD-ROM, and a hard disk.

カラオケシステムの概略構成を示す図である。It is a figure which shows schematic structure of a karaoke system. 仮想ボーカル出力処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a virtual vocal output process. パワーの調整の第１手法を説明する説明図である。It is explanatory drawing explaining the 1st method of adjustment of power. パワーの調整の第２手法を説明する説明図である。It is explanatory drawing explaining the 2nd method of adjustment of power. 指定楽曲において、仮想ボーカルデータと歌唱データとの組み合わせ例を示す図であり、（Ａ）は利用者が一人である場合の組み合わせ例であり、（Ｂ）は利用者が二人である場合の組み合わせ例である。It is a figure which shows the example of a combination of virtual vocal data and song data in a designated music, (A) is an example of a combination when there is one user, and (B) is a case where there are two users. It is a combination example. 最適特定処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of an optimal specific process. 類似度の算出方法を説明する説明図である。It is explanatory drawing explaining the calculation method of similarity.

以下、本発明の実施形態を図面と共に説明する。
＜カラオケシステム＞
図１に示すカラオケシステム１は、情報処理サーバ１０と、少なくとも１つのカラオケ装置３０とを備えている。カラオケシステム１は、カラオケ装置３０の利用者によって指定された楽曲である指定楽曲を再生する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Karaoke system>
The karaoke system 1 shown in FIG. 1 includes an information processing server 10 and at least one karaoke device 30. The karaoke system 1 reproduces designated music that is music designated by the user of the karaoke apparatus 30.

楽曲とは、時間軸に沿って配置された複数の音符のうちの少なくとも一部に歌詞が割り当てられた音楽の曲であり、複数の歌唱パートを有した曲である。歌唱パートとは、音楽グループを構成する歌手の各々が歌唱する声部である。複数の歌唱パートの各々は、音符属性の少なくとも一部分が互いに異なっている。 A song is a song of music in which lyrics are assigned to at least some of a plurality of notes arranged along the time axis, and is a song having a plurality of singing parts. A singing part is a voice part which each singer who comprises a music group sings. Each of the plurality of singing parts is different from each other in at least a part of the note attributes.

歌唱パートにおいて異なる音符属性とは、各歌唱パートを表す旋律、及びその旋律を構成する音符に割り当てられた歌詞の内容である。ただし、歌唱パートにおいて異なる音符属性は、これに限るものではなく、各歌唱パートを表す旋律だけであってもよいし、歌詞が割り当てられた音符の音高だけであってもよいし、歌詞の内容だけであってもよいし、その他であってもよい。すなわち、音符属性とは、音符の音高や音価、当該音符に割り当てられた歌詞などである。 The different note attributes in the singing part are the melody representing each singing part and the contents of the lyrics assigned to the notes constituting the melody. However, the note attributes that are different in the song part are not limited to this, it may be only the melody representing each song part, only the pitch of the note to which the lyrics are assigned, It may be only the content or other. That is, the note attributes are the pitch and value of a note, the lyrics assigned to the note, and the like.

なお、音楽グループとは、当該楽曲を歌唱した実演家としての複数のオリジナル歌手によって構成されるグループである。この音楽グループは、例えば、当該楽曲を持ち歌として歌うグループである。また、オリジナル歌手とは、音楽グループを構成する固有の歌手である。固有の歌手とは、楽曲における複数の歌唱パートの各々を担当して歌唱している歌手である。
＜情報処理サーバ＞
情報処理サーバ１０は、通信部１２と、記憶部１４と、制御部１６とを備えている。 The music group is a group composed of a plurality of original singers as performers who sang the music. This music group is, for example, a group that sings the song as a song. An original singer is a unique singer that constitutes a music group. The unique singer is a singer who sings in charge of each of a plurality of singing parts in the music.
<Information processing server>
The information processing server 10 includes a communication unit 12, a storage unit 14, and a control unit 16.

通信部１２は、通信網を介してカラオケ装置３０と接続され、情報処理サーバ１０が外部との間で通信を行う。
記憶部１４は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部１４には、詳しくは後述する少なくとも１つのＭＩＤＩ楽曲ＭＤと、少なくとも１つの仮想出力データＳＤと、少なくとも１つの比較用ボーカルＣＤとが記憶されている。 The communication unit 12 is connected to the karaoke apparatus 30 via a communication network, and the information processing server 10 communicates with the outside.
The storage unit 14 is a known storage device configured to be able to read and write stored contents. The storage unit 14 stores at least one MIDI musical piece MD, at least one virtual output data SD, and at least one comparative vocal CD, which will be described in detail later.

なお、図１に示す符号「ｎ」は、ＭＩＤＩ楽曲ＭＤを識別する識別子である。この符号「ｎ」は、１以上の自然数である。符号「ｌ」は、仮想出力データＳＤを識別する識別子である。この符号「ｌ」は、１以上の自然数である。符号「ｍ」は、比較用ボーカルＣＤを識別する識別子である。この符号「ｍ」は、１以上の自然数である。 Note that the symbol “n” shown in FIG. 1 is an identifier for identifying the MIDI music piece MD. This code “n” is a natural number of 1 or more. The code “l” is an identifier for identifying the virtual output data SD. This code “l” is a natural number of 1 or more. The code “m” is an identifier for identifying the comparative vocal CD. This code “m” is a natural number of 1 or more.

制御部１６は、ＲＯＭ１８，ＲＡＭ２０，ＣＰＵ２２を備えた周知のマイクロコンピュータを中心に構成された周知の制御装置である。ＲＯＭ１８は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ２０は、処理プログラムやデータを一時的に記憶する。ＣＰＵ２２は、ＲＯＭ１８やＲＡＭ２０に記憶された処理プログラムに従って各処理を実行する。
＜ＭＩＤＩ楽曲＞
ＭＩＤＩ楽曲ＭＤは、楽曲ごとに予め用意されたデータであり、楽譜データと、歌詞データと、楽曲情報とを備えている。 The control unit 16 is a known control device that is configured around a known microcomputer including a ROM 18, a RAM 20, and a CPU 22. The ROM 18 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 20 temporarily stores processing programs and data. The CPU 22 executes each process according to a processing program stored in the ROM 18 or the RAM 20.
<MIDI music>
The MIDI musical piece MD is data prepared in advance for each musical piece, and includes score data, lyrics data, and musical piece information.

楽譜データは、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格によって、一つの楽曲の伴奏旋律の楽譜を表したデータである。この楽譜データには、ＭＩＤＩ音源が再生されて出力される個々の音符について、少なくとも、音高（いわゆるノートナンバー）と、音符長とが規定されている。楽譜データにおける音符長は、当該音符のノートオンタイミングと、当該音符のノートオフタイミングとによって規定されている。楽譜データが再生されると、伴奏音が出力される。 The musical score data is data representing the musical score of the accompaniment melody of one musical piece in accordance with the well-known MIDI (Musical Instrument Digital Interface) standard. In the musical score data, at least a pitch (so-called note number) and a note length are defined for each note that is reproduced and output from a MIDI sound source. The note length in the musical score data is defined by the note-on timing of the note and the note-off timing of the note. When the musical score data is reproduced, an accompaniment sound is output.

歌詞データは、楽曲の歌詞に関するデータであり、楽曲の歌詞のテキストを表す歌詞テキストデータと、歌詞のテキストの出力タイミングを、楽譜データに基づく楽曲の再生と対応付けた歌詞出力データとを備えている。 The lyric data is data related to the lyrics of the music, and includes lyric text data representing the text of the lyrics of the music, and lyric output data in which the output timing of the text of the lyrics is associated with the reproduction of the music based on the score data. Yes.

楽曲情報は、楽曲に関する情報であり、楽曲ＩＤと、音楽グループ情報とを含む。
楽曲ＩＤは、楽曲を識別する識別情報である。音楽グループ情報は、音楽グループに関する情報である。音楽グループ情報には、音楽グループを識別するグループＩＤ、音楽グループの名称、オリジナル歌手の氏名、オリジナル歌手の各々が担当する歌唱パートを表す情報を含む。
＜比較用ボーカル及び仮想出力データ＞
比較用ボーカルＣＤは、音楽グループを構成するオリジナル歌手によって歌唱パートの各々が歌唱された１つの楽曲全体での歌声である。比較用ボーカルＣＤは、楽曲ごとに予め生成される。 The music information is information relating to the music, and includes a music ID and music group information.
The song ID is identification information for identifying a song. The music group information is information related to the music group. The music group information includes information representing a group ID for identifying the music group, the name of the music group, the name of the original singer, and the singing part that each original singer is in charge of.
<Comparison vocal and virtual output data>
The comparative vocal CD is a singing voice of one whole piece of music in which each of the singing parts is sung by the original singer constituting the music group. The comparative vocal CD is generated in advance for each music piece.

なお、比較用ボーカルＣＤの生成方法の例として、一つの楽曲において楽器の伴奏音とボーカル音とが混在する原盤波形から、周知の方法によってボーカル音を特定して分離する方法が考えられる。 As an example of a method for generating a comparative vocal CD, a method of identifying and separating a vocal sound by a known method from a master waveform in which an accompaniment sound of a musical instrument and a vocal sound are mixed in one piece of music can be considered.

また、仮想出力データＳＤは、仮想ボーカルデータの生成に必要なデータであり、音楽グループごと、かつ、オリジナル歌手ごとに予め用意される。なお、仮想ボーカルデータとは、指定楽曲における複数の歌唱パートの各々の歌声を、オリジナル歌手の歌い方またはオリジナル歌手の歌い方に近似した歌い方で表した歌声の各々である。ここで言う歌い方には、ビブラートやしゃくり、こぶしなどの歌唱技巧の他に、声質を含む。 The virtual output data SD is data necessary for generating virtual vocal data, and is prepared in advance for each music group and each original singer. The virtual vocal data is each of the singing voices that express the singing voices of the plurality of singing parts in the designated music piece in the way of singing the original singer or the way of singing the original singer. The way of singing here includes voice quality in addition to singing techniques such as vibrato, shackles, and fist.

本実施形態における仮想出力データＳＤは、オリジナル歌手によって当該楽曲の歌唱パートが歌唱された歌声であってもよいし、オリジナル歌手に歌い方が類似する歌手によって当該楽曲の歌唱パートが歌唱された歌声であってもよいし、音声合成に用いるデータであってもよい。音声合成に用いるデータは、波形接続に用いる音声素片であってもよいし、フォルマント合成に用いる音声パラメータであってもよい。 The virtual output data SD in the present embodiment may be a singing voice in which the singing part of the music piece is sung by the original singer, or a singing voice in which the singing part of the music piece is sung by a singer that is similar to the original singer. Or data used for speech synthesis. The data used for speech synthesis may be a speech element used for waveform connection or a speech parameter used for formant synthesis.

仮想出力データＳＤが音声合成に用いるデータである場合には、その仮想出力データＳＤは、特徴データを備える。特徴データとは、オリジナル歌手の歌い方が再現されるように、音符における音高と音価との組み合わせと当該音符に割り当てられた歌詞の音節との組み合わせごとの各オリジナル歌手における歌い方の特徴を表すデータである。
＜カラオケ装置＞
カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。 When the virtual output data SD is data used for speech synthesis, the virtual output data SD includes feature data. Characteristic data refers to the characteristics of the singing method for each original singer for each combination of the pitch and note value of the note and the syllable of the lyrics assigned to the note so that the original singer's singing method is reproduced. It is data representing.
<Karaoke equipment>
The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.

通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。ここでの入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. Here, the input device is, for example, a key, a switch, a reception unit of a remote controller, or the like.

楽曲再生部３６は、情報処理サーバ１０からダウンロードしたＭＩＤＩ楽曲ＭＤに基づく楽曲の伴奏音の再生を実行する。楽曲再生部３６は、例えば、ＭＩＤＩ音源である。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music reproducing unit 36 reproduces the accompaniment sound of the music based on the MIDI music MD downloaded from the information processing server 10. The music reproducing unit 36 is, for example, a MIDI sound source. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、少なくとも１つのマイク６２を介して入力された音声を取得する。本実施形態においては、２本のマイク６２がマイク入力部４４に接続されているものとして説明するが、マイク６２の本数は、２本に限るものではなく、１本であってもよいし、３本以上であってもよい。なお、ここで言う接続は、有線、無線を問わない。 The microphone input unit 44 acquires sound input through at least one microphone 62. In the present embodiment, two microphones 62 are described as being connected to the microphone input unit 44, but the number of microphones 62 is not limited to two, and may be one, Three or more may be sufficient. The connection referred to here may be wired or wireless.

出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から出力される音源信号を音に換えて出力する。映像制御部４６は、制御部５０から送られてくる映像データに基づく映像または画像の出力を行う。映像制御部４６には、映像または画像を表示する表示部６４が接続されている。 The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal output from the output unit 42 instead of sound. The video control unit 46 outputs a video or an image based on the video data sent from the control unit 50. The video control unit 46 is connected to a display unit 64 that displays video or images.

記憶部３８は、記憶内容を読み書き可能に構成された周知の記憶装置である。
制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ５４は、処理プログラムやデータを一時的に記憶する。ＣＰＵ５６は、ＲＯＭ５２やＲＡＭ５４に記憶された処理プログラムに従って各処理を実行する。 The storage unit 38 is a well-known storage device configured to be able to read and write stored contents.
The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 54 temporarily stores processing programs and data. The CPU 56 executes each process according to a processing program stored in the ROM 52 or the RAM 54.

本実施形態のＲＯＭ５２には、指定楽曲の伴奏音を再生し、歌詞を表示すると共に、適切な仮想ボーカルデータ生成して出力する仮想ボーカル出力処理を、制御部５０が実行するための処理プログラムが記憶されている。
＜仮想ボーカル出力処理＞
図２に示す仮想ボーカル出力処理が起動されると、制御部５０は、まず、入力受付部３４を介して指定された楽曲（即ち、指定楽曲）の楽曲ＩＤを取得する（Ｓ１１０）。そして、制御部５０は、Ｓ１１０で取得した楽曲ＩＤを含むＭＩＤＩ楽曲ＭＤを、情報処理サーバ１０の記憶部１４から取得する（Ｓ１２０）。 In the ROM 52 of the present embodiment, a processing program for the control unit 50 to execute a virtual vocal output process of reproducing the accompaniment sound of the designated music, displaying the lyrics, and generating and outputting appropriate virtual vocal data. It is remembered.
<Virtual vocal output processing>
When the virtual vocal output process shown in FIG. 2 is activated, the control unit 50 first acquires the song ID of the song (ie, the designated song) designated via the input receiving unit 34 (S110). And the control part 50 acquires the MIDI music MD containing the music ID acquired by S110 from the memory | storage part 14 of the information processing server 10 (S120).

そして、制御部５０は、Ｓ１２０で取得したＭＩＤＩ楽曲ＭＤに基づいて指定楽曲の伴奏音の再生を実行する（Ｓ１３０）。Ｓ１３０では、制御部５０は、楽曲再生部３６にＭＩＤＩ楽曲ＭＤを時間軸に沿って順次出力する。そのＭＩＤＩ楽曲ＭＤを取得した楽曲再生部３６は、指定楽曲の伴奏音の再生を行い、指定楽曲の音源信号を、出力部４２を介してスピーカ６０へと出力する。これにより、指定楽曲の演奏音は、スピーカ６０から放音される。 And the control part 50 performs reproduction | regeneration of the accompaniment sound of a designated music based on the MIDI music MD acquired by S120 (S130). In S130, the control unit 50 sequentially outputs the MIDI music MD to the music playback unit 36 along the time axis. The music reproduction unit 36 that has acquired the MIDI music MD reproduces the accompaniment sound of the designated music, and outputs the sound source signal of the designated music to the speaker 60 via the output unit 42. Thereby, the performance sound of the designated music is emitted from the speaker 60.

Ｓ１３０では、制御部５０は、更に、歌詞出力データ及び歌詞テキストデータを映像制御部４６に出力する。すると、映像制御部４６は、指定楽曲の伴奏音の再生に併せて歌詞を表示部６４に順次表示する。 In S <b> 130, the control unit 50 further outputs lyrics output data and lyrics text data to the video control unit 46. Then, the video control unit 46 sequentially displays lyrics on the display unit 64 as the accompaniment sound of the designated music is reproduced.

続いて、制御部５０は、指定楽曲を歌唱している利用者（以下、歌唱ユーザと称す）の人数を取得する（Ｓ１４０）。Ｓ１４０では、制御部５０は、例えば、電源がオンとなっているマイク６２の本数を歌唱ユーザの人数として取得してもよいし、入力受付部３４を介して入力された数を歌唱ユーザの人数として取得してもよい。 Subsequently, the control unit 50 acquires the number of users (hereinafter referred to as singing users) who are singing the designated music (S140). In S140, the control unit 50 may acquire, for example, the number of microphones 62 that are turned on as the number of singing users, or the number input via the input receiving unit 34 is the number of singing users. You may get as

さらに、仮想ボーカル出力処理では、制御部５０は、マイク６２の各々を介して入力された音声それぞれを歌唱データとして取得し、記憶部３８に記憶する（Ｓ１５０）。歌唱データとは、マイク６２を介して入力された音声、即ち、指定楽曲における少なくとも１つの歌唱パートを歌唱した歌声である。 Further, in the virtual vocal output process, the control unit 50 acquires each voice input via each of the microphones 62 as song data and stores it in the storage unit 38 (S150). The singing data is the voice input through the microphone 62, that is, the singing voice that sang at least one singing part in the designated music piece.

そして、制御部５０は、指定楽曲において予め定められた区間である比較対象区間の再生を終了したか否かを判定する（Ｓ１６０）。Ｓ１６０での判定の結果、比較対象区間の再生を終了していなければ（Ｓ１６０：ＮＯ）、制御部５０は、仮想ボーカル出力処理をＳ１５０へと戻す。そして、比較対象区間の再生が終了すると（Ｓ１６０：ＹＥＳ）、制御部５０は、仮想ボーカル出力処理をＳ１７０へと移行させる。 Then, the control unit 50 determines whether or not the reproduction of the comparison target section that is a predetermined section in the designated music has been completed (S160). As a result of the determination in S160, if the reproduction of the comparison target section has not ended (S160: NO), the control unit 50 returns the virtual vocal output process to S150. When the reproduction of the comparison target section ends (S160: YES), the control unit 50 shifts the virtual vocal output process to S170.

そのＳ１７０では、制御部５０は、仮想ボーカルデータを取得する。具体的に、本実施形態のＳ１７０では、オリジナル歌手によって当該楽曲が歌唱された歌声や、オリジナル歌手に歌い方が類似する歌手によって当該楽曲が歌唱された歌声が、仮想出力データＳＤとして用意されている場合には、制御部５０は、指定楽曲に対応する仮想出力データＳＤを仮想ボーカルデータとして取得する。また、音声合成に用いるデータが仮想出力データＳＤとして用意されている場合には、制御部５０は、Ｓ１２０で取得したＭＩＤＩ楽曲ＭＤ（楽譜データ，歌詞データ）及び仮想出力データＳＤに含まれる特徴データに従って、音声合成によって仮想ボーカルデータを生成して取得する。この場合、仮想ボーカルデータは、各歌唱パートの楽譜通り、かつオリジナル歌手の歌い方の特徴を反映した歌声として合成音声が生成される。 In S170, the control unit 50 acquires virtual vocal data. Specifically, in S170 of the present embodiment, a singing voice in which the music is sung by the original singer, and a singing voice in which the music is sung by a singer similar to the original singer is prepared as the virtual output data SD. If there is, the control unit 50 acquires virtual output data SD corresponding to the designated music piece as virtual vocal data. When data used for speech synthesis is prepared as virtual output data SD, the control unit 50 includes the MIDI music MD (score data, lyrics data) acquired in S120 and feature data included in the virtual output data SD. To generate and acquire virtual vocal data by speech synthesis. In this case, the synthesized vocal is generated from the virtual vocal data as a singing voice reflecting the characteristics of the original singer in accordance with the score of each singing part.

仮想ボーカル出力処理では、制御部５０は、歌唱データのパワーを調整する（Ｓ１８０）。Ｓ１８０では、制御部５０は、歌唱データのパワーが、仮想ボーカルデータにおける少なくとも１つの歌唱パートのパワーを基準として規定された範囲内となるように、歌唱データのパワーを補正する。 In the virtual vocal output process, the control unit 50 adjusts the power of the song data (S180). In S180, the control unit 50 corrects the power of the singing data so that the power of the singing data falls within a range defined based on the power of at least one singing part in the virtual vocal data.

パワーとは、音声波形の振幅に基づく指標であり、例えば、音量や音圧を意味する指標である。音声波形の振幅に基づく指標は、音の強さまたは量のレベルも表すものである。
基準として規定された範囲は、歌唱データが、仮想ボーカルデータに埋没しないパワーとして規定された範囲である。基準として規定された範囲は、具体的には、図３に示すように、複数の仮想ボーカルデータのパワーの平均値であってもよいし、図４に示すように、当該歌唱データに対応する歌唱パートの仮想ボーカルデータのパワーであってもよい。 Power is an index based on the amplitude of a speech waveform, for example, an index that means volume or sound pressure. An indicator based on the amplitude of the speech waveform also represents the level of sound intensity or volume.
The range defined as the reference is a range defined as the power at which the singing data is not buried in the virtual vocal data. Specifically, the range defined as the reference may be an average value of the power of a plurality of virtual vocal data as shown in FIG. 3, or corresponds to the song data as shown in FIG. The power of the virtual vocal data of the singing part may be used.

続いて、制御部５０は、Ｓ１７０で生成された仮想ボーカルデータと、Ｓ１８０で調整された歌唱データとを混合した１つの総合ボーカルを生成する（Ｓ１９０）。
総合ボーカルとは、仮想ボーカルデータに歌唱データを混合した音声波形である。総合ボーカルにおいて混合される仮想ボーカルデータの数は、指定楽曲における歌唱パートの数と歌唱データの音声の数との差以上の数である。 Then, the control part 50 produces | generates one comprehensive vocal which mixed the virtual vocal data produced | generated by S170, and the song data adjusted by S180 (S190).
The general vocal is a voice waveform in which singing data is mixed with virtual vocal data. The number of virtual vocal data mixed in the general vocal is a number greater than or equal to the difference between the number of singing parts in the designated music and the number of voices in the singing data.

Ｓ１９０では、歌唱ユーザの人数が１人である場合、制御部５０は、歌唱データに対して１つの歌唱パートを設定し、残りの歌唱パートに対する仮想ボーカルデータの組み合わせに歌唱データを混合する。換言すると、制御部５０は、指定楽曲における歌唱パートの数から「１」を減算した個数の仮想ボーカルデータを組み合わせた組み合わせの１つに、歌唱ユーザによる歌唱データを混合する。これにより、１つの総合ボーカルデータを生成する。 In S190, when the number of singing users is one, the control unit 50 sets one singing part for the singing data, and mixes the singing data with the combination of virtual vocal data for the remaining singing parts. In other words, the control unit 50 mixes the singing data by the singing user with one of the combinations obtained by combining the number of virtual vocal data obtained by subtracting “1” from the number of singing parts in the designated music piece. Thereby, one total vocal data is generated.

また、歌唱ユーザの人数が複数である場合には、制御部５０は、複数の歌唱データの各々に対応する歌唱パートが別個の歌唱パートであるものとして、総合ボーカルを生成してもよいし、複数の歌唱データのうちの少なくとも２つの歌唱データに対応する歌唱パートが同一の歌唱パートであるものとして、総合ボーカルを生成してもよい。 Moreover, when there are a plurality of singing users, the control unit 50 may generate a general vocal, assuming that the singing part corresponding to each of the plurality of singing data is a separate singing part, The integrated vocal may be generated assuming that the singing parts corresponding to at least two of the singing data are the same singing part.

つまり、歌唱ユーザの人数が複数である場合、制御部５０は、その複数の歌唱データを混合する対象を、指定楽曲における歌唱パートの数から「当該複数以下の数」を減算した個数の仮想ボーカルデータの組み合わせの１つとする。 That is, when there are a plurality of singing users, the control unit 50 determines the number of virtual vocals obtained by subtracting “the number equal to or less than the number” from the number of singing parts in the designated music for the target of mixing the singing data. One of the data combinations.

さらに、制御部５０は、仮想ボーカルデータと歌唱データとの全ての組み合わせについて総合ボーカルを生成したか否かを判定する（Ｓ２００）。このＳ２００での判定の結果、全ての組み合わせについて総合ボーカルを生成していなければ（Ｓ２００：ＮＯ）、制御部５０は、仮想ボーカル出力処理をＳ１９０へと戻す。そして、Ｓ１９０において、未生成の組み合わせの中から１つの組み合わせを実現した総合ボーカルを生成する。 Further, the control unit 50 determines whether or not a total vocal has been generated for all combinations of virtual vocal data and singing data (S200). As a result of the determination in S200, if the total vocals are not generated for all combinations (S200: NO), the control unit 50 returns the virtual vocal output process to S190. In S190, a total vocal that realizes one combination from the ungenerated combinations is generated.

全ての組み合わせについて総合ボーカルを生成していれば（Ｓ２００：ＹＥＳ）、制御部５０は、仮想ボーカル処理をＳ２１０へと移行させる。
なお、全ての組み合わせとは、例えば、歌唱パートの種類の数が「３」である楽曲に対して、歌唱ユーザの人数が１人である場合には、図５（Ａ）に示すように、歌唱パートの１つに対応する歌声を歌唱データとし、残りの歌唱パートに対応する歌声を仮想出力データＳＤに基づく歌声とした、３つの組み合わせである。また、全ての組み合わせとは、例えば、歌唱パートの種類の数が「３」である楽曲に対して、歌唱ユーザの人数が２人である場合には、図５（Ｂ）に示すように、計６つの組み合わせである。その６つの組み合わせは、２つの歌唱パートの各々に対応する歌声を歌唱データとし、残りの歌唱パートに対応する歌声を仮想出力データＳＤに基づく歌声とした、３つの組み合わせと、１つの歌唱パートに対応する歌声を２つの歌唱データとし、残りの歌唱パートに対応する歌声を仮想出力データＳＤに基づく歌声とした、３つの組み合わせである。 If the integrated vocals are generated for all combinations (S200: YES), the control unit 50 shifts the virtual vocal process to S210.
In addition, with all combinations, for example, when the number of singing users is one for a song whose number of singing part types is “3”, as shown in FIG. The singing voice corresponding to one of the singing parts is singing data, and the singing voice corresponding to the remaining singing parts is the singing voice based on the virtual output data SD. Moreover, with all combinations, for example, when the number of singing users is two for a song whose number of types of singing parts is “3”, as shown in FIG. There are a total of six combinations. The six combinations include singing voices corresponding to each of the two singing parts as singing data, and singing voices corresponding to the remaining singing parts as singing voices based on the virtual output data SD, and one singing part. The three singing voices are the corresponding singing voices, and the singing voices corresponding to the remaining singing parts are the singing voices based on the virtual output data SD.

Ｓ２１０では、制御部５０は、最適特定処理を実行する。最適特定処理とは、最も適切な総合ボーカルを特定する処理である。最適特定処理の詳細は後述する。
続いて、制御部５０は、最も適切な総合ボーカルを構成する仮想ボーカルデータの音レベルを調整する（Ｓ２２０）。Ｓ２２０における仮想ボーカルデータの音レベルの調整は、１つの仮想ボーカルデータにおける音レベルが、他の仮想ボーカルデータの音レベルに対して、相対的に規定された範囲内となるように実施する。相対的に規定された範囲とは、指定楽曲におけるオリジナル歌手それぞれの歌声の音レベルの差の範囲内として予め規定された範囲である。 In S210, the control unit 50 executes the optimum specifying process. The optimum specifying process is a process for specifying the most appropriate comprehensive vocal. Details of the optimum specifying process will be described later.
Subsequently, the control unit 50 adjusts the sound level of the virtual vocal data constituting the most appropriate general vocal (S220). The adjustment of the sound level of the virtual vocal data in S220 is performed so that the sound level in one virtual vocal data is within a range defined relative to the sound level of the other virtual vocal data. The relatively defined range is a range defined in advance as a range of difference in sound level of each singing voice of the original singer in the designated music piece.

音レベルとは、音の強さまたは音量のレベルも表す指標である。
そして、制御部５０は、Ｓ２２０にて音レベルが調整された仮想ボーカルデータを出力部４２へと出力する（Ｓ２３０）。その出力部４２は、スピーカ６０から仮想ボーカルデータによって表される仮想ボーカルを出力する。 The sound level is an index that also represents the level of sound intensity or volume.
And the control part 50 outputs the virtual vocal data by which the sound level was adjusted in S220 to the output part 42 (S230). The output unit 42 outputs a virtual vocal represented by virtual vocal data from the speaker 60.

さらに、Ｓ２３０では、制御部５０は、歌唱データを出力部４２へと出力する。その出力部４２は、スピーカ６０から歌唱データ、即ち、歌唱音声を出力する。
制御部５０は、その後、本仮想ボーカル出力処理を終了する。
＜最適特定処理＞
仮想ボーカル出力処理のＳ２１０にて起動される最適特定処理では、制御部５０は、図６に示すように、まず、指定楽曲の比較用ボーカルＣＤを情報処理サーバ１０の記憶部１４から取得する（Ｓ３１０）。続いて、制御部５０は、Ｓ３１０で取得した比較用ボーカルＣＤのスペクトログラムである比較スペクトルを算出する（Ｓ３２０）。さらに、制御部５０は、Ｓ３２０で算出した比較スペクトルを、単位時間ごとに周波数軸に沿って平滑化する（Ｓ３３０）。 Furthermore, in S <b> 230, the control unit 50 outputs the song data to the output unit 42. The output unit 42 outputs singing data, that is, singing voice, from the speaker 60.
Thereafter, the control unit 50 ends the virtual vocal output process.
<Optimum specific processing>
In the optimum specifying process activated in S210 of the virtual vocal output process, as shown in FIG. 6, the control unit 50 first obtains a comparison vocal CD for the designated music piece from the storage unit 14 of the information processing server 10 ( S310). Subsequently, the control unit 50 calculates a comparison spectrum that is a spectrogram of the comparative vocal CD acquired in S310 (S320). Further, the control unit 50 smoothes the comparison spectrum calculated in S320 along the frequency axis for each unit time (S330).

最適特定処理では、続いて、制御部５０は、先のＳ１９０で生成された総合ボーカルの１つを取得する（Ｓ３４０）。そして、制御部５０は、Ｓ３４０で取得した総合ボーカルのスペクトログラムであるボーカルスペクトルを算出する（Ｓ３５０）。さらに、制御部５０は、Ｓ３５０で算出したボーカルスペクトルを、単位時間ごとに周波数軸に沿って平滑化する（Ｓ３６０）。 In the optimum specifying process, subsequently, the control unit 50 acquires one of the integrated vocals generated in the previous S190 (S340). And the control part 50 calculates the vocal spectrum which is the spectrogram of the general vocal acquired by S340 (S350). Further, the control unit 50 smoothes the vocal spectrum calculated in S350 along the frequency axis for each unit time (S360).

さらに、最適特定処理では、制御部５０は、平滑化されたボーカルスペクトルを、平滑化された比較スペクトルに照合し、類似度を算出する（Ｓ３７０）。類似度とは、平滑化されたボーカルスペクトルと、平滑化された比較スペクトルとの相関を表す指標である。このＳ３７０における類似度の算出は、平滑化されたボーカルスペクトルと、平滑化された比較スペクトルとの相関を求めることで実行すればよい。 Further, in the optimum specifying process, the control unit 50 collates the smoothed vocal spectrum with the smoothed comparison spectrum, and calculates the similarity (S370). The similarity is an index representing the correlation between the smoothed vocal spectrum and the smoothed comparison spectrum. The calculation of the similarity in S370 may be executed by obtaining the correlation between the smoothed vocal spectrum and the smoothed comparison spectrum.

続いて、制御部５０は、全ての総合ボーカルについて、比較用ボーカルＣＤとの類似度を求めたか否かを判定する（Ｓ３８０）。このＳ３８０での判定の結果、全ての総合ボーカルについて類似度を求めていなければ（Ｓ３８０：ＮＯ）、制御部５０は、最適特定処理をＳ３４０へと戻す。そのＳ３４０では、制御部５０は、類似度を未算出である総合ボーカルの中から１つの総合ボーカルを取得し、Ｓ３５０へと移行する。これにより、最適特定処理では、図７に示すように、全ての総合ボーカルのボーカルスペクトルについて、比較スペクトルと照合され、全ての総合ボーカルに対する類似度が求められる。 Subsequently, the control unit 50 determines whether or not the similarity with the comparative vocal CD is obtained for all the integrated vocals (S380). As a result of the determination in S380, if the similarity is not obtained for all the integrated vocals (S380: NO), the control unit 50 returns the optimum specifying process to S340. In S340, the control unit 50 acquires one general vocal from the general vocals whose similarity has not been calculated, and proceeds to S350. As a result, in the optimum specifying process, as shown in FIG. 7, the vocal spectra of all the general vocals are collated with the comparison spectrum, and the similarity to all the general vocals is obtained.

一方、全ての総合ボーカルについて類似度を算出済みであれば（Ｓ３８０：ＹＥＳ）、制御部５０は、最適特定処理をＳ３９０へと移行させる。
そのＳ３９０では、算出された全ての類似度のうち、最も類似度が高い総合ボーカルを、最も適切な総合ボーカルとして特定する。 On the other hand, if the similarity has been calculated for all the general vocals (S380: YES), the control unit 50 shifts the optimum specifying process to S390.
In S390, the total vocal having the highest similarity among all the calculated similarities is specified as the most appropriate total vocal.

制御部５０は、その後、本最適特定処理を終了し、補助ボーカル出力処理のＳ２２０へと処理を戻す。
［実施形態の効果］
（１）以上説明したように、仮想ボーカル出力処理によれば、比較用ボーカルＣＤに最も類似する総合ボーカルを特定できる。この総合ボーカルは、指定楽曲において利用者が担当する歌唱パート以外の歌唱パートの歌声を、仮想ボーカルデータで補った指定楽曲全体の歌声である。 Thereafter, the control unit 50 ends the optimum specifying process and returns the process to S220 of the auxiliary vocal output process.
[Effect of the embodiment]
(1) As described above, according to the virtual vocal output process, it is possible to specify the general vocal most similar to the comparative vocal CD. This general vocal is the singing voice of the entire designated music obtained by supplementing the singing voice of the singing part other than the singing part in charge of the designated music with the virtual vocal data.

そして、仮想ボーカル出力処理によれば、その特定した総合ボーカルに含まれる仮想ボーカルデータを出力できる。さらに、仮想ボーカル出力処理では、仮想ボーカルデータによって表される歌声と併せて利用者の歌声を出力しているため、カラオケ装置３０においては、原曲における全体の歌声に近い演奏音を出力できる。 According to the virtual vocal output process, virtual vocal data included in the specified general vocal can be output. Furthermore, in the virtual vocal output process, since the user's singing voice is output together with the singing voice represented by the virtual vocal data, the karaoke apparatus 30 can output a performance sound close to the entire singing voice in the original song.

換言すれば、カラオケ装置３０において、複数の歌唱パートを有した楽曲を再生する際に、いずれかの歌唱パートを歌唱する利用者の歌唱に合わせて、可能な限り原曲に近い演奏音を出力することができる。 In other words, in the karaoke device 30, when playing a song having a plurality of singing parts, a performance sound that is as close to the original song as possible is output in accordance with the song of the user who sings any singing part. can do.

（２）仮想ボーカル出力処理では、歌唱データのパワーが、規定された範囲内となるように、歌唱データのパワーを補正して、総合ボーカルを生成している。
このため、仮想ボーカル出力処理によれば、歌唱データ及び仮想ボーカルデータのいずれか一方のパワーが他方のパワーに比べて極端に大きくなることを低減でき、歌唱データ及び仮想ボーカルデータのいずれか一方に他方が埋没することを低減できる。よって、類似度の算出精度を向上させることができる。 (2) In the virtual vocal output process, the power of the song data is corrected so that the power of the song data is within a specified range, and a general vocal is generated.
For this reason, according to the virtual vocal output processing, it is possible to reduce that the power of either one of the song data and the virtual vocal data is extremely large compared to the other power, and either the song data or the virtual vocal data can be reduced. It can reduce that the other is buried. Therefore, the calculation accuracy of the similarity can be improved.

（３）また、最適特定処理においては、比較スペクトル及びボーカルスペクトルの双方を、単位時間ごとに周波数軸に沿って平滑化して類似度を算出している。このため、最適特定処理によって求められる類似度を、比較スペクトル及びボーカルスペクトルにおける細かな変化の影響を低減したものとすることができる。 (3) In the optimum specifying process, both the comparison spectrum and the vocal spectrum are smoothed along the frequency axis for each unit time to calculate the similarity. For this reason, the similarity obtained by the optimum specifying process can be reduced from the influence of fine changes in the comparison spectrum and the vocal spectrum.

この結果、仮想ボーカル出力処理によれば、より適切な総合ボーカルを、最も適切な総合ボーカルとして特定できる。
（４）ところで、仮想ボーカル出力処理においては、歌唱ユーザの人数が複数である場合、制御部５０は、その複数の歌唱データを混合する対象を、指定楽曲における歌唱パートの数から「当該複数以下の数」を減算した個数の仮想ボーカルデータの組み合わせのそれぞれとしている。 As a result, according to the virtual vocal output process, a more appropriate integrated vocal can be specified as the most appropriate integrated vocal.
(4) By the way, in the virtual vocal output process, when there are a plurality of singing users, the control unit 50 determines the target of mixing the plurality of singing data from the number of singing parts in the designated music as “the number or less. Each of the combinations of virtual vocal data of the number obtained by subtracting “number of”.

すなわち、仮想ボーカル出力処理によれば、複数の利用者が歌唱する場合であっても、最も類似度が高い総合ボーカルを特定でき、適切な仮想ボーカルデータを出力することができる。 That is, according to the virtual vocal output process, even when a plurality of users sing, it is possible to specify the general vocal with the highest similarity and output appropriate virtual vocal data.

更に言えば、仮想ボーカル出力処理によれば、複数の利用者の各々が別個の歌唱パートを歌唱する場合であっても、複数の利用者が同一の歌唱パートを歌唱する場合であっても、最も類似度が高い総合ボーカルを特定できる。 Furthermore, according to the virtual vocal output process, even when each of a plurality of users sing a separate singing part, even when a plurality of users sing the same singing part, Can identify the general vocal with the highest similarity.

（５）なお、仮想ボーカル出力処理においては、１つの仮想ボーカルデータにおける音レベルが、他の仮想ボーカルデータの音レベルに対して相対的に規定された範囲内となるように、仮想ボーカルデータの音レベルの補正を実施している。 (5) In the virtual vocal output process, the virtual vocal data is recorded so that the sound level in one virtual vocal data is within a range defined relative to the sound level of the other virtual vocal data. The sound level is corrected.

このため、仮想ボーカル出力処理によれば、スピーカ６０から出力されるボーカル全体の音量バランスを、原盤の歌唱音声全体に近づくように調整できる。さらには、その原盤の歌唱音声全体の音量バランスに近づけた歌声を出力できる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 For this reason, according to the virtual vocal output process, the volume balance of the entire vocal output from the speaker 60 can be adjusted so as to approach the entire singing voice of the master disk. Furthermore, a singing voice close to the volume balance of the entire singing voice of the master can be output.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

（１）例えば、上記実施形態の仮想ボーカル出力処理では、Ｓ２２０において音レベルを調整する対象を仮想ボーカルデータとしていたが、音レベルを調整する対象は、歌唱データであってもよい。 (1) For example, in the virtual vocal output process of the above embodiment, the object whose sound level is adjusted in S220 is the virtual vocal data, but the object whose sound level is adjusted may be song data.

（２）上記実施形態では、最適特定処理の実行主体をカラオケ装置３０としていたが、最適特定処理の実行主体は、情報処理サーバ１０であってもよい。
（３）上記実施形態の構成の一部を省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。 (2) In the above embodiment, the execution subject of the optimum specifying process is the karaoke device 30, but the execution subject of the optimum specifying process may be the information processing server 10.
(3) The aspect which abbreviate | omitted a part of structure of the said embodiment is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention.

（４）また、本発明は、前述したカラオケ装置の他、仮想ボーカルを音声合成によって生成して出力するためにカラオケ装置が備えるコンピュータが実行するプログラム、仮想ボーカルを音声合成によって生成して出力する生成方法等、種々の形態で実現することができる。
［対応関係の一例］
仮想ボーカル出力処理におけるＳ１３０を実行することで得られる機能が、再生手段の一例に相当する。Ｓ１５０を実行することで得られる機能が、取得手段の一例に相当する。Ｓ１７０〜Ｓ１９０を実行することで得られる機能が生成手段の一例に相当する。Ｓ２３０を実行することで得られる機能が、出力手段の一例に相当する。 (4) In addition to the karaoke apparatus described above, the present invention generates and outputs a virtual vocal by voice synthesis, a program executed by a computer included in the karaoke apparatus to generate and output a virtual vocal by voice synthesis. It can be realized in various forms such as a generation method.
[Example of correspondence]
The function obtained by executing S130 in the virtual vocal output process corresponds to an example of a playback unit. The function obtained by executing S150 corresponds to an example of an acquisition unit. The function obtained by executing S170 to S190 corresponds to an example of a generation unit. The function obtained by executing S230 corresponds to an example of an output unit.

さらに、最適特定処理のＳ３１０〜Ｓ３８０を実行することで得られる機能が、算出手段の一例に相当する。最適特定処理のＳ３９０を実行することで得られる機能が、特定手段の一例に相当する。 Furthermore, the function obtained by executing the optimum specifying processes S310 to S380 corresponds to an example of a calculation unit. The function obtained by executing S390 of the optimum specifying process corresponds to an example of the specifying unit.

なお、仮想ボーカル出力処理におけるＳ２２０を実行することで得られる機能が、調整手段の一例に相当する。 Note that the function obtained by executing S220 in the virtual vocal output process corresponds to an example of an adjusting unit.

１…カラオケシステム１０…情報処理サーバ１２，３２…通信部１４，３８…記憶部１６，５０…制御部１８，５２…ＲＯＭ２０，５４…ＲＡＭ２２，５６…ＣＰＵ３０…カラオケ装置３４…入力受付部３６…楽曲再生部４０…音声制御部４２…出力部４４…マイク入力部４６…映像制御部６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 1 ... Karaoke system 10 ... Information processing server 12, 32 ... Communication part 14,38 ... Memory | storage part 16,50 ... Control part 18,52 ... ROM 20,54 ... RAM 22,56 ... CPU 30 ... Karaoke apparatus 34 ... Input reception Unit 36 ... Music reproduction unit 40 ... Audio control unit 42 ... Output unit 44 ... Microphone input unit 46 ... Video control unit 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

Reproduction means for reproducing the accompaniment sound of a designated song that is a song having a plurality of singing parts in which at least a part of the note attribute is different,
Obtaining means for obtaining one or a plurality of voices input via a microphone as singing data during reproduction of the designated music by the reproducing means;
Each singing voice that reproduces each singing voice of the plurality of singing parts in the designated music by the original singer's way of singing is assumed to be virtual vocal data, and the difference between the number of singing parts in the designated music and the number of voices of the singing data or more Generating means for generating, for each combination of singing parts of the designated music, a total vocal in which the number of the virtual vocal data and the singing data are mixed in correspondence with each singing part of the designated music;
And each of said overall vocal generated by said generating means, said collated with each of the plurality of singing part is singing the ratio較用vocals by the original singer, the similarity between the overall vocal and the comparative vocals A calculating means for calculating;
Among the similarities calculated by the calculating means, a specifying means for specifying a general vocal having the highest similarity,
A karaoke apparatus comprising: output means for outputting the virtual vocal data included in the general vocal specified by the specifying means together with reproduction of the accompaniment sound of the designated music piece.

The acquisition means includes
Each of a plurality of voices input via each of two or more microphones is acquired as the singing data,
The generating means includes
The plurality of singing data acquired by the acquisition means are set as singing parts equal to or less than the number of the singing data, and for all combinations of the singing parts of the plurality of singing data, the singing data and the singing part in the designated music The karaoke apparatus according to claim 1, wherein the total vocal is generated by mixing the virtual vocal data having a number equal to or greater than a difference between the number of the vocal parts and the set number of the singing parts.

The generating means includes
The karaoke apparatus according to claim 2, wherein singing parts of the plurality of singing data generate the integrated vocal for combinations that are different singing parts.

The generating means includes
The karaoke apparatus of Claim 2 which produces the said integrated vocal about the combination including the case where the song part of the at least 2 song data among these song data is the same song part.

The generating means includes
The total vocal is generated by correcting the power of the singing data so that the power of the singing data falls within a range defined based on the power of at least one singing part in the virtual vocal data. The karaoke apparatus according to any one of claims 1 to 4.

The output means includes
Adjusting means for adjusting at least one of the sound level in the virtual vocal data or the sound level of the singing data within a relatively prescribed range;
The karaoke apparatus according to any one of claims 1 to 5, wherein either the virtual vocal data adjusted by the adjusting unit or the singing data is output.

A program to be executed by a computer,
A reproduction procedure for reproducing an accompaniment sound of a designated song that is a designated song that is a song having a plurality of singing parts in which at least a part of the note attribute is different,
An acquisition procedure for acquiring one or a plurality of voices input via a microphone as song data during playback of the designated music piece according to the playback procedure;
Each singing voice that reproduces each singing voice of the plurality of singing parts in the designated music by the original singer's way of singing is assumed to be virtual vocal data, and the difference between the number of singing parts in the designated music and the number of voices of the singing data or more A generation procedure for generating, for each combination of singing parts of the designated music, a total vocal in which the number of the virtual vocal data and the singing data are mixed in correspondence with each singing part of the designated music;
And each of said overall vocal generated in the generation procedure, the collated with each of the plurality of singing part is singing the ratio較用vocals by the original singer, the similarity between the overall vocal and the comparative vocals A calculation procedure to calculate,
Among the similarities calculated in the calculating procedure, a specific procedure for specifying the general vocal with the highest similarity,
An output procedure for outputting the virtual vocal data included in the general vocal specified in the specific procedure together with the reproduction of the accompaniment sound of the designated music;
A program to be executed by the computer.